Files
everyone-can-use-english/1000-hours/sounds-of-american-english/8.2-cepd-phonetics-and-sound.md
2024-08-25 12:35:12 +08:00

105 lines
3.5 KiB
Markdown
Raw Blame History

This file contains ambiguous Unicode characters
This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.
# 8.2. 获取 CEPD 音标
在 Github 上有一个开源的仓库,提供了《剑桥英语发声词典》的 json 格式数据库:
> https://github.com/zelic91/camdict
将这个仓库里的 [cam_dict.refined.json](https://github.com/zelic91/camdict/raw/main/cam_dict.refined.json) 下载并保存到本地某个位置。
我写了一个 Alfred 的 workflow使用的是 macOS 系统自带的 *python3*`/usr/bin/python3`
> [CEPD-phonetic-transcription.alfredworkflow](https:///1000h.org/public/alfred-workflows/CEPD-phonetic-transcription.alfredworkflow)
下载这个文件之后,导入 Alfred。
在使用之前要注意:
> * 修改各个 Python 脚本内的 `cam_dict.refined.json` 的文件路径
这个 workflow 可用的启动关键字分别是:
> * `cams`:查询音标(美式发音)
> * `camk`:查询音标(英式发音)
> * `camsd`:用浏览器打开 CEPD 真人示范录音(美式发音)在线网址
> * `camsd`:用浏览器打开 CEPD 真人示范录音(英式发音)在线网址
> * `camw`:用浏览器打开 CEPD 查询页面
> * `ipa`:返回 CMU卡耐基梅隆大学音标库中的音标
以下是查询音标的 workflow启动关键字为 `cams`)中的 python 脚本:
```python
#
# NOTE: Python 2 is deprecated in macOS, and has been removed from macOS 12.3+
#
import sys
import json
# 假设你的 JSON 数据库是一个 JSON 文件,我们将从文件中加载数据
# 如果 JSON 数据在内存中或其他格式,你可能需要修改这部分代码
def load_json_database(file_path):
records = []
with open(file_path, 'r') as file:
for line in file:
try:
record = json.loads(line)
records.append(record)
except json.JSONDecodeError as e:
print(f"Error parsing JSON: {e}")
return records
# 在 JSON 数据库中检索 word
def search_in_json_database(database, search_word, region):
for record in database:
# 检查 word 字段是否匹配
if record.get('word') == search_word:
# 找到匹配项后,获取美式发音信息
pos_items = record.get('pos_items', [])
for pos_item in pos_items:
pronunciations = pos_item.get('pronunciations', [])
for pronunciation in pronunciations:
if pronunciation.get('region') == region:
# 找到美式发音,返回相关信息
return {
'pronunciation': pronunciation.get('pronunciation'),
'audio': pronunciation.get('audio')
}
# 如果没有找到匹配的 word 字段,返回 'not exist'
return 'not exist'
# cam_dict.refined.json 的文件路径
json_db_file_path = '/Users/joker/github/camdict/cam_dict.refined.json'
# 要检索的单词
search_word = sys.argv[1]
region = "us"
json_database = load_json_database(json_db_file_path)
# replace punctuations in text with space
punctuations = ",.?!;"
for p in punctuations:
search_word = search_word.replace(p, " ")
words = [word for word in search_word.split() if word.strip() != '']
phonetics = []
for w in words:
# 检索并获取结果
w = w.strip().lower()
if w[-1] in punctuations:
w = w.rstrip(",.?!;")
result = search_in_json_database(json_database, w, region)
if result == 'not exist':
phonetics.append(w+"*")
else:
phonetics.append(result['pronunciation'])
returnvalue = ''
for p in phonetics:
returnvalue += p + ' '
sys.stdout.write(returnvalue.strip())
```