110 lines
4.0 KiB
Markdown
110 lines
4.0 KiB
Markdown
# 8.2. 获取 CEPD 音标
|
||
|
||
macOS 上有一个收费软件,[Alfred](https://www.alfredapp.com/),可以用来定义很多快捷流程(workflow)去完成相对复杂的任务。比如,通过设定关键字启动一个 Python 脚本,查询某个单词(甚至整个句子)在《剑桥英语发声词典》(CEPD)中的音标。
|
||
|
||
> Alfred 的使用方法,参见:
|
||
> https://github.com/xiaolai/apple-computer-literacy/blob/main/alfred.md
|
||
|
||
在 Github 上有一个开源的仓库,提供了《剑桥英语发声词典》的 json 格式数据库:
|
||
|
||
> https://github.com/zelic91/camdict
|
||
|
||
将这个仓库里的 [cam_dict.refined.json](https://github.com/zelic91/camdict/raw/main/cam_dict.refined.json) 下载并保存到本地某个位置。
|
||
|
||
我写了一个 Alfred 的 workflow,使用的是 macOS 系统自带的 python3:`/usr/bin/python3`:
|
||
|
||
> [CEPD-phonetic-transcription.alfredworkflow](https:///1000h.org/public/alfred-workflows/CEPD-phonetic-transcription.alfredworkflow)
|
||
|
||
下载这个文件之后,导入 Alfred。
|
||
|
||
在使用之前要注意:
|
||
|
||
> * 修改各个 Python 脚本内的 `cam_dict.refined.json` 的文件路径
|
||
|
||
这个 workflow 可用的启动关键字分别是:
|
||
|
||
> * `cams`:查询音标(美式发音)
|
||
> * `camk`:查询音标(英式发音)
|
||
> * `camsd`:用浏览器打开 CEPD 真人示范录音(美式发音)在线网址
|
||
> * `camsd`:用浏览器打开 CEPD 真人示范录音(英式发音)在线网址
|
||
> * `camw`:用浏览器打开 CEPD 查询页面
|
||
> * `ipa`:返回 CMU(卡耐基梅隆大学)音标库中的音标
|
||
|
||
以下是查询音标的 workflow(启动关键字为 `cams`)中的 python 脚本:
|
||
|
||
```python
|
||
#
|
||
# NOTE: Python 2 is deprecated in macOS, and has been removed from macOS 12.3+
|
||
#
|
||
import sys
|
||
import json
|
||
|
||
# 假设你的 JSON 数据库是一个 JSON 文件,我们将从文件中加载数据
|
||
# 如果 JSON 数据在内存中或其他格式,你可能需要修改这部分代码
|
||
def load_json_database(file_path):
|
||
records = []
|
||
with open(file_path, 'r') as file:
|
||
for line in file:
|
||
try:
|
||
record = json.loads(line)
|
||
records.append(record)
|
||
except json.JSONDecodeError as e:
|
||
print(f"Error parsing JSON: {e}")
|
||
return records
|
||
|
||
# 在 JSON 数据库中检索 word
|
||
def search_in_json_database(database, search_word, region):
|
||
for record in database:
|
||
# 检查 word 字段是否匹配
|
||
if record.get('word') == search_word:
|
||
# 找到匹配项后,获取美式发音信息
|
||
pos_items = record.get('pos_items', [])
|
||
for pos_item in pos_items:
|
||
pronunciations = pos_item.get('pronunciations', [])
|
||
for pronunciation in pronunciations:
|
||
if pronunciation.get('region') == region:
|
||
# 找到美式发音,返回相关信息
|
||
return {
|
||
'pronunciation': pronunciation.get('pronunciation'),
|
||
'audio': pronunciation.get('audio')
|
||
}
|
||
# 如果没有找到匹配的 word 字段,返回 'not exist'
|
||
return 'not exist'
|
||
|
||
# cam_dict.refined.json 的文件路径
|
||
json_db_file_path = '/Users/joker/github/camdict/cam_dict.refined.json'
|
||
|
||
# 要检索的单词
|
||
search_word = sys.argv[1]
|
||
|
||
region = "us"
|
||
|
||
json_database = load_json_database(json_db_file_path)
|
||
|
||
# replace punctuations in text with space
|
||
punctuations = ",.?!;"
|
||
for p in punctuations:
|
||
search_word = search_word.replace(p, " ")
|
||
words = [word for word in search_word.split() if word.strip() != '']
|
||
|
||
phonetics = []
|
||
|
||
for w in words:
|
||
# 检索并获取结果
|
||
w = w.strip().lower()
|
||
|
||
if w[-1] in punctuations:
|
||
w = w.rstrip(",.?!;")
|
||
result = search_in_json_database(json_database, w, region)
|
||
|
||
if result == 'not exist':
|
||
phonetics.append(w+"*")
|
||
else:
|
||
phonetics.append(result['pronunciation'])
|
||
|
||
returnvalue = ''
|
||
for p in phonetics:
|
||
returnvalue += p + ' '
|
||
|
||
sys.stdout.write(returnvalue.strip())
|
||
``` |