everyone-can-use-english/1000-hours/sounds-of-american-english/8.2-cepd-phonetics-and-sound.md

# 8.2. 获取 CEPD 音标

macOS 上有一个收费软件，[Alfred](https://www.alfredapp.com/)，可以用来定义很多快捷流程（workflow）去完成相对复杂的任务。比如，通过设定关键字启动一个 Python 脚本，查询某个单词（甚至整个句子）在《剑桥英语发声词典》（CEPD）中的音标。

> Alfred 的使用方法，参见：
> https://github.com/xiaolai/apple-computer-literacy/blob/main/alfred.md

在 Github 上有一个开源的仓库，提供了《剑桥英语发声词典》的 json 格式数据库：

> https://github.com/zelic91/camdict

将这个仓库里的 [cam_dict.refined.json](https://github.com/zelic91/camdict/raw/main/cam_dict.refined.json) 下载并保存到本地某个位置。

我写了一个 Alfred 的 workflow，使用的是 macOS 系统自带的 python3：`/usr/bin/python3`：

> [CEPD-phonetic-transcription.alfredworkflow](https:///1000h.org/public/alfred-workflows/CEPD-phonetic-transcription.alfredworkflow)

下载这个文件之后，导入 Alfred。

在使用之前要注意：

> * 修改各个 Python 脚本内的 `cam_dict.refined.json` 的文件路径

这个 workflow 可用的启动关键字分别是：

> * `cams`：查询音标（美式发音）
> * `camk`：查询音标（英式发音）
> * `camsd`：用浏览器打开 CEPD 真人示范录音（美式发音）在线网址
> * `camsd`：用浏览器打开 CEPD 真人示范录音（英式发音）在线网址
> * `camw`：用浏览器打开 CEPD 查询页面
> * `ipa`：返回 CMU（卡耐基梅隆大学）音标库中的音标

以下是查询音标的 workflow（启动关键字为 `cams`）中的 python 脚本：

```python
#
# NOTE: Python 2 is deprecated in macOS, and has been removed from macOS 12.3+
#
import sys
import json

# 假设你的 JSON 数据库是一个 JSON 文件，我们将从文件中加载数据
# 如果 JSON 数据在内存中或其他格式，你可能需要修改这部分代码
def load_json_database(file_path):
    records = []
    with open(file_path, 'r') as file:
        for line in file:
            try:
                record = json.loads(line)
                records.append(record)
            except json.JSONDecodeError as e:
                print(f"Error parsing JSON: {e}")
    return records

# 在 JSON 数据库中检索 word
def search_in_json_database(database, search_word, region):
    for record in database:
        # 检查 word 字段是否匹配
        if record.get('word') == search_word:
            # 找到匹配项后，获取美式发音信息
            pos_items = record.get('pos_items', [])
            for pos_item in pos_items:
                pronunciations = pos_item.get('pronunciations', [])
                for pronunciation in pronunciations:
                    if pronunciation.get('region') == region:
                        # 找到美式发音，返回相关信息
                        return {
                            'pronunciation': pronunciation.get('pronunciation'),
                            'audio': pronunciation.get('audio')
                        }
    # 如果没有找到匹配的 word 字段，返回 'not exist'
    return 'not exist'

# cam_dict.refined.json 的文件路径
json_db_file_path = '/Users/joker/github/camdict/cam_dict.refined.json'

# 要检索的单词
search_word = sys.argv[1]

region = "us"

json_database = load_json_database(json_db_file_path)

# replace punctuations in text with space
punctuations = ",.?!;"
for p in punctuations:
    search_word = search_word.replace(p, " ")
words = [word for word in search_word.split() if word.strip() != '']

phonetics = []

for w in words:
  # 检索并获取结果
  w = w.strip().lower()

  if w[-1] in punctuations:
    w = w.rstrip(",.?!;")
  result = search_in_json_database(json_database, w, region)

  if result == 'not exist':
    phonetics.append(w+"*")
  else:
    phonetics.append(result['pronunciation'])

returnvalue = ''
for p in phonetics:
  returnvalue += p + ' '

sys.stdout.write(returnvalue.strip())
```