8.appendix added.
This commit is contained in:
@@ -359,9 +359,9 @@
|
||||
</tr>
|
||||
<tr>
|
||||
<td><span class="pho">ŋ</span><span class="speak-word-inline" data-audio-uk-male="/audios/uk_phonetics_sound_sing_2023feb.mp3"></span></td>
|
||||
<td><b>th</b>ank <span class="pho alt not-display">θæŋk</span><span class="speak-word-inline" data-audio-uk-female="/audios/thank-uk-female.mp3" data-audio-uk-male="/audios/thank-uk-male.mp3"></span></td>
|
||||
<td>tha<b>n</b>k <span class="pho alt not-display">θæŋk</span><span class="speak-word-inline" data-audio-uk-female="/audios/thank-uk-female.mp3" data-audio-uk-male="/audios/thank-uk-male.mp3"></span></td>
|
||||
<td><span class="pho">ŋ</span><span class="speak-word-inline" data-audio-us-male="/audios/us_phonetics_sound_sing_2023feb.mp3"></span></td>
|
||||
<td><b>th</b>ank <span class="pho alt not-display">θæŋk</span><span class="speak-word-inline" data-audio-us-female="/audios/thank-us-female.mp3" data-audio-us-male="/audios/thank-us-male.mp3"></span></td>
|
||||
<td>tha<b>n</b>k <span class="pho alt not-display">θæŋk</span><span class="speak-word-inline" data-audio-us-female="/audios/thank-us-female.mp3" data-audio-us-male="/audios/thank-us-male.mp3"></span></td>
|
||||
</tr>
|
||||
<tr>
|
||||
<td><span class="pho">l</span><span class="speak-word-inline" data-audio-uk-male="/audios/uk_phonetics_sound_look_2023feb.mp3"></span></td>
|
||||
|
||||
@@ -28,9 +28,9 @@
|
||||
</tr>
|
||||
<tr>
|
||||
<td><span class="pho">ŋ</span><span class="speak-word-inline" data-audio-uk-male="/audios/uk_phonetics_sound_sing_2023feb.mp3"></span></td>
|
||||
<td><b>th</b>ank <span class="pho alt">θæŋk</span><span class="speak-word-inline" data-audio-uk-female="/audios/thank-uk-female.mp3" data-audio-uk-male="/audios/thank-uk-male.mp3"></span></td>
|
||||
<td>tha<b>n</b>k <span class="pho alt">θæŋk</span><span class="speak-word-inline" data-audio-uk-female="/audios/thank-uk-female.mp3" data-audio-uk-male="/audios/thank-uk-male.mp3"></span></td>
|
||||
<td><span class="pho">ŋ</span><span class="speak-word-inline" data-audio-us-male="/audios/us_phonetics_sound_sing_2023feb.mp3"></span></td>
|
||||
<td><b>th</b>ank <span class="pho alt">θæŋk</span><span class="speak-word-inline" data-audio-us-female="/audios/thank-us-female.mp3" data-audio-us-male="/audios/thank-us-male.mp3"></span></td>
|
||||
<td>tha<b>n</b>k <span class="pho alt">θæŋk</span><span class="speak-word-inline" data-audio-us-female="/audios/thank-us-female.mp3" data-audio-us-male="/audios/thank-us-male.mp3"></span></td>
|
||||
</tr>
|
||||
</tbody>
|
||||
</table>
|
||||
|
||||
3
1000-hours/sounds-of-american-english/8-appendix.md
Normal file
3
1000-hours/sounds-of-american-english/8-appendix.md
Normal file
@@ -0,0 +1,3 @@
|
||||
# 附录
|
||||
|
||||
这里补充的是一些日常可以使用的桌面版工具。
|
||||
@@ -0,0 +1,63 @@
|
||||
# 8.1. 输入音标与特殊符号
|
||||
|
||||
在电子文档中输入音标符号(及其其它特殊符号)从来都很麻烦。
|
||||
|
||||
再一次,我用 Alfred 作为辅助,以下是 workflow 文件:
|
||||
|
||||
> [IPA-Phonetic-Symbols](https:///1000h.org/public/alfred-workflows/IPA-Phonetic-Symbols.alfredworkflow)
|
||||
|
||||
以启动关键字 `ipae` 为例 —— 呼出 Alfred:
|
||||
|
||||

|
||||
|
||||
这时,就可以用 `CMD + 数字` 的方式,将对应的符号插入当前文本。比如,`CMD + 4` 就是将 <span class="pho">ɝː</span> 插入当前文本编辑器。
|
||||
|
||||
以下罗列的是各个符号对应的 Alfred 关键字(Keywords):
|
||||
|
||||
| 关键字(Keyword) | 符号(Symbol) |
|
||||
| ----- | ----- |
|
||||
| `ipaa` | <span class="pho">ʌ</span> |
|
||||
| `ipaaa` | <span class="pho">ɑ</span> |
|
||||
| `ipaae` | <span class="pho">æ</span> |
|
||||
| `ipae` | <span class="pho">ə</span> |
|
||||
| `ipaeeer` | <span class="pho">ɝː</span> |
|
||||
| `ipaer` | <span class="pho">ɚ</span> |
|
||||
| `ipaes` | <span class="pho">ᵊ</span> |
|
||||
| `ipai` | <span class="pho">ɪ</span> |
|
||||
| `ipau` | <span class="pho">ʊ</span> |
|
||||
| `ipao` | <span class="pho">ɒ</span> |
|
||||
| `ipaoo` | <span class="pho">ɔ</span> |
|
||||
| `ipal` | <span class="pho">ɤ</span> |
|
||||
| `ipatd` | <span class="pho">t̠</span> |
|
||||
| `ipatg` | <span class="pho">ʔ</span> |
|
||||
| `ipats` | <span class="pho">ᵗ</span> |
|
||||
| `ipan` | <span class="pho">ŋ</span> |
|
||||
| `ipath` | <span class="pho">θ</span> |
|
||||
| `ipad` | <span class="pho">ð</span> |
|
||||
| `ipas` | <span class="pho">ʃ</span> |
|
||||
| `ipaz` | <span class="pho">ʒ</span> |
|
||||
| `ipaj` | <span class="pho">ʲ</span> |
|
||||
| `ipaw` | <span class="pho">ʷ</span> |
|
||||
| `ipa1` | <span class="pho">◌̅</span> flat |
|
||||
| `ipa2` | <span class="pho">◌́</span> rise |
|
||||
| `ipa3` | <span class="pho">◌̌</span> fall-rise |
|
||||
| `ipa4` | <span class="pho">◌̀</span> fall |
|
||||
| `ipa5` | <span class="pho">◌̂</span> pitch raise |
|
||||
| `ipa6` | <span class="pho">◌̲</span> long vowel |
|
||||
| `ipa7` | <span class="pho">◌̩</span> syllabic consonant |
|
||||
| `ipa8` | <span class="pho">◌̥</span> voiceless |
|
||||
| `ipa9` | <span class="pho">◌̚</span> stop |
|
||||
| `ipa0` | <span class="pho">◌</span> |
|
||||
| `ipa`: | <span class="pho">ː</span> long vowel symbol |
|
||||
| `ipa`" | <span class="pho">ˈ</span> prime stress |
|
||||
| `ipa`' | <span class="pho">ˌ</span> secondary stress |
|
||||
| `ipa-` | <span class="pho">◌‿◌</span> linking |
|
||||
| `ipa\|` | <span class="pho">‖</span> grouping boundary |
|
||||
| `-->` | <span class="pho">⭢</span> |
|
||||
| `<--` | <span class="pho">⭠</span> |
|
||||
| `<->` | <span class="pho">⭤</span> |
|
||||
| `irise` | <span class="pho">⤴</span> senetence intonation rise |
|
||||
| `idown` | <span class="pho">⤵</span> senetence intonation fall |
|
||||
|
||||
|
||||
|
||||
@@ -0,0 +1,110 @@
|
||||
# 8.2. 获取 CEPD 音标
|
||||
|
||||
macOS 上有一个收费软件,[Alfred](https://www.alfredapp.com/),可以用来定义很多快捷流程(workflow)去完成相对复杂的任务。比如,通过设定关键字启动一个 Python 脚本,查询某个单词(甚至整个句子)在《剑桥英语发声词典》(CEPD)中的音标。
|
||||
|
||||
> Alfred 的使用方法,参见:
|
||||
> https://github.com/xiaolai/apple-computer-literacy/blob/main/alfred.md
|
||||
|
||||
在 Github 上有一个开源的仓库,提供了《剑桥英语发声词典》的 json 格式数据库:
|
||||
|
||||
> https://github.com/zelic91/camdict
|
||||
|
||||
将这个仓库里的 [cam_dict.refined.json](https://github.com/zelic91/camdict/raw/main/cam_dict.refined.json) 下载并保存到本地某个位置。
|
||||
|
||||
我写了一个 Alfred 的 workflow,使用的是 macOS 系统自带的 python3:`/usr/bin/python3`:
|
||||
|
||||
> [CEPD-phonetic-transcription.alfredworkflow](https:///1000h.org/public/alfred-workflows/CEPD-phonetic-transcription.alfredworkflow)
|
||||
|
||||
下载这个文件之后,导入 Alfred。
|
||||
|
||||
在使用之前要注意:
|
||||
|
||||
> * 修改各个 Python 脚本内的 `cam_dict.refined.json` 的文件路径
|
||||
|
||||
这个 workflow 可用的启动关键字分别是:
|
||||
|
||||
> * `cams`:查询音标(美式发音)
|
||||
> * `camk`:查询音标(英式发音)
|
||||
> * `camsd`:用浏览器打开 CEPD 真人示范录音(美式发音)在线网址
|
||||
> * `camsd`:用浏览器打开 CEPD 真人示范录音(英式发音)在线网址
|
||||
> * `camw`:用浏览器打开 CEPD 查询页面
|
||||
> * `ipa`:返回 CMU(卡耐基梅隆大学)音标库中的音标
|
||||
|
||||
以下是查询音标的 workflow(启动关键字为 `cams`)中的 python 脚本:
|
||||
|
||||
```python
|
||||
#
|
||||
# NOTE: Python 2 is deprecated in macOS, and has been removed from macOS 12.3+
|
||||
#
|
||||
import sys
|
||||
import json
|
||||
|
||||
# 假设你的 JSON 数据库是一个 JSON 文件,我们将从文件中加载数据
|
||||
# 如果 JSON 数据在内存中或其他格式,你可能需要修改这部分代码
|
||||
def load_json_database(file_path):
|
||||
records = []
|
||||
with open(file_path, 'r') as file:
|
||||
for line in file:
|
||||
try:
|
||||
record = json.loads(line)
|
||||
records.append(record)
|
||||
except json.JSONDecodeError as e:
|
||||
print(f"Error parsing JSON: {e}")
|
||||
return records
|
||||
|
||||
# 在 JSON 数据库中检索 word
|
||||
def search_in_json_database(database, search_word, region):
|
||||
for record in database:
|
||||
# 检查 word 字段是否匹配
|
||||
if record.get('word') == search_word:
|
||||
# 找到匹配项后,获取美式发音信息
|
||||
pos_items = record.get('pos_items', [])
|
||||
for pos_item in pos_items:
|
||||
pronunciations = pos_item.get('pronunciations', [])
|
||||
for pronunciation in pronunciations:
|
||||
if pronunciation.get('region') == region:
|
||||
# 找到美式发音,返回相关信息
|
||||
return {
|
||||
'pronunciation': pronunciation.get('pronunciation'),
|
||||
'audio': pronunciation.get('audio')
|
||||
}
|
||||
# 如果没有找到匹配的 word 字段,返回 'not exist'
|
||||
return 'not exist'
|
||||
|
||||
# cam_dict.refined.json 的文件路径
|
||||
json_db_file_path = '/Users/joker/github/camdict/cam_dict.refined.json'
|
||||
|
||||
# 要检索的单词
|
||||
search_word = sys.argv[1]
|
||||
|
||||
region = "us"
|
||||
|
||||
json_database = load_json_database(json_db_file_path)
|
||||
|
||||
# replace punctuations in text with space
|
||||
punctuations = ",.?!;"
|
||||
for p in punctuations:
|
||||
search_word = search_word.replace(p, " ")
|
||||
words = [word for word in search_word.split() if word.strip() != '']
|
||||
|
||||
phonetics = []
|
||||
|
||||
for w in words:
|
||||
# 检索并获取结果
|
||||
w = w.strip().lower()
|
||||
|
||||
if w[-1] in punctuations:
|
||||
w = w.rstrip(",.?!;")
|
||||
result = search_in_json_database(json_database, w, region)
|
||||
|
||||
if result == 'not exist':
|
||||
phonetics.append(w+"*")
|
||||
else:
|
||||
phonetics.append(result['pronunciation'])
|
||||
|
||||
returnvalue = ''
|
||||
for p in phonetics:
|
||||
returnvalue += p + ' '
|
||||
|
||||
sys.stdout.write(returnvalue.strip())
|
||||
```
|
||||
127
1000-hours/sounds-of-american-english/8.3-phoneme-exercises.md
Normal file
127
1000-hours/sounds-of-american-english/8.3-phoneme-exercises.md
Normal file
@@ -0,0 +1,127 @@
|
||||
# 8.3. 音标练习
|
||||
|
||||
这是一个 Jupyter Notebook,用来建立音标符号与声音之间的关联。
|
||||
|
||||
每次执行,随即从《剑桥英语发声词典》中选取一个词汇,播放真人朗读语音,而后要求对元音或者辅音填空……
|
||||
|
||||
> [phonetics-fill-in-exercise.ipynb](https://1000h.org/public/jupyter-notebooks/phonetics-fill-in-exercise.ipynb)
|
||||
|
||||
执行结果如下:
|
||||
|
||||

|
||||
|
||||
Jupyter Notebook 代码如下:
|
||||
|
||||
``` Python
|
||||
# %%
|
||||
%pip install python-vlc
|
||||
|
||||
# %%
|
||||
import requests
|
||||
import json
|
||||
import vlc
|
||||
import re
|
||||
import random
|
||||
from IPython.display import Audio
|
||||
|
||||
import json
|
||||
import requests
|
||||
|
||||
def load_json_database(source):
|
||||
records = []
|
||||
|
||||
def parse_json_lines(lines):
|
||||
for line in lines:
|
||||
if line:
|
||||
try:
|
||||
record = json.loads(line)
|
||||
records.append(record)
|
||||
except json.JSONDecodeError as e:
|
||||
print(f"Error parsing JSON: {e}")
|
||||
|
||||
try:
|
||||
if source.startswith('http://') or source.startswith('https://'):
|
||||
# Handle as URL
|
||||
response = requests.get(source)
|
||||
response.raise_for_status() # Raise an error for bad status codes
|
||||
parse_json_lines(response.iter_lines(decode_unicode=True))
|
||||
else:
|
||||
# Handle as file
|
||||
with open(source, 'r', encoding='utf-8') as file:
|
||||
parse_json_lines(file)
|
||||
except requests.exceptions.RequestException as e:
|
||||
print(f"Error fetching data from URL: {e}")
|
||||
except FileNotFoundError as e:
|
||||
print(f"Error opening file: {e}")
|
||||
except Exception as e:
|
||||
print(f"An unexpected error occurred: {e}")
|
||||
|
||||
return records
|
||||
|
||||
url = "https://raw.githubusercontent.com/zelic91/camdict/main/cam_dict.refined.json"
|
||||
json_database = load_json_database(url)
|
||||
|
||||
|
||||
# %%
|
||||
def search_in_json_database(database, search_word, region):
|
||||
for record in database:
|
||||
# 检查 word 字段是否匹配
|
||||
if record.get('word') == search_word:
|
||||
# 找到匹配项后,获取美式发音信息
|
||||
pos_items = record.get('pos_items', [])
|
||||
for pos_item in pos_items:
|
||||
pronunciations = pos_item.get('pronunciations', [])
|
||||
for pronunciation in pronunciations:
|
||||
if pronunciation.get('region') == region:
|
||||
# 找到美式发音,返回相关信息
|
||||
return {
|
||||
'pronunciation': pronunciation.get('pronunciation'),
|
||||
'audio': pronunciation.get('audio')
|
||||
}
|
||||
# 如果没有找到匹配的 word 字段,返回 'not exist'
|
||||
return 'not exist'
|
||||
|
||||
def replace_with_underscores(match):
|
||||
return '_' * len(match.group(0))
|
||||
|
||||
# %%
|
||||
# get a random word from the database
|
||||
|
||||
vowel_phonetics = re.compile(r'ɑː|ɑːr|ʌ||iː|ɪ|i|ɪr|ʊ|ʊr|uː|ʊr|e|er|æ|ə|ɚ|ɝː|ɒ|ɔː|ɔːr|ɔɪ|aɪ|aɪr|eɪ|aʊ|aʊr|oʊ|')
|
||||
consonant_phonetics = re.compile(r'p|b|t|d|k|ɡ|f|v|θ|ð|s|z|ʃ|ʒ|tʃ|dʒ|r|h|l|t̬|j|w|ŋ|n|m|tr|dr|ts|dz|br|pr|fr|ɡr|θr|dr|ʃr|kr|bl|kl|ɡl|fl|pl|sl|sp|st|sk|sm|sn|sw|str|spr|skr|spl|sfr|skw|skr|skl|')
|
||||
|
||||
# if the word is with certain enddings such as 'es, ed, ing', get another word
|
||||
random_word = random.choice(json_database)
|
||||
while random_word['word'].endswith(('ed', 'ing', 'es', 'ts', 'ks', 'ds', 'ps', 'bs', 'gs', 'ls', 'rs', 'ms', 'ns', 'er', 'est')):
|
||||
random_word = random.choice(json_database)
|
||||
|
||||
# get pronunciation of the random word with region 'us'
|
||||
random_word_us = search_in_json_database(json_database, random_word['word'], 'us')
|
||||
|
||||
# get the word's phonetics
|
||||
random_word_entry = random_word['word']
|
||||
print(random_word_entry)
|
||||
|
||||
random_word_phonetics = random_word_us['pronunciation']
|
||||
|
||||
# get the audio url of the word
|
||||
random_word_us_audio_url = random_word_us['audio']
|
||||
print(random_word_us_audio_url)
|
||||
|
||||
blank_vowel_phonetics = re.sub(vowel_phonetics, replace_with_underscores, random_word_phonetics)
|
||||
blank_consonant_phonetics = re.sub(consonant_phonetics, replace_with_underscores, random_word_phonetics)
|
||||
|
||||
# fill vowels in blanks
|
||||
print(f'Fill vowels in blanks: {blank_vowel_phonetics}')
|
||||
|
||||
# fill consonants in blanks
|
||||
print(f'Fill in consonants in blanks: {blank_consonant_phonetics}')
|
||||
|
||||
# play the audio
|
||||
player = vlc.MediaPlayer(random_word_us['audio'])
|
||||
player.play()
|
||||
|
||||
# display the audio
|
||||
Audio(url=random_word_us_audio_url)
|
||||
```
|
||||
|
||||
@@ -0,0 +1,10 @@
|
||||
# 8.4. 每日练习语音生成
|
||||
|
||||
这是一个 Jupyter Notebook —— 需要有自己的 OpenAI API Key。指定 `user-prompt`,而后生成
|
||||
|
||||
> * 一个篇章及其 markdown 文件,以及由 alloy 和 nova 朗读的 mp3 文件
|
||||
> * 同一话题的两个对话,及其 mp3 文件
|
||||
|
||||
压缩包链接:
|
||||
|
||||
> [8.4-daily-speech-exercises.zip](https://1000h.org/public/jupyter-notebooks/8.4-daily-speech-exercises.zip)
|
||||
Reference in New Issue
Block a user