IPA scheme¶
The ipa scheme produces broad phonemic IPA transcription in the format used by Wiktionary's Thai pronunciation entries. It is the phonological baseline — the scheme used for accuracy benchmarking.
from thaiphon import transcribe
transcribe("น้ำ", scheme="ipa")
# '/naːm˦˥/'
transcribe("สวัสดี", scheme="ipa")
# '/sa˨˩.wat̚˨˩.diː˧/'
Format conventions¶
| Convention | Detail |
|---|---|
| Word delimiters | /…/ phonemic slashes wrap the entire word |
| Syllable separator | . (full stop) |
| Long vowel | ː (IPA length mark, U+02D0) |
| Unreleased stops | p̚ t̚ k̚ (U+031A combining left angle above) |
| Tones | Chao tone letters after each syllable |
Tone notation¶
Thai has five lexical tones. The IPA scheme uses Chao tone letters:
| Tone | Chao letters | Example | TLC equivalent |
|---|---|---|---|
| Mid | ˧ |
/kaː˧/ |
{M} |
| Low | ˨˩ |
/kaː˨˩/ |
{L} |
| Falling | ˥˩ |
/kaː˥˩/ |
{F} |
| High | ˦˥ |
/kaː˦˥/ |
{H} |
| Rising | ˩˩˦ |
/kaː˩˩˦/ |
{R} |
Onset inventory¶
All onset phonemes in broad IPA as produced by thaiphon:
| IPA onset | Example Thai | Gloss |
|---|---|---|
k |
กา | crow |
kʰ |
ขา | leg |
tɕ |
จาน | plate |
tɕʰ |
ชา | tea |
d |
ดา | to be covered |
t |
ตา | eye / grandfather |
tʰ |
ทา | to apply |
b |
บา | to be thin |
p |
ปา | to throw |
pʰ |
พา | to bring |
f |
ฝา | lid, cover |
s |
สา | to name |
h |
หา | to find |
ʔ |
อา | uncle/aunt (glottal onset, rendered as empty in IPA output) |
m |
มา | to come |
n |
นา | rice field |
ŋ |
งา | sesame |
j |
ยา | medicine |
r |
รา | mould, fungus |
l |
ลา | donkey |
w |
วา | fathom (unit) |
Clusters: pr pl pʰr pʰl tr tʰr kr kl kʰr kʰl kʰw — written as the two phonemes concatenated with no joiner.
Vowel inventory¶
| Quality | Short IPA | Long IPA |
|---|---|---|
| /a/ | a |
aː |
| /i/ | i |
iː |
| /u/ | u |
uː |
| /e/ | e |
eː |
| /ɛ/ | ɛ |
ɛː |
| /o/ | o |
oː |
| /ɔ/ | ɔ |
ɔː |
| /ɯ/ | ɯ |
ɯː |
| /ɤ/ | ɤ |
ɤː |
| /ia/ (centring diphthong) | iə |
iə |
| /ɯa/ (centring diphthong) | ɯə |
ɯə |
| /ua/ (centring diphthong) | uə |
uə |
The centring diphthongs do not distinguish short/long in broad IPA; both length values surface as the same symbol.
Coda inventory¶
Native Thai phonotactics allows six coda positions:
| Coda IPA | Orthographic sources | Example |
|---|---|---|
m |
ม | /kaːm˧/ — กาม |
n |
น ณ ญ ร ล ฬ | /kaːn˧/ — กาน |
ŋ |
ง | /kaːŋ˧/ — กาง |
p̚ |
บ ป พ ภ ฟ | /kaːp̚˨˩/ — กาบ |
t̚ |
จ ช ฌ ด ต ถ ท ธ ฎ ฏ ฐ ฑ ฒ ซ ศ ษ ส | /kaːt̚˨˩/ — กาด |
k̚ |
ก ข ค ฆ | /kaːk̚˨˩/ — กาก |
Semi-vowel offglides /w/ and /j/ also appear as codas in diphthongs.
Loanword codas¶
Modern loanwords may preserve foreign phonemes in coda position under the everyday and careful_educated profiles:
| Preserved coda | Source letters | Example |
|---|---|---|
f |
ฟ | ลิฟต์ → /lif˦˥/ |
s |
ส ศ ษ | (in applicable loanwords) |
l |
ล | (in applicable loanwords) |
Under etalon_compat, all foreign codas collapse to their native equivalents (f→p̚, s→t̚, l→n).
Profile sensitivity¶
# everyday (default): preserve /f/ in well-integrated loans
transcribe("ลิฟต์", scheme="ipa", profile="everyday")
# '/lif˦˥/'
# etalon_compat: collapse to native /p̚/
transcribe("ลิฟต์", scheme="ipa", profile="etalon_compat")
# '/lip̚˦˥/'
See Reading profiles for full details.