The phonological model¶
This section describes how thaiphon analyses Thai orthography into phonological representations. It is written for readers familiar with basic linguistics — phoneme, syllable, tone — but does not assume prior knowledge of Thai.
Audience
This section is aimed at linguists, Thai grammar specialists, and curious developers. It explains the phonological reasoning behind thaiphon's rules, not just the code that implements them.
Overview¶
Thai is a tonal language written in an alphabetic script. The relationship between spelling and pronunciation is systematic but non-trivial:
- There are 44 consonant letters but only 21 distinct consonant phonemes in onset position (and 8 in coda position).
- Tone is determined not by a single tone letter but by a combination of: the consonant class of the onset, the shape of the syllable (live or dead), the length of the vowel, and any tone mark present.
- Vowels are written in a discontinuous, multi-directional arrangement around the onset consonant.
- Some written consonants are purely orthographic — they shift tone but are not pronounced.
thaiphon models Thai phonology in layers, each implemented as a separate derivation module:
| Layer | What it does |
|---|---|
| Unicode normalisation | NFC + mark reordering + variation-selector strip |
| Expansion | Sara Am decomposition, ๆ repetition, ฯลฯ, Thai digit spelling |
| Syllabification | Segment the word into syllable-sized chunks; rank candidates |
| Onset resolution | Identify the onset consonant(s) and their IPA phoneme(s) |
| Vowel resolution | Identify the vowel nucleus, length, and any offglide |
| Coda resolution | Identify the final consonant and its collapsed phoneme |
| Syllable-type classification | Live vs. dead syllable |
| Tone assignment | Lookup in the tone matrix |
| PhonologicalWord assembly | Immutable frozen tuple of Syllables |
The phonological word¶
Every thaiphon analysis produces a PhonologicalWord — an immutable tuple of Syllable objects. Each Syllable carries:
| Field | Type | Description |
|---|---|---|
onset |
Phoneme \| Cluster \| None |
The initial consonant or cluster, in IPA |
vowel |
Phoneme |
The nucleus vowel quality, in IPA |
vowel_length |
VowelLength |
SHORT or LONG |
coda |
Phoneme \| None |
The final consonant phoneme, or None for open syllables |
tone |
Tone |
MID, LOW, FALLING, HIGH, or RISING |
tone_mark |
ToneMark |
NONE, MAI_EK, MAI_THO, MAI_TRI, MAI_JATTAWA |
effective_class |
EffectiveClass |
HIGH, MID, or LOW (after leading ห adjustments) |
syllable_type |
SyllableType |
LIVE or DEAD |
raw |
str |
The raw Thai orthographic slice for this syllable |
This intermediate representation is completely scheme-independent. Schemes only read it; they never modify it.
Sections in this chapter¶
- Consonant classes — the three-class system and why it matters for tone.
- Tone derivation — the tone matrix and how tone is calculated.
- Vowels — the vowel inventory, length, and orthographic patterns.
- Codas and the six-way merge — how 26 letters collapse to 6 coda phonemes.
- Onset clusters — genuine clusters vs. aksornam patterns.
- Special cases — leading ห, Sara Am, thanthakhat (killer mark), ทร, ฤ.
Key references¶
thaiphon's phonological rules draw on publicly available descriptions of Thai phonology. Key references include:
- Mary Haas, Thai Reader and Thai-English Student's Dictionary — foundational description of Thai phonology.
- Appendix:Thai pronunciation on the English Wiktionary — the notation standard for the IPA scheme.
- Royal Institute of Thailand, Royal Thai General System of Transcription — the official RTGS standard.
- thai-language.com — the source of the TLC Enhanced Phonemic notation.