The phonological model

The phonological model

This section describes how thaiphon analyses Thai orthography into phonological representations. It is written for readers familiar with basic linguistics — phoneme, syllable, tone — but does not assume prior knowledge of Thai.

Audience

This section is aimed at linguists, Thai grammar specialists, and curious developers. It explains the phonological reasoning behind thaiphon's rules, not just the code that implements them.


Overview

Thai is a tonal language written in an alphabetic script. The relationship between spelling and pronunciation is systematic but non-trivial:

  • There are 44 consonant letters but only 21 distinct consonant phonemes in onset position (and 8 in coda position).
  • Tone is determined not by a single tone letter but by a combination of: the consonant class of the onset, the shape of the syllable (live or dead), the length of the vowel, and any tone mark present.
  • Vowels are written in a discontinuous, multi-directional arrangement around the onset consonant.
  • Some written consonants are purely orthographic — they shift tone but are not pronounced.

thaiphon models Thai phonology in layers, each implemented as a separate derivation module:

Layer What it does
Unicode normalisation NFC + mark reordering + variation-selector strip
Expansion Sara Am decomposition, ๆ repetition, ฯลฯ, Thai digit spelling
Syllabification Segment the word into syllable-sized chunks; rank candidates
Onset resolution Identify the onset consonant(s) and their IPA phoneme(s)
Vowel resolution Identify the vowel nucleus, length, and any offglide
Coda resolution Identify the final consonant and its collapsed phoneme
Syllable-type classification Live vs. dead syllable
Tone assignment Lookup in the tone matrix
PhonologicalWord assembly Immutable frozen tuple of Syllables

The phonological word

Every thaiphon analysis produces a PhonologicalWord — an immutable tuple of Syllable objects. Each Syllable carries:

Field Type Description
onset Phoneme \| Cluster \| None The initial consonant or cluster, in IPA
vowel Phoneme The nucleus vowel quality, in IPA
vowel_length VowelLength SHORT or LONG
coda Phoneme \| None The final consonant phoneme, or None for open syllables
tone Tone MID, LOW, FALLING, HIGH, or RISING
tone_mark ToneMark NONE, MAI_EK, MAI_THO, MAI_TRI, MAI_JATTAWA
effective_class EffectiveClass HIGH, MID, or LOW (after leading ห adjustments)
syllable_type SyllableType LIVE or DEAD
raw str The raw Thai orthographic slice for this syllable

This intermediate representation is completely scheme-independent. Schemes only read it; they never modify it.


Sections in this chapter


Key references

thaiphon's phonological rules draw on publicly available descriptions of Thai phonology. Key references include:

  • Mary Haas, Thai Reader and Thai-English Student's Dictionary — foundational description of Thai phonology.
  • Appendix:Thai pronunciation on the English Wiktionary — the notation standard for the IPA scheme.
  • Royal Institute of Thailand, Royal Thai General System of Transcription — the official RTGS standard.
  • thai-language.com — the source of the TLC Enhanced Phonemic notation.