Vowels

Vowels

Thai has a rich vowel system with a phonemic length distinction (short vs. long) for every vowel quality, and three centring diphthongs. In writing, vowel symbols are placed above, below, before, and after the consonant they belong to — sometimes in combination.


The vowel inventory

thaiphon uses IPA vowel quality labels as internal phoneme symbols. The vowel map below shows both the IPA representation and the typical orthographic pattern.

Quality IPA Short Thai spelling Long Thai spelling Notes
/a/ a ◌ั Short /a/ before a coda uses ◌ั (sara a); long uses า (sara aa)
/i/ i ◌ิ ◌ี Short sara i; long sara ii
/u/ u ◌ุ ◌ู Short sara u; long sara uu
/e/ e เ◌็ เ◌ Short requires mai tai khu ◌็; long uses just เ
/ɛ/ ɛ แ◌็ แ◌ Short /ɛ/ before coda uses แ without ◌็
/o/ o โ◌ โ◌ Short /o/ before coda uses โ◌ pattern
/ɔ/ ɔ เ◌าะ ◌อ Short uses เ◌าะ frame; long uses ◌อ
/ɯ/ ɯ ◌ึ ◌ื Short sara ue; long sara uee
/ɤ/ ɤ เ◌อะ เ◌อ The "ambiguous" vowel; short is เ◌อะ, long is เ◌อ
/ia/ เ◌ียะ เ◌ีย Centring diphthong
/ɯa/ ɯə เ◌ือะ เ◌ือ Centring diphthong
/ua/ ◌ัวะ ◌ัว Centring diphthong

Vowel length

Vowel length is phonemically distinctive in Thai. Minimal pairs based on length alone are common:

Short Long
เขิน /kʰɤn˩˩˦/ เขิน (/kʰɤːn˧/) — illustrative: these are the same word, length is orthographic
คน /kʰon˧/ (person) โคน /kʰoːn˧/ (base, stump)

The vowel_length field on a Syllable records VowelLength.SHORT or VowelLength.LONG.


Sara Am: a special case

Sara Am (◌ำ, U+0E33) is not a simple vowel mark. It decomposes as:

  • A long /aː/ vowel nucleus.
  • A nasal /m/ coda.

thaiphon expands Sara Am in the normalisation phase. The word น้ำ (water) contains น + ◌้ + ◌ำ and is analysed as onset /n/ + vowel /aː/ LONG + coda /m/ + high tone (from ◌้ on an LC onset).

See Special cases for the full expansion logic.


Orthographic vowel frames

Thai vowel notation is positional. Some vowels are written before the consonant (pre-vowels), some above, some below, and some after. A single vowel phoneme may involve characters in multiple positions around the onset consonant.

Pre-vowels (written before the onset in text, but phonemically part of the nucleus): - เ — used in เ◌ (long /eː/), เ◌็ (short /e/), แ◌ (long /ɛː/), โ◌ (long /oː/), เ◌าะ (short /ɔ/), เ◌อ (long /ɤː/)

Post-base vowel marks (written above or below the onset): - ◌ั ◌ิ ◌ี ◌ึ ◌ื ◌ุ ◌ู ◌็

thaiphon identifies these marks during syllabification and uses them alongside the presence/absence of a coda and the pre-vowel to determine the vowel phoneme.


Centring diphthongs

The three centring diphthongs — /ia/, /ɯa/, /ua/ — glide from a front or back position toward the central schwa. Orthographically they are written as two-part frames:

Diphthong Short form Long form
/ia/ (เ◌ีย) เ◌ียะ เ◌ีย
/ɯa/ (เ◌ือ) เ◌ือะ เ◌ือ
/ua/ (◌ัว) ◌ัวะ ◌ัว

In broad IPA, the long and short centring diphthongs are not distinguished — both surface as , ɯə, . The length distinction is preserved in the internal representation but does not appear in the IPA output.


Offglides

Thai has two semi-vowel offglides that function as coda-like elements at the end of diphthongs: - /w/ — from ว or อ in specific vowel frames - /j/ — from ย or ◌็ in specific vowel frames

These are classified as Phoneme objects with is_sonorant=True in the coda position, and they make the syllable live (not dead). Schemes render them as letters or diacritics appropriate to their notation.