Phonological model¶

The phonological model is thaiphon's universal intermediate representation — the scheme-independent data structure that sits between orthography parsing and rendering.

Design principles¶

Immutable. Every object in the model is a frozen dataclass. Once computed, a PhonologicalWord cannot be changed. This makes the model safe to cache, pass between threads, and use as a dictionary key.
IPA-based. All phoneme symbols use IPA notation internally. Schemes translate from IPA to their target notation — they do not invent a private phoneme representation.
Slots. All frozen dataclasses use __slots__ for reduced memory overhead.
No I/O. The model objects contain no file paths, database references, or network handles.

The class hierarchy¶

PhonologicalWord
  └── syllables: tuple[Syllable, ...]
        ├── onset:  Phoneme | Cluster | None
        │     ├── Phoneme.symbol: str   (IPA)
        │     └── Cluster.first + .second: Phoneme
        ├── vowel:  Phoneme
        ├── vowel_length: VowelLength   (SHORT | LONG)
        ├── coda:   Phoneme | None
        ├── tone:   Tone               (MID | LOW | FALLING | HIGH | RISING)
        ├── tone_mark: ToneMark        (NONE | MAI_EK | MAI_THO | MAI_TRI | MAI_JATTAWA)
        ├── effective_class: EffectiveClass (HIGH | MID | LOW)
        ├── syllable_type: SyllableType    (LIVE | DEAD)
        ├── raw: str                   (orthographic slice)
        └── inserted_vowel: bool       (True when a vowel was implicitly inserted)

PhonologicalWord¶

from thaiphon.model.word import PhonologicalWord

Field	Type	Description
`syllables`	`tuple[Syllable, ...]`	The syllables of the word, in order
`morpheme_boundaries`	`tuple[int, ...]`	Indices of morpheme boundaries (may be empty)
`confidence`	`float`	Syllabification confidence score (1.0 = lexicon hit)
`source`	`str`	`"lexicon"`, `"derivation"`, or `"derivation+lexicon"`
`raw`	`str`	The original input string

PhonologicalWord supports len() and iteration over its syllables:

result = analyze("สวัสดี")
word = result.best

len(word)         # 3 (three syllables)
for syl in word:
    print(syl.tone.name)   # LOW, LOW, MID

Syllable¶

from thaiphon.model.syllable import Syllable

Field	Type	Default	Description
`onset`	`Phoneme \\| Cluster \\| None`	—	Initial consonant(s)
`vowel`	`Phoneme`	—	Nucleus vowel
`vowel_length`	`VowelLength`	—	SHORT or LONG
`coda`	`Phoneme \\| None`	—	Final consonant, or None for open syllables
`tone`	`Tone`	—	Derived tone
`tone_mark`	`ToneMark`	`NONE`	Written tone mark, if any
`effective_class`	`EffectiveClass`	`MID`	Class used for tone lookup (after leading-ห adjustment)
`syllable_type`	`SyllableType`	`LIVE`	LIVE or DEAD
`raw`	`str`	`""`	Orthographic slice for this syllable
`inserted_vowel`	`bool`	`False`	True when an inherent vowel was inserted
`notes`	`tuple[str, ...]`	`()`	Diagnostic notes (for debugging)

Phoneme and Cluster¶

from thaiphon.model.phoneme import Phoneme, Cluster

Phoneme is a single IPA phoneme:

Field	Type	Description
`symbol`	`str`	IPA symbol (e.g. `"kʰ"`, `"aː"`, `"m"`)
`is_aspirated`	`bool`	True for aspirated stops
`is_sonorant`	`bool`	True for sonorants (/m n ŋ j w r l/)

Cluster is a two-phoneme onset cluster:

Field	Type	Description
`first`	`Phoneme`	First consonant of the cluster
`second`	`Phoneme`	Second consonant (typically /r/, /l/, or /w/)

Enumerations¶

from thaiphon.model.enums import Tone, VowelLength, SyllableType, ToneMark, EffectiveClass, ConsonantClass

All enumerations are str enums, meaning they compare equal to their string names:

from thaiphon.model.enums import Tone
Tone.MID == "MID"   # True

Enum	Values
`Tone`	`MID`, `LOW`, `FALLING`, `HIGH`, `RISING`
`VowelLength`	`SHORT`, `LONG`
`SyllableType`	`LIVE`, `DEAD`
`ToneMark`	`NONE`, `MAI_EK`, `MAI_THO`, `MAI_TRI`, `MAI_JATTAWA`
`EffectiveClass`	`HIGH`, `MID`, `LOW`
`ConsonantClass`	`HIGH`, `MID`, `LOW_PAIRED`, `LOW_SONORANT`

AnalysisResult¶

from thaiphon.model.candidate import AnalysisResult

Returned by analyze() and analyze_word():

Field	Type	Description
`best`	`PhonologicalWord`	Top-ranked phonological word
`alternatives`	`tuple[PhonologicalWord, ...]`	Lower-ranked candidates (may be empty)
`source`	`str`	`"lexicon"` or `"derivation"`
`raw`	`str`	The normalised input string
`loan_analysis`	`LoanAnalysis \\| None`	Foreignness detector output (observational only)

from thaiphon import analyze

result = analyze("น้ำ")
result.best           # PhonologicalWord
result.best.syllables # tuple of Syllable
result.raw            # 'น้ำ'
result.source         # 'lexicon' (found in the lexicon)