Types¶
Public types and data structures in the thaiphon API.
AnalysisResult¶
Returned by analyze() and analyze_word().
@dataclass(frozen=True, slots=True)
class AnalysisResult:
best: PhonologicalWord
alternatives: tuple[PhonologicalWord, ...] = ()
source: str = "derivation"
raw: str = ""
loan_analysis: LoanAnalysis | None = None
| Field | Type | Description |
|---|---|---|
best |
PhonologicalWord |
Top-ranked phonological word |
alternatives |
tuple[PhonologicalWord, ...] |
Lower-ranked candidates (often empty) |
source |
str |
"lexicon", "derivation", or "derivation+lexicon" |
raw |
str |
Normalised input string |
loan_analysis |
LoanAnalysis \| None |
Foreignness detector result (observational) |
PhonologicalWord¶
An immutable tuple of syllables.
@dataclass(frozen=True, slots=True)
class PhonologicalWord:
syllables: tuple[Syllable, ...]
morpheme_boundaries: tuple[int, ...] = ()
confidence: float = 1.0
source: str = "derivation"
raw: str = ""
| Field | Description |
|---|---|
syllables |
The word's syllables, in order |
morpheme_boundaries |
Syllable indices where morpheme breaks occur (may be empty) |
confidence |
Syllabification confidence (1.0 = lexicon; < 1.0 = ranked derivation) |
source |
"lexicon", "derivation", or "derivation+lexicon" |
raw |
Original Thai orthographic string |
Supports len() and iteration:
word = analyze("กรุงเทพ").best
len(word) # 2
list(word) # [Syllable(...), Syllable(...)]
word.syllables[0] # Syllable for กรุง
Syllable¶
One syllable in the phonological representation.
@dataclass(frozen=True, slots=True)
class Syllable:
onset: Phoneme | Cluster | None
vowel: Phoneme
vowel_length: VowelLength
coda: Phoneme | None
tone: Tone
tone_mark: ToneMark = ToneMark.NONE
effective_class: EffectiveClass = EffectiveClass.MID
syllable_type: SyllableType = SyllableType.LIVE
raw: str = ""
inserted_vowel: bool = False
notes: tuple[str, ...] = ()
| Field | Type | Description |
|---|---|---|
onset |
Phoneme \| Cluster \| None |
Initial consonant(s) |
vowel |
Phoneme |
Nucleus vowel (IPA symbol) |
vowel_length |
VowelLength |
SHORT or LONG |
coda |
Phoneme \| None |
Final consonant, or None for open syllables |
tone |
Tone |
Derived tone |
tone_mark |
ToneMark |
Written tone mark (NONE if absent) |
effective_class |
EffectiveClass |
Class used for tone lookup |
syllable_type |
SyllableType |
LIVE or DEAD |
raw |
str |
Orthographic slice for this syllable |
inserted_vowel |
bool |
True when an inherent vowel was inserted |
Phoneme¶
A single IPA phoneme.
@dataclass(frozen=True, slots=True)
class Phoneme:
symbol: str
is_aspirated: bool = False
is_sonorant: bool = False
| Field | Description |
|---|---|
symbol |
IPA symbol (e.g. "kʰ", "aː", "m", "p̚") |
is_aspirated |
True for aspirated stop onsets |
is_sonorant |
True for sonorants (/m n ŋ j w r l/) |
Cluster¶
A two-phoneme onset cluster.
Example — the onset of ปลา:
result = analyze("ปลา")
onset = result.best.syllables[0].onset
isinstance(onset, Cluster) # True
onset.first.symbol # 'p'
onset.second.symbol # 'l'
Enumerations¶
All enums are str enums: Tone.MID == "MID" is True.
Tone¶
from thaiphon.model.enums import Tone
class Tone(str, Enum):
MID = "MID"
LOW = "LOW"
FALLING = "FALLING"
HIGH = "HIGH"
RISING = "RISING"
VowelLength¶
from thaiphon.model.enums import VowelLength
class VowelLength(str, Enum):
SHORT = "SHORT"
LONG = "LONG"
SyllableType¶
from thaiphon.model.enums import SyllableType
class SyllableType(str, Enum):
LIVE = "LIVE"
DEAD = "DEAD"
ToneMark¶
from thaiphon.model.enums import ToneMark
class ToneMark(str, Enum):
NONE = "NONE"
MAI_EK = "MAI_EK" # ◌่
MAI_THO = "MAI_THO" # ◌้
MAI_TRI = "MAI_TRI" # ◌๊
MAI_JATTAWA = "MAI_JATTAWA" # ◌๋
EffectiveClass¶
from thaiphon.model.enums import EffectiveClass
class EffectiveClass(str, Enum):
HIGH = "HIGH"
MID = "MID"
LOW = "LOW"
ConsonantClass¶
from thaiphon.model.enums import ConsonantClass
class ConsonantClass(str, Enum):
HIGH = "HIGH"
MID = "MID"
LOW_PAIRED = "LOW_PAIRED" # has a HC counterpart
LOW_SONORANT = "LOW_SONORANT" # no HC counterpart
SchemeMapping¶
See Write your own scheme for the full field reference.
Errors¶
from thaiphon.errors import (
ThaiphonError,
ParseError,
NormalizationError,
DerivationError,
UnsupportedSchemeError,
AmbiguousAnalysisError,
)
| Exception | When raised |
|---|---|
ThaiphonError |
Base class for all thaiphon exceptions |
ParseError |
Tokenization or orthography rejects input |
NormalizationError |
A combining mark appears at string start without a base character |
DerivationError |
A tone matrix lookup has no entry (non-standard mark combination) |
UnsupportedSchemeError |
The requested scheme is not registered |
AmbiguousAnalysisError |
Strict mode: multiple candidates tie after ranking |