Reading profiles¶
A reading profile tells thaiphon which register of pronunciation to produce. Thai speech varies by formality, and some words — especially loanwords and Sanskrit/Pali-derived vocabulary — are pronounced differently in casual conversation versus broadcast speech versus scholarly citation.
Pass a profile to any transcribe, transcribe_word, transcribe_sentence, or analyze call:
from thaiphon import transcribe
transcribe("ลิฟต์", scheme="ipa", profile="everyday")
# '/lif˦˥/'
transcribe("ลิฟต์", scheme="ipa", profile="etalon_compat")
# '/lip̚˦˥/'
The four profiles¶
everyday (default)¶
Colloquial urban pronunciation — the way educated Bangkok speakers say a word in normal conversation.
- Foreign codas in modern loanwords are preserved when the word is well-integrated into everyday speech. ลิฟต์ (elevator) keeps its /f/ coda in this profile.
- Foreign codas in less register-sensitive loans are collapsed to their native-Thai equivalents. กราฟ (graph) and บัส (bus) do not preserve foreign codas in everyday speech.
- Cluster simplification applies where it is standard in colloquial speech.
Preservation is per-word, driven by the lexicon and attested usage — not by blanket rules that paper over lexical convention. Whether a given loanword keeps its foreign coda in everyday speech is recorded individually in the lexicon; the engine does not infer it from the spelling alone. This is one reason the thaiphon-data-volubilis package makes a meaningful difference to loanword accuracy.
This is the right profile for conversational Thai, teaching materials aimed at spoken language, and most general-purpose uses.
careful_educated¶
Formal broadcast register — the pronunciation used by news anchors, teachers in formal settings, and educated speakers in careful speech.
- Preserves more foreign codas than
everyday. กราฟ (graph) retains its /f/ coda in this profile, whereeverydaycollapses it to /p/. - Suitable for materials aimed at formal written or broadcast contexts.
learned_full¶
Full Indic-derived readings for Sanskrit and Pali loanwords — the way Pali/Sanskrit scholars or monks pronounce words in a liturgical or academic context.
- Appropriate for texts analysing Classical Thai literature, religious texts, or Indic etymology. The set of words affected is determined by the lexicon.
etalon_compat¶
Dictionary-citation style — collapses every foreign coda to its native-Thai phonological equivalent, regardless of register or lexical convention.
- All /f/ codas become /p̚/, all /s/ codas become /t̚/, all /l/ codas become /n/.
- This profile matches the transcription style used in many Thai pronunciation dictionaries that record the phonologically "corrected" form rather than the attested spoken form.
- Also used as the baseline for accuracy benchmarking against Wiktionary IPA, because Wiktionary editors generally record dictionary-style citations.
Examples¶
The most visible difference between profiles is in loanwords that end in consonants foreign to Thai phonotactics.
| Word | Gloss | everyday |
careful_educated |
etalon_compat |
|---|---|---|---|---|
| ลิฟต์ | elevator | /lif˦˥/ |
/lif˦˥/ |
/lip̚˦˥/ |
| กราฟ | graph | /kraːp̚˦˥/ |
/kraːf˦˥/ |
/kraːp̚˦˥/ |
| บัส | bus | /bat̚˨˩/ |
/bas˨˩/ |
/bat̚˨˩/ |
Which profile should I use?¶
For most teaching and learning purposes, everyday is the right choice. It reflects how Thai people actually speak.
Use careful_educated if you are producing materials for formal contexts — a broadcast transcript, a formal speech, or educational materials that emphasise careful pronunciation.
Use learned_full only when you specifically need the full Indic citation form, for example in a phonological analysis of Sanskrit borrowings in Thai.
Use etalon_compat only for benchmarking or if you specifically want to match dictionary-citation style.
Profile handling in the API¶
All four profile strings are accepted by every entry point:
from thaiphon import transcribe, transcribe_sentence, analyze
# All three accept a profile kwarg.
transcribe("กราฟ", scheme="ipa", profile="careful_educated")
transcribe("บัส", scheme="ipa", profile="everyday")
analyze("ลิฟต์", profile="etalon_compat")
An unrecognised profile name raises ValueError: