Reading profiles¶

A reading profile tells thaiphon which register of pronunciation to produce. Thai speech varies by formality, and some words — especially loanwords and Sanskrit/Pali-derived vocabulary — are pronounced differently in casual conversation versus broadcast speech versus scholarly citation.

Pass a profile to any transcribe, transcribe_word, transcribe_sentence, or analyze call:

from thaiphon import transcribe

transcribe("ลิฟต์", scheme="ipa", profile="everyday")
# '/lif˦˥/'

transcribe("ลิฟต์", scheme="ipa", profile="etalon_compat")
# '/lip̚˦˥/'

The four profiles¶

`everyday` (default)¶

Colloquial urban pronunciation — the way educated Bangkok speakers say a word in normal conversation.

Foreign codas in modern loanwords are preserved when the word is well-integrated into everyday speech. ลิฟต์ (elevator) keeps its /f/ coda in this profile.
Foreign codas in less register-sensitive loans are collapsed to their native-Thai equivalents. กราฟ (graph) and บัส (bus) do not preserve foreign codas in everyday speech.
Cluster simplification applies where it is standard in colloquial speech.

Preservation is per-word, driven by the lexicon and attested usage — not by blanket rules that paper over lexical convention. Whether a given loanword keeps its foreign coda in everyday speech is recorded individually in the lexicon; the engine does not infer it from the spelling alone. This is one reason the thaiphon-data-volubilis package makes a meaningful difference to loanword accuracy.

This is the right profile for conversational Thai, teaching materials aimed at spoken language, and most general-purpose uses.

`careful_educated`¶

Formal broadcast register — the pronunciation used by news anchors, teachers in formal settings, and educated speakers in careful speech.

Preserves more foreign codas than everyday. กราฟ (graph) retains its /f/ coda in this profile, where everyday collapses it to /p/.
Suitable for materials aimed at formal written or broadcast contexts.

`learned_full`¶

Full Indic-derived readings for Sanskrit and Pali loanwords — the way Pali/Sanskrit scholars or monks pronounce words in a liturgical or academic context.

Appropriate for texts analysing Classical Thai literature, religious texts, or Indic etymology. The set of words affected is determined by the lexicon.

`etalon_compat`¶

Dictionary-citation style — collapses every foreign coda to its native-Thai phonological equivalent, regardless of register or lexical convention.

All /f/ codas become /p̚/, all /s/ codas become /t̚/, all /l/ codas become /n/.
This profile matches the transcription style used in many Thai pronunciation dictionaries that record the phonologically "corrected" form rather than the attested spoken form.
Also used as the baseline for accuracy benchmarking against Wiktionary IPA, because Wiktionary editors generally record dictionary-style citations.

Examples¶

The most visible difference between profiles is in loanwords that end in consonants foreign to Thai phonotactics.

Word	Gloss	`everyday`	`careful_educated`	`etalon_compat`
ลิฟต์	elevator	`/lif˦˥/`	`/lif˦˥/`	`/lip̚˦˥/`
กราฟ	graph	`/kraːp̚˦˥/`	`/kraːf˦˥/`	`/kraːp̚˦˥/`
บัส	bus	`/bat̚˨˩/`	`/bas˨˩/`	`/bat̚˨˩/`

Which profile should I use?¶

For most teaching and learning purposes, everyday is the right choice. It reflects how Thai people actually speak.

Use careful_educated if you are producing materials for formal contexts — a broadcast transcript, a formal speech, or educational materials that emphasise careful pronunciation.

Use learned_full only when you specifically need the full Indic citation form, for example in a phonological analysis of Sanskrit borrowings in Thai.

Use etalon_compat only for benchmarking or if you specifically want to match dictionary-citation style.

Profile handling in the API¶

All four profile strings are accepted by every entry point:

from thaiphon import transcribe, transcribe_sentence, analyze

# All three accept a profile kwarg.
transcribe("กราฟ", scheme="ipa", profile="careful_educated")
transcribe("บัส", scheme="ipa", profile="everyday")
analyze("ลิฟต์", profile="etalon_compat")

An unrecognised profile name raises ValueError:

transcribe("กา", scheme="tlc", profile="bad_profile")
# ValueError: unknown reading profile 'bad_profile'; expected one of [...]

Reading profiles¶

The four profiles¶

everyday (default)¶

careful_educated¶

learned_full¶

etalon_compat¶