Reading profiles

Reading profiles

A reading profile tells thaiphon which register of pronunciation to produce. Thai speech varies by formality, and some words — especially loanwords and Sanskrit/Pali-derived vocabulary — are pronounced differently in casual conversation versus broadcast speech versus scholarly citation.

Pass a profile to any transcribe, transcribe_word, transcribe_sentence, or analyze call:

from thaiphon import transcribe

transcribe("ลิฟต์", scheme="ipa", profile="everyday")
# '/lif˦˥/'

transcribe("ลิฟต์", scheme="ipa", profile="etalon_compat")
# '/lip̚˦˥/'

The four profiles

everyday (default)

Colloquial urban pronunciation — the way educated Bangkok speakers say a word in normal conversation.

  • Foreign codas in modern loanwords are preserved when the word is well-integrated into everyday speech. ลิฟต์ (elevator) keeps its /f/ coda in this profile.
  • Foreign codas in less register-sensitive loans are collapsed to their native-Thai equivalents. กราฟ (graph) and บัส (bus) do not preserve foreign codas in everyday speech.
  • Cluster simplification applies where it is standard in colloquial speech.

Preservation is per-word, driven by the lexicon and attested usage — not by blanket rules that paper over lexical convention. Whether a given loanword keeps its foreign coda in everyday speech is recorded individually in the lexicon; the engine does not infer it from the spelling alone. This is one reason the thaiphon-data-volubilis package makes a meaningful difference to loanword accuracy.

This is the right profile for conversational Thai, teaching materials aimed at spoken language, and most general-purpose uses.

careful_educated

Formal broadcast register — the pronunciation used by news anchors, teachers in formal settings, and educated speakers in careful speech.

  • Preserves more foreign codas than everyday. กราฟ (graph) retains its /f/ coda in this profile, where everyday collapses it to /p/.
  • Suitable for materials aimed at formal written or broadcast contexts.

learned_full

Full Indic-derived readings for Sanskrit and Pali loanwords — the way Pali/Sanskrit scholars or monks pronounce words in a liturgical or academic context.

  • Appropriate for texts analysing Classical Thai literature, religious texts, or Indic etymology. The set of words affected is determined by the lexicon.

etalon_compat

Dictionary-citation style — collapses every foreign coda to its native-Thai phonological equivalent, regardless of register or lexical convention.

  • All /f/ codas become /p̚/, all /s/ codas become /t̚/, all /l/ codas become /n/.
  • This profile matches the transcription style used in many Thai pronunciation dictionaries that record the phonologically "corrected" form rather than the attested spoken form.
  • Also used as the baseline for accuracy benchmarking against Wiktionary IPA, because Wiktionary editors generally record dictionary-style citations.

Examples

The most visible difference between profiles is in loanwords that end in consonants foreign to Thai phonotactics.

Word Gloss everyday careful_educated etalon_compat
ลิฟต์ elevator /lif˦˥/ /lif˦˥/ /lip̚˦˥/
กราฟ graph /kraːp̚˦˥/ /kraːf˦˥/ /kraːp̚˦˥/
บัส bus /bat̚˨˩/ /bas˨˩/ /bat̚˨˩/

Which profile should I use?

For most teaching and learning purposes, everyday is the right choice. It reflects how Thai people actually speak.

Use careful_educated if you are producing materials for formal contexts — a broadcast transcript, a formal speech, or educational materials that emphasise careful pronunciation.

Use learned_full only when you specifically need the full Indic citation form, for example in a phonological analysis of Sanskrit borrowings in Thai.

Use etalon_compat only for benchmarking or if you specifically want to match dictionary-citation style.


Profile handling in the API

All four profile strings are accepted by every entry point:

from thaiphon import transcribe, transcribe_sentence, analyze

# All three accept a profile kwarg.
transcribe("กราฟ", scheme="ipa", profile="careful_educated")
transcribe("บัส", scheme="ipa", profile="everyday")
analyze("ลิฟต์", profile="etalon_compat")

An unrecognised profile name raises ValueError:

transcribe("กา", scheme="tlc", profile="bad_profile")
# ValueError: unknown reading profile 'bad_profile'; expected one of [...]