Your first transcription¶

This page walks through the most common operations you will perform with thaiphon, with explanations of what each piece of output means.

Before you start

Make sure thaiphon is installed. If not, see Install or Install without Python experience.

The `transcribe` function¶

transcribe is the main entry point. It takes a Thai string and returns a romanization.

from thaiphon import transcribe

transcribe("สวัสดี", format="html")
# 'sa<sup>L</sup> wat<sup>L</sup> dee<sup>M</sup>'

The default scheme is tlc (thai-language.com Enhanced Phonemic). Pass format="html" to get tone tags as <sup> elements — the form shown throughout these docs. The plain-text form uses bracketed tags ({L}, {M}, etc.) and is available with the default format="text".

Choosing a scheme¶

# International Phonetic Alphabet
transcribe("สวัสดี", scheme="ipa")
# '/sa˨˩.wat̚˨˩.diː˧/'

# Cyrillic (Morev tradition) — format="html" gives superscript aspiration marks
transcribe("สวัสดี", scheme="morev", format="html")
# 'саˆ-ватˆ-дӣ'

# thai-language.com notation with superscript tones (html mode)
transcribe("สวัสดี", scheme="tlc", format="html")
# 'sa<sup>L</sup> wat<sup>L</sup> dee<sup>M</sup>'

See Schemes for a full comparison.

Sentence-level input¶

Use transcribe_sentence when your input contains pre-segmented words (separated by spaces or punctuation). It transcribes each token and joins results with spaces:

from thaiphon import transcribe_sentence, transcribe_word

# Transcribe individual words — most reliable for single known words.
transcribe_word("ฉัน", scheme="ipa")    # '/tɕʰan˩˩˦/'
transcribe_word("ชอบ", scheme="ipa")    # '/tɕʰɔːp̚˥˩/'
transcribe_word("กิน", scheme="ipa")    # '/kin˧/'
transcribe_word("ข้าว", scheme="ipa")   # '/kʰaːw˥˩/'

Word segmentation

transcribe_sentence uses a dictionary-based longest-match segmenter. Results depend on which words are in the built-in dictionary. For best results with sentences, pre-segment your text and pass individual words to transcribe_word, or install pythainlp for improved automatic segmentation.

For single known words, transcribe_word is equivalent to transcribe:

from thaiphon import transcribe_word

transcribe_word("น้ำ", scheme="ipa")
# '/naːm˦˥/'

Reading the IPA output¶

If you choose scheme ipa, the output uses standard IPA conventions:

Symbol	Meaning
`/…/`	phonemic slashes wrapping the whole word
`.`	syllable boundary
`ː`	long vowel (e.g. `aː` = long /a/)
`p̚` `t̚` `k̚`	unreleased stop codas
`˧`	mid tone
`˨˩`	low tone
`˥˩`	falling tone
`˦˥`	high tone
`˩˩˦`	rising tone

Example: /naːm˦˥/ = onset /n/, long /aː/ vowel, /m/ coda, high tone.

Reading the TLC output¶

The tlc scheme uses plain ASCII letters and is readable without special fonts:

Element	Notation
Long vowels	doubled letter: `aa`, `ee`, `uu`
Aspirated stops	`kh`, `th`, `ph`, `ch`
Unaspirated stops	`g` (k), `dt` (t), `bp` (p), `j` (tɕ)
Tones	`{M}` mid, `{L}` low, `{H}` high, `{F}` falling, `{R}` rising

Example: naam{H} = /n/ onset, long /aa/, /m/ coda, high tone.

A sample word list¶

from thaiphon import transcribe

words = {
    "สวัสดี": "hello",
    "น้ำ":    "water",
    "ข้าว":   "rice",
    "รัก":    "love",
    "ปลา":    "fish",
    "ภาษาไทย": "Thai language",
    "กรุงเทพ": "Bangkok",
    "ผลไม้":  "fruit",
}

for thai, gloss in words.items():
    ipa = transcribe(thai, scheme="ipa")
    tlc = transcribe(thai, scheme="tlc", format="html")
    print(f"{thai:12} ({gloss:15}) IPA: {ipa:30} TLC: {tlc}")

Accessing the phonological structure¶

If you need more than a string — for instance, to know the tone of each syllable individually — use analyze:

from thaiphon import analyze

result = analyze("ผลไม้")

for syl in result.best.syllables:
    print(
        f"onset={syl.onset.symbol if syl.onset else '∅':6} "
        f"vowel={syl.vowel.symbol:4} "
        f"length={syl.vowel_length.name:6} "
        f"coda={syl.coda.symbol if syl.coda else '∅':4} "
        f"tone={syl.tone.name}"
    )

Output (ผลไม้ has three syllables: ผล = pʰ+ɔ+n, ล = l+a, ไม้ = m+aː+j):

onset=pʰ     vowel=ɔ    length=SHORT  coda=n    tone=RISING
onset=l      vowel=a    length=SHORT  coda=∅    tone=HIGH
onset=m      vowel=a    length=LONG   coda=j    tone=HIGH

See analyze for the full API documentation.

Next steps¶

Reading profiles — adjust pronunciation register.
Schemes — understand what each scheme produces.
Override lexicons — inject authoritative pronunciations for specific words.
Troubleshooting — common problems.

Your first transcription¶

The transcribe function¶

Choosing a scheme¶

Sentence-level input¶

Reading the IPA output¶

Reading the TLC output¶

A sample word list¶

Accessing the phonological structure¶

Next steps¶

The `transcribe` function¶