Your first transcription¶
This page walks through the most common operations you will perform with thaiphon, with explanations of what each piece of output means.
Before you start
Make sure thaiphon is installed. If not, see Install or Install without Python experience.
The transcribe function¶
transcribe is the main entry point. It takes a Thai string and returns a romanization.
from thaiphon import transcribe
transcribe("สวัสดี", format="html")
# 'sa<sup>L</sup> wat<sup>L</sup> dee<sup>M</sup>'
The default scheme is tlc (thai-language.com Enhanced Phonemic). Pass format="html" to get tone tags as <sup> elements — the form shown throughout these docs. The plain-text form uses bracketed tags ({L}, {M}, etc.) and is available with the default format="text".
Choosing a scheme¶
# International Phonetic Alphabet
transcribe("สวัสดี", scheme="ipa")
# '/sa˨˩.wat̚˨˩.diː˧/'
# Cyrillic (Morev tradition) — format="html" gives superscript aspiration marks
transcribe("สวัสดี", scheme="morev", format="html")
# 'саˆ-ватˆ-дӣ'
# thai-language.com notation with superscript tones (html mode)
transcribe("สวัสดี", scheme="tlc", format="html")
# 'sa<sup>L</sup> wat<sup>L</sup> dee<sup>M</sup>'
See Schemes for a full comparison.
Sentence-level input¶
Use transcribe_sentence when your input contains pre-segmented words (separated by spaces or punctuation). It transcribes each token and joins results with spaces:
from thaiphon import transcribe_sentence, transcribe_word
# Transcribe individual words — most reliable for single known words.
transcribe_word("ฉัน", scheme="ipa") # '/tɕʰan˩˩˦/'
transcribe_word("ชอบ", scheme="ipa") # '/tɕʰɔːp̚˥˩/'
transcribe_word("กิน", scheme="ipa") # '/kin˧/'
transcribe_word("ข้าว", scheme="ipa") # '/kʰaːw˥˩/'
Word segmentation
transcribe_sentence uses a dictionary-based longest-match segmenter. Results depend on which words are in the built-in dictionary. For best results with sentences, pre-segment your text and pass individual words to transcribe_word, or install pythainlp for improved automatic segmentation.
For single known words, transcribe_word is equivalent to transcribe:
Reading the IPA output¶
If you choose scheme ipa, the output uses standard IPA conventions:
| Symbol | Meaning |
|---|---|
/…/ |
phonemic slashes wrapping the whole word |
. |
syllable boundary |
ː |
long vowel (e.g. aː = long /a/) |
p̚ t̚ k̚ |
unreleased stop codas |
˧ |
mid tone |
˨˩ |
low tone |
˥˩ |
falling tone |
˦˥ |
high tone |
˩˩˦ |
rising tone |
Example: /naːm˦˥/ = onset /n/, long /aː/ vowel, /m/ coda, high tone.
Reading the TLC output¶
The tlc scheme uses plain ASCII letters and is readable without special fonts:
| Element | Notation |
|---|---|
| Long vowels | doubled letter: aa, ee, uu |
| Aspirated stops | kh, th, ph, ch |
| Unaspirated stops | g (k), dt (t), bp (p), j (tɕ) |
| Tones | {M} mid, {L} low, {H} high, {F} falling, {R} rising |
Example: naam{H} = /n/ onset, long /aa/, /m/ coda, high tone.
A sample word list¶
from thaiphon import transcribe
words = {
"สวัสดี": "hello",
"น้ำ": "water",
"ข้าว": "rice",
"รัก": "love",
"ปลา": "fish",
"ภาษาไทย": "Thai language",
"กรุงเทพ": "Bangkok",
"ผลไม้": "fruit",
}
for thai, gloss in words.items():
ipa = transcribe(thai, scheme="ipa")
tlc = transcribe(thai, scheme="tlc", format="html")
print(f"{thai:12} ({gloss:15}) IPA: {ipa:30} TLC: {tlc}")
Accessing the phonological structure¶
If you need more than a string — for instance, to know the tone of each syllable individually — use analyze:
from thaiphon import analyze
result = analyze("ผลไม้")
for syl in result.best.syllables:
print(
f"onset={syl.onset.symbol if syl.onset else '∅':6} "
f"vowel={syl.vowel.symbol:4} "
f"length={syl.vowel_length.name:6} "
f"coda={syl.coda.symbol if syl.coda else '∅':4} "
f"tone={syl.tone.name}"
)
Output (ผลไม้ has three syllables: ผล = pʰ+ɔ+n, ล = l+a, ไม้ = m+aː+j):
onset=pʰ vowel=ɔ length=SHORT coda=n tone=RISING
onset=l vowel=a length=SHORT coda=∅ tone=HIGH
onset=m vowel=a length=LONG coda=j tone=HIGH
See analyze for the full API documentation.
Next steps¶
- Reading profiles — adjust pronunciation register.
- Schemes — understand what each scheme produces.
- Troubleshooting — common problems.