thaiphon¶
You give it Thai text. You get back a pronunciation guide.
thaiphon is a zero-dependency Python library for Thai romanisation. It ships with eight built-in output schemes — IPA, the thai-language.com "Enhanced Phonemic" notation, two Cyrillic transliterations (Morev and LMT), the official RTGS romanization, RTL School romanization, and two Paiboon variants — and you can add your own with one data declaration.
from thaiphon import transcribe
transcribe("สวัสดี", scheme="ipa") # '/sa˨˩.wat̚˨˩.diː˧/'
transcribe("น้ำ", scheme="tlc", format="html") # 'naam<sup>H</sup>'
transcribe("รัก", scheme="morev", format="html") # 'ракˇ'
transcribe("ข้าว", scheme="rtl") # 'khâaw'
transcribe("ปลา", scheme="paiboon") # 'bplaa'
transcribe("เสือ", scheme="paiboon_plus") # 'sʉ̌ʉa'
Pick a thread¶
You want to produce phonetic guides for your students or personal study — and you'd rather not wrestle with a command line.
- What thaiphon does — plain-English explanation
- Install without Python experience — step-by-step, including installing Python itself
- Your first transcription — copy-paste and run
- Reading profiles — everyday vs. formal register
At a glance¶
| Version | 0.4.1 |
| Python | 3.10+ |
| License | Apache-2.0 |
| Runtime dependencies | Zero |
| Accuracy | ~75% exact-match vs. Wiktionary IPA (17,014 words) with thaiphon-data-volubilis; ~57% base engine alone |
| Built-in schemes | ipa, tlc, morev, lmt, rtl, paiboon, paiboon_plus, rtgs |
| Reading profiles | everyday, careful_educated, learned_full, etalon_compat |
| Source | github.com/5w0rdf15h/thaiphon |
| Package | pip install thaiphon thaiphon-data-volubilis (recommended) |
Install¶
# Recommended — engine + lexicon package (~57% → ~75% accuracy):
pip install thaiphon thaiphon-data-volubilis
# or
uv add thaiphon thaiphon-data-volubilis
The lexicon package (thaiphon-data-volubilis) ships a ~35,000-entry Thai lexicon derived from the VOLUBILIS Mundo Dictionary (CC-BY-SA 4.0). The engine picks it up on import if it's installed. Nothing to configure.
The base engine alone (pip install thaiphon) works without it; the lexicon package is what gets you from ~57% to ~75% on the Wiktionary IPA benchmark. See Install for full details.
Quick example¶
from thaiphon import transcribe, transcribe_sentence, analyze, list_schemes
# Which schemes are available?
list_schemes()
# ('ipa', 'lmt', 'morev', 'paiboon', 'paiboon_plus', 'rtgs', 'rtl', 'tlc')
# Transcribe a single word — default scheme is 'tlc', html mode gives superscript tones.
transcribe("ข้าว", format="html")
# 'khaao<sup>F</sup>'
# Choose a scheme explicitly.
transcribe("ข้าว", scheme="ipa")
# '/kʰaːw˥˩/'
# Inspect the phonological structure directly.
result = analyze("รัก")
for syl in result.best.syllables:
print(syl.onset.symbol, syl.vowel.symbol, syl.vowel_length.name, syl.tone.name)
# r a SHORT HIGH
How it works¶
thaiphon converts Thai text through four deterministic stages:
Thai text
↓
Unicode normalisation + expansion (Sara Am → /aː/ + /m/, ๆ repetition, digits)
↓
Syllabification → candidate ranking
↓
Rule-based derivation (onset class → tone matrix → phoneme assignment)
↓
PhonologicalWord ← universal intermediate, scheme-independent
↓
SchemeMapping → surface form (IPA / TLC / Morev / your own)
Every output scheme is a pure transformation of the same PhonologicalWord. Fix a derivation bug once and all schemes benefit. See Architecture for the full picture.