Python 3.10+ · Zero runtime dependencies

ภาษาไทย → transliteration,
one phonological model at a time.

Thai phonological transliteration engine — zero runtime dependencies, pluggable schemes.

thaiphon

You give it Thai text. You get back a pronunciation guide.

thaiphon is a zero-dependency Python library for Thai romanisation. It ships with eight built-in output schemes — IPA, the thai-language.com "Enhanced Phonemic" notation, two Cyrillic transliterations (Morev and LMT), the official RTGS romanization, RTL School romanization, and two Paiboon variants — and you can add your own with one data declaration.

from thaiphon import transcribe

transcribe("สวัสดี", scheme="ipa")                     # '/sa˨˩.wat̚˨˩.diː˧/'
transcribe("น้ำ",    scheme="tlc",   format="html")  # 'naam<sup>H</sup>'
transcribe("รัก",    scheme="morev", format="html")  # 'ракˇ'
transcribe("ข้าว",   scheme="rtl")          # 'khâaw'
transcribe("ปลา",    scheme="paiboon")      # 'bplaa'
transcribe("เสือ",   scheme="paiboon_plus") # 'sʉ̌ʉa'
Try it in your browser
No install needed
The online tool at Thai Transliteration Tool runs thaiphon directly. Paste Thai text and instantly get IPA, TLC, and Morev output. Install the Python package only if you need it in your own code or offline workflow.

Pick a thread

You want to produce phonetic guides for your students or personal study — and you'd rather not wrestle with a command line.


At a glance

Version 0.4.1
Python 3.10+
License Apache-2.0
Runtime dependencies Zero
Accuracy ~75% exact-match vs. Wiktionary IPA (17,014 words) with thaiphon-data-volubilis; ~57% base engine alone
Built-in schemes ipa, tlc, morev, lmt, rtl, paiboon, paiboon_plus, rtgs
Reading profiles everyday, careful_educated, learned_full, etalon_compat
Source github.com/5w0rdf15h/thaiphon
Package pip install thaiphon thaiphon-data-volubilis (recommended)

Install

# Recommended — engine + lexicon package (~57% → ~75% accuracy):
pip install thaiphon thaiphon-data-volubilis
# or
uv add thaiphon thaiphon-data-volubilis

The lexicon package (thaiphon-data-volubilis) ships a ~35,000-entry Thai lexicon derived from the VOLUBILIS Mundo Dictionary (CC-BY-SA 4.0). The engine picks it up on import if it's installed. Nothing to configure.

The base engine alone (pip install thaiphon) works without it; the lexicon package is what gets you from ~57% to ~75% on the Wiktionary IPA benchmark. See Install for full details.


Quick example

from thaiphon import transcribe, transcribe_sentence, analyze, list_schemes

# Which schemes are available?
list_schemes()
# ('ipa', 'lmt', 'morev', 'paiboon', 'paiboon_plus', 'rtgs', 'rtl', 'tlc')

# Transcribe a single word — default scheme is 'tlc', html mode gives superscript tones.
transcribe("ข้าว", format="html")
# 'khaao<sup>F</sup>'

# Choose a scheme explicitly.
transcribe("ข้าว", scheme="ipa")
# '/kʰaːw˥˩/'

# Inspect the phonological structure directly.
result = analyze("รัก")
for syl in result.best.syllables:
    print(syl.onset.symbol, syl.vowel.symbol, syl.vowel_length.name, syl.tone.name)
# r  a  SHORT  HIGH

How it works

thaiphon converts Thai text through four deterministic stages:

Thai text
Unicode normalisation + expansion (Sara Am → /aː/ + /m/, ๆ repetition, digits)
Syllabification → candidate ranking
Rule-based derivation (onset class → tone matrix → phoneme assignment)
PhonologicalWord  ←  universal intermediate, scheme-independent
SchemeMapping → surface form (IPA / TLC / Morev / your own)

Every output scheme is a pure transformation of the same PhonologicalWord. Fix a derivation bug once and all schemes benefit. See Architecture for the full picture.