Write your own scheme¶
thaiphon's output is controlled by a SchemeMapping — a plain Python data structure that maps each phoneme in the engine's internal representation to a string in your target notation. You do not need to touch any phonological code. If you know what sounds Thai has and how you want to spell them, you can write a complete new scheme.
What a scheme needs to specify¶
A scheme maps these phoneme categories to surface strings:
- Onset consonants — one IPA symbol → your string (e.g.
kʰ→"kh") - Vowels — a (quality, length) pair → your string (e.g.
("a", LONG)→"aa") - Codas — one IPA coda symbol → your string (e.g.
"ŋ"→"ng") - Tone — a function that takes a base syllable string and a
Syllableobject and returns the tone-decorated string - Optional extras — syllable separator, cluster joiner, context-dependent overrides
All of this lives in a SchemeMapping dataclass. Instantiate one, pass it to a MappingRenderer, and register it.
A complete working example: RTGS-inspired scheme¶
The Royal Thai General System (RTGS) is an official romanization published by the Royal Institute of Thailand. The following example implements a close approximation:
from thaiphon.model.enums import VowelLength, Tone
from thaiphon.model.syllable import Syllable
from thaiphon.renderers.mapping import SchemeMapping, MappingRenderer
from thaiphon.registry import RENDERERS
from thaiphon import transcribe
# Step 1: Map onset IPA symbols to RTGS letters.
# The IPA symbols are the engine's internal representation — not the Thai letters.
ONSET_MAP = {
"k": "k",
"kʰ": "kh",
"tɕ": "ch", # RTGS uses 'ch' for the unaspirated palatal affricate
"tɕʰ": "ch", # RTGS does not distinguish aspirated vs unaspirated here
"d": "d",
"t": "t",
"tʰ": "th",
"b": "b",
"p": "p",
"pʰ": "ph",
"f": "f",
"s": "s",
"h": "h",
"ʔ": "", # glottal onset — RTGS does not write it
"m": "m",
"n": "n",
"ŋ": "ng",
"j": "y",
"r": "r",
"l": "l",
"w": "w",
}
# Step 2: Map (vowel quality, length) pairs to surface strings.
# VowelLength.SHORT and VowelLength.LONG come from the enum.
VOWEL_MAP = {
("a", VowelLength.SHORT): "a",
("a", VowelLength.LONG): "a", # RTGS does not mark vowel length
("i", VowelLength.SHORT): "i",
("i", VowelLength.LONG): "i",
("u", VowelLength.SHORT): "u",
("u", VowelLength.LONG): "u",
("e", VowelLength.SHORT): "e",
("e", VowelLength.LONG): "e",
("ɛ", VowelLength.SHORT): "ae",
("ɛ", VowelLength.LONG): "ae",
("o", VowelLength.SHORT): "o",
("o", VowelLength.LONG): "o",
("ɔ", VowelLength.SHORT): "o",
("ɔ", VowelLength.LONG): "o",
("ɯ", VowelLength.SHORT): "ue",
("ɯ", VowelLength.LONG): "ue",
("ɤ", VowelLength.SHORT): "oe",
("ɤ", VowelLength.LONG): "oe",
("iə", VowelLength.SHORT): "ia",
("iə", VowelLength.LONG): "ia",
("ɯə", VowelLength.SHORT): "uea",
("ɯə", VowelLength.LONG): "uea",
("uə", VowelLength.SHORT): "ua",
("uə", VowelLength.LONG): "ua",
}
# Step 3: Map coda IPA symbols to surface strings.
CODA_MAP = {
"m": "m",
"n": "n",
"ŋ": "ng",
"p̚": "p",
"t̚": "t",
"k̚": "k",
"w": "o", # offglide /w/ → 'o' in RTGS convention
"j": "i", # offglide /j/ → 'i'
"f": "f",
}
# Step 4: Tone formatter.
# RTGS does not write tones, so we simply return the base string unchanged.
def tone_format(base: str, syl: Syllable) -> str:
return base
# Step 5: Assemble the SchemeMapping.
RTGS_MAPPING = SchemeMapping(
scheme_id="rtgs",
onset_map=ONSET_MAP,
vowel_map=VOWEL_MAP,
coda_map=CODA_MAP,
tone_format=tone_format,
syllable_separator="", # RTGS writes multi-syllable words without separator
cluster_joiner="",
empty_onset="",
unknown_fallback="?",
)
# Step 6: Register the scheme.
# This only needs to run once — typically at import time.
if "rtgs" not in RENDERERS:
RENDERERS.register("rtgs", lambda: MappingRenderer(RTGS_MAPPING))
# Step 7: Use it.
transcribe("สวัสดี", scheme="rtgs")
# 'sawatdi'
transcribe("กรุงเทพ", scheme="rtgs")
# 'krungthep'
SchemeMapping field reference¶
| Field | Type | Required | Description |
|---|---|---|---|
scheme_id |
str |
yes | The string key used in transcribe(..., scheme="your_id") |
onset_map |
Mapping[str, str] |
yes | IPA onset symbol → surface string |
vowel_map |
Mapping[tuple[str, VowelLength], str] |
yes | (IPA quality, VowelLength) → surface string |
coda_map |
Mapping[str, str] |
yes | IPA coda symbol → surface string |
tone_format |
Callable[[str, Syllable], str] |
yes | Function that adds tone decoration to the base syllable string. Used for format="text" and as fallback for format="html". |
tone_format_html |
Callable[[str, Syllable], str] \| None |
no | Alternate tone formatter used only when format="html". When None, tone_format is used for both formats. |
coda_context_map |
Mapping[tuple[str, VowelLength, str], str] \| None |
no | Context-dependent coda overrides keyed by (vowel, length, coda-IPA) |
vowel_context_map |
Mapping[tuple[str, VowelLength, str], str] \| None |
no | Context-dependent vowel overrides keyed by (vowel, length, coda-IPA) |
word_coda_override |
Callable[[str, Syllable, str, str], str \| None] \| None |
no | Per-word coda override for loanword/profile-sensitive codas |
cluster_joiner |
str |
no | String inserted between the two phonemes of an onset cluster. Default: "" |
syllable_separator |
str |
no | Inserted between syllables. Default: "-" |
empty_onset |
str |
no | Rendered when a syllable has no onset consonant. Default: "" |
unknown_fallback |
str |
no | Rendered when a phoneme has no entry in the map. Default: "?" |
The tone_format function¶
The tone_format callable receives two arguments:
base: str— the syllable rendered so far (onset + vowel + coda, concatenated).syl: Syllable— theSyllableobject, giving access tosyl.tone(aToneenum:MID,LOW,FALLING,HIGH,RISING).
Return the final syllable string, with whatever tone decoration your scheme uses.
No tone (RTGS-style):
Bracketed tags (TLC-style):
from thaiphon.model.enums import Tone
_TAG = {Tone.MID: "{M}", Tone.LOW: "{L}", Tone.HIGH: "{H}", Tone.FALLING: "{F}", Tone.RISING: "{R}"}
def tone_format(base: str, syl: Syllable) -> str:
return base + _TAG[syl.tone]
Superscript digit (pedagogical):
_DIGIT = {Tone.MID: "3", Tone.LOW: "2", Tone.HIGH: "4", Tone.FALLING: "5", Tone.RISING: "1"}
def tone_format(base: str, syl: Syllable) -> str:
return base + _DIGIT[syl.tone]
Per-format tone markup with tone_format_html¶
If your scheme needs a different tone representation in HTML output — for example, wrapping the tone tag in a <sup> element — you can declare an optional tone_format_html callable on the mapping:
from thaiphon.model.enums import Tone
from thaiphon.model.syllable import Syllable
_TAG = {Tone.MID: "M", Tone.LOW: "L", Tone.HIGH: "H", Tone.FALLING: "F", Tone.RISING: "R"}
def tone_format_text(base: str, syl: Syllable) -> str:
return base + "{" + _TAG[syl.tone] + "}"
def tone_format_html(base: str, syl: Syllable) -> str:
return base + "<sup>" + _TAG[syl.tone] + "</sup>"
MY_MAPPING = SchemeMapping(
scheme_id="my_scheme",
...,
tone_format=tone_format_text,
tone_format_html=tone_format_html, # used only when format="html"
)
When transcribe(..., format="html") is called, the renderer uses tone_format_html. For format="text" (the default), it uses tone_format. If tone_format_html is None (the default when omitted), tone_format is used for both formats — so schemes that do not need the distinction need not declare it at all.
Context-dependent spellings¶
Some schemes need the coda or vowel representation to vary based on the surrounding phonemes. The coda_context_map and vowel_context_map fields support this.
Example: a scheme where the /j/ coda spells as "y" after /ɔː/ but "i" elsewhere:
CODA_CONTEXT = {
("ɔ", VowelLength.LONG, "j"): "y",
}
mapping = SchemeMapping(
...,
coda_context_map=CODA_CONTEXT,
coda_map={..., "j": "i", ...},
)
When thaiphon renders a syllable with vowel /ɔː/ and coda /j/, it checks coda_context_map first and finds "y". Any other vowel before /j/ falls through to coda_map and gets "i".
Sharing your scheme¶
If you have written a scheme that would be useful to other thaiphon users, consider contributing it:
- See Add a scheme for the conceptual walk-through.
- See Pull requests for how to propose the addition on GitHub.