transcribe, transcribe_word, transcribe_sentence¶

The three transcription functions convert Thai text to a romanized or phonetic string.

`transcribe`¶

def transcribe(
    text: str,
    scheme: str = "tlc",
    *,
    format: Literal["text", "html"] = "text",
    profile: ReadingProfile = "everyday",
) -> str:

Transcribe a Thai word or short phrase into the target scheme.

Parameters¶

Parameter	Type	Default	Description
`text`	`str`	—	Thai text to transcribe. NFC normalisation is applied automatically.
`scheme`	`str`	`"tlc"`	The romanization scheme to use. Must be a registered scheme name.
`format`	`"text"` \| `"html"`	`"text"`	Output format. `"text"` returns a plain string. `"html"` activates per-scheme HTML rendering (e.g. superscript tone tags for TLC, superscript aspiration markup for Morev). Schemes without HTML-specific output return the same string as `"text"`.
`profile`	`ReadingProfile`	`"everyday"`	Reading profile. Controls register-sensitive pronunciation decisions.

Returns¶

str — the transcribed text in the requested scheme and format.

Raises¶

UnsupportedSchemeError — if scheme is not registered.
ValueError — if profile is not one of the four valid profile strings.

Examples¶

from thaiphon import transcribe

# Default scheme is TLC; use format="html" for superscript tone tags.
transcribe("น้ำ", format="html")
# 'naam<sup>H</sup>'

# IPA scheme — format has no effect, output is always the same.
transcribe("น้ำ", scheme="ipa")
# '/naːm˦˥/'

# Morev scheme — format="html" emits superscript aspiration markup.
transcribe("ขอ", scheme="morev", format="html")
# 'к<sup>х</sup>ɔ̄´'

# Morev without html — aspiration written as plain digraph кх / тх / пх.
transcribe("น้ำ", scheme="morev")
# 'на̄мˇ'

# TLC text mode — bracketed tags instead of superscripts.
transcribe("น้ำ", scheme="tlc")
# 'naam{H}'

# Reading profile.
transcribe("ลิฟต์", scheme="ipa", profile="everyday")
# '/lif˦˥/'

transcribe("ลิฟต์", scheme="ipa", profile="etalon_compat")
# '/lip̚˦˥/'

# Empty input returns empty string.
transcribe("", scheme="tlc")
# ''

Notes¶

transcribe calls analyze internally. For multiple transcriptions of the same word in different schemes, it is more efficient to call analyze once and render with each scheme's renderer.render_word.
The default scheme is "tlc". To check which schemes are available, call list_schemes().
NFC normalisation ensures NFD and NFC input produce identical output.

`transcribe_word`¶

def transcribe_word(
    text: str,
    scheme: str = "tlc",
    *,
    format: Literal["text", "html"] = "text",
    profile: ReadingProfile = "everyday",
) -> str:

Identical to transcribe. Provided as an explicit alternative when the caller wants to signal that the input is a single known word (rather than a possibly multi-word phrase).

Example¶

from thaiphon import transcribe_word

transcribe_word("สวัสดี", scheme="ipa")
# '/sa˨˩.wat̚˨˩.diː˧/'

`transcribe_sentence`¶

def transcribe_sentence(
    text: str,
    scheme: str = "tlc",
    *,
    format: Literal["text", "html"] = "text",
    profile: ReadingProfile = "everyday",
    segmenter: Callable[[str], Sequence[str]] | None = None,
) -> str:

Segment text into words, transcribe each word, and join the results with spaces.

Parameters¶

Parameter	Type	Default	Description
`text`	`str`	—	Full sentence or multi-word string.
`scheme`	`str`	`"tlc"`	Romanization scheme.
`format`	`"text"` \| `"html"`	`"text"`	Output format.
`profile`	`ReadingProfile`	`"everyday"`	Reading profile.
`segmenter`	`Callable[[str], Sequence[str]] \\| None`	`None`	Custom word segmenter. If `None`, uses the built-in longest-match segmenter (with pythainlp if available).

Returns¶

str — transcribed words joined by spaces. Empty string if input is empty or whitespace-only.

Examples¶

from thaiphon import transcribe_sentence

# Use a custom segmenter (or pythainlp) for reliable sentence splitting.
def my_segmenter(text: str) -> list[str]:
    return text.split()   # split on spaces (pre-segmented input)

transcribe_sentence("ฉัน ชอบ กิน ข้าว", scheme="ipa", segmenter=my_segmenter)
# '/tɕʰan˩˩˦/ /tɕʰɔːp̚˥˩/ /kin˧/ /kʰaːw˥˩/'

transcribe_sentence("ฉัน ชอบ กิน ข้าว", scheme="tlc", format="html", segmenter=my_segmenter)
# 'chan<sup>R</sup> chaawp<sup>F</sup> gin<sup>M</sup> khaao<sup>F</sup>'

transcribe_sentence("กา นก ปลา", scheme="tlc", format="html", segmenter=my_segmenter)
# 'gaa<sup>M</sup> nohk<sup>H</sup> bplaa<sup>M</sup>'

Notes¶

Words appearing mid-compound have their vowel-length overrides suppressed, matching the colloquial shortening of vowels in non-final position. Words at the end of the sentence receive the full override.
If pythainlp is installed and importable, the default segmenter uses it. Otherwise, the built-in longest-match segmenter is used.
Whitespace tokens in the segmentation output are skipped.

`list_schemes`¶

def list_schemes() -> tuple[str, ...]:

Return a sorted tuple of registered scheme identifiers.

Returns¶

tuple[str, ...] — sorted tuple of registered scheme IDs, e.g. ('ipa', 'morev', 'paiboon', 'paiboon_plus', 'rtl', 'tlc').

Example¶

from thaiphon import list_schemes

list_schemes()
# ('ipa', 'morev', 'paiboon', 'paiboon_plus', 'rtl', 'tlc')

Notes¶

list_schemes() triggers the import of the built-in renderers module, which registers all six built-in schemes. Any additional schemes you have registered with RENDERERS.register also appear.

ReadingProfile¶

ReadingProfile = Literal["everyday", "careful_educated", "learned_full", "etalon_compat"]

The four valid profile strings:

Value	Register
`"everyday"`	Colloquial urban speech (default)
`"careful_educated"`	Formal broadcast register
`"learned_full"`	Full Indic/Sanskrit citation forms
`"etalon_compat"`	Dictionary-citation, collapses foreign codas

See Reading profiles for details and examples.

transcribe, transcribe_word, transcribe_sentence¶

transcribe¶

Parameters¶

Returns¶

Raises¶

Examples¶

Notes¶

transcribe_word¶

Example¶

transcribe_sentence¶

Parameters¶

Returns¶

Examples¶

Notes¶

list_schemes¶

Returns¶

Example¶

Notes¶

ReadingProfile¶

`transcribe`¶

`transcribe_word`¶

`transcribe_sentence`¶

`list_schemes`¶