Morev scheme (Cyrillic)¶
The morev scheme implements the Cyrillic transliteration system used in L. N. Morev, Yu. Ya. Plam, and M. F. Fomicheva's 1964 Большой тайско-русский словарь (Big Thai-Russian Dictionary) and in subsequent Russian-language Thai teaching materials. It is intended for audiences who read Cyrillic.
from thaiphon import transcribe
transcribe("น้ำ", scheme="morev", format="html")
# 'на̄мˇ'
transcribe("สวัสดี", scheme="morev", format="html")
# 'саˆ-ватˆ-дӣ'
transcribe("ภาษาไทย", scheme="morev", format="html")
# 'п<sup>х</sup>а̄-са̄´-т<sup>х</sup>ай'
Format conventions¶
| Convention | Detail |
|---|---|
| Syllable separator | - (hyphen) |
| Long vowel | combining macron (◌̄, U+0304) on the first vowel letter of the syllable |
| Tone | spacing modifier letter appended after the coda (see table below) |
| Mid tone | unmarked |
| Output encoding | NFC-normalised; diacritics precomposed where a precomposed form exists |
Tone notation¶
Tone marks are spacing characters written at the end of the syllable, after any coda consonant. They are not placed on the vowel letter.
| Tone | Modifier | Unicode | Example |
|---|---|---|---|
| Mid | (none) | — | на |
| Low | ˆ |
U+02C6 | декˆ |
| Falling | ` |
U+0060 | тхе̄п` |
| High | ˇ |
U+02C7 | ракˇ |
| Rising | ´ |
U+00B4 | кхо̄´ |
So เด็ก (dead syllable, low tone) renders as декˆ, not де̂к — the mark sits at the end of the whole syllable string.
The HIGH↔RISING correspondence is worth noting. Morev's "third (rising-falling)" tone describes what modern phonological notation labels HIGH (a characteristically high-pitched syllable with a rise-then-fall contour in citation speech). His "fourth (rising)" describes what modern notation labels RISING (a low-to-high contour). The renderer maps HIGH → ˇ and RISING → ´ accordingly.
Onset map (IPA → Cyrillic)¶
| IPA onset | Text mode | HTML mode | Note |
|---|---|---|---|
k |
к |
к |
|
kʰ |
кх |
к<sup>х</sup> |
aspiration digraph |
tɕ |
ть |
ть |
т + soft sign |
tɕʰ |
ч |
ч |
bare ч; no aspiration mark in either format |
d |
д |
д |
|
t |
т |
т |
|
tʰ |
тх |
т<sup>х</sup> |
aspiration digraph |
b |
б |
б |
|
p |
п |
п |
|
pʰ |
пх |
п<sup>х</sup> |
aspiration digraph |
f |
ф |
ф |
|
s |
с |
с |
|
h |
х |
х |
|
ʔ |
(empty) | (empty) | glottal onset not written |
m |
м |
м |
|
n |
н |
н |
|
ŋ |
нг |
нг |
two-letter digraph |
j |
й |
й |
|
r |
р |
р |
|
l |
л |
л |
|
w |
в |
в |
as bare initial; see cluster note below |
Aspiration: The three plain stops gain a second letter х to mark aspiration — /kʰ/ → кх, /tʰ/ → тх, /pʰ/ → пх. In HTML mode the second element becomes a superscript (к<sup>х</sup>, т<sup>х</sup>, п<sup>х</sup>). The aspirated palatal /tɕʰ/ is written as bare ч in both modes — the alphabet table in the dictionary lists ฉ, ช, and ฌ all as ч without an aspiration mark.
Velar nasal: /ŋ/ is the two-letter string нг, not a single Cyrillic character. This applies in both onset and coda positions.
CC onset clusters: When ว fills the second slot of a true onset cluster (e.g. /kw/, /kʰw/), it surfaces as у rather than в. กวาง (gaur) renders as куа̄нг. As a bare initial onset ว is still в.
Vowel map (IPA → Cyrillic)¶
| IPA quality | Short | Long |
|---|---|---|
| /a/ | а |
а̄ |
| /i/ | и |
ӣ |
| /u/ | у |
ӯ |
| /e/ | е |
е̄ |
| /ɛ/ | э |
э̄ |
| /o/ | о |
о̄ |
| /ɔ/ | о |
о̄ |
| /ɯ/ | ы |
ы̄ |
| /ɤ/ | ə |
ə̄ |
| /iə/ short | иа |
— |
| /iə/ long | — | ӣа |
| /ɯə/ short | ыа |
— |
| /ɯə/ long | — | ы̄а |
| /uə/ short | уа |
— |
| /uə/ long | — | ӯа |
Note on /o/ and /ɔ/: Both render as Cyrillic о/о̄. The source dictionary uses these as its default for both modern Thai /oː/ and /ɔː/ in long open syllables; the Latin glyphs ɔ/ɔ̄ appear only sporadically in the dictionary body without a derivable phonological pattern, so the renderer emits о/о̄ for both vowels.
Mid-central /ɤ/: Uses the schwa ə (U+0259), which is intentionally non-Cyrillic and reproduces the dictionary's typesetting convention for this vowel.
Long diphthongs: The macron sits on the first element only — ӣа, ы̄а, ӯа.
Coda map (IPA → Cyrillic)¶
| IPA coda | Cyrillic | Note |
|---|---|---|
m |
м |
|
n |
н |
|
ŋ |
нг |
two-letter digraph |
p̚ |
п |
|
t̚ |
т |
|
k̚ |
к |
|
w (offglide) |
у |
|
j (offglide) |
й |
|
f (loanword) |
п |
collapses to nearest native stop |
s (loanword) |
т |
collapses to nearest native stop |
l (loanword) |
н |
collapses to nearest native nasal |
Foreign codas that no native Thai syllable supports collapse to the nearest native segment. This matches the dictionary's treatment of loanwords: ก๊าซ → ка̄тˇ, ฟุตบอล → футˇ-бо̄н, ปรู๊ฟ → прӯпˇ.
HTML mode¶
Pass format="html" to receive aspirated stop onsets with superscript markup. All other output is identical to text mode.
transcribe("ขอ", scheme="morev", format="text")
# 'кхо̄´'
transcribe("ขอ", scheme="morev", format="html")
# 'к<sup>х</sup>о̄´'
Schemes without per-format differences return the same string for both values of format. IPA and TLC are in that category; Morev provides an HTML overlay only for the four aspirated onsets.
Examples¶
All outputs are verified against the engine.
| Thai | Gloss | Morev text | Morev HTML |
|---|---|---|---|
| กา | crow | ка̄ |
ка̄ |
| ขา | leg | кха̄´ |
к<sup>х</sup>а̄´ |
| ขอ | to ask/want | кхо̄´ |
к<sup>х</sup>о̄´ |
| ถุง | bag | тхунг´ |
т<sup>х</sup>унг´ |
| เด็ก | child | декˆ |
декˆ |
| รัก | love | ракˇ |
ракˇ |
| ปลา | fish | пла̄ |
пла̄ |
| น้ำ | water | на̄мˇ |
на̄мˇ |
| สวัสดี | hello | саˆ-ватˆ-дӣ |
саˆ-ватˆ-дӣ |
| ทหาร | soldier | тхаˇ-ха̄н´ |
т<sup>х</sup>аˇ-ха̄н´ |
| กรุงเทพ | Bangkok | крунг-тхе̄п` |
крунг-т<sup>х</sup>е̄п` |
| ภาษาไทย | Thai language | пха̄-са̄´-тхай |
п<sup>х</sup>а̄-са̄´-т<sup>х</sup>ай |
| กวาง | gaur | куа̄нг |
куа̄нг |
| ปรู๊ฟ | proof | прӯпˇ |
прӯпˇ |
| ฟุตบอล | football | футˇ-бо̄н |
футˇ-бо̄н |