Morev (Cyrillic)

Morev scheme (Cyrillic)

The morev scheme implements the Cyrillic transliteration system used in L. N. Morev, Yu. Ya. Plam, and M. F. Fomicheva's 1964 Большой тайско-русский словарь (Big Thai-Russian Dictionary) and in subsequent Russian-language Thai teaching materials. It is intended for audiences who read Cyrillic.

from thaiphon import transcribe

transcribe("น้ำ", scheme="morev", format="html")
# 'на̄мˇ'

transcribe("สวัสดี", scheme="morev", format="html")
# 'саˆ-ватˆ-дӣ'

transcribe("ภาษาไทย", scheme="morev", format="html")
# 'п<sup>х</sup>а̄-са̄´-т<sup>х</sup>ай'

Format conventions

Convention Detail
Syllable separator - (hyphen)
Long vowel combining macron (◌̄, U+0304) on the first vowel letter of the syllable
Tone spacing modifier letter appended after the coda (see table below)
Mid tone unmarked
Output encoding NFC-normalised; diacritics precomposed where a precomposed form exists

Tone notation

Tone marks are spacing characters written at the end of the syllable, after any coda consonant. They are not placed on the vowel letter.

Tone Modifier Unicode Example
Mid (none) на
Low ˆ U+02C6 декˆ
Falling ` U+0060 тхе̄п`
High ˇ U+02C7 ракˇ
Rising ´ U+00B4 кхо̄´

So เด็ก (dead syllable, low tone) renders as декˆ, not де̂к — the mark sits at the end of the whole syllable string.

The HIGH↔RISING correspondence is worth noting. Morev's "third (rising-falling)" tone describes what modern phonological notation labels HIGH (a characteristically high-pitched syllable with a rise-then-fall contour in citation speech). His "fourth (rising)" describes what modern notation labels RISING (a low-to-high contour). The renderer maps HIGHˇ and RISING´ accordingly.


Onset map (IPA → Cyrillic)

IPA onset Text mode HTML mode Note
k к к
кх к<sup>х</sup> aspiration digraph
ть ть т + soft sign
tɕʰ ч ч bare ч; no aspiration mark in either format
d д д
t т т
тх т<sup>х</sup> aspiration digraph
b б б
p п п
пх п<sup>х</sup> aspiration digraph
f ф ф
s с с
h х х
ʔ (empty) (empty) glottal onset not written
m м м
n н н
ŋ нг нг two-letter digraph
j й й
r р р
l л л
w в в as bare initial; see cluster note below

Aspiration: The three plain stops gain a second letter х to mark aspiration — /kʰ/кх, /tʰ/тх, /pʰ/пх. In HTML mode the second element becomes a superscript (к<sup>х</sup>, т<sup>х</sup>, п<sup>х</sup>). The aspirated palatal /tɕʰ/ is written as bare ч in both modes — the alphabet table in the dictionary lists ฉ, ช, and ฌ all as ч without an aspiration mark.

Velar nasal: /ŋ/ is the two-letter string нг, not a single Cyrillic character. This applies in both onset and coda positions.

CC onset clusters: When fills the second slot of a true onset cluster (e.g. /kw/, /kʰw/), it surfaces as у rather than в. กวาง (gaur) renders as куа̄нг. As a bare initial onset is still в.


Vowel map (IPA → Cyrillic)

IPA quality Short Long
/a/ а а̄
/i/ и ӣ
/u/ у ӯ
/e/ е е̄
/ɛ/ э э̄
/o/ о о̄
/ɔ/ о о̄
/ɯ/ ы ы̄
/ɤ/ ə ə̄
/iə/ short иа
/iə/ long ӣа
/ɯə/ short ыа
/ɯə/ long ы̄а
/uə/ short уа
/uə/ long ӯа

Note on /o/ and /ɔ/: Both render as Cyrillic о/о̄. The source dictionary uses these as its default for both modern Thai /oː/ and /ɔː/ in long open syllables; the Latin glyphs ɔ/ɔ̄ appear only sporadically in the dictionary body without a derivable phonological pattern, so the renderer emits о/о̄ for both vowels.

Mid-central /ɤ/: Uses the schwa ə (U+0259), which is intentionally non-Cyrillic and reproduces the dictionary's typesetting convention for this vowel.

Long diphthongs: The macron sits on the first element only — ӣа, ы̄а, ӯа.


Coda map (IPA → Cyrillic)

IPA coda Cyrillic Note
m м
n н
ŋ нг two-letter digraph
п
т
к
w (offglide) у
j (offglide) й
f (loanword) п collapses to nearest native stop
s (loanword) т collapses to nearest native stop
l (loanword) н collapses to nearest native nasal

Foreign codas that no native Thai syllable supports collapse to the nearest native segment. This matches the dictionary's treatment of loanwords: ก๊าซ → ка̄тˇ, ฟุตบอล → футˇ-бо̄н, ปรู๊ฟ → прӯпˇ.


HTML mode

Pass format="html" to receive aspirated stop onsets with superscript markup. All other output is identical to text mode.

transcribe("ขอ", scheme="morev", format="text")
# 'кхо̄´'

transcribe("ขอ", scheme="morev", format="html")
# 'к<sup>х</sup>о̄´'

Schemes without per-format differences return the same string for both values of format. IPA and TLC are in that category; Morev provides an HTML overlay only for the four aspirated onsets.


Examples

All outputs are verified against the engine.

Thai Gloss Morev text Morev HTML
กา crow ка̄ ка̄
ขา leg кха̄´ к<sup>х</sup>а̄´
ขอ to ask/want кхо̄´ к<sup>х</sup>о̄´
ถุง bag тхунг´ т<sup>х</sup>унг´
เด็ก child декˆ декˆ
รัก love ракˇ ракˇ
ปลา fish пла̄ пла̄
น้ำ water на̄мˇ на̄мˇ
สวัสดี hello саˆ-ватˆ-дӣ саˆ-ватˆ-дӣ
ทหาร soldier тхаˇ-ха̄н´ т<sup>х</sup>аˇ-ха̄н´
กรุงเทพ Bangkok крунг-тхе̄п` крунг-т<sup>х</sup>е̄п`
ภาษาไทย Thai language пха̄-са̄´-тхай п<sup>х</sup>а̄-са̄´-т<sup>х</sup>ай
กวาง gaur куа̄нг куа̄нг
ปรู๊ฟ proof прӯпˇ прӯпˇ
ฟุตบอล football футˇ-бо̄н футˇ-бо̄н