Troubleshooting¶
Common problems and how to fix them.
Installation issues¶
Recommended install
For best results, install both the engine and the lexicon data package together:
The data package raises accuracy from ~57% to ~75% and is needed for correct output on many common words. See the Install guide for full details.ModuleNotFoundError: No module named 'thaiphon'¶
thaiphon is not installed in the Python environment you are running.
Check 1 — Multiple Python versions. Run python --version (or python3 --version). If the version differs from the one you used when running pip install, you have multiple Python installations. Use:
python -m pip install thaiphon thaiphon-data-volubilis
# or
python3 -m pip install thaiphon thaiphon-data-volubilis
Check 2 — Virtual environment not activated. If you use virtual environments, make sure it is activated before running your script. Activate with:
pip: command not found¶
On macOS/Linux, try pip3 or python3 -m pip install thaiphon.
On Windows, if you see this error it usually means Python's Scripts folder was not added to PATH during installation. Reinstall Python and check "Add Python to PATH" at the first screen.
Display issues¶
Thai characters appear as boxes or question marks in the terminal¶
Your terminal's font does not include Thai characters, or the terminal encoding is not UTF-8.
This does not affect the correctness of thaiphon's output. The calculation is correct regardless of how your terminal displays input characters.
Fix for terminal display: On Windows, set the terminal to UTF-8:
Better fix: write to a file instead. UTF-8 text files will display correctly in any modern text editor:
from thaiphon import transcribe
results = [transcribe(w, scheme="ipa") for w in ["สวัสดี", "น้ำ", "ข้าว"]]
with open("output.txt", "w", encoding="utf-8") as f:
f.write("\n".join(results))
The IPA output is garbled or shows replacement characters¶
Make sure you open the output file with UTF-8 encoding. IPA symbols (˧ ˦ ˩ ː p̚ t̚ k̚ etc.) are standard Unicode characters; they display correctly in any editor that supports Unicode.
Input issues¶
Copy-pasting Thai loses tone marks¶
Some messaging apps and web forms strip or mangle Unicode combining characters (the diacritics that carry tone marks in Thai).
- Use a text editor that preserves Unicode when preparing your input (VS Code, Notepad++, BBEdit, etc.).
- If you received the text by copy-paste from a PDF, check the original source — PDFs sometimes extract Thai characters in a garbled order.
thaiphon produces wrong output for a word I know is correct¶
There are several possibilities:
- Register mismatch. Try different reading profiles. The default
everydayprofile may differ from the form you expected. - Loanword not in the lexicon. For foreign loanwords, thaiphon uses a lexicon plus a heuristic. Words not in the lexicon fall back to the derivation pipeline, which may not preserve foreign codas.
- Genuine engine error. Please open an issue at github.com/5w0rdf15h/thaiphon/issues with the word and what you expected.
Output issues¶
UnsupportedSchemeError: no such scheme: 'xyz'¶
The scheme name is not registered. Built-in scheme names are ipa, tlc, and morev — all lowercase. Check for typos.
ValueError: unknown reading profile ...¶
The profile name is not one of the four valid strings. Valid values are everyday, careful_educated, learned_full, etalon_compat.
NormalizationError: combining mark at string start without base¶
The input string begins with a combining character (a vowel mark or tone mark) without a consonant before it. This is malformed Thai input. Check the source of the string — it may have been truncated or partially garbled.
Getting help¶
If none of the above solves your problem, open a GitHub issue at github.com/5w0rdf15h/thaiphon/issues. Include:
- What you typed or ran.
- What you expected.
- What actually happened (the full error message or the wrong output).
- Your Python version (
python --version) and thaiphon version (python -c "import thaiphon; print(thaiphon.__version__)").