Override lexicons — API¶
Full reference for the three public functions that manage the override lexicon registry.
For motivation, worked examples, and guidance on constructing PhonologicalWord instances, see Override lexicons.
register_lexicon¶
def register_lexicon(
lookup: Callable[[str], PhonologicalWord | None],
*,
name: str,
priority: int = 0,
) -> None:
Register a word-level override lookup with the pipeline.
Parameters¶
| Parameter | Type | Default | Description |
|---|---|---|---|
lookup |
Callable[[str], PhonologicalWord \| None] |
— | Callable that takes a post-normalisation Thai word string and returns a PhonologicalWord for a hit, or None to defer. |
name |
str |
— | Identifier for this layer. Used for unregistration, the source tag, and registered_lexicons() output. Must be non-empty and unique across all currently-registered layers. |
priority |
int |
0 |
Resolution priority. Higher values resolve first. Layers with equal priority resolve in registration order. |
Returns¶
None.
Raises¶
| Exception | Condition |
|---|---|
ValueError |
name is empty. |
ValueError |
A lexicon named name is already registered. |
Behaviour¶
The lookup callable is called with the Thai word after Unicode normalisation and Sara-Am expansion have been applied. The caller does not need to replicate thaiphon's normalisation.
When lookup returns a PhonologicalWord, the pipeline attaches source='override:<name>' to both the AnalysisResult and the returned word, then short-circuits — built-in lexicons and rule-based derivation are not consulted for that word.
When lookup returns None, the next layer in priority order is tried. If all registered layers return None, the built-in pipeline continues as normal.
Example¶
from thaiphon import register_lexicon
from thaiphon.model.word import PhonologicalWord
VOCAB: dict[str, PhonologicalWord] = {
"กรุงเทพ": PhonologicalWord(...),
}
register_lexicon(lambda w: VOCAB.get(w), name="my-site")
unregister_lexicon¶
Remove a previously-registered lexicon by name.
Parameters¶
| Parameter | Type | Description |
|---|---|---|
name |
str |
The name passed to register_lexicon when the layer was registered. |
Returns¶
True if a lexicon with that name was found and removed.
False if no lexicon with that name was registered.
Raises¶
Nothing. Unregistering a name that was never registered is not an error.
Example¶
from thaiphon import unregister_lexicon
removed = unregister_lexicon("my-site")
print(removed) # True if the layer existed, False otherwise
registered_lexicons¶
Return the names of all currently-registered override lexicons, in resolution order.
Returns¶
A tuple[str, ...] of layer names, sorted from highest priority to lowest. Layers with equal priority appear in the order they were registered.
An empty tuple is returned when no override lexicons are registered.
Example¶
from thaiphon import register_lexicon, registered_lexicons
register_lexicon(lambda w: None, name="base", priority=0)
register_lexicon(lambda w: None, name="premium", priority=10)
print(registered_lexicons())
# ('premium', 'base')
LookupCallable type alias¶
The type of the callable accepted by register_lexicon. Exposed for use in type annotations:
from thaiphon.overrides import LookupCallable
from thaiphon.model.word import PhonologicalWord
def make_lookup(vocab: dict[str, PhonologicalWord]) -> LookupCallable:
return vocab.get
Source tagging¶
When an override lookup returns a result, thaiphon sets source='override:<name>' on both:
AnalysisResult.source— visible in the return value ofanalyze().PhonologicalWord.source— carried on the word itself.
from thaiphon import analyze, register_lexicon
from thaiphon.model.word import PhonologicalWord
# ... register a lexicon named "my-site" with an entry for กรุงเทพ ...
result = analyze("กรุงเทพ")
print(result.source) # 'override:my-site'
print(result.best.source) # 'override:my-site'
For words served by the normal pipeline, source is 'lexicon', 'derivation', or 'derivation+lexicon'. See Types — AnalysisResult for the full list.