API reference

Override lexicons — API

Full reference for the three public functions that manage the override lexicon registry.

For motivation, worked examples, and guidance on constructing PhonologicalWord instances, see Override lexicons.


register_lexicon

from thaiphon import register_lexicon
def register_lexicon(
    lookup: Callable[[str], PhonologicalWord | None],
    *,
    name: str,
    priority: int = 0,
) -> None:

Register a word-level override lookup with the pipeline.

Parameters

Parameter Type Default Description
lookup Callable[[str], PhonologicalWord \| None] Callable that takes a post-normalisation Thai word string and returns a PhonologicalWord for a hit, or None to defer.
name str Identifier for this layer. Used for unregistration, the source tag, and registered_lexicons() output. Must be non-empty and unique across all currently-registered layers.
priority int 0 Resolution priority. Higher values resolve first. Layers with equal priority resolve in registration order.

Returns

None.

Raises

Exception Condition
ValueError name is empty.
ValueError A lexicon named name is already registered.

Behaviour

The lookup callable is called with the Thai word after Unicode normalisation and Sara-Am expansion have been applied. The caller does not need to replicate thaiphon's normalisation.

When lookup returns a PhonologicalWord, the pipeline attaches source='override:<name>' to both the AnalysisResult and the returned word, then short-circuits — built-in lexicons and rule-based derivation are not consulted for that word.

When lookup returns None, the next layer in priority order is tried. If all registered layers return None, the built-in pipeline continues as normal.

Example

from thaiphon import register_lexicon
from thaiphon.model.word import PhonologicalWord

VOCAB: dict[str, PhonologicalWord] = {
    "กรุงเทพ": PhonologicalWord(...),
}

register_lexicon(lambda w: VOCAB.get(w), name="my-site")

unregister_lexicon

from thaiphon import unregister_lexicon
def unregister_lexicon(name: str) -> bool:

Remove a previously-registered lexicon by name.

Parameters

Parameter Type Description
name str The name passed to register_lexicon when the layer was registered.

Returns

True if a lexicon with that name was found and removed. False if no lexicon with that name was registered.

Raises

Nothing. Unregistering a name that was never registered is not an error.

Example

from thaiphon import unregister_lexicon

removed = unregister_lexicon("my-site")
print(removed)   # True if the layer existed, False otherwise

registered_lexicons

from thaiphon import registered_lexicons
def registered_lexicons() -> tuple[str, ...]:

Return the names of all currently-registered override lexicons, in resolution order.

Returns

A tuple[str, ...] of layer names, sorted from highest priority to lowest. Layers with equal priority appear in the order they were registered.

An empty tuple is returned when no override lexicons are registered.

Example

from thaiphon import register_lexicon, registered_lexicons

register_lexicon(lambda w: None, name="base",    priority=0)
register_lexicon(lambda w: None, name="premium", priority=10)

print(registered_lexicons())
# ('premium', 'base')

LookupCallable type alias

from thaiphon.overrides import LookupCallable
LookupCallable = Callable[[str], PhonologicalWord | None]

The type of the callable accepted by register_lexicon. Exposed for use in type annotations:

from thaiphon.overrides import LookupCallable
from thaiphon.model.word import PhonologicalWord

def make_lookup(vocab: dict[str, PhonologicalWord]) -> LookupCallable:
    return vocab.get

Source tagging

When an override lookup returns a result, thaiphon sets source='override:<name>' on both:

  • AnalysisResult.source — visible in the return value of analyze().
  • PhonologicalWord.source — carried on the word itself.
from thaiphon import analyze, register_lexicon
from thaiphon.model.word import PhonologicalWord

# ... register a lexicon named "my-site" with an entry for กรุงเทพ ...

result = analyze("กรุงเทพ")
print(result.source)        # 'override:my-site'
print(result.best.source)   # 'override:my-site'

For words served by the normal pipeline, source is 'lexicon', 'derivation', or 'derivation+lexicon'. See Types — AnalysisResult for the full list.