Languages

The pymkv.Languages module provides a normalized API for comparing and translating language codes across ISO 639-1, /B, /T, 639-3, BCP 47, and English language names. It backs the lenient MKVTrack language setter and the matches_language() convenience helper.

The authoritative source for “what mkvmerge accepts” is mkvmerge --list-languages (loaded once per process via _load_mkvmerge_table). python-iso639 provides the offline fallback and resolves /T-variant codes that mkvmerge does not emit.

pymkv.get_iso639_2(language: str | None) str | None

Resolve a language identifier to its canonical ISO 639-2 /B code.

Accepts any ISO 639-1 ("en"), 639-2 /B ("eng"), 639-2 /T ("fra""fre"), 639-3 ("eng") code, or English language name ("English", "english"). Returns None for unrecognized input, non-string input, empty strings, or for languages that exist only in 639-3 (e.g. "tok" / Toki Pona) and therefore have no /B code.

Parameters:
language : str | None

The language identifier to resolve. None, non-strings (including unhashable values such as lists), and empty strings all return None without raising.

Returns:

The canonical ISO 639-2 /B code, or None if no mapping exists.

Return type:

str | None

Examples

>>> get_iso639_2("eng")
'eng'
>>> get_iso639_2("fra")
'fre'
>>> get_iso639_2("English")
'eng'
>>> get_iso639_2("tok") is None
True
>>> get_iso639_2(None) is None
True
pymkv.normalize_language(code: str | None, mkvmerge_path: tuple[str, ...] | None = None) str | None

Normalize a language identifier to its canonical ISO 639-2 /B code.

Strips BCP 47 subtags before lookup ("zh-Hans-CN""chi") and treats None, the empty string, and "und" (the BCP 47 / mkvmerge “undefined” sentinel) as “no language” by returning None.

Resolution order:

  1. _load_mkvmerge_table() lookup (authoritative for mkvmerge).

  2. pymkv.ISO639_2.get_iso639_2() (covers /T → /B and English names).

  3. None.

Parameters:
code : str | None

Any 639-1/2/B/2/T/3 code, English language name, or BCP 47 tag. None, "", "und" and unrecognized input return None.

mkvmerge_path : tuple[str, ...], optional

Tuple form of the mkvmerge invocation (as produced by pymkv.utils.prepare_mkvtoolnix_path()) used to load the authoritative language table. Defaults to plain "mkvmerge" from $PATH. Pass MKVFile.mkvmerge_path / MKVTrack.mkvmerge_path to honor a per-object override.

Returns:

Canonical ISO 639-2 /B code, or None.

Return type:

str | None

pymkv.is_known_language(code: str | None, mkvmerge_path: tuple[str, ...] | None = None) bool

Return True when code is a tag mkvmerge or python-iso639 recognizes.

Strips BCP 47 subtags before lookup. Differs from normalize_language() by recognizing 639-3-only languages such as Toki Pona ("tok") and Ghotuo ("aaa") — mkvmerge accepts these even though they have no /B form, so normalize_language() returns None for them. Use this predicate when you need to gate on “would mkvmerge accept this tag” rather than “does it normalize to /B”.

The BCP 47 "und" sentinel is treated as recognized: mkvmerge accepts it as the explicit “undefined language” marker.

Parameters:
code : str | None

Any 639-1/2/B/2/T/3 code, English language name, or BCP 47 tag.

mkvmerge_path : tuple[str, ...], optional

Tuple form of the mkvmerge invocation (as produced by pymkv.utils.prepare_mkvtoolnix_path()). Defaults to plain "mkvmerge" from $PATH. Pass MKVFile.mkvmerge_path / MKVTrack.mkvmerge_path to honor a per-object override.

Returns:

True if the input is well-formed BCP 47 and the primary subtag appears in the mkvmerge table or is recognized by python-iso639. False for None, non-strings, empty strings, malformed BCP 47 (whitespace, consecutive hyphens, empty subtags), and unrecognized input.

Return type:

bool

pymkv.language_equivalents(code: str | None, mkvmerge_path: tuple[str, ...] | None = None) frozenset[str]

Return every known code for the same language as code.

Merges results from both data sources (mkvmerge frozenset and python-iso639 attributes) into a single frozenset. Includes 639-1, 639-2 /B, 639-2 /T and 639-3 codes when known. BCP 47 subtags on the input are stripped before lookup so "zh-Hans-CN" resolves to the same set as "chi".

Parameters:
code : str | None

A language identifier.

mkvmerge_path : tuple[str, ...], optional

Tuple form of the mkvmerge invocation. Defaults to plain "mkvmerge" from $PATH. Pass a custom path to consult a non-default binary.

Returns:

All equivalent codes. Empty frozenset for None, "", "und", and malformed BCP 47 input ("en--US", names containing hyphens such as "Pa-O"). For an unrecognized but well-formed string, returns a frozenset containing only the lower-cased primary subtag.

Return type:

frozenset[str]

pymkv.languages_match(a: str | None, b: str | None, mkvmerge_path: tuple[str, ...] | None = None) bool

Return True when a and b refer to the same language.

Equality is computed across all formats this module understands, including BCP 47 subtags, so e.g. languages_match("chi", "zh-Hans") is True. 639-3-only codes mkvmerge accepts ("tok", "aaa") compare equal to themselves even though they have no canonical /B form.

Either side being None, "" or "und" always returns False — “unknown matches unknown” is not useful in the contexts this API was designed for (track filtering, “is this the original-language audio?” questions).

Parameters:
a : str | None

Language identifiers to compare.

b : str | None

Language identifiers to compare.

mkvmerge_path : tuple[str, ...], optional

Tuple form of the mkvmerge invocation. Defaults to plain "mkvmerge" from $PATH.

Returns:

True if both inputs resolve to the same canonical key.

Return type:

bool