Languages¶
The pymkv.Languages module provides a normalized API for comparing and
translating language codes across ISO 639-1, /B, /T, 639-3, BCP 47, and
English language names. It backs the lenient MKVTrack
language setter and the matches_language()
convenience helper.
The authoritative source for “what mkvmerge accepts” is
mkvmerge --list-languages (loaded once per process via
_load_mkvmerge_table). python-iso639 provides the offline fallback
and resolves /T-variant codes that mkvmerge does not emit.
- pymkv.get_iso639_2(language: str | None) str | None¶
Resolve a language identifier to its canonical ISO 639-2 /B code.
Accepts any ISO 639-1 (
"en"), 639-2 /B ("eng"), 639-2 /T ("fra"→"fre"), 639-3 ("eng") code, or English language name ("English","english"). ReturnsNonefor unrecognized input, non-string input, empty strings, or for languages that exist only in 639-3 (e.g."tok"/ Toki Pona) and therefore have no /B code.- Parameters:¶
- language : str | None¶
The language identifier to resolve.
None, non-strings (including unhashable values such as lists), and empty strings all returnNonewithout raising.
- Returns:¶
The canonical ISO 639-2 /B code, or
Noneif no mapping exists.- Return type:¶
str | None
Examples
>>> get_iso639_2("eng") 'eng' >>> get_iso639_2("fra") 'fre' >>> get_iso639_2("English") 'eng' >>> get_iso639_2("tok") is None True >>> get_iso639_2(None) is None True
-
pymkv.normalize_language(code: str | None, mkvmerge_path: tuple[str, ...] | None =
None) str | None¶ Normalize a language identifier to its canonical ISO 639-2 /B code.
Strips BCP 47 subtags before lookup (
"zh-Hans-CN"→"chi") and treatsNone, the empty string, and"und"(the BCP 47 / mkvmerge “undefined” sentinel) as “no language” by returningNone.Resolution order:
_load_mkvmerge_table()lookup (authoritative for mkvmerge).pymkv.ISO639_2.get_iso639_2()(covers /T → /B and English names).None.
- Parameters:¶
- code : str | None¶
Any 639-1/2/B/2/T/3 code, English language name, or BCP 47 tag.
None,"","und"and unrecognized input returnNone.- mkvmerge_path : tuple[str, ...], optional¶
Tuple form of the mkvmerge invocation (as produced by
pymkv.utils.prepare_mkvtoolnix_path()) used to load the authoritative language table. Defaults to plain"mkvmerge"from$PATH. PassMKVFile.mkvmerge_path/MKVTrack.mkvmerge_pathto honor a per-object override.
- Returns:¶
Canonical ISO 639-2 /B code, or
None.- Return type:¶
str | None
-
pymkv.is_known_language(code: str | None, mkvmerge_path: tuple[str, ...] | None =
None) bool¶ Return
Truewhencodeis a tag mkvmerge or python-iso639 recognizes.Strips BCP 47 subtags before lookup. Differs from
normalize_language()by recognizing 639-3-only languages such as Toki Pona ("tok") and Ghotuo ("aaa") — mkvmerge accepts these even though they have no /B form, sonormalize_language()returnsNonefor them. Use this predicate when you need to gate on “would mkvmerge accept this tag” rather than “does it normalize to /B”.The BCP 47
"und"sentinel is treated as recognized: mkvmerge accepts it as the explicit “undefined language” marker.- Parameters:¶
- code : str | None¶
Any 639-1/2/B/2/T/3 code, English language name, or BCP 47 tag.
- mkvmerge_path : tuple[str, ...], optional¶
Tuple form of the mkvmerge invocation (as produced by
pymkv.utils.prepare_mkvtoolnix_path()). Defaults to plain"mkvmerge"from$PATH. PassMKVFile.mkvmerge_path/MKVTrack.mkvmerge_pathto honor a per-object override.
- Returns:¶
Trueif the input is well-formed BCP 47 and the primary subtag appears in the mkvmerge table or is recognized by python-iso639.FalseforNone, non-strings, empty strings, malformed BCP 47 (whitespace, consecutive hyphens, empty subtags), and unrecognized input.- Return type:¶
bool
-
pymkv.language_equivalents(code: str | None, mkvmerge_path: tuple[str, ...] | None =
None) frozenset[str]¶ Return every known code for the same language as
code.Merges results from both data sources (mkvmerge frozenset and python-iso639 attributes) into a single frozenset. Includes 639-1, 639-2 /B, 639-2 /T and 639-3 codes when known. BCP 47 subtags on the input are stripped before lookup so
"zh-Hans-CN"resolves to the same set as"chi".- Parameters:¶
- Returns:¶
All equivalent codes. Empty frozenset for
None,"","und", and malformed BCP 47 input ("en--US", names containing hyphens such as"Pa-O"). For an unrecognized but well-formed string, returns a frozenset containing only the lower-cased primary subtag.- Return type:¶
frozenset[str]
-
pymkv.languages_match(a: str | None, b: str | None, mkvmerge_path: tuple[str, ...] | None =
None) bool¶ Return
Truewhenaandbrefer to the same language.Equality is computed across all formats this module understands, including BCP 47 subtags, so e.g.
languages_match("chi", "zh-Hans") is True. 639-3-only codes mkvmerge accepts ("tok","aaa") compare equal to themselves even though they have no canonical /B form.Either side being
None,""or"und"always returnsFalse— “unknown matches unknown” is not useful in the contexts this API was designed for (track filtering, “is this the original-language audio?” questions).