Account for plusses in the OCR language setting

This commit is contained in:
Trenton H 2022-09-28 14:24:34 -07:00
parent 2d71415ede
commit e88d911984

View File

@ -719,7 +719,10 @@ def _get_nltk_language_setting(ocr_lang: str) -> Optional[str]:
Maps an ISO-639-1 language code supported by Tesseract into Maps an ISO-639-1 language code supported by Tesseract into
an optional NLTK language name. This is the set of common supported an optional NLTK language name. This is the set of common supported
languages for all the NLTK data used. languages for all the NLTK data used.
Assumption: The primary language is first
""" """
ocr_lang = ocr_lang.split("+")[0]
iso_code_to_nltk = { iso_code_to_nltk = {
"dan": "danish", "dan": "danish",
"nld": "dutch", "nld": "dutch",