Merge pull request #2830 from tooomm/patch-1

docs: better language code help
2025-07-16 17:25:11 -05:00 · 2023-03-06 16:56:08 -08:00 · 2023-03-06 16:56:08 -08:00 · 9564a9c28d
commit 9564a9c28d
parent 64b2037eda c5b701f99d
1 changed files with 7 additions and 6 deletions
--- a/docs/configuration.md
+++ b/docs/configuration.md
@ -383,21 +383,20 @@ needs.
 : Customize the language that paperless will attempt to use when
 parsing documents.

-    It should be a 3-letter language code consistent with ISO 639:
-    https://www.loc.gov/standards/iso639-2/php/code_list.php
+    It should be a 3-letter code, see the list of [languages Tesseract supports](https://tesseract-ocr.github.io/tessdoc/Data-Files-in-different-versions.html).

    Set this to the language most of your documents are written in.

    This can be a combination of multiple languages such as `deu+eng`,
-    in which case tesseract will use whatever language matches best.
-    Keep in mind that tesseract uses much more cpu time with multiple
+    in which case Tesseract will use whatever language matches best.
+    Keep in mind that Tesseract uses much more CPU time with multiple
    languages enabled.

    Defaults to "eng".

    !!! note

-        If your language contains a '-' such as chi-sim, you must use chi_sim
+        If your language contains a '-' such as chi-sim, you must use `chi_sim`.

 `PAPERLESS_OCR_MODE=<mode>`

@ -1097,12 +1096,14 @@ actual group ID on the host system, which you can get by executing
 : Additional OCR languages to install. By default, paperless comes
 with English, German, Italian, Spanish and French. If your language
 is not in this list, install additional languages with this
-configuration option:
+configuration option ([find the right LangCodes](https://tesseract-ocr.github.io/tessdoc/Data-Files-in-different-versions.html)):

    ``` bash
    PAPERLESS_OCR_LANGUAGES=tur ces
    ```

+    Make sure it's a space separated list when using several values.
+
    To actually use these languages, also set the default OCR language
    of paperless: