Merge pull request #2830 from tooomm/patch-1

docs: better language code help
This commit is contained in:
shamoon 2023-03-06 16:56:08 -08:00 committed by GitHub
commit 9564a9c28d
No known key found for this signature in database
GPG Key ID: 4AEE18F83AFDEB23

View File

@ -383,21 +383,20 @@ needs.
: Customize the language that paperless will attempt to use when
parsing documents.
It should be a 3-letter language code consistent with ISO 639:
https://www.loc.gov/standards/iso639-2/php/code_list.php
It should be a 3-letter code, see the list of [languages Tesseract supports](https://tesseract-ocr.github.io/tessdoc/Data-Files-in-different-versions.html).
Set this to the language most of your documents are written in.
This can be a combination of multiple languages such as `deu+eng`,
in which case tesseract will use whatever language matches best.
Keep in mind that tesseract uses much more cpu time with multiple
in which case Tesseract will use whatever language matches best.
Keep in mind that Tesseract uses much more CPU time with multiple
languages enabled.
Defaults to "eng".
!!! note
If your language contains a '-' such as chi-sim, you must use chi_sim
If your language contains a '-' such as chi-sim, you must use `chi_sim`.
`PAPERLESS_OCR_MODE=<mode>`
@ -1097,12 +1096,14 @@ actual group ID on the host system, which you can get by executing
: Additional OCR languages to install. By default, paperless comes
with English, German, Italian, Spanish and French. If your language
is not in this list, install additional languages with this
configuration option:
configuration option ([find the right LangCodes](https://tesseract-ocr.github.io/tessdoc/Data-Files-in-different-versions.html)):
``` bash
PAPERLESS_OCR_LANGUAGES=tur ces
```
Make sure it's a space separated list when using several values.
To actually use these languages, also set the default OCR language
of paperless: