Add info re tesseract language codes

Closes #3065
This commit is contained in:
shamoon 2023-04-10 14:04:30 -07:00
parent 3425d01853
commit d872423a76

View File

@ -1088,10 +1088,13 @@ actual group ID on the host system, which you can get by executing
: Additional OCR languages to install. By default, paperless comes
with English, German, Italian, Spanish and French. If your language
is not in this list, install additional languages with this
configuration option ([find the right LangCodes](https://tesseract-ocr.github.io/tessdoc/Data-Files-in-different-versions.html)):
configuration option. You will need to [find the right LangCodes](https://tesseract-ocr.github.io/tessdoc/Data-Files-in-different-versions.html)
but note that (tesseract-ocr-\* package names)[https://packages.debian.org/bullseye/graphics/]
do not always correspond with the language codes e.g. "chi_tra" should be
specified as "chi-tra".
``` bash
PAPERLESS_OCR_LANGUAGES=tur ces
PAPERLESS_OCR_LANGUAGES=tur ces chi-tra
```
Make sure it's a space separated list when using several values.