From d872423a7649540d5d23311bc8765b993329e1cc Mon Sep 17 00:00:00 2001 From: shamoon <4887959+shamoon@users.noreply.github.com> Date: Mon, 10 Apr 2023 14:04:30 -0700 Subject: [PATCH] Add info re tesseract language codes Closes #3065 --- docs/configuration.md | 7 +++++-- 1 file changed, 5 insertions(+), 2 deletions(-) diff --git a/docs/configuration.md b/docs/configuration.md index 046904eaf..aca9961e2 100644 --- a/docs/configuration.md +++ b/docs/configuration.md @@ -1088,10 +1088,13 @@ actual group ID on the host system, which you can get by executing : Additional OCR languages to install. By default, paperless comes with English, German, Italian, Spanish and French. If your language is not in this list, install additional languages with this -configuration option ([find the right LangCodes](https://tesseract-ocr.github.io/tessdoc/Data-Files-in-different-versions.html)): +configuration option. You will need to [find the right LangCodes](https://tesseract-ocr.github.io/tessdoc/Data-Files-in-different-versions.html) +but note that (tesseract-ocr-\* package names)[https://packages.debian.org/bullseye/graphics/] +do not always correspond with the language codes e.g. "chi_tra" should be +specified as "chi-tra". ``` bash - PAPERLESS_OCR_LANGUAGES=tur ces + PAPERLESS_OCR_LANGUAGES=tur ces chi-tra ``` Make sure it's a space separated list when using several values.