mirror of
https://github.com/paperless-ngx/paperless-ngx.git
synced 2025-04-15 10:13:15 -05:00
Merge pull request #2830 from tooomm/patch-1
docs: better language code help
This commit is contained in:
commit
9564a9c28d
@ -383,21 +383,20 @@ needs.
|
|||||||
: Customize the language that paperless will attempt to use when
|
: Customize the language that paperless will attempt to use when
|
||||||
parsing documents.
|
parsing documents.
|
||||||
|
|
||||||
It should be a 3-letter language code consistent with ISO 639:
|
It should be a 3-letter code, see the list of [languages Tesseract supports](https://tesseract-ocr.github.io/tessdoc/Data-Files-in-different-versions.html).
|
||||||
https://www.loc.gov/standards/iso639-2/php/code_list.php
|
|
||||||
|
|
||||||
Set this to the language most of your documents are written in.
|
Set this to the language most of your documents are written in.
|
||||||
|
|
||||||
This can be a combination of multiple languages such as `deu+eng`,
|
This can be a combination of multiple languages such as `deu+eng`,
|
||||||
in which case tesseract will use whatever language matches best.
|
in which case Tesseract will use whatever language matches best.
|
||||||
Keep in mind that tesseract uses much more cpu time with multiple
|
Keep in mind that Tesseract uses much more CPU time with multiple
|
||||||
languages enabled.
|
languages enabled.
|
||||||
|
|
||||||
Defaults to "eng".
|
Defaults to "eng".
|
||||||
|
|
||||||
!!! note
|
!!! note
|
||||||
|
|
||||||
If your language contains a '-' such as chi-sim, you must use chi_sim
|
If your language contains a '-' such as chi-sim, you must use `chi_sim`.
|
||||||
|
|
||||||
`PAPERLESS_OCR_MODE=<mode>`
|
`PAPERLESS_OCR_MODE=<mode>`
|
||||||
|
|
||||||
@ -1097,12 +1096,14 @@ actual group ID on the host system, which you can get by executing
|
|||||||
: Additional OCR languages to install. By default, paperless comes
|
: Additional OCR languages to install. By default, paperless comes
|
||||||
with English, German, Italian, Spanish and French. If your language
|
with English, German, Italian, Spanish and French. If your language
|
||||||
is not in this list, install additional languages with this
|
is not in this list, install additional languages with this
|
||||||
configuration option:
|
configuration option ([find the right LangCodes](https://tesseract-ocr.github.io/tessdoc/Data-Files-in-different-versions.html)):
|
||||||
|
|
||||||
``` bash
|
``` bash
|
||||||
PAPERLESS_OCR_LANGUAGES=tur ces
|
PAPERLESS_OCR_LANGUAGES=tur ces
|
||||||
```
|
```
|
||||||
|
|
||||||
|
Make sure it's a space separated list when using several values.
|
||||||
|
|
||||||
To actually use these languages, also set the default OCR language
|
To actually use these languages, also set the default OCR language
|
||||||
of paperless:
|
of paperless:
|
||||||
|
|
||||||
|
Loading…
x
Reference in New Issue
Block a user