documentation for the new configuration options

This commit is contained in:
jonaswinkler 2021-02-21 00:23:01 +01:00
parent 6da237dd9e
commit b978994525

View File

@ -202,7 +202,6 @@ Paperless uses `OCRmyPDF <https://ocrmypdf.readthedocs.io/en/latest/>`_ for
performing OCR on documents and images. Paperless uses sensible defaults for performing OCR on documents and images. Paperless uses sensible defaults for
most settings, but all of them can be configured to your needs. most settings, but all of them can be configured to your needs.
PAPERLESS_OCR_LANGUAGE=<lang> PAPERLESS_OCR_LANGUAGE=<lang>
Customize the language that paperless will attempt to use when Customize the language that paperless will attempt to use when
parsing documents. parsing documents.
@ -245,6 +244,39 @@ PAPERLESS_OCR_MODE=<mode>
The default is ``skip``, which only performs OCR when necessary and always The default is ``skip``, which only performs OCR when necessary and always
creates archived documents. creates archived documents.
PAPERLESS_OCR_CLEAN=<mode>
Tells paperless to use ``unpaper`` to clean any input document before
sending it to tesseract. This uses more resources, but generally results
in better OCR results. The following modes are available:
* ``clean``: Apply unpaper.
* ``clean-final``: Apply unpaper, and use the cleaned images to build the
output file instead of the original images.
* ``none``: Do not apply unpaper.
Defaults to ``clean``.
PAPERLESS_OCR_DESKEW=<bool>
Tells paperless to correct skewing (slight rotation of input images Mostly
due to improper scanning)
Defaults to ``false``, which disables this feature.
PAPERLESS_OCR_ROTATE_PAGES=<bool>
Tells paperless to correct page rotation (90°, 180° and 270° rotation).
Defaults to ``false``, which disables this feature.
PAPERLESS_OCR_ROTATE_PAGES_THRESHOLD=<num>
Adjust the threshold for automatic page rotation by ``PAPERLESS_OCR_ROTATE_PAGES``.
This is an arbitrary value reported by tesseract. "15" is a very conservative value,
whereas "2" is a very aggressive option and will often result correctly rotated pages
being rotated as well.
Defaults to "10".
PAPERLESS_OCR_OUTPUT_TYPE=<type> PAPERLESS_OCR_OUTPUT_TYPE=<type>
Specify the the type of PDF documents that paperless should produce. Specify the the type of PDF documents that paperless should produce.
@ -271,7 +303,6 @@ PAPERLESS_OCR_PAGES=<num>
Defaults to 0, which disables this feature and always uses all pages. Defaults to 0, which disables this feature and always uses all pages.
PAPERLESS_OCR_IMAGE_DPI=<num> PAPERLESS_OCR_IMAGE_DPI=<num>
Paperless will OCR any images you put into the system and convert them Paperless will OCR any images you put into the system and convert them
into PDF documents. This is useful if your scanner produces images. into PDF documents. This is useful if your scanner produces images.
@ -282,8 +313,8 @@ PAPERLESS_OCR_IMAGE_DPI=<num>
Set this to the DPI your scanner produces images at. Set this to the DPI your scanner produces images at.
Default is none, which causes paperless to fail if no DPI information is Default is none, which will automatically calculate image DPI so that
present in an image. the produced PDF documents are A4 sized.
PAPERLESS_OCR_USER_ARGS=<json> PAPERLESS_OCR_USER_ARGS=<json>