documentation

This commit is contained in:
jonaswinkler 2020-12-03 00:15:03 +01:00
parent 657c41ab37
commit a47623dbaf
2 changed files with 14 additions and 6 deletions

View File

@ -183,11 +183,11 @@ PAPERLESS_OCR_MODE=<mode>
are available:
* ``skip``: Paperless skips all pages and will perform ocr only on pages
where no text is present. This is the safest and fastest option.
where no text is present. This is the safest option.
* ``skip_noarchive``: In addition to skip, paperless won't create an
archived version of your documents when it finds any text in them.
This is useful if you don't want to have two almost-identical versions
of your digital documents in the media folder.
of your digital documents in the media folder. This is the fastest option.
* ``redo``: Paperless will OCR all pages of your documents and attempt to
replace any existing text layers with new text. This will be useful for
documents from scanners that already performed OCR with insufficient

View File

@ -220,16 +220,24 @@ writing. Windows is not and will never be supported.
* ``python3-dev``
* ``imagemagick`` >= 6 for PDF conversion
* ``unpaper`` for cleaning documents before OCR
* ``ghostscript``
* ``optipng`` for optimising thumbnails
* ``tesseract-ocr`` >= 4.0.0 for OCR
* ``tesseract-ocr`` language packs (``tesseract-ocr-eng``, ``tesseract-ocr-deu``, etc)
* ``gnupg`` for handling encrypted documents
* ``libpoppler-cpp-dev`` for PDF to text conversion
* ``libmagic-dev`` for mime type detection
* ``libpq-dev`` for PostgreSQL
These dependencies are required for OCRmyPDF, which is used for text recognition.
* ``unpaper``
* ``ghostscript``
* ``icc-profiles-free``
* ``liblept5``
* ``libxml2``
* ``pngquant``
* ``zlib1g``
* ``tesseract-ocr`` >= 4.0.0 for OCR
* ``tesseract-ocr`` language packs (``tesseract-ocr-eng``, ``tesseract-ocr-deu``, etc)
You will also need ``build-essential``, ``python3-setuptools`` and ``python3-wheel``
for installing some of the python dependencies. You can remove that
again after installation.