documentation

2026-02-22 00:49:35 -06:00 · 2020-12-03 00:15:03 +01:00
parent 657c41ab37
commit a47623dbaf
2 changed files with 14 additions and 6 deletions
--- a/docs/configuration.rst
+++ b/docs/configuration.rst
@@ -183,11 +183,11 @@ PAPERLESS_OCR_MODE=<mode>
    are available:

    *   ``skip``: Paperless skips all pages and will perform ocr only on pages
-        where no text is present. This is the safest and fastest option.
+        where no text is present. This is the safest option.
    *   ``skip_noarchive``: In addition to skip, paperless won't create an
        archived version of your documents when it finds any text in them.
        This is useful if you don't want to have two almost-identical versions
-        of your digital documents in the media folder.
+        of your digital documents in the media folder. This is the fastest option.
    *   ``redo``: Paperless will OCR all pages of your documents and attempt to
        replace any existing text layers with new text. This will be useful for
        documents from scanners that already performed OCR with insufficient
--- a/docs/setup.rst
+++ b/docs/setup.rst
@@ -220,16 +220,24 @@ writing. Windows is not and will never be supported.
    *   ``python3-dev``

    *   ``imagemagick`` >= 6 for PDF conversion
-    *   ``unpaper`` for cleaning documents before OCR
-    *   ``ghostscript``
    *   ``optipng`` for optimising thumbnails
-    *   ``tesseract-ocr`` >= 4.0.0 for OCR
-    *   ``tesseract-ocr`` language packs (``tesseract-ocr-eng``, ``tesseract-ocr-deu``, etc)
    *   ``gnupg`` for handling encrypted documents
    *   ``libpoppler-cpp-dev`` for PDF to text conversion
    *   ``libmagic-dev`` for mime type detection
    *   ``libpq-dev`` for PostgreSQL

+    These dependencies are required for OCRmyPDF, which is used for text recognition.
+
+    *   ``unpaper``
+    *   ``ghostscript``
+    *   ``icc-profiles-free``
+    *   ``liblept5``
+    *   ``libxml2``
+    *   ``pngquant``
+    *   ``zlib1g``
+    *   ``tesseract-ocr`` >= 4.0.0 for OCR
+    *   ``tesseract-ocr`` language packs (``tesseract-ocr-eng``, ``tesseract-ocr-deu``, etc)
+
    You will also need ``build-essential``, ``python3-setuptools`` and ``python3-wheel``
    for installing some of the python dependencies. You can remove that
    again after installation.