Add PAPERLESS_OCR_SKIP_ARCHIVE_FILE config setting

2026-02-11 23:59:31 -06:00 · 2023-02-23 22:42:57 -05:00
parent 8a89f5ae27
commit ca412e0184
8 changed files with 185 additions and 14 deletions
--- a/docs/configuration.md
+++ b/docs/configuration.md
@@ -415,12 +415,6 @@ modes are available:
    -   `skip`: Paperless skips all pages and will perform ocr only on
        pages where no text is present. This is the safest option.

-    -   `skip_noarchive`: In addition to skip, paperless won't create
-        an archived version of your documents when it finds any text in
-        them. This is useful if you don't want to have two
-        almost-identical versions of your digital documents in the media
-        folder. This is the fastest option.
-
    -   `redo`: Paperless will OCR all pages of your documents and
        attempt to replace any existing text layers with new text. This
        will be useful for documents from scanners that already
@@ -443,6 +437,19 @@ modes are available:
    Read more about this in the [OCRmyPDF
    documentation](https://ocrmypdf.readthedocs.io/en/latest/advanced.html#when-ocr-is-skipped).

+`PAPERLESS_OCR_SKIP_ARCHIVE_FILE=<mode>`
+
+: Specify when you would like paperless to skip creating an archived
+version of your documents. This is useful if you don't want to have two
+almost-identical versions of your documents in the media folder.
+
+    -   `never`: Never skip creating an archived version.
+    -   `with_text`: Skip creating an archived version for documents
+    that already have embedded text.
+    -   `always`: Always skip creating an archived version.
+
+    The default is `never`.
+
 `PAPERLESS_OCR_CLEAN=<mode>`

 : Tells paperless to use `unpaper` to clean any input document before
--- a/docs/setup.md
+++ b/docs/setup.md
@@ -818,9 +818,10 @@ performance immensely:
  other tasks).
 - Keep `PAPERLESS_OCR_MODE` at its default value `skip` and consider
  OCR'ing your documents before feeding them into paperless. Some
-  scanners are able to do this! You might want to even specify
-  `skip_noarchive` to skip archive file generation for already ocr'ed
-  documents entirely.
+  scanners are able to do this!
+- Set `PAPERLESS_OCR_SKIP_ARCHIVE_FILE` to `with_text` to skip archive
+  file generation for already ocr'ed documents, or `always` to skip it
+  for all documents.
 - If you want to perform OCR on the device, consider using
  `PAPERLESS_OCR_CLEAN=none`. This will speed up OCR times and use
  less memory at the expense of slightly worse OCR results.
--- a/docs/usage.md
+++ b/docs/usage.md
@@ -60,8 +60,8 @@ following operations on your documents:

    This process can be configured to fit your needs. If you don't want
    paperless to create archived versions for digital documents, you can
-    configure that by configuring `PAPERLESS_OCR_MODE=skip_noarchive`.
-    Please read the
+    configure that by configuring
+    `PAPERLESS_OCR_SKIP_ARCHIVE_FILE=with_text`. Please read the
    [relevant section in the documentation](/configuration#ocr).

 !!! note