Feature: collate two single-sided multipage scans (#3784)

* Feature: collate two single-sided scans Some ADF only support single-sided scans, making scanning double-sided documents a bit annoying. This new feature enables Paperless to do most of the work, by merging two seperate scans into a single one, collating the even and odd numbered pages. * Documentation: clarify that collation is disabled by default * Apply suggestions from code review Co-authored-by: shamoon <4887959+shamoon@users.noreply.github.com> * Address code review remarks * Grammar fixes --------- Co-authored-by: shamoon <4887959+shamoon@users.noreply.github.com>
2026-01-28 22:59:03 -06:00 · 2023-07-24 09:29:04 +02:00
parent 9f5d47c320
commit 8c7554e081
11 changed files with 584 additions and 48 deletions
--- a/docs/advanced_usage.md
+++ b/docs/advanced_usage.md
@@ -528,7 +528,7 @@ For how to enable barcode usage, see [the configuration](/configuration#barcodes
 The two settings may be enabled independently, but do have interactions as explained
 below.

-### Document Splitting
+### Document Splitting {#document-splitting}

 When enabled, Paperless will look for a barcode with the configured value and create a new document
 starting from the next page. The page with the barcode on it will _not_ be retained. It
@@ -543,3 +543,69 @@ If document splitting via barcode is also enabled, documents will be split when
 barcode is located. However, differing from the splitting, the page with the
 barcode _will_ be retained. This allows application of a barcode to any page, including
 one which holds data to keep in the document.
+
+## Automatic collation of double-sided documents {#collate}
+
+!!! note
+
+    If your scanner supports double-sided scanning natively, you do not need this feature.
+
+This feature is turned off by default, see [configuration](/configuration#collate) on how to turn it on.
+
+### Summary
+
+If you have a scanner with an automatic document feeder (ADF) that only scans a single side,
+this feature makes scanning double-sided documents much more convenient by automatically
+collating two separate scans into one document, reordering the pages as necessary.
+
+### Usage example
+
+Suppose you have a double-sided document with 6 pages (3 sheets of paper). First,
+put the stack into your ADF as normal, ensuring that page 1 is scanned first. Your ADF
+will now scan pages 1, 3, and 5. Then you (or your the scanner, if it supports it) upload
+the scan into the correct sub-directory of the consume folder (`double-sided` by default;
+keep in mind that Paperless will _not_ automatically create the directory for you.)
+Paperless will then process the scan and move it into an internal staging area.
+
+The next step is to turn your stack upside down (without reordering the sheets of paper),
+and scan it once again, your ADF will now scan pages 6, 4, and 2, in that order. Once this
+scan is copied into the sub-directory, Paperless will collate the previous scan with the
+new one, reversing the order of the pages on the second, "even numbered" scan. The
+resulting document will have the pages 1-6 in the correct order, and this new file will
+then be processed as normal.
+
+!!! tip
+
+    When scanning the even numbered pages, you can omit the last empty pages, if there are
+    any. For example, if page 6 is empty, you only need to scan pages 2 and 4. _Do not_ omit
+    empty pages in the middle of the document.
+
+### Things that could go wrong
+
+Paperless will notice when the first, "odd numbered" scan has less pages than the second
+scan (this can happen when e.g. the ADF skipped a few pages in the first pass). In that
+case, Paperless will remove the staging copy as well as the scan, and give you an error
+message asking you to restart the process from scratch, by scanning the odd pages again,
+followed by the even pages.
+
+Another thing that might happen is that you start a double sided scan, but then forget
+to upload the second file. To avoid collating the wrong documents if you then come back
+a day later to scan a new double-sided document, Paperless will only keep an "odd numbered
+pages" file for up to 30 minutes. If more time passes, it will consider the next incoming
+scan a completely new "odd numbered pages" one. The old staging file will get discarded.
+
+### Interaction with "subdirs as tags"
+
+The collation feature can be used together with the "subdirs as tags" feature (but this is not
+a requirement). Just create a correctly named double-sided subdir in the hierachy and upload
+your scans there. For example, both `double-sided/foo/bar` as well as `foo/bar/double-sided` will
+cause the collated document to be treated as if it were uploaded into `foo/bar` and receive both
+`foo` and `bar` tags, but not `double-sided`.
+
+### Interaction with document splitting
+
+You can use the [document splitting](#document-splitting) feature, but if you use a normal
+single-sided split marker page, the split document(s) will have an empty page at the front (or
+whatever else was on the backside of the split marker page.) You can work around that by having
+a split marker page that has the split barcode on _both_ sides. This way, the extra page will
+get automatically removed.
--- a/docs/configuration.md
+++ b/docs/configuration.md
@@ -1116,6 +1116,43 @@ combination with PAPERLESS_CONSUMER_BARCODE_UPSCALE bigger than 1.0.

    Defaults to "300"

+## Collate Double-Sided Documents {#collate}
+
+`PAPERLESS_CONSUMER_ENABLE_COLLATE_DOUBLE_SIDED=<bool>`
+
+: Enables automatic collation of two single-sided scans into a double-sided
+document.
+
+    This is useful if you have an automatic document feeder that only supports
+    single-sided scans, but you need to scan a double-sided document. If your
+    ADF supports double-sided scans natively, you do not need this feature.
+
+    `PAPERLESS_CONSUMER_RECURSIVE` must be enabled for this to work.
+
+    For more information, read the [corresponding section in the advanced
+    documentation](/advanced_usage#collate).
+
+    Defaults to false.
+
+`PAPERLESS_CONSUMER_COLLATE_DOUBLE_SIDED_SUBDIR_NAME=<str>`
+
+: The name of the subdirectory that the collate feature expects documents to
+arrive.
+
+    This only has an effect if `PAPERLESS_CONSUMER_ENABLE_COLLATE_DOUBLE_SIDED`
+    has been enabled. Note that Paperless will not automatically create the
+    directory.
+
+    Defaults to "double-sided".
+
+`PAPERLESS_CONSUMER_COLLATE_DOUBLE_SIDED_TIFF_SUPPORT=<bool>`
+: Whether TIFF image files should be supported when collating documents.
+This will automatically convert any TIFF image(s) to pdfs for later
+processing. This only has an effect if
+`PAPERLESS_CONSUMER_ENABLE_COLLATE_DOUBLE_SIDED` has been enabled.
+
+    Defaults to false.
+
 ## Binaries

 There are a few external software packages that Paperless expects to