Merge branch 'dev'

2025-12-08 00:51:13 -06:00 · 2023-08-04 11:38:12 -07:00
parent 3a597b8030 702e108b67
commit b51e639328
151 changed files with 22313 additions and 14471 deletions
--- a/docs/administration.md
+++ b/docs/administration.md
@@ -68,23 +68,23 @@ $ docker-compose down

 After that, [make a backup](#backup).

-1. If you pull the image from the docker hub, all you need to do is:
+1.  If you pull the image from the docker hub, all you need to do is:

-   ```shell-session
-   $ docker-compose pull
-   $ docker-compose up
-   ```
+    ```shell-session
+    $ docker-compose pull
+    $ docker-compose up
+    ```

-   The docker-compose files refer to the `latest` version, which is
-   always the latest stable release.
+    The docker-compose files refer to the `latest` version, which is
+    always the latest stable release.

-2. If you built the image yourself, do the following:
+1.  If you built the image yourself, do the following:

-   ```shell-session
-   $ git pull
-   $ docker-compose build
-   $ docker-compose up
-   ```
+    ```shell-session
+    $ git pull
+    $ docker-compose build
+    $ docker-compose up
+    ```

 Running `docker-compose up` will also apply any new database migrations.
 If you see everything working, press CTRL+C once to gracefully stop
@@ -470,7 +470,7 @@ The issues detected by the sanity checker are as follows:
 - Inaccessible thumbnails due to improper permissions.
 - Documents without any content (warning).
 - Orphaned files in the media directory (warning). These are files
-  that are not referenced by any document im paperless.
+  that are not referenced by any document in paperless.

 ```
 document_sanity_checker
--- a/docs/advanced_usage.md
+++ b/docs/advanced_usage.md
@@ -1,6 +1,6 @@
 # Advanced Topics

-Paperless offers a couple features that automate certain tasks and make
+Paperless offers a couple of features that automate certain tasks and make
 your life easier.

 ## Matching tags, correspondents, document types, and storage paths {#matching}
@@ -35,9 +35,9 @@ The following algorithms are available:
  (i.e. preserve ordering) in the PDF.
 - **Regular expression:** Parses the match as a regular expression and
  tries to find a match within the document.
- **Fuzzy match:** I don't know. Look at the source.
+- **Fuzzy match:** I don't know. Look at [the source](https://github.com/paperless-ngx/paperless-ngx/blob/main/src/documents/matching.py).
 - **Auto:** Tries to automatically match new documents. This does not
-  require you to set a match. See the notes below.
+  require you to set a match. See the [notes below](#automatic-matching).

 When using the _any_ or _all_ matching algorithms, you can search for
 terms that consist of multiple words by enclosing them in double quotes.
@@ -92,7 +92,7 @@ when using this feature:
  decide when not to assign a certain tag, correspondent, document
  type, or storage path. This will usually be the case as you start
  filling up paperless with documents. Example: If all your documents
-  are either from "Webshop" and "Bank", paperless will assign one
+  are either from "Webshop" or "Bank", paperless will assign one
  of these correspondents to ANY new document, if both are set to
  automatic matching.

@@ -101,7 +101,7 @@ when using this feature:
 Sometimes you may want to do something arbitrary whenever a document is
 consumed. Rather than try to predict what you may want to do, Paperless
 lets you execute scripts of your own choosing just before or after a
-document is consumed using a couple simple hooks.
+document is consumed using a couple of simple hooks.

 Just write a script, put it somewhere that Paperless can read & execute,
 and then put the path to that script in `paperless.conf` or
@@ -197,7 +197,7 @@ The script can be in any language, A simple shell script example:
 !!! warning

    The post consumption script should not modify the document files
-    directly
+    directly.

 The script's stdout and stderr will be logged line by line to the
 webserver log, along with the exit code of the script.
@@ -311,6 +311,7 @@ Paperless provides the following placeholders within filenames:
 - `{added_day}`: Day added only (number 01-31).
 - `{owner_username}`: Username of document owner, if any, or "none"
 - `{original_name}`: Document original filename, minus the extension, if any, or "none"
+- `{doc_pk}`: The paperless identifier (primary key) for the document.

 Paperless will try to conserve the information from your database as
 much as possible. However, some characters that you can use in document
@@ -528,7 +529,7 @@ For how to enable barcode usage, see [the configuration](/configuration#barcodes
 The two settings may be enabled independently, but do have interactions as explained
 below.

-### Document Splitting
+### Document Splitting {#document-splitting}

 When enabled, Paperless will look for a barcode with the configured value and create a new document
 starting from the next page. The page with the barcode on it will _not_ be retained. It
@@ -543,3 +544,69 @@ If document splitting via barcode is also enabled, documents will be split when
 barcode is located. However, differing from the splitting, the page with the
 barcode _will_ be retained. This allows application of a barcode to any page, including
 one which holds data to keep in the document.
+
+## Automatic collation of double-sided documents {#collate}
+
+!!! note
+
+    If your scanner supports double-sided scanning natively, you do not need this feature.
+
+This feature is turned off by default, see [configuration](/configuration#collate) on how to turn it on.
+
+### Summary
+
+If you have a scanner with an automatic document feeder (ADF) that only scans a single side,
+this feature makes scanning double-sided documents much more convenient by automatically
+collating two separate scans into one document, reordering the pages as necessary.
+
+### Usage example
+
+Suppose you have a double-sided document with 6 pages (3 sheets of paper). First,
+put the stack into your ADF as normal, ensuring that page 1 is scanned first. Your ADF
+will now scan pages 1, 3, and 5. Then you (or your the scanner, if it supports it) upload
+the scan into the correct sub-directory of the consume folder (`double-sided` by default;
+keep in mind that Paperless will _not_ automatically create the directory for you.)
+Paperless will then process the scan and move it into an internal staging area.
+
+The next step is to turn your stack upside down (without reordering the sheets of paper),
+and scan it once again, your ADF will now scan pages 6, 4, and 2, in that order. Once this
+scan is copied into the sub-directory, Paperless will collate the previous scan with the
+new one, reversing the order of the pages on the second, "even numbered" scan. The
+resulting document will have the pages 1-6 in the correct order, and this new file will
+then be processed as normal.
+
+!!! tip
+
+    When scanning the even numbered pages, you can omit the last empty pages, if there are
+    any. For example, if page 6 is empty, you only need to scan pages 2 and 4. _Do not_ omit
+    empty pages in the middle of the document.
+
+### Things that could go wrong
+
+Paperless will notice when the first, "odd numbered" scan has less pages than the second
+scan (this can happen when e.g. the ADF skipped a few pages in the first pass). In that
+case, Paperless will remove the staging copy as well as the scan, and give you an error
+message asking you to restart the process from scratch, by scanning the odd pages again,
+followed by the even pages.
+
+Another thing that might happen is that you start a double sided scan, but then forget
+to upload the second file. To avoid collating the wrong documents if you then come back
+a day later to scan a new double-sided document, Paperless will only keep an "odd numbered
+pages" file for up to 30 minutes. If more time passes, it will consider the next incoming
+scan a completely new "odd numbered pages" one. The old staging file will get discarded.
+
+### Interaction with "subdirs as tags"
+
+The collation feature can be used together with the "subdirs as tags" feature (but this is not
+a requirement). Just create a correctly named double-sided subdir in the hierachy and upload
+your scans there. For example, both `double-sided/foo/bar` as well as `foo/bar/double-sided` will
+cause the collated document to be treated as if it were uploaded into `foo/bar` and receive both
+`foo` and `bar` tags, but not `double-sided`.
+
+### Interaction with document splitting
+
+You can use the [document splitting](#document-splitting) feature, but if you use a normal
+single-sided split marker page, the split document(s) will have an empty page at the front (or
+whatever else was on the backside of the split marker page.) You can work around that by having
+a split marker page that has the split barcode on _both_ sides. This way, the extra page will
+get automatically removed.
--- a/docs/configuration.md
+++ b/docs/configuration.md
@@ -524,7 +524,7 @@ parsing documents.

 `PAPERLESS_OCR_MODE=<mode>`

-: Tell paperless when and how to perform ocr on your documents. Four
+: Tell paperless when and how to perform ocr on your documents. Three
 modes are available:

    -   `skip`: Paperless skips all pages and will perform ocr only on
@@ -1116,6 +1116,43 @@ combination with PAPERLESS_CONSUMER_BARCODE_UPSCALE bigger than 1.0.

    Defaults to "300"

+## Collate Double-Sided Documents {#collate}
+
+`PAPERLESS_CONSUMER_ENABLE_COLLATE_DOUBLE_SIDED=<bool>`
+
+: Enables automatic collation of two single-sided scans into a double-sided
+document.
+
+    This is useful if you have an automatic document feeder that only supports
+    single-sided scans, but you need to scan a double-sided document. If your
+    ADF supports double-sided scans natively, you do not need this feature.
+
+    `PAPERLESS_CONSUMER_RECURSIVE` must be enabled for this to work.
+
+    For more information, read the [corresponding section in the advanced
+    documentation](/advanced_usage#collate).
+
+    Defaults to false.
+
+`PAPERLESS_CONSUMER_COLLATE_DOUBLE_SIDED_SUBDIR_NAME=<str>`
+
+: The name of the subdirectory that the collate feature expects documents to
+arrive.
+
+    This only has an effect if `PAPERLESS_CONSUMER_ENABLE_COLLATE_DOUBLE_SIDED`
+    has been enabled. Note that Paperless will not automatically create the
+    directory.
+
+    Defaults to "double-sided".
+
+`PAPERLESS_CONSUMER_COLLATE_DOUBLE_SIDED_TIFF_SUPPORT=<bool>`
+: Whether TIFF image files should be supported when collating documents.
+This will automatically convert any TIFF image(s) to pdfs for later
+processing. This only has an effect if
+`PAPERLESS_CONSUMER_ENABLE_COLLATE_DOUBLE_SIDED` has been enabled.
+
+    Defaults to false.
+
 ## Binaries

 There are a few external software packages that Paperless expects to
@@ -1123,7 +1160,7 @@ find on your system when it starts up. Unless you've done something
 creative with their installation, you probably won't need to edit any
 of these. However, if you've installed these programs somewhere where
 simply typing the name of the program doesn't automatically execute it
-(ie. the program isn't in your \$PATH), then you'll need to specify
+(ie. the program isn't in your $PATH), then you'll need to specify
 the literal path for that program.

 `PAPERLESS_CONVERT_BINARY=<path>`
@@ -1207,7 +1244,7 @@ actual group ID on the host system, which you can get by executing
 with English, German, Italian, Spanish and French. If your language
 is not in this list, install additional languages with this
 configuration option. You will need to [find the right LangCodes](https://tesseract-ocr.github.io/tessdoc/Data-Files-in-different-versions.html)
-but note that (tesseract-ocr-\* package names)[https://packages.debian.org/bullseye/graphics/]
+but note that [tesseract-ocr-\* package names](https://packages.debian.org/bullseye/graphics/)
 do not always correspond with the language codes e.g. "chi_tra" should be
 specified as "chi-tra".

--- a/docs/development.md
+++ b/docs/development.md
@@ -58,7 +58,7 @@ first-time setup.

 !!! note

-    Every command is executed directly from the root folder of the project unless specified otherwise.
+      Every command is executed directly from the root folder of the project unless specified otherwise.

 1.  Install prerequisites + pipenv as mentioned in
    [Bare metal route](/setup#bare_metal).
@@ -177,68 +177,69 @@ The front end is built using AngularJS. In order to get started, you need Node.j

    The following commands are all performed in the `src-ui`-directory. You will need a running back end (including an active session) to connect to the back end API. To spin it up refer to the commands under the section [above](#back-end-development).

-1. Install the Angular CLI. You might need sudo privileges
-   to perform this command:
+1.  Install the Angular CLI. You might need sudo privileges to perform this command:

-   ```bash
-   $ npm install -g @angular/cli
-   ```
+    ```bash
+    $ npm install -g @angular/cli
+    ```

-2. Make sure that it's on your path.
+2.  Make sure that it's on your path.

-3. Install all necessary modules:
+3.  Install all necessary modules:

-   ```bash
-   $ npm install
-   ```
+    ```bash
+    $ npm install
+    ```

-4. You can launch a development server by running:
+4.  You can launch a development server by running:

-   ```bash
-   $ ng serve
-   ```
+    ```bash
+    $ ng serve
+    ```

-   This will automatically update whenever you save. However, in-place
-   compilation might fail on syntax errors, in which case you need to
-   restart it.
+    This will automatically update whenever you save. However, in-place
+    compilation might fail on syntax errors, in which case you need to
+    restart it.

-   By default, the development server is available on `http://localhost:4200/` and is configured to access the API at
-   `http://localhost:8000/api/`, which is the default of the backend. If you enabled `DEBUG` on the back end, several security overrides for allowed hosts, CORS and X-Frame-Options are in place so that the front end behaves exactly as in production.
+    By default, the development server is available on `http://localhost:4200/` and is configured to access the API at
+    `http://localhost:8000/api/`, which is the default of the backend. If you enabled `DEBUG` on the back end, several security overrides for allowed hosts, CORS and X-Frame-Options are in place so that the front end behaves exactly as in production.

 ### Testing and code style

- The front end code (.ts, .html, .scss) use `prettier` for code
-  formatting via the Git `pre-commit` hooks which run automatically on
-  commit. See [above](#code-formatting-with-pre-commit-hooks) for installation instructions. You can also run this via the CLI with a
-  command such as
+The front end code (.ts, .html, .scss) use `prettier` for code
+formatting via the Git `pre-commit` hooks which run automatically on
+commit. See [above](#code-formatting-with-pre-commit-hooks) for installation instructions. You can also run this via the CLI with a
+command such as

-  ```bash
-  $ git ls-files -- '*.ts' | xargs pre-commit run prettier --files
-  ```
+```bash
+$ git ls-files -- '*.ts' | xargs pre-commit run prettier --files
+```

- Front end testing uses Jest and Playwright. Unit tests and e2e tests,
-  respectively, can be run non-interactively with:
+Front end testing uses Jest and Playwright. Unit tests and e2e tests,
+respectively, can be run non-interactively with:

-  ```bash
-  $ ng test
-  $ npx playwright test
-  ```
+```bash
+$ ng test
+$ npx playwright test
+```

-  - Playwright also includes a UI which can be run with:
+Playwright also includes a UI which can be run with:

-    ```bash
-    $ npx playwright test --ui
-    ```
+```bash
+$ npx playwright test --ui
+```

- In order to build the front end and serve it as part of Django, execute:
+### Building the frontend

-  ```bash
-  $ ng build --configuration production
-  ```
+In order to build the front end and serve it as part of Django, execute:

-  This will build the front end and put it in a location from which the
-  Django server will serve it as static content. This way, you can verify
-  that authentication is working.
+```bash
+$ ng build --configuration production
+```
+
+This will build the front end and put it in a location from which the
+Django server will serve it as static content. This way, you can verify
+that authentication is working.

 ## Localization

--- a/docs/faq.md
+++ b/docs/faq.md
@@ -3,10 +3,11 @@
 ## _What's the general plan for Paperless-ngx?_

 **A:** While Paperless-ngx is already considered largely
-"feature-complete" it is a community-driven project and development
-will be guided in this way. New features can be submitted via GitHub
-discussions and "up-voted" by the community but this is not a
-guarantee the feature will be implemented. This project will always be
+"feature-complete", it is a community-driven project and development
+will be guided in this way. New features can be submitted via
+[GitHub discussions](https://github.com/paperless-ngx/paperless-ngx/discussions)
+and "up-voted" by the community, but this is not a
+guarantee that the feature will be implemented. This project will always be
 open to collaboration in the form of PRs, ideas etc.

 ## _I'm using docker. Where are my documents?_
@@ -58,7 +59,7 @@ elsewhere. Here are a couple notes about that.
  WebP images are processed with OCR and converted into PDF documents.
 - Plain text documents are supported as well and are added verbatim to
  paperless.
- With the optional Tika integration enabled (see [Tika configuration](/configuration#tika),
+- With the optional Tika integration enabled (see [Tika configuration](https://docs.paperless-ngx.com/configuration#tika)),
  Paperless also supports various Office documents (.docx, .doc, odt,
  .ppt, .pptx, .odp, .xls, .xlsx, .ods).

@@ -82,7 +83,7 @@ has to do much less work to serve the data.
 ## _How do I install paperless-ngx on Raspberry Pi?_

 **A:** Docker images are available for armv7 and arm64 hardware, so just
-follow the docker-compose instructions. Apart from more required disk
+follow the [docker-compose instructions](https://docs.paperless-ngx.com/setup/#installation). Apart from more required disk
 space compared to a bare metal installation, docker comes with close to
 zero overhead, even on Raspberry Pi.