Merge branch 'dev'

2025-11-17 04:16:54 -06:00 · 2025-08-16 09:47:48 -07:00
parent 5d6ea70434 243b3bc812
commit 00e629d957
358 changed files with 51680 additions and 45489 deletions
--- a/docs/administration.md
+++ b/docs/administration.md
@@ -179,10 +179,14 @@ following:

 ### Database Upgrades

-In general, paperless does not require a specific version of PostgreSQL or MariaDB and it is
+Paperless-ngx is compatible with Django-supported versions of PostgreSQL and MariaDB and it is generally
 safe to update them to newer versions. However, you should always take a backup and follow
 the instructions from your database's documentation for how to upgrade between major versions.

+!!! note
+
+    As of Paperless-ngx v2.18, the minimum supported version of PostgreSQL is 13.
+
 For PostgreSQL, refer to [Upgrading a PostgreSQL Cluster](https://www.postgresql.org/docs/current/upgrading.html).

 For MariaDB, refer to [Upgrading MariaDB](https://mariadb.com/kb/en/upgrading/)
@@ -306,7 +310,7 @@ in dedicated folders according to their nature: `archive`, `originals`,
 If `-sm` or `--split-manifest` is provided, information about document
 will be placed in individual json files, instead of a single JSON file. The main
 manifest.json will still contain application wide information (e.g. tags, correspondent,
-documenttype, etc)
+document type, etc)

 If `-z` or `--zip` is provided, the export will be a zip file
 in the target directory, named according to the current local date or the
@@ -457,6 +461,22 @@ of the index and usually makes queries faster and also ensures that the
 autocompletion works properly. This command is regularly invoked by the
 task scheduler.

+### Clearing the database read cache
+
+If the database read cache is enabled, **you must run this command** after making any changes to the database outside the application context.
+This includes operations such as restoring a database backup or executing SQL statements like UPDATE, INSERT, DELETE, ALTER, CREATE, or DROP.
+
+Failing to invalidate the cache after such modifications can lead to stale data being served from the cache, and **may cause data corruption** or inconsistent behavior in the application.
+
+Use the following management command to clear the cache:
+
+```
+invalidate_cachalot
+```
+
+!!! info
+The database read cache is based on Django-Cachalot. You can refer to their [documentation](https://django-cachalot.readthedocs.io/en/latest/quickstart.html#manage-py-command).
+
 ### Managing filenames {#renamer}

 If you use paperless' feature to
--- a/docs/advanced_usage.md
+++ b/docs/advanced_usage.md
@@ -434,6 +434,136 @@ provided. The template is provided as a string, potentially multiline, and rende
 In addition, the entire Document instance is available to be utilized in a more advanced way, as well as some variables which only make sense to be accessed
 with more complex logic.

+#### Custom Jinja2 Filters
+
+##### Custom Field Access
+
+The `get_cf_value` filter retrieves a value from custom field data with optional default fallback.
+
+###### Syntax
+
+```jinja2
+{{ custom_fields | get_cf_value('field_name') }}
+{{ custom_fields | get_cf_value('field_name', 'default_value') }}
+```
+
+###### Parameters
+
+-   `custom_fields`: This _must_ be the provided custom field data
+-   `name` (str): Name of the custom field to retrieve
+-   `default` (str, optional): Default value to return if field is not found or has no value
+
+###### Returns
+
+-   `str | None`: The field value, default value, or `None` if neither exists
+
+###### Examples
+
+```jinja2
+<!-- Basic usage -->
+{{ custom_fields | get_cf_value('department') }}
+
+<!-- With default value -->
+{{ custom_fields | get_cf_value('phone', 'Not provided') }}
+```
+
+##### Datetime Formatting
+
+The `format_datetime`filter formats a datetime string or datetime object using Python's strftime formatting.
+
+###### Syntax
+
+```jinja2
+{{ datetime_value | format_datetime('%Y-%m-%d %H:%M:%S') }}
+```
+
+###### Parameters
+
+-   `value` (str | datetime): Date/time value to format (strings will be parsed automatically)
+-   `format` (str): Python strftime format string
+
+###### Returns
+
+-   `str`: Formatted datetime string
+
+###### Examples
+
+```jinja2
+<!-- Format datetime object -->
+{{ created_at | format_datetime('%B %d, %Y at %I:%M %p') }}
+<!-- Output: "January 15, 2024 at 02:30 PM" -->
+
+<!-- Format datetime string -->
+{{ "2024-01-15T14:30:00" | format_datetime('%m/%d/%Y') }}
+<!-- Output: "01/15/2024" -->
+
+<!-- Custom formatting -->
+{{ timestamp | format_datetime('%A, %B %d, %Y') }}
+<!-- Output: "Monday, January 15, 2024" -->
+```
+
+See the [strftime format code documentation](https://docs.python.org/3.13/library/datetime.html#strftime-and-strptime-format-codes)
+for the possible codes and their meanings.
+
+##### Date Localization
+
+The `localize_date` filter formats a date or datetime object into a localized string using Babel internationalization.
+This takes into account the provided locale for translation.
+
+###### Syntax
+
+```jinja2
+{{ date_value | localize_date('medium', 'en_US') }}
+{{ datetime_value | localize_date('short', 'fr_FR') }}
+```
+
+###### Parameters
+
+-   `value` (date | datetime): Date or datetime object to format (datetime should be timezone-aware)
+-   `format` (str): Format type - either a Babel preset ('short', 'medium', 'long', 'full') or custom pattern
+-   `locale` (str): Locale code for localization (e.g., 'en_US', 'fr_FR', 'de_DE')
+
+###### Returns
+
+-   `str`: Localized, formatted date string
+
+###### Examples
+
+```jinja2
+<!-- Preset formats -->
+{{ created_date | localize_date('short', 'en_US') }}
+<!-- Output: "1/15/24" -->
+
+{{ created_date | localize_date('medium', 'en_US') }}
+<!-- Output: "Jan 15, 2024" -->
+
+{{ created_date | localize_date('long', 'en_US') }}
+<!-- Output: "January 15, 2024" -->
+
+{{ created_date | localize_date('full', 'en_US') }}
+<!-- Output: "Monday, January 15, 2024" -->
+
+<!-- Different locales -->
+{{ created_date | localize_date('medium', 'fr_FR') }}
+<!-- Output: "15 janv. 2024" -->
+
+{{ created_date | localize_date('medium', 'de_DE') }}
+<!-- Output: "15.01.2024" -->
+
+<!-- Custom patterns -->
+{{ created_date | localize_date('dd/MM/yyyy', 'en_GB') }}
+<!-- Output: "15/01/2024" -->
+```
+
+See the [supported format codes](https://unicode.org/reports/tr35/tr35-dates.html#Date_Format_Patterns) for more options.
+
+### Format Presets
+
+-   **short**: Abbreviated format (e.g., "1/15/24")
+-   **medium**: Medium-length format (e.g., "Jan 15, 2024")
+-   **long**: Long format with full month name (e.g., "January 15, 2024")
+-   **full**: Full format including day of week (e.g., "Monday, January 15, 2024")
+
 #### Additional Variables

 -   `{{ tag_name_list }}`: A list of tag names applied to the document, ordered by the tag name. Note this is a list, not a single string
--- a/docs/api.md
+++ b/docs/api.md
@@ -282,6 +282,18 @@ The following methods are supported:
        -   `"merge": true or false` (defaults to false)
    -   The `merge` flag determines if the supplied permissions will overwrite all existing permissions (including
        removing them) or be merged with existing permissions.
+-   `edit_pdf`
+    -   Requires `parameters`:
+        -   `"doc_ids": [DOCUMENT_ID]` A list of a single document ID to edit.
+        -   `"operations": [OPERATION, ...]` A list of operations to perform on the documents. Each operation is a dictionary
+            with the following keys:
+            -   `"page": PAGE_NUMBER` The page number to edit (1-based).
+            -   `"rotate": DEGREES` Optional rotation in degrees (90, 180, 270).
+            -   `"doc": OUTPUT_DOCUMENT_INDEX` Optional index of the output document for split operations.
+    -   Optional `parameters`:
+        -   `"delete_original": true` to delete the original documents after editing.
+        -   `"update_document": true` to update the existing document with the edited PDF.
+        -   `"include_metadata": true` to copy metadata from the original document to the edited document.
 -   `merge`
    -   No additional `parameters` required.
    -   The ordering of the merged document is determined by the list of IDs.
--- a/docs/changelog.md
+++ b/docs/changelog.md
@@ -6004,7 +6004,6 @@ primarily.
        a very good job at ocr'ing a document with the default
        language. Certain language specifics such as umlauts may not get
        picked up properly.
-    -   `PAPERLESS_DEBUG` defaults to `false`.
    -   The presence of `PAPERLESS_DBHOST` now determines whether to use
        PostgreSQL or SQLite.
    -   `PAPERLESS_OCR_THREADS` is gone and replaced with
--- a/docs/configuration.md
+++ b/docs/configuration.md
@@ -159,6 +159,58 @@ Available options are `postgresql` and `mariadb`.

    Defaults to unset, which uses Django’s built-in defaults.

+#### [`PAPERLESS_DB_POOLSIZE=<int>`](#PAPERLESS_DB_POOLSIZE) {#PAPERLESS_DB_POOLSIZE}
+
+: Defines the maximum number of database connections to keep in the pool.
+
+    Only applies to PostgreSQL. This setting is ignored for other database engines.
+
+    The value must be greater than or equal to 1 to be used.
+    Defaults to unset, which disables connection pooling.
+
+    !!! note
+
+    A small pool is typically sufficient — for example, a size of 4.
+    Make sure your PostgreSQL server's max_connections setting is large enough to handle:
+    ```(Paperless workers + Celery workers) × pool size + safety margin```
+    For example, with 4 Paperless workers and 2 Celery workers, and a pool size of 4:
+    (4 + 2) × 4 + 10 = 34 connections required.
+
+#### [`PAPERLESS_DB_READ_CACHE_ENABLED=<bool>`](#PAPERLESS_DB_READ_CACHE_ENABLED) {#PAPERLESS_DB_READ_CACHE_ENABLED}
+
+: Caches the database read query results into Redis. This can significantly improve application response times by caching database queries, at the cost of slightly increased memory usage.
+
+    Defaults to `false`.
+
+    !!! danger
+
+    **Do not modify the database outside the application while it is running.**
+    This includes actions such as restoring a backup, upgrading the database, or performing manual inserts. All external modifications must be done **only when the application is stopped**.
+    After making any such changes, you **must invalidate the DB read cache** using the `invalidate_cachalot` management command.
+
+#### [`PAPERLESS_READ_CACHE_TTL=<int>`](#PAPERLESS_READ_CACHE_TTL) {#PAPERLESS_READ_CACHE_TTL}
+
+: Specifies how long (in seconds) read data should be cached.
+
+    Allowed values are between `1` (one second) and `31536000` (one year). Defaults to `3600` (one hour).
+
+    !!! warning
+
+    A high TTL increases memory usage over time. Memory may be used until end of TTL, even if the cache is invalidated with the `invalidate_cachalot` command.
+
+In case of an out-of-memory (OOM) situation, Redis may stop accepting new data — including cache entries, scheduled tasks, and documents to consume.
+If your system has limited RAM, consider configuring a dedicated Redis instance for the read cache, with a memory limit and the eviction policy set to `allkeys-lru`.
+For more details, refer to the [Redis eviction policy documentation](https://redis.io/docs/latest/develop/reference/eviction/), and see the `PAPERLESS_READ_CACHE_REDIS_URL` setting to specify a separate Redis broker.
+
+#### [`PAPERLESS_READ_CACHE_REDIS_URL=<url>`](#PAPERLESS_READ_CACHE_REDIS_URL) {#PAPERLESS_READ_CACHE_REDIS_URL}
+
+: Defines the Redis instance used for the read cache.
+
+    Defaults to `None`.
+
+    !!! Note
+    If this value is not set, the same Redis instance used for scheduled tasks will be used for caching as well.
+
 ## Optional Services

 ### Tika {#tika}
@@ -968,6 +1020,22 @@ still perform some basic text pre-processing before matching.

    Defaults to 1.

+#### [`PAPERLESS_DATE_PARSER_LANGUAGES=<lang>`](#PAPERLESS_DATE_PARSER_LANGUAGES) {#PAPERLESS_DATE_PARSER_LANGUAGES}
+
+Specifies which language Paperless should use when parsing dates from documents.
+
+    This should be a language code supported by the dateparser library,
+    for example: "en", or a combination such as "en+de".
+    Locales are also supported (e.g., "en-AU").
+    Multiple languages can be combined using "+", for example: "en+de" or "en-AU+de".
+    For valid values, refer to the list of supported languages and locales in the [dateparser documentation](https://dateparser.readthedocs.io/en/latest/supported_locales.html).
+
+    Set this to match the languages in which most of your documents are written.
+    If not set, Paperless will attempt to infer the language(s) from the OCR configuration (`PAPERLESS_OCR_LANGUAGE`).
+
+!!! note
+This format differs from the `PAPERLESS_OCR_LANGUAGE` setting, which uses ISO 639-2 codes (3 letters, e.g., "eng+deu" for Tesseract OCR).
+
 #### [`PAPERLESS_EMAIL_TASK_CRON=<cron expression>`](#PAPERLESS_EMAIL_TASK_CRON) {#PAPERLESS_EMAIL_TASK_CRON}

 : Configures the scheduled email fetching frequency. The value
@@ -1214,6 +1282,30 @@ within your documents.

    Defaults to false.

+## Workflow webhooks
+
+#### [`PAPERLESS_WEBHOOKS_ALLOWED_SCHEMES=<str>`](#PAPERLESS_WEBHOOKS_ALLOWED_SCHEMES) {#PAPERLESS_WEBHOOKS_ALLOWED_SCHEMES}
+
+: A comma-separated list of allowed schemes for webhooks. This setting
+controls which URL schemes are permitted for webhook URLs.
+
+    Defaults to `http,https`.
+
+#### [`PAPERLESS_WEBHOOKS_ALLOWED_PORTS=<str>`](#PAPERLESS_WEBHOOKS_ALLOWED_PORTS) {#PAPERLESS_WEBHOOKS_ALLOWED_PORTS}
+
+: A comma-separated list of allowed ports for webhooks. This setting
+controls which ports are permitted for webhook URLs. For example, if you
+set this to `80,443`, webhooks will only be sent to URLs that use these
+ports.
+
+    Defaults to empty list, which allows all ports.
+
+#### [`PAPERLESS_WEBHOOKS_ALLOW_INTERNAL_REQUESTS=<bool>`](#PAPERLESS_WEBHOOKS_ALLOW_INTERNAL_REQUESTS) {#PAPERLESS_WEBHOOKS_ALLOW_INTERNAL_REQUESTS}
+
+: If set to false, webhooks cannot be sent to internal URLs (e.g., localhost).
+
+    Defaults to true, which allows internal requests.
+
 ### Polling {#polling}

 #### [`PAPERLESS_CONSUMER_POLLING=<num>`](#PAPERLESS_CONSUMER_POLLING) {#PAPERLESS_CONSUMER_POLLING}
--- a/docs/development.md
+++ b/docs/development.md
@@ -95,13 +95,13 @@ first-time setup.

 7.  You can now either ...

-    -   install redis or
+    -   install Redis or

-    -   use the included `scripts/start_services.sh` to use docker to fire
-        up a redis instance (and some other services such as tika,
-        gotenberg and a database server) or
+    -   use the included `scripts/start_services.sh` to use Docker to fire
+        up a Redis instance (and some other services such as Tika,
+        Gotenberg and a database server) or

-    -   spin up a bare redis container
+    -   spin up a bare Redis container

        ```
        docker run -d -p 6379:6379 --restart unless-stopped redis:latest
@@ -147,7 +147,7 @@ $ ng build --configuration production
 ### Testing

 -   Run `pytest` in the `src/` directory to execute all tests. This also
-    generates a HTML coverage report. When runnings test, `paperless.conf`
+    generates a HTML coverage report. When running tests, `paperless.conf`
    is loaded as well. However, the tests rely on the default
    configuration. This is not ideal. But for now, make sure no settings
    except for DEBUG are overridden when testing.
--- a/docs/index.md
+++ b/docs/index.md
@@ -30,7 +30,7 @@ physical documents into a searchable online archive so you can keep, well, _less
 -   Utilizes the open-source Tesseract engine to recognize more than 100 languages.
 -   Documents are saved as PDF/A format which is designed for long term storage, alongside the unaltered originals.
 -   Uses machine-learning to automatically add tags, correspondents and document types to your documents.
-   Supports PDF documents, images, plain text files, Office documents (Word, Excel, Powerpoint, and LibreOffice equivalents)[^1] and more.
+-   Supports PDF documents, images, plain text files, Office documents (Word, Excel, PowerPoint, and LibreOffice equivalents)[^1] and more.
 -   Paperless stores your documents plain on disk. Filenames and folders are managed by paperless and their format can be configured freely with different configurations assigned to different documents.
 -   **Beautiful, modern web application** that features:
    -   Customizable dashboard with statistics.
--- a/docs/setup.md
+++ b/docs/setup.md
@@ -445,7 +445,7 @@ are released, dependency support is confirmed, etc.
 13. Configure ImageMagick to allow processing of PDF documents. Most
    distributions have this disabled by default, since PDF documents can
    contain malware. If you don't do this, paperless will fall back to
-    ghostscript for certain steps such as thumbnail generation.
+    Ghostscript for certain steps such as thumbnail generation.

    Edit `/etc/ImageMagick-6/policy.xml` and adjust

--- a/docs/troubleshooting.md
+++ b/docs/troubleshooting.md
@@ -335,7 +335,7 @@ You may see errors when deleting documents like:
 Data too long for column 'transaction_id' at row 1
 ```

-This error can occur in installations which have upgraded from a version of Paperless-ngx that used Django 4 (Paperless-ngx versions prior to v2.13.0) with a MariaDB/MySQL database. Due to the backawards-incompatible change in Django 5, the column "documents_document.transaction_id" will need to be re-created, which can be done with a one-time run of the following management command:
+This error can occur in installations which have upgraded from a version of Paperless-ngx that used Django 4 (Paperless-ngx versions prior to v2.13.0) with a MariaDB/MySQL database. Due to the backwards-incompatible change in Django 5, the column "documents_document.transaction_id" will need to be re-created, which can be done with a one-time run of the following management command:

 ```shell-session
 $ python3 manage.py convert_mariadb_uuid
--- a/docs/usage.md
+++ b/docs/usage.md
@@ -30,6 +30,9 @@ Each document has data fields that you can assign to them:
 -   A _document type_ is used to demarcate the type of a document such
    as letter, bank statement, invoice, contract, etc. It is used to
    identify what a document is about.
+-   The document _storage path_ is the location where the document files
+    are stored. See [Storage Paths](advanced_usage.md#storage-paths) for
+    more information.
 -   The _date added_ of a document is the date the document was scanned
    into paperless. You cannot and should not change this date.
 -   The _date created_ of a document is the date the document was
@@ -496,6 +499,10 @@ The following workflow action types are available:
 -   Encoding for the request body, either JSON or form data
 -   The request headers as key-value pairs

+For security reasons, webhooks can be limited to specific ports and disallowed from connecting to local URLs. See the relevant
+[configuration settings](configuration.md#workflow-webhooks) to change this behavior. If you are allowing non-admins to create workflows,
+you may want to adjust these settings to prevent abuse.
+
 #### Workflow placeholders

 Some workflow text can include placeholders but the available options differ depending on the type of
@@ -573,12 +580,14 @@ The following custom field types are supported:

 ## PDF Actions

-Paperless-ngx supports four basic editing operations for PDFs (these operations currently cannot be performed on non-PDF files):
+Paperless-ngx supports basic editing operations for PDFs (these operations currently cannot be performed on non-PDF files). When viewing an individual document you can
+open the 'PDF Editor' to use a simple UI for re-arranging, rotating, deleting pages and splitting documents.

 -   Merging documents: available when selecting multiple documents for 'bulk editing'.
-   Rotating documents: available when selecting multiple documents for 'bulk editing' and from an individual document's details page.
-   Splitting documents: available from an individual document's details page.
-   Deleting pages: available from an individual document's details page.
+-   Rotating documents: available when selecting multiple documents for 'bulk editing' and via the pdf editor on an individual document's details page.
+-   Splitting documents: via the pdf editor on an individual document's details page.
+-   Deleting pages: via the pdf editor on an individual document's details page.
+-   Re-arranging pages: via the pdf editor on an individual document's details page.

 !!! important