Merge branch 'dev'

This commit is contained in:
shamoon
2025-08-16 09:47:48 -07:00
358 changed files with 51680 additions and 45489 deletions

View File

@@ -179,10 +179,14 @@ following:
### Database Upgrades
In general, paperless does not require a specific version of PostgreSQL or MariaDB and it is
Paperless-ngx is compatible with Django-supported versions of PostgreSQL and MariaDB and it is generally
safe to update them to newer versions. However, you should always take a backup and follow
the instructions from your database's documentation for how to upgrade between major versions.
!!! note
As of Paperless-ngx v2.18, the minimum supported version of PostgreSQL is 13.
For PostgreSQL, refer to [Upgrading a PostgreSQL Cluster](https://www.postgresql.org/docs/current/upgrading.html).
For MariaDB, refer to [Upgrading MariaDB](https://mariadb.com/kb/en/upgrading/)
@@ -306,7 +310,7 @@ in dedicated folders according to their nature: `archive`, `originals`,
If `-sm` or `--split-manifest` is provided, information about document
will be placed in individual json files, instead of a single JSON file. The main
manifest.json will still contain application wide information (e.g. tags, correspondent,
documenttype, etc)
document type, etc)
If `-z` or `--zip` is provided, the export will be a zip file
in the target directory, named according to the current local date or the
@@ -457,6 +461,22 @@ of the index and usually makes queries faster and also ensures that the
autocompletion works properly. This command is regularly invoked by the
task scheduler.
### Clearing the database read cache
If the database read cache is enabled, **you must run this command** after making any changes to the database outside the application context.
This includes operations such as restoring a database backup or executing SQL statements like UPDATE, INSERT, DELETE, ALTER, CREATE, or DROP.
Failing to invalidate the cache after such modifications can lead to stale data being served from the cache, and **may cause data corruption** or inconsistent behavior in the application.
Use the following management command to clear the cache:
```
invalidate_cachalot
```
!!! info
The database read cache is based on Django-Cachalot. You can refer to their [documentation](https://django-cachalot.readthedocs.io/en/latest/quickstart.html#manage-py-command).
### Managing filenames {#renamer}
If you use paperless' feature to

View File

@@ -434,6 +434,136 @@ provided. The template is provided as a string, potentially multiline, and rende
In addition, the entire Document instance is available to be utilized in a more advanced way, as well as some variables which only make sense to be accessed
with more complex logic.
#### Custom Jinja2 Filters
##### Custom Field Access
The `get_cf_value` filter retrieves a value from custom field data with optional default fallback.
###### Syntax
```jinja2
{{ custom_fields | get_cf_value('field_name') }}
{{ custom_fields | get_cf_value('field_name', 'default_value') }}
```
###### Parameters
- `custom_fields`: This _must_ be the provided custom field data
- `name` (str): Name of the custom field to retrieve
- `default` (str, optional): Default value to return if field is not found or has no value
###### Returns
- `str | None`: The field value, default value, or `None` if neither exists
###### Examples
```jinja2
<!-- Basic usage -->
{{ custom_fields | get_cf_value('department') }}
<!-- With default value -->
{{ custom_fields | get_cf_value('phone', 'Not provided') }}
```
##### Datetime Formatting
The `format_datetime`filter formats a datetime string or datetime object using Python's strftime formatting.
###### Syntax
```jinja2
{{ datetime_value | format_datetime('%Y-%m-%d %H:%M:%S') }}
```
###### Parameters
- `value` (str | datetime): Date/time value to format (strings will be parsed automatically)
- `format` (str): Python strftime format string
###### Returns
- `str`: Formatted datetime string
###### Examples
```jinja2
<!-- Format datetime object -->
{{ created_at | format_datetime('%B %d, %Y at %I:%M %p') }}
<!-- Output: "January 15, 2024 at 02:30 PM" -->
<!-- Format datetime string -->
{{ "2024-01-15T14:30:00" | format_datetime('%m/%d/%Y') }}
<!-- Output: "01/15/2024" -->
<!-- Custom formatting -->
{{ timestamp | format_datetime('%A, %B %d, %Y') }}
<!-- Output: "Monday, January 15, 2024" -->
```
See the [strftime format code documentation](https://docs.python.org/3.13/library/datetime.html#strftime-and-strptime-format-codes)
for the possible codes and their meanings.
##### Date Localization
The `localize_date` filter formats a date or datetime object into a localized string using Babel internationalization.
This takes into account the provided locale for translation.
###### Syntax
```jinja2
{{ date_value | localize_date('medium', 'en_US') }}
{{ datetime_value | localize_date('short', 'fr_FR') }}
```
###### Parameters
- `value` (date | datetime): Date or datetime object to format (datetime should be timezone-aware)
- `format` (str): Format type - either a Babel preset ('short', 'medium', 'long', 'full') or custom pattern
- `locale` (str): Locale code for localization (e.g., 'en_US', 'fr_FR', 'de_DE')
###### Returns
- `str`: Localized, formatted date string
###### Examples
```jinja2
<!-- Preset formats -->
{{ created_date | localize_date('short', 'en_US') }}
<!-- Output: "1/15/24" -->
{{ created_date | localize_date('medium', 'en_US') }}
<!-- Output: "Jan 15, 2024" -->
{{ created_date | localize_date('long', 'en_US') }}
<!-- Output: "January 15, 2024" -->
{{ created_date | localize_date('full', 'en_US') }}
<!-- Output: "Monday, January 15, 2024" -->
<!-- Different locales -->
{{ created_date | localize_date('medium', 'fr_FR') }}
<!-- Output: "15 janv. 2024" -->
{{ created_date | localize_date('medium', 'de_DE') }}
<!-- Output: "15.01.2024" -->
<!-- Custom patterns -->
{{ created_date | localize_date('dd/MM/yyyy', 'en_GB') }}
<!-- Output: "15/01/2024" -->
```
See the [supported format codes](https://unicode.org/reports/tr35/tr35-dates.html#Date_Format_Patterns) for more options.
### Format Presets
- **short**: Abbreviated format (e.g., "1/15/24")
- **medium**: Medium-length format (e.g., "Jan 15, 2024")
- **long**: Long format with full month name (e.g., "January 15, 2024")
- **full**: Full format including day of week (e.g., "Monday, January 15, 2024")
#### Additional Variables
- `{{ tag_name_list }}`: A list of tag names applied to the document, ordered by the tag name. Note this is a list, not a single string

View File

@@ -282,6 +282,18 @@ The following methods are supported:
- `"merge": true or false` (defaults to false)
- The `merge` flag determines if the supplied permissions will overwrite all existing permissions (including
removing them) or be merged with existing permissions.
- `edit_pdf`
- Requires `parameters`:
- `"doc_ids": [DOCUMENT_ID]` A list of a single document ID to edit.
- `"operations": [OPERATION, ...]` A list of operations to perform on the documents. Each operation is a dictionary
with the following keys:
- `"page": PAGE_NUMBER` The page number to edit (1-based).
- `"rotate": DEGREES` Optional rotation in degrees (90, 180, 270).
- `"doc": OUTPUT_DOCUMENT_INDEX` Optional index of the output document for split operations.
- Optional `parameters`:
- `"delete_original": true` to delete the original documents after editing.
- `"update_document": true` to update the existing document with the edited PDF.
- `"include_metadata": true` to copy metadata from the original document to the edited document.
- `merge`
- No additional `parameters` required.
- The ordering of the merged document is determined by the list of IDs.

View File

@@ -6004,7 +6004,6 @@ primarily.
a very good job at ocr'ing a document with the default
language. Certain language specifics such as umlauts may not get
picked up properly.
- `PAPERLESS_DEBUG` defaults to `false`.
- The presence of `PAPERLESS_DBHOST` now determines whether to use
PostgreSQL or SQLite.
- `PAPERLESS_OCR_THREADS` is gone and replaced with

View File

@@ -159,6 +159,58 @@ Available options are `postgresql` and `mariadb`.
Defaults to unset, which uses Djangos built-in defaults.
#### [`PAPERLESS_DB_POOLSIZE=<int>`](#PAPERLESS_DB_POOLSIZE) {#PAPERLESS_DB_POOLSIZE}
: Defines the maximum number of database connections to keep in the pool.
Only applies to PostgreSQL. This setting is ignored for other database engines.
The value must be greater than or equal to 1 to be used.
Defaults to unset, which disables connection pooling.
!!! note
A small pool is typically sufficient — for example, a size of 4.
Make sure your PostgreSQL server's max_connections setting is large enough to handle:
```(Paperless workers + Celery workers) × pool size + safety margin```
For example, with 4 Paperless workers and 2 Celery workers, and a pool size of 4:
(4 + 2) × 4 + 10 = 34 connections required.
#### [`PAPERLESS_DB_READ_CACHE_ENABLED=<bool>`](#PAPERLESS_DB_READ_CACHE_ENABLED) {#PAPERLESS_DB_READ_CACHE_ENABLED}
: Caches the database read query results into Redis. This can significantly improve application response times by caching database queries, at the cost of slightly increased memory usage.
Defaults to `false`.
!!! danger
**Do not modify the database outside the application while it is running.**
This includes actions such as restoring a backup, upgrading the database, or performing manual inserts. All external modifications must be done **only when the application is stopped**.
After making any such changes, you **must invalidate the DB read cache** using the `invalidate_cachalot` management command.
#### [`PAPERLESS_READ_CACHE_TTL=<int>`](#PAPERLESS_READ_CACHE_TTL) {#PAPERLESS_READ_CACHE_TTL}
: Specifies how long (in seconds) read data should be cached.
Allowed values are between `1` (one second) and `31536000` (one year). Defaults to `3600` (one hour).
!!! warning
A high TTL increases memory usage over time. Memory may be used until end of TTL, even if the cache is invalidated with the `invalidate_cachalot` command.
In case of an out-of-memory (OOM) situation, Redis may stop accepting new data — including cache entries, scheduled tasks, and documents to consume.
If your system has limited RAM, consider configuring a dedicated Redis instance for the read cache, with a memory limit and the eviction policy set to `allkeys-lru`.
For more details, refer to the [Redis eviction policy documentation](https://redis.io/docs/latest/develop/reference/eviction/), and see the `PAPERLESS_READ_CACHE_REDIS_URL` setting to specify a separate Redis broker.
#### [`PAPERLESS_READ_CACHE_REDIS_URL=<url>`](#PAPERLESS_READ_CACHE_REDIS_URL) {#PAPERLESS_READ_CACHE_REDIS_URL}
: Defines the Redis instance used for the read cache.
Defaults to `None`.
!!! Note
If this value is not set, the same Redis instance used for scheduled tasks will be used for caching as well.
## Optional Services
### Tika {#tika}
@@ -968,6 +1020,22 @@ still perform some basic text pre-processing before matching.
Defaults to 1.
#### [`PAPERLESS_DATE_PARSER_LANGUAGES=<lang>`](#PAPERLESS_DATE_PARSER_LANGUAGES) {#PAPERLESS_DATE_PARSER_LANGUAGES}
Specifies which language Paperless should use when parsing dates from documents.
This should be a language code supported by the dateparser library,
for example: "en", or a combination such as "en+de".
Locales are also supported (e.g., "en-AU").
Multiple languages can be combined using "+", for example: "en+de" or "en-AU+de".
For valid values, refer to the list of supported languages and locales in the [dateparser documentation](https://dateparser.readthedocs.io/en/latest/supported_locales.html).
Set this to match the languages in which most of your documents are written.
If not set, Paperless will attempt to infer the language(s) from the OCR configuration (`PAPERLESS_OCR_LANGUAGE`).
!!! note
This format differs from the `PAPERLESS_OCR_LANGUAGE` setting, which uses ISO 639-2 codes (3 letters, e.g., "eng+deu" for Tesseract OCR).
#### [`PAPERLESS_EMAIL_TASK_CRON=<cron expression>`](#PAPERLESS_EMAIL_TASK_CRON) {#PAPERLESS_EMAIL_TASK_CRON}
: Configures the scheduled email fetching frequency. The value
@@ -1214,6 +1282,30 @@ within your documents.
Defaults to false.
## Workflow webhooks
#### [`PAPERLESS_WEBHOOKS_ALLOWED_SCHEMES=<str>`](#PAPERLESS_WEBHOOKS_ALLOWED_SCHEMES) {#PAPERLESS_WEBHOOKS_ALLOWED_SCHEMES}
: A comma-separated list of allowed schemes for webhooks. This setting
controls which URL schemes are permitted for webhook URLs.
Defaults to `http,https`.
#### [`PAPERLESS_WEBHOOKS_ALLOWED_PORTS=<str>`](#PAPERLESS_WEBHOOKS_ALLOWED_PORTS) {#PAPERLESS_WEBHOOKS_ALLOWED_PORTS}
: A comma-separated list of allowed ports for webhooks. This setting
controls which ports are permitted for webhook URLs. For example, if you
set this to `80,443`, webhooks will only be sent to URLs that use these
ports.
Defaults to empty list, which allows all ports.
#### [`PAPERLESS_WEBHOOKS_ALLOW_INTERNAL_REQUESTS=<bool>`](#PAPERLESS_WEBHOOKS_ALLOW_INTERNAL_REQUESTS) {#PAPERLESS_WEBHOOKS_ALLOW_INTERNAL_REQUESTS}
: If set to false, webhooks cannot be sent to internal URLs (e.g., localhost).
Defaults to true, which allows internal requests.
### Polling {#polling}
#### [`PAPERLESS_CONSUMER_POLLING=<num>`](#PAPERLESS_CONSUMER_POLLING) {#PAPERLESS_CONSUMER_POLLING}

View File

@@ -95,13 +95,13 @@ first-time setup.
7. You can now either ...
- install redis or
- install Redis or
- use the included `scripts/start_services.sh` to use docker to fire
up a redis instance (and some other services such as tika,
gotenberg and a database server) or
- use the included `scripts/start_services.sh` to use Docker to fire
up a Redis instance (and some other services such as Tika,
Gotenberg and a database server) or
- spin up a bare redis container
- spin up a bare Redis container
```
docker run -d -p 6379:6379 --restart unless-stopped redis:latest
@@ -147,7 +147,7 @@ $ ng build --configuration production
### Testing
- Run `pytest` in the `src/` directory to execute all tests. This also
generates a HTML coverage report. When runnings test, `paperless.conf`
generates a HTML coverage report. When running tests, `paperless.conf`
is loaded as well. However, the tests rely on the default
configuration. This is not ideal. But for now, make sure no settings
except for DEBUG are overridden when testing.

View File

@@ -30,7 +30,7 @@ physical documents into a searchable online archive so you can keep, well, _less
- Utilizes the open-source Tesseract engine to recognize more than 100 languages.
- Documents are saved as PDF/A format which is designed for long term storage, alongside the unaltered originals.
- Uses machine-learning to automatically add tags, correspondents and document types to your documents.
- Supports PDF documents, images, plain text files, Office documents (Word, Excel, Powerpoint, and LibreOffice equivalents)[^1] and more.
- Supports PDF documents, images, plain text files, Office documents (Word, Excel, PowerPoint, and LibreOffice equivalents)[^1] and more.
- Paperless stores your documents plain on disk. Filenames and folders are managed by paperless and their format can be configured freely with different configurations assigned to different documents.
- **Beautiful, modern web application** that features:
- Customizable dashboard with statistics.

View File

@@ -445,7 +445,7 @@ are released, dependency support is confirmed, etc.
13. Configure ImageMagick to allow processing of PDF documents. Most
distributions have this disabled by default, since PDF documents can
contain malware. If you don't do this, paperless will fall back to
ghostscript for certain steps such as thumbnail generation.
Ghostscript for certain steps such as thumbnail generation.
Edit `/etc/ImageMagick-6/policy.xml` and adjust

View File

@@ -335,7 +335,7 @@ You may see errors when deleting documents like:
Data too long for column 'transaction_id' at row 1
```
This error can occur in installations which have upgraded from a version of Paperless-ngx that used Django 4 (Paperless-ngx versions prior to v2.13.0) with a MariaDB/MySQL database. Due to the backawards-incompatible change in Django 5, the column "documents_document.transaction_id" will need to be re-created, which can be done with a one-time run of the following management command:
This error can occur in installations which have upgraded from a version of Paperless-ngx that used Django 4 (Paperless-ngx versions prior to v2.13.0) with a MariaDB/MySQL database. Due to the backwards-incompatible change in Django 5, the column "documents_document.transaction_id" will need to be re-created, which can be done with a one-time run of the following management command:
```shell-session
$ python3 manage.py convert_mariadb_uuid

View File

@@ -30,6 +30,9 @@ Each document has data fields that you can assign to them:
- A _document type_ is used to demarcate the type of a document such
as letter, bank statement, invoice, contract, etc. It is used to
identify what a document is about.
- The document _storage path_ is the location where the document files
are stored. See [Storage Paths](advanced_usage.md#storage-paths) for
more information.
- The _date added_ of a document is the date the document was scanned
into paperless. You cannot and should not change this date.
- The _date created_ of a document is the date the document was
@@ -496,6 +499,10 @@ The following workflow action types are available:
- Encoding for the request body, either JSON or form data
- The request headers as key-value pairs
For security reasons, webhooks can be limited to specific ports and disallowed from connecting to local URLs. See the relevant
[configuration settings](configuration.md#workflow-webhooks) to change this behavior. If you are allowing non-admins to create workflows,
you may want to adjust these settings to prevent abuse.
#### Workflow placeholders
Some workflow text can include placeholders but the available options differ depending on the type of
@@ -573,12 +580,14 @@ The following custom field types are supported:
## PDF Actions
Paperless-ngx supports four basic editing operations for PDFs (these operations currently cannot be performed on non-PDF files):
Paperless-ngx supports basic editing operations for PDFs (these operations currently cannot be performed on non-PDF files). When viewing an individual document you can
open the 'PDF Editor' to use a simple UI for re-arranging, rotating, deleting pages and splitting documents.
- Merging documents: available when selecting multiple documents for 'bulk editing'.
- Rotating documents: available when selecting multiple documents for 'bulk editing' and from an individual document's details page.
- Splitting documents: available from an individual document's details page.
- Deleting pages: available from an individual document's details page.
- Rotating documents: available when selecting multiple documents for 'bulk editing' and via the pdf editor on an individual document's details page.
- Splitting documents: via the pdf editor on an individual document's details page.
- Deleting pages: via the pdf editor on an individual document's details page.
- Re-arranging pages: via the pdf editor on an individual document's details page.
!!! important