mirror of
https://github.com/paperless-ngx/paperless-ngx.git
synced 2025-08-18 00:46:25 +00:00
Merge branch 'dev'
This commit is contained in:
@@ -179,10 +179,14 @@ following:
|
||||
|
||||
### Database Upgrades
|
||||
|
||||
In general, paperless does not require a specific version of PostgreSQL or MariaDB and it is
|
||||
Paperless-ngx is compatible with Django-supported versions of PostgreSQL and MariaDB and it is generally
|
||||
safe to update them to newer versions. However, you should always take a backup and follow
|
||||
the instructions from your database's documentation for how to upgrade between major versions.
|
||||
|
||||
!!! note
|
||||
|
||||
As of Paperless-ngx v2.18, the minimum supported version of PostgreSQL is 13.
|
||||
|
||||
For PostgreSQL, refer to [Upgrading a PostgreSQL Cluster](https://www.postgresql.org/docs/current/upgrading.html).
|
||||
|
||||
For MariaDB, refer to [Upgrading MariaDB](https://mariadb.com/kb/en/upgrading/)
|
||||
@@ -306,7 +310,7 @@ in dedicated folders according to their nature: `archive`, `originals`,
|
||||
If `-sm` or `--split-manifest` is provided, information about document
|
||||
will be placed in individual json files, instead of a single JSON file. The main
|
||||
manifest.json will still contain application wide information (e.g. tags, correspondent,
|
||||
documenttype, etc)
|
||||
document type, etc)
|
||||
|
||||
If `-z` or `--zip` is provided, the export will be a zip file
|
||||
in the target directory, named according to the current local date or the
|
||||
@@ -457,6 +461,22 @@ of the index and usually makes queries faster and also ensures that the
|
||||
autocompletion works properly. This command is regularly invoked by the
|
||||
task scheduler.
|
||||
|
||||
### Clearing the database read cache
|
||||
|
||||
If the database read cache is enabled, **you must run this command** after making any changes to the database outside the application context.
|
||||
This includes operations such as restoring a database backup or executing SQL statements like UPDATE, INSERT, DELETE, ALTER, CREATE, or DROP.
|
||||
|
||||
Failing to invalidate the cache after such modifications can lead to stale data being served from the cache, and **may cause data corruption** or inconsistent behavior in the application.
|
||||
|
||||
Use the following management command to clear the cache:
|
||||
|
||||
```
|
||||
invalidate_cachalot
|
||||
```
|
||||
|
||||
!!! info
|
||||
The database read cache is based on Django-Cachalot. You can refer to their [documentation](https://django-cachalot.readthedocs.io/en/latest/quickstart.html#manage-py-command).
|
||||
|
||||
### Managing filenames {#renamer}
|
||||
|
||||
If you use paperless' feature to
|
||||
|
@@ -434,6 +434,136 @@ provided. The template is provided as a string, potentially multiline, and rende
|
||||
In addition, the entire Document instance is available to be utilized in a more advanced way, as well as some variables which only make sense to be accessed
|
||||
with more complex logic.
|
||||
|
||||
#### Custom Jinja2 Filters
|
||||
|
||||
##### Custom Field Access
|
||||
|
||||
The `get_cf_value` filter retrieves a value from custom field data with optional default fallback.
|
||||
|
||||
###### Syntax
|
||||
|
||||
```jinja2
|
||||
{{ custom_fields | get_cf_value('field_name') }}
|
||||
{{ custom_fields | get_cf_value('field_name', 'default_value') }}
|
||||
```
|
||||
|
||||
###### Parameters
|
||||
|
||||
- `custom_fields`: This _must_ be the provided custom field data
|
||||
- `name` (str): Name of the custom field to retrieve
|
||||
- `default` (str, optional): Default value to return if field is not found or has no value
|
||||
|
||||
###### Returns
|
||||
|
||||
- `str | None`: The field value, default value, or `None` if neither exists
|
||||
|
||||
###### Examples
|
||||
|
||||
```jinja2
|
||||
<!-- Basic usage -->
|
||||
{{ custom_fields | get_cf_value('department') }}
|
||||
|
||||
<!-- With default value -->
|
||||
{{ custom_fields | get_cf_value('phone', 'Not provided') }}
|
||||
```
|
||||
|
||||
##### Datetime Formatting
|
||||
|
||||
The `format_datetime`filter formats a datetime string or datetime object using Python's strftime formatting.
|
||||
|
||||
###### Syntax
|
||||
|
||||
```jinja2
|
||||
{{ datetime_value | format_datetime('%Y-%m-%d %H:%M:%S') }}
|
||||
```
|
||||
|
||||
###### Parameters
|
||||
|
||||
- `value` (str | datetime): Date/time value to format (strings will be parsed automatically)
|
||||
- `format` (str): Python strftime format string
|
||||
|
||||
###### Returns
|
||||
|
||||
- `str`: Formatted datetime string
|
||||
|
||||
###### Examples
|
||||
|
||||
```jinja2
|
||||
<!-- Format datetime object -->
|
||||
{{ created_at | format_datetime('%B %d, %Y at %I:%M %p') }}
|
||||
<!-- Output: "January 15, 2024 at 02:30 PM" -->
|
||||
|
||||
<!-- Format datetime string -->
|
||||
{{ "2024-01-15T14:30:00" | format_datetime('%m/%d/%Y') }}
|
||||
<!-- Output: "01/15/2024" -->
|
||||
|
||||
<!-- Custom formatting -->
|
||||
{{ timestamp | format_datetime('%A, %B %d, %Y') }}
|
||||
<!-- Output: "Monday, January 15, 2024" -->
|
||||
```
|
||||
|
||||
See the [strftime format code documentation](https://docs.python.org/3.13/library/datetime.html#strftime-and-strptime-format-codes)
|
||||
for the possible codes and their meanings.
|
||||
|
||||
##### Date Localization
|
||||
|
||||
The `localize_date` filter formats a date or datetime object into a localized string using Babel internationalization.
|
||||
This takes into account the provided locale for translation.
|
||||
|
||||
###### Syntax
|
||||
|
||||
```jinja2
|
||||
{{ date_value | localize_date('medium', 'en_US') }}
|
||||
{{ datetime_value | localize_date('short', 'fr_FR') }}
|
||||
```
|
||||
|
||||
###### Parameters
|
||||
|
||||
- `value` (date | datetime): Date or datetime object to format (datetime should be timezone-aware)
|
||||
- `format` (str): Format type - either a Babel preset ('short', 'medium', 'long', 'full') or custom pattern
|
||||
- `locale` (str): Locale code for localization (e.g., 'en_US', 'fr_FR', 'de_DE')
|
||||
|
||||
###### Returns
|
||||
|
||||
- `str`: Localized, formatted date string
|
||||
|
||||
###### Examples
|
||||
|
||||
```jinja2
|
||||
<!-- Preset formats -->
|
||||
{{ created_date | localize_date('short', 'en_US') }}
|
||||
<!-- Output: "1/15/24" -->
|
||||
|
||||
{{ created_date | localize_date('medium', 'en_US') }}
|
||||
<!-- Output: "Jan 15, 2024" -->
|
||||
|
||||
{{ created_date | localize_date('long', 'en_US') }}
|
||||
<!-- Output: "January 15, 2024" -->
|
||||
|
||||
{{ created_date | localize_date('full', 'en_US') }}
|
||||
<!-- Output: "Monday, January 15, 2024" -->
|
||||
|
||||
<!-- Different locales -->
|
||||
{{ created_date | localize_date('medium', 'fr_FR') }}
|
||||
<!-- Output: "15 janv. 2024" -->
|
||||
|
||||
{{ created_date | localize_date('medium', 'de_DE') }}
|
||||
<!-- Output: "15.01.2024" -->
|
||||
|
||||
<!-- Custom patterns -->
|
||||
{{ created_date | localize_date('dd/MM/yyyy', 'en_GB') }}
|
||||
<!-- Output: "15/01/2024" -->
|
||||
```
|
||||
|
||||
See the [supported format codes](https://unicode.org/reports/tr35/tr35-dates.html#Date_Format_Patterns) for more options.
|
||||
|
||||
### Format Presets
|
||||
|
||||
- **short**: Abbreviated format (e.g., "1/15/24")
|
||||
- **medium**: Medium-length format (e.g., "Jan 15, 2024")
|
||||
- **long**: Long format with full month name (e.g., "January 15, 2024")
|
||||
- **full**: Full format including day of week (e.g., "Monday, January 15, 2024")
|
||||
|
||||
#### Additional Variables
|
||||
|
||||
- `{{ tag_name_list }}`: A list of tag names applied to the document, ordered by the tag name. Note this is a list, not a single string
|
||||
|
12
docs/api.md
12
docs/api.md
@@ -282,6 +282,18 @@ The following methods are supported:
|
||||
- `"merge": true or false` (defaults to false)
|
||||
- The `merge` flag determines if the supplied permissions will overwrite all existing permissions (including
|
||||
removing them) or be merged with existing permissions.
|
||||
- `edit_pdf`
|
||||
- Requires `parameters`:
|
||||
- `"doc_ids": [DOCUMENT_ID]` A list of a single document ID to edit.
|
||||
- `"operations": [OPERATION, ...]` A list of operations to perform on the documents. Each operation is a dictionary
|
||||
with the following keys:
|
||||
- `"page": PAGE_NUMBER` The page number to edit (1-based).
|
||||
- `"rotate": DEGREES` Optional rotation in degrees (90, 180, 270).
|
||||
- `"doc": OUTPUT_DOCUMENT_INDEX` Optional index of the output document for split operations.
|
||||
- Optional `parameters`:
|
||||
- `"delete_original": true` to delete the original documents after editing.
|
||||
- `"update_document": true` to update the existing document with the edited PDF.
|
||||
- `"include_metadata": true` to copy metadata from the original document to the edited document.
|
||||
- `merge`
|
||||
- No additional `parameters` required.
|
||||
- The ordering of the merged document is determined by the list of IDs.
|
||||
|
@@ -6004,7 +6004,6 @@ primarily.
|
||||
a very good job at ocr'ing a document with the default
|
||||
language. Certain language specifics such as umlauts may not get
|
||||
picked up properly.
|
||||
- `PAPERLESS_DEBUG` defaults to `false`.
|
||||
- The presence of `PAPERLESS_DBHOST` now determines whether to use
|
||||
PostgreSQL or SQLite.
|
||||
- `PAPERLESS_OCR_THREADS` is gone and replaced with
|
||||
|
@@ -159,6 +159,58 @@ Available options are `postgresql` and `mariadb`.
|
||||
|
||||
Defaults to unset, which uses Django’s built-in defaults.
|
||||
|
||||
#### [`PAPERLESS_DB_POOLSIZE=<int>`](#PAPERLESS_DB_POOLSIZE) {#PAPERLESS_DB_POOLSIZE}
|
||||
|
||||
: Defines the maximum number of database connections to keep in the pool.
|
||||
|
||||
Only applies to PostgreSQL. This setting is ignored for other database engines.
|
||||
|
||||
The value must be greater than or equal to 1 to be used.
|
||||
Defaults to unset, which disables connection pooling.
|
||||
|
||||
!!! note
|
||||
|
||||
A small pool is typically sufficient — for example, a size of 4.
|
||||
Make sure your PostgreSQL server's max_connections setting is large enough to handle:
|
||||
```(Paperless workers + Celery workers) × pool size + safety margin```
|
||||
For example, with 4 Paperless workers and 2 Celery workers, and a pool size of 4:
|
||||
(4 + 2) × 4 + 10 = 34 connections required.
|
||||
|
||||
#### [`PAPERLESS_DB_READ_CACHE_ENABLED=<bool>`](#PAPERLESS_DB_READ_CACHE_ENABLED) {#PAPERLESS_DB_READ_CACHE_ENABLED}
|
||||
|
||||
: Caches the database read query results into Redis. This can significantly improve application response times by caching database queries, at the cost of slightly increased memory usage.
|
||||
|
||||
Defaults to `false`.
|
||||
|
||||
!!! danger
|
||||
|
||||
**Do not modify the database outside the application while it is running.**
|
||||
This includes actions such as restoring a backup, upgrading the database, or performing manual inserts. All external modifications must be done **only when the application is stopped**.
|
||||
After making any such changes, you **must invalidate the DB read cache** using the `invalidate_cachalot` management command.
|
||||
|
||||
#### [`PAPERLESS_READ_CACHE_TTL=<int>`](#PAPERLESS_READ_CACHE_TTL) {#PAPERLESS_READ_CACHE_TTL}
|
||||
|
||||
: Specifies how long (in seconds) read data should be cached.
|
||||
|
||||
Allowed values are between `1` (one second) and `31536000` (one year). Defaults to `3600` (one hour).
|
||||
|
||||
!!! warning
|
||||
|
||||
A high TTL increases memory usage over time. Memory may be used until end of TTL, even if the cache is invalidated with the `invalidate_cachalot` command.
|
||||
|
||||
In case of an out-of-memory (OOM) situation, Redis may stop accepting new data — including cache entries, scheduled tasks, and documents to consume.
|
||||
If your system has limited RAM, consider configuring a dedicated Redis instance for the read cache, with a memory limit and the eviction policy set to `allkeys-lru`.
|
||||
For more details, refer to the [Redis eviction policy documentation](https://redis.io/docs/latest/develop/reference/eviction/), and see the `PAPERLESS_READ_CACHE_REDIS_URL` setting to specify a separate Redis broker.
|
||||
|
||||
#### [`PAPERLESS_READ_CACHE_REDIS_URL=<url>`](#PAPERLESS_READ_CACHE_REDIS_URL) {#PAPERLESS_READ_CACHE_REDIS_URL}
|
||||
|
||||
: Defines the Redis instance used for the read cache.
|
||||
|
||||
Defaults to `None`.
|
||||
|
||||
!!! Note
|
||||
If this value is not set, the same Redis instance used for scheduled tasks will be used for caching as well.
|
||||
|
||||
## Optional Services
|
||||
|
||||
### Tika {#tika}
|
||||
@@ -968,6 +1020,22 @@ still perform some basic text pre-processing before matching.
|
||||
|
||||
Defaults to 1.
|
||||
|
||||
#### [`PAPERLESS_DATE_PARSER_LANGUAGES=<lang>`](#PAPERLESS_DATE_PARSER_LANGUAGES) {#PAPERLESS_DATE_PARSER_LANGUAGES}
|
||||
|
||||
Specifies which language Paperless should use when parsing dates from documents.
|
||||
|
||||
This should be a language code supported by the dateparser library,
|
||||
for example: "en", or a combination such as "en+de".
|
||||
Locales are also supported (e.g., "en-AU").
|
||||
Multiple languages can be combined using "+", for example: "en+de" or "en-AU+de".
|
||||
For valid values, refer to the list of supported languages and locales in the [dateparser documentation](https://dateparser.readthedocs.io/en/latest/supported_locales.html).
|
||||
|
||||
Set this to match the languages in which most of your documents are written.
|
||||
If not set, Paperless will attempt to infer the language(s) from the OCR configuration (`PAPERLESS_OCR_LANGUAGE`).
|
||||
|
||||
!!! note
|
||||
This format differs from the `PAPERLESS_OCR_LANGUAGE` setting, which uses ISO 639-2 codes (3 letters, e.g., "eng+deu" for Tesseract OCR).
|
||||
|
||||
#### [`PAPERLESS_EMAIL_TASK_CRON=<cron expression>`](#PAPERLESS_EMAIL_TASK_CRON) {#PAPERLESS_EMAIL_TASK_CRON}
|
||||
|
||||
: Configures the scheduled email fetching frequency. The value
|
||||
@@ -1214,6 +1282,30 @@ within your documents.
|
||||
|
||||
Defaults to false.
|
||||
|
||||
## Workflow webhooks
|
||||
|
||||
#### [`PAPERLESS_WEBHOOKS_ALLOWED_SCHEMES=<str>`](#PAPERLESS_WEBHOOKS_ALLOWED_SCHEMES) {#PAPERLESS_WEBHOOKS_ALLOWED_SCHEMES}
|
||||
|
||||
: A comma-separated list of allowed schemes for webhooks. This setting
|
||||
controls which URL schemes are permitted for webhook URLs.
|
||||
|
||||
Defaults to `http,https`.
|
||||
|
||||
#### [`PAPERLESS_WEBHOOKS_ALLOWED_PORTS=<str>`](#PAPERLESS_WEBHOOKS_ALLOWED_PORTS) {#PAPERLESS_WEBHOOKS_ALLOWED_PORTS}
|
||||
|
||||
: A comma-separated list of allowed ports for webhooks. This setting
|
||||
controls which ports are permitted for webhook URLs. For example, if you
|
||||
set this to `80,443`, webhooks will only be sent to URLs that use these
|
||||
ports.
|
||||
|
||||
Defaults to empty list, which allows all ports.
|
||||
|
||||
#### [`PAPERLESS_WEBHOOKS_ALLOW_INTERNAL_REQUESTS=<bool>`](#PAPERLESS_WEBHOOKS_ALLOW_INTERNAL_REQUESTS) {#PAPERLESS_WEBHOOKS_ALLOW_INTERNAL_REQUESTS}
|
||||
|
||||
: If set to false, webhooks cannot be sent to internal URLs (e.g., localhost).
|
||||
|
||||
Defaults to true, which allows internal requests.
|
||||
|
||||
### Polling {#polling}
|
||||
|
||||
#### [`PAPERLESS_CONSUMER_POLLING=<num>`](#PAPERLESS_CONSUMER_POLLING) {#PAPERLESS_CONSUMER_POLLING}
|
||||
|
@@ -95,13 +95,13 @@ first-time setup.
|
||||
|
||||
7. You can now either ...
|
||||
|
||||
- install redis or
|
||||
- install Redis or
|
||||
|
||||
- use the included `scripts/start_services.sh` to use docker to fire
|
||||
up a redis instance (and some other services such as tika,
|
||||
gotenberg and a database server) or
|
||||
- use the included `scripts/start_services.sh` to use Docker to fire
|
||||
up a Redis instance (and some other services such as Tika,
|
||||
Gotenberg and a database server) or
|
||||
|
||||
- spin up a bare redis container
|
||||
- spin up a bare Redis container
|
||||
|
||||
```
|
||||
docker run -d -p 6379:6379 --restart unless-stopped redis:latest
|
||||
@@ -147,7 +147,7 @@ $ ng build --configuration production
|
||||
### Testing
|
||||
|
||||
- Run `pytest` in the `src/` directory to execute all tests. This also
|
||||
generates a HTML coverage report. When runnings test, `paperless.conf`
|
||||
generates a HTML coverage report. When running tests, `paperless.conf`
|
||||
is loaded as well. However, the tests rely on the default
|
||||
configuration. This is not ideal. But for now, make sure no settings
|
||||
except for DEBUG are overridden when testing.
|
||||
|
@@ -30,7 +30,7 @@ physical documents into a searchable online archive so you can keep, well, _less
|
||||
- Utilizes the open-source Tesseract engine to recognize more than 100 languages.
|
||||
- Documents are saved as PDF/A format which is designed for long term storage, alongside the unaltered originals.
|
||||
- Uses machine-learning to automatically add tags, correspondents and document types to your documents.
|
||||
- Supports PDF documents, images, plain text files, Office documents (Word, Excel, Powerpoint, and LibreOffice equivalents)[^1] and more.
|
||||
- Supports PDF documents, images, plain text files, Office documents (Word, Excel, PowerPoint, and LibreOffice equivalents)[^1] and more.
|
||||
- Paperless stores your documents plain on disk. Filenames and folders are managed by paperless and their format can be configured freely with different configurations assigned to different documents.
|
||||
- **Beautiful, modern web application** that features:
|
||||
- Customizable dashboard with statistics.
|
||||
|
@@ -445,7 +445,7 @@ are released, dependency support is confirmed, etc.
|
||||
13. Configure ImageMagick to allow processing of PDF documents. Most
|
||||
distributions have this disabled by default, since PDF documents can
|
||||
contain malware. If you don't do this, paperless will fall back to
|
||||
ghostscript for certain steps such as thumbnail generation.
|
||||
Ghostscript for certain steps such as thumbnail generation.
|
||||
|
||||
Edit `/etc/ImageMagick-6/policy.xml` and adjust
|
||||
|
||||
|
@@ -335,7 +335,7 @@ You may see errors when deleting documents like:
|
||||
Data too long for column 'transaction_id' at row 1
|
||||
```
|
||||
|
||||
This error can occur in installations which have upgraded from a version of Paperless-ngx that used Django 4 (Paperless-ngx versions prior to v2.13.0) with a MariaDB/MySQL database. Due to the backawards-incompatible change in Django 5, the column "documents_document.transaction_id" will need to be re-created, which can be done with a one-time run of the following management command:
|
||||
This error can occur in installations which have upgraded from a version of Paperless-ngx that used Django 4 (Paperless-ngx versions prior to v2.13.0) with a MariaDB/MySQL database. Due to the backwards-incompatible change in Django 5, the column "documents_document.transaction_id" will need to be re-created, which can be done with a one-time run of the following management command:
|
||||
|
||||
```shell-session
|
||||
$ python3 manage.py convert_mariadb_uuid
|
||||
|
@@ -30,6 +30,9 @@ Each document has data fields that you can assign to them:
|
||||
- A _document type_ is used to demarcate the type of a document such
|
||||
as letter, bank statement, invoice, contract, etc. It is used to
|
||||
identify what a document is about.
|
||||
- The document _storage path_ is the location where the document files
|
||||
are stored. See [Storage Paths](advanced_usage.md#storage-paths) for
|
||||
more information.
|
||||
- The _date added_ of a document is the date the document was scanned
|
||||
into paperless. You cannot and should not change this date.
|
||||
- The _date created_ of a document is the date the document was
|
||||
@@ -496,6 +499,10 @@ The following workflow action types are available:
|
||||
- Encoding for the request body, either JSON or form data
|
||||
- The request headers as key-value pairs
|
||||
|
||||
For security reasons, webhooks can be limited to specific ports and disallowed from connecting to local URLs. See the relevant
|
||||
[configuration settings](configuration.md#workflow-webhooks) to change this behavior. If you are allowing non-admins to create workflows,
|
||||
you may want to adjust these settings to prevent abuse.
|
||||
|
||||
#### Workflow placeholders
|
||||
|
||||
Some workflow text can include placeholders but the available options differ depending on the type of
|
||||
@@ -573,12 +580,14 @@ The following custom field types are supported:
|
||||
|
||||
## PDF Actions
|
||||
|
||||
Paperless-ngx supports four basic editing operations for PDFs (these operations currently cannot be performed on non-PDF files):
|
||||
Paperless-ngx supports basic editing operations for PDFs (these operations currently cannot be performed on non-PDF files). When viewing an individual document you can
|
||||
open the 'PDF Editor' to use a simple UI for re-arranging, rotating, deleting pages and splitting documents.
|
||||
|
||||
- Merging documents: available when selecting multiple documents for 'bulk editing'.
|
||||
- Rotating documents: available when selecting multiple documents for 'bulk editing' and from an individual document's details page.
|
||||
- Splitting documents: available from an individual document's details page.
|
||||
- Deleting pages: available from an individual document's details page.
|
||||
- Rotating documents: available when selecting multiple documents for 'bulk editing' and via the pdf editor on an individual document's details page.
|
||||
- Splitting documents: via the pdf editor on an individual document's details page.
|
||||
- Deleting pages: via the pdf editor on an individual document's details page.
|
||||
- Re-arranging pages: via the pdf editor on an individual document's details page.
|
||||
|
||||
!!! important
|
||||
|
||||
|
Reference in New Issue
Block a user