Documentation: Fix list indentation (#8050)

---------

Co-authored-by: shamoon <4887959+shamoon@users.noreply.github.com>
This commit is contained in:
marph91 2024-10-28 01:02:06 +01:00 committed by GitHub
parent 149d770ad1
commit 605aa50b00
No known key found for this signature in database
GPG Key ID: B5690EEEBB952194
10 changed files with 4458 additions and 4458 deletions

View File

@ -7,9 +7,9 @@
"trailingComma": "es5", "trailingComma": "es5",
"overrides": [ "overrides": [
{ {
"files": ["index.md", "administration.md"], "files": ["docs/*.md"],
"options": { "options": {
"tabWidth": 4 "tabWidth": 4,
} }
} }
] ]

View File

@ -25,20 +25,20 @@ documents.
The following algorithms are available: The following algorithms are available:
- **None:** No matching will be performed. - **None:** No matching will be performed.
- **Any:** Looks for any occurrence of any word provided in match in - **Any:** Looks for any occurrence of any word provided in match in
the PDF. If you define the match as `Bank1 Bank2`, it will match the PDF. If you define the match as `Bank1 Bank2`, it will match
documents containing either of these terms. documents containing either of these terms.
- **All:** Requires that every word provided appears in the PDF, - **All:** Requires that every word provided appears in the PDF,
albeit not in the order provided. albeit not in the order provided.
- **Exact:** Matches only if the match appears exactly as provided - **Exact:** Matches only if the match appears exactly as provided
(i.e. preserve ordering) in the PDF. (i.e. preserve ordering) in the PDF.
- **Regular expression:** Parses the match as a regular expression and - **Regular expression:** Parses the match as a regular expression and
tries to find a match within the document. tries to find a match within the document.
- **Fuzzy match:** Uses a partial matching based on locating the tag text - **Fuzzy match:** Uses a partial matching based on locating the tag text
inside the document, using a [partial ratio](https://rapidfuzz.github.io/RapidFuzz/Usage/fuzz.html#partial-ratio) inside the document, using a [partial ratio](https://rapidfuzz.github.io/RapidFuzz/Usage/fuzz.html#partial-ratio)
- **Auto:** Tries to automatically match new documents. This does not - **Auto:** Tries to automatically match new documents. This does not
require you to set a match. See the [notes below](#automatic-matching). require you to set a match. See the [notes below](#automatic-matching).
When using the _any_ or _all_ matching algorithms, you can search for When using the _any_ or _all_ matching algorithms, you can search for
terms that consist of multiple words by enclosing them in double quotes. terms that consist of multiple words by enclosing them in double quotes.
@ -69,33 +69,33 @@ Paperless tries to hide much of the involved complexity with this
approach. However, there are a couple caveats you need to keep in mind approach. However, there are a couple caveats you need to keep in mind
when using this feature: when using this feature:
- Changes to your documents are not immediately reflected by the - Changes to your documents are not immediately reflected by the
matching algorithm. The neural network needs to be _trained_ on your matching algorithm. The neural network needs to be _trained_ on your
documents after changes. Paperless periodically (default: once each documents after changes. Paperless periodically (default: once each
hour) checks for changes and does this automatically for you. hour) checks for changes and does this automatically for you.
- The Auto matching algorithm only takes documents into account which - The Auto matching algorithm only takes documents into account which
are NOT placed in your inbox (i.e. have any inbox tags assigned to are NOT placed in your inbox (i.e. have any inbox tags assigned to
them). This ensures that the neural network only learns from them). This ensures that the neural network only learns from
documents which you have correctly tagged before. documents which you have correctly tagged before.
- The matching algorithm can only work if there is a correlation - The matching algorithm can only work if there is a correlation
between the tag, correspondent, document type, or storage path and between the tag, correspondent, document type, or storage path and
the document itself. Your bank statements usually contain your bank the document itself. Your bank statements usually contain your bank
account number and the name of the bank, so this works reasonably account number and the name of the bank, so this works reasonably
well, However, tags such as "TODO" cannot be automatically well, However, tags such as "TODO" cannot be automatically
assigned. assigned.
- The matching algorithm needs a reasonable number of documents to - The matching algorithm needs a reasonable number of documents to
identify when to assign tags, correspondents, storage paths, and identify when to assign tags, correspondents, storage paths, and
types. If one out of a thousand documents has the correspondent types. If one out of a thousand documents has the correspondent
"Very obscure web shop I bought something five years ago", it will "Very obscure web shop I bought something five years ago", it will
probably not assign this correspondent automatically if you buy probably not assign this correspondent automatically if you buy
something from them again. The more documents, the better. something from them again. The more documents, the better.
- Paperless also needs a reasonable amount of negative examples to - Paperless also needs a reasonable amount of negative examples to
decide when not to assign a certain tag, correspondent, document decide when not to assign a certain tag, correspondent, document
type, or storage path. This will usually be the case as you start type, or storage path. This will usually be the case as you start
filling up paperless with documents. Example: If all your documents filling up paperless with documents. Example: If all your documents
are either from "Webshop" or "Bank", paperless will assign one are either from "Webshop" or "Bank", paperless will assign one
of these correspondents to ANY new document, if both are set to of these correspondents to ANY new document, if both are set to
automatic matching. automatic matching.
## Hooking into the consumption process {#consume-hooks} ## Hooking into the consumption process {#consume-hooks}
@ -242,12 +242,12 @@ webserver:
Troubleshooting: Troubleshooting:
- Monitor the Docker Compose log - Monitor the Docker Compose log
`cd ~/paperless-ngx; docker compose logs -f` `cd ~/paperless-ngx; docker compose logs -f`
- Check your script's permission e.g. in case of permission error - Check your script's permission e.g. in case of permission error
`sudo chmod 755 post-consumption-example.sh` `sudo chmod 755 post-consumption-example.sh`
- Pipe your scripts's output to a log file e.g. - Pipe your scripts's output to a log file e.g.
`echo "${DOCUMENT_ID}" | tee --append /usr/src/paperless/scripts/post-consumption-example.log` `echo "${DOCUMENT_ID}" | tee --append /usr/src/paperless/scripts/post-consumption-example.log`
## File name handling {#file-name-handling} ## File name handling {#file-name-handling}
@ -302,35 +302,35 @@ will create a directory structure as follows:
Paperless provides the following variables for use within filenames: Paperless provides the following variables for use within filenames:
- `{{ asn }}`: The archive serial number of the document, or "none". - `{{ asn }}`: The archive serial number of the document, or "none".
- `{{ correspondent }}`: The name of the correspondent, or "none". - `{{ correspondent }}`: The name of the correspondent, or "none".
- `{{ document_type }}`: The name of the document type, or "none". - `{{ document_type }}`: The name of the document type, or "none".
- `{{ tag_list }}`: A comma separated list of all tags assigned to the - `{{ tag_list }}`: A comma separated list of all tags assigned to the
document. document.
- `{{ title }}`: The title of the document. - `{{ title }}`: The title of the document.
- `{{ created }}`: The full date (ISO format) the document was created. - `{{ created }}`: The full date (ISO format) the document was created.
- `{{ created_year }}`: Year created only, formatted as the year with - `{{ created_year }}`: Year created only, formatted as the year with
century. century.
- `{{ created_year_short }}`: Year created only, formatted as the year - `{{ created_year_short }}`: Year created only, formatted as the year
without century, zero padded. without century, zero padded.
- `{{ created_month }}`: Month created only (number 01-12). - `{{ created_month }}`: Month created only (number 01-12).
- `{{ created_month_name }}`: Month created name, as per locale - `{{ created_month_name }}`: Month created name, as per locale
- `{{ created_month_name_short }}`: Month created abbreviated name, as per - `{{ created_month_name_short }}`: Month created abbreviated name, as per
locale locale
- `{{ created_day }}`: Day created only (number 01-31). - `{{ created_day }}`: Day created only (number 01-31).
- `{{ added }}`: The full date (ISO format) the document was added to - `{{ added }}`: The full date (ISO format) the document was added to
paperless. paperless.
- `{{ added_year }}`: Year added only. - `{{ added_year }}`: Year added only.
- `{{ added_year_short }}`: Year added only, formatted as the year without - `{{ added_year_short }}`: Year added only, formatted as the year without
century, zero padded. century, zero padded.
- `{{ added_month }}`: Month added only (number 01-12). - `{{ added_month }}`: Month added only (number 01-12).
- `{{ added_month_name }}`: Month added name, as per locale - `{{ added_month_name }}`: Month added name, as per locale
- `{{ added_month_name_short }}`: Month added abbreviated name, as per - `{{ added_month_name_short }}`: Month added abbreviated name, as per
locale locale
- `{{ added_day }}`: Day added only (number 01-31). - `{{ added_day }}`: Day added only (number 01-31).
- `{{ owner_username }}`: Username of document owner, if any, or "none" - `{{ owner_username }}`: Username of document owner, if any, or "none"
- `{{ original_name }}`: Document original filename, minus the extension, if any, or "none" - `{{ original_name }}`: Document original filename, minus the extension, if any, or "none"
- `{{ doc_pk }}`: The paperless identifier (primary key) for the document. - `{{ doc_pk }}`: The paperless identifier (primary key) for the document.
!!! warning !!! warning
@ -381,10 +381,10 @@ before empty placeholders are removed as well, empty directories are omitted.
When a single storage layout is not sufficient for your use case, storage paths allow for more complex When a single storage layout is not sufficient for your use case, storage paths allow for more complex
structure to set precisely where each document is stored in the file system. structure to set precisely where each document is stored in the file system.
- Each storage path is a [`PAPERLESS_FILENAME_FORMAT`](configuration.md#PAPERLESS_FILENAME_FORMAT) and - Each storage path is a [`PAPERLESS_FILENAME_FORMAT`](configuration.md#PAPERLESS_FILENAME_FORMAT) and
follows the rules described above follows the rules described above
- Each document is assigned a storage path using the matching algorithms described above, but can be - Each document is assigned a storage path using the matching algorithms described above, but can be
overwritten at any time overwritten at any time
For example, you could define the following two storage paths: For example, you could define the following two storage paths:
@ -435,8 +435,8 @@ with more complex logic.
#### Additional Variables #### Additional Variables
- `{{ tag_name_list }}`: A list of tag names applied to the document, ordered by the tag name. Note this is a list, not a single string - `{{ tag_name_list }}`: A list of tag names applied to the document, ordered by the tag name. Note this is a list, not a single string
- `{{ custom_fields }}`: A mapping of custom field names to their type and value. A user can access the mapping by field name or check if a field is applied by checking its existence in the variable. - `{{ custom_fields }}`: A mapping of custom field names to their type and value. A user can access the mapping by field name or check if a field is applied by checking its existence in the variable.
!!! tip !!! tip
@ -532,15 +532,15 @@ installation, you can use volumes to accomplish this:
```yaml ```yaml
services: services:
# ...
webserver:
environment:
- PAPERLESS_ENABLE_FLOWER
ports:
- 5555:5555 # (2)!
# ... # ...
volumes: webserver:
- /path/to/my/flowerconfig.py:/usr/src/paperless/src/paperless/flowerconfig.py:ro # (1)! environment:
- PAPERLESS_ENABLE_FLOWER
ports:
- 5555:5555 # (2)!
# ...
volumes:
- /path/to/my/flowerconfig.py:/usr/src/paperless/src/paperless/flowerconfig.py:ro # (1)!
``` ```
1. Note the `:ro` tag means the file will be mounted as read only. 1. Note the `:ro` tag means the file will be mounted as read only.
@ -571,11 +571,11 @@ For example, using Docker Compose:
```yaml ```yaml
services: services:
# ...
webserver:
# ... # ...
volumes: webserver:
- /path/to/my/scripts:/custom-cont-init.d:ro # (1)! # ...
volumes:
- /path/to/my/scripts:/custom-cont-init.d:ro # (1)!
``` ```
1. Note the `:ro` tag means the folder will be mounted as read only. This is for extra security against changes 1. Note the `:ro` tag means the folder will be mounted as read only. This is for extra security against changes
@ -623,16 +623,16 @@ Paperless is able to utilize barcodes for automatically performing some tasks.
At this time, the library utilized for detection of barcodes supports the following types: At this time, the library utilized for detection of barcodes supports the following types:
- AN-13/UPC-A - AN-13/UPC-A
- UPC-E - UPC-E
- EAN-8 - EAN-8
- Code 128 - Code 128
- Code 93 - Code 93
- Code 39 - Code 39
- Codabar - Codabar
- Interleaved 2 of 5 - Interleaved 2 of 5
- QR Code - QR Code
- SQ Code - SQ Code
You may check for updates on the [zbar library homepage](https://github.com/mchehab/zbar). You may check for updates on the [zbar library homepage](https://github.com/mchehab/zbar).
For usage in Paperless, the type of barcode does not matter, only the contents of it. For usage in Paperless, the type of barcode does not matter, only the contents of it.
@ -819,9 +819,9 @@ If using docker, you'll need to add the following volume mounts to your `docker-
```yaml ```yaml
webserver: webserver:
volumes: volumes:
- /home/user/.gnupg/pubring.gpg:/usr/src/paperless/.gnupg/pubring.gpg - /home/user/.gnupg/pubring.gpg:/usr/src/paperless/.gnupg/pubring.gpg
- <path to gpg-agent.extra socket>:/usr/src/paperless/.gnupg/S.gpg-agent - <path to gpg-agent.extra socket>:/usr/src/paperless/.gnupg/S.gpg-agent
``` ```
For a 'bare-metal' installation no further configuration is necessary. If you For a 'bare-metal' installation no further configuration is necessary. If you
@ -829,9 +829,9 @@ want to use a separate `GNUPG_HOME`, you can do so by configuring the [PAPERLESS
### Troubleshooting ### Troubleshooting
- Make sure, that `gpg-agent` is running on your host machine - Make sure, that `gpg-agent` is running on your host machine
- Make sure, that encryption and decryption works from inside the container using the `gpg` commands from above. - Make sure, that encryption and decryption works from inside the container using the `gpg` commands from above.
- Check that all files in `/usr/src/paperless/.gnupg` have correct permissions - Check that all files in `/usr/src/paperless/.gnupg` have correct permissions
```shell ```shell
paperless@9da1865df327:~/.gnupg$ ls -al paperless@9da1865df327:~/.gnupg$ ls -al

View File

@ -8,23 +8,23 @@ most of the available filters and ordering fields.
The API provides the following main endpoints: The API provides the following main endpoints:
- `/api/correspondents/`: Full CRUD support. - `/api/correspondents/`: Full CRUD support.
- `/api/custom_fields/`: Full CRUD support. - `/api/custom_fields/`: Full CRUD support.
- `/api/documents/`: Full CRUD support, except POSTing new documents. - `/api/documents/`: Full CRUD support, except POSTing new documents.
See [below](#file-uploads). See [below](#file-uploads).
- `/api/document_types/`: Full CRUD support. - `/api/document_types/`: Full CRUD support.
- `/api/groups/`: Full CRUD support. - `/api/groups/`: Full CRUD support.
- `/api/logs/`: Read-Only. - `/api/logs/`: Read-Only.
- `/api/mail_accounts/`: Full CRUD support. - `/api/mail_accounts/`: Full CRUD support.
- `/api/mail_rules/`: Full CRUD support. - `/api/mail_rules/`: Full CRUD support.
- `/api/profile/`: GET, PATCH - `/api/profile/`: GET, PATCH
- `/api/share_links/`: Full CRUD support. - `/api/share_links/`: Full CRUD support.
- `/api/storage_paths/`: Full CRUD support. - `/api/storage_paths/`: Full CRUD support.
- `/api/tags/`: Full CRUD support. - `/api/tags/`: Full CRUD support.
- `/api/tasks/`: Read-only. - `/api/tasks/`: Read-only.
- `/api/users/`: Full CRUD support. - `/api/users/`: Full CRUD support.
- `/api/workflows/`: Full CRUD support. - `/api/workflows/`: Full CRUD support.
- `/api/search/` GET, see [below](#global-search). - `/api/search/` GET, see [below](#global-search).
All of these endpoints except for the logging endpoint allow you to All of these endpoints except for the logging endpoint allow you to
fetch (and edit and delete where appropriate) individual objects by fetch (and edit and delete where appropriate) individual objects by
@ -33,32 +33,32 @@ appending their primary key to the path, e.g. `/api/documents/454/`.
The objects served by the document endpoint contain the following The objects served by the document endpoint contain the following
fields: fields:
- `id`: ID of the document. Read-only. - `id`: ID of the document. Read-only.
- `title`: Title of the document. - `title`: Title of the document.
- `content`: Plain text content of the document. - `content`: Plain text content of the document.
- `tags`: List of IDs of tags assigned to this document, or empty - `tags`: List of IDs of tags assigned to this document, or empty
list. list.
- `document_type`: Document type of this document, or null. - `document_type`: Document type of this document, or null.
- `correspondent`: Correspondent of this document or null. - `correspondent`: Correspondent of this document or null.
- `created`: The date time at which this document was created. - `created`: The date time at which this document was created.
- `created_date`: The date (YYYY-MM-DD) at which this document was - `created_date`: The date (YYYY-MM-DD) at which this document was
created. Optional. If also passed with created, this is ignored. created. Optional. If also passed with created, this is ignored.
- `modified`: The date at which this document was last edited in - `modified`: The date at which this document was last edited in
paperless. Read-only. paperless. Read-only.
- `added`: The date at which this document was added to paperless. - `added`: The date at which this document was added to paperless.
Read-only. Read-only.
- `archive_serial_number`: The identifier of this document in a - `archive_serial_number`: The identifier of this document in a
physical document archive. physical document archive.
- `original_file_name`: Verbose filename of the original document. - `original_file_name`: Verbose filename of the original document.
Read-only. Read-only.
- `archived_file_name`: Verbose filename of the archived document. - `archived_file_name`: Verbose filename of the archived document.
Read-only. Null if no archived document is available. Read-only. Null if no archived document is available.
- `notes`: Array of notes associated with the document. - `notes`: Array of notes associated with the document.
- `page_count`: Number of pages. - `page_count`: Number of pages.
- `set_permissions`: Allows setting document permissions. Optional, - `set_permissions`: Allows setting document permissions. Optional,
write-only. See [below](#permissions). write-only. See [below](#permissions).
- `custom_fields`: Array of custom fields & values, specified as - `custom_fields`: Array of custom fields & values, specified as
`{ field: CUSTOM_FIELD_ID, value: VALUE }` `{ field: CUSTOM_FIELD_ID, value: VALUE }`
!!! note !!! note
@ -69,11 +69,11 @@ fields:
In addition to that, the document endpoint offers these additional In addition to that, the document endpoint offers these additional
actions on individual documents: actions on individual documents:
- `/api/documents/<pk>/download/`: Download the document. - `/api/documents/<pk>/download/`: Download the document.
- `/api/documents/<pk>/preview/`: Display the document inline, without - `/api/documents/<pk>/preview/`: Display the document inline, without
downloading it. downloading it.
- `/api/documents/<pk>/thumb/`: Download the PNG thumbnail of a - `/api/documents/<pk>/thumb/`: Download the PNG thumbnail of a
document. document.
Paperless generates archived PDF/A documents from consumed files and Paperless generates archived PDF/A documents from consumed files and
stores both the original files as well as the archived files. By stores both the original files as well as the archived files. By
@ -107,30 +107,30 @@ Access the metadata of a document with an ID `id` at
The endpoint reports the following data: The endpoint reports the following data:
- `original_checksum`: MD5 checksum of the original document. - `original_checksum`: MD5 checksum of the original document.
- `original_size`: Size of the original document, in bytes. - `original_size`: Size of the original document, in bytes.
- `original_mime_type`: Mime type of the original document. - `original_mime_type`: Mime type of the original document.
- `media_filename`: Current filename of the document, under which it - `media_filename`: Current filename of the document, under which it
is stored inside the media directory. is stored inside the media directory.
- `has_archive_version`: True, if this document is archived, false - `has_archive_version`: True, if this document is archived, false
otherwise. otherwise.
- `original_metadata`: A list of metadata associated with the original - `original_metadata`: A list of metadata associated with the original
document. See below. document. See below.
- `archive_checksum`: MD5 checksum of the archived document, or null. - `archive_checksum`: MD5 checksum of the archived document, or null.
- `archive_size`: Size of the archived document in bytes, or null. - `archive_size`: Size of the archived document in bytes, or null.
- `archive_metadata`: Metadata associated with the archived document, - `archive_metadata`: Metadata associated with the archived document,
or null. See below. or null. See below.
File metadata is reported as a list of objects in the following form: File metadata is reported as a list of objects in the following form:
```json ```json
[ [
{ {
"namespace": "http://ns.adobe.com/pdf/1.3/", "namespace": "http://ns.adobe.com/pdf/1.3/",
"prefix": "pdf", "prefix": "pdf",
"key": "Producer", "key": "Producer",
"value": "SparklePDF, Fancy edition" "value": "SparklePDF, Fancy edition"
} }
] ]
``` ```
@ -140,9 +140,9 @@ document. Paperless only reports PDF metadata at this point.
## Documents additional endpoints ## Documents additional endpoints
- `/api/documents/<id>/notes/`: Retrieve notes for a document. - `/api/documents/<id>/notes/`: Retrieve notes for a document.
- `/api/documents/<id>/share_links/`: Retrieve share links for a document. - `/api/documents/<id>/share_links/`: Retrieve share links for a document.
- `/api/documents/<id>/history/`: Retrieve history of changes for a document. - `/api/documents/<id>/history/`: Retrieve history of changes for a document.
## Authorization ## Authorization
@ -228,10 +228,10 @@ Full text searching is available on the `/api/documents/` endpoint. Two
specific query parameters cause the API to return full text search specific query parameters cause the API to return full text search
results: results:
- `/api/documents/?query=your%20search%20query`: Search for a document - `/api/documents/?query=your%20search%20query`: Search for a document
using a full text query. For details on the syntax, see [Basic Usage - Searching](usage.md#basic-usage_searching). using a full text query. For details on the syntax, see [Basic Usage - Searching](usage.md#basic-usage_searching).
- `/api/documents/?more_like_id=1234`: Search for documents similar to - `/api/documents/?more_like_id=1234`: Search for documents similar to
the document with id 1234. the document with id 1234.
Pagination works exactly the same as it does for normal requests on this Pagination works exactly the same as it does for normal requests on this
endpoint. endpoint.
@ -268,12 +268,12 @@ attribute with various information about the search results:
} }
``` ```
- `score` is an indication how well this document matches the query - `score` is an indication how well this document matches the query
relative to the other search results. relative to the other search results.
- `highlights` is an excerpt from the document content and highlights - `highlights` is an excerpt from the document content and highlights
the search terms with `<span>` tags as shown above. the search terms with `<span>` tags as shown above.
- `rank` is the index of the search results. The first result will - `rank` is the index of the search results. The first result will
have rank 0. have rank 0.
### Filtering by custom fields ### Filtering by custom fields
@ -284,33 +284,33 @@ use cases:
1. Documents with a custom field "due" (date) between Aug 1, 2024 and 1. Documents with a custom field "due" (date) between Aug 1, 2024 and
Sept 1, 2024 (inclusive): Sept 1, 2024 (inclusive):
`?custom_field_query=["due", "range", ["2024-08-01", "2024-09-01"]]` `?custom_field_query=["due", "range", ["2024-08-01", "2024-09-01"]]`
2. Documents with a custom field "customer" (text) that equals "bob" 2. Documents with a custom field "customer" (text) that equals "bob"
(case sensitive): (case sensitive):
`?custom_field_query=["customer", "exact", "bob"]` `?custom_field_query=["customer", "exact", "bob"]`
3. Documents with a custom field "answered" (boolean) set to `true`: 3. Documents with a custom field "answered" (boolean) set to `true`:
`?custom_field_query=["answered", "exact", true]` `?custom_field_query=["answered", "exact", true]`
4. Documents with a custom field "favorite animal" (select) set to either 4. Documents with a custom field "favorite animal" (select) set to either
"cat" or "dog": "cat" or "dog":
`?custom_field_query=["favorite animal", "in", ["cat", "dog"]]` `?custom_field_query=["favorite animal", "in", ["cat", "dog"]]`
5. Documents with a custom field "address" (text) that is empty: 5. Documents with a custom field "address" (text) that is empty:
`?custom_field_query=["OR", ["address", "isnull", true], ["address", "exact", ""]]` `?custom_field_query=["OR", ["address", "isnull", true], ["address", "exact", ""]]`
6. Documents that don't have a field called "foo": 6. Documents that don't have a field called "foo":
`?custom_field_query=["foo", "exists", false]` `?custom_field_query=["foo", "exists", false]`
7. Documents that have document links "references" to both document 3 and 7: 7. Documents that have document links "references" to both document 3 and 7:
`?custom_field_query=["references", "contains", [3, 7]]` `?custom_field_query=["references", "contains", [3, 7]]`
All field types support basic operations including `exact`, `in`, `isnull`, All field types support basic operations including `exact`, `in`, `isnull`,
and `exists`. String, URL, and monetary fields support case-insensitive and `exists`. String, URL, and monetary fields support case-insensitive
@ -326,8 +326,8 @@ Get auto completions for a partial search term.
Query parameters: Query parameters:
- `term`: The incomplete term. - `term`: The incomplete term.
- `limit`: Amount of results. Defaults to 10. - `limit`: Amount of results. Defaults to 10.
Results returned by the endpoint are ordered by importance of the term Results returned by the endpoint are ordered by importance of the term
in the document index. The first result is the term that has the highest in the document index. The first result is the term that has the highest
@ -351,19 +351,19 @@ from there.
The endpoint supports the following optional form fields: The endpoint supports the following optional form fields:
- `title`: Specify a title that the consumer should use for the - `title`: Specify a title that the consumer should use for the
document. document.
- `created`: Specify a DateTime where the document was created (e.g. - `created`: Specify a DateTime where the document was created (e.g.
"2016-04-19" or "2016-04-19 06:15:00+02:00"). "2016-04-19" or "2016-04-19 06:15:00+02:00").
- `correspondent`: Specify the ID of a correspondent that the consumer - `correspondent`: Specify the ID of a correspondent that the consumer
should use for the document. should use for the document.
- `document_type`: Similar to correspondent. - `document_type`: Similar to correspondent.
- `storage_path`: Similar to correspondent. - `storage_path`: Similar to correspondent.
- `tags`: Similar to correspondent. Specify this multiple times to - `tags`: Similar to correspondent. Specify this multiple times to
have multiple tags added to the document. have multiple tags added to the document.
- `archive_serial_number`: An optional archive serial number to set. - `archive_serial_number`: An optional archive serial number to set.
- `custom_fields`: An array of custom field ids to assign (with an empty - `custom_fields`: An array of custom field ids to assign (with an empty
value) to the document. value) to the document.
The endpoint will immediately return HTTP 200 if the document consumption The endpoint will immediately return HTTP 200 if the document consumption
process was started successfully, with the UUID of the consumption task process was started successfully, with the UUID of the consumption task
@ -429,50 +429,50 @@ a json payload of the format:
The following methods are supported: The following methods are supported:
- `set_correspondent` - `set_correspondent`
- Requires `parameters`: `{ "correspondent": CORRESPONDENT_ID }` - Requires `parameters`: `{ "correspondent": CORRESPONDENT_ID }`
- `set_document_type` - `set_document_type`
- Requires `parameters`: `{ "document_type": DOCUMENT_TYPE_ID }` - Requires `parameters`: `{ "document_type": DOCUMENT_TYPE_ID }`
- `set_storage_path` - `set_storage_path`
- Requires `parameters`: `{ "storage_path": STORAGE_PATH_ID }` - Requires `parameters`: `{ "storage_path": STORAGE_PATH_ID }`
- `add_tag` - `add_tag`
- Requires `parameters`: `{ "tag": TAG_ID }` - Requires `parameters`: `{ "tag": TAG_ID }`
- `remove_tag` - `remove_tag`
- Requires `parameters`: `{ "tag": TAG_ID }` - Requires `parameters`: `{ "tag": TAG_ID }`
- `modify_tags` - `modify_tags`
- Requires `parameters`: `{ "add_tags": [LIST_OF_TAG_IDS] }` and / or `{ "remove_tags": [LIST_OF_TAG_IDS] }` - Requires `parameters`: `{ "add_tags": [LIST_OF_TAG_IDS] }` and / or `{ "remove_tags": [LIST_OF_TAG_IDS] }`
- `delete` - `delete`
- No `parameters` required - No `parameters` required
- `reprocess` - `reprocess`
- No `parameters` required - No `parameters` required
- `set_permissions` - `set_permissions`
- Requires `parameters`: - Requires `parameters`:
- `"set_permissions": PERMISSIONS_OBJ` (see format [above](#permissions)) and / or - `"set_permissions": PERMISSIONS_OBJ` (see format [above](#permissions)) and / or
- `"owner": OWNER_ID or null` - `"owner": OWNER_ID or null`
- `"merge": true or false` (defaults to false) - `"merge": true or false` (defaults to false)
- The `merge` flag determines if the supplied permissions will overwrite all existing permissions (including - The `merge` flag determines if the supplied permissions will overwrite all existing permissions (including
removing them) or be merged with existing permissions. removing them) or be merged with existing permissions.
- `merge` - `merge`
- No additional `parameters` required. - No additional `parameters` required.
- The ordering of the merged document is determined by the list of IDs. - The ordering of the merged document is determined by the list of IDs.
- Optional `parameters`: - Optional `parameters`:
- `"metadata_document_id": DOC_ID` apply metadata (tags, correspondent, etc.) from this document to the merged document. - `"metadata_document_id": DOC_ID` apply metadata (tags, correspondent, etc.) from this document to the merged document.
- `"delete_originals": true` to delete the original documents. This requires the calling user being the owner of - `"delete_originals": true` to delete the original documents. This requires the calling user being the owner of
all documents that are merged. all documents that are merged.
- `split` - `split`
- Requires `parameters`: - Requires `parameters`:
- `"pages": [..]` The list should be a list of pages and/or a ranges, separated by commas e.g. `"[1,2-3,4,5-7]"` - `"pages": [..]` The list should be a list of pages and/or a ranges, separated by commas e.g. `"[1,2-3,4,5-7]"`
- Optional `parameters`: - Optional `parameters`:
- `"delete_originals": true` to delete the original document after consumption. This requires the calling user being the owner of - `"delete_originals": true` to delete the original document after consumption. This requires the calling user being the owner of
the document. the document.
- The split operation only accepts a single document. - The split operation only accepts a single document.
- `rotate` - `rotate`
- Requires `parameters`: - Requires `parameters`:
- `"degrees": DEGREES`. Must be an integer i.e. 90, 180, 270 - `"degrees": DEGREES`. Must be an integer i.e. 90, 180, 270
- `delete_pages` - `delete_pages`
- Requires `parameters`: - Requires `parameters`:
- `"pages": [..]` The list should be a list of integers e.g. `"[2,3,4]"` - `"pages": [..]` The list should be a list of integers e.g. `"[2,3,4]"`
- The delete_pages operation only accepts a single document. - The delete_pages operation only accepts a single document.
### Objects ### Objects
@ -494,16 +494,16 @@ operations, using the endpoint: `/api/bulk_edit_objects/`, which requires a json
The REST API is versioned since Paperless-ngx 1.3.0. The REST API is versioned since Paperless-ngx 1.3.0.
- Versioning ensures that changes to the API don't break older - Versioning ensures that changes to the API don't break older
clients. clients.
- Clients specify the specific version of the API they wish to use - Clients specify the specific version of the API they wish to use
with every request and Paperless will handle the request using the with every request and Paperless will handle the request using the
specified API version. specified API version.
- Even if the underlying data model changes, older API versions will - Even if the underlying data model changes, older API versions will
always serve compatible data. always serve compatible data.
- If no version is specified, Paperless will serve version 1 to ensure - If no version is specified, Paperless will serve version 1 to ensure
compatibility with older clients that do not request a specific API compatibility with older clients that do not request a specific API
version. version.
API versions are specified by submitting an additional HTTP `Accept` API versions are specified by submitting an additional HTTP `Accept`
header with every request: header with every request:
@ -540,19 +540,19 @@ Initial API version.
#### Version 2 #### Version 2
- Added field `Tag.color`. This read/write string field contains a hex - Added field `Tag.color`. This read/write string field contains a hex
color such as `#a6cee3`. color such as `#a6cee3`.
- Added read-only field `Tag.text_color`. This field contains the text - Added read-only field `Tag.text_color`. This field contains the text
color to use for a specific tag, which is either black or white color to use for a specific tag, which is either black or white
depending on the brightness of `Tag.color`. depending on the brightness of `Tag.color`.
- Removed field `Tag.colour`. - Removed field `Tag.colour`.
#### Version 3 #### Version 3
- Permissions endpoints have been added. - Permissions endpoints have been added.
- The format of the `/api/ui_settings/` has changed. - The format of the `/api/ui_settings/` has changed.
#### Version 4 #### Version 4
- Consumption templates were refactored to workflows and API endpoints - Consumption templates were refactored to workflows and API endpoints
changed as such. changed as such.

File diff suppressed because it is too large Load Diff

View File

@ -8,17 +8,17 @@ common [OCR](#ocr) related settings and some frontend settings. If set, these wi
preference over the settings via environment variables. If not set, the environment setting preference over the settings via environment variables. If not set, the environment setting
or applicable default will be utilized instead. or applicable default will be utilized instead.
- If you run paperless on docker, `paperless.conf` is not used. - If you run paperless on docker, `paperless.conf` is not used.
Rather, configure paperless by copying necessary options to Rather, configure paperless by copying necessary options to
`docker-compose.env`. `docker-compose.env`.
- If you are running paperless on anything else, paperless will search - If you are running paperless on anything else, paperless will search
for the configuration file in these locations and use the first one for the configuration file in these locations and use the first one
it finds: it finds:
- The environment variable `PAPERLESS_CONFIGURATION_PATH` - The environment variable `PAPERLESS_CONFIGURATION_PATH`
- `/path/to/paperless/paperless.conf` - `/path/to/paperless/paperless.conf`
- `/etc/paperless.conf` - `/etc/paperless.conf`
- `/usr/local/etc/paperless.conf` - `/usr/local/etc/paperless.conf`
## Required services ## Required services

View File

@ -6,23 +6,23 @@ on Paperless-ngx.
Check out the source from GitHub. The repository is organized in the Check out the source from GitHub. The repository is organized in the
following way: following way:
- `main` always represents the latest release and will only see - `main` always represents the latest release and will only see
changes when a new release is made. changes when a new release is made.
- `dev` contains the code that will be in the next release. - `dev` contains the code that will be in the next release.
- `feature-X` contains bigger changes that will be in some release, but - `feature-X` contains bigger changes that will be in some release, but
not necessarily the next one. not necessarily the next one.
When making functional changes to Paperless-ngx, _always_ make your changes When making functional changes to Paperless-ngx, _always_ make your changes
on the `dev` branch. on the `dev` branch.
Apart from that, the folder structure is as follows: Apart from that, the folder structure is as follows:
- `docs/` - Documentation. - `docs/` - Documentation.
- `src-ui/` - Code of the front end. - `src-ui/` - Code of the front end.
- `src/` - Code of the back end. - `src/` - Code of the back end.
- `scripts/` - Various scripts that help with different parts of - `scripts/` - Various scripts that help with different parts of
development. development.
- `docker/` - Files required to build the docker image. - `docker/` - Files required to build the docker image.
## Contributing to Paperless-ngx ## Contributing to Paperless-ngx
@ -99,17 +99,17 @@ first-time setup.
7. You can now either ... 7. You can now either ...
- install redis or - install redis or
- use the included `scripts/start_services.sh` to use docker to fire - use the included `scripts/start_services.sh` to use docker to fire
up a redis instance (and some other services such as tika, up a redis instance (and some other services such as tika,
gotenberg and a database server) or gotenberg and a database server) or
- spin up a bare redis container - spin up a bare redis container
``` ```
$ docker run -d -p 6379:6379 --restart unless-stopped redis:latest $ docker run -d -p 6379:6379 --restart unless-stopped redis:latest
``` ```
8. Continue with either back-end or front-end development or both :-). 8. Continue with either back-end or front-end development or both :-).
@ -122,9 +122,9 @@ work well for development, but you can use whatever you want.
Configure the IDE to use the `src/`-folder as the base source folder. Configure the IDE to use the `src/`-folder as the base source folder.
Configure the following launch configurations in your IDE: Configure the following launch configurations in your IDE:
- `python3 manage.py runserver` - `python3 manage.py runserver`
- `python3 manage.py document_consumer` - `python3 manage.py document_consumer`
- `celery --app paperless worker -l DEBUG` (or any other log level) - `celery --app paperless worker -l DEBUG` (or any other log level)
To start them all: To start them all:
@ -150,11 +150,11 @@ $ ng build --configuration production
### Testing ### Testing
- Run `pytest` in the `src/` directory to execute all tests. This also - Run `pytest` in the `src/` directory to execute all tests. This also
generates a HTML coverage report. When runnings test, `paperless.conf` generates a HTML coverage report. When runnings test, `paperless.conf`
is loaded as well. However, the tests rely on the default is loaded as well. However, the tests rely on the default
configuration. This is not ideal. But for now, make sure no settings configuration. This is not ideal. But for now, make sure no settings
except for DEBUG are overridden when testing. except for DEBUG are overridden when testing.
!!! note !!! note
@ -245,14 +245,14 @@ these parts have to be translated separately.
### Front end localization ### Front end localization
- The AngularJS front end does localization according to the [Angular - The AngularJS front end does localization according to the [Angular
documentation](https://angular.io/guide/i18n). documentation](https://angular.io/guide/i18n).
- The source language of the project is "en_US". - The source language of the project is "en_US".
- The source strings end up in the file `src-ui/messages.xlf`. - The source strings end up in the file `src-ui/messages.xlf`.
- The translated strings need to be placed in the - The translated strings need to be placed in the
`src-ui/src/locale/` folder. `src-ui/src/locale/` folder.
- In order to extract added or changed strings from the source files, - In order to extract added or changed strings from the source files,
call `ng extract-i18n`. call `ng extract-i18n`.
Adding new languages requires adding the translated files in the Adding new languages requires adding the translated files in the
`src-ui/src/locale/` folder and adjusting a couple files. `src-ui/src/locale/` folder and adjusting a couple files.
@ -298,18 +298,18 @@ A majority of the strings that appear in the back end appear only when
the admin is used. However, some of these are still shown on the front the admin is used. However, some of these are still shown on the front
end (such as error messages). end (such as error messages).
- The django application does localization according to the [Django - The django application does localization according to the [Django
documentation](https://docs.djangoproject.com/en/3.1/topics/i18n/translation/). documentation](https://docs.djangoproject.com/en/3.1/topics/i18n/translation/).
- The source language of the project is "en_US". - The source language of the project is "en_US".
- Localization files end up in the folder `src/locale/`. - Localization files end up in the folder `src/locale/`.
- In order to extract strings from the application, call - In order to extract strings from the application, call
`python3 manage.py makemessages -l en_US`. This is important after `python3 manage.py makemessages -l en_US`. This is important after
making changes to translatable strings. making changes to translatable strings.
- The message files need to be compiled for them to show up in the - The message files need to be compiled for them to show up in the
application. Call `python3 manage.py compilemessages` to do this. application. Call `python3 manage.py compilemessages` to do this.
The generated files don't get committed into git, since these are The generated files don't get committed into git, since these are
derived artifacts. The build pipeline takes care of executing this derived artifacts. The build pipeline takes care of executing this
command. command.
Adding new languages requires adding the translated files in the Adding new languages requires adding the translated files in the
`src/locale/`-folder and adjusting the file `src/locale/`-folder and adjusting the file
@ -378,10 +378,10 @@ base code.
Paperless-ngx uses parsers to add documents. A parser is Paperless-ngx uses parsers to add documents. A parser is
responsible for: responsible for:
- Retrieving the content from the original - Retrieving the content from the original
- Creating a thumbnail - Creating a thumbnail
- _optional:_ Retrieving a created date from the original - _optional:_ Retrieving a created date from the original
- _optional:_ Creating an archived document from the original - _optional:_ Creating an archived document from the original
Custom parsers can be added to Paperless-ngx to support more file types. In Custom parsers can be added to Paperless-ngx to support more file types. In
order to do that, you need to write the parser itself and announce its order to do that, you need to write the parser itself and announce its
@ -439,14 +439,14 @@ def myparser_consumer_declaration(sender, **kwargs):
} }
``` ```
- `parser` is a reference to a class that extends `DocumentParser`. - `parser` is a reference to a class that extends `DocumentParser`.
- `weight` is used whenever two or more parsers are able to parse a - `weight` is used whenever two or more parsers are able to parse a
file: The parser with the higher weight wins. This can be used to file: The parser with the higher weight wins. This can be used to
override the parsers provided by Paperless-ngx. override the parsers provided by Paperless-ngx.
- `mime_types` is a dictionary. The keys are the mime types your - `mime_types` is a dictionary. The keys are the mime types your
parser supports and the value is the default file extension that parser supports and the value is the default file extension that
Paperless-ngx should use when storing files and serving them for Paperless-ngx should use when storing files and serving them for
download. We could guess that from the file extensions, but some download. We could guess that from the file extensions, but some
mime types have many extensions associated with them and the Python mime types have many extensions associated with them and the Python
methods responsible for guessing the extension do not always return methods responsible for guessing the extension do not always return
the same value. the same value.

View File

@ -40,28 +40,28 @@ system. On Linux, chances are high that this location is
You can always drag those files out of that folder to use them You can always drag those files out of that folder to use them
elsewhere. Here are a couple notes about that. elsewhere. Here are a couple notes about that.
- Paperless-ngx never modifies your original documents. It keeps - Paperless-ngx never modifies your original documents. It keeps
checksums of all documents and uses a scheduled sanity checker to checksums of all documents and uses a scheduled sanity checker to
check that they remain the same. check that they remain the same.
- By default, paperless uses the internal ID of each document as its - By default, paperless uses the internal ID of each document as its
filename. This might not be very convenient for export. However, you filename. This might not be very convenient for export. However, you
can adjust the way files are stored in paperless by can adjust the way files are stored in paperless by
[configuring the filename format](advanced_usage.md#file-name-handling). [configuring the filename format](advanced_usage.md#file-name-handling).
- [The exporter](administration.md#exporter) is - [The exporter](administration.md#exporter) is
another easy way to get your files out of paperless with reasonable another easy way to get your files out of paperless with reasonable
file names. file names.
## _What file types does paperless-ngx support?_ ## _What file types does paperless-ngx support?_
**A:** Currently, the following files are supported: **A:** Currently, the following files are supported:
- PDF documents, PNG images, JPEG images, TIFF images, GIF images and - PDF documents, PNG images, JPEG images, TIFF images, GIF images and
WebP images are processed with OCR and converted into PDF documents. WebP images are processed with OCR and converted into PDF documents.
- Plain text documents are supported as well and are added verbatim to - Plain text documents are supported as well and are added verbatim to
paperless. paperless.
- With the optional Tika integration enabled (see [Tika configuration](https://docs.paperless-ngx.com/configuration#tika)), - With the optional Tika integration enabled (see [Tika configuration](https://docs.paperless-ngx.com/configuration#tika)),
Paperless also supports various Office documents (.docx, .doc, odt, Paperless also supports various Office documents (.docx, .doc, odt,
.ppt, .pptx, .odp, .xls, .xlsx, .ods). .ppt, .pptx, .odp, .xls, .xlsx, .ods).
Paperless-ngx determines the type of a file by inspecting its content. Paperless-ngx determines the type of a file by inspecting its content.
The file extensions do not matter. The file extensions do not matter.
@ -127,11 +127,11 @@ ASGI-enabled web server as well that processes WebSocket connections,
and configure Apache to redirect WebSocket connections to this server. and configure Apache to redirect WebSocket connections to this server.
Multiple options for ASGI servers exist: Multiple options for ASGI servers exist:
- `gunicorn` with `uvicorn` as the worker implementation (the default - `gunicorn` with `uvicorn` as the worker implementation (the default
of paperless) of paperless)
- `daphne` as a standalone server, which is the reference - `daphne` as a standalone server, which is the reference
implementation for ASGI. implementation for ASGI.
- `uvicorn` as a standalone server - `uvicorn` as a standalone server
## _What about the Redis licensing change and using one of the open source forks_? ## _What about the Redis licensing change and using one of the open source forks_?

View File

@ -2,11 +2,11 @@
You can go multiple routes to setup and run Paperless: You can go multiple routes to setup and run Paperless:
- [Use the easy install docker script](#docker_script) - [Use the easy install docker script](#docker_script)
- [Pull the image from Docker Hub](#docker_hub) - [Pull the image from Docker Hub](#docker_hub)
- [Build the Docker image yourself](#docker_build) - [Build the Docker image yourself](#docker_build)
- [Install Paperless directly on your system manually (bare metal)](#bare_metal) - [Install Paperless directly on your system manually (bare metal)](#bare_metal)
- A user-maintained list of commercial hosting providers can be found [in the wiki](https://github.com/paperless-ngx/paperless-ngx/wiki/Related-Projects) - A user-maintained list of commercial hosting providers can be found [in the wiki](https://github.com/paperless-ngx/paperless-ngx/wiki/Related-Projects)
The Docker routes are quick & easy. These are the recommended routes. The Docker routes are quick & easy. These are the recommended routes.
This configures all the stuff from the above automatically so that it This configures all the stuff from the above automatically so that it
@ -105,14 +105,14 @@ steps described in [Docker setup](#docker_hub) automatically.
```yaml ```yaml
ports: ports:
- 8000:8000 - 8000:8000
``` ```
Replace the part BEFORE the colon with a port of your choice: Replace the part BEFORE the colon with a port of your choice:
```yaml ```yaml
ports: ports:
- 8010:8000 - 8010:8000
``` ```
Don't change the part after the colon or edit other lines that Don't change the part after the colon or edit other lines that
@ -129,11 +129,11 @@ steps described in [Docker setup](#docker_hub) automatically.
If you want to run Paperless as a rootless container, you will need If you want to run Paperless as a rootless container, you will need
to do the following in your `docker-compose.yml`: to do the following in your `docker-compose.yml`:
- set the `user` running the container to map to the `paperless` - set the `user` running the container to map to the `paperless`
user in the container. This value (`user_id` below), should be user in the container. This value (`user_id` below), should be
the same id that `USERMAP_UID` and `USERMAP_GID` are set to in the same id that `USERMAP_UID` and `USERMAP_GID` are set to in
the next step. See `USERMAP_UID` and `USERMAP_GID` the next step. See `USERMAP_UID` and `USERMAP_GID`
[here](configuration.md#docker). [here](configuration.md#docker).
Your entry for Paperless should contain something like: Your entry for Paperless should contain something like:
@ -222,7 +222,7 @@ steps described in [Docker setup](#docker_hub) automatically.
```yaml ```yaml
webserver: webserver:
image: ghcr.io/paperless-ngx/paperless-ngx:latest image: ghcr.io/paperless-ngx/paperless-ngx:latest
``` ```
and replace it with a line that instructs Docker Compose to build and replace it with a line that instructs Docker Compose to build
@ -230,8 +230,8 @@ steps described in [Docker setup](#docker_hub) automatically.
```yaml ```yaml
webserver: webserver:
build: build:
context: . context: .
``` ```
4. Follow steps 3 to 8 of [Docker Setup](#docker_hub). When asked to run 4. Follow steps 3 to 8 of [Docker Setup](#docker_hub). When asked to run
@ -257,20 +257,20 @@ are released, dependency support is confirmed, etc.
1. Install dependencies. Paperless requires the following packages. 1. Install dependencies. Paperless requires the following packages.
- `python3` - `python3`
- `python3-pip` - `python3-pip`
- `python3-dev` - `python3-dev`
- `default-libmysqlclient-dev` for MariaDB - `default-libmysqlclient-dev` for MariaDB
- `pkg-config` for mysqlclient (python dependency) - `pkg-config` for mysqlclient (python dependency)
- `fonts-liberation` for generating thumbnails for plain text - `fonts-liberation` for generating thumbnails for plain text
files files
- `imagemagick` >= 6 for PDF conversion - `imagemagick` >= 6 for PDF conversion
- `gnupg` for handling encrypted documents - `gnupg` for handling encrypted documents
- `libpq-dev` for PostgreSQL - `libpq-dev` for PostgreSQL
- `libmagic-dev` for mime type detection - `libmagic-dev` for mime type detection
- `mariadb-client` for MariaDB compile time - `mariadb-client` for MariaDB compile time
- `libzbar0` for barcode detection - `libzbar0` for barcode detection
- `poppler-utils` for barcode detection - `poppler-utils` for barcode detection
Use this list for your preferred package management: Use this list for your preferred package management:
@ -281,17 +281,17 @@ are released, dependency support is confirmed, etc.
These dependencies are required for OCRmyPDF, which is used for text These dependencies are required for OCRmyPDF, which is used for text
recognition. recognition.
- `unpaper` - `unpaper`
- `ghostscript` - `ghostscript`
- `icc-profiles-free` - `icc-profiles-free`
- `qpdf` - `qpdf`
- `liblept5` - `liblept5`
- `libxml2` - `libxml2`
- `pngquant` (suggested for certain PDF image optimizations) - `pngquant` (suggested for certain PDF image optimizations)
- `zlib1g` - `zlib1g`
- `tesseract-ocr` >= 4.0.0 for OCR - `tesseract-ocr` >= 4.0.0 for OCR
- `tesseract-ocr` language packs (`tesseract-ocr-eng`, - `tesseract-ocr` language packs (`tesseract-ocr-eng`,
`tesseract-ocr-deu`, etc) `tesseract-ocr-deu`, etc)
Use this list for your preferred package management: Use this list for your preferred package management:
@ -301,15 +301,15 @@ are released, dependency support is confirmed, etc.
On Raspberry Pi, these libraries are required as well: On Raspberry Pi, these libraries are required as well:
- `libatlas-base-dev` - `libatlas-base-dev`
- `libxslt1-dev` - `libxslt1-dev`
- `mime-support` - `mime-support`
You will also need these for installing some of the python dependencies: You will also need these for installing some of the python dependencies:
- `build-essential` - `build-essential`
- `python3-setuptools` - `python3-setuptools`
- `python3-wheel` - `python3-wheel`
Use this list for your preferred package management: Use this list for your preferred package management:
@ -361,33 +361,33 @@ are released, dependency support is confirmed, etc.
needs. Required settings for getting needs. Required settings for getting
paperless running are: paperless running are:
- [`PAPERLESS_REDIS`](configuration.md#PAPERLESS_REDIS) should point to your redis server, such as - [`PAPERLESS_REDIS`](configuration.md#PAPERLESS_REDIS) should point to your redis server, such as
<redis://localhost:6379>. <redis://localhost:6379>.
- [`PAPERLESS_DBENGINE`](configuration.md#PAPERLESS_DBENGINE) optional, and should be one of `postgres`, - [`PAPERLESS_DBENGINE`](configuration.md#PAPERLESS_DBENGINE) optional, and should be one of `postgres`,
`mariadb`, or `sqlite` `mariadb`, or `sqlite`
- [`PAPERLESS_DBHOST`](configuration.md#PAPERLESS_DBHOST) should be the hostname on which your - [`PAPERLESS_DBHOST`](configuration.md#PAPERLESS_DBHOST) should be the hostname on which your
PostgreSQL server is running. Do not configure this to use PostgreSQL server is running. Do not configure this to use
SQLite instead. Also configure port, database name, user and SQLite instead. Also configure port, database name, user and
password as necessary. password as necessary.
- [`PAPERLESS_CONSUMPTION_DIR`](configuration.md#PAPERLESS_CONSUMPTION_DIR) should point to a folder which - [`PAPERLESS_CONSUMPTION_DIR`](configuration.md#PAPERLESS_CONSUMPTION_DIR) should point to a folder which
paperless should watch for documents. You might want to have paperless should watch for documents. You might want to have
this somewhere else. Likewise, [`PAPERLESS_DATA_DIR`](configuration.md#PAPERLESS_DATA_DIR) and this somewhere else. Likewise, [`PAPERLESS_DATA_DIR`](configuration.md#PAPERLESS_DATA_DIR) and
[`PAPERLESS_MEDIA_ROOT`](configuration.md#PAPERLESS_MEDIA_ROOT) define where paperless stores its data. [`PAPERLESS_MEDIA_ROOT`](configuration.md#PAPERLESS_MEDIA_ROOT) define where paperless stores its data.
If you like, you can point both to the same directory. If you like, you can point both to the same directory.
- [`PAPERLESS_SECRET_KEY`](configuration.md#PAPERLESS_SECRET_KEY) should be a random sequence of - [`PAPERLESS_SECRET_KEY`](configuration.md#PAPERLESS_SECRET_KEY) should be a random sequence of
characters. It's used for authentication. Failure to do so characters. It's used for authentication. Failure to do so
allows third parties to forge authentication credentials. allows third parties to forge authentication credentials.
- [`PAPERLESS_URL`](configuration.md#PAPERLESS_URL) if you are behind a reverse proxy. This should - [`PAPERLESS_URL`](configuration.md#PAPERLESS_URL) if you are behind a reverse proxy. This should
point to your domain. Please see point to your domain. Please see
[configuration](configuration.md) for more [configuration](configuration.md) for more
information. information.
Many more adjustments can be made to paperless, especially the OCR Many more adjustments can be made to paperless, especially the OCR
part. The following options are recommended for everyone: part. The following options are recommended for everyone:
- Set [`PAPERLESS_OCR_LANGUAGE`](configuration.md#PAPERLESS_OCR_LANGUAGE) to the language most of your - Set [`PAPERLESS_OCR_LANGUAGE`](configuration.md#PAPERLESS_OCR_LANGUAGE) to the language most of your
documents are written in. documents are written in.
- Set [`PAPERLESS_TIME_ZONE`](configuration.md#PAPERLESS_TIME_ZONE) to your local time zone. - Set [`PAPERLESS_TIME_ZONE`](configuration.md#PAPERLESS_TIME_ZONE) to your local time zone.
!!! warning !!! warning
@ -395,9 +395,9 @@ are released, dependency support is confirmed, etc.
7. Create the following directories if they are missing: 7. Create the following directories if they are missing:
- `/opt/paperless/media` - `/opt/paperless/media`
- `/opt/paperless/data` - `/opt/paperless/data`
- `/opt/paperless/consume` - `/opt/paperless/consume`
Adjust as necessary if you configured different folders. Adjust as necessary if you configured different folders.
Ensure that the paperless user has write permissions for every one Ensure that the paperless user has write permissions for every one
@ -586,21 +586,21 @@ your setup depending on how you installed paperless.
This setup describes how to update an existing paperless Docker This setup describes how to update an existing paperless Docker
installation. The important things to keep in mind are as follows: installation. The important things to keep in mind are as follows:
- Read the [changelog](changelog.md) and - Read the [changelog](changelog.md) and
take note of breaking changes. take note of breaking changes.
- You should decide if you want to stick with SQLite or want to - You should decide if you want to stick with SQLite or want to
migrate your database to PostgreSQL. See [documentation](#sqlite_to_psql) migrate your database to PostgreSQL. See [documentation](#sqlite_to_psql)
for details on for details on
how to move your data from SQLite to PostgreSQL. Both work fine with how to move your data from SQLite to PostgreSQL. Both work fine with
paperless. However, if you already have a database server running paperless. However, if you already have a database server running
for other services, you might as well use it for paperless as well. for other services, you might as well use it for paperless as well.
- The task scheduler of paperless, which is used to execute periodic - The task scheduler of paperless, which is used to execute periodic
tasks such as email checking and maintenance, requires a tasks such as email checking and maintenance, requires a
[redis](https://redis.io/) message broker instance. The [redis](https://redis.io/) message broker instance. The
Docker Compose route takes care of that. Docker Compose route takes care of that.
- The layout of the folder structure for your documents and data - The layout of the folder structure for your documents and data
remains the same, so you can just plug your old docker volumes into remains the same, so you can just plug your old docker volumes into
paperless-ngx and expect it to find everything where it should be. paperless-ngx and expect it to find everything where it should be.
Migration to paperless-ngx is then performed in a few simple steps: Migration to paperless-ngx is then performed in a few simple steps:
@ -763,30 +763,30 @@ Paperless runs on Raspberry Pi. However, some things are rather slow on
the Pi and configuring some options in paperless can help improve the Pi and configuring some options in paperless can help improve
performance immensely: performance immensely:
- Stick with SQLite to save some resources. - Stick with SQLite to save some resources.
- Consider setting [`PAPERLESS_OCR_PAGES`](configuration.md#PAPERLESS_OCR_PAGES) to 1, so that paperless will - Consider setting [`PAPERLESS_OCR_PAGES`](configuration.md#PAPERLESS_OCR_PAGES) to 1, so that paperless will
only OCR the first page of your documents. In most cases, this page only OCR the first page of your documents. In most cases, this page
contains enough information to be able to find it. contains enough information to be able to find it.
- [`PAPERLESS_TASK_WORKERS`](configuration.md#PAPERLESS_TASK_WORKERS) and [`PAPERLESS_THREADS_PER_WORKER`](configuration.md#PAPERLESS_THREADS_PER_WORKER) are - [`PAPERLESS_TASK_WORKERS`](configuration.md#PAPERLESS_TASK_WORKERS) and [`PAPERLESS_THREADS_PER_WORKER`](configuration.md#PAPERLESS_THREADS_PER_WORKER) are
configured to use all cores. The Raspberry Pi models 3 and up have 4 configured to use all cores. The Raspberry Pi models 3 and up have 4
cores, meaning that paperless will use 2 workers and 2 threads per cores, meaning that paperless will use 2 workers and 2 threads per
worker. This may result in sluggish response times during worker. This may result in sluggish response times during
consumption, so you might want to lower these settings (example: 2 consumption, so you might want to lower these settings (example: 2
workers and 1 thread to always have some computing power left for workers and 1 thread to always have some computing power left for
other tasks). other tasks).
- Keep [`PAPERLESS_OCR_MODE`](configuration.md#PAPERLESS_OCR_MODE) at its default value `skip` and consider - Keep [`PAPERLESS_OCR_MODE`](configuration.md#PAPERLESS_OCR_MODE) at its default value `skip` and consider
OCR'ing your documents before feeding them into paperless. Some OCR'ing your documents before feeding them into paperless. Some
scanners are able to do this! scanners are able to do this!
- Set [`PAPERLESS_OCR_SKIP_ARCHIVE_FILE`](configuration.md#PAPERLESS_OCR_SKIP_ARCHIVE_FILE) to `with_text` to skip archive - Set [`PAPERLESS_OCR_SKIP_ARCHIVE_FILE`](configuration.md#PAPERLESS_OCR_SKIP_ARCHIVE_FILE) to `with_text` to skip archive
file generation for already ocr'ed documents, or `always` to skip it file generation for already ocr'ed documents, or `always` to skip it
for all documents. for all documents.
- If you want to perform OCR on the device, consider using - If you want to perform OCR on the device, consider using
`PAPERLESS_OCR_CLEAN=none`. This will speed up OCR times and use `PAPERLESS_OCR_CLEAN=none`. This will speed up OCR times and use
less memory at the expense of slightly worse OCR results. less memory at the expense of slightly worse OCR results.
- If using docker, consider setting [`PAPERLESS_WEBSERVER_WORKERS`](configuration.md#PAPERLESS_WEBSERVER_WORKERS) to 1. This will save some memory. - If using docker, consider setting [`PAPERLESS_WEBSERVER_WORKERS`](configuration.md#PAPERLESS_WEBSERVER_WORKERS) to 1. This will save some memory.
- Consider setting [`PAPERLESS_ENABLE_NLTK`](configuration.md#PAPERLESS_ENABLE_NLTK) to false, to disable the - Consider setting [`PAPERLESS_ENABLE_NLTK`](configuration.md#PAPERLESS_ENABLE_NLTK) to false, to disable the
more advanced language processing, which can take more memory and more advanced language processing, which can take more memory and
processing time. processing time.
For details, refer to [configuration](configuration.md). For details, refer to [configuration](configuration.md).

View File

@ -4,27 +4,27 @@
Check for the following issues: Check for the following issues:
- Ensure that the directory you're putting your documents in is the - Ensure that the directory you're putting your documents in is the
folder paperless is watching. With docker, this setting is performed folder paperless is watching. With docker, this setting is performed
in the `docker-compose.yml` file. Without Docker, look at the in the `docker-compose.yml` file. Without Docker, look at the
`CONSUMPTION_DIR` setting. Don't adjust this setting if you're `CONSUMPTION_DIR` setting. Don't adjust this setting if you're
using docker. using docker.
- Ensure that redis is up and running. Paperless does its task - Ensure that redis is up and running. Paperless does its task
processing asynchronously, and for documents to arrive at the task processing asynchronously, and for documents to arrive at the task
processor, it needs redis to run. processor, it needs redis to run.
- Ensure that the task processor is running. Docker does this - Ensure that the task processor is running. Docker does this
automatically. Manually invoke the task processor by executing automatically. Manually invoke the task processor by executing
```shell-session ```shell-session
$ celery --app paperless worker $ celery --app paperless worker
``` ```
- Look at the output of paperless and inspect it for any errors. - Look at the output of paperless and inspect it for any errors.
- Go to the admin interface, and check if there are failed tasks. If - Go to the admin interface, and check if there are failed tasks. If
so, the tasks will contain an error message. so, the tasks will contain an error message.
## Consumer warns `OCR for XX failed` ## Consumer warns `OCR for XX failed`
@ -78,12 +78,12 @@ Ensure that `chown` is possible on these directories.
This indicates that the Auto matching algorithm found no documents to This indicates that the Auto matching algorithm found no documents to
learn from. This may have two reasons: learn from. This may have two reasons:
- You don't use the Auto matching algorithm: The error can be safely - You don't use the Auto matching algorithm: The error can be safely
ignored in this case. ignored in this case.
- You are using the Auto matching algorithm: The classifier explicitly - You are using the Auto matching algorithm: The classifier explicitly
excludes documents with Inbox tags. Verify that there are documents excludes documents with Inbox tags. Verify that there are documents
in your archive without inbox tags. The algorithm will only learn in your archive without inbox tags. The algorithm will only learn
from documents not in your inbox. from documents not in your inbox.
## UserWarning in sklearn on every single document ## UserWarning in sklearn on every single document
@ -127,10 +127,10 @@ change in the `docker-compose.yml` file:
# The gotenberg chromium route is used to convert .eml files. We do not # The gotenberg chromium route is used to convert .eml files. We do not
# want to allow external content like tracking pixels or even javascript. # want to allow external content like tracking pixels or even javascript.
command: command:
- 'gotenberg' - 'gotenberg'
- '--chromium-disable-javascript=true' - '--chromium-disable-javascript=true'
- '--chromium-allow-list=file:///tmp/.*' - '--chromium-allow-list=file:///tmp/.*'
- '--api-timeout=60' - '--api-timeout=60'
``` ```
## Permission denied errors in the consumption directory ## Permission denied errors in the consumption directory

View File

@ -10,37 +10,37 @@ and provides many utilities for finding and managing your documents.
Paperless essentially consists of two different parts for managing your Paperless essentially consists of two different parts for managing your
documents: documents:
- The _consumer_ watches a specified folder and adds all documents in - The _consumer_ watches a specified folder and adds all documents in
that folder to paperless. that folder to paperless.
- The _web server_ provides a UI that you use to manage and search for - The _web server_ provides a UI that you use to manage and search for
your scanned documents. your scanned documents.
Each document has a couple of fields that you can assign to them: Each document has a couple of fields that you can assign to them:
- A _Document_ is a piece of paper that sometimes contains valuable - A _Document_ is a piece of paper that sometimes contains valuable
information. information.
- The _correspondent_ of a document is the person, institution or - The _correspondent_ of a document is the person, institution or
company that a document either originates from, or is sent to. company that a document either originates from, or is sent to.
- A _tag_ is a label that you can assign to documents. Think of labels - A _tag_ is a label that you can assign to documents. Think of labels
as more powerful folders: Multiple documents can be grouped together as more powerful folders: Multiple documents can be grouped together
with a single tag, however, a single document can also have multiple with a single tag, however, a single document can also have multiple
tags. This is not possible with folders. The reason folders are not tags. This is not possible with folders. The reason folders are not
implemented in paperless is simply that tags are much more versatile implemented in paperless is simply that tags are much more versatile
than folders. than folders.
- A _document type_ is used to demarcate the type of a document such - A _document type_ is used to demarcate the type of a document such
as letter, bank statement, invoice, contract, etc. It is used to as letter, bank statement, invoice, contract, etc. It is used to
identify what a document is about. identify what a document is about.
- The _date added_ of a document is the date the document was scanned - The _date added_ of a document is the date the document was scanned
into paperless. You cannot and should not change this date. into paperless. You cannot and should not change this date.
- The _date created_ of a document is the date the document was - The _date created_ of a document is the date the document was
initially issued. This can be the date you bought a product, the initially issued. This can be the date you bought a product, the
date you signed a contract, or the date a letter was sent to you. date you signed a contract, or the date a letter was sent to you.
- The _archive serial number_ (short: ASN) of a document is the - The _archive serial number_ (short: ASN) of a document is the
identifier of the document in your physical document binders. See identifier of the document in your physical document binders. See
[recommended workflow](#usage-recommended-workflow) below. [recommended workflow](#usage-recommended-workflow) below.
- The _content_ of a document is the text that was OCR'ed from the - The _content_ of a document is the text that was OCR'ed from the
document. This text is fed into the search engine and is used for document. This text is fed into the search engine and is used for
matching tags, correspondents and document types. matching tags, correspondents and document types.
## Adding documents to paperless ## Adding documents to paperless
@ -142,21 +142,21 @@ patterns can include wildcards and multiple patterns separated by a comma.
The actions all ensure that the same mail is not consumed twice by The actions all ensure that the same mail is not consumed twice by
different means. These are as follows: different means. These are as follows:
- **Delete:** Immediately deletes mail that paperless has consumed - **Delete:** Immediately deletes mail that paperless has consumed
documents from. Use with caution. documents from. Use with caution.
- **Mark as read:** Mark consumed mail as read. Paperless will not - **Mark as read:** Mark consumed mail as read. Paperless will not
consume documents from already read mails. If you read a mail before consume documents from already read mails. If you read a mail before
paperless sees it, it will be ignored. paperless sees it, it will be ignored.
- **Flag:** Sets the 'important' flag on mails with consumed - **Flag:** Sets the 'important' flag on mails with consumed
documents. Paperless will not consume flagged mails. documents. Paperless will not consume flagged mails.
- **Move to folder:** Moves consumed mails out of the way so that - **Move to folder:** Moves consumed mails out of the way so that
paperless won't consume them again. paperless won't consume them again.
- **Add custom Tag:** Adds a custom tag to mails with consumed - **Add custom Tag:** Adds a custom tag to mails with consumed
documents (the IMAP standard calls these "keywords"). Paperless documents (the IMAP standard calls these "keywords"). Paperless
will not consume mails already tagged. Not all mail servers support will not consume mails already tagged. Not all mail servers support
this feature! this feature!
- **Apple Mail support:** Apple Mail clients allow differently colored tags. For this to work use `apple:<color>` (e.g. _apple:green_) as a custom tag. Available colors are _red_, _orange_, _yellow_, _blue_, _green_, _violet_ and _grey_. - **Apple Mail support:** Apple Mail clients allow differently colored tags. For this to work use `apple:<color>` (e.g. _apple:green_) as a custom tag. Available colors are _red_, _orange_, _yellow_, _blue_, _green_, _violet_ and _grey_.
!!! warning !!! warning
@ -360,32 +360,32 @@ flowchart TD
Workflows allow you to filter by: Workflows allow you to filter by:
- Source, e.g. documents uploaded via consume folder, API (& the web UI) and mail fetch - Source, e.g. documents uploaded via consume folder, API (& the web UI) and mail fetch
- File name, including wildcards e.g. \*.pdf will apply to all pdfs - File name, including wildcards e.g. \*.pdf will apply to all pdfs
- File path, including wildcards. Note that enabling `PAPERLESS_CONSUMER_RECURSIVE` would allow, for - File path, including wildcards. Note that enabling `PAPERLESS_CONSUMER_RECURSIVE` would allow, for
example, automatically assigning documents to different owners based on the upload directory. example, automatically assigning documents to different owners based on the upload directory.
- Mail rule. Choosing this option will force 'mail fetch' to be the workflow source. - Mail rule. Choosing this option will force 'mail fetch' to be the workflow source.
- Content matching (`Added` and `Updated` triggers only). Filter document content using the matching settings. - Content matching (`Added` and `Updated` triggers only). Filter document content using the matching settings.
- Tags (`Added` and `Updated` triggers only). Filter for documents with any of the specified tags - Tags (`Added` and `Updated` triggers only). Filter for documents with any of the specified tags
- Document type (`Added` and `Updated` triggers only). Filter documents with this doc type - Document type (`Added` and `Updated` triggers only). Filter documents with this doc type
- Correspondent (`Added` and `Updated` triggers only). Filter documents with this correspondent - Correspondent (`Added` and `Updated` triggers only). Filter documents with this correspondent
### Workflow Actions ### Workflow Actions
There are currently two types of workflow actions, "Assignment", which can assign: There are currently two types of workflow actions, "Assignment", which can assign:
- Title, see [title placeholders](usage.md#title-placeholders) below - Title, see [title placeholders](usage.md#title-placeholders) below
- Tags, correspondent, document type and storage path - Tags, correspondent, document type and storage path
- Document owner - Document owner
- View and / or edit permissions to users or groups - View and / or edit permissions to users or groups
- Custom fields. Note that no value for the field will be set - Custom fields. Note that no value for the field will be set
and "Removal" actions, which can remove either all of or specific sets of the following: and "Removal" actions, which can remove either all of or specific sets of the following:
- Tags, correspondents, document types or storage paths - Tags, correspondents, document types or storage paths
- Document owner - Document owner
- View and / or edit permissions - View and / or edit permissions
- Custom fields - Custom fields
#### Title placeholders #### Title placeholders
@ -393,29 +393,29 @@ Workflow titles can include placeholders but the available options differ depend
workflow trigger. This is because at the time of consumption (when the title is to be set), no automatic tags etc. have been workflow trigger. This is because at the time of consumption (when the title is to be set), no automatic tags etc. have been
applied. You can use the following placeholders with any trigger type: applied. You can use the following placeholders with any trigger type:
- `{correspondent}`: assigned correspondent name - `{correspondent}`: assigned correspondent name
- `{document_type}`: assigned document type name - `{document_type}`: assigned document type name
- `{owner_username}`: assigned owner username - `{owner_username}`: assigned owner username
- `{added}`: added datetime - `{added}`: added datetime
- `{added_year}`: added year - `{added_year}`: added year
- `{added_year_short}`: added year - `{added_year_short}`: added year
- `{added_month}`: added month - `{added_month}`: added month
- `{added_month_name}`: added month name - `{added_month_name}`: added month name
- `{added_month_name_short}`: added month short name - `{added_month_name_short}`: added month short name
- `{added_day}`: added day - `{added_day}`: added day
- `{added_time}`: added time in HH:MM format - `{added_time}`: added time in HH:MM format
- `{original_filename}`: original file name without extension - `{original_filename}`: original file name without extension
The following placeholders are only available for "added" or "updated" triggers The following placeholders are only available for "added" or "updated" triggers
- `{created}`: created datetime - `{created}`: created datetime
- `{created_year}`: created year - `{created_year}`: created year
- `{created_year_short}`: created year - `{created_year_short}`: created year
- `{created_month}`: created month - `{created_month}`: created month
- `{created_month_name}`: created month name - `{created_month_name}`: created month name
- `{created_month_name_short}`: created month short name - `{created_month_name_short}`: created month short name
- `{created_day}`: created day - `{created_day}`: created day
- `{created_time}`: created time in HH:MM format - `{created_time}`: created time in HH:MM format
### Workflow permissions ### Workflow permissions
@ -450,24 +450,24 @@ Multiple fields may be attached to a document but the same field name cannot be
The following custom field types are supported: The following custom field types are supported:
- `Text`: any text - `Text`: any text
- `Boolean`: true / false (check / unchecked) field - `Boolean`: true / false (check / unchecked) field
- `Date`: date - `Date`: date
- `URL`: a valid url - `URL`: a valid url
- `Integer`: integer number e.g. 12 - `Integer`: integer number e.g. 12
- `Number`: float number e.g. 12.3456 - `Number`: float number e.g. 12.3456
- `Monetary`: [ISO 4217 currency code](https://en.wikipedia.org/wiki/ISO_4217#List_of_ISO_4217_currency_codes) and a number with exactly two decimals, e.g. USD12.30 - `Monetary`: [ISO 4217 currency code](https://en.wikipedia.org/wiki/ISO_4217#List_of_ISO_4217_currency_codes) and a number with exactly two decimals, e.g. USD12.30
- `Document Link`: reference(s) to other document(s) displayed as links, automatically creates a symmetrical link in reverse - `Document Link`: reference(s) to other document(s) displayed as links, automatically creates a symmetrical link in reverse
- `Select`: a pre-defined list of strings from which the user can choose - `Select`: a pre-defined list of strings from which the user can choose
## Share Links ## Share Links
Paperless-ngx added the ability to create shareable links to files in version 2.0. You can find the button for this on the document detail screen. Paperless-ngx added the ability to create shareable links to files in version 2.0. You can find the button for this on the document detail screen.
- Share links do not require a user to login and thus link directly to a file. - Share links do not require a user to login and thus link directly to a file.
- Links are unique and are of the form `{paperless-url}/share/{randomly-generated-slug}`. - Links are unique and are of the form `{paperless-url}/share/{randomly-generated-slug}`.
- Links can optionally have an expiration time set. - Links can optionally have an expiration time set.
- After a link expires or is deleted users will be redirected to the regular paperless-ngx login. - After a link expires or is deleted users will be redirected to the regular paperless-ngx login.
!!! tip !!! tip
@ -477,10 +477,10 @@ Paperless-ngx added the ability to create shareable links to files in version 2.
Paperless-ngx supports four basic editing operations for PDFs (these operations currently cannot be performed on non-PDF files): Paperless-ngx supports four basic editing operations for PDFs (these operations currently cannot be performed on non-PDF files):
- Merging documents: available when selecting multiple documents for 'bulk editing'. - Merging documents: available when selecting multiple documents for 'bulk editing'.
- Rotating documents: available when selecting multiple documents for 'bulk editing' and from an individual document's details page. - Rotating documents: available when selecting multiple documents for 'bulk editing' and from an individual document's details page.
- Splitting documents: available from an individual document's details page. - Splitting documents: available from an individual document's details page.
- Deleting pages: available from an individual document's details page. - Deleting pages: available from an individual document's details page.
!!! important !!! important
@ -558,18 +558,18 @@ the system.
Here are a couple examples of tags and types that you could use in your Here are a couple examples of tags and types that you could use in your
collection. collection.
- An `inbox` tag for newly added documents that you haven't manually - An `inbox` tag for newly added documents that you haven't manually
edited yet. edited yet.
- A tag `car` for everything car related (repairs, registration, - A tag `car` for everything car related (repairs, registration,
insurance, etc) insurance, etc)
- A tag `todo` for documents that you still need to do something with, - A tag `todo` for documents that you still need to do something with,
such as reply, or perform some task online. such as reply, or perform some task online.
- A tag `bank account x` for all bank statement related to that - A tag `bank account x` for all bank statement related to that
account. account.
- A tag `mail` for anything that you added to paperless via its mail - A tag `mail` for anything that you added to paperless via its mail
processing capabilities. processing capabilities.
- A tag `missing_metadata` when you still need to add some metadata to - A tag `missing_metadata` when you still need to add some metadata to
a document, but can't or don't want to do this right now. a document, but can't or don't want to do this right now.
## Searching {#basic-usage_searching} ## Searching {#basic-usage_searching}
@ -658,8 +658,8 @@ The following diagram shows how easy it is to manage your documents.
### Preparations in paperless ### Preparations in paperless
- Create an inbox tag that gets assigned to all new documents. - Create an inbox tag that gets assigned to all new documents.
- Create a TODO tag. - Create a TODO tag.
### Processing of the physical documents ### Processing of the physical documents
@ -733,78 +733,78 @@ Some documents require attention and require you to act on the document.
You may take two different approaches to handle these documents based on You may take two different approaches to handle these documents based on
how regularly you intend to scan documents and use paperless. how regularly you intend to scan documents and use paperless.
- If you scan and process your documents in paperless regularly, - If you scan and process your documents in paperless regularly,
assign a TODO tag to all scanned documents that you need to process. assign a TODO tag to all scanned documents that you need to process.
Create a saved view on the dashboard that shows all documents with Create a saved view on the dashboard that shows all documents with
this tag. this tag.
- If you do not scan documents regularly and use paperless solely for - If you do not scan documents regularly and use paperless solely for
archiving, create a physical todo box next to your physical inbox archiving, create a physical todo box next to your physical inbox
and put documents you need to process in the TODO box. When you and put documents you need to process in the TODO box. When you
performed the task associated with the document, move it to the performed the task associated with the document, move it to the
inbox. inbox.
## Architecture ## Architecture
Paperless-ngx consists of the following components: Paperless-ngx consists of the following components:
- **The webserver:** This serves the administration pages, the API, - **The webserver:** This serves the administration pages, the API,
and the new frontend. This is the main tool you'll be using to interact and the new frontend. This is the main tool you'll be using to interact
with paperless. You may start the webserver directly with with paperless. You may start the webserver directly with
```shell-session ```shell-session
$ cd /path/to/paperless/src/ $ cd /path/to/paperless/src/
$ gunicorn -c ../gunicorn.conf.py paperless.wsgi $ gunicorn -c ../gunicorn.conf.py paperless.wsgi
``` ```
or by any other means such as Apache `mod_wsgi`. or by any other means such as Apache `mod_wsgi`.
- **The consumer:** This is what watches your consumption folder for - **The consumer:** This is what watches your consumption folder for
documents. However, the consumer itself does not really consume your documents. However, the consumer itself does not really consume your
documents. Now it notifies a task processor that a new file is ready documents. Now it notifies a task processor that a new file is ready
for consumption. I suppose it should be named differently. This was for consumption. I suppose it should be named differently. This was
also used to check your emails, but that's now done elsewhere as also used to check your emails, but that's now done elsewhere as
well. well.
Start the consumer with the management command `document_consumer`: Start the consumer with the management command `document_consumer`:
```shell-session ```shell-session
$ cd /path/to/paperless/src/ $ cd /path/to/paperless/src/
$ python3 manage.py document_consumer $ python3 manage.py document_consumer
``` ```
- **The task processor:** Paperless relies on [Celery - Distributed - **The task processor:** Paperless relies on [Celery - Distributed
Task Queue](https://docs.celeryq.dev/en/stable/index.html) for doing Task Queue](https://docs.celeryq.dev/en/stable/index.html) for doing
most of the heavy lifting. This is a task queue that accepts tasks most of the heavy lifting. This is a task queue that accepts tasks
from multiple sources and processes these in parallel. It also comes from multiple sources and processes these in parallel. It also comes
with a scheduler that executes certain commands periodically. with a scheduler that executes certain commands periodically.
This task processor is responsible for: This task processor is responsible for:
- Consuming documents. When the consumer finds new documents, it - Consuming documents. When the consumer finds new documents, it
notifies the task processor to start a consumption task. notifies the task processor to start a consumption task.
- The task processor also performs the consumption of any - The task processor also performs the consumption of any
documents you upload through the web interface. documents you upload through the web interface.
- Consuming emails. It periodically checks your configured - Consuming emails. It periodically checks your configured
accounts for new emails and notifies the task processor to accounts for new emails and notifies the task processor to
consume the attachment of an email. consume the attachment of an email.
- Maintaining the search index and the automatic matching - Maintaining the search index and the automatic matching
algorithm. These are things that paperless needs to do from time algorithm. These are things that paperless needs to do from time
to time in order to operate properly. to time in order to operate properly.
This allows paperless to process multiple documents from your This allows paperless to process multiple documents from your
consumption folder in parallel! On a modern multi core system, this consumption folder in parallel! On a modern multi core system, this
makes the consumption process with full OCR blazingly fast. makes the consumption process with full OCR blazingly fast.
The task processor comes with a built-in admin interface that you The task processor comes with a built-in admin interface that you
can use to check whenever any of the tasks fail and inspect the can use to check whenever any of the tasks fail and inspect the
errors (i.e., wrong email credentials, errors during consuming a errors (i.e., wrong email credentials, errors during consuming a
specific file, etc). specific file, etc).
- A [redis](https://redis.io/) message broker: This is a really - A [redis](https://redis.io/) message broker: This is a really
lightweight service that is responsible for getting the tasks from lightweight service that is responsible for getting the tasks from
the webserver and the consumer to the task scheduler. These run in a the webserver and the consumer to the task scheduler. These run in a
different process (maybe even on different machines!), and different process (maybe even on different machines!), and
therefore, this is necessary. therefore, this is necessary.
- Optional: A database server. Paperless supports PostgreSQL, MariaDB - Optional: A database server. Paperless supports PostgreSQL, MariaDB
and SQLite for storing its data. and SQLite for storing its data.