{"config":{"lang":["en"],"separator":"[\\s\\-]+","pipeline":["stopWordFilter"]},"docs":[{"location":"","title":"Home","text":"
Paperless-ngx is a community-supported open-source document management system that transforms your physical documents into a searchable online archive so you can keep, well, less paper.
Get started Demo
"},{"location":"#features","title":"Features","text":"
Paperless-ngx is the official successor to the original Paperless & Paperless-ng projects and is designed to distribute the responsibility of advancing and supporting the project among a team of people. Consider joining us!
Further discussion of the transition between these projects can be found at ng#1599 and ng#1632.
"},{"location":"#screenshots","title":"Screenshots","text":"Paperless-ngx aims to be as nice to use as it is useful. Check out some screenshots below.
The dashboard shows saved views which can be sorted. Documents can be uploaded with the button or dropped anywhere in the application.
The document list provides three different styles to browse your documents.
Use the 'slim' sidebar to focus on your docs and minimize the UI.
Of course, Paperless-ngx also supports dark mode:
Quickly find documents with extensive filtering mechanisms.
And perform bulk edit operations to set tags, correspondents, etc. as well as permissions.
Side-by-side editing of documents.
Support for custom fields.
A robust permissions system with support for 'global' and document / object permissions.
Searching provides auto complete and highlights the results.
Tag, correspondent, document type and storage path editing.
Mail rules support various filters and actions for incoming e-mails.
Workflows provide finer control over the document pipeline and trigger actions.
Mobile devices are supported.
"},{"location":"#support","title":"Support","text":"
Community support is available via GitHub Discussions and the Matrix chat room.
"},{"location":"#feature-requests","title":"Feature Requests","text":"Feature requests can be submitted via GitHub Discussions where you can search for existing ideas, add your own and vote for the ones you care about.
"},{"location":"#bugs","title":"Bugs","text":"For bugs please open an issue or start a discussion if you have questions.
"},{"location":"#contributing","title":"Contributing","text":"People interested in continuing the work on paperless-ngx are encouraged to reach out on GitHub or the Matrix chat room. If you would like to contribute to the project on an ongoing basis there are multiple teams (frontend, ci/cd, etc) that could use your help so please reach out!
"},{"location":"#translation","title":"Translation","text":"Paperless-ngx is available in many languages that are coordinated on Crowdin. If you want to help out by translating paperless-ngx into your language, please head over to the Paperless-ngx project at Crowdin, and thank you!
"},{"location":"#scanners-software","title":"Scanners & Software","text":"Paperless-ngx is compatible with many different scanners and scanning tools. A user-maintained list of scanners and other software is available on the wiki.
Office document and email consumption support is optional and provided by Apache Tika (see configuration)\u00a0\u21a9\u21a9
Multiple options exist for making backups of your paperless instance, depending on how you installed paperless.
Before making a backup, it's probably best to make sure that paperless is not actively consuming documents at that time.
Options available to any installation of paperless:
Use the document exporter. The document exporter exports all your documents, thumbnails, metadata, and database contents to a specific folder. You may import your documents and settings into a fresh instance of paperless again or store your documents in another DMS with this export.
The document exporter is also able to update an already existing export. Therefore, incremental backups with rsync
are entirely possible.
The exporter does not include API tokens and they will need to be re-generated after importing.
Caution
You cannot import the export generated with one version of paperless in a different version of paperless. The export contains an exact image of the database, and migrations may change the database layout.
Options available to docker installations:
Backup the docker volumes. These usually reside within /var/lib/docker/volumes
on the host and you need to be root in order to access them.
Paperless uses 4 volumes:
paperless_media
: This is where your documents are stored.paperless_data
: This is where auxiliary data is stored. This folder also contains the SQLite database, if you use it.paperless_pgdata
: Exists only if you use PostgreSQL and contains the database.paperless_dbdata
: Exists only if you use MariaDB and contains the database.Options available to bare-metal and non-docker installations:
Backup the entire paperless folder. This ensures that if your paperless instance crashes at some point or your disk fails, you can simply copy the folder back into place and it works.
When using PostgreSQL or MariaDB, you'll also have to backup the database.
If you've backed-up Paperless-ngx using the document exporter, restoring can simply be done with the document importer.
Of course, other backup strategies require restoring any volumes, folders and database copies you created in the steps above.
"},{"location":"administration/#updating","title":"Updating Paperless","text":""},{"location":"administration/#docker-updating","title":"Docker Route","text":"If a new release of paperless-ngx is available, upgrading depends on how you installed paperless-ngx in the first place. The releases are available at the release page.
First of all, make sure no active processes (like consumption) are running, then make a backup.
After that, ensure that paperless is stopped:
$ cd /path/to/paperless\n$ docker compose down\n
If you pull the image from the docker hub, all you need to do is:
docker compose pull\ndocker compose up\n
The Docker Compose files refer to the latest
version, which is always the latest stable release.
If you built the image yourself, do the following:
git pull\ndocker compose build\ndocker compose up\n
Running docker compose up
will also apply any new database migrations. If you see everything working, press CTRL+C once to gracefully stop paperless. Then you can start paperless-ngx with -d
to have it run in the background.
Note
In version 0.9.14, the update process was changed. In 0.9.13 and earlier, the Docker Compose files specified exact versions and pull won't automatically update to newer versions. In order to enable updates as described above, either get the new docker-compose.yml
file from here or edit the docker-compose.yml
file, find the line that says
image: ghcr.io/paperless-ngx/paperless-ngx:0.9.x\n
and replace the version with latest
:
image: ghcr.io/paperless-ngx/paperless-ngx:latest\n
Note
In version 1.7.1 and onwards, the Docker image can now be pinned to a release series. This is often combined with automatic updaters such as Watchtower to allow safer unattended upgrading to new bugfix releases only. It is still recommended to always review release notes before upgrading. To pin your install to a release series, edit the docker-compose.yml
find the line that says
image: ghcr.io/paperless-ngx/paperless-ngx:latest\n
and replace the version with the series you want to track, for example:
image: ghcr.io/paperless-ngx/paperless-ngx:1.7\n
"},{"location":"administration/#bare-metal-updating","title":"Bare Metal Route","text":"After grabbing the new release and unpacking the contents, do the following:
Update dependencies. New paperless version may require additional dependencies. The dependencies required are listed in the section about bare metal installations.
Update python requirements. Keep in mind to activate your virtual environment before that, if you use one.
pip install -r requirements.txt\n
Note
At times, some dependencies will be removed from requirements.txt. Comparing the versions and removing no longer needed dependencies will keep your system or virtual environment clean and prevent possible conflicts.
Migrate the database.
cd src\npython3 manage.py migrate # (1)\n
sudo -Hu <paperless_user>
may be requiredThis might not actually do anything. Not every new paperless version comes with new database migrations.
In general, paperless does not require a specific version of PostgreSQL or MariaDB and it is safe to update them to newer versions. However, you should always take a backup and follow the instructions from your database's documentation for how to upgrade between major versions.
For PostgreSQL, refer to Upgrading a PostgreSQL Cluster.
For MariaDB, refer to Upgrading MariaDB
You may also use the exporter and importer with the --data-only
flag, after creating a new database with the updated version of PostgreSQL or MariaDB.
Warning
You should not change any settings, especially paths, when doing this or there is a risk of data loss
"},{"location":"administration/#management-commands","title":"Management utilities","text":"Paperless comes with some management commands that perform various maintenance tasks on your paperless instance. You can invoke these commands in the following way:
With Docker Compose, while paperless is running:
$ cd /path/to/paperless\n$ docker compose exec webserver <command> <arguments>\n
With docker, while paperless is running:
$ docker exec -it <container-name> <command> <arguments>\n
Bare metal:
$ cd /path/to/paperless/src\n$ python3 manage.py <command> <arguments> # (1)\n
sudo -Hu <paperless_user>
may be requiredAll commands have built-in help, which can be accessed by executing them with the argument --help
.
The document exporter exports all your data (including your settings and database contents) from paperless into a folder for backup or migration to another DMS.
If you use the document exporter within a cronjob to backup your data you might use the -T
flag behind exec to suppress \"The input device is not a TTY\" errors. For example: docker compose exec -T webserver document_exporter ../export
document_exporter target [-c] [-d] [-f] [-na] [-nt] [-p] [-sm] [-z]\n\noptional arguments:\n-c, --compare-checksums\n-cj, --compare-json\n-d, --delete\n-f, --use-filename-format\n-na, --no-archive\n-nt, --no-thumbnail\n-p, --use-folder-prefix\n-sm, --split-manifest\n-z, --zip\n-zn, --zip-name\n--data-only\n--no-progress-bar\n--passphrase\n
target
is a folder to which the data gets written. This includes documents, thumbnails and a manifest.json
file. The manifest contains all metadata from the database (correspondents, tags, etc).
When you use the provided docker compose script, specify ../export
as the target. This path inside the container is automatically mounted on your host on the folder export
.
If the target directory already exists and contains files, paperless will assume that the contents of the export directory are a previous export and will attempt to update the previous export. Paperless will only export changed and added files. Paperless determines whether a file has changed by inspecting the file attributes \"date/time modified\" and \"size\". If that does not work out for you, specify -c
or --compare-checksums
and paperless will attempt to compare file checksums instead. This is slower. The manifest and metadata json files are always updated, unless cj
or --compare-json
is specified.
Paperless will not remove any existing files in the export directory. If you want paperless to also remove files that do not belong to the current export such as files from deleted documents, specify -d
or --delete
. Be careful when pointing paperless to a directory that already contains other files.
The filenames generated by this command follow the format [date created] [correspondent] [title].[extension]
. If you want paperless to use PAPERLESS_FILENAME_FORMAT
for exported filenames instead, specify -f
or --use-filename-format
.
If -na
or --no-archive
is provided, no archive files will be exported, only the original files.
If -nt
or --no-thumbnail
is provided, thumbnail files will not be exported.
Note
When using the -na
/--no-archive
or -nt
/--no-thumbnail
options the exporter will not output these files for backup. After importing, the sanity checker will warn about missing thumbnails and archive files until they are regenerated with document_thumbnails
or document_archiver
. It can make sense to omit these files from backup as their content and checksum can change (new archiver algorithm) and may then cause additional used space in a deduplicated backup.
If -p
or --use-folder-prefix
is provided, files will be exported in dedicated folders according to their nature: archive
, originals
, thumbnails
or json
If -sm
or --split-manifest
is provided, information about document will be placed in individual json files, instead of a single JSON file. The main manifest.json will still contain application wide information (e.g. tags, correspondent, documenttype, etc)
If -z
or --zip
is provided, the export will be a zip file in the target directory, named according to the current local date or the value set in -zn
or --zip-name
.
If --data-only
is provided, only the database will be exported. This option is intended to facilitate database upgrades without needing to clean documents and thumbnails from the media directory.
If --no-progress-bar
is provided, the progress bar will be hidden, rendering the exporter quiet. This option is useful for scripting scenarios, such as when using the exporter with crontab
.
If --passphrase
is provided, it will be used to encrypt certain fields in the export. This value must be provided to import. If this value is lost, the export cannot be imported.
Warning
If exporting with the file name format, there may be errors due to your operating system's maximum path lengths. Try adjusting the export target or consider not using the filename format.
"},{"location":"administration/#importer","title":"Document importer","text":"The document importer takes the export produced by the Document exporter and imports it into paperless.
The importer works just like the exporter. You point it at a directory, and the script does the rest of the work:
document_importer source\n
Option Required Default Description source Yes N/A The directory containing an export --no-progress-bar
No False If provided, the progress bar will be hidden --data-only
No False If provided, only import data, do not import document files or thumbnails --passphrase
No N/A If your export was encrypted with a passphrase, must be provided When you use the provided docker compose script, put the export inside the export
folder in your paperless source directory. Specify ../export
as the source
.
Note that .zip files (as can be generated from the exporter) are not supported. You must unzip them into the target directory first.
Note
Importing from a previous version of Paperless may work, but for best results it is suggested to match the versions.
Warning
The importer should be run against a completely empty installation (database and directories) of Paperless-ngx. If using a data only import, only the database must be empty.
"},{"location":"administration/#retagger","title":"Document retagger","text":"Say you've imported a few hundred documents and now want to introduce a tag or set up a new correspondent, and apply its matching to all of the currently-imported docs. This problem is common enough that there are tools for it.
document_retagger [-h] [-c] [-T] [-t] [-i] [--id-range] [--use-first] [-f]\n\noptional arguments:\n-c, --correspondent\n-T, --tags\n-t, --document_type\n-s, --storage_path\n-i, --inbox-only\n--id-range\n--use-first\n-f, --overwrite\n
Run this after changing or adding matching rules. It'll loop over all of the documents in your database and attempt to match documents according to the new rules.
Specify any combination of -c
, -T
, -t
and -s
to have the retagger perform matching of the specified metadata type. If you don't specify any of these options, the document retagger won't do anything.
Specify -i
to have the document retagger work on documents tagged with inbox tags only. This is useful when you don't want to mess with your already processed documents.
Specify --id-range 1 100
to have the document retagger work only on a specific range of document id\u00b4s. This can be useful if you have a lot of documents and want to test the matching rules only on a subset of documents.
When multiple document types or correspondents match a single document, the retagger won't assign these to the document. Specify --use-first
to override this behavior and just use the first correspondent or type it finds. This option does not apply to tags, since any amount of tags can be applied to a document.
Finally, -f
specifies that you wish to overwrite already assigned correspondents, types and/or tags. The default behavior is to not assign correspondents and types to documents that have this data already assigned. -f
works differently for tags: By default, only additional tags get added to documents, no tags will be removed. With -f
, tags that don't match a document anymore get removed as well.
The Auto matching algorithm requires a trained neural network to work. This network needs to be updated whenever something in your data changes. The docker image takes care of that automatically with the task scheduler. You can manually renew the classifier by invoking the following management command:
document_create_classifier\n
This command takes no arguments.
"},{"location":"administration/#thumbnails","title":"Document thumbnails","text":"Use this command to re-create document thumbnails. Optionally include the --document {id}
option to generate thumbnails for a specific document only.
You may also specify --processes
to control the number of processes used to generate new thumbnails. The default is to utilize a quarter of the available processors.
document_thumbnails\n
"},{"location":"administration/#index","title":"Managing the document search index","text":"The document search index is responsible for delivering search results for the website. The document index is automatically updated whenever documents get added to, changed, or removed from paperless. However, if the search yields non-existing documents or won't find anything, you may need to recreate the index manually.
document_index {reindex,optimize}\n
Specify reindex
to have the index created from scratch. This may take some time.
Specify optimize
to optimize the index. This updates certain aspects of the index and usually makes queries faster and also ensures that the autocompletion works properly. This command is regularly invoked by the task scheduler.
If you use paperless' feature to assign custom filenames to your documents, you can use this command to move all your files after changing the naming scheme.
Warning
Since this command moves your documents, it is advised to do a backup beforehand. The renaming logic is robust and will never overwrite or delete a file, but you can't ever be careful enough.
document_renamer\n
The command takes no arguments and processes all your documents at once.
Learn how to use Management Utilities.
"},{"location":"administration/#sanity-checker","title":"Sanity checker","text":"Paperless has a built-in sanity checker that inspects your document collection for issues.
The issues detected by the sanity checker are as follows:
document_sanity_checker\n
The command takes no arguments. Depending on the size of your document archive, this may take some time.
"},{"location":"administration/#fetching-e-mail","title":"Fetching e-mail","text":"Paperless automatically fetches your e-mail every 10 minutes by default. If you want to invoke the email consumer manually, call the following management command:
mail_fetcher\n
The command takes no arguments and processes all your mail accounts and rules.
Tip
To use OAuth access tokens for mail fetching, select the box to indicate the password is actually a token when creating or editing a mail account. The details for creating a token depend on your email provider.
"},{"location":"administration/#archiver","title":"Creating archived documents","text":"Paperless stores archived PDF/A documents alongside your original documents. These archived documents will also contain selectable text for image-only originals. These documents are derived from the originals, which are always stored unmodified. If coming from an earlier version of paperless, your documents won't have archived versions.
This command creates PDF/A documents for your documents.
document_archiver --overwrite --document <id>\n
This command will only attempt to create archived documents when no archived document exists yet, unless --overwrite
is specified. If --document <id>
is specified, the archiver will only process that document.
Note
This command essentially performs OCR on all your documents again, according to your settings. If you run this with PAPERLESS_OCR_MODE=redo
, it will potentially run for a very long time. You can cancel the command at any time, since this command will skip already archived versions the next time it is run.
Note
Some documents will cause errors and cannot be converted into PDF/A documents, such as encrypted PDF documents. The archiver will skip over these documents each time it sees them.
"},{"location":"administration/#encryption","title":"Managing encryption","text":"Warning
Encryption was removed in paperless-ng 0.9 because it did not really provide any additional security, the passphrase was stored in a configuration file on the same system as the documents. Furthermore, the entire text content of the documents is stored plain in the database, even if your documents are encrypted. Filenames are not encrypted as well. Finally, the web server provides transparent access to your encrypted documents.
Consider running paperless on an encrypted filesystem instead, which will then at least provide security against physical hardware theft.
"},{"location":"administration/#enabling-encryption","title":"Enabling encryption","text":"Enabling encryption is no longer supported.
"},{"location":"administration/#disabling-encryption","title":"Disabling encryption","text":"Basic usage to disable encryption of your document store:
(Note: If PAPERLESS_PASSPHRASE
isn't set already, you need to specify it here)
decrypt_documents [--passphrase SECR3TP4SSPHRA$E]\n
"},{"location":"administration/#fuzzy_duplicate","title":"Detecting duplicates","text":"Paperless already catches and prevents upload of exactly matching documents, however a new scan of an existing document may not produce an exact bit for bit duplicate. But the content should be exact or close, allowing detection.
This tool does a fuzzy match over document content, looking for those which look close according to a given ratio.
At this time, other metadata (such as correspondent or type) is not taken into account by the detection.
document_fuzzy_match [--ratio] [--processes N]\n
Option Required Default Description --ratio No 85.0 a number between 0 and 100, setting how similar a document must be for it to be reported. Higher numbers mean more similarity. --processes No 1/4 of system cores Number of processes to use for matching. Setting 1 disables multiple processes --delete No False If provided, one document of a matched pair above the ratio will be deleted. Warning
If providing the --delete
option, it is highly recommended to have a backup. While every effort has been taken to ensure proper operation, there is always the chance of deletion of a file you want to keep.
If the audit log is enabled Paperless-ngx keeps an audit log of all changes made to documents. Functionality to automatically remove entries for deleted documents was added but entries created prior to this are not removed. This command allows you to prune the audit log of entries that are no longer needed.
prune_audit_logs\n
"},{"location":"administration/#create-superuser","title":"Create superuser","text":"If you need to create a superuser, use the following command:
createsuperuser\n
"},{"location":"advanced_usage/","title":"Advanced Topics","text":"Paperless offers a couple of features that automate certain tasks and make your life easier.
"},{"location":"advanced_usage/#matching","title":"Matching tags, correspondents, document types, and storage paths","text":"Paperless will compare the matching algorithms defined by every tag, correspondent, document type, and storage path in your database to see if they apply to the text in a document. In other words, if you define a tag called Home Utility
that had a match
property of bc hydro
and a matching_algorithm
of Exact
, Paperless will automatically tag your newly-consumed document with your Home Utility
tag so long as the text bc hydro
appears in the body of the document somewhere.
The matching logic is quite powerful. It supports searching the text of your document with different algorithms, and as such, some experimentation may be necessary to get things right.
In order to have a tag, correspondent, document type, or storage path assigned automatically to newly consumed documents, assign a match and matching algorithm using the web interface. These settings define when to assign tags, correspondents, document types, and storage paths to documents.
The following algorithms are available:
Bank1 Bank2
, it will match documents containing either of these terms.When using the any or all matching algorithms, you can search for terms that consist of multiple words by enclosing them in double quotes. For example, defining a match text of \"Bank of America\" BofA
using the any algorithm, will match documents that contain either \"Bank of America\" or \"BofA\", but will not match documents containing \"Bank of South America\".
Then just save your tag, correspondent, document type, or storage path and run another document through the consumer. Once complete, you should see the newly-created document, automatically tagged with the appropriate data.
"},{"location":"advanced_usage/#automatic-matching","title":"Automatic matching","text":"Paperless-ngx comes with a new matching algorithm called Auto. This matching algorithm tries to assign tags, correspondents, document types, and storage paths to your documents based on how you have already assigned these on existing documents. It uses a neural network under the hood.
If, for example, all your bank statements of your account 123 at the Bank of America are tagged with the tag \"bofa123\" and the matching algorithm of this tag is set to Auto, this neural network will examine your documents and automatically learn when to assign this tag.
Paperless tries to hide much of the involved complexity with this approach. However, there are a couple caveats you need to keep in mind when using this feature:
Sometimes you may want to do something arbitrary whenever a document is consumed. Rather than try to predict what you may want to do, Paperless lets you execute scripts of your own choosing just before or after a document is consumed using a couple of simple hooks.
Just write a script, put it somewhere that Paperless can read & execute, and then put the path to that script in paperless.conf
or docker-compose.env
with the variable name of either PAPERLESS_PRE_CONSUME_SCRIPT
or PAPERLESS_POST_CONSUME_SCRIPT
.
Info
These scripts are executed in a blocking process, which means that if a script takes a long time to run, it can significantly slow down your document consumption flow. If you want things to run asynchronously, you'll have to fork the process in your script and exit.
"},{"location":"advanced_usage/#pre-consume-script","title":"Pre-consumption script","text":"Executed after the consumer sees a new document in the consumption folder, but before any processing of the document is performed. This script can access the following relevant environment variables set:
Environment Variable DescriptionDOCUMENT_SOURCE_PATH
Original path of the consumed document DOCUMENT_WORKING_PATH
Path to a copy of the original that consumption will work on TASK_ID
UUID of the task used to process the new document (if any) Note
Pre-consume scripts which modify the document should only change the DOCUMENT_WORKING_PATH
file or a second consume task may be triggered, leading to failures as two tasks work on the same document path
Warning
If your script modifies DOCUMENT_WORKING_PATH
in a non-deterministic way, this may allow duplicate documents to be stored
A simple but common example for this would be creating a simple script like this:
/usr/local/bin/ocr-pdf
#!/usr/bin/env bash\npdf2pdfocr.py -i ${DOCUMENT_WORKING_PATH}\n
/etc/paperless.conf
...\nPAPERLESS_PRE_CONSUME_SCRIPT=\"/usr/local/bin/ocr-pdf\"\n...\n
This will pass the path to the document about to be consumed to /usr/local/bin/ocr-pdf
, which will in turn call pdf2pdfocr.py on your document, which will then overwrite the file with an OCR'd version of the file and exit. At which point, the consumption process will begin with the newly modified file.
The script's stdout and stderr will be logged line by line to the webserver log, along with the exit code of the script.
"},{"location":"advanced_usage/#post-consume-script","title":"Post-consumption script","text":"Executed after the consumer has successfully processed a document and has moved it into paperless. It receives the following environment variables:
Environment Variable DescriptionDOCUMENT_ID
Database primary key of the document DOCUMENT_FILE_NAME
Formatted filename, not including paths DOCUMENT_CREATED
Date & time when document created DOCUMENT_MODIFIED
Date & time when document was last modified DOCUMENT_ADDED
Date & time when document was added DOCUMENT_SOURCE_PATH
Path to the original document file DOCUMENT_ARCHIVE_PATH
Path to the generate archive file (if any) DOCUMENT_THUMBNAIL_PATH
Path to the generated thumbnail DOCUMENT_DOWNLOAD_URL
URL for document download DOCUMENT_THUMBNAIL_URL
URL for the document thumbnail DOCUMENT_OWNER
Username of the document owner (if any) DOCUMENT_CORRESPONDENT
Assigned correspondent (if any) DOCUMENT_TAGS
Comma separated list of tags applied (if any) DOCUMENT_ORIGINAL_FILENAME
Filename of original document TASK_ID
Task UUID used to import the document (if any) The script can be in any language, A simple shell script example:
post-consumption-example#!/usr/bin/env bash\n\necho \"\n\nA document with an id of ${DOCUMENT_ID} was just consumed. I know the\nfollowing additional information about it:\n\n* Generated File Name: ${DOCUMENT_FILE_NAME}\n* Archive Path: ${DOCUMENT_ARCHIVE_PATH}\n* Source Path: ${DOCUMENT_SOURCE_PATH}\n* Created: ${DOCUMENT_CREATED}\n* Added: ${DOCUMENT_ADDED}\n* Modified: ${DOCUMENT_MODIFIED}\n* Thumbnail Path: ${DOCUMENT_THUMBNAIL_PATH}\n* Download URL: ${DOCUMENT_DOWNLOAD_URL}\n* Thumbnail URL: ${DOCUMENT_THUMBNAIL_URL}\n* Owner Name: ${DOCUMENT_OWNER}\n* Correspondent: ${DOCUMENT_CORRESPONDENT}\n* Tags: ${DOCUMENT_TAGS}\n\nIt was consumed with the passphrase ${PASSPHRASE}\n\n\"\n
Note
The post consumption script cannot cancel the consumption process.
Warning
The post consumption script should not modify the document files directly.
The script's stdout and stderr will be logged line by line to the webserver log, along with the exit code of the script.
"},{"location":"advanced_usage/#docker-consume-hooks","title":"Docker","text":"To hook into the consumption process when using Docker, you will need to pass the scripts into the container via a host mount in your docker-compose.yml
.
Assuming you have /home/paperless-ngx/scripts/post-consumption-example.sh
as a script which you'd like to run.
You can pass that script into the consumer container via a host mount:
...\nwebserver:\n ...\n volumes:\n ...\n - /home/paperless-ngx/scripts:/path/in/container/scripts/ # (1)!\n environment: # (3)!\n ...\n PAPERLESS_POST_CONSUME_SCRIPT: /path/in/container/scripts/post-consumption-example.sh # (2)!\n...\n
docker-compose.env
Troubleshooting:
cd ~/paperless-ngx; docker compose logs -f
sudo chmod 755 post-consumption-example.sh
echo \"${DOCUMENT_ID}\" | tee --append /usr/src/paperless/scripts/post-consumption-example.log
By default, paperless stores your documents in the media directory and renames them using the identifier which it has assigned to each document. You will end up getting files like 0000123.pdf
in your media directory. This isn't necessarily a bad thing, because you normally don't have to access these files manually. However, if you wish to name your files differently, you can do that by adjusting the PAPERLESS_FILENAME_FORMAT
configuration option or using storage paths (see below). Paperless adds the correct file extension e.g. .pdf
, .jpg
automatically.
This variable allows you to configure the filename (folders are allowed) using placeholders. For example, configuring this to
PAPERLESS_FILENAME_FORMAT={{ created_year }}/{{ correspondent }}/{{ title }}\n
will create a directory structure as follows:
2019/\n My bank/\n Statement January.pdf\n Statement February.pdf\n2020/\n My bank/\n Statement January.pdf\n Letter.pdf\n Letter_01.pdf\n Shoe store/\n My new shoes.pdf\n
Warning
Do not manually move your files in the media folder. Paperless remembers the last filename a document was stored as. If you do rename a file, paperless will report your files as missing and won't be able to find them.
Tip
Paperless checks the filename of a document whenever it is saved. Changing (or deleting) a storage path will automatically be reflected in the file system. However, when changing PAPERLESS_FILENAME_FORMAT
you will need to manually run the document renamer
to move any existing documents.
Paperless provides the following variables for use within filenames:
{{ asn }}
: The archive serial number of the document, or \"none\".{{ correspondent }}
: The name of the correspondent, or \"none\".{{ document_type }}
: The name of the document type, or \"none\".{{ tag_list }}
: A comma separated list of all tags assigned to the document.{{ title }}
: The title of the document.{{ created }}
: The full date (ISO 8601 format, e.g. 2024-03-14
) the document was created.{{ created_year }}
: Year created only, formatted as the year with century.{{ created_year_short }}
: Year created only, formatted as the year without century, zero padded.{{ created_month }}
: Month created only (number 01-12).{{ created_month_name }}
: Month created name, as per locale{{ created_month_name_short }}
: Month created abbreviated name, as per locale{{ created_day }}
: Day created only (number 01-31).{{ added }}
: The full date (ISO format) the document was added to paperless.{{ added_year }}
: Year added only.{{ added_year_short }}
: Year added only, formatted as the year without century, zero padded.{{ added_month }}
: Month added only (number 01-12).{{ added_month_name }}
: Month added name, as per locale{{ added_month_name_short }}
: Month added abbreviated name, as per locale{{ added_day }}
: Day added only (number 01-31).{{ owner_username }}
: Username of document owner, if any, or \"none\"{{ original_name }}
: Document original filename, minus the extension, if any, or \"none\"{{ doc_pk }}
: The paperless identifier (primary key) for the document.Warning
When using file name placeholders, in particular when using {tag_list}
, you may run into the limits of your operating system's maximum path lengths. In that case, files will retain the previous path instead and the issue logged.
Tip
These variables are all simple strings, but the format can be a full template. See Filename Templates for even more advanced formatting.
Paperless will try to conserve the information from your database as much as possible. However, some characters that you can use in document titles and correspondent names (such as : \\ /
and a couple more) are not allowed in filenames and will be replaced with dashes.
If paperless detects that two documents share the same filename, paperless will automatically append _01
, _02
, etc to the filename. This happens if all the placeholders in a filename evaluate to the same value.
If there are any errors in the placeholders included in PAPERLESS_FILENAME_FORMAT
, paperless will fall back to using the default naming scheme instead.
Caution
As of now, you could potentially tell paperless to store your files anywhere outside the media directory by setting
PAPERLESS_FILENAME_FORMAT=../../my/custom/location/{title}\n
However, keep in mind that inside docker, if files get stored outside of the predefined volumes, they will be lost after a restart.
"},{"location":"advanced_usage/#empty-placeholders","title":"Empty placeholders","text":"You can affect how empty placeholders are treated by changing the PAPERLESS_FILENAME_FORMAT_REMOVE_NONE
setting.
Enabling this results in all empty placeholders resolving to \"\" instead of \"none\" as stated above. Spaces before empty placeholders are removed as well, empty directories are omitted.
"},{"location":"advanced_usage/#storage-paths","title":"Storage paths","text":"When a single storage layout is not sufficient for your use case, storage paths allow for more complex structure to set precisely where each document is stored in the file system.
PAPERLESS_FILENAME_FORMAT
and follows the rules described aboveFor example, you could define the following two storage paths:
year/correspondent
By Year = {{ created_year }}/{{ correspondent }}/{{ title }}\nInsurances = Insurances/{{ correspondent }}/{{ created_year }}-{{ created_month }}-{{ created_day }} {{ title }}\n
If you then map these storage paths to the documents, you might get the following result. For simplicity, By Year
defines the same structure as in the previous example above.
2019/ # By Year\n My bank/\n Statement January.pdf\n Statement February.pdf\n\nInsurances/ # Insurances\n Healthcare 123/\n 2022-01-01 Statement January.pdf\n 2022-02-02 Letter.pdf\n 2022-02-03 Letter.pdf\n Dental 456/\n 2021-12-01 New Conditions.pdf\n
Tip
Defining a storage path is optional. If no storage path is defined for a document, the global PAPERLESS_FILENAME_FORMAT
is applied.
The filename formatting uses Jinja templates to build the filename. This allows for complex logic to be included in the format, including logical structures and filters to manipulate the variables provided. The template is provided as a string, potentially multiline, and rendered into a single line.
In addition, the entire Document instance is available to be utilized in a more advanced way, as well as some variables which only make sense to be accessed with more complex logic.
"},{"location":"advanced_usage/#additional-variables","title":"Additional Variables","text":"{{ tag_name_list }}
: A list of tag names applied to the document, ordered by the tag name. Note this is a list, not a single string{{ custom_fields }}
: A mapping of custom field names to their type and value. A user can access the mapping by field name or check if a field is applied by checking its existence in the variable.Tip
To access a custom field which has a space in the name, use the get_cf_value
filter. See the examples below. This helps get fields by name and handle a default value if the named field is not attached to a Document.
This example will construct a path based on the archive serial number range:
somepath/\n{% if document.archive_serial_number >= 0 and document.archive_serial_number <= 200 %}\n asn-000-200/{{title}}\n{% elif document.archive_serial_number >= 201 and document.archive_serial_number <= 400 %}\n asn-201-400\n {% if document.archive_serial_number >= 201 and document.archive_serial_number < 300 %}\n /asn-2xx\n {% elif document.archive_serial_number >= 300 and document.archive_serial_number < 400 %}\n /asn-3xx\n {% endif %}\n{% endif %}\n/{{ title }}\n
For a document with an ASN of 205, it would result in somepath/asn-201-400/asn-2xx/Title.pdf
, but a document with an ASN of 355 would be placed in somepath/asn-201-400/asn-3xx/Title.pdf
.
{% if document.mime_type == \"application/pdf\" %}\n pdfs\n{% elif document.mime_type == \"image/png\" %}\n pngs\n{% else %}\n others\n{% endif %}\n/{{ title }}\n
For a PDF document, it would result in pdfs/Title.pdf
, but for a PNG document, the path would be pngs/Title.png
.
To use custom fields:
{% if \"Invoice\" in custom_fields %}\n invoices/{{ custom_fields.Invoice.value }}\n{% else %}\n not-invoices/{{ title }}\n{% endif %}\n
If the document has a custom field named \"Invoice\" with a value of 123, it would be filed into the invoices/123.pdf
, but a document without the custom field would be filed to not-invoices/Title.pdf
If the custom field is named \"Invoice Number\", you would access the value of it via the get_cf_value
filter due to quirks of the Django Template Language:
\"invoices/{{ custom_fields|get_cf_value('Invoice Number') }}\"\n
You can also use a custom datetime
filter to format dates:
invoices/\n{{ custom_fields|get_cf_value(\"Date Field\",\"2024-01-01\")|datetime('%Y') }}/\n{{ custom_fields|get_cf_value(\"Date Field\",\"2024-01-01\")|datetime('%m') }}/\n{{ custom_fields|get_cf_value(\"Date Field\",\"2024-01-01\")|datetime('%d') }}/\nInvoice_{{ custom_fields|get_cf_value(\"Select Field\") }}_{{ custom_fields|get_cf_value(\"Date Field\",\"2024-01-01\")|replace(\"-\", \"\") }}.pdf\n
This will create a path like invoices/2022/01/01/Invoice_OptionTwo_20220101.pdf
if the custom field \"Date Field\" is set to January 1, 2022 and \"Select Field\" is set to OptionTwo
.
You can also use a custom slugify
filter to slufigy text:
{{ title | slugify }}\n
"},{"location":"advanced_usage/#pdf-recovery","title":"Automatic recovery of invalid PDFs","text":"Paperless will attempt to \"clean\" certain invalid PDFs with qpdf
before processing if, for example, the mime_type detection is incorrect. This can happen if the PDF is not properly formatted or contains errors.
The monitoring tool Flower can be used to view more detailed information about the health of the celery workers used for asynchronous tasks. This includes details on currently running, queued and completed tasks, timing and more. Flower can also be used with Prometheus, as it exports metrics. For details on its capabilities, refer to the Flower documentation.
Flower can be enabled with the setting PAPERLESS_ENABLE_FLOWER. To configure Flower further, create a flowerconfig.py
and place it into the src/paperless
directory. For a Docker installation, you can use volumes to accomplish this:
services:\n # ...\n webserver:\n environment:\n - PAPERLESS_ENABLE_FLOWER\n ports:\n - 5555:5555 # (2)!\n # ...\n volumes:\n - /path/to/my/flowerconfig.py:/usr/src/paperless/src/paperless/flowerconfig.py:ro # (1)!\n
:ro
tag means the file will be mounted as read only.The Docker image includes the ability to run custom user scripts during startup. This could be utilized for installing additional tools or Python packages, for example. Scripts are expected to be shell scripts.
To utilize this, mount a folder containing your scripts to the custom initialization directory, /custom-cont-init.d
and place scripts you wish to run inside. For security, the folder must be owned by root
and should have permissions of a=rx
. Additionally, scripts must only be writable by root
.
Your scripts will be run directly before the webserver completes startup. Scripts will be run by the root
user. If you would like to switch users, the utility gosu
is available and preferred over sudo
.
This is an advanced functionality with which you could break functionality or lose data. If you experience issues, please disable any custom scripts and try again before reporting an issue.
For example, using Docker Compose:
services:\n # ...\n webserver:\n # ...\n volumes:\n - /path/to/my/scripts:/custom-cont-init.d:ro # (1)!\n
:ro
tag means the folder will be mounted as read only. This is for extra security against changesThe database interface does not provide a method to configure a MySQL database to be case-sensitive. A case-insensitive database prevents a user from creating a tag Name
and NAME
as they are considered the same.
However, there is a downside to turning on case sensitivity, as it also makes searches case-sensitive, so for example a document with the title Invoice
won't be found when searching for invoice
.
Per Django documentation, making a database case-sensitive requires manual intervention. To enable case sensitive tables, you can execute the following command against each table:
ALTER TABLE <table_name> CONVERT TO CHARACTER SET utf8mb4 COLLATE utf8mb4_bin;
You can also set the default for new tables (this does NOT affect existing tables) with:
ALTER DATABASE <db_name> CHARACTER SET utf8mb4 COLLATE utf8mb4_bin;
Warning
Using mariadb version 10.4+ is recommended. Using the utf8mb3
character set on an older system may fix issues that can arise while setting up Paperless-ngx but utf8mb3
can cause issues with consumption (where utf8mb4
does not).
For more information on this topic, you can refer to this Django issue.
"},{"location":"advanced_usage/#missing-timezones","title":"Missing timezones","text":"MySQL as well as MariaDB do not have any timezone information by default (though some docker images such as the official MariaDB image take care of this for you) which will cause unexpected behavior with date-based queries.
To fix this, execute one of the following commands:
MySQL: mysql_tzinfo_to_sql /usr/share/zoneinfo | mysql -u root mysql -p
MariaDB: mariadb-tzinfo-to-sql /usr/share/zoneinfo | mariadb -u root mysql -p
Paperless is able to utilize barcodes for automatically performing some tasks.
At this time, the library utilized for detection of barcodes supports the following types:
You may check for updates on the zbar library homepage. For usage in Paperless, the type of barcode does not matter, only the contents of it.
For how to enable barcode usage, see the configuration. The two settings may be enabled independently, but do have interactions as explained below.
"},{"location":"advanced_usage/#document-splitting","title":"Document Splitting","text":"When enabled, Paperless will look for a barcode with the configured value and create a new document starting from the next page. The page with the barcode on it will not be retained. It is expected to be a page existing only for triggering the split.
"},{"location":"advanced_usage/#archive-serial-number-assignment","title":"Archive Serial Number Assignment","text":"When enabled, the value of the barcode (as an integer) will be used to set the document's archive serial number, allowing quick reference back to the original, paper document.
If document splitting via barcode is also enabled, documents will be split when an ASN barcode is located. However, differing from the splitting, the page with the barcode will be retained. This allows application of a barcode to any page, including one which holds data to keep in the document.
"},{"location":"advanced_usage/#tag-assignment","title":"Tag Assignment","text":"When enabled, Paperless will parse barcodes and attempt to interpret and assign tags.
See the relevant settings PAPERLESS_CONSUMER_ENABLE_TAG_BARCODE
and PAPERLESS_CONSUMER_TAG_BARCODE_MAPPING
for more information.
Note
If your scanner supports double-sided scanning natively, you do not need this feature.
This feature is turned off by default, see configuration on how to turn it on.
"},{"location":"advanced_usage/#summary","title":"Summary","text":"If you have a scanner with an automatic document feeder (ADF) that only scans a single side, this feature makes scanning double-sided documents much more convenient by automatically collating two separate scans into one document, reordering the pages as necessary.
"},{"location":"advanced_usage/#usage-example","title":"Usage example","text":"Suppose you have a double-sided document with 6 pages (3 sheets of paper). First, put the stack into your ADF as normal, ensuring that page 1 is scanned first. Your ADF will now scan pages 1, 3, and 5. Then you (or your scanner, if it supports it) upload the scan into the correct sub-directory of the consume folder (double-sided
by default; keep in mind that Paperless will not automatically create the directory for you.) Paperless will then process the scan and move it into an internal staging area.
The next step is to turn your stack upside down (without reordering the sheets of paper), and scan it once again, your ADF will now scan pages 6, 4, and 2, in that order. Once this scan is copied into the sub-directory, Paperless will collate the previous scan with the new one, reversing the order of the pages on the second, \"even numbered\" scan. The resulting document will have the pages 1-6 in the correct order, and this new file will then be processed as normal.
Tip
When scanning the even numbered pages, you can omit the last empty pages, if there are any. For example, if page 6 is empty, you only need to scan pages 2 and 4. Do not omit empty pages in the middle of the document.
"},{"location":"advanced_usage/#things-that-could-go-wrong","title":"Things that could go wrong","text":"Paperless will notice when the first, \"odd numbered\" scan has less pages than the second scan (this can happen when e.g. the ADF skipped a few pages in the first pass). In that case, Paperless will remove the staging copy as well as the scan, and give you an error message asking you to restart the process from scratch, by scanning the odd pages again, followed by the even pages.
It's important that the scan files get consumed in the correct order, and one at a time. You therefore need to make sure that Paperless is running while you upload the files into the directory; and if you're using polling, make sure that CONSUMER_POLLING
is set to a value lower than it takes for the second scan to appear, like 5-10 or even lower.
Another thing that might happen is that you start a double sided scan, but then forget to upload the second file. To avoid collating the wrong documents if you then come back a day later to scan a new double-sided document, Paperless will only keep an \"odd numbered pages\" file for up to 30 minutes. If more time passes, it will consider the next incoming scan a completely new \"odd numbered pages\" one. The old staging file will get discarded.
"},{"location":"advanced_usage/#interaction-with-subdirs-as-tags","title":"Interaction with \"subdirs as tags\"","text":"The collation feature can be used together with the subdirs as tags feature (but this is not a requirement). Just create a correctly named double-sided subdir in the hierarchy and upload your scans there. For example, both double-sided/foo/bar
as well as foo/bar/double-sided
will cause the collated document to be treated as if it were uploaded into foo/bar
and receive both foo
and bar
tags, but not double-sided
.
You can use the document splitting feature, but if you use a normal single-sided split marker page, the split document(s) will have an empty page at the front (or whatever else was on the backside of the split marker page.) You can work around that by having a split marker page that has the split barcode on both sides. This way, the extra page will get automatically removed.
"},{"location":"advanced_usage/#sso-and-third-party-authentication-with-paperless-ngx","title":"SSO and third party authentication with Paperless-ngx","text":"Paperless-ngx has a built-in authentication system from Django but you can easily integrate an external authentication solution using one of the following methods:
"},{"location":"advanced_usage/#remote-user-authentication","title":"Remote User authentication","text":"This is a simple option that uses remote user authentication made available by certain SSO applications. See the relevant configuration options for more information: PAPERLESS_ENABLE_HTTP_REMOTE_USER, PAPERLESS_HTTP_REMOTE_USER_HEADER_NAME and PAPERLESS_LOGOUT_REDIRECT_URL
"},{"location":"advanced_usage/#openid-connect-and-social-authentication","title":"OpenID Connect and social authentication","text":"Version 2.5.0 of Paperless-ngx added support for integrating other authentication systems via the django-allauth package. Once set up, users can either log in or (optionally) sign up using any third party systems you integrate. See the relevant configuration settings and django-allauth docs for more information.
To associate an existing Paperless-ngx account with a social account, first login with your regular credentials and then choose \"My Profile\" from the user dropdown in the app and you will see options to connect social account(s). If enabled, signup options will be available on the login page.
As an example, to set up login via Github, the following environment variables would need to be set:
PAPERLESS_APPS=\"allauth.socialaccount.providers.github\"\nPAPERLESS_SOCIALACCOUNT_PROVIDERS='{\"github\": {\"APPS\": [{\"provider_id\": \"github\",\"name\": \"Github\",\"client_id\": \"<CLIENT_ID>\",\"secret\": \"<CLIENT_SECRET>\"}]}}'\n
Or, to use OpenID Connect (\"OIDC\"), via Keycloak in this example:
PAPERLESS_APPS=\"allauth.socialaccount.providers.openid_connect\"\nPAPERLESS_SOCIALACCOUNT_PROVIDERS='\n{\"openid_connect\": {\"APPS\": [{\"provider_id\": \"keycloak\",\"name\": \"Keycloak\",\"client_id\": \"paperless\",\"secret\": \"<CLIENT_SECRET>\",\"settings\": { \"server_url\": \"https://<KEYCLOAK_SERVER>/realms/<REALM>/.well-known/openid-configuration\"}}]}}'\n
More details about configuration option for various providers can be found in the allauth documentation.
"},{"location":"advanced_usage/#disabling-regular-login","title":"Disabling Regular Login","text":"Once external auth is set up, 'regular' login can be disabled with the PAPERLESS_DISABLE_REGULAR_LOGIN setting and / or users can be automatically redirected with the PAPERLESS_REDIRECT_LOGIN_TO_SSO setting.
"},{"location":"advanced_usage/#gpg-decryptor","title":"Decryption of encrypted emails before consumption","text":"Paperless-ngx can be configured to decrypt gpg encrypted emails before consumption.
"},{"location":"advanced_usage/#requirements","title":"Requirements","text":"You need a recent version of gpg-agent >= 2.1.1
installed on your host. Your host needs to be setup for decrypting your emails via gpg-agent
, see this tutorial for instance. Test your setup and make sure that you can encrypt and decrypt files using your key
gpg --encrypt --armor -r person@email.com name_of_file\ngpg --decrypt name_of_file.asc\n
"},{"location":"advanced_usage/#setup","title":"Setup","text":"First, enable the PAPERLESS_ENABLE_GPG_DECRYPTOR environment variable.
Then determine your local gpg-agent
socket by invoking
gpgconf --list-dir agent-socket\n
on your host. A possible output is ~/.gnupg/S.gpg-agent
. Also find the location of your public keyring.
If using docker, you'll need to add the following volume mounts to your docker-compose.yml
file:
webserver:\n volumes:\n - /home/user/.gnupg/pubring.gpg:/usr/src/paperless/.gnupg/pubring.gpg\n - <path to gpg-agent socket>:/usr/src/paperless/.gnupg/S.gpg-agent\n
For a 'bare-metal' installation no further configuration is necessary. If you want to use a separate GNUPG_HOME
, you can do so by configuring the PAPERLESS_EMAIL_GNUPG_HOME environment variable.
gpg-agent
is running on your host machinegpg
commands from above./usr/src/paperless/.gnupg
have correct permissionspaperless@9da1865df327:~/.gnupg$ ls -al\ndrwx------ 1 paperless paperless 4096 Aug 18 17:52 .\ndrwxr-xr-x 1 paperless paperless 4096 Aug 18 17:52 ..\nsrw------- 1 paperless paperless 0 Aug 18 17:22 S.gpg-agent\n-rw------- 1 paperless paperless 147940 Jul 24 10:23 pubring.gpg\n
"},{"location":"api/","title":"The REST API","text":"Paperless-ngx now ships with a fully-documented REST API and a browsable web interface to explore it. The API browsable interface is available at /api/schema/view/
.
Further documentation is provided here for some endpoints and features.
"},{"location":"api/#authorization","title":"Authorization","text":"The REST api provides four different forms of authentication.
Basic authentication
Authorize by providing a HTTP header in the form
Authorization: Basic <credentials>\n
where credentials
is a base64-encoded string of <username>:<password>
Session authentication
When you're logged into paperless in your browser, you're automatically logged into the API as well and don't need to provide any authorization headers.
Token authentication
You can create (or re-create) an API token by opening the \"My Profile\" link in the user dropdown found in the web UI and clicking the circular arrow button.
Paperless also offers an endpoint to acquire authentication tokens.
POST a username and password as a form or json string to /api/token/
and paperless will respond with a token, if the login data is correct. This token can be used to authenticate other requests with the following HTTP header:
Authorization: Token <token>\n
Tokens can also be managed in the Django admin.
Remote User authentication
If enabled (see configuration), you can authenticate against the API using Remote User auth.
Full text searching is available on the /api/documents/
endpoint. Two specific query parameters cause the API to return full text search results:
/api/documents/?query=your%20search%20query
: Search for a document using a full text query. For details on the syntax, see Basic Usage - Searching./api/documents/?more_like_id=1234
: Search for documents similar to the document with id 1234.Pagination works exactly the same as it does for normal requests on this endpoint.
Furthermore, each returned document has an additional __search_hit__
attribute with various information about the search results:
{\n \"count\": 31,\n \"next\": \"http://localhost:8000/api/documents/?page=2&query=test\",\n \"previous\": null,\n \"results\": [\n\n ...\n\n {\n \"id\": 123,\n \"title\": \"title\",\n \"content\": \"content\",\n\n ...\n\n \"__search_hit__\": {\n \"score\": 0.343,\n \"highlights\": \"text <span class=\"match\">Test</span> text\",\n \"rank\": 23\n }\n },\n\n ...\n\n ]\n}\n
score
is an indication how well this document matches the query relative to the other search results.highlights
is an excerpt from the document content and highlights the search terms with <span>
tags as shown above.rank
is the index of the search results. The first result will have rank 0.You can filter documents by their custom field values by specifying the custom_field_query
query parameter. Here are some recipes for common use cases:
Documents with a custom field \"due\" (date) between Aug 1, 2024 and Sept 1, 2024 (inclusive):
?custom_field_query=[\"due\", \"range\", [\"2024-08-01\", \"2024-09-01\"]]
Documents with a custom field \"customer\" (text) that equals \"bob\" (case sensitive):
?custom_field_query=[\"customer\", \"exact\", \"bob\"]
Documents with a custom field \"answered\" (boolean) set to true
:
?custom_field_query=[\"answered\", \"exact\", true]
Documents with a custom field \"favorite animal\" (select) set to either \"cat\" or \"dog\":
?custom_field_query=[\"favorite animal\", \"in\", [\"cat\", \"dog\"]]
Documents with a custom field \"address\" (text) that is empty:
?custom_field_query=[\"OR\", [\"address\", \"isnull\", true], [\"address\", \"exact\", \"\"]]
Documents that don't have a field called \"foo\":
?custom_field_query=[\"foo\", \"exists\", false]
Documents that have document links \"references\" to both document 3 and 7:
?custom_field_query=[\"references\", \"contains\", [3, 7]]
All field types support basic operations including exact
, in
, isnull
, and exists
. String, URL, and monetary fields support case-insensitive substring matching operations including icontains
, istartswith
, and iendswith
. Integer, float, and date fields support arithmetic comparisons including gt
(>), gte
(>=), lt
(<), lte
(<=), and range
. Lastly, document link fields support a contains
operator that behaves like a \"is superset of\" check.
/api/search/autocomplete/
","text":"Get auto completions for a partial search term.
Query parameters:
term
: The incomplete term.limit
: Amount of results. Defaults to 10.Results returned by the endpoint are ordered by importance of the term in the document index. The first result is the term that has the highest Tf/Idf score in the index.
[\"term1\", \"term3\", \"term6\", \"term4\"]\n
"},{"location":"api/#file-uploads","title":"POSTing documents","text":"The API provides a special endpoint for file uploads:
/api/documents/post_document/
POST a multipart form to this endpoint, where the form field document
contains the document that you want to upload to paperless. The filename is sanitized and then used to store the document in a temporary directory, and the consumer will be instructed to consume the document from there.
The endpoint supports the following optional form fields:
title
: Specify a title that the consumer should use for the document.created
: Specify a DateTime where the document was created (e.g. \"2016-04-19\" or \"2016-04-19 06:15:00+02:00\").correspondent
: Specify the ID of a correspondent that the consumer should use for the document.document_type
: Similar to correspondent.storage_path
: Similar to correspondent.tags
: Similar to correspondent. Specify this multiple times to have multiple tags added to the document.archive_serial_number
: An optional archive serial number to set.custom_fields
: An array of custom field ids to assign (with an empty value) to the document.The endpoint will immediately return HTTP 200 if the document consumption process was started successfully, with the UUID of the consumption task as the data. No additional status information about the consumption process itself is available immediately, since that happens in a different process. However, querying the tasks endpoint with the returned UUID e.g. /api/tasks/?task_id={uuid}
will provide information on the state of the consumption including the ID of a created document if consumption succeeded.
All objects (documents, tags, etc.) allow setting object-level permissions with optional owner
and / or a set_permissions
parameters which are of the form:
\"owner\": ...,\n\"set_permissions\": {\n \"view\": {\n \"users\": [...],\n \"groups\": [...],\n },\n \"change\": {\n \"users\": [...],\n \"groups\": [...],\n },\n}\n
Note
Arrays should contain user or group ID numbers.
If these parameters are supplied the object's permissions will be overwritten, assuming the authenticated user has permission to do so (the user must be the object owner or a superuser).
"},{"location":"api/#retrieving-full-permissions","title":"Retrieving full permissions","text":"By default, the API will return a truncated version of object-level permissions, returning user_can_change
indicating whether the current user can edit the object (either because they are the object owner or have permissions granted). You can pass the parameter full_perms=true
to API calls to view the full permissions of objects in a format that mirrors the set_permissions
parameter above.
The API supports various bulk-editing operations which are executed asynchronously.
"},{"location":"api/#documents","title":"Documents","text":"For bulk operations on documents, use the endpoint /api/documents/bulk_edit/
which accepts a json payload of the format:
{\n \"documents\": [LIST_OF_DOCUMENT_IDS],\n \"method\": METHOD, // see below\n \"parameters\": args // see below\n}\n
The following methods are supported:
set_correspondent
parameters
: { \"correspondent\": CORRESPONDENT_ID }
set_document_type
parameters
: { \"document_type\": DOCUMENT_TYPE_ID }
set_storage_path
parameters
: { \"storage_path\": STORAGE_PATH_ID }
add_tag
parameters
: { \"tag\": TAG_ID }
remove_tag
parameters
: { \"tag\": TAG_ID }
modify_tags
parameters
: { \"add_tags\": [LIST_OF_TAG_IDS] }
and { \"remove_tags\": [LIST_OF_TAG_IDS] }
delete
parameters
requiredreprocess
parameters
requiredset_permissions
parameters
:\"set_permissions\": PERMISSIONS_OBJ
(see format above) and / or\"owner\": OWNER_ID or null
\"merge\": true or false
(defaults to false)merge
flag determines if the supplied permissions will overwrite all existing permissions (including removing them) or be merged with existing permissions.merge
parameters
required.parameters
:\"metadata_document_id\": DOC_ID
apply metadata (tags, correspondent, etc.) from this document to the merged document.\"delete_originals\": true
to delete the original documents. This requires the calling user being the owner of all documents that are merged.split
parameters
:\"pages\": [..]
The list should be a list of pages and/or a ranges, separated by commas e.g. \"[1,2-3,4,5-7]\"
parameters
:\"delete_originals\": true
to delete the original document after consumption. This requires the calling user being the owner of the document.rotate
parameters
:\"degrees\": DEGREES
. Must be an integer i.e. 90, 180, 270delete_pages
parameters
:\"pages\": [..]
The list should be a list of integers e.g. \"[2,3,4]\"
modify_custom_fields
parameters
:\"add_custom_fields\": { CUSTOM_FIELD_ID: VALUE }
: JSON object consisting of custom field id:value pairs to add to the document, can also be a list of custom field IDs to add with empty values.\"remove_custom_fields\": [CUSTOM_FIELD_ID]
: custom field ids to remove from the document.Bulk editing for objects (tags, document types etc.) currently supports set permissions or delete operations, using the endpoint: /api/bulk_edit_objects/
, which requires a json payload of the format:
{\n \"objects\": [LIST_OF_OBJECT_IDS],\n \"object_type\": \"tags\", \"correspondents\", \"document_types\" or \"storage_paths\",\n \"operation\": \"set_permissions\" or \"delete\",\n \"owner\": OWNER_ID, // optional\n \"permissions\": { \"view\": { \"users\": [] ... }, \"change\": { ... } }, // (see 'set_permissions' format above)\n \"merge\": true / false // defaults to false, see above\n}\n
"},{"location":"api/#api-versioning","title":"API Versioning","text":"The REST API is versioned since Paperless-ngx 1.3.0.
API versions are specified by submitting an additional HTTP Accept
header with every request:
Accept: application/json; version=6\n
If an invalid version is specified, Paperless 1.3.0 will respond with \"406 Not Acceptable\" and an error message in the body. Earlier versions of Paperless will serve API version 1 regardless of whether a version is specified via the Accept
header.
If a client wishes to verify whether it is compatible with any given server, the following procedure should be performed:
Perform an authenticated request against any API endpoint. If the server is on version 1.3.0 or newer, the server will add two custom headers to the response:
X-Api-Version: 2\nX-Version: 1.3.0\n
Determine whether the client is compatible with this server based on the presence/absence of these headers and their values if present.
Older API versions are guaranteed to be supported for at least one year after the release of a new API version. After that, support for older API versions may be (but is not guaranteed to be) dropped.
"},{"location":"api/#api-changelog","title":"API Changelog","text":""},{"location":"api/#version-1","title":"Version 1","text":"Initial API version.
"},{"location":"api/#version-2","title":"Version 2","text":"Tag.color
. This read/write string field contains a hex color such as #a6cee3
.Tag.text_color
. This field contains the text color to use for a specific tag, which is either black or white depending on the brightness of Tag.color
.Tag.colour
./api/ui_settings/
has changed./api/tasks/acknowledge/
.id
and label
fields as opposed to a simple list of strings. When creating or updating a custom field value of a document for a select type custom field, the value should be the id
of the option whereas previously was the index of the option.Exports generated in Paperless-ngx v2.0.0\u20132.0.1 will not contain consumption templates or custom fields, we recommend users upgrade to at least v2.1.
"},{"location":"changelog/#bug-fixes_62","title":"Bug Fixes","text":"Exports generated in Paperless-ngx v2.0.0\u20132.0.1 will not contain consumption templates or custom fields, we recommend users upgrade to at least v2.1.
"},{"location":"changelog/#breaking-changes_4","title":"Breaking Changes","text":"PAPERLESS_DB_TIMEOUT
for all db types @shamoon (#3576)Chore: Run tests which require convert in the CI @stumpylog (#2570)
Feature: split documents on ASN barcode @muued (#2554)
Note: Version 1.12.x introduced searching of comments which will work for comments added after the upgrade but a reindex of the search index is required in order to be able to search older comments. The Docker image will automatically perform this reindex, bare metal installations will have to perform this manually, see the docs.
"},{"location":"changelog/#bug-fixes_84","title":"Bug Fixes","text":"Note: PR #2279 could represent a breaking change to the API which may affect third party applications that were only checking the post_document
endpoint for e.g. result = 'OK' as opposed to e.g. HTTP status = 200
Versions 1.11.1 and 1.11.2 contain bug fixes from v1.11.0 that prevented use of the new email consumption feature
"},{"location":"changelog/#bug-fixes_88","title":"Bug Fixes","text":"CONSUMER_SUBDIRS_AS_TAGS
causes failure with Celery in dev
@shamoon (#1942)dev
after move to celery @shamoon (#1934)PAPERLESS_DBPASS
@shamoon (#1683)dev
trying to build Pillow or lxml @stumpylog (#1909)CONSUMER_SUBDIRS_AS_TAGS
causes failure with Celery in dev
@shamoon (#1942)dev
after move to celery @shamoon (#1934)myst-parser
to fix readthedocs @qcasey (#982)myst-parser
to fix readthedocs @qcasey (#982)PAPERLESS_URL
is now required when using a reverse proxy. See #674.PAPERLESS_URL
env variable & CSRF var @shamoon (#674)PAPERLESS_OCR_MAX_IMAGE_PIXELS
@hacker-h (#441)PAPERLESS_URL
env variable & CSRF var @shamoon (#674)PAPERLESS_OCR_MAX_IMAGE_PIXELS
@hacker-h (#441)PAPERLESS_URL
env variable & CSRF var @shamoon (#674)This is the first release of the revived paperless-ngx project \ud83c\udf89. Thank you to everyone on the paperless-ngx team for your initiative and excellent teamwork!
Version 1.6.0 merges several pending PRs from jonaswinkler's repo and includes new feature updates and bug fixes. Major backend and UI changes include:
PAPERLESS_LOGOUT_REDIRECT_URL
.PAPERLESS_TRASH_DIR
.PAPERLESS_WORKER_TIMEOUT
.PAPERLESS_PORT
.Known issues:
Dockerfile
to RUN npm update npm -g && npm install --legacy-peer-deps
.Thank you to the following people for their documentation updates, fixes, and comprehensive testing:
@m0veax, @a17t, @fignew, @muued, @bauerj, @isigmund, @denilsonsa, @mweimerskirch, @alexander-bauer, @apeltzer, @tribut, @yschroeder, @gador, @sAksham-Ar, @sbrunner, @philpagel, @davemachado, @2600box, @qcasey, @Nicarim, @kpj, @filcuk, @Timoms, @mattlamb99, @padraigkitterick, @ajkavanagh, @Tooa, @Unkn0wnCat, @pewter77, @stumpylog, @Toxix, @azapater, @jschpp
Another big thanks to the people who have contributed translations:
Support for Python 3.6 was dropped.
PAPERLESS_CONSUMER_IGNORE_PATTERNS
.This is a maintenance release.
.DS_STORE
and ._XXXXX.pdf
.PAPERLESS_FORCE_SCRIPT_NAME
. You can now host paperless on sub paths such as https://localhost:8000/paperless/
.sudo
that caused paperless to not start on many Raspberry Pi devices. Thank you WhiteHatTux!PAPERLESS_ADMIN_USER
and PAPERLESS_ADMIN_PASSWORD
as environment variables to the docker container.Note
The changed to the full text searching require you to reindex your documents. The docker image does this automatically, you don't need to do anything. To do this, execute the document_index reindex
management command (see Managing the document search index).
AUTO_LOGIN_USERNAME
: Unable to perform POST/PUT/DELETE requests and unable to receive WebSocket messages.This release contains new database migrations.
unpaper
.libpoppler-cpp-dev
.This release contains new database migrations.
PAPERLESS_FILENAME_FORMAT
is used and the filenames of two or more documents are the same, except for the file extension.Document processing status
Live updates to document lists and saved views when new documents are added.
Tip
For status notifications and live updates to work, paperless now requires an ASGI-enabled web server. The docker images uses gunicorn
and an ASGI-enabled worker called uvicorn, and there is no need to configure anything.
For bare metal installations, changes are required for the notifications to work. Adapt the service paperless-webserver.service
to use the supplied gunicorn.conf.py
configuration file and adapt the reference to the ASGI application as follows:
ExecStart=/opt/paperless/.local/bin/gunicorn -c /opt/paperless/gunicorn.conf.py paperless.asgi:application\n
Paperless will continue to work with WSGI, but you will not get any status notifications.
Apache mod_wsgi
users, see this note.
Paperless now offers suggestions for tags, correspondents and types on the document detail page.
Added an interactive easy install script that automatically downloads, configures and starts paperless with docker.
Official support for Python 3.9.
Other changes and fixes
PAPERLESS_DATA_DIR/log/
. Logging settings can be adjusted with PAPERLESS_LOGGING_DIR
, PAPERLESS_LOGROTATE_MAX_SIZE
and PAPERLESS_LOGROTATE_MAX_BACKUPS
.Nothing special about this release, but since there are relatively few bug reports coming in, I think that this is reasonably stable.
rsync
.PAPERLESS_FILENAME_FORMAT
.Starting with this version, releases are getting built automatically. This release also comes with changes on how to install and update paperless.
-dockerfiles.tar.xz
release archive is gone. Instead, simply grab the docker files from /docker/compose
in the repository if you wish to install paperless by pulling from the hub./docker/compose
were changed to always use the latest
version automatically. In order to do further updates, simply do a docker-compose pull
. The documentation has been updated.compilemessages
and collectstatic
are now obsolete.PAPERLESS_IGNORE_DATES
was added by jayme-github. This can be used to instruct paperless to ignore certain dates (such as your date of birth) when guessing the date from the document content. This was actually introduced in 0.9.12, I just forgot to mention it in the changelog.sslmode
to force encryption of the connection to PostgreSQL.jbig2enc
, which is a lossless image encoder for PDF documents and decreases the size of certain PDF/A documents.USERMAP_UID
and USERMAP_GID
was used in the docker-compose.env
file.Note
The bulk delete operations did not update the search index. Therefore, documents that you deleted remained in the index and caused the search to return messages about missing documents when searching. Further bulk operations will properly update the index.
However, this change is not retroactive: If you used the delete method of the bulk editor, you need to reindex your search index by running the management command document_index
with the argument reindex
.
Christmas release!
PAPERLESS_COOKIE_PREFIX
is used was fixed.This release addresses two severe issues with the previous release.
{tag_list}
inserts a list of tags into the filename, separated by comma.document_retagger
no longer removes inbox tags or tags without matching rules.PAPERLESS_COOKIE_PREFIX
allows you to run multiple instances of paperless on different ports. This option enables you to be logged in into multiple instances by specifying different cookie names for each instance.{created_month}
and {created_day}
now use a leading zero for single digit values.{tags}
can no longer be used without arguments.:
not being accepted for upload.This release focusses primarily on many small issues with the UI.
FILENAME_FORMAT
placeholder for document types._01
, _02
, etc when it detects duplicate filenames.Note
The changes to the filename format will apply to newly added documents and changed documents. If you want all files to reflect these changes, execute the document_renamer
management command.
This release concludes the big changes I wanted to get rolled into paperless. The next releases before 1.0 will focus on fixing issues, primarily.
PAPERLESS_OCR_LANGUAGE
. Be sure to set this to the language the majority of your documents are in. Multiple languages can be specified, but that requires more CPU time.document_archiver
can be used to create archived versions for already existing documents.PAPERLESS_CONSUMER_RECURSIVE
and PAPERLESS_CONSUMER_SUBDIRS_AS_TAGS
.correspondent
, document_type
and tags
. The *_id
versions are gone. These fields are read/write.document_index reindex
management command (see document search index) that adds the data to the search index. You only need to do this once, since the schema of the search index changed. Paperless keeps the index updated after that whenever something changes.PAPERLESS_AUTO_LOGIN_USERNAME
replaces PAPERLESS_DISABLE_LOGIN
. You have to specify your username.application/octet-stream
)PAPERLESS_OCR_PAGES
limits the tesseract parser to the first n pages of scanned documents.PAPERLESS_TASK_WORKERS
and PAPERLESS_THREADS_PER_WORKER
. See TODO for details on concurrency./fetch/
urls. Redirects are in place./push
url. Redirects are in place.PAPERLESS_DBHOST
is specified in the settings, paperless uses PostgreSQL instead of SQLite. Username, database and password all default to paperless
if not specified.document_correspondents
management command.supervisord
to run everything paperless-related in a single container.PAPERLESS_FORGIVING_OCR
is now default and gone. Reason: Even if langdetect
fails to detect a language, tesseract still does a very good job at ocr'ing a document with the default language. Certain language specifics such as umlauts may not get picked up properly.PAPERLESS_DEBUG
defaults to false
.PAPERLESS_DBHOST
now determines whether to use PostgreSQL or SQLite.PAPERLESS_OCR_THREADS
is gone and replaced with PAPERLESS_TASK_WORKERS
and PAPERLESS_THREADS_PER_WORKER
. Refer to the config example for details.PAPERLESS_OPTIMIZE_THUMBNAILS
allows you to disable or enable thumbnail optimization. This is useful on less powerful devices.psycopg2
to the Pipfile #489. He also fixed a syntax error in docker-compose.yml.example
#488 and added DjangoQL, which allows a litany of handy search functionality #492.DEBUG
value. The paperless.conf.example
file was also updated to mirror the project defaults.added
property to the REST API #471.RecentCorrespondentsFilter
correspondents filter that was added in 2.4 to play nice with the defaults. Thanks to tsia and Sblop who pointed this out. #423..editorconfig
file to better specify coding style.PAPERLESS_OPTIPNG_BINARY
. The Docker image has already been updated on the Docker Hub, so you just need to pull the latest one from there if you're a Docker user.slugify()
function. The slug value is still visible in the admin though..slug
values to ones conforming to the slugify()
rules..save()
in determining a slug and using that to check for an existing tag/correspondent.get_date()
functionality of the parsers has been consolidated onto the DocumentParser
class since much of that code was redundant anyway.unencrypted
, since exports are by their nature unencrypted. It's now in the import step that we decide the storage type. This allows you to export from an encrypted system and import into an unencrypted one, or vice-versa.PAPERLESS_DBUSER
in your environment. This will attempt to connect to your Postgres database without a password unless you also set PAPERLESS_DBPASS
.DISABLE_LOGIN
feature: #392.overrides.css
and/or overrides.js
in the root of your media directory. Thanks to Mark McFate for this idea: #371This is a big release as we've changed a core-functionality of Paperless: we no longer encrypt files with GPG by default.
The reasons for this are many, but it boils down to that the encryption wasn't really all that useful, as files on-disk were still accessible so long as you had the key, and the key was most typically stored in the config file. In other words, your files are only as safe as the paperless
user is. In addition to that, the contents of the documents were never encrypted, so important numbers etc. were always accessible simply by querying the database. Still, it was better than nothing, but the consensus from users appears to be that it was more an annoyance than anything else, so this feature is now turned off unless you explicitly set a passphrase in your config file.
Encryption isn't gone, it's just off for new users. So long as you have PAPERLESS_PASSPHRASE
set in your config or your environment, Paperless should continue to operate as it always has. If however, you want to drop encryption too, you only need to do two things:
./manage.py migrate && ./manage.py change_storage_type gpg unencrypted
. This will go through your entire database and Decrypt All The Things.PAPERLESS_PASSPHRASE
from your paperless.conf
file, or simply stop declaring it in your environment.Special thanks to erikarvstedt, matthewmoto, and mcronce who did the bulk of the work on this big change.
"},{"location":"changelog/#140","title":"1.4.0","text":"--directory
, limit the --loop-time
, set the time between mail server checks with --mail-delta
or just run it as a one-off with --one-shot
. See #305 & #313 for more information..travis.yml
and setup.cfg
.paperless.conf
. Thanks to Martin Arendtsen who provided this (#322).--noreload
to the default server start process. This helps reduce the load imposed by the running webservice.PAPERLESS_DISABLE_LOGIN=\"true\"
in your environment or in /etc/paperless.conf
.convert
and unpaper
and fail-out nicely.PAPERLESS_OCR_ALWAYS=YES
either in your paperless.conf
or in the environment. Note that this also means that Paperless now requires libpoppler-cpp-dev
to be installed. Important: You'll need to run pip install -r requirements.txt
after the usual git pull
to properly update./paperless
), rather than always running in the root (/
) thanks to maphy-psd's work on #255..tif
files properly. Thanks to ayounggun for reporting this one and to Kusti Skyt\u00e9n for posting the correct solution in the GitHub issue.PAPERLESS_SHARED_SECRET
as it was being used both for the API (now replaced with a normal auth) and form email polling. Now that we're only using it for email, this variable has been renamed to PAPERLESS_EMAIL_SECRET
. The old value will still work for a while, but you should change your config if you've been using the email polling feature. Thanks to Joshua Gilman for all the help with this feature..PDF
content
field is now optional, to allow for the edge case of a purely graphical document.SECRET_KEY
value.[\"*\"]
and allowing the user to set her own value via PAPERLESS_ALLOWED_HOSTS
should the need arise.CONVERT_BINARY
PAPERLESS_CONSUMER_LOOP_TIME
to a number of seconds. The default is 10.PAPERLESS_CONVERT
, PAPERLESS_CONSUME
, and PAPERLESS_SECRET
. Please use PAPERLESS_CONVERT_BINARY
, PAPERLESS_CONSUMPTION_DIR
, and PAPERLESS_SHARED_SECRET
respectively instead.paperless.conf
.paperless.conf
./fetch
URL./etc/paperless.conf
and modified the systemd unit files to use it.settings.py
..jpeg
and .JPG
images to be imported but made unavailable.User
and Group
from the adminpytz
to the list of requirementsdocument_exporter
.settings.TESSERACT_LANGUAGE
to settings.OCR_LANGUAGE
.Paperless provides a wide range of customizations. Depending on how you run paperless, these settings have to be defined in different places.
Certain configuration options may be set via the UI. This currently includes common OCR related settings and some frontend settings. If set, these will take preference over the settings via environment variables. If not set, the environment setting or applicable default will be utilized instead.
If you run paperless on docker, paperless.conf
is not used. Rather, configure paperless by copying necessary options to docker-compose.env
.
If you are running paperless on anything else, paperless will search for the configuration file in these locations and use the first one it finds:
PAPERLESS_CONFIGURATION_PATH
/path/to/paperless/paperless.conf
/etc/paperless.conf
/usr/local/etc/paperless.conf
PAPERLESS_REDIS=<url>
","text":"This is required for processing scheduled tasks such as email fetching, index optimization and for training the automatic document matcher.
redis://<username>:<password>@<host>:<port>
redis://:<password>@<host>:<port>
redis://<username>:<password>@<host>:<port>/<DBIndex>
More information on securing your Redis Instance.
Defaults to redis://localhost:6379
.
PAPERLESS_REDIS_PREFIX=<prefix>
","text":"Prefix to be used in Redis for keys and channels. Useful for sharing one Redis server among multiple Paperless instances.
Defaults to no prefix.
"},{"location":"configuration/#database","title":"Database","text":""},{"location":"configuration/#PAPERLESS_DBENGINE","title":"PAPERLESS_DBENGINE=<engine_name>
","text":"Optional, gives the ability to choose Postgres or MariaDB for database engine. Available options are postgresql
and mariadb
.
Default is postgresql
.
Warning
Using MariaDB comes with some caveats. See MySQL Caveats.
"},{"location":"configuration/#PAPERLESS_DBHOST","title":"PAPERLESS_DBHOST=<hostname>
","text":"By default, sqlite is used as the database backend. This can be changed here.
Set PAPERLESS_DBHOST and another database will be used instead of sqlite.
"},{"location":"configuration/#PAPERLESS_DBPORT","title":"PAPERLESS_DBPORT=<port>
","text":"Adjust port if necessary.
Default is 5432.
"},{"location":"configuration/#PAPERLESS_DBNAME","title":"PAPERLESS_DBNAME=<name>
","text":"Database name in PostgreSQL or MariaDB.
Defaults to \"paperless\".
"},{"location":"configuration/#PAPERLESS_DBUSER","title":"PAPERLESS_DBUSER=<name>
","text":"Database user in PostgreSQL or MariaDB.
Defaults to \"paperless\".
"},{"location":"configuration/#PAPERLESS_DBPASS","title":"PAPERLESS_DBPASS=<password>
","text":"Database password for PostgreSQL or MariaDB.
Defaults to \"paperless\".
"},{"location":"configuration/#PAPERLESS_DBSSLMODE","title":"PAPERLESS_DBSSLMODE=<mode>
","text":"SSL mode to use when connecting to PostgreSQL or MariaDB.
See the official documentation about sslmode for PostgreSQL.
See the official documentation about sslmode for MySQL and MariaDB.
Note: SSL mode values differ between PostgreSQL and MariaDB.
Default is prefer
for PostgreSQL and PREFERRED
for MariaDB.
PAPERLESS_DBSSLROOTCERT=<ca-path>
","text":"SSL root certificate path
See the official documentation about sslmode for PostgreSQL. Changes path of root.crt
.
See the official documentation about sslmode for MySQL and MariaDB.
Defaults to unset, using the documented path in the home directory.
"},{"location":"configuration/#PAPERLESS_DBSSLCERT","title":"PAPERLESS_DBSSLCERT=<client-cert-path>
","text":"SSL client certificate path
See the official documentation about sslmode for PostgreSQL.
See the official documentation about sslmode for MySQL and MariaDB.
Changes path of postgresql.crt
.
Defaults to unset, using the documented path in the home directory.
"},{"location":"configuration/#PAPERLESS_DBSSLKEY","title":"PAPERLESS_DBSSLKEY=<client-cert-key>
","text":"SSL client key path
See the official documentation about sslmode for PostgreSQL.
See the official documentation about sslmode for MySQL and MariaDB.
Changes path of postgresql.key
.
Defaults to unset, using the documented path in the home directory.
"},{"location":"configuration/#PAPERLESS_DB_TIMEOUT","title":"PAPERLESS_DB_TIMEOUT=<int>
","text":"Amount of time for a database connection to wait for the database to unlock. Mostly applicable for sqlite based installation. Consider changing to postgresql if you are having concurrency problems with sqlite.
Defaults to unset, keeping the Django defaults.
"},{"location":"configuration/#optional-services","title":"Optional Services","text":""},{"location":"configuration/#tika","title":"Tika","text":"Paperless can make use of Tika and Gotenberg for parsing and converting \"Office\" documents (such as \".doc\", \".xlsx\" and \".odt\"). Tika and Gotenberg are also needed to allow parsing of E-Mails (.eml).
If you wish to use this, you must provide a Tika server and a Gotenberg server, configure their endpoints, and enable the feature.
"},{"location":"configuration/#PAPERLESS_TIKA_ENABLED","title":"PAPERLESS_TIKA_ENABLED=<bool>
","text":"Enable (or disable) the Tika parser.
Defaults to false.
"},{"location":"configuration/#PAPERLESS_TIKA_ENDPOINT","title":"PAPERLESS_TIKA_ENDPOINT=<url>
","text":"Set the endpoint URL where Paperless can reach your Tika server.
Defaults to \"http://localhost:9998\".
"},{"location":"configuration/#PAPERLESS_TIKA_GOTENBERG_ENDPOINT","title":"PAPERLESS_TIKA_GOTENBERG_ENDPOINT=<url>
","text":"Set the endpoint URL where Paperless can reach your Gotenberg server.
Defaults to \"http://localhost:3000\".
If you run paperless on docker, you can add those services to the Docker Compose file (see the provided docker-compose.sqlite-tika.yml
file for reference).
Add all three configuration parameters to your configuration. If using Docker, this may be the environment
key of the webserver or a docker-compose.env
file. Bare metal installations may have a .conf
file containing the configuration parameters. Be sure to use the correct format and watch out for indentation if editing the YAML file.
PAPERLESS_EMAIL_PARSE_DEFAULT_LAYOUT=<int>
(#PAPERLESS_EMAIL_PARSE_DEFAULT_LAYOUT)","text":"The default layout to use for emails that are consumed as documents. Must be one of the integer choices below. Note that mail rules can specify this setting, thus this fallback is used for the default selection and for .eml files consumed by other means.
1
= Text, then HTML2
= HTML, then text3
= HTML only4
= Text onlyPAPERLESS_CONSUMPTION_DIR=<path>
","text":"This is where your documents should go to be consumed. Make sure that it exists and that the user running the paperless service can read/write its contents before you start Paperless.
Don't change this when using docker, as it only changes the path within the container. Change the local consumption directory in the docker-compose.yml file instead.
Defaults to \"../consume/\", relative to the \"src\" directory.
"},{"location":"configuration/#PAPERLESS_DATA_DIR","title":"PAPERLESS_DATA_DIR=<path>
","text":"This is where paperless stores all its data (search index, SQLite database, classification model, etc).
Defaults to \"../data/\", relative to the \"src\" directory.
"},{"location":"configuration/#PAPERLESS_EMPTY_TRASH_DIR","title":"PAPERLESS_EMPTY_TRASH_DIR=<path>
","text":"When documents are deleted (e.g. after emptying the trash) the original files will be moved here instead of being removed from the filesystem. Only the original version is kept.
This must be writeable by the user running paperless. When running inside docker, ensure that this path is within a permanent volume (such as \"../media/trash\") so it won't get lost on upgrades.
Note that the directory must exist prior to using this setting.
Defaults to empty (i.e. really delete files).
This setting was previously named PAPERLESS_TRASH_DIR.
"},{"location":"configuration/#PAPERLESS_MEDIA_ROOT","title":"PAPERLESS_MEDIA_ROOT=<path>
","text":"This is where your documents and thumbnails are stored.
You can set this and PAPERLESS_DATA_DIR to the same folder to have paperless store all its data within the same volume.
Defaults to \"../media/\", relative to the \"src\" directory.
"},{"location":"configuration/#PAPERLESS_STATICDIR","title":"PAPERLESS_STATICDIR=<path>
","text":"Override the default STATIC_ROOT here. This is where all static files created using \"collectstatic\" manager command are stored.
Unless you're doing something fancy, there is no need to override this. If this is changed, you may need to run collectstatic
again.
Defaults to \"../static/\", relative to the \"src\" directory.
"},{"location":"configuration/#PAPERLESS_FILENAME_FORMAT","title":"PAPERLESS_FILENAME_FORMAT=<format>
","text":"Changes the filenames paperless uses to store documents in the media directory. See File name handling for details.
Default is none, which disables this feature.
"},{"location":"configuration/#PAPERLESS_FILENAME_FORMAT_REMOVE_NONE","title":"PAPERLESS_FILENAME_FORMAT_REMOVE_NONE=<bool>
","text":"Tells paperless to replace placeholders in PAPERLESS_FILENAME_FORMAT
that would resolve to 'none' to be omitted from the resulting filename. This also holds true for directory names. See File name handling for details.
Defaults to false
which disables this feature.
PAPERLESS_LOGGING_DIR=<path>
","text":"This is where paperless will store log files.
Defaults to PAPERLESS_DATA_DIR/log/
.
PAPERLESS_NLTK_DIR=<path>
","text":"This is where paperless will search for the data required for NLTK processing, if you are using it. If you are using the Docker image, this should not be changed, as the data is included in the image already. Previously, the location defaulted to PAPERLESS_DATA_DIR/nltk
. Unless you are using this in a bare metal install or other setup, this folder is no longer needed and can be removed manually.
Defaults to /usr/share/nltk_data
PAPERLESS_MODEL_FILE=<path>
","text":"This is where paperless will store the classification model.
Defaults to PAPERLESS_DATA_DIR/classification_model.pickle
.
PAPERLESS_LOGROTATE_MAX_SIZE=<num>
","text":"Maximum file size for log files before they are rotated, in bytes.
Defaults to 1 MiB.
"},{"location":"configuration/#PAPERLESS_LOGROTATE_MAX_BACKUPS","title":"PAPERLESS_LOGROTATE_MAX_BACKUPS=<num>
","text":"Number of rotated log files to keep.
Defaults to 20.
"},{"location":"configuration/#hosting-and-security","title":"Hosting & Security","text":""},{"location":"configuration/#PAPERLESS_SECRET_KEY","title":"PAPERLESS_SECRET_KEY=<key>
","text":"Paperless uses this to make session tokens. If you expose paperless on the internet, you need to change this, since the default secret is well known.
Use any sequence of characters. The more, the better. You don't need to remember this. Just face-roll your keyboard.
Default is listed in the file src/paperless/settings.py
.
PAPERLESS_URL=<url>
","text":"This setting can be used to set the three options below (ALLOWED_HOSTS, CORS_ALLOWED_HOSTS and CSRF_TRUSTED_ORIGINS). If the other options are set the values will be combined with this one. Do not include a trailing slash. E.g. https://paperless.domain.com
Defaults to empty string, leaving the other settings unaffected.
Note
This value cannot contain a path (e.g. domain.com/path), even if you are installing paperless-ngx at a subpath.
"},{"location":"configuration/#PAPERLESS_CSRF_TRUSTED_ORIGINS","title":"PAPERLESS_CSRF_TRUSTED_ORIGINS=<comma-separated-list>
","text":"A list of trusted origins for unsafe requests (e.g. POST). As of Django 4.0 this is required to access the Django admin via the web. See the Django project documentation on the settings
Can also be set using PAPERLESS_URL (see above).
Defaults to empty string, which does not add any origins to the trusted list.
"},{"location":"configuration/#PAPERLESS_ALLOWED_HOSTS","title":"PAPERLESS_ALLOWED_HOSTS=<comma-separated-list>
","text":"If you're planning on putting Paperless on the open internet, then you really should set this value to the domain name you're using. Failing to do so leaves you open to HTTP host header attacks. You can read more about this in the Django project's documentation
Just remember that this is a comma-separated list, so \"example.com\" is fine, as is \"example.com,www.example.com\", but NOT \" example.com\" or \"example.com,\"
Can also be set using PAPERLESS_URL (see above).
\"localhost\" is always allowed for docker healthcheck
Defaults to \"*\", which is all hosts.
"},{"location":"configuration/#PAPERLESS_CORS_ALLOWED_HOSTS","title":"PAPERLESS_CORS_ALLOWED_HOSTS=<comma-separated-list>
","text":"You need to add your servers to the list of allowed hosts that can do CORS calls. Set this to your public domain name.
Can also be set using PAPERLESS_URL (see above).
Defaults to \"http://localhost:8000\".
"},{"location":"configuration/#PAPERLESS_TRUSTED_PROXIES","title":"PAPERLESS_TRUSTED_PROXIES=<comma-separated-list>
","text":"This may be needed to prevent IP address spoofing if you are using e.g. fail2ban with log entries for failed authorization attempts. Value should be IP address(es).
Defaults to empty string.
"},{"location":"configuration/#PAPERLESS_FORCE_SCRIPT_NAME","title":"PAPERLESS_FORCE_SCRIPT_NAME=<path>
","text":"To host paperless under a subpath url like example.com/paperless you set this value to /paperless. No trailing slash!
Defaults to none, which hosts paperless at \"/\".
"},{"location":"configuration/#PAPERLESS_STATIC_URL","title":"PAPERLESS_STATIC_URL=<path>
","text":"Override the STATIC_URL here. Unless you're hosting Paperless off a specific path like /paperless/, you probably don't need to change this. If you do change it, be sure to include the trailing slash.
Defaults to \"/static/\".
Note
When hosting paperless behind a reverse proxy like Traefik or Nginx at a subpath e.g. example.com/paperlessngx you will also need to set PAPERLESS_FORCE_SCRIPT_NAME
(see above).
PAPERLESS_AUTO_LOGIN_USERNAME=<username>
","text":"Specify a username here so that paperless will automatically perform login with the selected user.
Danger
Do not use this when exposing paperless on the internet. There are no checks in place that would prevent you from doing this.
Defaults to none, which disables this feature.
"},{"location":"configuration/#PAPERLESS_ADMIN_USER","title":"PAPERLESS_ADMIN_USER=<username>
","text":"If this environment variable is specified, Paperless automatically creates a superuser with the provided username at start. This is useful in cases where you can not run the createsuperuser
command separately, such as Kubernetes or AWS ECS.
Requires PAPERLESS_ADMIN_PASSWORD be set.
Note
This will not change an existing [super]user's password, nor will it recreate a user that already exists. You can leave this throughout the lifecycle of the containers.
"},{"location":"configuration/#PAPERLESS_ADMIN_MAIL","title":"PAPERLESS_ADMIN_MAIL=<email>
","text":"(Optional) Specify superuser email address. Only used when PAPERLESS_ADMIN_USER is set.
Defaults to root@localhost
.
PAPERLESS_ADMIN_PASSWORD=<password>
","text":"Only used when PAPERLESS_ADMIN_USER is set. This will be the password of the automatically created superuser."},{"location":"configuration/#PAPERLESS_COOKIE_PREFIX","title":"PAPERLESS_COOKIE_PREFIX=<str>
","text":"Specify a prefix that is added to the cookies used by paperless to identify the currently logged in user. This is useful for when you're running two instances of paperless on the same host.
After changing this, you will have to login again.
Defaults to \"\"
, which does not alter the cookie names.
PAPERLESS_ENABLE_HTTP_REMOTE_USER=<bool>
","text":"Allows authentication via HTTP_REMOTE_USER which is used by some SSO applications.
Warning
This will allow authentication by simply adding a Remote-User: <username>
header to a request. Use with care! You especially must ensure that any such header is not passed from external requests to your reverse-proxy to paperless (that would effectively bypass all authentication).
If you're exposing paperless to the internet directly (i.e. without a reverse proxy), do not use this.
Also see the warning in the official documentation.
Defaults to \"false\" which disables this feature.
"},{"location":"configuration/#PAPERLESS_ENABLE_HTTP_REMOTE_USER_API","title":"PAPERLESS_ENABLE_HTTP_REMOTE_USER_API=<bool>
","text":"Allows authentication via HTTP_REMOTE_USER directly against the API
Warning
See the warning above about securing your installation when using remote user header authentication. This setting is separate from PAPERLESS_ENABLE_HTTP_REMOTE_USER
to avoid introducing a security vulnerability to existing reverse proxy setups. As above, ensure that your reverse proxy does not simply pass the Remote-User
header from the internet to paperless.
Defaults to \"false\" which disables this feature.
"},{"location":"configuration/#PAPERLESS_HTTP_REMOTE_USER_HEADER_NAME","title":"PAPERLESS_HTTP_REMOTE_USER_HEADER_NAME=<str>
","text":"If \"PAPERLESS_ENABLE_HTTP_REMOTE_USER\" or PAPERLESS_ENABLE_HTTP_REMOTE_USER_API
are enabled, this property allows to customize the name of the HTTP header from which the authenticated username is extracted. Values are in terms of HttpRequest.META. Thus, the configured value must start with HTTP*
followed by the normalized actual header name.
Defaults to \"HTTP_REMOTE_USER\".
"},{"location":"configuration/#PAPERLESS_LOGOUT_REDIRECT_URL","title":"PAPERLESS_LOGOUT_REDIRECT_URL=<str>
","text":"URL to redirect the user to after a logout. This can be used together with PAPERLESS_ENABLE_HTTP_REMOTE_USER and SSO to redirect the user back to the SSO application's logout page to complete the logout process.
Defaults to None, which disables this feature.
"},{"location":"configuration/#PAPERLESS_USE_X_FORWARD_HOST","title":"PAPERLESS_USE_X_FORWARD_HOST=<bool>
","text":"Configures the Django setting USE_X_FORWARDED_HOST which may be needed for hosting behind a proxy.
Defaults to False
"},{"location":"configuration/#PAPERLESS_USE_X_FORWARD_PORT","title":"PAPERLESS_USE_X_FORWARD_PORT=<bool>
","text":"Configures the Django setting USE_X_FORWARDED_PORT which may be needed for hosting behind a proxy.
Defaults to False
"},{"location":"configuration/#PAPERLESS_PROXY_SSL_HEADER","title":"PAPERLESS_PROXY_SSL_HEADER=<json-list>
","text":"Configures the Django setting SECURE_PROXY_SSL_HEADER which may be needed for hosting behind a proxy. The two values in the list will form the tuple of HTTP header/value expected by Django, eg '[\"HTTP_X_FORWARDED_PROTO\", \"https\"]'
.
Defaults to None
Warning
Settings this value has security implications. Read the Django documentation and be sure you understand its usage before setting it.
"},{"location":"configuration/#PAPERLESS_EMAIL_CERTIFICATE_LOCATION","title":"PAPERLESS_EMAIL_CERTIFICATE_LOCATION=<path>
","text":"Configures an additional SSL certificate file containing a certificate or certificate chain which should be trusted for validating SSL connections against mail providers. This is for use with self-signed certificates against local IMAP servers.
Defaults to None.
Warning
Settings this value has security implications for the security of your email. Understand what it does and be sure you need to before setting.
"},{"location":"configuration/#authentication","title":"Authentication & SSO","text":""},{"location":"configuration/#PAPERLESS_ACCOUNT_ALLOW_SIGNUPS","title":"PAPERLESS_ACCOUNT_ALLOW_SIGNUPS=<bool>
","text":"Allow users to signup for a new Paperless-ngx account.
Defaults to False
"},{"location":"configuration/#PAPERLESS_ACCOUNT_DEFAULT_GROUPS","title":"PAPERLESS_ACCOUNT_DEFAULT_GROUPS=<comma-separated-list>
","text":"A list of group names that users will be added to when they sign up for a new account. Groups listed here must already exist.
Defaults to None
"},{"location":"configuration/#PAPERLESS_SOCIALACCOUNT_PROVIDERS","title":"PAPERLESS_SOCIALACCOUNT_PROVIDERS=<json>
","text":"This variable is used to setup login and signup via social account providers which are compatible with django-allauth. See the corresponding django-allauth documentation for a list of provider configurations. You will also need to include the relevant Django 'application' inside the PAPERLESS_APPS setting to activate that specific authentication provider (e.g. allauth.socialaccount.providers.openid_connect
for the OIDC Connect provider).
Defaults to None, which does not enable any third party authentication systems.
"},{"location":"configuration/#PAPERLESS_SOCIAL_AUTO_SIGNUP","title":"PAPERLESS_SOCIAL_AUTO_SIGNUP=<bool>
","text":"Attempt to signup the user using retrieved email, username etc from the third party authentication system. See the corresponding django-allauth documentation
Defaults to False
"},{"location":"configuration/#PAPERLESS_SOCIALACCOUNT_ALLOW_SIGNUPS","title":"PAPERLESS_SOCIALACCOUNT_ALLOW_SIGNUPS=<bool>
","text":"Allow users to signup for a new Paperless-ngx account using any setup third party authentication systems.
Defaults to True
"},{"location":"configuration/#PAPERLESS_SOCIAL_ACCOUNT_SYNC_GROUPS","title":"PAPERLESS_SOCIAL_ACCOUNT_SYNC_GROUPS=<bool>
","text":"Sync groups from the third party authentication system (e.g. OIDC) to Paperless-ngx. When enabled, users will be added or removed from groups based on their group membership in the third party authentication system. Groups must already exist in Paperless-ngx and have the same name as in the third party authentication system. Groups are updated upon logging in via the third party authentication system, see the corresponding django-allauth documentation. In order to pass groups from the authentication system you will need to update your PAPERLESS_SOCIALACCOUNT_PROVIDERS setting by adding a top-level \"SCOPES\" setting which includes \"groups\", e.g.:
{\"openid_connect\":{\"SCOPE\": [\"openid\",\"profile\",\"email\",\"groups\"]...\n
Defaults to False
"},{"location":"configuration/#PAPERLESS_SOCIAL_ACCOUNT_DEFAULT_GROUPS","title":"PAPERLESS_SOCIAL_ACCOUNT_DEFAULT_GROUPS=<comma-separated-list>
","text":"A list of group names that users who signup via social accounts will be added to upon signup. Groups listed here must already exist. If both the PAPERLESS_ACCOUNT_DEFAULT_GROUPS setting and this setting are used, the user will be added to both sets of groups.
Defaults to None
"},{"location":"configuration/#PAPERLESS_ACCOUNT_DEFAULT_HTTP_PROTOCOL","title":"PAPERLESS_ACCOUNT_DEFAULT_HTTP_PROTOCOL=<string>
","text":"The protocol used when generating URLs, e.g. login callback URLs. See the corresponding django-allauth documentation
Defaults to 'https'
"},{"location":"configuration/#PAPERLESS_ACCOUNT_EMAIL_VERIFICATION","title":"PAPERLESS_ACCOUNT_EMAIL_VERIFICATION=<string>
","text":"Determines whether email addresses are verified during signup (as performed by Django allauth). See the relevant paperless settings and the allauth docs
Defaults to 'optional'
Note
If you do not have a working email server set up you should set this to 'none'.
"},{"location":"configuration/#PAPERLESS_DISABLE_REGULAR_LOGIN","title":"PAPERLESS_DISABLE_REGULAR_LOGIN=<bool>
","text":"Disables the regular frontend username / password login, i.e. once you have setup SSO. Note that this setting does not disable the Django admin login nor logging in with local credentials via the API. To prevent access to the Django admin, consider blocking /admin/
in your web server or reverse proxy configuration.
You can optionally also automatically redirect users to the SSO login with PAPERLESS_REDIRECT_LOGIN_TO_SSO
Defaults to False
"},{"location":"configuration/#PAPERLESS_REDIRECT_LOGIN_TO_SSO","title":"PAPERLESS_REDIRECT_LOGIN_TO_SSO=<bool>
","text":"When this setting is enabled users will automatically be redirected (using javascript) to the first SSO provider login. You may still want to disable the frontend login form for clarity.
Defaults to False
"},{"location":"configuration/#PAPERLESS_ACCOUNT_SESSION_REMEMBER","title":"PAPERLESS_ACCOUNT_SESSION_REMEMBER=<bool>
","text":"If false, sessions will expire at browser close, if true will use PAPERLESS_SESSION_COOKIE_AGE
for expiration. See the corresponding django-allauth documentation
Defaults to True
"},{"location":"configuration/#PAPERLESS_SESSION_COOKIE_AGE","title":"PAPERLESS_SESSION_COOKIE_AGE=<int>
","text":"Login session cookie expiration. Applies if PAPERLESS_ACCOUNT_SESSION_REMEMBER
is enabled. See the corresponding django documentation
Defaults to 1209600 (2 weeks)
"},{"location":"configuration/#ocr","title":"OCR settings","text":"Paperless uses OCRmyPDF for performing OCR on documents and images. Paperless uses sensible defaults for most settings, but all of them can be configured to your needs.
"},{"location":"configuration/#PAPERLESS_OCR_LANGUAGE","title":"PAPERLESS_OCR_LANGUAGE=<lang>
","text":"Customize the language that paperless will attempt to use when parsing documents.
It should be a 3-letter code, see the list of languages Tesseract supports.
Set this to the language most of your documents are written in.
This can be a combination of multiple languages such as deu+eng
, in which case Tesseract will use whatever language matches best. Keep in mind that Tesseract uses much more CPU time with multiple languages enabled.
If you are including languages that are not installed by default, you will need to also set PAPERLESS_OCR_LANGUAGES
for docker deployments or install the tesseract language packages manually for bare metal installations.
Defaults to \"eng\".
Note
If your language contains a '-' such as chi-sim, you must use chi_sim
.
PAPERLESS_OCR_MODE=<mode>
","text":"Tell paperless when and how to perform ocr on your documents. Three modes are available:
skip
: Paperless skips all pages and will perform ocr only on pages where no text is present. This is the safest option.
redo
: Paperless will OCR all pages of your documents and attempt to replace any existing text layers with new text. This will be useful for documents from scanners that already performed OCR with insufficient results. It will also perform OCR on purely digital documents.
This option may fail on some documents that have features that cannot be removed, such as forms. In this case, the text from the document is used instead.
force
: Paperless rasterizes your documents, converting any text into images and puts the OCRed text on top. This works for all documents, however, the resulting document may be significantly larger and text won't appear as sharp when zoomed in.
The default is skip
, which only performs OCR when necessary and always creates archived documents.
Read more about this in the OCRmyPDF documentation.
"},{"location":"configuration/#PAPERLESS_OCR_SKIP_ARCHIVE_FILE","title":"PAPERLESS_OCR_SKIP_ARCHIVE_FILE=<mode>
","text":"Specify when you would like paperless to skip creating an archived version of your documents. This is useful if you don't want to have two almost-identical versions of your documents in the media folder.
never
: Never skip creating an archived version.with_text
: Skip creating an archived version for documents that already have embedded text.always
: Always skip creating an archived version.The default is never
.
PAPERLESS_OCR_CLEAN=<mode>
","text":"Tells paperless to use unpaper
to clean any input document before sending it to tesseract. This uses more resources, but generally results in better OCR results. The following modes are available:
clean
: Apply unpaper.clean-final
: Apply unpaper, and use the cleaned images to build the output file instead of the original images.none
: Do not apply unpaper.Defaults to clean
.
Note
clean-final
is incompatible with ocr mode redo
. When both clean-final
and the ocr mode redo
is configured, clean
is used instead.
PAPERLESS_OCR_DESKEW=<bool>
","text":"Tells paperless to correct skewing (slight rotation of input images mainly due to improper scanning)
Defaults to true
, which enables this feature.
Note
Deskewing is incompatible with ocr mode redo
. Deskewing will get disabled automatically if redo
is used as the ocr mode.
PAPERLESS_OCR_ROTATE_PAGES=<bool>
","text":"Tells paperless to correct page rotation (90\u00b0, 180\u00b0 and 270\u00b0 rotation).
If you notice that paperless is not rotating incorrectly rotated pages (or vice versa), try adjusting the threshold up or down (see below).
Defaults to true
, which enables this feature.
PAPERLESS_OCR_ROTATE_PAGES_THRESHOLD=<num>
","text":"Adjust the threshold for automatic page rotation by PAPERLESS_OCR_ROTATE_PAGES
. This is an arbitrary value reported by tesseract. \"15\" is a very conservative value, whereas \"2\" is a very aggressive option and will often result in correctly rotated pages being rotated as well.
Defaults to \"12\".
"},{"location":"configuration/#PAPERLESS_OCR_OUTPUT_TYPE","title":"PAPERLESS_OCR_OUTPUT_TYPE=<type>
","text":"Specify the the type of PDF documents that paperless should produce.
pdf
: Modify the PDF document as little as possible.pdfa
: Convert PDF documents into PDF/A-2b documents, which is a subset of the entire PDF specification and meant for storing documents long term.pdfa-1
, pdfa-2
, pdfa-3
to specify the exact version of PDF/A you wish to use.If not specified, pdfa
is used. Remember that paperless also keeps the original input file as well as the archived version.
PAPERLESS_OCR_PAGES=<num>
","text":"Tells paperless to use only the specified amount of pages for OCR. Documents with less than the specified amount of pages get OCR'ed completely.
Specifying 1 here will only use the first page.
The value must be greater than or equal to 1 to be used.
When combined with PAPERLESS_OCR_MODE=redo
or PAPERLESS_OCR_MODE=force
, paperless will not modify any text it finds on excluded pages and copy it verbatim.
Defaults to unset, which disables this feature and always uses all pages.
"},{"location":"configuration/#PAPERLESS_OCR_IMAGE_DPI","title":"PAPERLESS_OCR_IMAGE_DPI=<num>
","text":"Paperless will OCR any images you put into the system and convert them into PDF documents. This is useful if your scanner produces images. In order to do so, paperless needs to know the DPI of the image. Most images from scanners will have this information embedded and paperless will detect and use that information. In case this fails, it uses this value as a fallback.
Set this to the DPI your scanner produces images at.
Defaults to unset, which will automatically calculate image DPI so that the produced PDF documents are A4 sized.
"},{"location":"configuration/#PAPERLESS_OCR_MAX_IMAGE_PIXELS","title":"PAPERLESS_OCR_MAX_IMAGE_PIXELS=<num>
","text":"Paperless will raise a warning when OCRing images which are over this limit and will not OCR images which are more than twice this limit. Note this does not prevent the document from being consumed, but could result in missing text content.
If unset, will default to the value determined by Pillow.
Setting this value to 0 will entirely disable the limit. See the below warning.
Note
Increasing this limit could cause Paperless to consume additional resources when consuming a file. Be sure you have sufficient system resources.
Warning
The limit is intended to prevent malicious files from consuming system resources and causing crashes and other errors. Only change this value if you are certain your documents are not malicious and you need the text which was not OCRed
"},{"location":"configuration/#PAPERLESS_OCR_COLOR_CONVERSION_STRATEGY","title":"PAPERLESS_OCR_COLOR_CONVERSION_STRATEGY=<RGB>
","text":"Controls the Ghostscript color conversion strategy when creating the archive file. This setting will only be utilized if the output is a version of PDF/A.
Valid options are CMYK, Gray, LeaveColorUnchanged, RGB or UseDeviceIndependentColor.
You can find more on the settings here in the Ghostscript documentation.
Warning
Utilizing some of the options may result in errors when creating archive files from PDFs.
"},{"location":"configuration/#PAPERLESS_OCR_USER_ARGS","title":"PAPERLESS_OCR_USER_ARGS=<json>
","text":"OCRmyPDF offers many more options. Use this parameter to specify any additional arguments you wish to pass to OCRmyPDF. Since Paperless uses the API of OCRmyPDF, you have to specify these in a format that can be passed to the API. See the API reference of OCRmyPDF for valid parameters. All command line options are supported, but they use underscores instead of dashes.
Warning
Paperless has been tested to work with the OCR options provided above. There are many options that are incompatible with each other, so specifying invalid options may prevent paperless from consuming any documents. Use with caution!
Specify arguments as a JSON dictionary. Keep note of lower case booleans and double quoted parameter names and strings. Examples:
{\"deskew\": true, \"optimize\": 3, \"unpaper_args\": \"--pre-rotate 90\"}\n
"},{"location":"configuration/#software_tweaks","title":"Software tweaks","text":""},{"location":"configuration/#PAPERLESS_TASK_WORKERS","title":"PAPERLESS_TASK_WORKERS=<num>
","text":"Paperless does multiple things in the background: Maintain the search index, maintain the automatic matching algorithm, check emails, consume documents, etc. This variable specifies how many things it will do in parallel.
Defaults to 1
"},{"location":"configuration/#PAPERLESS_THREADS_PER_WORKER","title":"PAPERLESS_THREADS_PER_WORKER=<num>
","text":"Furthermore, paperless uses multiple threads when consuming documents to speed up OCR. This variable specifies how many pages paperless will process in parallel on a single document.
Warning
Ensure that the product
PAPERLESS_TASK_WORKERS * PAPERLESS_THREADS_PER_WORKER
does not exceed your CPU core count or else paperless will be extremely slow. If you want paperless to process many documents in parallel, choose a high worker count. If you want paperless to process very large documents faster, use a higher thread per worker count.
The default is a balance between the two, according to your CPU core count, with a slight favor towards threads per worker:
CPU core count Workers Threads > 1 > 1 > 1 > 2 > 2 > 1 > 4 > 2 > 2 > 6 > 2 > 3 > 8 > 2 > 4 > 12 > 3 > 4 > 16 > 4 > 4If you only specify PAPERLESS_TASK_WORKERS, paperless will adjust PAPERLESS_THREADS_PER_WORKER automatically.
"},{"location":"configuration/#PAPERLESS_WORKER_TIMEOUT","title":"PAPERLESS_WORKER_TIMEOUT=<num>
","text":"Machines with few cores or weak ones might not be able to finish OCR on large documents within the default 1800 seconds. So extending this timeout may prove to be useful on weak hardware setups."},{"location":"configuration/#PAPERLESS_TIME_ZONE","title":"PAPERLESS_TIME_ZONE=<timezone>
","text":"Set the time zone here. See more details on why and how to set it in the Django project documentation for details on how to set it.
Defaults to UTC.
"},{"location":"configuration/#PAPERLESS_ENABLE_NLTK","title":"PAPERLESS_ENABLE_NLTK=<bool>
","text":"Enables or disables the advanced natural language processing used during automatic classification. If disabled, paperless will still perform some basic text pre-processing before matching. See also PAPERLESS_NLTK_DIR
.
Defaults to 1.
"},{"location":"configuration/#PAPERLESS_EMAIL_TASK_CRON","title":"PAPERLESS_EMAIL_TASK_CRON=<cron expression>
","text":"Configures the scheduled email fetching frequency. The value should be a valid crontab(5) expression describing when to run. If set to the string \"disable\", no emails will be fetched automatically.
Defaults to */10 * * * *
or every ten minutes.
PAPERLESS_TRAIN_TASK_CRON=<cron expression>
","text":"Configures the scheduled automatic classifier training frequency. The value should be a valid crontab(5) expression describing when to run. If set to the string \"disable\", the classifier will not be trained automatically.
Defaults to 5 */1 * * *
or every hour at 5 minutes past the hour.
PAPERLESS_INDEX_TASK_CRON=<cron expression>
","text":"Configures the scheduled search index update frequency. The value should be a valid crontab(5) expression describing when to run. If set to the string \"disable\", the search index will not be automatically updated.
Defaults to 0 0 * * *
or daily at midnight.
PAPERLESS_SANITY_TASK_CRON=<cron expression>
","text":"Configures the scheduled sanity checker frequency. If set to the string \"disable\", the sanity checker will not run automatically.
Defaults to 30 0 * * sun
or Sunday at 30 minutes past midnight.
PAPERLESS_ENABLE_COMPRESSION=<bool>
","text":"Enables compression of the responses from the webserver. Defaults to 1, enabling compression.
Note
If you are using a proxy such as nginx, it is likely more efficient to enable compression in your proxy configuration rather than the webserver
"},{"location":"configuration/#PAPERLESS_CONVERT_MEMORY_LIMIT","title":"PAPERLESS_CONVERT_MEMORY_LIMIT=<num>
","text":"On smaller systems, or even in the case of Very Large Documents, the consumer may explode, complaining about how it's \"unable to extend pixel cache\". In such cases, try setting this to a reasonably low value, like 32. The default is to use whatever is necessary to do everything without writing to disk, and units are in megabytes.
For more information on how to use this value, you should search the web for \"MAGICK_MEMORY_LIMIT\".
Defaults to 0, which disables the limit.
"},{"location":"configuration/#PAPERLESS_CONVERT_TMPDIR","title":"PAPERLESS_CONVERT_TMPDIR=<path>
","text":"Similar to the memory limit, if you've got a small system and your OS mounts /tmp as tmpfs, you should set this to a path that's on a physical disk, like /home/your_user/tmp or something. ImageMagick will use this as scratch space when crunching through very large documents.
For more information on how to use this value, you should search the web for \"MAGICK_TMPDIR\".
Default is none, which disables the temporary directory.
"},{"location":"configuration/#PAPERLESS_APPS","title":"PAPERLESS_APPS=<string>
","text":"A comma-separated list of Django apps to be included in Django's INSTALLED_APPS
. This setting should be used with caution!
Defaults to None, which does not add any additional apps.
"},{"location":"configuration/#PAPERLESS_MAX_IMAGE_PIXELS","title":"PAPERLESS_MAX_IMAGE_PIXELS=<number>
","text":"Configures the maximum size of an image PIL will allow to load without warning or error. If unset, will default to the value determined by Pillow.
Defaults to None, which does change the limit
Warning
This limit is designed to prevent denial of service from malicious files. It should only be raised or disabled in certain circumstances and with great care.
"},{"location":"configuration/#consume_config","title":"Document Consumption","text":""},{"location":"configuration/#PAPERLESS_CONSUMER_DISABLE","title":"PAPERLESS_CONSUMER_DISABLE=<bool>
","text":"Completely disable the directory-based consumer in docker. If you don't plan to consume documents via the consumption directory, you can disable the consumer to save resources."},{"location":"configuration/#PAPERLESS_CONSUMER_DELETE_DUPLICATES","title":"PAPERLESS_CONSUMER_DELETE_DUPLICATES=<bool>
","text":"When the consumer detects a duplicate document, it will not touch the original document. This default behavior can be changed here.
Defaults to false.
"},{"location":"configuration/#PAPERLESS_CONSUMER_RECURSIVE","title":"PAPERLESS_CONSUMER_RECURSIVE=<bool>
","text":"Enable recursive watching of the consumption directory. Paperless will then pickup files from files in subdirectories within your consumption directory as well.
Defaults to false.
"},{"location":"configuration/#PAPERLESS_CONSUMER_SUBDIRS_AS_TAGS","title":"PAPERLESS_CONSUMER_SUBDIRS_AS_TAGS=<bool>
","text":"Set the names of subdirectories as tags for consumed files. E.g. <CONSUMPTION_DIR>/foo/bar/file.pdf
will add the tags \"foo\" and \"bar\" to the consumed file. Paperless will create any tags that don't exist yet.
This is useful for sorting documents with certain tags such as car
or todo
prior to consumption. These folders won't be deleted.
PAPERLESS_CONSUMER_RECURSIVE must be enabled for this to work.
Defaults to false.
"},{"location":"configuration/#PAPERLESS_CONSUMER_IGNORE_PATTERNS","title":"PAPERLESS_CONSUMER_IGNORE_PATTERNS=<json>
","text":"By default, paperless ignores certain files and folders in the consumption directory, such as system files created by the Mac OS or hidden folders some tools use to store data.
This can be adjusted by configuring a custom json array with patterns to exclude.
For example, .DS_STORE/*
will ignore any files found in a folder named .DS_STORE
, including .DS_STORE/bar.pdf
and foo/.DS_STORE/bar.pdf
A pattern like ._*
will ignore anything starting with ._
, including: ._foo.pdf
and ._bar/foo.pdf
Defaults to [\".DS_Store\", \".DS_STORE\", \"._*\", \".stfolder/*\", \".stversions/*\", \".localized/*\", \"desktop.ini\", \"@eaDir/*\", \"Thumbs.db\"]
.
PAPERLESS_CONSUMER_BARCODE_SCANNER=<string>
","text":"Sets the barcode scanner used for barcode functionality.
Currently, \"PYZBAR\" (the default) or \"ZXING\" might be selected. If you have problems that your Barcodes/QR-Codes are not detected (especially with bad scan quality and/or small codes), try the other one.
"},{"location":"configuration/#PAPERLESS_PRE_CONSUME_SCRIPT","title":"PAPERLESS_PRE_CONSUME_SCRIPT=<filename>
","text":"After some initial validation, Paperless can trigger an arbitrary script if you like before beginning consumption. This script will be provided data for it to work with via the environment.
For more information, take a look at pre-consumption script.
The default is blank, which means nothing will be executed.
"},{"location":"configuration/#PAPERLESS_POST_CONSUME_SCRIPT","title":"PAPERLESS_POST_CONSUME_SCRIPT=<filename>
","text":"After a document is consumed, Paperless can trigger an arbitrary script if you like. This script will be provided data for it to work with via the environment.
For more information, take a look at Post-consumption script.
The default is blank, which means nothing will be executed.
"},{"location":"configuration/#PAPERLESS_FILENAME_DATE_ORDER","title":"PAPERLESS_FILENAME_DATE_ORDER=<format>
","text":"Paperless will check the document text for document date information. Use this setting to enable checking the document filename for date information. The date order can be set to any option as specified in https://dateparser.readthedocs.io/en/latest/settings.html#date-order. The filename will be checked first, and if nothing is found, the document text will be checked as normal.
A date in a filename must have some separators (.
, ,
, -
, /
, etc) for it to be parsed.
Defaults to none, which disables this feature.
"},{"location":"configuration/#PAPERLESS_NUMBER_OF_SUGGESTED_DATES","title":"PAPERLESS_NUMBER_OF_SUGGESTED_DATES=<num>
","text":"Paperless searches an entire document for dates. The first date found will be used as the initial value for the created date. When this variable is greater than 0 (or left to its default value), paperless will also suggest other dates found in the document, up to a maximum of this setting. Note that duplicates will be removed, which can result in fewer dates displayed in the frontend than this setting value.
The task to find all dates can be time-consuming and increases with a higher (maximum) number of suggested dates and slower hardware.
Defaults to 3. Set to 0 to disable this feature.
"},{"location":"configuration/#PAPERLESS_THUMBNAIL_FONT_NAME","title":"PAPERLESS_THUMBNAIL_FONT_NAME=<filename>
","text":"Paperless creates thumbnails for plain text files by rendering the content of the file on an image and uses a predefined font for that. This font can be changed here.
Note that this won't have any effect on already generated thumbnails.
Defaults to /usr/share/fonts/liberation/LiberationSerif-Regular.ttf
.
PAPERLESS_IGNORE_DATES=<string>
","text":"Paperless parses a document's creation date from filename and file content. You may specify a comma separated list of dates that should be ignored during this process. This is useful for special dates (like date of birth) that appear in documents regularly but are very unlikely to be the document's creation date.
The date is parsed using the order specified in PAPERLESS_DATE_ORDER
Defaults to an empty string to not ignore any dates.
"},{"location":"configuration/#PAPERLESS_DATE_ORDER","title":"PAPERLESS_DATE_ORDER=<format>
","text":"Paperless will try to determine the document creation date from its contents. Specify the date format Paperless should expect to see within your documents.
This option defaults to DMY which translates to day first, month second, and year last order. Characters D, M, or Y can be shuffled to meet the required order.
"},{"location":"configuration/#PAPERLESS_ENABLE_GPG_DECRYPTOR","title":"PAPERLESS_ENABLE_GPG_DECRYPTOR=<bool>
","text":"Enable or disable the GPG decryptor for encrypted emails. See GPG Decryptor for more information.
Defaults to false.
"},{"location":"configuration/#polling","title":"Polling","text":""},{"location":"configuration/#PAPERLESS_CONSUMER_POLLING","title":"PAPERLESS_CONSUMER_POLLING=<num>
","text":"If paperless won't find documents added to your consume folder, it might not be able to automatically detect filesystem changes. In that case, specify a polling interval in seconds here, which will then cause paperless to periodically check your consumption directory for changes. This will also disable listening for file system changes with inotify
.
Defaults to 0, which disables polling and uses filesystem notifications.
"},{"location":"configuration/#PAPERLESS_CONSUMER_POLLING_RETRY_COUNT","title":"PAPERLESS_CONSUMER_POLLING_RETRY_COUNT=<num>
","text":"If consumer polling is enabled, sets the maximum number of times paperless will check for a file to remain unmodified. If a file's modification time and size are identical for two consecutive checks, it will be consumed.
Defaults to 5.
"},{"location":"configuration/#PAPERLESS_CONSUMER_POLLING_DELAY","title":"PAPERLESS_CONSUMER_POLLING_DELAY=<num>
","text":"If consumer polling is enabled, sets the delay in seconds between each check (above) paperless will do while waiting for a file to remain unmodified.
Defaults to 5.
"},{"location":"configuration/#inotify","title":"iNotify","text":""},{"location":"configuration/#PAPERLESS_CONSUMER_INOTIFY_DELAY","title":"PAPERLESS_CONSUMER_INOTIFY_DELAY=<num>
","text":"Sets the time in seconds the consumer will wait for additional events from inotify before the consumer will consider a file ready and begin consumption. Certain scanners or network setups may generate multiple events for a single file, leading to multiple consumers working on the same file. Configure this to prevent that.
Defaults to 0.5 seconds.
"},{"location":"configuration/#incoming_mail","title":"Incoming Mail","text":""},{"location":"configuration/#email_oauth","title":"Email OAuth","text":""},{"location":"configuration/#PAPERLESS_OAUTH_CALLBACK_BASE_URL","title":"PAPERLESS_OAUTH_CALLBACK_BASE_URL=<str>
","text":"The base URL for the OAuth callback. This is used to construct the full URL for the OAuth callback. This should be the URL that the Paperless instance is accessible at. If not set, defaults to the PAPERLESS_URL
setting. At least one of these settings must be set to enable OAuth Email setup.
Defaults to none (thus will use PAPERLESS_URL).
Note
This setting only applies to OAuth Email setup (not to the SSO setup).
"},{"location":"configuration/#PAPERLESS_GMAIL_OAUTH_CLIENT_ID","title":"PAPERLESS_GMAIL_OAUTH_CLIENT_ID=<str>
","text":"The OAuth client ID for Gmail. This is required for Gmail OAuth Email setup. See OAuth Email Setup for more information.
Defaults to none.
"},{"location":"configuration/#PAPERLESS_GMAIL_OAUTH_CLIENT_SECRET","title":"PAPERLESS_GMAIL_OAUTH_CLIENT_SECRET=<str>
","text":"The OAuth client secret for Gmail. This is required for Gmail OAuth Email setup. See OAuth Email Setup for more information.
Defaults to none.
"},{"location":"configuration/#PAPERLESS_OUTLOOK_OAUTH_CLIENT_ID","title":"PAPERLESS_OUTLOOK_OAUTH_CLIENT_ID=<str>
","text":"The OAuth client ID for Outlook. This is required for Outlook OAuth Email setup. See OAuth Email Setup for more information.
Defaults to none.
"},{"location":"configuration/#PAPERLESS_OUTLOOK_OAUTH_CLIENT_SECRET","title":"PAPERLESS_OUTLOOK_OAUTH_CLIENT_SECRET=<str>
","text":"The OAuth client secret for Outlook. This is required for Outlook OAuth Email setup. See OAuth Email Setup for more information.
Defaults to none.
"},{"location":"configuration/#encrypted_emails","title":"Encrypted Emails","text":""},{"location":"configuration/#PAPERLESS_EMAIL_GNUPG_HOME","title":"PAPERLESS_EMAIL_GNUPG_HOME=<str>
","text":"Optional, sets the GNUPG_HOME
path to use with GPG decryptor for encrypted emails. See GPG Decryptor for more information. If not set, defaults to the default GNUPG_HOME
path.
Defaults to ."},{"location":"configuration/#barcodes","title":"Barcodes","text":""},{"location":"configuration/#PAPERLESS_CONSUMER_ENABLE_BARCODES","title":"PAPERLESS_CONSUMER_ENABLE_BARCODES=<bool>
","text":"
Enables the scanning and page separation based on detected barcodes. This allows for scanning and adding multiple documents per uploaded file, which are separated by one or multiple barcode pages.
For ease of use, it is suggested to use a standardized separation page, e.g. here.
If no barcodes are detected in the uploaded file, no page separation will happen.
The original document will be removed and the separated pages will be saved as pdf.
See additional information in the advanced usage documentation
Defaults to false.
"},{"location":"configuration/#PAPERLESS_CONSUMER_BARCODE_TIFF_SUPPORT","title":"PAPERLESS_CONSUMER_BARCODE_TIFF_SUPPORT=<bool>
","text":"Whether TIFF image files should be scanned for barcodes. This will automatically convert any TIFF image(s) to pdfs for later processing. This only has an effect, if PAPERLESS_CONSUMER_ENABLE_BARCODES has been enabled.
Defaults to false.
"},{"location":"configuration/#PAPERLESS_CONSUMER_BARCODE_STRING","title":"PAPERLESS_CONSUMER_BARCODE_STRING=<string>
","text":"Defines the string to be detected as a separator barcode. If paperless is used with the PATCH-T separator pages, users shouldn't change this.
Defaults to \"PATCHT\"
"},{"location":"configuration/#PAPERLESS_CONSUMER_BARCODE_RETAIN_SPLIT_PAGES","title":"PAPERLESS_CONSUMER_BARCODE_RETAIN_SPLIT_PAGES=<bool>
","text":"If set to true, all pages that are split by a barcode (such as PATCHT) will be kept.
Defaults to false.
"},{"location":"configuration/#PAPERLESS_CONSUMER_ENABLE_ASN_BARCODE","title":"PAPERLESS_CONSUMER_ENABLE_ASN_BARCODE=<bool>
","text":"Enables the detection of barcodes in the scanned document and setting the ASN (archive serial number) if a properly formatted barcode is detected.
The barcode must consist of a (configurable) prefix and the ASN to be set, for instance ASN00123
. The content after the prefix is cleaned of non-numeric characters.
This option is compatible with barcode page separation, since pages will be split up before reading the ASN.
If no ASN barcodes are detected in the uploaded file, no ASN will be set. If a barcode with an existing ASN is detected, the document will not be consumed and an error logged.
Defaults to false.
"},{"location":"configuration/#PAPERLESS_CONSUMER_ASN_BARCODE_PREFIX","title":"PAPERLESS_CONSUMER_ASN_BARCODE_PREFIX=<string>
","text":"Defines the prefix that is used to identify a barcode as an ASN barcode.
Defaults to \"ASN\"
"},{"location":"configuration/#PAPERLESS_CONSUMER_BARCODE_UPSCALE","title":"PAPERLESS_CONSUMER_BARCODE_UPSCALE=<float>
","text":"Defines the upscale factor used in barcode detection. Improves the detection of small barcodes, i.e. with a value of 1.5 by upscaling the document before the detection process. Upscaling will only take place if value is bigger than 1.0. Otherwise upscaling will not be performed to save resources. Try using in combination with PAPERLESS_CONSUMER_BARCODE_DPI set to a value higher than default.
Defaults to 0.0
"},{"location":"configuration/#PAPERLESS_CONSUMER_BARCODE_DPI","title":"PAPERLESS_CONSUMER_BARCODE_DPI=<int>
","text":"During barcode detection every page from a PDF document needs to be converted to an image. A dpi value can be specified in the conversion process. Default is 300. If the detection of small barcodes fails a bigger dpi value i.e. 600 can fix the issue. Try using in combination with PAPERLESS_CONSUMER_BARCODE_UPSCALE bigger than 1.0.
Defaults to \"300\"
"},{"location":"configuration/#PAPERLESS_CONSUMER_BARCODE_MAX_PAGES","title":"PAPERLESS_CONSUMER_BARCODE_MAX_PAGES=<int>
","text":"Because barcode detection is a computationally-intensive operation, this setting limits the detection of barcodes to a number of first pages. If your scanner has a limit for the number of pages that can be scanned it would be sensible to set this as the limit here.
Defaults to \"0\", allowing all pages to be checked for barcodes.
"},{"location":"configuration/#PAPERLESS_CONSUMER_ENABLE_TAG_BARCODE","title":"PAPERLESS_CONSUMER_ENABLE_TAG_BARCODE=<bool>
","text":"Enables the detection of barcodes in the scanned document and assigns or creates tags if a properly formatted barcode is detected.
The barcode must match one of the (configurable) regular expressions. If the barcode text contains ',' (comma), it is split into multiple barcodes which are individually processed for tagging.
Matching is case insensitive.
Defaults to false.
"},{"location":"configuration/#PAPERLESS_CONSUMER_TAG_BARCODE_MAPPING","title":"PAPERLESS_CONSUMER_TAG_BARCODE_MAPPING=<json dict>
","text":"Defines a dictionary of filter regex and substitute expressions.
Syntax: {\"<regex>\": \"<substitute>\" [,...]]}
A barcode is considered for tagging if the barcode text matches at least one of the provided pattern.
If a match is found, the rule is applied. This allows very versatile reformatting and mapping of barcode pattern to tag values.
If a tag is not found it will be created.
Defaults to:
{\"TAG:(.*)\": \"\\\\g<1>\"}
which defines - a regex TAG:(.*) which includes barcodes beginning with TAG: followed by any text that gets stored into match group #1 and - a substitute \\\\g<1>
that replaces the original barcode text by the content in match group #1. Consequently, the tag is the barcode text without its TAG: prefix.
More examples:
{\"ASN12.*\": \"JOHN\", \"ASN13.*\": \"SMITH\"}
for example maps - ASN12nnnn barcodes to the tag JOHN and - ASN13nnnn barcodes to the tag SMITH.
{\"T-J\": \"JOHN\", \"T-S\": \"SMITH\", \"T-D\": \"DOE\"}
directly maps - T-J barcodes to the tag JOHN, - T-S barcodes to the tag SMITH and - T-D barcodes to the tag DOE.
Please refer to the Python regex documentation for more information.
"},{"location":"configuration/#audit-trail","title":"Audit Trail","text":""},{"location":"configuration/#PAPERLESS_AUDIT_LOG_ENABLED","title":"PAPERLESS_AUDIT_LOG_ENABLED=<bool>
","text":"Enables the audit trail for documents, document types, correspondents, and tags.
Defaults to true.
"},{"location":"configuration/#collate","title":"Collate Double-Sided Documents","text":""},{"location":"configuration/#PAPERLESS_CONSUMER_ENABLE_COLLATE_DOUBLE_SIDED","title":"PAPERLESS_CONSUMER_ENABLE_COLLATE_DOUBLE_SIDED=<bool>
","text":"Enables automatic collation of two single-sided scans into a double-sided document.
This is useful if you have an automatic document feeder that only supports single-sided scans, but you need to scan a double-sided document. If your ADF supports double-sided scans natively, you do not need this feature.
PAPERLESS_CONSUMER_RECURSIVE
must be enabled for this to work.
For more information, read the corresponding section in the advanced documentation.
Defaults to false.
"},{"location":"configuration/#PAPERLESS_CONSUMER_COLLATE_DOUBLE_SIDED_SUBDIR_NAME","title":"PAPERLESS_CONSUMER_COLLATE_DOUBLE_SIDED_SUBDIR_NAME=<str>
","text":"The name of the subdirectory that the collate feature expects documents to arrive.
This only has an effect if PAPERLESS_CONSUMER_ENABLE_COLLATE_DOUBLE_SIDED
has been enabled. Note that Paperless will not automatically create the directory.
Defaults to \"double-sided\".
"},{"location":"configuration/#PAPERLESS_CONSUMER_COLLATE_DOUBLE_SIDED_TIFF_SUPPORT","title":"PAPERLESS_CONSUMER_COLLATE_DOUBLE_SIDED_TIFF_SUPPORT=<bool>
","text":"Whether TIFF image files should be supported when collating documents. This will automatically convert any TIFF image(s) to pdfs for later processing. This only has an effect if PAPERLESS_CONSUMER_ENABLE_COLLATE_DOUBLE_SIDED
has been enabled.
Defaults to false.
"},{"location":"configuration/#trash","title":"Trash","text":""},{"location":"configuration/#PAPERLESS_EMPTY_TRASH_DELAY","title":"PAPERLESS_EMPTY_TRASH_DELAY=<num>
","text":"Sets how long in days documents remain in the 'trash' before they are permanently deleted.
Defaults to 30 days, minimum of 1 day.
"},{"location":"configuration/#PAPERLESS_EMPTY_TRASH_TASK_CRON","title":"PAPERLESS_EMPTY_TRASH_TASK_CRON=<cron expression>
","text":"Configures the schedule to empty the trash of expired deleted documents.
Defaults to 0 1 * * *
, once per day.
There are a few external software packages that Paperless expects to find on your system when it starts up. Unless you've done something creative with their installation, you probably won't need to edit any of these. However, if you've installed these programs somewhere where simply typing the name of the program doesn't automatically execute it (ie. the program isn't in your $PATH), then you'll need to specify the literal path for that program.
"},{"location":"configuration/#PAPERLESS_CONVERT_BINARY","title":"PAPERLESS_CONVERT_BINARY=<path>
","text":"Defaults to \"convert\"."},{"location":"configuration/#PAPERLESS_GS_BINARY","title":"PAPERLESS_GS_BINARY=<path>
","text":"Defaults to \"gs\"."},{"location":"configuration/#docker","title":"Docker-specific options","text":"These options don't have any effect in paperless.conf
. These options adjust the behavior of the docker container. Configure these in docker-compose.env
.
PAPERLESS_WEBSERVER_WORKERS=<num>
","text":"The number of worker processes the webserver should spawn. More worker processes usually result in the front end to load data much quicker. However, each worker process also loads the entire application into memory separately, so increasing this value will increase RAM usage.
Defaults to 1.
Note
This option may also be set with GRANIAN_WORKERS
and this option may be removed in the future
PAPERLESS_BIND_ADDR=<ip address>
","text":"The IP address the webserver will listen on inside the container. There are special setups where you may need to configure this value to restrict the Ip address or interface the webserver listens on.
Defaults to ::
, meaning all interfaces, including IPv6.
Note
This option may also be set with GRANIAN_HOST
and this option may be removed in the future
PAPERLESS_PORT=<port>
","text":"The port number the webserver will listen on inside the container. There are special setups where you may need this to avoid collisions with other services (like using podman with multiple containers in one pod).
Don't change this when using Docker. To change the port the webserver is reachable outside of the container, instead refer to the \"ports\" key in docker-compose.yml
.
Defaults to 8000.
Note
This option may also be set with GRANIAN_PORT
and this option may be removed in the future
USERMAP_UID=<uid>
","text":"The ID of the paperless user in the container. Set this to your actual user ID on the host system, which you can get by executing
id -u\n
Paperless will change ownership on its folders to this user, so you need to get this right in order to be able to write to the consumption directory.
Defaults to 1000.
"},{"location":"configuration/#USERMAP_GID","title":"USERMAP_GID=<gid>
","text":"The ID of the paperless Group in the container. Set this to your actual group ID on the host system, which you can get by executing
id -g\n
Paperless will change ownership on its folders to this group, so you need to get this right in order to be able to write to the consumption directory.
Defaults to 1000.
"},{"location":"configuration/#PAPERLESS_OCR_LANGUAGES","title":"PAPERLESS_OCR_LANGUAGES=<list>
","text":"Additional OCR languages to install. By default, paperless comes with English, German, Italian, Spanish and French. If your language is not in this list, install additional languages with this configuration option. You will need to find the right LangCodes but note that tesseract-ocr-* package names do not always correspond with the language codes e.g. \"chi_tra\" should be specified as \"chi-tra\".
PAPERLESS_OCR_LANGUAGES=tur ces chi-tra\n
Make sure it's a space-separated list when using several values.
To actually use these languages, also set the default OCR language of paperless:
PAPERLESS_OCR_LANGUAGE=tur\n
Defaults to none, which does not install any additional languages.
Warning
This option must not be used in rootless containers.
"},{"location":"configuration/#PAPERLESS_ENABLE_FLOWER","title":"PAPERLESS_ENABLE_FLOWER=<defined>
","text":"If this environment variable is defined, the Celery monitoring tool Flower will be started by the container.
You can read more about this in the advanced documentation.
"},{"location":"configuration/#PAPERLESS_SUPERVISORD_WORKING_DIR","title":"PAPERLESS_SUPERVISORD_WORKING_DIR=<defined>
","text":"Warning
This option is deprecated and has no effect. For read only file system support,\nsee [S6_READ_ONLY_ROOT](https://github.com/just-containers/s6-overlay#customizing-s6-overlay-behaviour)\nfrom s6-overlay.\n
"},{"location":"configuration/#frontend-settings","title":"Frontend Settings","text":""},{"location":"configuration/#PAPERLESS_APP_TITLE","title":"PAPERLESS_APP_TITLE=<str>
","text":"If set, overrides the default name \"Paperless-ngx\""},{"location":"configuration/#PAPERLESS_APP_LOGO","title":"PAPERLESS_APP_LOGO=<path>
","text":"Path to an image file in the /media/logo directory, must include 'logo', e.g. /logo/Atari_logo.svg
"},{"location":"configuration/#PAPERLESS_ENABLE_UPDATE_CHECK","title":"PAPERLESS_ENABLE_UPDATE_CHECK=<bool>
","text":"Note
This setting was deprecated in favor of a frontend setting after v1.9.2. A one-time migration is performed for users who have this setting set. This setting is always ignored if the corresponding frontend setting has been set.
"},{"location":"configuration/#email-sending","title":"Email sending","text":"Setting an SMTP server for the backend will allow you to use the Email workflow action, send documents from the UI as well as reset your password. All of these options come from their similarly-named Django settings
"},{"location":"configuration/#PAPERLESS_EMAIL_HOST","title":"PAPERLESS_EMAIL_HOST=<str>
","text":"Defaults to 'localhost'."},{"location":"configuration/#PAPERLESS_EMAIL_PORT","title":"PAPERLESS_EMAIL_PORT=<int>
","text":"Defaults to port 25."},{"location":"configuration/#PAPERLESS_EMAIL_HOST_USER","title":"PAPERLESS_EMAIL_HOST_USER=<str>
","text":"Defaults to ''."},{"location":"configuration/#PAPERLESS_EMAIL_FROM","title":"PAPERLESS_EMAIL_FROM=<str>
","text":"Defaults to PAPERLESS_EMAIL_HOST_USER if not set."},{"location":"configuration/#PAPERLESS_EMAIL_HOST_PASSWORD","title":"PAPERLESS_EMAIL_HOST_PASSWORD=<str>
","text":"Defaults to ''."},{"location":"configuration/#PAPERLESS_EMAIL_USE_TLS","title":"PAPERLESS_EMAIL_USE_TLS=<bool>
","text":"Defaults to false."},{"location":"configuration/#PAPERLESS_EMAIL_USE_SSL","title":"PAPERLESS_EMAIL_USE_SSL=<bool>
","text":"Defaults to false."},{"location":"development/","title":"Development","text":"This section describes the steps you need to take to start development on Paperless-ngx.
Check out the source from GitHub. The repository is organized in the following way:
main
always represents the latest release and will only see changes when a new release is made.dev
contains the code that will be in the next release.feature-X
contains bigger changes that will be in some release, but not necessarily the next one.When making functional changes to Paperless-ngx, always make your changes on the dev
branch.
Apart from that, the folder structure is as follows:
docs/
- Documentation.src-ui/
- Code of the front end.src/
- Code of the back end.scripts/
- Various scripts that help with different parts of development.docker/
- Files required to build the docker image.Maybe you've been using Paperless-ngx for a while and want to add a feature or two, or maybe you've come across a bug that you have some ideas how to solve. The beauty of open source software is that you can see what's wrong and help to get it fixed for everyone!
Before contributing please review our code of conduct and other important information in the contributing guidelines.
"},{"location":"development/#code-formatting-with-pre-commit-hooks","title":"Code formatting with pre-commit hooks","text":"To ensure a consistent style and formatting across the project source, the project utilizes Git pre-commit
hooks to perform some formatting and linting before a commit is allowed. That way, everyone uses the same style and some common issues can be caught early on.
Once installed, hooks will run when you commit. If the formatting isn't quite right or a linter catches something, the commit will be rejected. You'll need to look at the output and fix the issue. Some hooks, such as the Python linting and formatting tool ruff
, will format failing files, so all you need to do is git add
those files again and retry your commit.
After you forked and cloned the code from GitHub you need to perform a first-time setup.
Note
Every command is executed directly from the root folder of the project unless specified otherwise.
Install prerequisites + uv as mentioned in Bare metal route.
Copy paperless.conf.example
to paperless.conf
and enable debug mode within the file via PAPERLESS_DEBUG=true
.
Create consume
and media
directories:
mkdir -p consume media\n
Install the Python dependencies:
$ uv sync --group dev\n
Install pre-commit hooks:
$ uv run pre-commit install\n
Apply migrations and create a superuser (also can be done via the web UI) for your development instance:
# src/\n\n$ uv run manage.py migrate\n$ uv run manage.py createsuperuser\n
You can now either ...
install redis or
use the included scripts/start_services.sh
to use docker to fire up a redis instance (and some other services such as tika, gotenberg and a database server) or
spin up a bare redis container
docker run -d -p 6379:6379 --restart unless-stopped redis:latest\n
Continue with either back-end or front-end development \u2013 or both :-).
The back end is a Django application. PyCharm as well as Visual Studio Code work well for development, but you can use whatever you want.
Configure the IDE to use the src/
-folder as the base source folder. Configure the following launch configurations in your IDE:
python3 manage.py runserver
python3 manage.py document_consumer
celery --app paperless worker -l DEBUG
(or any other log level)To start them all:
# src/\n\n$ python3 manage.py runserver & \\\n python3 manage.py document_consumer & \\\n celery --app paperless worker -l DEBUG\n
You might need the front end to test your back end code. This assumes that you have AngularJS installed on your system. Go to the Front end development section for further details. To build the front end once use this command:
# src-ui/\n\n$ pnpm install\n$ ng build --configuration production\n
"},{"location":"development/#testing","title":"Testing","text":"pytest
in the src/
directory to execute all tests. This also generates a HTML coverage report. When runnings test, paperless.conf
is loaded as well. However, the tests rely on the default configuration. This is not ideal. But for now, make sure no settings except for DEBUG are overridden when testing.Note
The line length rule E501 is generally useful for getting multiple source files next to each other on the screen. However, in some cases, its just not possible to make some lines fit, especially complicated IF cases. Append # noqa: E501
to disable this check for certain lines.
Paperless uses uv
to manage packages and virtual environments for both development and production. To accomplish some common tasks using uv
, follow the shortcuts below:
To upgrade all locked packages to the latest allowed versions: uv lock --upgrade
To upgrade a single locked package: uv lock --upgrade-package <package>
To add a new package: uv add <package>
To add a new development package uv add --dev <package>
The front end is built using AngularJS. In order to get started, you need Node.js (version 14.15+) and pnpm
.
Note
The following commands are all performed in the src-ui
-directory. You will need a running back end (including an active session) to connect to the back end API. To spin it up refer to the commands under the section above.
Install the Angular CLI. You might need sudo privileges to perform this command:
pnpm install -g @angular/cli\n
Make sure that it's on your path.
Install all necessary modules:
pnpm install\n
You can launch a development server by running:
ng serve\n
This will automatically update whenever you save. However, in-place compilation might fail on syntax errors, in which case you need to restart it.
By default, the development server is available on http://localhost:4200/
and is configured to access the API at http://localhost:8000/api/
, which is the default of the backend. If you enabled DEBUG
on the back end, several security overrides for allowed hosts and CORS are in place so that the front end behaves exactly as in production.
The front end code (.ts, .html, .scss) use prettier
for code formatting via the Git pre-commit
hooks which run automatically on commit. See above for installation instructions. You can also run this via the CLI with a command such as
$ git ls-files -- '*.ts' | xargs pre-commit run prettier --files\n
Front end testing uses Jest and Playwright. Unit tests and e2e tests, respectively, can be run non-interactively with:
$ ng test\n$ npx playwright test\n
Playwright also includes a UI which can be run with:
$ npx playwright test --ui\n
"},{"location":"development/#building-the-frontend","title":"Building the frontend","text":"In order to build the front end and serve it as part of Django, execute:
$ ng build --configuration production\n
This will build the front end and put it in a location from which the Django server will serve it as static content. This way, you can verify that authentication is working.
"},{"location":"development/#localization","title":"Localization","text":"Paperless-ngx is available in many different languages. Since Paperless-ngx consists both of a Django application and an AngularJS front end, both these parts have to be translated separately.
"},{"location":"development/#front-end-localization","title":"Front end localization","text":"src-ui/messages.xlf
.src-ui/src/locale/
folder.ng extract-i18n
.Adding new languages requires adding the translated files in the src-ui/src/locale/
folder and adjusting a couple files.
Adjust src-ui/angular.json
:
\"i18n\": {\n \"sourceLocale\": \"en-US\",\n \"locales\": {\n \"de\": \"src/locale/messages.de.xlf\",\n \"nl-NL\": \"src/locale/messages.nl_NL.xlf\",\n \"fr\": \"src/locale/messages.fr.xlf\",\n \"en-GB\": \"src/locale/messages.en_GB.xlf\",\n \"pt-BR\": \"src/locale/messages.pt_BR.xlf\",\n \"language-code\": \"language-file\"\n }\n}\n
Add the language to the LANGUAGE_OPTIONS
array in src-ui/src/app/services/settings.service.ts
:
`dateInputFormat` is a special string that defines the behavior of\nthe date input fields and absolutely needs to contain \"dd\", \"mm\"\nand \"yyyy\".\n
Import and register the Angular data for this locale in src-ui/src/app/app.module.ts
:
import localeDe from '@angular/common/locales/de'\nregisterLocaleData(localeDe)\n
A majority of the strings that appear in the back end appear only when the admin is used. However, some of these are still shown on the front end (such as error messages).
src/locale/
.python3 manage.py makemessages -l en_US
. This is important after making changes to translatable strings.python3 manage.py compilemessages
to do this. The generated files don't get committed into git, since these are derived artifacts. The build pipeline takes care of executing this command.Adding new languages requires adding the translated files in the src/locale/
-folder and adjusting the file src/paperless/settings.py
to include the new language:
LANGUAGES = [\n (\"en-us\", _(\"English (US)\")),\n (\"en-gb\", _(\"English (GB)\")),\n (\"de\", _(\"German\")),\n (\"nl-nl\", _(\"Dutch\")),\n (\"fr\", _(\"French\")),\n (\"pt-br\", _(\"Portuguese (Brazil)\")),\n # Add language here.\n]\n
"},{"location":"development/#building-the-documentation","title":"Building the documentation","text":"The documentation is built using material-mkdocs, see their documentation. If you want to build the documentation locally, this is how you do it:
Build the documentation
$ uv run mkdocs build --config-file mkdocs.yml\n
alternatively...
Serve the documentation. This will spin up a copy of the documentation at http://127.0.0.1:8000 that will automatically refresh every time you change something.
$ uv run mkdocs serve\n
The docker image is primarily built by the GitHub actions workflow, but it can be faster when developing to build and tag an image locally.
Make sure you have the docker-buildx
package installed. Building the image works as with any image:
docker build --file Dockerfile --tag paperless:local .\n
"},{"location":"development/#extending-paperless-ngx","title":"Extending Paperless-ngx","text":"Paperless-ngx does not have any fancy plugin systems and will probably never have. However, some parts of the application have been designed to allow easy integration of additional features without any modification to the base code.
"},{"location":"development/#making-custom-parsers","title":"Making custom parsers","text":"Paperless-ngx uses parsers to add documents. A parser is responsible for:
Custom parsers can be added to Paperless-ngx to support more file types. In order to do that, you need to write the parser itself and announce its existence to Paperless-ngx.
The parser itself must extend documents.parsers.DocumentParser
and must implement the methods parse
and get_thumbnail
. You can provide your own implementation to get_date
if you don't want to rely on Paperless-ngx' default date guessing mechanisms.
class MyCustomParser(DocumentParser):\n\n def parse(self, document_path, mime_type):\n # This method does not return anything. Rather, you should assign\n # whatever you got from the document to the following fields:\n\n # The content of the document.\n self.text = \"content\"\n\n # Optional: path to a PDF document that you created from the original.\n self.archive_path = os.path.join(self.tempdir, \"archived.pdf\")\n\n # Optional: \"created\" date of the document.\n self.date = get_created_from_metadata(document_path)\n\n def get_thumbnail(self, document_path, mime_type):\n # This should return the path to a thumbnail you created for this\n # document.\n return os.path.join(self.tempdir, \"thumb.webp\")\n
If you encounter any issues during parsing, raise a documents.parsers.ParseError
.
The self.tempdir
directory is a temporary directory that is guaranteed to be empty and removed after consumption finished. You can use that directory to store any intermediate files and also use it to store the thumbnail / archived document.
After that, you need to announce your parser to Paperless-ngx. You need to connect a handler to the document_consumer_declaration
signal. Have a look in the file src/paperless_tesseract/apps.py
on how that's done. The handler is a method that returns information about your parser:
def myparser_consumer_declaration(sender, **kwargs):\n return {\n \"parser\": MyCustomParser,\n \"weight\": 0,\n \"mime_types\": {\n \"application/pdf\": \".pdf\",\n \"image/jpeg\": \".jpg\",\n }\n }\n
parser
is a reference to a class that extends DocumentParser
.weight
is used whenever two or more parsers are able to parse a file: The parser with the higher weight wins. This can be used to override the parsers provided by Paperless-ngx.mime_types
is a dictionary. The keys are the mime types your parser supports and the value is the default file extension that Paperless-ngx should use when storing files and serving them for download. We could guess that from the file extensions, but some mime types have many extensions associated with them and the Python methods responsible for guessing the extension do not always return the same value.Another easy way to get started with development is to use Visual Studio Code devcontainers. This approach will create a preconfigured development environment with all of the required tools and dependencies. Learn more about devcontainers. The .devcontainer/vscode/tasks.json and .devcontainer/vscode/launch.json files contain more information about the specific tasks and launch configurations (see the non-standard \"description\" field).
To get started:
Clone the repository on your machine and open the Paperless-ngx folder in VS Code.
VS Code will prompt you with \"Reopen in container\". Do so and wait for the environment to start.
Initialize the project by running the task Project Setup: Run all Init Tasks. This will initialize the database tables and create a superuser. Then you can compile the front end for production or run the frontend in debug mode.
The project is ready for debugging, start either run the fullstack debug or individual debug processes. Yo spin up the project without debugging run the task Project Start: Run all Services
A: While Paperless-ngx is already considered largely \"feature-complete\", it is a community-driven project and development will be guided in this way. New features can be submitted via GitHub discussions and \"up-voted\" by the community, but this is not a guarantee that the feature will be implemented. This project will always be open to collaboration in the form of PRs, ideas etc.
"},{"location":"faq/#im-using-docker-where-are-my-documents","title":"I'm using docker. Where are my documents?","text":"A: By default, your documents are stored inside the docker volume paperless_media
. Docker manages this volume automatically for you. It is a persistent storage and will persist as long as you don't explicitly delete it. The actual location depends on your host operating system. On Linux, chances are high that this location is
/var/lib/docker/volumes/paperless_media/_data\n
Warning
Do not mess with this folder. Don't change permissions and don't move files around manually. This folder is meant to be entirely managed by docker and paperless.
Note
Files consumed from the consumption directory are re-created inside this media directory and are removed from the consumption directory itself.
"},{"location":"faq/#lets-say-i-want-to-switch-tools-in-a-year-can-i-easily-move-to-other-systems","title":"Let's say I want to switch tools in a year. Can I easily move to other systems?","text":"A: Your documents are stored as plain files inside the media folder. You can always drag those files out of that folder to use them elsewhere. Here are a couple notes about that.
A: Currently, the following files are supported:
Paperless-ngx determines the type of a file by inspecting its content. The file extensions do not matter.
"},{"location":"faq/#will-paperless-ngx-run-on-raspberry-pi","title":"Will paperless-ngx run on Raspberry Pi?","text":"A: The short answer is yes. I've tested it on a Raspberry Pi 3 B. The long answer is that certain parts of Paperless will run very slow, such as the OCR. On Raspberry Pi, try to OCR documents before feeding them into paperless so that paperless can reuse the text. The web interface is a lot snappier, since it runs in your browser and paperless has to do much less work to serve the data.
Note
You can adjust some of the settings so that paperless uses less processing power. See setup for details.
"},{"location":"faq/#how-do-i-install-paperless-ngx-on-raspberry-pi","title":"How do I install paperless-ngx on Raspberry Pi?","text":"A: Docker images are available for arm64 hardware, so just follow the Docker Compose instructions. Apart from more required disk space compared to a bare metal installation, docker comes with close to zero overhead, even on Raspberry Pi.
If you decide to go with the bare metal route, be aware that some of the python requirements do not have precompiled packages for ARM / ARM64. Installation of these will require additional development libraries and compilation will take a long time.
Note
For ARMv7 (32-bit) systems, paperless may still function, but it could require modifications to the Dockerfile (if using Docker) or additional tools for installing bare metal. It is suggested to upgrade to arm64 instead.
"},{"location":"faq/#how-do-i-run-this-on-unraid","title":"How do I run this on Unraid?","text":"A: Paperless-ngx is available as community app in Unraid. Uli Fahrer created a container template for that.
"},{"location":"faq/#how-do-i-run-this-on-my-toaster","title":"How do I run this on my toaster?","text":"A: I honestly don't know! As for all other devices that might be able to run paperless, you're a bit on your own. If you can't run the docker image, the documentation has instructions for bare metal installs.
"},{"location":"faq/#what-about-the-redis-licensing-change-and-using-one-of-the-open-source-forks","title":"What about the Redis licensing change and using one of the open source forks?","text":"Currently (October 2024), forks of Redis such as Valkey or Redirect are not officially supported by our upstream libraries, so using one of these to replace Redis is not officially supported.
However, they do claim to be compatible with the Redis protocol and will likely work, but we will not be updating from using Redis as the broker officially just yet.
"},{"location":"setup/","title":"Setup","text":""},{"location":"setup/#installation","title":"Installation","text":"You can go multiple routes to setup and run Paperless:
The Docker routes are quick & easy. These are the recommended routes. This configures all the stuff from the above automatically so that it just works and uses sensible defaults for all configuration options. Here you find a cheat-sheet for docker beginners: CLI Basics
The bare metal route is complicated to setup but makes it easier should you want to contribute some code back. You need to configure and run the above mentioned components yourself.
"},{"location":"setup/#docker_script","title":"Use the Installation Script","text":"Paperless provides an interactive installation script to setup a Docker Compose installation. The script asks for a couple configuration options, and will then create the necessary configuration files, pull the docker image, start Paperless-ngx and create your superuser account. The script essentially automatically performs the steps described in Docker setup.
Make sure that Docker and Docker Compose are installed.
Download and run the installation script:
bash -c \"$(curl --location --silent --show-error https://raw.githubusercontent.com/paperless-ngx/paperless-ngx/main/install-paperless-ngx.sh)\"\n
Note
macOS users will need to install gnu-sed with support for running as sed
as well as wget.
Make sure that Docker and Docker Compose are installed.
Go to the /docker/compose directory on the project page and download one of the docker-compose.*.yml
files, depending on which database backend you want to use. Place the files in a local directory and rename it docker-compose.yml
. Download the docker-compose.env
file and the .env
file as well in the same directory.
If you want to enable optional support for Office and other documents, download a file with -tika
in the file name.
Tip
For new installations, it is recommended to use PostgreSQL as the database backend.
Modify docker-compose.yml
as needed. For example, you may want to change the paths to the consumption, media etc. directories to use 'bind mounts'. Find the line that specifies where to mount the directory, e.g.:
- ./consume:/usr/src/paperless/consume\n
Replace the part before the colon with a local directory of your choice:
- /home/jonaswinkler/paperless-inbox:/usr/src/paperless/consume\n
You may also want to change the default port that the webserver will use from the default (8000) to something else, e.g. for port 8010:
ports:\n - 8010:8000\n
Rootless
Warning
It is currently not possible to run the container rootless if additional languages are specified via PAPERLESS_OCR_LANGUAGES
.
If you want to run Paperless as a rootless container, you will need to do the following in your docker-compose.yml
:
user
running the container to map to the paperless
user in the container. This value (user_id
below), should be the same id that USERMAP_UID
and USERMAP_GID
are set to in the next step. See USERMAP_UID
and USERMAP_GID
here.Your entry for Paperless should contain something like:
webserver:\n image: ghcr.io/paperless-ngx/paperless-ngx:latest\n user: <user_id>\n
Modify docker-compose.env
with any configuration options you'd like. See the configuration documentation for all options.
You may also need to set USERMAP_UID
and USERMAP_GID
to the uid and gid of your user on the host system. Use id -u
and id -g
to get these. This ensures that both the container and the host user have write access to the consumption directory. If your UID and GID on the host system is 1000 (the default for the first normal user on most systems), it will work out of the box without any modifications. Run id \"username\"
to check.
Note
You can utilize Docker secrets for configuration settings by appending _FILE
to configuration values. For example PAPERLESS_DBUSER
can be set using PAPERLESS_DBUSER_FILE=/var/run/secrets/password.txt
.
Warning
Some file systems such as NFS network shares don't support file system notifications with inotify
. When storing the consumption directory on such a file system, paperless will not pick up new files with the default configuration. You will need to use PAPERLESS_CONSUMER_POLLING
, which will disable inotify. See here.
Run docker compose pull
. This will pull the image from the GitHub container registry by default but you can change the image to pull from Docker Hub by changing the image
line to image: paperlessngx/paperless-ngx:latest
.
Run docker compose up -d
. This will create and start the necessary containers.
Congratulations! Your Paperless-ngx instance should now be accessible at http://127.0.0.1:8000
(or similar, depending on your configuration). When you first access the web interface, you will be prompted to create a superuser account.
Clone the entire repository of paperless:
git clone https://github.com/paperless-ngx/paperless-ngx\n
The main branch always reflects the latest stable version.
Copy one of the docker/compose/docker-compose.*.yml
to docker-compose.yml
in the root folder, depending on which database backend you want to use. Copy docker-compose.env
into the project root as well.
In the docker-compose.yml
file, find the line that instructs Docker Compose to pull the paperless image from Docker Hub:
webserver:\n image: ghcr.io/paperless-ngx/paperless-ngx:latest\n
and replace it with a line that instructs Docker Compose to build the image from the current working directory instead:
webserver:\n build:\n context: .\n
Follow the Docker setup above except when asked to run docker compose pull
to pull the image, run
docker compose build\n
instead to build the image.
Paperless runs on linux only. The following procedure has been tested on a minimal installation of Debian/Buster, which is the current stable release at the time of writing. Windows is not and will never be supported.
Paperless requires Python 3. At this time, 3.10 - 3.12 are tested versions. Newer versions may work, but some dependencies may not fully support newer versions. Support for older Python versions may be dropped as they reach end of life or as newer versions are released, dependency support is confirmed, etc.
Install dependencies. Paperless requires the following packages.
python3
python3-pip
python3-dev
default-libmysqlclient-dev
for MariaDBpkg-config
for mysqlclient (python dependency)fonts-liberation
for generating thumbnails for plain text filesimagemagick
>= 6 for PDF conversiongnupg
for handling encrypted documentslibpq-dev
for PostgreSQLlibmagic-dev
for mime type detectionmariadb-client
for MariaDB compile timelibzbar0
for barcode detectionpoppler-utils
for barcode detectionUse this list for your preferred package management:
python3 python3-pip python3-dev imagemagick fonts-liberation gnupg libpq-dev default-libmysqlclient-dev pkg-config libmagic-dev libzbar0 poppler-utils\n
These dependencies are required for OCRmyPDF, which is used for text recognition.
unpaper
ghostscript
icc-profiles-free
qpdf
liblept5
libxml2
pngquant
(suggested for certain PDF image optimizations)zlib1g
tesseract-ocr
>= 4.0.0 for OCRtesseract-ocr
language packs (tesseract-ocr-eng
, tesseract-ocr-deu
, etc)Use this list for your preferred package management:
unpaper ghostscript icc-profiles-free qpdf liblept5 libxml2 pngquant zlib1g tesseract-ocr\n
On Raspberry Pi, these libraries are required as well:
libatlas-base-dev
libxslt1-dev
mime-support
You will also need these for installing some of the python dependencies:
build-essential
python3-setuptools
python3-wheel
Use this list for your preferred package management:
build-essential python3-setuptools python3-wheel\n
Install redis
>= 6.0 and configure it to start automatically.
Optional. Install postgresql
and configure a database, user and password for paperless. If you do not wish to use PostgreSQL, MariaDB and SQLite are available as well.
Note
On bare-metal installations using SQLite, ensure the JSON1 extension is enabled. This is usually the case, but not always.
Create a system user with a new home folder under which you wish to run paperless.
adduser paperless --system --home /opt/paperless --group\n
Get the release archive from https://github.com/paperless-ngx/paperless-ngx/releases for example with
curl -O -L https://github.com/paperless-ngx/paperless-ngx/releases/download/v1.10.2/paperless-ngx-v1.10.2.tar.xz\n
Extract the archive with
tar -xf paperless-ngx-v1.10.2.tar.xz\n
and copy the contents to the home folder of the user you created before (/opt/paperless
).
Optional: If you cloned the git repo, you will have to compile the frontend yourself, see here and use the build
step, not serve
.
Configure paperless. See configuration for details. Edit the included paperless.conf
and adjust the settings to your needs. Required settings for getting paperless running are:
PAPERLESS_REDIS
should point to your redis server, such as . PAPERLESS_DBENGINE
optional, and should be one of postgres
, mariadb
, or sqlite
PAPERLESS_DBHOST
should be the hostname on which your PostgreSQL server is running. Do not configure this to use SQLite instead. Also configure port, database name, user and password as necessary.PAPERLESS_CONSUMPTION_DIR
should point to a folder which paperless should watch for documents. You might want to have this somewhere else. Likewise, PAPERLESS_DATA_DIR
and PAPERLESS_MEDIA_ROOT
define where paperless stores its data. If you like, you can point both to the same directory.PAPERLESS_SECRET_KEY
should be a random sequence of characters. It's used for authentication. Failure to do so allows third parties to forge authentication credentials.PAPERLESS_URL
if you are behind a reverse proxy. This should point to your domain. Please see configuration for more information.Many more adjustments can be made to paperless, especially the OCR part. The following options are recommended for everyone:
PAPERLESS_OCR_LANGUAGE
to the language most of your documents are written in.PAPERLESS_TIME_ZONE
to your local time zone.Warning
Ensure your Redis instance is secured.
Create the following directories if they are missing:
/opt/paperless/media
/opt/paperless/data
/opt/paperless/consume
Adjust as necessary if you configured different folders. Ensure that the paperless user has write permissions for every one of these folders with
ls -l -d /opt/paperless/media\n
If needed, change the owner with
sudo chown paperless:paperless /opt/paperless/media\nsudo chown paperless:paperless /opt/paperless/data\nsudo chown paperless:paperless /opt/paperless/consume\n
Install python requirements from the requirements.txt
file.
sudo -Hu paperless pip3 install -r requirements.txt\n
This will install all python dependencies in the home directory of the new paperless user.
Tip
It is up to you if you wish to use a virtual environment or not for the Python dependencies. This is an alternative to the above and may require adjusting the example scripts to utilize the virtual environment paths
Tip
If you use modern Python tooling, such as uv
, installation will not include dependencies for Postgres or Mariadb. You can select those extras with --extra <EXTRA>
or all with --all-extras
Go to /opt/paperless/src
, and execute the following command:
# This creates the database schema.\nsudo -Hu paperless python3 manage.py migrate\n
When you first access the web interface you will be prompted to create a superuser account.
Optional: Test that paperless is working by executing
# Manually starts the webserver\nsudo -Hu paperless python3 manage.py runserver\n
and pointing your browser to http://localhost:8000 if accessing from the same devices on which paperless is installed. If accessing from another machine, set up systemd services. You may need to set PAPERLESS_DEBUG=true
in order for the development server to work normally in your browser.
Warning
This is a development server which should not be used in production. It is not audited for security and performance is inferior to production ready web servers.
Tip
This will not start the consumer. Paperless does this in a separate process.
Setup systemd services to run paperless automatically. You may use the service definition files included in the scripts
folder as a starting point.
Paperless needs the webserver
script to run the webserver, the consumer
script to watch the input folder, taskqueue
for the background workers used to handle things like document consumption and the scheduler
script to run tasks such as email checking at certain times .
Note
The socket
script enables granian
to run on port 80 without root privileges. For this you need to uncomment the Require=paperless-webserver.socket
in the webserver
script and configure granian
to listen on port 80 (set GRANIAN_PORT
).
These services rely on redis and optionally the database server, but don't need to be started in any particular order. The example files depend on redis being started. If you use a database server, you should add additional dependencies.
Note
For instructions on using a reverse proxy, see the wiki.
Warning
If celery won't start (check with sudo systemctl status paperless-task-queue.service
for paperless-task-queue.service and paperless-scheduler.service ) you need to change the path in the files. Example: ExecStart=/opt/paperless/.local/bin/celery --app paperless worker --loglevel INFO
Optional: Install a samba server and make the consumption folder available as a network share.
Configure ImageMagick to allow processing of PDF documents. Most distributions have this disabled by default, since PDF documents can contain malware. If you don't do this, paperless will fall back to ghostscript for certain steps such as thumbnail generation.
Edit /etc/ImageMagick-6/policy.xml
and adjust
<policy domain=\"coder\" rights=\"none\" pattern=\"PDF\" />\n
to
<policy domain=\"coder\" rights=\"read|write\" pattern=\"PDF\" />\n
Optional: Install the jbig2enc encoder. This will reduce the size of generated PDF documents. You'll most likely need to compile this by yourself, because this software has been patented until around 2017 and binary packages are not available for most distributions.
Optional: If using the NLTK machine learning processing (see PAPERLESS_ENABLE_NLTK
for details), download the NLTK data for the Snowball Stemmer, Stopwords and Punkt tokenizer to /usr/share/nltk_data
. Refer to the NLTK instructions for details on how to download the data.
Migration is possible both from Paperless-ng or directly from the 'original' Paperless.
"},{"location":"setup/#migrating-from-paperless-ng","title":"Migrating from Paperless-ng","text":"Paperless-ngx is meant to be a drop-in replacement for Paperless-ng and thus upgrading should be trivial for most users, especially when using docker. However, as with any major change, it is recommended to take a full backup first. Once you are ready, simply change the docker image to point to the new source. E.g. if using Docker Compose, edit docker-compose.yml
and change:
image: jonaswinkler/paperless-ng:latest\n
to
image: ghcr.io/paperless-ngx/paperless-ngx:latest\n
and then run docker compose up -d
which will pull the new image recreate the container. That's it!
Users who installed with the bare-metal route should also update their Git clone to point to https://github.com/paperless-ngx/paperless-ngx
, e.g. using the command git remote set-url origin https://github.com/paperless-ngx/paperless-ngx
and then pull the latest version.
At its core, paperless-ngx is still paperless and fully compatible. However, some things have changed under the hood, so you need to adapt your setup depending on how you installed paperless.
This setup describes how to update an existing paperless Docker installation. The important things to keep in mind are as follows:
Migration to paperless-ngx is then performed in a few simple steps:
Stop paperless.
cd /path/to/current/paperless\ndocker compose down\n
Do a backup for two purposes: If something goes wrong, you still have your data. Second, if you don't like paperless-ngx, you can switch back to paperless.
Download the latest release of paperless-ngx. You can either go with the Docker Compose files from here or clone the repository to build the image yourself (see above). You can either replace your current paperless folder or put paperless-ngx in a different location.
Warning
Paperless-ngx includes a .env
file. This will set the project name for docker compose to paperless
, which will also define the name of the volumes by paperless-ngx. However, if you experience that paperless-ngx is not using your old paperless volumes, verify the names of your volumes with
docker volume ls | grep _data\n
and adjust the project name in the .env
file so that it matches the name of the volumes before the _data
part.
Download the docker-compose.sqlite.yml
file to docker-compose.yml
. If you want to switch to PostgreSQL, do that after you migrated your existing SQLite database.
Adjust docker-compose.yml
and docker-compose.env
to your needs. See Docker setup details on which edits are advised.
Update paperless.
In order to find your existing documents with the new search feature, you need to invoke a one-time operation that will create the search index:
docker compose run --rm webserver document_index reindex\n
This will migrate your database and create the search index. After that, paperless will take care of maintaining the index by itself.
Start paperless-ngx.
docker compose up -d\n
This will run paperless in the background and automatically start it on system boot.
Paperless installed a permanent redirect to admin/
in your browser. This redirect is still in place and prevents access to the new UI. Clear your browsing cache in order to fix this.
Optionally, follow the instructions below to migrate your existing data to PostgreSQL.
As with any upgrades and large changes, it is highly recommended to create a backup before starting. This assumes the image was running using Docker Compose, but the instructions are translatable to Docker commands as well.
Update Redis configuration
If REDIS_URL
is already set, change it to PAPERLESS_REDIS
and continue to step 4.
Otherwise, in the docker-compose.yml
add a new service for Redis, following the example compose files
Set the environment variable PAPERLESS_REDIS
so it points to the new Redis container
Update user mapping
If set, change the environment variable PUID
to USERMAP_UID
If set, change the environment variable PGID
to USERMAP_GID
Update configuration paths
PAPERLESS_DATA_DIR
to /config
Update media paths
PAPERLESS_MEDIA_ROOT
to /data/media
Update timezone
PAPERLESS_TIME_ZONE
to the same value as TZ
Modify the image:
to point to ghcr.io/paperless-ngx/paperless-ngx:latest
or a specific version if preferred.
docker compose
.The best way to migrate between database types is to perform an export and then import into a clean installation of Paperless-ngx.
"},{"location":"setup/#moving-back-to-paperless","title":"Moving back to Paperless","text":"Lets say you migrated to Paperless-ngx and used it for a while, but decided that you don't like it and want to move back (If you do, send me a mail about what part you didn't like!), you can totally do that with a few simple steps.
Paperless-ngx modified the database schema slightly, however, these changes can be reverted while keeping your current data, so that your current data will be compatible with original Paperless. Thumbnails were also changed from PNG to WEBP format and will need to be re-generated.
Execute this:
$ cd /path/to/paperless\n$ docker compose run --rm webserver migrate documents 0023\n
Or without docker:
$ cd /path/to/paperless/src\n$ python3 manage.py migrate documents 0023\n
After regenerating thumbnails, you'll need to clear your cookies (Paperless-ngx comes with updated dependencies that do cookie-processing differently) and probably your cache as well.
"},{"location":"setup/#less-powerful-devices","title":"Considerations for less powerful devices","text":"Paperless runs on Raspberry Pi. However, some things are rather slow on the Pi and configuring some options in paperless can help improve performance immensely:
PAPERLESS_CONSUMER_DISABLE
to true
.PAPERLESS_OCR_PAGES
to 1, so that paperless will only OCR the first page of your documents. In most cases, this page contains enough information to be able to find it.PAPERLESS_TASK_WORKERS
and PAPERLESS_THREADS_PER_WORKER
are configured to use all cores. The Raspberry Pi models 3 and up have 4 cores, meaning that paperless will use 2 workers and 2 threads per worker. This may result in sluggish response times during consumption, so you might want to lower these settings (example: 2 workers and 1 thread to always have some computing power left for other tasks).PAPERLESS_OCR_MODE
at its default value skip
and consider OCR'ing your documents before feeding them into paperless. Some scanners are able to do this!PAPERLESS_OCR_SKIP_ARCHIVE_FILE
to with_text
to skip archive file generation for already ocr'ed documents, or always
to skip it for all documents.PAPERLESS_OCR_CLEAN=none
. This will speed up OCR times and use less memory at the expense of slightly worse OCR results.PAPERLESS_WEBSERVER_WORKERS
to 1. This will save some memory.PAPERLESS_ENABLE_NLTK
to false, to disable the more advanced language processing, which can take more memory and processing time.For details, refer to configuration.
Note
Updating the automatic matching algorithm takes quite a bit of time. However, the update mechanism checks if your data has changed before doing the heavy lifting. If you experience the algorithm taking too much cpu time, consider changing the schedule in the admin interface to daily. You can also manually invoke the task by changing the date and time of the next run to today/now.
The actual matching of the algorithm is fast and works on Raspberry Pi as well as on any other device.
"},{"location":"setup/#nginx","title":"Using nginx as a reverse proxy","text":"Please see the wiki for user-maintained documentation of using nginx with Paperless-ngx.
"},{"location":"setup/#security","title":"Enhancing security","text":"Please see the wiki for user-maintained documentation of how to configure security tools like Fail2ban with Paperless-ngx.
"},{"location":"troubleshooting/","title":"Troubleshooting","text":""},{"location":"troubleshooting/#no-files-are-added-by-the-consumer","title":"No files are added by the consumer","text":"Check for the following issues:
Ensure that the directory you're putting your documents in is the folder paperless is watching. With docker, this setting is performed in the docker-compose.yml
file. Without Docker, look at the CONSUMPTION_DIR
setting. Don't adjust this setting if you're using docker.
Ensure that redis is up and running. Paperless does its task processing asynchronously, and for documents to arrive at the task processor, it needs redis to run.
Ensure that the task processor is running. Docker does this automatically. Manually invoke the task processor by executing
celery --app paperless worker\n
Look at the output of paperless and inspect it for any errors.
Go to the admin interface, and check if there are failed tasks. If so, the tasks will contain an error message.
OCR for XX failed
","text":"If you find the OCR accuracy to be too low, and/or the document consumer warns that OCR for XX failed, but we're going to stick with what we've got since FORGIVING_OCR is enabled
, then you might need to install the Tesseract language files marching your document's languages.
As an example, if you are running Paperless-ngx from any Ubuntu or Debian box, and your documents are written in Spanish you may need to run:
apt-get install -y tesseract-ocr-spa\n
"},{"location":"troubleshooting/#consumer-fails-to-pickup-any-new-files","title":"Consumer fails to pickup any new files","text":"If you notice that the consumer will only pickup files in the consumption directory at startup, but won't find any other files added later, you will need to enable filesystem polling with the configuration option PAPERLESS_CONSUMER_POLLING
.
This will disable listening to filesystem changes with inotify and paperless will manually check the consumption directory for changes instead.
"},{"location":"troubleshooting/#paperless-always-redirects-to-admin","title":"Paperless always redirects to /admin","text":"You probably had the old paperless installed at some point. Paperless installed a permanent redirect to /admin in your browser, and you need to clear your browsing data / cache to fix that.
"},{"location":"troubleshooting/#operation-not-permitted","title":"Operation not permitted","text":"You might see errors such as:
chown: changing ownership of '../export': Operation not permitted\n
The container tries to set file ownership on the listed directories. This is required so that the user running paperless inside docker has write permissions to these folders. This happens when pointing these directories to NFS shares, for example.
Ensure that chown
is possible on these directories.
This indicates that the Auto matching algorithm found no documents to learn from. This may have two reasons:
You may encounter warnings like this:
/usr/local/lib/python3.7/site-packages/sklearn/base.py:315:\nUserWarning: Trying to unpickle estimator CountVectorizer from version 0.23.2 when using version 0.24.0.\nThis might lead to breaking code or invalid results. Use at your own risk.\n
This happens when certain dependencies of paperless that are responsible for the auto matching algorithm are updated. After updating these, your current training data might not be compatible anymore. This can be ignored in most cases. This warning will disappear automatically when paperless updates the training data.
If you want to get rid of the warning or actually experience issues with automatic matching, delete the file classification_model.pickle
in the data directory and let paperless recreate it.
You may experience these errors when using the optional TIKA integration:
requests.exceptions.HTTPError: 504 Server Error: Gateway Timeout for url: http://gotenberg:3000/forms/libreoffice/convert\n
Gotenberg is a server that converts Office documents into PDF documents and has a default timeout of 30 seconds. When conversion takes longer, Gotenberg raises this error.
You can increase the timeout by configuring a command flag for Gotenberg (see also here). If using Docker Compose, this is achieved by the following configuration change in the docker-compose.yml
file:
# The gotenberg chromium route is used to convert .eml files. We do not\n# want to allow external content like tracking pixels or even javascript.\ncommand:\n - 'gotenberg'\n - '--chromium-disable-javascript=true'\n - '--chromium-allow-list=file:///tmp/.*'\n - '--api-timeout=60'\n
"},{"location":"troubleshooting/#permission-denied-errors-in-the-consumption-directory","title":"Permission denied errors in the consumption directory","text":"You might encounter errors such as:
The following error occurred while consuming document.pdf: [Errno 13] Permission denied: '/usr/src/paperless/src/../consume/document.pdf'\n
This happens when paperless does not have permission to delete files inside the consumption directory. Ensure that USERMAP_UID
and USERMAP_GID
are set to the user id and group id you use on the host operating system, if these are different from 1000
. See Docker setup.
Also ensure that you are able to read and write to the consumption directory on the host.
"},{"location":"troubleshooting/#oserror-errno-19-no-such-device-when-consuming-files","title":"OSError: [Errno 19] No such device when consuming files","text":"If you experience errors such as:
File \"/usr/local/lib/python3.7/site-packages/whoosh/codec/base.py\", line 570, in open_compound_file\nreturn CompoundStorage(dbfile, use_mmap=storage.supports_mmap)\nFile \"/usr/local/lib/python3.7/site-packages/whoosh/filedb/compound.py\", line 75, in __init__\nself._source = mmap.mmap(fileno, 0, access=mmap.ACCESS_READ)\nOSError: [Errno 19] No such device\n\nDuring handling of the above exception, another exception occurred:\n\nTraceback (most recent call last):\nFile \"/usr/local/lib/python3.7/site-packages/django_q/cluster.py\", line 436, in worker\nres = f(*task[\"args\"], **task[\"kwargs\"])\nFile \"/usr/src/paperless/src/documents/tasks.py\", line 73, in consume_file\noverride_tag_ids=override_tag_ids)\nFile \"/usr/src/paperless/src/documents/consumer.py\", line 271, in try_consume_file\nraise ConsumerError(e)\n
Paperless uses a search index to provide better and faster full text searching. This search index is stored inside the data
folder. The search index uses memory-mapped files (mmap). The above error indicates that paperless was unable to create and open these files.
This happens when you're trying to store the data directory on certain file systems (mostly network shares) that don't support memory-mapped files.
"},{"location":"troubleshooting/#web-ui-stuck-at-loading","title":"Web-UI stuck at \"Loading...\"","text":"This might have multiple reasons.
If you built the docker image yourself or deployed using the bare metal route, make sure that there are files in <paperless-root>/static/frontend/<lang-code>/
. If there are no files, make sure that you executed collectstatic
successfully, either manually or as part of the docker image build.
If the front end is still missing, make sure that the front end is compiled (files present in src/documents/static/frontend
). If it is not, you need to compile the front end yourself or download the release archive instead of cloning the repository.
You might find messages like these in your log files:
[WARNING] [paperless.parsing.tesseract] Error while reading metadata\n
This indicates that paperless failed to read PDF metadata from one of your documents. This happens when you open the affected documents in paperless for editing. Paperless will continue to work, and will simply not show the invalid metadata.
"},{"location":"troubleshooting/#consumer-fails-with-a-filenotfounderror","title":"Consumer fails with a FileNotFoundError","text":"You might find messages like these in your log files:
[ERROR] [paperless.consumer] Error while consuming document SCN_0001.pdf: FileNotFoundError: [Errno 2] No such file or directory: '/tmp/ocrmypdf.io.yhk3zbv0/origin.pdf'\nTraceback (most recent call last):\n File \"/app/paperless/src/paperless_tesseract/parsers.py\", line 261, in parse\n ocrmypdf.ocr(**args)\n File \"/usr/local/lib/python3.8/dist-packages/ocrmypdf/api.py\", line 337, in ocr\n return run_pipeline(options=options, plugin_manager=plugin_manager, api=True)\n File \"/usr/local/lib/python3.8/dist-packages/ocrmypdf/_sync.py\", line 385, in run_pipeline\n exec_concurrent(context, executor)\n File \"/usr/local/lib/python3.8/dist-packages/ocrmypdf/_sync.py\", line 302, in exec_concurrent\n pdf = post_process(pdf, context, executor)\n File \"/usr/local/lib/python3.8/dist-packages/ocrmypdf/_sync.py\", line 235, in post_process\n pdf_out = metadata_fixup(pdf_out, context)\n File \"/usr/local/lib/python3.8/dist-packages/ocrmypdf/_pipeline.py\", line 798, in metadata_fixup\n with pikepdf.open(context.origin) as original, pikepdf.open(working_file) as pdf:\n File \"/usr/local/lib/python3.8/dist-packages/pikepdf/_methods.py\", line 923, in open\n pdf = Pdf._open(\nFileNotFoundError: [Errno 2] No such file or directory: '/tmp/ocrmypdf.io.yhk3zbv0/origin.pdf'\n
This probably indicates paperless tried to consume the same file twice. This can happen for a number of reasons, depending on how documents are placed into the consume folder. If paperless is using inotify (the default) to check for documents, try adjusting the inotify configuration. If polling is enabled, try adjusting the polling configuration.
"},{"location":"troubleshooting/#consumer-fails-waiting-for-file-to-remain-unmodified","title":"Consumer fails waiting for file to remain unmodified.","text":"You might find messages like these in your log files:
[ERROR] [paperless.management.consumer] Timeout while waiting on file /usr/src/paperless/src/../consume/SCN_0001.pdf to remain unmodified.\n
This indicates paperless timed out while waiting for the file to be completely written to the consume folder. Adjusting polling configuration values should resolve the issue.
Note
The user will need to manually move the file out of the consume folder and back in, for the initial failing file to be consumed.
"},{"location":"troubleshooting/#consumer-fails-reporting-os-reports-file-as-busy-still","title":"Consumer fails reporting \"OS reports file as busy still\".","text":"You might find messages like these in your log files:
[WARNING] [paperless.management.consumer] Not consuming file /usr/src/paperless/src/../consume/SCN_0001.pdf: OS reports file as busy still\n
This indicates paperless was unable to open the file, as the OS reported the file as still being in use. To prevent a crash, paperless did not try to consume the file. If paperless is using inotify (the default) to check for documents, try adjusting the inotify configuration. If polling is enabled, try adjusting the polling configuration.
Note
The user will need to manually move the file out of the consume folder and back in, for the initial failing file to be consumed.
"},{"location":"troubleshooting/#log-reports-creating-paperlesstask-failed","title":"Log reports \"Creating PaperlessTask failed\".","text":"You might find messages like these in your log files:
[ERROR] [paperless.management.consumer] Creating PaperlessTask failed: db locked\n
You are likely using an sqlite based installation, with an increased number of workers and are running into sqlite's concurrency limitations. Uploading or consuming multiple files at once results in many workers attempting to access the database simultaneously.
Consider changing to the PostgreSQL database if you will be processing many documents at once often. Otherwise, try tweaking the PAPERLESS_DB_TIMEOUT
setting to allow more time for the database to unlock. Additionally, you can change your SQLite database to use \"Write-Ahead Logging\". These changes may have minor performance implications but can help prevent database locking issues.
You are likely running using Kubernetes, which automatically creates an environment variable named ${serviceName}_PORT
. This is the same environment variable which is used by Paperless to optionally change the port granian listens on.
To fix this, set PAPERLESS_PORT
again to your desired port, or the default of 8000.
You may see database log lines like:
ERROR: duplicate key value violates unique constraint \"documents_tag_name_uniq\"\nDETAIL: Key (name)=(NameF) already exists.\nSTATEMENT: INSERT INTO \"documents_tag\" (\"owner_id\", \"name\", \"match\", \"matching_algorithm\", \"is_insensitive\", \"color\", \"is_inbox_tag\") VALUES (NULL, 'NameF', '', 1, true, '#a6cee3', false) RETURNING \"documents_tag\".\"id\"\n
This can happen during heavy consumption when using polling. Paperless will handle it correctly and the file will still be consumed
"},{"location":"troubleshooting/#consumption-fails-with-ghostscript-pdfa-rendering-failed","title":"Consumption fails with \"Ghostscript PDF/A rendering failed\"","text":"Newer versions of OCRmyPDF will fail if it encounters errors during processing. This is intentional as the output archive file may differ in unexpected or undesired ways from the original. As the logs indicate, if you encounter this error you can set PAPERLESS_OCR_USER_ARGS: '{\"continue_on_soft_render_error\": true}'
to try to 'force' processing documents with this issue.
You may see errors when deleting documents like:
Data too long for column 'transaction_id' at row 1\n
This error can occur in installations which have upgraded from a version of Paperless-ngx that used Django 4 (Paperless-ngx versions prior to v2.13.0) with a MariaDB/MySQL database. Due to the backawards-incompatible change in Django 5, the column \"documents_document.transaction_id\" will need to be re-created, which can be done with a one-time run of the following management command:
$ python3 manage.py convert_mariadb_uuid\n
"},{"location":"troubleshooting/#platform-specific-deployment-troubleshooting","title":"Platform-Specific Deployment Troubleshooting","text":"A user-maintained wiki page is available to help troubleshoot issues that may arise when trying to deploy Paperless-ngx on specific platforms, for example SELinux. Please see the wiki.
"},{"location":"usage/","title":"Usage Overview","text":"Paperless-ngx is an application that manages your personal documents. With the (optional) help of a document scanner (see the scanners wiki), Paperless-ngx transforms your unwieldy physical documents into a searchable archive and provides many utilities for finding and managing your documents.
"},{"location":"usage/#terms-and-definitions","title":"Terms and definitions","text":"Paperless essentially consists of two different parts for managing your documents:
Each document has data fields that you can assign to them:
The web UI is the primary way to interact with Paperless-ngx. It is a single-page application that is built with modern web technologies and is designed to be fast and responsive. The web UI includes a robust interface for filtering, viewing, searching and editing documents. You can also manage tags, correspondents, document types, and other settings from the web UI.
The web UI also includes a 'tour' feature that can be accessed from the settings page or from the dashboard for new users. The tour highlights some of the key features of the web UI and can be useful for new users.
"},{"location":"usage/#dashboard","title":"Dashboard","text":"The dashboard is the first page you see when you log in. By default, it does not show any documents, but you can add saved views to the dashboard to show documents that match certain criteria. The dashboard also includes a button to upload documents to Paperless-ngx but you can also drag and drop files anywhere in the app to initiate the consumption process.
"},{"location":"usage/#document-list","title":"Document List","text":"The document list is the primary way to view and interact with your documents. You can filter the list by tags, correspondents, document types, and other criteria. You can also edit documents in bulk including assigning tags, correspondents, document types, and custom fields. Selecting document(s) from the list will allow you to perform the various bulk edit operations. The document list also includes a search bar that allows you to search for documents by title, ASN, and use advanced search syntax.
"},{"location":"usage/#document-detail","title":"Document Detail","text":"The document detail page shows all the information about a single document. You can view the document, edit its metadata, assign tags, correspondents, document types, and custom fields. You can also view the document history, download the document or share it via a share link.
"},{"location":"usage/#management-lists","title":"Management Lists","text":"Paperless-ngx includes management lists for tags, correspondents, document types and more. These areas allow you to view, add, edit, delete and manage permissions for these objects. You can also manage saved views, mail accounts, mail rules, workflows and more from the management sections.
"},{"location":"usage/#adding-documents-to-paperless-ngx","title":"Adding documents to Paperless-ngx","text":"Once you've got Paperless setup, you need to start feeding documents into it. When adding documents to paperless, it will perform the following operations on your documents:
Tip
This process can be configured to fit your needs. If you don't want paperless to create archived versions for digital documents, you can configure that by configuring PAPERLESS_OCR_SKIP_ARCHIVE_FILE=with_text
. Please read the relevant section in the documentation.
Note
No matter which options you choose, Paperless will always store the original document that it found in the consumption directory or in the mail and will never overwrite that document (except when using certain document actions, which make that clear). Archived versions are stored alongside the original versions. Any files found in the consumption directory will stored inside the Paperless-ngx file structure and will not be retained in the consumption directory.
"},{"location":"usage/#the-consumption-directory","title":"The consumption directory","text":"The primary method of getting documents into your database is by putting them in the consumption directory. The consumer waits patiently, looking for new additions to this directory. When it finds them, the consumer goes about the process of parsing them with the OCR, indexing what it finds, and storing it in the media directory. You should think of this folder as a temporary location, as files will be re-created inside Paperless-ngx and removed from the consumption folder.
Getting stuff into this directory is up to you. If you're running Paperless on your local computer, you might just want to drag and drop files there, but if you're running this on a server and want your scanner to automatically push files to this directory, you'll need to setup some sort of service to accept the files from the scanner. Typically, you're looking at an FTP server like Proftpd or a Windows folder share with Samba.
Warning
Files found in the consumption directory that are consumed will be removed from the consumption directory and stored inside the Paperless-ngx file structure using any settings / storage paths you have specified. This action is performed as safely as possible but this means it is expected that files in the consumption directory will no longer exist (there) after being consumed.
"},{"location":"usage/#web-ui-upload","title":"Web UI Upload","text":"The dashboard has a button to upload documents to paperless or you can simply drag a file anywhere into the app to initiate the consumption process.
"},{"location":"usage/#usage-mobile_upload","title":"Mobile upload","text":"Please see the wiki for a user-maintained list of related projects and software (e.g. for mobile devices) that is compatible with Paperless-ngx.
"},{"location":"usage/#incoming-mail","title":"Incoming Email","text":"You can tell paperless-ngx to consume documents from your email accounts. This is a very flexible and powerful feature, if you regularly received documents via mail that you need to archive. The mail consumer can be configured via the frontend settings (/settings/mail) in the following manner:
These rules perform the following:
Paperless will check all emails only once and completely ignore messages that do not match your filters. It will also only perform the rule action on e-mails that it has consumed documents from. The filename attachment patterns can include wildcards and multiple patterns separated by a comma.
The actions all ensure that the same mail is not consumed twice by different means. These are as follows:
Add custom Tag: Adds a custom tag to mails with consumed documents (the IMAP standard calls these \"keywords\"). Paperless will not consume mails already tagged. Not all mail servers support this feature!
apple:<color>
(e.g. apple:green) as a custom tag. Available colors are red, orange, yellow, blue, green, violet and grey.Warning
The mail consumer will perform these actions on all mails it has consumed documents from. Keep in mind that the actual consumption process may fail for some reason, leaving you with missing documents in paperless.
Note
With the correct set of rules, you can completely automate your email documents. Create rules for every correspondent you receive digital documents from and paperless will read them automatically. The default action \"mark as read\" is pretty tame and will not cause any damage or data loss whatsoever.
You can also setup a special folder in your mail account for paperless and use your favorite mail client to move to be consumed mails into that folder automatically or manually and tell paperless to move them to yet another folder after consumption. It's up to you.
Note
When defining a mail rule with a folder, you may need to try different characters to define how the sub-folders are separated. Common values include \".\", \"/\" or \"|\", but this varies by the mail server. Check the documentation for your mail server. In the event of an error fetching mail from a certain folder, check the Paperless logs. When a folder is not located, Paperless will attempt to list all folders found in the account to the Paperless logs.
Note
Paperless will process the rules in the order defined in the admin page.
You can define catch-all rules and have them executed last to consume any documents not matched by previous rules. Such a rule may assign an \"Unknown mail document\" tag to consumed documents so you can inspect them further.
Paperless is set up to check your mails every 10 minutes. This can be configured via PAPERLESS_EMAIL_TASK_CRON
Paperless-ngx supports OAuth2 authentication for Gmail and Outlook email accounts. To set up an email account with OAuth2, you will need to create a 'developer' app with the respective provider and obtain the client ID and client secret and set the appropriate configuration variables. You will also need to set either PAPERLESS_OAUTH_CALLBACK_BASE_URL
or PAPERLESS_URL
to the correct value for the OAuth2 flow to work correctly.
Specific instructions for setting up the required 'developer' app with Google or Microsoft are beyond the scope of this documentation, but you can find user-maintained instructions in the wiki or by searching the web.
Once setup, navigating to the email settings page in Paperless-ngx will allow you to add an email account for Gmail or Outlook using OAuth2. After authenticating, you will be presented with the newly-created account where you will need to enter and save your email address. After this, the account will work as any other email account in Paperless-ngx and refreshing tokens will be handled automatically.
"},{"location":"usage/#rest-api","title":"REST API","text":"You can also submit a document using the REST API, see POSTing documents for details.
"},{"location":"usage/#sharing-documents-from-paperless-ngx","title":"Sharing documents from Paperless-ngx","text":"Paperless-ngx supports sharing documents with other users by assigning them permissions to the document. Document files can also be shared externally via share links, email or using email or webhook actions in workflows.
"},{"location":"usage/#share-links","title":"Share Links","text":"\"Share links\" are shareable public links to files and can be created and managed under the 'Send' button on the document detail screen.
{paperless-url}/share/{randomly-generated-slug}
.Tip
If your paperless-ngx instance is behind a reverse-proxy you may want to create an exception to bypass any authentication layers that are part of your setup in order to make links truly publicly-accessible. Of course, do so with caution.
"},{"location":"usage/#email-sharing","title":"Email Sharing","text":"Paperless-ngx supports directly sending documents via email. If an email server has been configured the \"Send\" button on the document detail page will include an \"Email\" option. You can also share files via email automatically by using a workflow action.
"},{"location":"usage/#permissions","title":"Permissions","text":"Permissions in Paperless-ngx are based around 'global' permissions as well as 'object-level' permissions. Global permissions determine which parts of the application a user can access (e.g. Documents, Tags, Settings) and object-level determine which objects are visible or editable. All objects have an 'owner' and 'view' and 'edit' permissions which can be granted to other users or groups. The paperless-ngx permissions system uses the built-in user model of the backend framework, Django.
Tip
Object-level permissions only apply to the object itself. In other words, setting permissions for a Tag will not affect the permissions of documents that have the Tag.
Permissions can be set using the new \"Permissions\" tab when editing documents, or bulk-applied in the UI by selecting documents and choosing the \"Permissions\" button.
"},{"location":"usage/#default-permissions","title":"Default permissions","text":"Workflows provide advanced ways to control permissions.
For objects created via the web UI (tags, doc types, etc.) the default is to set the current user as owner and no extra permissions, but you can explicitly set these under Settings > Permissions.
Documents consumed via the consumption directory do not have an owner or additional permissions set by default, but again, can be controlled with Workflows.
"},{"location":"usage/#users-and-groups","title":"Users and Groups","text":"Paperless-ngx supports editing users and groups via the 'frontend' UI, which can be found under Settings > Users & Groups, assuming the user has access. If a user is designated as a member of a group those permissions will be inherited and this is reflected in the UI. Explicit permissions can be granted to limit access to certain parts of the UI (and corresponding API endpoints).
Tip
By default, new users are not granted any permissions, except those inherited from any group(s) of which they are a member.
"},{"location":"usage/#superusers","title":"Superusers","text":"Superusers can access all parts of the front and backend application as well as any and all objects. Superuser status can only be granted by another superuser.
"},{"location":"usage/#admin-status","title":"Admin Status","text":"Admin status (Django 'staff status') grants access to viewing the paperless logs and the system status dialog as well as accessing the Django backend.
"},{"location":"usage/#global-permissions","title":"Detailed Explanation of Global Permissions","text":"Global permissions define what areas of the app and API endpoints users can access. For example, they determine if a user can create, edit, delete or view any documents, but individual documents themselves still have \"object-level\" permissions.
Type Details AppConfig Change or higher permissions grants access to the \"Application Configuration\" area. Correspondent Add, edit, delete or view Correspondents. CustomField Add, edit, delete or view Custom Fields. Document Add, edit, delete or view Documents. DocumentType Add, edit, delete or view Document Types. Group Add, edit, delete or view Groups. MailAccount Add, edit, delete or view Mail Accounts. MailRule Add, edit, delete or view Mail Rules. Note Add, edit, delete or view Notes. PaperlessTask View or dismiss (Change) File Tasks. SavedView Add, edit, delete or view Saved Views. ShareLink Add, delete or view Share Links. StoragePath Add, edit, delete or view Storage Paths. Tag Add, edit, delete or view Tags. UISettings Add, edit, delete or view the UI settings that are used by the web app. Users that will access the web UI must be granted at least View permissions. User Add, edit, delete or view Users. Workflow Add, edit, delete or view Workflows.Note that Workflows are global, in other words all users who can access workflows have access to the same set of them."},{"location":"usage/#object-permissions","title":"Detailed Explanation of Object Permissions","text":"Type Details Owner By default objects are only visible and editable by their owner.Only the object owner can grant permissions to other users or groups.Additionally, only document owners can create share links and add / remove custom fields.For backwards compatibility objects can have no owner which makes them visible to any user. View Confers the ability to view (not edit) a document, tag, etc.Users without 'view' (or higher) permissions will be shown 'Private' in place of the object name for example when viewing a document with a tag for which the user doesn't have permissions. Edit Confers the ability to edit (and view) a document, tag, etc."},{"location":"usage/#password-reset","title":"Password reset","text":"In order to enable the password reset feature you will need to setup an SMTP backend, see PAPERLESS_EMAIL_HOST
. If your installation does not have PAPERLESS_URL
set, the reset link included in emails will use the server host.
Users can enable two-factor authentication (2FA) for their accounts from the 'My Profile' dialog. Opening the dropdown reveals a QR code that can be scanned by a 2FA app (e.g. Google Authenticator) to generate a code. The code must then be entered in the dialog to enable 2FA. If the code is accepted and 2FA is enabled, the user will be shown a set of 10 recovery codes that can be used to login in the event that the 2FA device is lost or unavailable. These codes should be stored securely and cannot be retrieved again. Once enabled, users will be required to enter a code from their 2FA app when logging in.
Should a user lose access to their 2FA device and all recovery codes, a superuser can disable 2FA for the user from the 'Users & Groups' management screen.
"},{"location":"usage/#workflows","title":"Workflows","text":"Note
v2.3 added \"Workflows\" and existing \"Consumption Templates\" were converted automatically to the new more powerful format.
Workflows allow hooking into the Paperless-ngx document pipeline, for example to alter what metadata (tags, doc types) and permissions (owner, privileges) are assigned to documents. Workflows can have multiple 'triggers' and 'actions'. Triggers are events (with optional filtering rules) that will cause the workflow to be run and actions are the set of sequential actions to apply.
In general, workflows and any actions they contain are applied sequentially by sort order. For \"assignment\" actions, subsequent workflow actions will override previous assignments, except for assignments that accept multiple items e.g. tags, custom fields and permissions, which will be merged.
"},{"location":"usage/#workflow-triggers","title":"Workflow Triggers","text":""},{"location":"usage/#workflow-trigger-types","title":"Types","text":"Currently, there are three events that correspond to workflow trigger 'types':
The following flow diagram illustrates the three document trigger types:
flowchart TD\n consumption{\"Matching\n 'Consumption'\n trigger(s)\"}\n\n added{\"Matching\n 'Added'\n trigger(s)\"}\n\n updated{\"Matching\n 'Updated'\n trigger(s)\"}\n\n A[New Document] --> consumption\n consumption --> |Yes| C[Workflow Actions Run]\n consumption --> |No| D\n C --> D[Document Added]\n D -- Paperless-ngx 'matching' of tags, etc. --> added\n added --> |Yes| F[Workflow Actions Run]\n added --> |No| G\n F --> G[Document Finalized]\n H[Existing Document Changed] --> updated\n updated --> |Yes| J[Workflow Actions Run]\n updated --> |No| K\n J --> K[Document Saved]
"},{"location":"usage/#workflow-trigger-filters","title":"Filters","text":"Workflows allow you to filter by:
PAPERLESS_CONSUMER_RECURSIVE
would allow, for example, automatically assigning documents to different owners based on the upload directory.Added
and Updated
triggers only). Filter document content using the matching settings.Added
and Updated
triggers only). Filter for documents with any of the specified tagsAdded
and Updated
triggers only). Filter documents with this doc typeAdded
and Updated
triggers only). Filter documents with this correspondentThe following workflow action types are available:
"},{"location":"usage/#workflow-action-assignment","title":"Assignment","text":"\"Assignment\" actions can assign:
\"Removal\" actions can remove either all of or specific sets of the following:
\"Email\" actions can send documents via email. This action requires a mail server to be configured. You can specify:
\"Webhook\" actions send a POST request to a specified URL. You can specify:
Some workflow text can include placeholders but the available options differ depending on the type of workflow trigger. This is because at the time of consumption (when the text is to be set), no automatic tags etc. have been applied. You can use the following placeholders with any trigger type:
{correspondent}
: assigned correspondent name{document_type}
: assigned document type name{owner_username}
: assigned owner username{added}
: added datetime{added_year}
: added year{added_year_short}
: added year{added_month}
: added month{added_month_name}
: added month name{added_month_name_short}
: added month short name{added_day}
: added day{added_time}
: added time in HH:MM format{original_filename}
: original file name without extension{filename}
: current file name without extensionThe following placeholders are only available for \"added\" or \"updated\" triggers
{created}
: created datetime{created_year}
: created year{created_year_short}
: created year{created_month}
: created month{created_month_name}
: created month name{created_month_name_short}
: created month short name{created_day}
: created day{created_time}
: created time in HH:MM format{doc_url}
: URL to the document in the web UI. Requires the PAPERLESS_URL
setting to be set.All users who have application permissions for editing workflows can see the same set of workflows. In other words, workflows themselves intentionally do not have an owner or permissions.
Given their potentially far-reaching capabilities, you may want to restrict access to workflows.
Upon migration, existing installs will grant access to workflows to users who can add documents (and superusers who can always access all parts of the app).
"},{"location":"usage/#custom-fields","title":"Custom Fields","text":"Paperless-ngx supports the use of custom fields for documents as of v2.0, allowing a user to optionally attach data to documents which does not fit in the existing set of fields Paperless-ngx provides.
Important
Added / removed fields, as well as any data, is not saved to the document until you actually hit the \"Save\" button, similar to other changes on the document details page.
Note
Once the data type for a field is set, it cannot be changed.
Multiple fields may be attached to a document but the same field name cannot be assigned multiple times to the a single document.
The following custom field types are supported:
Text
: any textBoolean
: true / false (check / unchecked) fieldDate
: dateURL
: a valid urlInteger
: integer number e.g. 12Number
: float number e.g. 12.3456Monetary
: ISO 4217 currency code and a number with exactly two decimals, e.g. USD12.30Document Link
: reference(s) to other document(s) displayed as links, automatically creates a symmetrical link in reverseSelect
: a pre-defined list of strings from which the user can choosePaperless-ngx supports four basic editing operations for PDFs (these operations currently cannot be performed on non-PDF files):
Important
Note that rotation and deleting pages alter the Paperless-ngx original file, which would, for example, invalidate a digital signature.
"},{"location":"usage/#document-history","title":"Document History","text":"As of version 2.7, Paperless-ngx automatically records all changes to a document and records this in an audit log. The feature requires PAPERLESS_AUDIT_LOG_ENABLED
be enabled, which it is by default as of version 2.7. Changes to documents are visible under the \"History\" tab. Note that certain changes such as those made by workflows, record the 'actor' as \"System\".
When you first delete a document it is moved to the 'trash' until either it is explicitly deleted or it is automatically removed after a set amount of time has passed. You can set how long documents remain in the trash before being automatically deleted with PAPERLESS_EMPTY_TRASH_DELAY
, which defaults to 30 days. Until the file is actually deleted (e.g. the trash is emptied), all files and database content remains intact and can be restored at any point up until that time.
Additionally you may configure a directory where deleted files are moved to when they the trash is emptied with PAPERLESS_EMPTY_TRASH_DIR
. Note that the empty trash directory only stores the original file, the archive file and all database information is permanently removed once a document is fully deleted.
Paperless offers a couple tools that help you organize your document collection. However, it is up to you to use them in a way that helps you organize documents and find specific documents when you need them. This section offers a couple ideas for managing your collection.
Document types allow you to classify documents according to what they are. You can define types such as \"Receipt\", \"Invoice\", or \"Contract\". If you used to collect all your receipts in a single binder, you can recreate that system in paperless by defining a document type, assigning documents to that type and then filtering by that type to only see all receipts.
Not all documents need document types. Sometimes its hard to determine what the type of a document is or it is hard to justify creating a document type that you only need once or twice. This is okay. As long as the types you define help you organize your collection in the way you want, paperless is doing its job.
Tags can be used in many different ways. Think of tags are more versatile folders or binders. If you have a binder for documents related to university / your car or health care, you can create these binders in paperless by creating tags and assigning them to relevant documents. Just as with documents, you can filter the document list by tags and only see documents of a certain topic.
With physical documents, you'll often need to decide which folder the document belongs to. The advantage of tags over folders and binders is that a single document can have multiple tags. A physical document cannot magically appear in two different folders, but with tags, this is entirely possible.
Tip
This can be used in many different ways. One example: Imagine you're working on a particular task, such as signing up for university. Usually you'll need to collect a bunch of different documents that are already sorted into various folders. With the tag system of paperless, you can create a new group of documents that are relevant to this task without destroying the already existing organization. When you're done with the task, you could delete the tag again, which would be equal to sorting documents back into the folder they belong into. Or keep the tag, up to you.
All of the logic above applies to correspondents as well. Attach them to documents if you feel that they help you organize your collection.
When you've started organizing your documents, create a couple saved views for document collections you regularly access. This is equal to having labeled physical binders on your desk, except that these saved views are dynamic and simply update themselves as you add documents to the system.
Here are a couple examples of tags and types that you could use in your collection.
inbox
tag for newly added documents that you haven't manually edited yet.car
for everything car related (repairs, registration, insurance, etc)todo
for documents that you still need to do something with, such as reply, or perform some task online.bank account x
for all bank statement related to that account.mail
for anything that you added to paperless via its mail processing capabilities.missing_metadata
when you still need to add some metadata to a document, but can't or don't want to do this right now.The top search bar in the web UI performs a \"global\" search of the various objects Paperless-ngx uses, including documents, tags, workflows, etc. Only objects for which the user has appropriate permissions are returned. For documents, if there are < 3 results, \"advanced\" search results (which use the document index) will also be included. This can be disabled under settings.
"},{"location":"usage/#document-searches","title":"Document searches","text":"Paperless offers an extensive searching mechanism that is designed to allow you to quickly find a document you're looking for (for example, that thing that just broke and you bought a couple months ago, that contract you signed 8 years ago).
When you search paperless for a document, it tries to match this query against your documents. Paperless will look for matching documents by inspecting their content, title, correspondent, type and tags. Paperless returns a scored list of results, so that documents matching your query better will appear further up in the search results.
By default, paperless returns only documents which contain all words typed in the search bar. However, paperless also offers advanced search syntax if you want to drill down the results further.
Matching documents with logical expressions:
shopname AND (product1 OR product2)\n
Matching specific tags, correspondents or types:
type:invoice tag:unpaid\ncorrespondent:university certificate\n
Matching dates:
created:[2005 to 2009]\nadded:yesterday\nmodified:today\n
Matching inexact words:
produ*name\n
Note
Inexact terms are hard for search indexes. These queries might take a while to execute. That's why paperless offers auto complete and query correction.
All of these constructs can be combined as you see fit. If you want to learn more about the query language used by paperless, paperless uses Whoosh's default query language. Head over to Whoosh query language. For details on what date parsing utilities are available, see Date parsing.
"},{"location":"usage/#keyboard-shortcuts-hotkeys","title":"Keyboard shortcuts / hotkeys","text":"A list of available hotkeys can be shown on any page using Shift + ?. The help dialog shows only the keys that are currently available based on which area of Paperless-ngx you are using.
"},{"location":"usage/#usage-recommended-workflow","title":"The recommended workflow","text":"Once you have familiarized yourself with paperless and are ready to use it for all your documents, the recommended workflow for managing your documents is as follows. This workflow also takes into account that some documents have to be kept in physical form, but still ensures that you get all the advantages for these documents as well.
The following diagram shows how easy it is to manage your documents.
"},{"location":"usage/#preparations-in-paperless","title":"Preparations in paperless","text":"Keep a physical inbox. Whenever you receive a document that you need to archive, put it into your inbox. Regularly, do the following for all documents in your inbox:
Tip
Instead of writing a number on the document by hand, you may also prepare a spool of labels with barcodes with an ascending serial number, that are formatted like ASN00001
. This also enables Paperless to automatically parse and process the ASN (if enabled in the config), so that you don't need to manually assign it.
Over time, you will notice that your physical binder will fill up. If it is full, label the binder with the range of ASNs in this binder (i.e., \"Documents 1 to 343\"), store the binder in your cellar or elsewhere, and start a new binder.
The idea behind this process is that you will never have to use the physical binders to find a document. If you need a specific physical document, you may find this document by:
Once you have scanned in a document, proceed in paperless as follows.
Tip
You can setup manual matching rules for your correspondents and tags and paperless will assign them automatically. After consuming a couple documents, you can even ask paperless to learn when to assign tags and correspondents by itself. For details on this feature, see advanced matching.
"},{"location":"usage/#task-management","title":"Task management","text":"Some documents require attention and require you to act on the document. You may take two different approaches to handle these documents based on how regularly you intend to scan documents and use paperless.
Paperless-ngx consists of the following components:
The webserver: This serves the administration pages, the API, and the new frontend. This is the main tool you'll be using to interact with paperless. You may start the webserver directly with
cd /path/to/paperless/src/\ngranian --interface asginl --ws \"paperless.asgi:application\"\n
or by any other means such as Apache mod_wsgi
.
The consumer: This is what watches your consumption folder for documents. However, the consumer itself does not really consume your documents. Now it notifies a task processor that a new file is ready for consumption. I suppose it should be named differently. This was also used to check your emails, but that's now done elsewhere as well.
Start the consumer with the management command document_consumer
:
cd /path/to/paperless/src/\npython3 manage.py document_consumer\n
The task processor: Paperless relies on Celery - Distributed Task Queue for doing most of the heavy lifting. This is a task queue that accepts tasks from multiple sources and processes these in parallel. It also comes with a scheduler that executes certain commands periodically.
This task processor is responsible for:
This allows paperless to process multiple documents from your consumption folder in parallel! On a modern multi core system, this makes the consumption process with full OCR blazingly fast.
The task processor comes with a built-in admin interface that you can use to check whenever any of the tasks fail and inspect the errors (i.e., wrong email credentials, errors during consuming a specific file, etc).
A redis message broker: This is a really lightweight service that is responsible for getting the tasks from the webserver and the consumer to the task scheduler. These run in a different process (maybe even on different machines!), and therefore, this is necessary.
Optional: A database server. Paperless supports PostgreSQL, MariaDB and SQLite for storing its data.