mirror of
https://github.com/paperless-ngx/paperless-ngx.git
synced 2025-04-02 13:45:10 -05:00
317 lines
11 KiB
ReStructuredText
317 lines
11 KiB
ReStructuredText
.. _configuration:
|
|
|
|
*************
|
|
Configuration
|
|
*************
|
|
|
|
Paperless provides a wide range of customizations.
|
|
Depending on how you run paperless, these settings have to be defined in different
|
|
places.
|
|
|
|
* If you run paperless on docker, ``paperless.conf`` is not used. Rather, configure
|
|
paperless by copying necessary options to ``docker-compose.env``.
|
|
* If you are running paperless on anything else, paperless will search for the
|
|
configuration file in these locations and use the first one it finds:
|
|
|
|
.. code::
|
|
|
|
/path/to/paperless/paperless.conf
|
|
/etc/paperless.conf
|
|
/usr/local/etc/paperless.conf
|
|
|
|
|
|
Required services
|
|
#################
|
|
|
|
PAPERLESS_REDIS=<url>
|
|
This is required for processing scheduled tasks such as email fetching, index
|
|
optimization and for training the automatic document matcher.
|
|
|
|
Defaults to redis://localhost:6379.
|
|
|
|
PAPERLESS_DBHOST=<hostname>
|
|
By default, sqlite is used as the database backend. This can be changed here.
|
|
Set PAPERLESS_DBHOST and PostgreSQL will be used instead of mysql.
|
|
|
|
PAPERLESS_DBPORT=<port>
|
|
Adjust port if necessary.
|
|
|
|
Default is 5432.
|
|
|
|
PAPERLESS_DBNAME=<name>
|
|
Database name in PostgreSQL.
|
|
|
|
Defaults to "paperless".
|
|
|
|
PAPERLESS_DBUSER=<name>
|
|
Database user in PostgreSQL.
|
|
|
|
Defaults to "paperless".
|
|
|
|
PAPERLESS_DBPASS=<password>
|
|
Database password for PostgreSQL.
|
|
|
|
Defaults to "paperless".
|
|
|
|
|
|
Paths and folders
|
|
#################
|
|
|
|
PAPERLESS_CONSUMPTION_DIR=<path>
|
|
This where your documents should go to be consumed. Make sure that it exists
|
|
and that the user running the paperless service can read/write its contents
|
|
before you start Paperless.
|
|
|
|
Don't change this when using docker, as it only changes the path within the
|
|
container. Change the local consumption directory in the docker-compose.yml
|
|
file instead.
|
|
|
|
Defaults to "../consume", relative to the "src" directory.
|
|
|
|
PAPERLESS_DATA_DIR=<path>
|
|
This is where paperless stores all its data (search index, sqlite database,
|
|
classification model, etc).
|
|
|
|
Defaults to "../data", relative to the "src" directory.
|
|
|
|
PAPERLESS_MEDIA_ROOT=<path>
|
|
This is where your documents and thumbnails are stored.
|
|
|
|
You can set this and PAPERLESS_DATA_DIR to the same folder to have paperless
|
|
store all its data within the same volume.
|
|
|
|
Defaults to "../media", relative to the "src" directory.
|
|
|
|
PAPERLESS_STATICDIR=<path>
|
|
Override the default STATIC_ROOT here. This is where all static files
|
|
created using "collectstatic" manager command are stored.
|
|
|
|
Unless you're doing something fancy, there is no need to override this.
|
|
|
|
Defaults to "../static", relative to the "src" directory.
|
|
|
|
PAPERLESS_FILENAME_FORMAT=<format>
|
|
Changes the filenames paperless uses to store documents in the media directory.
|
|
See :ref:`advanced-file_name_handling` for details.
|
|
|
|
Default is none, which disables this feature.
|
|
|
|
Hosting & Security
|
|
##################
|
|
|
|
PAPERLESS_SECRET_KEY=<key>
|
|
Paperless uses this to make session tokens. If you exose paperless on the
|
|
internet, you need to change this, since the default secret is well known.
|
|
|
|
Use any sequence of characters. The more, the better. You don't need to
|
|
remember this. Just face-roll your keyboard.
|
|
|
|
Default is listed in the file ``src/paperless/settings.py``.
|
|
|
|
PAPERLESS_ALLOWED_HOSTS<comma-separated-list>
|
|
If you're planning on putting Paperless on the open internet, then you
|
|
really should set this value to the domain name you're using. Failing to do
|
|
so leaves you open to HTTP host header attacks:
|
|
https://docs.djangoproject.com/en/3.1/topics/security/#host-header-validation
|
|
|
|
Just remember that this is a comma-separated list, so "example.com" is fine,
|
|
as is "example.com,www.example.com", but NOT " example.com" or "example.com,"
|
|
|
|
Defaults to "*", which is all hosts.
|
|
|
|
PAPERLESS_CORS_ALLOWED_HOSTS<comma-separated-list>
|
|
You need to add your servers to the list of allowed hosts that can do CORS
|
|
calls. Set this to your public domain name.
|
|
|
|
Defaults to "http://localhost:8000".
|
|
|
|
PAPERLESS_FORCE_SCRIPT_NAME=<path>
|
|
To host paperless under a subpath url like example.com/paperless you set
|
|
this value to /paperless. No trailing slash!
|
|
|
|
.. note::
|
|
|
|
I don't know if this works in paperless-ng. Probably not.
|
|
|
|
Defaults to none, which hosts paperless at "/".
|
|
|
|
PAPERLESS_STATIC_URL=<path>
|
|
Override the STATIC_URL here. Unless you're hosting Paperless off a
|
|
subdomain like /paperless/, you probably don't need to change this.
|
|
|
|
Defaults to "/static/".
|
|
|
|
|
|
Software tweaks
|
|
###############
|
|
|
|
PAPERLESS_TASK_WORKERS=<num>
|
|
Paperless does multiple things in the background: Maintain the search index,
|
|
maintain the automatic matching algorithm, check emails, consume documents,
|
|
etc. This variable specifies how many things it will do in parallel.
|
|
|
|
PAPERLESS_THREADS_PER_WORKER=<num>
|
|
Furthermore, paperless uses multiple threads when consuming documents to
|
|
speed up OCR. This variable specifies how many pages paperless will process
|
|
in parallel on a single document.
|
|
|
|
.. caution::
|
|
|
|
Ensure that the product
|
|
|
|
PAPERLESS_TASK_WORKERS * PAPERLESS_THREADS_PER_WORKER
|
|
|
|
does not exceed your CPU core count or else paperless will be extremely slow.
|
|
If you want paperless to process many documents in parallel, choose a high
|
|
worker count. If you want paperless to process very large documents faster,
|
|
use a higher thread per worker count.
|
|
|
|
The default is a balance between the two, according to your CPU core count,
|
|
with a slight favor towards threads per worker, and using as much cores as
|
|
possible.
|
|
|
|
If you only specify PAPERLESS_TASK_WORKERS, paperless will adjust
|
|
PAPERLESS_THREADS_PER_WORKER automatically.
|
|
|
|
|
|
|
|
PAPERLESS_TIME_ZONE=<timezone>
|
|
Set the time zone here.
|
|
See https://docs.djangoproject.com/en/3.1/ref/settings/#std:setting-TIME_ZONE
|
|
for details on how to set it.
|
|
|
|
Defaults to UTC.
|
|
|
|
|
|
|
|
PAPERLESS_OCR_PAGES=<num>
|
|
Tells paperless to use only the specified amount of pages for OCR. Documents
|
|
with less than the specified amount of pages get OCR'ed completely.
|
|
|
|
Specifying 1 here will only use the first page.
|
|
|
|
Defaults to 0, which disables this feature and always uses all pages.
|
|
|
|
|
|
|
|
PAPERLESS_OCR_LANGUAGE=<lang>
|
|
Customize the default language that tesseract will attempt to use when
|
|
parsing documents. The default language is used whenever
|
|
|
|
* No language could be detected on a document
|
|
* No tesseract data files are available for the detected language
|
|
|
|
It should be a 3-letter language code consistent with ISO
|
|
639: https://www.loc.gov/standards/iso639-2/php/code_list.php
|
|
|
|
Set this to the language most of your documents are written in.
|
|
|
|
Defaults to "eng".
|
|
|
|
PAPERLESS_OCR_ALWAYS=<bool>
|
|
By default Paperless does not OCR a document if the text can be retrieved from
|
|
the document directly. Set to true to always OCR documents.
|
|
|
|
Defaults to false.
|
|
|
|
PAPERLESS_CONSUMER_POLLING=<num>
|
|
If paperless won't find documents added to your consume folder, it might
|
|
not be able to automatically detect filesystem changes. In that case,
|
|
specify a polling interval in seconds here, which will then cause paperless
|
|
to periodically check your consumption directory for changes.
|
|
|
|
Defaults to 0, which disables polling and uses filesystem notifiactions.
|
|
|
|
PAPERLESS_CONSUMER_DELETE_DUPLICATES=<bool>
|
|
When the consumer detects a duplicate document, it will not touch the
|
|
original document. This default behavior can be changed here.
|
|
|
|
Defaults to false.
|
|
|
|
PAPERLESS_CONVERT_MEMORY_LIMIT=<num>
|
|
On smaller systems, or even in the case of Very Large Documents, the consumer
|
|
may explode, complaining about how it's "unable to extend pixel cache". In
|
|
such cases, try setting this to a reasonably low value, like 32. The
|
|
default is to use whatever is necessary to do everything without writing to
|
|
disk, and units are in megabytes.
|
|
|
|
For more information on how to use this value, you should search
|
|
the web for "MAGICK_MEMORY_LIMIT".
|
|
|
|
Defaults to 0, which disables the limit.
|
|
|
|
PAPERLESS_CONVERT_TMPDIR=<path>
|
|
Similar to the memory limit, if you've got a small system and your OS mounts
|
|
/tmp as tmpfs, you should set this to a path that's on a physical disk, like
|
|
/home/your_user/tmp or something. ImageMagick will use this as scratch space
|
|
when crunching through very large documents.
|
|
|
|
For more information on how to use this value, you should search
|
|
the web for "MAGICK_TMPDIR".
|
|
|
|
Default is none, which disables the temporary directory.
|
|
|
|
PAPERLESS_CONVERT_DENSITY=<num>
|
|
This setting has a high impact on the physical size of tmp page files,
|
|
the speed of document conversion, and can affect the accuracy of OCR
|
|
results. Individual results can vary and this setting should be tested
|
|
thoroughly against the documents you are importing to see if it has any
|
|
impacts either negative or positive.
|
|
Testing on limited document sets has shown a setting of 200 can cut the
|
|
size of tmp files by 1/3, and speed up conversion by up to 4x
|
|
with little impact to OCR accuracy.
|
|
|
|
Default is 300.
|
|
|
|
PAPERLESS_OPTIMIZE_THUMBNAILS=<bool>
|
|
Use optipng to optimize thumbnails. This usually reduces the sice of
|
|
thumbnails by about 20%, but uses considerable compute time during
|
|
consumption.
|
|
|
|
Defaults to true.
|
|
|
|
PAPERLESS_POST_CONSUME_SCRIPT=<filename>
|
|
After a document is consumed, Paperless can trigger an arbitrary script if
|
|
you like. This script will be passed a number of arguments for you to work
|
|
with. For more information, take a look at :ref:`advanced-post_consume_script`.
|
|
|
|
The default is blank, which means nothing will be executed.
|
|
|
|
PAPERLESS_FILENAME_DATE_ORDER=<format>
|
|
Paperless will check the document text for document date information.
|
|
Use this setting to enable checking the document filename for date
|
|
information. The date order can be set to any option as specified in
|
|
https://dateparser.readthedocs.io/en/latest/settings.html#date-order.
|
|
The filename will be checked first, and if nothing is found, the document
|
|
text will be checked as normal.
|
|
|
|
Defaults to none, which disables this feature.
|
|
|
|
PAPERLESS_FILENAME_PARSE_TRANSFORMS
|
|
Transforms filenames before they are processed by paperless. See
|
|
:ref:`advanced-transforming_filenames` for details.
|
|
|
|
Defaults to none, which disables this feature.
|
|
|
|
Binaries
|
|
########
|
|
|
|
There are a few external software packages that Paperless expects to find on
|
|
your system when it starts up. Unless you've done something creative with
|
|
their installation, you probably won't need to edit any of these. However,
|
|
if you've installed these programs somewhere where simply typing the name of
|
|
the program doesn't automatically execute it (ie. the program isn't in your
|
|
$PATH), then you'll need to specify the literal path for that program.
|
|
|
|
PAPERLESS_CONVERT_BINARY=<path>
|
|
Defaults to "/usr/bin/convert".
|
|
|
|
PAPERLESS_GS_BINARY=<path>
|
|
Defaults to "/usr/bin/gs".
|
|
|
|
PAPERLESS_UNPAPER_BINARY=<path>
|
|
Defaults to "/usr/bin/unpaper".
|
|
|
|
PAPERLESS_OPTIPNG_BINARY=<path>
|
|
Defaults to "/usr/bin/optipng".
|