.. _configuration:

*************
Configuration
*************

Paperless provides a wide range of customizations.
Depending on how you run paperless, these settings have to be defined in different
places.

*   If you run paperless on docker, ``paperless.conf`` is not used. Rather, configure
    paperless by copying necessary options to ``docker-compose.env``.
*   If you are running paperless on anything else, paperless will search for the
    configuration file in these locations and use the first one it finds:

    .. code::

        /path/to/paperless/paperless.conf
        /etc/paperless.conf
        /usr/local/etc/paperless.conf


Required services
#################

PAPERLESS_REDIS=<url>
    This is required for processing scheduled tasks such as email fetching, index
    optimization and for training the automatic document matcher.

    Defaults to redis://localhost:6379.

PAPERLESS_DBHOST=<hostname>
    By default, sqlite is used as the database backend. This can be changed here.
    Set PAPERLESS_DBHOST and PostgreSQL will be used instead of mysql.

PAPERLESS_DBPORT=<port>
    Adjust port if necessary.

    Default is 5432.

PAPERLESS_DBNAME=<name>
    Database name in PostgreSQL.

    Defaults to "paperless".

PAPERLESS_DBUSER=<name>
    Database user in PostgreSQL.

    Defaults to "paperless".

PAPERLESS_DBPASS=<password>
    Database password for PostgreSQL.

    Defaults to "paperless".


Paths and folders
#################

PAPERLESS_CONSUMPTION_DIR=<path>
    This where your documents should go to be consumed.  Make sure that it exists
    and that the user running the paperless service can read/write its contents
    before you start Paperless.

    Don't change this when using docker, as it only changes the path within the
    container. Change the local consumption directory in the docker-compose.yml
    file instead.

    Defaults to "../consume", relative to the "src" directory.

PAPERLESS_DATA_DIR=<path>
    This is where paperless stores all its data (search index, SQLite database,
    classification model, etc).

    Defaults to "../data", relative to the "src" directory.

PAPERLESS_MEDIA_ROOT=<path>
    This is where your documents and thumbnails are stored.

    You can set this and PAPERLESS_DATA_DIR to the same folder to have paperless
    store all its data within the same volume.

    Defaults to "../media", relative to the "src" directory.

PAPERLESS_STATICDIR=<path>
    Override the default STATIC_ROOT here.  This is where all static files
    created using "collectstatic" manager command are stored.

    Unless you're doing something fancy, there is no need to override this.

    Defaults to "../static", relative to the "src" directory.

PAPERLESS_FILENAME_FORMAT=<format>
    Changes the filenames paperless uses to store documents in the media directory.
    See :ref:`advanced-file_name_handling` for details.

    Default is none, which disables this feature.

Hosting & Security
##################

PAPERLESS_SECRET_KEY=<key>
    Paperless uses this to make session tokens. If you expose paperless on the
    internet, you need to change this, since the default secret is well known.

    Use any sequence of characters. The more, the better. You don't need to
    remember this. Just face-roll your keyboard.

    Default is listed in the file ``src/paperless/settings.py``.

PAPERLESS_ALLOWED_HOSTS<comma-separated-list>
    If you're planning on putting Paperless on the open internet, then you
    really should set this value to the domain name you're using.  Failing to do
    so leaves you open to HTTP host header attacks:
    https://docs.djangoproject.com/en/3.1/topics/security/#host-header-validation

    Just remember that this is a comma-separated list, so "example.com" is fine,
    as is "example.com,www.example.com", but NOT " example.com" or "example.com,"

    Defaults to "*", which is all hosts.

PAPERLESS_CORS_ALLOWED_HOSTS<comma-separated-list>
    You need to add your servers to the list of allowed hosts that can do CORS
    calls. Set this to your public domain name.

    Defaults to "http://localhost:8000".

PAPERLESS_FORCE_SCRIPT_NAME=<path>
    To host paperless under a subpath url like example.com/paperless you set
    this value to /paperless. No trailing slash!

    .. note::

        I don't know if this works in paperless-ng. Probably not.

    Defaults to none, which hosts paperless at "/".

PAPERLESS_STATIC_URL=<path>
    Override the STATIC_URL here.  Unless you're hosting Paperless off a
    subdomain like /paperless/, you probably don't need to change this.

    Defaults to "/static/".

PAPERLESS_AUTO_LOGIN_USERNAME=<username>
    Specify a username here so that paperless will automatically perform login
    with the selected user.

    .. danger::

        Do not use this when exposing paperless on the internet. There are no
        checks in place that would prevent you from doing this.

    Defaults to none, which disables this feature.

Software tweaks
###############

PAPERLESS_TASK_WORKERS=<num>
    Paperless does multiple things in the background: Maintain the search index,
    maintain the automatic matching algorithm, check emails, consume documents,
    etc. This variable specifies how many things it will do in parallel.

PAPERLESS_THREADS_PER_WORKER=<num>
    Furthermore, paperless uses multiple threads when consuming documents to
    speed up OCR. This variable specifies how many pages paperless will process
    in parallel on a single document.

    .. caution::

        Ensure that the product

            PAPERLESS_TASK_WORKERS * PAPERLESS_THREADS_PER_WORKER

        does not exceed your CPU core count or else paperless will be extremely slow.
        If you want paperless to process many documents in parallel, choose a high
        worker count. If you want paperless to process very large documents faster,
        use a higher thread per worker count.

    The default is a balance between the two, according to your CPU core count,
    with a slight favor towards threads per worker, and using as much cores as
    possible.

    If you only specify PAPERLESS_TASK_WORKERS, paperless will adjust
    PAPERLESS_THREADS_PER_WORKER automatically.



PAPERLESS_TIME_ZONE=<timezone>
    Set the time zone here.
    See https://docs.djangoproject.com/en/3.1/ref/settings/#std:setting-TIME_ZONE
    for details on how to set it.

    Defaults to UTC.



PAPERLESS_OCR_PAGES=<num>
    Tells paperless to use only the specified amount of pages for OCR. Documents
    with less than the specified amount of pages get OCR'ed completely.

    Specifying 1 here will only use the first page.

    Defaults to 0, which disables this feature and always uses all pages.



PAPERLESS_OCR_LANGUAGE=<lang>
    Customize the default language that tesseract will attempt to use when
    parsing documents. The default language is used whenever

    * No language could be detected on a document
    * No tesseract data files are available for the detected language

    It should be a 3-letter language code consistent with ISO
    639: https://www.loc.gov/standards/iso639-2/php/code_list.php

    Set this to the language most of your documents are written in.

    Defaults to "eng".

PAPERLESS_OCR_ALWAYS=<bool>
    By default Paperless does not OCR a document if the text can be retrieved from
    the document directly. Set to true to always OCR documents.

    Defaults to false.

PAPERLESS_CONSUMER_POLLING=<num>
    If paperless won't find documents added to your consume folder, it might
    not be able to automatically detect filesystem changes. In that case,
    specify a polling interval in seconds here, which will then cause paperless
    to periodically check your consumption directory for changes.

    Defaults to 0, which disables polling and uses filesystem notifications.

PAPERLESS_CONSUMER_DELETE_DUPLICATES=<bool>
    When the consumer detects a duplicate document, it will not touch the
    original document. This default behavior can be changed here.

    Defaults to false.

PAPERLESS_CONVERT_MEMORY_LIMIT=<num>
    On smaller systems, or even in the case of Very Large Documents, the consumer
    may explode, complaining about how it's "unable to extend pixel cache".  In
    such cases, try setting this to a reasonably low value, like 32.  The
    default is to use whatever is necessary to do everything without writing to
    disk, and units are in megabytes.

    For more information on how to use this value, you should search
    the web for "MAGICK_MEMORY_LIMIT".

    Defaults to 0, which disables the limit.

PAPERLESS_CONVERT_TMPDIR=<path>
    Similar to the memory limit, if you've got a small system and your OS mounts
    /tmp as tmpfs, you should set this to a path that's on a physical disk, like
    /home/your_user/tmp or something.  ImageMagick will use this as scratch space
    when crunching through very large documents.

    For more information on how to use this value, you should search
    the web for "MAGICK_TMPDIR".

    Default is none, which disables the temporary directory.

PAPERLESS_CONVERT_DENSITY=<num>
    This setting has a high impact on the physical size of tmp page files,
    the speed of document conversion, and can affect the accuracy of OCR
    results. Individual results can vary and this setting should be tested
    thoroughly against the documents you are importing to see if it has any
    impacts either negative or positive.
    Testing on limited document sets has shown a setting of 200 can cut the
    size of tmp files by 1/3, and speed up conversion by up to 4x
    with little impact to OCR accuracy.

    Default is 300.

PAPERLESS_OPTIMIZE_THUMBNAILS=<bool>
    Use optipng to optimize thumbnails. This usually reduces the size of
    thumbnails by about 20%, but uses considerable compute time during
    consumption.

    Defaults to true.

PAPERLESS_POST_CONSUME_SCRIPT=<filename>
    After a document is consumed, Paperless can trigger an arbitrary script if
    you like.  This script will be passed a number of arguments for you to work
    with. For more information, take a look at :ref:`advanced-post_consume_script`.

    The default is blank, which means nothing will be executed.

PAPERLESS_FILENAME_DATE_ORDER=<format>
    Paperless will check the document text for document date information.
    Use this setting to enable checking the document filename for date
    information. The date order can be set to any option as specified in
    https://dateparser.readthedocs.io/en/latest/settings.html#date-order.
    The filename will be checked first, and if nothing is found, the document
    text will be checked as normal.

    Defaults to none, which disables this feature.

PAPERLESS_FILENAME_PARSE_TRANSFORMS
    Transforms filenames before they are processed by paperless. See
    :ref:`advanced-transforming_filenames` for details.

    Defaults to none, which disables this feature.

Binaries
########

There are a few external software packages that Paperless expects to find on
your system when it starts up.  Unless you've done something creative with
their installation, you probably won't need to edit any of these.  However,
if you've installed these programs somewhere where simply typing the name of
the program doesn't automatically execute it (ie. the program isn't in your
$PATH), then you'll need to specify the literal path for that program.

PAPERLESS_CONVERT_BINARY=<path>
    Defaults to "/usr/bin/convert".

PAPERLESS_GS_BINARY=<path>
    Defaults to "/usr/bin/gs".

PAPERLESS_UNPAPER_BINARY=<path>
    Defaults to "/usr/bin/unpaper".

PAPERLESS_OPTIPNG_BINARY=<path>
    Defaults to "/usr/bin/optipng".