paperless-ngx/docs/configuration.rst

.. _configuration:

*************
Configuration
*************

Paperless provides a wide range of customizations.
Depending on how you run paperless, these settings have to be defined in different
places.

*   If you run paperless on docker, ``paperless.conf`` is not used. Rather, configure
    paperless by copying necessary options to ``docker-compose.env``.
*   If you are running paperless on anything else, paperless will search for the
    configuration file in these locations and use the first one it finds:

    .. code::

        /path/to/paperless/paperless.conf
        /etc/paperless.conf
        /usr/local/etc/paperless.conf


Required services
#################

PAPERLESS_REDIS=<url>
    This is required for processing scheduled tasks such as email fetching, index
    optimization and for training the automatic document matcher.

    Defaults to redis://localhost:6379.

PAPERLESS_DBHOST=<hostname>
    By default, sqlite is used as the database backend. This can be changed here.
    Set PAPERLESS_DBHOST and PostgreSQL will be used instead of sqlite.

PAPERLESS_DBPORT=<port>
    Adjust port if necessary.

    Default is 5432.

PAPERLESS_DBNAME=<name>
    Database name in PostgreSQL.

    Defaults to "paperless".

PAPERLESS_DBUSER=<name>
    Database user in PostgreSQL.

    Defaults to "paperless".

PAPERLESS_DBPASS=<password>
    Database password for PostgreSQL.

    Defaults to "paperless".

PAPERLESS_DBSSLMODE=<mode>
    SSL mode to use when connecting to PostgreSQL.

    See `the official documentation about sslmode <https://www.postgresql.org/docs/current/libpq-ssl.html>`_.

    Default is ``prefer``.

PAPERLESS_DB_TIMEOUT=<float>
    Amount of time for a database connection to wait for the database to unlock.
    Mostly applicable for an sqlite based installation, consider changing to postgresql
    if you need to increase this.

    Defaults to unset, keeping the Django defaults.

Paths and folders
#################

PAPERLESS_CONSUMPTION_DIR=<path>
    This where your documents should go to be consumed.  Make sure that it exists
    and that the user running the paperless service can read/write its contents
    before you start Paperless.

    Don't change this when using docker, as it only changes the path within the
    container. Change the local consumption directory in the docker-compose.yml
    file instead.

    Defaults to "../consume/", relative to the "src" directory.

PAPERLESS_DATA_DIR=<path>
    This is where paperless stores all its data (search index, SQLite database,
    classification model, etc).

    Defaults to "../data/", relative to the "src" directory.

PAPERLESS_TRASH_DIR=<path>
    Instead of removing deleted documents, they are moved to this directory.

    This must be writeable by the user running paperless. When running inside
    docker, ensure that this path is within a permanent volume (such as
    "../media/trash") so it won't get lost on upgrades.

    Defaults to empty (i.e. really delete documents).

PAPERLESS_MEDIA_ROOT=<path>
    This is where your documents and thumbnails are stored.

    You can set this and PAPERLESS_DATA_DIR to the same folder to have paperless
    store all its data within the same volume.

    Defaults to "../media/", relative to the "src" directory.

PAPERLESS_STATICDIR=<path>
    Override the default STATIC_ROOT here.  This is where all static files
    created using "collectstatic" manager command are stored.

    Unless you're doing something fancy, there is no need to override this.

    Defaults to "../static/", relative to the "src" directory.

PAPERLESS_FILENAME_FORMAT=<format>
    Changes the filenames paperless uses to store documents in the media directory.
    See :ref:`advanced-file_name_handling` for details.

    Default is none, which disables this feature.

PAPERLESS_FILENAME_FORMAT_REMOVE_NONE=<bool>
    Tells paperless to replace placeholders in `PAPERLESS_FILENAME_FORMAT` that would resolve
    to 'none' to be omitted from the resulting filename. This also holds true for directory
    names.
    See :ref:`advanced-file_name_handling` for details.

    Defaults to `false` which disables this feature.

PAPERLESS_LOGGING_DIR=<path>
    This is where paperless will store log files.

    Defaults to "``PAPERLESS_DATA_DIR``/log/".


Logging
#######

PAPERLESS_LOGROTATE_MAX_SIZE=<num>
    Maximum file size for log files before they are rotated, in bytes.

    Defaults to 1 MiB.

PAPERLESS_LOGROTATE_MAX_BACKUPS=<num>
    Number of rotated log files to keep.

    Defaults to 20.

.. _hosting-and-security:

Hosting & Security
##################

PAPERLESS_SECRET_KEY=<key>
    Paperless uses this to make session tokens. If you expose paperless on the
    internet, you need to change this, since the default secret is well known.

    Use any sequence of characters. The more, the better. You don't need to
    remember this. Just face-roll your keyboard.

    Default is listed in the file ``src/paperless/settings.py``.

PAPERLESS_URL=<url>
    This setting can be used to set the three options below (ALLOWED_HOSTS,
    CORS_ALLOWED_HOSTS and CSRF_TRUSTED_ORIGINS). If the other options are
    set the values will be combined with this one. Do not include a trailing
    slash. E.g. https://paperless.domain.com

    Defaults to empty string, leaving the other settings unaffected.

PAPERLESS_CSRF_TRUSTED_ORIGINS=<comma-separated-list>
    A list of trusted origins for unsafe requests (e.g. POST). As of Django 4.0
    this is required to access the Django admin via the web.
    See https://docs.djangoproject.com/en/4.0/ref/settings/#csrf-trusted-origins

    Can also be set using PAPERLESS_URL (see above).

    Defaults to empty string, which does not add any origins to the trusted list.

PAPERLESS_ALLOWED_HOSTS=<comma-separated-list>
    If you're planning on putting Paperless on the open internet, then you
    really should set this value to the domain name you're using.  Failing to do
    so leaves you open to HTTP host header attacks:
    https://docs.djangoproject.com/en/3.1/topics/security/#host-header-validation

    Just remember that this is a comma-separated list, so "example.com" is fine,
    as is "example.com,www.example.com", but NOT " example.com" or "example.com,"

    Can also be set using PAPERLESS_URL (see above).

    If manually set, please remember to include "localhost". Otherwise docker
    healthcheck will fail.

    Defaults to "*", which is all hosts.

PAPERLESS_CORS_ALLOWED_HOSTS=<comma-separated-list>
    You need to add your servers to the list of allowed hosts that can do CORS
    calls. Set this to your public domain name.

    Can also be set using PAPERLESS_URL (see above).

    Defaults to "http://localhost:8000".

PAPERLESS_FORCE_SCRIPT_NAME=<path>
    To host paperless under a subpath url like example.com/paperless you set
    this value to /paperless. No trailing slash!

    Defaults to none, which hosts paperless at "/".

PAPERLESS_STATIC_URL=<path>
    Override the STATIC_URL here.  Unless you're hosting Paperless off a
    subdomain like /paperless/, you probably don't need to change this.

    Defaults to "/static/".

PAPERLESS_AUTO_LOGIN_USERNAME=<username>
    Specify a username here so that paperless will automatically perform login
    with the selected user.

    .. danger::

        Do not use this when exposing paperless on the internet. There are no
        checks in place that would prevent you from doing this.

    Defaults to none, which disables this feature.

PAPERLESS_ADMIN_USER=<username>
    If this environment variable is specified, Paperless automatically creates
    a superuser with the provided username at start. This is useful in cases
    where you can not run the `createsuperuser` command separately, such as Kubernetes
    or AWS ECS.

    Requires `PAPERLESS_ADMIN_PASSWORD` to be set.

    .. note::

        This will not change an existing [super]user's password, nor will
        it recreate a user that already exists. You can leave this throughout
        the lifecycle of the containers.

PAPERLESS_ADMIN_MAIL=<email>
    (Optional) Specify superuser email address. Only used when
    `PAPERLESS_ADMIN_USER` is set.

    Defaults to ``root@localhost``.

PAPERLESS_ADMIN_PASSWORD=<password>
    Only used when `PAPERLESS_ADMIN_USER` is set.
    This will be the password of the automatically created superuser.


PAPERLESS_COOKIE_PREFIX=<str>
    Specify a prefix that is added to the cookies used by paperless to identify
    the currently logged in user. This is useful for when you're running two
    instances of paperless on the same host.

    After changing this, you will have to login again.

    Defaults to ``""``, which does not alter the cookie names.

PAPERLESS_ENABLE_HTTP_REMOTE_USER=<bool>
    Allows authentication via HTTP_REMOTE_USER which is used by some SSO
    applications.

    .. warning::

        This will allow authentication by simply adding a ``Remote-User: <username>`` header
        to a request. Use with care! You especially *must* ensure that any such header is not
        passed from your proxy server to paperless.

        If you're exposing paperless to the internet directly, do not use this.

        Also see the warning `in the official documentation <https://docs.djangoproject.com/en/3.1/howto/auth-remote-user/#configuration>`.

    Defaults to `false` which disables this feature.

PAPERLESS_HTTP_REMOTE_USER_HEADER_NAME=<str>
    If `PAPERLESS_ENABLE_HTTP_REMOTE_USER` is enabled, this property allows to
    customize the name of the HTTP header from which the authenticated username
    is extracted. Values are in terms of
    [HttpRequest.META](https://docs.djangoproject.com/en/3.1/ref/request-response/#django.http.HttpRequest.META).
    Thus, the configured value must start with `HTTP_` followed by the
    normalized actual header name.

    Defaults to `HTTP_REMOTE_USER`.

PAPERLESS_LOGOUT_REDIRECT_URL=<str>
    URL to redirect the user to after a logout. This can be used together with
    `PAPERLESS_ENABLE_HTTP_REMOTE_USER` to redirect the user back to the SSO
    application's logout page.

    Defaults to None, which disables this feature.

.. _configuration-ocr:

OCR settings
############

Paperless uses `OCRmyPDF <https://ocrmypdf.readthedocs.io/en/latest/>`_ for
performing OCR on documents and images. Paperless uses sensible defaults for
most settings, but all of them can be configured to your needs.

PAPERLESS_OCR_LANGUAGE=<lang>
    Customize the language that paperless will attempt to use when
    parsing documents.

    It should be a 3-letter language code consistent with ISO
    639: https://www.loc.gov/standards/iso639-2/php/code_list.php

    Set this to the language most of your documents are written in.

    This can be a combination of multiple languages such as ``deu+eng``,
    in which case tesseract will use whatever language matches best.
    Keep in mind that tesseract uses much more cpu time with multiple
    languages enabled.

    Defaults to "eng".

		Note: If your language contains a '-' such as chi-sim, you must use chi_sim

PAPERLESS_OCR_MODE=<mode>
    Tell paperless when and how to perform ocr on your documents. Four modes
    are available:

    *   ``skip``: Paperless skips all pages and will perform ocr only on pages
        where no text is present. This is the safest option.
    *   ``skip_noarchive``: In addition to skip, paperless won't create an
        archived version of your documents when it finds any text in them.
        This is useful if you don't want to have two almost-identical versions
        of your digital documents in the media folder. This is the fastest option.
    *   ``redo``: Paperless will OCR all pages of your documents and attempt to
        replace any existing text layers with new text. This will be useful for
        documents from scanners that already performed OCR with insufficient
        results. It will also perform OCR on purely digital documents.

        This option may fail on some documents that have features that cannot
        be removed, such as forms. In this case, the text from the document is
        used instead.
    *   ``force``: Paperless rasterizes your documents, converting any text
        into images and puts the OCRed text on top. This works for all documents,
        however, the resulting document may be significantly larger and text
        won't appear as sharp when zoomed in.

    The default is ``skip``, which only performs OCR when necessary and always
    creates archived documents.

    Read more about this in the `OCRmyPDF documentation <https://ocrmypdf.readthedocs.io/en/latest/advanced.html#when-ocr-is-skipped>`_.

PAPERLESS_OCR_CLEAN=<mode>
    Tells paperless to use ``unpaper`` to clean any input document before
    sending it to tesseract. This uses more resources, but generally results
    in better OCR results. The following modes are available:

    *   ``clean``: Apply unpaper.
    *   ``clean-final``: Apply unpaper, and use the cleaned images to build the
        output file instead of the original images.
    *   ``none``: Do not apply unpaper.

    Defaults to ``clean``.

    .. note::

        ``clean-final`` is incompatible with ocr mode ``redo``. When both
        ``clean-final`` and the ocr mode ``redo`` is configured, ``clean``
        is used instead.

PAPERLESS_OCR_DESKEW=<bool>
    Tells paperless to correct skewing (slight rotation of input images mainly
    due to improper scanning)

    Defaults to ``true``, which enables this feature.

    .. note::

        Deskewing is incompatible with ocr mode ``redo``. Deskewing will get
        disabled automatically if ``redo`` is used as the ocr mode.

PAPERLESS_OCR_ROTATE_PAGES=<bool>
    Tells paperless to correct page rotation (90°, 180° and 270° rotation).

    If you notice that paperless is not rotating incorrectly rotated
    pages (or vice versa), try adjusting the threshold up or down (see below).

    Defaults to ``true``, which enables this feature.


PAPERLESS_OCR_ROTATE_PAGES_THRESHOLD=<num>
    Adjust the threshold for automatic page rotation by ``PAPERLESS_OCR_ROTATE_PAGES``.
    This is an arbitrary value reported by tesseract. "15" is a very conservative value,
    whereas "2" is a very aggressive option and will often result in correctly rotated pages
    being rotated as well.

    Defaults to "12".

PAPERLESS_OCR_OUTPUT_TYPE=<type>
    Specify the the type of PDF documents that paperless should produce.

    *   ``pdf``: Modify the PDF document as little as possible.
    *   ``pdfa``: Convert PDF documents into PDF/A-2b documents, which is a
        subset of the entire PDF specification and meant for storing
        documents long term.
    *   ``pdfa-1``, ``pdfa-2``, ``pdfa-3`` to specify the exact version of
        PDF/A you wish to use.

    If not specified, ``pdfa`` is used. Remember that paperless also keeps
    the original input file as well as the archived version.


PAPERLESS_OCR_PAGES=<num>
    Tells paperless to use only the specified amount of pages for OCR. Documents
    with less than the specified amount of pages get OCR'ed completely.

    Specifying 1 here will only use the first page.

    When combined with ``PAPERLESS_OCR_MODE=redo`` or ``PAPERLESS_OCR_MODE=force``,
    paperless will not modify any text it finds on excluded pages and copy it
    verbatim.

    Defaults to 0, which disables this feature and always uses all pages.

PAPERLESS_OCR_IMAGE_DPI=<num>
    Paperless will OCR any images you put into the system and convert them
    into PDF documents. This is useful if your scanner produces images.
    In order to do so, paperless needs to know the DPI of the image.
    Most images from scanners will have this information embedded and
    paperless will detect and use that information. In case this fails, it
    uses this value as a fallback.

    Set this to the DPI your scanner produces images at.

    Default is none, which will automatically calculate image DPI so that
    the produced PDF documents are A4 sized.

PAPERLESS_OCR_MAX_IMAGE_PIXELS=<num>
    Paperless will raise a warning when OCRing images which are over this limit and
    will not OCR images which are more than twice this limit.  Note this does not
    prevent the document from being consumed, but could result in missing text content.

    If unset, will default to the value determined by
    `Pillow <https://pillow.readthedocs.io/en/stable/reference/Image.html#PIL.Image.MAX_IMAGE_PIXELS>`_.

    .. note::

        Increasing this limit could cause Paperless to consume additional resources
        when consuming a file.  Be sure you have sufficient system resources.

    .. caution::

        The limit is intended to prevent malicious files from consuming system resources
        and causing crashes and other errors.  Only increase this value if you are certain
        your documents are not malicious and you need the text which was not OCRed

PAPERLESS_OCR_USER_ARGS=<json>
    OCRmyPDF offers many more options. Use this parameter to specify any
    additional arguments you wish to pass to OCRmyPDF. Since Paperless uses
    the API of OCRmyPDF, you have to specify these in a format that can be
    passed to the API. See `the API reference of OCRmyPDF <https://ocrmypdf.readthedocs.io/en/latest/api.html#reference>`_
    for valid parameters. All command line options are supported, but they
    use underscores instead of dashes.

    .. caution::

        Paperless has been tested to work with the OCR options provided
        above. There are many options that are incompatible with each other,
        so specifying invalid options may prevent paperless from consuming
        any documents.

    Specify arguments as a JSON dictionary. Keep note of lower case booleans
    and double quoted parameter names and strings. Examples:

    .. code:: json

        {"deskew": true, "optimize": 3, "unpaper_args": "--pre-rotate 90"}

.. _configuration-tika:

Tika settings
#############

Paperless can make use of `Tika <https://tika.apache.org/>`_ and
`Gotenberg <https://gotenberg.dev/>`_ for parsing and
converting "Office" documents (such as ".doc", ".xlsx" and ".odt"). If you
wish to use this, you must provide a Tika server and a Gotenberg server,
configure their endpoints, and enable the feature.

PAPERLESS_TIKA_ENABLED=<bool>
    Enable (or disable) the Tika parser.

    Defaults to false.

PAPERLESS_TIKA_ENDPOINT=<url>
    Set the endpoint URL were Paperless can reach your Tika server.

    Defaults to "http://localhost:9998".

PAPERLESS_TIKA_GOTENBERG_ENDPOINT=<url>
    Set the endpoint URL were Paperless can reach your Gotenberg server.

    Defaults to "http://localhost:3000".

If you run paperless on docker, you can add those services to the docker-compose
file (see the provided ``docker-compose.sqlite-tika.yml`` file for reference). The changes
requires are as follows:

.. code:: yaml

    services:
        # ...

        webserver:
            # ...

            environment:
                # ...

                PAPERLESS_TIKA_ENABLED: 1
                PAPERLESS_TIKA_GOTENBERG_ENDPOINT: http://gotenberg:3000
                PAPERLESS_TIKA_ENDPOINT: http://tika:9998

        # ...

        gotenberg:
            image: gotenberg/gotenberg:7.4
            restart: unless-stopped
            command:
                - "gotenberg"
                - "--chromium-disable-routes=true"

        tika:
            image: ghcr.io/paperless-ngx/tika:latest
            restart: unless-stopped

Add the configuration variables to the environment of the webserver (alternatively
put the configuration in the ``docker-compose.env`` file) and add the additional
services below the webserver service. Watch out for indentation.

Make sure to use the correct format `PAPERLESS_TIKA_ENABLED = 1` so python_dotenv can parse the statement correctly.

Software tweaks
###############

PAPERLESS_TASK_WORKERS=<num>
    Paperless does multiple things in the background: Maintain the search index,
    maintain the automatic matching algorithm, check emails, consume documents,
    etc. This variable specifies how many things it will do in parallel.

    Defaults to 1


PAPERLESS_THREADS_PER_WORKER=<num>
    Furthermore, paperless uses multiple threads when consuming documents to
    speed up OCR. This variable specifies how many pages paperless will process
    in parallel on a single document.

    .. caution::

        Ensure that the product

            PAPERLESS_TASK_WORKERS * PAPERLESS_THREADS_PER_WORKER

        does not exceed your CPU core count or else paperless will be extremely slow.
        If you want paperless to process many documents in parallel, choose a high
        worker count. If you want paperless to process very large documents faster,
        use a higher thread per worker count.

    The default is a balance between the two, according to your CPU core count,
    with a slight favor towards threads per worker:

    +----------------+---------+---------+
    | CPU core count | Workers | Threads |
    +----------------+---------+---------+
    |              1 |       1 |       1 |
    +----------------+---------+---------+
    |              2 |       2 |       1 |
    +----------------+---------+---------+
    |              4 |       2 |       2 |
    +----------------+---------+---------+
    |              6 |       2 |       3 |
    +----------------+---------+---------+
    |              8 |       2 |       4 |
    +----------------+---------+---------+
    |             12 |       3 |       4 |
    +----------------+---------+---------+
    |             16 |       4 |       4 |
    +----------------+---------+---------+

    If you only specify PAPERLESS_TASK_WORKERS, paperless will adjust
    PAPERLESS_THREADS_PER_WORKER automatically.


PAPERLESS_WORKER_TIMEOUT=<num>
    Machines with few cores or weak ones might not be able to finish OCR on
    large documents within the default 1800 seconds. So extending this timeout
    may prove to be useful on weak hardware setups.

PAPERLESS_WORKER_RETRY=<num>
    If PAPERLESS_WORKER_TIMEOUT has been configured, the retry time for a task can
    also be configured.  By default, this value will be set to 10s more than the
    worker timeout.  This value should never be set less than the worker timeout.

PAPERLESS_TIME_ZONE=<timezone>
    Set the time zone here.
    See https://docs.djangoproject.com/en/3.1/ref/settings/#std:setting-TIME_ZONE
    for details on how to set it.

    Defaults to UTC.


.. _configuration-polling:

PAPERLESS_CONSUMER_POLLING=<num>
    If paperless won't find documents added to your consume folder, it might
    not be able to automatically detect filesystem changes. In that case,
    specify a polling interval in seconds here, which will then cause paperless
    to periodically check your consumption directory for changes. This will also
    disable listening for file system changes with ``inotify``.

    Defaults to 0, which disables polling and uses filesystem notifications.

PAPERLESS_CONSUMER_POLLING_RETRY_COUNT=<num>
    If consumer polling is enabled, sets the number of times paperless will check for a
    file to remain unmodified.

    Defaults to 5.

PAPERLESS_CONSUMER_POLLING_DELAY=<num>
    If consumer polling is enabled, sets the delay in seconds between each check (above) paperless
    will do while waiting for a file to remain unmodified.

    Defaults to 5.

.. _configuration-inotify:

PAPERLESS_CONSUMER_INOTIFY_DELAY=<num>
    Sets the time in seconds the consumer will wait for additional events
    from inotify before the consumer will consider a file ready and begin consumption.
    Certain scanners or network setups may generate multiple events for a single file,
    leading to multiple consumers working on the same file.  Configure this to
    prevent that.

    Defaults to 0.5 seconds.

PAPERLESS_CONSUMER_DELETE_DUPLICATES=<bool>
    When the consumer detects a duplicate document, it will not touch the
    original document. This default behavior can be changed here.

    Defaults to false.


PAPERLESS_CONSUMER_RECURSIVE=<bool>
    Enable recursive watching of the consumption directory. Paperless will
    then pickup files from files in subdirectories within your consumption
    directory as well.

    Defaults to false.


PAPERLESS_CONSUMER_SUBDIRS_AS_TAGS=<bool>
    Set the names of subdirectories as tags for consumed files.
    E.g. <CONSUMPTION_DIR>/foo/bar/file.pdf will add the tags "foo" and "bar" to
    the consumed file. Paperless will create any tags that don't exist yet.

    This is useful for sorting documents with certain tags such as ``car`` or
    ``todo`` prior to consumption. These folders won't be deleted.

    PAPERLESS_CONSUMER_RECURSIVE must be enabled for this to work.

    Defaults to false.

PAPERLESS_CONSUMER_ENABLE_BARCODES=<bool>
    Enables the scanning and page separation based on detected barcodes.
    This allows for scanning and adding multiple documents per uploaded
    file, which are separated by one or multiple barcode pages.

    For ease of use, it is suggested to use a standardized separation page,
    e.g. `here <https://www.alliancegroup.co.uk/patch-codes.htm>`_.

    If no barcodes are detected in the uploaded file, no page separation
    will happen.

    The original document will be removed and the separated pages will be
    saved as pdf.

    Defaults to false.

PAPERLESS_CONSUMER_BARCODE_TIFF_SUPPORT=<bool>
    Whether TIFF image files should be scanned for barcodes.
    This will automatically convert any TIFF image(s) to pdfs for later
    processing.
    This only has an effect, if PAPERLESS_CONSUMER_ENABLE_BARCODES has been
    enabled.

    Defaults to false.

PAPERLESS_CONSUMER_BARCODE_STRING=PATCHT
  Defines the string to be detected as a separator barcode.
  If paperless is used with the PATCH-T separator pages, users
  shouldn't change this.

  Defaults to "PATCHT"

PAPERLESS_CONVERT_MEMORY_LIMIT=<num>
    On smaller systems, or even in the case of Very Large Documents, the consumer
    may explode, complaining about how it's "unable to extend pixel cache".  In
    such cases, try setting this to a reasonably low value, like 32.  The
    default is to use whatever is necessary to do everything without writing to
    disk, and units are in megabytes.

    For more information on how to use this value, you should search
    the web for "MAGICK_MEMORY_LIMIT".

    Defaults to 0, which disables the limit.

PAPERLESS_CONVERT_TMPDIR=<path>
    Similar to the memory limit, if you've got a small system and your OS mounts
    /tmp as tmpfs, you should set this to a path that's on a physical disk, like
    /home/your_user/tmp or something.  ImageMagick will use this as scratch space
    when crunching through very large documents.

    For more information on how to use this value, you should search
    the web for "MAGICK_TMPDIR".

    Default is none, which disables the temporary directory.

PAPERLESS_POST_CONSUME_SCRIPT=<filename>
    After a document is consumed, Paperless can trigger an arbitrary script if
    you like.  This script will be passed a number of arguments for you to work
    with. For more information, take a look at :ref:`advanced-post_consume_script`.

    The default is blank, which means nothing will be executed.

PAPERLESS_FILENAME_DATE_ORDER=<format>
    Paperless will check the document text for document date information.
    Use this setting to enable checking the document filename for date
    information. The date order can be set to any option as specified in
    https://dateparser.readthedocs.io/en/latest/settings.html#date-order.
    The filename will be checked first, and if nothing is found, the document
    text will be checked as normal.

    A date in a filename must have some separators (`.`, `-`, `/`, etc)
    for it to be parsed.

    Defaults to none, which disables this feature.

PAPERLESS_THUMBNAIL_FONT_NAME=<filename>
    Paperless creates thumbnails for plain text files by rendering the content
    of the file on an image and uses a predefined font for that. This
    font can be changed here.

    Note that this won't have any effect on already generated thumbnails.

    Defaults to ``/usr/share/fonts/liberation/LiberationSerif-Regular.ttf``.

PAPERLESS_IGNORE_DATES=<string>
    Paperless parses a documents creation date from filename and file content.
    You may specify a comma separated list of dates that should be ignored during
    this process. This is useful for special dates (like date of birth) that appear
    in documents regularly but are very unlikely to be the documents creation date.

    The date is parsed using the order specified in PAPERLESS_DATE_ORDER

    Defaults to an empty string to not ignore any dates.

PAPERLESS_DATE_ORDER=<format>
    Paperless will try to determine the document creation date from its contents.
    Specify the date format Paperless should expect to see within your documents.

    This option defaults to DMY which translates to day first, month second, and year
    last order. Characters D, M, or Y can be shuffled to meet the required order.

PAPERLESS_CONSUMER_IGNORE_PATTERNS=<json>
    By default, paperless ignores certain files and folders in the consumption
    directory, such as system files created by the Mac OS.

    This can be adjusted by configuring a custom json array with patterns to exclude.

    Defaults to ``[".DS_STORE/*", "._*", ".stfolder/*", ".stversions/*", ".localized/*", "desktop.ini"]``.

Binaries
########

There are a few external software packages that Paperless expects to find on
your system when it starts up.  Unless you've done something creative with
their installation, you probably won't need to edit any of these.  However,
if you've installed these programs somewhere where simply typing the name of
the program doesn't automatically execute it (ie. the program isn't in your
$PATH), then you'll need to specify the literal path for that program.

PAPERLESS_CONVERT_BINARY=<path>
    Defaults to "/usr/bin/convert".

PAPERLESS_GS_BINARY=<path>
    Defaults to "/usr/bin/gs".


.. _configuration-docker:

Docker-specific options
#######################

These options don't have any effect in ``paperless.conf``. These options adjust
the behavior of the docker container. Configure these in `docker-compose.env`.

PAPERLESS_WEBSERVER_WORKERS=<num>
    The number of worker processes the webserver should spawn. More worker processes
    usually result in the front end to load data much quicker. However, each worker process
    also loads the entire application into memory separately, so increasing this value
    will increase RAM usage.

    Defaults to 1.

PAPERLESS_PORT=<port>
    The port number the webserver will listen on inside the container. There are
    special setups where you may need this to avoid collisions with other
    services (like using podman with multiple containers in one pod).

    Don't change this when using Docker. To change the port the webserver is
    reachable outside of the container, instead refer to the "ports" key in
    ``docker-compose.yml``.

    Defaults to 8000.

USERMAP_UID=<uid>
    The ID of the paperless user in the container. Set this to your actual user ID on the
    host system, which you can get by executing

    .. code:: shell-session

        $ id -u

    Paperless will change ownership on its folders to this user, so you need to get this right
    in order to be able to write to the consumption directory.

    Defaults to 1000.

USERMAP_GID=<gid>
    The ID of the paperless Group in the container. Set this to your actual group ID on the
    host system, which you can get by executing

    .. code:: shell-session

        $ id -g

    Paperless will change ownership on its folders to this group, so you need to get this right
    in order to be able to write to the consumption directory.

    Defaults to 1000.

PAPERLESS_OCR_LANGUAGES=<list>
    Additional OCR languages to install. By default, paperless comes with
    English, German, Italian, Spanish and French. If your language is not in this list, install
    additional languages with this configuration option:

    .. code:: bash

        PAPERLESS_OCR_LANGUAGES=tur ces

    To actually use these languages, also set the default OCR language of paperless:

    .. code:: bash

        PAPERLESS_OCR_LANGUAGE=tur

    Defaults to none, which does not install any additional languages.


.. _configuration-update-checking:

Update Checking
###############

PAPERLESS_ENABLE_UPDATE_CHECK=<bool>
    Enable (or disable) the automatic check for available updates. This feature is disabled
    by default but if it is not explicitly set Paperless-ngx will show a message about this.

    If enabled, the feature works by pinging the the Github API for the latest release e.g.
    https://api.github.com/repos/paperless-ngx/paperless-ngx/releases/latest
    to determine whether a new version is available.

    Actual updating of the app must still be performed manually.

    Note that for users of thirdy-party containers e.g. linuxserver.io this notification
    may be 'ahead' of a new release from the third-party maintainers.

    In either case, no tracking data is collected by the app in any way.

    Defaults to none, which disables the feature.