Merge branch 'dev' into beta

2025-11-09 03:46:12 -06:00 · 2022-11-09 13:51:10 -08:00
parent 3be5cc3f77 24453f34b8
commit fbce827583
168 changed files with 19419 additions and 23870 deletions
--- a/docs/advanced_usage.rst
+++ b/docs/advanced_usage.rst
@@ -258,12 +258,18 @@ Paperless provides the following placeholders within filenames:
 * ``{tag_list}``: A comma separated list of all tags assigned to the document.
 * ``{title}``: The title of the document.
 * ``{created}``: The full date (ISO format) the document was created.
-* ``{created_year}``: Year created only.
+* ``{created_year}``: Year created only, formatted as the year with century.
+* ``{created_year_short}``: Year created only, formatted as the year without century, zero padded.
 * ``{created_month}``: Month created only (number 01-12).
+* ``{created_month_name}``: Month created name, as per locale
+* ``{created_month_name_short}``: Month created abbreviated name, as per locale
 * ``{created_day}``: Day created only (number 01-31).
 * ``{added}``: The full date (ISO format) the document was added to paperless.
 * ``{added_year}``: Year added only.
+* ``{added_year_short}``: Year added only, formatted as the year without century, zero padded.
 * ``{added_month}``: Month added only (number 01-12).
+* ``{added_month_name}``: Month added name, as per locale
+* ``{added_month_name_short}``: Month added abbreviated name, as per locale
 * ``{added_day}``: Day added only (number 01-31).


@@ -364,3 +370,50 @@ For simplicity, `By Year` defines the same structure as in the previous example

    If you adjust the format of an existing storage path, old documents don't get relocated automatically.
    You need to run the :ref:`document renamer <utilities-renamer>` to adjust their pathes.
+
+.. _advanced-celery-monitoring:
+
+Celery Monitoring
+#################
+
+The monitoring tool `Flower <https://flower.readthedocs.io/en/latest/index.html>`_ can be used to view more
+detailed information about the health of the celery workers used for asynchronous tasks.  This includes details
+on currently running, queued and completed tasks, timing and more.  Flower can also be used with Prometheus, as it
+exports metrics.  For details on its capabilities, refer to the Flower documentation.
+
+To configure Flower further, create a `flowerconfig.py` and place it into the `src/paperless` directory.  For
+a Docker installation, you can use volumes to accomplish this:
+
+.. code:: yaml
+
+    services:
+      # ...
+      webserver:
+        # ...
+        volumes:
+          - /path/to/my/flowerconfig.py:/usr/src/paperless/src/paperless/flowerconfig.py:ro
+
+Custom Container Initialization
+###############################
+
+The Docker image includes the ability to run custom user scripts during startup.  This could be
+utilized for installing additional tools or Python packages, for example.
+
+To utilize this, mount a folder containing your scripts to the custom initialization directory, `/custom-cont-init.d`
+and place scripts you wish to run inside.  For security, the folder and its contents must be owned by `root`.
+Additionally, scripts must only be writable by `root`.
+
+Your scripts will be run directly before the webserver completes startup.  Scripts will be run by the `root` user.
+This is an advanced functionality with which you could break functionality or lose data.
+
+For example, using Docker Compose:
+
+
+.. code:: yaml
+
+    services:
+      # ...
+      webserver:
+        # ...
+        volumes:
+          - /path/to/my/scripts:/custom-cont-init.d:ro
--- a/docs/configuration.rst
+++ b/docs/configuration.rst
@@ -538,7 +538,7 @@ requires are as follows:
        # ...

        gotenberg:
-            image: gotenberg/gotenberg:7.4
+            image: gotenberg/gotenberg:7.6
            restart: unless-stopped
            command:
                - "gotenberg"
@@ -701,6 +701,7 @@ PAPERLESS_CONSUMER_ENABLE_BARCODES=<bool>

    Defaults to false.

+
 PAPERLESS_CONSUMER_BARCODE_TIFF_SUPPORT=<bool>
    Whether TIFF image files should be scanned for barcodes.
    This will automatically convert any TIFF image(s) to pdfs for later
@@ -901,6 +902,14 @@ PAPERLESS_OCR_LANGUAGES=<list>

    Defaults to none, which does not install any additional languages.

+PAPERLESS_ENABLE_FLOWER=<defined>
+    If this environment variable is defined, the Celery monitoring tool
+    `Flower <https://flower.readthedocs.io/en/latest/index.html>`_ will
+    be started by the container.
+
+    You can read more about this in the :ref:`advanced setup <advanced-celery-monitoring>`
+    documentation.
+

 .. _configuration-update-checking:

@@ -908,18 +917,9 @@ Update Checking
 ###############

 PAPERLESS_ENABLE_UPDATE_CHECK=<bool>
-    Enable (or disable) the automatic check for available updates. This feature is disabled
-    by default but if it is not explicitly set Paperless-ngx will show a message about this.

-    If enabled, the feature works by pinging the the Github API for the latest release e.g.
-    https://api.github.com/repos/paperless-ngx/paperless-ngx/releases/latest
-    to determine whether a new version is available.
+    .. note::

-    Actual updating of the app must still be performed manually.
-
-    Note that for users of thirdy-party containers e.g. linuxserver.io this notification
-    may be 'ahead' of a new release from the third-party maintainers.
-
-    In either case, no tracking data is collected by the app in any way.
-
-    Defaults to none, which disables the feature.
+            This setting was deprecated in favor of a frontend setting after v1.9.2. A one-time
+            migration is performed for users who have this setting set. This setting is always
+            ignored if the corresponding frontend setting has been set.
--- a/docs/extending.rst
+++ b/docs/extending.rst
@@ -112,7 +112,7 @@ To do the setup you need to perform the steps from the following chapters in a c

    .. code:: shell-session

-        python3 manage.py runserver & python3 manage.py document_consumer & python3 manage.py qcluster
+        python3 manage.py runserver & python3 manage.py document_consumer & celery --app paperless worker

 11. Login with the superuser credentials provided in step 8 at ``http://localhost:8000`` to create a session that enables you to use the backend.

@@ -128,14 +128,14 @@ Configure the IDE to use the src/ folder as the base source folder. Configure th
 launch configurations in your IDE:

 *   python3 manage.py runserver
-*   python3 manage.py qcluster
+*   celery --app paperless worker
 *   python3 manage.py document_consumer

 To start them all:

 .. code:: shell-session

-    python3 manage.py runserver & python3 manage.py document_consumer & python3 manage.py qcluster
+    python3 manage.py runserver & python3 manage.py document_consumer & celery --app paperless worker

 Testing and code style:

--- a/docs/requirements.txt
+++ b/docs/requirements.txt
@@ -1 +1 @@
-myst-parser==0.17.2
+myst-parser==0.18.1
--- a/docs/setup.rst
+++ b/docs/setup.rst
@@ -39,7 +39,7 @@ Paperless consists of the following components:

    .. _setup-task_processor:

-*   **The task processor:** Paperless relies on `Django Q <https://django-q.readthedocs.io/en/latest/>`_
+*   **The task processor:** Paperless relies on `Celery - Distributed Task Queue <https://docs.celeryq.dev/en/stable/index.html>`_
    for doing most of the heavy lifting. This is a task queue that accepts tasks from
    multiple sources and processes these in parallel. It also comes with a scheduler that executes
    certain commands periodically.
@@ -62,13 +62,6 @@ Paperless consists of the following components:
    tasks fail and inspect the errors (i.e., wrong email credentials, errors during consuming a specific
    file, etc).

-    You may start the task processor by executing:
-
-    .. code:: shell-session
-
-        $ cd /path/to/paperless/src/
-        $ python3 manage.py qcluster
-
 *   A `redis <https://redis.io/>`_ message broker: This is a really lightweight service that is responsible
    for getting the tasks from the webserver and the consumer to the task scheduler. These run in a different
    process (maybe even on different machines!), and therefore, this is necessary.
@@ -291,7 +284,20 @@ Build the Docker image yourself
    .. code:: yaml

        webserver:
-            build: .
+            build:
+              context: .
+              args:
+                QPDF_VERSION: x.y.x
+                PIKEPDF_VERSION: x.y.z
+                PSYCOPG2_VERSION: x.y.z
+                JBIG2ENC_VERSION: 0.29
+
+    .. note::
+
+        You should match the build argument versions to the version for the release you have
+        checked out.  These are pre-built images with certain, more updated software.
+        If you want to build these images your self, that is possible, but beyond
+        the scope of these steps.

 4.  Follow steps 3 to 8 of :ref:`setup-docker_hub`. When asked to run
    ``docker-compose pull`` to pull the image, do
@@ -332,7 +338,7 @@ writing. Windows is not and will never be supported.

    .. code::

-        python3 python3-pip python3-dev imagemagick fonts-liberation gnupg libpq-dev libmagic-dev mime-support libzbar0 poppler-utils
+        python3 python3-pip python3-dev imagemagick fonts-liberation gnupg libpq-dev default-libmysqlclient-dev libmagic-dev mime-support libzbar0 poppler-utils

    These dependencies are required for OCRmyPDF, which is used for text recognition.

@@ -361,7 +367,7 @@ writing. Windows is not and will never be supported.
    You will also need ``build-essential``, ``python3-setuptools`` and ``python3-wheel``
    for installing some of the python dependencies.

-2.  Install ``redis`` >= 5.0 and configure it to start automatically.
+2.  Install ``redis`` >= 6.0 and configure it to start automatically.

 3.  Optional. Install ``postgresql`` and configure a database, user and password for paperless. If you do not wish
    to use PostgreSQL, MariaDB and SQLite are available as well.
@@ -461,8 +467,9 @@ writing. Windows is not and will never be supported.
    as a starting point.

    Paperless needs the ``webserver`` script to run the webserver, the
-    ``consumer`` script to watch the input folder, and the ``scheduler``
-    script to run tasks such as email checking and document consumption.
+    ``consumer`` script to watch the input folder, ``taskqueue`` for the background workers
+    used to handle things like document consumption and the ``scheduler`` script to run tasks such as
+    email checking at certain times .

 		The ``socket`` script enables ``gunicorn`` to run on port 80 without
 		root privileges. For this you need to uncomment the ``Require=paperless-webserver.socket``
@@ -513,6 +520,13 @@ writing. Windows is not and will never be supported.
    to compile this by yourself, because this software has been patented until around 2017 and
    binary packages are not available for most distributions.

+15. Optional: If using the NLTK machine learning processing (see ``PAPERLESS_ENABLE_NLTK`` in
+    :ref:`configuration` for details), download the NLTK data for the Snowball Stemmer, Stopwords
+    and Punkt tokenizer to your ``PAPERLESS_DATA_DIR/nltk``.  Refer to
+    the `NLTK instructions <https://www.nltk.org/data.html>`_ for details on how to
+    download the data.
+
+
 Migrating to Paperless-ngx
 ##########################

@@ -809,6 +823,8 @@ configuring some options in paperless can help improve performance immensely:
    OCR results.
 *   If using docker, consider setting ``PAPERLESS_WEBSERVER_WORKERS`` to
    1. This will save some memory.
+*   Consider setting ``PAPERLESS_ENABLE_NLTK`` to false, to disable the more
+    advanced language processing, which can take more memory and processing time.

 For details, refer to :ref:`configuration`.

--- a/docs/troubleshooting.rst
+++ b/docs/troubleshooting.rst
@@ -19,7 +19,7 @@ Check for the following issues:

    .. code:: shell-session

-        $ python3 manage.py qcluster
+        $ celery --app paperless worker

 *   Look at the output of paperless and inspect it for any errors.
 *   Go to the admin interface, and check if there are failed tasks. If so, the
@@ -125,7 +125,7 @@ If using docker-compose, this is achieved by the following configuration change
 .. code:: yaml

    gotenberg:
-        image: gotenberg/gotenberg:7.4
+        image: gotenberg/gotenberg:7.6
        restart: unless-stopped
        command:
            - "gotenberg"