Merge branch 'dev' into feature-websockets-status

2025-10-28 03:46:06 -05:00 · 2021-01-30 16:08:50 +01:00
parent 014c0a4fb8 8dd2d7bbb7
commit fce76d0671
45 changed files with 1017 additions and 515 deletions
--- a/docs/administration.rst
+++ b/docs/administration.rst
@@ -121,27 +121,19 @@ After grabbing the new release and unpacking the contents, do the following:
    dependencies. The dependencies required are listed in the section about
    :ref:`bare metal installations <setup-bare_metal>`.

-2.  Update python requirements. If you use Pipenv, this is done with the following steps.
+2.  Update python requirements. Keep in mind to activate your virtual environment
+    before that, if you use one.

    .. code:: shell-session

-        $ pip install --upgrade pipenv
-        $ cd /path/to/paperless
-        $ pipenv clean
-        $ pipenv install
-
-    This creates a new virtual environment (or uses your existing environment)
-    and installs all dependencies into it.
-
-    You can also use the included ``requirements.txt`` file instead and create the virtual
-    environment yourself. This file includes exactly the same dependencies.
+        $ pip install -r requirements.txt

 3.  Migrate the database.

    .. code:: shell-session

        $ cd src
-        $ pipenv run python3 manage.py migrate
+        $ python3 manage.py migrate

    This might not actually do anything. Not every new paperless version comes with new
    database migrations.
@@ -195,7 +187,7 @@ or
 .. code:: shell-session

    $ cd /path/to/paperless/src
-    $ pipenv run python manage.py <command> <arguments>
+    $ python3 manage.py <command> <arguments>

 depending on whether you use docker or not.

@@ -462,6 +454,3 @@ Basic usage to disable encryption of your document store:
 .. code::

    decrypt_documents [--passphrase SECR3TP4SSPHRA$E]
-
-
-.. _Pipenv: https://pipenv.pypa.io/en/latest/
--- a/docs/configuration.rst
+++ b/docs/configuration.rst
@@ -376,25 +376,24 @@ PAPERLESS_THREADS_PER_WORKER=<num>
        use a higher thread per worker count.

    The default is a balance between the two, according to your CPU core count,
-    with a slight favor towards threads per worker, and leaving at least one core
-    free for other tasks:
+    with a slight favor towards threads per worker:

    +----------------+---------+---------+
    | CPU core count | Workers | Threads |
    +----------------+---------+---------+
    |              1 |       1 |       1 |
    +----------------+---------+---------+
-    |              2 |       1 |       1 |
+    |              2 |       2 |       1 |
    +----------------+---------+---------+
-    |              4 |       1 |       3 |
+    |              4 |       2 |       2 |
    +----------------+---------+---------+
-    |              6 |       2 |       2 |
+    |              6 |       2 |       3 |
    +----------------+---------+---------+
-    |              8 |       2 |       3 |
+    |              8 |       2 |       4 |
    +----------------+---------+---------+
-    |             12 |       3 |       3 |
+    |             12 |       3 |       4 |
    +----------------+---------+---------+
-    |             16 |       3 |       5 |
+    |             16 |       4 |       4 |
    +----------------+---------+---------+

    If you only specify PAPERLESS_TASK_WORKERS, paperless will adjust
--- a/docs/setup.rst
+++ b/docs/setup.rst
@@ -20,45 +20,45 @@ Paperless consists of the following components:
    .. code:: shell-session

        $ cd /path/to/paperless/src/
-        $ pipenv run gunicorn -c /usr/src/paperless/gunicorn.conf.py -b 0.0.0.0:8000 paperless.wsgi
+        $ gunicorn -c ../gunicorn.conf.py -b 0.0.0.0:8000 paperless.wsgi

    or by any other means such as Apache ``mod_wsgi``.

 *   **The consumer:** This is what watches your consumption folder for documents.
-    However, the consumer itself does not consume really consume your documents anymore.
-    It rather notifies a task processor that a new file is ready for consumption.
+    However, the consumer itself does not really consume your documents.
+    Now it notifies a task processor that a new file is ready for consumption.
    I suppose it should be named differently.
-    This also used to check your emails, but that's now gone elsewhere as well.
+    This was also used to check your emails, but that's now done elsewhere as well.

    Start the consumer with the management command ``document_consumer``:

    .. code:: shell-session

        $ cd /path/to/paperless/src/
-        $ pipenv run python3 manage.py document_consumer
+        $ python3 manage.py document_consumer

    .. _setup-task_processor:

 *   **The task processor:** Paperless relies on `Django Q <https://django-q.readthedocs.io/en/latest/>`_
-    for doing much of the heavy lifting. This is a task queue that accepts tasks from
-    multiple sources and processes tasks in parallel. It also comes with a scheduler that executes
+    for doing most of the heavy lifting. This is a task queue that accepts tasks from
+    multiple sources and processes these in parallel. It also comes with a scheduler that executes
    certain commands periodically.

    This task processor is responsible for:

    *   Consuming documents. When the consumer finds new documents, it notifies the task processor to
        start a consumption task.
-    *   Consuming emails. It periodically checks your configured accounts for new mails and
-        produces consumption tasks for any documents it finds.
    *   The task processor also performs the consumption of any documents you upload through
        the web interface.
-    *   Maintain the search index and the automatic matching algorithm. These are things that paperless
+    *   Consuming emails. It periodically checks your configured accounts for new emails and
+        notifies the task processor to consume the attachment of an email.
+    *   Maintaining the search index and the automatic matching algorithm. These are things that paperless
        needs to do from time to time in order to operate properly.

    This allows paperless to process multiple documents from your consumption folder in parallel! On
-    a modern multi core system, consumption with full ocr is blazing fast.
+    a modern multi core system, this makes the consumption process with full OCR blazingly fast.

-    The task processor comes with a built-in admin interface that you can use to see whenever any of the
+    The task processor comes with a built-in admin interface that you can use to check whenever any of the
    tasks fail and inspect the errors (i.e., wrong email credentials, errors during consuming a specific
    file, etc).

@@ -67,11 +67,11 @@ Paperless consists of the following components:
    .. code:: shell-session

        $ cd /path/to/paperless/src/
-        $ pipenv run python3 manage.py qcluster
+        $ python3 manage.py qcluster

 *   A `redis <https://redis.io/>`_ message broker: This is a really lightweight service that is responsible
-    for getting the tasks from the webserver and consumer to the task scheduler. These run in different
-    processes (maybe even on different machines!), and therefore, this is necessary.
+    for getting the tasks from the webserver and the consumer to the task scheduler. These run in a different
+    process (maybe even on different machines!), and therefore, this is necessary.

 *   Optional: A database server. Paperless supports both PostgreSQL and SQLite for storing its data.

@@ -79,7 +79,7 @@ Paperless consists of the following components:
 Installation
 ############

-You can go multiple routes with setting up and running Paperless:
+You can go multiple routes to setup and run Paperless:

 * :ref:`Pull the image from Docker Hub <setup-docker_hub>`
 * :ref:`Build the Docker image yourself <setup-docker_build>`
@@ -87,26 +87,31 @@ You can go multiple routes with setting up and running Paperless:
 * :ref:`Use ansible to install Paperless on your system automatically (bare metal) <setup-ansible>`

 The Docker routes are quick & easy. These are the recommended routes. This configures all the stuff
-from above automatically so that it just works and uses sensible defaults for all configuration options.
+from the above automatically so that it just works and uses sensible defaults for all configuration options.
+Here you find a cheat-sheet for docker beginners: `CLI Basics <https://sehn.tech/post/devops-with-docker/>`_

-The bare metal route is more complicated to setup but makes it easier
+The bare metal route is complicated to setup but makes it easier
 should you want to contribute some code back. You need to configure and
 run the above mentioned components yourself.

-The ansible route cobines benefits from both options:
-the setup process is fully automated, reproducible and idempotent,
-it includes the same sensible defaults,
-and it simultaneously provides the flexibility of a bare metal installation.
+The ansible route combines benefits of both options:
+the setup process is fully automated, reproducible and `idempotent <https://docs.ansible.com/ansible/latest/reference_appendices/glossary.html#Idempotency>`_,
+it includes the same sensible defaults, and it simultaneously provides the flexibility of a bare metal installation.
+
+.. _CLI Basics: https://sehn.tech/post/devops-with-docker/
+.. _idempotent: https://docs.ansible.com/ansible/latest/reference_appendices/glossary.html#Idempotency

 .. _setup-docker_hub:

 Install Paperless from Docker Hub
 =================================

-1.  Go to the `/docker/compose directory on the project page <https://github.com/jonaswinkler/paperless-ng/tree/master/docker/compose>`_
-    and download one of the ``docker-compose.*.yml`` files, depending on which database backend you
+1.  Login with your user and create a folder in your home-directory `mkdir -v ~/paperless-ng` to have a place for your configuration files and consumption directory.
+
+2.  Go to the `/docker/compose directory on the project page <https://github.com/jonaswinkler/paperless-ng/tree/master/docker/compose>`_
+    and download one of the `docker-compose.*.yml` files, depending on which database backend you
    want to use. Rename this file to `docker-compose.yml`.
-    If you want to enable optional support for Office documents, download a file with ``-tika`` in its name.
+    If you want to enable optional support for Office documents, download a file with `-tika` in the file name.
    Download the ``docker-compose.env`` file and the ``.env`` file as well and store them
    in the same directory.

@@ -115,25 +120,26 @@ Install Paperless from Docker Hub
        For new installations, it is recommended to use PostgreSQL as the database
        backend.

-2.  Install `Docker`_ and `docker-compose`_.
+3.  Install `Docker`_ and `docker-compose`_.

    .. caution::

        If you want to use the included ``docker-compose.*.yml`` file, you
        need to have at least Docker version **17.09.0** and docker-compose
        version **1.17.0**.
+        To check do: `docker-compose -v` or `docker -v`

        See the `Docker installation guide`_ on how to install the current
        version of Docker for your operating system or Linux distribution of
-        choice. To get an up-to-date version of docker-compose, follow the
+        choice. To get the latest version of docker-compose, follow the
        `docker-compose installation guide`_ if your package repository doesn't
        include it.

        .. _Docker installation guide: https://docs.docker.com/engine/installation/
        .. _docker-compose installation guide: https://docs.docker.com/compose/install/

-3.  Modify ``docker-compose.yml`` to your preferences. You may want to change the path
-    to the consumption directory in this file. Find the line that specifies where
+4.  Modify ``docker-compose.yml`` to your preferences. You may want to change the path
+    to the consumption directory. Find the line that specifies where
    to mount the consumption directory:

    .. code::
@@ -149,31 +155,35 @@ Install Paperless from Docker Hub
    Don't change the part after the colon or paperless wont find your documents.


-4.  Modify ``docker-compose.env``, following the comments in the file. The
+5.  Modify ``docker-compose.env``, following the comments in the file. The
    most important change is to set ``USERMAP_UID`` and ``USERMAP_GID``
    to the uid and gid of your user on the host system. This ensures that
    both the docker container and you on the host machine have write access
    to the consumption directory. If your UID and GID on the host system is
    1000 (the default for the first normal user on most systems), it will
-    work out of the box without any modifications.
+    work out of the box without any modifications. `id "username"` to check.

    .. note::

-        You can use any settings from the file ``paperless.conf.example`` in this file.
-        Have a look at :ref:`configuration` to see whats available.
+        You can copy any setting from the file ``paperless.conf.example`` and paste it here.
+        Have a look at :ref:`configuration` to see what's available.

    .. caution::

-        Certain file systems such as NFS network shares don't support file system
+        Some file systems such as NFS network shares don't support file system
        notifications with ``inotify``. When storing the consumption directory
-        on such a file system, paperless will be unable to pick up new files
+        on such a file system, paperless will not pick up new files
        with the default configuration. You will need to use ``PAPERLESS_CONSUMER_POLLING``,
        which will disable inotify. See :ref:`here <configuration-polling>`.

-5.  Run ``docker-compose up -d``. This will create and start the necessary
-    containers.
+6.  Now head over to: https://hub.docker.com/r/jonaswinkler/paperless-ng and choose your preferred
+    image and copy the link. To download this image do a `docker pull` followed by the link. Do this within the directory with the .yml files.
+    Depending on your network connection and CPU this will take a while. You have time to get a beverage.

-6.  To be able to login, you will need a super user. To create it, execute the
+7.  Run ``docker-compose up -d``. This will create and start the necessary
+    containers, but your are not done yet!
+
+8.  To be able to login, you will need a super user. To create it, execute the
    following command:

    .. code-block:: shell-session
@@ -181,12 +191,12 @@ Install Paperless from Docker Hub
        $ docker-compose run --rm webserver createsuperuser

    This will prompt you to set a username, an optional e-mail address and
-    finally a password.
+    finally a password (at least 8 characters).

-7.  The default ``docker-compose.yml`` exports the webserver on your local port
+9.  The default ``docker-compose.yml`` exports the webserver on your local port
    8000. If you haven't adapted this, you should now be able to visit your
-    Paperless instance at ``http://127.0.0.1:8000``. You can login with the
-    user and password you just created.
+    Paperless instance at ``http://127.0.0.1:8000`` or your servers IP-Address:8000.
+    Use the login credentials you have created with the previous step.

 .. _Docker: https://www.docker.com/
 .. _docker-compose: https://docs.docker.com/compose/install/
@@ -214,7 +224,7 @@ Build the docker image yourself

        webserver:
            image: jonaswinkler/paperless-ng:latest
-    
+
    and replace it with a line that instructs docker-compose to build the image from the current working directory instead:

    .. code:: yaml
@@ -245,7 +255,7 @@ writing. Windows is not and will never be supported.
 1.  Install dependencies. Paperless requires the following packages.

    *   ``python3`` 3.6, 3.7, 3.8, 3.9
-    *   ``python3-pip``, optionally ``pipenv`` for package installation
+    *   ``python3-pip``
    *   ``python3-dev``

    *   ``fonts-liberation`` for generating thumbnails for plain text files
@@ -314,8 +324,13 @@ writing. Windows is not and will never be supported.

    Adjust as necessary if you configured different folders.

-7.  Install python requirements. Paperless comes with both Pipfiles for ``pipenv`` as well as with a ``requirements.txt``.
-    Both will install exactly the same requirements. It is up to you if you wish to use a virtual environment or not.
+7.  Install python requirements from the ``requirements.txt`` file.
+    It is up to you if you wish to use a virtual environment or not.
+
+    .. code:: shell-session
+
+        pip3 install -r requirements.txt
+

 8.  Go to ``/opt/paperless/src``, and execute the following commands:

@@ -339,7 +354,8 @@ writing. Windows is not and will never be supported.
    .. warning::

        This is a development server which should not be used in
-        production.
+        production. It is not audited for security and performance
+        is inferior to production ready web servers.

    .. hint::

@@ -354,6 +370,11 @@ writing. Windows is not and will never be supported.
    ``consumer`` script to watch the input folder, and the ``scheduler``
    script to run tasks such as email checking and document consumption.

+    You may need to adjust the path to the ``gunicorn`` executable. This
+    will be installed as part of the python dependencies, and is either located
+    in the ``bin`` folder of your virtual environment, or in ``~/.local/bin/`` if
+    no virtual environment is used.
+
    These services rely on redis and optionally the database server, but
    don't need to be started in any particular order. The example files
    depend on redis being started. If you use a database server, you should
@@ -406,7 +427,7 @@ Install Paperless using ansible

        This role currently only supports Debian 10 Buster and Ubuntu 20.04 Focal or later as target hosts.

-1.  Install ansible 2.7+ on the management node. 
+1.  Install ansible 2.7+ on the management node.
    This may be the target host paperless-ng is being installed on or any remote host which can access the target host.
    For further details, check the ansible `inventory <https://docs.ansible.com/ansible/latest/user_guide/intro_inventory.html>`_ documentation.

@@ -518,7 +539,10 @@ Migration to paperless-ng

 At its core, paperless-ng is still paperless and fully compatible. However, some
 things have changed under the hood, so you need to adapt your setup depending on
-how you installed paperless. The important things to keep in mind are as follows.
+how you installed paperless.
+
+This setup describes how to update an existing paperless Docker installation.
+The important things to keep in mind are as follows:

 * Read the :ref:`changelog <paperless_changelog>` and take note of breaking changes.
 * You should decide if you want to stick with SQLite or want to migrate your database
@@ -553,11 +577,18 @@ Migration to paperless-ng is then performed in a few simple steps:

    .. caution::

-        Paperless includes a ``.env`` file. This will set the
-        project name for docker compose to ``paperless`` so that paperless-ng will
-        automatically reuse your existing paperless volumes. When you start it, it
-        will migrate your existing data. After that, your old paperless installation
-        will be incompatible with the migrated volumes.
+        Paperless-ng includes a ``.env`` file. This will set the
+        project name for docker compose to ``paperless``, which will also define the name
+        of the volumes by paperless-ng. However, if you experience that paperless-ng
+        is not using your old paperless volumes, verify the names of your volumes with
+
+        .. code:: shell-session
+
+            $ docker volume ls | grep _data
+
+        and adjust the project name in the ``.env`` file so that it matches the name
+        of the volumes before the ``_data`` part.
+

 4.  Download the ``docker-compose.sqlite.yml`` file to ``docker-compose.yml``.
    If you want to switch to PostgreSQL, do that after you migrated your existing
@@ -638,14 +669,12 @@ management commands as below.

        This will launch the container and initialize the PostgreSQL database.

-    b)  Without docker, open a shell in your virtual environment, switch to
+    b)  Without docker, remember to activate any virtual environment, switch to
        the ``src`` directory and create the database schema:

        .. code:: shell-session

-            $ cd /path/to/paperless
-            $ pipenv shell
-            $ cd src
+            $ cd /path/to/paperless/src
            $ python3 manage.py migrate

        This will not copy any data yet.
@@ -662,7 +691,7 @@ management commands as below.

        $ python3 manage.py loaddata data.json

-6.  Exit the shell.
+6.  If operating inside Docker, you may exit the shell now.

    .. code:: shell-session

--- a/docs/troubleshooting.rst
+++ b/docs/troubleshooting.rst
@@ -30,13 +30,22 @@ Consumer fails to pickup any new files
 ######################################

 If you notice that the consumer will only pickup files in the consumption
-directory at startup, but won't find any other files added later, check out
-the configuration file and enable filesystem polling with the setting
-``PAPERLESS_CONSUMER_POLLING``.
+directory at startup, but won't find any other files added later, you will need to
+enable filesystem polling with the configuration option
+``PAPERLESS_CONSUMER_POLLING``, see :ref:`here <configuration-polling>`.

 This will disable listening to filesystem changes with inotify and paperless will
 manually check the consumption directory for changes instead.

+
+Paperless always redirects to /admin
+####################################
+
+You probably had the old paperless installed at some point. Paperless installed
+a permanent redirect to /admin in your browser, and you need to clear your
+browsing data / cache to fix that.
+
+
 Operation not permitted
 #######################

@@ -64,6 +73,24 @@ This may have two reasons:
    with Inbox tags. Verify that there are documents in your archive without inbox tags.
    The algorithm will only learn from documents not in your inbox.

+UserWarning in sklearn on every single document
+###############################################
+
+You may encounter warnings like this:
+
+.. code::
+    
+    /usr/local/lib/python3.7/site-packages/sklearn/base.py:315:
+    UserWarning: Trying to unpickle estimator CountVectorizer from version 0.23.2 when using version 0.24.0.
+    This might lead to breaking code or invalid results. Use at your own risk.
+
+This happens when certain dependencies of paperless that are responsible for the auto matching algorithm are
+updated. After updating these, your current training data *might* not be compatible anymore. This can be ignored
+in most cases. This warning will disappear automatically when paperless updates the training data.
+
+If you want to get rid of the warning or actually experience issues with automatic matching, delete
+the file ``classification_model.pickle`` in the data directory and let paperless recreate it.
+
 Permission denied errors in the consumption directory
 #####################################################

@@ -78,3 +105,47 @@ Ensure that ``USERMAP_UID`` and ``USERMAP_GID`` are set to the user id and group
 different from ``1000``. See :ref:`setup-docker_hub`.

 Also ensure that you are able to read and write to the consumption directory on the host.
+
+Web-UI stuck at "Loading..."
+############################
+
+This might have multiple reasons.
+
+
+1.  If you built the docker image yourself or deployed using the bare metal route,
+    make sure that there are files in ``<paperless-root>/static/frontend/<lang-code>/``.
+    If there are no files, make sure that you executed ``collectstatic`` successfully, either
+    manually or as part of the docker image build.
+
+    If the front end is still missing, make sure that the front end is compiled (files present in
+    ``src/documents/static/frontend``). If it is not, you need to compile the front end yourself
+    or download the release archive instead of cloning the repository.
+
+2.  Check the output of the web server. You might see errors like this:
+
+
+    .. code::
+
+        [2021-01-25 10:08:04 +0000] [40] [ERROR] Socket error processing request.
+        Traceback (most recent call last):
+        File "/usr/local/lib/python3.7/site-packages/gunicorn/workers/sync.py", line 134, in handle
+            self.handle_request(listener, req, client, addr)
+        File "/usr/local/lib/python3.7/site-packages/gunicorn/workers/sync.py", line 190, in handle_request
+            util.reraise(*sys.exc_info())
+        File "/usr/local/lib/python3.7/site-packages/gunicorn/util.py", line 625, in reraise
+            raise value
+        File "/usr/local/lib/python3.7/site-packages/gunicorn/workers/sync.py", line 178, in handle_request
+            resp.write_file(respiter)
+        File "/usr/local/lib/python3.7/site-packages/gunicorn/http/wsgi.py", line 396, in write_file
+            if not self.sendfile(respiter):
+        File "/usr/local/lib/python3.7/site-packages/gunicorn/http/wsgi.py", line 386, in sendfile
+            sent += os.sendfile(sockno, fileno, offset + sent, count)
+        OSError: [Errno 22] Invalid argument
+    
+    To fix this issue, add
+
+    .. code::
+
+        SENDFILE=0
+    
+    to your `docker-compose.env` file.