paperless-ngx/docs/troubleshooting.rst
Michael Shamoon 05feadbb7a Squashed commit of the following:
commit a4709b1175f730a3091907040b4d60b72e1f4cd1
Author: Michael Shamoon <4887959+shamoon@users.noreply.github.com>
Date:   Thu Jul 28 15:36:13 2022 -0700

    Update stale.yml

    [skip ci]

commit 3a031084f3f9542458c872daf66cea14fd7948de
Author: Michael Shamoon <4887959+shamoon@users.noreply.github.com>
Date:   Thu Jul 28 15:24:23 2022 -0700

    Update changelog.md

commit 0c517e535146dc1ada8f8fa83a591e260b236ec6
Author: Michael Shamoon <4887959+shamoon@users.noreply.github.com>
Date:   Thu Jul 28 15:18:49 2022 -0700

    v1.8.0 version strings

commit 5fe435048bc6eb77f9473afc11588427846456ab
Merge: 278cedf3 a722bfd0
Author: shamoon <4887959+shamoon@users.noreply.github.com>
Date:   Thu Jul 28 15:17:30 2022 -0700

    Merge pull request #1240 from paperless-ngx/beta

    [Beta] Paperless-ngx v1.8.0 Release Candidate 1

commit a722bfd09994c1adb820aa41460024fbbf8ad08c
Author: Paperless-ngx Translation Bot [bot] <99855517+paperless-l10n@users.noreply.github.com>
Date:   Thu Jul 28 07:46:12 2022 -0700

    New Crowdin updates (#1291)

    * New translations django.po (French)
    [ci skip]

    * New translations messages.xlf (French)
    [ci skip]

    * New translations django.po (French)
    [ci skip]

    * New translations messages.xlf (French)
    [ci skip]

    * New translations messages.xlf (Turkish)
    [ci skip]

    * New translations django.po (Turkish)
    [ci skip]

commit f3d99a5fdbc9362721e821f85944c906d33c97df
Merge: ca334770 79de0989
Author: shamoon <4887959+shamoon@users.noreply.github.com>
Date:   Tue Jul 26 11:21:42 2022 -0700

    Merge pull request #1277 from paperless-ngx/fix/redo-ocr-button-on-edit

    Fix/feature: add redo ocr button to document edit view

commit 79de0989d544f16394f24a99d520aef4232e5184
Author: Michael Shamoon <4887959+shamoon@users.noreply.github.com>
Date:   Tue Jul 26 09:54:05 2022 -0700

    fix button icon spacing on mobile

commit ca334770b705de3907c4396441b0d93bfd6c05da
Author: Paperless-ngx Translation Bot [bot] <99855517+paperless-l10n@users.noreply.github.com>
Date:   Tue Jul 26 09:45:21 2022 -0700

    New Crowdin updates (#1242)

    * New translations messages.xlf (Turkish)
    [ci skip]

    * New translations messages.xlf (German)
    [ci skip]

    * New translations django.po (German)
    [ci skip]

    * New translations messages.xlf (Italian)
    [ci skip]

    * New translations messages.xlf (Italian)
    [ci skip]

    * New translations messages.xlf (Finnish)
    [ci skip]

    * New translations messages.xlf (Finnish)
    [ci skip]

commit 10713575059044abab24ba94cc2429d87528775e
Merge: f32dfe02 ef790ca6
Author: shamoon <4887959+shamoon@users.noreply.github.com>
Date:   Tue Jul 26 09:44:42 2022 -0700

    Merge pull request #1268 from paperless-ngx/bugfix-db-locked

    Bugfix: Adds configuration for database timeout, fixing database locked error

commit f32dfe0278c4af1ba93d6f0c4756e30f5183daa6
Merge: 611707a3 4e78ca5d
Author: shamoon <4887959+shamoon@users.noreply.github.com>
Date:   Mon Jul 25 12:59:31 2022 -0700

    Merge pull request #1261 from paperless-ngx/fix/b1.8.0-ng-select-dropdowns

    Fix: dropdown selected items not visible again

commit 278cedf3d01628ae7f1776f49f5cf48274a09b4c
Merge: b141671d ecc4553e
Author: shamoon <4887959+shamoon@users.noreply.github.com>
Date:   Mon Jul 25 09:25:52 2022 -0700

    Merge pull request #1272 from paperless-ngx/fix-1263

    Documentation: fix occasional code block color legibility

commit 45a6b5a43676d8e62b09c37594e01ad98c432fba
Author: Michael Shamoon <4887959+shamoon@users.noreply.github.com>
Date:   Sun Jul 24 20:15:26 2022 -0700

    Add redo OCR button to document edit

commit 611707a3d177836bd586b0fe667a71883cf7ff92
Merge: 2d88638d b4d20d9b
Author: shamoon <4887959+shamoon@users.noreply.github.com>
Date:   Sun Jul 24 19:59:30 2022 -0700

    Merge pull request #1276 from paperless-ngx/bugfix-webp-import

    Bugfix: Document import doesn't convert thumbnails to WebP

commit b4d20d9b9a4f1ff3cb90945dbbcf321e6f84c6ea
Author: Trenton Holmes <holmes.trenton@gmail.com>
Date:   Sun Jul 24 10:22:53 2022 -0700

    Fixes document import copying PNG files to .webp extensions without actual conversion

commit ecc4553e673440d18f68d88c8579ef4f53f4dc80
Author: Michael Shamoon <4887959+shamoon@users.noreply.github.com>
Date:   Fri Jul 22 15:10:33 2022 -0700

    fix occasional code block color legibility

commit ef790ca6f4336095610a3fca2a4ad6507c26455e
Author: Trenton Holmes <holmes.trenton@gmail.com>
Date:   Fri Jul 22 11:08:52 2022 -0700

    Fixes the copy and paste of the log line

commit 2d88638da7e144413085f29c2e9ba714648b9d69
Merge: 0e2e5f34 91ba0bd0
Author: shamoon <4887959+shamoon@users.noreply.github.com>
Date:   Fri Jul 22 10:45:53 2022 -0700

    Merge pull request #1269 from paperless-ngx/beta-deps-final

    Chore: Locks dependencies to the final versions for the beta

commit 91ba0bd0af089e59157305ea23331c8b86bd8644
Author: Trenton Holmes <holmes.trenton@gmail.com>
Date:   Fri Jul 22 08:53:02 2022 -0700

    Locks dependencies to the final versions for the beta

commit 0e2e5f3413ba265ac209ec9e755702671e47f30a
Author: Trenton Holmes <holmes.trenton@gmail.com>
Date:   Tue Jul 19 13:57:00 2022 -0700

    Creates utiliy to ensure all paths in settings are normalized and absolute

commit 7a99dcf69309a464648db39e59498a97715238c4
Author: Trenton Holmes <holmes.trenton@gmail.com>
Date:   Thu Jul 21 08:02:11 2022 -0700

    Adds configuration for database timeout, documentation and troubleshotting suggestion

commit 4e78ca5d82cb9b047639d92e0692436434d3a556
Author: Michael Shamoon <4887959+shamoon@users.noreply.github.com>
Date:   Wed Jul 20 11:15:35 2022 -0700

    remove merge error ng-select css

commit 83de38e56f5019fe506c52dbae1f9f5b6e81afc4
Merge: f4be2e4f b1b6d50a
Author: shamoon <4887959+shamoon@users.noreply.github.com>
Date:   Wed Jul 20 08:26:12 2022 -0700

    Merge pull request #1247 from paperless-ngx/bugfix-pikepdf-ocrmypdf-warnings

    Bugfix: Adds pngquant and jbig2dec to Docker image

commit f4be2e4fe77f8340b1b2dffa29b0ad609bfca86a
Merge: 4444925d 16b0f7f9
Author: Quinn Casey <quinn@quinncasey.com>
Date:   Tue Jul 19 21:03:16 2022 -0700

    Merge pull request #1259 from paperless-ngx/chore-add-ci-hadolint

    Chore: Add Hadolint job to CI

commit 16b0f7f9ee96a5fdf3c1c989dba0db9279bc907c
Author: Trenton Holmes <holmes.trenton@gmail.com>
Date:   Tue Jul 19 14:18:47 2022 -0700

    Removes a Dockerfile I can't find referenced anywhere

commit 27721aef71529e133487294e79585bc2c8f6f451
Author: Trenton Holmes <holmes.trenton@gmail.com>
Date:   Tue Jul 19 14:01:47 2022 -0700

    Fixes and updates the Hadolint action version

commit 329a317fdf04ce905b9e3bfcbefb7e3a21f04659
Author: Trenton Holmes <holmes.trenton@gmail.com>
Date:   Tue Jul 19 13:54:33 2022 -0700

    Configure Hadolint in a single location for both hooks and CI

commit daad634894831b410b9348587ffdde389bf72ae2
Author: Trenton Holmes <holmes.trenton@gmail.com>
Date:   Fri Jul 15 13:45:23 2022 -0700

    Adds a CI job for hadolint over all the Dockerfiles, fixes the minor thing it complained about

commit 4444925dea6ebac6a972cb94076bc08c15ab94c2
Merge: 4c697ab5 9c1ae96d
Author: shamoon <4887959+shamoon@users.noreply.github.com>
Date:   Mon Jul 18 15:55:29 2022 -0700

    Merge pull request #1249 from paperless-ngx/fix-generated-changelog

    [CI] Fix automatic changelog generation on release

commit 9c1ae96d336b499355cb5053516a36daa60983a0
Author: Quinn Casey <quinn@quinncasey.com>
Date:   Mon Jul 18 09:48:03 2022 -0700

    Create PR for changelog instead of direct commit

commit b1b6d50af602f2d52a2557fb921f36367e9be38c
Author: Trenton Holmes <holmes.trenton@gmail.com>
Date:   Mon Jul 18 09:46:31 2022 -0700

    Adds a couple packages to the Docker image for ocrmypdf and pikepdf

commit 4c697ab50e3a4ecc92291659c9ca93921421d61d
Author: Quinn Casey <quinn@quinncasey.com>
Date:   Sun Jul 17 15:23:28 2022 -0700

    Bump version to beta

commit b141671d908204dc05d1fdf3c5cad1f325f3e7a3
Merge: 48dfbbeb 2ab2d912
Author: Quinn Casey <quinn@quinncasey.com>
Date:   Sun Jul 17 13:18:57 2022 -0700

    Merge pull request #1237 from tooomm/patch-1

    chore: Run stale bot only on certain labels

commit 2ab2d9127df146910130591b541258c3bb6cd4c4
Author: shamoon <4887959+shamoon@users.noreply.github.com>
Date:   Fri Jul 15 20:19:28 2022 -0700

    Use cant-reproduce for stale

commit 278453451ec49366f993a7b9cce22a3dcaab5f1d
Author: tooomm <tooomm@users.noreply.github.com>
Date:   Fri Jul 15 21:18:38 2022 +0200

    only run on certain labels

commit 48dfbbebc654464026b0137c635262073c417292
Merge: 8efb97ef e568b300
Author: shamoon <4887959+shamoon@users.noreply.github.com>
Date:   Sun Jul 3 07:42:20 2022 -0700

    Merge pull request #1110 from paperless-ngx/update-issue-form

commit 8efb97ef4ebfad8690c32ac9e4ae0b328b1c13e1
Author: shamoon <4887959+shamoon@users.noreply.github.com>
Date:   Sat Jul 2 19:06:32 2022 -0700

    Update stale.yml

    [ci skip]

commit d8cda7fc1b878c43ae10733f6b807c13d50239e9
Author: shamoon <4887959+shamoon@users.noreply.github.com>
Date:   Sat Jul 2 17:51:39 2022 -0700

    Use any-of-labels for stalebot

    [ci skip]

commit 68f0cf419b54b2487647db84941dfb9233e54580
Merge: 666b9385 26b12512
Author: Felix E <felix@eckhofer.com>
Date:   Mon Jun 20 14:25:59 2022 +0200

    Merge pull request #1148 from pReya/patch-1

    fix: update scanner capability

commit 26b12512b1fd25dba7e1180bcf1dbf70b66b8dba
Author: Moritz Stückler <moritz.stueckler@gmail.com>
Date:   Mon Jun 20 12:06:54 2022 +0200

    fix: update scanner capability

    The Brother ADS-A1700W does indeed support SFTP. I've just bought it, and set it up like this.

commit e568b3000e9304c1aa1febfd6ab6749fc59e09a3
Author: shamoon <4887959+shamoon@users.noreply.github.com>
Date:   Tue Jun 7 15:28:49 2022 -0700

    Add lsio to issue form

commit 666b938550963d136a4f2274cafc0d8d14993761
Merge: de5eaf1c 163231d3
Author: Quinn Casey <quinn@quinncasey.com>
Date:   Thu May 19 17:23:23 2022 -0700

    Merge pull request #990 from tooomm/patch-2

    Docs: Fix headings and add links to PRs in changelog

commit 163231d3076562da4079a13842b5e13cd7470611
Author: tooomm <tooomm@users.noreply.github.com>
Date:   Thu May 19 23:12:40 2022 +0200

    Link issues, capitalization and minor fixes

commit e530750fc6e405bf3a37981d9da8dbb0d33c840a
Author: tooomm <tooomm@users.noreply.github.com>
Date:   Thu May 19 22:05:43 2022 +0200

    update heading levels for v1.7.0
2022-07-28 15:36:24 -07:00

320 lines
14 KiB
ReStructuredText

***************
Troubleshooting
***************
No files are added by the consumer
##################################
Check for the following issues:
* Ensure that the directory you're putting your documents in is the folder
paperless is watching. With docker, this setting is performed in the
``docker-compose.yml`` file. Without docker, look at the ``CONSUMPTION_DIR``
setting. Don't adjust this setting if you're using docker.
* Ensure that redis is up and running. Paperless does its task processing
asynchronously, and for documents to arrive at the task processor, it needs
redis to run.
* Ensure that the task processor is running. Docker does this automatically.
Manually invoke the task processor by executing
.. code:: shell-session
$ python3 manage.py qcluster
* Look at the output of paperless and inspect it for any errors.
* Go to the admin interface, and check if there are failed tasks. If so, the
tasks will contain an error message.
Consumer warns ``OCR for XX failed``
####################################
If you find the OCR accuracy to be too low, and/or the document consumer warns
that ``OCR for XX failed, but we're going to stick with what we've got since
FORGIVING_OCR is enabled``, then you might need to install the
`Tesseract language files <http://packages.ubuntu.com/search?keywords=tesseract-ocr>`_
marching your document's languages.
As an example, if you are running Paperless-ngx from any Ubuntu or Debian
box, and your documents are written in Spanish you may need to run::
apt-get install -y tesseract-ocr-spa
Consumer fails to pickup any new files
######################################
If you notice that the consumer will only pickup files in the consumption
directory at startup, but won't find any other files added later, you will need to
enable filesystem polling with the configuration option
``PAPERLESS_CONSUMER_POLLING``, see :ref:`here <configuration-polling>`.
This will disable listening to filesystem changes with inotify and paperless will
manually check the consumption directory for changes instead.
Paperless always redirects to /admin
####################################
You probably had the old paperless installed at some point. Paperless installed
a permanent redirect to /admin in your browser, and you need to clear your
browsing data / cache to fix that.
Operation not permitted
#######################
You might see errors such as:
.. code:: shell-session
chown: changing ownership of '../export': Operation not permitted
The container tries to set file ownership on the listed directories. This is
required so that the user running paperless inside docker has write permissions
to these folders. This happens when pointing these directories to NFS shares,
for example.
Ensure that ``chown`` is possible on these directories.
Classifier error: No training data available
############################################
This indicates that the Auto matching algorithm found no documents to learn from.
This may have two reasons:
* You don't use the Auto matching algorithm: The error can be safely ignored in this case.
* You are using the Auto matching algorithm: The classifier explicitly excludes documents
with Inbox tags. Verify that there are documents in your archive without inbox tags.
The algorithm will only learn from documents not in your inbox.
UserWarning in sklearn on every single document
###############################################
You may encounter warnings like this:
.. code::
/usr/local/lib/python3.7/site-packages/sklearn/base.py:315:
UserWarning: Trying to unpickle estimator CountVectorizer from version 0.23.2 when using version 0.24.0.
This might lead to breaking code or invalid results. Use at your own risk.
This happens when certain dependencies of paperless that are responsible for the auto matching algorithm are
updated. After updating these, your current training data *might* not be compatible anymore. This can be ignored
in most cases. This warning will disappear automatically when paperless updates the training data.
If you want to get rid of the warning or actually experience issues with automatic matching, delete
the file ``classification_model.pickle`` in the data directory and let paperless recreate it.
504 Server Error: Gateway Timeout when adding Office documents
##############################################################
You may experience these errors when using the optional TIKA integration:
.. code::
requests.exceptions.HTTPError: 504 Server Error: Gateway Timeout for url: http://gotenberg:3000/forms/libreoffice/convert
Gotenberg is a server that converts Office documents into PDF documents and has a default timeout of 30 seconds.
When conversion takes longer, Gotenberg raises this error.
You can increase the timeout by configuring a command flag for Gotenberg (see also `here <https://gotenberg.dev/docs/modules/api#properties>`__).
If using docker-compose, this is achieved by the following configuration change in the ``docker-compose.yml`` file:
.. code:: yaml
gotenberg:
image: gotenberg/gotenberg:7.4
restart: unless-stopped
command:
- "gotenberg"
- "--chromium-disable-routes=true"
- "--api-timeout=60"
Permission denied errors in the consumption directory
#####################################################
You might encounter errors such as:
.. code:: shell-session
The following error occured while consuming document.pdf: [Errno 13] Permission denied: '/usr/src/paperless/src/../consume/document.pdf'
This happens when paperless does not have permission to delete files inside the consumption directory.
Ensure that ``USERMAP_UID`` and ``USERMAP_GID`` are set to the user id and group id you use on the host operating system, if these are
different from ``1000``. See :ref:`setup-docker_hub`.
Also ensure that you are able to read and write to the consumption directory on the host.
OSError: [Errno 19] No such device when consuming files
#######################################################
If you experience errors such as:
.. code:: shell-session
File "/usr/local/lib/python3.7/site-packages/whoosh/codec/base.py", line 570, in open_compound_file
return CompoundStorage(dbfile, use_mmap=storage.supports_mmap)
File "/usr/local/lib/python3.7/site-packages/whoosh/filedb/compound.py", line 75, in __init__
self._source = mmap.mmap(fileno, 0, access=mmap.ACCESS_READ)
OSError: [Errno 19] No such device
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/usr/local/lib/python3.7/site-packages/django_q/cluster.py", line 436, in worker
res = f(*task["args"], **task["kwargs"])
File "/usr/src/paperless/src/documents/tasks.py", line 73, in consume_file
override_tag_ids=override_tag_ids)
File "/usr/src/paperless/src/documents/consumer.py", line 271, in try_consume_file
raise ConsumerError(e)
Paperless uses a search index to provide better and faster full text searching. This search index is stored inside
the ``data`` folder. The search index uses memory-mapped files (mmap). The above error indicates that paperless
was unable to create and open these files.
This happens when you're trying to store the data directory on certain file systems (mostly network shares)
that don't support memory-mapped files.
Web-UI stuck at "Loading..."
############################
This might have multiple reasons.
1. If you built the docker image yourself or deployed using the bare metal route,
make sure that there are files in ``<paperless-root>/static/frontend/<lang-code>/``.
If there are no files, make sure that you executed ``collectstatic`` successfully, either
manually or as part of the docker image build.
If the front end is still missing, make sure that the front end is compiled (files present in
``src/documents/static/frontend``). If it is not, you need to compile the front end yourself
or download the release archive instead of cloning the repository.
2. Check the output of the web server. You might see errors like this:
.. code::
[2021-01-25 10:08:04 +0000] [40] [ERROR] Socket error processing request.
Traceback (most recent call last):
File "/usr/local/lib/python3.7/site-packages/gunicorn/workers/sync.py", line 134, in handle
self.handle_request(listener, req, client, addr)
File "/usr/local/lib/python3.7/site-packages/gunicorn/workers/sync.py", line 190, in handle_request
util.reraise(*sys.exc_info())
File "/usr/local/lib/python3.7/site-packages/gunicorn/util.py", line 625, in reraise
raise value
File "/usr/local/lib/python3.7/site-packages/gunicorn/workers/sync.py", line 178, in handle_request
resp.write_file(respiter)
File "/usr/local/lib/python3.7/site-packages/gunicorn/http/wsgi.py", line 396, in write_file
if not self.sendfile(respiter):
File "/usr/local/lib/python3.7/site-packages/gunicorn/http/wsgi.py", line 386, in sendfile
sent += os.sendfile(sockno, fileno, offset + sent, count)
OSError: [Errno 22] Invalid argument
To fix this issue, add
.. code::
SENDFILE=0
to your `docker-compose.env` file.
Error while reading metadata
############################
You might find messages like these in your log files:
.. code::
[WARNING] [paperless.parsing.tesseract] Error while reading metadata
This indicates that paperless failed to read PDF metadata from one of your documents. This happens when you
open the affected documents in paperless for editing. Paperless will continue to work, and will simply not
show the invalid metadata.
Consumer fails with a FileNotFoundError
#######################################
You might find messages like these in your log files:
.. code::
[ERROR] [paperless.consumer] Error while consuming document SCN_0001.pdf: FileNotFoundError: [Errno 2] No such file or directory: '/tmp/ocrmypdf.io.yhk3zbv0/origin.pdf'
Traceback (most recent call last):
File "/app/paperless/src/paperless_tesseract/parsers.py", line 261, in parse
ocrmypdf.ocr(**args)
File "/usr/local/lib/python3.8/dist-packages/ocrmypdf/api.py", line 337, in ocr
return run_pipeline(options=options, plugin_manager=plugin_manager, api=True)
File "/usr/local/lib/python3.8/dist-packages/ocrmypdf/_sync.py", line 385, in run_pipeline
exec_concurrent(context, executor)
File "/usr/local/lib/python3.8/dist-packages/ocrmypdf/_sync.py", line 302, in exec_concurrent
pdf = post_process(pdf, context, executor)
File "/usr/local/lib/python3.8/dist-packages/ocrmypdf/_sync.py", line 235, in post_process
pdf_out = metadata_fixup(pdf_out, context)
File "/usr/local/lib/python3.8/dist-packages/ocrmypdf/_pipeline.py", line 798, in metadata_fixup
with pikepdf.open(context.origin) as original, pikepdf.open(working_file) as pdf:
File "/usr/local/lib/python3.8/dist-packages/pikepdf/_methods.py", line 923, in open
pdf = Pdf._open(
FileNotFoundError: [Errno 2] No such file or directory: '/tmp/ocrmypdf.io.yhk3zbv0/origin.pdf'
This probably indicates paperless tried to consume the same file twice. This can happen for a number of reasons,
depending on how documents are placed into the consume folder. If paperless is using inotify (the default) to
check for documents, try adjusting the :ref:`inotify configuration <configuration-inotify>`. If polling is enabled,
try adjusting the :ref:`polling configuration <configuration-polling>`.
Consumer fails waiting for file to remain unmodified.
#####################################################
You might find messages like these in your log files:
.. code::
[ERROR] [paperless.management.consumer] Timeout while waiting on file /usr/src/paperless/src/../consume/SCN_0001.pdf to remain unmodified.
This indicates paperless timed out while waiting for the file to be completely written to the consume folder.
Adjusting :ref:`polling configuration <configuration-polling>` values should resolve the issue.
.. note::
The user will need to manually move the file out of the consume folder and
back in, for the initial failing file to be consumed.
Consumer fails reporting "OS reports file as busy still".
#########################################################
You might find messages like these in your log files:
.. code::
[WARNING] [paperless.management.consumer] Not consuming file /usr/src/paperless/src/../consume/SCN_0001.pdf: OS reports file as busy still
This indicates paperless was unable to open the file, as the OS reported the file as still being in use. To prevent a
crash, paperless did not try to consume the file. If paperless is using inotify (the default) to
check for documents, try adjusting the :ref:`inotify configuration <configuration-inotify>`. If polling is enabled,
try adjusting the :ref:`polling configuration <configuration-polling>`.
.. note::
The user will need to manually move the file out of the consume folder and
back in, for the initial failing file to be consumed.
Log reports "Creating PaperlessTask failed".
#########################################################
You might find messages like these in your log files:
.. code::
[ERROR] [paperless.management.consumer] Creating PaperlessTask failed: db locked
You are likely using an sqlite based installation, with an increased number of workers and are running into sqlite's concurrency limitations.
Uploading or consuming multiple files at once results in many workers attempting to access the database simultaneously.
Consider changing to the PostgreSQL database if you will be processing many documents at once often. Otherwise,
try tweaking the ``PAPERLESS_DB_TIMEOUT`` setting to allow more time for the database to unlock. This may have
minor performance implications.