mirror of
https://github.com/paperless-ngx/paperless-ngx.git
synced 2025-04-02 13:45:10 -05:00
docs
This commit is contained in:
parent
9ee21f081f
commit
38a651c42a
@ -8,6 +8,9 @@ Changelog
|
|||||||
paperless-ng 0.9.5
|
paperless-ng 0.9.5
|
||||||
##################
|
##################
|
||||||
|
|
||||||
|
This release concludes the big changes I wanted to get rolled into paperless. The next releases before 1.0 will
|
||||||
|
focus on fixing issues, primarily.
|
||||||
|
|
||||||
* OCR
|
* OCR
|
||||||
|
|
||||||
* Paperless now uses `OCRmyPDF <https://github.com/jbarlow83/OCRmyPDF>`_ to perform OCR on documents.
|
* Paperless now uses `OCRmyPDF <https://github.com/jbarlow83/OCRmyPDF>`_ to perform OCR on documents.
|
||||||
|
12
docs/faq.rst
12
docs/faq.rst
@ -86,3 +86,15 @@ the documentation has instructions for bare metal installs. I'm running
|
|||||||
paperless on an i3 processor from 2015 or so. This is also what I use to test
|
paperless on an i3 processor from 2015 or so. This is also what I use to test
|
||||||
new releases with. Apart from that, I also have a Raspberry Pi, which I
|
new releases with. Apart from that, I also have a Raspberry Pi, which I
|
||||||
occasionally build the image on and see if it works.
|
occasionally build the image on and see if it works.
|
||||||
|
|
||||||
|
**Q:** *How do I proxy this with NGINX?*
|
||||||
|
|
||||||
|
.. code::
|
||||||
|
|
||||||
|
location / {
|
||||||
|
proxy_pass http://localhost:8000/
|
||||||
|
}
|
||||||
|
|
||||||
|
And that's about it. Paperless serves everything, including static files by itself
|
||||||
|
when running the docker image. If you want to do anything fancy, you have to
|
||||||
|
install paperless bare metal.
|
||||||
|
@ -29,75 +29,23 @@ Check for the following issues:
|
|||||||
Consumer fails to pickup any new files
|
Consumer fails to pickup any new files
|
||||||
######################################
|
######################################
|
||||||
|
|
||||||
If you notice, that the consumer will only pickup files in the consumption
|
If you notice that the consumer will only pickup files in the consumption
|
||||||
directory at startup, but won't find any other files added later, check out
|
directory at startup, but won't find any other files added later, check out
|
||||||
the configuration file and enable filesystem polling with the setting
|
the configuration file and enable filesystem polling with the setting
|
||||||
``PAPERLESS_CONSUMER_POLLING``.
|
``PAPERLESS_CONSUMER_POLLING``.
|
||||||
|
|
||||||
|
Operation not permitted
|
||||||
|
#######################
|
||||||
|
|
||||||
Consumer warns ``OCR for XX failed``
|
You might see errors such as:
|
||||||
####################################
|
|
||||||
|
|
||||||
If you find the OCR accuracy to be too low, and/or the document consumer warns
|
.. code::
|
||||||
that ``OCR for XX failed, but we're going to stick with what we've got since
|
|
||||||
FORGIVING_OCR is enabled``, then you might need to install the
|
|
||||||
`Tesseract language files <http://packages.ubuntu.com/search?keywords=tesseract-ocr>`_
|
|
||||||
marching your document's languages.
|
|
||||||
|
|
||||||
As an example, if you are running Paperless from any Ubuntu or Debian
|
chown: changing ownership of '../export': Operation not permitted
|
||||||
box, and your documents are written in Spanish you may need to run::
|
|
||||||
|
|
||||||
apt-get install -y tesseract-ocr-spa
|
The container tries to set file ownership on the listed directories. This is
|
||||||
|
required so that the user running paperless inside docker has write permissions
|
||||||
|
to these folders. This happens when pointing these directories to NFS shares,
|
||||||
|
for example.
|
||||||
|
|
||||||
|
Ensure that `chown` is possible on these directories.
|
||||||
|
|
||||||
Consumer dies with ``convert: unable to extent pixel cache``
|
|
||||||
############################################################
|
|
||||||
|
|
||||||
During the consumption process, Paperless invokes ImageMagick's ``convert``
|
|
||||||
program to translate the source document into something that the OCR engine can
|
|
||||||
understand and this can burn a Very Large amount of memory if the original
|
|
||||||
document is rather long. Similarly, if your system doesn't have a lot of
|
|
||||||
memory to begin with (ie. a Raspberry Pi), then this can happen for even
|
|
||||||
medium-sized documents.
|
|
||||||
|
|
||||||
The solution is to tell ImageMagick *not* to Use All The RAM, as is its
|
|
||||||
default, and instead tell it to used a fixed amount. ``convert`` will then
|
|
||||||
break up the job into hundreds of individual files and use them to slowly
|
|
||||||
compile the finished image. Simply set ``PAPERLESS_CONVERT_MEMORY_LIMIT`` in
|
|
||||||
``/etc/paperless.conf`` to something like ``32000000`` and you'll limit
|
|
||||||
``convert`` to 32MB. Fiddle with this value as you like.
|
|
||||||
|
|
||||||
**HOWEVER**: Simply setting this value may not be enough on system where
|
|
||||||
``/tmp`` is mounted as tmpfs, as this is where ``convert`` will write its
|
|
||||||
temporary files. In these cases (most Systemd machines), you need to tell
|
|
||||||
ImageMagick to use a different space for its scratch work. You do this by
|
|
||||||
setting ``PAPERLESS_CONVERT_TMPDIR`` in ``/etc/paperless.conf`` to somewhere
|
|
||||||
that's actually on a physical disk (and writable by the user running
|
|
||||||
Paperless), like ``/var/tmp/paperless`` or ``/home/my_user/tmp`` in a pinch.
|
|
||||||
|
|
||||||
|
|
||||||
DecompressionBombWarning and/or no text in the OCR output
|
|
||||||
#########################################################
|
|
||||||
|
|
||||||
Some users have had issues using Paperless to consume PDFs that were created
|
|
||||||
by merging Very Large Scanned Images into one PDF. If this happens to you,
|
|
||||||
it's likely because the PDF you've created contains some very large pages
|
|
||||||
(millions of pixels) and the process of converting the PDF to a OCR-friendly
|
|
||||||
image is exploding.
|
|
||||||
|
|
||||||
Typically, this happens because the scanned images are created with a high
|
|
||||||
DPI and then rolled into the PDF with an assumed DPI of 72 (the default).
|
|
||||||
The best solution then is to specify the DPI used in the scan in the
|
|
||||||
conversion-to-PDF step. So for example, if you scanned the original image
|
|
||||||
with a DPI of 300, then merging the images into the single PDF with
|
|
||||||
``convert`` should look like this:
|
|
||||||
|
|
||||||
.. code:: bash
|
|
||||||
|
|
||||||
$ convert -density 300 *.jpg finished.pdf
|
|
||||||
|
|
||||||
For more information on this and situations like it, you should take a look
|
|
||||||
at `Issue #118`_ as that's where this tip originated.
|
|
||||||
|
|
||||||
.. _Issue #118: https://github.com/the-paperless-project/paperless/issues/118
|
|
||||||
|
Loading…
x
Reference in New Issue
Block a user