mirror of
https://github.com/paperless-ngx/paperless-ngx.git
synced 2025-04-02 13:45:10 -05:00
docs
This commit is contained in:
parent
9ee21f081f
commit
38a651c42a
@ -8,6 +8,9 @@ Changelog
|
||||
paperless-ng 0.9.5
|
||||
##################
|
||||
|
||||
This release concludes the big changes I wanted to get rolled into paperless. The next releases before 1.0 will
|
||||
focus on fixing issues, primarily.
|
||||
|
||||
* OCR
|
||||
|
||||
* Paperless now uses `OCRmyPDF <https://github.com/jbarlow83/OCRmyPDF>`_ to perform OCR on documents.
|
||||
|
12
docs/faq.rst
12
docs/faq.rst
@ -86,3 +86,15 @@ the documentation has instructions for bare metal installs. I'm running
|
||||
paperless on an i3 processor from 2015 or so. This is also what I use to test
|
||||
new releases with. Apart from that, I also have a Raspberry Pi, which I
|
||||
occasionally build the image on and see if it works.
|
||||
|
||||
**Q:** *How do I proxy this with NGINX?*
|
||||
|
||||
.. code::
|
||||
|
||||
location / {
|
||||
proxy_pass http://localhost:8000/
|
||||
}
|
||||
|
||||
And that's about it. Paperless serves everything, including static files by itself
|
||||
when running the docker image. If you want to do anything fancy, you have to
|
||||
install paperless bare metal.
|
||||
|
@ -29,75 +29,23 @@ Check for the following issues:
|
||||
Consumer fails to pickup any new files
|
||||
######################################
|
||||
|
||||
If you notice, that the consumer will only pickup files in the consumption
|
||||
If you notice that the consumer will only pickup files in the consumption
|
||||
directory at startup, but won't find any other files added later, check out
|
||||
the configuration file and enable filesystem polling with the setting
|
||||
``PAPERLESS_CONSUMER_POLLING``.
|
||||
|
||||
Operation not permitted
|
||||
#######################
|
||||
|
||||
Consumer warns ``OCR for XX failed``
|
||||
####################################
|
||||
You might see errors such as:
|
||||
|
||||
If you find the OCR accuracy to be too low, and/or the document consumer warns
|
||||
that ``OCR for XX failed, but we're going to stick with what we've got since
|
||||
FORGIVING_OCR is enabled``, then you might need to install the
|
||||
`Tesseract language files <http://packages.ubuntu.com/search?keywords=tesseract-ocr>`_
|
||||
marching your document's languages.
|
||||
.. code::
|
||||
|
||||
As an example, if you are running Paperless from any Ubuntu or Debian
|
||||
box, and your documents are written in Spanish you may need to run::
|
||||
chown: changing ownership of '../export': Operation not permitted
|
||||
|
||||
apt-get install -y tesseract-ocr-spa
|
||||
The container tries to set file ownership on the listed directories. This is
|
||||
required so that the user running paperless inside docker has write permissions
|
||||
to these folders. This happens when pointing these directories to NFS shares,
|
||||
for example.
|
||||
|
||||
|
||||
|
||||
Consumer dies with ``convert: unable to extent pixel cache``
|
||||
############################################################
|
||||
|
||||
During the consumption process, Paperless invokes ImageMagick's ``convert``
|
||||
program to translate the source document into something that the OCR engine can
|
||||
understand and this can burn a Very Large amount of memory if the original
|
||||
document is rather long. Similarly, if your system doesn't have a lot of
|
||||
memory to begin with (ie. a Raspberry Pi), then this can happen for even
|
||||
medium-sized documents.
|
||||
|
||||
The solution is to tell ImageMagick *not* to Use All The RAM, as is its
|
||||
default, and instead tell it to used a fixed amount. ``convert`` will then
|
||||
break up the job into hundreds of individual files and use them to slowly
|
||||
compile the finished image. Simply set ``PAPERLESS_CONVERT_MEMORY_LIMIT`` in
|
||||
``/etc/paperless.conf`` to something like ``32000000`` and you'll limit
|
||||
``convert`` to 32MB. Fiddle with this value as you like.
|
||||
|
||||
**HOWEVER**: Simply setting this value may not be enough on system where
|
||||
``/tmp`` is mounted as tmpfs, as this is where ``convert`` will write its
|
||||
temporary files. In these cases (most Systemd machines), you need to tell
|
||||
ImageMagick to use a different space for its scratch work. You do this by
|
||||
setting ``PAPERLESS_CONVERT_TMPDIR`` in ``/etc/paperless.conf`` to somewhere
|
||||
that's actually on a physical disk (and writable by the user running
|
||||
Paperless), like ``/var/tmp/paperless`` or ``/home/my_user/tmp`` in a pinch.
|
||||
|
||||
|
||||
DecompressionBombWarning and/or no text in the OCR output
|
||||
#########################################################
|
||||
|
||||
Some users have had issues using Paperless to consume PDFs that were created
|
||||
by merging Very Large Scanned Images into one PDF. If this happens to you,
|
||||
it's likely because the PDF you've created contains some very large pages
|
||||
(millions of pixels) and the process of converting the PDF to a OCR-friendly
|
||||
image is exploding.
|
||||
|
||||
Typically, this happens because the scanned images are created with a high
|
||||
DPI and then rolled into the PDF with an assumed DPI of 72 (the default).
|
||||
The best solution then is to specify the DPI used in the scan in the
|
||||
conversion-to-PDF step. So for example, if you scanned the original image
|
||||
with a DPI of 300, then merging the images into the single PDF with
|
||||
``convert`` should look like this:
|
||||
|
||||
.. code:: bash
|
||||
|
||||
$ convert -density 300 *.jpg finished.pdf
|
||||
|
||||
For more information on this and situations like it, you should take a look
|
||||
at `Issue #118`_ as that's where this tip originated.
|
||||
|
||||
.. _Issue #118: https://github.com/the-paperless-project/paperless/issues/118
|
||||
Ensure that `chown` is possible on these directories.
|
||||
|
Loading…
x
Reference in New Issue
Block a user