This commit is contained in:
Daniel Quinn 2016-04-30 13:57:31 +01:00
parent 3c79b55ae6
commit a0d268ebbc

View File

@ -47,3 +47,30 @@ ImageMagick to use a different space for its scratch work. You do this by
setting ``PAPERLESS_CONVERT_TMPDIR`` in ``/etc/paperless.conf`` to somewhere setting ``PAPERLESS_CONVERT_TMPDIR`` in ``/etc/paperless.conf`` to somewhere
that's actually on a physical disk (and writable by the user running that's actually on a physical disk (and writable by the user running
Paperless), like ``/var/tmp/paperless`` or ``/home/my_user/tmp`` in a pinch. Paperless), like ``/var/tmp/paperless`` or ``/home/my_user/tmp`` in a pinch.
.. _troubleshooting-decompressionbombwarning:
DecompressionBombWarning and/or no text in the OCR output
---------------------------------------------------------
Some users have had issues using Paperless to consume PDFs that were created
by merging Very Large Scanned Images into one PDF. If this happens to you,
it's likely because the PDF you've created contains some very large pages
(millions of pixels) and the process of converting the PDF to a OCR-friendly
image is exploding.
Typically, this happens because the scanned images are created with a high
DPI and then rolled into the PDF with an assumed DPI of 72 (the default).
The best solution then is to specify the DPI used in the scan in the
conversion-to-PDF step. So for example, if you scanned the original image
with a DPI of 300, then merging the images into the single PDF with
``convert`` should look like this:
.. code:: bash
$ convert -density 300 *.jpg finished.pdf
For more information on this and situations like it, you should take a look
at `Issue #118`_ as that's where this tip originated.
.. _Issue #118: https://github.com/danielquinn/paperless/issues/118