diff --git a/docs/troubleshooting.rst b/docs/troubleshooting.rst index 07257481d..6550ef9cf 100644 --- a/docs/troubleshooting.rst +++ b/docs/troubleshooting.rst @@ -47,3 +47,30 @@ ImageMagick to use a different space for its scratch work. You do this by setting ``PAPERLESS_CONVERT_TMPDIR`` in ``/etc/paperless.conf`` to somewhere that's actually on a physical disk (and writable by the user running Paperless), like ``/var/tmp/paperless`` or ``/home/my_user/tmp`` in a pinch. + + +.. _troubleshooting-decompressionbombwarning: + +DecompressionBombWarning and/or no text in the OCR output +--------------------------------------------------------- +Some users have had issues using Paperless to consume PDFs that were created +by merging Very Large Scanned Images into one PDF. If this happens to you, +it's likely because the PDF you've created contains some very large pages +(millions of pixels) and the process of converting the PDF to a OCR-friendly +image is exploding. + +Typically, this happens because the scanned images are created with a high +DPI and then rolled into the PDF with an assumed DPI of 72 (the default). +The best solution then is to specify the DPI used in the scan in the +conversion-to-PDF step. So for example, if you scanned the original image +with a DPI of 300, then merging the images into the single PDF with +``convert`` should look like this: + +.. code:: bash + + $ convert -density 300 *.jpg finished.pdf + +For more information on this and situations like it, you should take a look +at `Issue #118`_ as that's where this tip originated. + +.. _Issue #118: https://github.com/danielquinn/paperless/issues/118