From 23aa79f3073a5ffb04c105350f7e8c99dbd9979e Mon Sep 17 00:00:00 2001 From: Daniel Quinn Date: Fri, 25 Mar 2016 20:51:22 +0000 Subject: [PATCH] Documented the new variables and updated the changelog --- docs/changelog.rst | 6 ++++++ docs/troubleshooting.rst | 44 +++++++++++++++++++++++++++++++++------- 2 files changed, 43 insertions(+), 7 deletions(-) diff --git a/docs/changelog.rst b/docs/changelog.rst index c1397bb6c..da0d0ec56 100644 --- a/docs/changelog.rst +++ b/docs/changelog.rst @@ -3,6 +3,10 @@ Changelog * 0.2.0 + * `#98`_: Added optional environment variables for ImageMagick so that it + doesn't explode when handling Very Large Documents or when it's just + running on a low-memory system. Thanks to `Florian Harr`_ for his help on + this one. * Added support for guessing the date from the file name along with the correspondent, title, and tags. Thanks to `Tikitu de Jager`_ for his pull request that I took forever to merge and to `Pit`_ for his efforts on the @@ -97,6 +101,7 @@ Changelog .. _zedster: https://github.com/zedster .. _Martin Honermeyer: https://github.com/djmaze .. _Tim White: https://github.com/timwhite +.. _Florian Harr: https://github.com/evils .. _#20: https://github.com/danielquinn/paperless/issues/20 .. _#44: https://github.com/danielquinn/paperless/issues/44 @@ -111,3 +116,4 @@ Changelog .. _#68: https://github.com/danielquinn/paperless/issues/68 .. _#71: https://github.com/danielquinn/paperless/issues/71 .. _#94: https://github.com/danielquinn/paperless/issues/71 +.. _#98: https://github.com/danielquinn/paperless/issues/71 diff --git a/docs/troubleshooting.rst b/docs/troubleshooting.rst index 0fa7c1a29..39228ed48 100644 --- a/docs/troubleshooting.rst +++ b/docs/troubleshooting.rst @@ -3,17 +3,47 @@ Troubleshooting =============== -.. _troubleshooting_ocr_language_files_missing: +.. _troubleshooting-languagemissing: Consumer warns ``OCR for XX failed`` ------------------------------------ -If you find the OCR accuracy to be too low, and/or the document consumer warns that ``OCR for -XX failed, but we're going to stick with what we've got since FORGIVING_OCR is enabled``, then you -might need to install the `Tesseract language files -`_ marching your documents languages. +If you find the OCR accuracy to be too low, and/or the document consumer warns +that ``OCR for XX failed, but we're going to stick with what we've got since +FORGIVING_OCR is enabled``, then you might need to install the +`Tesseract language files `_ +marching your documents languages. -As an example, if you are running Paperless from the Vagrant setup provided (or from any Ubuntu or Debian -box), and your documents are written in Spanish you may need to run:: +As an example, if you are running Paperless from the Vagrant setup provided +(or from any Ubuntu or Debian box), and your documents are written in Spanish +you may need to run:: apt-get install -y tesseract-ocr-spa + + +.. _troubleshooting-convertpixelcache: + +Consumer dies with ``convert: unable to extent pixel cache`` +------------------------------------------------------------ + +During the consumption process, Paperless invokes ImageMagick's ``convert`` +program to translate the source document into something that the OCR engine can +understand and this can burn a Very Large amount of memory if the original +document is rather long. Similarly, if your system doesn't have a lot of +memory to begin with (ie. a Rasberry Pi), then this can happen for even +medium-sized documents. + +The solution is to tell ImageMagick *not* to Use All The RAM, as is its +default, and instead tell it to used a fixed amount. ``convert`` will then +break up the job into hundreds of individual files and use them to slowly +compile the finished image. Simply set ``PAPERLESS_CONVERT_MEMORY_LIMIT`` in +``/etc/paperless.conf`` to something like ``32000000`` and you'll limit +``convert`` to 32MB. Fiddle with this value as you like. + +**HOWEVER**: Simply setting this value may not be enough on system where +``/tmp`` is mounted as tmpfs, as this is where ``convert`` will write its +temporary files. In these cases (most Systemd machines), you need to tell +ImageMagick to use a different space for its scratch work. You do this by +setting ``PAPERLESS_CONVERT_TMPDIR`` in ``/etc/paperless.conf`` to somewhere +that's actually on a physical disk (and writable by the user running +Paperless), like ``/var/tmp/paperless`` or ``/home/my_user/tmp`` in a pinch.