Merge pull request #96 from JaimeObregon/master

Improves the docs: OCRing files in languages other than English + fixes typos
This commit is contained in:
Daniel Quinn 2016-03-23 11:27:20 +00:00
commit ef54e2f94a
5 changed files with 23 additions and 3 deletions

View File

@ -59,7 +59,7 @@ powerful tools.
* `ImageMagick`_ converts the images between colour and greyscale.
* `Tesseract`_ does the character recognition.
* `Unpaper`_ despeckles and and deskews the scanned image.
* `Unpaper`_ despeckles and deskews the scanned image.
* `GNU Privacy Guard`_ is used as the encryption backend.
* `Python 3`_ is the language of the project.

View File

@ -128,7 +128,7 @@ following name/value pairs:
don't start uploading stuff to your server. The means of generating this
signature is defined below.
Specify ``enctype="multipart/form-data"``, and then POST your file with:::
Specify ``enctype="multipart/form-data"``, and then POST your file with::
Content-Disposition: form-data; name="document"; filename="whatever.pdf"

View File

@ -33,4 +33,5 @@ Contents
api
utilities
migrating
troubleshooting
changelog

View File

@ -8,7 +8,7 @@ should work) that has the following software installed on it:
* `Python3`_ (with development libraries, pip and virtualenv)
* `GNU Privacy Guard`_
* `Tesseract`_
* `Tesseract`_, plus its language files matching your document base.
* `Imagemagick`_
* `unpaper`_

19
docs/troubleshooting.rst Normal file
View File

@ -0,0 +1,19 @@
.. _troubleshooting:
Troubleshooting
===============
.. _troubleshooting_ocr_language_files_missing:
Consumer warns ``OCR for XX failed``
------------------------------------
If you find the OCR accuracy to be too low, and/or the document consumer warns that ``OCR for
XX failed, but we're going to stick with what we've got since FORGIVING_OCR is enabled``, then you
might need to install the `Tesseract language files
<http://packages.ubuntu.com/search?keywords=tesseract-ocr>`_ marching your documents languages.
As an example, if you are running Paperless from the Vagrant setup provided (or from any Ubuntu or Debian
box), and your documents are written in Spanish you may need to run::
apt-get install -y tesseract-ocr-spa