Improves the docs: OCRing files in languages other than English + fixes typos

This commit is contained in:
Jaime Gómez 2016-03-21 21:57:36 +01:00
parent 840626e571
commit 8115cf8905
5 changed files with 22 additions and 3 deletions

View File

@ -59,7 +59,7 @@ powerful tools.
* `ImageMagick`_ converts the images between colour and greyscale.
* `Tesseract`_ does the character recognition.
* `Unpaper`_ despeckles and and deskews the scanned image.
* `Unpaper`_ despeckles and deskews the scanned image.
* `GNU Privacy Guard`_ is used as the encryption backend.
* `Python 3`_ is the language of the project.

View File

@ -128,7 +128,7 @@ following name/value pairs:
don't start uploading stuff to your server. The means of generating this
signature is defined below.
Specify ``enctype="multipart/form-data"``, and then POST your file with:::
Specify ``enctype="multipart/form-data"``, and then POST your file with::
Content-Disposition: form-data; name="document"; filename="whatever.pdf"

View File

@ -33,4 +33,5 @@ Contents
api
utilities
migrating
troubleshooting
changelog

View File

@ -8,7 +8,7 @@ should work) that has the following software installed on it:
* `Python3`_ (with development libraries, pip and virtualenv)
* `GNU Privacy Guard`_
* `Tesseract`_
* `Tesseract`_, plus it's language files matching your document base.
* `Imagemagick`_
* `unpaper`_

18
docs/troubleshooting.rst Normal file
View File

@ -0,0 +1,18 @@
.. _troubleshooting:
Troubleshooting
===============
.. _troubleshooting_ocr_language_files_missing:
Consumer warns ``OCR for XX failed``
------------------------------------
If you find the OCR accuracy to be too low, and/or the document consumer warns that ``OCR for
XX failed, but we're going to stick with what we've got since FORGIVING_OCR is enabled``, then you
might need to install the `Tesseract language files
<http://packages.ubuntu.com/search?keywords=tesseract-ocr>`_ marching your documents languages.
As an example, if your documents are written in Spanish you may need to run::
apt-get install -y tesseract-ocr-spa