mirror of
https://github.com/paperless-ngx/paperless-ngx.git
synced 2025-04-02 13:45:10 -05:00
Improves the docs: OCRing files in languages other than English + fixes typos
This commit is contained in:
parent
840626e571
commit
8115cf8905
@ -59,7 +59,7 @@ powerful tools.
|
||||
|
||||
* `ImageMagick`_ converts the images between colour and greyscale.
|
||||
* `Tesseract`_ does the character recognition.
|
||||
* `Unpaper`_ despeckles and and deskews the scanned image.
|
||||
* `Unpaper`_ despeckles and deskews the scanned image.
|
||||
* `GNU Privacy Guard`_ is used as the encryption backend.
|
||||
* `Python 3`_ is the language of the project.
|
||||
|
||||
|
@ -128,7 +128,7 @@ following name/value pairs:
|
||||
don't start uploading stuff to your server. The means of generating this
|
||||
signature is defined below.
|
||||
|
||||
Specify ``enctype="multipart/form-data"``, and then POST your file with:::
|
||||
Specify ``enctype="multipart/form-data"``, and then POST your file with::
|
||||
|
||||
Content-Disposition: form-data; name="document"; filename="whatever.pdf"
|
||||
|
||||
|
@ -33,4 +33,5 @@ Contents
|
||||
api
|
||||
utilities
|
||||
migrating
|
||||
troubleshooting
|
||||
changelog
|
||||
|
@ -8,7 +8,7 @@ should work) that has the following software installed on it:
|
||||
|
||||
* `Python3`_ (with development libraries, pip and virtualenv)
|
||||
* `GNU Privacy Guard`_
|
||||
* `Tesseract`_
|
||||
* `Tesseract`_, plus it's language files matching your document base.
|
||||
* `Imagemagick`_
|
||||
* `unpaper`_
|
||||
|
||||
|
18
docs/troubleshooting.rst
Normal file
18
docs/troubleshooting.rst
Normal file
@ -0,0 +1,18 @@
|
||||
.. _troubleshooting:
|
||||
|
||||
Troubleshooting
|
||||
===============
|
||||
|
||||
.. _troubleshooting_ocr_language_files_missing:
|
||||
|
||||
Consumer warns ``OCR for XX failed``
|
||||
------------------------------------
|
||||
|
||||
If you find the OCR accuracy to be too low, and/or the document consumer warns that ``OCR for
|
||||
XX failed, but we're going to stick with what we've got since FORGIVING_OCR is enabled``, then you
|
||||
might need to install the `Tesseract language files
|
||||
<http://packages.ubuntu.com/search?keywords=tesseract-ocr>`_ marching your documents languages.
|
||||
|
||||
As an example, if your documents are written in Spanish you may need to run::
|
||||
|
||||
apt-get install -y tesseract-ocr-spa
|
Loading…
x
Reference in New Issue
Block a user