diff --git a/README.rst b/README.rst index 71b59750b..6eec07d72 100644 --- a/README.rst +++ b/README.rst @@ -59,7 +59,7 @@ powerful tools. * `ImageMagick`_ converts the images between colour and greyscale. * `Tesseract`_ does the character recognition. -* `Unpaper`_ despeckles and and deskews the scanned image. +* `Unpaper`_ despeckles and deskews the scanned image. * `GNU Privacy Guard`_ is used as the encryption backend. * `Python 3`_ is the language of the project. diff --git a/docs/consumption.rst b/docs/consumption.rst index eadf12823..6e5bd8574 100644 --- a/docs/consumption.rst +++ b/docs/consumption.rst @@ -128,7 +128,7 @@ following name/value pairs: don't start uploading stuff to your server. The means of generating this signature is defined below. -Specify ``enctype="multipart/form-data"``, and then POST your file with::: +Specify ``enctype="multipart/form-data"``, and then POST your file with:: Content-Disposition: form-data; name="document"; filename="whatever.pdf" diff --git a/docs/index.rst b/docs/index.rst index 47710d376..43f77b15a 100644 --- a/docs/index.rst +++ b/docs/index.rst @@ -33,4 +33,5 @@ Contents api utilities migrating + troubleshooting changelog diff --git a/docs/requirements.rst b/docs/requirements.rst index 5d4fddaf0..a1567361a 100644 --- a/docs/requirements.rst +++ b/docs/requirements.rst @@ -8,7 +8,7 @@ should work) that has the following software installed on it: * `Python3`_ (with development libraries, pip and virtualenv) * `GNU Privacy Guard`_ -* `Tesseract`_ +* `Tesseract`_, plus its language files matching your document base. * `Imagemagick`_ * `unpaper`_ diff --git a/docs/troubleshooting.rst b/docs/troubleshooting.rst new file mode 100644 index 000000000..0fa7c1a29 --- /dev/null +++ b/docs/troubleshooting.rst @@ -0,0 +1,19 @@ +.. _troubleshooting: + +Troubleshooting +=============== + +.. _troubleshooting_ocr_language_files_missing: + +Consumer warns ``OCR for XX failed`` +------------------------------------ + +If you find the OCR accuracy to be too low, and/or the document consumer warns that ``OCR for +XX failed, but we're going to stick with what we've got since FORGIVING_OCR is enabled``, then you +might need to install the `Tesseract language files +`_ marching your documents languages. + +As an example, if you are running Paperless from the Vagrant setup provided (or from any Ubuntu or Debian +box), and your documents are written in Spanish you may need to run:: + + apt-get install -y tesseract-ocr-spa diff --git a/src/documents/models.py b/src/documents/models.py index 8880935e3..cf32fabe3 100644 --- a/src/documents/models.py +++ b/src/documents/models.py @@ -155,7 +155,7 @@ class Document(models.Model): ) tags = models.ManyToManyField( Tag, related_name="documents", blank=True) - created = models.DateTimeField(default=timezone.now, editable=False) + created = models.DateTimeField(default=timezone.now) modified = models.DateTimeField(auto_now=True, editable=False) class Meta(object):