As per requirements.txt we are using Django version 1.10. It makes sense
to link to the documentation for that version as well.
Also, the documentation for the previous version has a notice on the top
that informs about the version being unsafe which is a bit disconcerting
when seeing it.
The problem with the original instruction is that systemd creates a
symlink pointing to the service file in the paperless directory. A user
is unlikely to leave the changes in the service files committed
(especially not on a master branch checkout) and they are easily lost and
the services fail to start without obvious reason.
To avoid this we simply copy the service files to the systemd directory
directly and use the files in the repository only as an example.
This makes it clear that only a specific set of characters is allowed to
be used for email titles. It is worth mentioning this in the
documentation as it otherwise needs to be figured out from the Paperless
sources [0].
[0] SAFE_REGEX in src/documents/models.py
The configuration does not have to be hardcoded in settings.py anymore,
and instead happens in the config file. Also, we added that the emails
are checked at startup [0].
[0] see commit 3153bbd6a8d674362eccb4d48b8458b33298f6a9
Especially when first setting up the configuration for consuming
documents from emails it makes sense to quickly test the changes. Having
to wait for 10 minutes is not acceptable.
There are two ways around it that come to my mind: the simple approach
is to always fetch the emails when Paperless first starts. This way the
fetching of emails can be tested straight away.
The alternative would be to have a configuration option that allows to
set the interval in which emails are checked. The user could then reduce
it to test the setup and increase it again later on. This seems
needlessly complicated though, so fetching at startup it is.
The latest pyocr version now allows running it with the latest tesseract
version. Hopefully this means better OCR results.
I am not sure about whether there are binary packages for the latest
tesseract. But on my setup it was simply a case of checking out the
master branch [0] and compiling + installing from there. It seems to work
fine with paperless as well.
[0] https://github.com/tesseract-ocr/tesseract