Introduce some creative code around setting of ALLOWED_HOSTS that defaults to ['*']. Also added PAPERLESS_ALLOWED_HOSTS to paperless.conf.example with an explanation as to what it's for
There appears to be quite the mess out there with regard to how DRF
handles filtering. DRF has its own built-in stuff, but recommends
django_filter for the advanced stuff, which has its own overriding
module that explodes with this message when used as per the
documentation:
AttributeError: 'NoneType' object has no attribute 'DjangoFilterBackend'
Then there's djangorestframework-filter, another package that claims to
do the same thing, that does everything just differently enough that
nothing worked while I had it enabled.
I ended up using django_filter, but doing so importing each element
explicitly, rather than just using the recommended (and broken, at least
in this project) method of:
import django_filter.restframework as fitlers
Anyway, this should bring the dependencies up to date, and strips out a
lot of redundant code.
* PEP8 conformity
* rename run_post_consume_external_script to run_post_consume_script
* rename run_pre_consume_external_script to run_pre_consume_script
* change order of declaration and use from post...pre to pre...post
At the moment, every page in a PDF will be processed one by one using
tesseract. Since the processing of a single page is independent from every
other page, one can make use of multi-core machines.
This PR introduces a multiprocessing pool to process multiple pages
simultaneously. The amount of threads to use can be specified in the
environment variable `PAPERLESS_OCR_THREADS`. This will default to the
number of cores/hyperthreads Python detects for your system.
Rename exporter to export and fixt some debugging
Account for files not matching the sender/title pattern
Added a safety note
Wrong regex on the name parser
Renamed the command to something slightly less ambiguous