There appears to be quite the mess out there with regard to how DRF
handles filtering. DRF has its own built-in stuff, but recommends
django_filter for the advanced stuff, which has its own overriding
module that explodes with this message when used as per the
documentation:
AttributeError: 'NoneType' object has no attribute 'DjangoFilterBackend'
Then there's djangorestframework-filter, another package that claims to
do the same thing, that does everything just differently enough that
nothing worked while I had it enabled.
I ended up using django_filter, but doing so importing each element
explicitly, rather than just using the recommended (and broken, at least
in this project) method of:
import django_filter.restframework as fitlers
Anyway, this should bring the dependencies up to date, and strips out a
lot of redundant code.
* PEP8 conformity
* rename run_post_consume_external_script to run_post_consume_script
* rename run_pre_consume_external_script to run_pre_consume_script
* change order of declaration and use from post...pre to pre...post
At the moment, every page in a PDF will be processed one by one using
tesseract. Since the processing of a single page is independent from every
other page, one can make use of multi-core machines.
This PR introduces a multiprocessing pool to process multiple pages
simultaneously. The amount of threads to use can be specified in the
environment variable `PAPERLESS_OCR_THREADS`. This will default to the
number of cores/hyperthreads Python detects for your system.
Rename exporter to export and fixt some debugging
Account for files not matching the sender/title pattern
Added a safety note
Wrong regex on the name parser
Renamed the command to something slightly less ambiguous