157 Commits

Author SHA1 Message Date
Daniel Quinn
7843ea5037 Added and implemented a rudimentary logger 2016-02-14 16:09:52 +00:00
Pit Kleyersburg
20b2408dbb Ensure OCR_THREADS is integer, add documentation 2016-02-14 16:37:38 +01:00
Pit Kleyersburg
f5beda9c56 Enable parallel OCR processing
At the moment, every page in a PDF will be processed one by one using
tesseract. Since the processing of a single page is independent from every
other page, one can make use of multi-core machines.

This PR introduces a multiprocessing pool to process multiple pages
simultaneously. The amount of threads to use can be specified in the
environment variable `PAPERLESS_OCR_THREADS`. This will default to the
number of cores/hyperthreads Python detects for your system.
2016-02-14 15:57:42 +01:00
Daniel Quinn
a846b3f7b8 Adding some more debugging 2016-02-13 00:57:05 +00:00
Daniel Quinn
2421f559be Simpler regex 2016-02-12 08:27:09 +00:00
Daniel Quinn
a022fcb8f1 Fixed the auto-naming regexes 2016-02-11 22:05:55 +00:00
Daniel Quinn
48761911b3 Image imports and consumption by mail work 2016-02-06 17:05:36 +00:00