3078 Commits

Author SHA1 Message Date
Daniel Quinn
cebc44f2c9 API is halfway there 2016-02-16 09:28:34 +00:00
Daniel Quinn
2f0da8ab25 Added download_url to the Document model 2016-02-15 22:38:18 +00:00
Pit Kleyersburg
7b227ffa2f Fix matching if user supplied an empty value 2016-02-14 19:47:05 +01:00
Daniel Quinn
aea4af5d3b Version bump and feature update 2016-02-14 17:18:28 +00:00
Daniel Quinn
a0f4f6c5f2 Fixed merge conflict and did some pep8 2016-02-14 17:13:48 +00:00
Pit Kleyersburg
aeab9a0e81 Detect language only on one page of PDF
To detect the language currently the entire document gets processed. If
a different language has been detected than the default one, the entire
document will be processed again for the new language.

This PR analyzes the middle page for its language and either processes
the remaining pages with the default language if it didn't differ, or
processes all pages for the new guessed language.

The amount of processed pages comes down from the worst case `2n` to
worst case `n+1`.
2016-02-14 17:55:13 +01:00
Daniel Quinn
7843ea5037 Added and implemented a rudimentary logger 2016-02-14 16:09:52 +00:00
Pit Kleyersburg
20b2408dbb Ensure OCR_THREADS is integer, add documentation 2016-02-14 16:37:38 +01:00
Pit Kleyersburg
f5beda9c56 Enable parallel OCR processing
At the moment, every page in a PDF will be processed one by one using
tesseract. Since the processing of a single page is independent from every
other page, one can make use of multi-core machines.

This PR introduces a multiprocessing pool to process multiple pages
simultaneously. The amount of threads to use can be specified in the
environment variable `PAPERLESS_OCR_THREADS`. This will default to the
number of cores/hyperthreads Python detects for your system.
2016-02-14 15:57:42 +01:00
Daniel Quinn
6b0a537bff Added support for a shared secret in email 2016-02-14 03:01:24 +00:00
Daniel Quinn
3b5d4cdd39 Added some error handling 2016-02-14 01:32:25 +00:00
Daniel Quinn
fc5d89c6fc Added a default algorithm 2016-02-14 01:30:41 +00:00
Daniel Quinn
d9b7851de9 Added a default algorithm 2016-02-14 01:30:18 +00:00
Daniel Quinn
330dfa544b Fixed a typo in the description. There's no need for a new migration here. 2016-02-14 00:10:37 +00:00
Daniel Quinn
a846b3f7b8 Adding some more debugging 2016-02-13 00:57:05 +00:00
Daniel Quinn
840472071c Added the required verbosity reference 2016-02-12 08:27:28 +00:00
Daniel Quinn
2421f559be Simpler regex 2016-02-12 08:27:09 +00:00
Daniel Quinn
a022fcb8f1 Fixed the auto-naming regexes 2016-02-11 22:05:55 +00:00
Daniel Quinn
7aadab23cc Added the Renderable mixin because DRY 2016-02-11 22:05:38 +00:00
Daniel Quinn
ef1639208c Tests for the consumer 2016-02-11 12:25:23 +00:00
Daniel Quinn
cef4abc01d version bump 2016-02-11 12:25:12 +00:00
Daniel Quinn
c423a13f85 Added a simple re-tagger 2016-02-11 12:24:18 +00:00
Daniel Quinn
39134b517e Cleaned up file_name() 2016-02-10 23:53:48 +00:00
Daniel Quinn
4a078dcfbc Merge branch 'master' into feature/images-as-docs 2016-02-09 17:20:45 +00:00
Daniel Quinn
0eaed36420 The 'API' is written but untested 2016-02-08 23:46:16 +00:00
Daniel Quinn
212752f46e Fixt the tags to be optional 2016-02-08 17:28:59 +00:00
Daniel Quinn
0c729e5675 Changed the name, forgot to change the check.
Closes #17
2016-02-08 11:14:57 +00:00
Daniel Quinn
c4311af263 Cleaned up the tests 2016-02-06 17:41:11 +00:00
Daniel Quinn
febb45af81 Prettied up the interface a little 2016-02-06 17:27:17 +00:00
Daniel Quinn
ce69e37256 Linked tag labels 2016-02-06 17:14:44 +00:00
Daniel Quinn
48761911b3 Image imports and consumption by mail work 2016-02-06 17:05:36 +00:00
Daniel Quinn
71075a691a The mailconsumer isn't a consumer at all. Best fixt that 2016-02-05 20:15:08 +00:00
Daniel Quinn
d8ad6b589b Added pytest and broke up the consumer into file and mail 2016-02-05 00:23:36 +00:00
Daniel Quinn
3bc89d23c8 Sorting the filters 2016-02-03 17:20:12 +00:00
Daniel Quinn
a70b40f618 Broke the consumer script into separate files and started on a mail consumer 2016-01-30 01:18:52 +00:00
Daniel Quinn
84d5f8cc5d Merge branch 'master' into feature/images-as-docs 2016-01-29 23:41:13 +00:00
Daniel Quinn
ace9389e5f #12: Support image documents 2016-01-29 23:18:03 +00:00
Daniel Quinn
10e4f0f5f3 Added some better admin for tags 2016-01-28 18:37:27 +00:00
Daniel Quinn
a7d041a9f5 Prettied-up the admin 2016-01-28 08:16:29 +00:00
Daniel Quinn
3026593d6c Version bump for automated tagging 2016-01-28 07:29:25 +00:00
Daniel Quinn
0ec63ae1f9 #11: automatic tagging support 2016-01-28 07:23:11 +00:00
Daniel Quinn
286292dbf9 Added some documentation 2016-01-24 20:15:50 -05:00
Daniel Quinn
04bcb1cdad Forced python3 for setups not using a virtualenv 2016-01-24 12:31:02 +00:00
Daniel Quinn
669cf1cb70 Add labels (#9) 2016-01-23 04:40:35 +00:00
Daniel Quinn
1219e81e77 Moved changes to where it should be 2016-01-23 03:44:51 +00:00
Daniel Quinn
65074b4375 Smarter check positions 2016-01-23 03:42:39 +00:00
Daniel Quinn
0eb0c88d3d Now the exporter sets the proper dates 2016-01-23 03:22:15 +00:00
Daniel Quinn
796e977894 Django insists on adding every little thing as a migration 2016-01-23 03:14:55 +00:00
Daniel Quinn
4f1bf81d5b Better variable names 2016-01-23 03:05:40 +00:00
Daniel Quinn
9e596953a3 pep8 2016-01-23 02:58:03 +00:00