10079 Commits

Author SHA1 Message Date
Daniel Quinn
a256d5ee2f Merge pull request #37 from jat255/DOCFIX_documentation_badge
Make docs badge in readme redirect to documentation, not image
2016-02-15 16:59:30 +00:00
Joshua Taillon
d2757707b3 Make docs badge in readme redirect to documentation, not image 2016-02-15 11:58:07 -05:00
Daniel Quinn
9a437dc9f6 Merge pull request #35 from pitkley/fix/matching-logic
Fix matching if user supplied an empty value
2016-02-14 19:21:50 +00:00
Pit Kleyersburg
7b227ffa2f Fix matching if user supplied an empty value 2016-02-14 19:47:05 +01:00
Daniel Quinn
aea4af5d3b Version bump and feature update 2016-02-14 17:18:28 +00:00
Daniel Quinn
a0f4f6c5f2 Fixed merge conflict and did some pep8 2016-02-14 17:13:48 +00:00
Daniel Quinn
4689e2b975 Merge pull request #32 from pitkley/feature/single-page-langdetect
Detect language only on first page of PDF
2016-02-14 16:56:30 +00:00
Pit Kleyersburg
aeab9a0e81 Detect language only on one page of PDF
To detect the language currently the entire document gets processed. If
a different language has been detected than the default one, the entire
document will be processed again for the new language.

This PR analyzes the middle page for its language and either processes
the remaining pages with the default language if it didn't differ, or
processes all pages for the new guessed language.

The amount of processed pages comes down from the worst case `2n` to
worst case `n+1`.
2016-02-14 17:55:13 +01:00
Daniel Quinn
7843ea5037 Added and implemented a rudimentary logger 2016-02-14 16:09:52 +00:00
Daniel Quinn
9162e41507 Merge pull request #33 from pitkley/fix/parallelism
Ensure `OCR_THREADS` is integer, add documentation
2016-02-14 15:40:20 +00:00
Pit Kleyersburg
20b2408dbb Ensure OCR_THREADS is integer, add documentation 2016-02-14 16:37:38 +01:00
Daniel Quinn
88acf50fe0 Merge pull request #31 from pitkley/feature/paralellism
This is great.  It seriously sped up the OCR time.
2016-02-14 15:29:05 +00:00
Pit Kleyersburg
f5beda9c56 Enable parallel OCR processing
At the moment, every page in a PDF will be processed one by one using
tesseract. Since the processing of a single page is independent from every
other page, one can make use of multi-core machines.

This PR introduces a multiprocessing pool to process multiple pages
simultaneously. The amount of threads to use can be specified in the
environment variable `PAPERLESS_OCR_THREADS`. This will default to the
number of cores/hyperthreads Python detects for your system.
2016-02-14 15:57:42 +01:00
Daniel Quinn
6b0a537bff Added support for a shared secret in email 2016-02-14 03:01:24 +00:00
Daniel Quinn
3b5d4cdd39 Added some error handling 2016-02-14 01:32:25 +00:00
Daniel Quinn
fc5d89c6fc Added a default algorithm 2016-02-14 01:30:41 +00:00
Daniel Quinn
d9b7851de9 Added a default algorithm 2016-02-14 01:30:18 +00:00
Daniel Quinn
cec9968cdb Documented consumption 2016-02-14 00:10:49 +00:00
Daniel Quinn
330dfa544b Fixed a typo in the description. There's no need for a new migration here. 2016-02-14 00:10:37 +00:00
Daniel Quinn
294f104474 Merge branch 'master' into feature/images-as-docs 2016-02-13 01:01:10 +00:00
Daniel Quinn
68fa7d68fa Merge branch 'master' of github.com:danielquinn/paperless 2016-02-13 00:59:36 +00:00
Daniel Quinn
2ed2d641b5 Added a note about the plight of Apple users. 2016-02-13 00:59:19 +00:00
Daniel Quinn
a846b3f7b8 Adding some more debugging 2016-02-13 00:57:05 +00:00
Daniel Quinn
b7859a0ff3 Merge pull request #26 from wttw/master
Document cloning from public URL rather than ssh
2016-02-12 20:30:07 +00:00
Steve Atkins
a4903049a3 Document cloning from public URL rather than ssh 2016-02-12 11:36:07 -08:00
Daniel Quinn
9ed8a2b2d7 Merge branch 'master' into feature/images-as-docs 2016-02-12 09:03:46 +00:00
Daniel Quinn
1d4b87ee46 Update for #22 2016-02-12 08:54:04 +00:00
Daniel Quinn
840472071c Added the required verbosity reference 2016-02-12 08:27:28 +00:00
Daniel Quinn
2421f559be Simpler regex 2016-02-12 08:27:09 +00:00
Daniel Quinn
a022fcb8f1 Fixed the auto-naming regexes 2016-02-11 22:05:55 +00:00
Daniel Quinn
7aadab23cc Added the Renderable mixin because DRY 2016-02-11 22:05:38 +00:00
Daniel Quinn
ef1639208c Tests for the consumer 2016-02-11 12:25:23 +00:00
Daniel Quinn
cef4abc01d version bump 2016-02-11 12:25:12 +00:00
Daniel Quinn
78ee138ad7 Added migration and changelog updates 2016-02-11 12:25:00 +00:00
Daniel Quinn
c423a13f85 Added a simple re-tagger 2016-02-11 12:24:18 +00:00
Daniel Quinn
39134b517e Cleaned up file_name() 2016-02-10 23:53:48 +00:00
Daniel Quinn
a892abc701 Added dateutil 2016-02-10 23:50:58 +00:00
Daniel Quinn
4a078dcfbc Merge branch 'master' into feature/images-as-docs 2016-02-09 17:20:45 +00:00
Daniel Quinn
642b2f7ee3 Merge pull request #18 from mrwacky42/master
Add other prerequisites for Vagrant
2016-02-09 09:41:53 +00:00
Sharif Nassar
6115b2f03d Add other prerequisites
Vagrant setup didn't work for me unless I manually installed tesseract and ImageMagick.
2016-02-09 01:07:48 -08:00
Daniel Quinn
0eaed36420 The 'API' is written but untested 2016-02-08 23:46:16 +00:00
Daniel Quinn
212752f46e Fixt the tags to be optional 2016-02-08 17:28:59 +00:00
Daniel Quinn
0c729e5675 Changed the name, forgot to change the check.
Closes #17
2016-02-08 11:14:57 +00:00
Daniel Quinn
e5e4ee0350 Added file magic 2016-02-08 11:12:14 +00:00
Daniel Quinn
c4311af263 Cleaned up the tests 2016-02-06 17:41:11 +00:00
Daniel Quinn
febb45af81 Prettied up the interface a little 2016-02-06 17:27:17 +00:00
Daniel Quinn
ce69e37256 Linked tag labels 2016-02-06 17:14:44 +00:00
Daniel Quinn
48761911b3 Image imports and consumption by mail work 2016-02-06 17:05:36 +00:00
Daniel Quinn
71075a691a The mailconsumer isn't a consumer at all. Best fixt that 2016-02-05 20:15:08 +00:00
Daniel Quinn
d8ad6b589b Added pytest and broke up the consumer into file and mail 2016-02-05 00:23:36 +00:00