10039 Commits

Author SHA1 Message Date
Daniel Quinn
1c45ca10d4 Patched sorting 2016-02-17 00:11:57 +00:00
Daniel Quinn
550184cbae Patched sorting 2016-02-17 00:11:46 +00:00
Daniel Quinn
52f242574f Merge branch 'pitkley-fix/secure-temporary-files' 2016-02-17 00:10:54 +00:00
Daniel Quinn
6f95b05287 Support appropriate sorting for long documents 2016-02-17 00:10:05 +00:00
Pit Kleyersburg
46f8f492f5 Safely and non-randomly create scratch directory
Creating the scratch-files in `_get_grayscale` using a random integer is
for one inherently unsafe and can cause a collision. On the other hand,
it should be unnecessary given that the files will be cleaned up after
the OCR run.

Since we don't know if OCR runs might be parallel in the future, this
commit implements thread-safe and deterministic directory-creation.

Additionally it fixes the call to `_cleanup` by `consume`. In the
current implementation `_cleanup` will not be called if the last
consumed document failed with an `OCRError`, this commit fixes this.
2016-02-16 12:15:57 +01:00
Daniel Quinn
cebc44f2c9 API is halfway there 2016-02-16 09:28:34 +00:00
Daniel Quinn
bbe7a02b4d Added a screenshot and cleaned things up a bit. 2016-02-16 09:22:51 +00:00
Daniel Quinn
5de4951a46 Added a screenshot, now I have to figure out how to put it in the readme. 2016-02-16 09:08:35 +00:00
Daniel Quinn
8a5d4b1cc8 Merge branch 'master' of github.com:danielquinn/paperless 2016-02-15 22:38:25 +00:00
Daniel Quinn
2f0da8ab25 Added download_url to the Document model 2016-02-15 22:38:18 +00:00
Daniel Quinn
a256d5ee2f Merge pull request #37 from jat255/DOCFIX_documentation_badge
Make docs badge in readme redirect to documentation, not image
2016-02-15 16:59:30 +00:00
Joshua Taillon
d2757707b3 Make docs badge in readme redirect to documentation, not image 2016-02-15 11:58:07 -05:00
Daniel Quinn
9a437dc9f6 Merge pull request #35 from pitkley/fix/matching-logic
Fix matching if user supplied an empty value
2016-02-14 19:21:50 +00:00
Pit Kleyersburg
7b227ffa2f Fix matching if user supplied an empty value 2016-02-14 19:47:05 +01:00
Daniel Quinn
aea4af5d3b Version bump and feature update 2016-02-14 17:18:28 +00:00
Daniel Quinn
a0f4f6c5f2 Fixed merge conflict and did some pep8 2016-02-14 17:13:48 +00:00
Daniel Quinn
4689e2b975 Merge pull request #32 from pitkley/feature/single-page-langdetect
Detect language only on first page of PDF
2016-02-14 16:56:30 +00:00
Pit Kleyersburg
aeab9a0e81 Detect language only on one page of PDF
To detect the language currently the entire document gets processed. If
a different language has been detected than the default one, the entire
document will be processed again for the new language.

This PR analyzes the middle page for its language and either processes
the remaining pages with the default language if it didn't differ, or
processes all pages for the new guessed language.

The amount of processed pages comes down from the worst case `2n` to
worst case `n+1`.
2016-02-14 17:55:13 +01:00
Daniel Quinn
7843ea5037 Added and implemented a rudimentary logger 2016-02-14 16:09:52 +00:00
Daniel Quinn
9162e41507 Merge pull request #33 from pitkley/fix/parallelism
Ensure `OCR_THREADS` is integer, add documentation
2016-02-14 15:40:20 +00:00
Pit Kleyersburg
20b2408dbb Ensure OCR_THREADS is integer, add documentation 2016-02-14 16:37:38 +01:00
Daniel Quinn
88acf50fe0 Merge pull request #31 from pitkley/feature/paralellism
This is great.  It seriously sped up the OCR time.
2016-02-14 15:29:05 +00:00
Pit Kleyersburg
f5beda9c56 Enable parallel OCR processing
At the moment, every page in a PDF will be processed one by one using
tesseract. Since the processing of a single page is independent from every
other page, one can make use of multi-core machines.

This PR introduces a multiprocessing pool to process multiple pages
simultaneously. The amount of threads to use can be specified in the
environment variable `PAPERLESS_OCR_THREADS`. This will default to the
number of cores/hyperthreads Python detects for your system.
2016-02-14 15:57:42 +01:00
Daniel Quinn
6b0a537bff Added support for a shared secret in email 2016-02-14 03:01:24 +00:00
Daniel Quinn
3b5d4cdd39 Added some error handling 2016-02-14 01:32:25 +00:00
Daniel Quinn
fc5d89c6fc Added a default algorithm 2016-02-14 01:30:41 +00:00
Daniel Quinn
d9b7851de9 Added a default algorithm 2016-02-14 01:30:18 +00:00
Daniel Quinn
cec9968cdb Documented consumption 2016-02-14 00:10:49 +00:00
Daniel Quinn
330dfa544b Fixed a typo in the description. There's no need for a new migration here. 2016-02-14 00:10:37 +00:00
Daniel Quinn
294f104474 Merge branch 'master' into feature/images-as-docs 2016-02-13 01:01:10 +00:00
Daniel Quinn
68fa7d68fa Merge branch 'master' of github.com:danielquinn/paperless 2016-02-13 00:59:36 +00:00
Daniel Quinn
2ed2d641b5 Added a note about the plight of Apple users. 2016-02-13 00:59:19 +00:00
Daniel Quinn
a846b3f7b8 Adding some more debugging 2016-02-13 00:57:05 +00:00
Daniel Quinn
b7859a0ff3 Merge pull request #26 from wttw/master
Document cloning from public URL rather than ssh
2016-02-12 20:30:07 +00:00
Steve Atkins
a4903049a3 Document cloning from public URL rather than ssh 2016-02-12 11:36:07 -08:00
Daniel Quinn
9ed8a2b2d7 Merge branch 'master' into feature/images-as-docs 2016-02-12 09:03:46 +00:00
Daniel Quinn
1d4b87ee46 Update for #22 2016-02-12 08:54:04 +00:00
Daniel Quinn
840472071c Added the required verbosity reference 2016-02-12 08:27:28 +00:00
Daniel Quinn
2421f559be Simpler regex 2016-02-12 08:27:09 +00:00
Daniel Quinn
a022fcb8f1 Fixed the auto-naming regexes 2016-02-11 22:05:55 +00:00
Daniel Quinn
7aadab23cc Added the Renderable mixin because DRY 2016-02-11 22:05:38 +00:00
Daniel Quinn
ef1639208c Tests for the consumer 2016-02-11 12:25:23 +00:00
Daniel Quinn
cef4abc01d version bump 2016-02-11 12:25:12 +00:00
Daniel Quinn
78ee138ad7 Added migration and changelog updates 2016-02-11 12:25:00 +00:00
Daniel Quinn
c423a13f85 Added a simple re-tagger 2016-02-11 12:24:18 +00:00
Daniel Quinn
39134b517e Cleaned up file_name() 2016-02-10 23:53:48 +00:00
Daniel Quinn
a892abc701 Added dateutil 2016-02-10 23:50:58 +00:00
Daniel Quinn
4a078dcfbc Merge branch 'master' into feature/images-as-docs 2016-02-09 17:20:45 +00:00
Daniel Quinn
642b2f7ee3 Merge pull request #18 from mrwacky42/master
Add other prerequisites for Vagrant
2016-02-09 09:41:53 +00:00
Sharif Nassar
6115b2f03d Add other prerequisites
Vagrant setup didn't work for me unless I manually installed tesseract and ImageMagick.
2016-02-09 01:07:48 -08:00