Daniel Quinn
a256d5ee2f
Merge pull request #37 from jat255/DOCFIX_documentation_badge
...
Make docs badge in readme redirect to documentation, not image
2016-02-15 16:59:30 +00:00
Joshua Taillon
d2757707b3
Make docs badge in readme redirect to documentation, not image
2016-02-15 11:58:07 -05:00
Daniel Quinn
9a437dc9f6
Merge pull request #35 from pitkley/fix/matching-logic
...
Fix matching if user supplied an empty value
2016-02-14 19:21:50 +00:00
Pit Kleyersburg
7b227ffa2f
Fix matching if user supplied an empty value
2016-02-14 19:47:05 +01:00
Daniel Quinn
aea4af5d3b
Version bump and feature update
2016-02-14 17:18:28 +00:00
Daniel Quinn
a0f4f6c5f2
Fixed merge conflict and did some pep8
2016-02-14 17:13:48 +00:00
Daniel Quinn
4689e2b975
Merge pull request #32 from pitkley/feature/single-page-langdetect
...
Detect language only on first page of PDF
2016-02-14 16:56:30 +00:00
Pit Kleyersburg
aeab9a0e81
Detect language only on one page of PDF
...
To detect the language currently the entire document gets processed. If
a different language has been detected than the default one, the entire
document will be processed again for the new language.
This PR analyzes the middle page for its language and either processes
the remaining pages with the default language if it didn't differ, or
processes all pages for the new guessed language.
The amount of processed pages comes down from the worst case `2n` to
worst case `n+1`.
2016-02-14 17:55:13 +01:00
Daniel Quinn
7843ea5037
Added and implemented a rudimentary logger
2016-02-14 16:09:52 +00:00
Daniel Quinn
9162e41507
Merge pull request #33 from pitkley/fix/parallelism
...
Ensure `OCR_THREADS` is integer, add documentation
2016-02-14 15:40:20 +00:00
Pit Kleyersburg
20b2408dbb
Ensure OCR_THREADS
is integer, add documentation
2016-02-14 16:37:38 +01:00
Daniel Quinn
88acf50fe0
Merge pull request #31 from pitkley/feature/paralellism
...
This is great. It seriously sped up the OCR time.
2016-02-14 15:29:05 +00:00
Pit Kleyersburg
f5beda9c56
Enable parallel OCR processing
...
At the moment, every page in a PDF will be processed one by one using
tesseract. Since the processing of a single page is independent from every
other page, one can make use of multi-core machines.
This PR introduces a multiprocessing pool to process multiple pages
simultaneously. The amount of threads to use can be specified in the
environment variable `PAPERLESS_OCR_THREADS`. This will default to the
number of cores/hyperthreads Python detects for your system.
2016-02-14 15:57:42 +01:00
Daniel Quinn
6b0a537bff
Added support for a shared secret in email
2016-02-14 03:01:24 +00:00
Daniel Quinn
3b5d4cdd39
Added some error handling
2016-02-14 01:32:25 +00:00
Daniel Quinn
fc5d89c6fc
Added a default algorithm
2016-02-14 01:30:41 +00:00
Daniel Quinn
d9b7851de9
Added a default algorithm
2016-02-14 01:30:18 +00:00
Daniel Quinn
cec9968cdb
Documented consumption
2016-02-14 00:10:49 +00:00
Daniel Quinn
330dfa544b
Fixed a typo in the description. There's no need for a new migration here.
2016-02-14 00:10:37 +00:00
Daniel Quinn
294f104474
Merge branch 'master' into feature/images-as-docs
2016-02-13 01:01:10 +00:00
Daniel Quinn
68fa7d68fa
Merge branch 'master' of github.com:danielquinn/paperless
2016-02-13 00:59:36 +00:00
Daniel Quinn
2ed2d641b5
Added a note about the plight of Apple users.
2016-02-13 00:59:19 +00:00
Daniel Quinn
a846b3f7b8
Adding some more debugging
2016-02-13 00:57:05 +00:00
Daniel Quinn
b7859a0ff3
Merge pull request #26 from wttw/master
...
Document cloning from public URL rather than ssh
2016-02-12 20:30:07 +00:00
Steve Atkins
a4903049a3
Document cloning from public URL rather than ssh
2016-02-12 11:36:07 -08:00
Daniel Quinn
9ed8a2b2d7
Merge branch 'master' into feature/images-as-docs
2016-02-12 09:03:46 +00:00
Daniel Quinn
1d4b87ee46
Update for #22
2016-02-12 08:54:04 +00:00
Daniel Quinn
840472071c
Added the required verbosity reference
2016-02-12 08:27:28 +00:00
Daniel Quinn
2421f559be
Simpler regex
2016-02-12 08:27:09 +00:00
Daniel Quinn
a022fcb8f1
Fixed the auto-naming regexes
2016-02-11 22:05:55 +00:00
Daniel Quinn
7aadab23cc
Added the Renderable mixin because DRY
2016-02-11 22:05:38 +00:00
Daniel Quinn
ef1639208c
Tests for the consumer
2016-02-11 12:25:23 +00:00
Daniel Quinn
cef4abc01d
version bump
2016-02-11 12:25:12 +00:00
Daniel Quinn
78ee138ad7
Added migration and changelog updates
2016-02-11 12:25:00 +00:00
Daniel Quinn
c423a13f85
Added a simple re-tagger
2016-02-11 12:24:18 +00:00
Daniel Quinn
39134b517e
Cleaned up file_name()
2016-02-10 23:53:48 +00:00
Daniel Quinn
a892abc701
Added dateutil
2016-02-10 23:50:58 +00:00
Daniel Quinn
4a078dcfbc
Merge branch 'master' into feature/images-as-docs
2016-02-09 17:20:45 +00:00
Daniel Quinn
642b2f7ee3
Merge pull request #18 from mrwacky42/master
...
Add other prerequisites for Vagrant
2016-02-09 09:41:53 +00:00
Sharif Nassar
6115b2f03d
Add other prerequisites
...
Vagrant setup didn't work for me unless I manually installed tesseract and ImageMagick.
2016-02-09 01:07:48 -08:00
Daniel Quinn
0eaed36420
The 'API' is written but untested
2016-02-08 23:46:16 +00:00
Daniel Quinn
212752f46e
Fixt the tags to be optional
2016-02-08 17:28:59 +00:00
Daniel Quinn
0c729e5675
Changed the name, forgot to change the check.
...
Closes #17
2016-02-08 11:14:57 +00:00
Daniel Quinn
e5e4ee0350
Added file magic
2016-02-08 11:12:14 +00:00
Daniel Quinn
c4311af263
Cleaned up the tests
2016-02-06 17:41:11 +00:00
Daniel Quinn
febb45af81
Prettied up the interface a little
2016-02-06 17:27:17 +00:00
Daniel Quinn
ce69e37256
Linked tag labels
2016-02-06 17:14:44 +00:00
Daniel Quinn
48761911b3
Image imports and consumption by mail work
2016-02-06 17:05:36 +00:00
Daniel Quinn
71075a691a
The mailconsumer isn't a consumer at all. Best fixt that
2016-02-05 20:15:08 +00:00
Daniel Quinn
d8ad6b589b
Added pytest and broke up the consumer into file and mail
2016-02-05 00:23:36 +00:00