Daniel Quinn 
							
						 
					 
					
						
						
							
						
						52f242574f 
					 
					
						
						
							
							Merge branch 'pitkley-fix/secure-temporary-files'  
						
						
						
						
					 
					
						2016-02-17 00:10:54 +00:00 
						 
				 
			
				
					
						
							
							
								Daniel Quinn 
							
						 
					 
					
						
						
							
						
						6f95b05287 
					 
					
						
						
							
							Support appropriate sorting for long documents  
						
						
						
						
					 
					
						2016-02-17 00:10:05 +00:00 
						 
				 
			
				
					
						
							
							
								Pit Kleyersburg 
							
						 
					 
					
						
						
							
						
						46f8f492f5 
					 
					
						
						
							
							Safely and non-randomly create scratch directory  
						
						... 
						
						
						
						Creating the scratch-files in `_get_grayscale` using a random integer is
for one inherently unsafe and can cause a collision. On the other hand,
it should be unnecessary given that the files will be cleaned up after
the OCR run.
Since we don't know if OCR runs might be parallel in the future, this
commit implements thread-safe and deterministic directory-creation.
Additionally it fixes the call to `_cleanup` by `consume`. In the
current implementation `_cleanup` will not be called if the last
consumed document failed with an `OCRError`, this commit fixes this. 
						
						
					 
					
						2016-02-16 12:15:57 +01:00 
						 
				 
			
				
					
						
							
							
								Daniel Quinn 
							
						 
					 
					
						
						
							
						
						cebc44f2c9 
					 
					
						
						
							
							API is halfway there  
						
						
						
						
					 
					
						2016-02-16 09:28:34 +00:00 
						 
				 
			
				
					
						
							
							
								Daniel Quinn 
							
						 
					 
					
						
						
							
						
						bbe7a02b4d 
					 
					
						
						
							
							Added a screenshot and cleaned things up a bit.  
						
						
						
						
					 
					
						2016-02-16 09:22:51 +00:00 
						 
				 
			
				
					
						
							
							
								Daniel Quinn 
							
						 
					 
					
						
						
							
						
						5de4951a46 
					 
					
						
						
							
							Added a screenshot, now I have to figure out how to put it in the readme.  
						
						
						
						
					 
					
						2016-02-16 09:08:35 +00:00 
						 
				 
			
				
					
						
							
							
								Daniel Quinn 
							
						 
					 
					
						
						
							
						
						8a5d4b1cc8 
					 
					
						
						
							
							Merge branch 'master' of github.com:danielquinn/paperless  
						
						
						
						
					 
					
						2016-02-15 22:38:25 +00:00 
						 
				 
			
				
					
						
							
							
								Daniel Quinn 
							
						 
					 
					
						
						
							
						
						2f0da8ab25 
					 
					
						
						
							
							Added download_url to the Document model  
						
						
						
						
					 
					
						2016-02-15 22:38:18 +00:00 
						 
				 
			
				
					
						
							
							
								Daniel Quinn 
							
						 
					 
					
						
						
							
						
						a256d5ee2f 
					 
					
						
						
							
							Merge pull request  #37  from jat255/DOCFIX_documentation_badge  
						
						... 
						
						
						
						Make docs badge in readme redirect to documentation, not image 
						
						
					 
					
						2016-02-15 16:59:30 +00:00 
						 
				 
			
				
					
						
							
							
								Joshua Taillon 
							
						 
					 
					
						
						
							
						
						d2757707b3 
					 
					
						
						
							
							Make docs badge in readme redirect to documentation, not image  
						
						
						
						
					 
					
						2016-02-15 11:58:07 -05:00 
						 
				 
			
				
					
						
							
							
								Daniel Quinn 
							
						 
					 
					
						
						
							
						
						9a437dc9f6 
					 
					
						
						
							
							Merge pull request  #35  from pitkley/fix/matching-logic  
						
						... 
						
						
						
						Fix matching if user supplied an empty value 
						
						
					 
					
						2016-02-14 19:21:50 +00:00 
						 
				 
			
				
					
						
							
							
								Pit Kleyersburg 
							
						 
					 
					
						
						
							
						
						7b227ffa2f 
					 
					
						
						
							
							Fix matching if user supplied an empty value  
						
						
						
						
					 
					
						2016-02-14 19:47:05 +01:00 
						 
				 
			
				
					
						
							
							
								Daniel Quinn 
							
						 
					 
					
						
						
							
						
						aea4af5d3b 
					 
					
						
						
							
							Version bump and feature update  
						
						
						
						
					 
					
						2016-02-14 17:18:28 +00:00 
						 
				 
			
				
					
						
							
							
								Daniel Quinn 
							
						 
					 
					
						
						
							
						
						a0f4f6c5f2 
					 
					
						
						
							
							Fixed merge conflict and did some pep8  
						
						
						
						
					 
					
						2016-02-14 17:13:48 +00:00 
						 
				 
			
				
					
						
							
							
								Daniel Quinn 
							
						 
					 
					
						
						
							
						
						4689e2b975 
					 
					
						
						
							
							Merge pull request  #32  from pitkley/feature/single-page-langdetect  
						
						... 
						
						
						
						Detect language only on first page of PDF 
						
						
					 
					
						2016-02-14 16:56:30 +00:00 
						 
				 
			
				
					
						
							
							
								Pit Kleyersburg 
							
						 
					 
					
						
						
							
						
						aeab9a0e81 
					 
					
						
						
							
							Detect language only on one page of PDF  
						
						... 
						
						
						
						To detect the language currently the entire document gets processed. If
a different language has been detected than the default one, the entire
document will be processed again for the new language.
This PR analyzes the middle page for its language and either processes
the remaining pages with the default language if it didn't differ, or
processes all pages for the new guessed language.
The amount of processed pages comes down from the worst case `2n` to
worst case `n+1`. 
						
						
					 
					
						2016-02-14 17:55:13 +01:00 
						 
				 
			
				
					
						
							
							
								Daniel Quinn 
							
						 
					 
					
						
						
							
						
						7843ea5037 
					 
					
						
						
							
							Added and implemented a rudimentary logger  
						
						
						
						
					 
					
						2016-02-14 16:09:52 +00:00 
						 
				 
			
				
					
						
							
							
								Daniel Quinn 
							
						 
					 
					
						
						
							
						
						9162e41507 
					 
					
						
						
							
							Merge pull request  #33  from pitkley/fix/parallelism  
						
						... 
						
						
						
						Ensure `OCR_THREADS` is integer, add documentation 
						
						
					 
					
						2016-02-14 15:40:20 +00:00 
						 
				 
			
				
					
						
							
							
								Pit Kleyersburg 
							
						 
					 
					
						
						
							
						
						20b2408dbb 
					 
					
						
						
							
							Ensure OCR_THREADS is integer, add documentation  
						
						
						
						
					 
					
						2016-02-14 16:37:38 +01:00 
						 
				 
			
				
					
						
							
							
								Daniel Quinn 
							
						 
					 
					
						
						
							
						
						88acf50fe0 
					 
					
						
						
							
							Merge pull request  #31  from pitkley/feature/paralellism  
						
						... 
						
						
						
						This is great.  It seriously sped up the OCR time. 
						
						
					 
					
						2016-02-14 15:29:05 +00:00 
						 
				 
			
				
					
						
							
							
								Pit Kleyersburg 
							
						 
					 
					
						
						
							
						
						f5beda9c56 
					 
					
						
						
							
							Enable parallel OCR processing  
						
						... 
						
						
						
						At the moment, every page in a PDF will be processed one by one using
tesseract. Since the processing of a single page is independent from every
other page, one can make use of multi-core machines.
This PR introduces a multiprocessing pool to process multiple pages
simultaneously. The amount of threads to use can be specified in the
environment variable `PAPERLESS_OCR_THREADS`. This will default to the
number of cores/hyperthreads Python detects for your system. 
						
						
					 
					
						2016-02-14 15:57:42 +01:00 
						 
				 
			
				
					
						
							
							
								Daniel Quinn 
							
						 
					 
					
						
						
							
						
						6b0a537bff 
					 
					
						
						
							
							Added support for a shared secret in email  
						
						
						
						
					 
					
						2016-02-14 03:01:24 +00:00 
						 
				 
			
				
					
						
							
							
								Daniel Quinn 
							
						 
					 
					
						
						
							
						
						3b5d4cdd39 
					 
					
						
						
							
							Added some error handling  
						
						
						
						
					 
					
						2016-02-14 01:32:25 +00:00 
						 
				 
			
				
					
						
							
							
								Daniel Quinn 
							
						 
					 
					
						
						
							
						
						fc5d89c6fc 
					 
					
						
						
							
							Added a default algorithm  
						
						
						
						
					 
					
						2016-02-14 01:30:41 +00:00 
						 
				 
			
				
					
						
							
							
								Daniel Quinn 
							
						 
					 
					
						
						
							
						
						d9b7851de9 
					 
					
						
						
							
							Added a default algorithm  
						
						
						
						
					 
					
						2016-02-14 01:30:18 +00:00 
						 
				 
			
				
					
						
							
							
								Daniel Quinn 
							
						 
					 
					
						
						
							
						
						cec9968cdb 
					 
					
						
						
							
							Documented consumption  
						
						
						
						
					 
					
						2016-02-14 00:10:49 +00:00 
						 
				 
			
				
					
						
							
							
								Daniel Quinn 
							
						 
					 
					
						
						
							
						
						330dfa544b 
					 
					
						
						
							
							Fixed a typo in the description. There's no need for a new migration here.  
						
						
						
						
					 
					
						2016-02-14 00:10:37 +00:00 
						 
				 
			
				
					
						
							
							
								Daniel Quinn 
							
						 
					 
					
						
						
							
						
						294f104474 
					 
					
						
						
							
							Merge branch 'master' into feature/images-as-docs  
						
						
						
						
					 
					
						2016-02-13 01:01:10 +00:00 
						 
				 
			
				
					
						
							
							
								Daniel Quinn 
							
						 
					 
					
						
						
							
						
						68fa7d68fa 
					 
					
						
						
							
							Merge branch 'master' of github.com:danielquinn/paperless  
						
						
						
						
					 
					
						2016-02-13 00:59:36 +00:00 
						 
				 
			
				
					
						
							
							
								Daniel Quinn 
							
						 
					 
					
						
						
							
						
						2ed2d641b5 
					 
					
						
						
							
							Added a note about the plight of Apple users.  
						
						
						
						
					 
					
						2016-02-13 00:59:19 +00:00 
						 
				 
			
				
					
						
							
							
								Daniel Quinn 
							
						 
					 
					
						
						
							
						
						a846b3f7b8 
					 
					
						
						
							
							Adding some more debugging  
						
						
						
						
					 
					
						2016-02-13 00:57:05 +00:00 
						 
				 
			
				
					
						
							
							
								Daniel Quinn 
							
						 
					 
					
						
						
							
						
						b7859a0ff3 
					 
					
						
						
							
							Merge pull request  #26  from wttw/master  
						
						... 
						
						
						
						Document cloning from public URL rather than ssh 
						
						
					 
					
						2016-02-12 20:30:07 +00:00 
						 
				 
			
				
					
						
							
							
								Steve Atkins 
							
						 
					 
					
						
						
							
						
						a4903049a3 
					 
					
						
						
							
							Document cloning from public URL rather than ssh  
						
						
						
						
					 
					
						2016-02-12 11:36:07 -08:00 
						 
				 
			
				
					
						
							
							
								Daniel Quinn 
							
						 
					 
					
						
						
							
						
						9ed8a2b2d7 
					 
					
						
						
							
							Merge branch 'master' into feature/images-as-docs  
						
						
						
						
					 
					
						2016-02-12 09:03:46 +00:00 
						 
				 
			
				
					
						
							
							
								Daniel Quinn 
							
						 
					 
					
						
						
							
						
						1d4b87ee46 
					 
					
						
						
							
							Update for  #22  
						
						
						
						
					 
					
						2016-02-12 08:54:04 +00:00 
						 
				 
			
				
					
						
							
							
								Daniel Quinn 
							
						 
					 
					
						
						
							
						
						840472071c 
					 
					
						
						
							
							Added the required verbosity reference  
						
						
						
						
					 
					
						2016-02-12 08:27:28 +00:00 
						 
				 
			
				
					
						
							
							
								Daniel Quinn 
							
						 
					 
					
						
						
							
						
						2421f559be 
					 
					
						
						
							
							Simpler regex  
						
						
						
						
					 
					
						2016-02-12 08:27:09 +00:00 
						 
				 
			
				
					
						
							
							
								Daniel Quinn 
							
						 
					 
					
						
						
							
						
						a022fcb8f1 
					 
					
						
						
							
							Fixed the auto-naming regexes  
						
						
						
						
					 
					
						2016-02-11 22:05:55 +00:00 
						 
				 
			
				
					
						
							
							
								Daniel Quinn 
							
						 
					 
					
						
						
							
						
						7aadab23cc 
					 
					
						
						
							
							Added the Renderable mixin because DRY  
						
						
						
						
					 
					
						2016-02-11 22:05:38 +00:00 
						 
				 
			
				
					
						
							
							
								Daniel Quinn 
							
						 
					 
					
						
						
							
						
						ef1639208c 
					 
					
						
						
							
							Tests for the consumer  
						
						
						
						
					 
					
						2016-02-11 12:25:23 +00:00 
						 
				 
			
				
					
						
							
							
								Daniel Quinn 
							
						 
					 
					
						
						
							
						
						cef4abc01d 
					 
					
						
						
							
							version bump  
						
						
						
						
					 
					
						2016-02-11 12:25:12 +00:00 
						 
				 
			
				
					
						
							
							
								Daniel Quinn 
							
						 
					 
					
						
						
							
						
						78ee138ad7 
					 
					
						
						
							
							Added migration and changelog updates  
						
						
						
						
					 
					
						2016-02-11 12:25:00 +00:00 
						 
				 
			
				
					
						
							
							
								Daniel Quinn 
							
						 
					 
					
						
						
							
						
						c423a13f85 
					 
					
						
						
							
							Added a simple re-tagger  
						
						
						
						
					 
					
						2016-02-11 12:24:18 +00:00 
						 
				 
			
				
					
						
							
							
								Daniel Quinn 
							
						 
					 
					
						
						
							
						
						39134b517e 
					 
					
						
						
							
							Cleaned up file_name()  
						
						
						
						
					 
					
						2016-02-10 23:53:48 +00:00 
						 
				 
			
				
					
						
							
							
								Daniel Quinn 
							
						 
					 
					
						
						
							
						
						a892abc701 
					 
					
						
						
							
							Added dateutil  
						
						
						
						
					 
					
						2016-02-10 23:50:58 +00:00 
						 
				 
			
				
					
						
							
							
								Daniel Quinn 
							
						 
					 
					
						
						
							
						
						4a078dcfbc 
					 
					
						
						
							
							Merge branch 'master' into feature/images-as-docs  
						
						
						
						
					 
					
						2016-02-09 17:20:45 +00:00 
						 
				 
			
				
					
						
							
							
								Daniel Quinn 
							
						 
					 
					
						
						
							
						
						642b2f7ee3 
					 
					
						
						
							
							Merge pull request  #18  from mrwacky42/master  
						
						... 
						
						
						
						Add other prerequisites for Vagrant 
						
						
					 
					
						2016-02-09 09:41:53 +00:00 
						 
				 
			
				
					
						
							
							
								Sharif Nassar 
							
						 
					 
					
						
						
							
						
						6115b2f03d 
					 
					
						
						
							
							Add other prerequisites  
						
						... 
						
						
						
						Vagrant setup didn't work for me unless I manually installed tesseract and ImageMagick. 
						
						
					 
					
						2016-02-09 01:07:48 -08:00 
						 
				 
			
				
					
						
							
							
								Daniel Quinn 
							
						 
					 
					
						
						
							
						
						0eaed36420 
					 
					
						
						
							
							The 'API' is written but untested  
						
						
						
						
					 
					
						2016-02-08 23:46:16 +00:00 
						 
				 
			
				
					
						
							
							
								Daniel Quinn 
							
						 
					 
					
						
						
							
						
						212752f46e 
					 
					
						
						
							
							Fixt the tags to be optional  
						
						
						
						
					 
					
						2016-02-08 17:28:59 +00:00