jonaswinkler 
							
						 
					 
					
						
						
							
						
						12fa844c7f 
					 
					
						
						
							
							testing the new noarchive option.  
						
						
						
						
					 
					
						2020-12-01 14:30:13 +01:00 
						 
				 
			
				
					
						
							
							
								jonaswinkler 
							
						 
					 
					
						
						
							
						
						fd3df1ec58 
					 
					
						
						
							
							some more tests.  
						
						
						
						
					 
					
						2020-12-01 14:15:43 +01:00 
						 
				 
			
				
					
						
							
							
								jonaswinkler 
							
						 
					 
					
						
						
							
						
						aaa6599283 
					 
					
						
						
							
							Merge branch 'dev' into feature-ocrmypdf  
						
						
						
						
					 
					
						2020-11-30 16:48:09 +01:00 
						 
				 
			
				
					
						
							
							
								jonaswinkler 
							
						 
					 
					
						
						
							
						
						f51207fc32 
					 
					
						
						
							
							added file type checks to the parsers to prevent temporary files from being consumed. Also: parsers announce file types they wish to use as default for each mime type.  
						
						
						
						
					 
					
						2020-11-30 00:40:04 +01:00 
						 
				 
			
				
					
						
							
							
								jonaswinkler 
							
						 
					 
					
						
						
							
						
						ac1b701000 
					 
					
						
						
							
							more tests!  
						
						
						
						
					 
					
						2020-11-29 19:58:48 +01:00 
						 
				 
			
				
					
						
							
							
								jonaswinkler 
							
						 
					 
					
						
						
							
						
						fca98b411e 
					 
					
						
						
							
							reorganised settings documentation and added OCR_USER_ARGS  
						
						
						
						
					 
					
						2020-11-29 12:38:32 +01:00 
						 
				 
			
				
					
						
							
							
								jonaswinkler 
							
						 
					 
					
						
						
							
						
						0565118a01 
					 
					
						
						
							
							fixed checking the installed languages.  
						
						
						
						
					 
					
						2020-11-29 12:31:42 +01:00 
						 
				 
			
				
					
						
							
							
								jonaswinkler 
							
						 
					 
					
						
						
							
						
						06cfc3113a 
					 
					
						
						
							
							test case fixes.  
						
						
						
						
					 
					
						2020-11-27 14:06:37 +01:00 
						 
				 
			
				
					
						
							
							
								Jonas Winkler 
							
						 
					 
					
						
						
							
						
						e87575240d 
					 
					
						
						
							
							more tests of the new parser  
						
						
						
						
					 
					
						2020-11-26 00:08:23 +01:00 
						 
				 
			
				
					
						
							
							
								Jonas Winkler 
							
						 
					 
					
						
						
							
						
						f51d2be303 
					 
					
						
						
							
							fixed the test cases  
						
						
						
						
					 
					
						2020-11-25 19:51:09 +01:00 
						 
				 
			
				
					
						
							
							
								Jonas Winkler 
							
						 
					 
					
						
						
							
						
						a60a4babf6 
					 
					
						
						
							
							OMP_THREAD_LIMIT  
						
						
						
						
					 
					
						2020-11-25 19:37:59 +01:00 
						 
				 
			
				
					
						
							
							
								Jonas Winkler 
							
						 
					 
					
						
						
							
						
						a03315102a 
					 
					
						
						
							
							added image DPI detection to the tesseract parser.  
						
						
						
						
					 
					
						2020-11-25 19:37:48 +01:00 
						 
				 
			
				
					
						
							
							
								Jonas Winkler 
							
						 
					 
					
						
						
							
						
						df801d17e1 
					 
					
						
						
							
							reworked the interface of the parsers.  
						
						
						
						
					 
					
						2020-11-25 19:36:39 +01:00 
						 
				 
			
				
					
						
							
							
								Jonas Winkler 
							
						 
					 
					
						
						
							
						
						b269af7572 
					 
					
						
						
							
							Merge branch 'dev' into feature-ocrmypdf  
						
						
						
						
					 
					
						2020-11-25 16:58:20 +01:00 
						 
				 
			
				
					
						
							
							
								Jonas Winkler 
							
						 
					 
					
						
						
							
						
						d92214d412 
					 
					
						
						
							
							codestyle  
						
						
						
						
					 
					
						2020-11-25 16:05:52 +01:00 
						 
				 
			
				
					
						
							
							
								Jonas Winkler 
							
						 
					 
					
						
						
							
						
						56ce267f89 
					 
					
						
						
							
							removed obsolete tests.  
						
						
						
						
					 
					
						2020-11-25 14:51:32 +01:00 
						 
				 
			
				
					
						
							
							
								Jonas Winkler 
							
						 
					 
					
						
						
							
						
						2d559d330d 
					 
					
						
						
							
							reworked PDF parser that uses OCRmyPDF and produces archive files.  
						
						
						
						
					 
					
						2020-11-25 14:50:43 +01:00 
						 
				 
			
				
					
						
							
							
								Jonas Winkler 
							
						 
					 
					
						
						
							
						
						dd83364326 
					 
					
						
						
							
							default language check  
						
						
						
						
					 
					
						2020-11-25 10:52:38 +01:00 
						 
				 
			
				
					
						
							
							
								Jonas Winkler 
							
						 
					 
					
						
						
							
						
						fec9e54049 
					 
					
						
						
							
							new setting: PAPERLESS_OCR_PAGES  
						
						
						
						
					 
					
						2020-11-22 12:54:08 +01:00 
						 
				 
			
				
					
						
							
							
								Jonas Winkler 
							
						 
					 
					
						
						
							
						
						450fb877f6 
					 
					
						
						
							
							code cleanup  
						
						
						
						
					 
					
						2020-11-21 15:34:00 +01:00 
						 
				 
			
				
					
						
							
							
								Jonas Winkler 
							
						 
					 
					
						
						
							
						
						b44f8383e4 
					 
					
						
						
							
							code cleanup  
						
						
						
						
					 
					
						2020-11-21 14:03:45 +01:00 
						 
				 
			
				
					
						
							
							
								Jonas Winkler 
							
						 
					 
					
						
						
							
						
						41650f20f4 
					 
					
						
						
							
							mime type handling  
						
						
						
						
					 
					
						2020-11-20 13:31:03 +01:00 
						 
				 
			
				
					
						
							
							
								Jonas Winkler 
							
						 
					 
					
						
						
							
						
						1655d85a53 
					 
					
						
						
							
							testing the tesseract parser  
						
						
						
						
					 
					
						2020-11-19 20:31:08 +01:00 
						 
				 
			
				
					
						
							
							
								Jonas Winkler 
							
						 
					 
					
						
						
							
						
						8908bc259e 
					 
					
						
						
							
							updated logging, logging for the mail consumer to see whats happening  
						
						
						
						
					 
					
						2020-11-18 13:23:30 +01:00 
						 
				 
			
				
					
						
							
							
								Jonas Winkler 
							
						 
					 
					
						
						
							
						
						d2e22e3f27 
					 
					
						
						
							
							Changed the way parsers are discovered. This also prepares for upcoming changes regarding content types and file types: parsers should declare what they support, and actual file extensions should not be hardcoded everywhere.  
						
						
						
						
					 
					
						2020-11-16 23:53:12 +01:00 
						 
				 
			
				
					
						
							
							
								Jonas Winkler 
							
						 
					 
					
						
						
							
						
						8dca459573 
					 
					
						
						
							
							first version of the new consumer.  
						
						
						
						
					 
					
						2020-11-16 18:26:54 +01:00 
						 
				 
			
				
					
						
							
							
								Jonas Winkler 
							
						 
					 
					
						
						
							
						
						2e04ba1c04 
					 
					
						
						
							
							code style fixes  
						
						
						
						
					 
					
						2020-11-12 21:09:45 +01:00 
						 
				 
			
				
					
						
							
							
								Jonas Winkler 
							
						 
					 
					
						
						
							
						
						f182709fdd 
					 
					
						
						
							
							fixed most of the tests  
						
						
						
						
					 
					
						2020-11-02 19:42:23 +01:00 
						 
				 
			
				
					
						
							
							
								Jonas Winkler 
							
						 
					 
					
						
						
							
						
						3a08a2d206 
					 
					
						
						
							
							made unpaper and convert a little bit nicer to interact with  
						
						
						
						
					 
					
						2020-11-02 19:31:04 +01:00 
						 
				 
			
				
					
						
							
							
								Jonas Winkler 
							
						 
					 
					
						
						
							
						
						7d282a4e4e 
					 
					
						
						
							
							removed unused code, small fixes  
						
						
						
						
					 
					
						2020-11-02 18:20:04 +01:00 
						 
				 
			
				
					
						
							
							
								Jonas Winkler 
							
						 
					 
					
						
						
							
						
						d15405ef56 
					 
					
						
						
							
							reworked most of the tesseract parser, better logging  
						
						
						
						
					 
					
						2020-11-02 15:40:44 +01:00 
						 
				 
			
				
					
						
							
							
								Jonas Winkler 
							
						 
					 
					
						
						
							
						
						06ad212320 
					 
					
						
						
							
							bugfix  
						
						
						
						
					 
					
						2020-11-02 01:26:42 +01:00 
						 
				 
			
				
					
						
							
							
								Jonas Winkler 
							
						 
					 
					
						
						
							
						
						9f55fb668d 
					 
					
						
						
							
							silenced unpaper, optipng for cleaner output  
						
						... 
						
						
						
						moved parser settings to settings
removed forgiving ocr (now default) since tesseract is plenty accurate even without defining the correct language. 
						
						
					 
					
						2020-11-01 23:23:42 +01:00 
						 
				 
			
				
					
						
							
							
								Jonas Winkler 
							
						 
					 
					
						
						
							
						
						743ce1dc14 
					 
					
						
						
							
							better thumbnail generation for smaller files  
						
						
						
						
					 
					
						2020-10-26 01:05:23 +01:00 
						 
				 
			
				
					
						
							
							
								Johannes Wienke 
							
						 
					 
					
						
						
							
						
						a311cd498c 
					 
					
						
						
							
							Handle dateparser ValueErrors  
						
						... 
						
						
						
						When parsing dates from the document text or filenames, correctly handle values
errors indicating broken dates. Newly added tests ensure that this handling
works properly. 
						
						
					 
					
						2020-03-08 18:44:15 +01:00 
						 
				 
			
				
					
						
							
							
								Johannes Wienke 
							
						 
					 
					
						
						
							
						
						a3aab0cb48 
					 
					
						
						
							
							Remove duplicated date parsing test  
						
						... 
						
						
						
						The exact same tests existed twice in the file. 
						
						
					 
					
						2020-03-08 18:26:29 +01:00 
						 
				 
			
				
					
						
							
							
								Stéphane Brunner 
							
						 
					 
					
						
						
							
						
						daca77cc1b 
					 
					
						
						
							
							Strip the thumbnails  
						
						
						
						
					 
					
						2019-03-17 16:37:47 +01:00 
						 
				 
			
				
					
						
							
							
								jenspfeifle 
							
						 
					 
					
						
						
							
						
						336f747f16 
					 
					
						
						
							
							make pycodestyle happy  
						
						
						
						
					 
					
						2019-03-03 20:41:17 +01:00 
						 
				 
			
				
					
						
							
							
								JensPfeifle 
							
						 
					 
					
						
						
							
						
						29b0886950 
					 
					
						
						
							
							try to run convert, but fall back on gs if needed  
						
						
						
						
					 
					
						2019-03-03 20:31:52 +01:00 
						 
				 
			
				
					
						
							
							
								JensPfeifle 
							
						 
					 
					
						
						
							
						
						ea282c22ba 
					 
					
						
						
							
							Add GS_BINARY to settings to avoid harcoded call of "gs"  
						
						
						
						
					 
					
						2019-03-03 20:31:52 +01:00 
						 
				 
			
				
					
						
							
							
								Pit 
							
						 
					 
					
						
						
							
						
						cbf008f37b 
					 
					
						
						
							
							Fix quoting in call to run_convert  
						
						... 
						
						
						
						Co-Authored-By: JensPfeifle <jens@pfeifle.tech > 
						
						
					 
					
						2019-03-03 20:31:52 +01:00 
						 
				 
			
				
					
						
							
							
								JensPfeifle 
							
						 
					 
					
						
						
							
						
						50504c3fd8 
					 
					
						
						
							
							remove unnecessary env arg in Popen  
						
						
						
						
					 
					
						2019-03-03 20:31:52 +01:00 
						 
				 
			
				
					
						
							
							
								Jens Pfeifle 
							
						 
					 
					
						
						
							
						
						0220199766 
					 
					
						
						
							
							fix parse error of some documents by using gs  
						
						
						
						
					 
					
						2019-03-03 20:31:52 +01:00 
						 
				 
			
				
					
						
							
							
								Daniel Quinn 
							
						 
					 
					
						
						
							
						
						637b0d4cc2 
					 
					
						
						
							
							Drop problematic tests  
						
						... 
						
						
						
						Some tests had differing outcomes depending on the version of Tesseract
installed on the test system.  This lead to a bunch of false test
failures, which lead to people (including me) just ignoring the Travis
results.
This commit removes those tests, and while it reduces our coverage, at
least the results are predictable. 
						
						
					 
					
						2018-12-30 17:32:45 +00:00 
						 
				 
			
				
					
						
							
							
								Daniel Quinn 
							
						 
					 
					
						
						
							
						
						27af2603f5 
					 
					
						
						
							
							Use modern languages for sample test files  
						
						
						
						
					 
					
						2018-12-30 14:09:17 +00:00 
						 
				 
			
				
					
						
							
							
								Erik Arvstedt 
							
						 
					 
					
						
						
							
						
						a19f0ef97e 
					 
					
						
						
							
							Fix date test sample image  
						
						... 
						
						
						
						The previous version of `tests_date_3.png` had too much spacing
between the `0` and the `8` glyphs, which resulted in the year getting
parsed as `200 8` in Tesseract 3.05.00 (+ tessdata 3.04.00).
This caused the date parsing test to fail. 
						
						
					 
					
						2018-12-02 15:10:21 +01:00 
						 
				 
			
				
					
						
							
							
								Daniel Quinn 
							
						 
					 
					
						
						
							
						
						d544f269e0 
					 
					
						
						
							
							Conform everything to the coding standards  
						
						... 
						
						
						
						https://paperless.readthedocs.io/en/latest/contributing.html#additional-style-guides  
					
						2018-12-01 17:09:12 +00:00 
						 
				 
			
				
					
						
							
							
								Daniel Quinn 
							
						 
					 
					
						
						
							
						
						650db75c2b 
					 
					
						
						
							
							Merge branch 'ENH_filename_date_parsing' of  https://github.com/jat255/paperless  into jat255-ENH_filename_date_parsing  
						
						
						
						
					 
					
						2018-12-01 16:57:16 +00:00 
						 
				 
			
				
					
						
							
							
								Daniel Quinn 
							
						 
					 
					
						
						
							
						
						c1d18c1e83 
					 
					
						
						
							
							Fix language guesses in tests  
						
						... 
						
						
						
						It turns out that the Lorem ipsum text in the sample files was confuing the language guesser, causing it to think the file was in Catalan and not English or German. 
						
						
					 
					
						2018-12-01 15:55:59 +00:00 
						 
				 
			
				
					
						
							
							
								Joshua Taillon 
							
						 
					 
					
						
						
							
						
						730daa3d6d 
					 
					
						
						
							
							Merge branch 'master' of github.com:danielquinn/paperless into ENH_filename_date_parsing  
						
						
						
						
					 
					
						2018-11-15 23:17:59 -05:00