| 
							
							
								 shamoon | e14f4c94c2 | Fix: ghostscript rendering error doesnt trigger frontend failure message (#4092) * Raise ParseError from gs rendering error
* catch all parser errors as generic exception
* Differentiate generic vs parse errors during consumption | 2023-08-31 19:49:00 -07:00 |  | 
			
				
					| 
							
							
								 Trenton H | 7e768bfe23 | When PDF/A rendering fails, add a warning the user may want to allow it to continue | 2023-08-28 18:10:11 -07:00 |  | 
			
				
					| 
							
							
								 Dennis Brakhane | 93009c1eed | Don't consider better OCR as failing Tesseract 5.3.0 does a better job at OCR, and correctly
reads "a webp" instead of "awebp", this is good, so we
don't want the test to fail. | 2023-07-11 16:44:18 +02:00 |  | 
			
				
					| 
							
							
								 Trenton H | 70f3f98363 | Let ruff autofix some things from the newest version | 2023-06-13 20:15:18 -07:00 |  | 
			
				
					| 
							
							
								 Trenton H | 452c79f9a1 | Improves the logging mixin and allows it to be typed better | 2023-05-23 17:16:39 -07:00 |  | 
			
				
					| 
							
							
								 Trenton H | 111960c530 | Adds better handling for files with invalid utf8 content | 2023-05-13 09:29:18 -07:00 |  | 
			
				
					| 
							
							
								 Trenton H | 6f163111ce | Upgrades black to v23, upgrades ruff | 2023-04-26 09:35:27 -07:00 |  | 
			
				
					| 
							
							
								 Trenton H | 3bcbd05252 | Fixes ruff not running isort against the codebase | 2023-04-26 09:35:27 -07:00 |  | 
			
				
					| 
							
							
								 Trenton H | ce41ac9158 | Configures ruff as the one stop linter and resolves warnings it raised | 2023-04-01 17:03:52 -07:00 |  | 
			
				
					| 
							
							
								 Brandon Rothweiler | ca412e0184 | Add PAPERLESS_OCR_SKIP_ARCHIVE_FILE config setting | 2023-02-23 22:42:57 -05:00 |  | 
			
				
					| 
							
							
								 Brandon Rothweiler | 8a89f5ae27 | Revert "Merge pull request #2732 from bdr99/skip_neverarchive" This reverts commit 77b23d3acb, reversing
changes made to5d8aa27831. | 2023-02-23 21:26:53 -05:00 |  | 
			
				
					| 
							
							
								 Brandon Rothweiler | 93a6391f96 | Add a setting to disable creating an archive file | 2023-02-22 15:27:17 -05:00 |  | 
			
				
					| 
							
							
								 Trenton Holmes | 0df91c31f1 | Creates a mix-in for asserting file system states | 2023-02-20 10:25:21 -08:00 |  | 
			
				
					| 
							
							
								 Trenton H | bdcba570cb | Adding more test coverage, in particular around Tika and its parser | 2023-02-05 11:01:55 -08:00 |  | 
			
				
					| 
							
							
								 shamoon | 985f298c46 | Merge pull request #2302 from paperless-ngx/feature-fix-display-rtl-content | 2023-01-10 07:30:52 -08:00 |  | 
			
				
					| 
							
							
								 Trenton H | d7939ca958 | Fixes some sample test files showing as modified after running tests | 2023-01-05 08:39:48 -08:00 |  | 
			
				
					| 
							
							
								 Trenton H | 1e4923835b | Small tweak to use the existing tempdir instead of a new one | 2023-01-03 13:05:44 -08:00 |  | 
			
				
					| 
							
							
								 Trenton Holmes | 7be9ae9c02 | Try a new way of extracting text from a given PDF file | 2023-01-03 12:43:31 -08:00 |  | 
			
				
					| 
							
							
								 Trenton H | 0fd51e35e1 | Adds testing coverage of multipage TIFF with alpha, without and with alpha/sRGB | 2023-01-03 09:56:19 -08:00 |  | 
			
				
					| 
							
							
								 Trenton H | 59e0c1fe4e | Let convert handle the removal of the alpha channel | 2023-01-03 09:56:19 -08:00 |  | 
			
				
					| 
							
							
								 Trenton Holmes | 26c7fad005 | If extracting text from a fallback file (ie forced), allow the text to be used | 2023-01-01 09:57:15 -08:00 |  | 
			
				
					| 
							
							
								 Trenton H | a2b7687c3b | In the case of an RTL language being extracted via pdfminer.six, fall back to forced OCR, which handles RTL text better | 2022-12-29 16:02:02 -08:00 |  | 
			
				
					| 
							
							
								 Trenton Holmes | 55ef0d4a1b | Fixes language code checks around two part languages | 2022-12-04 12:23:12 -08:00 |  | 
			
				
					| 
							
							
								 shamoon | 5d3a6e230d | Merge pull request #2057 from paperless-ngx/fix/2044-lang-code-diffs Bugfix: Some tesseract languages aren't detected as installed. | 2022-11-28 11:04:44 -08:00 |  | 
			
				
					| 
							
							
								 Trenton H | e96d65f945 | Allows parsing of WebP format images | 2022-11-28 09:35:54 -08:00 |  | 
			
				
					| 
							
							
								 Trenton Holmes | f0497e7744 | Fixes how a language code like chi-sim is treated in the checks | 2022-11-27 08:28:22 -08:00 |  | 
			
				
					| 
							
							
								 Trenton H | f015556562 | Adds a test to cover this edge case | 2022-11-22 07:22:41 -08:00 |  | 
			
				
					| 
							
							
								 Trenton H | b897d6de2e | Don't use the sidecar file when redoing the OCR, it only contains new text | 2022-11-22 07:22:41 -08:00 |  | 
			
				
					| 
							
							
								 Trenton Holmes | d1aa08850d | Reverts the change around skip_noarchive to align with how it is documented to work | 2022-10-20 13:34:41 -07:00 |  | 
			
				
					| 
							
							
								 Trenton Holmes | b3b2519bf0 | Fixes the creation of an archive file, even if noarchive was specified | 2022-08-20 13:47:56 -07:00 |  | 
			
				
					| 
							
							
								 Trenton Holmes | b70e21a6d5 | When raising an exception during exception handling, chain them together for slightly cleaner logs | 2022-08-03 09:00:56 -07:00 |  | 
			
				
					| 
							
							
								 Trenton Holmes | 49a843dcdd | Changes the simple-alpha parsing test to use a tempdir so the original isn't modified in Git | 2022-07-02 16:19:22 +02:00 |  | 
			
				
					| 
							
							
								 Trenton Holmes | fc26fe0ac0 | Updates to provide the user provided max pixel size to ocrmypdf | 2022-05-22 16:56:08 -07:00 |  | 
			
				
					| 
							
							
								 Trenton Holmes | 3003bdd507 | Runs pyupgrade to Python 3.8+ and adds a hook for it | 2022-05-06 09:04:08 -07:00 |  | 
			
				
					| 
							
							
								 Henning Häcker | 3b4da70c85 | extract OCR_MAX_IMAGE_PIXELS into settings.py | 2022-03-30 09:23:45 +02:00 |  | 
			
				
					| 
							
							
								 Henning Häcker | 95199bd325 | formatting according to black | 2022-03-30 09:23:45 +02:00 |  | 
			
				
					| 
							
							
								 Henning Häcker | a8887b211e | implement PAPERLESS_OCR_MAX_IMAGE_PIXELS | 2022-03-30 09:23:45 +02:00 |  | 
			
				
					| 
							
							
								 Trenton Holmes | 1771d18a21 | Runs the pre-commit hooks over all the Python files | 2022-03-11 11:34:28 -08:00 |  | 
			
				
					| 
							
							
								 Trenton Holmes | 85b210ebf6 | Reduces number of warnings from testing from 165 to 128.  In doing so, fixes a few minor things in the decrypt and export commands | 2022-03-10 18:12:48 -08:00 |  | 
			
				
					| 
							
							
								 kpj | fc695896dd | Format Python code with black | 2022-02-27 15:26:41 +01:00 |  | 
			
				
					| 
							
							
								 Martin Müller | 1e288100a9 | Remove unneded exception handler from has_alpha() | 2022-02-21 22:58:19 +01:00 |  | 
			
				
					| 
							
							
								 Martin Müller | 73a8569d21 | Modify test for PNG image with alpha | 2022-02-21 22:38:25 +01:00 |  | 
			
				
					| 
							
							
								 Martin Müller | 2a47b3f1a1 | Fix code style (line too long) | 2022-02-21 22:34:34 +01:00 |  | 
			
				
					| 
							
							
								 Martin Müller | 41494ee689 | Remove alpha layer from PNG files for img2pdf Fixes issue #1254 | 2022-02-21 22:06:43 +01:00 |  | 
			
				
					| 
							
							
								 jonaswinkler | 23c6f849d6 | fix bug with DPI calculation | 2021-08-18 18:33:33 +02:00 |  | 
			
				
					| 
							
							
								 jonaswinkler | 1f707e86cc | fix logging getting spammed with pdfminer warnings on JPG files | 2021-06-13 12:09:16 +02:00 |  | 
			
				
					| 
							
							
								 jonaswinkler | 814d90745b | Workaround for all PDFminer.six issues. | 2021-05-15 12:15:32 +02:00 |  | 
			
				
					| 
							
							
								 jonaswinkler | 0e596bd1fc | also apply \0 removal to sidecar contents | 2021-03-22 23:08:34 +01:00 |  | 
			
				
					| 
							
							
								 jonaswinkler | fda2bfbea7 | better exception logging | 2021-03-22 23:00:15 +01:00 |  | 
			
				
					| 
							
							
								 jonaswinkler | d26c46e034 | fixes #794 | 2021-03-22 22:46:35 +01:00 |  |