paperless-ngx

mirror of https://github.com/paperless-ngx/paperless-ngx.git synced 2026-02-28 01:19:36 -06:00

Author	SHA1	Message	Date
shamoon	e1d52f4884	Merge pull request #2302 from paperless-ngx/feature-fix-display-rtl-content	2023-01-10 07:30:52 -08:00
Trenton H	b91217064b	Fixes some sample test files showing as modified after running tests	2023-01-05 08:39:48 -08:00
Trenton H	cd42d17ffb	Small tweak to use the existing tempdir instead of a new one	2023-01-03 13:05:44 -08:00
Trenton Holmes	a185f94c4b	Try a new way of extracting text from a given PDF file	2023-01-03 12:43:31 -08:00
Trenton H	fb20c92c51	Adds testing coverage of multipage TIFF with alpha, without and with alpha/sRGB	2023-01-03 09:56:19 -08:00
Trenton H	911d3cb567	Let convert handle the removal of the alpha channel	2023-01-03 09:56:19 -08:00
Trenton Holmes	22620caf6e	If extracting text from a fallback file (ie forced), allow the text to be used	2023-01-01 09:57:15 -08:00
Trenton H	79aecebbd2	In the case of an RTL language being extracted via pdfminer.six, fall back to forced OCR, which handles RTL text better	2022-12-29 16:02:02 -08:00
Trenton Holmes	c83d2da67e	Fixes language code checks around two part languages	2022-12-04 12:23:12 -08:00
shamoon	7edf178019	Merge pull request #2057 from paperless-ngx/fix/2044-lang-code-diffs Bugfix: Some tesseract languages aren't detected as installed.	2022-11-28 11:04:44 -08:00
Trenton H	68c62f3857	Allows parsing of WebP format images	2022-11-28 09:35:54 -08:00
Trenton Holmes	90f3266900	Fixes how a language code like chi-sim is treated in the checks	2022-11-27 08:28:22 -08:00
Trenton H	ffd9cd721d	Adds a test to cover this edge case	2022-11-22 07:22:41 -08:00
Trenton H	be8fa418bb	Don't use the sidecar file when redoing the OCR, it only contains new text	2022-11-22 07:22:41 -08:00
Trenton Holmes	1be8f39aa0	Reverts the change around skip_noarchive to align with how it is documented to work	2022-10-20 13:34:41 -07:00
Trenton Holmes	43d2545321	Fixes the creation of an archive file, even if noarchive was specified	2022-08-20 13:47:56 -07:00
Trenton Holmes	024fd8bc9b	When raising an exception during exception handling, chain them together for slightly cleaner logs	2022-08-03 09:00:56 -07:00
Trenton Holmes	8660103563	Changes the simple-alpha parsing test to use a tempdir so the original isn't modified in Git	2022-07-02 16:19:22 +02:00
Trenton Holmes	95bbf47995	Updates to provide the user provided max pixel size to ocrmypdf	2022-05-22 16:56:08 -07:00
Trenton Holmes	f62193099c	Runs pyupgrade to Python 3.8+ and adds a hook for it	2022-05-06 09:04:08 -07:00
Henning Häcker	f4a0d8c040	extract OCR_MAX_IMAGE_PIXELS into settings.py	2022-03-30 09:23:45 +02:00
Henning Häcker	6bc2cb0607	formatting according to black	2022-03-30 09:23:45 +02:00
Henning Häcker	2c1f0cd3ee	implement PAPERLESS_OCR_MAX_IMAGE_PIXELS	2022-03-30 09:23:45 +02:00
Trenton Holmes	6635fa5f0d	Runs the pre-commit hooks over all the Python files	2022-03-11 11:34:28 -08:00
Trenton Holmes	55486ac151	Reduces number of warnings from testing from 165 to 128. In doing so, fixes a few minor things in the decrypt and export commands	2022-03-10 18:12:48 -08:00
kpj	c56cb25b5f	Format Python code with black	2022-02-27 15:26:41 +01:00
Martin Müller	5fa3ec6704	Remove unneded exception handler from has_alpha()	2022-02-21 22:58:19 +01:00
Martin Müller	a662ce03ea	Modify test for PNG image with alpha	2022-02-21 22:38:25 +01:00
Martin Müller	b0afdc4841	Fix code style (line too long)	2022-02-21 22:34:34 +01:00
Martin Müller	01310b9742	Remove alpha layer from PNG files for img2pdf Fixes issue #1254	2022-02-21 22:06:43 +01:00
jonaswinkler	95abc7d6d7	fix bug with DPI calculation	2021-08-18 18:33:33 +02:00
jonaswinkler	1402f11dc8	fix logging getting spammed with pdfminer warnings on JPG files	2021-06-13 12:09:16 +02:00
jonaswinkler	271f9001dd	Workaround for all PDFminer.six issues.	2021-05-15 12:15:32 +02:00
jonaswinkler	c9d76322eb	also apply \0 removal to sidecar contents	2021-03-22 23:08:34 +01:00
jonaswinkler	d85a0f950f	better exception logging	2021-03-22 23:00:15 +01:00
jonaswinkler	62f829ae82	fixes #794	2021-03-22 22:46:35 +01:00
jonaswinkler	3a67462396	fixes #631	2021-03-14 14:42:48 +01:00
jonaswinkler	81b787635e	update dependencies	2021-02-28 13:01:26 +01:00
jonaswinkler	96088716d9	tests	2021-02-22 00:17:16 +01:00
jonaswinkler	8dd2e1098b	fix up the ocrmypdf parameter construction for clean-final and redo	2021-02-21 23:39:19 +01:00
jonaswinkler	3f920a84da	use archived file for thumbnail, if available	2021-02-21 23:30:14 +01:00
jonaswinkler	dce65dc0fa	more parameter checking	2021-02-21 22:19:24 +01:00
jonaswinkler	3cfd97aa08	pycodestyle	2021-02-21 00:21:43 +01:00
jonaswinkler	26c65b29d5	tests	2021-02-21 00:18:34 +01:00
jonaswinkler	e3dd1863a9	completely reworked the OCRmyPDF parser.	2021-02-21 00:16:57 +01:00
jonaswinkler	99cb371483	add some test files	2021-02-21 00:13:08 +01:00
jonaswinkler	94cc9876d9	local import of ocrmypdf so that the webserver does not load that	2021-02-15 12:18:10 +01:00
jonaswinkler	b04d91d68c	fix a bug with thumbnail generation when TIKA was enabled	2021-02-09 22:12:43 +01:00
jonaswinkler	e5a7dc0cc7	rework most of the logging	2021-02-05 01:10:29 +01:00
jonaswinkler	95f5c9f3a6	lazy loading for parsers	2021-02-04 13:17:24 +01:00

1 2 3 4

160 Commits