paperless-ngx

mirror of https://github.com/paperless-ngx/paperless-ngx.git synced 2025-04-02 13:45:10 -05:00

Author	SHA1	Message	Date
Trenton H	1e4923835b	Small tweak to use the existing tempdir instead of a new one	2023-01-03 13:05:44 -08:00
Trenton Holmes	7be9ae9c02	Try a new way of extracting text from a given PDF file	2023-01-03 12:43:31 -08:00
Trenton H	59e0c1fe4e	Let convert handle the removal of the alpha channel	2023-01-03 09:56:19 -08:00
Trenton Holmes	26c7fad005	If extracting text from a fallback file (ie forced), allow the text to be used	2023-01-01 09:57:15 -08:00
Trenton H	a2b7687c3b	In the case of an RTL language being extracted via pdfminer.six, fall back to forced OCR, which handles RTL text better	2022-12-29 16:02:02 -08:00
Trenton H	e96d65f945	Allows parsing of WebP format images	2022-11-28 09:35:54 -08:00
Trenton H	b897d6de2e	Don't use the sidecar file when redoing the OCR, it only contains new text	2022-11-22 07:22:41 -08:00
Trenton Holmes	d1aa08850d	Reverts the change around skip_noarchive to align with how it is documented to work	2022-10-20 13:34:41 -07:00
Trenton Holmes	b3b2519bf0	Fixes the creation of an archive file, even if noarchive was specified	2022-08-20 13:47:56 -07:00
Trenton Holmes	b70e21a6d5	When raising an exception during exception handling, chain them together for slightly cleaner logs	2022-08-03 09:00:56 -07:00
Trenton Holmes	fc26fe0ac0	Updates to provide the user provided max pixel size to ocrmypdf	2022-05-22 16:56:08 -07:00
Trenton Holmes	3003bdd507	Runs pyupgrade to Python 3.8+ and adds a hook for it	2022-05-06 09:04:08 -07:00
Henning Häcker	3b4da70c85	extract OCR_MAX_IMAGE_PIXELS into settings.py	2022-03-30 09:23:45 +02:00
Henning Häcker	95199bd325	formatting according to black	2022-03-30 09:23:45 +02:00
Henning Häcker	a8887b211e	implement PAPERLESS_OCR_MAX_IMAGE_PIXELS	2022-03-30 09:23:45 +02:00
Trenton Holmes	1771d18a21	Runs the pre-commit hooks over all the Python files	2022-03-11 11:34:28 -08:00
Trenton Holmes	85b210ebf6	Reduces number of warnings from testing from 165 to 128. In doing so, fixes a few minor things in the decrypt and export commands	2022-03-10 18:12:48 -08:00
kpj	fc695896dd	Format Python code with black	2022-02-27 15:26:41 +01:00
Martin Müller	1e288100a9	Remove unneded exception handler from has_alpha()	2022-02-21 22:58:19 +01:00
Martin Müller	2a47b3f1a1	Fix code style (line too long)	2022-02-21 22:34:34 +01:00
Martin Müller	41494ee689	Remove alpha layer from PNG files for img2pdf Fixes issue #1254	2022-02-21 22:06:43 +01:00
jonaswinkler	23c6f849d6	fix bug with DPI calculation	2021-08-18 18:33:33 +02:00
jonaswinkler	1f707e86cc	fix logging getting spammed with pdfminer warnings on JPG files	2021-06-13 12:09:16 +02:00
jonaswinkler	814d90745b	Workaround for all PDFminer.six issues.	2021-05-15 12:15:32 +02:00
jonaswinkler	0e596bd1fc	also apply \0 removal to sidecar contents	2021-03-22 23:08:34 +01:00
jonaswinkler	fda2bfbea7	better exception logging	2021-03-22 23:00:15 +01:00
jonaswinkler	d26c46e034	fixes #794	2021-03-22 22:46:35 +01:00
jonaswinkler	40ce38254b	fixes #631	2021-03-14 14:42:48 +01:00
jonaswinkler	265432f2a5	fix up the ocrmypdf parameter construction for clean-final and redo	2021-02-21 23:39:19 +01:00
jonaswinkler	a13e9f23b1	use archived file for thumbnail, if available	2021-02-21 23:30:14 +01:00
jonaswinkler	14e2ad7bc4	more parameter checking	2021-02-21 22:19:24 +01:00
jonaswinkler	6da237dd9e	pycodestyle	2021-02-21 00:21:43 +01:00
jonaswinkler	ce121a261d	completely reworked the OCRmyPDF parser.	2021-02-21 00:16:57 +01:00
jonaswinkler	56bd966c02	local import of ocrmypdf so that the webserver does not load that	2021-02-15 12:18:10 +01:00
jonaswinkler	8d6071e977	fix a bug with thumbnail generation when TIKA was enabled	2021-02-09 22:12:43 +01:00
jonaswinkler	431d4fd8e4	rework most of the logging	2021-02-05 01:10:29 +01:00
jonaswinkler	d17de45791	fix typo	2021-02-03 14:51:04 +01:00
jonaswinkler	bdc247ce49	code style	2021-02-02 23:58:25 +01:00
jonaswinkler	b0ed06003b	better error messages	2021-01-27 17:56:06 +01:00
jonaswinkler	40ef375c15	supply file_name for tika parser	2021-01-01 22:19:43 +01:00
jonaswinkler	c05bfb894a	remove duplicate code	2021-01-01 21:50:45 +01:00
jonaswinkler	713985f259	fixes #218	2020-12-30 15:12:16 +01:00
jonaswinkler	a0631413d6	fixes bauerj/paperless_app#23 and most of all other scanner apps out there.	2020-12-12 18:25:15 +01:00
jonaswinkler	2f7bb01f34	moved metadata extraction to the parsers	2020-12-10 14:57:53 +01:00
jonaswinkler	dab4b1253a	fixes for the parser.	2020-12-04 16:44:34 +01:00
jonaswinkler	991a46c4f0	disabled thumbnail trimming.	2020-12-04 12:44:02 +01:00
jonaswinkler	6a04e95f69	catch encrypted pdf documents	2020-12-03 01:02:37 +01:00
jonaswinkler	e3ce573fbb	a couple fixes and more supported image files	2020-12-02 17:39:49 +01:00
jonaswinkler	fd3df1ec58	some more tests.	2020-12-01 14:15:43 +01:00
jonaswinkler	fca98b411e	reorganised settings documentation and added OCR_USER_ARGS	2020-11-29 12:38:32 +01:00

1 2 3

106 Commits