145 Commits

Author SHA1 Message Date
Trenton Holmes
b3b2519bf0 Fixes the creation of an archive file, even if noarchive was specified 2022-08-20 13:47:56 -07:00
Trenton Holmes
b70e21a6d5 When raising an exception during exception handling, chain them together for slightly cleaner logs 2022-08-03 09:00:56 -07:00
Trenton Holmes
49a843dcdd Changes the simple-alpha parsing test to use a tempdir so the original isn't modified in Git 2022-07-02 16:19:22 +02:00
Trenton Holmes
fc26fe0ac0
Updates to provide the user provided max pixel size to ocrmypdf 2022-05-22 16:56:08 -07:00
Trenton Holmes
3003bdd507 Runs pyupgrade to Python 3.8+ and adds a hook for it 2022-05-06 09:04:08 -07:00
Henning Häcker
3b4da70c85 extract OCR_MAX_IMAGE_PIXELS into settings.py 2022-03-30 09:23:45 +02:00
Henning Häcker
95199bd325 formatting according to black 2022-03-30 09:23:45 +02:00
Henning Häcker
a8887b211e implement PAPERLESS_OCR_MAX_IMAGE_PIXELS 2022-03-30 09:23:45 +02:00
Trenton Holmes
1771d18a21 Runs the pre-commit hooks over all the Python files 2022-03-11 11:34:28 -08:00
Trenton Holmes
85b210ebf6 Reduces number of warnings from testing from 165 to 128. In doing so, fixes a few minor things in the decrypt and export commands 2022-03-10 18:12:48 -08:00
kpj
fc695896dd Format Python code with black 2022-02-27 15:26:41 +01:00
Martin Müller
1e288100a9 Remove unneded exception handler from has_alpha() 2022-02-21 22:58:19 +01:00
Martin Müller
73a8569d21 Modify test for PNG image with alpha 2022-02-21 22:38:25 +01:00
Martin Müller
2a47b3f1a1 Fix code style (line too long) 2022-02-21 22:34:34 +01:00
Martin Müller
41494ee689 Remove alpha layer from PNG files for img2pdf
Fixes issue #1254
2022-02-21 22:06:43 +01:00
jonaswinkler
23c6f849d6 fix bug with DPI calculation 2021-08-18 18:33:33 +02:00
jonaswinkler
1f707e86cc fix logging getting spammed with pdfminer warnings on JPG files 2021-06-13 12:09:16 +02:00
jonaswinkler
814d90745b Workaround for all PDFminer.six issues. 2021-05-15 12:15:32 +02:00
jonaswinkler
0e596bd1fc also apply \0 removal to sidecar contents 2021-03-22 23:08:34 +01:00
jonaswinkler
fda2bfbea7 better exception logging 2021-03-22 23:00:15 +01:00
jonaswinkler
d26c46e034 fixes #794 2021-03-22 22:46:35 +01:00
jonaswinkler
40ce38254b fixes #631 2021-03-14 14:42:48 +01:00
jonaswinkler
6ab884a95c update dependencies 2021-02-28 13:01:26 +01:00
jonaswinkler
99a18516b2 tests 2021-02-22 00:17:16 +01:00
jonaswinkler
265432f2a5 fix up the ocrmypdf parameter construction for clean-final and redo 2021-02-21 23:39:19 +01:00
jonaswinkler
a13e9f23b1 use archived file for thumbnail, if available 2021-02-21 23:30:14 +01:00
jonaswinkler
14e2ad7bc4 more parameter checking 2021-02-21 22:19:24 +01:00
jonaswinkler
6da237dd9e pycodestyle 2021-02-21 00:21:43 +01:00
jonaswinkler
50c1978d36 tests 2021-02-21 00:18:34 +01:00
jonaswinkler
ce121a261d completely reworked the OCRmyPDF parser. 2021-02-21 00:16:57 +01:00
jonaswinkler
9cbb1c5726 add some test files 2021-02-21 00:13:08 +01:00
jonaswinkler
56bd966c02 local import of ocrmypdf so that the webserver does not load that 2021-02-15 12:18:10 +01:00
jonaswinkler
8d6071e977 fix a bug with thumbnail generation when TIKA was enabled 2021-02-09 22:12:43 +01:00
jonaswinkler
431d4fd8e4 rework most of the logging 2021-02-05 01:10:29 +01:00
jonaswinkler
44ec3a3d9c lazy loading for parsers 2021-02-04 13:17:24 +01:00
jonaswinkler
d17de45791 fix typo 2021-02-03 14:51:04 +01:00
jonaswinkler
bdc247ce49 code style 2021-02-02 23:58:25 +01:00
jonaswinkler
b0ed06003b better error messages 2021-01-27 17:56:06 +01:00
jonaswinkler
89d6e422f5 fix bugs and test cases 2021-01-02 15:37:27 +01:00
jonaswinkler
40ef375c15 supply file_name for tika parser 2021-01-01 22:19:43 +01:00
jonaswinkler
c05bfb894a remove duplicate code 2021-01-01 21:50:45 +01:00
jonaswinkler
713985f259 fixes #218 2020-12-30 15:12:16 +01:00
jonaswinkler
ee31fdc650 removed unused code 2020-12-20 14:00:24 +01:00
jonaswinkler
1b1b57eb6a more tests 2020-12-19 15:54:13 +01:00
jonaswinkler
a0631413d6 fixes bauerj/paperless_app#23 and most of all other scanner apps out there. 2020-12-12 18:25:15 +01:00
jonaswinkler
2f7bb01f34 moved metadata extraction to the parsers 2020-12-10 14:57:53 +01:00
jonaswinkler
dab4b1253a fixes for the parser. 2020-12-04 16:44:34 +01:00
jonaswinkler
991a46c4f0 disabled thumbnail trimming. 2020-12-04 12:44:02 +01:00
jonaswinkler
6a04e95f69 catch encrypted pdf documents 2020-12-03 01:02:37 +01:00
jonaswinkler
e3ce573fbb a couple fixes and more supported image files 2020-12-02 17:39:49 +01:00