91 Commits

Author SHA1 Message Date
Trenton Holmes
6635fa5f0d Runs the pre-commit hooks over all the Python files 2022-03-11 11:34:28 -08:00
Trenton Holmes
55486ac151 Reduces number of warnings from testing from 165 to 128. In doing so, fixes a few minor things in the decrypt and export commands 2022-03-10 18:12:48 -08:00
kpj
c56cb25b5f Format Python code with black 2022-02-27 15:26:41 +01:00
Martin Müller
5fa3ec6704 Remove unneded exception handler from has_alpha() 2022-02-21 22:58:19 +01:00
Martin Müller
b0afdc4841 Fix code style (line too long) 2022-02-21 22:34:34 +01:00
Martin Müller
01310b9742 Remove alpha layer from PNG files for img2pdf
Fixes issue #1254
2022-02-21 22:06:43 +01:00
jonaswinkler
95abc7d6d7 fix bug with DPI calculation 2021-08-18 18:33:33 +02:00
jonaswinkler
1402f11dc8 fix logging getting spammed with pdfminer warnings on JPG files 2021-06-13 12:09:16 +02:00
jonaswinkler
271f9001dd Workaround for all PDFminer.six issues. 2021-05-15 12:15:32 +02:00
jonaswinkler
c9d76322eb also apply \0 removal to sidecar contents 2021-03-22 23:08:34 +01:00
jonaswinkler
d85a0f950f better exception logging 2021-03-22 23:00:15 +01:00
jonaswinkler
62f829ae82 fixes #794 2021-03-22 22:46:35 +01:00
jonaswinkler
3a67462396 fixes #631 2021-03-14 14:42:48 +01:00
jonaswinkler
8dd2e1098b fix up the ocrmypdf parameter construction for clean-final and redo 2021-02-21 23:39:19 +01:00
jonaswinkler
3f920a84da use archived file for thumbnail, if available 2021-02-21 23:30:14 +01:00
jonaswinkler
dce65dc0fa more parameter checking 2021-02-21 22:19:24 +01:00
jonaswinkler
3cfd97aa08 pycodestyle 2021-02-21 00:21:43 +01:00
jonaswinkler
e3dd1863a9 completely reworked the OCRmyPDF parser. 2021-02-21 00:16:57 +01:00
jonaswinkler
94cc9876d9 local import of ocrmypdf so that the webserver does not load that 2021-02-15 12:18:10 +01:00
jonaswinkler
b04d91d68c fix a bug with thumbnail generation when TIKA was enabled 2021-02-09 22:12:43 +01:00
jonaswinkler
e5a7dc0cc7 rework most of the logging 2021-02-05 01:10:29 +01:00
jonaswinkler
701897dc3c fix typo 2021-02-03 14:51:04 +01:00
jonaswinkler
eeff7b3bdb code style 2021-02-02 23:58:25 +01:00
jonaswinkler
14c61d72f3 better error messages 2021-01-27 17:56:06 +01:00
jonaswinkler
755f950cd2 supply file_name for tika parser 2021-01-01 22:19:43 +01:00
jonaswinkler
f1e9b414f9 remove duplicate code 2021-01-01 21:50:45 +01:00
jonaswinkler
4b7138f477 fixes #218 2020-12-30 15:12:16 +01:00
jonaswinkler
45d31f9735 fixes bauerj/paperless_app#23 and most of all other scanner apps out there. 2020-12-12 18:25:15 +01:00
jonaswinkler
0c6c4a62d8 moved metadata extraction to the parsers 2020-12-10 14:57:53 +01:00
jonaswinkler
905c090908 fixes for the parser. 2020-12-04 16:44:34 +01:00
jonaswinkler
884eec9b61 disabled thumbnail trimming. 2020-12-04 12:44:02 +01:00
jonaswinkler
e2a375c9aa catch encrypted pdf documents 2020-12-03 01:02:37 +01:00
jonaswinkler
1d073d2cfd a couple fixes and more supported image files 2020-12-02 17:39:49 +01:00
jonaswinkler
1f90d50833 some more tests. 2020-12-01 14:15:43 +01:00
jonaswinkler
388f6cfbe6 reorganised settings documentation and added OCR_USER_ARGS 2020-11-29 12:38:32 +01:00
Jonas Winkler
f901def797 more tests of the new parser 2020-11-26 00:08:23 +01:00
Jonas Winkler
e55d1ff9cc OMP_THREAD_LIMIT 2020-11-25 19:37:59 +01:00
Jonas Winkler
3b655c95d9 added image DPI detection to the tesseract parser. 2020-11-25 19:37:48 +01:00
Jonas Winkler
9bfa088eb5 reworked the interface of the parsers. 2020-11-25 19:36:39 +01:00
Jonas Winkler
15935ab61f reworked PDF parser that uses OCRmyPDF and produces archive files. 2020-11-25 14:50:43 +01:00
Jonas Winkler
ae198f0767 new setting: PAPERLESS_OCR_PAGES 2020-11-22 12:54:08 +01:00
Jonas Winkler
a532200d10 code cleanup 2020-11-21 15:34:00 +01:00
Jonas Winkler
afc3753e58 code cleanup 2020-11-21 14:03:45 +01:00
Jonas Winkler
680ab3d56b updated logging, logging for the mail consumer to see whats happening 2020-11-18 13:23:30 +01:00
Jonas Winkler
bd04c966c5 first version of the new consumer. 2020-11-16 18:26:54 +01:00
Jonas Winkler
eb6805e37e code style fixes 2020-11-12 21:09:45 +01:00
Jonas Winkler
d42979842e made unpaper and convert a little bit nicer to interact with 2020-11-02 19:31:04 +01:00
Jonas Winkler
a89773ad71 removed unused code, small fixes 2020-11-02 18:20:04 +01:00
Jonas Winkler
def3a85858 reworked most of the tesseract parser, better logging 2020-11-02 15:40:44 +01:00
Jonas Winkler
972a6a2333 bugfix 2020-11-02 01:26:42 +01:00