Martin Müller
|
1e288100a9
|
Remove unneded exception handler from has_alpha()
|
2022-02-21 22:58:19 +01:00 |
|
Martin Müller
|
2a47b3f1a1
|
Fix code style (line too long)
|
2022-02-21 22:34:34 +01:00 |
|
Martin Müller
|
41494ee689
|
Remove alpha layer from PNG files for img2pdf
Fixes issue #1254
|
2022-02-21 22:06:43 +01:00 |
|
jonaswinkler
|
23c6f849d6
|
fix bug with DPI calculation
|
2021-08-18 18:33:33 +02:00 |
|
jonaswinkler
|
1f707e86cc
|
fix logging getting spammed with pdfminer warnings on JPG files
|
2021-06-13 12:09:16 +02:00 |
|
jonaswinkler
|
814d90745b
|
Workaround for all PDFminer.six issues.
|
2021-05-15 12:15:32 +02:00 |
|
jonaswinkler
|
0e596bd1fc
|
also apply \0 removal to sidecar contents
|
2021-03-22 23:08:34 +01:00 |
|
jonaswinkler
|
fda2bfbea7
|
better exception logging
|
2021-03-22 23:00:15 +01:00 |
|
jonaswinkler
|
d26c46e034
|
fixes #794
|
2021-03-22 22:46:35 +01:00 |
|
jonaswinkler
|
40ce38254b
|
fixes #631
|
2021-03-14 14:42:48 +01:00 |
|
jonaswinkler
|
265432f2a5
|
fix up the ocrmypdf parameter construction for clean-final and redo
|
2021-02-21 23:39:19 +01:00 |
|
jonaswinkler
|
a13e9f23b1
|
use archived file for thumbnail, if available
|
2021-02-21 23:30:14 +01:00 |
|
jonaswinkler
|
14e2ad7bc4
|
more parameter checking
|
2021-02-21 22:19:24 +01:00 |
|
jonaswinkler
|
6da237dd9e
|
pycodestyle
|
2021-02-21 00:21:43 +01:00 |
|
jonaswinkler
|
ce121a261d
|
completely reworked the OCRmyPDF parser.
|
2021-02-21 00:16:57 +01:00 |
|
jonaswinkler
|
56bd966c02
|
local import of ocrmypdf so that the webserver does not load that
|
2021-02-15 12:18:10 +01:00 |
|
jonaswinkler
|
8d6071e977
|
fix a bug with thumbnail generation when TIKA was enabled
|
2021-02-09 22:12:43 +01:00 |
|
jonaswinkler
|
431d4fd8e4
|
rework most of the logging
|
2021-02-05 01:10:29 +01:00 |
|
jonaswinkler
|
d17de45791
|
fix typo
|
2021-02-03 14:51:04 +01:00 |
|
jonaswinkler
|
bdc247ce49
|
code style
|
2021-02-02 23:58:25 +01:00 |
|
jonaswinkler
|
b0ed06003b
|
better error messages
|
2021-01-27 17:56:06 +01:00 |
|
jonaswinkler
|
40ef375c15
|
supply file_name for tika parser
|
2021-01-01 22:19:43 +01:00 |
|
jonaswinkler
|
c05bfb894a
|
remove duplicate code
|
2021-01-01 21:50:45 +01:00 |
|
jonaswinkler
|
713985f259
|
fixes #218
|
2020-12-30 15:12:16 +01:00 |
|
jonaswinkler
|
a0631413d6
|
fixes bauerj/paperless_app#23 and most of all other scanner apps out there.
|
2020-12-12 18:25:15 +01:00 |
|
jonaswinkler
|
2f7bb01f34
|
moved metadata extraction to the parsers
|
2020-12-10 14:57:53 +01:00 |
|
jonaswinkler
|
dab4b1253a
|
fixes for the parser.
|
2020-12-04 16:44:34 +01:00 |
|
jonaswinkler
|
991a46c4f0
|
disabled thumbnail trimming.
|
2020-12-04 12:44:02 +01:00 |
|
jonaswinkler
|
6a04e95f69
|
catch encrypted pdf documents
|
2020-12-03 01:02:37 +01:00 |
|
jonaswinkler
|
e3ce573fbb
|
a couple fixes and more supported image files
|
2020-12-02 17:39:49 +01:00 |
|
jonaswinkler
|
fd3df1ec58
|
some more tests.
|
2020-12-01 14:15:43 +01:00 |
|
jonaswinkler
|
fca98b411e
|
reorganised settings documentation and added OCR_USER_ARGS
|
2020-11-29 12:38:32 +01:00 |
|
Jonas Winkler
|
e87575240d
|
more tests of the new parser
|
2020-11-26 00:08:23 +01:00 |
|
Jonas Winkler
|
a60a4babf6
|
OMP_THREAD_LIMIT
|
2020-11-25 19:37:59 +01:00 |
|
Jonas Winkler
|
a03315102a
|
added image DPI detection to the tesseract parser.
|
2020-11-25 19:37:48 +01:00 |
|
Jonas Winkler
|
df801d17e1
|
reworked the interface of the parsers.
|
2020-11-25 19:36:39 +01:00 |
|
Jonas Winkler
|
2d559d330d
|
reworked PDF parser that uses OCRmyPDF and produces archive files.
|
2020-11-25 14:50:43 +01:00 |
|
Jonas Winkler
|
fec9e54049
|
new setting: PAPERLESS_OCR_PAGES
|
2020-11-22 12:54:08 +01:00 |
|
Jonas Winkler
|
450fb877f6
|
code cleanup
|
2020-11-21 15:34:00 +01:00 |
|
Jonas Winkler
|
b44f8383e4
|
code cleanup
|
2020-11-21 14:03:45 +01:00 |
|
Jonas Winkler
|
8908bc259e
|
updated logging, logging for the mail consumer to see whats happening
|
2020-11-18 13:23:30 +01:00 |
|
Jonas Winkler
|
8dca459573
|
first version of the new consumer.
|
2020-11-16 18:26:54 +01:00 |
|
Jonas Winkler
|
2e04ba1c04
|
code style fixes
|
2020-11-12 21:09:45 +01:00 |
|
Jonas Winkler
|
3a08a2d206
|
made unpaper and convert a little bit nicer to interact with
|
2020-11-02 19:31:04 +01:00 |
|
Jonas Winkler
|
7d282a4e4e
|
removed unused code, small fixes
|
2020-11-02 18:20:04 +01:00 |
|
Jonas Winkler
|
d15405ef56
|
reworked most of the tesseract parser, better logging
|
2020-11-02 15:40:44 +01:00 |
|
Jonas Winkler
|
06ad212320
|
bugfix
|
2020-11-02 01:26:42 +01:00 |
|
Jonas Winkler
|
9f55fb668d
|
silenced unpaper, optipng for cleaner output
moved parser settings to settings
removed forgiving ocr (now default) since tesseract is plenty accurate even without defining the correct language.
|
2020-11-01 23:23:42 +01:00 |
|
Jonas Winkler
|
743ce1dc14
|
better thumbnail generation for smaller files
|
2020-10-26 01:05:23 +01:00 |
|
Stéphane Brunner
|
daca77cc1b
|
Strip the thumbnails
|
2019-03-17 16:37:47 +01:00 |
|