Trenton Holmes
|
95bbf47995
|
Updates to provide the user provided max pixel size to ocrmypdf
|
2022-05-22 16:56:08 -07:00 |
|
Trenton Holmes
|
f62193099c
|
Runs pyupgrade to Python 3.8+ and adds a hook for it
|
2022-05-06 09:04:08 -07:00 |
|
Henning Häcker
|
f4a0d8c040
|
extract OCR_MAX_IMAGE_PIXELS into settings.py
|
2022-03-30 09:23:45 +02:00 |
|
Henning Häcker
|
6bc2cb0607
|
formatting according to black
|
2022-03-30 09:23:45 +02:00 |
|
Henning Häcker
|
2c1f0cd3ee
|
implement PAPERLESS_OCR_MAX_IMAGE_PIXELS
|
2022-03-30 09:23:45 +02:00 |
|
Trenton Holmes
|
6635fa5f0d
|
Runs the pre-commit hooks over all the Python files
|
2022-03-11 11:34:28 -08:00 |
|
Trenton Holmes
|
55486ac151
|
Reduces number of warnings from testing from 165 to 128. In doing so, fixes a few minor things in the decrypt and export commands
|
2022-03-10 18:12:48 -08:00 |
|
kpj
|
c56cb25b5f
|
Format Python code with black
|
2022-02-27 15:26:41 +01:00 |
|
Martin Müller
|
5fa3ec6704
|
Remove unneded exception handler from has_alpha()
|
2022-02-21 22:58:19 +01:00 |
|
Martin Müller
|
b0afdc4841
|
Fix code style (line too long)
|
2022-02-21 22:34:34 +01:00 |
|
Martin Müller
|
01310b9742
|
Remove alpha layer from PNG files for img2pdf
Fixes issue #1254
|
2022-02-21 22:06:43 +01:00 |
|
jonaswinkler
|
95abc7d6d7
|
fix bug with DPI calculation
|
2021-08-18 18:33:33 +02:00 |
|
jonaswinkler
|
1402f11dc8
|
fix logging getting spammed with pdfminer warnings on JPG files
|
2021-06-13 12:09:16 +02:00 |
|
jonaswinkler
|
271f9001dd
|
Workaround for all PDFminer.six issues.
|
2021-05-15 12:15:32 +02:00 |
|
jonaswinkler
|
c9d76322eb
|
also apply \0 removal to sidecar contents
|
2021-03-22 23:08:34 +01:00 |
|
jonaswinkler
|
d85a0f950f
|
better exception logging
|
2021-03-22 23:00:15 +01:00 |
|
jonaswinkler
|
62f829ae82
|
fixes #794
|
2021-03-22 22:46:35 +01:00 |
|
jonaswinkler
|
3a67462396
|
fixes #631
|
2021-03-14 14:42:48 +01:00 |
|
jonaswinkler
|
8dd2e1098b
|
fix up the ocrmypdf parameter construction for clean-final and redo
|
2021-02-21 23:39:19 +01:00 |
|
jonaswinkler
|
3f920a84da
|
use archived file for thumbnail, if available
|
2021-02-21 23:30:14 +01:00 |
|
jonaswinkler
|
dce65dc0fa
|
more parameter checking
|
2021-02-21 22:19:24 +01:00 |
|
jonaswinkler
|
3cfd97aa08
|
pycodestyle
|
2021-02-21 00:21:43 +01:00 |
|
jonaswinkler
|
e3dd1863a9
|
completely reworked the OCRmyPDF parser.
|
2021-02-21 00:16:57 +01:00 |
|
jonaswinkler
|
94cc9876d9
|
local import of ocrmypdf so that the webserver does not load that
|
2021-02-15 12:18:10 +01:00 |
|
jonaswinkler
|
b04d91d68c
|
fix a bug with thumbnail generation when TIKA was enabled
|
2021-02-09 22:12:43 +01:00 |
|
jonaswinkler
|
e5a7dc0cc7
|
rework most of the logging
|
2021-02-05 01:10:29 +01:00 |
|
jonaswinkler
|
701897dc3c
|
fix typo
|
2021-02-03 14:51:04 +01:00 |
|
jonaswinkler
|
eeff7b3bdb
|
code style
|
2021-02-02 23:58:25 +01:00 |
|
jonaswinkler
|
14c61d72f3
|
better error messages
|
2021-01-27 17:56:06 +01:00 |
|
jonaswinkler
|
755f950cd2
|
supply file_name for tika parser
|
2021-01-01 22:19:43 +01:00 |
|
jonaswinkler
|
f1e9b414f9
|
remove duplicate code
|
2021-01-01 21:50:45 +01:00 |
|
jonaswinkler
|
4b7138f477
|
fixes #218
|
2020-12-30 15:12:16 +01:00 |
|
jonaswinkler
|
45d31f9735
|
fixes bauerj/paperless_app#23 and most of all other scanner apps out there.
|
2020-12-12 18:25:15 +01:00 |
|
jonaswinkler
|
0c6c4a62d8
|
moved metadata extraction to the parsers
|
2020-12-10 14:57:53 +01:00 |
|
jonaswinkler
|
905c090908
|
fixes for the parser.
|
2020-12-04 16:44:34 +01:00 |
|
jonaswinkler
|
884eec9b61
|
disabled thumbnail trimming.
|
2020-12-04 12:44:02 +01:00 |
|
jonaswinkler
|
e2a375c9aa
|
catch encrypted pdf documents
|
2020-12-03 01:02:37 +01:00 |
|
jonaswinkler
|
1d073d2cfd
|
a couple fixes and more supported image files
|
2020-12-02 17:39:49 +01:00 |
|
jonaswinkler
|
1f90d50833
|
some more tests.
|
2020-12-01 14:15:43 +01:00 |
|
jonaswinkler
|
388f6cfbe6
|
reorganised settings documentation and added OCR_USER_ARGS
|
2020-11-29 12:38:32 +01:00 |
|
Jonas Winkler
|
f901def797
|
more tests of the new parser
|
2020-11-26 00:08:23 +01:00 |
|
Jonas Winkler
|
e55d1ff9cc
|
OMP_THREAD_LIMIT
|
2020-11-25 19:37:59 +01:00 |
|
Jonas Winkler
|
3b655c95d9
|
added image DPI detection to the tesseract parser.
|
2020-11-25 19:37:48 +01:00 |
|
Jonas Winkler
|
9bfa088eb5
|
reworked the interface of the parsers.
|
2020-11-25 19:36:39 +01:00 |
|
Jonas Winkler
|
15935ab61f
|
reworked PDF parser that uses OCRmyPDF and produces archive files.
|
2020-11-25 14:50:43 +01:00 |
|
Jonas Winkler
|
ae198f0767
|
new setting: PAPERLESS_OCR_PAGES
|
2020-11-22 12:54:08 +01:00 |
|
Jonas Winkler
|
a532200d10
|
code cleanup
|
2020-11-21 15:34:00 +01:00 |
|
Jonas Winkler
|
afc3753e58
|
code cleanup
|
2020-11-21 14:03:45 +01:00 |
|
Jonas Winkler
|
680ab3d56b
|
updated logging, logging for the mail consumer to see whats happening
|
2020-11-18 13:23:30 +01:00 |
|
Jonas Winkler
|
bd04c966c5
|
first version of the new consumer.
|
2020-11-16 18:26:54 +01:00 |
|