jonaswinkler
|
56bd966c02
|
local import of ocrmypdf so that the webserver does not load that
|
2021-02-15 12:18:10 +01:00 |
|
jonaswinkler
|
8d6071e977
|
fix a bug with thumbnail generation when TIKA was enabled
|
2021-02-09 22:12:43 +01:00 |
|
jonaswinkler
|
431d4fd8e4
|
rework most of the logging
|
2021-02-05 01:10:29 +01:00 |
|
jonaswinkler
|
44ec3a3d9c
|
lazy loading for parsers
|
2021-02-04 13:17:24 +01:00 |
|
jonaswinkler
|
d17de45791
|
fix typo
|
2021-02-03 14:51:04 +01:00 |
|
jonaswinkler
|
bdc247ce49
|
code style
|
2021-02-02 23:58:25 +01:00 |
|
jonaswinkler
|
b0ed06003b
|
better error messages
|
2021-01-27 17:56:06 +01:00 |
|
jonaswinkler
|
89d6e422f5
|
fix bugs and test cases
|
2021-01-02 15:37:27 +01:00 |
|
jonaswinkler
|
40ef375c15
|
supply file_name for tika parser
|
2021-01-01 22:19:43 +01:00 |
|
jonaswinkler
|
c05bfb894a
|
remove duplicate code
|
2021-01-01 21:50:45 +01:00 |
|
jonaswinkler
|
713985f259
|
fixes #218
|
2020-12-30 15:12:16 +01:00 |
|
jonaswinkler
|
ee31fdc650
|
removed unused code
|
2020-12-20 14:00:24 +01:00 |
|
jonaswinkler
|
1b1b57eb6a
|
more tests
|
2020-12-19 15:54:13 +01:00 |
|
jonaswinkler
|
a0631413d6
|
fixes bauerj/paperless_app#23 and most of all other scanner apps out there.
|
2020-12-12 18:25:15 +01:00 |
|
jonaswinkler
|
2f7bb01f34
|
moved metadata extraction to the parsers
|
2020-12-10 14:57:53 +01:00 |
|
jonaswinkler
|
dab4b1253a
|
fixes for the parser.
|
2020-12-04 16:44:34 +01:00 |
|
jonaswinkler
|
991a46c4f0
|
disabled thumbnail trimming.
|
2020-12-04 12:44:02 +01:00 |
|
jonaswinkler
|
6a04e95f69
|
catch encrypted pdf documents
|
2020-12-03 01:02:37 +01:00 |
|
jonaswinkler
|
e3ce573fbb
|
a couple fixes and more supported image files
|
2020-12-02 17:39:49 +01:00 |
|
jonaswinkler
|
12fa844c7f
|
testing the new noarchive option.
|
2020-12-01 14:30:13 +01:00 |
|
jonaswinkler
|
fd3df1ec58
|
some more tests.
|
2020-12-01 14:15:43 +01:00 |
|
jonaswinkler
|
aaa6599283
|
Merge branch 'dev' into feature-ocrmypdf
|
2020-11-30 16:48:09 +01:00 |
|
jonaswinkler
|
f51207fc32
|
added file type checks to the parsers to prevent temporary files from being consumed. Also: parsers announce file types they wish to use as default for each mime type.
|
2020-11-30 00:40:04 +01:00 |
|
jonaswinkler
|
ac1b701000
|
more tests!
|
2020-11-29 19:58:48 +01:00 |
|
jonaswinkler
|
fca98b411e
|
reorganised settings documentation and added OCR_USER_ARGS
|
2020-11-29 12:38:32 +01:00 |
|
jonaswinkler
|
0565118a01
|
fixed checking the installed languages.
|
2020-11-29 12:31:42 +01:00 |
|
jonaswinkler
|
06cfc3113a
|
test case fixes.
|
2020-11-27 14:06:37 +01:00 |
|
Jonas Winkler
|
e87575240d
|
more tests of the new parser
|
2020-11-26 00:08:23 +01:00 |
|
Jonas Winkler
|
f51d2be303
|
fixed the test cases
|
2020-11-25 19:51:09 +01:00 |
|
Jonas Winkler
|
a60a4babf6
|
OMP_THREAD_LIMIT
|
2020-11-25 19:37:59 +01:00 |
|
Jonas Winkler
|
a03315102a
|
added image DPI detection to the tesseract parser.
|
2020-11-25 19:37:48 +01:00 |
|
Jonas Winkler
|
df801d17e1
|
reworked the interface of the parsers.
|
2020-11-25 19:36:39 +01:00 |
|
Jonas Winkler
|
b269af7572
|
Merge branch 'dev' into feature-ocrmypdf
|
2020-11-25 16:58:20 +01:00 |
|
Jonas Winkler
|
d92214d412
|
codestyle
|
2020-11-25 16:05:52 +01:00 |
|
Jonas Winkler
|
56ce267f89
|
removed obsolete tests.
|
2020-11-25 14:51:32 +01:00 |
|
Jonas Winkler
|
2d559d330d
|
reworked PDF parser that uses OCRmyPDF and produces archive files.
|
2020-11-25 14:50:43 +01:00 |
|
Jonas Winkler
|
dd83364326
|
default language check
|
2020-11-25 10:52:38 +01:00 |
|
Jonas Winkler
|
fec9e54049
|
new setting: PAPERLESS_OCR_PAGES
|
2020-11-22 12:54:08 +01:00 |
|
Jonas Winkler
|
450fb877f6
|
code cleanup
|
2020-11-21 15:34:00 +01:00 |
|
Jonas Winkler
|
b44f8383e4
|
code cleanup
|
2020-11-21 14:03:45 +01:00 |
|
Jonas Winkler
|
41650f20f4
|
mime type handling
|
2020-11-20 13:31:03 +01:00 |
|
Jonas Winkler
|
1655d85a53
|
testing the tesseract parser
|
2020-11-19 20:31:08 +01:00 |
|
Jonas Winkler
|
8908bc259e
|
updated logging, logging for the mail consumer to see whats happening
|
2020-11-18 13:23:30 +01:00 |
|
Jonas Winkler
|
d2e22e3f27
|
Changed the way parsers are discovered. This also prepares for upcoming changes regarding content types and file types: parsers should declare what they support, and actual file extensions should not be hardcoded everywhere.
|
2020-11-16 23:53:12 +01:00 |
|
Jonas Winkler
|
8dca459573
|
first version of the new consumer.
|
2020-11-16 18:26:54 +01:00 |
|
Jonas Winkler
|
2e04ba1c04
|
code style fixes
|
2020-11-12 21:09:45 +01:00 |
|
Jonas Winkler
|
f182709fdd
|
fixed most of the tests
|
2020-11-02 19:42:23 +01:00 |
|
Jonas Winkler
|
3a08a2d206
|
made unpaper and convert a little bit nicer to interact with
|
2020-11-02 19:31:04 +01:00 |
|
Jonas Winkler
|
7d282a4e4e
|
removed unused code, small fixes
|
2020-11-02 18:20:04 +01:00 |
|
Jonas Winkler
|
d15405ef56
|
reworked most of the tesseract parser, better logging
|
2020-11-02 15:40:44 +01:00 |
|