Trenton Holmes
|
3003bdd507
|
Runs pyupgrade to Python 3.8+ and adds a hook for it
|
2022-05-06 09:04:08 -07:00 |
|
Henning Häcker
|
3b4da70c85
|
extract OCR_MAX_IMAGE_PIXELS into settings.py
|
2022-03-30 09:23:45 +02:00 |
|
Henning Häcker
|
95199bd325
|
formatting according to black
|
2022-03-30 09:23:45 +02:00 |
|
Henning Häcker
|
a8887b211e
|
implement PAPERLESS_OCR_MAX_IMAGE_PIXELS
|
2022-03-30 09:23:45 +02:00 |
|
Trenton Holmes
|
1771d18a21
|
Runs the pre-commit hooks over all the Python files
|
2022-03-11 11:34:28 -08:00 |
|
kpj
|
fc695896dd
|
Format Python code with black
|
2022-02-27 15:26:41 +01:00 |
|
jonaswinkler
|
8d6071e977
|
fix a bug with thumbnail generation when TIKA was enabled
|
2021-02-09 22:12:43 +01:00 |
|
jonaswinkler
|
431d4fd8e4
|
rework most of the logging
|
2021-02-05 01:10:29 +01:00 |
|
jonaswinkler
|
44ec3a3d9c
|
lazy loading for parsers
|
2021-02-04 13:17:24 +01:00 |
|
jonaswinkler
|
40ef375c15
|
supply file_name for tika parser
|
2021-01-01 22:19:43 +01:00 |
|
jonaswinkler
|
f964dd5935
|
added configuration option for the font #197 #207
|
2020-12-29 12:26:41 +01:00 |
|
jonaswinkler
|
ee31fdc650
|
removed unused code
|
2020-12-20 14:00:24 +01:00 |
|
jonaswinkler
|
b2e0a8c884
|
thumbnail generation
|
2020-12-16 14:19:11 +01:00 |
|
jonaswinkler
|
e47b105185
|
fixes #7 and some test cases.
|
2020-12-16 14:17:05 +01:00 |
|
jonaswinkler
|
7e0aa7136a
|
more tests
|
2020-12-15 13:26:01 +01:00 |
|
jonaswinkler
|
aaa6599283
|
Merge branch 'dev' into feature-ocrmypdf
|
2020-11-30 16:48:09 +01:00 |
|
jonaswinkler
|
f51207fc32
|
added file type checks to the parsers to prevent temporary files from being consumed. Also: parsers announce file types they wish to use as default for each mime type.
|
2020-11-30 00:40:04 +01:00 |
|
Jonas Winkler
|
df801d17e1
|
reworked the interface of the parsers.
|
2020-11-25 19:36:39 +01:00 |
|
Jonas Winkler
|
41650f20f4
|
mime type handling
|
2020-11-20 13:31:03 +01:00 |
|
Jonas Winkler
|
d2e22e3f27
|
Changed the way parsers are discovered. This also prepares for upcoming changes regarding content types and file types: parsers should declare what they support, and actual file extensions should not be hardcoded everywhere.
|
2020-11-16 23:53:12 +01:00 |
|
Jonas Winkler
|
2e04ba1c04
|
code style fixes
|
2020-11-12 21:09:45 +01:00 |
|
Jonas Winkler
|
d15405ef56
|
reworked most of the tesseract parser, better logging
|
2020-11-02 15:40:44 +01:00 |
|
Daniel Quinn
|
750ab5bf85
|
Use optipng to optimise document thumbnails
|
2018-10-07 14:56:38 +01:00 |
|
Daniel Quinn
|
2a3f766b93
|
Consolidate get_date onto the DocumentParser parent class
|
2018-10-07 14:56:02 +01:00 |
|
Daniel Quinn
|
c99f5923d5
|
Rename parsers to DATE_REGEX
In moving the `parsers` variable into the package-level, it lost the
context, so a more descriptive name was needed.
|
2018-09-09 21:02:30 +01:00 |
|
Daniel Quinn
|
ef302abed7
|
Fix pycodestyle complaints
|
2018-09-09 20:55:37 +01:00 |
|
Joshua Taillon
|
72c828170e
|
move date-matching regex pattern to base parser module for use by all subclasses
|
2018-09-05 21:13:36 -04:00 |
|
Joshua Taillon
|
4849249d86
|
explicitly add txt, md, and csv types for consumer and viewer; fix thumbnail generation
|
2018-09-03 23:46:13 -04:00 |
|
Joshua Taillon
|
d6fedbec52
|
first stab at text consumer
|
2018-08-30 23:32:41 -04:00 |
|