jonaswinkler
|
e5a7dc0cc7
|
rework most of the logging
|
2021-02-05 01:10:29 +01:00 |
|
jonaswinkler
|
95f5c9f3a6
|
lazy loading for parsers
|
2021-02-04 13:17:24 +01:00 |
|
jonaswinkler
|
755f950cd2
|
supply file_name for tika parser
|
2021-01-01 22:19:43 +01:00 |
|
jonaswinkler
|
fe73f42495
|
added configuration option for the font #197 #207
|
2020-12-29 12:26:41 +01:00 |
|
jonaswinkler
|
d329b371ef
|
removed unused code
|
2020-12-20 14:00:24 +01:00 |
|
jonaswinkler
|
e28f741fac
|
thumbnail generation
|
2020-12-16 14:19:11 +01:00 |
|
jonaswinkler
|
b1cee55edb
|
fixes #7 and some test cases.
|
2020-12-16 14:17:05 +01:00 |
|
jonaswinkler
|
f7db27de70
|
more tests
|
2020-12-15 13:26:01 +01:00 |
|
jonaswinkler
|
1df64e3129
|
Merge branch 'dev' into feature-ocrmypdf
|
2020-11-30 16:48:09 +01:00 |
|
jonaswinkler
|
7658c07b4d
|
added file type checks to the parsers to prevent temporary files from being consumed. Also: parsers announce file types they wish to use as default for each mime type.
|
2020-11-30 00:40:04 +01:00 |
|
Jonas Winkler
|
9bfa088eb5
|
reworked the interface of the parsers.
|
2020-11-25 19:36:39 +01:00 |
|
Jonas Winkler
|
f976a0b4ba
|
mime type handling
|
2020-11-20 13:31:03 +01:00 |
|
Jonas Winkler
|
9a48d6c577
|
Changed the way parsers are discovered. This also prepares for upcoming changes regarding content types and file types: parsers should declare what they support, and actual file extensions should not be hardcoded everywhere.
|
2020-11-16 23:53:12 +01:00 |
|
Jonas Winkler
|
eb6805e37e
|
code style fixes
|
2020-11-12 21:09:45 +01:00 |
|
Jonas Winkler
|
def3a85858
|
reworked most of the tesseract parser, better logging
|
2020-11-02 15:40:44 +01:00 |
|
Daniel Quinn
|
bc898c1992
|
Use optipng to optimise document thumbnails
|
2018-10-07 14:56:38 +01:00 |
|
Daniel Quinn
|
074609e1fc
|
Consolidate get_date onto the DocumentParser parent class
|
2018-10-07 14:56:02 +01:00 |
|
Daniel Quinn
|
ef7f98281d
|
Rename parsers to DATE_REGEX
In moving the `parsers` variable into the package-level, it lost the
context, so a more descriptive name was needed.
|
2018-09-09 21:02:30 +01:00 |
|
Daniel Quinn
|
69fc0d6d80
|
Fix pycodestyle complaints
|
2018-09-09 20:55:37 +01:00 |
|
Joshua Taillon
|
5326895334
|
move date-matching regex pattern to base parser module for use by all subclasses
|
2018-09-05 21:13:36 -04:00 |
|
Joshua Taillon
|
cc7a341e75
|
explicitly add txt, md, and csv types for consumer and viewer; fix thumbnail generation
|
2018-09-03 23:46:13 -04:00 |
|
Joshua Taillon
|
3c074d9e36
|
first stab at text consumer
|
2018-08-30 23:32:41 -04:00 |
|