jonaswinkler
|
aaa6599283
|
Merge branch 'dev' into feature-ocrmypdf
|
2020-11-30 16:48:09 +01:00 |
|
jonaswinkler
|
f51207fc32
|
added file type checks to the parsers to prevent temporary files from being consumed. Also: parsers announce file types they wish to use as default for each mime type.
|
2020-11-30 00:40:04 +01:00 |
|
Jonas Winkler
|
df801d17e1
|
reworked the interface of the parsers.
|
2020-11-25 19:36:39 +01:00 |
|
Jonas Winkler
|
41650f20f4
|
mime type handling
|
2020-11-20 13:31:03 +01:00 |
|
Jonas Winkler
|
d2e22e3f27
|
Changed the way parsers are discovered. This also prepares for upcoming changes regarding content types and file types: parsers should declare what they support, and actual file extensions should not be hardcoded everywhere.
|
2020-11-16 23:53:12 +01:00 |
|
Jonas Winkler
|
2e04ba1c04
|
code style fixes
|
2020-11-12 21:09:45 +01:00 |
|
Jonas Winkler
|
d15405ef56
|
reworked most of the tesseract parser, better logging
|
2020-11-02 15:40:44 +01:00 |
|
Daniel Quinn
|
750ab5bf85
|
Use optipng to optimise document thumbnails
|
2018-10-07 14:56:38 +01:00 |
|
Daniel Quinn
|
2a3f766b93
|
Consolidate get_date onto the DocumentParser parent class
|
2018-10-07 14:56:02 +01:00 |
|
Daniel Quinn
|
c99f5923d5
|
Rename parsers to DATE_REGEX
In moving the `parsers` variable into the package-level, it lost the
context, so a more descriptive name was needed.
|
2018-09-09 21:02:30 +01:00 |
|
Daniel Quinn
|
ef302abed7
|
Fix pycodestyle complaints
|
2018-09-09 20:55:37 +01:00 |
|
Joshua Taillon
|
72c828170e
|
move date-matching regex pattern to base parser module for use by all subclasses
|
2018-09-05 21:13:36 -04:00 |
|
Joshua Taillon
|
4849249d86
|
explicitly add txt, md, and csv types for consumer and viewer; fix thumbnail generation
|
2018-09-03 23:46:13 -04:00 |
|
Joshua Taillon
|
d6fedbec52
|
first stab at text consumer
|
2018-08-30 23:32:41 -04:00 |
|