Trenton Holmes
27d1d790f9
Try waiting a little bit after a parser error during the live testing
2022-11-02 15:55:12 -07:00
Trenton Holmes
d13ca98223
Enables some basic live testing against a tika server with actual sample documents to catch some more errors mocking won't catch
2022-10-07 18:06:06 -07:00
Trenton Holmes
fddd2af922
Ensure the tika parse function gets a string, not a PathLike
2022-09-14 07:48:12 -07:00
Trenton Holmes
02e5cea0f1
Fixes a minor TODO in settings, and enables flake8 for settings.py
2022-09-09 11:42:50 -07:00
Trenton Holmes
024fd8bc9b
When raising an exception during exception handling, chain them together for slightly cleaner logs
2022-08-03 09:00:56 -07:00
Trenton Holmes
6635fa5f0d
Runs the pre-commit hooks over all the Python files
2022-03-11 11:34:28 -08:00
kpj
c56cb25b5f
Format Python code with black
2022-02-27 15:26:41 +01:00
Uli Fahrer
3ea164e4d2
fix(tika): adapt to Gotenberg 7 API
...
This commit adapts to the latest breaking changes from Gotenberg 7.
It also freezes the usage of the Gotenberg server to v7.x. Doing
this prevents further breaking changes leaking in our code base.
* refs #1250
2021-08-27 08:32:16 +02:00
Jo Vandeginste
8720a76bec
Add support for rtf
...
Signed-off-by: Jo Vandeginste <Jo.Vandeginste@kuleuven.be>
2021-04-30 13:19:12 +02:00
jonaswinkler
b04d91d68c
fix a bug with thumbnail generation when TIKA was enabled
2021-02-09 22:12:43 +01:00
jonaswinkler
e5a7dc0cc7
rework most of the logging
2021-02-05 01:10:29 +01:00
jonaswinkler
95f5c9f3a6
lazy loading for parsers
2021-02-04 13:17:24 +01:00
jonaswinkler
e2680b7113
code style
2021-01-02 15:26:09 +01:00
jonaswinkler
f0e2088b28
test cases
2021-01-02 15:25:13 +01:00
jonaswinkler
755f950cd2
supply file_name for tika parser
2021-01-01 22:19:43 +01:00
jonaswinkler
f800fdf66a
fix up the tika parser
2021-01-01 21:59:21 +01:00
jonaswinkler
f1e9b414f9
remove duplicate code
2021-01-01 21:50:45 +01:00
Jo Vandeginste
aa88f25267
Refactor after feedback:
...
- rename PAPERLESS_TIKA to PAPERLESS_TIKA_ENABLED
- all other env params now start with PAPERLESS_TIKA
- convert_to_pdf as class instance method
- smaller details
Signed-off-by: Jo Vandeginste <Jo.Vandeginste@kuleuven.be>
2020-12-31 14:41:47 +01:00
Jo Vandeginste
bf8739864d
Add the new paperless_tika parser
...
This parser will use an external Tika and Gotenberg server to parse
"Office" documents (.doc, .xls, .odt, etc.)
Signed-off-by: Jo Vandeginste <Jo.Vandeginste@kuleuven.be>
2020-12-29 21:51:21 +01:00