Trenton H
7a63bcc817
Updates the httpx timeout to be 30s for all operations
2023-06-19 08:59:51 -07:00
Trenton Holmes
48ab961c68
Adds better error handling/checking around getting content of a document via Tika
...
Signed-off-by: Trenton Holmes <797416+stumpylog@users.noreply.github.com>
2023-06-18 08:39:17 -07:00
Trenton H
1b3492a01f
Rewrites the email parsing to be more clear and concise.
...
Adds testing to use httpx mocked responses to stand in as a server even offline
2023-06-06 09:05:26 -07:00
Trenton H
36f09c4974
Swapping out the tika and replaces requests with httpx
2023-06-06 09:05:26 -07:00
Trenton H
bad8d304cb
Improves the logging mixin and allows it to be typed better
2023-05-23 17:16:39 -07:00
Trenton H
30655f1b73
Fixes ruff not running isort against the codebase
2023-04-26 09:35:27 -07:00
Trenton H
d2c02b9102
Configures ruff as the one stop linter and resolves warnings it raised
2023-04-01 17:03:52 -07:00
Trenton Holmes
6644ccc33f
Changes testing to use more declarative status code names from DRF
2023-02-20 10:25:21 -08:00
Trenton H
09ac404148
Adding more test coverage, in particular around Tika and its parser
2023-02-05 11:01:55 -08:00
Kexogg
b2009e842e
Fix importing files with non-ascii names ( #2555 )
2023-01-31 11:33:06 -08:00
Trenton H
dc95cc3cd4
Adds setting to Gotenberg API call for outputting the correct PDF/A format
2023-01-27 11:05:23 -08:00
Trenton H
6f23cfe78c
Resolves minor flake8 warnings in the test suite
2023-01-05 08:39:48 -08:00
Trenton Holmes
27d1d790f9
Try waiting a little bit after a parser error during the live testing
2022-11-02 15:55:12 -07:00
Trenton Holmes
d13ca98223
Enables some basic live testing against a tika server with actual sample documents to catch some more errors mocking won't catch
2022-10-07 18:06:06 -07:00
Trenton Holmes
fddd2af922
Ensure the tika parse function gets a string, not a PathLike
2022-09-14 07:48:12 -07:00
Trenton Holmes
02e5cea0f1
Fixes a minor TODO in settings, and enables flake8 for settings.py
2022-09-09 11:42:50 -07:00
Trenton Holmes
024fd8bc9b
When raising an exception during exception handling, chain them together for slightly cleaner logs
2022-08-03 09:00:56 -07:00
Trenton Holmes
6635fa5f0d
Runs the pre-commit hooks over all the Python files
2022-03-11 11:34:28 -08:00
kpj
c56cb25b5f
Format Python code with black
2022-02-27 15:26:41 +01:00
Uli Fahrer
3ea164e4d2
fix(tika): adapt to Gotenberg 7 API
...
This commit adapts to the latest breaking changes from Gotenberg 7.
It also freezes the usage of the Gotenberg server to v7.x. Doing
this prevents further breaking changes leaking in our code base.
* refs #1250
2021-08-27 08:32:16 +02:00
Jo Vandeginste
8720a76bec
Add support for rtf
...
Signed-off-by: Jo Vandeginste <Jo.Vandeginste@kuleuven.be>
2021-04-30 13:19:12 +02:00
jonaswinkler
b04d91d68c
fix a bug with thumbnail generation when TIKA was enabled
2021-02-09 22:12:43 +01:00
jonaswinkler
e5a7dc0cc7
rework most of the logging
2021-02-05 01:10:29 +01:00
jonaswinkler
95f5c9f3a6
lazy loading for parsers
2021-02-04 13:17:24 +01:00
jonaswinkler
e2680b7113
code style
2021-01-02 15:26:09 +01:00
jonaswinkler
f0e2088b28
test cases
2021-01-02 15:25:13 +01:00
jonaswinkler
755f950cd2
supply file_name for tika parser
2021-01-01 22:19:43 +01:00
jonaswinkler
f800fdf66a
fix up the tika parser
2021-01-01 21:59:21 +01:00
jonaswinkler
f1e9b414f9
remove duplicate code
2021-01-01 21:50:45 +01:00
Jo Vandeginste
aa88f25267
Refactor after feedback:
...
- rename PAPERLESS_TIKA to PAPERLESS_TIKA_ENABLED
- all other env params now start with PAPERLESS_TIKA
- convert_to_pdf as class instance method
- smaller details
Signed-off-by: Jo Vandeginste <Jo.Vandeginste@kuleuven.be>
2020-12-31 14:41:47 +01:00
Jo Vandeginste
bf8739864d
Add the new paperless_tika parser
...
This parser will use an external Tika and Gotenberg server to parse
"Office" documents (.doc, .xls, .odt, etc.)
Signed-off-by: Jo Vandeginste <Jo.Vandeginste@kuleuven.be>
2020-12-29 21:51:21 +01:00