27 Commits

Author SHA1 Message Date
Trenton Holmes
dbe916f957 Sets the timezone of creation, if the date is known and naive 2023-08-03 09:57:52 -07:00
Simon Siebert
29877d1ca3 Working arround current TIKA Library Bugs - lint 2023-08-03 09:55:10 -07:00
Simon Siebert
4664ff2f00 Working arround current TIKA Library Bugs 2023-07-06 23:26:01 +02:00
Trenton H
97d9edda96 Updates tika client library and handle the changes to it 2023-06-26 10:41:05 -06:00
Trenton H
7a63bcc817 Updates the httpx timeout to be 30s for all operations 2023-06-19 08:59:51 -07:00
Trenton Holmes
48ab961c68 Adds better error handling/checking around getting content of a document via Tika
Signed-off-by: Trenton Holmes <797416+stumpylog@users.noreply.github.com>
2023-06-18 08:39:17 -07:00
Trenton H
1b3492a01f Rewrites the email parsing to be more clear and concise.
Adds testing to use httpx mocked responses to stand in as a server even offline
2023-06-06 09:05:26 -07:00
Trenton H
36f09c4974 Swapping out the tika and replaces requests with httpx 2023-06-06 09:05:26 -07:00
Trenton H
bad8d304cb Improves the logging mixin and allows it to be typed better 2023-05-23 17:16:39 -07:00
Trenton H
30655f1b73 Fixes ruff not running isort against the codebase 2023-04-26 09:35:27 -07:00
Trenton H
d2c02b9102 Configures ruff as the one stop linter and resolves warnings it raised 2023-04-01 17:03:52 -07:00
Kexogg
b2009e842e Fix importing files with non-ascii names (#2555) 2023-01-31 11:33:06 -08:00
Trenton H
dc95cc3cd4 Adds setting to Gotenberg API call for outputting the correct PDF/A format 2023-01-27 11:05:23 -08:00
Trenton Holmes
fddd2af922 Ensure the tika parse function gets a string, not a PathLike 2022-09-14 07:48:12 -07:00
Trenton Holmes
02e5cea0f1 Fixes a minor TODO in settings, and enables flake8 for settings.py 2022-09-09 11:42:50 -07:00
Trenton Holmes
024fd8bc9b When raising an exception during exception handling, chain them together for slightly cleaner logs 2022-08-03 09:00:56 -07:00
Trenton Holmes
6635fa5f0d Runs the pre-commit hooks over all the Python files 2022-03-11 11:34:28 -08:00
kpj
c56cb25b5f Format Python code with black 2022-02-27 15:26:41 +01:00
Uli Fahrer
3ea164e4d2 fix(tika): adapt to Gotenberg 7 API
This commit adapts to the latest breaking changes from Gotenberg 7.
It also freezes the usage of the Gotenberg server to v7.x. Doing
this prevents further breaking changes leaking in our code base.

* refs #1250
2021-08-27 08:32:16 +02:00
jonaswinkler
b04d91d68c fix a bug with thumbnail generation when TIKA was enabled 2021-02-09 22:12:43 +01:00
jonaswinkler
e5a7dc0cc7 rework most of the logging 2021-02-05 01:10:29 +01:00
jonaswinkler
f0e2088b28 test cases 2021-01-02 15:25:13 +01:00
jonaswinkler
755f950cd2 supply file_name for tika parser 2021-01-01 22:19:43 +01:00
jonaswinkler
f800fdf66a fix up the tika parser 2021-01-01 21:59:21 +01:00
jonaswinkler
f1e9b414f9 remove duplicate code 2021-01-01 21:50:45 +01:00
Jo Vandeginste
aa88f25267 Refactor after feedback:
- rename PAPERLESS_TIKA to PAPERLESS_TIKA_ENABLED
- all other env params now start with PAPERLESS_TIKA
- convert_to_pdf as class instance method
- smaller details

Signed-off-by: Jo Vandeginste <Jo.Vandeginste@kuleuven.be>
2020-12-31 14:41:47 +01:00
Jo Vandeginste
bf8739864d Add the new paperless_tika parser
This parser will use an external Tika and Gotenberg server to parse
"Office" documents (.doc, .xls, .odt, etc.)

Signed-off-by: Jo Vandeginste <Jo.Vandeginste@kuleuven.be>
2020-12-29 21:51:21 +01:00