Trenton Holmes
dbe916f957
Sets the timezone of creation, if the date is known and naive
2023-08-03 09:57:52 -07:00
Simon Siebert
29877d1ca3
Working arround current TIKA Library Bugs - lint
2023-08-03 09:55:10 -07:00
Simon Siebert
4664ff2f00
Working arround current TIKA Library Bugs
2023-07-06 23:26:01 +02:00
Trenton H
97d9edda96
Updates tika client library and handle the changes to it
2023-06-26 10:41:05 -06:00
Trenton H
7a63bcc817
Updates the httpx timeout to be 30s for all operations
2023-06-19 08:59:51 -07:00
Trenton Holmes
48ab961c68
Adds better error handling/checking around getting content of a document via Tika
...
Signed-off-by: Trenton Holmes <797416+stumpylog@users.noreply.github.com>
2023-06-18 08:39:17 -07:00
Trenton H
1b3492a01f
Rewrites the email parsing to be more clear and concise.
...
Adds testing to use httpx mocked responses to stand in as a server even offline
2023-06-06 09:05:26 -07:00
Trenton H
36f09c4974
Swapping out the tika and replaces requests with httpx
2023-06-06 09:05:26 -07:00
Trenton H
bad8d304cb
Improves the logging mixin and allows it to be typed better
2023-05-23 17:16:39 -07:00
Trenton H
30655f1b73
Fixes ruff not running isort against the codebase
2023-04-26 09:35:27 -07:00
Trenton H
d2c02b9102
Configures ruff as the one stop linter and resolves warnings it raised
2023-04-01 17:03:52 -07:00
Kexogg
b2009e842e
Fix importing files with non-ascii names ( #2555 )
2023-01-31 11:33:06 -08:00
Trenton H
dc95cc3cd4
Adds setting to Gotenberg API call for outputting the correct PDF/A format
2023-01-27 11:05:23 -08:00
Trenton Holmes
fddd2af922
Ensure the tika parse function gets a string, not a PathLike
2022-09-14 07:48:12 -07:00
Trenton Holmes
02e5cea0f1
Fixes a minor TODO in settings, and enables flake8 for settings.py
2022-09-09 11:42:50 -07:00
Trenton Holmes
024fd8bc9b
When raising an exception during exception handling, chain them together for slightly cleaner logs
2022-08-03 09:00:56 -07:00
Trenton Holmes
6635fa5f0d
Runs the pre-commit hooks over all the Python files
2022-03-11 11:34:28 -08:00
kpj
c56cb25b5f
Format Python code with black
2022-02-27 15:26:41 +01:00
Uli Fahrer
3ea164e4d2
fix(tika): adapt to Gotenberg 7 API
...
This commit adapts to the latest breaking changes from Gotenberg 7.
It also freezes the usage of the Gotenberg server to v7.x. Doing
this prevents further breaking changes leaking in our code base.
* refs #1250
2021-08-27 08:32:16 +02:00
jonaswinkler
b04d91d68c
fix a bug with thumbnail generation when TIKA was enabled
2021-02-09 22:12:43 +01:00
jonaswinkler
e5a7dc0cc7
rework most of the logging
2021-02-05 01:10:29 +01:00
jonaswinkler
f0e2088b28
test cases
2021-01-02 15:25:13 +01:00
jonaswinkler
755f950cd2
supply file_name for tika parser
2021-01-01 22:19:43 +01:00
jonaswinkler
f800fdf66a
fix up the tika parser
2021-01-01 21:59:21 +01:00
jonaswinkler
f1e9b414f9
remove duplicate code
2021-01-01 21:50:45 +01:00
Jo Vandeginste
aa88f25267
Refactor after feedback:
...
- rename PAPERLESS_TIKA to PAPERLESS_TIKA_ENABLED
- all other env params now start with PAPERLESS_TIKA
- convert_to_pdf as class instance method
- smaller details
Signed-off-by: Jo Vandeginste <Jo.Vandeginste@kuleuven.be>
2020-12-31 14:41:47 +01:00
Jo Vandeginste
bf8739864d
Add the new paperless_tika parser
...
This parser will use an external Tika and Gotenberg server to parse
"Office" documents (.doc, .xls, .odt, etc.)
Signed-off-by: Jo Vandeginste <Jo.Vandeginste@kuleuven.be>
2020-12-29 21:51:21 +01:00