30 Commits

Author SHA1 Message Date
Trenton H
999ae678c2
Feature: Switches to a new client to handle communication with Gotenberg (#4391)
Switches to a new client to handle communication with Gotenberg for merging and generating PDFs
2023-10-20 00:27:29 +00:00
Trenton H
ada67bd54e
Retry Tika parsing with PUT instead of form data in the event of a 500 error response (#4334) 2023-10-07 18:36:27 -07:00
Trenton Holmes
fe1f88ce5d Sets the http timeouts equal to the task timeout, so it's either done or really done 2023-08-23 18:40:22 -07:00
Trenton Holmes
6bcc26b487 Sets the timezone of creation, if the date is known and naive 2023-08-03 09:57:52 -07:00
Simon Siebert
56fcb3fee1 Working arround current TIKA Library Bugs - lint 2023-08-03 09:55:10 -07:00
Simon Siebert
d875be60d4 Working arround current TIKA Library Bugs 2023-07-06 23:26:01 +02:00
Trenton H
e05b3441de Updates tika client library and handle the changes to it 2023-06-26 10:41:05 -06:00
Trenton H
74fe7c586b Updates the httpx timeout to be 30s for all operations 2023-06-19 08:59:51 -07:00
Trenton Holmes
4782b4da07 Adds better error handling/checking around getting content of a document via Tika
Signed-off-by: Trenton Holmes <797416+stumpylog@users.noreply.github.com>
2023-06-18 08:39:17 -07:00
Trenton H
2c1cd25be4 Rewrites the email parsing to be more clear and concise.
Adds testing to use httpx mocked responses to stand in as a server even offline
2023-06-06 09:05:26 -07:00
Trenton H
6e65558ea4 Swapping out the tika and replaces requests with httpx 2023-06-06 09:05:26 -07:00
Trenton H
452c79f9a1 Improves the logging mixin and allows it to be typed better 2023-05-23 17:16:39 -07:00
Trenton H
3bcbd05252 Fixes ruff not running isort against the codebase 2023-04-26 09:35:27 -07:00
Trenton H
ce41ac9158 Configures ruff as the one stop linter and resolves warnings it raised 2023-04-01 17:03:52 -07:00
Kexogg
dae4550bc3
Fix importing files with non-ascii names (#2555) 2023-01-31 11:33:06 -08:00
Trenton H
1b2cb13a21 Adds setting to Gotenberg API call for outputting the correct PDF/A format 2023-01-27 11:05:23 -08:00
Trenton Holmes
d4cb84ff76 Ensure the tika parse function gets a string, not a PathLike 2022-09-14 07:48:12 -07:00
Trenton Holmes
0bf9e55ca7 Fixes a minor TODO in settings, and enables flake8 for settings.py 2022-09-09 11:42:50 -07:00
Trenton Holmes
b70e21a6d5 When raising an exception during exception handling, chain them together for slightly cleaner logs 2022-08-03 09:00:56 -07:00
Trenton Holmes
1771d18a21 Runs the pre-commit hooks over all the Python files 2022-03-11 11:34:28 -08:00
kpj
fc695896dd Format Python code with black 2022-02-27 15:26:41 +01:00
Uli Fahrer
2dcacaee14 fix(tika): adapt to Gotenberg 7 API
This commit adapts to the latest breaking changes from Gotenberg 7.
It also freezes the usage of the Gotenberg server to v7.x. Doing
this prevents further breaking changes leaking in our code base.

* refs #1250
2021-08-27 08:32:16 +02:00
jonaswinkler
8d6071e977 fix a bug with thumbnail generation when TIKA was enabled 2021-02-09 22:12:43 +01:00
jonaswinkler
431d4fd8e4 rework most of the logging 2021-02-05 01:10:29 +01:00
jonaswinkler
97e96d02f2 test cases 2021-01-02 15:25:13 +01:00
jonaswinkler
40ef375c15 supply file_name for tika parser 2021-01-01 22:19:43 +01:00
jonaswinkler
de32addf76 fix up the tika parser 2021-01-01 21:59:21 +01:00
jonaswinkler
c05bfb894a remove duplicate code 2021-01-01 21:50:45 +01:00
Jo Vandeginste
5236f4e58d
Refactor after feedback:
- rename PAPERLESS_TIKA to PAPERLESS_TIKA_ENABLED
- all other env params now start with PAPERLESS_TIKA
- convert_to_pdf as class instance method
- smaller details

Signed-off-by: Jo Vandeginste <Jo.Vandeginste@kuleuven.be>
2020-12-31 14:41:47 +01:00
Jo Vandeginste
b8e8bf3dd4
Add the new paperless_tika parser
This parser will use an external Tika and Gotenberg server to parse
"Office" documents (.doc, .xls, .odt, etc.)

Signed-off-by: Jo Vandeginste <Jo.Vandeginste@kuleuven.be>
2020-12-29 21:51:21 +01:00