Trenton H
facb7226fe
Chore: Backend bulk updates ( #4509 )
2023-11-13 17:09:56 +00:00
Trenton H
999ae678c2
Feature: Switches to a new client to handle communication with Gotenberg ( #4391 )
...
Switches to a new client to handle communication with Gotenberg for merging and generating PDFs
2023-10-20 00:27:29 +00:00
Trenton H
ada67bd54e
Retry Tika parsing with PUT instead of form data in the event of a 500 error response ( #4334 )
2023-10-07 18:36:27 -07:00
Trenton H
8d60506884
Standarizes the imports across all the files and modules ( #4248 )
2023-09-23 20:17:01 -07:00
Trenton Holmes
650c816a7b
Removes support for Python 3.8 and lower from the code base
2023-09-10 11:42:59 -07:00
Trenton H
a1697ff21c
Combine and extend the utility for calling the live services to be more robust against failures, reporting, etc
2023-09-08 19:20:08 -07:00
Trenton Holmes
fe1f88ce5d
Sets the http timeouts equal to the task timeout, so it's either done or really done
2023-08-23 18:40:22 -07:00
Trenton Holmes
6bcc26b487
Sets the timezone of creation, if the date is known and naive
2023-08-03 09:57:52 -07:00
Simon Siebert
56fcb3fee1
Working arround current TIKA Library Bugs - lint
2023-08-03 09:55:10 -07:00
Simon Siebert
d875be60d4
Working arround current TIKA Library Bugs
2023-07-06 23:26:01 +02:00
Trenton H
e05b3441de
Updates tika client library and handle the changes to it
2023-06-26 10:41:05 -06:00
Trenton H
74fe7c586b
Updates the httpx timeout to be 30s for all operations
2023-06-19 08:59:51 -07:00
Trenton Holmes
4782b4da07
Adds better error handling/checking around getting content of a document via Tika
...
Signed-off-by: Trenton Holmes <797416+stumpylog@users.noreply.github.com>
2023-06-18 08:39:17 -07:00
Trenton H
2c1cd25be4
Rewrites the email parsing to be more clear and concise.
...
Adds testing to use httpx mocked responses to stand in as a server even offline
2023-06-06 09:05:26 -07:00
Trenton H
6e65558ea4
Swapping out the tika and replaces requests with httpx
2023-06-06 09:05:26 -07:00
Trenton H
452c79f9a1
Improves the logging mixin and allows it to be typed better
2023-05-23 17:16:39 -07:00
Trenton H
3bcbd05252
Fixes ruff not running isort against the codebase
2023-04-26 09:35:27 -07:00
Trenton H
ce41ac9158
Configures ruff as the one stop linter and resolves warnings it raised
2023-04-01 17:03:52 -07:00
Trenton Holmes
a6e2708605
Changes testing to use more declarative status code names from DRF
2023-02-20 10:25:21 -08:00
Trenton H
bdcba570cb
Adding more test coverage, in particular around Tika and its parser
2023-02-05 11:01:55 -08:00
Kexogg
dae4550bc3
Fix importing files with non-ascii names ( #2555 )
2023-01-31 11:33:06 -08:00
Trenton H
1b2cb13a21
Adds setting to Gotenberg API call for outputting the correct PDF/A format
2023-01-27 11:05:23 -08:00
Trenton H
6ff28c92a4
Resolves minor flake8 warnings in the test suite
2023-01-05 08:39:48 -08:00
Trenton Holmes
3c325582d9
Try waiting a little bit after a parser error during the live testing
2022-11-02 15:55:12 -07:00
Trenton Holmes
9c0c734b34
Enables some basic live testing against a tika server with actual sample documents to catch some more errors mocking won't catch
2022-10-07 18:06:06 -07:00
Trenton Holmes
d4cb84ff76
Ensure the tika parse function gets a string, not a PathLike
2022-09-14 07:48:12 -07:00
Trenton Holmes
0bf9e55ca7
Fixes a minor TODO in settings, and enables flake8 for settings.py
2022-09-09 11:42:50 -07:00
Trenton Holmes
b70e21a6d5
When raising an exception during exception handling, chain them together for slightly cleaner logs
2022-08-03 09:00:56 -07:00
Trenton Holmes
1771d18a21
Runs the pre-commit hooks over all the Python files
2022-03-11 11:34:28 -08:00
kpj
fc695896dd
Format Python code with black
2022-02-27 15:26:41 +01:00
Uli Fahrer
2dcacaee14
fix(tika): adapt to Gotenberg 7 API
...
This commit adapts to the latest breaking changes from Gotenberg 7.
It also freezes the usage of the Gotenberg server to v7.x. Doing
this prevents further breaking changes leaking in our code base.
* refs #1250
2021-08-27 08:32:16 +02:00
Jo Vandeginste
ec0af59596
Add support for rtf
...
Signed-off-by: Jo Vandeginste <Jo.Vandeginste@kuleuven.be>
2021-04-30 13:19:12 +02:00
jonaswinkler
8d6071e977
fix a bug with thumbnail generation when TIKA was enabled
2021-02-09 22:12:43 +01:00
jonaswinkler
431d4fd8e4
rework most of the logging
2021-02-05 01:10:29 +01:00
jonaswinkler
44ec3a3d9c
lazy loading for parsers
2021-02-04 13:17:24 +01:00
jonaswinkler
e97ff3d671
code style
2021-01-02 15:26:09 +01:00
jonaswinkler
97e96d02f2
test cases
2021-01-02 15:25:13 +01:00
jonaswinkler
40ef375c15
supply file_name for tika parser
2021-01-01 22:19:43 +01:00
jonaswinkler
de32addf76
fix up the tika parser
2021-01-01 21:59:21 +01:00
jonaswinkler
c05bfb894a
remove duplicate code
2021-01-01 21:50:45 +01:00
Jo Vandeginste
5236f4e58d
Refactor after feedback:
...
- rename PAPERLESS_TIKA to PAPERLESS_TIKA_ENABLED
- all other env params now start with PAPERLESS_TIKA
- convert_to_pdf as class instance method
- smaller details
Signed-off-by: Jo Vandeginste <Jo.Vandeginste@kuleuven.be>
2020-12-31 14:41:47 +01:00
Jo Vandeginste
b8e8bf3dd4
Add the new paperless_tika parser
...
This parser will use an external Tika and Gotenberg server to parse
"Office" documents (.doc, .xls, .odt, etc.)
Signed-off-by: Jo Vandeginste <Jo.Vandeginste@kuleuven.be>
2020-12-29 21:51:21 +01:00