47 Commits

Author SHA1 Message Date
Trenton H
6f3bc54c41 Chore: Initial conversion to pytest fixtures (#7110) 2024-07-08 07:46:20 -07:00
Trenton H
d92659a63d Feature: Upgrade Gotenberg to v8 (#7094) 2024-06-27 02:37:50 +00:00
Daniel Böhme
5bdb4136a5 Enhancement: allow consumption of odg files (#6940) 2024-06-09 07:34:22 -07:00
Trenton H
16584328f1 Chore: Change the code formatter to Ruff (#6756)
* Changing the formatting to ruff-format

* Replaces references to black to ruff or ruff format, removes black from dependencies
2024-05-18 02:26:50 +00:00
Trenton H
c8a62715ec Feature: Allow setting backend configuration settings via the UI (#5126)
* Saving some start on this

* At least partially working for the tesseract parser

* Problems with migration testing need to figure out

* Work around that error

* Fixes max m_pixels

* Moving the settings to main paperless application

* Starting some consumer options

* More fixes and work

* Fixes these last tests

* Fix max_length on OcrSettings.mode field

* Fix all fields on Common & Ocr settings serializers

* Umbrellla config view

* Revert "Umbrellla config view"

This reverts commit fbaf9f4be30f89afeb509099180158a3406416a5.

* Updates to use a single configuration object for all settings

* Squashed commit of the following:

commit 8a0a49dd5766094f60462fbfbe62e9921fbd2373
Author: shamoon <4887959+shamoon@users.noreply.github.com>
Date:   Tue Dec 19 23:02:47 2023 -0800

    Fix formatting

commit 66b2d90c507b8afd9507813ff555e46198ea33b9
Author: shamoon <4887959+shamoon@users.noreply.github.com>
Date:   Tue Dec 19 22:36:35 2023 -0800

    Refactor frontend data models

commit 5723bd8dd823ee855625e250df39393e26709d48
Author: Adam Bogdał <adam@bogdal.pl>
Date:   Wed Dec 20 01:17:43 2023 +0100

    Fix: speed up admin panel for installs with a large number of documents (#5052)

commit 9b08ce176199bf9011a6634bb88f616846150d2b
Author: shamoon <4887959+shamoon@users.noreply.github.com>
Date:   Tue Dec 19 15:18:51 2023 -0800

    Update PULL_REQUEST_TEMPLATE.md

commit a6248bec2d793b7690feed95fcaf5eb34a75bfb6
Author: shamoon <4887959+shamoon@users.noreply.github.com>
Date:   Tue Dec 19 15:02:05 2023 -0800

    Chore: Update Angular to v17 (#4980)

commit b1f6f52486d5ba5c04af99b41315eb6428fd1fa8
Author: shamoon <4887959+shamoon@users.noreply.github.com>
Date:   Tue Dec 19 13:53:56 2023 -0800

    Fix: Dont allow null custom_fields property via API (#5063)

commit 638d9970fd468d8c02c91d19bd28f8b0796bdcb1
Author: shamoon <4887959+shamoon@users.noreply.github.com>
Date:   Tue Dec 19 13:43:50 2023 -0800

    Enhancement: symmetric document links (#4907)

commit 5e8de4c1da6eb4eb8f738b20962595c7536b30ec
Author: shamoon <4887959+shamoon@users.noreply.github.com>
Date:   Tue Dec 19 12:45:04 2023 -0800

    Enhancement: shared icon & shared by me filter (#4859)

commit 088bad90306025d3f6b139cbd0ad264a1cbecfe5
Author: Trenton H <797416+stumpylog@users.noreply.github.com>
Date:   Tue Dec 19 12:04:03 2023 -0800

    Bulk updates all the backend libraries (#5061)

* Saving some work on frontend config

* Very basic but dynamically-generated config form

* Saving work on slightly less ugly frontend config

* JSON validation for user_args field

* Fully dynamic config form

* Adds in some additional validators for a nicer error message

* Cleaning up the testing and coverage more

* Reverts unintentional change

* Adds documentation about the settings and the precedence

* Couple more commenting and style fixes

---------

Co-authored-by: shamoon <4887959+shamoon@users.noreply.github.com>
2023-12-29 15:42:56 -08:00
Trenton H
ab9e561317 Chore: Backend bulk updates (#4509) 2023-11-13 17:09:56 +00:00
Trenton H
61d0459e3c Feature: Switches to a new client to handle communication with Gotenberg (#4391)
Switches to a new client to handle communication with Gotenberg for merging and generating PDFs
2023-10-20 00:27:29 +00:00
Trenton H
9d13af81c0 Retry Tika parsing with PUT instead of form data in the event of a 500 error response (#4334) 2023-10-07 18:36:27 -07:00
Trenton H
dc642152d1 Standarizes the imports across all the files and modules (#4248) 2023-09-23 20:17:01 -07:00
Trenton Holmes
34b80a4d8e Removes support for Python 3.8 and lower from the code base 2023-09-10 11:42:59 -07:00
Trenton H
226cda9d77 Combine and extend the utility for calling the live services to be more robust against failures, reporting, etc 2023-09-08 19:20:08 -07:00
Trenton Holmes
0ddb62943c Sets the http timeouts equal to the task timeout, so it's either done or really done 2023-08-23 18:40:22 -07:00
Trenton Holmes
dbe916f957 Sets the timezone of creation, if the date is known and naive 2023-08-03 09:57:52 -07:00
Simon Siebert
29877d1ca3 Working arround current TIKA Library Bugs - lint 2023-08-03 09:55:10 -07:00
Simon Siebert
4664ff2f00 Working arround current TIKA Library Bugs 2023-07-06 23:26:01 +02:00
Trenton H
97d9edda96 Updates tika client library and handle the changes to it 2023-06-26 10:41:05 -06:00
Trenton H
7a63bcc817 Updates the httpx timeout to be 30s for all operations 2023-06-19 08:59:51 -07:00
Trenton Holmes
48ab961c68 Adds better error handling/checking around getting content of a document via Tika
Signed-off-by: Trenton Holmes <797416+stumpylog@users.noreply.github.com>
2023-06-18 08:39:17 -07:00
Trenton H
1b3492a01f Rewrites the email parsing to be more clear and concise.
Adds testing to use httpx mocked responses to stand in as a server even offline
2023-06-06 09:05:26 -07:00
Trenton H
36f09c4974 Swapping out the tika and replaces requests with httpx 2023-06-06 09:05:26 -07:00
Trenton H
bad8d304cb Improves the logging mixin and allows it to be typed better 2023-05-23 17:16:39 -07:00
Trenton H
30655f1b73 Fixes ruff not running isort against the codebase 2023-04-26 09:35:27 -07:00
Trenton H
d2c02b9102 Configures ruff as the one stop linter and resolves warnings it raised 2023-04-01 17:03:52 -07:00
Trenton Holmes
6644ccc33f Changes testing to use more declarative status code names from DRF 2023-02-20 10:25:21 -08:00
Trenton H
09ac404148 Adding more test coverage, in particular around Tika and its parser 2023-02-05 11:01:55 -08:00
Kexogg
b2009e842e Fix importing files with non-ascii names (#2555) 2023-01-31 11:33:06 -08:00
Trenton H
dc95cc3cd4 Adds setting to Gotenberg API call for outputting the correct PDF/A format 2023-01-27 11:05:23 -08:00
Trenton H
6f23cfe78c Resolves minor flake8 warnings in the test suite 2023-01-05 08:39:48 -08:00
Trenton Holmes
27d1d790f9 Try waiting a little bit after a parser error during the live testing 2022-11-02 15:55:12 -07:00
Trenton Holmes
d13ca98223 Enables some basic live testing against a tika server with actual sample documents to catch some more errors mocking won't catch 2022-10-07 18:06:06 -07:00
Trenton Holmes
fddd2af922 Ensure the tika parse function gets a string, not a PathLike 2022-09-14 07:48:12 -07:00
Trenton Holmes
02e5cea0f1 Fixes a minor TODO in settings, and enables flake8 for settings.py 2022-09-09 11:42:50 -07:00
Trenton Holmes
024fd8bc9b When raising an exception during exception handling, chain them together for slightly cleaner logs 2022-08-03 09:00:56 -07:00
Trenton Holmes
6635fa5f0d Runs the pre-commit hooks over all the Python files 2022-03-11 11:34:28 -08:00
kpj
c56cb25b5f Format Python code with black 2022-02-27 15:26:41 +01:00
Uli Fahrer
3ea164e4d2 fix(tika): adapt to Gotenberg 7 API
This commit adapts to the latest breaking changes from Gotenberg 7.
It also freezes the usage of the Gotenberg server to v7.x. Doing
this prevents further breaking changes leaking in our code base.

* refs #1250
2021-08-27 08:32:16 +02:00
Jo Vandeginste
8720a76bec Add support for rtf
Signed-off-by: Jo Vandeginste <Jo.Vandeginste@kuleuven.be>
2021-04-30 13:19:12 +02:00
jonaswinkler
b04d91d68c fix a bug with thumbnail generation when TIKA was enabled 2021-02-09 22:12:43 +01:00
jonaswinkler
e5a7dc0cc7 rework most of the logging 2021-02-05 01:10:29 +01:00
jonaswinkler
95f5c9f3a6 lazy loading for parsers 2021-02-04 13:17:24 +01:00
jonaswinkler
e2680b7113 code style 2021-01-02 15:26:09 +01:00
jonaswinkler
f0e2088b28 test cases 2021-01-02 15:25:13 +01:00
jonaswinkler
755f950cd2 supply file_name for tika parser 2021-01-01 22:19:43 +01:00
jonaswinkler
f800fdf66a fix up the tika parser 2021-01-01 21:59:21 +01:00
jonaswinkler
f1e9b414f9 remove duplicate code 2021-01-01 21:50:45 +01:00
Jo Vandeginste
aa88f25267 Refactor after feedback:
- rename PAPERLESS_TIKA to PAPERLESS_TIKA_ENABLED
- all other env params now start with PAPERLESS_TIKA
- convert_to_pdf as class instance method
- smaller details

Signed-off-by: Jo Vandeginste <Jo.Vandeginste@kuleuven.be>
2020-12-31 14:41:47 +01:00
Jo Vandeginste
bf8739864d Add the new paperless_tika parser
This parser will use an external Tika and Gotenberg server to parse
"Office" documents (.doc, .xls, .odt, etc.)

Signed-off-by: Jo Vandeginste <Jo.Vandeginste@kuleuven.be>
2020-12-29 21:51:21 +01:00