68 Commits

Author SHA1 Message Date
Trenton H
4813a7bc70
Chore: Adds additional rules for Ruff linter (#5660) 2024-02-05 21:46:59 +00:00
Trenton H
25542c56b9
Feature: Cache metadata and suggestions in Redis (#5638) 2024-02-04 10:42:21 -08:00
Trenton H
e16645b146
Feature: Add additional caching support to suggestions and metadata (#5414)
* Adds ETag and Last-Modified headers to suggestions, metadata and previews

* Slight update to the suggestions etag

* Small user message for why classifier didn't train again
2024-01-16 17:01:07 +00:00
Trenton H
41a3c7c89b
Fix: Catch new warning when loading the classifier (#5395) 2024-01-14 13:21:17 -08:00
shamoon
f525ac0af6
Chore: add pre-commit hook for codespell (#5324) 2024-01-08 13:03:05 -08:00
Trenton H
061f33fb05
Feature: Allow setting backend configuration settings via the UI (#5126)
* Saving some start on this

* At least partially working for the tesseract parser

* Problems with migration testing need to figure out

* Work around that error

* Fixes max m_pixels

* Moving the settings to main paperless application

* Starting some consumer options

* More fixes and work

* Fixes these last tests

* Fix max_length on OcrSettings.mode field

* Fix all fields on Common & Ocr settings serializers

* Umbrellla config view

* Revert "Umbrellla config view"

This reverts commit fbaf9f4be30f89afeb509099180158a3406416a5.

* Updates to use a single configuration object for all settings

* Squashed commit of the following:

commit 8a0a49dd5766094f60462fbfbe62e9921fbd2373
Author: shamoon <4887959+shamoon@users.noreply.github.com>
Date:   Tue Dec 19 23:02:47 2023 -0800

    Fix formatting

commit 66b2d90c507b8afd9507813ff555e46198ea33b9
Author: shamoon <4887959+shamoon@users.noreply.github.com>
Date:   Tue Dec 19 22:36:35 2023 -0800

    Refactor frontend data models

commit 5723bd8dd823ee855625e250df39393e26709d48
Author: Adam Bogdał <adam@bogdal.pl>
Date:   Wed Dec 20 01:17:43 2023 +0100

    Fix: speed up admin panel for installs with a large number of documents (#5052)

commit 9b08ce176199bf9011a6634bb88f616846150d2b
Author: shamoon <4887959+shamoon@users.noreply.github.com>
Date:   Tue Dec 19 15:18:51 2023 -0800

    Update PULL_REQUEST_TEMPLATE.md

commit a6248bec2d793b7690feed95fcaf5eb34a75bfb6
Author: shamoon <4887959+shamoon@users.noreply.github.com>
Date:   Tue Dec 19 15:02:05 2023 -0800

    Chore: Update Angular to v17 (#4980)

commit b1f6f52486d5ba5c04af99b41315eb6428fd1fa8
Author: shamoon <4887959+shamoon@users.noreply.github.com>
Date:   Tue Dec 19 13:53:56 2023 -0800

    Fix: Dont allow null custom_fields property via API (#5063)

commit 638d9970fd468d8c02c91d19bd28f8b0796bdcb1
Author: shamoon <4887959+shamoon@users.noreply.github.com>
Date:   Tue Dec 19 13:43:50 2023 -0800

    Enhancement: symmetric document links (#4907)

commit 5e8de4c1da6eb4eb8f738b20962595c7536b30ec
Author: shamoon <4887959+shamoon@users.noreply.github.com>
Date:   Tue Dec 19 12:45:04 2023 -0800

    Enhancement: shared icon & shared by me filter (#4859)

commit 088bad90306025d3f6b139cbd0ad264a1cbecfe5
Author: Trenton H <797416+stumpylog@users.noreply.github.com>
Date:   Tue Dec 19 12:04:03 2023 -0800

    Bulk updates all the backend libraries (#5061)

* Saving some work on frontend config

* Very basic but dynamically-generated config form

* Saving work on slightly less ugly frontend config

* JSON validation for user_args field

* Fully dynamic config form

* Adds in some additional validators for a nicer error message

* Cleaning up the testing and coverage more

* Reverts unintentional change

* Adds documentation about the settings and the precedence

* Couple more commenting and style fixes

---------

Co-authored-by: shamoon <4887959+shamoon@users.noreply.github.com>
2023-12-29 15:42:56 -08:00
Trenton H
facb7226fe
Chore: Backend bulk updates (#4509) 2023-11-13 17:09:56 +00:00
Trenton Holmes
650c816a7b Removes support for Python 3.8 and lower from the code base 2023-09-10 11:42:59 -07:00
Trenton Holmes
d376f9e7a3 Adding more typing around the classification and matching 2023-07-26 07:03:43 -07:00
Trenton H
8aa5ecde62 Updates some Python dependencies and the hooks 2023-07-20 18:30:11 -07:00
Trenton H
c1641f6fb8 Just in case, catch a sometimes nltk error and return the basic processed content instead 2023-05-24 19:34:49 -07:00
Trenton H
6f163111ce Upgrades black to v23, upgrades ruff 2023-04-26 09:35:27 -07:00
Trenton H
3bcbd05252 Fixes ruff not running isort against the codebase 2023-04-26 09:35:27 -07:00
Trenton H
ce41ac9158 Configures ruff as the one stop linter and resolves warnings it raised 2023-04-01 17:03:52 -07:00
Trenton H
41bcfcaffe Changes out the settings and a decent amount of test code to be pathlib compatible 2023-03-06 09:16:07 -08:00
Trenton Holmes
6b939f7567 Returns to using hashing against primary keys, at least for fields. Improves testing coverage 2023-02-28 08:13:10 -08:00
Trenton Holmes
c958a7c593 Changes from a hash based system to a time based system to prevent extra retrains 2023-02-28 08:13:10 -08:00
Trenton H
8709ea4df0 Changes classifier training to hold less data in memory at the same time 2023-02-28 08:13:10 -08:00
Trenton H
1e891414a3 Allows disabling NLTK, adds it as a consideration for low power devices 2022-10-10 08:58:23 -07:00
Trenton Holmes
c44c914d3d Changes the NLTK language to be based on the Tesseract OCR language, with fallback to the default processing 2022-10-10 08:58:23 -07:00
Trenton H
d10d2f5a54 Allows configuration of the NLTK processing language 2022-10-10 08:58:23 -07:00
Trenton Holmes
6523cf0c4b Fixes the download and usage of the downloaded data 2022-10-10 08:58:23 -07:00
Trenton Holmes
d856e48045 Updates the pre-processing of document content to be much more robust, with tokenization, stemming and stop word removal 2022-10-10 08:58:23 -07:00
Trenton Holmes
b70e21a6d5 When raising an exception during exception handling, chain them together for slightly cleaner logs 2022-08-03 09:00:56 -07:00
Trenton Holmes
55dadea98e No need for a branch here, the loop takes care of it 2022-07-05 08:20:35 +02:00
Trenton Holmes
77fbbe95ff Updates the classifier to catch warnings from scikit-learn and rebuild the model file when this happens 2022-07-05 08:20:35 +02:00
Markus
69ef26dab0
Feature: Dynamic document storage pathes (#916)
* Added devcontainer

* Add feature storage pathes

* Exclude tests and add versioning

* Check escaping

* Check escaping

* Check quoting

* Echo

* Escape

* Escape :

* Double escape \

* Escaping

* Remove if

* Escape colon

* Missing \

* Esacpe :

* Escape all

* test

* Remove sed

* Fix exclude

* Remove SED command

* Add LD_LIBRARY_PATH

* Adjusted to v1.7

* Updated test-cases

* Remove devcontainer

* Removed internal build-file

* Run pre-commit

* Corrected flak8 error

* Adjusted to v1.7

* Updated test-cases

* Corrected flak8 error

* Adjusted to new plural translations

* Small adjustments due to code-review backend

* Adjusted line-break

* Removed PAPERLESS prefix from settings variables

* Corrected style change due to search+replace

* First documentation draft

* Revert changes to Pipfile

* Add sphinx-autobuild with keep-outdated

* Revert merge error that results in wrong storage path is evaluated

* Adjust styles of generated files ...

* Adds additional testing to cover dynamic storage path functionality

* Remove unnecessary condition

* Add hint to edit storage path dialog

* Correct spelling of pathes to paths

* Minor documentation tweaks

* Minor typo

* improving wrapping of filter editor buttons with new storage path button

* Update .gitignore

* Fix select border radius in non input-groups

* Better storage path edit hint

* Add note to edit storage path dialog re document_renamer

* Add note to bulk edit storage path re document_renamer

* Rename FILTER_STORAGE_DIRECTORY to PATH

* Fix broken filter rule parsing

* Show default storage if unspecified

* Remove note re storage path on bulk edit

* Add basic validation of filename variables

Co-authored-by: Markus Kling <markus@markus-kling.net>
Co-authored-by: Trenton Holmes <holmes.trenton@gmail.com>
Co-authored-by: Michael Shamoon <4887959+shamoon@users.noreply.github.com>
Co-authored-by: Quinn Casey <quinn@quinncasey.com>
2022-05-19 14:42:25 -07:00
Trenton Holmes
3003bdd507 Runs pyupgrade to Python 3.8+ and adds a hook for it 2022-05-06 09:04:08 -07:00
Trenton Holmes
9bb5568d8e Un-pickle and re-pickle the test models to resolve the version difference warning 2022-03-22 09:37:17 +01:00
Johann Bauer
cffdaefe2f Fix model test 2022-03-21 18:53:53 +01:00
Johann Bauer
9de4ca61e8 Increase FORMAT_VERSION to force model re-creation 2022-03-21 18:11:18 +01:00
Trenton Holmes
1771d18a21 Runs the pre-commit hooks over all the Python files 2022-03-11 11:34:28 -08:00
kpj
fc695896dd Format Python code with black 2022-02-27 15:26:41 +01:00
jonaswinkler
a3dae02cfb write classifier model to temporary file before copying to final location 2021-06-13 12:03:20 +02:00
jonaswinkler
635c96accf better exception handling 2021-05-19 23:11:24 +02:00
jonaswinkler
ca1e838c52 catch another exception regarding classifier loading 2021-05-19 22:57:52 +02:00
Jonas Winkler
61b47e358f correct file mode 2021-05-16 01:22:51 +02:00
jonaswinkler
12235cc853 fixes #689 2021-03-03 23:35:26 +01:00
jonaswinkler
7e88085377 load sklearn modules only when training data has changed 2021-02-15 11:25:25 +01:00
jonaswinkler
b48e67d714 revert a faulty change that caused memory usage to explode #537 2021-02-13 19:51:04 +01:00
jonaswinkler
ed0b1fe115 better exception logging 2021-02-11 22:16:41 +01:00
jonaswinkler
7702f5012b classifier cache timeout 2021-02-06 21:03:32 +01:00
jonaswinkler
ffe96c8fff classifier caching 2021-02-06 20:54:58 +01:00
jonaswinkler
431d4fd8e4 rework most of the logging 2021-02-05 01:10:29 +01:00
jonaswinkler
d8e0ef257e don't load sklearn libraries unless needed 2021-02-04 15:15:11 +01:00
jonaswinkler
4c6a02aee7 pycodestyle 2021-01-30 15:22:51 +01:00
jonaswinkler
87a18eae2d centralized classifier loading, better error handling, no error messages when auto matching is not used 2021-01-30 14:22:23 +01:00
jonaswinkler
bc4192e7d1 more tests and bugfixes. 2020-11-27 15:36:32 +01:00
Jonas Winkler
30acfdd3f1 tests for the classifier and fixes for edge cases with minimal data. 2020-11-26 14:18:34 +01:00
Jonas Winkler
450fb877f6 code cleanup 2020-11-21 15:34:00 +01:00