Trenton H
|
f7e6361206
|
Just in case, catch a sometimes nltk error and return the basic processed content instead
|
2023-05-24 19:34:49 -07:00 |
|
Trenton H
|
aabcc9a1c4
|
Upgrades black to v23, upgrades ruff
|
2023-04-26 09:35:27 -07:00 |
|
Trenton H
|
30655f1b73
|
Fixes ruff not running isort against the codebase
|
2023-04-26 09:35:27 -07:00 |
|
Trenton H
|
d2c02b9102
|
Configures ruff as the one stop linter and resolves warnings it raised
|
2023-04-01 17:03:52 -07:00 |
|
Trenton H
|
ec2b0eb308
|
Changes out the settings and a decent amount of test code to be pathlib compatible
|
2023-03-06 09:16:07 -08:00 |
|
Trenton Holmes
|
73dc928832
|
Returns to using hashing against primary keys, at least for fields. Improves testing coverage
|
2023-02-28 08:13:10 -08:00 |
|
Trenton Holmes
|
303e81eb79
|
Changes from a hash based system to a time based system to prevent extra retrains
|
2023-02-28 08:13:10 -08:00 |
|
Trenton H
|
21cd76a181
|
Changes classifier training to hold less data in memory at the same time
|
2023-02-28 08:13:10 -08:00 |
|
Trenton H
|
2d71415ede
|
Allows disabling NLTK, adds it as a consideration for low power devices
|
2022-10-10 08:58:23 -07:00 |
|
Trenton Holmes
|
a78d44ec5f
|
Changes the NLTK language to be based on the Tesseract OCR language, with fallback to the default processing
|
2022-10-10 08:58:23 -07:00 |
|
Trenton H
|
0bc13c2a72
|
Allows configuration of the NLTK processing language
|
2022-10-10 08:58:23 -07:00 |
|
Trenton Holmes
|
70b1988a55
|
Fixes the download and usage of the downloaded data
|
2022-10-10 08:58:23 -07:00 |
|
Trenton Holmes
|
66884ea035
|
Updates the pre-processing of document content to be much more robust, with tokenization, stemming and stop word removal
|
2022-10-10 08:58:23 -07:00 |
|
Trenton Holmes
|
024fd8bc9b
|
When raising an exception during exception handling, chain them together for slightly cleaner logs
|
2022-08-03 09:00:56 -07:00 |
|
Trenton Holmes
|
64be6dcb36
|
No need for a branch here, the loop takes care of it
|
2022-07-05 08:20:35 +02:00 |
|
Trenton Holmes
|
6bd585a9a0
|
Updates the classifier to catch warnings from scikit-learn and rebuild the model file when this happens
|
2022-07-05 08:20:35 +02:00 |
|
Markus
|
dd3b5c129c
|
Feature: Dynamic document storage pathes (#916)
* Added devcontainer
* Add feature storage pathes
* Exclude tests and add versioning
* Check escaping
* Check escaping
* Check quoting
* Echo
* Escape
* Escape :
* Double escape \
* Escaping
* Remove if
* Escape colon
* Missing \
* Esacpe :
* Escape all
* test
* Remove sed
* Fix exclude
* Remove SED command
* Add LD_LIBRARY_PATH
* Adjusted to v1.7
* Updated test-cases
* Remove devcontainer
* Removed internal build-file
* Run pre-commit
* Corrected flak8 error
* Adjusted to v1.7
* Updated test-cases
* Corrected flak8 error
* Adjusted to new plural translations
* Small adjustments due to code-review backend
* Adjusted line-break
* Removed PAPERLESS prefix from settings variables
* Corrected style change due to search+replace
* First documentation draft
* Revert changes to Pipfile
* Add sphinx-autobuild with keep-outdated
* Revert merge error that results in wrong storage path is evaluated
* Adjust styles of generated files ...
* Adds additional testing to cover dynamic storage path functionality
* Remove unnecessary condition
* Add hint to edit storage path dialog
* Correct spelling of pathes to paths
* Minor documentation tweaks
* Minor typo
* improving wrapping of filter editor buttons with new storage path button
* Update .gitignore
* Fix select border radius in non input-groups
* Better storage path edit hint
* Add note to edit storage path dialog re document_renamer
* Add note to bulk edit storage path re document_renamer
* Rename FILTER_STORAGE_DIRECTORY to PATH
* Fix broken filter rule parsing
* Show default storage if unspecified
* Remove note re storage path on bulk edit
* Add basic validation of filename variables
Co-authored-by: Markus Kling <markus@markus-kling.net>
Co-authored-by: Trenton Holmes <holmes.trenton@gmail.com>
Co-authored-by: Michael Shamoon <4887959+shamoon@users.noreply.github.com>
Co-authored-by: Quinn Casey <quinn@quinncasey.com>
|
2022-05-19 14:42:25 -07:00 |
|
Trenton Holmes
|
f62193099c
|
Runs pyupgrade to Python 3.8+ and adds a hook for it
|
2022-05-06 09:04:08 -07:00 |
|
Trenton Holmes
|
e3f8531c2d
|
Un-pickle and re-pickle the test models to resolve the version difference warning
|
2022-03-22 09:37:17 +01:00 |
|
Johann Bauer
|
5efa551946
|
Fix model test
|
2022-03-21 18:53:53 +01:00 |
|
Johann Bauer
|
9ceae3e0db
|
Increase FORMAT_VERSION to force model re-creation
|
2022-03-21 18:11:18 +01:00 |
|
Trenton Holmes
|
6635fa5f0d
|
Runs the pre-commit hooks over all the Python files
|
2022-03-11 11:34:28 -08:00 |
|
kpj
|
c56cb25b5f
|
Format Python code with black
|
2022-02-27 15:26:41 +01:00 |
|
jonaswinkler
|
ddd9ac9a07
|
write classifier model to temporary file before copying to final location
|
2021-06-13 12:03:20 +02:00 |
|
jonaswinkler
|
ac9bd6c908
|
better exception handling
|
2021-05-19 23:11:24 +02:00 |
|
jonaswinkler
|
0f960755ae
|
catch another exception regarding classifier loading
|
2021-05-19 22:57:52 +02:00 |
|
Jonas Winkler
|
dc565bd035
|
correct file mode
|
2021-05-16 01:22:51 +02:00 |
|
jonaswinkler
|
e4655866f3
|
fixes #689
|
2021-03-03 23:35:26 +01:00 |
|
jonaswinkler
|
dac21862fe
|
load sklearn modules only when training data has changed
|
2021-02-15 11:25:25 +01:00 |
|
jonaswinkler
|
c946263f31
|
revert a faulty change that caused memory usage to explode #537
|
2021-02-13 19:51:04 +01:00 |
|
jonaswinkler
|
555e37958f
|
better exception logging
|
2021-02-11 22:16:41 +01:00 |
|
jonaswinkler
|
85366024ec
|
classifier cache timeout
|
2021-02-06 21:03:32 +01:00 |
|
jonaswinkler
|
a4c1252a3b
|
classifier caching
|
2021-02-06 20:54:58 +01:00 |
|
jonaswinkler
|
e5a7dc0cc7
|
rework most of the logging
|
2021-02-05 01:10:29 +01:00 |
|
jonaswinkler
|
d08a530701
|
don't load sklearn libraries unless needed
|
2021-02-04 15:15:11 +01:00 |
|
jonaswinkler
|
3461e6f354
|
pycodestyle
|
2021-01-30 15:22:51 +01:00 |
|
jonaswinkler
|
a37e41ef0c
|
centralized classifier loading, better error handling, no error messages when auto matching is not used
|
2021-01-30 14:22:23 +01:00 |
|
jonaswinkler
|
0bc68d7d1a
|
more tests and bugfixes.
|
2020-11-27 15:36:32 +01:00 |
|
Jonas Winkler
|
c4f5f640ee
|
tests for the classifier and fixes for edge cases with minimal data.
|
2020-11-26 14:18:34 +01:00 |
|
Jonas Winkler
|
a532200d10
|
code cleanup
|
2020-11-21 15:34:00 +01:00 |
|
Jonas Winkler
|
eb6805e37e
|
code style fixes
|
2020-11-12 21:09:45 +01:00 |
|
Jonas Winkler
|
1c50b7693d
|
fixes #31
|
2020-11-12 10:04:01 +01:00 |
|
Jonas Winkler
|
33f1c82943
|
updated the classifier. Its now much faster and does not retrain when data hasnt changed.
|
2020-11-06 14:46:06 +01:00 |
|
Jonas Winkler
|
9a4ff3f807
|
replaced usages of .id with .pk, fixed filename issue in exporter
|
2020-11-03 12:37:37 +01:00 |
|
Jonas Winkler
|
6ce493e3a7
|
the document classifier is now stateless
|
2020-10-29 14:33:42 +01:00 |
|
Jonas Winkler
|
dd16b7262e
|
unified document matching, legacy and automatching work alongside now
|
2020-10-28 11:45:11 +01:00 |
|
Jonas Winkler
|
b71657964b
|
Code style changes
|
2018-09-26 10:51:42 +02:00 |
|
Jonas Winkler
|
efc7bf1d23
|
Code style adjustments
|
2018-09-25 16:09:33 +02:00 |
|
Jonas Winkler
|
20233a1706
|
Code style changed
|
2018-09-13 14:15:16 +02:00 |
|
Jonas Winkler
|
35ea0f2add
|
Merge branch 'machine-learning' into dev
|
2018-09-11 14:36:21 +02:00 |
|