Jonas Winkler
9bd0bee2f6
codestyle
2020-11-25 19:51:02 +01:00
Jonas Winkler
df801d17e1
reworked the interface of the parsers.
2020-11-25 19:36:39 +01:00
Jonas Winkler
8069c2eb6a
add support for archive files.
2020-11-25 14:47:17 +01:00
Jonas Winkler
9a33f191a7
added archive directory.
2020-11-25 14:45:21 +01:00
Jonas Winkler
d252a1dcda
Merge branch 'dev' into celery-tasks
2020-11-22 22:49:37 +01:00
Jonas Winkler
b44f8383e4
code cleanup
2020-11-21 14:03:45 +01:00
Jonas Winkler
3d5b66c2b7
FileType does not care about the extension anymore.
2020-11-20 16:18:59 +01:00
Jonas Winkler
41650f20f4
mime type handling
2020-11-20 13:31:03 +01:00
Jonas Winkler
391020a2b0
small changes
2020-11-20 10:58:17 +01:00
Jonas Winkler
17430210a1
Merge branch 'dev' into celery-tasks
2020-11-19 22:10:57 +01:00
Jonas Winkler
727f86c369
codestyle
2020-11-18 22:41:14 +01:00
Jonas Winkler
8908bc259e
updated logging, logging for the mail consumer to see whats happening
2020-11-18 13:23:30 +01:00
Jonas Winkler
c7c6be42be
refactor
2020-11-17 11:49:44 +01:00
Jonas Winkler
70d8e8bc56
added more testing
2020-11-16 23:16:37 +01:00
Jonas Winkler
8dca459573
first version of the new consumer.
2020-11-16 18:26:54 +01:00
Jonas Winkler
2e04ba1c04
code style fixes
2020-11-12 21:09:45 +01:00
Jonas Winkler
734da28b69
fixed the file handling implementation. The feature is cool, but the original implementation had so many small flaws it wasn't even funny.
2020-11-11 14:21:33 +01:00
Jonas Winkler
02ef7cb038
small consumer fixes
2020-11-11 14:14:21 +01:00
Jonas Winkler
83f82f3caf
added a setting: delete duplicate documents
2020-11-10 01:47:58 +01:00
Jonas Winkler
572e40ca27
backend that supports asgi and status update sockets with channels
2020-11-07 11:31:04 +01:00
Jonas Winkler
296c113b16
updated the classifier. Its now much faster and does not retrain when data hasnt changed.
2020-11-06 14:46:06 +01:00
Jonas Winkler
f4cebda085
A handy script to redo ocr on all documents,
2020-11-03 14:04:11 +01:00
Jonas Winkler
7d282a4e4e
removed unused code, small fixes
2020-11-02 18:20:04 +01:00
Jonas Winkler
d15405ef56
reworked most of the tesseract parser, better logging
2020-11-02 15:40:44 +01:00
Jonas Winkler
9f29dc2863
updated consumer: now using watchdog
2020-11-01 23:07:54 +01:00
Jonas Winkler
05f20c19c3
the document classifier is now stateless
2020-10-29 14:33:42 +01:00
Jonas Winkler
11af74ba36
unified document matching, legacy and automatching work alongside now
2020-10-28 11:45:11 +01:00
Jonas Winkler
052c1680f3
added
...
- document index
- api access for thumbnails/downloads
- more api filters
updated
- pipfile
removed
- filename handling
- legacy thumb/download access
- obsolete admin gui settings (per page items, FY, inline view)
2020-10-25 23:03:02 +01:00
Jonas Winkler
421dab786d
Merge branch 'master' into dev
2020-10-16 15:02:57 +02:00
JOKer
8698f92ac9
Merge pull request #593 from BastianPoe/feature-293
...
Give stored documents a structured and configurable filename
2020-05-02 08:33:49 +02:00
Johann Bauer
22c7f309a7
Warn if consume directory contains subdirectories
...
.
2020-01-04 01:09:54 +01:00
Wolf-Bastian Poettner
6813805712
Allows to configure directory and filename formats for documents stored in paperless
...
Default configuration is as before (incrementing numbers), but additional fields can be added at will
2019-12-27 14:25:38 +00:00
Jonas Winkler
ea58c66fd4
Merge branch 'master' into dev
2018-12-11 12:38:15 +01:00
Jonas Winkler
766109ae4e
Merge remote-tracking branch 'upstream/master'
2018-12-11 12:06:15 +01:00
Daniel Quinn
750ab5bf85
Use optipng to optimise document thumbnails
2018-10-07 14:56:38 +01:00
Daniel Quinn
14bb52b6a4
Wrap document consumption in a transaction #262
2018-10-07 13:12:22 +01:00
Jonas Winkler
b347e3347d
Restored tagging functionality
2018-09-27 20:41:16 +02:00
Jonas Winkler
11adc94e5e
mode change
2018-09-06 12:00:01 +02:00
Jonas Winkler
70bd05450a
removed matching model fields, automatic classifier reloading, added autmatic_classification field to matching model
2018-09-04 18:40:26 +02:00
Erik Arvstedt
742b01d1f5
Update Consumer class documentation
2018-06-17 20:17:40 +01:00
Daniel Quinn
90cd9f3eb7
Drop lines thanks to @erikarvstedt's eagle-eye
2018-06-17 17:10:45 +01:00
Daniel Quinn
c9f35a7da2
Merge branch 'master' into mcronce-disable_encryption
2018-06-17 16:32:51 +01:00
Daniel Quinn
81a8cb45d7
It's exist_ok=, not exists_ok= -- my bad.
2018-05-28 13:08:00 +01:00
Daniel Quinn
6e1f2b3f03
Drop STORAGE_TYPE in favour of just using PAPERLESS_PASSPHRASE
2018-05-28 12:58:28 +01:00
Daniel Quinn
d8740ee5ca
Make the consumer aware of the different storage types
2018-05-28 12:58:28 +01:00
Erik Arvstedt
bccac5017c
fixup: remove helper fn 'make_dirs'
2018-05-21 00:45:00 +02:00
Erik Arvstedt
e65e27d11f
Consider mtime of ignored files, garbage-collect ignore list
...
1. Store the mtime of ignored files so that we can reconsider them if
they have changed.
2. Regularly reset the ignore list to files that still exist in the
consumption dir. Previously, the list could grow indefinitely.
2018-05-11 14:05:30 +02:00
Erik Arvstedt
12488c9634
Simplify ignoring docs
2018-05-11 14:05:29 +02:00
Erik Arvstedt
61cd050e24
Ensure docs have been unmodified for some time before consuming
...
Previously, the second mtime check for new files usually happened right
after the first one, which could have caused consumption of docs that
were still being modified.
We're now waiting for at least FILES_MIN_UNMODIFIED_DURATION (0.5s).
This also cleans up the logic by eliminating the consumer.stats attribute
and the weird double call to consumer.run().
Additionally, this a fixes memory leak in consumer.stats where paths could be
added but never removed if the corresponding files disappeared from
the consumer dir before being considered ready.
2018-05-11 14:05:29 +02:00
Erik Arvstedt
f018e8e54f
Refactor: extract fn try_consume_file
...
The main purpose of this change is to make the following commits more
readable.
2018-05-11 14:05:28 +02:00