Jonas Winkler
8c40c54421
codestyle
2020-11-18 22:41:14 +01:00
Jonas Winkler
680ab3d56b
updated logging, logging for the mail consumer to see whats happening
2020-11-18 13:23:30 +01:00
Jonas Winkler
39ba14aac1
refactor
2020-11-17 11:49:44 +01:00
Jonas Winkler
e30f0b274b
added more testing
2020-11-16 23:16:37 +01:00
Jonas Winkler
bd04c966c5
first version of the new consumer.
2020-11-16 18:26:54 +01:00
Jonas Winkler
eb6805e37e
code style fixes
2020-11-12 21:09:45 +01:00
Jonas Winkler
8b8a2af053
fixed the file handling implementation. The feature is cool, but the original implementation had so many small flaws it wasn't even funny.
2020-11-11 14:21:33 +01:00
Jonas Winkler
a91e46364a
small consumer fixes
2020-11-11 14:14:21 +01:00
Jonas Winkler
3048342de7
added a setting: delete duplicate documents
2020-11-10 01:47:58 +01:00
Jonas Winkler
d46203c114
backend that supports asgi and status update sockets with channels
2020-11-07 11:31:04 +01:00
Jonas Winkler
33f1c82943
updated the classifier. Its now much faster and does not retrain when data hasnt changed.
2020-11-06 14:46:06 +01:00
Jonas Winkler
9757e261f2
A handy script to redo ocr on all documents,
2020-11-03 14:04:11 +01:00
Jonas Winkler
a89773ad71
removed unused code, small fixes
2020-11-02 18:20:04 +01:00
Jonas Winkler
def3a85858
reworked most of the tesseract parser, better logging
2020-11-02 15:40:44 +01:00
Jonas Winkler
6fd73a04b8
updated consumer: now using watchdog
2020-11-01 23:07:54 +01:00
Jonas Winkler
6ce493e3a7
the document classifier is now stateless
2020-10-29 14:33:42 +01:00
Jonas Winkler
dd16b7262e
unified document matching, legacy and automatching work alongside now
2020-10-28 11:45:11 +01:00
Jonas Winkler
93d963ed4e
added
...
- document index
- api access for thumbnails/downloads
- more api filters
updated
- pipfile
removed
- filename handling
- legacy thumb/download access
- obsolete admin gui settings (per page items, FY, inline view)
2020-10-25 23:03:02 +01:00
Jonas Winkler
b71049ad16
Merge branch 'master' into dev
2020-10-16 15:02:57 +02:00
JOKer
5f8120add1
Merge pull request #593 from BastianPoe/feature-293
...
Give stored documents a structured and configurable filename
2020-05-02 08:33:49 +02:00
Johann Bauer
cea6dcce23
Warn if consume directory contains subdirectories
...
.
2020-01-04 01:09:54 +01:00
Wolf-Bastian Poettner
d1a54d6576
Allows to configure directory and filename formats for documents stored in paperless
...
Default configuration is as before (incrementing numbers), but additional fields can be added at will
2019-12-27 14:25:38 +00:00
Jonas Winkler
f711b146e1
Merge branch 'master' into dev
2018-12-11 12:38:15 +01:00
Jonas Winkler
8f0d53c54a
Merge remote-tracking branch 'upstream/master'
2018-12-11 12:06:15 +01:00
Daniel Quinn
bc898c1992
Use optipng to optimise document thumbnails
2018-10-07 14:56:38 +01:00
Daniel Quinn
40b9e44bfe
Wrap document consumption in a transaction #262
2018-10-07 13:12:22 +01:00
Jonas Winkler
001a80a528
Restored tagging functionality
2018-09-27 20:41:16 +02:00
Jonas Winkler
1c8576cfb9
mode change
2018-09-06 12:00:01 +02:00
Jonas Winkler
9d4155a907
removed matching model fields, automatic classifier reloading, added autmatic_classification field to matching model
2018-09-04 18:40:26 +02:00
Erik Arvstedt
88a05947f7
Update Consumer class documentation
2018-06-17 20:17:40 +01:00
Daniel Quinn
d566a014d2
Drop lines thanks to @erikarvstedt's eagle-eye
2018-06-17 17:10:45 +01:00
Daniel Quinn
e7fefc40fe
Merge branch 'master' into mcronce-disable_encryption
2018-06-17 16:32:51 +01:00
Daniel Quinn
d1b6e9329f
It's exist_ok=, not exists_ok= -- my bad.
2018-05-28 13:08:00 +01:00
Daniel Quinn
a9382ffd1a
Drop STORAGE_TYPE in favour of just using PAPERLESS_PASSPHRASE
2018-05-28 12:58:28 +01:00
Daniel Quinn
92d9506a2e
Make the consumer aware of the different storage types
2018-05-28 12:58:28 +01:00
Erik Arvstedt
d132e2b9f5
fixup: remove helper fn 'make_dirs'
2018-05-21 00:45:00 +02:00
Erik Arvstedt
8b37af994a
Consider mtime of ignored files, garbage-collect ignore list
...
1. Store the mtime of ignored files so that we can reconsider them if
they have changed.
2. Regularly reset the ignore list to files that still exist in the
consumption dir. Previously, the list could grow indefinitely.
2018-05-11 14:05:30 +02:00
Erik Arvstedt
cc22204e5a
Simplify ignoring docs
2018-05-11 14:05:29 +02:00
Erik Arvstedt
f56ec70aad
Ensure docs have been unmodified for some time before consuming
...
Previously, the second mtime check for new files usually happened right
after the first one, which could have caused consumption of docs that
were still being modified.
We're now waiting for at least FILES_MIN_UNMODIFIED_DURATION (0.5s).
This also cleans up the logic by eliminating the consumer.stats attribute
and the weird double call to consumer.run().
Additionally, this a fixes memory leak in consumer.stats where paths could be
added but never removed if the corresponding files disappeared from
the consumer dir before being considered ready.
2018-05-11 14:05:29 +02:00
Erik Arvstedt
0db6ed225b
Refactor: extract fn try_consume_file
...
The main purpose of this change is to make the following commits more
readable.
2018-05-11 14:05:28 +02:00
Erik Arvstedt
312a6a91b5
Use os.scandir instead of os.listdir
...
It's simpler and better suited for use cases introduced in later commits.
2018-05-11 14:05:25 +02:00
Erik Arvstedt
2c64e70754
Consume documents in order of increasing mtime
...
This increases overall usability, especially for multi-page scans.
Previously, the consumption order was undefined (see os.listdir())
2018-05-11 14:04:37 +02:00
Erik Arvstedt
9320230100
Refactor: extract fn 'make_dirs'
2018-05-11 14:04:36 +02:00
Daniel Quinn
13452ba33b
Clean up docstring to be properly rst
2018-03-03 18:43:20 +00:00
Ovv
b10c2c770c
style & test
2018-03-03 18:43:20 +00:00
Ovv
d89dbbe537
Configuration cli argument for document_consumer
2018-03-03 18:43:20 +00:00
Daniel Quinn
4f726e1991
Monitor return codes of calls to convert
and unpaper
...
...and handle the failures nicely. Addresses #303 .
2018-02-18 16:02:27 +00:00
Daniel Quinn
caf44146db
Style and removal of Python 2.7 stuff
2018-02-18 15:55:55 +00:00
Wolf-Bastian Pöttner
21fc51c09a
Add support for a heuristic that extracts the document date from its text
2018-01-28 19:37:10 +01:00
Daniel Quinn
e7d4ca92ba
fix: allow for caps in file name suffixes #206
...
@schinkelg ran aground of this one and I took the opportunity to add a
test to catch this sort of thing for next time.
2017-03-28 21:14:24 +00:00