130 Commits

Author SHA1 Message Date
jonaswinkler
f373211281 tests for pre and post consume script 2021-01-06 14:08:44 +01:00
jonaswinkler
e07128a145 don't run post-consume script inside transaction #259 2021-01-04 00:03:31 +01:00
jonaswinkler
4b74cd5677 fix #236 2021-01-01 23:27:55 +01:00
jonaswinkler
40ef375c15 supply file_name for tika parser 2021-01-01 22:19:43 +01:00
jonaswinkler
208f4291d6 fixes #222 2020-12-30 21:54:36 +01:00
jonaswinkler
99b5e25731 clarify error messages #176 2020-12-25 19:26:27 +01:00
jonaswinkler
cffe9fa354 debug log mime type #176 2020-12-23 15:22:28 +01:00
jonaswinkler
46b0776714 fix typo #176 2020-12-23 15:14:24 +01:00
jonaswinkler
7f9a0204b5 removed most of the logic that extracts data from filename patterns #156 2020-12-20 00:08:05 +01:00
jonaswinkler
6003122b06 fixes #112 2020-12-09 22:16:57 +01:00
jonaswinkler
74a99cf330 removed slugs entirely, since their only purpose was purely cosmetic anyway. 2020-12-09 00:04:37 +01:00
jonaswinkler
550a74347c a test that "verifies" that the file renaming lock works and no inconsistencies are created. 2020-12-08 21:08:44 +01:00
jonaswinkler
9da11f29c7 fixes #90 2020-12-08 13:54:49 +01:00
jonaswinkler
834352130c checking file types against parsers in the consumer. 2020-12-01 15:26:05 +01:00
jonaswinkler
8a5c782425 filename handling for archive files. 2020-11-30 21:38:42 +01:00
jonaswinkler
aaa6599283 Merge branch 'dev' into feature-ocrmypdf 2020-11-30 16:48:09 +01:00
jonaswinkler
f51207fc32 added file type checks to the parsers to prevent temporary files from being consumed. Also: parsers announce file types they wish to use as default for each mime type. 2020-11-30 00:40:04 +01:00
jonaswinkler
39c682dc07 Merge branch 'dev' into feature-ocrmypdf 2020-11-29 18:37:38 +01:00
jonaswinkler
023aeea7ea test cases for #67 2020-11-29 15:47:56 +01:00
jonaswinkler
a27daaebe9 fixes an issue with paperless not assigning metadata when FILENAME_FORMAT is specified and resolves an invalid warning about missing files fixes #67 2020-11-29 14:45:43 +01:00
jonaswinkler
9677631bb2 error logging. 2020-11-29 12:37:11 +01:00
jonaswinkler
24767f62c7 added checksums for archived documents. 2020-11-29 12:31:26 +01:00
jonaswinkler
ea9de1bcf1 Merge branch 'dev' into feature-ocrmypdf 2020-11-27 14:03:19 +01:00
jonaswinkler
d04b54140c moved consumption dir check into the correct spot 2020-11-27 13:12:13 +01:00
Jonas Winkler
7e84863beb Merge branch 'dev' into feature-ocrmypdf 2020-11-25 21:13:02 +01:00
Jonas Winkler
ef15de18a9 Paperless will continue to operate with encrypted files, however, all new files will be stored unencrypted. 2020-11-25 21:03:06 +01:00
Jonas Winkler
6f30ceea38 GnuPG for archive file. 2020-11-25 20:16:27 +01:00
Jonas Winkler
9bd0bee2f6 codestyle 2020-11-25 19:51:02 +01:00
Jonas Winkler
df801d17e1 reworked the interface of the parsers. 2020-11-25 19:36:39 +01:00
Jonas Winkler
8069c2eb6a add support for archive files. 2020-11-25 14:47:17 +01:00
Jonas Winkler
9a33f191a7 added archive directory. 2020-11-25 14:45:21 +01:00
Jonas Winkler
b44f8383e4 code cleanup 2020-11-21 14:03:45 +01:00
Jonas Winkler
3d5b66c2b7 FileType does not care about the extension anymore. 2020-11-20 16:18:59 +01:00
Jonas Winkler
41650f20f4 mime type handling 2020-11-20 13:31:03 +01:00
Jonas Winkler
727f86c369 codestyle 2020-11-18 22:41:14 +01:00
Jonas Winkler
8908bc259e updated logging, logging for the mail consumer to see whats happening 2020-11-18 13:23:30 +01:00
Jonas Winkler
c7c6be42be refactor 2020-11-17 11:49:44 +01:00
Jonas Winkler
70d8e8bc56 added more testing 2020-11-16 23:16:37 +01:00
Jonas Winkler
8dca459573 first version of the new consumer. 2020-11-16 18:26:54 +01:00
Jonas Winkler
2e04ba1c04 code style fixes 2020-11-12 21:09:45 +01:00
Jonas Winkler
734da28b69 fixed the file handling implementation. The feature is cool, but the original implementation had so many small flaws it wasn't even funny. 2020-11-11 14:21:33 +01:00
Jonas Winkler
02ef7cb038 small consumer fixes 2020-11-11 14:14:21 +01:00
Jonas Winkler
83f82f3caf added a setting: delete duplicate documents 2020-11-10 01:47:58 +01:00
Jonas Winkler
296c113b16 updated the classifier. Its now much faster and does not retrain when data hasnt changed. 2020-11-06 14:46:06 +01:00
Jonas Winkler
f4cebda085 A handy script to redo ocr on all documents, 2020-11-03 14:04:11 +01:00
Jonas Winkler
7d282a4e4e removed unused code, small fixes 2020-11-02 18:20:04 +01:00
Jonas Winkler
d15405ef56 reworked most of the tesseract parser, better logging 2020-11-02 15:40:44 +01:00
Jonas Winkler
9f29dc2863 updated consumer: now using watchdog 2020-11-01 23:07:54 +01:00
Jonas Winkler
05f20c19c3 the document classifier is now stateless 2020-10-29 14:33:42 +01:00
Jonas Winkler
11af74ba36 unified document matching, legacy and automatching work alongside now 2020-10-28 11:45:11 +01:00