57 Commits

Author SHA1 Message Date
jonaswinkler
57f77c4657 fix test case 2021-01-02 15:52:02 +01:00
jonaswinkler
4b74cd5677 fix #236 2021-01-01 23:27:55 +01:00
jonaswinkler
40ef375c15 supply file_name for tika parser 2021-01-01 22:19:43 +01:00
jonaswinkler
fb83069975 fix test case. 2020-12-27 14:50:57 +01:00
jonaswinkler
7f9a0204b5 removed most of the logic that extracts data from filename patterns #156 2020-12-20 00:08:05 +01:00
jonaswinkler
32224f187d test CONSUMER_DELETE_DUPLICATES 2020-12-20 00:06:33 +01:00
jonaswinkler
74a99cf330 removed slugs entirely, since their only purpose was purely cosmetic anyway. 2020-12-09 00:04:37 +01:00
jonaswinkler
9da11f29c7 fixes #90 2020-12-08 13:54:49 +01:00
jonaswinkler
a079c310b4 changes to filename generation, partially addresses #90 2020-12-06 16:13:37 +01:00
jonaswinkler
834352130c checking file types against parsers in the consumer. 2020-12-01 15:26:05 +01:00
jonaswinkler
24b8c358cc Merge branch 'dev' into feature-ocrmypdf 2020-11-30 23:53:19 +01:00
jonaswinkler
e431a658cc more testing. 2020-11-30 22:04:25 +01:00
jonaswinkler
aaa6599283 Merge branch 'dev' into feature-ocrmypdf 2020-11-30 16:48:09 +01:00
jonaswinkler
f51207fc32 added file type checks to the parsers to prevent temporary files from being consumed. Also: parsers announce file types they wish to use as default for each mime type. 2020-11-30 00:40:04 +01:00
jonaswinkler
a3143ec512 more tests! 2020-11-29 19:22:49 +01:00
jonaswinkler
39c682dc07 Merge branch 'dev' into feature-ocrmypdf 2020-11-29 18:37:38 +01:00
jonaswinkler
023aeea7ea test cases for #67 2020-11-29 15:47:56 +01:00
jonaswinkler
ea9de1bcf1 Merge branch 'dev' into feature-ocrmypdf 2020-11-27 14:03:19 +01:00
jonaswinkler
6834e563a8 refactored the test cases to use a mixin for setting up temporary directories. 2020-11-27 14:00:41 +01:00
jonaswinkler
d04b54140c moved consumption dir check into the correct spot 2020-11-27 13:12:13 +01:00
jonaswinkler
8bcc40a182 Pipfile.lock post merge 2020-11-27 00:10:40 +01:00
jonaswinkler
24381ad5dc Merge branch 'dev' into feature-ocrmypdf 2020-11-27 00:06:20 +01:00
jonaswinkler
4bf0d834a0 improved test cases. Python 3.6 compatibility. 2020-11-26 22:17:14 +01:00
Jonas Winkler
f51d2be303 fixed the test cases 2020-11-25 19:51:09 +01:00
Jonas Winkler
3d5b66c2b7 FileType does not care about the extension anymore. 2020-11-20 16:18:59 +01:00
Jonas Winkler
41650f20f4 mime type handling 2020-11-20 13:31:03 +01:00
Jonas Winkler
727f86c369 codestyle 2020-11-18 22:41:14 +01:00
Jonas Winkler
bd322a0ce6 fixed test case. 2020-11-17 18:35:45 +01:00
Jonas Winkler
c7c6be42be refactor 2020-11-17 11:49:44 +01:00
Jonas Winkler
70d8e8bc56 added more testing 2020-11-16 23:16:37 +01:00
Jonas Winkler
6d14e111b6 fixed most of the test cases 2020-11-08 13:49:15 +01:00
Michael Gmelin
4f85d9ed9f Add unit test for PAPERLESS_FILENAME_PARSE_TRANSFORMS feature. 2019-09-08 20:58:13 +02:00
Daniel Quinn
f1e1bb4deb Fix #384: duplicate tags due to case insensitivity 2018-09-02 20:48:51 +01:00
Daniel Quinn
20a4a66a57 Clean up test formatting a bit 2018-04-22 16:28:21 +01:00
Daniel Quinn
7223ea3c3f Don't explode on invalid dates 2018-04-22 16:27:43 +01:00
Ovv
f8c6c07bb7 use tmp dir 2018-03-03 18:43:20 +00:00
Ovv
8fefafb844 style & test 2018-03-03 18:43:20 +00:00
Daniel Quinn
ede274386b Detect .tif files properly
Fixes #232
2017-07-15 19:02:11 +01:00
Daniel Quinn
6af58203dd fix: travis doesn't like my new tests 2017-03-28 21:23:42 +00:00
Daniel Quinn
fa4924d5ba fix: allow for caps in file name suffixes #206
@schinkelg ran aground of this one and I took the opportunity to add a
test to catch this sort of thing for next time.
2017-03-28 21:14:24 +00:00
Daniel Quinn
7611c2b3d5 fix: pep8 + travis & tox env updates 2017-03-25 15:10:51 +00:00
Daniel Quinn
55e81ca4bb feat: refactor for pluggable consumers
I've broken out the OCR-specific code from the consumers and dumped it
all into its own app, `paperless_tesseract`.  This new app should serve
as a sample of how to create one's own consumer for different file
types.

Documentation for how to do this isn't ready yet, but for the impatient:

* Create a new app
    * containing a `parsers.py` for your parser modelled after
      `paperless_tesseract.parsers.RasterisedDocumentParser`
    * containing a `signals.py` with a handler moddelled after
      `paperless_tesseract.signals.ConsumerDeclaration`
    * connect the signal handler to
      `documents.signals.document_consumer_declaration` in
      `your_app.apps`
* Install the app into Paperless by declaring
  `PAPERLESS_INSTALLED_APPS=your_app`.  Additional apps should be
  separated with commas.
* Restart the consumer
2017-03-25 15:10:25 +00:00
Daniel Quinn
fddd330e75 Fixed reference to wrong file 2017-01-01 16:40:29 +00:00
Daniel Quinn
6183e1ce5f pep8 2016-11-27 15:10:07 +00:00
Daniel Quinn
18495ce9da Fix for #154
* Added a test with a faked pyocr and tesseract
* Added a catch for pyocr's *other* TesseractError
2016-11-27 15:06:45 +00:00
Daniel Quinn
8e58406881 pep8 corrections 2016-10-26 09:32:59 +00:00
Aleksandr Bogdanov
63de2ca1b0 Collapsing excess whitespace after OCR 2016-10-12 01:46:34 +02:00
Daniel Quinn
1ce76a5486 Actually write the date found in the file name 2016-08-20 18:11:51 +01:00
Daniel Quinn
f5daded930 Fix for #131: delete files on document.delete 2016-08-16 19:13:37 +01:00
Daniel Quinn
0aa0513004 Modifications for support for dates 2016-03-24 19:18:33 +00:00