40 Commits

Author SHA1 Message Date
jonaswinkler
a3334293af more tests 2020-12-19 15:54:13 +01:00
jonaswinkler
45d31f9735 fixes bauerj/paperless_app#23 and most of all other scanner apps out there. 2020-12-12 18:25:15 +01:00
jonaswinkler
1d073d2cfd a couple fixes and more supported image files 2020-12-02 17:39:49 +01:00
jonaswinkler
0fb294d556 testing the new noarchive option. 2020-12-01 14:30:13 +01:00
jonaswinkler
20cc7e3dc0 more tests! 2020-11-29 19:58:48 +01:00
jonaswinkler
99e6906b51 test case fixes. 2020-11-27 14:06:37 +01:00
Jonas Winkler
f901def797 more tests of the new parser 2020-11-26 00:08:23 +01:00
Jonas Winkler
c00c63c639 fixed the test cases 2020-11-25 19:51:09 +01:00
Jonas Winkler
f5656222e2 removed obsolete tests. 2020-11-25 14:51:32 +01:00
Jonas Winkler
f976a0b4ba mime type handling 2020-11-20 13:31:03 +01:00
Jonas Winkler
cbee56ae8c testing the tesseract parser 2020-11-19 20:31:08 +01:00
Jonas Winkler
9a48d6c577 Changed the way parsers are discovered. This also prepares for upcoming changes regarding content types and file types: parsers should declare what they support, and actual file extensions should not be hardcoded everywhere. 2020-11-16 23:53:12 +01:00
Jonas Winkler
eb6805e37e code style fixes 2020-11-12 21:09:45 +01:00
Jonas Winkler
340f9f141f fixed most of the tests 2020-11-02 19:42:23 +01:00
Jonas Winkler
a89773ad71 removed unused code, small fixes 2020-11-02 18:20:04 +01:00
Johannes Wienke
ebcfcea05b Handle dateparser ValueErrors
When parsing dates from the document text or filenames, correctly handle values
errors indicating broken dates. Newly added tests ensure that this handling
works properly.
2020-03-08 18:44:15 +01:00
Johannes Wienke
6531a67940 Remove duplicated date parsing test
The exact same tests existed twice in the file.
2020-03-08 18:26:29 +01:00
Daniel Quinn
e395b0e081 Drop problematic tests
Some tests had differing outcomes depending on the version of Tesseract
installed on the test system.  This lead to a bunch of false test
failures, which lead to people (including me) just ignoring the Travis
results.

This commit removes those tests, and while it reduces our coverage, at
least the results are predictable.
2018-12-30 17:32:45 +00:00
Daniel Quinn
86b0d08377 Use modern languages for sample test files 2018-12-30 14:09:17 +00:00
Erik Arvstedt
f38ac7f62b Fix date test sample image
The previous version of `tests_date_3.png` had too much spacing
between the `0` and the `8` glyphs, which resulted in the year getting
parsed as `200 8` in Tesseract 3.05.00 (+ tessdata 3.04.00).
This caused the date parsing test to fail.
2018-12-02 15:10:21 +01:00
Daniel Quinn
0d59844567 Conform everything to the coding standards
https://paperless.readthedocs.io/en/latest/contributing.html#additional-style-guides
2018-12-01 17:09:12 +00:00
Daniel Quinn
4e186ede0e Merge branch 'ENH_filename_date_parsing' of https://github.com/jat255/paperless into jat255-ENH_filename_date_parsing 2018-12-01 16:57:16 +00:00
Daniel Quinn
9c6b8629a3 Fix language guesses in tests
It turns out that the Lorem ipsum text in the sample files was confuing the language guesser, causing it to think the file was in Catalan and not English or German.
2018-12-01 15:55:59 +00:00
Joshua Taillon
b0326b5a19 Merge branch 'master' of github.com:danielquinn/paperless into ENH_filename_date_parsing 2018-11-15 23:17:59 -05:00
Joshua Taillon
a2422cc529 Add option for parsing of date from filename (and associated tests) 2018-11-15 20:32:15 -05:00
Joshua Taillon
8b69aa1e52 Update date tests to be more explicit with settings and allow tests to pass if using a timezone other than UTC 2018-11-15 20:30:23 -05:00
Daniel Quinn
074609e1fc Consolidate get_date onto the DocumentParser parent class 2018-10-07 14:56:02 +01:00
Daniel Quinn
0a4338143a Tweak the date guesser to not allow dates prior to 1900 (#414) 2018-10-01 20:03:47 +01:00
Erik Arvstedt
4fa9ff60fc Stop tests from writing to the source tree 2018-07-19 23:48:23 +02:00
Wolf-Bastian Pöttner
328330eb08 Increase testcoverage by testing two more date detection cases 2018-02-19 21:36:48 +01:00
Daniel Quinn
fc6d2d5e0c Fix formatting 2018-02-18 18:00:34 +00:00
Daniel Quinn
9e26e7b39e Fix tests to use _text instead of TEXT_CACHE 2018-02-18 18:00:22 +00:00
Daniel Quinn
7c5ca5f505 Merge pull request #302 from BastianPoe/bugfix/extend_regex_to_find_more_dates
Extends the regex to find dates in documents as reported by @isaacsando
2018-02-18 17:23:49 +01:00
Wolf-Bastian Pöttner
fc81feb32e Add more (fast-running) unit tests 2018-02-14 21:41:01 +01:00
Wolf-Bastian Pöttner
3899763261 Extends the regex to find dates in documents as reported by @isaacsando 2018-02-12 22:41:15 +01:00
Daniel Quinn
4c0b908a41 Rework tests to write to /tmp
Originally the test wrote scratch data inside the repo dir, which meant
manual cleanup.  Now it writes to `/tmp/paperless-tests-<random-string>`
and cleans up after itself.
2018-02-03 14:49:48 +00:00
Wolf-Bastian Pöttner
4f725cf4d2 Add test cases for date parsing 2018-02-03 00:28:49 +01:00
Daniel Quinn
67844dff0c Update test for #259 fix 2017-10-16 10:53:18 +01:00
Daniel Quinn
e7d4ca92ba fix: allow for caps in file name suffixes #206
@schinkelg ran aground of this one and I took the opportunity to add a
test to catch this sort of thing for next time.
2017-03-28 21:14:24 +00:00
Daniel Quinn
d2c283582b feat: refactor for pluggable consumers
I've broken out the OCR-specific code from the consumers and dumped it
all into its own app, `paperless_tesseract`.  This new app should serve
as a sample of how to create one's own consumer for different file
types.

Documentation for how to do this isn't ready yet, but for the impatient:

* Create a new app
    * containing a `parsers.py` for your parser modelled after
      `paperless_tesseract.parsers.RasterisedDocumentParser`
    * containing a `signals.py` with a handler moddelled after
      `paperless_tesseract.signals.ConsumerDeclaration`
    * connect the signal handler to
      `documents.signals.document_consumer_declaration` in
      `your_app.apps`
* Install the app into Paperless by declaring
  `PAPERLESS_INSTALLED_APPS=your_app`.  Additional apps should be
  separated with commas.
* Restart the consumer
2017-03-25 15:10:25 +00:00