paperless-ngx

mirror of https://github.com/paperless-ngx/paperless-ngx.git synced 2025-12-24 02:05:48 -06:00

Author	SHA1	Message	Date
Jonas Winkler	d2e22e3f27	Changed the way parsers are discovered. This also prepares for upcoming changes regarding content types and file types: parsers should declare what they support, and actual file extensions should not be hardcoded everywhere.	2020-11-16 23:53:12 +01:00
Jonas Winkler	2e04ba1c04	code style fixes	2020-11-12 21:09:45 +01:00
Jonas Winkler	f182709fdd	fixed most of the tests	2020-11-02 19:42:23 +01:00
Jonas Winkler	7d282a4e4e	removed unused code, small fixes	2020-11-02 18:20:04 +01:00
Johannes Wienke	a311cd498c	Handle dateparser ValueErrors When parsing dates from the document text or filenames, correctly handle values errors indicating broken dates. Newly added tests ensure that this handling works properly.	2020-03-08 18:44:15 +01:00
Johannes Wienke	a3aab0cb48	Remove duplicated date parsing test The exact same tests existed twice in the file.	2020-03-08 18:26:29 +01:00
Daniel Quinn	637b0d4cc2	Drop problematic tests Some tests had differing outcomes depending on the version of Tesseract installed on the test system. This lead to a bunch of false test failures, which lead to people (including me) just ignoring the Travis results. This commit removes those tests, and while it reduces our coverage, at least the results are predictable.	2018-12-30 17:32:45 +00:00
Daniel Quinn	27af2603f5	Use modern languages for sample test files	2018-12-30 14:09:17 +00:00
Erik Arvstedt	a19f0ef97e	Fix date test sample image The previous version of `tests_date_3.png` had too much spacing between the `0` and the `8` glyphs, which resulted in the year getting parsed as `200 8` in Tesseract 3.05.00 (+ tessdata 3.04.00). This caused the date parsing test to fail.	2018-12-02 15:10:21 +01:00
Daniel Quinn	d544f269e0	Conform everything to the coding standards https://paperless.readthedocs.io/en/latest/contributing.html#additional-style-guides	2018-12-01 17:09:12 +00:00
Daniel Quinn	650db75c2b	Merge branch 'ENH_filename_date_parsing' of https://github.com/jat255/paperless into jat255-ENH_filename_date_parsing	2018-12-01 16:57:16 +00:00
Daniel Quinn	c1d18c1e83	Fix language guesses in tests It turns out that the Lorem ipsum text in the sample files was confuing the language guesser, causing it to think the file was in Catalan and not English or German.	2018-12-01 15:55:59 +00:00
Joshua Taillon	730daa3d6d	Merge branch 'master' of github.com:danielquinn/paperless into ENH_filename_date_parsing	2018-11-15 23:17:59 -05:00
Joshua Taillon	e1d8744c66	Add option for parsing of date from filename (and associated tests)	2018-11-15 20:32:15 -05:00
Joshua Taillon	4409f65840	Update date tests to be more explicit with settings and allow tests to pass if using a timezone other than UTC	2018-11-15 20:30:23 -05:00
Daniel Quinn	2a3f766b93	Consolidate get_date onto the DocumentParser parent class	2018-10-07 14:56:02 +01:00
Daniel Quinn	8010d72f18	Tweak the date guesser to not allow dates prior to 1900 (#414 )	2018-10-01 20:03:47 +01:00
Erik Arvstedt	be2cbebaf7	Stop tests from writing to the source tree	2018-07-19 23:48:23 +02:00
Wolf-Bastian Pöttner	fba58f3bdd	Increase testcoverage by testing two more date detection cases	2018-02-19 21:36:48 +01:00
Daniel Quinn	6662ca3467	Fix formatting	2018-02-18 18:00:34 +00:00
Daniel Quinn	6f1ed89e26	Fix tests to use _text instead of TEXT_CACHE	2018-02-18 18:00:22 +00:00
Daniel Quinn	5d01410dc0	Merge pull request #302 from BastianPoe/bugfix/extend_regex_to_find_more_dates Extends the regex to find dates in documents as reported by @isaacsando	2018-02-18 17:23:49 +01:00
Wolf-Bastian Pöttner	1737e27b34	Add more (fast-running) unit tests	2018-02-14 21:41:01 +01:00
Wolf-Bastian Pöttner	07d06d9aee	Extends the regex to find dates in documents as reported by @isaacsando	2018-02-12 22:41:15 +01:00
Daniel Quinn	c90ed2da1d	Rework tests to write to /tmp Originally the test wrote scratch data inside the repo dir, which meant manual cleanup. Now it writes to `/tmp/paperless-tests-<random-string>` and cleans up after itself.	2018-02-03 14:49:48 +00:00
Wolf-Bastian Pöttner	bef2d94374	Add test cases for date parsing	2018-02-03 00:28:49 +01:00
Daniel Quinn	bd67b53d50	Update test for #259 fix	2017-10-16 10:53:18 +01:00
Daniel Quinn	fa4924d5ba	fix: allow for caps in file name suffixes #206 @schinkelg ran aground of this one and I took the opportunity to add a test to catch this sort of thing for next time.	2017-03-28 21:14:24 +00:00
Daniel Quinn	55e81ca4bb	feat: refactor for pluggable consumers I've broken out the OCR-specific code from the consumers and dumped it all into its own app, `paperless_tesseract`. This new app should serve as a sample of how to create one's own consumer for different file types. Documentation for how to do this isn't ready yet, but for the impatient: * Create a new app * containing a `parsers.py` for your parser modelled after `paperless_tesseract.parsers.RasterisedDocumentParser` * containing a `signals.py` with a handler moddelled after `paperless_tesseract.signals.ConsumerDeclaration` * connect the signal handler to `documents.signals.document_consumer_declaration` in `your_app.apps` * Install the app into Paperless by declaring `PAPERLESS_INSTALLED_APPS=your_app`. Additional apps should be separated with commas. * Restart the consumer	2017-03-25 15:10:25 +00:00

1 2

79 Commits