paperless-ngx

mirror of https://github.com/paperless-ngx/paperless-ngx.git synced 2026-02-24 00:59:35 -06:00

Author	SHA1	Message	Date
Daniel Quinn	6b63ce9201	Fix pycodestyle complaints Apparently, pycodestyle updated itself to now check for invalid escape sequences, which only complain if the regex in use isn't a raw string (r"").	2018-09-09 20:00:12 +01:00
Erik Arvstedt	4fa9ff60fc	Stop tests from writing to the source tree	2018-07-19 23:48:23 +02:00
Daniel Quinn	bce2d3dd22	Account for KeyError problem in #345	2018-04-28 12:20:43 +01:00
Daniel Quinn	f3f86242de	Account for KeyError problem in #345	2018-04-28 12:19:53 +01:00
Ovv	32c440cbd9	Log detected document date with isoformat	2018-03-04 13:10:49 +01:00
Wolf-Bastian Pöttner	328330eb08	Increase testcoverage by testing two more date detection cases	2018-02-19 21:36:48 +01:00
Daniel Quinn	fc6d2d5e0c	Fix formatting	2018-02-18 18:00:34 +00:00
Daniel Quinn	9e26e7b39e	Fix tests to use _text instead of TEXT_CACHE	2018-02-18 18:00:22 +00:00
Daniel Quinn	7c5ca5f505	Merge pull request #302 from BastianPoe/bugfix/extend_regex_to_find_more_dates Extends the regex to find dates in documents as reported by @isaacsando	2018-02-18 17:23:49 +01:00
Daniel Quinn	4f726e1991	Monitor return codes of calls to `convert` and `unpaper` ...and handle the failures nicely. Addresses #303.	2018-02-18 16:02:27 +00:00
Daniel Quinn	e53033d1b3	Rename .TEXT_CACHE to .text Properties should use snake_case, and only constants should be ALL_CAPS. This change also makes use of the convention of "private" properties being prefixed with `_`.	2018-02-18 16:00:43 +00:00
Daniel Quinn	3302ee2a78	Make isort happy	2018-02-18 16:00:03 +00:00
Daniel Quinn	caf44146db	Style and removal of Python 2.7 stuff	2018-02-18 15:55:55 +00:00
Wolf-Bastian Pöttner	5fed7ba6d4	Improved regular expression to only match for (unicode) characters in month names + parsed one regex match after another until one gave a parsable date	2018-02-14 21:41:04 +01:00
Wolf-Bastian Pöttner	fc81feb32e	Add more (fast-running) unit tests	2018-02-14 21:41:01 +01:00
Wolf-Bastian Pöttner	3e65054e39	Extended exception handling	2018-02-12 22:43:16 +01:00
Wolf-Bastian Pöttner	c0c20f99e9	Added log output for date detected in document	2018-02-12 22:41:19 +01:00
Wolf-Bastian Pöttner	3899763261	Extends the regex to find dates in documents as reported by @isaacsando	2018-02-12 22:41:15 +01:00
Daniel Quinn	c6e671f2fa	No need to extend object	2018-02-03 15:26:28 +00:00
Daniel Quinn	4c0b908a41	Rework tests to write to /tmp Originally the test wrote scratch data inside the repo dir, which meant manual cleanup. Now it writes to `/tmp/paperless-tests-<random-string>` and cleans up after itself.	2018-02-03 14:49:48 +00:00
Wolf-Bastian Pöttner	acfacaac4f	Added a text cache to optimize performance of date detection	2018-02-03 00:28:52 +01:00
Wolf-Bastian Pöttner	4f725cf4d2	Add test cases for date parsing	2018-02-03 00:28:49 +01:00
Wolf-Bastian Pöttner	73d261484a	Merge branch 'master' of https://github.com/danielquinn/paperless into feature/heuristically-extract-date-from-document-text	2018-02-02 22:44:03 +01:00
Wolf-Bastian Pöttner	3dc730808e	Add support for using pre-existing text from PDFs	2018-02-02 22:37:58 +01:00
Matt	bc5c45a705	Fixing error sentinel for pdftotext when the PDF has no text (scanned images). It was causing a crash previously.	2018-02-01 10:08:57 -05:00
Daniel Quinn	269c32ce6a	Add support for using pre-existing text from PDFs	2018-01-30 20:13:35 +00:00
Wolf-Bastian Pöttner	21fc51c09a	Add support for a heuristic that extracts the document date from its text	2018-01-28 19:37:10 +01:00
Daniel Quinn	67844dff0c	Update test for #259 fix	2017-10-16 10:53:18 +01:00
Daniel Quinn	2820767f29	Support .jpeg as well as .jpg	2017-10-16 09:00:38 +01:00
Daniel Quinn	e7d4ca92ba	fix: allow for caps in file name suffixes #206 @schinkelg ran aground of this one and I took the opportunity to add a test to catch this sort of thing for next time.	2017-03-28 21:14:24 +00:00
Daniel Quinn	d2c283582b	feat: refactor for pluggable consumers I've broken out the OCR-specific code from the consumers and dumped it all into its own app, `paperless_tesseract`. This new app should serve as a sample of how to create one's own consumer for different file types. Documentation for how to do this isn't ready yet, but for the impatient: * Create a new app * containing a `parsers.py` for your parser modelled after `paperless_tesseract.parsers.RasterisedDocumentParser` * containing a `signals.py` with a handler moddelled after `paperless_tesseract.signals.ConsumerDeclaration` * connect the signal handler to `documents.signals.document_consumer_declaration` in `your_app.apps` * Install the app into Paperless by declaring `PAPERLESS_INSTALLED_APPS=your_app`. Additional apps should be separated with commas. * Restart the consumer	2017-03-25 15:10:25 +00:00

31 Commits