paperless-ngx

mirror of https://github.com/paperless-ngx/paperless-ngx.git synced 2025-12-14 01:21:14 -06:00

Author	SHA1	Message	Date
Wolf-Bastian Pöttner	96c7222269	Improved regular expression to only match for (unicode) characters in month names + parsed one regex match after another until one gave a parsable date	2018-02-14 21:41:04 +01:00
Wolf-Bastian Pöttner	1737e27b34	Add more (fast-running) unit tests	2018-02-14 21:41:01 +01:00
Wolf-Bastian Pöttner	39f198138a	Extended exception handling	2018-02-12 22:43:16 +01:00
Wolf-Bastian Pöttner	c74bb84c83	Added log output for date detected in document	2018-02-12 22:41:19 +01:00
Wolf-Bastian Pöttner	07d06d9aee	Extends the regex to find dates in documents as reported by @isaacsando	2018-02-12 22:41:15 +01:00
Daniel Quinn	73163d893f	No need to extend object	2018-02-03 15:26:28 +00:00
Daniel Quinn	c90ed2da1d	Rework tests to write to /tmp Originally the test wrote scratch data inside the repo dir, which meant manual cleanup. Now it writes to `/tmp/paperless-tests-<random-string>` and cleans up after itself.	2018-02-03 14:49:48 +00:00
Wolf-Bastian Pöttner	40f8ba23a4	Added a text cache to optimize performance of date detection	2018-02-03 00:28:52 +01:00
Wolf-Bastian Pöttner	bef2d94374	Add test cases for date parsing	2018-02-03 00:28:49 +01:00
Wolf-Bastian Pöttner	f39c7654a0	Merge branch 'master' of https://github.com/danielquinn/paperless into feature/heuristically-extract-date-from-document-text	2018-02-02 22:44:03 +01:00
Wolf-Bastian Pöttner	87e466c47c	Add support for using pre-existing text from PDFs	2018-02-02 22:37:58 +01:00
Matt	ce98019b49	Fixing error sentinel for pdftotext when the PDF has no text (scanned images). It was causing a crash previously.	2018-02-01 10:08:57 -05:00
Daniel Quinn	cd92c005e3	Add support for using pre-existing text from PDFs	2018-01-30 20:13:35 +00:00
Wolf-Bastian Pöttner	b140935843	Add support for a heuristic that extracts the document date from its text	2018-01-28 19:37:10 +01:00
Daniel Quinn	bd67b53d50	Update test for #259 fix	2017-10-16 10:53:18 +01:00
Daniel Quinn	e32ed09da3	Support .jpeg as well as .jpg	2017-10-16 09:00:38 +01:00
Daniel Quinn	fa4924d5ba	fix: allow for caps in file name suffixes #206 @schinkelg ran aground of this one and I took the opportunity to add a test to catch this sort of thing for next time.	2017-03-28 21:14:24 +00:00
Daniel Quinn	55e81ca4bb	feat: refactor for pluggable consumers I've broken out the OCR-specific code from the consumers and dumped it all into its own app, `paperless_tesseract`. This new app should serve as a sample of how to create one's own consumer for different file types. Documentation for how to do this isn't ready yet, but for the impatient: * Create a new app * containing a `parsers.py` for your parser modelled after `paperless_tesseract.parsers.RasterisedDocumentParser` * containing a `signals.py` with a handler moddelled after `paperless_tesseract.signals.ConsumerDeclaration` * connect the signal handler to `documents.signals.document_consumer_declaration` in `your_app.apps` * Install the app into Paperless by declaring `PAPERLESS_INSTALLED_APPS=your_app`. Additional apps should be separated with commas. * Restart the consumer	2017-03-25 15:10:25 +00:00

1 2

68 Commits