paperless-ngx

mirror of https://github.com/paperless-ngx/paperless-ngx.git synced 2025-05-01 11:19:32 -05:00

Author	SHA1	Message	Date
Jonas Winkler	6adc870a20	silenced unpaper, optipng for cleaner output moved parser settings to settings removed forgiving ocr (now default) since tesseract is plenty accurate even without defining the correct language.	2020-11-01 23:23:42 +01:00
Johannes Wienke	ebcfcea05b	Handle dateparser ValueErrors When parsing dates from the document text or filenames, correctly handle values errors indicating broken dates. Newly added tests ensure that this handling works properly.	2020-03-08 18:44:15 +01:00
Daniel Quinn	0d59844567	Conform everything to the coding standards https://paperless.readthedocs.io/en/latest/contributing.html#additional-style-guides	2018-12-01 17:09:12 +00:00
Joshua Taillon	b0326b5a19	Merge branch 'master' of github.com:danielquinn/paperless into ENH_filename_date_parsing	2018-11-15 23:17:59 -05:00
Joshua Taillon	6e88634fa8	Change the massive regex to match boundaries with _ or - characters (not just word breaks); add line for year first formats like YYYY-MM-DD	2018-11-15 20:38:53 -05:00
Daniel Quinn	bc898c1992	Use optipng to optimise document thumbnails	2018-10-07 14:56:38 +01:00
Daniel Quinn	074609e1fc	Consolidate get_date onto the DocumentParser parent class	2018-10-07 14:56:02 +01:00
Daniel Quinn	ef7f98281d	Rename `parsers` to `DATE_REGEX` In moving the `parsers` variable into the package-level, it lost the context, so a more descriptive name was needed.	2018-09-09 21:02:30 +01:00
Joshua Taillon	5326895334	move date-matching regex pattern to base parser module for use by all subclasses	2018-09-05 21:13:36 -04:00
Daniel Quinn	5cc10a282b	Use `paperless-` instead of `paperless` for tempdir name This is purely aesthetic.	2018-02-03 14:49:17 +00:00
Daniel Quinn	648e7b6d4f	No need to explicitly extend object	2018-02-03 14:49:01 +00:00
Wolf-Bastian Pöttner	21fc51c09a	Add support for a heuristic that extracts the document date from its text	2018-01-28 19:37:10 +01:00
Daniel Quinn	d2c283582b	feat: refactor for pluggable consumers I've broken out the OCR-specific code from the consumers and dumped it all into its own app, `paperless_tesseract`. This new app should serve as a sample of how to create one's own consumer for different file types. Documentation for how to do this isn't ready yet, but for the impatient: * Create a new app * containing a `parsers.py` for your parser modelled after `paperless_tesseract.parsers.RasterisedDocumentParser` * containing a `signals.py` with a handler moddelled after `paperless_tesseract.signals.ConsumerDeclaration` * connect the signal handler to `documents.signals.document_consumer_declaration` in `your_app.apps` * Install the app into Paperless by declaring `PAPERLESS_INSTALLED_APPS=your_app`. Additional apps should be separated with commas. * Restart the consumer	2017-03-25 15:10:25 +00:00

13 Commits