paperless-ngx

mirror of https://github.com/paperless-ngx/paperless-ngx.git synced 2025-05-01 11:19:32 -05:00

Author	SHA1	Message	Date
Jonas Winkler	4230a0a474	a new setting that allows you to skip thumbnail optimization.	2020-11-18 22:42:05 +01:00
Jonas Winkler	680ab3d56b	updated logging, logging for the mail consumer to see whats happening	2020-11-18 13:23:30 +01:00
Jonas Winkler	9a48d6c577	Changed the way parsers are discovered. This also prepares for upcoming changes regarding content types and file types: parsers should declare what they support, and actual file extensions should not be hardcoded everywhere.	2020-11-16 23:53:12 +01:00
Jonas Winkler	4734dec465	add some more checks.	2020-11-12 21:20:12 +01:00
Jonas Winkler	eb6805e37e	code style fixes	2020-11-12 21:09:45 +01:00
Jonas Winkler	cf5e463b9b	silenced unpaper once and for all	2020-11-03 14:04:21 +01:00
Jonas Winkler	9757e261f2	A handy script to redo ocr on all documents,	2020-11-03 14:04:11 +01:00
Jonas Winkler	d42979842e	made unpaper and convert a little bit nicer to interact with	2020-11-02 19:31:04 +01:00
Jonas Winkler	def3a85858	reworked most of the tesseract parser, better logging	2020-11-02 15:40:44 +01:00
Jonas Winkler	ffdb517b73	removed settings constants	2020-11-01 23:37:56 +01:00
Jonas Winkler	6adc870a20	silenced unpaper, optipng for cleaner output moved parser settings to settings removed forgiving ocr (now default) since tesseract is plenty accurate even without defining the correct language.	2020-11-01 23:23:42 +01:00
Johannes Wienke	ebcfcea05b	Handle dateparser ValueErrors When parsing dates from the document text or filenames, correctly handle values errors indicating broken dates. Newly added tests ensure that this handling works properly.	2020-03-08 18:44:15 +01:00
Daniel Quinn	0d59844567	Conform everything to the coding standards https://paperless.readthedocs.io/en/latest/contributing.html#additional-style-guides	2018-12-01 17:09:12 +00:00
Joshua Taillon	b0326b5a19	Merge branch 'master' of github.com:danielquinn/paperless into ENH_filename_date_parsing	2018-11-15 23:17:59 -05:00
Joshua Taillon	6e88634fa8	Change the massive regex to match boundaries with _ or - characters (not just word breaks); add line for year first formats like YYYY-MM-DD	2018-11-15 20:38:53 -05:00
Daniel Quinn	bc898c1992	Use optipng to optimise document thumbnails	2018-10-07 14:56:38 +01:00
Daniel Quinn	074609e1fc	Consolidate get_date onto the DocumentParser parent class	2018-10-07 14:56:02 +01:00
Daniel Quinn	ef7f98281d	Rename `parsers` to `DATE_REGEX` In moving the `parsers` variable into the package-level, it lost the context, so a more descriptive name was needed.	2018-09-09 21:02:30 +01:00
Joshua Taillon	5326895334	move date-matching regex pattern to base parser module for use by all subclasses	2018-09-05 21:13:36 -04:00
Daniel Quinn	5cc10a282b	Use `paperless-` instead of `paperless` for tempdir name This is purely aesthetic.	2018-02-03 14:49:17 +00:00
Daniel Quinn	648e7b6d4f	No need to explicitly extend object	2018-02-03 14:49:01 +00:00
Wolf-Bastian Pöttner	21fc51c09a	Add support for a heuristic that extracts the document date from its text	2018-01-28 19:37:10 +01:00
Daniel Quinn	d2c283582b	feat: refactor for pluggable consumers I've broken out the OCR-specific code from the consumers and dumped it all into its own app, `paperless_tesseract`. This new app should serve as a sample of how to create one's own consumer for different file types. Documentation for how to do this isn't ready yet, but for the impatient: * Create a new app * containing a `parsers.py` for your parser modelled after `paperless_tesseract.parsers.RasterisedDocumentParser` * containing a `signals.py` with a handler moddelled after `paperless_tesseract.signals.ConsumerDeclaration` * connect the signal handler to `documents.signals.document_consumer_declaration` in `your_app.apps` * Install the app into Paperless by declaring `PAPERLESS_INSTALLED_APPS=your_app`. Additional apps should be separated with commas. * Restart the consumer	2017-03-25 15:10:25 +00:00

23 Commits