7 Commits

Author SHA1 Message Date
Jonas Winkler
41650f20f4 mime type handling 2020-11-20 13:31:03 +01:00
Jonas Winkler
d2e22e3f27 Changed the way parsers are discovered. This also prepares for upcoming changes regarding content types and file types: parsers should declare what they support, and actual file extensions should not be hardcoded everywhere. 2020-11-16 23:53:12 +01:00
Daniel Quinn
5342db6ada Fix pycodestyle complaints
Apparently, pycodestyle updated itself to now check for invalid escape
sequences, which only complain if the regex in use isn't a raw string
(r"").
2018-09-09 20:00:12 +01:00
Daniel Quinn
73163d893f No need to extend object 2018-02-03 15:26:28 +00:00
Daniel Quinn
e32ed09da3 Support .jpeg as well as .jpg 2017-10-16 09:00:38 +01:00
Daniel Quinn
fa4924d5ba fix: allow for caps in file name suffixes #206
@schinkelg ran aground of this one and I took the opportunity to add a
test to catch this sort of thing for next time.
2017-03-28 21:14:24 +00:00
Daniel Quinn
55e81ca4bb feat: refactor for pluggable consumers
I've broken out the OCR-specific code from the consumers and dumped it
all into its own app, `paperless_tesseract`.  This new app should serve
as a sample of how to create one's own consumer for different file
types.

Documentation for how to do this isn't ready yet, but for the impatient:

* Create a new app
    * containing a `parsers.py` for your parser modelled after
      `paperless_tesseract.parsers.RasterisedDocumentParser`
    * containing a `signals.py` with a handler moddelled after
      `paperless_tesseract.signals.ConsumerDeclaration`
    * connect the signal handler to
      `documents.signals.document_consumer_declaration` in
      `your_app.apps`
* Install the app into Paperless by declaring
  `PAPERLESS_INSTALLED_APPS=your_app`.  Additional apps should be
  separated with commas.
* Restart the consumer
2017-03-25 15:10:25 +00:00