Documented consumption

2025-11-21 04:36:53 -06:00 · 2016-02-14 00:10:49 +00:00
parent 330dfa544b
commit cec9968cdb
2 changed files with 155 additions and 0 deletions
--- a/docs/consumption.rst
+++ b/docs/consumption.rst
@@ -0,0 +1,154 @@
+.. _consumption:
+
+Consumption
+###########
+
+Once you've got *Paperless* setup, you need to start feeding documents into it.
+Currently, there are three options: the consumption directory, IMAP (email), and
+HTTP POST.
+
+
+.. _consumption-directory:
+
+The Consumption Directory
+=========================
+
+The primary method of getting documents into your database is by putting them in
+the consumption directory.  The ``document_consumer`` script runs in an infinite
+loop looking for new additions to this directory and when it finds them, it goes
+about the process of parsing them with the OCR, indexing what it finds, and
+encrypting the PDF, storing it in the media directory.
+
+Getting stuff into this directory is up to you.  If you're running *Paperless*
+on your local computer, you might just want to drag and drop files there, but if
+you're running this on a server and want your scanner to automatically push
+files to this directory, you'll need to setup some sort of service to accept the
+files from the scanner.  Typically, you're looking at an FTP server like
+`Proftpd`_ or `Samba`_.
+
+.. _Proftpd: http://www.proftpd.org/
+.. _Samba: http://www.samba.org/
+
+So where is this consumption directory?  It's wherever you define it.  Look for
+the ``CONSUMPTION_DIR`` value in ``settings.py``.  Set that to somewhere
+appropriate for your use and put some documents in there.  When you're ready,
+follow the :ref:`consumer <utilities-consumer>` instructions to get it running.
+
+
+.. _consumption-directory-naming:
+
+A Note on File Naming
+---------------------
+
+Any document you put into the consumption directory will be consumed, but if you
+name the file right, it'll automatically set some values in the database for
+you.  This is is the logic the consumer follows:
+
+1. Try to find the sender, title, and tags in the file name following the
+   pattern: ``Sender - Title - tag,tag,tag.pdf``.
+2. If that doesn't work, try to find the sender and title in the file name
+   following the pattern:  ``Sender - Title.pdf``.
+3. If that doesn't work, just assume that the name of the file is the title.
+
+So given the above, the following examples would work as you'd expect:
+
+* ``Some Company Name - Invoice 2016-01-01 - money,invoices.pdf``
+* ``Another Company - Letter of Reference.jpg``
+* ``Dad's Recipe for Pancakes.png``
+
+These however wouldn't work:
+
+* ``Some Company Name, Invoice 2016-01-01, money, invoices.pdf``
+* ``Another Company- Letter of Reference.jpg``
+
+
+.. _consumption-imap:
+
+IMAP (Email)
+============
+
+Another handy way to get documents into your database is to email them to
+yourself.  The typical use-case would be to be out for lunch and want to send a
+copy of the receipt back to your system at home.  *Paperless* can be taught to
+pull emails down from an arbitrary account and dump them into the consumption
+directory where the process :ref:`above <consumption-directory>` will follow the
+usual pattern on consuming the document.
+
+Some things you need to know about this feature:
+
+* It's disabled by default.  By setting the values below it will be enabled.
+* It's been tested in a limited environment, so it may not work for you (please
+  submit a pull request if you can!)
+* It's designed to **delete mail from the server once consumed**.  So don't go
+  pointing this to your personal email account and wonder where all your stuff
+  went.
+* Currently, only one photo (attachment) per email will work.
+
+So, with all that in mind, here's what you do to get it running:
+
+1. Setup a new email account somewhere, or if you're feeling daring, create a
+   folder in an existing email box and note the path to that folder.
+2. In ``settings.py`` set all of the appropriate values in ``MAIL_CONSUMPTION``.
+   If you decided to use a subfolder of an existing account, then make sure you
+   set ``INBOX`` accordingly here.
+3. Restart the :ref:`consumer <utilities-consumer>`.  The consumer will check
+   the configured email account every 10 minutes for something new and pull down
+   whatever it finds.
+4. Send yourself an email!  Note that the subject is treated as the file name,
+   so if you set the subject to ``Sender - Title - tag,tag,tag``, you'll get
+   what you expect.
+5. After a few minutes, the consumer will poll your mailbox, pull down the
+   message, and place the attachment in the consumption directory with the
+   appropriate name.  A few minutes later, the consumer will import it like any
+   other file.
+
+
+.. _consumption-http:
+
+HTTP POST
+=========
+
+Currently, the API is limited to only handling file uploads, it doesn't do tags
+yet, and the URL schema isn't concrete, but it's a start.  It's also not much of
+a real API, it's just a URL that accepts an HTTP POST.
+
+To push your document to *Paperless*, send an HTTP POST to the server with the
+following name/value pairs:
+
+* ``sender``: The name of the document's sender.  Note that there are
+  restrictions on what characters you can use here.  Specifically, alphanumeric
+  characters, `-`, `,`, `.`, and `'` are ok, everything else it out.  You also
+  can't use the sequence ` - ` (space, dash, space).
+* ``title``: The title of the document.  The rules for characters is the same
+  here as the sender.
+* ``signature``: For security reasons, we have the sender send a signature using
+  a "shared secret" method to make sure that random strangers don't start
+  uploading stuff to your server.  The means of generating this signature is
+  defined below.
+
+Specify ``enctype="multipart/form-data"``, and then POST your file with:::
+
+    Content-Disposition: form-data; name="document"; filename="whatever.pdf"
+
+
+.. _consumption-http-signature:
+
+Generating the Signature
+------------------------
+
+Generating a signature based a shared secret is pretty simple: define a secret,
+and store it on the server and the client.  Then use that secret, along with
+the text you want to verify to generate a string that you can use for
+verification.
+
+In the case of *Paperless*, you configure the server with the secret by setting
+``UPLOAD_SHARED_SECRET``.  Then on your client, you generate your signature by
+concatenating the sender, title, and the secret, and then using sha256 to
+generate a hexdigest.
+
+If you're using Python, this is what that looks like:
+
+.. code:: python
+
+    from hashlib import sha256
+    signature = sha256(sender + title + secret).hexdigest()
--- a/docs/index.rst
+++ b/docs/index.rst
@@ -29,6 +29,7 @@ Contents

   requirements
   setup
+   consumption
   utilities
   migrating
   changelog