diff --git a/docs/consumption.rst b/docs/consumption.rst new file mode 100644 index 000000000..03430b2e0 --- /dev/null +++ b/docs/consumption.rst @@ -0,0 +1,154 @@ +.. _consumption: + +Consumption +########### + +Once you've got *Paperless* setup, you need to start feeding documents into it. +Currently, there are three options: the consumption directory, IMAP (email), and +HTTP POST. + + +.. _consumption-directory: + +The Consumption Directory +========================= + +The primary method of getting documents into your database is by putting them in +the consumption directory. The ``document_consumer`` script runs in an infinite +loop looking for new additions to this directory and when it finds them, it goes +about the process of parsing them with the OCR, indexing what it finds, and +encrypting the PDF, storing it in the media directory. + +Getting stuff into this directory is up to you. If you're running *Paperless* +on your local computer, you might just want to drag and drop files there, but if +you're running this on a server and want your scanner to automatically push +files to this directory, you'll need to setup some sort of service to accept the +files from the scanner. Typically, you're looking at an FTP server like +`Proftpd`_ or `Samba`_. + +.. _Proftpd: http://www.proftpd.org/ +.. _Samba: http://www.samba.org/ + +So where is this consumption directory? It's wherever you define it. Look for +the ``CONSUMPTION_DIR`` value in ``settings.py``. Set that to somewhere +appropriate for your use and put some documents in there. When you're ready, +follow the :ref:`consumer ` instructions to get it running. + + +.. _consumption-directory-naming: + +A Note on File Naming +--------------------- + +Any document you put into the consumption directory will be consumed, but if you +name the file right, it'll automatically set some values in the database for +you. This is is the logic the consumer follows: + +1. Try to find the sender, title, and tags in the file name following the + pattern: ``Sender - Title - tag,tag,tag.pdf``. +2. If that doesn't work, try to find the sender and title in the file name + following the pattern: ``Sender - Title.pdf``. +3. If that doesn't work, just assume that the name of the file is the title. + +So given the above, the following examples would work as you'd expect: + +* ``Some Company Name - Invoice 2016-01-01 - money,invoices.pdf`` +* ``Another Company - Letter of Reference.jpg`` +* ``Dad's Recipe for Pancakes.png`` + +These however wouldn't work: + +* ``Some Company Name, Invoice 2016-01-01, money, invoices.pdf`` +* ``Another Company- Letter of Reference.jpg`` + + +.. _consumption-imap: + +IMAP (Email) +============ + +Another handy way to get documents into your database is to email them to +yourself. The typical use-case would be to be out for lunch and want to send a +copy of the receipt back to your system at home. *Paperless* can be taught to +pull emails down from an arbitrary account and dump them into the consumption +directory where the process :ref:`above ` will follow the +usual pattern on consuming the document. + +Some things you need to know about this feature: + +* It's disabled by default. By setting the values below it will be enabled. +* It's been tested in a limited environment, so it may not work for you (please + submit a pull request if you can!) +* It's designed to **delete mail from the server once consumed**. So don't go + pointing this to your personal email account and wonder where all your stuff + went. +* Currently, only one photo (attachment) per email will work. + +So, with all that in mind, here's what you do to get it running: + +1. Setup a new email account somewhere, or if you're feeling daring, create a + folder in an existing email box and note the path to that folder. +2. In ``settings.py`` set all of the appropriate values in ``MAIL_CONSUMPTION``. + If you decided to use a subfolder of an existing account, then make sure you + set ``INBOX`` accordingly here. +3. Restart the :ref:`consumer `. The consumer will check + the configured email account every 10 minutes for something new and pull down + whatever it finds. +4. Send yourself an email! Note that the subject is treated as the file name, + so if you set the subject to ``Sender - Title - tag,tag,tag``, you'll get + what you expect. +5. After a few minutes, the consumer will poll your mailbox, pull down the + message, and place the attachment in the consumption directory with the + appropriate name. A few minutes later, the consumer will import it like any + other file. + + +.. _consumption-http: + +HTTP POST +========= + +Currently, the API is limited to only handling file uploads, it doesn't do tags +yet, and the URL schema isn't concrete, but it's a start. It's also not much of +a real API, it's just a URL that accepts an HTTP POST. + +To push your document to *Paperless*, send an HTTP POST to the server with the +following name/value pairs: + +* ``sender``: The name of the document's sender. Note that there are + restrictions on what characters you can use here. Specifically, alphanumeric + characters, `-`, `,`, `.`, and `'` are ok, everything else it out. You also + can't use the sequence ` - ` (space, dash, space). +* ``title``: The title of the document. The rules for characters is the same + here as the sender. +* ``signature``: For security reasons, we have the sender send a signature using + a "shared secret" method to make sure that random strangers don't start + uploading stuff to your server. The means of generating this signature is + defined below. + +Specify ``enctype="multipart/form-data"``, and then POST your file with::: + + Content-Disposition: form-data; name="document"; filename="whatever.pdf" + + +.. _consumption-http-signature: + +Generating the Signature +------------------------ + +Generating a signature based a shared secret is pretty simple: define a secret, +and store it on the server and the client. Then use that secret, along with +the text you want to verify to generate a string that you can use for +verification. + +In the case of *Paperless*, you configure the server with the secret by setting +``UPLOAD_SHARED_SECRET``. Then on your client, you generate your signature by +concatenating the sender, title, and the secret, and then using sha256 to +generate a hexdigest. + +If you're using Python, this is what that looks like: + +.. code:: python + + from hashlib import sha256 + signature = sha256(sender + title + secret).hexdigest() diff --git a/docs/index.rst b/docs/index.rst index 2bf21633b..fc78f6f23 100644 --- a/docs/index.rst +++ b/docs/index.rst @@ -29,6 +29,7 @@ Contents requirements setup + consumption utilities migrating changelog