mirror of
https://github.com/paperless-ngx/paperless-ngx.git
synced 2025-04-02 13:45:10 -05:00
Documented consumption
This commit is contained in:
parent
330dfa544b
commit
cec9968cdb
154
docs/consumption.rst
Normal file
154
docs/consumption.rst
Normal file
@ -0,0 +1,154 @@
|
||||
.. _consumption:
|
||||
|
||||
Consumption
|
||||
###########
|
||||
|
||||
Once you've got *Paperless* setup, you need to start feeding documents into it.
|
||||
Currently, there are three options: the consumption directory, IMAP (email), and
|
||||
HTTP POST.
|
||||
|
||||
|
||||
.. _consumption-directory:
|
||||
|
||||
The Consumption Directory
|
||||
=========================
|
||||
|
||||
The primary method of getting documents into your database is by putting them in
|
||||
the consumption directory. The ``document_consumer`` script runs in an infinite
|
||||
loop looking for new additions to this directory and when it finds them, it goes
|
||||
about the process of parsing them with the OCR, indexing what it finds, and
|
||||
encrypting the PDF, storing it in the media directory.
|
||||
|
||||
Getting stuff into this directory is up to you. If you're running *Paperless*
|
||||
on your local computer, you might just want to drag and drop files there, but if
|
||||
you're running this on a server and want your scanner to automatically push
|
||||
files to this directory, you'll need to setup some sort of service to accept the
|
||||
files from the scanner. Typically, you're looking at an FTP server like
|
||||
`Proftpd`_ or `Samba`_.
|
||||
|
||||
.. _Proftpd: http://www.proftpd.org/
|
||||
.. _Samba: http://www.samba.org/
|
||||
|
||||
So where is this consumption directory? It's wherever you define it. Look for
|
||||
the ``CONSUMPTION_DIR`` value in ``settings.py``. Set that to somewhere
|
||||
appropriate for your use and put some documents in there. When you're ready,
|
||||
follow the :ref:`consumer <utilities-consumer>` instructions to get it running.
|
||||
|
||||
|
||||
.. _consumption-directory-naming:
|
||||
|
||||
A Note on File Naming
|
||||
---------------------
|
||||
|
||||
Any document you put into the consumption directory will be consumed, but if you
|
||||
name the file right, it'll automatically set some values in the database for
|
||||
you. This is is the logic the consumer follows:
|
||||
|
||||
1. Try to find the sender, title, and tags in the file name following the
|
||||
pattern: ``Sender - Title - tag,tag,tag.pdf``.
|
||||
2. If that doesn't work, try to find the sender and title in the file name
|
||||
following the pattern: ``Sender - Title.pdf``.
|
||||
3. If that doesn't work, just assume that the name of the file is the title.
|
||||
|
||||
So given the above, the following examples would work as you'd expect:
|
||||
|
||||
* ``Some Company Name - Invoice 2016-01-01 - money,invoices.pdf``
|
||||
* ``Another Company - Letter of Reference.jpg``
|
||||
* ``Dad's Recipe for Pancakes.png``
|
||||
|
||||
These however wouldn't work:
|
||||
|
||||
* ``Some Company Name, Invoice 2016-01-01, money, invoices.pdf``
|
||||
* ``Another Company- Letter of Reference.jpg``
|
||||
|
||||
|
||||
.. _consumption-imap:
|
||||
|
||||
IMAP (Email)
|
||||
============
|
||||
|
||||
Another handy way to get documents into your database is to email them to
|
||||
yourself. The typical use-case would be to be out for lunch and want to send a
|
||||
copy of the receipt back to your system at home. *Paperless* can be taught to
|
||||
pull emails down from an arbitrary account and dump them into the consumption
|
||||
directory where the process :ref:`above <consumption-directory>` will follow the
|
||||
usual pattern on consuming the document.
|
||||
|
||||
Some things you need to know about this feature:
|
||||
|
||||
* It's disabled by default. By setting the values below it will be enabled.
|
||||
* It's been tested in a limited environment, so it may not work for you (please
|
||||
submit a pull request if you can!)
|
||||
* It's designed to **delete mail from the server once consumed**. So don't go
|
||||
pointing this to your personal email account and wonder where all your stuff
|
||||
went.
|
||||
* Currently, only one photo (attachment) per email will work.
|
||||
|
||||
So, with all that in mind, here's what you do to get it running:
|
||||
|
||||
1. Setup a new email account somewhere, or if you're feeling daring, create a
|
||||
folder in an existing email box and note the path to that folder.
|
||||
2. In ``settings.py`` set all of the appropriate values in ``MAIL_CONSUMPTION``.
|
||||
If you decided to use a subfolder of an existing account, then make sure you
|
||||
set ``INBOX`` accordingly here.
|
||||
3. Restart the :ref:`consumer <utilities-consumer>`. The consumer will check
|
||||
the configured email account every 10 minutes for something new and pull down
|
||||
whatever it finds.
|
||||
4. Send yourself an email! Note that the subject is treated as the file name,
|
||||
so if you set the subject to ``Sender - Title - tag,tag,tag``, you'll get
|
||||
what you expect.
|
||||
5. After a few minutes, the consumer will poll your mailbox, pull down the
|
||||
message, and place the attachment in the consumption directory with the
|
||||
appropriate name. A few minutes later, the consumer will import it like any
|
||||
other file.
|
||||
|
||||
|
||||
.. _consumption-http:
|
||||
|
||||
HTTP POST
|
||||
=========
|
||||
|
||||
Currently, the API is limited to only handling file uploads, it doesn't do tags
|
||||
yet, and the URL schema isn't concrete, but it's a start. It's also not much of
|
||||
a real API, it's just a URL that accepts an HTTP POST.
|
||||
|
||||
To push your document to *Paperless*, send an HTTP POST to the server with the
|
||||
following name/value pairs:
|
||||
|
||||
* ``sender``: The name of the document's sender. Note that there are
|
||||
restrictions on what characters you can use here. Specifically, alphanumeric
|
||||
characters, `-`, `,`, `.`, and `'` are ok, everything else it out. You also
|
||||
can't use the sequence ` - ` (space, dash, space).
|
||||
* ``title``: The title of the document. The rules for characters is the same
|
||||
here as the sender.
|
||||
* ``signature``: For security reasons, we have the sender send a signature using
|
||||
a "shared secret" method to make sure that random strangers don't start
|
||||
uploading stuff to your server. The means of generating this signature is
|
||||
defined below.
|
||||
|
||||
Specify ``enctype="multipart/form-data"``, and then POST your file with:::
|
||||
|
||||
Content-Disposition: form-data; name="document"; filename="whatever.pdf"
|
||||
|
||||
|
||||
.. _consumption-http-signature:
|
||||
|
||||
Generating the Signature
|
||||
------------------------
|
||||
|
||||
Generating a signature based a shared secret is pretty simple: define a secret,
|
||||
and store it on the server and the client. Then use that secret, along with
|
||||
the text you want to verify to generate a string that you can use for
|
||||
verification.
|
||||
|
||||
In the case of *Paperless*, you configure the server with the secret by setting
|
||||
``UPLOAD_SHARED_SECRET``. Then on your client, you generate your signature by
|
||||
concatenating the sender, title, and the secret, and then using sha256 to
|
||||
generate a hexdigest.
|
||||
|
||||
If you're using Python, this is what that looks like:
|
||||
|
||||
.. code:: python
|
||||
|
||||
from hashlib import sha256
|
||||
signature = sha256(sender + title + secret).hexdigest()
|
@ -29,6 +29,7 @@ Contents
|
||||
|
||||
requirements
|
||||
setup
|
||||
consumption
|
||||
utilities
|
||||
migrating
|
||||
changelog
|
||||
|
Loading…
x
Reference in New Issue
Block a user