s/Sender/Correspondent & reworked the (im|ex)porter

This commit is contained in:
Daniel Quinn
2016-03-03 20:52:42 +00:00
parent fad466477b
commit 070463b85a
14 changed files with 342 additions and 184 deletions

View File

@@ -44,10 +44,10 @@ Any document you put into the consumption directory will be consumed, but if you
name the file right, it'll automatically set some values in the database for
you. This is is the logic the consumer follows:
1. Try to find the sender, title, and tags in the file name following the
pattern: ``Sender - Title - tag,tag,tag.pdf``.
2. If that doesn't work, try to find the sender and title in the file name
following the pattern: ``Sender - Title.pdf``.
1. Try to find the correspondent, title, and tags in the file name following
the pattern: ``Correspondent - Title - tag,tag,tag.pdf``.
2. If that doesn't work, try to find the correspondent and title in the file
name following the pattern: ``Correspondent - Title.pdf``.
3. If that doesn't work, just assume that the name of the file is the title.
So given the above, the following examples would work as you'd expect:
@@ -97,9 +97,9 @@ So, with all that in mind, here's what you do to get it running:
the configured email account every 10 minutes for something new and pull down
whatever it finds.
4. Send yourself an email! Note that the subject is treated as the file name,
so if you set the subject to ``Sender - Title - tag,tag,tag``, you'll get
what you expect. Also, you must include the aforementioned secret string in
every email so the fetcher knows that it's safe to import.
so if you set the subject to ``Correspondent - Title - tag,tag,tag``, you'll
get what you expect. Also, you must include the aforementioned secret
string in every email so the fetcher knows that it's safe to import.
5. After a few minutes, the consumer will poll your mailbox, pull down the
message, and place the attachment in the consumption directory with the
appropriate name. A few minutes later, the consumer will import it like any
@@ -118,16 +118,16 @@ a real API, it's just a URL that accepts an HTTP POST.
To push your document to *Paperless*, send an HTTP POST to the server with the
following name/value pairs:
* ``sender``: The name of the document's sender. Note that there are
restrictions on what characters you can use here. Specifically, alphanumeric
characters, `-`, `,`, `.`, and `'` are ok, everything else it out. You also
can't use the sequence ` - ` (space, dash, space).
* ``correspondent``: The name of the document's correspondent. Note that there
are restrictions on what characters you can use here. Specifically,
alphanumeric characters, `-`, `,`, `.`, and `'` are ok, everything else it
out. You also can't use the sequence ` - ` (space, dash, space).
* ``title``: The title of the document. The rules for characters is the same
here as the sender.
* ``signature``: For security reasons, we have the sender send a signature using
a "shared secret" method to make sure that random strangers don't start
uploading stuff to your server. The means of generating this signature is
defined below.
here as the correspondent.
* ``signature``: For security reasons, we have the correspondent send a
signature using a "shared secret" method to make sure that random strangers
don't start uploading stuff to your server. The means of generating this
signature is defined below.
Specify ``enctype="multipart/form-data"``, and then POST your file with:::
@@ -146,12 +146,12 @@ verification.
In the case of *Paperless*, you configure the server with the secret by setting
``UPLOAD_SHARED_SECRET``. Then on your client, you generate your signature by
concatenating the sender, title, and the secret, and then using sha256 to
generate a hexdigest.
concatenating the correspondent, title, and the secret, and then using sha256
to generate a hexdigest.
If you're using Python, this is what that looks like:
.. code:: python
from hashlib import sha256
signature = sha256(sender + title + secret).hexdigest()
signature = sha256(correspondent + title + secret).hexdigest()

View File

@@ -4,10 +4,68 @@ Migrating, Updates, and Backups
===============================
As *Paperless* is still under active development, there's a lot that can change
as software updates roll out. The thing you just need to remember for all of
this is that for the most part, **the database is expendable** so long as you
have your files. This is because the file name of the exported files includes
the name of the sender, the title, and the tags (if any) on each file.
as software updates roll out. You should backup often, so if anything goes
wrong during an update, you at least have a means of restoring to something
usable. Thankfully, there are automated ways of backing up, restoring, and
updating the software.
.. _migrating-backup:
Backing Up
----------
So you're bored of this whole project, or you want to make a remote backup of
the unencrypted files for whatever reason. This is easy to do, simply use the
:ref:`exporter <utilities-exporter>` to dump your documents and database out
into an arbitrary directory.
.. _migrating-restoring:
Restoring
---------
Restoring your data is just as easy, since nearly all of your data exists either
in the file names, or in the contents of the files themselves. You just need to
create an empty database (just follow the
:ref:`installation instructions <setup-installation>` again) and then import the
``tags.json`` file you created as part of your backup. Lastly, copy your
exported documents into the consumption directory and start up the consumer.
.. code-block:: shell-session
$ cd /path/to/project
$ rm data/db.sqlite3 # Delete the database
$ cd src
$ ./manage.py migrate # Create the database
$ ./manage.py createsuperuser
$ ./manage.py loaddata /path/to/arbitrary/place/tags.json
$ cp /path/to/exported/docs/* /path/to/consumption/dir/
$ ./manage.py document_consumer
Importing your data if you are :ref:`using Docker <setup-installation-docker>`
is almost as simple:
.. code-block:: shell-session
# Stop and remove your current containers
$ docker-compose stop
$ docker-compose rm -f
# Recreate them, add the superuser
$ docker-compose up -d
$ docker-compose run --rm webserver createsuperuser
# Load the tags
$ cat /path/to/arbitrary/place/tags.json | docker-compose run --rm webserver loaddata_stdin -
# Load your exported documents into the consumption directory
# (How you do this highly depends on how you have set this up)
$ cp /path/to/exported/docs/* /path/to/mounted/consumption/dir/
After loading the documents into the consumption directory the consumer will
immediately start consuming the documents.
.. _migrating-updates:
@@ -20,7 +78,7 @@ on the directory containing the project files, and then use Django's ``migrate``
command to execute any database schema updates that might have been rolled in
as part of the update:
.. code:: bash
.. code-block:: shell-session
$ cd /path/to/project
$ git pull
@@ -43,112 +101,3 @@ requires only one additional step:
If ``git pull`` doesn't report any changes, there is no need to continue with
the remaining steps.
.. _migrating-backup:
Backing Up
----------
So you're bored of this whole project, or you want to make a remote backup of
the unencrypted files for whatever reason. This is easy to do, simply use the
:ref:`exporter <utilities-exporter>` to dump your documents out into an
arbitrary directory.
Additionally however, you'll need to back up the tags themselves. The file
names contain the tag names, but you still need to define the tags and their
matching algorithms in the database for things to work properly. We do this
with Django's ``dumpdata`` command, which produces JSON output.
.. code:: bash
$ cd /path/to/project
$ cd src
$ ./manage.py document_export /path/to/arbitrary/place/
$ ./manage.py dumpdata documents.Tag > /path/to/arbitrary/place/tags.json
If you are :ref:`using Docker <setup-installation-docker>`, exporting your tags
as JSON is almost as easy:
.. code-block:: shell-session
$ docker-compose run --rm webserver dumpdata documents.Tag > /path/to/arbitrary/place/tags.json
To export the documents you can either use ``docker run`` directly, specifying all
the commandline options by hand, or (more simply) mount a second volume for export.
To mount a volume for exports, follow the instructions in the
``docker-compose.yml.example`` file for the ``/export`` volume (making the changes
in your own ``docker-compose.yml`` file, of course). Once you have the
volume mounted, the command to run an export is:
.. code-block:: console
$ docker-compose run --rm consumer document_exporter /export
If you prefer to use ``docker run`` directly, supplying the necessary commandline
options:
.. code-block:: shell-session
$ # Identify your containers
$ docker-compose ps
Name Command State Ports
-------------------------------------------------------------------------
paperless_consumer_1 /sbin/docker-entrypoint.sh ... Exit 0
paperless_webserver_1 /sbin/docker-entrypoint.sh ... Exit 0
$ # Make sure to replace your passphrase and remove or adapt the id mapping
$ docker run --rm \
--volumes-from paperless_data_1 \
--volume /path/to/arbitrary/place:/export \
-e PAPERLESS_PASSPHRASE=YOUR_PASSPHRASE \
-e USERMAP_UID=1000 -e USERMAP_GID=1000 \
paperless document_exporter /export
.. _migrating-restoring:
Restoring
---------
Restoring your data is just as easy, since nearly all of your data exists either
in the file names, or in the contents of the files themselves. You just need to
create an empty database (just follow the
:ref:`installation instructions <setup-installation>` again) and then import the
``tags.json`` file you created as part of your backup. Lastly, copy your
exported documents into the consumption directory and start up the consumer.
.. code:: bash
$ cd /path/to/project
$ rm data/db.sqlite3 # Delete the database
$ cd src
$ ./manage.py migrate # Create the database
$ ./manage.py createsuperuser
$ ./manage.py loaddata /path/to/arbitrary/place/tags.json
$ cp /path/to/exported/docs/* /path/to/consumption/dir/
$ ./manage.py document_consumer
Importing your data if you are :ref:`using Docker <setup-installation-docker>`
is almost as simple:
.. code-block:: shell-session
$ # Stop and remove your current containers
$ docker-compose stop
$ docker-compose rm -f
$ # Recreate them, add the superuser
$ docker-compose up -d
$ docker-compose run --rm webserver createsuperuser
$ # Load the tags
$ cat /path/to/arbitrary/place/tags.json | docker-compose run --rm webserver loaddata_stdin -
$ # Load your exported documents into the consumption directory
$ # (How you do this highly depends on how you have set this up)
$ cp /path/to/exported/docs/* /path/to/mounted/consumption/dir/
After loading the documents into the consumption directory the consumer will
immediately start consuming the documents.

View File

@@ -26,7 +26,7 @@ How to Use It
The webserver is started via the ``manage.py`` script:
.. code:: bash
.. code-block:: shell-session
$ /path/to/paperless/src/manage.py runserver
@@ -64,7 +64,7 @@ How to Use It
The consumer is started via the ``manage.py`` script:
.. code:: bash
.. code-block:: shell-session
$ /path/to/paperless/src/manage.py document_consumer
@@ -95,16 +95,86 @@ How to Use It
This too is done via the ``manage.py`` script:
.. code:: bash
.. code-block:: shell-session
$ /path/to/paperless/src/manage.py document_exporter /path/to/somewhere
$ /path/to/paperless/src/manage.py document_exporter /path/to/somewhere/
This will dump all of your PDFs into ``/path/to/somewhere`` for you to do with
as you please. The naming scheme on export is identical to that used for
import, so should you can now safely delete the entire project directly,
database, encrypted PDFs and all, and later create it all again simply by
running the consumer again and dumping all of these files into
``CONSUMPTION_DIR``.
This will dump all of your unencrypted PDFs into ``/path/to/somewhere`` for you
to do with as you please. The files are accompanied with a special file,
``manifest.json`` which can be used to
:ref:`import the files <utilities-importer>` at a later date if you wish.
.. _utilities-exporter-howto-docker:
Docker
______
If you are :ref:`using Docker <setup-installation-docker>`, running the
expoorter is almost as easy. To mount a volume for exports, follow the
instructions in the ``docker-compose.yml.example`` file for the ``/export``
volume (making the changes in your own ``docker-compose.yml`` file, of course).
Once you have the volume mounted, the command to run an export is:
.. code-block:: shell-session
$ docker-compose run --rm consumer document_exporter /export
If you prefer to use ``docker run`` directly, supplying the necessary commandline
options:
.. code-block:: shell-session
$ # Identify your containers
$ docker-compose ps
Name Command State Ports
-------------------------------------------------------------------------
paperless_consumer_1 /sbin/docker-entrypoint.sh ... Exit 0
paperless_webserver_1 /sbin/docker-entrypoint.sh ... Exit 0
$ # Make sure to replace your passphrase and remove or adapt the id mapping
$ docker run --rm \
--volumes-from paperless_data_1 \
--volume /path/to/arbitrary/place:/export \
-e PAPERLESS_PASSPHRASE=YOUR_PASSPHRASE \
-e USERMAP_UID=1000 -e USERMAP_GID=1000 \
paperless document_exporter /export
.. _utilities-importer:
The Importer
------------
Looking to transfer Paperless data from one instance to another, or just want
to restore from a backup? This is your go-to toy.
.. _utilities-importer-howto:
How to Use It
.............
The importer works just like the exporter. You point it at a directory, and
the script does the rest of the work:
.. code-block:: shell-session
$ /path/to/paperless/src/manage.py document_importer /path/to/somewhere/
Docker
______
Assuming that you've already gone through the steps above in the
:ref:`export <utilities-exporter-howto-docker>` section, then the easiest thing
to do is just re-use the ``/export`` path you already setup:
.. code-block:: shell-session
$ docker-compose run --rm consumer document_importer /export
Similarly, if you're not using docker-compose, you can adjust the export
instructions above to do the import.
.. _utilities-retagger: