mirror of
https://github.com/paperless-ngx/paperless-ngx.git
synced 2025-07-28 18:24:38 -05:00
s/Sender/Correspondent & reworked the (im|ex)porter
This commit is contained in:
@@ -44,10 +44,10 @@ Any document you put into the consumption directory will be consumed, but if you
|
||||
name the file right, it'll automatically set some values in the database for
|
||||
you. This is is the logic the consumer follows:
|
||||
|
||||
1. Try to find the sender, title, and tags in the file name following the
|
||||
pattern: ``Sender - Title - tag,tag,tag.pdf``.
|
||||
2. If that doesn't work, try to find the sender and title in the file name
|
||||
following the pattern: ``Sender - Title.pdf``.
|
||||
1. Try to find the correspondent, title, and tags in the file name following
|
||||
the pattern: ``Correspondent - Title - tag,tag,tag.pdf``.
|
||||
2. If that doesn't work, try to find the correspondent and title in the file
|
||||
name following the pattern: ``Correspondent - Title.pdf``.
|
||||
3. If that doesn't work, just assume that the name of the file is the title.
|
||||
|
||||
So given the above, the following examples would work as you'd expect:
|
||||
@@ -97,9 +97,9 @@ So, with all that in mind, here's what you do to get it running:
|
||||
the configured email account every 10 minutes for something new and pull down
|
||||
whatever it finds.
|
||||
4. Send yourself an email! Note that the subject is treated as the file name,
|
||||
so if you set the subject to ``Sender - Title - tag,tag,tag``, you'll get
|
||||
what you expect. Also, you must include the aforementioned secret string in
|
||||
every email so the fetcher knows that it's safe to import.
|
||||
so if you set the subject to ``Correspondent - Title - tag,tag,tag``, you'll
|
||||
get what you expect. Also, you must include the aforementioned secret
|
||||
string in every email so the fetcher knows that it's safe to import.
|
||||
5. After a few minutes, the consumer will poll your mailbox, pull down the
|
||||
message, and place the attachment in the consumption directory with the
|
||||
appropriate name. A few minutes later, the consumer will import it like any
|
||||
@@ -118,16 +118,16 @@ a real API, it's just a URL that accepts an HTTP POST.
|
||||
To push your document to *Paperless*, send an HTTP POST to the server with the
|
||||
following name/value pairs:
|
||||
|
||||
* ``sender``: The name of the document's sender. Note that there are
|
||||
restrictions on what characters you can use here. Specifically, alphanumeric
|
||||
characters, `-`, `,`, `.`, and `'` are ok, everything else it out. You also
|
||||
can't use the sequence ` - ` (space, dash, space).
|
||||
* ``correspondent``: The name of the document's correspondent. Note that there
|
||||
are restrictions on what characters you can use here. Specifically,
|
||||
alphanumeric characters, `-`, `,`, `.`, and `'` are ok, everything else it
|
||||
out. You also can't use the sequence ` - ` (space, dash, space).
|
||||
* ``title``: The title of the document. The rules for characters is the same
|
||||
here as the sender.
|
||||
* ``signature``: For security reasons, we have the sender send a signature using
|
||||
a "shared secret" method to make sure that random strangers don't start
|
||||
uploading stuff to your server. The means of generating this signature is
|
||||
defined below.
|
||||
here as the correspondent.
|
||||
* ``signature``: For security reasons, we have the correspondent send a
|
||||
signature using a "shared secret" method to make sure that random strangers
|
||||
don't start uploading stuff to your server. The means of generating this
|
||||
signature is defined below.
|
||||
|
||||
Specify ``enctype="multipart/form-data"``, and then POST your file with:::
|
||||
|
||||
@@ -146,12 +146,12 @@ verification.
|
||||
|
||||
In the case of *Paperless*, you configure the server with the secret by setting
|
||||
``UPLOAD_SHARED_SECRET``. Then on your client, you generate your signature by
|
||||
concatenating the sender, title, and the secret, and then using sha256 to
|
||||
generate a hexdigest.
|
||||
concatenating the correspondent, title, and the secret, and then using sha256
|
||||
to generate a hexdigest.
|
||||
|
||||
If you're using Python, this is what that looks like:
|
||||
|
||||
.. code:: python
|
||||
|
||||
from hashlib import sha256
|
||||
signature = sha256(sender + title + secret).hexdigest()
|
||||
signature = sha256(correspondent + title + secret).hexdigest()
|
||||
|
@@ -4,10 +4,68 @@ Migrating, Updates, and Backups
|
||||
===============================
|
||||
|
||||
As *Paperless* is still under active development, there's a lot that can change
|
||||
as software updates roll out. The thing you just need to remember for all of
|
||||
this is that for the most part, **the database is expendable** so long as you
|
||||
have your files. This is because the file name of the exported files includes
|
||||
the name of the sender, the title, and the tags (if any) on each file.
|
||||
as software updates roll out. You should backup often, so if anything goes
|
||||
wrong during an update, you at least have a means of restoring to something
|
||||
usable. Thankfully, there are automated ways of backing up, restoring, and
|
||||
updating the software.
|
||||
|
||||
|
||||
.. _migrating-backup:
|
||||
|
||||
Backing Up
|
||||
----------
|
||||
|
||||
So you're bored of this whole project, or you want to make a remote backup of
|
||||
the unencrypted files for whatever reason. This is easy to do, simply use the
|
||||
:ref:`exporter <utilities-exporter>` to dump your documents and database out
|
||||
into an arbitrary directory.
|
||||
|
||||
|
||||
.. _migrating-restoring:
|
||||
|
||||
Restoring
|
||||
---------
|
||||
|
||||
Restoring your data is just as easy, since nearly all of your data exists either
|
||||
in the file names, or in the contents of the files themselves. You just need to
|
||||
create an empty database (just follow the
|
||||
:ref:`installation instructions <setup-installation>` again) and then import the
|
||||
``tags.json`` file you created as part of your backup. Lastly, copy your
|
||||
exported documents into the consumption directory and start up the consumer.
|
||||
|
||||
.. code-block:: shell-session
|
||||
|
||||
$ cd /path/to/project
|
||||
$ rm data/db.sqlite3 # Delete the database
|
||||
$ cd src
|
||||
$ ./manage.py migrate # Create the database
|
||||
$ ./manage.py createsuperuser
|
||||
$ ./manage.py loaddata /path/to/arbitrary/place/tags.json
|
||||
$ cp /path/to/exported/docs/* /path/to/consumption/dir/
|
||||
$ ./manage.py document_consumer
|
||||
|
||||
Importing your data if you are :ref:`using Docker <setup-installation-docker>`
|
||||
is almost as simple:
|
||||
|
||||
.. code-block:: shell-session
|
||||
|
||||
# Stop and remove your current containers
|
||||
$ docker-compose stop
|
||||
$ docker-compose rm -f
|
||||
|
||||
# Recreate them, add the superuser
|
||||
$ docker-compose up -d
|
||||
$ docker-compose run --rm webserver createsuperuser
|
||||
|
||||
# Load the tags
|
||||
$ cat /path/to/arbitrary/place/tags.json | docker-compose run --rm webserver loaddata_stdin -
|
||||
|
||||
# Load your exported documents into the consumption directory
|
||||
# (How you do this highly depends on how you have set this up)
|
||||
$ cp /path/to/exported/docs/* /path/to/mounted/consumption/dir/
|
||||
|
||||
After loading the documents into the consumption directory the consumer will
|
||||
immediately start consuming the documents.
|
||||
|
||||
|
||||
.. _migrating-updates:
|
||||
@@ -20,7 +78,7 @@ on the directory containing the project files, and then use Django's ``migrate``
|
||||
command to execute any database schema updates that might have been rolled in
|
||||
as part of the update:
|
||||
|
||||
.. code:: bash
|
||||
.. code-block:: shell-session
|
||||
|
||||
$ cd /path/to/project
|
||||
$ git pull
|
||||
@@ -43,112 +101,3 @@ requires only one additional step:
|
||||
|
||||
If ``git pull`` doesn't report any changes, there is no need to continue with
|
||||
the remaining steps.
|
||||
|
||||
|
||||
.. _migrating-backup:
|
||||
|
||||
Backing Up
|
||||
----------
|
||||
|
||||
So you're bored of this whole project, or you want to make a remote backup of
|
||||
the unencrypted files for whatever reason. This is easy to do, simply use the
|
||||
:ref:`exporter <utilities-exporter>` to dump your documents out into an
|
||||
arbitrary directory.
|
||||
|
||||
Additionally however, you'll need to back up the tags themselves. The file
|
||||
names contain the tag names, but you still need to define the tags and their
|
||||
matching algorithms in the database for things to work properly. We do this
|
||||
with Django's ``dumpdata`` command, which produces JSON output.
|
||||
|
||||
.. code:: bash
|
||||
|
||||
$ cd /path/to/project
|
||||
$ cd src
|
||||
$ ./manage.py document_export /path/to/arbitrary/place/
|
||||
$ ./manage.py dumpdata documents.Tag > /path/to/arbitrary/place/tags.json
|
||||
|
||||
If you are :ref:`using Docker <setup-installation-docker>`, exporting your tags
|
||||
as JSON is almost as easy:
|
||||
|
||||
.. code-block:: shell-session
|
||||
|
||||
$ docker-compose run --rm webserver dumpdata documents.Tag > /path/to/arbitrary/place/tags.json
|
||||
|
||||
To export the documents you can either use ``docker run`` directly, specifying all
|
||||
the commandline options by hand, or (more simply) mount a second volume for export.
|
||||
|
||||
To mount a volume for exports, follow the instructions in the
|
||||
``docker-compose.yml.example`` file for the ``/export`` volume (making the changes
|
||||
in your own ``docker-compose.yml`` file, of course). Once you have the
|
||||
volume mounted, the command to run an export is:
|
||||
|
||||
.. code-block:: console
|
||||
|
||||
$ docker-compose run --rm consumer document_exporter /export
|
||||
|
||||
If you prefer to use ``docker run`` directly, supplying the necessary commandline
|
||||
options:
|
||||
|
||||
.. code-block:: shell-session
|
||||
|
||||
$ # Identify your containers
|
||||
$ docker-compose ps
|
||||
Name Command State Ports
|
||||
-------------------------------------------------------------------------
|
||||
paperless_consumer_1 /sbin/docker-entrypoint.sh ... Exit 0
|
||||
paperless_webserver_1 /sbin/docker-entrypoint.sh ... Exit 0
|
||||
|
||||
$ # Make sure to replace your passphrase and remove or adapt the id mapping
|
||||
$ docker run --rm \
|
||||
--volumes-from paperless_data_1 \
|
||||
--volume /path/to/arbitrary/place:/export \
|
||||
-e PAPERLESS_PASSPHRASE=YOUR_PASSPHRASE \
|
||||
-e USERMAP_UID=1000 -e USERMAP_GID=1000 \
|
||||
paperless document_exporter /export
|
||||
|
||||
|
||||
.. _migrating-restoring:
|
||||
|
||||
Restoring
|
||||
---------
|
||||
|
||||
Restoring your data is just as easy, since nearly all of your data exists either
|
||||
in the file names, or in the contents of the files themselves. You just need to
|
||||
create an empty database (just follow the
|
||||
:ref:`installation instructions <setup-installation>` again) and then import the
|
||||
``tags.json`` file you created as part of your backup. Lastly, copy your
|
||||
exported documents into the consumption directory and start up the consumer.
|
||||
|
||||
.. code:: bash
|
||||
|
||||
$ cd /path/to/project
|
||||
$ rm data/db.sqlite3 # Delete the database
|
||||
$ cd src
|
||||
$ ./manage.py migrate # Create the database
|
||||
$ ./manage.py createsuperuser
|
||||
$ ./manage.py loaddata /path/to/arbitrary/place/tags.json
|
||||
$ cp /path/to/exported/docs/* /path/to/consumption/dir/
|
||||
$ ./manage.py document_consumer
|
||||
|
||||
Importing your data if you are :ref:`using Docker <setup-installation-docker>`
|
||||
is almost as simple:
|
||||
|
||||
.. code-block:: shell-session
|
||||
|
||||
$ # Stop and remove your current containers
|
||||
$ docker-compose stop
|
||||
$ docker-compose rm -f
|
||||
|
||||
$ # Recreate them, add the superuser
|
||||
$ docker-compose up -d
|
||||
$ docker-compose run --rm webserver createsuperuser
|
||||
|
||||
$ # Load the tags
|
||||
$ cat /path/to/arbitrary/place/tags.json | docker-compose run --rm webserver loaddata_stdin -
|
||||
|
||||
$ # Load your exported documents into the consumption directory
|
||||
$ # (How you do this highly depends on how you have set this up)
|
||||
$ cp /path/to/exported/docs/* /path/to/mounted/consumption/dir/
|
||||
|
||||
After loading the documents into the consumption directory the consumer will
|
||||
immediately start consuming the documents.
|
||||
|
@@ -26,7 +26,7 @@ How to Use It
|
||||
|
||||
The webserver is started via the ``manage.py`` script:
|
||||
|
||||
.. code:: bash
|
||||
.. code-block:: shell-session
|
||||
|
||||
$ /path/to/paperless/src/manage.py runserver
|
||||
|
||||
@@ -64,7 +64,7 @@ How to Use It
|
||||
|
||||
The consumer is started via the ``manage.py`` script:
|
||||
|
||||
.. code:: bash
|
||||
.. code-block:: shell-session
|
||||
|
||||
$ /path/to/paperless/src/manage.py document_consumer
|
||||
|
||||
@@ -95,16 +95,86 @@ How to Use It
|
||||
|
||||
This too is done via the ``manage.py`` script:
|
||||
|
||||
.. code:: bash
|
||||
.. code-block:: shell-session
|
||||
|
||||
$ /path/to/paperless/src/manage.py document_exporter /path/to/somewhere
|
||||
$ /path/to/paperless/src/manage.py document_exporter /path/to/somewhere/
|
||||
|
||||
This will dump all of your PDFs into ``/path/to/somewhere`` for you to do with
|
||||
as you please. The naming scheme on export is identical to that used for
|
||||
import, so should you can now safely delete the entire project directly,
|
||||
database, encrypted PDFs and all, and later create it all again simply by
|
||||
running the consumer again and dumping all of these files into
|
||||
``CONSUMPTION_DIR``.
|
||||
This will dump all of your unencrypted PDFs into ``/path/to/somewhere`` for you
|
||||
to do with as you please. The files are accompanied with a special file,
|
||||
``manifest.json`` which can be used to
|
||||
:ref:`import the files <utilities-importer>` at a later date if you wish.
|
||||
|
||||
|
||||
.. _utilities-exporter-howto-docker:
|
||||
|
||||
Docker
|
||||
______
|
||||
|
||||
If you are :ref:`using Docker <setup-installation-docker>`, running the
|
||||
expoorter is almost as easy. To mount a volume for exports, follow the
|
||||
instructions in the ``docker-compose.yml.example`` file for the ``/export``
|
||||
volume (making the changes in your own ``docker-compose.yml`` file, of course).
|
||||
Once you have the volume mounted, the command to run an export is:
|
||||
|
||||
.. code-block:: shell-session
|
||||
|
||||
$ docker-compose run --rm consumer document_exporter /export
|
||||
|
||||
If you prefer to use ``docker run`` directly, supplying the necessary commandline
|
||||
options:
|
||||
|
||||
.. code-block:: shell-session
|
||||
|
||||
$ # Identify your containers
|
||||
$ docker-compose ps
|
||||
Name Command State Ports
|
||||
-------------------------------------------------------------------------
|
||||
paperless_consumer_1 /sbin/docker-entrypoint.sh ... Exit 0
|
||||
paperless_webserver_1 /sbin/docker-entrypoint.sh ... Exit 0
|
||||
|
||||
$ # Make sure to replace your passphrase and remove or adapt the id mapping
|
||||
$ docker run --rm \
|
||||
--volumes-from paperless_data_1 \
|
||||
--volume /path/to/arbitrary/place:/export \
|
||||
-e PAPERLESS_PASSPHRASE=YOUR_PASSPHRASE \
|
||||
-e USERMAP_UID=1000 -e USERMAP_GID=1000 \
|
||||
paperless document_exporter /export
|
||||
|
||||
|
||||
.. _utilities-importer:
|
||||
|
||||
The Importer
|
||||
------------
|
||||
|
||||
Looking to transfer Paperless data from one instance to another, or just want
|
||||
to restore from a backup? This is your go-to toy.
|
||||
|
||||
|
||||
.. _utilities-importer-howto:
|
||||
|
||||
How to Use It
|
||||
.............
|
||||
|
||||
The importer works just like the exporter. You point it at a directory, and
|
||||
the script does the rest of the work:
|
||||
|
||||
.. code-block:: shell-session
|
||||
|
||||
$ /path/to/paperless/src/manage.py document_importer /path/to/somewhere/
|
||||
|
||||
Docker
|
||||
______
|
||||
|
||||
Assuming that you've already gone through the steps above in the
|
||||
:ref:`export <utilities-exporter-howto-docker>` section, then the easiest thing
|
||||
to do is just re-use the ``/export`` path you already setup:
|
||||
|
||||
.. code-block:: shell-session
|
||||
|
||||
$ docker-compose run --rm consumer document_importer /export
|
||||
|
||||
Similarly, if you're not using docker-compose, you can adjust the export
|
||||
instructions above to do the import.
|
||||
|
||||
|
||||
.. _utilities-retagger:
|
||||
|
Reference in New Issue
Block a user