Add Dockerfile for application and documentation

This commit adds a `Dockerfile` to the root of the project, accompanied
by a `docker-compose.yml.example` for simplified deployment. The
`Dockerfile` is agnostic to whether it will be the webserver, the
consumer, or if it is run for a one-off command (i.e. creation of a
superuser, migration of the database, document export, ...).

The containers entrypoint is the `scripts/docker-entrypoint.sh` script.
This script verifies that the required permissions are set, remaps the
default users and/or groups id if required and installs additional
languages if the user wishes to.

After initialization, it analyzes the command the user supplied:

  - If the command starts with a slash, it is expected that the user
    wants to execute a binary file and the command will be executed
    without further intervention. (Using `exec` to effectively replace
    the started shell-script and not have any reaping-issues.)

  - If the command does not start with a slash, the command will be
    passed directly to the `manage.py` script without further
    modification. (Again using `exec`.)

The default command is set to `--help`.

If the user wants to execute a command that is not meant for `manage.py`
but doesn't start with a slash, the Docker `--entrypoint` parameter can
be used to circumvent the mechanics of `docker-entrypoint.sh`.

Further information can be found in `docs/setup.rst` and in
`docs/migrating.rst`.

For additional convenience, a `Dockerfile` has been added to the `docs/`
directory which allows for easy building and serving of the
documentation. This is documented in `docs/requirements.rst`.
This commit is contained in:
Pit Kleyersburg
2016-02-17 18:45:04 +01:00
parent 57bcb883bf
commit 724afa59c7
10 changed files with 474 additions and 6 deletions

18
docs/Dockerfile Normal file
View File

@@ -0,0 +1,18 @@
FROM python:3.5.1
MAINTAINER Pit Kleyersburg <pitkley@googlemail.com>
# Install Sphinx and Pygments
RUN pip install Sphinx Pygments
# Setup directories, copy data
RUN mkdir /build
COPY . /build
WORKDIR /build/docs
# Build documentation
RUN make html
# Start webserver
WORKDIR /build/docs/_build/html
EXPOSE 8000/tcp
CMD ["python3", "-m", "http.server"]

View File

@@ -30,6 +30,20 @@ as part of the update:
Note that it's possible (even likely) that while ``git pull`` may update some
files, the ``migrate`` step may not update anything. This is totally normal.
If you are :ref:`using Docker <setup-installation-docker>` the update process
requires only one additional step:
.. code-block:: shell-session
$ cd /path/to/project
$ git pull
$ docker build -t paperless .
$ docker-compose up -d
$ docker-compose run --rm webserver migrate
If ``git pull`` doesn't report any changes, there is no need to continue with
the remaining steps.
.. _migrating-backup:
@@ -53,6 +67,65 @@ with Django's ``dumpdata`` command, which produces JSON output.
$ ./manage.py document_export /path/to/arbitrary/place/
$ ./manage.py dumpdata documents.Tag > /path/to/arbitrary/place/tags.json
If you are :ref:`using Docker <setup-installation-docker>`, exporting your tags
as JSON is almost as easy:
.. code-block:: shell-session
$ docker-compose run --rm webserver dumpdata documents.Tag > /path/to/arbitrary/place/tags.json
Exporting the documents though is a little more involved, since docker-compose
doesn't support mounting additional volumes with the ``run`` command. You have
three general options:
1. Use the consumption directory if you happen to already have it mounted to a
host directory.
.. code-block:: console
$ # Stop the consumer so that it doesn't consume the exported documents
$ docker-compose stop consumer
$ # Export into the consumption directory
$ docker-compose run --rm consumer document_exporter /consume
2. Add another volume to ``docker-compose.yml`` for exports and use
``docker-compose run``:
.. code-block:: diff
diff --git a/docker-compose.yml b/docker-compose.yml
--- a/docker-compose.yml
+++ b/docker-compose.yml
@@ -17,9 +18,8 @@ services:
volumes:
- paperless-data:/usr/src/paperless/data
- paperless-media:/usr/src/paperless/media
- /consume
+ - /path/to/arbitrary/place:/export
.. code-block:: shell-session
$ docker-compose run --rm consumer document_exporter /export
3. Use ``docker run`` directly, supplying the necessary commandline options:
.. code-block:: shell-session
$ # Identify your containers
$ docker-compose ps
Name Command State Ports
-------------------------------------------------------------------------
paperless_consumer_1 /sbin/docker-entrypoint.sh ... Exit 0
paperless_webserver_1 /sbin/docker-entrypoint.sh ... Exit 0
$ # Make sure to replace your passphrase and remove or adapt the id mapping
$ docker run --rm \
--volumes-from paperless_data_1 \
--volume /path/to/arbitrary/place:/export \
-e PAPERLESS_PASSPHRASE=YOUR_PASSPHRASE \
-e USERMAP_UID=1000 -e USERMAP_GID=1000 \
paperless document_exporter /export
.. _migrating-restoring:
@@ -77,3 +150,25 @@ exported documents into the consumption directory and start up the consumer.
$ cp /path/to/exported/docs/* /path/to/consumption/dir/
$ ./manage.py document_consumer
Importing your data if you are :ref:`using Docker <setup-installation-docker>`
is almost as simple:
.. code-block:: shell-session
$ # Stop and remove your current containers
$ docker-compose stop
$ docker-compose rm -f
$ # Recreate them, add the superuser
$ docker-compose up -d
$ docker-compose run --rm webserver createsuperuser
$ # Load the tags
$ cat /path/to/arbitrary/place/tags.json | docker-compose run --rm webserver loaddata_stdin -
$ # Load your exported documents into the consumption directory
$ # (How you do this highly depends on how you have set this up)
$ cp /path/to/exported/docs/* /path/to/mounted/consumption/dir/
After loading the documents into the consumption directory the consumer will
immediately start consuming the documents.

View File

@@ -101,3 +101,16 @@ you'd like to generate your own docs locally, you'll need to:
$ pip install sphinx
and then cd into the ``docs`` directory and type ``make html``.
If you are using Docker, you can use the following commands to build the
documentation and run a webserver serving it on `port 8001`_:
.. code:: bash
$ pwd
/path/to/paperless
$ docker build -t paperless:docs -f docs/Dockerfile .
$ docker run --rm -it -p "8001:8000" paperless:docs
.. _port 8001: http://127.0.0.1:8001

View File

@@ -37,11 +37,18 @@ or just download the tarball and go that route:
Installation & Configuration
----------------------------
You can go two routes with setting up and running Paperless. The *Vagrant*
route is quick & easy, but means you're running a VM which comes with memory
consumption etc. Alternatively the standard, "bare metal" approach is a little
more complicated.
You can go multiple routes with setting up and running Paperless. The `Vagrant
route`_ is quick & easy, but means you're running a VM which comes with memory
consumption etc. We also `support Docker`_, which you can use natively under
Linux and in a VM with `Docker Machine`_ (this guide was written for native
Docker usage under Linux, you might have to adapt it for Docker Machine.)
Alternatively the standard, `bare metal`_ approach is a little more complicated.
.. _Vagrant route: setup-installation-vagrant_
.. _support Docker: setup-installation-docker_
.. _bare metal: setup-installation-standard_
.. _Docker Machine: https://docs.docker.com/machine/
.. _setup-installation-standard:
@@ -118,6 +125,150 @@ Vagrant Method
.. _Paperless server: http://172.28.128.4:8000
.. _setup-installation-docker:
Docker Method
.............
1. Install `Docker`_.
.. caution::
As mentioned earlier, this guide assumes that you use Docker natively
under Linux. If you are using `Docker Machine`_ under Mac OS X or Windows,
you will have to adapt IP addresses, volume-mounting, command execution
and maybe more.
2. Install `docker-compose`_. [#compose]_
.. caution::
If you want to use the included ``docker-compose.yml.example`` file, you
need to have at least Docker version **1.10.0** and docker-compose
version **1.6.0**.
See the `Docker installation guide`_ on how to install the current
version of Docker for your operating system or Linux distribution of
choice. To get an up-to-date version of docker-compose, follow the
`docker-compose installation guide`_ if your package repository doesn't
include it.
.. _Docker installation guide: https://docs.docker.com/engine/installation/
.. _docker-compose installation guide: https://docs.docker.com/compose/install/
3. Create a copy of ``docker-compose.yml.example`` as ``docker-compose.yml``.
4. Modify ``docker-compose.env`` and adapt the following environment variables:
``PAPERLESS_PASSPHRASE``
This is the passphrase Paperless uses to encrypt/decrypt the original
document.
``PAPERLESS_OCR_THREADS``
This is the number of threads the OCR process will spawn to process
document pages in parallel. If the variable is not set, Python determines
the core-count of your CPU and uses that value.
``PAPERLESS_OCR_LANGUAGES``
If you want the OCR to recognize other languages in addition to the default
English, set this parameter to a space separated list of three-letter
language-codes after `ISO 639-2/T`_. For a list of available languages --
including their three letter codes -- see the `Debian packagelist`_.
``USERMAP_UID`` and ``USERMAP_GID``
If you want to mount the consumption volume (directory ``/consume`` within
the containers) to a host-directory -- which you probably want to do --
access rights might be an issue. The default user and group ``paperless``
in the containers have an id of 1000. The containers will enforce that the
owning group of the consumption directory will be ``paperless`` to be able
to delete consumed documents. If your host-system has a group with an id of
1000 and you don't want this group to have access rights to the consumption
directory, you can use ``USERMAP_GID`` to change the id in the container
and thus the one of the consumption directory. Furthermore, you can change
the id of the default user as well using ``USERMAP_UID``.
5. Run ``docker-compose up -d``. This will create and start the necessary
containers.
6. To be able to login, you will need a super user. To create it, execute the
following command:
.. code-block:: shell-session
$ docker-compose run --rm webserver createsuperuser
This will prompt you to set a username (default ``paperless``), an optional
e-mail address and finally a password.
7. The default ``docker-compose.yml`` exports the webserver on your local port
8000. If you haven't adapted this, you should now be able to visit your
`Paperless webserver`_ at ``http://127.0.0.1:8000``. You can login with the
user and password you just created.
8. Add files to consumption directory the way you prefer to. Following are two
possible options:
1. Mount the consumption directory to a local host path by modifying your
``docker-compose.yml``:
.. code-block:: diff
diff --git a/docker-compose.yml b/docker-compose.yml
--- a/docker-compose.yml
+++ b/docker-compose.yml
@@ -17,9 +18,8 @@ services:
volumes:
- paperless-data:/usr/src/paperless/data
- paperless-media:/usr/src/paperless/media
- - /consume
+ - /local/path/you/choose:/consume
.. danger::
While the consumption container will ensure at startup that it can
**delete** a consumed file from a host-mounted directory, it might not
be able to **read** the document in the first place if the access
rights to the file are incorrect.
Make sure that the documents you put into the consumption directory
will either be readable by everyone (``chmod o+r file.pdf``) or
readable by the default user or group id 1000 (or the one you have set
with ``USERMAP_UID`` or ``USERMAP_GID`` respectively).
2. Use ``docker cp`` to copy your files directly into the container:
.. code-block:: shell-session
$ # Identify your containers
$ docker-compose ps
Name Command State Ports
-------------------------------------------------------------------------
paperless_consumer_1 /sbin/docker-entrypoint.sh ... Exit 0
paperless_webserver_1 /sbin/docker-entrypoint.sh ... Exit 0
$ docker cp /path/to/your/file.pdf paperless_consumer_1:/consume
``docker cp`` is a one-shot-command, just like ``cp``. This means that
every time you want to consume a new document, you will have to execute
``docker cp`` again. You can of course automate this process, but option 1
is generally the preferred one.
.. danger::
``docker cp`` will change the owning user and group of a copied file
to the acting user at the destination, which will be ``root``.
You therefore need to ensure that the documents you want to copy into
the container are readable by everyone (``chmod o+r file.pdf``) before
copying them.
.. _Docker: https://www.docker.com/
.. _docker-compose: https://docs.docker.com/compose/install/
.. _ISO 639-2/T: https://en.wikipedia.org/wiki/List_of_ISO_639-1_codes
.. _Debian packagelist: https://packages.debian.org/search?suite=jessie&searchon=names&keywords=tesseract-ocr-
.. [#compose] You of course don't have to use docker-compose, but it
simplifies deployment immensely. If you know your way around Docker, feel
free to tinker around without using compose!
.. _making-things-a-little-more-permanent:
Making Things a Little more Permanent
@@ -126,5 +277,9 @@ Making Things a Little more Permanent
Once you've tested things and are happy with the work flow, you can automate the
process of starting the webserver and consumer automatically. If you're running
on a bare metal system that's using Systemd, you can use the service unit files
in the ``scripts`` directory to set this up. If you're on a SysV or other
startup system (like the Vagrant box), then you're currently on your own.
in the ``scripts`` directory to set this up. If you're on another startup
system or are using a Vagrant box, then you're currently on your own. If you are
using Docker, you can set a restart-policy_ in the ``docker-compose.yml`` to
have the containers automatically start with the Docker daemon.
.. _restart-policy: https://docs.docker.com/engine/reference/commandline/run/#restart-policies-restart