paperless-ngx/docs/setup.rst
Joshua Taillon 597a7bb391
Update setup.rst
The provided `gunicorn` command did not work for me, failing with the following error:

```
ModuleNotFoundError: No module named '/home/paperless/paperless/src/paperless' 
```

The solution was to provide only `paperless.wsgi` as the argument to `gunicorn`, and provide a flag for `--pythonpath`. After changing it to this, the server started up fine.
2018-11-16 09:20:08 -05:00

474 lines
17 KiB
ReStructuredText

.. _setup:
Setup
=====
Paperless isn't a very complicated app, but there are a few components, so some
basic documentation is in order. If you follow along in this document and
still have trouble, please open an `issue on GitHub`_ so I can fill in the
gaps.
.. _issue on GitHub: https://github.com/danielquinn/paperless/issues
.. _setup-download:
Download
--------
The source is currently only available via GitHub, so grab it from there,
either by using ``git``:
.. code:: bash
$ git clone https://github.com/danielquinn/paperless.git
$ cd paperless
or just download the tarball and go that route:
.. code:: bash
$ cd to the directory where you want to run Paperless
$ wget https://github.com/danielquinn/paperless/archive/master.zip
$ unzip master.zip
$ cd paperless-master
.. _setup-installation:
Installation & Configuration
----------------------------
You can go multiple routes with setting up and running Paperless:
* The `bare metal route`_
* The `docker route`_
The `docker route`_ is quick & easy.
The `bare metal route`_ is a bit more complicated to setup but makes it easier
should you want to contribute some code back.
.. _docker route: setup-installation-docker_
.. _bare metal route: setup-installation-bare-metal_
.. _Docker Machine: https://docs.docker.com/machine/
.. _setup-installation-bare-metal:
Standard (Bare Metal)
+++++++++++++++++++++
1. Install the requirements as per the :ref:`requirements <requirements>` page.
2. Within the extract of master.zip go to the ``src`` directory.
3. Copy ``../paperless.conf.example`` to ``/etc/paperless.conf`` and open it in
your favourite editor. As this file contains passwords. It should only be
readable by user root and paperless! Set the values for:
Set the values for:
* ``PAPERLESS_CONSUMPTION_DIR``: this is where your documents will be
dumped to be consumed by Paperless.
* ``PAPERLESS_OCR_THREADS``: this is the number of threads the OCR process
will spawn to process document pages in parallel.
* ``PAPERLESS_PASSPHRASE``: this is only required if you want to use GPG to
encrypt your document files. This is the passphrase Paperless uses to
encrypt/decrypt the original documents. Don't worry about defining this
if you don't want to use encryption (the default).
4. Initialise the SQLite database with ``./manage.py migrate``.
5. Create a user for your Paperless instance with
``./manage.py createsuperuser``. Follow the prompts to create your user.
6. Start the webserver with ``./manage.py runserver <IP>:<PORT>``.
If no specifc IP or port are given, the default is ``127.0.0.1:8000``
also known as http://localhost:8000/.
You should now be able to visit your (empty) installation at
`Paperless webserver`_ or whatever you chose before. You can login with the
user/pass you created in #5.
7. In a separate window, change to the ``src`` directory in this repo again,
but this time, you should start the consumer script with
``./manage.py document_consumer``.
8. Scan something or put a file into the ``CONSUMPTION_DIR``.
9. Wait a few minutes
10. Visit the document list on your webserver, and it should be there, indexed
and downloadable.
.. caution::
This installation is not secure. Once everything is working head over to
`Making things more permanent`_
.. _Paperless webserver: http://127.0.0.1:8000
.. _Making things more permanent: setup-permanent_
.. _setup-installation-docker:
Docker Method
+++++++++++++
1. Install `Docker`_.
.. caution::
As mentioned earlier, this guide assumes that you use Docker natively
under Linux. If you are using `Docker Machine`_ under Mac OS X or
Windows, you will have to adapt IP addresses, volume-mounting, command
execution and maybe more.
2. Install `docker-compose`_. [#compose]_
.. caution::
If you want to use the included ``docker-compose.yml.example`` file, you
need to have at least Docker version **1.10.0** and docker-compose
version **1.6.0**.
See the `Docker installation guide`_ on how to install the current
version of Docker for your operating system or Linux distribution of
choice. To get an up-to-date version of docker-compose, follow the
`docker-compose installation guide`_ if your package repository doesn't
include it.
.. _Docker installation guide: https://docs.docker.com/engine/installation/
.. _docker-compose installation guide: https://docs.docker.com/compose/install/
3. Create a copy of ``docker-compose.yml.example`` as ``docker-compose.yml``
and a copy of ``docker-compose.env.example`` as ``docker-compose.env``.
You'll be editing both these files: taking a copy ensures that you can
``git pull`` to receive updates without risking merge conflicts with your
modified versions of the configuration files.
4. Modify ``docker-compose.yml`` to your preferences, following the
instructions in comments in the file. The only change that is a hard
requirement is to specify where the consumption directory should
mount.[#dockercomposeyml]_
5. Modify ``docker-compose.env`` and adapt the following environment variables:
``PAPERLESS_PASSPHRASE``
This is the passphrase Paperless uses to encrypt/decrypt the original
document. If you aren't planning on using GPG encryption, you can just
leave this undefined.
``PAPERLESS_OCR_THREADS``
This is the number of threads the OCR process will spawn to process
document pages in parallel. If the variable is not set, Python determines
the core-count of your CPU and uses that value.
``PAPERLESS_OCR_LANGUAGES``
If you want the OCR to recognize other languages in addition to the
default English, set this parameter to a space separated list of
three-letter language-codes after `ISO 639-2/T`_. For a list of available
languages -- including their three letter codes -- see the
`Alpine packagelist`_.
``USERMAP_UID`` and ``USERMAP_GID``
If you want to mount the consumption volume (directory ``/consume`` within
the containers) to a host-directory -- which you probably want to do --
access rights might be an issue. The default user and group ``paperless``
in the containers have an id of 1000. The containers will enforce that the
owning group of the consumption directory will be ``paperless`` to be able
to delete consumed documents. If your host-system has a group with an ID
of 1000 and you don't want this group to have access rights to the
consumption directory, you can use ``USERMAP_GID`` to change the id in the
container and thus the one of the consumption directory. Furthermore, you
can change the id of the default user as well using ``USERMAP_UID``.
6. Run ``docker-compose up -d``. This will create and start the necessary
containers.
7. To be able to login, you will need a super user. To create it, execute the
following command:
.. code-block:: shell-session
$ docker-compose run --rm webserver createsuperuser
This will prompt you to set a username (default ``paperless``), an optional
e-mail address and finally a password.
8. The default ``docker-compose.yml`` exports the webserver on your local port
8000. If you haven't adapted this, you should now be able to visit your
`Paperless webserver`_ at ``http://127.0.0.1:8000``. You can login with the
user and password you just created.
9. Add files to consumption directory the way you prefer to. Following are two
possible options:
1. Mount the consumption directory to a local host path by modifying your
``docker-compose.yml``:
.. code-block:: diff
diff --git a/docker-compose.yml b/docker-compose.yml
--- a/docker-compose.yml
+++ b/docker-compose.yml
@@ -17,9 +18,8 @@ services:
volumes:
- paperless-data:/usr/src/paperless/data
- paperless-media:/usr/src/paperless/media
- - /consume
+ - /local/path/you/choose:/consume
.. danger::
While the consumption container will ensure at startup that it can
**delete** a consumed file from a host-mounted directory, it might
not be able to **read** the document in the first place if the access
rights to the file are incorrect.
Make sure that the documents you put into the consumption directory
will either be readable by everyone (``chmod o+r file.pdf``) or
readable by the default user or group id 1000 (or the one you have
set with ``USERMAP_UID`` or ``USERMAP_GID`` respectively).
2. Use ``docker cp`` to copy your files directly into the container:
.. code-block:: shell-session
$ # Identify your containers
$ docker-compose ps
Name Command State Ports
-------------------------------------------------------------------------
paperless_consumer_1 /sbin/docker-entrypoint.sh ... Exit 0
paperless_webserver_1 /sbin/docker-entrypoint.sh ... Exit 0
$ docker cp /path/to/your/file.pdf paperless_consumer_1:/consume
``docker cp`` is a one-shot-command, just like ``cp``. This means that
every time you want to consume a new document, you will have to execute
``docker cp`` again. You can of course automate this process, but option
1 is generally the preferred one.
.. danger::
``docker cp`` will change the owning user and group of a copied file
to the acting user at the destination, which will be ``root``.
You therefore need to ensure that the documents you want to copy into
the container are readable by everyone (``chmod o+r file.pdf``)
before copying them.
.. _Docker: https://www.docker.com/
.. _docker-compose: https://docs.docker.com/compose/install/
.. _ISO 639-2/T: https://en.wikipedia.org/wiki/List_of_ISO_639-1_codes
.. _Alpine packagelist: https://pkgs.alpinelinux.org/packages?name=tesseract-ocr-data*&arch=x86_64
.. [#compose] You of course don't have to use docker-compose, but it
simplifies deployment immensely. If you know your way around Docker, feel
free to tinker around without using compose!
.. [#dockercomposeyml] If you're upgrading your docker-compose images from
version 1.1.0 or earlier, you might need to change in the
``docker-compose.yml`` file the ``image: pitkley/paperless`` directive in
both the ``webserver`` and ``consumer`` sections to ``build: ./`` as per the
newer ``docker-compose.yml.example`` file
.. _setup-permanent:
Making Things a Little more Permanent
-------------------------------------
Once you've tested things and are happy with the work flow, you should secure
the installation and automate the process of starting the webserver and
consumer.
.. _setup-permanent-webserver:
Using a Real Webserver
++++++++++++++++++++++
The default is to use Django's development server, as that's easy and does the
job well enough on a home network. However it is heavily discouraged to use
it for more than that.
If you want to do things right you should use a real webserver capable of
handling more than one thread. You will also have to let the webserver serve
the static files (CSS, JavaScript) from the directory configured in
``PAPERLESS_STATICDIR``. The default static files directory is ``../static``.
For that you need to activate your virtual environment and collect the static
files with the command:
.. code:: bash
$ cd <paperless directory>/src
$ ./manage.py collectstatic
Apache
~~~~~~
This is a configuration supplied by `steckerhalter`_ on GitHub. It uses Apache
and mod_wsgi, with a Paperless installation in ``/home/paperless/``:
.. code:: apache
<VirtualHost *:80>
ServerName example.com
Alias /static/ /home/paperless/paperless/static/
<Directory /home/paperless/paperless/static>
Require all granted
</Directory>
WSGIScriptAlias / /home/paperless/paperless/src/paperless/wsgi.py
WSGIDaemonProcess example.com user=paperless group=paperless threads=5 python-path=/home/paperless/paperless/src:/home/paperless/.env/lib/python3.4/site-packages
WSGIProcessGroup example.com
<Directory /home/paperless/paperless/src/paperless>
<Files wsgi.py>
Require all granted
</Files>
</Directory>
</VirtualHost>
.. _steckerhalter: https://github.com/steckerhalter
Nginx + Gunicorn
~~~~~~~~~~~~~~~~
If you're using Nginx, the most common setup is to combine it with a
Python-based server like Gunicorn so that Nginx is acting as a proxy. Below is
a copy of a simple Nginx configuration fragment making use of a gunicorn
instance listening on localhost port 8000.
.. code:: nginx
server {
listen 80;
index index.html index.htm index.php;
access_log /var/log/nginx/paperless_access.log;
error_log /var/log/nginx/paperless_error.log;
location /static {
autoindex on;
alias <path-to-paperless-static-directory>;
}
location / {
proxy_set_header Host $http_host;
proxy_set_header X-Real-IP $remote_addr;
proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
proxy_set_header X-Forwarded-Proto $scheme;
proxy_pass http://127.0.0.1:8000;
}
}
The gunicorn server can be started with the command:
.. code-block:: shell
$ <path-to-paperless-virtual-environment>/bin/gunicorn --pythonpath=<path-to-paperless>/src paperless.wsgi -w 2
.. _setup-permanent-standard-systemd:
Standard (Bare Metal + Systemd)
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
If you're running on a bare metal system that's using Systemd, you can use the
service unit files in the ``scripts`` directory to set this up.
1. You'll need to create a group and user called ``paperless`` (without login)
2. Setup Paperless to be in a place that this new user can read and write to.
3. Ensure ``/etc/paperless`` is readable by the ``paperless`` user.
4. Copy the service file from the ``scripts`` directory to
``/etc/systemd/system``.
.. code-block:: bash
$ cp /path/to/paperless/scripts/paperless-consumer.service /etc/systemd/system/
$ cp /path/to/paperless/scripts/paperless-webserver.service /etc/systemd/system/
5. Edit the service file to point the ``ExecStart`` line to the proper location
of your paperless install, referencing the appropriate Python binary. For
example:
``ExecStart=/path/to/python3 /path/to/paperless/src/manage.py document_consumer``.
6. Start and enable (so they start on boot) the services.
.. code-block:: bash
$ systemctl enable paperless-consumer
$ systemctl enable paperless-webserver
$ systemctl start paperless-consumer
$ systemctl start paperless-webserver
.. _setup-permanent-standard-upstart:
Standard (Bare Metal + Upstart)
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Ubuntu 14.04 and earlier use the `Upstart`_ init system to start services
during the boot process. To configure Upstart to run Paperless automatically
after restarting your system:
1. Change to the directory where Upstart's configuration files are kept:
``cd /etc/init``
2. Create a new file: ``sudo nano paperless-server.conf``
3. In the newly-created file enter::
start on (local-filesystems and net-device-up IFACE=eth0)
stop on shutdown
respawn
respawn limit 10 5
script
exec <path to paperless virtual environment>/bin/gunicorn --pythonpath=<path to parperless>/src paperless.wsgi -w 2
end script
Note that you'll need to replace ``/srv/paperless/src/manage.py`` with the
path to the ``manage.py`` script in your installation directory.
If you are using a network interface other than ``eth0``, you will have to
change ``IFACE=eth0``. For example, if you are connected via WiFi, you will
likely need to replace ``eth0`` above with ``wlan0``. To see all interfaces,
run ``ifconfig -a``.
Save the file.
4. Create a new file: ``sudo nano paperless-consumer.conf``
5. In the newly-created file enter::
start on (local-filesystems and net-device-up IFACE=eth0)
stop on shutdown
respawn
respawn limit 10 5
script
exec <path to paperless virtual environment>/bin/python <path to parperless>/manage.py document_consumer
end script
Replace the path placeholder and ``eth0`` with the appropriate value and save the file.
These two configuration files together will start both the Paperless webserver
and document consumer processes when the file system and network interface
specified is available after boot. Furthermore, if either process ever exits
unexpectedly, Upstart will try to restart it a maximum of 10 times within a 5
second period.
.. _Upstart: http://upstart.ubuntu.com/
.. _setup-permanent-docker:
Docker
~~~~~~
If you're using Docker, you can set a restart-policy_ in the
``docker-compose.yml`` to have the containers automatically start with the
Docker daemon.
.. _restart-policy: https://docs.docker.com/engine/reference/commandline/run/#restart-policies-restart