Merge branch 'master' into dev

This commit is contained in:
Jonas Winkler
2020-10-16 15:02:57 +02:00
39 changed files with 1939 additions and 398 deletions

44
docs/_static/lxc-install.svg vendored Normal file

File diff suppressed because one or more lines are too long

After

Width:  |  Height:  |  Size: 1.9 MiB

View File

@@ -0,0 +1,158 @@
#!/usr/bin/env bash
# Bash script to install paperless in lxc containter
# paperless.lan
#
# Will set-up paperless, apache2 and proftpd
#
# lxc launch ubuntu: paperless
# lxc exec paperless -- sh -c "sudo apt-get update && sudo apt-get install -y wget"
# lxc exec paperless -- sh -c "wget https://raw.githubusercontent.com/the-paperless-project/paperless/master/docs/examples/lxc/lxc-install.sh && /bin/bash lxc-install.sh --email "
#
#
set +e
PASSWORD=$(< /dev/urandom tr -dc _A-Z-a-z-0-9+@%^{} | head -c20;echo;)
EMAIL=
function displayHelp() {
echo "available parameters:
-e <email> | --email <email>
-p <password> | --password <password>
"
}
POSITIONAL=()
while [[ $# -gt 0 ]]
do
key="$1"
i=$key
case $i in
-e|--email)
EMAIL="${2}"
shift
shift
;;
-p|--password)
PASSWORD="${2}"
shift
shift
;;
--default|-h|--help)
shift
displayHelp
exit 0
;;
*)
echo "argument: $i not recognized"
exit 2
;;
esac
done
set -- "${POSITIONAL[@]}" # restore positional parameters
if [ -z $EMAIL ]; then
echo "missing email, try running with -h "
exit 3
fi
if [[ $(/usr/bin/id -u) -ne 0 ]]; then
echo "Not running as root"
exit
fi
if [ $(grep -c paperless /etc/passwd) -eq 0 ]; then
# Add paperless user with no password
adduser --disabled-password --gecos "" paperless
fi
if [ $(grep -c ftpupload /etc/passwd) -eq 0 ]; then
# Add ftpupload
adduser --disabled-password --gecos "" ftpupload
echo "Set ftpupload password: "
#passwd ftpupload
#TODO: generate some password and allow parameter
echo "ftpupload:ftpuploadpassword" | chpasswd
fi
if [ $(id -nG paperless | grep -Fcw ftpupload) -eq 0 ]; then
# Allow paperless group to access
adduser paperless ftpupload
chmod g+w /home/ftpupload
fi
# Get apt up to date
apt-get update
# Needed for plain Paperless
apt-get -y install unpaper gnupg libpoppler-cpp-dev python3-pyocr tesseract-ocr imagemagick optipng git
# Needed for Apache
apt-get -y install apache2 libapache2-mod-wsgi-py3
if [ ! -f /etc/proftpd/proftpd.conf ]; then
# Install ftp server and make sure all uplaoded files are owned by paperless
apt-get -y install proftpd
fi
if [ $(grep -c paperless /etc/proftpd/proftpd.conf) -eq 0 ]; then
cat <<EOF >> /etc/proftpd/proftpd.conf
<Directory /home/ftpupload/>
UserOwner paperless
GroupOwner paperless
</Directory>
EOF
systemctl restart proftpd
fi
#Get Paperless from git
su -c "cd /home/paperless ; git clone https://github.com/the-paperless-project/paperless" paperless
# Install Pip Requirements
apt-get -y install python3-pip python3-venv
cd /home/paperless/paperless
pip3 install -r requirements.txt
# Take paperless.conf.example and set consumuption dir (ftp dir)
sed -e '/PAPERLESS_CONSUMPTION_DIR=/s/=.*/=\"\/home\/ftpupload\/\"/' \
/home/paperless/paperless/paperless.conf.example >/etc/paperless.conf
# Update /etc/paperless.conf with PAPERLESS_SECRET_KEY
SECRET=$(strings /dev/urandom | grep -o '[[:alnum:]]' | head -n 30 | tr -d '\n'; echo)
sed -i "s/#PAPERLESS_SECRET_KEY.*/PAPERLESS_SECRET_KEY=$SECRET/" /etc/paperless.conf
#Initialise the SQLite database
su -c "cd /home/paperless/paperless/src/ ; ./manage.py migrate" paperless
echo "if superuser doesn't exists, create one with login: paperless and password: ${PASSWORD}"
#Create a user for your Paperless instance
su -c "cd /home/paperless/paperless/src/ ; echo ./manage.py create_superuser_with_password --username paperless --email ${EMAIL} --password ${PASSWORD} --preserve" paperless
su -c "cd /home/paperless/paperless/src/ ; ./manage.py create_superuser_with_password --username paperless --email ${EMAIL} --password ${PASSWORD} --preserve" paperless
if [ ! -d /home/paperless/paperless/static ]; then
# 167 static files copied to '/home/paperless/paperless/static'.
su -c "cd /home/paperless/paperless/src/ ; ./manage.py collectstatic" paperless
fi
if [ ! -f /etc/apache2/sites-available/paperless.conf ]; then
# Set-up apache
cp /home/paperless/paperless/docs/examples/lxc/paperless.conf /etc/apache2/sites-available/
a2dissite 000-default.conf
a2ensite paperless.conf
systemctl reload apache2
fi
sed -e "s:home/paperless/project/virtualenv/bin/python:usr/bin/python3:" \
/home/paperless/paperless/scripts/paperless-consumer.service \
>/etc/systemd/system/paperless-consumer.service
sed -i "s:/home/paperless/project/src/manage.py:/home/paperless/paperless/src/manage.py:" \
/etc/systemd/system/paperless-consumer.service
systemctl enable paperless-consumer
systemctl start paperless-consumer
# convert-im6.q16: not authorized
# Security risk ?
# https://stackoverflow.com/questions/42928765/convertnot-authorized-aaaa-error-constitute-c-readimage-453
if [ -f /etc/ImageMagick-6/policy.xml ]; then
mv /etc/ImageMagick-6/policy.xml /etc/ImageMagick-6/policy.xmlout
fi

View File

@@ -0,0 +1,18 @@
<VirtualHost *:80>
ServerName paperless.lan
Alias /static/ /home/paperless/paperless/static/
<Directory /home/paperless/paperless/static>
Require all granted
</Directory>
WSGIScriptAlias / /home/paperless/paperless/src/paperless/wsgi.py
WSGIDaemonProcess paperless.lan user=paperless group=paperless threads=5 python-path=/home/paperless/paperless/src
WSGIProcessGroup paperless.lan
<Directory /home/paperless/paperless/src/paperless>
<Files wsgi.py>
Require all granted
</Files>
</Directory>
</VirtualHost>

View File

@@ -54,6 +54,34 @@ filename as described above.
.. _dateparser: https://github.com/scrapinghub/dateparser/blob/v0.7.0/docs/usage.rst#settings
Transforming filenames for parsing
----------------------------------
Some devices can't produce filenames that can be parsed by the default
parser. By configuring the option ``PAPERLESS_FILENAME_PARSE_TRANSFORMS`` in
``paperless.conf`` one can add transformations that are applied to the filename
before it's parsed.
The option contains a list of dictionaries of regular expressions (key:
``pattern``) and replacements (key: ``repl``) in JSON format, which are
applied in order by passing them to ``re.subn``. Transformation stops
after the first match, so at most one transformation is applied. The general
syntax is
.. code:: python
[{"pattern":"pattern1", "repl":"repl1"}, {"pattern":"pattern2", "repl":"repl2"}, ..., {"pattern":"patternN", "repl":"replN"}]
The example below is for a Brother ADS-2400N, a scanner that allows
different names to different hardware buttons (useful for handling
multiple entities in one instance), but insists on adding ``_<count>``
to the filename.
.. code:: python
# Brother profile configuration, support "Name_Date_Count" (the default
# setting) and "Name_Count" (use "Name" as tag and "Count" as title).
PAPERLESS_FILENAME_PARSE_TRANSFORMS=[{"pattern":"^([a-z]+)_(\\d{8})_(\\d{6})_([0-9]+)\\.", "repl":"\\2\\3Z - \\4 - \\1."}, {"pattern":"^([a-z]+)_([0-9]+)\\.", "repl":" - \\2 - \\1."}]
.. _guesswork-content:
Reading the Document Contents

View File

@@ -92,7 +92,7 @@ files, the ``migrate`` step may not update anything. This is totally normal.
Additionally, as new features are added, the ability to control those features
is typically added by way of an environment variable set in ``paperless.conf``.
You may want to take a look at the ``paperless.conf.example`` file to see if
there's anything new in there compared to what you've got int ``/etc``.
there's anything new in there compared to what you've got in ``/etc``.
If you are :ref:`using Docker <setup-installation-docker>` the update process
is similar:

View File

@@ -18,7 +18,7 @@ should work) that has the following software installed:
.. _GNU Privacy Guard: https://gnupg.org
.. _Tesseract: https://github.com/tesseract-ocr
.. _Imagemagick: http://imagemagick.org/
.. _unpaper: https://www.flameeyes.eu/projects/unpaper
.. _unpaper: https://github.com/unpaper/unpaper
.. _libpoppler-cpp-dev: https://poppler.freedesktop.org/
.. _optipng: http://optipng.sourceforge.net/

View File

@@ -19,15 +19,20 @@ that works right for you based on recommentations from other Paperless users.
+---------+----------------+-----+-----+-----+----------------+
| Brother | `MFC-J5910DW`_ | yes | | | `bmsleight`_ |
+---------+----------------+-----+-----+-----+----------------+
| Brother | `MFC-9142CDN`_ | yes | | yes | `REOLDEV`_ |
+---------+----------------+-----+-----+-----+----------------+
| Fujitsu | `ix500`_ | yes | | yes | `eonist`_ |
+---------+----------------+-----+-----+-----+----------------+
.. _ADS-1500W: https://www.brother.ca/en/p/ads1500w
.. _MFC-J6930DW: https://www.brother.ca/en/p/MFCJ6930DW
.. _MFC-J5910DW: https://www.brother.co.uk/printers/inkjet-printers/mfcj5910dw
.. _MFC-9142CDN: https://www.brother.co.uk/printers/laser-printers/mfc9140cdn
.. _ix500: http://www.fujitsu.com/us/products/computing/peripheral/scanners/scansnap/ix500/
.. _danielquinn: https://github.com/danielquinn
.. _ayounggun: https://github.com/ayounggun
.. _bmsleight: https://github.com/bmsleight
.. _eonist: https://github.com/eonist
.. _REOLDEV: https://github.com/REOLDEV

View File

@@ -43,6 +43,7 @@ You can go multiple routes with setting up and running Paperless:
* The `bare metal route`_
* The `docker route`_
* A suggested `linux containers route`_
The `docker route`_ is quick & easy.
@@ -50,10 +51,14 @@ The `docker route`_ is quick & easy.
The `bare metal route`_ is a bit more complicated to setup but makes it easier
should you want to contribute some code back.
The `linux containers route`_ is quick, but makes alot of assumptions on the
set-up, on the other hand the script could be used to install on a base
debian or ubuntu server.
.. _docker route: setup-installation-docker_
.. _bare metal route: setup-installation-bare-metal_
.. _Docker Machine: https://docs.docker.com/machine/
.. _linux containers route: setup-installation-linux-containers_
.. _setup-installation-bare-metal:
@@ -82,21 +87,22 @@ Standard (Bare Metal)
this is the default.
4. Initialise the SQLite database with ``./manage.py migrate``.
5. Create a user for your Paperless instance with
5. Collect the static files for the webserver with ``./manage.py collectstatic``.
6. Create a user for your Paperless instance with
``./manage.py createsuperuser``. Follow the prompts to create your user.
6. Start the webserver with ``./manage.py runserver <IP>:<PORT>``.
7. Start the webserver with ``./manage.py runserver <IP>:<PORT>``.
If no specific IP or port is given, the default is ``127.0.0.1:8000`` also
known as http://localhost:8000/.
You should now be able to visit your (empty) installation at
`Paperless webserver`_ or whatever you chose before. You can login with the
user/pass you created in #5.
7. In a separate window, change to the ``src`` directory in this repo again,
8. In a separate window, change to the ``src`` directory in this repo again,
but this time, you should start the consumer script with
``./manage.py document_consumer``.
8. Scan something or put a file into the ``CONSUMPTION_DIR``.
9. Wait a few minutes
10. Visit the document list on your webserver, and it should be there, indexed
9. Scan something or put a file into the ``CONSUMPTION_DIR``.
10. Wait a few minutes
11. Visit the document list on your webserver, and it should be there, indexed
and downloadable.
.. caution::
@@ -126,8 +132,8 @@ Docker Method
.. caution::
If you want to use the included ``docker-compose.yml.example`` file, you
need to have at least Docker version **1.10.0** and docker-compose
version **1.6.0**.
need to have at least Docker version **1.12.0** and docker-compose
version **1.9.0**.
See the `Docker installation guide`_ on how to install the current
version of Docker for your operating system or Linux distribution of
@@ -153,7 +159,7 @@ Docker Method
If you are using NFS mounts for the consume directory you also need to
change the command to turn off inotify as it doesn't work with NFS
`command: ["document_consumer", "--no-inotify"]`
``command: ["document_consumer", "--no-inotify"]``
5. Modify ``docker-compose.env`` and adapt the following environment variables:
@@ -187,6 +193,13 @@ Docker Method
container and thus the one of the consumption directory. Furthermore, you
can change the id of the default user as well using ``USERMAP_UID``.
``PAPERLESS_USE_SSL``
If you want Paperless to use SSL for the user interface, set this variable
to ``true``. You also need to copy your certificate and key to the ``data``
directory, named ``ssl.cert`` and ``ssl.key``.
This is not an ideal solution and, if possible, a reverse proxy with nginx
is preferred.
6. Run ``docker-compose up -d``. This will create and start the necessary
containers.
7. To be able to login, you will need a super user. To create it, execute the
@@ -200,7 +213,8 @@ Docker Method
e-mail address and finally a password.
8. The default ``docker-compose.yml`` exports the webserver on your local port
8000. If you haven't adapted this, you should now be able to visit your
`Paperless webserver`_ at ``http://127.0.0.1:8000``. You can login with the
`Paperless webserver`_ at ``http://127.0.0.1:8000`` (or
``https://127.0.0.1:8000`` if you enabled SSL). You can login with the
user and password you just created.
9. Add files to consumption directory the way you prefer to. Following are two
possible options:
@@ -326,7 +340,7 @@ and mod_wsgi, with a Paperless installation in ``/home/paperless/``:
</Directory>
WSGIScriptAlias / /home/paperless/paperless/src/paperless/wsgi.py
WSGIDaemonProcess example.com user=paperless group=paperless threads=5 python-path=/home/paperless/paperless/src:/home/paperless/.env/lib/python3.4/site-packages
WSGIDaemonProcess example.com user=paperless group=paperless threads=5 python-path=/home/paperless/paperless/src:/home/paperless/.env/lib/python3.6/site-packages
WSGIProcessGroup example.com
<Directory /home/paperless/paperless/src/paperless>
@@ -484,3 +498,45 @@ If you're using Docker, you can set a restart-policy_ in the
Docker daemon.
.. _restart-policy: https://docs.docker.com/engine/reference/commandline/run/#restart-policies-restart
.. _setup-installation-linux-containers:
Suggested way for Linux Container Method
++++++++++++++++++++++++++++++++++++++++
This method uses some rigid assumptions, for the best set-up:-
* Ubuntu lts as the container
* Apache as the webserver
* proftpd as ftp server
* ftpupload as the ftp user
* paperless as the main user for website
* http://paperless.lan is the desired lan url
* LXC set to give ip addresses on your lan
This could also be used as an install on a base debain/ubuntu server,
if the above assumptions are acceptable.
1. Install lxc
2. Lanch paperless container
.. code:: bash
$ lxc launch ubuntu: paperless
3. Run install script within container
.. code:: bash
$ lxc exec paperless -- sh -c "wget https://raw.githubusercontent.com/the-paperless-project/paperless/master/docs/examples/lxc/lxc-install.sh && /bin/bash lxc-install.sh --email"
The script will ask you for an ftpupload password.
As well as the super-user for paperless web front-end.
After around 10 mins, http://paperless.lan is ready and
ftp://paperless.lan with user: ftpupload
See the `Installation recording <_static/lxc-install.svg>`_.

View File

@@ -72,4 +72,4 @@ with a DPI of 300, then merging the images into the single PDF with
For more information on this and situations like it, you should take a look
at `Issue #118`_ as that's where this tip originated.
.. _Issue #118: https://github.com/the-paperless-project/paperless/issues/118
.. _Issue #118: https://github.com/the-paperless-project/paperless/issues/118

View File

@@ -193,18 +193,19 @@ instructions above to do the import.
.. _utilities-retagger:
The Re-tagger
-------------
Re-running your tagging and correspondent matchers
--------------------------------------------------
Say you've imported a few hundred documents and now want to introduce a tag
and apply its matching to all of the currently-imported docs. This problem is
common enough that there's a tool for it.
Say you've imported a few hundred documents and now want to introduce
a tag or set up a new correspondent, and apply its matching to all of
the currently-imported docs. This problem is common enough that
there are tools for it.
.. _utilities-retagger-howto:
How to Use It
.............
How to Do It
............
This too is done via the ``manage.py`` script:
@@ -212,10 +213,16 @@ This too is done via the ``manage.py`` script:
$ /path/to/paperless/src/manage.py document_retagger
That's it. It'll loop over all of the documents in your database and attempt
to match all of your tags to them. If one matches, it'll be applied. And
don't worry, you can run this as often as you like, it won't double-tag
a document.
Run this after changing or adding tagging rules. It'll loop over all
of the documents in your database and attempt to match all of your
tags to them. If one matches, it'll be applied. And don't worry, you
can run this as often as you like, it won't double-tag a document.
.. code:: bash
$ /path/to/paperless/src/manage.py document_correspondents
This is the similar command to run after adding or changing a correspondent.
.. _utilities-encyption:
@@ -232,10 +239,10 @@ Basic Syntax
Again we'll use the ``manage.py`` script, passing ``change_storage_type``:
.. code:: bash
.. code:: console
$ /path/to/paperless/src/manage.py change_storage_type --help
usage: manage.py change_storage_type [-h] [--version] [-v {0,1,2,3}]
usage: manage.py change_storage_type [-h] [--version] [-v {0,1,2,3}]
[--settings SETTINGS]
[--pythonpath PYTHONPATH] [--traceback]
[--no-color] [--passphrase PASSPHRASE]