mirror of
https://github.com/paperless-ngx/paperless-ngx.git
synced 2025-04-09 09:58:20 -05:00
redirect all RTD pages
This commit is contained in:
parent
3e22e8e0b9
commit
304cfc42a9
8
docs/_static/css/custom.css
vendored
8
docs/_static/css/custom.css
vendored
@ -595,3 +595,11 @@ html.writer-html5 .rst-content dl.footnote code {
|
|||||||
.wy-nav-content-wrap {
|
.wy-nav-content-wrap {
|
||||||
z-index: 20;
|
z-index: 20;
|
||||||
}
|
}
|
||||||
|
|
||||||
|
.rst-content .toctree-wrapper {
|
||||||
|
display: none;
|
||||||
|
}
|
||||||
|
|
||||||
|
.redirect-notice {
|
||||||
|
font-size: 2.5rem;
|
||||||
|
}
|
||||||
|
25
docs/_templates/layout.html
vendored
25
docs/_templates/layout.html
vendored
@ -8,6 +8,31 @@
|
|||||||
|
|
||||||
document.documentElement.classList.toggle("dark-mode", darkModeState);
|
document.documentElement.classList.toggle("dark-mode", darkModeState);
|
||||||
document.documentElement.classList.toggle("light-mode", !darkModeState);
|
document.documentElement.classList.toggle("light-mode", !darkModeState);
|
||||||
|
|
||||||
|
const RTD_TO_MKD = {
|
||||||
|
"index.html": "",
|
||||||
|
"setup.html": "setup",
|
||||||
|
"usage_overview.html": "usage",
|
||||||
|
"advanced_usage.html": "advanced_usage",
|
||||||
|
"administration.html": "administration",
|
||||||
|
"configuration.html": "configuration",
|
||||||
|
"api.html": "api",
|
||||||
|
"faq.html": "faq",
|
||||||
|
"troubleshooting.html": "troubleshooting",
|
||||||
|
"extending.html": "development",
|
||||||
|
"scanners.html": "",
|
||||||
|
"screenshots.html": "",
|
||||||
|
"changelog.html": "changelog",
|
||||||
|
}
|
||||||
|
|
||||||
|
const path = RTD_TO_MKD[window.location.pathname.substring(window.location.pathname.lastIndexOf("/") + 1)] + "/";
|
||||||
|
const hash = window.location.hash;
|
||||||
|
const redirectURL = new URL(path + hash, "https://paperless-ngx.com/");
|
||||||
|
console.log(`Redirecting to ${redirectURL} in 3 seconds...`);
|
||||||
|
|
||||||
|
setTimeout(() => {
|
||||||
|
window.location.replace(redirectURL);
|
||||||
|
}, 3000);
|
||||||
</script>
|
</script>
|
||||||
{{ super() }}
|
{{ super() }}
|
||||||
{% endblock %}
|
{% endblock %}
|
||||||
|
@ -1,531 +1,11 @@
|
|||||||
|
.. _administration:
|
||||||
|
|
||||||
**************
|
**************
|
||||||
Administration
|
Administration
|
||||||
**************
|
**************
|
||||||
|
|
||||||
.. _administration-backup:
|
.. cssclass:: redirect-notice
|
||||||
|
|
||||||
Making backups
|
The Paperless-ngx documentation has permanently moved.
|
||||||
##############
|
|
||||||
|
|
||||||
Multiple options exist for making backups of your paperless instance,
|
You will be redirected shortly...
|
||||||
depending on how you installed paperless.
|
|
||||||
|
|
||||||
Before making backups, make sure that paperless is not running.
|
|
||||||
|
|
||||||
Options available to any installation of paperless:
|
|
||||||
|
|
||||||
* Use the :ref:`document exporter <utilities-exporter>`.
|
|
||||||
The document exporter exports all your documents, thumbnails and
|
|
||||||
metadata to a specific folder. You may import your documents into a
|
|
||||||
fresh instance of paperless again or store your documents in another
|
|
||||||
DMS with this export.
|
|
||||||
* The document exporter is also able to update an already existing export.
|
|
||||||
Therefore, incremental backups with ``rsync`` are entirely possible.
|
|
||||||
|
|
||||||
.. caution::
|
|
||||||
|
|
||||||
You cannot import the export generated with one version of paperless in a
|
|
||||||
different version of paperless. The export contains an exact image of the
|
|
||||||
database, and migrations may change the database layout.
|
|
||||||
|
|
||||||
Options available to docker installations:
|
|
||||||
|
|
||||||
* Backup the docker volumes. These usually reside within
|
|
||||||
``/var/lib/docker/volumes`` on the host and you need to be root in order
|
|
||||||
to access them.
|
|
||||||
|
|
||||||
Paperless uses 4 volumes:
|
|
||||||
|
|
||||||
* ``paperless_media``: This is where your documents are stored.
|
|
||||||
* ``paperless_data``: This is where auxillary data is stored. This
|
|
||||||
folder also contains the SQLite database, if you use it.
|
|
||||||
* ``paperless_pgdata``: Exists only if you use PostgreSQL and contains
|
|
||||||
the database.
|
|
||||||
* ``paperless_dbdata``: Exists only if you use MariaDB and contains
|
|
||||||
the database.
|
|
||||||
|
|
||||||
Options available to bare-metal and non-docker installations:
|
|
||||||
|
|
||||||
* Backup the entire paperless folder. This ensures that if your paperless instance
|
|
||||||
crashes at some point or your disk fails, you can simply copy the folder back
|
|
||||||
into place and it works.
|
|
||||||
|
|
||||||
When using PostgreSQL or MariaDB, you'll also have to backup the database.
|
|
||||||
|
|
||||||
.. _migrating-restoring:
|
|
||||||
|
|
||||||
Restoring
|
|
||||||
=========
|
|
||||||
|
|
||||||
.. _administration-updating:
|
|
||||||
|
|
||||||
Updating Paperless
|
|
||||||
##################
|
|
||||||
|
|
||||||
Docker Route
|
|
||||||
============
|
|
||||||
|
|
||||||
If a new release of paperless-ngx is available, upgrading depends on how you
|
|
||||||
installed paperless-ngx in the first place. The releases are available at the
|
|
||||||
`release page <https://github.com/paperless-ngx/paperless-ngx/releases>`_.
|
|
||||||
|
|
||||||
First of all, ensure that paperless is stopped.
|
|
||||||
|
|
||||||
.. code:: shell-session
|
|
||||||
|
|
||||||
$ cd /path/to/paperless
|
|
||||||
$ docker-compose down
|
|
||||||
|
|
||||||
After that, :ref:`make a backup <administration-backup>`.
|
|
||||||
|
|
||||||
A. If you pull the image from the docker hub, all you need to do is:
|
|
||||||
|
|
||||||
.. code:: shell-session
|
|
||||||
|
|
||||||
$ docker-compose pull
|
|
||||||
$ docker-compose up
|
|
||||||
|
|
||||||
The docker-compose files refer to the ``latest`` version, which is always the latest
|
|
||||||
stable release.
|
|
||||||
|
|
||||||
B. If you built the image yourself, do the following:
|
|
||||||
|
|
||||||
.. code:: shell-session
|
|
||||||
|
|
||||||
$ git pull
|
|
||||||
$ docker-compose build
|
|
||||||
$ docker-compose up
|
|
||||||
|
|
||||||
Running ``docker-compose up`` will also apply any new database migrations.
|
|
||||||
If you see everything working, press CTRL+C once to gracefully stop paperless.
|
|
||||||
Then you can start paperless-ngx with ``-d`` to have it run in the background.
|
|
||||||
|
|
||||||
.. note::
|
|
||||||
|
|
||||||
In version 0.9.14, the update process was changed. In 0.9.13 and earlier, the
|
|
||||||
docker-compose files specified exact versions and pull won't automatically
|
|
||||||
update to newer versions. In order to enable updates as described above, either
|
|
||||||
get the new ``docker-compose.yml`` file from `here <https://github.com/paperless-ngx/paperless-ngx/tree/master/docker/compose>`_
|
|
||||||
or edit the ``docker-compose.yml`` file, find the line that says
|
|
||||||
|
|
||||||
.. code::
|
|
||||||
|
|
||||||
image: ghcr.io/paperless-ngx/paperless-ngx:0.9.x
|
|
||||||
|
|
||||||
and replace the version with ``latest``:
|
|
||||||
|
|
||||||
.. code::
|
|
||||||
|
|
||||||
image: ghcr.io/paperless-ngx/paperless-ngx:latest
|
|
||||||
|
|
||||||
.. note::
|
|
||||||
In version 1.7.1 and onwards, the Docker image can now be pinned to a release series.
|
|
||||||
This is often combined with automatic updaters such as Watchtower to allow safer
|
|
||||||
unattended upgrading to new bugfix releases only. It is still recommended to always
|
|
||||||
review release notes before upgrading. To pin your install to a release series, edit
|
|
||||||
the ``docker-compose.yml`` find the line that says
|
|
||||||
|
|
||||||
.. code::
|
|
||||||
|
|
||||||
image: ghcr.io/paperless-ngx/paperless-ngx:latest
|
|
||||||
|
|
||||||
and replace the version with the series you want to track, for example:
|
|
||||||
|
|
||||||
.. code::
|
|
||||||
|
|
||||||
image: ghcr.io/paperless-ngx/paperless-ngx:1.7
|
|
||||||
|
|
||||||
Bare Metal Route
|
|
||||||
================
|
|
||||||
|
|
||||||
After grabbing the new release and unpacking the contents, do the following:
|
|
||||||
|
|
||||||
1. Update dependencies. New paperless version may require additional
|
|
||||||
dependencies. The dependencies required are listed in the section about
|
|
||||||
:ref:`bare metal installations <setup-bare_metal>`.
|
|
||||||
|
|
||||||
2. Update python requirements. Keep in mind to activate your virtual environment
|
|
||||||
before that, if you use one.
|
|
||||||
|
|
||||||
.. code:: shell-session
|
|
||||||
|
|
||||||
$ pip install -r requirements.txt
|
|
||||||
|
|
||||||
3. Migrate the database.
|
|
||||||
|
|
||||||
.. code:: shell-session
|
|
||||||
|
|
||||||
$ cd src
|
|
||||||
$ python3 manage.py migrate
|
|
||||||
|
|
||||||
This might not actually do anything. Not every new paperless version comes with new
|
|
||||||
database migrations.
|
|
||||||
|
|
||||||
Downgrading Paperless
|
|
||||||
#####################
|
|
||||||
|
|
||||||
Downgrades are possible. However, some updates also contain database migrations (these change the layout of the database and may move data).
|
|
||||||
In order to move back from a version that applied database migrations, you'll have to revert the database migration *before* downgrading,
|
|
||||||
and then downgrade paperless.
|
|
||||||
|
|
||||||
This table lists the compatible versions for each database migration number.
|
|
||||||
|
|
||||||
+------------------+-----------------+
|
|
||||||
| Migration number | Version range |
|
|
||||||
+------------------+-----------------+
|
|
||||||
| 1011 | 1.0.0 |
|
|
||||||
+------------------+-----------------+
|
|
||||||
| 1012 | 1.1.0 - 1.2.1 |
|
|
||||||
+------------------+-----------------+
|
|
||||||
| 1014 | 1.3.0 - 1.3.1 |
|
|
||||||
+------------------+-----------------+
|
|
||||||
| 1016 | 1.3.2 - current |
|
|
||||||
+------------------+-----------------+
|
|
||||||
|
|
||||||
Execute the following management command to migrate your database:
|
|
||||||
|
|
||||||
.. code:: shell-session
|
|
||||||
|
|
||||||
$ python3 manage.py migrate documents <migration number>
|
|
||||||
|
|
||||||
.. note::
|
|
||||||
|
|
||||||
Some migrations cannot be undone. The command will issue errors if that happens.
|
|
||||||
|
|
||||||
.. _utilities-management-commands:
|
|
||||||
|
|
||||||
Management utilities
|
|
||||||
####################
|
|
||||||
|
|
||||||
Paperless comes with some management commands that perform various maintenance
|
|
||||||
tasks on your paperless instance. You can invoke these commands in the following way:
|
|
||||||
|
|
||||||
With docker-compose, while paperless is running:
|
|
||||||
|
|
||||||
.. code:: shell-session
|
|
||||||
|
|
||||||
$ cd /path/to/paperless
|
|
||||||
$ docker-compose exec webserver <command> <arguments>
|
|
||||||
|
|
||||||
With docker, while paperless is running:
|
|
||||||
|
|
||||||
.. code:: shell-session
|
|
||||||
|
|
||||||
$ docker exec -it <container-name> <command> <arguments>
|
|
||||||
|
|
||||||
Bare metal:
|
|
||||||
|
|
||||||
.. code:: shell-session
|
|
||||||
|
|
||||||
$ cd /path/to/paperless/src
|
|
||||||
$ python3 manage.py <command> <arguments>
|
|
||||||
|
|
||||||
All commands have built-in help, which can be accessed by executing them with
|
|
||||||
the argument ``--help``.
|
|
||||||
|
|
||||||
.. _utilities-exporter:
|
|
||||||
|
|
||||||
Document exporter
|
|
||||||
=================
|
|
||||||
|
|
||||||
The document exporter exports all your data from paperless into a folder for
|
|
||||||
backup or migration to another DMS.
|
|
||||||
|
|
||||||
If you use the document exporter within a cronjob to backup your data you might use the ``-T`` flag behind exec to suppress "The input device is not a TTY" errors. For example: ``docker-compose exec -T webserver document_exporter ../export``
|
|
||||||
|
|
||||||
.. code::
|
|
||||||
|
|
||||||
document_exporter target [-c] [-f] [-d]
|
|
||||||
|
|
||||||
optional arguments:
|
|
||||||
-c, --compare-checksums
|
|
||||||
-f, --use-filename-format
|
|
||||||
-d, --delete
|
|
||||||
|
|
||||||
``target`` is a folder to which the data gets written. This includes documents,
|
|
||||||
thumbnails and a ``manifest.json`` file. The manifest contains all metadata from
|
|
||||||
the database (correspondents, tags, etc).
|
|
||||||
|
|
||||||
When you use the provided docker compose script, specify ``../export`` as the
|
|
||||||
target. This path inside the container is automatically mounted on your host on
|
|
||||||
the folder ``export``.
|
|
||||||
|
|
||||||
If the target directory already exists and contains files, paperless will assume
|
|
||||||
that the contents of the export directory are a previous export and will attempt
|
|
||||||
to update the previous export. Paperless will only export changed and added files.
|
|
||||||
Paperless determines whether a file has changed by inspecting the file attributes
|
|
||||||
"date/time modified" and "size". If that does not work out for you, specify
|
|
||||||
``--compare-checksums`` and paperless will attempt to compare file checksums instead.
|
|
||||||
This is slower.
|
|
||||||
|
|
||||||
Paperless will not remove any existing files in the export directory. If you want
|
|
||||||
paperless to also remove files that do not belong to the current export such as files
|
|
||||||
from deleted documents, specify ``--delete``. Be careful when pointing paperless to
|
|
||||||
a directory that already contains other files.
|
|
||||||
|
|
||||||
The filenames generated by this command follow the format
|
|
||||||
``[date created] [correspondent] [title].[extension]``.
|
|
||||||
If you want paperless to use ``PAPERLESS_FILENAME_FORMAT`` for exported filenames
|
|
||||||
instead, specify ``--use-filename-format``.
|
|
||||||
|
|
||||||
|
|
||||||
.. _utilities-importer:
|
|
||||||
|
|
||||||
Document importer
|
|
||||||
=================
|
|
||||||
|
|
||||||
The document importer takes the export produced by the `Document exporter`_ and
|
|
||||||
imports it into paperless.
|
|
||||||
|
|
||||||
The importer works just like the exporter. You point it at a directory, and
|
|
||||||
the script does the rest of the work:
|
|
||||||
|
|
||||||
.. code::
|
|
||||||
|
|
||||||
document_importer source
|
|
||||||
|
|
||||||
When you use the provided docker compose script, put the export inside the
|
|
||||||
``export`` folder in your paperless source directory. Specify ``../export``
|
|
||||||
as the ``source``.
|
|
||||||
|
|
||||||
.. note::
|
|
||||||
|
|
||||||
Importing from a previous version of Paperless may work, but for best results
|
|
||||||
it is suggested to match the versions.
|
|
||||||
|
|
||||||
.. _utilities-retagger:
|
|
||||||
|
|
||||||
Document retagger
|
|
||||||
=================
|
|
||||||
|
|
||||||
Say you've imported a few hundred documents and now want to introduce
|
|
||||||
a tag or set up a new correspondent, and apply its matching to all of
|
|
||||||
the currently-imported docs. This problem is common enough that
|
|
||||||
there are tools for it.
|
|
||||||
|
|
||||||
.. code::
|
|
||||||
|
|
||||||
document_retagger [-h] [-c] [-T] [-t] [-i] [--use-first] [-f]
|
|
||||||
|
|
||||||
optional arguments:
|
|
||||||
-c, --correspondent
|
|
||||||
-T, --tags
|
|
||||||
-t, --document_type
|
|
||||||
-s, --storage_path
|
|
||||||
-i, --inbox-only
|
|
||||||
--use-first
|
|
||||||
-f, --overwrite
|
|
||||||
|
|
||||||
Run this after changing or adding matching rules. It'll loop over all
|
|
||||||
of the documents in your database and attempt to match documents
|
|
||||||
according to the new rules.
|
|
||||||
|
|
||||||
Specify any combination of ``-c``, ``-T``, ``-t`` and ``-s`` to have the
|
|
||||||
retagger perform matching of the specified metadata type. If you don't
|
|
||||||
specify any of these options, the document retagger won't do anything.
|
|
||||||
|
|
||||||
Specify ``-i`` to have the document retagger work on documents tagged
|
|
||||||
with inbox tags only. This is useful when you don't want to mess with
|
|
||||||
your already processed documents.
|
|
||||||
|
|
||||||
When multiple document types or correspondents match a single document,
|
|
||||||
the retagger won't assign these to the document. Specify ``--use-first``
|
|
||||||
to override this behavior and just use the first correspondent or type
|
|
||||||
it finds. This option does not apply to tags, since any amount of tags
|
|
||||||
can be applied to a document.
|
|
||||||
|
|
||||||
Finally, ``-f`` specifies that you wish to overwrite already assigned
|
|
||||||
correspondents, types and/or tags. The default behavior is to not
|
|
||||||
assign correspondents and types to documents that have this data already
|
|
||||||
assigned. ``-f`` works differently for tags: By default, only additional tags get
|
|
||||||
added to documents, no tags will be removed. With ``-f``, tags that don't
|
|
||||||
match a document anymore get removed as well.
|
|
||||||
|
|
||||||
|
|
||||||
Managing the Automatic matching algorithm
|
|
||||||
=========================================
|
|
||||||
|
|
||||||
The *Auto* matching algorithm requires a trained neural network to work.
|
|
||||||
This network needs to be updated whenever somethings in your data
|
|
||||||
changes. The docker image takes care of that automatically with the task
|
|
||||||
scheduler. You can manually renew the classifier by invoking the following
|
|
||||||
management command:
|
|
||||||
|
|
||||||
.. code::
|
|
||||||
|
|
||||||
document_create_classifier
|
|
||||||
|
|
||||||
This command takes no arguments.
|
|
||||||
|
|
||||||
.. _`administration-index`:
|
|
||||||
|
|
||||||
Managing the document search index
|
|
||||||
==================================
|
|
||||||
|
|
||||||
The document search index is responsible for delivering search results for the
|
|
||||||
website. The document index is automatically updated whenever documents get
|
|
||||||
added to, changed, or removed from paperless. However, if the search yields
|
|
||||||
non-existing documents or won't find anything, you may need to recreate the
|
|
||||||
index manually.
|
|
||||||
|
|
||||||
.. code::
|
|
||||||
|
|
||||||
document_index {reindex,optimize}
|
|
||||||
|
|
||||||
Specify ``reindex`` to have the index created from scratch. This may take some
|
|
||||||
time.
|
|
||||||
|
|
||||||
Specify ``optimize`` to optimize the index. This updates certain aspects of
|
|
||||||
the index and usually makes queries faster and also ensures that the
|
|
||||||
autocompletion works properly. This command is regularly invoked by the task
|
|
||||||
scheduler.
|
|
||||||
|
|
||||||
.. _utilities-renamer:
|
|
||||||
|
|
||||||
Managing filenames
|
|
||||||
==================
|
|
||||||
|
|
||||||
If you use paperless' feature to
|
|
||||||
:ref:`assign custom filenames to your documents <advanced-file_name_handling>`,
|
|
||||||
you can use this command to move all your files after changing
|
|
||||||
the naming scheme.
|
|
||||||
|
|
||||||
.. warning::
|
|
||||||
|
|
||||||
Since this command moves your documents, it is advised to do
|
|
||||||
a backup beforehand. The renaming logic is robust and will never overwrite
|
|
||||||
or delete a file, but you can't ever be careful enough.
|
|
||||||
|
|
||||||
.. code::
|
|
||||||
|
|
||||||
document_renamer
|
|
||||||
|
|
||||||
The command takes no arguments and processes all your documents at once.
|
|
||||||
|
|
||||||
Learn how to use :ref:`Management Utilities<utilities-management-commands>`.
|
|
||||||
|
|
||||||
|
|
||||||
.. _utilities-sanity-checker:
|
|
||||||
|
|
||||||
Sanity checker
|
|
||||||
==============
|
|
||||||
|
|
||||||
Paperless has a built-in sanity checker that inspects your document collection for issues.
|
|
||||||
|
|
||||||
The issues detected by the sanity checker are as follows:
|
|
||||||
|
|
||||||
* Missing original files.
|
|
||||||
* Missing archive files.
|
|
||||||
* Inaccessible original files due to improper permissions.
|
|
||||||
* Inaccessible archive files due to improper permissions.
|
|
||||||
* Corrupted original documents by comparing their checksum against what is stored in the database.
|
|
||||||
* Corrupted archive documents by comparing their checksum against what is stored in the database.
|
|
||||||
* Missing thumbnails.
|
|
||||||
* Inaccessible thumbnails due to improper permissions.
|
|
||||||
* Documents without any content (warning).
|
|
||||||
* Orphaned files in the media directory (warning). These are files that are not referenced by any document im paperless.
|
|
||||||
|
|
||||||
|
|
||||||
.. code::
|
|
||||||
|
|
||||||
document_sanity_checker
|
|
||||||
|
|
||||||
The command takes no arguments. Depending on the size of your document archive, this may take some time.
|
|
||||||
|
|
||||||
|
|
||||||
Fetching e-mail
|
|
||||||
===============
|
|
||||||
|
|
||||||
Paperless automatically fetches your e-mail every 10 minutes by default. If
|
|
||||||
you want to invoke the email consumer manually, call the following management
|
|
||||||
command:
|
|
||||||
|
|
||||||
.. code::
|
|
||||||
|
|
||||||
mail_fetcher
|
|
||||||
|
|
||||||
The command takes no arguments and processes all your mail accounts and rules.
|
|
||||||
|
|
||||||
.. note::
|
|
||||||
|
|
||||||
As of October 2022 Microsoft no longer supports IMAP authentication for Exchange
|
|
||||||
servers, thus Exchange is no longer supported until a solution is implemented in
|
|
||||||
the Python IMAP library used by Paperless. See `learn.microsoft.com`_
|
|
||||||
|
|
||||||
.. _learn.microsoft.com: https://learn.microsoft.com/en-us/exchange/clients-and-mobile-in-exchange-online/deprecation-of-basic-authentication-exchange-online
|
|
||||||
|
|
||||||
.. _utilities-archiver:
|
|
||||||
|
|
||||||
Creating archived documents
|
|
||||||
===========================
|
|
||||||
|
|
||||||
Paperless stores archived PDF/A documents alongside your original documents.
|
|
||||||
These archived documents will also contain selectable text for image-only
|
|
||||||
originals.
|
|
||||||
These documents are derived from the originals, which are always stored
|
|
||||||
unmodified. If coming from an earlier version of paperless, your documents
|
|
||||||
won't have archived versions.
|
|
||||||
|
|
||||||
This command creates PDF/A documents for your documents.
|
|
||||||
|
|
||||||
.. code::
|
|
||||||
|
|
||||||
document_archiver --overwrite --document <id>
|
|
||||||
|
|
||||||
This command will only attempt to create archived documents when no archived
|
|
||||||
document exists yet, unless ``--overwrite`` is specified. If ``--document <id>``
|
|
||||||
is specified, the archiver will only process that document.
|
|
||||||
|
|
||||||
.. note::
|
|
||||||
|
|
||||||
This command essentially performs OCR on all your documents again,
|
|
||||||
according to your settings. If you run this with ``PAPERLESS_OCR_MODE=redo``,
|
|
||||||
it will potentially run for a very long time. You can cancel the command
|
|
||||||
at any time, since this command will skip already archived versions the next time
|
|
||||||
it is run.
|
|
||||||
|
|
||||||
.. note::
|
|
||||||
|
|
||||||
Some documents will cause errors and cannot be converted into PDF/A documents,
|
|
||||||
such as encrypted PDF documents. The archiver will skip over these documents
|
|
||||||
each time it sees them.
|
|
||||||
|
|
||||||
.. _utilities-encyption:
|
|
||||||
|
|
||||||
Managing encryption
|
|
||||||
===================
|
|
||||||
|
|
||||||
Documents can be stored in Paperless using GnuPG encryption.
|
|
||||||
|
|
||||||
.. danger::
|
|
||||||
|
|
||||||
Encryption is deprecated since paperless-ngx 0.9 and doesn't really provide any
|
|
||||||
additional security, since you have to store the passphrase in a configuration
|
|
||||||
file on the same system as the encrypted documents for paperless to work.
|
|
||||||
Furthermore, the entire text content of the documents is stored plain in the
|
|
||||||
database, even if your documents are encrypted. Filenames are not encrypted as
|
|
||||||
well.
|
|
||||||
|
|
||||||
Also, the web server provides transparent access to your encrypted documents.
|
|
||||||
|
|
||||||
Consider running paperless on an encrypted filesystem instead, which will then
|
|
||||||
at least provide security against physical hardware theft.
|
|
||||||
|
|
||||||
|
|
||||||
Enabling encryption
|
|
||||||
-------------------
|
|
||||||
|
|
||||||
Enabling encryption is no longer supported.
|
|
||||||
|
|
||||||
|
|
||||||
Disabling encryption
|
|
||||||
--------------------
|
|
||||||
|
|
||||||
Basic usage to disable encryption of your document store:
|
|
||||||
|
|
||||||
(Note: If ``PAPERLESS_PASSPHRASE`` isn't set already, you need to specify it here)
|
|
||||||
|
|
||||||
.. code::
|
|
||||||
|
|
||||||
decrypt_documents [--passphrase SECR3TP4SSPHRA$E]
|
|
||||||
|
@ -1,447 +1,11 @@
|
|||||||
|
.. _advanced_usage:
|
||||||
|
|
||||||
***************
|
***************
|
||||||
Advanced topics
|
Advanced topics
|
||||||
***************
|
***************
|
||||||
|
|
||||||
Paperless offers a couple features that automate certain tasks and make your life
|
.. cssclass:: redirect-notice
|
||||||
easier.
|
|
||||||
|
|
||||||
.. _advanced-matching:
|
The Paperless-ngx documentation has permanently moved.
|
||||||
|
|
||||||
Matching tags, correspondents, document types, and storage paths
|
You will be redirected shortly...
|
||||||
################################################################
|
|
||||||
|
|
||||||
Paperless will compare the matching algorithms defined by every tag, correspondent,
|
|
||||||
document type, and storage path in your database to see if they apply to the text
|
|
||||||
in a document. In other words, if you define a tag called ``Home Utility``
|
|
||||||
that had a ``match`` property of ``bc hydro`` and a ``matching_algorithm`` of
|
|
||||||
``literal``, Paperless will automatically tag your newly-consumed document with
|
|
||||||
your ``Home Utility`` tag so long as the text ``bc hydro`` appears in the body
|
|
||||||
of the document somewhere.
|
|
||||||
|
|
||||||
The matching logic is quite powerful. It supports searching the text of your
|
|
||||||
document with different algorithms, and as such, some experimentation may be
|
|
||||||
necessary to get things right.
|
|
||||||
|
|
||||||
In order to have a tag, correspondent, document type, or storage path assigned
|
|
||||||
automatically to newly consumed documents, assign a match and matching algorithm
|
|
||||||
using the web interface. These settings define when to assign tags, correspondents,
|
|
||||||
document types, and storage paths to documents.
|
|
||||||
|
|
||||||
The following algorithms are available:
|
|
||||||
|
|
||||||
* **Any:** Looks for any occurrence of any word provided in match in the PDF.
|
|
||||||
If you define the match as ``Bank1 Bank2``, it will match documents containing
|
|
||||||
either of these terms.
|
|
||||||
* **All:** Requires that every word provided appears in the PDF, albeit not in the
|
|
||||||
order provided.
|
|
||||||
* **Literal:** Matches only if the match appears exactly as provided (i.e. preserve ordering) in the PDF.
|
|
||||||
* **Regular expression:** Parses the match as a regular expression and tries to
|
|
||||||
find a match within the document.
|
|
||||||
* **Fuzzy match:** I don't know. Look at the source.
|
|
||||||
* **Auto:** Tries to automatically match new documents. This does not require you
|
|
||||||
to set a match. See the notes below.
|
|
||||||
|
|
||||||
When using the *any* or *all* matching algorithms, you can search for terms
|
|
||||||
that consist of multiple words by enclosing them in double quotes. For example,
|
|
||||||
defining a match text of ``"Bank of America" BofA`` using the *any* algorithm,
|
|
||||||
will match documents that contain either "Bank of America" or "BofA", but will
|
|
||||||
not match documents containing "Bank of South America".
|
|
||||||
|
|
||||||
Then just save your tag, correspondent, document type, or storage path and run
|
|
||||||
another document through the consumer. Once complete, you should see the
|
|
||||||
newly-created document, automatically tagged with the appropriate data.
|
|
||||||
|
|
||||||
|
|
||||||
.. _advanced-automatic_matching:
|
|
||||||
|
|
||||||
Automatic matching
|
|
||||||
==================
|
|
||||||
|
|
||||||
Paperless-ngx comes with a new matching algorithm called *Auto*. This matching
|
|
||||||
algorithm tries to assign tags, correspondents, document types, and storage paths
|
|
||||||
to your documents based on how you have already assigned these on existing documents.
|
|
||||||
It uses a neural network under the hood.
|
|
||||||
|
|
||||||
If, for example, all your bank statements of your account 123 at the Bank of
|
|
||||||
America are tagged with the tag "bofa_123" and the matching algorithm of this
|
|
||||||
tag is set to *Auto*, this neural network will examine your documents and
|
|
||||||
automatically learn when to assign this tag.
|
|
||||||
|
|
||||||
Paperless tries to hide much of the involved complexity with this approach.
|
|
||||||
However, there are a couple caveats you need to keep in mind when using this
|
|
||||||
feature:
|
|
||||||
|
|
||||||
* Changes to your documents are not immediately reflected by the matching
|
|
||||||
algorithm. The neural network needs to be *trained* on your documents after
|
|
||||||
changes. Paperless periodically (default: once each hour) checks for changes
|
|
||||||
and does this automatically for you.
|
|
||||||
* The Auto matching algorithm only takes documents into account which are NOT
|
|
||||||
placed in your inbox (i.e. have any inbox tags assigned to them). This ensures
|
|
||||||
that the neural network only learns from documents which you have correctly
|
|
||||||
tagged before.
|
|
||||||
* The matching algorithm can only work if there is a correlation between the
|
|
||||||
tag, correspondent, document type, or storage path and the document itself.
|
|
||||||
Your bank statements usually contain your bank account number and the name
|
|
||||||
of the bank, so this works reasonably well, However, tags such as "TODO"
|
|
||||||
cannot be automatically assigned.
|
|
||||||
* The matching algorithm needs a reasonable number of documents to identify when
|
|
||||||
to assign tags, correspondents, storage paths, and types. If one out of a
|
|
||||||
thousand documents has the correspondent "Very obscure web shop I bought
|
|
||||||
something five years ago", it will probably not assign this correspondent
|
|
||||||
automatically if you buy something from them again. The more documents, the better.
|
|
||||||
* Paperless also needs a reasonable amount of negative examples to decide when
|
|
||||||
not to assign a certain tag, correspondent, document type, or storage path. This will
|
|
||||||
usually be the case as you start filling up paperless with documents.
|
|
||||||
Example: If all your documents are either from "Webshop" and "Bank", paperless
|
|
||||||
will assign one of these correspondents to ANY new document, if both are set
|
|
||||||
to automatic matching.
|
|
||||||
|
|
||||||
Hooking into the consumption process
|
|
||||||
####################################
|
|
||||||
|
|
||||||
Sometimes you may want to do something arbitrary whenever a document is
|
|
||||||
consumed. Rather than try to predict what you may want to do, Paperless lets
|
|
||||||
you execute scripts of your own choosing just before or after a document is
|
|
||||||
consumed using a couple simple hooks.
|
|
||||||
|
|
||||||
Just write a script, put it somewhere that Paperless can read & execute, and
|
|
||||||
then put the path to that script in ``paperless.conf`` or ``docker-compose.env`` with the variable name
|
|
||||||
of either ``PAPERLESS_PRE_CONSUME_SCRIPT`` or
|
|
||||||
``PAPERLESS_POST_CONSUME_SCRIPT``.
|
|
||||||
|
|
||||||
.. important::
|
|
||||||
|
|
||||||
These scripts are executed in a **blocking** process, which means that if
|
|
||||||
a script takes a long time to run, it can significantly slow down your
|
|
||||||
document consumption flow. If you want things to run asynchronously,
|
|
||||||
you'll have to fork the process in your script and exit.
|
|
||||||
|
|
||||||
|
|
||||||
Pre-consumption script
|
|
||||||
======================
|
|
||||||
|
|
||||||
Executed after the consumer sees a new document in the consumption folder, but
|
|
||||||
before any processing of the document is performed. This script can access the
|
|
||||||
following relevant environment variables set:
|
|
||||||
|
|
||||||
* ``DOCUMENT_SOURCE_PATH``
|
|
||||||
|
|
||||||
A simple but common example for this would be creating a simple script like
|
|
||||||
this:
|
|
||||||
|
|
||||||
``/usr/local/bin/ocr-pdf``
|
|
||||||
|
|
||||||
.. code:: bash
|
|
||||||
|
|
||||||
#!/usr/bin/env bash
|
|
||||||
pdf2pdfocr.py -i ${DOCUMENT_SOURCE_PATH}
|
|
||||||
|
|
||||||
``/etc/paperless.conf``
|
|
||||||
|
|
||||||
.. code:: bash
|
|
||||||
|
|
||||||
...
|
|
||||||
PAPERLESS_PRE_CONSUME_SCRIPT="/usr/local/bin/ocr-pdf"
|
|
||||||
...
|
|
||||||
|
|
||||||
This will pass the path to the document about to be consumed to ``/usr/local/bin/ocr-pdf``,
|
|
||||||
which will in turn call `pdf2pdfocr.py`_ on your document, which will then
|
|
||||||
overwrite the file with an OCR'd version of the file and exit. At which point,
|
|
||||||
the consumption process will begin with the newly modified file.
|
|
||||||
|
|
||||||
The script's stdout and stderr will be logged line by line to the webserver log, along
|
|
||||||
with the exit code of the script.
|
|
||||||
|
|
||||||
.. _pdf2pdfocr.py: https://github.com/LeoFCardoso/pdf2pdfocr
|
|
||||||
|
|
||||||
.. _advanced-post_consume_script:
|
|
||||||
|
|
||||||
Post-consumption script
|
|
||||||
=======================
|
|
||||||
|
|
||||||
Executed after the consumer has successfully processed a document and has moved it
|
|
||||||
into paperless. It receives the following environment variables:
|
|
||||||
|
|
||||||
* ``DOCUMENT_ID``
|
|
||||||
* ``DOCUMENT_FILE_NAME``
|
|
||||||
* ``DOCUMENT_CREATED``
|
|
||||||
* ``DOCUMENT_MODIFIED``
|
|
||||||
* ``DOCUMENT_ADDED``
|
|
||||||
* ``DOCUMENT_SOURCE_PATH``
|
|
||||||
* ``DOCUMENT_ARCHIVE_PATH``
|
|
||||||
* ``DOCUMENT_THUMBNAIL_PATH``
|
|
||||||
* ``DOCUMENT_DOWNLOAD_URL``
|
|
||||||
* ``DOCUMENT_THUMBNAIL_URL``
|
|
||||||
* ``DOCUMENT_CORRESPONDENT``
|
|
||||||
* ``DOCUMENT_TAGS``
|
|
||||||
* ``DOCUMENT_ORIGINAL_FILENAME``
|
|
||||||
|
|
||||||
The script can be in any language, but for a simple shell script
|
|
||||||
example, you can take a look at `post-consumption-example.sh`_ in this project.
|
|
||||||
|
|
||||||
The post consumption script cannot cancel the consumption process.
|
|
||||||
|
|
||||||
The script's stdout and stderr will be logged line by line to the webserver log, along
|
|
||||||
with the exit code of the script.
|
|
||||||
|
|
||||||
|
|
||||||
Docker
|
|
||||||
------
|
|
||||||
Assumed you have ``/home/foo/paperless-ngx/scripts/post-consumption-example.sh``.
|
|
||||||
|
|
||||||
You can pass that script into the consumer container via a host mount in your ``docker-compose.yml``.
|
|
||||||
|
|
||||||
.. code:: bash
|
|
||||||
|
|
||||||
...
|
|
||||||
consumer:
|
|
||||||
...
|
|
||||||
volumes:
|
|
||||||
...
|
|
||||||
- /home/paperless-ngx/scripts:/path/in/container/scripts/
|
|
||||||
...
|
|
||||||
|
|
||||||
Example (docker-compose.yml): ``- /home/foo/paperless-ngx/scripts:/usr/src/paperless/scripts``
|
|
||||||
|
|
||||||
which in turn requires the variable ``PAPERLESS_POST_CONSUME_SCRIPT`` in ``docker-compose.env`` to point to ``/path/in/container/scripts/post-consumption-example.sh``.
|
|
||||||
|
|
||||||
Example (docker-compose.env): ``PAPERLESS_POST_CONSUME_SCRIPT=/usr/src/paperless/scripts/post-consumption-example.sh``
|
|
||||||
|
|
||||||
Troubleshooting:
|
|
||||||
|
|
||||||
- Monitor the docker-compose log ``cd ~/paperless-ngx; docker-compose logs -f``
|
|
||||||
- Check your script's permission e.g. in case of permission error ``sudo chmod 755 post-consumption-example.sh``
|
|
||||||
- Pipe your scripts's output to a log file e.g. ``echo "${DOCUMENT_ID}" | tee --append /usr/src/paperless/scripts/post-consumption-example.log``
|
|
||||||
|
|
||||||
.. _post-consumption-example.sh: https://github.com/paperless-ngx/paperless-ngx/blob/main/scripts/post-consumption-example.sh
|
|
||||||
|
|
||||||
.. _advanced-file_name_handling:
|
|
||||||
|
|
||||||
File name handling
|
|
||||||
##################
|
|
||||||
|
|
||||||
By default, paperless stores your documents in the media directory and renames them
|
|
||||||
using the identifier which it has assigned to each document. You will end up getting
|
|
||||||
files like ``0000123.pdf`` in your media directory. This isn't necessarily a bad
|
|
||||||
thing, because you normally don't have to access these files manually. However, if
|
|
||||||
you wish to name your files differently, you can do that by adjusting the
|
|
||||||
``PAPERLESS_FILENAME_FORMAT`` configuration option. Paperless adds the correct
|
|
||||||
file extension e.g. ``.pdf``, ``.jpg`` automatically.
|
|
||||||
|
|
||||||
This variable allows you to configure the filename (folders are allowed) using
|
|
||||||
placeholders. For example, configuring this to
|
|
||||||
|
|
||||||
.. code:: bash
|
|
||||||
|
|
||||||
PAPERLESS_FILENAME_FORMAT={created_year}/{correspondent}/{title}
|
|
||||||
|
|
||||||
will create a directory structure as follows:
|
|
||||||
|
|
||||||
.. code::
|
|
||||||
|
|
||||||
2019/
|
|
||||||
My bank/
|
|
||||||
Statement January.pdf
|
|
||||||
Statement February.pdf
|
|
||||||
2020/
|
|
||||||
My bank/
|
|
||||||
Statement January.pdf
|
|
||||||
Letter.pdf
|
|
||||||
Letter_01.pdf
|
|
||||||
Shoe store/
|
|
||||||
My new shoes.pdf
|
|
||||||
|
|
||||||
.. danger::
|
|
||||||
|
|
||||||
Do not manually move your files in the media folder. Paperless remembers the
|
|
||||||
last filename a document was stored as. If you do rename a file, paperless will
|
|
||||||
report your files as missing and won't be able to find them.
|
|
||||||
|
|
||||||
Paperless provides the following placeholders within filenames:
|
|
||||||
|
|
||||||
* ``{asn}``: The archive serial number of the document, or "none".
|
|
||||||
* ``{correspondent}``: The name of the correspondent, or "none".
|
|
||||||
* ``{document_type}``: The name of the document type, or "none".
|
|
||||||
* ``{tag_list}``: A comma separated list of all tags assigned to the document.
|
|
||||||
* ``{title}``: The title of the document.
|
|
||||||
* ``{created}``: The full date (ISO format) the document was created.
|
|
||||||
* ``{created_year}``: Year created only, formatted as the year with century.
|
|
||||||
* ``{created_year_short}``: Year created only, formatted as the year without century, zero padded.
|
|
||||||
* ``{created_month}``: Month created only (number 01-12).
|
|
||||||
* ``{created_month_name}``: Month created name, as per locale
|
|
||||||
* ``{created_month_name_short}``: Month created abbreviated name, as per locale
|
|
||||||
* ``{created_day}``: Day created only (number 01-31).
|
|
||||||
* ``{added}``: The full date (ISO format) the document was added to paperless.
|
|
||||||
* ``{added_year}``: Year added only.
|
|
||||||
* ``{added_year_short}``: Year added only, formatted as the year without century, zero padded.
|
|
||||||
* ``{added_month}``: Month added only (number 01-12).
|
|
||||||
* ``{added_month_name}``: Month added name, as per locale
|
|
||||||
* ``{added_month_name_short}``: Month added abbreviated name, as per locale
|
|
||||||
* ``{added_day}``: Day added only (number 01-31).
|
|
||||||
|
|
||||||
|
|
||||||
Paperless will try to conserve the information from your database as much as possible.
|
|
||||||
However, some characters that you can use in document titles and correspondent names (such
|
|
||||||
as ``: \ /`` and a couple more) are not allowed in filenames and will be replaced with dashes.
|
|
||||||
|
|
||||||
If paperless detects that two documents share the same filename, paperless will automatically
|
|
||||||
append ``_01``, ``_02``, etc to the filename. This happens if all the placeholders in a filename
|
|
||||||
evaluate to the same value.
|
|
||||||
|
|
||||||
.. hint::
|
|
||||||
You can affect how empty placeholders are treated by changing the following setting to
|
|
||||||
`true`.
|
|
||||||
|
|
||||||
.. code::
|
|
||||||
|
|
||||||
PAPERLESS_FILENAME_FORMAT_REMOVE_NONE=True
|
|
||||||
|
|
||||||
Doing this results in all empty placeholders resolving to "" instead of "none" as stated above.
|
|
||||||
Spaces before empty placeholders are removed as well, empty directories are omitted.
|
|
||||||
|
|
||||||
.. hint::
|
|
||||||
|
|
||||||
Paperless checks the filename of a document whenever it is saved. Therefore,
|
|
||||||
you need to update the filenames of your documents and move them after altering
|
|
||||||
this setting by invoking the :ref:`document renamer <utilities-renamer>`.
|
|
||||||
|
|
||||||
.. warning::
|
|
||||||
|
|
||||||
Make absolutely sure you get the spelling of the placeholders right, or else
|
|
||||||
paperless will use the default naming scheme instead.
|
|
||||||
|
|
||||||
.. caution::
|
|
||||||
|
|
||||||
As of now, you could totally tell paperless to store your files anywhere outside
|
|
||||||
the media directory by setting
|
|
||||||
|
|
||||||
.. code::
|
|
||||||
|
|
||||||
PAPERLESS_FILENAME_FORMAT=../../my/custom/location/{title}
|
|
||||||
|
|
||||||
However, keep in mind that inside docker, if files get stored outside of the
|
|
||||||
predefined volumes, they will be lost after a restart of paperless.
|
|
||||||
|
|
||||||
|
|
||||||
Storage paths
|
|
||||||
#############
|
|
||||||
|
|
||||||
One of the best things in Paperless is that you can not only access the documents via the
|
|
||||||
web interface, but also via the file system.
|
|
||||||
|
|
||||||
When as single storage layout is not sufficient for your use case, storage paths come to
|
|
||||||
the rescue. Storage paths allow you to configure more precisely where each document is stored
|
|
||||||
in the file system.
|
|
||||||
|
|
||||||
- Each storage path is a `PAPERLESS_FILENAME_FORMAT` and follows the rules described above
|
|
||||||
- Each document is assigned a storage path using the matching algorithms described above, but
|
|
||||||
can be overwritten at any time
|
|
||||||
|
|
||||||
For example, you could define the following two storage paths:
|
|
||||||
|
|
||||||
1. Normal communications are put into a folder structure sorted by `year/correspondent`
|
|
||||||
2. Communications with insurance companies are stored in a flat structure with longer file names,
|
|
||||||
but containing the full date of the correspondence.
|
|
||||||
|
|
||||||
.. code::
|
|
||||||
|
|
||||||
By Year = {created_year}/{correspondent}/{title}
|
|
||||||
Insurances = Insurances/{correspondent}/{created_year}-{created_month}-{created_day} {title}
|
|
||||||
|
|
||||||
|
|
||||||
If you then map these storage paths to the documents, you might get the following result.
|
|
||||||
For simplicity, `By Year` defines the same structure as in the previous example above.
|
|
||||||
|
|
||||||
.. code:: text
|
|
||||||
|
|
||||||
2019/ # By Year
|
|
||||||
My bank/
|
|
||||||
Statement January.pdf
|
|
||||||
Statement February.pdf
|
|
||||||
|
|
||||||
Insurances/ # Insurances
|
|
||||||
Healthcare 123/
|
|
||||||
2022-01-01 Statement January.pdf
|
|
||||||
2022-02-02 Letter.pdf
|
|
||||||
2022-02-03 Letter.pdf
|
|
||||||
Dental 456/
|
|
||||||
2021-12-01 New Conditions.pdf
|
|
||||||
|
|
||||||
|
|
||||||
.. hint::
|
|
||||||
|
|
||||||
Defining a storage path is optional. If no storage path is defined for a document, the global
|
|
||||||
`PAPERLESS_FILENAME_FORMAT` is applied.
|
|
||||||
|
|
||||||
.. caution::
|
|
||||||
|
|
||||||
If you adjust the format of an existing storage path, old documents don't get relocated automatically.
|
|
||||||
You need to run the :ref:`document renamer <utilities-renamer>` to adjust their pathes.
|
|
||||||
|
|
||||||
.. _advanced-celery-monitoring:
|
|
||||||
|
|
||||||
Celery Monitoring
|
|
||||||
#################
|
|
||||||
|
|
||||||
The monitoring tool `Flower <https://flower.readthedocs.io/en/latest/index.html>`_ can be used to view more
|
|
||||||
detailed information about the health of the celery workers used for asynchronous tasks. This includes details
|
|
||||||
on currently running, queued and completed tasks, timing and more. Flower can also be used with Prometheus, as it
|
|
||||||
exports metrics. For details on its capabilities, refer to the Flower documentation.
|
|
||||||
|
|
||||||
To configure Flower further, create a `flowerconfig.py` and place it into the `src/paperless` directory. For
|
|
||||||
a Docker installation, you can use volumes to accomplish this:
|
|
||||||
|
|
||||||
.. code:: yaml
|
|
||||||
|
|
||||||
services:
|
|
||||||
# ...
|
|
||||||
webserver:
|
|
||||||
# ...
|
|
||||||
volumes:
|
|
||||||
- /path/to/my/flowerconfig.py:/usr/src/paperless/src/paperless/flowerconfig.py:ro
|
|
||||||
|
|
||||||
Custom Container Initialization
|
|
||||||
###############################
|
|
||||||
|
|
||||||
The Docker image includes the ability to run custom user scripts during startup. This could be
|
|
||||||
utilized for installing additional tools or Python packages, for example.
|
|
||||||
|
|
||||||
To utilize this, mount a folder containing your scripts to the custom initialization directory, `/custom-cont-init.d`
|
|
||||||
and place scripts you wish to run inside. For security, the folder and its contents must be owned by `root`.
|
|
||||||
Additionally, scripts must only be writable by `root`.
|
|
||||||
|
|
||||||
Your scripts will be run directly before the webserver completes startup. Scripts will be run by the `root` user.
|
|
||||||
This is an advanced functionality with which you could break functionality or lose data.
|
|
||||||
|
|
||||||
For example, using Docker Compose:
|
|
||||||
|
|
||||||
|
|
||||||
.. code:: yaml
|
|
||||||
|
|
||||||
services:
|
|
||||||
# ...
|
|
||||||
webserver:
|
|
||||||
# ...
|
|
||||||
volumes:
|
|
||||||
- /path/to/my/scripts:/custom-cont-init.d:ro
|
|
||||||
|
|
||||||
.. _advanced-mysql-caveats:
|
|
||||||
|
|
||||||
MySQL Caveats
|
|
||||||
#############
|
|
||||||
|
|
||||||
Case Sensitivity
|
|
||||||
================
|
|
||||||
|
|
||||||
The database interface does not provide a method to configure a MySQL database to
|
|
||||||
be case sensitive. This would prevent a user from creating a tag ``Name`` and ``NAME``
|
|
||||||
as they are considered the same.
|
|
||||||
|
|
||||||
Per Django documentation, to enable this requires manual intervention. To enable
|
|
||||||
case sensetive tables, you can execute the following command against each table:
|
|
||||||
|
|
||||||
``ALTER TABLE <table_name> CONVERT TO CHARACTER SET utf8mb4 COLLATE utf8mb4_bin;``
|
|
||||||
|
|
||||||
You can also set the default for new tables (this does NOT affect existing tables) with:
|
|
||||||
|
|
||||||
``ALTER DATABASE <db_name> CHARACTER SET utf8mb4 COLLATE utf8mb4_bin;``
|
|
||||||
|
299
docs/api.rst
299
docs/api.rst
@ -1,303 +1,12 @@
|
|||||||
|
.. _api:
|
||||||
|
|
||||||
************
|
************
|
||||||
The REST API
|
The REST API
|
||||||
************
|
************
|
||||||
|
|
||||||
|
|
||||||
Paperless makes use of the `Django REST Framework`_ standard API interface.
|
.. cssclass:: redirect-notice
|
||||||
It provides a browsable API for most of its endpoints, which you can inspect
|
|
||||||
at ``http://<paperless-host>:<port>/api/``. This also documents most of the
|
|
||||||
available filters and ordering fields.
|
|
||||||
|
|
||||||
.. _Django REST Framework: http://django-rest-framework.org/
|
The Paperless-ngx documentation has permanently moved.
|
||||||
|
|
||||||
The API provides 5 main endpoints:
|
You will be redirected shortly...
|
||||||
|
|
||||||
* ``/api/documents/``: Full CRUD support, except POSTing new documents. See below.
|
|
||||||
* ``/api/correspondents/``: Full CRUD support.
|
|
||||||
* ``/api/document_types/``: Full CRUD support.
|
|
||||||
* ``/api/logs/``: Read-Only.
|
|
||||||
* ``/api/tags/``: Full CRUD support.
|
|
||||||
|
|
||||||
All of these endpoints except for the logging endpoint
|
|
||||||
allow you to fetch, edit and delete individual objects
|
|
||||||
by appending their primary key to the path, for example ``/api/documents/454/``.
|
|
||||||
|
|
||||||
The objects served by the document endpoint contain the following fields:
|
|
||||||
|
|
||||||
* ``id``: ID of the document. Read-only.
|
|
||||||
* ``title``: Title of the document.
|
|
||||||
* ``content``: Plain text content of the document.
|
|
||||||
* ``tags``: List of IDs of tags assigned to this document, or empty list.
|
|
||||||
* ``document_type``: Document type of this document, or null.
|
|
||||||
* ``correspondent``: Correspondent of this document or null.
|
|
||||||
* ``created``: The date time at which this document was created.
|
|
||||||
* ``created_date``: The date (YYYY-MM-DD) at which this document was created. Optional. If also passed with created, this is ignored.
|
|
||||||
* ``modified``: The date at which this document was last edited in paperless. Read-only.
|
|
||||||
* ``added``: The date at which this document was added to paperless. Read-only.
|
|
||||||
* ``archive_serial_number``: The identifier of this document in a physical document archive.
|
|
||||||
* ``original_file_name``: Verbose filename of the original document. Read-only.
|
|
||||||
* ``archived_file_name``: Verbose filename of the archived document. Read-only. Null if no archived document is available.
|
|
||||||
|
|
||||||
|
|
||||||
Downloading documents
|
|
||||||
#####################
|
|
||||||
|
|
||||||
In addition to that, the document endpoint offers these additional actions on
|
|
||||||
individual documents:
|
|
||||||
|
|
||||||
* ``/api/documents/<pk>/download/``: Download the document.
|
|
||||||
* ``/api/documents/<pk>/preview/``: Display the document inline,
|
|
||||||
without downloading it.
|
|
||||||
* ``/api/documents/<pk>/thumb/``: Download the PNG thumbnail of a document.
|
|
||||||
|
|
||||||
Paperless generates archived PDF/A documents from consumed files and stores both
|
|
||||||
the original files as well as the archived files. By default, the endpoints
|
|
||||||
for previews and downloads serve the archived file, if it is available.
|
|
||||||
Otherwise, the original file is served.
|
|
||||||
Some document cannot be archived.
|
|
||||||
|
|
||||||
The endpoints correctly serve the response header fields ``Content-Disposition``
|
|
||||||
and ``Content-Type`` to indicate the filename for download and the type of content of
|
|
||||||
the document.
|
|
||||||
|
|
||||||
In order to download or preview the original document when an archived document is available,
|
|
||||||
supply the query parameter ``original=true``.
|
|
||||||
|
|
||||||
.. hint::
|
|
||||||
|
|
||||||
Paperless used to provide these functionality at ``/fetch/<pk>/preview``,
|
|
||||||
``/fetch/<pk>/thumb`` and ``/fetch/<pk>/doc``. Redirects to the new URLs
|
|
||||||
are in place. However, if you use these old URLs to access documents, you
|
|
||||||
should update your app or script to use the new URLs.
|
|
||||||
|
|
||||||
|
|
||||||
Getting document metadata
|
|
||||||
#########################
|
|
||||||
|
|
||||||
The api also has an endpoint to retrieve read-only metadata about specific documents. this
|
|
||||||
information is not served along with the document objects, since it requires reading
|
|
||||||
files and would therefore slow down document lists considerably.
|
|
||||||
|
|
||||||
Access the metadata of a document with an ID ``id`` at ``/api/documents/<id>/metadata/``.
|
|
||||||
|
|
||||||
The endpoint reports the following data:
|
|
||||||
|
|
||||||
* ``original_checksum``: MD5 checksum of the original document.
|
|
||||||
* ``original_size``: Size of the original document, in bytes.
|
|
||||||
* ``original_mime_type``: Mime type of the original document.
|
|
||||||
* ``media_filename``: Current filename of the document, under which it is stored inside the media directory.
|
|
||||||
* ``has_archive_version``: True, if this document is archived, false otherwise.
|
|
||||||
* ``original_metadata``: A list of metadata associated with the original document. See below.
|
|
||||||
* ``archive_checksum``: MD5 checksum of the archived document, or null.
|
|
||||||
* ``archive_size``: Size of the archived document in bytes, or null.
|
|
||||||
* ``archive_metadata``: Metadata associated with the archived document, or null. See below.
|
|
||||||
|
|
||||||
File metadata is reported as a list of objects in the following form:
|
|
||||||
|
|
||||||
.. code:: json
|
|
||||||
|
|
||||||
[
|
|
||||||
{
|
|
||||||
"namespace": "http://ns.adobe.com/pdf/1.3/",
|
|
||||||
"prefix": "pdf",
|
|
||||||
"key": "Producer",
|
|
||||||
"value": "SparklePDF, Fancy edition"
|
|
||||||
},
|
|
||||||
]
|
|
||||||
|
|
||||||
``namespace`` and ``prefix`` can be null. The actual metadata reported depends on the file type and the metadata
|
|
||||||
available in that specific document. Paperless only reports PDF metadata at this point.
|
|
||||||
|
|
||||||
Authorization
|
|
||||||
#############
|
|
||||||
|
|
||||||
The REST api provides three different forms of authentication.
|
|
||||||
|
|
||||||
1. Basic authentication
|
|
||||||
|
|
||||||
Authorize by providing a HTTP header in the form
|
|
||||||
|
|
||||||
.. code::
|
|
||||||
|
|
||||||
Authorization: Basic <credentials>
|
|
||||||
|
|
||||||
where ``credentials`` is a base64-encoded string of ``<username>:<password>``
|
|
||||||
|
|
||||||
2. Session authentication
|
|
||||||
|
|
||||||
When you're logged into paperless in your browser, you're automatically
|
|
||||||
logged into the API as well and don't need to provide any authorization
|
|
||||||
headers.
|
|
||||||
|
|
||||||
3. Token authentication
|
|
||||||
|
|
||||||
Paperless also offers an endpoint to acquire authentication tokens.
|
|
||||||
|
|
||||||
POST a username and password as a form or json string to ``/api/token/``
|
|
||||||
and paperless will respond with a token, if the login data is correct.
|
|
||||||
This token can be used to authenticate other requests with the
|
|
||||||
following HTTP header:
|
|
||||||
|
|
||||||
.. code::
|
|
||||||
|
|
||||||
Authorization: Token <token>
|
|
||||||
|
|
||||||
Tokens can be managed and revoked in the paperless admin.
|
|
||||||
|
|
||||||
Searching for documents
|
|
||||||
#######################
|
|
||||||
|
|
||||||
Full text searching is available on the ``/api/documents/`` endpoint. Two specific
|
|
||||||
query parameters cause the API to return full text search results:
|
|
||||||
|
|
||||||
* ``/api/documents/?query=your%20search%20query``: Search for a document using a full text query.
|
|
||||||
For details on the syntax, see :ref:`basic-usage_searching`.
|
|
||||||
|
|
||||||
* ``/api/documents/?more_like=1234``: Search for documents similar to the document with id 1234.
|
|
||||||
|
|
||||||
Pagination works exactly the same as it does for normal requests on this endpoint.
|
|
||||||
|
|
||||||
Certain limitations apply to full text queries:
|
|
||||||
|
|
||||||
* Results are always sorted by search score. The results matching the query best will show up first.
|
|
||||||
|
|
||||||
* Only a small subset of filtering parameters are supported.
|
|
||||||
|
|
||||||
Furthermore, each returned document has an additional ``__search_hit__`` attribute with various information
|
|
||||||
about the search results:
|
|
||||||
|
|
||||||
.. code::
|
|
||||||
|
|
||||||
{
|
|
||||||
"count": 31,
|
|
||||||
"next": "http://localhost:8000/api/documents/?page=2&query=test",
|
|
||||||
"previous": null,
|
|
||||||
"results": [
|
|
||||||
|
|
||||||
...
|
|
||||||
|
|
||||||
{
|
|
||||||
"id": 123,
|
|
||||||
"title": "title",
|
|
||||||
"content": "content",
|
|
||||||
|
|
||||||
...
|
|
||||||
|
|
||||||
"__search_hit__": {
|
|
||||||
"score": 0.343,
|
|
||||||
"highlights": "text <span class=\"match\">Test</span> text",
|
|
||||||
"rank": 23
|
|
||||||
}
|
|
||||||
},
|
|
||||||
|
|
||||||
...
|
|
||||||
|
|
||||||
]
|
|
||||||
}
|
|
||||||
|
|
||||||
* ``score`` is an indication how well this document matches the query relative to the other search results.
|
|
||||||
* ``highlights`` is an excerpt from the document content and highlights the search terms with ``<span>`` tags as shown above.
|
|
||||||
* ``rank`` is the index of the search results. The first result will have rank 0.
|
|
||||||
|
|
||||||
``/api/search/autocomplete/``
|
|
||||||
=============================
|
|
||||||
|
|
||||||
Get auto completions for a partial search term.
|
|
||||||
|
|
||||||
Query parameters:
|
|
||||||
|
|
||||||
* ``term``: The incomplete term.
|
|
||||||
* ``limit``: Amount of results. Defaults to 10.
|
|
||||||
|
|
||||||
Results returned by the endpoint are ordered by importance of the term in the
|
|
||||||
document index. The first result is the term that has the highest Tf/Idf score
|
|
||||||
in the index.
|
|
||||||
|
|
||||||
.. code:: json
|
|
||||||
|
|
||||||
[
|
|
||||||
"term1",
|
|
||||||
"term3",
|
|
||||||
"term6",
|
|
||||||
"term4"
|
|
||||||
]
|
|
||||||
|
|
||||||
|
|
||||||
.. _api-file_uploads:
|
|
||||||
|
|
||||||
POSTing documents
|
|
||||||
#################
|
|
||||||
|
|
||||||
The API provides a special endpoint for file uploads:
|
|
||||||
|
|
||||||
``/api/documents/post_document/``
|
|
||||||
|
|
||||||
POST a multipart form to this endpoint, where the form field ``document`` contains
|
|
||||||
the document that you want to upload to paperless. The filename is sanitized and
|
|
||||||
then used to store the document in a temporary directory, and the consumer will
|
|
||||||
be instructed to consume the document from there.
|
|
||||||
|
|
||||||
The endpoint supports the following optional form fields:
|
|
||||||
|
|
||||||
* ``title``: Specify a title that the consumer should use for the document.
|
|
||||||
* ``created``: Specify a DateTime where the document was created (e.g. "2016-04-19" or "2016-04-19 06:15:00+02:00").
|
|
||||||
* ``correspondent``: Specify the ID of a correspondent that the consumer should use for the document.
|
|
||||||
* ``document_type``: Similar to correspondent.
|
|
||||||
* ``tags``: Similar to correspondent. Specify this multiple times to have multiple tags added
|
|
||||||
to the document.
|
|
||||||
|
|
||||||
|
|
||||||
The endpoint will immediately return "OK" if the document consumption process
|
|
||||||
was started successfully. No additional status information about the consumption
|
|
||||||
process itself is available, since that happens in a different process.
|
|
||||||
|
|
||||||
|
|
||||||
.. _api-versioning:
|
|
||||||
|
|
||||||
API Versioning
|
|
||||||
##############
|
|
||||||
|
|
||||||
The REST API is versioned since Paperless-ngx 1.3.0.
|
|
||||||
|
|
||||||
* Versioning ensures that changes to the API don't break older clients.
|
|
||||||
* Clients specify the specific version of the API they wish to use with every request and Paperless will handle the request using the specified API version.
|
|
||||||
* Even if the underlying data model changes, older API versions will always serve compatible data.
|
|
||||||
* If no version is specified, Paperless will serve version 1 to ensure compatibility with older clients that do not request a specific API version.
|
|
||||||
|
|
||||||
API versions are specified by submitting an additional HTTP ``Accept`` header with every request:
|
|
||||||
|
|
||||||
.. code::
|
|
||||||
|
|
||||||
Accept: application/json; version=6
|
|
||||||
|
|
||||||
If an invalid version is specified, Paperless 1.3.0 will respond with "406 Not Acceptable" and an error message in the body.
|
|
||||||
Earlier versions of Paperless will serve API version 1 regardless of whether a version is specified via the ``Accept`` header.
|
|
||||||
|
|
||||||
If a client wishes to verify whether it is compatible with any given server, the following procedure should be performed:
|
|
||||||
|
|
||||||
1. Perform an *authenticated* request against any API endpoint. If the server is on version 1.3.0 or newer, the server will
|
|
||||||
add two custom headers to the response:
|
|
||||||
|
|
||||||
.. code::
|
|
||||||
|
|
||||||
X-Api-Version: 2
|
|
||||||
X-Version: 1.3.0
|
|
||||||
|
|
||||||
2. Determine whether the client is compatible with this server based on the presence/absence of these headers and their values if present.
|
|
||||||
|
|
||||||
|
|
||||||
API Changelog
|
|
||||||
=============
|
|
||||||
|
|
||||||
Version 1
|
|
||||||
---------
|
|
||||||
|
|
||||||
Initial API version.
|
|
||||||
|
|
||||||
Version 2
|
|
||||||
---------
|
|
||||||
|
|
||||||
* Added field ``Tag.color``. This read/write string field contains a hex color such as ``#a6cee3``.
|
|
||||||
* Added read-only field ``Tag.text_color``. This field contains the text color to use for a specific tag, which is either black or white depending on the brightness of ``Tag.color``.
|
|
||||||
* Removed field ``Tag.colour``.
|
|
||||||
|
2445
docs/changelog.md
2445
docs/changelog.md
File diff suppressed because it is too large
Load Diff
11
docs/changelog.rst
Normal file
11
docs/changelog.rst
Normal file
@ -0,0 +1,11 @@
|
|||||||
|
.. _changelog:
|
||||||
|
|
||||||
|
*********
|
||||||
|
Changelog
|
||||||
|
*********
|
||||||
|
|
||||||
|
.. cssclass:: redirect-notice
|
||||||
|
|
||||||
|
The Paperless-ngx documentation has permanently moved.
|
||||||
|
|
||||||
|
You will be redirected shortly...
|
@ -4,928 +4,9 @@
|
|||||||
Configuration
|
Configuration
|
||||||
*************
|
*************
|
||||||
|
|
||||||
Paperless provides a wide range of customizations.
|
|
||||||
Depending on how you run paperless, these settings have to be defined in different
|
|
||||||
places.
|
|
||||||
|
|
||||||
* If you run paperless on docker, ``paperless.conf`` is not used. Rather, configure
|
.. cssclass:: redirect-notice
|
||||||
paperless by copying necessary options to ``docker-compose.env``.
|
|
||||||
* If you are running paperless on anything else, paperless will search for the
|
|
||||||
configuration file in these locations and use the first one it finds:
|
|
||||||
|
|
||||||
.. code::
|
The Paperless-ngx documentation has permanently moved.
|
||||||
|
|
||||||
/path/to/paperless/paperless.conf
|
You will be redirected shortly...
|
||||||
/etc/paperless.conf
|
|
||||||
/usr/local/etc/paperless.conf
|
|
||||||
|
|
||||||
|
|
||||||
Required services
|
|
||||||
#################
|
|
||||||
|
|
||||||
PAPERLESS_REDIS=<url>
|
|
||||||
This is required for processing scheduled tasks such as email fetching, index
|
|
||||||
optimization and for training the automatic document matcher.
|
|
||||||
|
|
||||||
* If your Redis server needs login credentials PAPERLESS_REDIS = ``redis://<username>:<password>@<host>:<port>``
|
|
||||||
|
|
||||||
* With the requirepass option PAPERLESS_REDIS = ``redis://:<password>@<host>:<port>``
|
|
||||||
|
|
||||||
`More information on securing your Redis Instance <https://redis.io/docs/getting-started/#securing-redis>`_.
|
|
||||||
|
|
||||||
Defaults to redis://localhost:6379.
|
|
||||||
|
|
||||||
PAPERLESS_DBENGINE=<engine_name>
|
|
||||||
Optional, gives the ability to choose Postgres or MariaDB for database engine.
|
|
||||||
Available options are `postgresql` and `mariadb`.
|
|
||||||
|
|
||||||
Default is `postgresql`.
|
|
||||||
|
|
||||||
.. warning::
|
|
||||||
|
|
||||||
Using MariaDB comes with some caveats. See :ref:`advanced-mysql-caveats` for details.
|
|
||||||
|
|
||||||
|
|
||||||
PAPERLESS_DBHOST=<hostname>
|
|
||||||
By default, sqlite is used as the database backend. This can be changed here.
|
|
||||||
|
|
||||||
Set PAPERLESS_DBHOST and another database will be used instead of sqlite.
|
|
||||||
|
|
||||||
PAPERLESS_DBPORT=<port>
|
|
||||||
Adjust port if necessary.
|
|
||||||
|
|
||||||
Default is 5432.
|
|
||||||
|
|
||||||
PAPERLESS_DBNAME=<name>
|
|
||||||
Database name in PostgreSQL or MariaDB.
|
|
||||||
|
|
||||||
Defaults to "paperless".
|
|
||||||
|
|
||||||
PAPERLESS_DBUSER=<name>
|
|
||||||
Database user in PostgreSQL or MariaDB.
|
|
||||||
|
|
||||||
Defaults to "paperless".
|
|
||||||
|
|
||||||
PAPERLESS_DBPASS=<password>
|
|
||||||
Database password for PostgreSQL or MariaDB.
|
|
||||||
|
|
||||||
Defaults to "paperless".
|
|
||||||
|
|
||||||
PAPERLESS_DBSSLMODE=<mode>
|
|
||||||
SSL mode to use when connecting to PostgreSQL.
|
|
||||||
|
|
||||||
See `the official documentation about sslmode <https://www.postgresql.org/docs/current/libpq-ssl.html>`_.
|
|
||||||
|
|
||||||
Default is ``prefer``.
|
|
||||||
|
|
||||||
PAPERLESS_DB_TIMEOUT=<float>
|
|
||||||
Amount of time for a database connection to wait for the database to unlock.
|
|
||||||
Mostly applicable for an sqlite based installation, consider changing to postgresql
|
|
||||||
if you need to increase this.
|
|
||||||
|
|
||||||
Defaults to unset, keeping the Django defaults.
|
|
||||||
|
|
||||||
Paths and folders
|
|
||||||
#################
|
|
||||||
|
|
||||||
PAPERLESS_CONSUMPTION_DIR=<path>
|
|
||||||
This where your documents should go to be consumed. Make sure that it exists
|
|
||||||
and that the user running the paperless service can read/write its contents
|
|
||||||
before you start Paperless.
|
|
||||||
|
|
||||||
Don't change this when using docker, as it only changes the path within the
|
|
||||||
container. Change the local consumption directory in the docker-compose.yml
|
|
||||||
file instead.
|
|
||||||
|
|
||||||
Defaults to "../consume/", relative to the "src" directory.
|
|
||||||
|
|
||||||
PAPERLESS_DATA_DIR=<path>
|
|
||||||
This is where paperless stores all its data (search index, SQLite database,
|
|
||||||
classification model, etc).
|
|
||||||
|
|
||||||
Defaults to "../data/", relative to the "src" directory.
|
|
||||||
|
|
||||||
PAPERLESS_TRASH_DIR=<path>
|
|
||||||
Instead of removing deleted documents, they are moved to this directory.
|
|
||||||
|
|
||||||
This must be writeable by the user running paperless. When running inside
|
|
||||||
docker, ensure that this path is within a permanent volume (such as
|
|
||||||
"../media/trash") so it won't get lost on upgrades.
|
|
||||||
|
|
||||||
Defaults to empty (i.e. really delete documents).
|
|
||||||
|
|
||||||
PAPERLESS_MEDIA_ROOT=<path>
|
|
||||||
This is where your documents and thumbnails are stored.
|
|
||||||
|
|
||||||
You can set this and PAPERLESS_DATA_DIR to the same folder to have paperless
|
|
||||||
store all its data within the same volume.
|
|
||||||
|
|
||||||
Defaults to "../media/", relative to the "src" directory.
|
|
||||||
|
|
||||||
PAPERLESS_STATICDIR=<path>
|
|
||||||
Override the default STATIC_ROOT here. This is where all static files
|
|
||||||
created using "collectstatic" manager command are stored.
|
|
||||||
|
|
||||||
Unless you're doing something fancy, there is no need to override this.
|
|
||||||
|
|
||||||
Defaults to "../static/", relative to the "src" directory.
|
|
||||||
|
|
||||||
PAPERLESS_FILENAME_FORMAT=<format>
|
|
||||||
Changes the filenames paperless uses to store documents in the media directory.
|
|
||||||
See :ref:`advanced-file_name_handling` for details.
|
|
||||||
|
|
||||||
Default is none, which disables this feature.
|
|
||||||
|
|
||||||
PAPERLESS_FILENAME_FORMAT_REMOVE_NONE=<bool>
|
|
||||||
Tells paperless to replace placeholders in `PAPERLESS_FILENAME_FORMAT` that would resolve
|
|
||||||
to 'none' to be omitted from the resulting filename. This also holds true for directory
|
|
||||||
names.
|
|
||||||
See :ref:`advanced-file_name_handling` for details.
|
|
||||||
|
|
||||||
Defaults to `false` which disables this feature.
|
|
||||||
|
|
||||||
PAPERLESS_LOGGING_DIR=<path>
|
|
||||||
This is where paperless will store log files.
|
|
||||||
|
|
||||||
Defaults to "``PAPERLESS_DATA_DIR``/log/".
|
|
||||||
|
|
||||||
|
|
||||||
Logging
|
|
||||||
#######
|
|
||||||
|
|
||||||
PAPERLESS_LOGROTATE_MAX_SIZE=<num>
|
|
||||||
Maximum file size for log files before they are rotated, in bytes.
|
|
||||||
|
|
||||||
Defaults to 1 MiB.
|
|
||||||
|
|
||||||
PAPERLESS_LOGROTATE_MAX_BACKUPS=<num>
|
|
||||||
Number of rotated log files to keep.
|
|
||||||
|
|
||||||
Defaults to 20.
|
|
||||||
|
|
||||||
.. _hosting-and-security:
|
|
||||||
|
|
||||||
Hosting & Security
|
|
||||||
##################
|
|
||||||
|
|
||||||
PAPERLESS_SECRET_KEY=<key>
|
|
||||||
Paperless uses this to make session tokens. If you expose paperless on the
|
|
||||||
internet, you need to change this, since the default secret is well known.
|
|
||||||
|
|
||||||
Use any sequence of characters. The more, the better. You don't need to
|
|
||||||
remember this. Just face-roll your keyboard.
|
|
||||||
|
|
||||||
Default is listed in the file ``src/paperless/settings.py``.
|
|
||||||
|
|
||||||
PAPERLESS_URL=<url>
|
|
||||||
This setting can be used to set the three options below (ALLOWED_HOSTS,
|
|
||||||
CORS_ALLOWED_HOSTS and CSRF_TRUSTED_ORIGINS). If the other options are
|
|
||||||
set the values will be combined with this one. Do not include a trailing
|
|
||||||
slash. E.g. https://paperless.domain.com
|
|
||||||
|
|
||||||
Defaults to empty string, leaving the other settings unaffected.
|
|
||||||
|
|
||||||
PAPERLESS_CSRF_TRUSTED_ORIGINS=<comma-separated-list>
|
|
||||||
A list of trusted origins for unsafe requests (e.g. POST). As of Django 4.0
|
|
||||||
this is required to access the Django admin via the web.
|
|
||||||
See https://docs.djangoproject.com/en/4.0/ref/settings/#csrf-trusted-origins
|
|
||||||
|
|
||||||
Can also be set using PAPERLESS_URL (see above).
|
|
||||||
|
|
||||||
Defaults to empty string, which does not add any origins to the trusted list.
|
|
||||||
|
|
||||||
PAPERLESS_ALLOWED_HOSTS=<comma-separated-list>
|
|
||||||
If you're planning on putting Paperless on the open internet, then you
|
|
||||||
really should set this value to the domain name you're using. Failing to do
|
|
||||||
so leaves you open to HTTP host header attacks:
|
|
||||||
https://docs.djangoproject.com/en/3.1/topics/security/#host-header-validation
|
|
||||||
|
|
||||||
Just remember that this is a comma-separated list, so "example.com" is fine,
|
|
||||||
as is "example.com,www.example.com", but NOT " example.com" or "example.com,"
|
|
||||||
|
|
||||||
Can also be set using PAPERLESS_URL (see above).
|
|
||||||
|
|
||||||
If manually set, please remember to include "localhost". Otherwise docker
|
|
||||||
healthcheck will fail.
|
|
||||||
|
|
||||||
Defaults to "*", which is all hosts.
|
|
||||||
|
|
||||||
PAPERLESS_CORS_ALLOWED_HOSTS=<comma-separated-list>
|
|
||||||
You need to add your servers to the list of allowed hosts that can do CORS
|
|
||||||
calls. Set this to your public domain name.
|
|
||||||
|
|
||||||
Can also be set using PAPERLESS_URL (see above).
|
|
||||||
|
|
||||||
Defaults to "http://localhost:8000".
|
|
||||||
|
|
||||||
PAPERLESS_FORCE_SCRIPT_NAME=<path>
|
|
||||||
To host paperless under a subpath url like example.com/paperless you set
|
|
||||||
this value to /paperless. No trailing slash!
|
|
||||||
|
|
||||||
Defaults to none, which hosts paperless at "/".
|
|
||||||
|
|
||||||
PAPERLESS_STATIC_URL=<path>
|
|
||||||
Override the STATIC_URL here. Unless you're hosting Paperless off a
|
|
||||||
subdomain like /paperless/, you probably don't need to change this.
|
|
||||||
If you do change it, be sure to include the trailing slash.
|
|
||||||
|
|
||||||
Defaults to "/static/".
|
|
||||||
|
|
||||||
.. note::
|
|
||||||
|
|
||||||
When hosting paperless behind a reverse proxy like Traefik or Nginx at a subpath e.g.
|
|
||||||
example.com/paperlessngx you will also need to set ``PAPERLESS_FORCE_SCRIPT_NAME``
|
|
||||||
(see above).
|
|
||||||
|
|
||||||
PAPERLESS_AUTO_LOGIN_USERNAME=<username>
|
|
||||||
Specify a username here so that paperless will automatically perform login
|
|
||||||
with the selected user.
|
|
||||||
|
|
||||||
.. danger::
|
|
||||||
|
|
||||||
Do not use this when exposing paperless on the internet. There are no
|
|
||||||
checks in place that would prevent you from doing this.
|
|
||||||
|
|
||||||
Defaults to none, which disables this feature.
|
|
||||||
|
|
||||||
PAPERLESS_ADMIN_USER=<username>
|
|
||||||
If this environment variable is specified, Paperless automatically creates
|
|
||||||
a superuser with the provided username at start. This is useful in cases
|
|
||||||
where you can not run the `createsuperuser` command separately, such as Kubernetes
|
|
||||||
or AWS ECS.
|
|
||||||
|
|
||||||
Requires `PAPERLESS_ADMIN_PASSWORD` to be set.
|
|
||||||
|
|
||||||
.. note::
|
|
||||||
|
|
||||||
This will not change an existing [super]user's password, nor will
|
|
||||||
it recreate a user that already exists. You can leave this throughout
|
|
||||||
the lifecycle of the containers.
|
|
||||||
|
|
||||||
PAPERLESS_ADMIN_MAIL=<email>
|
|
||||||
(Optional) Specify superuser email address. Only used when
|
|
||||||
`PAPERLESS_ADMIN_USER` is set.
|
|
||||||
|
|
||||||
Defaults to ``root@localhost``.
|
|
||||||
|
|
||||||
PAPERLESS_ADMIN_PASSWORD=<password>
|
|
||||||
Only used when `PAPERLESS_ADMIN_USER` is set.
|
|
||||||
This will be the password of the automatically created superuser.
|
|
||||||
|
|
||||||
|
|
||||||
PAPERLESS_COOKIE_PREFIX=<str>
|
|
||||||
Specify a prefix that is added to the cookies used by paperless to identify
|
|
||||||
the currently logged in user. This is useful for when you're running two
|
|
||||||
instances of paperless on the same host.
|
|
||||||
|
|
||||||
After changing this, you will have to login again.
|
|
||||||
|
|
||||||
Defaults to ``""``, which does not alter the cookie names.
|
|
||||||
|
|
||||||
PAPERLESS_ENABLE_HTTP_REMOTE_USER=<bool>
|
|
||||||
Allows authentication via HTTP_REMOTE_USER which is used by some SSO
|
|
||||||
applications.
|
|
||||||
|
|
||||||
.. warning::
|
|
||||||
|
|
||||||
This will allow authentication by simply adding a ``Remote-User: <username>`` header
|
|
||||||
to a request. Use with care! You especially *must* ensure that any such header is not
|
|
||||||
passed from your proxy server to paperless.
|
|
||||||
|
|
||||||
If you're exposing paperless to the internet directly, do not use this.
|
|
||||||
|
|
||||||
Also see the warning `in the official documentation <https://docs.djangoproject.com/en/3.1/howto/auth-remote-user/#configuration>`.
|
|
||||||
|
|
||||||
Defaults to `false` which disables this feature.
|
|
||||||
|
|
||||||
PAPERLESS_HTTP_REMOTE_USER_HEADER_NAME=<str>
|
|
||||||
If `PAPERLESS_ENABLE_HTTP_REMOTE_USER` is enabled, this property allows to
|
|
||||||
customize the name of the HTTP header from which the authenticated username
|
|
||||||
is extracted. Values are in terms of
|
|
||||||
[HttpRequest.META](https://docs.djangoproject.com/en/3.1/ref/request-response/#django.http.HttpRequest.META).
|
|
||||||
Thus, the configured value must start with `HTTP_` followed by the
|
|
||||||
normalized actual header name.
|
|
||||||
|
|
||||||
Defaults to `HTTP_REMOTE_USER`.
|
|
||||||
|
|
||||||
PAPERLESS_LOGOUT_REDIRECT_URL=<str>
|
|
||||||
URL to redirect the user to after a logout. This can be used together with
|
|
||||||
`PAPERLESS_ENABLE_HTTP_REMOTE_USER` to redirect the user back to the SSO
|
|
||||||
application's logout page.
|
|
||||||
|
|
||||||
Defaults to None, which disables this feature.
|
|
||||||
|
|
||||||
.. _configuration-ocr:
|
|
||||||
|
|
||||||
OCR settings
|
|
||||||
############
|
|
||||||
|
|
||||||
Paperless uses `OCRmyPDF <https://ocrmypdf.readthedocs.io/en/latest/>`_ for
|
|
||||||
performing OCR on documents and images. Paperless uses sensible defaults for
|
|
||||||
most settings, but all of them can be configured to your needs.
|
|
||||||
|
|
||||||
PAPERLESS_OCR_LANGUAGE=<lang>
|
|
||||||
Customize the language that paperless will attempt to use when
|
|
||||||
parsing documents.
|
|
||||||
|
|
||||||
It should be a 3-letter language code consistent with ISO
|
|
||||||
639: https://www.loc.gov/standards/iso639-2/php/code_list.php
|
|
||||||
|
|
||||||
Set this to the language most of your documents are written in.
|
|
||||||
|
|
||||||
This can be a combination of multiple languages such as ``deu+eng``,
|
|
||||||
in which case tesseract will use whatever language matches best.
|
|
||||||
Keep in mind that tesseract uses much more cpu time with multiple
|
|
||||||
languages enabled.
|
|
||||||
|
|
||||||
Defaults to "eng".
|
|
||||||
|
|
||||||
Note: If your language contains a '-' such as chi-sim, you must use chi_sim
|
|
||||||
|
|
||||||
PAPERLESS_OCR_MODE=<mode>
|
|
||||||
Tell paperless when and how to perform ocr on your documents. Four modes
|
|
||||||
are available:
|
|
||||||
|
|
||||||
* ``skip``: Paperless skips all pages and will perform ocr only on pages
|
|
||||||
where no text is present. This is the safest option.
|
|
||||||
* ``skip_noarchive``: In addition to skip, paperless won't create an
|
|
||||||
archived version of your documents when it finds any text in them.
|
|
||||||
This is useful if you don't want to have two almost-identical versions
|
|
||||||
of your digital documents in the media folder. This is the fastest option.
|
|
||||||
* ``redo``: Paperless will OCR all pages of your documents and attempt to
|
|
||||||
replace any existing text layers with new text. This will be useful for
|
|
||||||
documents from scanners that already performed OCR with insufficient
|
|
||||||
results. It will also perform OCR on purely digital documents.
|
|
||||||
|
|
||||||
This option may fail on some documents that have features that cannot
|
|
||||||
be removed, such as forms. In this case, the text from the document is
|
|
||||||
used instead.
|
|
||||||
* ``force``: Paperless rasterizes your documents, converting any text
|
|
||||||
into images and puts the OCRed text on top. This works for all documents,
|
|
||||||
however, the resulting document may be significantly larger and text
|
|
||||||
won't appear as sharp when zoomed in.
|
|
||||||
|
|
||||||
The default is ``skip``, which only performs OCR when necessary and always
|
|
||||||
creates archived documents.
|
|
||||||
|
|
||||||
Read more about this in the `OCRmyPDF documentation <https://ocrmypdf.readthedocs.io/en/latest/advanced.html#when-ocr-is-skipped>`_.
|
|
||||||
|
|
||||||
PAPERLESS_OCR_CLEAN=<mode>
|
|
||||||
Tells paperless to use ``unpaper`` to clean any input document before
|
|
||||||
sending it to tesseract. This uses more resources, but generally results
|
|
||||||
in better OCR results. The following modes are available:
|
|
||||||
|
|
||||||
* ``clean``: Apply unpaper.
|
|
||||||
* ``clean-final``: Apply unpaper, and use the cleaned images to build the
|
|
||||||
output file instead of the original images.
|
|
||||||
* ``none``: Do not apply unpaper.
|
|
||||||
|
|
||||||
Defaults to ``clean``.
|
|
||||||
|
|
||||||
.. note::
|
|
||||||
|
|
||||||
``clean-final`` is incompatible with ocr mode ``redo``. When both
|
|
||||||
``clean-final`` and the ocr mode ``redo`` is configured, ``clean``
|
|
||||||
is used instead.
|
|
||||||
|
|
||||||
PAPERLESS_OCR_DESKEW=<bool>
|
|
||||||
Tells paperless to correct skewing (slight rotation of input images mainly
|
|
||||||
due to improper scanning)
|
|
||||||
|
|
||||||
Defaults to ``true``, which enables this feature.
|
|
||||||
|
|
||||||
.. note::
|
|
||||||
|
|
||||||
Deskewing is incompatible with ocr mode ``redo``. Deskewing will get
|
|
||||||
disabled automatically if ``redo`` is used as the ocr mode.
|
|
||||||
|
|
||||||
PAPERLESS_OCR_ROTATE_PAGES=<bool>
|
|
||||||
Tells paperless to correct page rotation (90°, 180° and 270° rotation).
|
|
||||||
|
|
||||||
If you notice that paperless is not rotating incorrectly rotated
|
|
||||||
pages (or vice versa), try adjusting the threshold up or down (see below).
|
|
||||||
|
|
||||||
Defaults to ``true``, which enables this feature.
|
|
||||||
|
|
||||||
|
|
||||||
PAPERLESS_OCR_ROTATE_PAGES_THRESHOLD=<num>
|
|
||||||
Adjust the threshold for automatic page rotation by ``PAPERLESS_OCR_ROTATE_PAGES``.
|
|
||||||
This is an arbitrary value reported by tesseract. "15" is a very conservative value,
|
|
||||||
whereas "2" is a very aggressive option and will often result in correctly rotated pages
|
|
||||||
being rotated as well.
|
|
||||||
|
|
||||||
Defaults to "12".
|
|
||||||
|
|
||||||
PAPERLESS_OCR_OUTPUT_TYPE=<type>
|
|
||||||
Specify the the type of PDF documents that paperless should produce.
|
|
||||||
|
|
||||||
* ``pdf``: Modify the PDF document as little as possible.
|
|
||||||
* ``pdfa``: Convert PDF documents into PDF/A-2b documents, which is a
|
|
||||||
subset of the entire PDF specification and meant for storing
|
|
||||||
documents long term.
|
|
||||||
* ``pdfa-1``, ``pdfa-2``, ``pdfa-3`` to specify the exact version of
|
|
||||||
PDF/A you wish to use.
|
|
||||||
|
|
||||||
If not specified, ``pdfa`` is used. Remember that paperless also keeps
|
|
||||||
the original input file as well as the archived version.
|
|
||||||
|
|
||||||
|
|
||||||
PAPERLESS_OCR_PAGES=<num>
|
|
||||||
Tells paperless to use only the specified amount of pages for OCR. Documents
|
|
||||||
with less than the specified amount of pages get OCR'ed completely.
|
|
||||||
|
|
||||||
Specifying 1 here will only use the first page.
|
|
||||||
|
|
||||||
When combined with ``PAPERLESS_OCR_MODE=redo`` or ``PAPERLESS_OCR_MODE=force``,
|
|
||||||
paperless will not modify any text it finds on excluded pages and copy it
|
|
||||||
verbatim.
|
|
||||||
|
|
||||||
Defaults to 0, which disables this feature and always uses all pages.
|
|
||||||
|
|
||||||
PAPERLESS_OCR_IMAGE_DPI=<num>
|
|
||||||
Paperless will OCR any images you put into the system and convert them
|
|
||||||
into PDF documents. This is useful if your scanner produces images.
|
|
||||||
In order to do so, paperless needs to know the DPI of the image.
|
|
||||||
Most images from scanners will have this information embedded and
|
|
||||||
paperless will detect and use that information. In case this fails, it
|
|
||||||
uses this value as a fallback.
|
|
||||||
|
|
||||||
Set this to the DPI your scanner produces images at.
|
|
||||||
|
|
||||||
Default is none, which will automatically calculate image DPI so that
|
|
||||||
the produced PDF documents are A4 sized.
|
|
||||||
|
|
||||||
PAPERLESS_OCR_MAX_IMAGE_PIXELS=<num>
|
|
||||||
Paperless will raise a warning when OCRing images which are over this limit and
|
|
||||||
will not OCR images which are more than twice this limit. Note this does not
|
|
||||||
prevent the document from being consumed, but could result in missing text content.
|
|
||||||
|
|
||||||
If unset, will default to the value determined by
|
|
||||||
`Pillow <https://pillow.readthedocs.io/en/stable/reference/Image.html#PIL.Image.MAX_IMAGE_PIXELS>`_.
|
|
||||||
|
|
||||||
.. note::
|
|
||||||
|
|
||||||
Increasing this limit could cause Paperless to consume additional resources
|
|
||||||
when consuming a file. Be sure you have sufficient system resources.
|
|
||||||
|
|
||||||
.. caution::
|
|
||||||
|
|
||||||
The limit is intended to prevent malicious files from consuming system resources
|
|
||||||
and causing crashes and other errors. Only increase this value if you are certain
|
|
||||||
your documents are not malicious and you need the text which was not OCRed
|
|
||||||
|
|
||||||
PAPERLESS_OCR_USER_ARGS=<json>
|
|
||||||
OCRmyPDF offers many more options. Use this parameter to specify any
|
|
||||||
additional arguments you wish to pass to OCRmyPDF. Since Paperless uses
|
|
||||||
the API of OCRmyPDF, you have to specify these in a format that can be
|
|
||||||
passed to the API. See `the API reference of OCRmyPDF <https://ocrmypdf.readthedocs.io/en/latest/api.html#reference>`_
|
|
||||||
for valid parameters. All command line options are supported, but they
|
|
||||||
use underscores instead of dashes.
|
|
||||||
|
|
||||||
.. caution::
|
|
||||||
|
|
||||||
Paperless has been tested to work with the OCR options provided
|
|
||||||
above. There are many options that are incompatible with each other,
|
|
||||||
so specifying invalid options may prevent paperless from consuming
|
|
||||||
any documents.
|
|
||||||
|
|
||||||
Specify arguments as a JSON dictionary. Keep note of lower case booleans
|
|
||||||
and double quoted parameter names and strings. Examples:
|
|
||||||
|
|
||||||
.. code:: json
|
|
||||||
|
|
||||||
{"deskew": true, "optimize": 3, "unpaper_args": "--pre-rotate 90"}
|
|
||||||
|
|
||||||
.. _configuration-tika:
|
|
||||||
|
|
||||||
Tika settings
|
|
||||||
#############
|
|
||||||
|
|
||||||
Paperless can make use of `Tika <https://tika.apache.org/>`_ and
|
|
||||||
`Gotenberg <https://gotenberg.dev/>`_ for parsing and
|
|
||||||
converting "Office" documents (such as ".doc", ".xlsx" and ".odt"). If you
|
|
||||||
wish to use this, you must provide a Tika server and a Gotenberg server,
|
|
||||||
configure their endpoints, and enable the feature.
|
|
||||||
|
|
||||||
PAPERLESS_TIKA_ENABLED=<bool>
|
|
||||||
Enable (or disable) the Tika parser.
|
|
||||||
|
|
||||||
Defaults to false.
|
|
||||||
|
|
||||||
PAPERLESS_TIKA_ENDPOINT=<url>
|
|
||||||
Set the endpoint URL were Paperless can reach your Tika server.
|
|
||||||
|
|
||||||
Defaults to "http://localhost:9998".
|
|
||||||
|
|
||||||
PAPERLESS_TIKA_GOTENBERG_ENDPOINT=<url>
|
|
||||||
Set the endpoint URL were Paperless can reach your Gotenberg server.
|
|
||||||
|
|
||||||
Defaults to "http://localhost:3000".
|
|
||||||
|
|
||||||
If you run paperless on docker, you can add those services to the docker-compose
|
|
||||||
file (see the provided ``docker-compose.sqlite-tika.yml`` file for reference). The changes
|
|
||||||
requires are as follows:
|
|
||||||
|
|
||||||
.. code:: yaml
|
|
||||||
|
|
||||||
services:
|
|
||||||
# ...
|
|
||||||
|
|
||||||
webserver:
|
|
||||||
# ...
|
|
||||||
|
|
||||||
environment:
|
|
||||||
# ...
|
|
||||||
|
|
||||||
PAPERLESS_TIKA_ENABLED: 1
|
|
||||||
PAPERLESS_TIKA_GOTENBERG_ENDPOINT: http://gotenberg:3000
|
|
||||||
PAPERLESS_TIKA_ENDPOINT: http://tika:9998
|
|
||||||
|
|
||||||
# ...
|
|
||||||
|
|
||||||
gotenberg:
|
|
||||||
image: gotenberg/gotenberg:7.6
|
|
||||||
restart: unless-stopped
|
|
||||||
command:
|
|
||||||
- "gotenberg"
|
|
||||||
- "--chromium-disable-routes=true"
|
|
||||||
|
|
||||||
tika:
|
|
||||||
image: ghcr.io/paperless-ngx/tika:latest
|
|
||||||
restart: unless-stopped
|
|
||||||
|
|
||||||
Add the configuration variables to the environment of the webserver (alternatively
|
|
||||||
put the configuration in the ``docker-compose.env`` file) and add the additional
|
|
||||||
services below the webserver service. Watch out for indentation.
|
|
||||||
|
|
||||||
Make sure to use the correct format `PAPERLESS_TIKA_ENABLED = 1` so python_dotenv can parse the statement correctly.
|
|
||||||
|
|
||||||
Software tweaks
|
|
||||||
###############
|
|
||||||
|
|
||||||
PAPERLESS_TASK_WORKERS=<num>
|
|
||||||
Paperless does multiple things in the background: Maintain the search index,
|
|
||||||
maintain the automatic matching algorithm, check emails, consume documents,
|
|
||||||
etc. This variable specifies how many things it will do in parallel.
|
|
||||||
|
|
||||||
Defaults to 1
|
|
||||||
|
|
||||||
|
|
||||||
PAPERLESS_THREADS_PER_WORKER=<num>
|
|
||||||
Furthermore, paperless uses multiple threads when consuming documents to
|
|
||||||
speed up OCR. This variable specifies how many pages paperless will process
|
|
||||||
in parallel on a single document.
|
|
||||||
|
|
||||||
.. caution::
|
|
||||||
|
|
||||||
Ensure that the product
|
|
||||||
|
|
||||||
PAPERLESS_TASK_WORKERS * PAPERLESS_THREADS_PER_WORKER
|
|
||||||
|
|
||||||
does not exceed your CPU core count or else paperless will be extremely slow.
|
|
||||||
If you want paperless to process many documents in parallel, choose a high
|
|
||||||
worker count. If you want paperless to process very large documents faster,
|
|
||||||
use a higher thread per worker count.
|
|
||||||
|
|
||||||
The default is a balance between the two, according to your CPU core count,
|
|
||||||
with a slight favor towards threads per worker:
|
|
||||||
|
|
||||||
+----------------+---------+---------+
|
|
||||||
| CPU core count | Workers | Threads |
|
|
||||||
+----------------+---------+---------+
|
|
||||||
| 1 | 1 | 1 |
|
|
||||||
+----------------+---------+---------+
|
|
||||||
| 2 | 2 | 1 |
|
|
||||||
+----------------+---------+---------+
|
|
||||||
| 4 | 2 | 2 |
|
|
||||||
+----------------+---------+---------+
|
|
||||||
| 6 | 2 | 3 |
|
|
||||||
+----------------+---------+---------+
|
|
||||||
| 8 | 2 | 4 |
|
|
||||||
+----------------+---------+---------+
|
|
||||||
| 12 | 3 | 4 |
|
|
||||||
+----------------+---------+---------+
|
|
||||||
| 16 | 4 | 4 |
|
|
||||||
+----------------+---------+---------+
|
|
||||||
|
|
||||||
If you only specify PAPERLESS_TASK_WORKERS, paperless will adjust
|
|
||||||
PAPERLESS_THREADS_PER_WORKER automatically.
|
|
||||||
|
|
||||||
|
|
||||||
PAPERLESS_WORKER_TIMEOUT=<num>
|
|
||||||
Machines with few cores or weak ones might not be able to finish OCR on
|
|
||||||
large documents within the default 1800 seconds. So extending this timeout
|
|
||||||
may prove to be useful on weak hardware setups.
|
|
||||||
|
|
||||||
PAPERLESS_WORKER_RETRY=<num>
|
|
||||||
If PAPERLESS_WORKER_TIMEOUT has been configured, the retry time for a task can
|
|
||||||
also be configured. By default, this value will be set to 10s more than the
|
|
||||||
worker timeout. This value should never be set less than the worker timeout.
|
|
||||||
|
|
||||||
PAPERLESS_TIME_ZONE=<timezone>
|
|
||||||
Set the time zone here.
|
|
||||||
See https://docs.djangoproject.com/en/3.1/ref/settings/#std:setting-TIME_ZONE
|
|
||||||
for details on how to set it.
|
|
||||||
|
|
||||||
Defaults to UTC.
|
|
||||||
|
|
||||||
|
|
||||||
.. _configuration-polling:
|
|
||||||
|
|
||||||
PAPERLESS_CONSUMER_POLLING=<num>
|
|
||||||
If paperless won't find documents added to your consume folder, it might
|
|
||||||
not be able to automatically detect filesystem changes. In that case,
|
|
||||||
specify a polling interval in seconds here, which will then cause paperless
|
|
||||||
to periodically check your consumption directory for changes. This will also
|
|
||||||
disable listening for file system changes with ``inotify``.
|
|
||||||
|
|
||||||
Defaults to 0, which disables polling and uses filesystem notifications.
|
|
||||||
|
|
||||||
PAPERLESS_CONSUMER_POLLING_RETRY_COUNT=<num>
|
|
||||||
If consumer polling is enabled, sets the number of times paperless will check for a
|
|
||||||
file to remain unmodified.
|
|
||||||
|
|
||||||
Defaults to 5.
|
|
||||||
|
|
||||||
PAPERLESS_CONSUMER_POLLING_DELAY=<num>
|
|
||||||
If consumer polling is enabled, sets the delay in seconds between each check (above) paperless
|
|
||||||
will do while waiting for a file to remain unmodified.
|
|
||||||
|
|
||||||
Defaults to 5.
|
|
||||||
|
|
||||||
.. _configuration-inotify:
|
|
||||||
|
|
||||||
PAPERLESS_CONSUMER_INOTIFY_DELAY=<num>
|
|
||||||
Sets the time in seconds the consumer will wait for additional events
|
|
||||||
from inotify before the consumer will consider a file ready and begin consumption.
|
|
||||||
Certain scanners or network setups may generate multiple events for a single file,
|
|
||||||
leading to multiple consumers working on the same file. Configure this to
|
|
||||||
prevent that.
|
|
||||||
|
|
||||||
Defaults to 0.5 seconds.
|
|
||||||
|
|
||||||
PAPERLESS_CONSUMER_DELETE_DUPLICATES=<bool>
|
|
||||||
When the consumer detects a duplicate document, it will not touch the
|
|
||||||
original document. This default behavior can be changed here.
|
|
||||||
|
|
||||||
Defaults to false.
|
|
||||||
|
|
||||||
|
|
||||||
PAPERLESS_CONSUMER_RECURSIVE=<bool>
|
|
||||||
Enable recursive watching of the consumption directory. Paperless will
|
|
||||||
then pickup files from files in subdirectories within your consumption
|
|
||||||
directory as well.
|
|
||||||
|
|
||||||
Defaults to false.
|
|
||||||
|
|
||||||
|
|
||||||
PAPERLESS_CONSUMER_SUBDIRS_AS_TAGS=<bool>
|
|
||||||
Set the names of subdirectories as tags for consumed files.
|
|
||||||
E.g. <CONSUMPTION_DIR>/foo/bar/file.pdf will add the tags "foo" and "bar" to
|
|
||||||
the consumed file. Paperless will create any tags that don't exist yet.
|
|
||||||
|
|
||||||
This is useful for sorting documents with certain tags such as ``car`` or
|
|
||||||
``todo`` prior to consumption. These folders won't be deleted.
|
|
||||||
|
|
||||||
PAPERLESS_CONSUMER_RECURSIVE must be enabled for this to work.
|
|
||||||
|
|
||||||
Defaults to false.
|
|
||||||
|
|
||||||
PAPERLESS_CONSUMER_ENABLE_BARCODES=<bool>
|
|
||||||
Enables the scanning and page separation based on detected barcodes.
|
|
||||||
This allows for scanning and adding multiple documents per uploaded
|
|
||||||
file, which are separated by one or multiple barcode pages.
|
|
||||||
|
|
||||||
For ease of use, it is suggested to use a standardized separation page,
|
|
||||||
e.g. `here <https://www.alliancegroup.co.uk/patch-codes.htm>`_.
|
|
||||||
|
|
||||||
If no barcodes are detected in the uploaded file, no page separation
|
|
||||||
will happen.
|
|
||||||
|
|
||||||
The original document will be removed and the separated pages will be
|
|
||||||
saved as pdf.
|
|
||||||
|
|
||||||
Defaults to false.
|
|
||||||
|
|
||||||
|
|
||||||
PAPERLESS_CONSUMER_BARCODE_TIFF_SUPPORT=<bool>
|
|
||||||
Whether TIFF image files should be scanned for barcodes.
|
|
||||||
This will automatically convert any TIFF image(s) to pdfs for later
|
|
||||||
processing.
|
|
||||||
This only has an effect, if PAPERLESS_CONSUMER_ENABLE_BARCODES has been
|
|
||||||
enabled.
|
|
||||||
|
|
||||||
Defaults to false.
|
|
||||||
|
|
||||||
PAPERLESS_CONSUMER_BARCODE_STRING=PATCHT
|
|
||||||
Defines the string to be detected as a separator barcode.
|
|
||||||
If paperless is used with the PATCH-T separator pages, users
|
|
||||||
shouldn't change this.
|
|
||||||
|
|
||||||
Defaults to "PATCHT"
|
|
||||||
|
|
||||||
PAPERLESS_CONVERT_MEMORY_LIMIT=<num>
|
|
||||||
On smaller systems, or even in the case of Very Large Documents, the consumer
|
|
||||||
may explode, complaining about how it's "unable to extend pixel cache". In
|
|
||||||
such cases, try setting this to a reasonably low value, like 32. The
|
|
||||||
default is to use whatever is necessary to do everything without writing to
|
|
||||||
disk, and units are in megabytes.
|
|
||||||
|
|
||||||
For more information on how to use this value, you should search
|
|
||||||
the web for "MAGICK_MEMORY_LIMIT".
|
|
||||||
|
|
||||||
Defaults to 0, which disables the limit.
|
|
||||||
|
|
||||||
PAPERLESS_CONVERT_TMPDIR=<path>
|
|
||||||
Similar to the memory limit, if you've got a small system and your OS mounts
|
|
||||||
/tmp as tmpfs, you should set this to a path that's on a physical disk, like
|
|
||||||
/home/your_user/tmp or something. ImageMagick will use this as scratch space
|
|
||||||
when crunching through very large documents.
|
|
||||||
|
|
||||||
For more information on how to use this value, you should search
|
|
||||||
the web for "MAGICK_TMPDIR".
|
|
||||||
|
|
||||||
Default is none, which disables the temporary directory.
|
|
||||||
|
|
||||||
PAPERLESS_POST_CONSUME_SCRIPT=<filename>
|
|
||||||
After a document is consumed, Paperless can trigger an arbitrary script if
|
|
||||||
you like. This script will be passed a number of arguments for you to work
|
|
||||||
with. For more information, take a look at :ref:`advanced-post_consume_script`.
|
|
||||||
|
|
||||||
The default is blank, which means nothing will be executed.
|
|
||||||
|
|
||||||
PAPERLESS_FILENAME_DATE_ORDER=<format>
|
|
||||||
Paperless will check the document text for document date information.
|
|
||||||
Use this setting to enable checking the document filename for date
|
|
||||||
information. The date order can be set to any option as specified in
|
|
||||||
https://dateparser.readthedocs.io/en/latest/settings.html#date-order.
|
|
||||||
The filename will be checked first, and if nothing is found, the document
|
|
||||||
text will be checked as normal.
|
|
||||||
|
|
||||||
A date in a filename must have some separators (`.`, `-`, `/`, etc)
|
|
||||||
for it to be parsed.
|
|
||||||
|
|
||||||
Defaults to none, which disables this feature.
|
|
||||||
|
|
||||||
PAPERLESS_NUMBER_OF_SUGGESTED_DATES=<num>
|
|
||||||
Paperless searches an entire document for dates. The first date found will
|
|
||||||
be used as the initial value for the created date. When this variable is
|
|
||||||
greater than 0 (or left to it's default value), paperless will also suggest
|
|
||||||
other dates found in the document, up to a maximum of this setting. Note that
|
|
||||||
duplicates will be removed, which can result in fewer dates displayed in the
|
|
||||||
frontend than this setting value.
|
|
||||||
|
|
||||||
The task to find all dates can be time-consuming and increases with a higher
|
|
||||||
(maximum) number of suggested dates and slower hardware.
|
|
||||||
|
|
||||||
Defaults to 3. Set to 0 to disable this feature.
|
|
||||||
|
|
||||||
PAPERLESS_THUMBNAIL_FONT_NAME=<filename>
|
|
||||||
Paperless creates thumbnails for plain text files by rendering the content
|
|
||||||
of the file on an image and uses a predefined font for that. This
|
|
||||||
font can be changed here.
|
|
||||||
|
|
||||||
Note that this won't have any effect on already generated thumbnails.
|
|
||||||
|
|
||||||
Defaults to ``/usr/share/fonts/liberation/LiberationSerif-Regular.ttf``.
|
|
||||||
|
|
||||||
PAPERLESS_IGNORE_DATES=<string>
|
|
||||||
Paperless parses a documents creation date from filename and file content.
|
|
||||||
You may specify a comma separated list of dates that should be ignored during
|
|
||||||
this process. This is useful for special dates (like date of birth) that appear
|
|
||||||
in documents regularly but are very unlikely to be the documents creation date.
|
|
||||||
|
|
||||||
The date is parsed using the order specified in PAPERLESS_DATE_ORDER
|
|
||||||
|
|
||||||
Defaults to an empty string to not ignore any dates.
|
|
||||||
|
|
||||||
PAPERLESS_DATE_ORDER=<format>
|
|
||||||
Paperless will try to determine the document creation date from its contents.
|
|
||||||
Specify the date format Paperless should expect to see within your documents.
|
|
||||||
|
|
||||||
This option defaults to DMY which translates to day first, month second, and year
|
|
||||||
last order. Characters D, M, or Y can be shuffled to meet the required order.
|
|
||||||
|
|
||||||
PAPERLESS_CONSUMER_IGNORE_PATTERNS=<json>
|
|
||||||
By default, paperless ignores certain files and folders in the consumption
|
|
||||||
directory, such as system files created by the Mac OS.
|
|
||||||
|
|
||||||
This can be adjusted by configuring a custom json array with patterns to exclude.
|
|
||||||
|
|
||||||
Defaults to ``[".DS_STORE/*", "._*", ".stfolder/*", ".stversions/*", ".localized/*", "desktop.ini"]``.
|
|
||||||
|
|
||||||
Binaries
|
|
||||||
########
|
|
||||||
|
|
||||||
There are a few external software packages that Paperless expects to find on
|
|
||||||
your system when it starts up. Unless you've done something creative with
|
|
||||||
their installation, you probably won't need to edit any of these. However,
|
|
||||||
if you've installed these programs somewhere where simply typing the name of
|
|
||||||
the program doesn't automatically execute it (ie. the program isn't in your
|
|
||||||
$PATH), then you'll need to specify the literal path for that program.
|
|
||||||
|
|
||||||
PAPERLESS_CONVERT_BINARY=<path>
|
|
||||||
Defaults to "convert".
|
|
||||||
|
|
||||||
PAPERLESS_GS_BINARY=<path>
|
|
||||||
Defaults to "gs".
|
|
||||||
|
|
||||||
|
|
||||||
.. _configuration-docker:
|
|
||||||
|
|
||||||
Docker-specific options
|
|
||||||
#######################
|
|
||||||
|
|
||||||
These options don't have any effect in ``paperless.conf``. These options adjust
|
|
||||||
the behavior of the docker container. Configure these in `docker-compose.env`.
|
|
||||||
|
|
||||||
PAPERLESS_WEBSERVER_WORKERS=<num>
|
|
||||||
The number of worker processes the webserver should spawn. More worker processes
|
|
||||||
usually result in the front end to load data much quicker. However, each worker process
|
|
||||||
also loads the entire application into memory separately, so increasing this value
|
|
||||||
will increase RAM usage.
|
|
||||||
|
|
||||||
Defaults to 1.
|
|
||||||
|
|
||||||
PAPERLESS_BIND_ADDR=<ip address>
|
|
||||||
The IP address the webserver will listen on inside the container. There are
|
|
||||||
special setups where you may need to configure this value to restrict the
|
|
||||||
Ip address or interface the webserver listens on.
|
|
||||||
|
|
||||||
Defaults to [::], meaning all interfaces, including IPv6.
|
|
||||||
|
|
||||||
PAPERLESS_PORT=<port>
|
|
||||||
The port number the webserver will listen on inside the container. There are
|
|
||||||
special setups where you may need this to avoid collisions with other
|
|
||||||
services (like using podman with multiple containers in one pod).
|
|
||||||
|
|
||||||
Don't change this when using Docker. To change the port the webserver is
|
|
||||||
reachable outside of the container, instead refer to the "ports" key in
|
|
||||||
``docker-compose.yml``.
|
|
||||||
|
|
||||||
Defaults to 8000.
|
|
||||||
|
|
||||||
USERMAP_UID=<uid>
|
|
||||||
The ID of the paperless user in the container. Set this to your actual user ID on the
|
|
||||||
host system, which you can get by executing
|
|
||||||
|
|
||||||
.. code:: shell-session
|
|
||||||
|
|
||||||
$ id -u
|
|
||||||
|
|
||||||
Paperless will change ownership on its folders to this user, so you need to get this right
|
|
||||||
in order to be able to write to the consumption directory.
|
|
||||||
|
|
||||||
Defaults to 1000.
|
|
||||||
|
|
||||||
USERMAP_GID=<gid>
|
|
||||||
The ID of the paperless Group in the container. Set this to your actual group ID on the
|
|
||||||
host system, which you can get by executing
|
|
||||||
|
|
||||||
.. code:: shell-session
|
|
||||||
|
|
||||||
$ id -g
|
|
||||||
|
|
||||||
Paperless will change ownership on its folders to this group, so you need to get this right
|
|
||||||
in order to be able to write to the consumption directory.
|
|
||||||
|
|
||||||
Defaults to 1000.
|
|
||||||
|
|
||||||
PAPERLESS_OCR_LANGUAGES=<list>
|
|
||||||
Additional OCR languages to install. By default, paperless comes with
|
|
||||||
English, German, Italian, Spanish and French. If your language is not in this list, install
|
|
||||||
additional languages with this configuration option:
|
|
||||||
|
|
||||||
.. code:: bash
|
|
||||||
|
|
||||||
PAPERLESS_OCR_LANGUAGES=tur ces
|
|
||||||
|
|
||||||
To actually use these languages, also set the default OCR language of paperless:
|
|
||||||
|
|
||||||
.. code:: bash
|
|
||||||
|
|
||||||
PAPERLESS_OCR_LANGUAGE=tur
|
|
||||||
|
|
||||||
Defaults to none, which does not install any additional languages.
|
|
||||||
|
|
||||||
PAPERLESS_ENABLE_FLOWER=<defined>
|
|
||||||
If this environment variable is defined, the Celery monitoring tool
|
|
||||||
`Flower <https://flower.readthedocs.io/en/latest/index.html>`_ will
|
|
||||||
be started by the container.
|
|
||||||
|
|
||||||
You can read more about this in the :ref:`advanced setup <advanced-celery-monitoring>`
|
|
||||||
documentation.
|
|
||||||
|
|
||||||
|
|
||||||
.. _configuration-update-checking:
|
|
||||||
|
|
||||||
Update Checking
|
|
||||||
###############
|
|
||||||
|
|
||||||
PAPERLESS_ENABLE_UPDATE_CHECK=<bool>
|
|
||||||
|
|
||||||
.. note::
|
|
||||||
|
|
||||||
This setting was deprecated in favor of a frontend setting after v1.9.2. A one-time
|
|
||||||
migration is performed for users who have this setting set. This setting is always
|
|
||||||
ignored if the corresponding frontend setting has been set.
|
|
||||||
|
@ -1,431 +1,12 @@
|
|||||||
.. _extending:
|
.. _extending:
|
||||||
|
|
||||||
|
*************************
|
||||||
Paperless-ngx Development
|
Paperless-ngx Development
|
||||||
#########################
|
*************************
|
||||||
|
|
||||||
This section describes the steps you need to take to start development on paperless-ngx.
|
|
||||||
|
|
||||||
Check out the source from github. The repository is organized in the following way:
|
.. cssclass:: redirect-notice
|
||||||
|
|
||||||
* ``main`` always represents the latest release and will only see changes
|
The Paperless-ngx documentation has permanently moved.
|
||||||
when a new release is made.
|
|
||||||
* ``dev`` contains the code that will be in the next release.
|
|
||||||
* ``feature-X`` contain bigger changes that will be in some release, but not
|
|
||||||
necessarily the next one.
|
|
||||||
|
|
||||||
When making functional changes to paperless, *always* make your changes on the ``dev`` branch.
|
You will be redirected shortly...
|
||||||
|
|
||||||
Apart from that, the folder structure is as follows:
|
|
||||||
|
|
||||||
* ``docs/`` - Documentation.
|
|
||||||
* ``src-ui/`` - Code of the front end.
|
|
||||||
* ``src/`` - Code of the back end.
|
|
||||||
* ``scripts/`` - Various scripts that help with different parts of development.
|
|
||||||
* ``docker/`` - Files required to build the docker image.
|
|
||||||
|
|
||||||
Contributing to Paperless
|
|
||||||
=========================
|
|
||||||
|
|
||||||
Maybe you've been using Paperless for a while and want to add a feature or two,
|
|
||||||
or maybe you've come across a bug that you have some ideas how to solve. The
|
|
||||||
beauty of open source software is that you can see what's wrong and help to get
|
|
||||||
it fixed for everyone!
|
|
||||||
|
|
||||||
Before contributing please review our `code of conduct`_ and other important
|
|
||||||
information in the `contributing guidelines`_.
|
|
||||||
|
|
||||||
.. _code-formatting-with-pre-commit-hooks:
|
|
||||||
|
|
||||||
Code formatting with pre-commit Hooks
|
|
||||||
=====================================
|
|
||||||
|
|
||||||
To ensure a consistent style and formatting across the project source, the project
|
|
||||||
utilizes a Git `pre-commit` hook to perform some formatting and linting before a
|
|
||||||
commit is allowed. That way, everyone uses the same style and some common issues
|
|
||||||
can be caught early on. See below for installation instructions.
|
|
||||||
|
|
||||||
Once installed, hooks will run when you commit. If the formatting isn't quite right
|
|
||||||
or a linter catches something, the commit will be rejected. You'll need to look at the
|
|
||||||
output and fix the issue. Some hooks, such as the Python formatting tool `black`,
|
|
||||||
will format failing files, so all you need to do is `git add` those files again and
|
|
||||||
retry your commit.
|
|
||||||
|
|
||||||
Initial setup and first start
|
|
||||||
=============================
|
|
||||||
|
|
||||||
After you forked and cloned the code from github you need to perform a first-time setup.
|
|
||||||
To do the setup you need to perform the steps from the following chapters in a certain order:
|
|
||||||
|
|
||||||
1. Install prerequisites + pipenv as mentioned in :ref:`Bare metal route <setup-bare_metal>`
|
|
||||||
2. Copy ``paperless.conf.example`` to ``paperless.conf`` and enable debug mode.
|
|
||||||
3. Install the Angular CLI interface:
|
|
||||||
|
|
||||||
.. code:: shell-session
|
|
||||||
|
|
||||||
$ npm install -g @angular/cli
|
|
||||||
|
|
||||||
4. Install pre-commit
|
|
||||||
|
|
||||||
.. code:: shell-session
|
|
||||||
|
|
||||||
pre-commit install
|
|
||||||
|
|
||||||
5. Create ``consume`` and ``media`` folders in the cloned root folder.
|
|
||||||
|
|
||||||
.. code:: shell-session
|
|
||||||
|
|
||||||
mkdir -p consume media
|
|
||||||
|
|
||||||
6. You can now either ...
|
|
||||||
|
|
||||||
* install redis or
|
|
||||||
* use the included scripts/start-services.sh to use docker to fire up a redis instance (and some other services such as tika, gotenberg and a database server) or
|
|
||||||
* spin up a bare redis container
|
|
||||||
|
|
||||||
.. code:: shell-session
|
|
||||||
|
|
||||||
docker run -d -p 6379:6379 --restart unless-stopped redis:latest
|
|
||||||
|
|
||||||
7. Install the python dependencies by performing in the src/ directory.
|
|
||||||
|
|
||||||
.. code:: shell-session
|
|
||||||
|
|
||||||
pipenv install --dev
|
|
||||||
|
|
||||||
* Make sure you're using python 3.9.x or lower. Otherwise you might get issues with building dependencies. You can use `pyenv <https://github.com/pyenv/pyenv>`_ to install a specific python version.
|
|
||||||
|
|
||||||
8. Generate the static UI so you can perform a login to get session that is required for frontend development (this needs to be done one time only). From src-ui directory:
|
|
||||||
|
|
||||||
.. code:: shell-session
|
|
||||||
|
|
||||||
npm install .
|
|
||||||
./node_modules/.bin/ng build --configuration production
|
|
||||||
|
|
||||||
9. Apply migrations and create a superuser for your dev instance:
|
|
||||||
|
|
||||||
.. code:: shell-session
|
|
||||||
|
|
||||||
python3 manage.py migrate
|
|
||||||
python3 manage.py createsuperuser
|
|
||||||
|
|
||||||
10. Now spin up the dev backend. Depending on which part of paperless you're developing for, you need to have some or all of them running.
|
|
||||||
|
|
||||||
.. code:: shell-session
|
|
||||||
|
|
||||||
python3 manage.py runserver & python3 manage.py document_consumer & celery --app paperless worker
|
|
||||||
|
|
||||||
11. Login with the superuser credentials provided in step 8 at ``http://localhost:8000`` to create a session that enables you to use the backend.
|
|
||||||
|
|
||||||
Backend development environment is now ready, to start Frontend development go to ``/src-ui`` and run ``ng serve``. From there you can use ``http://localhost:4200`` for a preview.
|
|
||||||
|
|
||||||
Back end development
|
|
||||||
====================
|
|
||||||
|
|
||||||
The backend is a django application. PyCharm works well for development, but you can use whatever
|
|
||||||
you want.
|
|
||||||
|
|
||||||
Configure the IDE to use the src/ folder as the base source folder. Configure the following
|
|
||||||
launch configurations in your IDE:
|
|
||||||
|
|
||||||
* python3 manage.py runserver
|
|
||||||
* celery --app paperless worker
|
|
||||||
* python3 manage.py document_consumer
|
|
||||||
|
|
||||||
To start them all:
|
|
||||||
|
|
||||||
.. code:: shell-session
|
|
||||||
|
|
||||||
python3 manage.py runserver & python3 manage.py document_consumer & celery --app paperless worker
|
|
||||||
|
|
||||||
Testing and code style:
|
|
||||||
|
|
||||||
* Run ``pytest`` in the src/ directory to execute all tests. This also generates a HTML coverage
|
|
||||||
report. When runnings test, paperless.conf is loaded as well. However: the tests rely on the default
|
|
||||||
configuration. This is not ideal. But for now, make sure no settings except for DEBUG are overridden when testing.
|
|
||||||
* Coding style is enforced by the Git pre-commit hooks. These will ensure your code is formatted and do some
|
|
||||||
linting when you do a `git commit`.
|
|
||||||
* You can also run ``black`` manually to format your code
|
|
||||||
|
|
||||||
.. note::
|
|
||||||
|
|
||||||
The line length rule E501 is generally useful for getting multiple source files
|
|
||||||
next to each other on the screen. However, in some cases, its just not possible
|
|
||||||
to make some lines fit, especially complicated IF cases. Append ``# NOQA: E501``
|
|
||||||
to disable this check for certain lines.
|
|
||||||
|
|
||||||
Front end development
|
|
||||||
=====================
|
|
||||||
|
|
||||||
The front end is built using Angular. In order to get started, you need ``npm``.
|
|
||||||
Install the Angular CLI interface with
|
|
||||||
|
|
||||||
.. code:: shell-session
|
|
||||||
|
|
||||||
$ npm install -g @angular/cli
|
|
||||||
|
|
||||||
and make sure that it's on your path. Next, in the src-ui/ directory, install the
|
|
||||||
required dependencies of the project.
|
|
||||||
|
|
||||||
.. code:: shell-session
|
|
||||||
|
|
||||||
$ npm install
|
|
||||||
|
|
||||||
You can launch a development server by running
|
|
||||||
|
|
||||||
.. code:: shell-session
|
|
||||||
|
|
||||||
$ ng serve
|
|
||||||
|
|
||||||
This will automatically update whenever you save. However, in-place compilation might fail
|
|
||||||
on syntax errors, in which case you need to restart it.
|
|
||||||
|
|
||||||
By default, the development server is available on ``http://localhost:4200/`` and is configured
|
|
||||||
to access the API at ``http://localhost:8000/api/``, which is the default of the backend.
|
|
||||||
If you enabled DEBUG on the back end, several security overrides for allowed hosts, CORS and
|
|
||||||
X-Frame-Options are in place so that the front end behaves exactly as in production. This also
|
|
||||||
relies on you being logged into the back end. Without a valid session, The front end will simply
|
|
||||||
not work.
|
|
||||||
|
|
||||||
Testing and code style:
|
|
||||||
|
|
||||||
* The frontend code (.ts, .html, .scss) use ``prettier`` for code formatting via the Git
|
|
||||||
``pre-commit`` hooks which run automatically on commit. See
|
|
||||||
:ref:`above <code-formatting-with-pre-commit-hooks>` for installation. You can also run this
|
|
||||||
via cli with a command such as
|
|
||||||
|
|
||||||
.. code:: shell-session
|
|
||||||
|
|
||||||
$ git ls-files -- '*.ts' | xargs pre-commit run prettier --files
|
|
||||||
|
|
||||||
* Frontend testing uses jest and cypress. There is currently a need for significantly more
|
|
||||||
frontend tests. Unit tests and e2e tests, respectively, can be run non-interactively with:
|
|
||||||
|
|
||||||
.. code:: shell-session
|
|
||||||
|
|
||||||
$ ng test
|
|
||||||
$ npm run e2e:ci
|
|
||||||
|
|
||||||
Cypress also includes a UI which can be run from within the ``src-ui`` directory with
|
|
||||||
|
|
||||||
.. code:: shell-session
|
|
||||||
|
|
||||||
$ ./node_modules/.bin/cypress open
|
|
||||||
|
|
||||||
In order to build the front end and serve it as part of django, execute
|
|
||||||
|
|
||||||
.. code:: shell-session
|
|
||||||
|
|
||||||
$ ng build --prod
|
|
||||||
|
|
||||||
This will build the front end and put it in a location from which the Django server will serve
|
|
||||||
it as static content. This way, you can verify that authentication is working.
|
|
||||||
|
|
||||||
|
|
||||||
Localization
|
|
||||||
============
|
|
||||||
|
|
||||||
Paperless is available in many different languages. Since paperless consists both of a django
|
|
||||||
application and an Angular front end, both these parts have to be translated separately.
|
|
||||||
|
|
||||||
Front end localization
|
|
||||||
----------------------
|
|
||||||
|
|
||||||
* The Angular front end does localization according to the `Angular documentation <https://angular.io/guide/i18n>`_.
|
|
||||||
* The source language of the project is "en_US".
|
|
||||||
* The source strings end up in the file "src-ui/messages.xlf".
|
|
||||||
* The translated strings need to be placed in the "src-ui/src/locale/" folder.
|
|
||||||
* In order to extract added or changed strings from the source files, call ``ng xi18n --ivy``.
|
|
||||||
|
|
||||||
Adding new languages requires adding the translated files in the "src-ui/src/locale/" folder and adjusting a couple files.
|
|
||||||
|
|
||||||
1. Adjust "src-ui/angular.json":
|
|
||||||
|
|
||||||
.. code:: json
|
|
||||||
|
|
||||||
"i18n": {
|
|
||||||
"sourceLocale": "en-US",
|
|
||||||
"locales": {
|
|
||||||
"de": "src/locale/messages.de.xlf",
|
|
||||||
"nl-NL": "src/locale/messages.nl_NL.xlf",
|
|
||||||
"fr": "src/locale/messages.fr.xlf",
|
|
||||||
"en-GB": "src/locale/messages.en_GB.xlf",
|
|
||||||
"pt-BR": "src/locale/messages.pt_BR.xlf",
|
|
||||||
"language-code": "language-file"
|
|
||||||
}
|
|
||||||
}
|
|
||||||
|
|
||||||
2. Add the language to the available options in "src-ui/src/app/services/settings.service.ts":
|
|
||||||
|
|
||||||
.. code:: typescript
|
|
||||||
|
|
||||||
getLanguageOptions(): LanguageOption[] {
|
|
||||||
return [
|
|
||||||
{code: "en-us", name: $localize`English (US)`, englishName: "English (US)", dateInputFormat: "mm/dd/yyyy"},
|
|
||||||
{code: "en-gb", name: $localize`English (GB)`, englishName: "English (GB)", dateInputFormat: "dd/mm/yyyy"},
|
|
||||||
{code: "de", name: $localize`German`, englishName: "German", dateInputFormat: "dd.mm.yyyy"},
|
|
||||||
{code: "nl", name: $localize`Dutch`, englishName: "Dutch", dateInputFormat: "dd-mm-yyyy"},
|
|
||||||
{code: "fr", name: $localize`French`, englishName: "French", dateInputFormat: "dd/mm/yyyy"},
|
|
||||||
{code: "pt-br", name: $localize`Portuguese (Brazil)`, englishName: "Portuguese (Brazil)", dateInputFormat: "dd/mm/yyyy"}
|
|
||||||
// Add your new language here
|
|
||||||
]
|
|
||||||
}
|
|
||||||
|
|
||||||
``dateInputFormat`` is a special string that defines the behavior of the date input fields and absolutely needs to contain "dd", "mm" and "yyyy".
|
|
||||||
|
|
||||||
3. Import and register the Angular data for this locale in "src-ui/src/app/app.module.ts":
|
|
||||||
|
|
||||||
.. code:: typescript
|
|
||||||
|
|
||||||
import localeDe from '@angular/common/locales/de';
|
|
||||||
registerLocaleData(localeDe)
|
|
||||||
|
|
||||||
Back end localization
|
|
||||||
---------------------
|
|
||||||
|
|
||||||
A majority of the strings that appear in the back end appear only when the admin is used. However,
|
|
||||||
some of these are still shown on the front end (such as error messages).
|
|
||||||
|
|
||||||
* The django application does localization according to the `django documentation <https://docs.djangoproject.com/en/3.1/topics/i18n/translation/>`_.
|
|
||||||
* The source language of the project is "en_US".
|
|
||||||
* Localization files end up in the folder "src/locale/".
|
|
||||||
* In order to extract strings from the application, call ``python3 manage.py makemessages -l en_US``. This is important after making changes to translatable strings.
|
|
||||||
* The message files need to be compiled for them to show up in the application. Call ``python3 manage.py compilemessages`` to do this. The generated files don't get
|
|
||||||
committed into git, since these are derived artifacts. The build pipeline takes care of executing this command.
|
|
||||||
|
|
||||||
Adding new languages requires adding the translated files in the "src/locale/" folder and adjusting the file "src/paperless/settings.py" to include the new language:
|
|
||||||
|
|
||||||
.. code:: python
|
|
||||||
|
|
||||||
LANGUAGES = [
|
|
||||||
("en-us", _("English (US)")),
|
|
||||||
("en-gb", _("English (GB)")),
|
|
||||||
("de", _("German")),
|
|
||||||
("nl-nl", _("Dutch")),
|
|
||||||
("fr", _("French")),
|
|
||||||
("pt-br", _("Portuguese (Brazil)")),
|
|
||||||
# Add language here.
|
|
||||||
]
|
|
||||||
|
|
||||||
|
|
||||||
Building the documentation
|
|
||||||
==========================
|
|
||||||
|
|
||||||
The documentation is built using sphinx. I've configured ReadTheDocs to automatically build
|
|
||||||
the documentation when changes are pushed. If you want to build the documentation locally,
|
|
||||||
this is how you do it:
|
|
||||||
|
|
||||||
1. Install python dependencies.
|
|
||||||
|
|
||||||
.. code:: shell-session
|
|
||||||
|
|
||||||
$ cd /path/to/paperless
|
|
||||||
$ pipenv install --dev
|
|
||||||
|
|
||||||
2. Build the documentation
|
|
||||||
|
|
||||||
.. code:: shell-session
|
|
||||||
|
|
||||||
$ cd /path/to/paperless/docs
|
|
||||||
$ pipenv run make clean html
|
|
||||||
|
|
||||||
This will build the HTML documentation, and put the resulting files in the ``_build/html``
|
|
||||||
directory.
|
|
||||||
|
|
||||||
Building the Docker image
|
|
||||||
=========================
|
|
||||||
|
|
||||||
The docker image is primarily built by the GitHub actions workflow, but it can be
|
|
||||||
faster when developing to build and tag an image locally.
|
|
||||||
|
|
||||||
To provide the build arguments automatically, build the image using the helper
|
|
||||||
script ``build-docker-image.sh``.
|
|
||||||
|
|
||||||
Building the docker image from source:
|
|
||||||
|
|
||||||
.. code:: shell-session
|
|
||||||
|
|
||||||
./build-docker-image.sh Dockerfile -t <your-tag>
|
|
||||||
|
|
||||||
Extending Paperless
|
|
||||||
===================
|
|
||||||
|
|
||||||
Paperless does not have any fancy plugin systems and will probably never have. However,
|
|
||||||
some parts of the application have been designed to allow easy integration of additional
|
|
||||||
features without any modification to the base code.
|
|
||||||
|
|
||||||
Making custom parsers
|
|
||||||
---------------------
|
|
||||||
|
|
||||||
Paperless uses parsers to add documents to paperless. A parser is responsible for:
|
|
||||||
|
|
||||||
* Retrieve the content from the original
|
|
||||||
* Create a thumbnail
|
|
||||||
* Optional: Retrieve a created date from the original
|
|
||||||
* Optional: Create an archived document from the original
|
|
||||||
|
|
||||||
Custom parsers can be added to paperless to support more file types. In order to do that,
|
|
||||||
you need to write the parser itself and announce its existence to paperless.
|
|
||||||
|
|
||||||
The parser itself must extend ``documents.parsers.DocumentParser`` and must implement the
|
|
||||||
methods ``parse`` and ``get_thumbnail``. You can provide your own implementation to
|
|
||||||
``get_date`` if you don't want to rely on paperless' default date guessing mechanisms.
|
|
||||||
|
|
||||||
.. code:: python
|
|
||||||
|
|
||||||
class MyCustomParser(DocumentParser):
|
|
||||||
|
|
||||||
def parse(self, document_path, mime_type):
|
|
||||||
# This method does not return anything. Rather, you should assign
|
|
||||||
# whatever you got from the document to the following fields:
|
|
||||||
|
|
||||||
# The content of the document.
|
|
||||||
self.text = "content"
|
|
||||||
|
|
||||||
# Optional: path to a PDF document that you created from the original.
|
|
||||||
self.archive_path = os.path.join(self.tempdir, "archived.pdf")
|
|
||||||
|
|
||||||
# Optional: "created" date of the document.
|
|
||||||
self.date = get_created_from_metadata(document_path)
|
|
||||||
|
|
||||||
def get_thumbnail(self, document_path, mime_type):
|
|
||||||
# This should return the path to a thumbnail you created for this
|
|
||||||
# document.
|
|
||||||
return os.path.join(self.tempdir, "thumb.png")
|
|
||||||
|
|
||||||
If you encounter any issues during parsing, raise a ``documents.parsers.ParseError``.
|
|
||||||
|
|
||||||
The ``self.tempdir`` directory is a temporary directory that is guaranteed to be empty
|
|
||||||
and removed after consumption finished. You can use that directory to store any
|
|
||||||
intermediate files and also use it to store the thumbnail / archived document.
|
|
||||||
|
|
||||||
After that, you need to announce your parser to paperless. You need to connect a
|
|
||||||
handler to the ``document_consumer_declaration`` signal. Have a look in the file
|
|
||||||
``src/paperless_tesseract/apps.py`` on how that's done. The handler is a method
|
|
||||||
that returns information about your parser:
|
|
||||||
|
|
||||||
.. code:: python
|
|
||||||
|
|
||||||
def myparser_consumer_declaration(sender, **kwargs):
|
|
||||||
return {
|
|
||||||
"parser": MyCustomParser,
|
|
||||||
"weight": 0,
|
|
||||||
"mime_types": {
|
|
||||||
"application/pdf": ".pdf",
|
|
||||||
"image/jpeg": ".jpg",
|
|
||||||
}
|
|
||||||
}
|
|
||||||
|
|
||||||
* ``parser`` is a reference to a class that extends ``DocumentParser``.
|
|
||||||
|
|
||||||
* ``weight`` is used whenever two or more parsers are able to parse a file: The parser with
|
|
||||||
the higher weight wins. This can be used to override the parsers provided by
|
|
||||||
paperless.
|
|
||||||
|
|
||||||
* ``mime_types`` is a dictionary. The keys are the mime types your parser supports and the value
|
|
||||||
is the default file extension that paperless should use when storing files and serving them for
|
|
||||||
download. We could guess that from the file extensions, but some mime types have many extensions
|
|
||||||
associated with them and the python methods responsible for guessing the extension do not always
|
|
||||||
return the same value.
|
|
||||||
|
|
||||||
.. _code of conduct: https://github.com/paperless-ngx/paperless-ngx/blob/main/CODE_OF_CONDUCT.md
|
|
||||||
.. _contributing guidelines: https://github.com/paperless-ngx/paperless-ngx/blob/main/CONTRIBUTING.md
|
|
||||||
|
113
docs/faq.rst
113
docs/faq.rst
@ -1,117 +1,12 @@
|
|||||||
|
.. _faq:
|
||||||
|
|
||||||
**************************
|
**************************
|
||||||
Frequently asked questions
|
Frequently asked questions
|
||||||
**************************
|
**************************
|
||||||
|
|
||||||
**Q:** *What's the general plan for Paperless-ngx?*
|
|
||||||
|
|
||||||
**A:** While Paperless-ngx is already considered largely "feature-complete" it is a community-driven
|
.. cssclass:: redirect-notice
|
||||||
project and development will be guided in this way. New features can be submitted via
|
|
||||||
GitHub discussions and "up-voted" by the community but this is not a guarantee the feature
|
|
||||||
will be implemented. This project will always be open to collaboration in the form of PRs,
|
|
||||||
ideas etc.
|
|
||||||
|
|
||||||
**Q:** *I'm using docker. Where are my documents?*
|
The Paperless-ngx documentation has permanently moved.
|
||||||
|
|
||||||
**A:** Your documents are stored inside the docker volume ``paperless_media``.
|
You will be redirected shortly...
|
||||||
Docker manages this volume automatically for you. It is a persistent storage
|
|
||||||
and will persist as long as you don't explicitly delete it. The actual location
|
|
||||||
depends on your host operating system. On Linux, chances are high that this location
|
|
||||||
is
|
|
||||||
|
|
||||||
.. code::
|
|
||||||
|
|
||||||
/var/lib/docker/volumes/paperless_media/_data
|
|
||||||
|
|
||||||
.. caution::
|
|
||||||
|
|
||||||
Do not mess with this folder. Don't change permissions and don't move
|
|
||||||
files around manually. This folder is meant to be entirely managed by docker
|
|
||||||
and paperless.
|
|
||||||
|
|
||||||
**Q:** *Let's say I want to switch tools in a year. Can I easily move to other systems?*
|
|
||||||
|
|
||||||
**A:** Your documents are stored as plain files inside the media folder. You can always drag those files
|
|
||||||
out of that folder to use them elsewhere. Here are a couple notes about that.
|
|
||||||
|
|
||||||
* Paperless-ngx never modifies your original documents. It keeps checksums of all documents and uses a
|
|
||||||
scheduled sanity checker to check that they remain the same.
|
|
||||||
* By default, paperless uses the internal ID of each document as its filename. This might not be very
|
|
||||||
convenient for export. However, you can adjust the way files are stored in paperless by
|
|
||||||
:ref:`configuring the filename format <advanced-file_name_handling>`.
|
|
||||||
* :ref:`The exporter <utilities-exporter>` is another easy way to get your files out of paperless with reasonable file names.
|
|
||||||
|
|
||||||
**Q:** *What file types does paperless-ngx support?*
|
|
||||||
|
|
||||||
**A:** Currently, the following files are supported:
|
|
||||||
|
|
||||||
* PDF documents, PNG images, JPEG images, TIFF images and GIF images are processed with OCR and converted into PDF documents.
|
|
||||||
* Plain text documents are supported as well and are added verbatim
|
|
||||||
to paperless.
|
|
||||||
* With the optional Tika integration enabled (see :ref:`Configuration <configuration-tika>`), Paperless also supports various
|
|
||||||
Office documents (.docx, .doc, odt, .ppt, .pptx, .odp, .xls, .xlsx, .ods).
|
|
||||||
|
|
||||||
Paperless-ngx determines the type of a file by inspecting its content. The
|
|
||||||
file extensions do not matter.
|
|
||||||
|
|
||||||
**Q:** *Will paperless-ngx run on Raspberry Pi?*
|
|
||||||
|
|
||||||
**A:** The short answer is yes. I've tested it on a Raspberry Pi 3 B.
|
|
||||||
The long answer is that certain parts of
|
|
||||||
Paperless will run very slow, such as the OCR. On Raspberry Pi,
|
|
||||||
try to OCR documents before feeding them into paperless so that paperless can
|
|
||||||
reuse the text. The web interface is a lot snappier, since it runs
|
|
||||||
in your browser and paperless has to do much less work to serve the data.
|
|
||||||
|
|
||||||
.. note::
|
|
||||||
|
|
||||||
You can adjust some of the settings so that paperless uses less processing
|
|
||||||
power. See :ref:`setup-less_powerful_devices` for details.
|
|
||||||
|
|
||||||
|
|
||||||
**Q:** *How do I install paperless-ngx on Raspberry Pi?*
|
|
||||||
|
|
||||||
**A:** Docker images are available for arm and arm64 hardware, so just follow
|
|
||||||
the docker-compose instructions. Apart from more required disk space compared to
|
|
||||||
a bare metal installation, docker comes with close to zero overhead, even on
|
|
||||||
Raspberry Pi.
|
|
||||||
|
|
||||||
If you decide to got with the bare metal route, be aware that some of the
|
|
||||||
python requirements do not have precompiled packages for ARM / ARM64. Installation
|
|
||||||
of these will require additional development libraries and compilation will take
|
|
||||||
a long time.
|
|
||||||
|
|
||||||
**Q:** *How do I run this on Unraid?*
|
|
||||||
|
|
||||||
**A:** Paperless-ngx is available as `community app <https://unraid.net/community/apps?q=paperless-ngx>`_
|
|
||||||
in Unraid. `Uli Fahrer <https://github.com/Tooa>`_ created a container template for that.
|
|
||||||
|
|
||||||
**Q:** *How do I run this on my toaster?*
|
|
||||||
|
|
||||||
**A:** I honestly don't know! As for all other devices that might be able
|
|
||||||
to run paperless, you're a bit on your own. If you can't run the docker image,
|
|
||||||
the documentation has instructions for bare metal installs. I'm running
|
|
||||||
paperless on an i3 processor from 2015 or so. This is also what I use to test
|
|
||||||
new releases with. Apart from that, I also have a Raspberry Pi, which I
|
|
||||||
occasionally build the image on and see if it works.
|
|
||||||
|
|
||||||
**Q:** *How do I proxy this with NGINX?*
|
|
||||||
|
|
||||||
**A:** See :ref:`here <setup-nginx>`.
|
|
||||||
|
|
||||||
.. _faq-mod_wsgi:
|
|
||||||
|
|
||||||
**Q:** *How do I get WebSocket support with Apache mod_wsgi*?
|
|
||||||
|
|
||||||
**A:** ``mod_wsgi`` by itself does not support ASGI. Paperless will continue
|
|
||||||
to work with WSGI, but certain features such as status notifications about
|
|
||||||
document consumption won't be available.
|
|
||||||
|
|
||||||
If you want to continue using ``mod_wsgi``, you will have to run an ASGI-enabled
|
|
||||||
web server as well that processes WebSocket connections, and configure Apache to
|
|
||||||
redirect WebSocket connections to this server. Multiple options for ASGI servers
|
|
||||||
exist:
|
|
||||||
|
|
||||||
* ``gunicorn`` with ``uvicorn`` as the worker implementation (the default of paperless)
|
|
||||||
* ``daphne`` as a standalone server, which is the reference implementation for ASGI.
|
|
||||||
* ``uvicorn`` as a standalone server
|
|
||||||
|
@ -2,74 +2,24 @@
|
|||||||
Paperless
|
Paperless
|
||||||
*********
|
*********
|
||||||
|
|
||||||
Paperless is a simple Django application running in two parts:
|
|
||||||
a *Consumer* (the thing that does the indexing) and
|
|
||||||
the *Web server* (the part that lets you search &
|
|
||||||
download already-indexed documents). If you want to learn more about its
|
|
||||||
functions keep on reading after the installation section.
|
|
||||||
|
|
||||||
|
.. cssclass:: redirect-notice
|
||||||
|
|
||||||
Why This Exists
|
The Paperless-ngx documentation has permanently moved.
|
||||||
===============
|
|
||||||
|
|
||||||
Paper is a nightmare. Environmental issues aside, there's no excuse for it in
|
You will be redirected shortly...
|
||||||
the 21st century. It takes up space, collects dust, doesn't support any form
|
|
||||||
of a search feature, indexing is tedious, it's heavy and prone to damage &
|
|
||||||
loss.
|
|
||||||
|
|
||||||
I wrote this to make "going paperless" easier. I do not have to worry about
|
|
||||||
finding stuff again. I feed documents right from the post box into the scanner
|
|
||||||
and then shred them. Perhaps you might find it useful too.
|
|
||||||
|
|
||||||
|
|
||||||
Paperless-ngx
|
|
||||||
=============
|
|
||||||
|
|
||||||
Paperless-ngx is a document management system that transforms your physical
|
|
||||||
documents into a searchable online archive so you can keep, well, *less paper*.
|
|
||||||
|
|
||||||
Paperless-ngx forked from paperless-ng to continue the great work and
|
|
||||||
distribute responsibility of supporting and advancing the project among a team
|
|
||||||
of people.
|
|
||||||
|
|
||||||
NG stands for both Angular (the framework used for the
|
|
||||||
Frontend) and next-gen. Publishing this project under a different name also
|
|
||||||
avoids confusion between paperless and paperless-ngx.
|
|
||||||
|
|
||||||
If you want to learn about what's different in paperless-ngx from Paperless, check out these
|
|
||||||
resources in the documentation:
|
|
||||||
|
|
||||||
* :ref:`Some screenshots <screenshots>` of the new UI are available.
|
|
||||||
* Read :ref:`this section <advanced-automatic_matching>` if you want to
|
|
||||||
learn about how paperless automates all tagging using machine learning.
|
|
||||||
* Paperless now comes with a :ref:`proper email consumer <usage-email>`
|
|
||||||
that's fully tested and production ready.
|
|
||||||
* Paperless creates searchable PDF/A documents from whatever you put into
|
|
||||||
the consumption directory. This means that you can select text in
|
|
||||||
image-only documents coming from your scanner.
|
|
||||||
* See :ref:`this note <utilities-encyption>` about GnuPG encryption in
|
|
||||||
paperless-ngx.
|
|
||||||
* Paperless is now integrated with a
|
|
||||||
:ref:`task processing queue <setup-task_processor>` that tells you
|
|
||||||
at a glance when and why something is not working.
|
|
||||||
* The :doc:`changelog </changelog>` contains a detailed list of all changes
|
|
||||||
in paperless-ngx.
|
|
||||||
|
|
||||||
Contents
|
|
||||||
========
|
|
||||||
|
|
||||||
.. toctree::
|
.. toctree::
|
||||||
:maxdepth: 1
|
|
||||||
|
|
||||||
setup
|
screenshots
|
||||||
usage_overview
|
scanners
|
||||||
advanced_usage
|
administration
|
||||||
administration
|
advanced_usage
|
||||||
configuration
|
usage_overview
|
||||||
api
|
setup
|
||||||
faq
|
troubleshooting
|
||||||
troubleshooting
|
changelog
|
||||||
extending
|
configuration
|
||||||
scanners
|
extending
|
||||||
screenshots
|
api
|
||||||
changelog
|
faq
|
||||||
|
@ -1,8 +1,12 @@
|
|||||||
|
|
||||||
.. _scanners:
|
.. _scanners:
|
||||||
|
|
||||||
*******************
|
*******************
|
||||||
Scanners & Software
|
Scanners & Software
|
||||||
*******************
|
*******************
|
||||||
|
|
||||||
Paperless-ngx is compatible with many different scanners and scanning tools. A user-maintained list of scanners and other software is available on `the wiki <https://github.com/paperless-ngx/paperless-ngx/wiki/Scanner-&-Software-Recommendations>`_.
|
|
||||||
|
.. cssclass:: redirect-notice
|
||||||
|
|
||||||
|
The Paperless-ngx documentation has permanently moved.
|
||||||
|
|
||||||
|
You will be redirected shortly...
|
||||||
|
@ -4,60 +4,9 @@
|
|||||||
Screenshots
|
Screenshots
|
||||||
***********
|
***********
|
||||||
|
|
||||||
This is what Paperless-ngx looks like.
|
|
||||||
|
|
||||||
The dashboard shows customizable views on your document and allows document uploads:
|
.. cssclass:: redirect-notice
|
||||||
|
|
||||||
.. image:: _static/screenshots/dashboard.png
|
The Paperless-ngx documentation has permanently moved.
|
||||||
:target: _static/screenshots/dashboard.png
|
|
||||||
|
|
||||||
The document list provides three different styles to scroll through your documents:
|
You will be redirected shortly...
|
||||||
|
|
||||||
.. image:: _static/screenshots/documents-table.png
|
|
||||||
:target: _static/screenshots/documents-table.png
|
|
||||||
.. image:: _static/screenshots/documents-smallcards.png
|
|
||||||
:target: _static/screenshots/documents-smallcards.png
|
|
||||||
.. image:: _static/screenshots/documents-largecards.png
|
|
||||||
:target: _static/screenshots/documents-largecards.png
|
|
||||||
|
|
||||||
Paperless-ngx also supports "dark mode":
|
|
||||||
|
|
||||||
.. image:: _static/screenshots/documents-smallcards-dark.png
|
|
||||||
:target: _static/screenshots/documents-smallcards-dark.png
|
|
||||||
|
|
||||||
Extensive filtering mechanisms:
|
|
||||||
|
|
||||||
.. image:: _static/screenshots/documents-filter.png
|
|
||||||
:target: _static/screenshots/documents-filter.png
|
|
||||||
|
|
||||||
Bulk editing of document tags, correspondents, etc.:
|
|
||||||
|
|
||||||
.. image:: _static/screenshots/bulk-edit.png
|
|
||||||
:target: _static/screenshots/bulk-edit.png
|
|
||||||
|
|
||||||
Side-by-side editing of documents:
|
|
||||||
|
|
||||||
.. image:: _static/screenshots/editing.png
|
|
||||||
:target: _static/screenshots/editing.png
|
|
||||||
|
|
||||||
Tag editing. This looks about the same for correspondents and document types.
|
|
||||||
|
|
||||||
.. image:: _static/screenshots/new-tag.png
|
|
||||||
:target: _static/screenshots/new-tag.png
|
|
||||||
|
|
||||||
Searching provides auto complete and highlights the results.
|
|
||||||
|
|
||||||
.. image:: _static/screenshots/search-preview.png
|
|
||||||
:target: _static/screenshots/search-preview.png
|
|
||||||
.. image:: _static/screenshots/search-results.png
|
|
||||||
:target: _static/screenshots/search-results.png
|
|
||||||
|
|
||||||
Fancy mail filters!
|
|
||||||
|
|
||||||
.. image:: _static/screenshots/mail-rules-edited.png
|
|
||||||
:target: _static/screenshots/mail-rules-edited.png
|
|
||||||
|
|
||||||
Mobile devices are supported.
|
|
||||||
|
|
||||||
.. image:: _static/screenshots/mobile.png
|
|
||||||
:target: _static/screenshots/mobile.png
|
|
||||||
|
890
docs/setup.rst
890
docs/setup.rst
@ -1,894 +1,12 @@
|
|||||||
|
.. _setup:
|
||||||
|
|
||||||
*****
|
*****
|
||||||
Setup
|
Setup
|
||||||
*****
|
*****
|
||||||
|
|
||||||
Overview of Paperless-ngx
|
|
||||||
#########################
|
|
||||||
|
|
||||||
Compared to paperless, paperless-ngx works a little different under the hood and has
|
.. cssclass:: redirect-notice
|
||||||
more moving parts that work together. While this increases the complexity of
|
|
||||||
the system, it also brings many benefits.
|
|
||||||
|
|
||||||
Paperless consists of the following components:
|
The Paperless-ngx documentation has permanently moved.
|
||||||
|
|
||||||
* **The webserver:** This is pretty much the same as in paperless. It serves
|
You will be redirected shortly...
|
||||||
the administration pages, the API, and the new frontend. This is the main
|
|
||||||
tool you'll be using to interact with paperless. You may start the webserver
|
|
||||||
with
|
|
||||||
|
|
||||||
.. code:: shell-session
|
|
||||||
|
|
||||||
$ cd /path/to/paperless/src/
|
|
||||||
$ gunicorn -c ../gunicorn.conf.py paperless.wsgi
|
|
||||||
|
|
||||||
or by any other means such as Apache ``mod_wsgi``.
|
|
||||||
|
|
||||||
* **The consumer:** This is what watches your consumption folder for documents.
|
|
||||||
However, the consumer itself does not really consume your documents.
|
|
||||||
Now it notifies a task processor that a new file is ready for consumption.
|
|
||||||
I suppose it should be named differently.
|
|
||||||
This was also used to check your emails, but that's now done elsewhere as well.
|
|
||||||
|
|
||||||
Start the consumer with the management command ``document_consumer``:
|
|
||||||
|
|
||||||
.. code:: shell-session
|
|
||||||
|
|
||||||
$ cd /path/to/paperless/src/
|
|
||||||
$ python3 manage.py document_consumer
|
|
||||||
|
|
||||||
.. _setup-task_processor:
|
|
||||||
|
|
||||||
* **The task processor:** Paperless relies on `Celery - Distributed Task Queue <https://docs.celeryq.dev/en/stable/index.html>`_
|
|
||||||
for doing most of the heavy lifting. This is a task queue that accepts tasks from
|
|
||||||
multiple sources and processes these in parallel. It also comes with a scheduler that executes
|
|
||||||
certain commands periodically.
|
|
||||||
|
|
||||||
This task processor is responsible for:
|
|
||||||
|
|
||||||
* Consuming documents. When the consumer finds new documents, it notifies the task processor to
|
|
||||||
start a consumption task.
|
|
||||||
* The task processor also performs the consumption of any documents you upload through
|
|
||||||
the web interface.
|
|
||||||
* Consuming emails. It periodically checks your configured accounts for new emails and
|
|
||||||
notifies the task processor to consume the attachment of an email.
|
|
||||||
* Maintaining the search index and the automatic matching algorithm. These are things that paperless
|
|
||||||
needs to do from time to time in order to operate properly.
|
|
||||||
|
|
||||||
This allows paperless to process multiple documents from your consumption folder in parallel! On
|
|
||||||
a modern multi core system, this makes the consumption process with full OCR blazingly fast.
|
|
||||||
|
|
||||||
The task processor comes with a built-in admin interface that you can use to check whenever any of the
|
|
||||||
tasks fail and inspect the errors (i.e., wrong email credentials, errors during consuming a specific
|
|
||||||
file, etc).
|
|
||||||
|
|
||||||
* A `redis <https://redis.io/>`_ message broker: This is a really lightweight service that is responsible
|
|
||||||
for getting the tasks from the webserver and the consumer to the task scheduler. These run in a different
|
|
||||||
process (maybe even on different machines!), and therefore, this is necessary.
|
|
||||||
|
|
||||||
* Optional: A database server. Paperless supports PostgreSQL, MariaDB and SQLite for storing its data.
|
|
||||||
|
|
||||||
|
|
||||||
Installation
|
|
||||||
############
|
|
||||||
|
|
||||||
You can go multiple routes to setup and run Paperless:
|
|
||||||
|
|
||||||
* :ref:`Use the easy install docker script <setup-docker_script>`
|
|
||||||
* :ref:`Pull the image from Docker Hub <setup-docker_hub>`
|
|
||||||
* :ref:`Build the Docker image yourself <setup-docker_build>`
|
|
||||||
* :ref:`Install Paperless directly on your system manually (bare metal) <setup-bare_metal>`
|
|
||||||
|
|
||||||
The Docker routes are quick & easy. These are the recommended routes. This configures all the stuff
|
|
||||||
from the above automatically so that it just works and uses sensible defaults for all configuration options.
|
|
||||||
Here you find a cheat-sheet for docker beginners: `CLI Basics <https://www.sehn.tech/refs/devops-with-docker/>`_
|
|
||||||
|
|
||||||
The bare metal route is complicated to setup but makes it easier
|
|
||||||
should you want to contribute some code back. You need to configure and
|
|
||||||
run the above mentioned components yourself.
|
|
||||||
|
|
||||||
.. _CLI Basics: https://www.sehn.tech/refs/devops-with-docker/
|
|
||||||
|
|
||||||
.. _setup-docker_script:
|
|
||||||
|
|
||||||
Install Paperless from Docker Hub using the installation script
|
|
||||||
===============================================================
|
|
||||||
|
|
||||||
Paperless provides an interactive installation script. This script will ask you
|
|
||||||
for a couple configuration options, download and create the necessary configuration files, pull the docker image, start paperless and create your user account. This script essentially
|
|
||||||
performs all the steps described in :ref:`setup-docker_hub` automatically.
|
|
||||||
|
|
||||||
1. Make sure that docker and docker-compose are installed.
|
|
||||||
2. Download and run the installation script:
|
|
||||||
|
|
||||||
.. code:: shell-session
|
|
||||||
|
|
||||||
$ bash -c "$(curl -L https://raw.githubusercontent.com/paperless-ngx/paperless-ngx/main/install-paperless-ngx.sh)"
|
|
||||||
|
|
||||||
.. _setup-docker_hub:
|
|
||||||
|
|
||||||
Install Paperless from Docker Hub
|
|
||||||
=================================
|
|
||||||
|
|
||||||
1. Login with your user and create a folder in your home-directory `mkdir -v ~/paperless-ngx` to have a place for your configuration files and consumption directory.
|
|
||||||
|
|
||||||
2. Go to the `/docker/compose directory on the project page <https://github.com/paperless-ngx/paperless-ngx/tree/master/docker/compose>`_
|
|
||||||
and download one of the `docker-compose.*.yml` files, depending on which database backend you
|
|
||||||
want to use. Rename this file to `docker-compose.yml`.
|
|
||||||
If you want to enable optional support for Office documents, download a file with `-tika` in the file name.
|
|
||||||
Download the ``docker-compose.env`` file and the ``.env`` file as well and store them
|
|
||||||
in the same directory.
|
|
||||||
|
|
||||||
.. hint::
|
|
||||||
|
|
||||||
For new installations, it is recommended to use PostgreSQL as the database
|
|
||||||
backend.
|
|
||||||
|
|
||||||
3. Install `Docker`_ and `docker-compose`_.
|
|
||||||
|
|
||||||
.. caution::
|
|
||||||
|
|
||||||
If you want to use the included ``docker-compose.*.yml`` file, you
|
|
||||||
need to have at least Docker version **17.09.0** and docker-compose
|
|
||||||
version **1.17.0**.
|
|
||||||
To check do: `docker-compose -v` or `docker -v`
|
|
||||||
|
|
||||||
See the `Docker installation guide`_ on how to install the current
|
|
||||||
version of Docker for your operating system or Linux distribution of
|
|
||||||
choice. To get the latest version of docker-compose, follow the
|
|
||||||
`docker-compose installation guide`_ if your package repository doesn't
|
|
||||||
include it.
|
|
||||||
|
|
||||||
.. _Docker installation guide: https://docs.docker.com/engine/installation/
|
|
||||||
.. _docker-compose installation guide: https://docs.docker.com/compose/install/
|
|
||||||
|
|
||||||
4. Modify ``docker-compose.yml`` to your preferences. You may want to change the path
|
|
||||||
to the consumption directory. Find the line that specifies where
|
|
||||||
to mount the consumption directory:
|
|
||||||
|
|
||||||
.. code::
|
|
||||||
|
|
||||||
- ./consume:/usr/src/paperless/consume
|
|
||||||
|
|
||||||
Replace the part BEFORE the colon with a local directory of your choice:
|
|
||||||
|
|
||||||
.. code::
|
|
||||||
|
|
||||||
- /home/jonaswinkler/paperless-inbox:/usr/src/paperless/consume
|
|
||||||
|
|
||||||
Don't change the part after the colon or paperless wont find your documents.
|
|
||||||
|
|
||||||
You may also need to change the default port that the webserver will use
|
|
||||||
from the default (8000):
|
|
||||||
|
|
||||||
.. code::
|
|
||||||
|
|
||||||
ports:
|
|
||||||
- 8000:8000
|
|
||||||
|
|
||||||
Replace the part BEFORE the colon with a port of your choice:
|
|
||||||
|
|
||||||
.. code::
|
|
||||||
|
|
||||||
ports:
|
|
||||||
- 8010:8000
|
|
||||||
|
|
||||||
Don't change the part after the colon or edit other lines that refer to
|
|
||||||
port 8000. Modifying the part before the colon will map requests on another
|
|
||||||
port to the webserver running on the default port.
|
|
||||||
|
|
||||||
**Rootless**
|
|
||||||
|
|
||||||
If you want to run Paperless as a rootless container, you will need to do the
|
|
||||||
following in your ``docker-compose.yml``:
|
|
||||||
|
|
||||||
- set the ``user`` running the container to map to the ``paperless`` user in the
|
|
||||||
container.
|
|
||||||
This value (``user_id`` below), should be the same id that ``USERMAP_UID`` and
|
|
||||||
``USERMAP_GID`` are set to in the next step.
|
|
||||||
See ``USERMAP_UID`` and ``USERMAP_GID`` :ref:`here <configuration-docker>`.
|
|
||||||
|
|
||||||
Your entry for Paperless should contain something like:
|
|
||||||
|
|
||||||
.. code::
|
|
||||||
|
|
||||||
webserver:
|
|
||||||
image: ghcr.io/paperless-ngx/paperless-ngx:latest
|
|
||||||
user: <user_id>
|
|
||||||
|
|
||||||
5. Modify ``docker-compose.env``, following the comments in the file. The
|
|
||||||
most important change is to set ``USERMAP_UID`` and ``USERMAP_GID``
|
|
||||||
to the uid and gid of your user on the host system. Use ``id -u`` and
|
|
||||||
``id -g`` to get these.
|
|
||||||
|
|
||||||
This ensures that
|
|
||||||
both the docker container and you on the host machine have write access
|
|
||||||
to the consumption directory. If your UID and GID on the host system is
|
|
||||||
1000 (the default for the first normal user on most systems), it will
|
|
||||||
work out of the box without any modifications. `id "username"` to check.
|
|
||||||
|
|
||||||
.. note::
|
|
||||||
|
|
||||||
You can copy any setting from the file ``paperless.conf.example`` and paste it here.
|
|
||||||
Have a look at :ref:`configuration` to see what's available.
|
|
||||||
|
|
||||||
.. note::
|
|
||||||
|
|
||||||
You can utilize Docker secrets for some configuration settings by
|
|
||||||
appending `_FILE` to some configuration values. This is supported currently
|
|
||||||
only by:
|
|
||||||
|
|
||||||
* PAPERLESS_DBUSER
|
|
||||||
* PAPERLESS_DBPASS
|
|
||||||
* PAPERLESS_SECRET_KEY
|
|
||||||
* PAPERLESS_AUTO_LOGIN_USERNAME
|
|
||||||
* PAPERLESS_ADMIN_USER
|
|
||||||
* PAPERLESS_ADMIN_MAIL
|
|
||||||
* PAPERLESS_ADMIN_PASSWORD
|
|
||||||
|
|
||||||
.. caution::
|
|
||||||
|
|
||||||
Some file systems such as NFS network shares don't support file system
|
|
||||||
notifications with ``inotify``. When storing the consumption directory
|
|
||||||
on such a file system, paperless will not pick up new files
|
|
||||||
with the default configuration. You will need to use ``PAPERLESS_CONSUMER_POLLING``,
|
|
||||||
which will disable inotify. See :ref:`here <configuration-polling>`.
|
|
||||||
|
|
||||||
6. Run ``docker-compose pull``, followed by ``docker-compose up -d``.
|
|
||||||
This will pull the image, create and start the necessary containers.
|
|
||||||
|
|
||||||
7. To be able to login, you will need a super user. To create it, execute the
|
|
||||||
following command:
|
|
||||||
|
|
||||||
.. code-block:: shell-session
|
|
||||||
|
|
||||||
$ docker-compose run --rm webserver createsuperuser
|
|
||||||
|
|
||||||
This will prompt you to set a username, an optional e-mail address and
|
|
||||||
finally a password (at least 8 characters).
|
|
||||||
|
|
||||||
8. The default ``docker-compose.yml`` exports the webserver on your local port
|
|
||||||
8000. If you did not change this, you should now be able to visit your
|
|
||||||
Paperless instance at ``http://127.0.0.1:8000`` or your servers IP-Address:8000.
|
|
||||||
Use the login credentials you have created with the previous step.
|
|
||||||
|
|
||||||
.. _Docker: https://www.docker.com/
|
|
||||||
.. _docker-compose: https://docs.docker.com/compose/install/
|
|
||||||
|
|
||||||
.. _setup-docker_build:
|
|
||||||
|
|
||||||
Build the Docker image yourself
|
|
||||||
===============================
|
|
||||||
|
|
||||||
1. Clone the entire repository of paperless:
|
|
||||||
|
|
||||||
.. code:: shell-session
|
|
||||||
|
|
||||||
git clone https://github.com/paperless-ngx/paperless-ngx
|
|
||||||
|
|
||||||
The master branch always reflects the latest stable version.
|
|
||||||
|
|
||||||
2. Copy one of the ``docker/compose/docker-compose.*.yml`` to ``docker-compose.yml`` in the root folder,
|
|
||||||
depending on which database backend you want to use. Copy
|
|
||||||
``docker-compose.env`` into the project root as well.
|
|
||||||
|
|
||||||
3. In the ``docker-compose.yml`` file, find the line that instructs docker-compose to pull the paperless image from Docker Hub:
|
|
||||||
|
|
||||||
.. code:: yaml
|
|
||||||
|
|
||||||
webserver:
|
|
||||||
image: ghcr.io/paperless-ngx/paperless-ngx:latest
|
|
||||||
|
|
||||||
and replace it with a line that instructs docker-compose to build the image from the current working directory instead:
|
|
||||||
|
|
||||||
.. code:: yaml
|
|
||||||
|
|
||||||
webserver:
|
|
||||||
build:
|
|
||||||
context: .
|
|
||||||
args:
|
|
||||||
QPDF_VERSION: x.y.x
|
|
||||||
PIKEPDF_VERSION: x.y.z
|
|
||||||
PSYCOPG2_VERSION: x.y.z
|
|
||||||
JBIG2ENC_VERSION: 0.29
|
|
||||||
|
|
||||||
.. note::
|
|
||||||
|
|
||||||
You should match the build argument versions to the version for the release you have
|
|
||||||
checked out. These are pre-built images with certain, more updated software.
|
|
||||||
If you want to build these images your self, that is possible, but beyond
|
|
||||||
the scope of these steps.
|
|
||||||
|
|
||||||
4. Follow steps 3 to 8 of :ref:`setup-docker_hub`. When asked to run
|
|
||||||
``docker-compose pull`` to pull the image, do
|
|
||||||
|
|
||||||
.. code:: shell-session
|
|
||||||
|
|
||||||
$ docker-compose build
|
|
||||||
|
|
||||||
instead to build the image.
|
|
||||||
|
|
||||||
.. _setup-bare_metal:
|
|
||||||
|
|
||||||
Bare Metal Route
|
|
||||||
================
|
|
||||||
|
|
||||||
Paperless runs on linux only. The following procedure has been tested on a minimal
|
|
||||||
installation of Debian/Buster, which is the current stable release at the time of
|
|
||||||
writing. Windows is not and will never be supported.
|
|
||||||
|
|
||||||
1. Install dependencies. Paperless requires the following packages.
|
|
||||||
|
|
||||||
* ``python3`` 3.8, 3.9
|
|
||||||
* ``python3-pip``
|
|
||||||
* ``python3-dev``
|
|
||||||
|
|
||||||
* ``default-libmysqlclient-dev`` for MariaDB
|
|
||||||
* ``fonts-liberation`` for generating thumbnails for plain text files
|
|
||||||
* ``imagemagick`` >= 6 for PDF conversion
|
|
||||||
* ``gnupg`` for handling encrypted documents
|
|
||||||
* ``libpq-dev`` for PostgreSQL
|
|
||||||
* ``libmagic-dev`` for mime type detection
|
|
||||||
* ``mariadb-client`` for MariaDB compile time
|
|
||||||
* ``mime-support`` for mime type detection
|
|
||||||
* ``libzbar0`` for barcode detection
|
|
||||||
* ``poppler-utils`` for barcode detection
|
|
||||||
|
|
||||||
Use this list for your preferred package management:
|
|
||||||
|
|
||||||
.. code::
|
|
||||||
|
|
||||||
python3 python3-pip python3-dev imagemagick fonts-liberation gnupg libpq-dev default-libmysqlclient-dev libmagic-dev mime-support libzbar0 poppler-utils
|
|
||||||
|
|
||||||
These dependencies are required for OCRmyPDF, which is used for text recognition.
|
|
||||||
|
|
||||||
* ``unpaper``
|
|
||||||
* ``ghostscript``
|
|
||||||
* ``icc-profiles-free``
|
|
||||||
* ``qpdf``
|
|
||||||
* ``liblept5``
|
|
||||||
* ``libxml2``
|
|
||||||
* ``pngquant`` (suggested for certain PDF image optimizations)
|
|
||||||
* ``zlib1g``
|
|
||||||
* ``tesseract-ocr`` >= 4.0.0 for OCR
|
|
||||||
* ``tesseract-ocr`` language packs (``tesseract-ocr-eng``, ``tesseract-ocr-deu``, etc)
|
|
||||||
|
|
||||||
Use this list for your preferred package management:
|
|
||||||
|
|
||||||
.. code::
|
|
||||||
|
|
||||||
unpaper ghostscript icc-profiles-free qpdf liblept5 libxml2 pngquant zlib1g tesseract-ocr
|
|
||||||
|
|
||||||
On Raspberry Pi, these libraries are required as well:
|
|
||||||
|
|
||||||
* ``libatlas-base-dev``
|
|
||||||
* ``libxslt1-dev``
|
|
||||||
|
|
||||||
You will also need ``build-essential``, ``python3-setuptools`` and ``python3-wheel``
|
|
||||||
for installing some of the python dependencies.
|
|
||||||
|
|
||||||
2. Install ``redis`` >= 6.0 and configure it to start automatically.
|
|
||||||
|
|
||||||
3. Optional. Install ``postgresql`` and configure a database, user and password for paperless. If you do not wish
|
|
||||||
to use PostgreSQL, MariaDB and SQLite are available as well.
|
|
||||||
|
|
||||||
.. note::
|
|
||||||
|
|
||||||
On bare-metal installations using SQLite, ensure the
|
|
||||||
`JSON1 extension <https://code.djangoproject.com/wiki/JSON1Extension>`_ is enabled. This is
|
|
||||||
usually the case, but not always.
|
|
||||||
|
|
||||||
4. Get the release archive from `<https://github.com/paperless-ngx/paperless-ngx/releases>`_.
|
|
||||||
If you clone the git repo as it is, you also have to compile the front end by yourself.
|
|
||||||
Extract the archive to a place from where you wish to execute it, such as ``/opt/paperless``.
|
|
||||||
|
|
||||||
5. Configure paperless. See :ref:`configuration` for details. Edit the included ``paperless.conf`` and adjust the
|
|
||||||
settings to your needs. Required settings for getting paperless running are:
|
|
||||||
|
|
||||||
* ``PAPERLESS_REDIS`` should point to your redis server, such as redis://localhost:6379.
|
|
||||||
* ``PAPERLESS_DBENGINE`` optional, and should be one of `postgres, mariadb, or sqlite`
|
|
||||||
* ``PAPERLESS_DBHOST`` should be the hostname on which your PostgreSQL server is running. Do not configure this
|
|
||||||
to use SQLite instead. Also configure port, database name, user and password as necessary.
|
|
||||||
* ``PAPERLESS_CONSUMPTION_DIR`` should point to a folder which paperless should watch for documents. You might
|
|
||||||
want to have this somewhere else. Likewise, ``PAPERLESS_DATA_DIR`` and ``PAPERLESS_MEDIA_ROOT`` define where
|
|
||||||
paperless stores its data. If you like, you can point both to the same directory.
|
|
||||||
* ``PAPERLESS_SECRET_KEY`` should be a random sequence of characters. It's used for authentication. Failure
|
|
||||||
to do so allows third parties to forge authentication credentials.
|
|
||||||
* ``PAPERLESS_URL`` if you are behind a reverse proxy. This should point to your domain. Please see
|
|
||||||
:ref:`configuration` for more information.
|
|
||||||
|
|
||||||
Many more adjustments can be made to paperless, especially the OCR part. The following options are recommended
|
|
||||||
for everyone:
|
|
||||||
|
|
||||||
* Set ``PAPERLESS_OCR_LANGUAGE`` to the language most of your documents are written in.
|
|
||||||
* Set ``PAPERLESS_TIME_ZONE`` to your local time zone.
|
|
||||||
|
|
||||||
6. Create a system user under which you wish to run paperless.
|
|
||||||
|
|
||||||
.. code:: shell-session
|
|
||||||
|
|
||||||
adduser paperless --system --home /opt/paperless --group
|
|
||||||
|
|
||||||
7. Ensure that these directories exist
|
|
||||||
and that the paperless user has write permissions to the following directories:
|
|
||||||
|
|
||||||
* ``/opt/paperless/media``
|
|
||||||
* ``/opt/paperless/data``
|
|
||||||
* ``/opt/paperless/consume``
|
|
||||||
|
|
||||||
Adjust as necessary if you configured different folders.
|
|
||||||
|
|
||||||
8. Install python requirements from the ``requirements.txt`` file.
|
|
||||||
It is up to you if you wish to use a virtual environment or not. First you should update your pip, so it gets the actual packages.
|
|
||||||
|
|
||||||
.. code:: shell-session
|
|
||||||
|
|
||||||
sudo -Hu paperless pip3 install --upgrade pip
|
|
||||||
|
|
||||||
.. code:: shell-session
|
|
||||||
|
|
||||||
sudo -Hu paperless pip3 install -r requirements.txt
|
|
||||||
|
|
||||||
This will install all python dependencies in the home directory of
|
|
||||||
the new paperless user.
|
|
||||||
|
|
||||||
9. Go to ``/opt/paperless/src``, and execute the following commands:
|
|
||||||
|
|
||||||
.. code:: bash
|
|
||||||
|
|
||||||
# This creates the database schema.
|
|
||||||
sudo -Hu paperless python3 manage.py migrate
|
|
||||||
|
|
||||||
# This creates your first paperless user
|
|
||||||
sudo -Hu paperless python3 manage.py createsuperuser
|
|
||||||
|
|
||||||
10. Optional: Test that paperless is working by executing
|
|
||||||
|
|
||||||
.. code:: bash
|
|
||||||
|
|
||||||
# This collects static files from paperless and django.
|
|
||||||
sudo -Hu paperless python3 manage.py runserver
|
|
||||||
|
|
||||||
and pointing your browser to http://localhost:8000/.
|
|
||||||
|
|
||||||
.. warning::
|
|
||||||
|
|
||||||
This is a development server which should not be used in
|
|
||||||
production. It is not audited for security and performance
|
|
||||||
is inferior to production ready web servers.
|
|
||||||
|
|
||||||
.. hint::
|
|
||||||
|
|
||||||
This will not start the consumer. Paperless does this in a
|
|
||||||
separate process.
|
|
||||||
|
|
||||||
11. Setup systemd services to run paperless automatically. You may
|
|
||||||
use the service definition files included in the ``scripts`` folder
|
|
||||||
as a starting point.
|
|
||||||
|
|
||||||
Paperless needs the ``webserver`` script to run the webserver, the
|
|
||||||
``consumer`` script to watch the input folder, ``taskqueue`` for the background workers
|
|
||||||
used to handle things like document consumption and the ``scheduler`` script to run tasks such as
|
|
||||||
email checking at certain times .
|
|
||||||
|
|
||||||
The ``socket`` script enables ``gunicorn`` to run on port 80 without
|
|
||||||
root privileges. For this you need to uncomment the ``Require=paperless-webserver.socket``
|
|
||||||
in the ``webserver`` script and configure ``gunicorn`` to listen on port 80 (see ``paperless/gunicorn.conf.py``).
|
|
||||||
|
|
||||||
You may need to adjust the path to the ``gunicorn`` executable. This
|
|
||||||
will be installed as part of the python dependencies, and is either located
|
|
||||||
in the ``bin`` folder of your virtual environment, or in ``~/.local/bin/`` if
|
|
||||||
no virtual environment is used.
|
|
||||||
|
|
||||||
These services rely on redis and optionally the database server, but
|
|
||||||
don't need to be started in any particular order. The example files
|
|
||||||
depend on redis being started. If you use a database server, you should
|
|
||||||
add additional dependencies.
|
|
||||||
|
|
||||||
.. caution::
|
|
||||||
|
|
||||||
The included scripts run a ``gunicorn`` standalone server,
|
|
||||||
which is fine for running paperless. It does support SSL,
|
|
||||||
however, the documentation of GUnicorn states that you should
|
|
||||||
use a proxy server in front of gunicorn instead.
|
|
||||||
|
|
||||||
For instructions on how to use nginx for that,
|
|
||||||
:ref:`see the instructions below <setup-nginx>`.
|
|
||||||
|
|
||||||
12. Optional: Install a samba server and make the consumption folder
|
|
||||||
available as a network share.
|
|
||||||
|
|
||||||
13. Configure ImageMagick to allow processing of PDF documents. Most distributions have
|
|
||||||
this disabled by default, since PDF documents can contain malware. If
|
|
||||||
you don't do this, paperless will fall back to ghostscript for certain steps
|
|
||||||
such as thumbnail generation.
|
|
||||||
|
|
||||||
Edit ``/etc/ImageMagick-6/policy.xml`` and adjust
|
|
||||||
|
|
||||||
.. code::
|
|
||||||
|
|
||||||
<policy domain="coder" rights="none" pattern="PDF" />
|
|
||||||
|
|
||||||
to
|
|
||||||
|
|
||||||
.. code::
|
|
||||||
|
|
||||||
<policy domain="coder" rights="read|write" pattern="PDF" />
|
|
||||||
|
|
||||||
14. Optional: Install the `jbig2enc <https://ocrmypdf.readthedocs.io/en/latest/jbig2.html>`_
|
|
||||||
encoder. This will reduce the size of generated PDF documents. You'll most likely need
|
|
||||||
to compile this by yourself, because this software has been patented until around 2017 and
|
|
||||||
binary packages are not available for most distributions.
|
|
||||||
|
|
||||||
15. Optional: If using the NLTK machine learning processing (see ``PAPERLESS_ENABLE_NLTK`` in
|
|
||||||
:ref:`configuration` for details), download the NLTK data for the Snowball Stemmer, Stopwords
|
|
||||||
and Punkt tokenizer to your ``PAPERLESS_DATA_DIR/nltk``. Refer to
|
|
||||||
the `NLTK instructions <https://www.nltk.org/data.html>`_ for details on how to
|
|
||||||
download the data.
|
|
||||||
|
|
||||||
|
|
||||||
Migrating to Paperless-ngx
|
|
||||||
##########################
|
|
||||||
|
|
||||||
Migration is possible both from Paperless-ng or directly from the 'original' Paperless.
|
|
||||||
|
|
||||||
Migrating from Paperless-ng
|
|
||||||
===========================
|
|
||||||
|
|
||||||
Paperless-ngx is meant to be a drop-in replacement for Paperless-ng and thus upgrading should be
|
|
||||||
trivial for most users, especially when using docker. However, as with any major change, it is
|
|
||||||
recommended to take a full backup first. Once you are ready, simply change the docker image to
|
|
||||||
point to the new source. E.g. if using Docker Compose, edit ``docker-compose.yml`` and change:
|
|
||||||
|
|
||||||
.. code::
|
|
||||||
|
|
||||||
image: jonaswinkler/paperless-ng:latest
|
|
||||||
|
|
||||||
to
|
|
||||||
|
|
||||||
.. code::
|
|
||||||
|
|
||||||
image: ghcr.io/paperless-ngx/paperless-ngx:latest
|
|
||||||
|
|
||||||
and then run ``docker-compose up -d`` which will pull the new image recreate the container.
|
|
||||||
That's it!
|
|
||||||
|
|
||||||
Users who installed with the bare-metal route should also update their Git clone to point to
|
|
||||||
``https://github.com/paperless-ngx/paperless-ngx``, e.g. using the command
|
|
||||||
``git remote set-url origin https://github.com/paperless-ngx/paperless-ngx`` and then pull the
|
|
||||||
lastest version.
|
|
||||||
|
|
||||||
Migrating from Paperless
|
|
||||||
========================
|
|
||||||
|
|
||||||
At its core, paperless-ngx is still paperless and fully compatible. However, some
|
|
||||||
things have changed under the hood, so you need to adapt your setup depending on
|
|
||||||
how you installed paperless.
|
|
||||||
|
|
||||||
This setup describes how to update an existing paperless Docker installation.
|
|
||||||
The important things to keep in mind are as follows:
|
|
||||||
|
|
||||||
* Read the :doc:`changelog </changelog>` and take note of breaking changes.
|
|
||||||
* You should decide if you want to stick with SQLite or want to migrate your database
|
|
||||||
to PostgreSQL. See :ref:`setup-sqlite_to_psql` for details on how to move your data from
|
|
||||||
SQLite to PostgreSQL. Both work fine with paperless. However, if you already have a
|
|
||||||
database server running for other services, you might as well use it for paperless as well.
|
|
||||||
* The task scheduler of paperless, which is used to execute periodic tasks
|
|
||||||
such as email checking and maintenance, requires a `redis`_ message broker
|
|
||||||
instance. The docker-compose route takes care of that.
|
|
||||||
* The layout of the folder structure for your documents and data remains the
|
|
||||||
same, so you can just plug your old docker volumes into paperless-ngx and
|
|
||||||
expect it to find everything where it should be.
|
|
||||||
|
|
||||||
Migration to paperless-ngx is then performed in a few simple steps:
|
|
||||||
|
|
||||||
1. Stop paperless.
|
|
||||||
|
|
||||||
.. code:: bash
|
|
||||||
|
|
||||||
$ cd /path/to/current/paperless
|
|
||||||
$ docker-compose down
|
|
||||||
|
|
||||||
2. Do a backup for two purposes: If something goes wrong, you still have your
|
|
||||||
data. Second, if you don't like paperless-ngx, you can switch back to
|
|
||||||
paperless.
|
|
||||||
|
|
||||||
3. Download the latest release of paperless-ngx. You can either go with the
|
|
||||||
docker-compose files from `here <https://github.com/paperless-ngx/paperless-ngx/tree/master/docker/compose>`__
|
|
||||||
or clone the repository to build the image yourself (see :ref:`above <setup-docker_build>`).
|
|
||||||
You can either replace your current paperless folder or put paperless-ngx
|
|
||||||
in a different location.
|
|
||||||
|
|
||||||
.. caution::
|
|
||||||
|
|
||||||
Paperless-ngx includes a ``.env`` file. This will set the
|
|
||||||
project name for docker compose to ``paperless``, which will also define the name
|
|
||||||
of the volumes by paperless-ngx. However, if you experience that paperless-ngx
|
|
||||||
is not using your old paperless volumes, verify the names of your volumes with
|
|
||||||
|
|
||||||
.. code:: shell-session
|
|
||||||
|
|
||||||
$ docker volume ls | grep _data
|
|
||||||
|
|
||||||
and adjust the project name in the ``.env`` file so that it matches the name
|
|
||||||
of the volumes before the ``_data`` part.
|
|
||||||
|
|
||||||
|
|
||||||
4. Download the ``docker-compose.sqlite.yml`` file to ``docker-compose.yml``.
|
|
||||||
If you want to switch to PostgreSQL, do that after you migrated your existing
|
|
||||||
SQLite database.
|
|
||||||
|
|
||||||
5. Adjust ``docker-compose.yml`` and ``docker-compose.env`` to your needs.
|
|
||||||
See :ref:`setup-docker_hub` for details on which edits are advised.
|
|
||||||
|
|
||||||
6. :ref:`Update paperless. <administration-updating>`
|
|
||||||
|
|
||||||
7. In order to find your existing documents with the new search feature, you need
|
|
||||||
to invoke a one-time operation that will create the search index:
|
|
||||||
|
|
||||||
.. code:: shell-session
|
|
||||||
|
|
||||||
$ docker-compose run --rm webserver document_index reindex
|
|
||||||
|
|
||||||
This will migrate your database and create the search index. After that,
|
|
||||||
paperless will take care of maintaining the index by itself.
|
|
||||||
|
|
||||||
8. Start paperless-ngx.
|
|
||||||
|
|
||||||
.. code:: bash
|
|
||||||
|
|
||||||
$ docker-compose up -d
|
|
||||||
|
|
||||||
This will run paperless in the background and automatically start it on system boot.
|
|
||||||
|
|
||||||
9. Paperless installed a permanent redirect to ``admin/`` in your browser. This
|
|
||||||
redirect is still in place and prevents access to the new UI. Clear your
|
|
||||||
browsing cache in order to fix this.
|
|
||||||
|
|
||||||
10. Optionally, follow the instructions below to migrate your existing data to PostgreSQL.
|
|
||||||
|
|
||||||
|
|
||||||
Migrating from LinuxServer.io Docker Image
|
|
||||||
==========================================
|
|
||||||
|
|
||||||
As with any upgrades and large changes, it is highly recommended to create a backup before
|
|
||||||
starting. This assumes the image was running using Docker Compose, but the instructions
|
|
||||||
are translatable to Docker commands as well.
|
|
||||||
|
|
||||||
1. Stop and remove the paperless container
|
|
||||||
2. If using an external database, stop the container
|
|
||||||
3. Update Redis configuration
|
|
||||||
|
|
||||||
a) If ``REDIS_URL`` is already set, change it to ``PAPERLESS_REDIS`` and continue
|
|
||||||
to step 4.
|
|
||||||
b) Otherwise, in the ``docker-compose.yml`` add a new service for Redis,
|
|
||||||
following `the example compose files <https://github.com/paperless-ngx/paperless-ngx/tree/main/docker/compose>`_
|
|
||||||
c) Set the environment variable ``PAPERLESS_REDIS`` so it points to the new Redis container
|
|
||||||
|
|
||||||
4. Update user mapping
|
|
||||||
|
|
||||||
a) If set, change the environment variable ``PUID`` to ``USERMAP_UID``
|
|
||||||
b) If set, change the environment variable ``PGID`` to ``USERMAP_GID``
|
|
||||||
|
|
||||||
5. Update configuration paths
|
|
||||||
|
|
||||||
a) Set the environment variable ``PAPERLESS_DATA_DIR``
|
|
||||||
to ``/config``
|
|
||||||
|
|
||||||
6. Update media paths
|
|
||||||
|
|
||||||
a) Set the environment variable ``PAPERLESS_MEDIA_ROOT``
|
|
||||||
to ``/data/media``
|
|
||||||
|
|
||||||
7. Update timezone
|
|
||||||
|
|
||||||
a) Set the environment variable ``PAPERLESS_TIME_ZONE``
|
|
||||||
to the same value as ``TZ``
|
|
||||||
|
|
||||||
8. Modify the ``image:`` to point to ``ghcr.io/paperless-ngx/paperless-ngx:latest`` or
|
|
||||||
a specific version if preferred.
|
|
||||||
|
|
||||||
9. Start the containers as before, using ``docker-compose``.
|
|
||||||
|
|
||||||
.. _setup-sqlite_to_psql:
|
|
||||||
|
|
||||||
Moving data from SQLite to PostgreSQL or MySQL/MariaDB
|
|
||||||
======================================================
|
|
||||||
|
|
||||||
Moving your data from SQLite to PostgreSQL or MySQL/MariaDB is done via executing a series of django
|
|
||||||
management commands as below. The commands below use PostgreSQL, but are applicable to MySQL/MariaDB
|
|
||||||
with the
|
|
||||||
|
|
||||||
.. caution::
|
|
||||||
|
|
||||||
Make sure that your SQLite database is migrated to the latest version.
|
|
||||||
Starting paperless will make sure that this is the case. If your try to
|
|
||||||
load data from an old database schema in SQLite into a newer database
|
|
||||||
schema in PostgreSQL, you will run into trouble.
|
|
||||||
|
|
||||||
.. warning::
|
|
||||||
|
|
||||||
On some database fields, PostgreSQL enforces predefined limits on maximum
|
|
||||||
length, whereas SQLite does not. The fields in question are the title of documents
|
|
||||||
(128 characters), names of document types, tags and correspondents (128 characters),
|
|
||||||
and filenames (1024 characters). If you have data in these fields that surpasses these
|
|
||||||
limits, migration to PostgreSQL is not possible and will fail with an error.
|
|
||||||
|
|
||||||
.. warning::
|
|
||||||
|
|
||||||
MySQL is case insensitive by default, treating values like "Name" and "NAME" as identical.
|
|
||||||
See :ref:`advanced-mysql-caveats` for details.
|
|
||||||
|
|
||||||
|
|
||||||
1. Stop paperless, if it is running.
|
|
||||||
2. Tell paperless to use PostgreSQL:
|
|
||||||
|
|
||||||
a) With docker, copy the provided ``docker-compose.postgres.yml`` file to
|
|
||||||
``docker-compose.yml``. Remember to adjust the consumption directory,
|
|
||||||
if necessary.
|
|
||||||
b) Without docker, configure the database in your ``paperless.conf`` file.
|
|
||||||
See :ref:`configuration` for details.
|
|
||||||
|
|
||||||
3. Open a shell and initialize the database:
|
|
||||||
|
|
||||||
a) With docker, run the following command to open a shell within the paperless
|
|
||||||
container:
|
|
||||||
|
|
||||||
.. code:: shell-session
|
|
||||||
|
|
||||||
$ cd /path/to/paperless
|
|
||||||
$ docker-compose run --rm webserver /bin/bash
|
|
||||||
|
|
||||||
This will launch the container and initialize the PostgreSQL database.
|
|
||||||
|
|
||||||
b) Without docker, remember to activate any virtual environment, switch to
|
|
||||||
the ``src`` directory and create the database schema:
|
|
||||||
|
|
||||||
.. code:: shell-session
|
|
||||||
|
|
||||||
$ cd /path/to/paperless/src
|
|
||||||
$ python3 manage.py migrate
|
|
||||||
|
|
||||||
This will not copy any data yet.
|
|
||||||
|
|
||||||
4. Dump your data from SQLite:
|
|
||||||
|
|
||||||
.. code:: shell-session
|
|
||||||
|
|
||||||
$ python3 manage.py dumpdata --database=sqlite --exclude=contenttypes --exclude=auth.Permission > data.json
|
|
||||||
|
|
||||||
5. Load your data into PostgreSQL:
|
|
||||||
|
|
||||||
.. code:: shell-session
|
|
||||||
|
|
||||||
$ python3 manage.py loaddata data.json
|
|
||||||
|
|
||||||
6. If operating inside Docker, you may exit the shell now.
|
|
||||||
|
|
||||||
.. code:: shell-session
|
|
||||||
|
|
||||||
$ exit
|
|
||||||
|
|
||||||
7. Start paperless.
|
|
||||||
|
|
||||||
|
|
||||||
Moving back to Paperless
|
|
||||||
========================
|
|
||||||
|
|
||||||
Lets say you migrated to Paperless-ngx and used it for a while, but decided that
|
|
||||||
you don't like it and want to move back (If you do, send me a mail about what
|
|
||||||
part you didn't like!), you can totally do that with a few simple steps.
|
|
||||||
|
|
||||||
Paperless-ngx modified the database schema slightly, however, these changes can
|
|
||||||
be reverted while keeping your current data, so that your current data will
|
|
||||||
be compatible with original Paperless.
|
|
||||||
|
|
||||||
Execute this:
|
|
||||||
|
|
||||||
.. code:: shell-session
|
|
||||||
|
|
||||||
$ cd /path/to/paperless
|
|
||||||
$ docker-compose run --rm webserver migrate documents 0023
|
|
||||||
|
|
||||||
Or without docker:
|
|
||||||
|
|
||||||
.. code:: shell-session
|
|
||||||
|
|
||||||
$ cd /path/to/paperless/src
|
|
||||||
$ python3 manage.py migrate documents 0023
|
|
||||||
|
|
||||||
After that, you need to clear your cookies (Paperless-ngx comes with updated
|
|
||||||
dependencies that do cookie-processing differently) and probably your cache
|
|
||||||
as well.
|
|
||||||
|
|
||||||
.. _setup-less_powerful_devices:
|
|
||||||
|
|
||||||
|
|
||||||
Considerations for less powerful devices
|
|
||||||
########################################
|
|
||||||
|
|
||||||
Paperless runs on Raspberry Pi. However, some things are rather slow on the Pi and
|
|
||||||
configuring some options in paperless can help improve performance immensely:
|
|
||||||
|
|
||||||
* Stick with SQLite to save some resources.
|
|
||||||
* Consider setting ``PAPERLESS_OCR_PAGES`` to 1, so that paperless will only OCR
|
|
||||||
the first page of your documents. In most cases, this page contains enough
|
|
||||||
information to be able to find it.
|
|
||||||
* ``PAPERLESS_TASK_WORKERS`` and ``PAPERLESS_THREADS_PER_WORKER`` are configured
|
|
||||||
to use all cores. The Raspberry Pi models 3 and up have 4 cores, meaning that
|
|
||||||
paperless will use 2 workers and 2 threads per worker. This may result in
|
|
||||||
sluggish response times during consumption, so you might want to lower these
|
|
||||||
settings (example: 2 workers and 1 thread to always have some computing power
|
|
||||||
left for other tasks).
|
|
||||||
* Keep ``PAPERLESS_OCR_MODE`` at its default value ``skip`` and consider OCR'ing
|
|
||||||
your documents before feeding them into paperless. Some scanners are able to
|
|
||||||
do this! You might want to even specify ``skip_noarchive`` to skip archive
|
|
||||||
file generation for already ocr'ed documents entirely.
|
|
||||||
* If you want to perform OCR on the device, consider using ``PAPERLESS_OCR_CLEAN=none``.
|
|
||||||
This will speed up OCR times and use less memory at the expense of slightly worse
|
|
||||||
OCR results.
|
|
||||||
* If using docker, consider setting ``PAPERLESS_WEBSERVER_WORKERS`` to
|
|
||||||
1. This will save some memory.
|
|
||||||
* Consider setting ``PAPERLESS_ENABLE_NLTK`` to false, to disable the more
|
|
||||||
advanced language processing, which can take more memory and processing time.
|
|
||||||
|
|
||||||
For details, refer to :ref:`configuration`.
|
|
||||||
|
|
||||||
.. note::
|
|
||||||
|
|
||||||
Updating the :ref:`automatic matching algorithm <advanced-automatic_matching>`
|
|
||||||
takes quite a bit of time. However, the update mechanism checks if your
|
|
||||||
data has changed before doing the heavy lifting. If you experience the
|
|
||||||
algorithm taking too much cpu time, consider changing the schedule in the
|
|
||||||
admin interface to daily. You can also manually invoke the task
|
|
||||||
by changing the date and time of the next run to today/now.
|
|
||||||
|
|
||||||
The actual matching of the algorithm is fast and works on Raspberry Pi as
|
|
||||||
well as on any other device.
|
|
||||||
|
|
||||||
.. _redis: https://redis.io/
|
|
||||||
|
|
||||||
|
|
||||||
.. _setup-nginx:
|
|
||||||
|
|
||||||
Using nginx as a reverse proxy
|
|
||||||
##############################
|
|
||||||
|
|
||||||
If you want to expose paperless to the internet, you should hide it behind a
|
|
||||||
reverse proxy with SSL enabled.
|
|
||||||
|
|
||||||
In addition to the usual configuration for SSL,
|
|
||||||
the following configuration is required for paperless to operate:
|
|
||||||
|
|
||||||
.. code:: nginx
|
|
||||||
|
|
||||||
http {
|
|
||||||
|
|
||||||
# Adjust as required. This is the maximum size for file uploads.
|
|
||||||
# The default value 1M might be a little too small.
|
|
||||||
client_max_body_size 10M;
|
|
||||||
|
|
||||||
server {
|
|
||||||
|
|
||||||
location / {
|
|
||||||
|
|
||||||
# Adjust host and port as required.
|
|
||||||
proxy_pass http://localhost:8000/;
|
|
||||||
|
|
||||||
# These configuration options are required for WebSockets to work.
|
|
||||||
proxy_http_version 1.1;
|
|
||||||
proxy_set_header Upgrade $http_upgrade;
|
|
||||||
proxy_set_header Connection "upgrade";
|
|
||||||
|
|
||||||
proxy_redirect off;
|
|
||||||
proxy_set_header Host $host;
|
|
||||||
proxy_set_header X-Real-IP $remote_addr;
|
|
||||||
proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
|
|
||||||
proxy_set_header X-Forwarded-Host $server_name;
|
|
||||||
}
|
|
||||||
}
|
|
||||||
}
|
|
||||||
|
|
||||||
The ``PAPERLESS_URL`` configuration variable is also required when using a reverse proxy. Please refer to the :ref:`hosting-and-security` docs.
|
|
||||||
|
|
||||||
Also read `this <https://channels.readthedocs.io/en/stable/deploying.html#nginx-supervisor-ubuntu>`__, towards the end of the section.
|
|
||||||
|
@ -1,328 +1,12 @@
|
|||||||
|
.. _troubleshooting:
|
||||||
|
|
||||||
***************
|
***************
|
||||||
Troubleshooting
|
Troubleshooting
|
||||||
***************
|
***************
|
||||||
|
|
||||||
No files are added by the consumer
|
|
||||||
##################################
|
|
||||||
|
|
||||||
Check for the following issues:
|
.. cssclass:: redirect-notice
|
||||||
|
|
||||||
* Ensure that the directory you're putting your documents in is the folder
|
The Paperless-ngx documentation has permanently moved.
|
||||||
paperless is watching. With docker, this setting is performed in the
|
|
||||||
``docker-compose.yml`` file. Without docker, look at the ``CONSUMPTION_DIR``
|
|
||||||
setting. Don't adjust this setting if you're using docker.
|
|
||||||
* Ensure that redis is up and running. Paperless does its task processing
|
|
||||||
asynchronously, and for documents to arrive at the task processor, it needs
|
|
||||||
redis to run.
|
|
||||||
* Ensure that the task processor is running. Docker does this automatically.
|
|
||||||
Manually invoke the task processor by executing
|
|
||||||
|
|
||||||
.. code:: shell-session
|
You will be redirected shortly...
|
||||||
|
|
||||||
$ celery --app paperless worker
|
|
||||||
|
|
||||||
* Look at the output of paperless and inspect it for any errors.
|
|
||||||
* Go to the admin interface, and check if there are failed tasks. If so, the
|
|
||||||
tasks will contain an error message.
|
|
||||||
|
|
||||||
Consumer warns ``OCR for XX failed``
|
|
||||||
####################################
|
|
||||||
|
|
||||||
If you find the OCR accuracy to be too low, and/or the document consumer warns
|
|
||||||
that ``OCR for XX failed, but we're going to stick with what we've got since
|
|
||||||
FORGIVING_OCR is enabled``, then you might need to install the
|
|
||||||
`Tesseract language files <http://packages.ubuntu.com/search?keywords=tesseract-ocr>`_
|
|
||||||
marching your document's languages.
|
|
||||||
|
|
||||||
As an example, if you are running Paperless-ngx from any Ubuntu or Debian
|
|
||||||
box, and your documents are written in Spanish you may need to run::
|
|
||||||
|
|
||||||
apt-get install -y tesseract-ocr-spa
|
|
||||||
|
|
||||||
Consumer fails to pickup any new files
|
|
||||||
######################################
|
|
||||||
|
|
||||||
If you notice that the consumer will only pickup files in the consumption
|
|
||||||
directory at startup, but won't find any other files added later, you will need to
|
|
||||||
enable filesystem polling with the configuration option
|
|
||||||
``PAPERLESS_CONSUMER_POLLING``, see :ref:`here <configuration-polling>`.
|
|
||||||
|
|
||||||
This will disable listening to filesystem changes with inotify and paperless will
|
|
||||||
manually check the consumption directory for changes instead.
|
|
||||||
|
|
||||||
|
|
||||||
Paperless always redirects to /admin
|
|
||||||
####################################
|
|
||||||
|
|
||||||
You probably had the old paperless installed at some point. Paperless installed
|
|
||||||
a permanent redirect to /admin in your browser, and you need to clear your
|
|
||||||
browsing data / cache to fix that.
|
|
||||||
|
|
||||||
|
|
||||||
Operation not permitted
|
|
||||||
#######################
|
|
||||||
|
|
||||||
You might see errors such as:
|
|
||||||
|
|
||||||
.. code:: shell-session
|
|
||||||
|
|
||||||
chown: changing ownership of '../export': Operation not permitted
|
|
||||||
|
|
||||||
The container tries to set file ownership on the listed directories. This is
|
|
||||||
required so that the user running paperless inside docker has write permissions
|
|
||||||
to these folders. This happens when pointing these directories to NFS shares,
|
|
||||||
for example.
|
|
||||||
|
|
||||||
Ensure that ``chown`` is possible on these directories.
|
|
||||||
|
|
||||||
|
|
||||||
Classifier error: No training data available
|
|
||||||
############################################
|
|
||||||
|
|
||||||
This indicates that the Auto matching algorithm found no documents to learn from.
|
|
||||||
This may have two reasons:
|
|
||||||
|
|
||||||
* You don't use the Auto matching algorithm: The error can be safely ignored in this case.
|
|
||||||
* You are using the Auto matching algorithm: The classifier explicitly excludes documents
|
|
||||||
with Inbox tags. Verify that there are documents in your archive without inbox tags.
|
|
||||||
The algorithm will only learn from documents not in your inbox.
|
|
||||||
|
|
||||||
|
|
||||||
UserWarning in sklearn on every single document
|
|
||||||
###############################################
|
|
||||||
|
|
||||||
You may encounter warnings like this:
|
|
||||||
|
|
||||||
.. code::
|
|
||||||
|
|
||||||
/usr/local/lib/python3.7/site-packages/sklearn/base.py:315:
|
|
||||||
UserWarning: Trying to unpickle estimator CountVectorizer from version 0.23.2 when using version 0.24.0.
|
|
||||||
This might lead to breaking code or invalid results. Use at your own risk.
|
|
||||||
|
|
||||||
This happens when certain dependencies of paperless that are responsible for the auto matching algorithm are
|
|
||||||
updated. After updating these, your current training data *might* not be compatible anymore. This can be ignored
|
|
||||||
in most cases. This warning will disappear automatically when paperless updates the training data.
|
|
||||||
|
|
||||||
If you want to get rid of the warning or actually experience issues with automatic matching, delete
|
|
||||||
the file ``classification_model.pickle`` in the data directory and let paperless recreate it.
|
|
||||||
|
|
||||||
|
|
||||||
504 Server Error: Gateway Timeout when adding Office documents
|
|
||||||
##############################################################
|
|
||||||
|
|
||||||
You may experience these errors when using the optional TIKA integration:
|
|
||||||
|
|
||||||
.. code::
|
|
||||||
|
|
||||||
requests.exceptions.HTTPError: 504 Server Error: Gateway Timeout for url: http://gotenberg:3000/forms/libreoffice/convert
|
|
||||||
|
|
||||||
Gotenberg is a server that converts Office documents into PDF documents and has a default timeout of 30 seconds.
|
|
||||||
When conversion takes longer, Gotenberg raises this error.
|
|
||||||
|
|
||||||
You can increase the timeout by configuring a command flag for Gotenberg (see also `here <https://gotenberg.dev/docs/modules/api#properties>`__).
|
|
||||||
If using docker-compose, this is achieved by the following configuration change in the ``docker-compose.yml`` file:
|
|
||||||
|
|
||||||
.. code:: yaml
|
|
||||||
|
|
||||||
gotenberg:
|
|
||||||
image: gotenberg/gotenberg:7.6
|
|
||||||
restart: unless-stopped
|
|
||||||
command:
|
|
||||||
- "gotenberg"
|
|
||||||
- "--chromium-disable-routes=true"
|
|
||||||
- "--api-timeout=60"
|
|
||||||
|
|
||||||
Permission denied errors in the consumption directory
|
|
||||||
#####################################################
|
|
||||||
|
|
||||||
You might encounter errors such as:
|
|
||||||
|
|
||||||
.. code:: shell-session
|
|
||||||
|
|
||||||
The following error occured while consuming document.pdf: [Errno 13] Permission denied: '/usr/src/paperless/src/../consume/document.pdf'
|
|
||||||
|
|
||||||
This happens when paperless does not have permission to delete files inside the consumption directory.
|
|
||||||
Ensure that ``USERMAP_UID`` and ``USERMAP_GID`` are set to the user id and group id you use on the host operating system, if these are
|
|
||||||
different from ``1000``. See :ref:`setup-docker_hub`.
|
|
||||||
|
|
||||||
Also ensure that you are able to read and write to the consumption directory on the host.
|
|
||||||
|
|
||||||
|
|
||||||
OSError: [Errno 19] No such device when consuming files
|
|
||||||
#######################################################
|
|
||||||
|
|
||||||
If you experience errors such as:
|
|
||||||
|
|
||||||
.. code:: shell-session
|
|
||||||
|
|
||||||
File "/usr/local/lib/python3.7/site-packages/whoosh/codec/base.py", line 570, in open_compound_file
|
|
||||||
return CompoundStorage(dbfile, use_mmap=storage.supports_mmap)
|
|
||||||
File "/usr/local/lib/python3.7/site-packages/whoosh/filedb/compound.py", line 75, in __init__
|
|
||||||
self._source = mmap.mmap(fileno, 0, access=mmap.ACCESS_READ)
|
|
||||||
OSError: [Errno 19] No such device
|
|
||||||
|
|
||||||
During handling of the above exception, another exception occurred:
|
|
||||||
|
|
||||||
Traceback (most recent call last):
|
|
||||||
File "/usr/local/lib/python3.7/site-packages/django_q/cluster.py", line 436, in worker
|
|
||||||
res = f(*task["args"], **task["kwargs"])
|
|
||||||
File "/usr/src/paperless/src/documents/tasks.py", line 73, in consume_file
|
|
||||||
override_tag_ids=override_tag_ids)
|
|
||||||
File "/usr/src/paperless/src/documents/consumer.py", line 271, in try_consume_file
|
|
||||||
raise ConsumerError(e)
|
|
||||||
|
|
||||||
Paperless uses a search index to provide better and faster full text searching. This search index is stored inside
|
|
||||||
the ``data`` folder. The search index uses memory-mapped files (mmap). The above error indicates that paperless
|
|
||||||
was unable to create and open these files.
|
|
||||||
|
|
||||||
This happens when you're trying to store the data directory on certain file systems (mostly network shares)
|
|
||||||
that don't support memory-mapped files.
|
|
||||||
|
|
||||||
|
|
||||||
Web-UI stuck at "Loading..."
|
|
||||||
############################
|
|
||||||
|
|
||||||
This might have multiple reasons.
|
|
||||||
|
|
||||||
|
|
||||||
1. If you built the docker image yourself or deployed using the bare metal route,
|
|
||||||
make sure that there are files in ``<paperless-root>/static/frontend/<lang-code>/``.
|
|
||||||
If there are no files, make sure that you executed ``collectstatic`` successfully, either
|
|
||||||
manually or as part of the docker image build.
|
|
||||||
|
|
||||||
If the front end is still missing, make sure that the front end is compiled (files present in
|
|
||||||
``src/documents/static/frontend``). If it is not, you need to compile the front end yourself
|
|
||||||
or download the release archive instead of cloning the repository.
|
|
||||||
|
|
||||||
2. Check the output of the web server. You might see errors like this:
|
|
||||||
|
|
||||||
|
|
||||||
.. code::
|
|
||||||
|
|
||||||
[2021-01-25 10:08:04 +0000] [40] [ERROR] Socket error processing request.
|
|
||||||
Traceback (most recent call last):
|
|
||||||
File "/usr/local/lib/python3.7/site-packages/gunicorn/workers/sync.py", line 134, in handle
|
|
||||||
self.handle_request(listener, req, client, addr)
|
|
||||||
File "/usr/local/lib/python3.7/site-packages/gunicorn/workers/sync.py", line 190, in handle_request
|
|
||||||
util.reraise(*sys.exc_info())
|
|
||||||
File "/usr/local/lib/python3.7/site-packages/gunicorn/util.py", line 625, in reraise
|
|
||||||
raise value
|
|
||||||
File "/usr/local/lib/python3.7/site-packages/gunicorn/workers/sync.py", line 178, in handle_request
|
|
||||||
resp.write_file(respiter)
|
|
||||||
File "/usr/local/lib/python3.7/site-packages/gunicorn/http/wsgi.py", line 396, in write_file
|
|
||||||
if not self.sendfile(respiter):
|
|
||||||
File "/usr/local/lib/python3.7/site-packages/gunicorn/http/wsgi.py", line 386, in sendfile
|
|
||||||
sent += os.sendfile(sockno, fileno, offset + sent, count)
|
|
||||||
OSError: [Errno 22] Invalid argument
|
|
||||||
|
|
||||||
To fix this issue, add
|
|
||||||
|
|
||||||
.. code::
|
|
||||||
|
|
||||||
SENDFILE=0
|
|
||||||
|
|
||||||
to your `docker-compose.env` file.
|
|
||||||
|
|
||||||
Error while reading metadata
|
|
||||||
############################
|
|
||||||
|
|
||||||
You might find messages like these in your log files:
|
|
||||||
|
|
||||||
.. code::
|
|
||||||
|
|
||||||
[WARNING] [paperless.parsing.tesseract] Error while reading metadata
|
|
||||||
|
|
||||||
This indicates that paperless failed to read PDF metadata from one of your documents. This happens when you
|
|
||||||
open the affected documents in paperless for editing. Paperless will continue to work, and will simply not
|
|
||||||
show the invalid metadata.
|
|
||||||
|
|
||||||
Consumer fails with a FileNotFoundError
|
|
||||||
#######################################
|
|
||||||
|
|
||||||
You might find messages like these in your log files:
|
|
||||||
|
|
||||||
.. code::
|
|
||||||
|
|
||||||
[ERROR] [paperless.consumer] Error while consuming document SCN_0001.pdf: FileNotFoundError: [Errno 2] No such file or directory: '/tmp/ocrmypdf.io.yhk3zbv0/origin.pdf'
|
|
||||||
Traceback (most recent call last):
|
|
||||||
File "/app/paperless/src/paperless_tesseract/parsers.py", line 261, in parse
|
|
||||||
ocrmypdf.ocr(**args)
|
|
||||||
File "/usr/local/lib/python3.8/dist-packages/ocrmypdf/api.py", line 337, in ocr
|
|
||||||
return run_pipeline(options=options, plugin_manager=plugin_manager, api=True)
|
|
||||||
File "/usr/local/lib/python3.8/dist-packages/ocrmypdf/_sync.py", line 385, in run_pipeline
|
|
||||||
exec_concurrent(context, executor)
|
|
||||||
File "/usr/local/lib/python3.8/dist-packages/ocrmypdf/_sync.py", line 302, in exec_concurrent
|
|
||||||
pdf = post_process(pdf, context, executor)
|
|
||||||
File "/usr/local/lib/python3.8/dist-packages/ocrmypdf/_sync.py", line 235, in post_process
|
|
||||||
pdf_out = metadata_fixup(pdf_out, context)
|
|
||||||
File "/usr/local/lib/python3.8/dist-packages/ocrmypdf/_pipeline.py", line 798, in metadata_fixup
|
|
||||||
with pikepdf.open(context.origin) as original, pikepdf.open(working_file) as pdf:
|
|
||||||
File "/usr/local/lib/python3.8/dist-packages/pikepdf/_methods.py", line 923, in open
|
|
||||||
pdf = Pdf._open(
|
|
||||||
FileNotFoundError: [Errno 2] No such file or directory: '/tmp/ocrmypdf.io.yhk3zbv0/origin.pdf'
|
|
||||||
|
|
||||||
This probably indicates paperless tried to consume the same file twice. This can happen for a number of reasons,
|
|
||||||
depending on how documents are placed into the consume folder. If paperless is using inotify (the default) to
|
|
||||||
check for documents, try adjusting the :ref:`inotify configuration <configuration-inotify>`. If polling is enabled,
|
|
||||||
try adjusting the :ref:`polling configuration <configuration-polling>`.
|
|
||||||
|
|
||||||
Consumer fails waiting for file to remain unmodified.
|
|
||||||
#####################################################
|
|
||||||
|
|
||||||
You might find messages like these in your log files:
|
|
||||||
|
|
||||||
.. code::
|
|
||||||
|
|
||||||
[ERROR] [paperless.management.consumer] Timeout while waiting on file /usr/src/paperless/src/../consume/SCN_0001.pdf to remain unmodified.
|
|
||||||
|
|
||||||
This indicates paperless timed out while waiting for the file to be completely written to the consume folder.
|
|
||||||
Adjusting :ref:`polling configuration <configuration-polling>` values should resolve the issue.
|
|
||||||
|
|
||||||
.. note::
|
|
||||||
|
|
||||||
The user will need to manually move the file out of the consume folder and
|
|
||||||
back in, for the initial failing file to be consumed.
|
|
||||||
|
|
||||||
Consumer fails reporting "OS reports file as busy still".
|
|
||||||
#########################################################
|
|
||||||
|
|
||||||
You might find messages like these in your log files:
|
|
||||||
|
|
||||||
.. code::
|
|
||||||
|
|
||||||
[WARNING] [paperless.management.consumer] Not consuming file /usr/src/paperless/src/../consume/SCN_0001.pdf: OS reports file as busy still
|
|
||||||
|
|
||||||
This indicates paperless was unable to open the file, as the OS reported the file as still being in use. To prevent a
|
|
||||||
crash, paperless did not try to consume the file. If paperless is using inotify (the default) to
|
|
||||||
check for documents, try adjusting the :ref:`inotify configuration <configuration-inotify>`. If polling is enabled,
|
|
||||||
try adjusting the :ref:`polling configuration <configuration-polling>`.
|
|
||||||
|
|
||||||
.. note::
|
|
||||||
|
|
||||||
The user will need to manually move the file out of the consume folder and
|
|
||||||
back in, for the initial failing file to be consumed.
|
|
||||||
|
|
||||||
Log reports "Creating PaperlessTask failed".
|
|
||||||
#########################################################
|
|
||||||
|
|
||||||
You might find messages like these in your log files:
|
|
||||||
|
|
||||||
.. code::
|
|
||||||
|
|
||||||
[ERROR] [paperless.management.consumer] Creating PaperlessTask failed: db locked
|
|
||||||
|
|
||||||
You are likely using an sqlite based installation, with an increased number of workers and are running into sqlite's concurrency limitations.
|
|
||||||
Uploading or consuming multiple files at once results in many workers attempting to access the database simultaneously.
|
|
||||||
|
|
||||||
Consider changing to the PostgreSQL database if you will be processing many documents at once often. Otherwise,
|
|
||||||
try tweaking the ``PAPERLESS_DB_TIMEOUT`` setting to allow more time for the database to unlock. This may have
|
|
||||||
minor performance implications.
|
|
||||||
|
|
||||||
|
|
||||||
gunicorn fails to start with "is not a valid port number"
|
|
||||||
#########################################################
|
|
||||||
|
|
||||||
You are likely running using Kubernetes, which automatically creates an environment variable named `${serviceName}_PORT`.
|
|
||||||
This is the same environment variable which is used by Paperless to optionally change the port gunicorn listens on.
|
|
||||||
|
|
||||||
To fix this, set `PAPERLESS_PORT` again to your desired port, or the default of 8000.
|
|
||||||
|
@ -1,420 +1,12 @@
|
|||||||
|
.. _usage_overview:
|
||||||
|
|
||||||
**************
|
**************
|
||||||
Usage Overview
|
Usage Overview
|
||||||
**************
|
**************
|
||||||
|
|
||||||
Paperless is an application that manages your personal documents. With
|
|
||||||
the help of a document scanner (see :ref:`scanners`), paperless transforms
|
|
||||||
your wieldy physical document binders into a searchable archive and
|
|
||||||
provides many utilities for finding and managing your documents.
|
|
||||||
|
|
||||||
|
.. cssclass:: redirect-notice
|
||||||
|
|
||||||
Terms and definitions
|
The Paperless-ngx documentation has permanently moved.
|
||||||
#####################
|
|
||||||
|
|
||||||
Paperless essentially consists of two different parts for managing your
|
You will be redirected shortly...
|
||||||
documents:
|
|
||||||
|
|
||||||
* The *consumer* watches a specified folder and adds all documents in that
|
|
||||||
folder to paperless.
|
|
||||||
* The *web server* provides a UI that you use to manage and search for your
|
|
||||||
scanned documents.
|
|
||||||
|
|
||||||
Each document has a couple of fields that you can assign to them:
|
|
||||||
|
|
||||||
* A *Document* is a piece of paper that sometimes contains valuable
|
|
||||||
information.
|
|
||||||
* The *correspondent* of a document is the person, institution or company that
|
|
||||||
a document either originates from, or is sent to.
|
|
||||||
* A *tag* is a label that you can assign to documents. Think of labels as more
|
|
||||||
powerful folders: Multiple documents can be grouped together with a single
|
|
||||||
tag, however, a single document can also have multiple tags. This is not
|
|
||||||
possible with folders. The reason folders are not implemented in paperless
|
|
||||||
is simply that tags are much more versatile than folders.
|
|
||||||
* A *document type* is used to demarcate the type of a document such as letter,
|
|
||||||
bank statement, invoice, contract, etc. It is used to identify what a document
|
|
||||||
is about.
|
|
||||||
* The *date added* of a document is the date the document was scanned into
|
|
||||||
paperless. You cannot and should not change this date.
|
|
||||||
* The *date created* of a document is the date the document was initially issued.
|
|
||||||
This can be the date you bought a product, the date you signed a contract, or
|
|
||||||
the date a letter was sent to you.
|
|
||||||
* The *archive serial number* (short: ASN) of a document is the identifier of
|
|
||||||
the document in your physical document binders. See
|
|
||||||
:ref:`usage-recommended_workflow` below.
|
|
||||||
* The *content* of a document is the text that was OCR'ed from the document.
|
|
||||||
This text is fed into the search engine and is used for matching tags,
|
|
||||||
correspondents and document types.
|
|
||||||
|
|
||||||
|
|
||||||
Frontend overview
|
|
||||||
#################
|
|
||||||
|
|
||||||
.. warning::
|
|
||||||
|
|
||||||
TBD. Add some fancy screenshots!
|
|
||||||
|
|
||||||
Adding documents to paperless
|
|
||||||
#############################
|
|
||||||
|
|
||||||
Once you've got Paperless setup, you need to start feeding documents into it.
|
|
||||||
When adding documents to paperless, it will perform the following operations on
|
|
||||||
your documents:
|
|
||||||
|
|
||||||
1. OCR the document, if it has no text. Digital documents usually have text,
|
|
||||||
and this step will be skipped for those documents.
|
|
||||||
2. Paperless will create an archivable PDF/A document from your document.
|
|
||||||
If this document is coming from your scanner, it will have embedded selectable text.
|
|
||||||
3. Paperless performs automatic matching of tags, correspondents and types on the
|
|
||||||
document before storing it in the database.
|
|
||||||
|
|
||||||
.. hint::
|
|
||||||
|
|
||||||
This process can be configured to fit your needs. If you don't want paperless
|
|
||||||
to create archived versions for digital documents, you can configure that by
|
|
||||||
configuring ``PAPERLESS_OCR_MODE=skip_noarchive``. Please read the
|
|
||||||
:ref:`relevant section in the documentation <configuration-ocr>`.
|
|
||||||
|
|
||||||
.. note::
|
|
||||||
|
|
||||||
No matter which options you choose, Paperless will always store the original
|
|
||||||
document that it found in the consumption directory or in the mail and
|
|
||||||
will never overwrite that document. Archived versions are stored alongside the
|
|
||||||
original versions.
|
|
||||||
|
|
||||||
|
|
||||||
The consumption directory
|
|
||||||
=========================
|
|
||||||
|
|
||||||
The primary method of getting documents into your database is by putting them in
|
|
||||||
the consumption directory. The consumer runs in an infinite loop, looking for new
|
|
||||||
additions to this directory. When it finds them, the consumer goes about the process
|
|
||||||
of parsing them with the OCR, indexing what it finds, and storing it in the media directory.
|
|
||||||
|
|
||||||
Getting stuff into this directory is up to you. If you're running Paperless
|
|
||||||
on your local computer, you might just want to drag and drop files there, but if
|
|
||||||
you're running this on a server and want your scanner to automatically push
|
|
||||||
files to this directory, you'll need to setup some sort of service to accept the
|
|
||||||
files from the scanner. Typically, you're looking at an FTP server like
|
|
||||||
`Proftpd`_ or a Windows folder share with `Samba`_.
|
|
||||||
|
|
||||||
.. _Proftpd: http://www.proftpd.org/
|
|
||||||
.. _Samba: http://www.samba.org/
|
|
||||||
|
|
||||||
.. TODO: hyperref to configuration of the location of this magic folder.
|
|
||||||
|
|
||||||
Web UI Upload
|
|
||||||
=============
|
|
||||||
|
|
||||||
The dashboard has a file drop field to upload documents to paperless. Simply drag a file
|
|
||||||
onto this field or select a file with the file dialog. Multiple files are supported.
|
|
||||||
|
|
||||||
You can also upload documents on any other page of the web UI by dragging-and-dropping
|
|
||||||
files into your browser window.
|
|
||||||
|
|
||||||
.. _usage-mobile_upload:
|
|
||||||
|
|
||||||
Mobile upload
|
|
||||||
=============
|
|
||||||
|
|
||||||
The mobile app over at `<https://github.com/qcasey/paperless_share>`_ allows Android users
|
|
||||||
to share any documents with paperless. This can be combined with any of the mobile
|
|
||||||
scanning apps out there, such as Office Lens.
|
|
||||||
|
|
||||||
Furthermore, there is the `Paperless App <https://github.com/bauerj/paperless_app>`_ as well,
|
|
||||||
which not only has document upload, but also document browsing and download features.
|
|
||||||
|
|
||||||
.. _usage-email:
|
|
||||||
|
|
||||||
IMAP (Email)
|
|
||||||
============
|
|
||||||
|
|
||||||
You can tell paperless-ngx to consume documents from your email accounts.
|
|
||||||
This is a very flexible and powerful feature, if you regularly received documents
|
|
||||||
via mail that you need to archive. The mail consumer can be configured by using the
|
|
||||||
admin interface in the following manner:
|
|
||||||
|
|
||||||
1. Define e-mail accounts.
|
|
||||||
2. Define mail rules for your account.
|
|
||||||
|
|
||||||
These rules perform the following:
|
|
||||||
|
|
||||||
1. Connect to the mail server.
|
|
||||||
2. Fetch all matching mails (as defined by folder, maximum age and the filters)
|
|
||||||
3. Check if there are any consumable attachments.
|
|
||||||
4. If so, instruct paperless to consume the attachments and optionally
|
|
||||||
use the metadata provided in the rule for the new document.
|
|
||||||
5. If documents were consumed from a mail, the rule action is performed
|
|
||||||
on that mail.
|
|
||||||
|
|
||||||
Paperless will completely ignore mails that do not match your filters. It will also
|
|
||||||
only perform the action on mails that it has consumed documents from.
|
|
||||||
|
|
||||||
The actions all ensure that the same mail is not consumed twice by different means.
|
|
||||||
These are as follows:
|
|
||||||
|
|
||||||
* **Delete:** Immediately deletes mail that paperless has consumed documents from.
|
|
||||||
Use with caution.
|
|
||||||
* **Mark as read:** Mark consumed mail as read. Paperless will not consume documents
|
|
||||||
from already read mails. If you read a mail before paperless sees it, it will be
|
|
||||||
ignored.
|
|
||||||
* **Flag:** Sets the 'important' flag on mails with consumed documents. Paperless
|
|
||||||
will not consume flagged mails.
|
|
||||||
* **Move to folder:** Moves consumed mails out of the way so that paperless wont
|
|
||||||
consume them again.
|
|
||||||
* **Add custom Tag:** Adds a custom tag to mails with consumed documents (the IMAP
|
|
||||||
standard calls these "keywords"). Paperless will not consume mails already tagged.
|
|
||||||
Not all mail servers support this feature!
|
|
||||||
|
|
||||||
.. caution::
|
|
||||||
|
|
||||||
The mail consumer will perform these actions on all mails it has consumed
|
|
||||||
documents from. Keep in mind that the actual consumption process may fail
|
|
||||||
for some reason, leaving you with missing documents in paperless.
|
|
||||||
|
|
||||||
.. note::
|
|
||||||
|
|
||||||
With the correct set of rules, you can completely automate your email documents.
|
|
||||||
Create rules for every correspondent you receive digital documents from and
|
|
||||||
paperless will read them automatically. The default action "mark as read" is
|
|
||||||
pretty tame and will not cause any damage or data loss whatsoever.
|
|
||||||
|
|
||||||
You can also setup a special folder in your mail account for paperless and use
|
|
||||||
your favorite mail client to move to be consumed mails into that folder
|
|
||||||
automatically or manually and tell paperless to move them to yet another folder
|
|
||||||
after consumption. It's up to you.
|
|
||||||
|
|
||||||
.. note::
|
|
||||||
|
|
||||||
When defining a mail rule with a folder, you may need to try different characters to
|
|
||||||
define how the sub-folders are separated. Common values include ".", "/" or "|", but
|
|
||||||
this varies by the mail server. Check the documentation for your mail server. In the
|
|
||||||
event of an error fetching mail from a certain folder, check the Paperless logs. When
|
|
||||||
a folder is not located, Paperless will attempt to list all folders found in the account
|
|
||||||
to the Paperless logs.
|
|
||||||
|
|
||||||
.. note::
|
|
||||||
|
|
||||||
Paperless will process the rules in the order defined in the admin page.
|
|
||||||
|
|
||||||
You can define catch-all rules and have them executed last to consume
|
|
||||||
any documents not matched by previous rules. Such a rule may assign an "Unknown
|
|
||||||
mail document" tag to consumed documents so you can inspect them further.
|
|
||||||
|
|
||||||
Paperless is set up to check your mails every 10 minutes. This can be configured on the
|
|
||||||
'Scheduled tasks' page in the admin.
|
|
||||||
|
|
||||||
|
|
||||||
REST API
|
|
||||||
========
|
|
||||||
|
|
||||||
You can also submit a document using the REST API, see :ref:`api-file_uploads` for details.
|
|
||||||
|
|
||||||
.. _basic-searching:
|
|
||||||
|
|
||||||
|
|
||||||
Best practices
|
|
||||||
##############
|
|
||||||
|
|
||||||
Paperless offers a couple tools that help you organize your document collection. However,
|
|
||||||
it is up to you to use them in a way that helps you organize documents and find specific
|
|
||||||
documents when you need them. This section offers a couple ideas for managing your collection.
|
|
||||||
|
|
||||||
Document types allow you to classify documents according to what they are. You can define
|
|
||||||
types such as "Receipt", "Invoice", or "Contract". If you used to collect all your receipts
|
|
||||||
in a single binder, you can recreate that system in paperless by defining a document type,
|
|
||||||
assigning documents to that type and then filtering by that type to only see all receipts.
|
|
||||||
|
|
||||||
Not all documents need document types. Sometimes its hard to determine what the type of a
|
|
||||||
document is or it is hard to justify creating a document type that you only need once or twice.
|
|
||||||
This is okay. As long as the types you define help you organize your collection in the way
|
|
||||||
you want, paperless is doing its job.
|
|
||||||
|
|
||||||
Tags can be used in many different ways. Think of tags are more versatile folders or binders.
|
|
||||||
If you have a binder for documents related to university / your car or health care, you can
|
|
||||||
create these binders in paperless by creating tags and assigning them to relevant documents.
|
|
||||||
Just as with documents, you can filter the document list by tags and only see documents of
|
|
||||||
a certain topic.
|
|
||||||
|
|
||||||
With physical documents, you'll often need to decide which folder the document belongs to.
|
|
||||||
The advantage of tags over folders and binders is that a single document can have multiple
|
|
||||||
tags. A physical document cannot magically appear in two different folders, but with tags,
|
|
||||||
this is entirely possible.
|
|
||||||
|
|
||||||
.. hint::
|
|
||||||
|
|
||||||
This can be used in many different ways. One example: Imagine you're working on a particular
|
|
||||||
task, such as signing up for university. Usually you'll need to collect a bunch of different
|
|
||||||
documents that are already sorted into various folders. With the tag system of paperless,
|
|
||||||
you can create a new group of documents that are relevant to this task without destroying
|
|
||||||
the already existing organization. When you're done with the task, you could delete the
|
|
||||||
tag again, which would be equal to sorting documents back into the folder they belong into.
|
|
||||||
Or keep the tag, up to you.
|
|
||||||
|
|
||||||
All of the logic above applies to correspondents as well. Attach them to documents if you
|
|
||||||
feel that they help you organize your collection.
|
|
||||||
|
|
||||||
When you've started organizing your documents, create a couple saved views for document collections
|
|
||||||
you regularly access. This is equal to having labeled physical binders on your desk, except
|
|
||||||
that these saved views are dynamic and simply update themselves as you add documents to the system.
|
|
||||||
|
|
||||||
Here are a couple examples of tags and types that you could use in your collection.
|
|
||||||
|
|
||||||
* An ``inbox`` tag for newly added documents that you haven't manually edited yet.
|
|
||||||
* A tag ``car`` for everything car related (repairs, registration, insurance, etc)
|
|
||||||
* A tag ``todo`` for documents that you still need to do something with, such as reply, or
|
|
||||||
perform some task online.
|
|
||||||
* A tag ``bank account x`` for all bank statement related to that account.
|
|
||||||
* A tag ``mail`` for anything that you added to paperless via its mail processing capabilities.
|
|
||||||
* A tag ``missing_metadata`` when you still need to add some metadata to a document, but can't
|
|
||||||
or don't want to do this right now.
|
|
||||||
|
|
||||||
.. _basic-usage_searching:
|
|
||||||
|
|
||||||
Searching
|
|
||||||
#########
|
|
||||||
|
|
||||||
Paperless offers an extensive searching mechanism that is designed to allow you to quickly
|
|
||||||
find a document you're looking for (for example, that thing that just broke and you bought
|
|
||||||
a couple months ago, that contract you signed 8 years ago).
|
|
||||||
|
|
||||||
When you search paperless for a document, it tries to match this query against your documents.
|
|
||||||
Paperless will look for matching documents by inspecting their content, title, correspondent,
|
|
||||||
type and tags. Paperless returns a scored list of results, so that documents matching your query
|
|
||||||
better will appear further up in the search results.
|
|
||||||
|
|
||||||
By default, paperless returns only documents which contain all words typed in the search bar.
|
|
||||||
However, paperless also offers advanced search syntax if you want to drill down the results
|
|
||||||
further.
|
|
||||||
|
|
||||||
Matching documents with logical expressions:
|
|
||||||
|
|
||||||
.. code::
|
|
||||||
|
|
||||||
shopname AND (product1 OR product2)
|
|
||||||
|
|
||||||
Matching specific tags, correspondents or types:
|
|
||||||
|
|
||||||
.. code::
|
|
||||||
|
|
||||||
type:invoice tag:unpaid
|
|
||||||
correspondent:university certificate
|
|
||||||
|
|
||||||
Matching dates:
|
|
||||||
|
|
||||||
.. code::
|
|
||||||
|
|
||||||
created:[2005 to 2009]
|
|
||||||
added:yesterday
|
|
||||||
modified:today
|
|
||||||
|
|
||||||
Matching inexact words:
|
|
||||||
|
|
||||||
.. code::
|
|
||||||
|
|
||||||
produ*name
|
|
||||||
|
|
||||||
.. note::
|
|
||||||
|
|
||||||
Inexact terms are hard for search indexes. These queries might take a while to execute. That's why paperless offers
|
|
||||||
auto complete and query correction.
|
|
||||||
|
|
||||||
All of these constructs can be combined as you see fit.
|
|
||||||
If you want to learn more about the query language used by paperless, paperless uses Whoosh's default query language.
|
|
||||||
Head over to `Whoosh query language <https://whoosh.readthedocs.io/en/latest/querylang.html>`_.
|
|
||||||
For details on what date parsing utilities are available, see
|
|
||||||
`Date parsing <https://whoosh.readthedocs.io/en/latest/dates.html#parsing-date-queries>`_.
|
|
||||||
|
|
||||||
|
|
||||||
.. _usage-recommended_workflow:
|
|
||||||
|
|
||||||
The recommended workflow
|
|
||||||
########################
|
|
||||||
|
|
||||||
Once you have familiarized yourself with paperless and are ready to use it
|
|
||||||
for all your documents, the recommended workflow for managing your documents
|
|
||||||
is as follows. This workflow also takes into account that some documents
|
|
||||||
have to be kept in physical form, but still ensures that you get all the
|
|
||||||
advantages for these documents as well.
|
|
||||||
|
|
||||||
The following diagram shows how easy it is to manage your documents.
|
|
||||||
|
|
||||||
.. image:: _static/recommended_workflow.png
|
|
||||||
|
|
||||||
Preparations in paperless
|
|
||||||
=========================
|
|
||||||
|
|
||||||
* Create an inbox tag that gets assigned to all new documents.
|
|
||||||
* Create a TODO tag.
|
|
||||||
|
|
||||||
Processing of the physical documents
|
|
||||||
====================================
|
|
||||||
|
|
||||||
Keep a physical inbox. Whenever you receive a document that you need to
|
|
||||||
archive, put it into your inbox. Regularly, do the following for all documents
|
|
||||||
in your inbox:
|
|
||||||
|
|
||||||
1. For each document, decide if you need to keep the document in physical
|
|
||||||
form. This applies to certain important documents, such as contracts and
|
|
||||||
certificates.
|
|
||||||
2. If you need to keep the document, write a running number on the document
|
|
||||||
before scanning, starting at one and counting upwards. This is the archive
|
|
||||||
serial number, or ASN in short.
|
|
||||||
3. Scan the document.
|
|
||||||
4. If the document has an ASN assigned, store it in a *single* binder, sorted
|
|
||||||
by ASN. Don't order this binder in any other way.
|
|
||||||
5. If the document has no ASN, throw it away. Yay!
|
|
||||||
|
|
||||||
Over time, you will notice that your physical binder will fill up. If it is
|
|
||||||
full, label the binder with the range of ASNs in this binder (i.e., "Documents
|
|
||||||
1 to 343"), store the binder in your cellar or elsewhere, and start a new
|
|
||||||
binder.
|
|
||||||
|
|
||||||
The idea behind this process is that you will never have to use the physical
|
|
||||||
binders to find a document. If you need a specific physical document, you
|
|
||||||
may find this document by:
|
|
||||||
|
|
||||||
1. Searching in paperless for the document.
|
|
||||||
2. Identify the ASN of the document, since it appears on the scan.
|
|
||||||
3. Grab the relevant document binder and get the document. This is easy since
|
|
||||||
they are sorted by ASN.
|
|
||||||
|
|
||||||
Processing of documents in paperless
|
|
||||||
====================================
|
|
||||||
|
|
||||||
Once you have scanned in a document, proceed in paperless as follows.
|
|
||||||
|
|
||||||
1. If the document has an ASN, assign the ASN to the document.
|
|
||||||
2. Assign a correspondent to the document (i.e., your employer, bank, etc)
|
|
||||||
This isn't strictly necessary but helps in finding a document when you need
|
|
||||||
it.
|
|
||||||
3. Assign a document type (i.e., invoice, bank statement, etc) to the document
|
|
||||||
This isn't strictly necessary but helps in finding a document when you need
|
|
||||||
it.
|
|
||||||
4. Assign a proper title to the document (the name of an item you bought, the
|
|
||||||
subject of the letter, etc)
|
|
||||||
5. Check that the date of the document is correct. Paperless tries to read
|
|
||||||
the date from the content of the document, but this fails sometimes if the
|
|
||||||
OCR is bad or multiple dates appear on the document.
|
|
||||||
6. Remove inbox tags from the documents.
|
|
||||||
|
|
||||||
.. hint::
|
|
||||||
|
|
||||||
You can setup manual matching rules for your correspondents and tags and
|
|
||||||
paperless will assign them automatically. After consuming a couple documents,
|
|
||||||
you can even ask paperless to *learn* when to assign tags and correspondents
|
|
||||||
by itself. For details on this feature, see :ref:`advanced-matching`.
|
|
||||||
|
|
||||||
Task management
|
|
||||||
===============
|
|
||||||
|
|
||||||
Some documents require attention and require you to act on the document. You
|
|
||||||
may take two different approaches to handle these documents based on how
|
|
||||||
regularly you intend to scan documents and use paperless.
|
|
||||||
|
|
||||||
* If you scan and process your documents in paperless regularly, assign a
|
|
||||||
TODO tag to all scanned documents that you need to process. Create a saved
|
|
||||||
view on the dashboard that shows all documents with this tag.
|
|
||||||
* If you do not scan documents regularly and use paperless solely for archiving,
|
|
||||||
create a physical todo box next to your physical inbox and put documents you
|
|
||||||
need to process in the TODO box. When you performed the task associated with
|
|
||||||
the document, move it to the inbox.
|
|
||||||
|
Loading…
x
Reference in New Issue
Block a user