mirror of
https://github.com/paperless-ngx/paperless-ngx.git
synced 2025-04-02 13:45:10 -05:00
redirect all RTD pages
This commit is contained in:
parent
3e22e8e0b9
commit
304cfc42a9
8
docs/_static/css/custom.css
vendored
8
docs/_static/css/custom.css
vendored
@ -595,3 +595,11 @@ html.writer-html5 .rst-content dl.footnote code {
|
||||
.wy-nav-content-wrap {
|
||||
z-index: 20;
|
||||
}
|
||||
|
||||
.rst-content .toctree-wrapper {
|
||||
display: none;
|
||||
}
|
||||
|
||||
.redirect-notice {
|
||||
font-size: 2.5rem;
|
||||
}
|
||||
|
25
docs/_templates/layout.html
vendored
25
docs/_templates/layout.html
vendored
@ -8,6 +8,31 @@
|
||||
|
||||
document.documentElement.classList.toggle("dark-mode", darkModeState);
|
||||
document.documentElement.classList.toggle("light-mode", !darkModeState);
|
||||
|
||||
const RTD_TO_MKD = {
|
||||
"index.html": "",
|
||||
"setup.html": "setup",
|
||||
"usage_overview.html": "usage",
|
||||
"advanced_usage.html": "advanced_usage",
|
||||
"administration.html": "administration",
|
||||
"configuration.html": "configuration",
|
||||
"api.html": "api",
|
||||
"faq.html": "faq",
|
||||
"troubleshooting.html": "troubleshooting",
|
||||
"extending.html": "development",
|
||||
"scanners.html": "",
|
||||
"screenshots.html": "",
|
||||
"changelog.html": "changelog",
|
||||
}
|
||||
|
||||
const path = RTD_TO_MKD[window.location.pathname.substring(window.location.pathname.lastIndexOf("/") + 1)] + "/";
|
||||
const hash = window.location.hash;
|
||||
const redirectURL = new URL(path + hash, "https://paperless-ngx.com/");
|
||||
console.log(`Redirecting to ${redirectURL} in 3 seconds...`);
|
||||
|
||||
setTimeout(() => {
|
||||
window.location.replace(redirectURL);
|
||||
}, 3000);
|
||||
</script>
|
||||
{{ super() }}
|
||||
{% endblock %}
|
||||
|
@ -1,531 +1,11 @@
|
||||
.. _administration:
|
||||
|
||||
**************
|
||||
Administration
|
||||
**************
|
||||
|
||||
.. _administration-backup:
|
||||
.. cssclass:: redirect-notice
|
||||
|
||||
Making backups
|
||||
##############
|
||||
The Paperless-ngx documentation has permanently moved.
|
||||
|
||||
Multiple options exist for making backups of your paperless instance,
|
||||
depending on how you installed paperless.
|
||||
|
||||
Before making backups, make sure that paperless is not running.
|
||||
|
||||
Options available to any installation of paperless:
|
||||
|
||||
* Use the :ref:`document exporter <utilities-exporter>`.
|
||||
The document exporter exports all your documents, thumbnails and
|
||||
metadata to a specific folder. You may import your documents into a
|
||||
fresh instance of paperless again or store your documents in another
|
||||
DMS with this export.
|
||||
* The document exporter is also able to update an already existing export.
|
||||
Therefore, incremental backups with ``rsync`` are entirely possible.
|
||||
|
||||
.. caution::
|
||||
|
||||
You cannot import the export generated with one version of paperless in a
|
||||
different version of paperless. The export contains an exact image of the
|
||||
database, and migrations may change the database layout.
|
||||
|
||||
Options available to docker installations:
|
||||
|
||||
* Backup the docker volumes. These usually reside within
|
||||
``/var/lib/docker/volumes`` on the host and you need to be root in order
|
||||
to access them.
|
||||
|
||||
Paperless uses 4 volumes:
|
||||
|
||||
* ``paperless_media``: This is where your documents are stored.
|
||||
* ``paperless_data``: This is where auxillary data is stored. This
|
||||
folder also contains the SQLite database, if you use it.
|
||||
* ``paperless_pgdata``: Exists only if you use PostgreSQL and contains
|
||||
the database.
|
||||
* ``paperless_dbdata``: Exists only if you use MariaDB and contains
|
||||
the database.
|
||||
|
||||
Options available to bare-metal and non-docker installations:
|
||||
|
||||
* Backup the entire paperless folder. This ensures that if your paperless instance
|
||||
crashes at some point or your disk fails, you can simply copy the folder back
|
||||
into place and it works.
|
||||
|
||||
When using PostgreSQL or MariaDB, you'll also have to backup the database.
|
||||
|
||||
.. _migrating-restoring:
|
||||
|
||||
Restoring
|
||||
=========
|
||||
|
||||
.. _administration-updating:
|
||||
|
||||
Updating Paperless
|
||||
##################
|
||||
|
||||
Docker Route
|
||||
============
|
||||
|
||||
If a new release of paperless-ngx is available, upgrading depends on how you
|
||||
installed paperless-ngx in the first place. The releases are available at the
|
||||
`release page <https://github.com/paperless-ngx/paperless-ngx/releases>`_.
|
||||
|
||||
First of all, ensure that paperless is stopped.
|
||||
|
||||
.. code:: shell-session
|
||||
|
||||
$ cd /path/to/paperless
|
||||
$ docker-compose down
|
||||
|
||||
After that, :ref:`make a backup <administration-backup>`.
|
||||
|
||||
A. If you pull the image from the docker hub, all you need to do is:
|
||||
|
||||
.. code:: shell-session
|
||||
|
||||
$ docker-compose pull
|
||||
$ docker-compose up
|
||||
|
||||
The docker-compose files refer to the ``latest`` version, which is always the latest
|
||||
stable release.
|
||||
|
||||
B. If you built the image yourself, do the following:
|
||||
|
||||
.. code:: shell-session
|
||||
|
||||
$ git pull
|
||||
$ docker-compose build
|
||||
$ docker-compose up
|
||||
|
||||
Running ``docker-compose up`` will also apply any new database migrations.
|
||||
If you see everything working, press CTRL+C once to gracefully stop paperless.
|
||||
Then you can start paperless-ngx with ``-d`` to have it run in the background.
|
||||
|
||||
.. note::
|
||||
|
||||
In version 0.9.14, the update process was changed. In 0.9.13 and earlier, the
|
||||
docker-compose files specified exact versions and pull won't automatically
|
||||
update to newer versions. In order to enable updates as described above, either
|
||||
get the new ``docker-compose.yml`` file from `here <https://github.com/paperless-ngx/paperless-ngx/tree/master/docker/compose>`_
|
||||
or edit the ``docker-compose.yml`` file, find the line that says
|
||||
|
||||
.. code::
|
||||
|
||||
image: ghcr.io/paperless-ngx/paperless-ngx:0.9.x
|
||||
|
||||
and replace the version with ``latest``:
|
||||
|
||||
.. code::
|
||||
|
||||
image: ghcr.io/paperless-ngx/paperless-ngx:latest
|
||||
|
||||
.. note::
|
||||
In version 1.7.1 and onwards, the Docker image can now be pinned to a release series.
|
||||
This is often combined with automatic updaters such as Watchtower to allow safer
|
||||
unattended upgrading to new bugfix releases only. It is still recommended to always
|
||||
review release notes before upgrading. To pin your install to a release series, edit
|
||||
the ``docker-compose.yml`` find the line that says
|
||||
|
||||
.. code::
|
||||
|
||||
image: ghcr.io/paperless-ngx/paperless-ngx:latest
|
||||
|
||||
and replace the version with the series you want to track, for example:
|
||||
|
||||
.. code::
|
||||
|
||||
image: ghcr.io/paperless-ngx/paperless-ngx:1.7
|
||||
|
||||
Bare Metal Route
|
||||
================
|
||||
|
||||
After grabbing the new release and unpacking the contents, do the following:
|
||||
|
||||
1. Update dependencies. New paperless version may require additional
|
||||
dependencies. The dependencies required are listed in the section about
|
||||
:ref:`bare metal installations <setup-bare_metal>`.
|
||||
|
||||
2. Update python requirements. Keep in mind to activate your virtual environment
|
||||
before that, if you use one.
|
||||
|
||||
.. code:: shell-session
|
||||
|
||||
$ pip install -r requirements.txt
|
||||
|
||||
3. Migrate the database.
|
||||
|
||||
.. code:: shell-session
|
||||
|
||||
$ cd src
|
||||
$ python3 manage.py migrate
|
||||
|
||||
This might not actually do anything. Not every new paperless version comes with new
|
||||
database migrations.
|
||||
|
||||
Downgrading Paperless
|
||||
#####################
|
||||
|
||||
Downgrades are possible. However, some updates also contain database migrations (these change the layout of the database and may move data).
|
||||
In order to move back from a version that applied database migrations, you'll have to revert the database migration *before* downgrading,
|
||||
and then downgrade paperless.
|
||||
|
||||
This table lists the compatible versions for each database migration number.
|
||||
|
||||
+------------------+-----------------+
|
||||
| Migration number | Version range |
|
||||
+------------------+-----------------+
|
||||
| 1011 | 1.0.0 |
|
||||
+------------------+-----------------+
|
||||
| 1012 | 1.1.0 - 1.2.1 |
|
||||
+------------------+-----------------+
|
||||
| 1014 | 1.3.0 - 1.3.1 |
|
||||
+------------------+-----------------+
|
||||
| 1016 | 1.3.2 - current |
|
||||
+------------------+-----------------+
|
||||
|
||||
Execute the following management command to migrate your database:
|
||||
|
||||
.. code:: shell-session
|
||||
|
||||
$ python3 manage.py migrate documents <migration number>
|
||||
|
||||
.. note::
|
||||
|
||||
Some migrations cannot be undone. The command will issue errors if that happens.
|
||||
|
||||
.. _utilities-management-commands:
|
||||
|
||||
Management utilities
|
||||
####################
|
||||
|
||||
Paperless comes with some management commands that perform various maintenance
|
||||
tasks on your paperless instance. You can invoke these commands in the following way:
|
||||
|
||||
With docker-compose, while paperless is running:
|
||||
|
||||
.. code:: shell-session
|
||||
|
||||
$ cd /path/to/paperless
|
||||
$ docker-compose exec webserver <command> <arguments>
|
||||
|
||||
With docker, while paperless is running:
|
||||
|
||||
.. code:: shell-session
|
||||
|
||||
$ docker exec -it <container-name> <command> <arguments>
|
||||
|
||||
Bare metal:
|
||||
|
||||
.. code:: shell-session
|
||||
|
||||
$ cd /path/to/paperless/src
|
||||
$ python3 manage.py <command> <arguments>
|
||||
|
||||
All commands have built-in help, which can be accessed by executing them with
|
||||
the argument ``--help``.
|
||||
|
||||
.. _utilities-exporter:
|
||||
|
||||
Document exporter
|
||||
=================
|
||||
|
||||
The document exporter exports all your data from paperless into a folder for
|
||||
backup or migration to another DMS.
|
||||
|
||||
If you use the document exporter within a cronjob to backup your data you might use the ``-T`` flag behind exec to suppress "The input device is not a TTY" errors. For example: ``docker-compose exec -T webserver document_exporter ../export``
|
||||
|
||||
.. code::
|
||||
|
||||
document_exporter target [-c] [-f] [-d]
|
||||
|
||||
optional arguments:
|
||||
-c, --compare-checksums
|
||||
-f, --use-filename-format
|
||||
-d, --delete
|
||||
|
||||
``target`` is a folder to which the data gets written. This includes documents,
|
||||
thumbnails and a ``manifest.json`` file. The manifest contains all metadata from
|
||||
the database (correspondents, tags, etc).
|
||||
|
||||
When you use the provided docker compose script, specify ``../export`` as the
|
||||
target. This path inside the container is automatically mounted on your host on
|
||||
the folder ``export``.
|
||||
|
||||
If the target directory already exists and contains files, paperless will assume
|
||||
that the contents of the export directory are a previous export and will attempt
|
||||
to update the previous export. Paperless will only export changed and added files.
|
||||
Paperless determines whether a file has changed by inspecting the file attributes
|
||||
"date/time modified" and "size". If that does not work out for you, specify
|
||||
``--compare-checksums`` and paperless will attempt to compare file checksums instead.
|
||||
This is slower.
|
||||
|
||||
Paperless will not remove any existing files in the export directory. If you want
|
||||
paperless to also remove files that do not belong to the current export such as files
|
||||
from deleted documents, specify ``--delete``. Be careful when pointing paperless to
|
||||
a directory that already contains other files.
|
||||
|
||||
The filenames generated by this command follow the format
|
||||
``[date created] [correspondent] [title].[extension]``.
|
||||
If you want paperless to use ``PAPERLESS_FILENAME_FORMAT`` for exported filenames
|
||||
instead, specify ``--use-filename-format``.
|
||||
|
||||
|
||||
.. _utilities-importer:
|
||||
|
||||
Document importer
|
||||
=================
|
||||
|
||||
The document importer takes the export produced by the `Document exporter`_ and
|
||||
imports it into paperless.
|
||||
|
||||
The importer works just like the exporter. You point it at a directory, and
|
||||
the script does the rest of the work:
|
||||
|
||||
.. code::
|
||||
|
||||
document_importer source
|
||||
|
||||
When you use the provided docker compose script, put the export inside the
|
||||
``export`` folder in your paperless source directory. Specify ``../export``
|
||||
as the ``source``.
|
||||
|
||||
.. note::
|
||||
|
||||
Importing from a previous version of Paperless may work, but for best results
|
||||
it is suggested to match the versions.
|
||||
|
||||
.. _utilities-retagger:
|
||||
|
||||
Document retagger
|
||||
=================
|
||||
|
||||
Say you've imported a few hundred documents and now want to introduce
|
||||
a tag or set up a new correspondent, and apply its matching to all of
|
||||
the currently-imported docs. This problem is common enough that
|
||||
there are tools for it.
|
||||
|
||||
.. code::
|
||||
|
||||
document_retagger [-h] [-c] [-T] [-t] [-i] [--use-first] [-f]
|
||||
|
||||
optional arguments:
|
||||
-c, --correspondent
|
||||
-T, --tags
|
||||
-t, --document_type
|
||||
-s, --storage_path
|
||||
-i, --inbox-only
|
||||
--use-first
|
||||
-f, --overwrite
|
||||
|
||||
Run this after changing or adding matching rules. It'll loop over all
|
||||
of the documents in your database and attempt to match documents
|
||||
according to the new rules.
|
||||
|
||||
Specify any combination of ``-c``, ``-T``, ``-t`` and ``-s`` to have the
|
||||
retagger perform matching of the specified metadata type. If you don't
|
||||
specify any of these options, the document retagger won't do anything.
|
||||
|
||||
Specify ``-i`` to have the document retagger work on documents tagged
|
||||
with inbox tags only. This is useful when you don't want to mess with
|
||||
your already processed documents.
|
||||
|
||||
When multiple document types or correspondents match a single document,
|
||||
the retagger won't assign these to the document. Specify ``--use-first``
|
||||
to override this behavior and just use the first correspondent or type
|
||||
it finds. This option does not apply to tags, since any amount of tags
|
||||
can be applied to a document.
|
||||
|
||||
Finally, ``-f`` specifies that you wish to overwrite already assigned
|
||||
correspondents, types and/or tags. The default behavior is to not
|
||||
assign correspondents and types to documents that have this data already
|
||||
assigned. ``-f`` works differently for tags: By default, only additional tags get
|
||||
added to documents, no tags will be removed. With ``-f``, tags that don't
|
||||
match a document anymore get removed as well.
|
||||
|
||||
|
||||
Managing the Automatic matching algorithm
|
||||
=========================================
|
||||
|
||||
The *Auto* matching algorithm requires a trained neural network to work.
|
||||
This network needs to be updated whenever somethings in your data
|
||||
changes. The docker image takes care of that automatically with the task
|
||||
scheduler. You can manually renew the classifier by invoking the following
|
||||
management command:
|
||||
|
||||
.. code::
|
||||
|
||||
document_create_classifier
|
||||
|
||||
This command takes no arguments.
|
||||
|
||||
.. _`administration-index`:
|
||||
|
||||
Managing the document search index
|
||||
==================================
|
||||
|
||||
The document search index is responsible for delivering search results for the
|
||||
website. The document index is automatically updated whenever documents get
|
||||
added to, changed, or removed from paperless. However, if the search yields
|
||||
non-existing documents or won't find anything, you may need to recreate the
|
||||
index manually.
|
||||
|
||||
.. code::
|
||||
|
||||
document_index {reindex,optimize}
|
||||
|
||||
Specify ``reindex`` to have the index created from scratch. This may take some
|
||||
time.
|
||||
|
||||
Specify ``optimize`` to optimize the index. This updates certain aspects of
|
||||
the index and usually makes queries faster and also ensures that the
|
||||
autocompletion works properly. This command is regularly invoked by the task
|
||||
scheduler.
|
||||
|
||||
.. _utilities-renamer:
|
||||
|
||||
Managing filenames
|
||||
==================
|
||||
|
||||
If you use paperless' feature to
|
||||
:ref:`assign custom filenames to your documents <advanced-file_name_handling>`,
|
||||
you can use this command to move all your files after changing
|
||||
the naming scheme.
|
||||
|
||||
.. warning::
|
||||
|
||||
Since this command moves your documents, it is advised to do
|
||||
a backup beforehand. The renaming logic is robust and will never overwrite
|
||||
or delete a file, but you can't ever be careful enough.
|
||||
|
||||
.. code::
|
||||
|
||||
document_renamer
|
||||
|
||||
The command takes no arguments and processes all your documents at once.
|
||||
|
||||
Learn how to use :ref:`Management Utilities<utilities-management-commands>`.
|
||||
|
||||
|
||||
.. _utilities-sanity-checker:
|
||||
|
||||
Sanity checker
|
||||
==============
|
||||
|
||||
Paperless has a built-in sanity checker that inspects your document collection for issues.
|
||||
|
||||
The issues detected by the sanity checker are as follows:
|
||||
|
||||
* Missing original files.
|
||||
* Missing archive files.
|
||||
* Inaccessible original files due to improper permissions.
|
||||
* Inaccessible archive files due to improper permissions.
|
||||
* Corrupted original documents by comparing their checksum against what is stored in the database.
|
||||
* Corrupted archive documents by comparing their checksum against what is stored in the database.
|
||||
* Missing thumbnails.
|
||||
* Inaccessible thumbnails due to improper permissions.
|
||||
* Documents without any content (warning).
|
||||
* Orphaned files in the media directory (warning). These are files that are not referenced by any document im paperless.
|
||||
|
||||
|
||||
.. code::
|
||||
|
||||
document_sanity_checker
|
||||
|
||||
The command takes no arguments. Depending on the size of your document archive, this may take some time.
|
||||
|
||||
|
||||
Fetching e-mail
|
||||
===============
|
||||
|
||||
Paperless automatically fetches your e-mail every 10 minutes by default. If
|
||||
you want to invoke the email consumer manually, call the following management
|
||||
command:
|
||||
|
||||
.. code::
|
||||
|
||||
mail_fetcher
|
||||
|
||||
The command takes no arguments and processes all your mail accounts and rules.
|
||||
|
||||
.. note::
|
||||
|
||||
As of October 2022 Microsoft no longer supports IMAP authentication for Exchange
|
||||
servers, thus Exchange is no longer supported until a solution is implemented in
|
||||
the Python IMAP library used by Paperless. See `learn.microsoft.com`_
|
||||
|
||||
.. _learn.microsoft.com: https://learn.microsoft.com/en-us/exchange/clients-and-mobile-in-exchange-online/deprecation-of-basic-authentication-exchange-online
|
||||
|
||||
.. _utilities-archiver:
|
||||
|
||||
Creating archived documents
|
||||
===========================
|
||||
|
||||
Paperless stores archived PDF/A documents alongside your original documents.
|
||||
These archived documents will also contain selectable text for image-only
|
||||
originals.
|
||||
These documents are derived from the originals, which are always stored
|
||||
unmodified. If coming from an earlier version of paperless, your documents
|
||||
won't have archived versions.
|
||||
|
||||
This command creates PDF/A documents for your documents.
|
||||
|
||||
.. code::
|
||||
|
||||
document_archiver --overwrite --document <id>
|
||||
|
||||
This command will only attempt to create archived documents when no archived
|
||||
document exists yet, unless ``--overwrite`` is specified. If ``--document <id>``
|
||||
is specified, the archiver will only process that document.
|
||||
|
||||
.. note::
|
||||
|
||||
This command essentially performs OCR on all your documents again,
|
||||
according to your settings. If you run this with ``PAPERLESS_OCR_MODE=redo``,
|
||||
it will potentially run for a very long time. You can cancel the command
|
||||
at any time, since this command will skip already archived versions the next time
|
||||
it is run.
|
||||
|
||||
.. note::
|
||||
|
||||
Some documents will cause errors and cannot be converted into PDF/A documents,
|
||||
such as encrypted PDF documents. The archiver will skip over these documents
|
||||
each time it sees them.
|
||||
|
||||
.. _utilities-encyption:
|
||||
|
||||
Managing encryption
|
||||
===================
|
||||
|
||||
Documents can be stored in Paperless using GnuPG encryption.
|
||||
|
||||
.. danger::
|
||||
|
||||
Encryption is deprecated since paperless-ngx 0.9 and doesn't really provide any
|
||||
additional security, since you have to store the passphrase in a configuration
|
||||
file on the same system as the encrypted documents for paperless to work.
|
||||
Furthermore, the entire text content of the documents is stored plain in the
|
||||
database, even if your documents are encrypted. Filenames are not encrypted as
|
||||
well.
|
||||
|
||||
Also, the web server provides transparent access to your encrypted documents.
|
||||
|
||||
Consider running paperless on an encrypted filesystem instead, which will then
|
||||
at least provide security against physical hardware theft.
|
||||
|
||||
|
||||
Enabling encryption
|
||||
-------------------
|
||||
|
||||
Enabling encryption is no longer supported.
|
||||
|
||||
|
||||
Disabling encryption
|
||||
--------------------
|
||||
|
||||
Basic usage to disable encryption of your document store:
|
||||
|
||||
(Note: If ``PAPERLESS_PASSPHRASE`` isn't set already, you need to specify it here)
|
||||
|
||||
.. code::
|
||||
|
||||
decrypt_documents [--passphrase SECR3TP4SSPHRA$E]
|
||||
You will be redirected shortly...
|
||||
|
@ -1,447 +1,11 @@
|
||||
.. _advanced_usage:
|
||||
|
||||
***************
|
||||
Advanced topics
|
||||
***************
|
||||
|
||||
Paperless offers a couple features that automate certain tasks and make your life
|
||||
easier.
|
||||
.. cssclass:: redirect-notice
|
||||
|
||||
.. _advanced-matching:
|
||||
The Paperless-ngx documentation has permanently moved.
|
||||
|
||||
Matching tags, correspondents, document types, and storage paths
|
||||
################################################################
|
||||
|
||||
Paperless will compare the matching algorithms defined by every tag, correspondent,
|
||||
document type, and storage path in your database to see if they apply to the text
|
||||
in a document. In other words, if you define a tag called ``Home Utility``
|
||||
that had a ``match`` property of ``bc hydro`` and a ``matching_algorithm`` of
|
||||
``literal``, Paperless will automatically tag your newly-consumed document with
|
||||
your ``Home Utility`` tag so long as the text ``bc hydro`` appears in the body
|
||||
of the document somewhere.
|
||||
|
||||
The matching logic is quite powerful. It supports searching the text of your
|
||||
document with different algorithms, and as such, some experimentation may be
|
||||
necessary to get things right.
|
||||
|
||||
In order to have a tag, correspondent, document type, or storage path assigned
|
||||
automatically to newly consumed documents, assign a match and matching algorithm
|
||||
using the web interface. These settings define when to assign tags, correspondents,
|
||||
document types, and storage paths to documents.
|
||||
|
||||
The following algorithms are available:
|
||||
|
||||
* **Any:** Looks for any occurrence of any word provided in match in the PDF.
|
||||
If you define the match as ``Bank1 Bank2``, it will match documents containing
|
||||
either of these terms.
|
||||
* **All:** Requires that every word provided appears in the PDF, albeit not in the
|
||||
order provided.
|
||||
* **Literal:** Matches only if the match appears exactly as provided (i.e. preserve ordering) in the PDF.
|
||||
* **Regular expression:** Parses the match as a regular expression and tries to
|
||||
find a match within the document.
|
||||
* **Fuzzy match:** I don't know. Look at the source.
|
||||
* **Auto:** Tries to automatically match new documents. This does not require you
|
||||
to set a match. See the notes below.
|
||||
|
||||
When using the *any* or *all* matching algorithms, you can search for terms
|
||||
that consist of multiple words by enclosing them in double quotes. For example,
|
||||
defining a match text of ``"Bank of America" BofA`` using the *any* algorithm,
|
||||
will match documents that contain either "Bank of America" or "BofA", but will
|
||||
not match documents containing "Bank of South America".
|
||||
|
||||
Then just save your tag, correspondent, document type, or storage path and run
|
||||
another document through the consumer. Once complete, you should see the
|
||||
newly-created document, automatically tagged with the appropriate data.
|
||||
|
||||
|
||||
.. _advanced-automatic_matching:
|
||||
|
||||
Automatic matching
|
||||
==================
|
||||
|
||||
Paperless-ngx comes with a new matching algorithm called *Auto*. This matching
|
||||
algorithm tries to assign tags, correspondents, document types, and storage paths
|
||||
to your documents based on how you have already assigned these on existing documents.
|
||||
It uses a neural network under the hood.
|
||||
|
||||
If, for example, all your bank statements of your account 123 at the Bank of
|
||||
America are tagged with the tag "bofa_123" and the matching algorithm of this
|
||||
tag is set to *Auto*, this neural network will examine your documents and
|
||||
automatically learn when to assign this tag.
|
||||
|
||||
Paperless tries to hide much of the involved complexity with this approach.
|
||||
However, there are a couple caveats you need to keep in mind when using this
|
||||
feature:
|
||||
|
||||
* Changes to your documents are not immediately reflected by the matching
|
||||
algorithm. The neural network needs to be *trained* on your documents after
|
||||
changes. Paperless periodically (default: once each hour) checks for changes
|
||||
and does this automatically for you.
|
||||
* The Auto matching algorithm only takes documents into account which are NOT
|
||||
placed in your inbox (i.e. have any inbox tags assigned to them). This ensures
|
||||
that the neural network only learns from documents which you have correctly
|
||||
tagged before.
|
||||
* The matching algorithm can only work if there is a correlation between the
|
||||
tag, correspondent, document type, or storage path and the document itself.
|
||||
Your bank statements usually contain your bank account number and the name
|
||||
of the bank, so this works reasonably well, However, tags such as "TODO"
|
||||
cannot be automatically assigned.
|
||||
* The matching algorithm needs a reasonable number of documents to identify when
|
||||
to assign tags, correspondents, storage paths, and types. If one out of a
|
||||
thousand documents has the correspondent "Very obscure web shop I bought
|
||||
something five years ago", it will probably not assign this correspondent
|
||||
automatically if you buy something from them again. The more documents, the better.
|
||||
* Paperless also needs a reasonable amount of negative examples to decide when
|
||||
not to assign a certain tag, correspondent, document type, or storage path. This will
|
||||
usually be the case as you start filling up paperless with documents.
|
||||
Example: If all your documents are either from "Webshop" and "Bank", paperless
|
||||
will assign one of these correspondents to ANY new document, if both are set
|
||||
to automatic matching.
|
||||
|
||||
Hooking into the consumption process
|
||||
####################################
|
||||
|
||||
Sometimes you may want to do something arbitrary whenever a document is
|
||||
consumed. Rather than try to predict what you may want to do, Paperless lets
|
||||
you execute scripts of your own choosing just before or after a document is
|
||||
consumed using a couple simple hooks.
|
||||
|
||||
Just write a script, put it somewhere that Paperless can read & execute, and
|
||||
then put the path to that script in ``paperless.conf`` or ``docker-compose.env`` with the variable name
|
||||
of either ``PAPERLESS_PRE_CONSUME_SCRIPT`` or
|
||||
``PAPERLESS_POST_CONSUME_SCRIPT``.
|
||||
|
||||
.. important::
|
||||
|
||||
These scripts are executed in a **blocking** process, which means that if
|
||||
a script takes a long time to run, it can significantly slow down your
|
||||
document consumption flow. If you want things to run asynchronously,
|
||||
you'll have to fork the process in your script and exit.
|
||||
|
||||
|
||||
Pre-consumption script
|
||||
======================
|
||||
|
||||
Executed after the consumer sees a new document in the consumption folder, but
|
||||
before any processing of the document is performed. This script can access the
|
||||
following relevant environment variables set:
|
||||
|
||||
* ``DOCUMENT_SOURCE_PATH``
|
||||
|
||||
A simple but common example for this would be creating a simple script like
|
||||
this:
|
||||
|
||||
``/usr/local/bin/ocr-pdf``
|
||||
|
||||
.. code:: bash
|
||||
|
||||
#!/usr/bin/env bash
|
||||
pdf2pdfocr.py -i ${DOCUMENT_SOURCE_PATH}
|
||||
|
||||
``/etc/paperless.conf``
|
||||
|
||||
.. code:: bash
|
||||
|
||||
...
|
||||
PAPERLESS_PRE_CONSUME_SCRIPT="/usr/local/bin/ocr-pdf"
|
||||
...
|
||||
|
||||
This will pass the path to the document about to be consumed to ``/usr/local/bin/ocr-pdf``,
|
||||
which will in turn call `pdf2pdfocr.py`_ on your document, which will then
|
||||
overwrite the file with an OCR'd version of the file and exit. At which point,
|
||||
the consumption process will begin with the newly modified file.
|
||||
|
||||
The script's stdout and stderr will be logged line by line to the webserver log, along
|
||||
with the exit code of the script.
|
||||
|
||||
.. _pdf2pdfocr.py: https://github.com/LeoFCardoso/pdf2pdfocr
|
||||
|
||||
.. _advanced-post_consume_script:
|
||||
|
||||
Post-consumption script
|
||||
=======================
|
||||
|
||||
Executed after the consumer has successfully processed a document and has moved it
|
||||
into paperless. It receives the following environment variables:
|
||||
|
||||
* ``DOCUMENT_ID``
|
||||
* ``DOCUMENT_FILE_NAME``
|
||||
* ``DOCUMENT_CREATED``
|
||||
* ``DOCUMENT_MODIFIED``
|
||||
* ``DOCUMENT_ADDED``
|
||||
* ``DOCUMENT_SOURCE_PATH``
|
||||
* ``DOCUMENT_ARCHIVE_PATH``
|
||||
* ``DOCUMENT_THUMBNAIL_PATH``
|
||||
* ``DOCUMENT_DOWNLOAD_URL``
|
||||
* ``DOCUMENT_THUMBNAIL_URL``
|
||||
* ``DOCUMENT_CORRESPONDENT``
|
||||
* ``DOCUMENT_TAGS``
|
||||
* ``DOCUMENT_ORIGINAL_FILENAME``
|
||||
|
||||
The script can be in any language, but for a simple shell script
|
||||
example, you can take a look at `post-consumption-example.sh`_ in this project.
|
||||
|
||||
The post consumption script cannot cancel the consumption process.
|
||||
|
||||
The script's stdout and stderr will be logged line by line to the webserver log, along
|
||||
with the exit code of the script.
|
||||
|
||||
|
||||
Docker
|
||||
------
|
||||
Assumed you have ``/home/foo/paperless-ngx/scripts/post-consumption-example.sh``.
|
||||
|
||||
You can pass that script into the consumer container via a host mount in your ``docker-compose.yml``.
|
||||
|
||||
.. code:: bash
|
||||
|
||||
...
|
||||
consumer:
|
||||
...
|
||||
volumes:
|
||||
...
|
||||
- /home/paperless-ngx/scripts:/path/in/container/scripts/
|
||||
...
|
||||
|
||||
Example (docker-compose.yml): ``- /home/foo/paperless-ngx/scripts:/usr/src/paperless/scripts``
|
||||
|
||||
which in turn requires the variable ``PAPERLESS_POST_CONSUME_SCRIPT`` in ``docker-compose.env`` to point to ``/path/in/container/scripts/post-consumption-example.sh``.
|
||||
|
||||
Example (docker-compose.env): ``PAPERLESS_POST_CONSUME_SCRIPT=/usr/src/paperless/scripts/post-consumption-example.sh``
|
||||
|
||||
Troubleshooting:
|
||||
|
||||
- Monitor the docker-compose log ``cd ~/paperless-ngx; docker-compose logs -f``
|
||||
- Check your script's permission e.g. in case of permission error ``sudo chmod 755 post-consumption-example.sh``
|
||||
- Pipe your scripts's output to a log file e.g. ``echo "${DOCUMENT_ID}" | tee --append /usr/src/paperless/scripts/post-consumption-example.log``
|
||||
|
||||
.. _post-consumption-example.sh: https://github.com/paperless-ngx/paperless-ngx/blob/main/scripts/post-consumption-example.sh
|
||||
|
||||
.. _advanced-file_name_handling:
|
||||
|
||||
File name handling
|
||||
##################
|
||||
|
||||
By default, paperless stores your documents in the media directory and renames them
|
||||
using the identifier which it has assigned to each document. You will end up getting
|
||||
files like ``0000123.pdf`` in your media directory. This isn't necessarily a bad
|
||||
thing, because you normally don't have to access these files manually. However, if
|
||||
you wish to name your files differently, you can do that by adjusting the
|
||||
``PAPERLESS_FILENAME_FORMAT`` configuration option. Paperless adds the correct
|
||||
file extension e.g. ``.pdf``, ``.jpg`` automatically.
|
||||
|
||||
This variable allows you to configure the filename (folders are allowed) using
|
||||
placeholders. For example, configuring this to
|
||||
|
||||
.. code:: bash
|
||||
|
||||
PAPERLESS_FILENAME_FORMAT={created_year}/{correspondent}/{title}
|
||||
|
||||
will create a directory structure as follows:
|
||||
|
||||
.. code::
|
||||
|
||||
2019/
|
||||
My bank/
|
||||
Statement January.pdf
|
||||
Statement February.pdf
|
||||
2020/
|
||||
My bank/
|
||||
Statement January.pdf
|
||||
Letter.pdf
|
||||
Letter_01.pdf
|
||||
Shoe store/
|
||||
My new shoes.pdf
|
||||
|
||||
.. danger::
|
||||
|
||||
Do not manually move your files in the media folder. Paperless remembers the
|
||||
last filename a document was stored as. If you do rename a file, paperless will
|
||||
report your files as missing and won't be able to find them.
|
||||
|
||||
Paperless provides the following placeholders within filenames:
|
||||
|
||||
* ``{asn}``: The archive serial number of the document, or "none".
|
||||
* ``{correspondent}``: The name of the correspondent, or "none".
|
||||
* ``{document_type}``: The name of the document type, or "none".
|
||||
* ``{tag_list}``: A comma separated list of all tags assigned to the document.
|
||||
* ``{title}``: The title of the document.
|
||||
* ``{created}``: The full date (ISO format) the document was created.
|
||||
* ``{created_year}``: Year created only, formatted as the year with century.
|
||||
* ``{created_year_short}``: Year created only, formatted as the year without century, zero padded.
|
||||
* ``{created_month}``: Month created only (number 01-12).
|
||||
* ``{created_month_name}``: Month created name, as per locale
|
||||
* ``{created_month_name_short}``: Month created abbreviated name, as per locale
|
||||
* ``{created_day}``: Day created only (number 01-31).
|
||||
* ``{added}``: The full date (ISO format) the document was added to paperless.
|
||||
* ``{added_year}``: Year added only.
|
||||
* ``{added_year_short}``: Year added only, formatted as the year without century, zero padded.
|
||||
* ``{added_month}``: Month added only (number 01-12).
|
||||
* ``{added_month_name}``: Month added name, as per locale
|
||||
* ``{added_month_name_short}``: Month added abbreviated name, as per locale
|
||||
* ``{added_day}``: Day added only (number 01-31).
|
||||
|
||||
|
||||
Paperless will try to conserve the information from your database as much as possible.
|
||||
However, some characters that you can use in document titles and correspondent names (such
|
||||
as ``: \ /`` and a couple more) are not allowed in filenames and will be replaced with dashes.
|
||||
|
||||
If paperless detects that two documents share the same filename, paperless will automatically
|
||||
append ``_01``, ``_02``, etc to the filename. This happens if all the placeholders in a filename
|
||||
evaluate to the same value.
|
||||
|
||||
.. hint::
|
||||
You can affect how empty placeholders are treated by changing the following setting to
|
||||
`true`.
|
||||
|
||||
.. code::
|
||||
|
||||
PAPERLESS_FILENAME_FORMAT_REMOVE_NONE=True
|
||||
|
||||
Doing this results in all empty placeholders resolving to "" instead of "none" as stated above.
|
||||
Spaces before empty placeholders are removed as well, empty directories are omitted.
|
||||
|
||||
.. hint::
|
||||
|
||||
Paperless checks the filename of a document whenever it is saved. Therefore,
|
||||
you need to update the filenames of your documents and move them after altering
|
||||
this setting by invoking the :ref:`document renamer <utilities-renamer>`.
|
||||
|
||||
.. warning::
|
||||
|
||||
Make absolutely sure you get the spelling of the placeholders right, or else
|
||||
paperless will use the default naming scheme instead.
|
||||
|
||||
.. caution::
|
||||
|
||||
As of now, you could totally tell paperless to store your files anywhere outside
|
||||
the media directory by setting
|
||||
|
||||
.. code::
|
||||
|
||||
PAPERLESS_FILENAME_FORMAT=../../my/custom/location/{title}
|
||||
|
||||
However, keep in mind that inside docker, if files get stored outside of the
|
||||
predefined volumes, they will be lost after a restart of paperless.
|
||||
|
||||
|
||||
Storage paths
|
||||
#############
|
||||
|
||||
One of the best things in Paperless is that you can not only access the documents via the
|
||||
web interface, but also via the file system.
|
||||
|
||||
When as single storage layout is not sufficient for your use case, storage paths come to
|
||||
the rescue. Storage paths allow you to configure more precisely where each document is stored
|
||||
in the file system.
|
||||
|
||||
- Each storage path is a `PAPERLESS_FILENAME_FORMAT` and follows the rules described above
|
||||
- Each document is assigned a storage path using the matching algorithms described above, but
|
||||
can be overwritten at any time
|
||||
|
||||
For example, you could define the following two storage paths:
|
||||
|
||||
1. Normal communications are put into a folder structure sorted by `year/correspondent`
|
||||
2. Communications with insurance companies are stored in a flat structure with longer file names,
|
||||
but containing the full date of the correspondence.
|
||||
|
||||
.. code::
|
||||
|
||||
By Year = {created_year}/{correspondent}/{title}
|
||||
Insurances = Insurances/{correspondent}/{created_year}-{created_month}-{created_day} {title}
|
||||
|
||||
|
||||
If you then map these storage paths to the documents, you might get the following result.
|
||||
For simplicity, `By Year` defines the same structure as in the previous example above.
|
||||
|
||||
.. code:: text
|
||||
|
||||
2019/ # By Year
|
||||
My bank/
|
||||
Statement January.pdf
|
||||
Statement February.pdf
|
||||
|
||||
Insurances/ # Insurances
|
||||
Healthcare 123/
|
||||
2022-01-01 Statement January.pdf
|
||||
2022-02-02 Letter.pdf
|
||||
2022-02-03 Letter.pdf
|
||||
Dental 456/
|
||||
2021-12-01 New Conditions.pdf
|
||||
|
||||
|
||||
.. hint::
|
||||
|
||||
Defining a storage path is optional. If no storage path is defined for a document, the global
|
||||
`PAPERLESS_FILENAME_FORMAT` is applied.
|
||||
|
||||
.. caution::
|
||||
|
||||
If you adjust the format of an existing storage path, old documents don't get relocated automatically.
|
||||
You need to run the :ref:`document renamer <utilities-renamer>` to adjust their pathes.
|
||||
|
||||
.. _advanced-celery-monitoring:
|
||||
|
||||
Celery Monitoring
|
||||
#################
|
||||
|
||||
The monitoring tool `Flower <https://flower.readthedocs.io/en/latest/index.html>`_ can be used to view more
|
||||
detailed information about the health of the celery workers used for asynchronous tasks. This includes details
|
||||
on currently running, queued and completed tasks, timing and more. Flower can also be used with Prometheus, as it
|
||||
exports metrics. For details on its capabilities, refer to the Flower documentation.
|
||||
|
||||
To configure Flower further, create a `flowerconfig.py` and place it into the `src/paperless` directory. For
|
||||
a Docker installation, you can use volumes to accomplish this:
|
||||
|
||||
.. code:: yaml
|
||||
|
||||
services:
|
||||
# ...
|
||||
webserver:
|
||||
# ...
|
||||
volumes:
|
||||
- /path/to/my/flowerconfig.py:/usr/src/paperless/src/paperless/flowerconfig.py:ro
|
||||
|
||||
Custom Container Initialization
|
||||
###############################
|
||||
|
||||
The Docker image includes the ability to run custom user scripts during startup. This could be
|
||||
utilized for installing additional tools or Python packages, for example.
|
||||
|
||||
To utilize this, mount a folder containing your scripts to the custom initialization directory, `/custom-cont-init.d`
|
||||
and place scripts you wish to run inside. For security, the folder and its contents must be owned by `root`.
|
||||
Additionally, scripts must only be writable by `root`.
|
||||
|
||||
Your scripts will be run directly before the webserver completes startup. Scripts will be run by the `root` user.
|
||||
This is an advanced functionality with which you could break functionality or lose data.
|
||||
|
||||
For example, using Docker Compose:
|
||||
|
||||
|
||||
.. code:: yaml
|
||||
|
||||
services:
|
||||
# ...
|
||||
webserver:
|
||||
# ...
|
||||
volumes:
|
||||
- /path/to/my/scripts:/custom-cont-init.d:ro
|
||||
|
||||
.. _advanced-mysql-caveats:
|
||||
|
||||
MySQL Caveats
|
||||
#############
|
||||
|
||||
Case Sensitivity
|
||||
================
|
||||
|
||||
The database interface does not provide a method to configure a MySQL database to
|
||||
be case sensitive. This would prevent a user from creating a tag ``Name`` and ``NAME``
|
||||
as they are considered the same.
|
||||
|
||||
Per Django documentation, to enable this requires manual intervention. To enable
|
||||
case sensetive tables, you can execute the following command against each table:
|
||||
|
||||
``ALTER TABLE <table_name> CONVERT TO CHARACTER SET utf8mb4 COLLATE utf8mb4_bin;``
|
||||
|
||||
You can also set the default for new tables (this does NOT affect existing tables) with:
|
||||
|
||||
``ALTER DATABASE <db_name> CHARACTER SET utf8mb4 COLLATE utf8mb4_bin;``
|
||||
You will be redirected shortly...
|
||||
|
299
docs/api.rst
299
docs/api.rst
@ -1,303 +1,12 @@
|
||||
.. _api:
|
||||
|
||||
************
|
||||
The REST API
|
||||
************
|
||||
|
||||
|
||||
Paperless makes use of the `Django REST Framework`_ standard API interface.
|
||||
It provides a browsable API for most of its endpoints, which you can inspect
|
||||
at ``http://<paperless-host>:<port>/api/``. This also documents most of the
|
||||
available filters and ordering fields.
|
||||
.. cssclass:: redirect-notice
|
||||
|
||||
.. _Django REST Framework: http://django-rest-framework.org/
|
||||
The Paperless-ngx documentation has permanently moved.
|
||||
|
||||
The API provides 5 main endpoints:
|
||||
|
||||
* ``/api/documents/``: Full CRUD support, except POSTing new documents. See below.
|
||||
* ``/api/correspondents/``: Full CRUD support.
|
||||
* ``/api/document_types/``: Full CRUD support.
|
||||
* ``/api/logs/``: Read-Only.
|
||||
* ``/api/tags/``: Full CRUD support.
|
||||
|
||||
All of these endpoints except for the logging endpoint
|
||||
allow you to fetch, edit and delete individual objects
|
||||
by appending their primary key to the path, for example ``/api/documents/454/``.
|
||||
|
||||
The objects served by the document endpoint contain the following fields:
|
||||
|
||||
* ``id``: ID of the document. Read-only.
|
||||
* ``title``: Title of the document.
|
||||
* ``content``: Plain text content of the document.
|
||||
* ``tags``: List of IDs of tags assigned to this document, or empty list.
|
||||
* ``document_type``: Document type of this document, or null.
|
||||
* ``correspondent``: Correspondent of this document or null.
|
||||
* ``created``: The date time at which this document was created.
|
||||
* ``created_date``: The date (YYYY-MM-DD) at which this document was created. Optional. If also passed with created, this is ignored.
|
||||
* ``modified``: The date at which this document was last edited in paperless. Read-only.
|
||||
* ``added``: The date at which this document was added to paperless. Read-only.
|
||||
* ``archive_serial_number``: The identifier of this document in a physical document archive.
|
||||
* ``original_file_name``: Verbose filename of the original document. Read-only.
|
||||
* ``archived_file_name``: Verbose filename of the archived document. Read-only. Null if no archived document is available.
|
||||
|
||||
|
||||
Downloading documents
|
||||
#####################
|
||||
|
||||
In addition to that, the document endpoint offers these additional actions on
|
||||
individual documents:
|
||||
|
||||
* ``/api/documents/<pk>/download/``: Download the document.
|
||||
* ``/api/documents/<pk>/preview/``: Display the document inline,
|
||||
without downloading it.
|
||||
* ``/api/documents/<pk>/thumb/``: Download the PNG thumbnail of a document.
|
||||
|
||||
Paperless generates archived PDF/A documents from consumed files and stores both
|
||||
the original files as well as the archived files. By default, the endpoints
|
||||
for previews and downloads serve the archived file, if it is available.
|
||||
Otherwise, the original file is served.
|
||||
Some document cannot be archived.
|
||||
|
||||
The endpoints correctly serve the response header fields ``Content-Disposition``
|
||||
and ``Content-Type`` to indicate the filename for download and the type of content of
|
||||
the document.
|
||||
|
||||
In order to download or preview the original document when an archived document is available,
|
||||
supply the query parameter ``original=true``.
|
||||
|
||||
.. hint::
|
||||
|
||||
Paperless used to provide these functionality at ``/fetch/<pk>/preview``,
|
||||
``/fetch/<pk>/thumb`` and ``/fetch/<pk>/doc``. Redirects to the new URLs
|
||||
are in place. However, if you use these old URLs to access documents, you
|
||||
should update your app or script to use the new URLs.
|
||||
|
||||
|
||||
Getting document metadata
|
||||
#########################
|
||||
|
||||
The api also has an endpoint to retrieve read-only metadata about specific documents. this
|
||||
information is not served along with the document objects, since it requires reading
|
||||
files and would therefore slow down document lists considerably.
|
||||
|
||||
Access the metadata of a document with an ID ``id`` at ``/api/documents/<id>/metadata/``.
|
||||
|
||||
The endpoint reports the following data:
|
||||
|
||||
* ``original_checksum``: MD5 checksum of the original document.
|
||||
* ``original_size``: Size of the original document, in bytes.
|
||||
* ``original_mime_type``: Mime type of the original document.
|
||||
* ``media_filename``: Current filename of the document, under which it is stored inside the media directory.
|
||||
* ``has_archive_version``: True, if this document is archived, false otherwise.
|
||||
* ``original_metadata``: A list of metadata associated with the original document. See below.
|
||||
* ``archive_checksum``: MD5 checksum of the archived document, or null.
|
||||
* ``archive_size``: Size of the archived document in bytes, or null.
|
||||
* ``archive_metadata``: Metadata associated with the archived document, or null. See below.
|
||||
|
||||
File metadata is reported as a list of objects in the following form:
|
||||
|
||||
.. code:: json
|
||||
|
||||
[
|
||||
{
|
||||
"namespace": "http://ns.adobe.com/pdf/1.3/",
|
||||
"prefix": "pdf",
|
||||
"key": "Producer",
|
||||
"value": "SparklePDF, Fancy edition"
|
||||
},
|
||||
]
|
||||
|
||||
``namespace`` and ``prefix`` can be null. The actual metadata reported depends on the file type and the metadata
|
||||
available in that specific document. Paperless only reports PDF metadata at this point.
|
||||
|
||||
Authorization
|
||||
#############
|
||||
|
||||
The REST api provides three different forms of authentication.
|
||||
|
||||
1. Basic authentication
|
||||
|
||||
Authorize by providing a HTTP header in the form
|
||||
|
||||
.. code::
|
||||
|
||||
Authorization: Basic <credentials>
|
||||
|
||||
where ``credentials`` is a base64-encoded string of ``<username>:<password>``
|
||||
|
||||
2. Session authentication
|
||||
|
||||
When you're logged into paperless in your browser, you're automatically
|
||||
logged into the API as well and don't need to provide any authorization
|
||||
headers.
|
||||
|
||||
3. Token authentication
|
||||
|
||||
Paperless also offers an endpoint to acquire authentication tokens.
|
||||
|
||||
POST a username and password as a form or json string to ``/api/token/``
|
||||
and paperless will respond with a token, if the login data is correct.
|
||||
This token can be used to authenticate other requests with the
|
||||
following HTTP header:
|
||||
|
||||
.. code::
|
||||
|
||||
Authorization: Token <token>
|
||||
|
||||
Tokens can be managed and revoked in the paperless admin.
|
||||
|
||||
Searching for documents
|
||||
#######################
|
||||
|
||||
Full text searching is available on the ``/api/documents/`` endpoint. Two specific
|
||||
query parameters cause the API to return full text search results:
|
||||
|
||||
* ``/api/documents/?query=your%20search%20query``: Search for a document using a full text query.
|
||||
For details on the syntax, see :ref:`basic-usage_searching`.
|
||||
|
||||
* ``/api/documents/?more_like=1234``: Search for documents similar to the document with id 1234.
|
||||
|
||||
Pagination works exactly the same as it does for normal requests on this endpoint.
|
||||
|
||||
Certain limitations apply to full text queries:
|
||||
|
||||
* Results are always sorted by search score. The results matching the query best will show up first.
|
||||
|
||||
* Only a small subset of filtering parameters are supported.
|
||||
|
||||
Furthermore, each returned document has an additional ``__search_hit__`` attribute with various information
|
||||
about the search results:
|
||||
|
||||
.. code::
|
||||
|
||||
{
|
||||
"count": 31,
|
||||
"next": "http://localhost:8000/api/documents/?page=2&query=test",
|
||||
"previous": null,
|
||||
"results": [
|
||||
|
||||
...
|
||||
|
||||
{
|
||||
"id": 123,
|
||||
"title": "title",
|
||||
"content": "content",
|
||||
|
||||
...
|
||||
|
||||
"__search_hit__": {
|
||||
"score": 0.343,
|
||||
"highlights": "text <span class=\"match\">Test</span> text",
|
||||
"rank": 23
|
||||
}
|
||||
},
|
||||
|
||||
...
|
||||
|
||||
]
|
||||
}
|
||||
|
||||
* ``score`` is an indication how well this document matches the query relative to the other search results.
|
||||
* ``highlights`` is an excerpt from the document content and highlights the search terms with ``<span>`` tags as shown above.
|
||||
* ``rank`` is the index of the search results. The first result will have rank 0.
|
||||
|
||||
``/api/search/autocomplete/``
|
||||
=============================
|
||||
|
||||
Get auto completions for a partial search term.
|
||||
|
||||
Query parameters:
|
||||
|
||||
* ``term``: The incomplete term.
|
||||
* ``limit``: Amount of results. Defaults to 10.
|
||||
|
||||
Results returned by the endpoint are ordered by importance of the term in the
|
||||
document index. The first result is the term that has the highest Tf/Idf score
|
||||
in the index.
|
||||
|
||||
.. code:: json
|
||||
|
||||
[
|
||||
"term1",
|
||||
"term3",
|
||||
"term6",
|
||||
"term4"
|
||||
]
|
||||
|
||||
|
||||
.. _api-file_uploads:
|
||||
|
||||
POSTing documents
|
||||
#################
|
||||
|
||||
The API provides a special endpoint for file uploads:
|
||||
|
||||
``/api/documents/post_document/``
|
||||
|
||||
POST a multipart form to this endpoint, where the form field ``document`` contains
|
||||
the document that you want to upload to paperless. The filename is sanitized and
|
||||
then used to store the document in a temporary directory, and the consumer will
|
||||
be instructed to consume the document from there.
|
||||
|
||||
The endpoint supports the following optional form fields:
|
||||
|
||||
* ``title``: Specify a title that the consumer should use for the document.
|
||||
* ``created``: Specify a DateTime where the document was created (e.g. "2016-04-19" or "2016-04-19 06:15:00+02:00").
|
||||
* ``correspondent``: Specify the ID of a correspondent that the consumer should use for the document.
|
||||
* ``document_type``: Similar to correspondent.
|
||||
* ``tags``: Similar to correspondent. Specify this multiple times to have multiple tags added
|
||||
to the document.
|
||||
|
||||
|
||||
The endpoint will immediately return "OK" if the document consumption process
|
||||
was started successfully. No additional status information about the consumption
|
||||
process itself is available, since that happens in a different process.
|
||||
|
||||
|
||||
.. _api-versioning:
|
||||
|
||||
API Versioning
|
||||
##############
|
||||
|
||||
The REST API is versioned since Paperless-ngx 1.3.0.
|
||||
|
||||
* Versioning ensures that changes to the API don't break older clients.
|
||||
* Clients specify the specific version of the API they wish to use with every request and Paperless will handle the request using the specified API version.
|
||||
* Even if the underlying data model changes, older API versions will always serve compatible data.
|
||||
* If no version is specified, Paperless will serve version 1 to ensure compatibility with older clients that do not request a specific API version.
|
||||
|
||||
API versions are specified by submitting an additional HTTP ``Accept`` header with every request:
|
||||
|
||||
.. code::
|
||||
|
||||
Accept: application/json; version=6
|
||||
|
||||
If an invalid version is specified, Paperless 1.3.0 will respond with "406 Not Acceptable" and an error message in the body.
|
||||
Earlier versions of Paperless will serve API version 1 regardless of whether a version is specified via the ``Accept`` header.
|
||||
|
||||
If a client wishes to verify whether it is compatible with any given server, the following procedure should be performed:
|
||||
|
||||
1. Perform an *authenticated* request against any API endpoint. If the server is on version 1.3.0 or newer, the server will
|
||||
add two custom headers to the response:
|
||||
|
||||
.. code::
|
||||
|
||||
X-Api-Version: 2
|
||||
X-Version: 1.3.0
|
||||
|
||||
2. Determine whether the client is compatible with this server based on the presence/absence of these headers and their values if present.
|
||||
|
||||
|
||||
API Changelog
|
||||
=============
|
||||
|
||||
Version 1
|
||||
---------
|
||||
|
||||
Initial API version.
|
||||
|
||||
Version 2
|
||||
---------
|
||||
|
||||
* Added field ``Tag.color``. This read/write string field contains a hex color such as ``#a6cee3``.
|
||||
* Added read-only field ``Tag.text_color``. This field contains the text color to use for a specific tag, which is either black or white depending on the brightness of ``Tag.color``.
|
||||
* Removed field ``Tag.colour``.
|
||||
You will be redirected shortly...
|
||||
|
2445
docs/changelog.md
2445
docs/changelog.md
File diff suppressed because it is too large
Load Diff
11
docs/changelog.rst
Normal file
11
docs/changelog.rst
Normal file
@ -0,0 +1,11 @@
|
||||
.. _changelog:
|
||||
|
||||
*********
|
||||
Changelog
|
||||
*********
|
||||
|
||||
.. cssclass:: redirect-notice
|
||||
|
||||
The Paperless-ngx documentation has permanently moved.
|
||||
|
||||
You will be redirected shortly...
|
@ -4,928 +4,9 @@
|
||||
Configuration
|
||||
*************
|
||||
|
||||
Paperless provides a wide range of customizations.
|
||||
Depending on how you run paperless, these settings have to be defined in different
|
||||
places.
|
||||
|
||||
* If you run paperless on docker, ``paperless.conf`` is not used. Rather, configure
|
||||
paperless by copying necessary options to ``docker-compose.env``.
|
||||
* If you are running paperless on anything else, paperless will search for the
|
||||
configuration file in these locations and use the first one it finds:
|
||||
.. cssclass:: redirect-notice
|
||||
|
||||
.. code::
|
||||
The Paperless-ngx documentation has permanently moved.
|
||||
|
||||
/path/to/paperless/paperless.conf
|
||||
/etc/paperless.conf
|
||||
/usr/local/etc/paperless.conf
|
||||
|
||||
|
||||
Required services
|
||||
#################
|
||||
|
||||
PAPERLESS_REDIS=<url>
|
||||
This is required for processing scheduled tasks such as email fetching, index
|
||||
optimization and for training the automatic document matcher.
|
||||
|
||||
* If your Redis server needs login credentials PAPERLESS_REDIS = ``redis://<username>:<password>@<host>:<port>``
|
||||
|
||||
* With the requirepass option PAPERLESS_REDIS = ``redis://:<password>@<host>:<port>``
|
||||
|
||||
`More information on securing your Redis Instance <https://redis.io/docs/getting-started/#securing-redis>`_.
|
||||
|
||||
Defaults to redis://localhost:6379.
|
||||
|
||||
PAPERLESS_DBENGINE=<engine_name>
|
||||
Optional, gives the ability to choose Postgres or MariaDB for database engine.
|
||||
Available options are `postgresql` and `mariadb`.
|
||||
|
||||
Default is `postgresql`.
|
||||
|
||||
.. warning::
|
||||
|
||||
Using MariaDB comes with some caveats. See :ref:`advanced-mysql-caveats` for details.
|
||||
|
||||
|
||||
PAPERLESS_DBHOST=<hostname>
|
||||
By default, sqlite is used as the database backend. This can be changed here.
|
||||
|
||||
Set PAPERLESS_DBHOST and another database will be used instead of sqlite.
|
||||
|
||||
PAPERLESS_DBPORT=<port>
|
||||
Adjust port if necessary.
|
||||
|
||||
Default is 5432.
|
||||
|
||||
PAPERLESS_DBNAME=<name>
|
||||
Database name in PostgreSQL or MariaDB.
|
||||
|
||||
Defaults to "paperless".
|
||||
|
||||
PAPERLESS_DBUSER=<name>
|
||||
Database user in PostgreSQL or MariaDB.
|
||||
|
||||
Defaults to "paperless".
|
||||
|
||||
PAPERLESS_DBPASS=<password>
|
||||
Database password for PostgreSQL or MariaDB.
|
||||
|
||||
Defaults to "paperless".
|
||||
|
||||
PAPERLESS_DBSSLMODE=<mode>
|
||||
SSL mode to use when connecting to PostgreSQL.
|
||||
|
||||
See `the official documentation about sslmode <https://www.postgresql.org/docs/current/libpq-ssl.html>`_.
|
||||
|
||||
Default is ``prefer``.
|
||||
|
||||
PAPERLESS_DB_TIMEOUT=<float>
|
||||
Amount of time for a database connection to wait for the database to unlock.
|
||||
Mostly applicable for an sqlite based installation, consider changing to postgresql
|
||||
if you need to increase this.
|
||||
|
||||
Defaults to unset, keeping the Django defaults.
|
||||
|
||||
Paths and folders
|
||||
#################
|
||||
|
||||
PAPERLESS_CONSUMPTION_DIR=<path>
|
||||
This where your documents should go to be consumed. Make sure that it exists
|
||||
and that the user running the paperless service can read/write its contents
|
||||
before you start Paperless.
|
||||
|
||||
Don't change this when using docker, as it only changes the path within the
|
||||
container. Change the local consumption directory in the docker-compose.yml
|
||||
file instead.
|
||||
|
||||
Defaults to "../consume/", relative to the "src" directory.
|
||||
|
||||
PAPERLESS_DATA_DIR=<path>
|
||||
This is where paperless stores all its data (search index, SQLite database,
|
||||
classification model, etc).
|
||||
|
||||
Defaults to "../data/", relative to the "src" directory.
|
||||
|
||||
PAPERLESS_TRASH_DIR=<path>
|
||||
Instead of removing deleted documents, they are moved to this directory.
|
||||
|
||||
This must be writeable by the user running paperless. When running inside
|
||||
docker, ensure that this path is within a permanent volume (such as
|
||||
"../media/trash") so it won't get lost on upgrades.
|
||||
|
||||
Defaults to empty (i.e. really delete documents).
|
||||
|
||||
PAPERLESS_MEDIA_ROOT=<path>
|
||||
This is where your documents and thumbnails are stored.
|
||||
|
||||
You can set this and PAPERLESS_DATA_DIR to the same folder to have paperless
|
||||
store all its data within the same volume.
|
||||
|
||||
Defaults to "../media/", relative to the "src" directory.
|
||||
|
||||
PAPERLESS_STATICDIR=<path>
|
||||
Override the default STATIC_ROOT here. This is where all static files
|
||||
created using "collectstatic" manager command are stored.
|
||||
|
||||
Unless you're doing something fancy, there is no need to override this.
|
||||
|
||||
Defaults to "../static/", relative to the "src" directory.
|
||||
|
||||
PAPERLESS_FILENAME_FORMAT=<format>
|
||||
Changes the filenames paperless uses to store documents in the media directory.
|
||||
See :ref:`advanced-file_name_handling` for details.
|
||||
|
||||
Default is none, which disables this feature.
|
||||
|
||||
PAPERLESS_FILENAME_FORMAT_REMOVE_NONE=<bool>
|
||||
Tells paperless to replace placeholders in `PAPERLESS_FILENAME_FORMAT` that would resolve
|
||||
to 'none' to be omitted from the resulting filename. This also holds true for directory
|
||||
names.
|
||||
See :ref:`advanced-file_name_handling` for details.
|
||||
|
||||
Defaults to `false` which disables this feature.
|
||||
|
||||
PAPERLESS_LOGGING_DIR=<path>
|
||||
This is where paperless will store log files.
|
||||
|
||||
Defaults to "``PAPERLESS_DATA_DIR``/log/".
|
||||
|
||||
|
||||
Logging
|
||||
#######
|
||||
|
||||
PAPERLESS_LOGROTATE_MAX_SIZE=<num>
|
||||
Maximum file size for log files before they are rotated, in bytes.
|
||||
|
||||
Defaults to 1 MiB.
|
||||
|
||||
PAPERLESS_LOGROTATE_MAX_BACKUPS=<num>
|
||||
Number of rotated log files to keep.
|
||||
|
||||
Defaults to 20.
|
||||
|
||||
.. _hosting-and-security:
|
||||
|
||||
Hosting & Security
|
||||
##################
|
||||
|
||||
PAPERLESS_SECRET_KEY=<key>
|
||||
Paperless uses this to make session tokens. If you expose paperless on the
|
||||
internet, you need to change this, since the default secret is well known.
|
||||
|
||||
Use any sequence of characters. The more, the better. You don't need to
|
||||
remember this. Just face-roll your keyboard.
|
||||
|
||||
Default is listed in the file ``src/paperless/settings.py``.
|
||||
|
||||
PAPERLESS_URL=<url>
|
||||
This setting can be used to set the three options below (ALLOWED_HOSTS,
|
||||
CORS_ALLOWED_HOSTS and CSRF_TRUSTED_ORIGINS). If the other options are
|
||||
set the values will be combined with this one. Do not include a trailing
|
||||
slash. E.g. https://paperless.domain.com
|
||||
|
||||
Defaults to empty string, leaving the other settings unaffected.
|
||||
|
||||
PAPERLESS_CSRF_TRUSTED_ORIGINS=<comma-separated-list>
|
||||
A list of trusted origins for unsafe requests (e.g. POST). As of Django 4.0
|
||||
this is required to access the Django admin via the web.
|
||||
See https://docs.djangoproject.com/en/4.0/ref/settings/#csrf-trusted-origins
|
||||
|
||||
Can also be set using PAPERLESS_URL (see above).
|
||||
|
||||
Defaults to empty string, which does not add any origins to the trusted list.
|
||||
|
||||
PAPERLESS_ALLOWED_HOSTS=<comma-separated-list>
|
||||
If you're planning on putting Paperless on the open internet, then you
|
||||
really should set this value to the domain name you're using. Failing to do
|
||||
so leaves you open to HTTP host header attacks:
|
||||
https://docs.djangoproject.com/en/3.1/topics/security/#host-header-validation
|
||||
|
||||
Just remember that this is a comma-separated list, so "example.com" is fine,
|
||||
as is "example.com,www.example.com", but NOT " example.com" or "example.com,"
|
||||
|
||||
Can also be set using PAPERLESS_URL (see above).
|
||||
|
||||
If manually set, please remember to include "localhost". Otherwise docker
|
||||
healthcheck will fail.
|
||||
|
||||
Defaults to "*", which is all hosts.
|
||||
|
||||
PAPERLESS_CORS_ALLOWED_HOSTS=<comma-separated-list>
|
||||
You need to add your servers to the list of allowed hosts that can do CORS
|
||||
calls. Set this to your public domain name.
|
||||
|
||||
Can also be set using PAPERLESS_URL (see above).
|
||||
|
||||
Defaults to "http://localhost:8000".
|
||||
|
||||
PAPERLESS_FORCE_SCRIPT_NAME=<path>
|
||||
To host paperless under a subpath url like example.com/paperless you set
|
||||
this value to /paperless. No trailing slash!
|
||||
|
||||
Defaults to none, which hosts paperless at "/".
|
||||
|
||||
PAPERLESS_STATIC_URL=<path>
|
||||
Override the STATIC_URL here. Unless you're hosting Paperless off a
|
||||
subdomain like /paperless/, you probably don't need to change this.
|
||||
If you do change it, be sure to include the trailing slash.
|
||||
|
||||
Defaults to "/static/".
|
||||
|
||||
.. note::
|
||||
|
||||
When hosting paperless behind a reverse proxy like Traefik or Nginx at a subpath e.g.
|
||||
example.com/paperlessngx you will also need to set ``PAPERLESS_FORCE_SCRIPT_NAME``
|
||||
(see above).
|
||||
|
||||
PAPERLESS_AUTO_LOGIN_USERNAME=<username>
|
||||
Specify a username here so that paperless will automatically perform login
|
||||
with the selected user.
|
||||
|
||||
.. danger::
|
||||
|
||||
Do not use this when exposing paperless on the internet. There are no
|
||||
checks in place that would prevent you from doing this.
|
||||
|
||||
Defaults to none, which disables this feature.
|
||||
|
||||
PAPERLESS_ADMIN_USER=<username>
|
||||
If this environment variable is specified, Paperless automatically creates
|
||||
a superuser with the provided username at start. This is useful in cases
|
||||
where you can not run the `createsuperuser` command separately, such as Kubernetes
|
||||
or AWS ECS.
|
||||
|
||||
Requires `PAPERLESS_ADMIN_PASSWORD` to be set.
|
||||
|
||||
.. note::
|
||||
|
||||
This will not change an existing [super]user's password, nor will
|
||||
it recreate a user that already exists. You can leave this throughout
|
||||
the lifecycle of the containers.
|
||||
|
||||
PAPERLESS_ADMIN_MAIL=<email>
|
||||
(Optional) Specify superuser email address. Only used when
|
||||
`PAPERLESS_ADMIN_USER` is set.
|
||||
|
||||
Defaults to ``root@localhost``.
|
||||
|
||||
PAPERLESS_ADMIN_PASSWORD=<password>
|
||||
Only used when `PAPERLESS_ADMIN_USER` is set.
|
||||
This will be the password of the automatically created superuser.
|
||||
|
||||
|
||||
PAPERLESS_COOKIE_PREFIX=<str>
|
||||
Specify a prefix that is added to the cookies used by paperless to identify
|
||||
the currently logged in user. This is useful for when you're running two
|
||||
instances of paperless on the same host.
|
||||
|
||||
After changing this, you will have to login again.
|
||||
|
||||
Defaults to ``""``, which does not alter the cookie names.
|
||||
|
||||
PAPERLESS_ENABLE_HTTP_REMOTE_USER=<bool>
|
||||
Allows authentication via HTTP_REMOTE_USER which is used by some SSO
|
||||
applications.
|
||||
|
||||
.. warning::
|
||||
|
||||
This will allow authentication by simply adding a ``Remote-User: <username>`` header
|
||||
to a request. Use with care! You especially *must* ensure that any such header is not
|
||||
passed from your proxy server to paperless.
|
||||
|
||||
If you're exposing paperless to the internet directly, do not use this.
|
||||
|
||||
Also see the warning `in the official documentation <https://docs.djangoproject.com/en/3.1/howto/auth-remote-user/#configuration>`.
|
||||
|
||||
Defaults to `false` which disables this feature.
|
||||
|
||||
PAPERLESS_HTTP_REMOTE_USER_HEADER_NAME=<str>
|
||||
If `PAPERLESS_ENABLE_HTTP_REMOTE_USER` is enabled, this property allows to
|
||||
customize the name of the HTTP header from which the authenticated username
|
||||
is extracted. Values are in terms of
|
||||
[HttpRequest.META](https://docs.djangoproject.com/en/3.1/ref/request-response/#django.http.HttpRequest.META).
|
||||
Thus, the configured value must start with `HTTP_` followed by the
|
||||
normalized actual header name.
|
||||
|
||||
Defaults to `HTTP_REMOTE_USER`.
|
||||
|
||||
PAPERLESS_LOGOUT_REDIRECT_URL=<str>
|
||||
URL to redirect the user to after a logout. This can be used together with
|
||||
`PAPERLESS_ENABLE_HTTP_REMOTE_USER` to redirect the user back to the SSO
|
||||
application's logout page.
|
||||
|
||||
Defaults to None, which disables this feature.
|
||||
|
||||
.. _configuration-ocr:
|
||||
|
||||
OCR settings
|
||||
############
|
||||
|
||||
Paperless uses `OCRmyPDF <https://ocrmypdf.readthedocs.io/en/latest/>`_ for
|
||||
performing OCR on documents and images. Paperless uses sensible defaults for
|
||||
most settings, but all of them can be configured to your needs.
|
||||
|
||||
PAPERLESS_OCR_LANGUAGE=<lang>
|
||||
Customize the language that paperless will attempt to use when
|
||||
parsing documents.
|
||||
|
||||
It should be a 3-letter language code consistent with ISO
|
||||
639: https://www.loc.gov/standards/iso639-2/php/code_list.php
|
||||
|
||||
Set this to the language most of your documents are written in.
|
||||
|
||||
This can be a combination of multiple languages such as ``deu+eng``,
|
||||
in which case tesseract will use whatever language matches best.
|
||||
Keep in mind that tesseract uses much more cpu time with multiple
|
||||
languages enabled.
|
||||
|
||||
Defaults to "eng".
|
||||
|
||||
Note: If your language contains a '-' such as chi-sim, you must use chi_sim
|
||||
|
||||
PAPERLESS_OCR_MODE=<mode>
|
||||
Tell paperless when and how to perform ocr on your documents. Four modes
|
||||
are available:
|
||||
|
||||
* ``skip``: Paperless skips all pages and will perform ocr only on pages
|
||||
where no text is present. This is the safest option.
|
||||
* ``skip_noarchive``: In addition to skip, paperless won't create an
|
||||
archived version of your documents when it finds any text in them.
|
||||
This is useful if you don't want to have two almost-identical versions
|
||||
of your digital documents in the media folder. This is the fastest option.
|
||||
* ``redo``: Paperless will OCR all pages of your documents and attempt to
|
||||
replace any existing text layers with new text. This will be useful for
|
||||
documents from scanners that already performed OCR with insufficient
|
||||
results. It will also perform OCR on purely digital documents.
|
||||
|
||||
This option may fail on some documents that have features that cannot
|
||||
be removed, such as forms. In this case, the text from the document is
|
||||
used instead.
|
||||
* ``force``: Paperless rasterizes your documents, converting any text
|
||||
into images and puts the OCRed text on top. This works for all documents,
|
||||
however, the resulting document may be significantly larger and text
|
||||
won't appear as sharp when zoomed in.
|
||||
|
||||
The default is ``skip``, which only performs OCR when necessary and always
|
||||
creates archived documents.
|
||||
|
||||
Read more about this in the `OCRmyPDF documentation <https://ocrmypdf.readthedocs.io/en/latest/advanced.html#when-ocr-is-skipped>`_.
|
||||
|
||||
PAPERLESS_OCR_CLEAN=<mode>
|
||||
Tells paperless to use ``unpaper`` to clean any input document before
|
||||
sending it to tesseract. This uses more resources, but generally results
|
||||
in better OCR results. The following modes are available:
|
||||
|
||||
* ``clean``: Apply unpaper.
|
||||
* ``clean-final``: Apply unpaper, and use the cleaned images to build the
|
||||
output file instead of the original images.
|
||||
* ``none``: Do not apply unpaper.
|
||||
|
||||
Defaults to ``clean``.
|
||||
|
||||
.. note::
|
||||
|
||||
``clean-final`` is incompatible with ocr mode ``redo``. When both
|
||||
``clean-final`` and the ocr mode ``redo`` is configured, ``clean``
|
||||
is used instead.
|
||||
|
||||
PAPERLESS_OCR_DESKEW=<bool>
|
||||
Tells paperless to correct skewing (slight rotation of input images mainly
|
||||
due to improper scanning)
|
||||
|
||||
Defaults to ``true``, which enables this feature.
|
||||
|
||||
.. note::
|
||||
|
||||
Deskewing is incompatible with ocr mode ``redo``. Deskewing will get
|
||||
disabled automatically if ``redo`` is used as the ocr mode.
|
||||
|
||||
PAPERLESS_OCR_ROTATE_PAGES=<bool>
|
||||
Tells paperless to correct page rotation (90°, 180° and 270° rotation).
|
||||
|
||||
If you notice that paperless is not rotating incorrectly rotated
|
||||
pages (or vice versa), try adjusting the threshold up or down (see below).
|
||||
|
||||
Defaults to ``true``, which enables this feature.
|
||||
|
||||
|
||||
PAPERLESS_OCR_ROTATE_PAGES_THRESHOLD=<num>
|
||||
Adjust the threshold for automatic page rotation by ``PAPERLESS_OCR_ROTATE_PAGES``.
|
||||
This is an arbitrary value reported by tesseract. "15" is a very conservative value,
|
||||
whereas "2" is a very aggressive option and will often result in correctly rotated pages
|
||||
being rotated as well.
|
||||
|
||||
Defaults to "12".
|
||||
|
||||
PAPERLESS_OCR_OUTPUT_TYPE=<type>
|
||||
Specify the the type of PDF documents that paperless should produce.
|
||||
|
||||
* ``pdf``: Modify the PDF document as little as possible.
|
||||
* ``pdfa``: Convert PDF documents into PDF/A-2b documents, which is a
|
||||
subset of the entire PDF specification and meant for storing
|
||||
documents long term.
|
||||
* ``pdfa-1``, ``pdfa-2``, ``pdfa-3`` to specify the exact version of
|
||||
PDF/A you wish to use.
|
||||
|
||||
If not specified, ``pdfa`` is used. Remember that paperless also keeps
|
||||
the original input file as well as the archived version.
|
||||
|
||||
|
||||
PAPERLESS_OCR_PAGES=<num>
|
||||
Tells paperless to use only the specified amount of pages for OCR. Documents
|
||||
with less than the specified amount of pages get OCR'ed completely.
|
||||
|
||||
Specifying 1 here will only use the first page.
|
||||
|
||||
When combined with ``PAPERLESS_OCR_MODE=redo`` or ``PAPERLESS_OCR_MODE=force``,
|
||||
paperless will not modify any text it finds on excluded pages and copy it
|
||||
verbatim.
|
||||
|
||||
Defaults to 0, which disables this feature and always uses all pages.
|
||||
|
||||
PAPERLESS_OCR_IMAGE_DPI=<num>
|
||||
Paperless will OCR any images you put into the system and convert them
|
||||
into PDF documents. This is useful if your scanner produces images.
|
||||
In order to do so, paperless needs to know the DPI of the image.
|
||||
Most images from scanners will have this information embedded and
|
||||
paperless will detect and use that information. In case this fails, it
|
||||
uses this value as a fallback.
|
||||
|
||||
Set this to the DPI your scanner produces images at.
|
||||
|
||||
Default is none, which will automatically calculate image DPI so that
|
||||
the produced PDF documents are A4 sized.
|
||||
|
||||
PAPERLESS_OCR_MAX_IMAGE_PIXELS=<num>
|
||||
Paperless will raise a warning when OCRing images which are over this limit and
|
||||
will not OCR images which are more than twice this limit. Note this does not
|
||||
prevent the document from being consumed, but could result in missing text content.
|
||||
|
||||
If unset, will default to the value determined by
|
||||
`Pillow <https://pillow.readthedocs.io/en/stable/reference/Image.html#PIL.Image.MAX_IMAGE_PIXELS>`_.
|
||||
|
||||
.. note::
|
||||
|
||||
Increasing this limit could cause Paperless to consume additional resources
|
||||
when consuming a file. Be sure you have sufficient system resources.
|
||||
|
||||
.. caution::
|
||||
|
||||
The limit is intended to prevent malicious files from consuming system resources
|
||||
and causing crashes and other errors. Only increase this value if you are certain
|
||||
your documents are not malicious and you need the text which was not OCRed
|
||||
|
||||
PAPERLESS_OCR_USER_ARGS=<json>
|
||||
OCRmyPDF offers many more options. Use this parameter to specify any
|
||||
additional arguments you wish to pass to OCRmyPDF. Since Paperless uses
|
||||
the API of OCRmyPDF, you have to specify these in a format that can be
|
||||
passed to the API. See `the API reference of OCRmyPDF <https://ocrmypdf.readthedocs.io/en/latest/api.html#reference>`_
|
||||
for valid parameters. All command line options are supported, but they
|
||||
use underscores instead of dashes.
|
||||
|
||||
.. caution::
|
||||
|
||||
Paperless has been tested to work with the OCR options provided
|
||||
above. There are many options that are incompatible with each other,
|
||||
so specifying invalid options may prevent paperless from consuming
|
||||
any documents.
|
||||
|
||||
Specify arguments as a JSON dictionary. Keep note of lower case booleans
|
||||
and double quoted parameter names and strings. Examples:
|
||||
|
||||
.. code:: json
|
||||
|
||||
{"deskew": true, "optimize": 3, "unpaper_args": "--pre-rotate 90"}
|
||||
|
||||
.. _configuration-tika:
|
||||
|
||||
Tika settings
|
||||
#############
|
||||
|
||||
Paperless can make use of `Tika <https://tika.apache.org/>`_ and
|
||||
`Gotenberg <https://gotenberg.dev/>`_ for parsing and
|
||||
converting "Office" documents (such as ".doc", ".xlsx" and ".odt"). If you
|
||||
wish to use this, you must provide a Tika server and a Gotenberg server,
|
||||
configure their endpoints, and enable the feature.
|
||||
|
||||
PAPERLESS_TIKA_ENABLED=<bool>
|
||||
Enable (or disable) the Tika parser.
|
||||
|
||||
Defaults to false.
|
||||
|
||||
PAPERLESS_TIKA_ENDPOINT=<url>
|
||||
Set the endpoint URL were Paperless can reach your Tika server.
|
||||
|
||||
Defaults to "http://localhost:9998".
|
||||
|
||||
PAPERLESS_TIKA_GOTENBERG_ENDPOINT=<url>
|
||||
Set the endpoint URL were Paperless can reach your Gotenberg server.
|
||||
|
||||
Defaults to "http://localhost:3000".
|
||||
|
||||
If you run paperless on docker, you can add those services to the docker-compose
|
||||
file (see the provided ``docker-compose.sqlite-tika.yml`` file for reference). The changes
|
||||
requires are as follows:
|
||||
|
||||
.. code:: yaml
|
||||
|
||||
services:
|
||||
# ...
|
||||
|
||||
webserver:
|
||||
# ...
|
||||
|
||||
environment:
|
||||
# ...
|
||||
|
||||
PAPERLESS_TIKA_ENABLED: 1
|
||||
PAPERLESS_TIKA_GOTENBERG_ENDPOINT: http://gotenberg:3000
|
||||
PAPERLESS_TIKA_ENDPOINT: http://tika:9998
|
||||
|
||||
# ...
|
||||
|
||||
gotenberg:
|
||||
image: gotenberg/gotenberg:7.6
|
||||
restart: unless-stopped
|
||||
command:
|
||||
- "gotenberg"
|
||||
- "--chromium-disable-routes=true"
|
||||
|
||||
tika:
|
||||
image: ghcr.io/paperless-ngx/tika:latest
|
||||
restart: unless-stopped
|
||||
|
||||
Add the configuration variables to the environment of the webserver (alternatively
|
||||
put the configuration in the ``docker-compose.env`` file) and add the additional
|
||||
services below the webserver service. Watch out for indentation.
|
||||
|
||||
Make sure to use the correct format `PAPERLESS_TIKA_ENABLED = 1` so python_dotenv can parse the statement correctly.
|
||||
|
||||
Software tweaks
|
||||
###############
|
||||
|
||||
PAPERLESS_TASK_WORKERS=<num>
|
||||
Paperless does multiple things in the background: Maintain the search index,
|
||||
maintain the automatic matching algorithm, check emails, consume documents,
|
||||
etc. This variable specifies how many things it will do in parallel.
|
||||
|
||||
Defaults to 1
|
||||
|
||||
|
||||
PAPERLESS_THREADS_PER_WORKER=<num>
|
||||
Furthermore, paperless uses multiple threads when consuming documents to
|
||||
speed up OCR. This variable specifies how many pages paperless will process
|
||||
in parallel on a single document.
|
||||
|
||||
.. caution::
|
||||
|
||||
Ensure that the product
|
||||
|
||||
PAPERLESS_TASK_WORKERS * PAPERLESS_THREADS_PER_WORKER
|
||||
|
||||
does not exceed your CPU core count or else paperless will be extremely slow.
|
||||
If you want paperless to process many documents in parallel, choose a high
|
||||
worker count. If you want paperless to process very large documents faster,
|
||||
use a higher thread per worker count.
|
||||
|
||||
The default is a balance between the two, according to your CPU core count,
|
||||
with a slight favor towards threads per worker:
|
||||
|
||||
+----------------+---------+---------+
|
||||
| CPU core count | Workers | Threads |
|
||||
+----------------+---------+---------+
|
||||
| 1 | 1 | 1 |
|
||||
+----------------+---------+---------+
|
||||
| 2 | 2 | 1 |
|
||||
+----------------+---------+---------+
|
||||
| 4 | 2 | 2 |
|
||||
+----------------+---------+---------+
|
||||
| 6 | 2 | 3 |
|
||||
+----------------+---------+---------+
|
||||
| 8 | 2 | 4 |
|
||||
+----------------+---------+---------+
|
||||
| 12 | 3 | 4 |
|
||||
+----------------+---------+---------+
|
||||
| 16 | 4 | 4 |
|
||||
+----------------+---------+---------+
|
||||
|
||||
If you only specify PAPERLESS_TASK_WORKERS, paperless will adjust
|
||||
PAPERLESS_THREADS_PER_WORKER automatically.
|
||||
|
||||
|
||||
PAPERLESS_WORKER_TIMEOUT=<num>
|
||||
Machines with few cores or weak ones might not be able to finish OCR on
|
||||
large documents within the default 1800 seconds. So extending this timeout
|
||||
may prove to be useful on weak hardware setups.
|
||||
|
||||
PAPERLESS_WORKER_RETRY=<num>
|
||||
If PAPERLESS_WORKER_TIMEOUT has been configured, the retry time for a task can
|
||||
also be configured. By default, this value will be set to 10s more than the
|
||||
worker timeout. This value should never be set less than the worker timeout.
|
||||
|
||||
PAPERLESS_TIME_ZONE=<timezone>
|
||||
Set the time zone here.
|
||||
See https://docs.djangoproject.com/en/3.1/ref/settings/#std:setting-TIME_ZONE
|
||||
for details on how to set it.
|
||||
|
||||
Defaults to UTC.
|
||||
|
||||
|
||||
.. _configuration-polling:
|
||||
|
||||
PAPERLESS_CONSUMER_POLLING=<num>
|
||||
If paperless won't find documents added to your consume folder, it might
|
||||
not be able to automatically detect filesystem changes. In that case,
|
||||
specify a polling interval in seconds here, which will then cause paperless
|
||||
to periodically check your consumption directory for changes. This will also
|
||||
disable listening for file system changes with ``inotify``.
|
||||
|
||||
Defaults to 0, which disables polling and uses filesystem notifications.
|
||||
|
||||
PAPERLESS_CONSUMER_POLLING_RETRY_COUNT=<num>
|
||||
If consumer polling is enabled, sets the number of times paperless will check for a
|
||||
file to remain unmodified.
|
||||
|
||||
Defaults to 5.
|
||||
|
||||
PAPERLESS_CONSUMER_POLLING_DELAY=<num>
|
||||
If consumer polling is enabled, sets the delay in seconds between each check (above) paperless
|
||||
will do while waiting for a file to remain unmodified.
|
||||
|
||||
Defaults to 5.
|
||||
|
||||
.. _configuration-inotify:
|
||||
|
||||
PAPERLESS_CONSUMER_INOTIFY_DELAY=<num>
|
||||
Sets the time in seconds the consumer will wait for additional events
|
||||
from inotify before the consumer will consider a file ready and begin consumption.
|
||||
Certain scanners or network setups may generate multiple events for a single file,
|
||||
leading to multiple consumers working on the same file. Configure this to
|
||||
prevent that.
|
||||
|
||||
Defaults to 0.5 seconds.
|
||||
|
||||
PAPERLESS_CONSUMER_DELETE_DUPLICATES=<bool>
|
||||
When the consumer detects a duplicate document, it will not touch the
|
||||
original document. This default behavior can be changed here.
|
||||
|
||||
Defaults to false.
|
||||
|
||||
|
||||
PAPERLESS_CONSUMER_RECURSIVE=<bool>
|
||||
Enable recursive watching of the consumption directory. Paperless will
|
||||
then pickup files from files in subdirectories within your consumption
|
||||
directory as well.
|
||||
|
||||
Defaults to false.
|
||||
|
||||
|
||||
PAPERLESS_CONSUMER_SUBDIRS_AS_TAGS=<bool>
|
||||
Set the names of subdirectories as tags for consumed files.
|
||||
E.g. <CONSUMPTION_DIR>/foo/bar/file.pdf will add the tags "foo" and "bar" to
|
||||
the consumed file. Paperless will create any tags that don't exist yet.
|
||||
|
||||
This is useful for sorting documents with certain tags such as ``car`` or
|
||||
``todo`` prior to consumption. These folders won't be deleted.
|
||||
|
||||
PAPERLESS_CONSUMER_RECURSIVE must be enabled for this to work.
|
||||
|
||||
Defaults to false.
|
||||
|
||||
PAPERLESS_CONSUMER_ENABLE_BARCODES=<bool>
|
||||
Enables the scanning and page separation based on detected barcodes.
|
||||
This allows for scanning and adding multiple documents per uploaded
|
||||
file, which are separated by one or multiple barcode pages.
|
||||
|
||||
For ease of use, it is suggested to use a standardized separation page,
|
||||
e.g. `here <https://www.alliancegroup.co.uk/patch-codes.htm>`_.
|
||||
|
||||
If no barcodes are detected in the uploaded file, no page separation
|
||||
will happen.
|
||||
|
||||
The original document will be removed and the separated pages will be
|
||||
saved as pdf.
|
||||
|
||||
Defaults to false.
|
||||
|
||||
|
||||
PAPERLESS_CONSUMER_BARCODE_TIFF_SUPPORT=<bool>
|
||||
Whether TIFF image files should be scanned for barcodes.
|
||||
This will automatically convert any TIFF image(s) to pdfs for later
|
||||
processing.
|
||||
This only has an effect, if PAPERLESS_CONSUMER_ENABLE_BARCODES has been
|
||||
enabled.
|
||||
|
||||
Defaults to false.
|
||||
|
||||
PAPERLESS_CONSUMER_BARCODE_STRING=PATCHT
|
||||
Defines the string to be detected as a separator barcode.
|
||||
If paperless is used with the PATCH-T separator pages, users
|
||||
shouldn't change this.
|
||||
|
||||
Defaults to "PATCHT"
|
||||
|
||||
PAPERLESS_CONVERT_MEMORY_LIMIT=<num>
|
||||
On smaller systems, or even in the case of Very Large Documents, the consumer
|
||||
may explode, complaining about how it's "unable to extend pixel cache". In
|
||||
such cases, try setting this to a reasonably low value, like 32. The
|
||||
default is to use whatever is necessary to do everything without writing to
|
||||
disk, and units are in megabytes.
|
||||
|
||||
For more information on how to use this value, you should search
|
||||
the web for "MAGICK_MEMORY_LIMIT".
|
||||
|
||||
Defaults to 0, which disables the limit.
|
||||
|
||||
PAPERLESS_CONVERT_TMPDIR=<path>
|
||||
Similar to the memory limit, if you've got a small system and your OS mounts
|
||||
/tmp as tmpfs, you should set this to a path that's on a physical disk, like
|
||||
/home/your_user/tmp or something. ImageMagick will use this as scratch space
|
||||
when crunching through very large documents.
|
||||
|
||||
For more information on how to use this value, you should search
|
||||
the web for "MAGICK_TMPDIR".
|
||||
|
||||
Default is none, which disables the temporary directory.
|
||||
|
||||
PAPERLESS_POST_CONSUME_SCRIPT=<filename>
|
||||
After a document is consumed, Paperless can trigger an arbitrary script if
|
||||
you like. This script will be passed a number of arguments for you to work
|
||||
with. For more information, take a look at :ref:`advanced-post_consume_script`.
|
||||
|
||||
The default is blank, which means nothing will be executed.
|
||||
|
||||
PAPERLESS_FILENAME_DATE_ORDER=<format>
|
||||
Paperless will check the document text for document date information.
|
||||
Use this setting to enable checking the document filename for date
|
||||
information. The date order can be set to any option as specified in
|
||||
https://dateparser.readthedocs.io/en/latest/settings.html#date-order.
|
||||
The filename will be checked first, and if nothing is found, the document
|
||||
text will be checked as normal.
|
||||
|
||||
A date in a filename must have some separators (`.`, `-`, `/`, etc)
|
||||
for it to be parsed.
|
||||
|
||||
Defaults to none, which disables this feature.
|
||||
|
||||
PAPERLESS_NUMBER_OF_SUGGESTED_DATES=<num>
|
||||
Paperless searches an entire document for dates. The first date found will
|
||||
be used as the initial value for the created date. When this variable is
|
||||
greater than 0 (or left to it's default value), paperless will also suggest
|
||||
other dates found in the document, up to a maximum of this setting. Note that
|
||||
duplicates will be removed, which can result in fewer dates displayed in the
|
||||
frontend than this setting value.
|
||||
|
||||
The task to find all dates can be time-consuming and increases with a higher
|
||||
(maximum) number of suggested dates and slower hardware.
|
||||
|
||||
Defaults to 3. Set to 0 to disable this feature.
|
||||
|
||||
PAPERLESS_THUMBNAIL_FONT_NAME=<filename>
|
||||
Paperless creates thumbnails for plain text files by rendering the content
|
||||
of the file on an image and uses a predefined font for that. This
|
||||
font can be changed here.
|
||||
|
||||
Note that this won't have any effect on already generated thumbnails.
|
||||
|
||||
Defaults to ``/usr/share/fonts/liberation/LiberationSerif-Regular.ttf``.
|
||||
|
||||
PAPERLESS_IGNORE_DATES=<string>
|
||||
Paperless parses a documents creation date from filename and file content.
|
||||
You may specify a comma separated list of dates that should be ignored during
|
||||
this process. This is useful for special dates (like date of birth) that appear
|
||||
in documents regularly but are very unlikely to be the documents creation date.
|
||||
|
||||
The date is parsed using the order specified in PAPERLESS_DATE_ORDER
|
||||
|
||||
Defaults to an empty string to not ignore any dates.
|
||||
|
||||
PAPERLESS_DATE_ORDER=<format>
|
||||
Paperless will try to determine the document creation date from its contents.
|
||||
Specify the date format Paperless should expect to see within your documents.
|
||||
|
||||
This option defaults to DMY which translates to day first, month second, and year
|
||||
last order. Characters D, M, or Y can be shuffled to meet the required order.
|
||||
|
||||
PAPERLESS_CONSUMER_IGNORE_PATTERNS=<json>
|
||||
By default, paperless ignores certain files and folders in the consumption
|
||||
directory, such as system files created by the Mac OS.
|
||||
|
||||
This can be adjusted by configuring a custom json array with patterns to exclude.
|
||||
|
||||
Defaults to ``[".DS_STORE/*", "._*", ".stfolder/*", ".stversions/*", ".localized/*", "desktop.ini"]``.
|
||||
|
||||
Binaries
|
||||
########
|
||||
|
||||
There are a few external software packages that Paperless expects to find on
|
||||
your system when it starts up. Unless you've done something creative with
|
||||
their installation, you probably won't need to edit any of these. However,
|
||||
if you've installed these programs somewhere where simply typing the name of
|
||||
the program doesn't automatically execute it (ie. the program isn't in your
|
||||
$PATH), then you'll need to specify the literal path for that program.
|
||||
|
||||
PAPERLESS_CONVERT_BINARY=<path>
|
||||
Defaults to "convert".
|
||||
|
||||
PAPERLESS_GS_BINARY=<path>
|
||||
Defaults to "gs".
|
||||
|
||||
|
||||
.. _configuration-docker:
|
||||
|
||||
Docker-specific options
|
||||
#######################
|
||||
|
||||
These options don't have any effect in ``paperless.conf``. These options adjust
|
||||
the behavior of the docker container. Configure these in `docker-compose.env`.
|
||||
|
||||
PAPERLESS_WEBSERVER_WORKERS=<num>
|
||||
The number of worker processes the webserver should spawn. More worker processes
|
||||
usually result in the front end to load data much quicker. However, each worker process
|
||||
also loads the entire application into memory separately, so increasing this value
|
||||
will increase RAM usage.
|
||||
|
||||
Defaults to 1.
|
||||
|
||||
PAPERLESS_BIND_ADDR=<ip address>
|
||||
The IP address the webserver will listen on inside the container. There are
|
||||
special setups where you may need to configure this value to restrict the
|
||||
Ip address or interface the webserver listens on.
|
||||
|
||||
Defaults to [::], meaning all interfaces, including IPv6.
|
||||
|
||||
PAPERLESS_PORT=<port>
|
||||
The port number the webserver will listen on inside the container. There are
|
||||
special setups where you may need this to avoid collisions with other
|
||||
services (like using podman with multiple containers in one pod).
|
||||
|
||||
Don't change this when using Docker. To change the port the webserver is
|
||||
reachable outside of the container, instead refer to the "ports" key in
|
||||
``docker-compose.yml``.
|
||||
|
||||
Defaults to 8000.
|
||||
|
||||
USERMAP_UID=<uid>
|
||||
The ID of the paperless user in the container. Set this to your actual user ID on the
|
||||
host system, which you can get by executing
|
||||
|
||||
.. code:: shell-session
|
||||
|
||||
$ id -u
|
||||
|
||||
Paperless will change ownership on its folders to this user, so you need to get this right
|
||||
in order to be able to write to the consumption directory.
|
||||
|
||||
Defaults to 1000.
|
||||
|
||||
USERMAP_GID=<gid>
|
||||
The ID of the paperless Group in the container. Set this to your actual group ID on the
|
||||
host system, which you can get by executing
|
||||
|
||||
.. code:: shell-session
|
||||
|
||||
$ id -g
|
||||
|
||||
Paperless will change ownership on its folders to this group, so you need to get this right
|
||||
in order to be able to write to the consumption directory.
|
||||
|
||||
Defaults to 1000.
|
||||
|
||||
PAPERLESS_OCR_LANGUAGES=<list>
|
||||
Additional OCR languages to install. By default, paperless comes with
|
||||
English, German, Italian, Spanish and French. If your language is not in this list, install
|
||||
additional languages with this configuration option:
|
||||
|
||||
.. code:: bash
|
||||
|
||||
PAPERLESS_OCR_LANGUAGES=tur ces
|
||||
|
||||
To actually use these languages, also set the default OCR language of paperless:
|
||||
|
||||
.. code:: bash
|
||||
|
||||
PAPERLESS_OCR_LANGUAGE=tur
|
||||
|
||||
Defaults to none, which does not install any additional languages.
|
||||
|
||||
PAPERLESS_ENABLE_FLOWER=<defined>
|
||||
If this environment variable is defined, the Celery monitoring tool
|
||||
`Flower <https://flower.readthedocs.io/en/latest/index.html>`_ will
|
||||
be started by the container.
|
||||
|
||||
You can read more about this in the :ref:`advanced setup <advanced-celery-monitoring>`
|
||||
documentation.
|
||||
|
||||
|
||||
.. _configuration-update-checking:
|
||||
|
||||
Update Checking
|
||||
###############
|
||||
|
||||
PAPERLESS_ENABLE_UPDATE_CHECK=<bool>
|
||||
|
||||
.. note::
|
||||
|
||||
This setting was deprecated in favor of a frontend setting after v1.9.2. A one-time
|
||||
migration is performed for users who have this setting set. This setting is always
|
||||
ignored if the corresponding frontend setting has been set.
|
||||
You will be redirected shortly...
|
||||
|
@ -1,431 +1,12 @@
|
||||
.. _extending:
|
||||
|
||||
*************************
|
||||
Paperless-ngx Development
|
||||
#########################
|
||||
*************************
|
||||
|
||||
This section describes the steps you need to take to start development on paperless-ngx.
|
||||
|
||||
Check out the source from github. The repository is organized in the following way:
|
||||
.. cssclass:: redirect-notice
|
||||
|
||||
* ``main`` always represents the latest release and will only see changes
|
||||
when a new release is made.
|
||||
* ``dev`` contains the code that will be in the next release.
|
||||
* ``feature-X`` contain bigger changes that will be in some release, but not
|
||||
necessarily the next one.
|
||||
The Paperless-ngx documentation has permanently moved.
|
||||
|
||||
When making functional changes to paperless, *always* make your changes on the ``dev`` branch.
|
||||
|
||||
Apart from that, the folder structure is as follows:
|
||||
|
||||
* ``docs/`` - Documentation.
|
||||
* ``src-ui/`` - Code of the front end.
|
||||
* ``src/`` - Code of the back end.
|
||||
* ``scripts/`` - Various scripts that help with different parts of development.
|
||||
* ``docker/`` - Files required to build the docker image.
|
||||
|
||||
Contributing to Paperless
|
||||
=========================
|
||||
|
||||
Maybe you've been using Paperless for a while and want to add a feature or two,
|
||||
or maybe you've come across a bug that you have some ideas how to solve. The
|
||||
beauty of open source software is that you can see what's wrong and help to get
|
||||
it fixed for everyone!
|
||||
|
||||
Before contributing please review our `code of conduct`_ and other important
|
||||
information in the `contributing guidelines`_.
|
||||
|
||||
.. _code-formatting-with-pre-commit-hooks:
|
||||
|
||||
Code formatting with pre-commit Hooks
|
||||
=====================================
|
||||
|
||||
To ensure a consistent style and formatting across the project source, the project
|
||||
utilizes a Git `pre-commit` hook to perform some formatting and linting before a
|
||||
commit is allowed. That way, everyone uses the same style and some common issues
|
||||
can be caught early on. See below for installation instructions.
|
||||
|
||||
Once installed, hooks will run when you commit. If the formatting isn't quite right
|
||||
or a linter catches something, the commit will be rejected. You'll need to look at the
|
||||
output and fix the issue. Some hooks, such as the Python formatting tool `black`,
|
||||
will format failing files, so all you need to do is `git add` those files again and
|
||||
retry your commit.
|
||||
|
||||
Initial setup and first start
|
||||
=============================
|
||||
|
||||
After you forked and cloned the code from github you need to perform a first-time setup.
|
||||
To do the setup you need to perform the steps from the following chapters in a certain order:
|
||||
|
||||
1. Install prerequisites + pipenv as mentioned in :ref:`Bare metal route <setup-bare_metal>`
|
||||
2. Copy ``paperless.conf.example`` to ``paperless.conf`` and enable debug mode.
|
||||
3. Install the Angular CLI interface:
|
||||
|
||||
.. code:: shell-session
|
||||
|
||||
$ npm install -g @angular/cli
|
||||
|
||||
4. Install pre-commit
|
||||
|
||||
.. code:: shell-session
|
||||
|
||||
pre-commit install
|
||||
|
||||
5. Create ``consume`` and ``media`` folders in the cloned root folder.
|
||||
|
||||
.. code:: shell-session
|
||||
|
||||
mkdir -p consume media
|
||||
|
||||
6. You can now either ...
|
||||
|
||||
* install redis or
|
||||
* use the included scripts/start-services.sh to use docker to fire up a redis instance (and some other services such as tika, gotenberg and a database server) or
|
||||
* spin up a bare redis container
|
||||
|
||||
.. code:: shell-session
|
||||
|
||||
docker run -d -p 6379:6379 --restart unless-stopped redis:latest
|
||||
|
||||
7. Install the python dependencies by performing in the src/ directory.
|
||||
|
||||
.. code:: shell-session
|
||||
|
||||
pipenv install --dev
|
||||
|
||||
* Make sure you're using python 3.9.x or lower. Otherwise you might get issues with building dependencies. You can use `pyenv <https://github.com/pyenv/pyenv>`_ to install a specific python version.
|
||||
|
||||
8. Generate the static UI so you can perform a login to get session that is required for frontend development (this needs to be done one time only). From src-ui directory:
|
||||
|
||||
.. code:: shell-session
|
||||
|
||||
npm install .
|
||||
./node_modules/.bin/ng build --configuration production
|
||||
|
||||
9. Apply migrations and create a superuser for your dev instance:
|
||||
|
||||
.. code:: shell-session
|
||||
|
||||
python3 manage.py migrate
|
||||
python3 manage.py createsuperuser
|
||||
|
||||
10. Now spin up the dev backend. Depending on which part of paperless you're developing for, you need to have some or all of them running.
|
||||
|
||||
.. code:: shell-session
|
||||
|
||||
python3 manage.py runserver & python3 manage.py document_consumer & celery --app paperless worker
|
||||
|
||||
11. Login with the superuser credentials provided in step 8 at ``http://localhost:8000`` to create a session that enables you to use the backend.
|
||||
|
||||
Backend development environment is now ready, to start Frontend development go to ``/src-ui`` and run ``ng serve``. From there you can use ``http://localhost:4200`` for a preview.
|
||||
|
||||
Back end development
|
||||
====================
|
||||
|
||||
The backend is a django application. PyCharm works well for development, but you can use whatever
|
||||
you want.
|
||||
|
||||
Configure the IDE to use the src/ folder as the base source folder. Configure the following
|
||||
launch configurations in your IDE:
|
||||
|
||||
* python3 manage.py runserver
|
||||
* celery --app paperless worker
|
||||
* python3 manage.py document_consumer
|
||||
|
||||
To start them all:
|
||||
|
||||
.. code:: shell-session
|
||||
|
||||
python3 manage.py runserver & python3 manage.py document_consumer & celery --app paperless worker
|
||||
|
||||
Testing and code style:
|
||||
|
||||
* Run ``pytest`` in the src/ directory to execute all tests. This also generates a HTML coverage
|
||||
report. When runnings test, paperless.conf is loaded as well. However: the tests rely on the default
|
||||
configuration. This is not ideal. But for now, make sure no settings except for DEBUG are overridden when testing.
|
||||
* Coding style is enforced by the Git pre-commit hooks. These will ensure your code is formatted and do some
|
||||
linting when you do a `git commit`.
|
||||
* You can also run ``black`` manually to format your code
|
||||
|
||||
.. note::
|
||||
|
||||
The line length rule E501 is generally useful for getting multiple source files
|
||||
next to each other on the screen. However, in some cases, its just not possible
|
||||
to make some lines fit, especially complicated IF cases. Append ``# NOQA: E501``
|
||||
to disable this check for certain lines.
|
||||
|
||||
Front end development
|
||||
=====================
|
||||
|
||||
The front end is built using Angular. In order to get started, you need ``npm``.
|
||||
Install the Angular CLI interface with
|
||||
|
||||
.. code:: shell-session
|
||||
|
||||
$ npm install -g @angular/cli
|
||||
|
||||
and make sure that it's on your path. Next, in the src-ui/ directory, install the
|
||||
required dependencies of the project.
|
||||
|
||||
.. code:: shell-session
|
||||
|
||||
$ npm install
|
||||
|
||||
You can launch a development server by running
|
||||
|
||||
.. code:: shell-session
|
||||
|
||||
$ ng serve
|
||||
|
||||
This will automatically update whenever you save. However, in-place compilation might fail
|
||||
on syntax errors, in which case you need to restart it.
|
||||
|
||||
By default, the development server is available on ``http://localhost:4200/`` and is configured
|
||||
to access the API at ``http://localhost:8000/api/``, which is the default of the backend.
|
||||
If you enabled DEBUG on the back end, several security overrides for allowed hosts, CORS and
|
||||
X-Frame-Options are in place so that the front end behaves exactly as in production. This also
|
||||
relies on you being logged into the back end. Without a valid session, The front end will simply
|
||||
not work.
|
||||
|
||||
Testing and code style:
|
||||
|
||||
* The frontend code (.ts, .html, .scss) use ``prettier`` for code formatting via the Git
|
||||
``pre-commit`` hooks which run automatically on commit. See
|
||||
:ref:`above <code-formatting-with-pre-commit-hooks>` for installation. You can also run this
|
||||
via cli with a command such as
|
||||
|
||||
.. code:: shell-session
|
||||
|
||||
$ git ls-files -- '*.ts' | xargs pre-commit run prettier --files
|
||||
|
||||
* Frontend testing uses jest and cypress. There is currently a need for significantly more
|
||||
frontend tests. Unit tests and e2e tests, respectively, can be run non-interactively with:
|
||||
|
||||
.. code:: shell-session
|
||||
|
||||
$ ng test
|
||||
$ npm run e2e:ci
|
||||
|
||||
Cypress also includes a UI which can be run from within the ``src-ui`` directory with
|
||||
|
||||
.. code:: shell-session
|
||||
|
||||
$ ./node_modules/.bin/cypress open
|
||||
|
||||
In order to build the front end and serve it as part of django, execute
|
||||
|
||||
.. code:: shell-session
|
||||
|
||||
$ ng build --prod
|
||||
|
||||
This will build the front end and put it in a location from which the Django server will serve
|
||||
it as static content. This way, you can verify that authentication is working.
|
||||
|
||||
|
||||
Localization
|
||||
============
|
||||
|
||||
Paperless is available in many different languages. Since paperless consists both of a django
|
||||
application and an Angular front end, both these parts have to be translated separately.
|
||||
|
||||
Front end localization
|
||||
----------------------
|
||||
|
||||
* The Angular front end does localization according to the `Angular documentation <https://angular.io/guide/i18n>`_.
|
||||
* The source language of the project is "en_US".
|
||||
* The source strings end up in the file "src-ui/messages.xlf".
|
||||
* The translated strings need to be placed in the "src-ui/src/locale/" folder.
|
||||
* In order to extract added or changed strings from the source files, call ``ng xi18n --ivy``.
|
||||
|
||||
Adding new languages requires adding the translated files in the "src-ui/src/locale/" folder and adjusting a couple files.
|
||||
|
||||
1. Adjust "src-ui/angular.json":
|
||||
|
||||
.. code:: json
|
||||
|
||||
"i18n": {
|
||||
"sourceLocale": "en-US",
|
||||
"locales": {
|
||||
"de": "src/locale/messages.de.xlf",
|
||||
"nl-NL": "src/locale/messages.nl_NL.xlf",
|
||||
"fr": "src/locale/messages.fr.xlf",
|
||||
"en-GB": "src/locale/messages.en_GB.xlf",
|
||||
"pt-BR": "src/locale/messages.pt_BR.xlf",
|
||||
"language-code": "language-file"
|
||||
}
|
||||
}
|
||||
|
||||
2. Add the language to the available options in "src-ui/src/app/services/settings.service.ts":
|
||||
|
||||
.. code:: typescript
|
||||
|
||||
getLanguageOptions(): LanguageOption[] {
|
||||
return [
|
||||
{code: "en-us", name: $localize`English (US)`, englishName: "English (US)", dateInputFormat: "mm/dd/yyyy"},
|
||||
{code: "en-gb", name: $localize`English (GB)`, englishName: "English (GB)", dateInputFormat: "dd/mm/yyyy"},
|
||||
{code: "de", name: $localize`German`, englishName: "German", dateInputFormat: "dd.mm.yyyy"},
|
||||
{code: "nl", name: $localize`Dutch`, englishName: "Dutch", dateInputFormat: "dd-mm-yyyy"},
|
||||
{code: "fr", name: $localize`French`, englishName: "French", dateInputFormat: "dd/mm/yyyy"},
|
||||
{code: "pt-br", name: $localize`Portuguese (Brazil)`, englishName: "Portuguese (Brazil)", dateInputFormat: "dd/mm/yyyy"}
|
||||
// Add your new language here
|
||||
]
|
||||
}
|
||||
|
||||
``dateInputFormat`` is a special string that defines the behavior of the date input fields and absolutely needs to contain "dd", "mm" and "yyyy".
|
||||
|
||||
3. Import and register the Angular data for this locale in "src-ui/src/app/app.module.ts":
|
||||
|
||||
.. code:: typescript
|
||||
|
||||
import localeDe from '@angular/common/locales/de';
|
||||
registerLocaleData(localeDe)
|
||||
|
||||
Back end localization
|
||||
---------------------
|
||||
|
||||
A majority of the strings that appear in the back end appear only when the admin is used. However,
|
||||
some of these are still shown on the front end (such as error messages).
|
||||
|
||||
* The django application does localization according to the `django documentation <https://docs.djangoproject.com/en/3.1/topics/i18n/translation/>`_.
|
||||
* The source language of the project is "en_US".
|
||||
* Localization files end up in the folder "src/locale/".
|
||||
* In order to extract strings from the application, call ``python3 manage.py makemessages -l en_US``. This is important after making changes to translatable strings.
|
||||
* The message files need to be compiled for them to show up in the application. Call ``python3 manage.py compilemessages`` to do this. The generated files don't get
|
||||
committed into git, since these are derived artifacts. The build pipeline takes care of executing this command.
|
||||
|
||||
Adding new languages requires adding the translated files in the "src/locale/" folder and adjusting the file "src/paperless/settings.py" to include the new language:
|
||||
|
||||
.. code:: python
|
||||
|
||||
LANGUAGES = [
|
||||
("en-us", _("English (US)")),
|
||||
("en-gb", _("English (GB)")),
|
||||
("de", _("German")),
|
||||
("nl-nl", _("Dutch")),
|
||||
("fr", _("French")),
|
||||
("pt-br", _("Portuguese (Brazil)")),
|
||||
# Add language here.
|
||||
]
|
||||
|
||||
|
||||
Building the documentation
|
||||
==========================
|
||||
|
||||
The documentation is built using sphinx. I've configured ReadTheDocs to automatically build
|
||||
the documentation when changes are pushed. If you want to build the documentation locally,
|
||||
this is how you do it:
|
||||
|
||||
1. Install python dependencies.
|
||||
|
||||
.. code:: shell-session
|
||||
|
||||
$ cd /path/to/paperless
|
||||
$ pipenv install --dev
|
||||
|
||||
2. Build the documentation
|
||||
|
||||
.. code:: shell-session
|
||||
|
||||
$ cd /path/to/paperless/docs
|
||||
$ pipenv run make clean html
|
||||
|
||||
This will build the HTML documentation, and put the resulting files in the ``_build/html``
|
||||
directory.
|
||||
|
||||
Building the Docker image
|
||||
=========================
|
||||
|
||||
The docker image is primarily built by the GitHub actions workflow, but it can be
|
||||
faster when developing to build and tag an image locally.
|
||||
|
||||
To provide the build arguments automatically, build the image using the helper
|
||||
script ``build-docker-image.sh``.
|
||||
|
||||
Building the docker image from source:
|
||||
|
||||
.. code:: shell-session
|
||||
|
||||
./build-docker-image.sh Dockerfile -t <your-tag>
|
||||
|
||||
Extending Paperless
|
||||
===================
|
||||
|
||||
Paperless does not have any fancy plugin systems and will probably never have. However,
|
||||
some parts of the application have been designed to allow easy integration of additional
|
||||
features without any modification to the base code.
|
||||
|
||||
Making custom parsers
|
||||
---------------------
|
||||
|
||||
Paperless uses parsers to add documents to paperless. A parser is responsible for:
|
||||
|
||||
* Retrieve the content from the original
|
||||
* Create a thumbnail
|
||||
* Optional: Retrieve a created date from the original
|
||||
* Optional: Create an archived document from the original
|
||||
|
||||
Custom parsers can be added to paperless to support more file types. In order to do that,
|
||||
you need to write the parser itself and announce its existence to paperless.
|
||||
|
||||
The parser itself must extend ``documents.parsers.DocumentParser`` and must implement the
|
||||
methods ``parse`` and ``get_thumbnail``. You can provide your own implementation to
|
||||
``get_date`` if you don't want to rely on paperless' default date guessing mechanisms.
|
||||
|
||||
.. code:: python
|
||||
|
||||
class MyCustomParser(DocumentParser):
|
||||
|
||||
def parse(self, document_path, mime_type):
|
||||
# This method does not return anything. Rather, you should assign
|
||||
# whatever you got from the document to the following fields:
|
||||
|
||||
# The content of the document.
|
||||
self.text = "content"
|
||||
|
||||
# Optional: path to a PDF document that you created from the original.
|
||||
self.archive_path = os.path.join(self.tempdir, "archived.pdf")
|
||||
|
||||
# Optional: "created" date of the document.
|
||||
self.date = get_created_from_metadata(document_path)
|
||||
|
||||
def get_thumbnail(self, document_path, mime_type):
|
||||
# This should return the path to a thumbnail you created for this
|
||||
# document.
|
||||
return os.path.join(self.tempdir, "thumb.png")
|
||||
|
||||
If you encounter any issues during parsing, raise a ``documents.parsers.ParseError``.
|
||||
|
||||
The ``self.tempdir`` directory is a temporary directory that is guaranteed to be empty
|
||||
and removed after consumption finished. You can use that directory to store any
|
||||
intermediate files and also use it to store the thumbnail / archived document.
|
||||
|
||||
After that, you need to announce your parser to paperless. You need to connect a
|
||||
handler to the ``document_consumer_declaration`` signal. Have a look in the file
|
||||
``src/paperless_tesseract/apps.py`` on how that's done. The handler is a method
|
||||
that returns information about your parser:
|
||||
|
||||
.. code:: python
|
||||
|
||||
def myparser_consumer_declaration(sender, **kwargs):
|
||||
return {
|
||||
"parser": MyCustomParser,
|
||||
"weight": 0,
|
||||
"mime_types": {
|
||||
"application/pdf": ".pdf",
|
||||
"image/jpeg": ".jpg",
|
||||
}
|
||||
}
|
||||
|
||||
* ``parser`` is a reference to a class that extends ``DocumentParser``.
|
||||
|
||||
* ``weight`` is used whenever two or more parsers are able to parse a file: The parser with
|
||||
the higher weight wins. This can be used to override the parsers provided by
|
||||
paperless.
|
||||
|
||||
* ``mime_types`` is a dictionary. The keys are the mime types your parser supports and the value
|
||||
is the default file extension that paperless should use when storing files and serving them for
|
||||
download. We could guess that from the file extensions, but some mime types have many extensions
|
||||
associated with them and the python methods responsible for guessing the extension do not always
|
||||
return the same value.
|
||||
|
||||
.. _code of conduct: https://github.com/paperless-ngx/paperless-ngx/blob/main/CODE_OF_CONDUCT.md
|
||||
.. _contributing guidelines: https://github.com/paperless-ngx/paperless-ngx/blob/main/CONTRIBUTING.md
|
||||
You will be redirected shortly...
|
||||
|
113
docs/faq.rst
113
docs/faq.rst
@ -1,117 +1,12 @@
|
||||
.. _faq:
|
||||
|
||||
**************************
|
||||
Frequently asked questions
|
||||
**************************
|
||||
|
||||
**Q:** *What's the general plan for Paperless-ngx?*
|
||||
|
||||
**A:** While Paperless-ngx is already considered largely "feature-complete" it is a community-driven
|
||||
project and development will be guided in this way. New features can be submitted via
|
||||
GitHub discussions and "up-voted" by the community but this is not a guarantee the feature
|
||||
will be implemented. This project will always be open to collaboration in the form of PRs,
|
||||
ideas etc.
|
||||
.. cssclass:: redirect-notice
|
||||
|
||||
**Q:** *I'm using docker. Where are my documents?*
|
||||
The Paperless-ngx documentation has permanently moved.
|
||||
|
||||
**A:** Your documents are stored inside the docker volume ``paperless_media``.
|
||||
Docker manages this volume automatically for you. It is a persistent storage
|
||||
and will persist as long as you don't explicitly delete it. The actual location
|
||||
depends on your host operating system. On Linux, chances are high that this location
|
||||
is
|
||||
|
||||
.. code::
|
||||
|
||||
/var/lib/docker/volumes/paperless_media/_data
|
||||
|
||||
.. caution::
|
||||
|
||||
Do not mess with this folder. Don't change permissions and don't move
|
||||
files around manually. This folder is meant to be entirely managed by docker
|
||||
and paperless.
|
||||
|
||||
**Q:** *Let's say I want to switch tools in a year. Can I easily move to other systems?*
|
||||
|
||||
**A:** Your documents are stored as plain files inside the media folder. You can always drag those files
|
||||
out of that folder to use them elsewhere. Here are a couple notes about that.
|
||||
|
||||
* Paperless-ngx never modifies your original documents. It keeps checksums of all documents and uses a
|
||||
scheduled sanity checker to check that they remain the same.
|
||||
* By default, paperless uses the internal ID of each document as its filename. This might not be very
|
||||
convenient for export. However, you can adjust the way files are stored in paperless by
|
||||
:ref:`configuring the filename format <advanced-file_name_handling>`.
|
||||
* :ref:`The exporter <utilities-exporter>` is another easy way to get your files out of paperless with reasonable file names.
|
||||
|
||||
**Q:** *What file types does paperless-ngx support?*
|
||||
|
||||
**A:** Currently, the following files are supported:
|
||||
|
||||
* PDF documents, PNG images, JPEG images, TIFF images and GIF images are processed with OCR and converted into PDF documents.
|
||||
* Plain text documents are supported as well and are added verbatim
|
||||
to paperless.
|
||||
* With the optional Tika integration enabled (see :ref:`Configuration <configuration-tika>`), Paperless also supports various
|
||||
Office documents (.docx, .doc, odt, .ppt, .pptx, .odp, .xls, .xlsx, .ods).
|
||||
|
||||
Paperless-ngx determines the type of a file by inspecting its content. The
|
||||
file extensions do not matter.
|
||||
|
||||
**Q:** *Will paperless-ngx run on Raspberry Pi?*
|
||||
|
||||
**A:** The short answer is yes. I've tested it on a Raspberry Pi 3 B.
|
||||
The long answer is that certain parts of
|
||||
Paperless will run very slow, such as the OCR. On Raspberry Pi,
|
||||
try to OCR documents before feeding them into paperless so that paperless can
|
||||
reuse the text. The web interface is a lot snappier, since it runs
|
||||
in your browser and paperless has to do much less work to serve the data.
|
||||
|
||||
.. note::
|
||||
|
||||
You can adjust some of the settings so that paperless uses less processing
|
||||
power. See :ref:`setup-less_powerful_devices` for details.
|
||||
|
||||
|
||||
**Q:** *How do I install paperless-ngx on Raspberry Pi?*
|
||||
|
||||
**A:** Docker images are available for arm and arm64 hardware, so just follow
|
||||
the docker-compose instructions. Apart from more required disk space compared to
|
||||
a bare metal installation, docker comes with close to zero overhead, even on
|
||||
Raspberry Pi.
|
||||
|
||||
If you decide to got with the bare metal route, be aware that some of the
|
||||
python requirements do not have precompiled packages for ARM / ARM64. Installation
|
||||
of these will require additional development libraries and compilation will take
|
||||
a long time.
|
||||
|
||||
**Q:** *How do I run this on Unraid?*
|
||||
|
||||
**A:** Paperless-ngx is available as `community app <https://unraid.net/community/apps?q=paperless-ngx>`_
|
||||
in Unraid. `Uli Fahrer <https://github.com/Tooa>`_ created a container template for that.
|
||||
|
||||
**Q:** *How do I run this on my toaster?*
|
||||
|
||||
**A:** I honestly don't know! As for all other devices that might be able
|
||||
to run paperless, you're a bit on your own. If you can't run the docker image,
|
||||
the documentation has instructions for bare metal installs. I'm running
|
||||
paperless on an i3 processor from 2015 or so. This is also what I use to test
|
||||
new releases with. Apart from that, I also have a Raspberry Pi, which I
|
||||
occasionally build the image on and see if it works.
|
||||
|
||||
**Q:** *How do I proxy this with NGINX?*
|
||||
|
||||
**A:** See :ref:`here <setup-nginx>`.
|
||||
|
||||
.. _faq-mod_wsgi:
|
||||
|
||||
**Q:** *How do I get WebSocket support with Apache mod_wsgi*?
|
||||
|
||||
**A:** ``mod_wsgi`` by itself does not support ASGI. Paperless will continue
|
||||
to work with WSGI, but certain features such as status notifications about
|
||||
document consumption won't be available.
|
||||
|
||||
If you want to continue using ``mod_wsgi``, you will have to run an ASGI-enabled
|
||||
web server as well that processes WebSocket connections, and configure Apache to
|
||||
redirect WebSocket connections to this server. Multiple options for ASGI servers
|
||||
exist:
|
||||
|
||||
* ``gunicorn`` with ``uvicorn`` as the worker implementation (the default of paperless)
|
||||
* ``daphne`` as a standalone server, which is the reference implementation for ASGI.
|
||||
* ``uvicorn`` as a standalone server
|
||||
You will be redirected shortly...
|
||||
|
@ -2,74 +2,24 @@
|
||||
Paperless
|
||||
*********
|
||||
|
||||
Paperless is a simple Django application running in two parts:
|
||||
a *Consumer* (the thing that does the indexing) and
|
||||
the *Web server* (the part that lets you search &
|
||||
download already-indexed documents). If you want to learn more about its
|
||||
functions keep on reading after the installation section.
|
||||
|
||||
.. cssclass:: redirect-notice
|
||||
|
||||
Why This Exists
|
||||
===============
|
||||
The Paperless-ngx documentation has permanently moved.
|
||||
|
||||
Paper is a nightmare. Environmental issues aside, there's no excuse for it in
|
||||
the 21st century. It takes up space, collects dust, doesn't support any form
|
||||
of a search feature, indexing is tedious, it's heavy and prone to damage &
|
||||
loss.
|
||||
|
||||
I wrote this to make "going paperless" easier. I do not have to worry about
|
||||
finding stuff again. I feed documents right from the post box into the scanner
|
||||
and then shred them. Perhaps you might find it useful too.
|
||||
|
||||
|
||||
Paperless-ngx
|
||||
=============
|
||||
|
||||
Paperless-ngx is a document management system that transforms your physical
|
||||
documents into a searchable online archive so you can keep, well, *less paper*.
|
||||
|
||||
Paperless-ngx forked from paperless-ng to continue the great work and
|
||||
distribute responsibility of supporting and advancing the project among a team
|
||||
of people.
|
||||
|
||||
NG stands for both Angular (the framework used for the
|
||||
Frontend) and next-gen. Publishing this project under a different name also
|
||||
avoids confusion between paperless and paperless-ngx.
|
||||
|
||||
If you want to learn about what's different in paperless-ngx from Paperless, check out these
|
||||
resources in the documentation:
|
||||
|
||||
* :ref:`Some screenshots <screenshots>` of the new UI are available.
|
||||
* Read :ref:`this section <advanced-automatic_matching>` if you want to
|
||||
learn about how paperless automates all tagging using machine learning.
|
||||
* Paperless now comes with a :ref:`proper email consumer <usage-email>`
|
||||
that's fully tested and production ready.
|
||||
* Paperless creates searchable PDF/A documents from whatever you put into
|
||||
the consumption directory. This means that you can select text in
|
||||
image-only documents coming from your scanner.
|
||||
* See :ref:`this note <utilities-encyption>` about GnuPG encryption in
|
||||
paperless-ngx.
|
||||
* Paperless is now integrated with a
|
||||
:ref:`task processing queue <setup-task_processor>` that tells you
|
||||
at a glance when and why something is not working.
|
||||
* The :doc:`changelog </changelog>` contains a detailed list of all changes
|
||||
in paperless-ngx.
|
||||
|
||||
Contents
|
||||
========
|
||||
You will be redirected shortly...
|
||||
|
||||
.. toctree::
|
||||
:maxdepth: 1
|
||||
|
||||
setup
|
||||
usage_overview
|
||||
advanced_usage
|
||||
administration
|
||||
configuration
|
||||
api
|
||||
faq
|
||||
troubleshooting
|
||||
extending
|
||||
scanners
|
||||
screenshots
|
||||
changelog
|
||||
screenshots
|
||||
scanners
|
||||
administration
|
||||
advanced_usage
|
||||
usage_overview
|
||||
setup
|
||||
troubleshooting
|
||||
changelog
|
||||
configuration
|
||||
extending
|
||||
api
|
||||
faq
|
||||
|
@ -1,8 +1,12 @@
|
||||
|
||||
.. _scanners:
|
||||
|
||||
*******************
|
||||
Scanners & Software
|
||||
*******************
|
||||
|
||||
Paperless-ngx is compatible with many different scanners and scanning tools. A user-maintained list of scanners and other software is available on `the wiki <https://github.com/paperless-ngx/paperless-ngx/wiki/Scanner-&-Software-Recommendations>`_.
|
||||
|
||||
.. cssclass:: redirect-notice
|
||||
|
||||
The Paperless-ngx documentation has permanently moved.
|
||||
|
||||
You will be redirected shortly...
|
||||
|
@ -4,60 +4,9 @@
|
||||
Screenshots
|
||||
***********
|
||||
|
||||
This is what Paperless-ngx looks like.
|
||||
|
||||
The dashboard shows customizable views on your document and allows document uploads:
|
||||
.. cssclass:: redirect-notice
|
||||
|
||||
.. image:: _static/screenshots/dashboard.png
|
||||
:target: _static/screenshots/dashboard.png
|
||||
The Paperless-ngx documentation has permanently moved.
|
||||
|
||||
The document list provides three different styles to scroll through your documents:
|
||||
|
||||
.. image:: _static/screenshots/documents-table.png
|
||||
:target: _static/screenshots/documents-table.png
|
||||
.. image:: _static/screenshots/documents-smallcards.png
|
||||
:target: _static/screenshots/documents-smallcards.png
|
||||
.. image:: _static/screenshots/documents-largecards.png
|
||||
:target: _static/screenshots/documents-largecards.png
|
||||
|
||||
Paperless-ngx also supports "dark mode":
|
||||
|
||||
.. image:: _static/screenshots/documents-smallcards-dark.png
|
||||
:target: _static/screenshots/documents-smallcards-dark.png
|
||||
|
||||
Extensive filtering mechanisms:
|
||||
|
||||
.. image:: _static/screenshots/documents-filter.png
|
||||
:target: _static/screenshots/documents-filter.png
|
||||
|
||||
Bulk editing of document tags, correspondents, etc.:
|
||||
|
||||
.. image:: _static/screenshots/bulk-edit.png
|
||||
:target: _static/screenshots/bulk-edit.png
|
||||
|
||||
Side-by-side editing of documents:
|
||||
|
||||
.. image:: _static/screenshots/editing.png
|
||||
:target: _static/screenshots/editing.png
|
||||
|
||||
Tag editing. This looks about the same for correspondents and document types.
|
||||
|
||||
.. image:: _static/screenshots/new-tag.png
|
||||
:target: _static/screenshots/new-tag.png
|
||||
|
||||
Searching provides auto complete and highlights the results.
|
||||
|
||||
.. image:: _static/screenshots/search-preview.png
|
||||
:target: _static/screenshots/search-preview.png
|
||||
.. image:: _static/screenshots/search-results.png
|
||||
:target: _static/screenshots/search-results.png
|
||||
|
||||
Fancy mail filters!
|
||||
|
||||
.. image:: _static/screenshots/mail-rules-edited.png
|
||||
:target: _static/screenshots/mail-rules-edited.png
|
||||
|
||||
Mobile devices are supported.
|
||||
|
||||
.. image:: _static/screenshots/mobile.png
|
||||
:target: _static/screenshots/mobile.png
|
||||
You will be redirected shortly...
|
||||
|
890
docs/setup.rst
890
docs/setup.rst
@ -1,894 +1,12 @@
|
||||
.. _setup:
|
||||
|
||||
*****
|
||||
Setup
|
||||
*****
|
||||
|
||||
Overview of Paperless-ngx
|
||||
#########################
|
||||
|
||||
Compared to paperless, paperless-ngx works a little different under the hood and has
|
||||
more moving parts that work together. While this increases the complexity of
|
||||
the system, it also brings many benefits.
|
||||
.. cssclass:: redirect-notice
|
||||
|
||||
Paperless consists of the following components:
|
||||
The Paperless-ngx documentation has permanently moved.
|
||||
|
||||
* **The webserver:** This is pretty much the same as in paperless. It serves
|
||||
the administration pages, the API, and the new frontend. This is the main
|
||||
tool you'll be using to interact with paperless. You may start the webserver
|
||||
with
|
||||
|
||||
.. code:: shell-session
|
||||
|
||||
$ cd /path/to/paperless/src/
|
||||
$ gunicorn -c ../gunicorn.conf.py paperless.wsgi
|
||||
|
||||
or by any other means such as Apache ``mod_wsgi``.
|
||||
|
||||
* **The consumer:** This is what watches your consumption folder for documents.
|
||||
However, the consumer itself does not really consume your documents.
|
||||
Now it notifies a task processor that a new file is ready for consumption.
|
||||
I suppose it should be named differently.
|
||||
This was also used to check your emails, but that's now done elsewhere as well.
|
||||
|
||||
Start the consumer with the management command ``document_consumer``:
|
||||
|
||||
.. code:: shell-session
|
||||
|
||||
$ cd /path/to/paperless/src/
|
||||
$ python3 manage.py document_consumer
|
||||
|
||||
.. _setup-task_processor:
|
||||
|
||||
* **The task processor:** Paperless relies on `Celery - Distributed Task Queue <https://docs.celeryq.dev/en/stable/index.html>`_
|
||||
for doing most of the heavy lifting. This is a task queue that accepts tasks from
|
||||
multiple sources and processes these in parallel. It also comes with a scheduler that executes
|
||||
certain commands periodically.
|
||||
|
||||
This task processor is responsible for:
|
||||
|
||||
* Consuming documents. When the consumer finds new documents, it notifies the task processor to
|
||||
start a consumption task.
|
||||
* The task processor also performs the consumption of any documents you upload through
|
||||
the web interface.
|
||||
* Consuming emails. It periodically checks your configured accounts for new emails and
|
||||
notifies the task processor to consume the attachment of an email.
|
||||
* Maintaining the search index and the automatic matching algorithm. These are things that paperless
|
||||
needs to do from time to time in order to operate properly.
|
||||
|
||||
This allows paperless to process multiple documents from your consumption folder in parallel! On
|
||||
a modern multi core system, this makes the consumption process with full OCR blazingly fast.
|
||||
|
||||
The task processor comes with a built-in admin interface that you can use to check whenever any of the
|
||||
tasks fail and inspect the errors (i.e., wrong email credentials, errors during consuming a specific
|
||||
file, etc).
|
||||
|
||||
* A `redis <https://redis.io/>`_ message broker: This is a really lightweight service that is responsible
|
||||
for getting the tasks from the webserver and the consumer to the task scheduler. These run in a different
|
||||
process (maybe even on different machines!), and therefore, this is necessary.
|
||||
|
||||
* Optional: A database server. Paperless supports PostgreSQL, MariaDB and SQLite for storing its data.
|
||||
|
||||
|
||||
Installation
|
||||
############
|
||||
|
||||
You can go multiple routes to setup and run Paperless:
|
||||
|
||||
* :ref:`Use the easy install docker script <setup-docker_script>`
|
||||
* :ref:`Pull the image from Docker Hub <setup-docker_hub>`
|
||||
* :ref:`Build the Docker image yourself <setup-docker_build>`
|
||||
* :ref:`Install Paperless directly on your system manually (bare metal) <setup-bare_metal>`
|
||||
|
||||
The Docker routes are quick & easy. These are the recommended routes. This configures all the stuff
|
||||
from the above automatically so that it just works and uses sensible defaults for all configuration options.
|
||||
Here you find a cheat-sheet for docker beginners: `CLI Basics <https://www.sehn.tech/refs/devops-with-docker/>`_
|
||||
|
||||
The bare metal route is complicated to setup but makes it easier
|
||||
should you want to contribute some code back. You need to configure and
|
||||
run the above mentioned components yourself.
|
||||
|
||||
.. _CLI Basics: https://www.sehn.tech/refs/devops-with-docker/
|
||||
|
||||
.. _setup-docker_script:
|
||||
|
||||
Install Paperless from Docker Hub using the installation script
|
||||
===============================================================
|
||||
|
||||
Paperless provides an interactive installation script. This script will ask you
|
||||
for a couple configuration options, download and create the necessary configuration files, pull the docker image, start paperless and create your user account. This script essentially
|
||||
performs all the steps described in :ref:`setup-docker_hub` automatically.
|
||||
|
||||
1. Make sure that docker and docker-compose are installed.
|
||||
2. Download and run the installation script:
|
||||
|
||||
.. code:: shell-session
|
||||
|
||||
$ bash -c "$(curl -L https://raw.githubusercontent.com/paperless-ngx/paperless-ngx/main/install-paperless-ngx.sh)"
|
||||
|
||||
.. _setup-docker_hub:
|
||||
|
||||
Install Paperless from Docker Hub
|
||||
=================================
|
||||
|
||||
1. Login with your user and create a folder in your home-directory `mkdir -v ~/paperless-ngx` to have a place for your configuration files and consumption directory.
|
||||
|
||||
2. Go to the `/docker/compose directory on the project page <https://github.com/paperless-ngx/paperless-ngx/tree/master/docker/compose>`_
|
||||
and download one of the `docker-compose.*.yml` files, depending on which database backend you
|
||||
want to use. Rename this file to `docker-compose.yml`.
|
||||
If you want to enable optional support for Office documents, download a file with `-tika` in the file name.
|
||||
Download the ``docker-compose.env`` file and the ``.env`` file as well and store them
|
||||
in the same directory.
|
||||
|
||||
.. hint::
|
||||
|
||||
For new installations, it is recommended to use PostgreSQL as the database
|
||||
backend.
|
||||
|
||||
3. Install `Docker`_ and `docker-compose`_.
|
||||
|
||||
.. caution::
|
||||
|
||||
If you want to use the included ``docker-compose.*.yml`` file, you
|
||||
need to have at least Docker version **17.09.0** and docker-compose
|
||||
version **1.17.0**.
|
||||
To check do: `docker-compose -v` or `docker -v`
|
||||
|
||||
See the `Docker installation guide`_ on how to install the current
|
||||
version of Docker for your operating system or Linux distribution of
|
||||
choice. To get the latest version of docker-compose, follow the
|
||||
`docker-compose installation guide`_ if your package repository doesn't
|
||||
include it.
|
||||
|
||||
.. _Docker installation guide: https://docs.docker.com/engine/installation/
|
||||
.. _docker-compose installation guide: https://docs.docker.com/compose/install/
|
||||
|
||||
4. Modify ``docker-compose.yml`` to your preferences. You may want to change the path
|
||||
to the consumption directory. Find the line that specifies where
|
||||
to mount the consumption directory:
|
||||
|
||||
.. code::
|
||||
|
||||
- ./consume:/usr/src/paperless/consume
|
||||
|
||||
Replace the part BEFORE the colon with a local directory of your choice:
|
||||
|
||||
.. code::
|
||||
|
||||
- /home/jonaswinkler/paperless-inbox:/usr/src/paperless/consume
|
||||
|
||||
Don't change the part after the colon or paperless wont find your documents.
|
||||
|
||||
You may also need to change the default port that the webserver will use
|
||||
from the default (8000):
|
||||
|
||||
.. code::
|
||||
|
||||
ports:
|
||||
- 8000:8000
|
||||
|
||||
Replace the part BEFORE the colon with a port of your choice:
|
||||
|
||||
.. code::
|
||||
|
||||
ports:
|
||||
- 8010:8000
|
||||
|
||||
Don't change the part after the colon or edit other lines that refer to
|
||||
port 8000. Modifying the part before the colon will map requests on another
|
||||
port to the webserver running on the default port.
|
||||
|
||||
**Rootless**
|
||||
|
||||
If you want to run Paperless as a rootless container, you will need to do the
|
||||
following in your ``docker-compose.yml``:
|
||||
|
||||
- set the ``user`` running the container to map to the ``paperless`` user in the
|
||||
container.
|
||||
This value (``user_id`` below), should be the same id that ``USERMAP_UID`` and
|
||||
``USERMAP_GID`` are set to in the next step.
|
||||
See ``USERMAP_UID`` and ``USERMAP_GID`` :ref:`here <configuration-docker>`.
|
||||
|
||||
Your entry for Paperless should contain something like:
|
||||
|
||||
.. code::
|
||||
|
||||
webserver:
|
||||
image: ghcr.io/paperless-ngx/paperless-ngx:latest
|
||||
user: <user_id>
|
||||
|
||||
5. Modify ``docker-compose.env``, following the comments in the file. The
|
||||
most important change is to set ``USERMAP_UID`` and ``USERMAP_GID``
|
||||
to the uid and gid of your user on the host system. Use ``id -u`` and
|
||||
``id -g`` to get these.
|
||||
|
||||
This ensures that
|
||||
both the docker container and you on the host machine have write access
|
||||
to the consumption directory. If your UID and GID on the host system is
|
||||
1000 (the default for the first normal user on most systems), it will
|
||||
work out of the box without any modifications. `id "username"` to check.
|
||||
|
||||
.. note::
|
||||
|
||||
You can copy any setting from the file ``paperless.conf.example`` and paste it here.
|
||||
Have a look at :ref:`configuration` to see what's available.
|
||||
|
||||
.. note::
|
||||
|
||||
You can utilize Docker secrets for some configuration settings by
|
||||
appending `_FILE` to some configuration values. This is supported currently
|
||||
only by:
|
||||
|
||||
* PAPERLESS_DBUSER
|
||||
* PAPERLESS_DBPASS
|
||||
* PAPERLESS_SECRET_KEY
|
||||
* PAPERLESS_AUTO_LOGIN_USERNAME
|
||||
* PAPERLESS_ADMIN_USER
|
||||
* PAPERLESS_ADMIN_MAIL
|
||||
* PAPERLESS_ADMIN_PASSWORD
|
||||
|
||||
.. caution::
|
||||
|
||||
Some file systems such as NFS network shares don't support file system
|
||||
notifications with ``inotify``. When storing the consumption directory
|
||||
on such a file system, paperless will not pick up new files
|
||||
with the default configuration. You will need to use ``PAPERLESS_CONSUMER_POLLING``,
|
||||
which will disable inotify. See :ref:`here <configuration-polling>`.
|
||||
|
||||
6. Run ``docker-compose pull``, followed by ``docker-compose up -d``.
|
||||
This will pull the image, create and start the necessary containers.
|
||||
|
||||
7. To be able to login, you will need a super user. To create it, execute the
|
||||
following command:
|
||||
|
||||
.. code-block:: shell-session
|
||||
|
||||
$ docker-compose run --rm webserver createsuperuser
|
||||
|
||||
This will prompt you to set a username, an optional e-mail address and
|
||||
finally a password (at least 8 characters).
|
||||
|
||||
8. The default ``docker-compose.yml`` exports the webserver on your local port
|
||||
8000. If you did not change this, you should now be able to visit your
|
||||
Paperless instance at ``http://127.0.0.1:8000`` or your servers IP-Address:8000.
|
||||
Use the login credentials you have created with the previous step.
|
||||
|
||||
.. _Docker: https://www.docker.com/
|
||||
.. _docker-compose: https://docs.docker.com/compose/install/
|
||||
|
||||
.. _setup-docker_build:
|
||||
|
||||
Build the Docker image yourself
|
||||
===============================
|
||||
|
||||
1. Clone the entire repository of paperless:
|
||||
|
||||
.. code:: shell-session
|
||||
|
||||
git clone https://github.com/paperless-ngx/paperless-ngx
|
||||
|
||||
The master branch always reflects the latest stable version.
|
||||
|
||||
2. Copy one of the ``docker/compose/docker-compose.*.yml`` to ``docker-compose.yml`` in the root folder,
|
||||
depending on which database backend you want to use. Copy
|
||||
``docker-compose.env`` into the project root as well.
|
||||
|
||||
3. In the ``docker-compose.yml`` file, find the line that instructs docker-compose to pull the paperless image from Docker Hub:
|
||||
|
||||
.. code:: yaml
|
||||
|
||||
webserver:
|
||||
image: ghcr.io/paperless-ngx/paperless-ngx:latest
|
||||
|
||||
and replace it with a line that instructs docker-compose to build the image from the current working directory instead:
|
||||
|
||||
.. code:: yaml
|
||||
|
||||
webserver:
|
||||
build:
|
||||
context: .
|
||||
args:
|
||||
QPDF_VERSION: x.y.x
|
||||
PIKEPDF_VERSION: x.y.z
|
||||
PSYCOPG2_VERSION: x.y.z
|
||||
JBIG2ENC_VERSION: 0.29
|
||||
|
||||
.. note::
|
||||
|
||||
You should match the build argument versions to the version for the release you have
|
||||
checked out. These are pre-built images with certain, more updated software.
|
||||
If you want to build these images your self, that is possible, but beyond
|
||||
the scope of these steps.
|
||||
|
||||
4. Follow steps 3 to 8 of :ref:`setup-docker_hub`. When asked to run
|
||||
``docker-compose pull`` to pull the image, do
|
||||
|
||||
.. code:: shell-session
|
||||
|
||||
$ docker-compose build
|
||||
|
||||
instead to build the image.
|
||||
|
||||
.. _setup-bare_metal:
|
||||
|
||||
Bare Metal Route
|
||||
================
|
||||
|
||||
Paperless runs on linux only. The following procedure has been tested on a minimal
|
||||
installation of Debian/Buster, which is the current stable release at the time of
|
||||
writing. Windows is not and will never be supported.
|
||||
|
||||
1. Install dependencies. Paperless requires the following packages.
|
||||
|
||||
* ``python3`` 3.8, 3.9
|
||||
* ``python3-pip``
|
||||
* ``python3-dev``
|
||||
|
||||
* ``default-libmysqlclient-dev`` for MariaDB
|
||||
* ``fonts-liberation`` for generating thumbnails for plain text files
|
||||
* ``imagemagick`` >= 6 for PDF conversion
|
||||
* ``gnupg`` for handling encrypted documents
|
||||
* ``libpq-dev`` for PostgreSQL
|
||||
* ``libmagic-dev`` for mime type detection
|
||||
* ``mariadb-client`` for MariaDB compile time
|
||||
* ``mime-support`` for mime type detection
|
||||
* ``libzbar0`` for barcode detection
|
||||
* ``poppler-utils`` for barcode detection
|
||||
|
||||
Use this list for your preferred package management:
|
||||
|
||||
.. code::
|
||||
|
||||
python3 python3-pip python3-dev imagemagick fonts-liberation gnupg libpq-dev default-libmysqlclient-dev libmagic-dev mime-support libzbar0 poppler-utils
|
||||
|
||||
These dependencies are required for OCRmyPDF, which is used for text recognition.
|
||||
|
||||
* ``unpaper``
|
||||
* ``ghostscript``
|
||||
* ``icc-profiles-free``
|
||||
* ``qpdf``
|
||||
* ``liblept5``
|
||||
* ``libxml2``
|
||||
* ``pngquant`` (suggested for certain PDF image optimizations)
|
||||
* ``zlib1g``
|
||||
* ``tesseract-ocr`` >= 4.0.0 for OCR
|
||||
* ``tesseract-ocr`` language packs (``tesseract-ocr-eng``, ``tesseract-ocr-deu``, etc)
|
||||
|
||||
Use this list for your preferred package management:
|
||||
|
||||
.. code::
|
||||
|
||||
unpaper ghostscript icc-profiles-free qpdf liblept5 libxml2 pngquant zlib1g tesseract-ocr
|
||||
|
||||
On Raspberry Pi, these libraries are required as well:
|
||||
|
||||
* ``libatlas-base-dev``
|
||||
* ``libxslt1-dev``
|
||||
|
||||
You will also need ``build-essential``, ``python3-setuptools`` and ``python3-wheel``
|
||||
for installing some of the python dependencies.
|
||||
|
||||
2. Install ``redis`` >= 6.0 and configure it to start automatically.
|
||||
|
||||
3. Optional. Install ``postgresql`` and configure a database, user and password for paperless. If you do not wish
|
||||
to use PostgreSQL, MariaDB and SQLite are available as well.
|
||||
|
||||
.. note::
|
||||
|
||||
On bare-metal installations using SQLite, ensure the
|
||||
`JSON1 extension <https://code.djangoproject.com/wiki/JSON1Extension>`_ is enabled. This is
|
||||
usually the case, but not always.
|
||||
|
||||
4. Get the release archive from `<https://github.com/paperless-ngx/paperless-ngx/releases>`_.
|
||||
If you clone the git repo as it is, you also have to compile the front end by yourself.
|
||||
Extract the archive to a place from where you wish to execute it, such as ``/opt/paperless``.
|
||||
|
||||
5. Configure paperless. See :ref:`configuration` for details. Edit the included ``paperless.conf`` and adjust the
|
||||
settings to your needs. Required settings for getting paperless running are:
|
||||
|
||||
* ``PAPERLESS_REDIS`` should point to your redis server, such as redis://localhost:6379.
|
||||
* ``PAPERLESS_DBENGINE`` optional, and should be one of `postgres, mariadb, or sqlite`
|
||||
* ``PAPERLESS_DBHOST`` should be the hostname on which your PostgreSQL server is running. Do not configure this
|
||||
to use SQLite instead. Also configure port, database name, user and password as necessary.
|
||||
* ``PAPERLESS_CONSUMPTION_DIR`` should point to a folder which paperless should watch for documents. You might
|
||||
want to have this somewhere else. Likewise, ``PAPERLESS_DATA_DIR`` and ``PAPERLESS_MEDIA_ROOT`` define where
|
||||
paperless stores its data. If you like, you can point both to the same directory.
|
||||
* ``PAPERLESS_SECRET_KEY`` should be a random sequence of characters. It's used for authentication. Failure
|
||||
to do so allows third parties to forge authentication credentials.
|
||||
* ``PAPERLESS_URL`` if you are behind a reverse proxy. This should point to your domain. Please see
|
||||
:ref:`configuration` for more information.
|
||||
|
||||
Many more adjustments can be made to paperless, especially the OCR part. The following options are recommended
|
||||
for everyone:
|
||||
|
||||
* Set ``PAPERLESS_OCR_LANGUAGE`` to the language most of your documents are written in.
|
||||
* Set ``PAPERLESS_TIME_ZONE`` to your local time zone.
|
||||
|
||||
6. Create a system user under which you wish to run paperless.
|
||||
|
||||
.. code:: shell-session
|
||||
|
||||
adduser paperless --system --home /opt/paperless --group
|
||||
|
||||
7. Ensure that these directories exist
|
||||
and that the paperless user has write permissions to the following directories:
|
||||
|
||||
* ``/opt/paperless/media``
|
||||
* ``/opt/paperless/data``
|
||||
* ``/opt/paperless/consume``
|
||||
|
||||
Adjust as necessary if you configured different folders.
|
||||
|
||||
8. Install python requirements from the ``requirements.txt`` file.
|
||||
It is up to you if you wish to use a virtual environment or not. First you should update your pip, so it gets the actual packages.
|
||||
|
||||
.. code:: shell-session
|
||||
|
||||
sudo -Hu paperless pip3 install --upgrade pip
|
||||
|
||||
.. code:: shell-session
|
||||
|
||||
sudo -Hu paperless pip3 install -r requirements.txt
|
||||
|
||||
This will install all python dependencies in the home directory of
|
||||
the new paperless user.
|
||||
|
||||
9. Go to ``/opt/paperless/src``, and execute the following commands:
|
||||
|
||||
.. code:: bash
|
||||
|
||||
# This creates the database schema.
|
||||
sudo -Hu paperless python3 manage.py migrate
|
||||
|
||||
# This creates your first paperless user
|
||||
sudo -Hu paperless python3 manage.py createsuperuser
|
||||
|
||||
10. Optional: Test that paperless is working by executing
|
||||
|
||||
.. code:: bash
|
||||
|
||||
# This collects static files from paperless and django.
|
||||
sudo -Hu paperless python3 manage.py runserver
|
||||
|
||||
and pointing your browser to http://localhost:8000/.
|
||||
|
||||
.. warning::
|
||||
|
||||
This is a development server which should not be used in
|
||||
production. It is not audited for security and performance
|
||||
is inferior to production ready web servers.
|
||||
|
||||
.. hint::
|
||||
|
||||
This will not start the consumer. Paperless does this in a
|
||||
separate process.
|
||||
|
||||
11. Setup systemd services to run paperless automatically. You may
|
||||
use the service definition files included in the ``scripts`` folder
|
||||
as a starting point.
|
||||
|
||||
Paperless needs the ``webserver`` script to run the webserver, the
|
||||
``consumer`` script to watch the input folder, ``taskqueue`` for the background workers
|
||||
used to handle things like document consumption and the ``scheduler`` script to run tasks such as
|
||||
email checking at certain times .
|
||||
|
||||
The ``socket`` script enables ``gunicorn`` to run on port 80 without
|
||||
root privileges. For this you need to uncomment the ``Require=paperless-webserver.socket``
|
||||
in the ``webserver`` script and configure ``gunicorn`` to listen on port 80 (see ``paperless/gunicorn.conf.py``).
|
||||
|
||||
You may need to adjust the path to the ``gunicorn`` executable. This
|
||||
will be installed as part of the python dependencies, and is either located
|
||||
in the ``bin`` folder of your virtual environment, or in ``~/.local/bin/`` if
|
||||
no virtual environment is used.
|
||||
|
||||
These services rely on redis and optionally the database server, but
|
||||
don't need to be started in any particular order. The example files
|
||||
depend on redis being started. If you use a database server, you should
|
||||
add additional dependencies.
|
||||
|
||||
.. caution::
|
||||
|
||||
The included scripts run a ``gunicorn`` standalone server,
|
||||
which is fine for running paperless. It does support SSL,
|
||||
however, the documentation of GUnicorn states that you should
|
||||
use a proxy server in front of gunicorn instead.
|
||||
|
||||
For instructions on how to use nginx for that,
|
||||
:ref:`see the instructions below <setup-nginx>`.
|
||||
|
||||
12. Optional: Install a samba server and make the consumption folder
|
||||
available as a network share.
|
||||
|
||||
13. Configure ImageMagick to allow processing of PDF documents. Most distributions have
|
||||
this disabled by default, since PDF documents can contain malware. If
|
||||
you don't do this, paperless will fall back to ghostscript for certain steps
|
||||
such as thumbnail generation.
|
||||
|
||||
Edit ``/etc/ImageMagick-6/policy.xml`` and adjust
|
||||
|
||||
.. code::
|
||||
|
||||
<policy domain="coder" rights="none" pattern="PDF" />
|
||||
|
||||
to
|
||||
|
||||
.. code::
|
||||
|
||||
<policy domain="coder" rights="read|write" pattern="PDF" />
|
||||
|
||||
14. Optional: Install the `jbig2enc <https://ocrmypdf.readthedocs.io/en/latest/jbig2.html>`_
|
||||
encoder. This will reduce the size of generated PDF documents. You'll most likely need
|
||||
to compile this by yourself, because this software has been patented until around 2017 and
|
||||
binary packages are not available for most distributions.
|
||||
|
||||
15. Optional: If using the NLTK machine learning processing (see ``PAPERLESS_ENABLE_NLTK`` in
|
||||
:ref:`configuration` for details), download the NLTK data for the Snowball Stemmer, Stopwords
|
||||
and Punkt tokenizer to your ``PAPERLESS_DATA_DIR/nltk``. Refer to
|
||||
the `NLTK instructions <https://www.nltk.org/data.html>`_ for details on how to
|
||||
download the data.
|
||||
|
||||
|
||||
Migrating to Paperless-ngx
|
||||
##########################
|
||||
|
||||
Migration is possible both from Paperless-ng or directly from the 'original' Paperless.
|
||||
|
||||
Migrating from Paperless-ng
|
||||
===========================
|
||||
|
||||
Paperless-ngx is meant to be a drop-in replacement for Paperless-ng and thus upgrading should be
|
||||
trivial for most users, especially when using docker. However, as with any major change, it is
|
||||
recommended to take a full backup first. Once you are ready, simply change the docker image to
|
||||
point to the new source. E.g. if using Docker Compose, edit ``docker-compose.yml`` and change:
|
||||
|
||||
.. code::
|
||||
|
||||
image: jonaswinkler/paperless-ng:latest
|
||||
|
||||
to
|
||||
|
||||
.. code::
|
||||
|
||||
image: ghcr.io/paperless-ngx/paperless-ngx:latest
|
||||
|
||||
and then run ``docker-compose up -d`` which will pull the new image recreate the container.
|
||||
That's it!
|
||||
|
||||
Users who installed with the bare-metal route should also update their Git clone to point to
|
||||
``https://github.com/paperless-ngx/paperless-ngx``, e.g. using the command
|
||||
``git remote set-url origin https://github.com/paperless-ngx/paperless-ngx`` and then pull the
|
||||
lastest version.
|
||||
|
||||
Migrating from Paperless
|
||||
========================
|
||||
|
||||
At its core, paperless-ngx is still paperless and fully compatible. However, some
|
||||
things have changed under the hood, so you need to adapt your setup depending on
|
||||
how you installed paperless.
|
||||
|
||||
This setup describes how to update an existing paperless Docker installation.
|
||||
The important things to keep in mind are as follows:
|
||||
|
||||
* Read the :doc:`changelog </changelog>` and take note of breaking changes.
|
||||
* You should decide if you want to stick with SQLite or want to migrate your database
|
||||
to PostgreSQL. See :ref:`setup-sqlite_to_psql` for details on how to move your data from
|
||||
SQLite to PostgreSQL. Both work fine with paperless. However, if you already have a
|
||||
database server running for other services, you might as well use it for paperless as well.
|
||||
* The task scheduler of paperless, which is used to execute periodic tasks
|
||||
such as email checking and maintenance, requires a `redis`_ message broker
|
||||
instance. The docker-compose route takes care of that.
|
||||
* The layout of the folder structure for your documents and data remains the
|
||||
same, so you can just plug your old docker volumes into paperless-ngx and
|
||||
expect it to find everything where it should be.
|
||||
|
||||
Migration to paperless-ngx is then performed in a few simple steps:
|
||||
|
||||
1. Stop paperless.
|
||||
|
||||
.. code:: bash
|
||||
|
||||
$ cd /path/to/current/paperless
|
||||
$ docker-compose down
|
||||
|
||||
2. Do a backup for two purposes: If something goes wrong, you still have your
|
||||
data. Second, if you don't like paperless-ngx, you can switch back to
|
||||
paperless.
|
||||
|
||||
3. Download the latest release of paperless-ngx. You can either go with the
|
||||
docker-compose files from `here <https://github.com/paperless-ngx/paperless-ngx/tree/master/docker/compose>`__
|
||||
or clone the repository to build the image yourself (see :ref:`above <setup-docker_build>`).
|
||||
You can either replace your current paperless folder or put paperless-ngx
|
||||
in a different location.
|
||||
|
||||
.. caution::
|
||||
|
||||
Paperless-ngx includes a ``.env`` file. This will set the
|
||||
project name for docker compose to ``paperless``, which will also define the name
|
||||
of the volumes by paperless-ngx. However, if you experience that paperless-ngx
|
||||
is not using your old paperless volumes, verify the names of your volumes with
|
||||
|
||||
.. code:: shell-session
|
||||
|
||||
$ docker volume ls | grep _data
|
||||
|
||||
and adjust the project name in the ``.env`` file so that it matches the name
|
||||
of the volumes before the ``_data`` part.
|
||||
|
||||
|
||||
4. Download the ``docker-compose.sqlite.yml`` file to ``docker-compose.yml``.
|
||||
If you want to switch to PostgreSQL, do that after you migrated your existing
|
||||
SQLite database.
|
||||
|
||||
5. Adjust ``docker-compose.yml`` and ``docker-compose.env`` to your needs.
|
||||
See :ref:`setup-docker_hub` for details on which edits are advised.
|
||||
|
||||
6. :ref:`Update paperless. <administration-updating>`
|
||||
|
||||
7. In order to find your existing documents with the new search feature, you need
|
||||
to invoke a one-time operation that will create the search index:
|
||||
|
||||
.. code:: shell-session
|
||||
|
||||
$ docker-compose run --rm webserver document_index reindex
|
||||
|
||||
This will migrate your database and create the search index. After that,
|
||||
paperless will take care of maintaining the index by itself.
|
||||
|
||||
8. Start paperless-ngx.
|
||||
|
||||
.. code:: bash
|
||||
|
||||
$ docker-compose up -d
|
||||
|
||||
This will run paperless in the background and automatically start it on system boot.
|
||||
|
||||
9. Paperless installed a permanent redirect to ``admin/`` in your browser. This
|
||||
redirect is still in place and prevents access to the new UI. Clear your
|
||||
browsing cache in order to fix this.
|
||||
|
||||
10. Optionally, follow the instructions below to migrate your existing data to PostgreSQL.
|
||||
|
||||
|
||||
Migrating from LinuxServer.io Docker Image
|
||||
==========================================
|
||||
|
||||
As with any upgrades and large changes, it is highly recommended to create a backup before
|
||||
starting. This assumes the image was running using Docker Compose, but the instructions
|
||||
are translatable to Docker commands as well.
|
||||
|
||||
1. Stop and remove the paperless container
|
||||
2. If using an external database, stop the container
|
||||
3. Update Redis configuration
|
||||
|
||||
a) If ``REDIS_URL`` is already set, change it to ``PAPERLESS_REDIS`` and continue
|
||||
to step 4.
|
||||
b) Otherwise, in the ``docker-compose.yml`` add a new service for Redis,
|
||||
following `the example compose files <https://github.com/paperless-ngx/paperless-ngx/tree/main/docker/compose>`_
|
||||
c) Set the environment variable ``PAPERLESS_REDIS`` so it points to the new Redis container
|
||||
|
||||
4. Update user mapping
|
||||
|
||||
a) If set, change the environment variable ``PUID`` to ``USERMAP_UID``
|
||||
b) If set, change the environment variable ``PGID`` to ``USERMAP_GID``
|
||||
|
||||
5. Update configuration paths
|
||||
|
||||
a) Set the environment variable ``PAPERLESS_DATA_DIR``
|
||||
to ``/config``
|
||||
|
||||
6. Update media paths
|
||||
|
||||
a) Set the environment variable ``PAPERLESS_MEDIA_ROOT``
|
||||
to ``/data/media``
|
||||
|
||||
7. Update timezone
|
||||
|
||||
a) Set the environment variable ``PAPERLESS_TIME_ZONE``
|
||||
to the same value as ``TZ``
|
||||
|
||||
8. Modify the ``image:`` to point to ``ghcr.io/paperless-ngx/paperless-ngx:latest`` or
|
||||
a specific version if preferred.
|
||||
|
||||
9. Start the containers as before, using ``docker-compose``.
|
||||
|
||||
.. _setup-sqlite_to_psql:
|
||||
|
||||
Moving data from SQLite to PostgreSQL or MySQL/MariaDB
|
||||
======================================================
|
||||
|
||||
Moving your data from SQLite to PostgreSQL or MySQL/MariaDB is done via executing a series of django
|
||||
management commands as below. The commands below use PostgreSQL, but are applicable to MySQL/MariaDB
|
||||
with the
|
||||
|
||||
.. caution::
|
||||
|
||||
Make sure that your SQLite database is migrated to the latest version.
|
||||
Starting paperless will make sure that this is the case. If your try to
|
||||
load data from an old database schema in SQLite into a newer database
|
||||
schema in PostgreSQL, you will run into trouble.
|
||||
|
||||
.. warning::
|
||||
|
||||
On some database fields, PostgreSQL enforces predefined limits on maximum
|
||||
length, whereas SQLite does not. The fields in question are the title of documents
|
||||
(128 characters), names of document types, tags and correspondents (128 characters),
|
||||
and filenames (1024 characters). If you have data in these fields that surpasses these
|
||||
limits, migration to PostgreSQL is not possible and will fail with an error.
|
||||
|
||||
.. warning::
|
||||
|
||||
MySQL is case insensitive by default, treating values like "Name" and "NAME" as identical.
|
||||
See :ref:`advanced-mysql-caveats` for details.
|
||||
|
||||
|
||||
1. Stop paperless, if it is running.
|
||||
2. Tell paperless to use PostgreSQL:
|
||||
|
||||
a) With docker, copy the provided ``docker-compose.postgres.yml`` file to
|
||||
``docker-compose.yml``. Remember to adjust the consumption directory,
|
||||
if necessary.
|
||||
b) Without docker, configure the database in your ``paperless.conf`` file.
|
||||
See :ref:`configuration` for details.
|
||||
|
||||
3. Open a shell and initialize the database:
|
||||
|
||||
a) With docker, run the following command to open a shell within the paperless
|
||||
container:
|
||||
|
||||
.. code:: shell-session
|
||||
|
||||
$ cd /path/to/paperless
|
||||
$ docker-compose run --rm webserver /bin/bash
|
||||
|
||||
This will launch the container and initialize the PostgreSQL database.
|
||||
|
||||
b) Without docker, remember to activate any virtual environment, switch to
|
||||
the ``src`` directory and create the database schema:
|
||||
|
||||
.. code:: shell-session
|
||||
|
||||
$ cd /path/to/paperless/src
|
||||
$ python3 manage.py migrate
|
||||
|
||||
This will not copy any data yet.
|
||||
|
||||
4. Dump your data from SQLite:
|
||||
|
||||
.. code:: shell-session
|
||||
|
||||
$ python3 manage.py dumpdata --database=sqlite --exclude=contenttypes --exclude=auth.Permission > data.json
|
||||
|
||||
5. Load your data into PostgreSQL:
|
||||
|
||||
.. code:: shell-session
|
||||
|
||||
$ python3 manage.py loaddata data.json
|
||||
|
||||
6. If operating inside Docker, you may exit the shell now.
|
||||
|
||||
.. code:: shell-session
|
||||
|
||||
$ exit
|
||||
|
||||
7. Start paperless.
|
||||
|
||||
|
||||
Moving back to Paperless
|
||||
========================
|
||||
|
||||
Lets say you migrated to Paperless-ngx and used it for a while, but decided that
|
||||
you don't like it and want to move back (If you do, send me a mail about what
|
||||
part you didn't like!), you can totally do that with a few simple steps.
|
||||
|
||||
Paperless-ngx modified the database schema slightly, however, these changes can
|
||||
be reverted while keeping your current data, so that your current data will
|
||||
be compatible with original Paperless.
|
||||
|
||||
Execute this:
|
||||
|
||||
.. code:: shell-session
|
||||
|
||||
$ cd /path/to/paperless
|
||||
$ docker-compose run --rm webserver migrate documents 0023
|
||||
|
||||
Or without docker:
|
||||
|
||||
.. code:: shell-session
|
||||
|
||||
$ cd /path/to/paperless/src
|
||||
$ python3 manage.py migrate documents 0023
|
||||
|
||||
After that, you need to clear your cookies (Paperless-ngx comes with updated
|
||||
dependencies that do cookie-processing differently) and probably your cache
|
||||
as well.
|
||||
|
||||
.. _setup-less_powerful_devices:
|
||||
|
||||
|
||||
Considerations for less powerful devices
|
||||
########################################
|
||||
|
||||
Paperless runs on Raspberry Pi. However, some things are rather slow on the Pi and
|
||||
configuring some options in paperless can help improve performance immensely:
|
||||
|
||||
* Stick with SQLite to save some resources.
|
||||
* Consider setting ``PAPERLESS_OCR_PAGES`` to 1, so that paperless will only OCR
|
||||
the first page of your documents. In most cases, this page contains enough
|
||||
information to be able to find it.
|
||||
* ``PAPERLESS_TASK_WORKERS`` and ``PAPERLESS_THREADS_PER_WORKER`` are configured
|
||||
to use all cores. The Raspberry Pi models 3 and up have 4 cores, meaning that
|
||||
paperless will use 2 workers and 2 threads per worker. This may result in
|
||||
sluggish response times during consumption, so you might want to lower these
|
||||
settings (example: 2 workers and 1 thread to always have some computing power
|
||||
left for other tasks).
|
||||
* Keep ``PAPERLESS_OCR_MODE`` at its default value ``skip`` and consider OCR'ing
|
||||
your documents before feeding them into paperless. Some scanners are able to
|
||||
do this! You might want to even specify ``skip_noarchive`` to skip archive
|
||||
file generation for already ocr'ed documents entirely.
|
||||
* If you want to perform OCR on the device, consider using ``PAPERLESS_OCR_CLEAN=none``.
|
||||
This will speed up OCR times and use less memory at the expense of slightly worse
|
||||
OCR results.
|
||||
* If using docker, consider setting ``PAPERLESS_WEBSERVER_WORKERS`` to
|
||||
1. This will save some memory.
|
||||
* Consider setting ``PAPERLESS_ENABLE_NLTK`` to false, to disable the more
|
||||
advanced language processing, which can take more memory and processing time.
|
||||
|
||||
For details, refer to :ref:`configuration`.
|
||||
|
||||
.. note::
|
||||
|
||||
Updating the :ref:`automatic matching algorithm <advanced-automatic_matching>`
|
||||
takes quite a bit of time. However, the update mechanism checks if your
|
||||
data has changed before doing the heavy lifting. If you experience the
|
||||
algorithm taking too much cpu time, consider changing the schedule in the
|
||||
admin interface to daily. You can also manually invoke the task
|
||||
by changing the date and time of the next run to today/now.
|
||||
|
||||
The actual matching of the algorithm is fast and works on Raspberry Pi as
|
||||
well as on any other device.
|
||||
|
||||
.. _redis: https://redis.io/
|
||||
|
||||
|
||||
.. _setup-nginx:
|
||||
|
||||
Using nginx as a reverse proxy
|
||||
##############################
|
||||
|
||||
If you want to expose paperless to the internet, you should hide it behind a
|
||||
reverse proxy with SSL enabled.
|
||||
|
||||
In addition to the usual configuration for SSL,
|
||||
the following configuration is required for paperless to operate:
|
||||
|
||||
.. code:: nginx
|
||||
|
||||
http {
|
||||
|
||||
# Adjust as required. This is the maximum size for file uploads.
|
||||
# The default value 1M might be a little too small.
|
||||
client_max_body_size 10M;
|
||||
|
||||
server {
|
||||
|
||||
location / {
|
||||
|
||||
# Adjust host and port as required.
|
||||
proxy_pass http://localhost:8000/;
|
||||
|
||||
# These configuration options are required for WebSockets to work.
|
||||
proxy_http_version 1.1;
|
||||
proxy_set_header Upgrade $http_upgrade;
|
||||
proxy_set_header Connection "upgrade";
|
||||
|
||||
proxy_redirect off;
|
||||
proxy_set_header Host $host;
|
||||
proxy_set_header X-Real-IP $remote_addr;
|
||||
proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
|
||||
proxy_set_header X-Forwarded-Host $server_name;
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
The ``PAPERLESS_URL`` configuration variable is also required when using a reverse proxy. Please refer to the :ref:`hosting-and-security` docs.
|
||||
|
||||
Also read `this <https://channels.readthedocs.io/en/stable/deploying.html#nginx-supervisor-ubuntu>`__, towards the end of the section.
|
||||
You will be redirected shortly...
|
||||
|
@ -1,328 +1,12 @@
|
||||
.. _troubleshooting:
|
||||
|
||||
***************
|
||||
Troubleshooting
|
||||
***************
|
||||
|
||||
No files are added by the consumer
|
||||
##################################
|
||||
|
||||
Check for the following issues:
|
||||
.. cssclass:: redirect-notice
|
||||
|
||||
* Ensure that the directory you're putting your documents in is the folder
|
||||
paperless is watching. With docker, this setting is performed in the
|
||||
``docker-compose.yml`` file. Without docker, look at the ``CONSUMPTION_DIR``
|
||||
setting. Don't adjust this setting if you're using docker.
|
||||
* Ensure that redis is up and running. Paperless does its task processing
|
||||
asynchronously, and for documents to arrive at the task processor, it needs
|
||||
redis to run.
|
||||
* Ensure that the task processor is running. Docker does this automatically.
|
||||
Manually invoke the task processor by executing
|
||||
The Paperless-ngx documentation has permanently moved.
|
||||
|
||||
.. code:: shell-session
|
||||
|
||||
$ celery --app paperless worker
|
||||
|
||||
* Look at the output of paperless and inspect it for any errors.
|
||||
* Go to the admin interface, and check if there are failed tasks. If so, the
|
||||
tasks will contain an error message.
|
||||
|
||||
Consumer warns ``OCR for XX failed``
|
||||
####################################
|
||||
|
||||
If you find the OCR accuracy to be too low, and/or the document consumer warns
|
||||
that ``OCR for XX failed, but we're going to stick with what we've got since
|
||||
FORGIVING_OCR is enabled``, then you might need to install the
|
||||
`Tesseract language files <http://packages.ubuntu.com/search?keywords=tesseract-ocr>`_
|
||||
marching your document's languages.
|
||||
|
||||
As an example, if you are running Paperless-ngx from any Ubuntu or Debian
|
||||
box, and your documents are written in Spanish you may need to run::
|
||||
|
||||
apt-get install -y tesseract-ocr-spa
|
||||
|
||||
Consumer fails to pickup any new files
|
||||
######################################
|
||||
|
||||
If you notice that the consumer will only pickup files in the consumption
|
||||
directory at startup, but won't find any other files added later, you will need to
|
||||
enable filesystem polling with the configuration option
|
||||
``PAPERLESS_CONSUMER_POLLING``, see :ref:`here <configuration-polling>`.
|
||||
|
||||
This will disable listening to filesystem changes with inotify and paperless will
|
||||
manually check the consumption directory for changes instead.
|
||||
|
||||
|
||||
Paperless always redirects to /admin
|
||||
####################################
|
||||
|
||||
You probably had the old paperless installed at some point. Paperless installed
|
||||
a permanent redirect to /admin in your browser, and you need to clear your
|
||||
browsing data / cache to fix that.
|
||||
|
||||
|
||||
Operation not permitted
|
||||
#######################
|
||||
|
||||
You might see errors such as:
|
||||
|
||||
.. code:: shell-session
|
||||
|
||||
chown: changing ownership of '../export': Operation not permitted
|
||||
|
||||
The container tries to set file ownership on the listed directories. This is
|
||||
required so that the user running paperless inside docker has write permissions
|
||||
to these folders. This happens when pointing these directories to NFS shares,
|
||||
for example.
|
||||
|
||||
Ensure that ``chown`` is possible on these directories.
|
||||
|
||||
|
||||
Classifier error: No training data available
|
||||
############################################
|
||||
|
||||
This indicates that the Auto matching algorithm found no documents to learn from.
|
||||
This may have two reasons:
|
||||
|
||||
* You don't use the Auto matching algorithm: The error can be safely ignored in this case.
|
||||
* You are using the Auto matching algorithm: The classifier explicitly excludes documents
|
||||
with Inbox tags. Verify that there are documents in your archive without inbox tags.
|
||||
The algorithm will only learn from documents not in your inbox.
|
||||
|
||||
|
||||
UserWarning in sklearn on every single document
|
||||
###############################################
|
||||
|
||||
You may encounter warnings like this:
|
||||
|
||||
.. code::
|
||||
|
||||
/usr/local/lib/python3.7/site-packages/sklearn/base.py:315:
|
||||
UserWarning: Trying to unpickle estimator CountVectorizer from version 0.23.2 when using version 0.24.0.
|
||||
This might lead to breaking code or invalid results. Use at your own risk.
|
||||
|
||||
This happens when certain dependencies of paperless that are responsible for the auto matching algorithm are
|
||||
updated. After updating these, your current training data *might* not be compatible anymore. This can be ignored
|
||||
in most cases. This warning will disappear automatically when paperless updates the training data.
|
||||
|
||||
If you want to get rid of the warning or actually experience issues with automatic matching, delete
|
||||
the file ``classification_model.pickle`` in the data directory and let paperless recreate it.
|
||||
|
||||
|
||||
504 Server Error: Gateway Timeout when adding Office documents
|
||||
##############################################################
|
||||
|
||||
You may experience these errors when using the optional TIKA integration:
|
||||
|
||||
.. code::
|
||||
|
||||
requests.exceptions.HTTPError: 504 Server Error: Gateway Timeout for url: http://gotenberg:3000/forms/libreoffice/convert
|
||||
|
||||
Gotenberg is a server that converts Office documents into PDF documents and has a default timeout of 30 seconds.
|
||||
When conversion takes longer, Gotenberg raises this error.
|
||||
|
||||
You can increase the timeout by configuring a command flag for Gotenberg (see also `here <https://gotenberg.dev/docs/modules/api#properties>`__).
|
||||
If using docker-compose, this is achieved by the following configuration change in the ``docker-compose.yml`` file:
|
||||
|
||||
.. code:: yaml
|
||||
|
||||
gotenberg:
|
||||
image: gotenberg/gotenberg:7.6
|
||||
restart: unless-stopped
|
||||
command:
|
||||
- "gotenberg"
|
||||
- "--chromium-disable-routes=true"
|
||||
- "--api-timeout=60"
|
||||
|
||||
Permission denied errors in the consumption directory
|
||||
#####################################################
|
||||
|
||||
You might encounter errors such as:
|
||||
|
||||
.. code:: shell-session
|
||||
|
||||
The following error occured while consuming document.pdf: [Errno 13] Permission denied: '/usr/src/paperless/src/../consume/document.pdf'
|
||||
|
||||
This happens when paperless does not have permission to delete files inside the consumption directory.
|
||||
Ensure that ``USERMAP_UID`` and ``USERMAP_GID`` are set to the user id and group id you use on the host operating system, if these are
|
||||
different from ``1000``. See :ref:`setup-docker_hub`.
|
||||
|
||||
Also ensure that you are able to read and write to the consumption directory on the host.
|
||||
|
||||
|
||||
OSError: [Errno 19] No such device when consuming files
|
||||
#######################################################
|
||||
|
||||
If you experience errors such as:
|
||||
|
||||
.. code:: shell-session
|
||||
|
||||
File "/usr/local/lib/python3.7/site-packages/whoosh/codec/base.py", line 570, in open_compound_file
|
||||
return CompoundStorage(dbfile, use_mmap=storage.supports_mmap)
|
||||
File "/usr/local/lib/python3.7/site-packages/whoosh/filedb/compound.py", line 75, in __init__
|
||||
self._source = mmap.mmap(fileno, 0, access=mmap.ACCESS_READ)
|
||||
OSError: [Errno 19] No such device
|
||||
|
||||
During handling of the above exception, another exception occurred:
|
||||
|
||||
Traceback (most recent call last):
|
||||
File "/usr/local/lib/python3.7/site-packages/django_q/cluster.py", line 436, in worker
|
||||
res = f(*task["args"], **task["kwargs"])
|
||||
File "/usr/src/paperless/src/documents/tasks.py", line 73, in consume_file
|
||||
override_tag_ids=override_tag_ids)
|
||||
File "/usr/src/paperless/src/documents/consumer.py", line 271, in try_consume_file
|
||||
raise ConsumerError(e)
|
||||
|
||||
Paperless uses a search index to provide better and faster full text searching. This search index is stored inside
|
||||
the ``data`` folder. The search index uses memory-mapped files (mmap). The above error indicates that paperless
|
||||
was unable to create and open these files.
|
||||
|
||||
This happens when you're trying to store the data directory on certain file systems (mostly network shares)
|
||||
that don't support memory-mapped files.
|
||||
|
||||
|
||||
Web-UI stuck at "Loading..."
|
||||
############################
|
||||
|
||||
This might have multiple reasons.
|
||||
|
||||
|
||||
1. If you built the docker image yourself or deployed using the bare metal route,
|
||||
make sure that there are files in ``<paperless-root>/static/frontend/<lang-code>/``.
|
||||
If there are no files, make sure that you executed ``collectstatic`` successfully, either
|
||||
manually or as part of the docker image build.
|
||||
|
||||
If the front end is still missing, make sure that the front end is compiled (files present in
|
||||
``src/documents/static/frontend``). If it is not, you need to compile the front end yourself
|
||||
or download the release archive instead of cloning the repository.
|
||||
|
||||
2. Check the output of the web server. You might see errors like this:
|
||||
|
||||
|
||||
.. code::
|
||||
|
||||
[2021-01-25 10:08:04 +0000] [40] [ERROR] Socket error processing request.
|
||||
Traceback (most recent call last):
|
||||
File "/usr/local/lib/python3.7/site-packages/gunicorn/workers/sync.py", line 134, in handle
|
||||
self.handle_request(listener, req, client, addr)
|
||||
File "/usr/local/lib/python3.7/site-packages/gunicorn/workers/sync.py", line 190, in handle_request
|
||||
util.reraise(*sys.exc_info())
|
||||
File "/usr/local/lib/python3.7/site-packages/gunicorn/util.py", line 625, in reraise
|
||||
raise value
|
||||
File "/usr/local/lib/python3.7/site-packages/gunicorn/workers/sync.py", line 178, in handle_request
|
||||
resp.write_file(respiter)
|
||||
File "/usr/local/lib/python3.7/site-packages/gunicorn/http/wsgi.py", line 396, in write_file
|
||||
if not self.sendfile(respiter):
|
||||
File "/usr/local/lib/python3.7/site-packages/gunicorn/http/wsgi.py", line 386, in sendfile
|
||||
sent += os.sendfile(sockno, fileno, offset + sent, count)
|
||||
OSError: [Errno 22] Invalid argument
|
||||
|
||||
To fix this issue, add
|
||||
|
||||
.. code::
|
||||
|
||||
SENDFILE=0
|
||||
|
||||
to your `docker-compose.env` file.
|
||||
|
||||
Error while reading metadata
|
||||
############################
|
||||
|
||||
You might find messages like these in your log files:
|
||||
|
||||
.. code::
|
||||
|
||||
[WARNING] [paperless.parsing.tesseract] Error while reading metadata
|
||||
|
||||
This indicates that paperless failed to read PDF metadata from one of your documents. This happens when you
|
||||
open the affected documents in paperless for editing. Paperless will continue to work, and will simply not
|
||||
show the invalid metadata.
|
||||
|
||||
Consumer fails with a FileNotFoundError
|
||||
#######################################
|
||||
|
||||
You might find messages like these in your log files:
|
||||
|
||||
.. code::
|
||||
|
||||
[ERROR] [paperless.consumer] Error while consuming document SCN_0001.pdf: FileNotFoundError: [Errno 2] No such file or directory: '/tmp/ocrmypdf.io.yhk3zbv0/origin.pdf'
|
||||
Traceback (most recent call last):
|
||||
File "/app/paperless/src/paperless_tesseract/parsers.py", line 261, in parse
|
||||
ocrmypdf.ocr(**args)
|
||||
File "/usr/local/lib/python3.8/dist-packages/ocrmypdf/api.py", line 337, in ocr
|
||||
return run_pipeline(options=options, plugin_manager=plugin_manager, api=True)
|
||||
File "/usr/local/lib/python3.8/dist-packages/ocrmypdf/_sync.py", line 385, in run_pipeline
|
||||
exec_concurrent(context, executor)
|
||||
File "/usr/local/lib/python3.8/dist-packages/ocrmypdf/_sync.py", line 302, in exec_concurrent
|
||||
pdf = post_process(pdf, context, executor)
|
||||
File "/usr/local/lib/python3.8/dist-packages/ocrmypdf/_sync.py", line 235, in post_process
|
||||
pdf_out = metadata_fixup(pdf_out, context)
|
||||
File "/usr/local/lib/python3.8/dist-packages/ocrmypdf/_pipeline.py", line 798, in metadata_fixup
|
||||
with pikepdf.open(context.origin) as original, pikepdf.open(working_file) as pdf:
|
||||
File "/usr/local/lib/python3.8/dist-packages/pikepdf/_methods.py", line 923, in open
|
||||
pdf = Pdf._open(
|
||||
FileNotFoundError: [Errno 2] No such file or directory: '/tmp/ocrmypdf.io.yhk3zbv0/origin.pdf'
|
||||
|
||||
This probably indicates paperless tried to consume the same file twice. This can happen for a number of reasons,
|
||||
depending on how documents are placed into the consume folder. If paperless is using inotify (the default) to
|
||||
check for documents, try adjusting the :ref:`inotify configuration <configuration-inotify>`. If polling is enabled,
|
||||
try adjusting the :ref:`polling configuration <configuration-polling>`.
|
||||
|
||||
Consumer fails waiting for file to remain unmodified.
|
||||
#####################################################
|
||||
|
||||
You might find messages like these in your log files:
|
||||
|
||||
.. code::
|
||||
|
||||
[ERROR] [paperless.management.consumer] Timeout while waiting on file /usr/src/paperless/src/../consume/SCN_0001.pdf to remain unmodified.
|
||||
|
||||
This indicates paperless timed out while waiting for the file to be completely written to the consume folder.
|
||||
Adjusting :ref:`polling configuration <configuration-polling>` values should resolve the issue.
|
||||
|
||||
.. note::
|
||||
|
||||
The user will need to manually move the file out of the consume folder and
|
||||
back in, for the initial failing file to be consumed.
|
||||
|
||||
Consumer fails reporting "OS reports file as busy still".
|
||||
#########################################################
|
||||
|
||||
You might find messages like these in your log files:
|
||||
|
||||
.. code::
|
||||
|
||||
[WARNING] [paperless.management.consumer] Not consuming file /usr/src/paperless/src/../consume/SCN_0001.pdf: OS reports file as busy still
|
||||
|
||||
This indicates paperless was unable to open the file, as the OS reported the file as still being in use. To prevent a
|
||||
crash, paperless did not try to consume the file. If paperless is using inotify (the default) to
|
||||
check for documents, try adjusting the :ref:`inotify configuration <configuration-inotify>`. If polling is enabled,
|
||||
try adjusting the :ref:`polling configuration <configuration-polling>`.
|
||||
|
||||
.. note::
|
||||
|
||||
The user will need to manually move the file out of the consume folder and
|
||||
back in, for the initial failing file to be consumed.
|
||||
|
||||
Log reports "Creating PaperlessTask failed".
|
||||
#########################################################
|
||||
|
||||
You might find messages like these in your log files:
|
||||
|
||||
.. code::
|
||||
|
||||
[ERROR] [paperless.management.consumer] Creating PaperlessTask failed: db locked
|
||||
|
||||
You are likely using an sqlite based installation, with an increased number of workers and are running into sqlite's concurrency limitations.
|
||||
Uploading or consuming multiple files at once results in many workers attempting to access the database simultaneously.
|
||||
|
||||
Consider changing to the PostgreSQL database if you will be processing many documents at once often. Otherwise,
|
||||
try tweaking the ``PAPERLESS_DB_TIMEOUT`` setting to allow more time for the database to unlock. This may have
|
||||
minor performance implications.
|
||||
|
||||
|
||||
gunicorn fails to start with "is not a valid port number"
|
||||
#########################################################
|
||||
|
||||
You are likely running using Kubernetes, which automatically creates an environment variable named `${serviceName}_PORT`.
|
||||
This is the same environment variable which is used by Paperless to optionally change the port gunicorn listens on.
|
||||
|
||||
To fix this, set `PAPERLESS_PORT` again to your desired port, or the default of 8000.
|
||||
You will be redirected shortly...
|
||||
|
@ -1,420 +1,12 @@
|
||||
.. _usage_overview:
|
||||
|
||||
**************
|
||||
Usage Overview
|
||||
**************
|
||||
|
||||
Paperless is an application that manages your personal documents. With
|
||||
the help of a document scanner (see :ref:`scanners`), paperless transforms
|
||||
your wieldy physical document binders into a searchable archive and
|
||||
provides many utilities for finding and managing your documents.
|
||||
|
||||
.. cssclass:: redirect-notice
|
||||
|
||||
Terms and definitions
|
||||
#####################
|
||||
The Paperless-ngx documentation has permanently moved.
|
||||
|
||||
Paperless essentially consists of two different parts for managing your
|
||||
documents:
|
||||
|
||||
* The *consumer* watches a specified folder and adds all documents in that
|
||||
folder to paperless.
|
||||
* The *web server* provides a UI that you use to manage and search for your
|
||||
scanned documents.
|
||||
|
||||
Each document has a couple of fields that you can assign to them:
|
||||
|
||||
* A *Document* is a piece of paper that sometimes contains valuable
|
||||
information.
|
||||
* The *correspondent* of a document is the person, institution or company that
|
||||
a document either originates from, or is sent to.
|
||||
* A *tag* is a label that you can assign to documents. Think of labels as more
|
||||
powerful folders: Multiple documents can be grouped together with a single
|
||||
tag, however, a single document can also have multiple tags. This is not
|
||||
possible with folders. The reason folders are not implemented in paperless
|
||||
is simply that tags are much more versatile than folders.
|
||||
* A *document type* is used to demarcate the type of a document such as letter,
|
||||
bank statement, invoice, contract, etc. It is used to identify what a document
|
||||
is about.
|
||||
* The *date added* of a document is the date the document was scanned into
|
||||
paperless. You cannot and should not change this date.
|
||||
* The *date created* of a document is the date the document was initially issued.
|
||||
This can be the date you bought a product, the date you signed a contract, or
|
||||
the date a letter was sent to you.
|
||||
* The *archive serial number* (short: ASN) of a document is the identifier of
|
||||
the document in your physical document binders. See
|
||||
:ref:`usage-recommended_workflow` below.
|
||||
* The *content* of a document is the text that was OCR'ed from the document.
|
||||
This text is fed into the search engine and is used for matching tags,
|
||||
correspondents and document types.
|
||||
|
||||
|
||||
Frontend overview
|
||||
#################
|
||||
|
||||
.. warning::
|
||||
|
||||
TBD. Add some fancy screenshots!
|
||||
|
||||
Adding documents to paperless
|
||||
#############################
|
||||
|
||||
Once you've got Paperless setup, you need to start feeding documents into it.
|
||||
When adding documents to paperless, it will perform the following operations on
|
||||
your documents:
|
||||
|
||||
1. OCR the document, if it has no text. Digital documents usually have text,
|
||||
and this step will be skipped for those documents.
|
||||
2. Paperless will create an archivable PDF/A document from your document.
|
||||
If this document is coming from your scanner, it will have embedded selectable text.
|
||||
3. Paperless performs automatic matching of tags, correspondents and types on the
|
||||
document before storing it in the database.
|
||||
|
||||
.. hint::
|
||||
|
||||
This process can be configured to fit your needs. If you don't want paperless
|
||||
to create archived versions for digital documents, you can configure that by
|
||||
configuring ``PAPERLESS_OCR_MODE=skip_noarchive``. Please read the
|
||||
:ref:`relevant section in the documentation <configuration-ocr>`.
|
||||
|
||||
.. note::
|
||||
|
||||
No matter which options you choose, Paperless will always store the original
|
||||
document that it found in the consumption directory or in the mail and
|
||||
will never overwrite that document. Archived versions are stored alongside the
|
||||
original versions.
|
||||
|
||||
|
||||
The consumption directory
|
||||
=========================
|
||||
|
||||
The primary method of getting documents into your database is by putting them in
|
||||
the consumption directory. The consumer runs in an infinite loop, looking for new
|
||||
additions to this directory. When it finds them, the consumer goes about the process
|
||||
of parsing them with the OCR, indexing what it finds, and storing it in the media directory.
|
||||
|
||||
Getting stuff into this directory is up to you. If you're running Paperless
|
||||
on your local computer, you might just want to drag and drop files there, but if
|
||||
you're running this on a server and want your scanner to automatically push
|
||||
files to this directory, you'll need to setup some sort of service to accept the
|
||||
files from the scanner. Typically, you're looking at an FTP server like
|
||||
`Proftpd`_ or a Windows folder share with `Samba`_.
|
||||
|
||||
.. _Proftpd: http://www.proftpd.org/
|
||||
.. _Samba: http://www.samba.org/
|
||||
|
||||
.. TODO: hyperref to configuration of the location of this magic folder.
|
||||
|
||||
Web UI Upload
|
||||
=============
|
||||
|
||||
The dashboard has a file drop field to upload documents to paperless. Simply drag a file
|
||||
onto this field or select a file with the file dialog. Multiple files are supported.
|
||||
|
||||
You can also upload documents on any other page of the web UI by dragging-and-dropping
|
||||
files into your browser window.
|
||||
|
||||
.. _usage-mobile_upload:
|
||||
|
||||
Mobile upload
|
||||
=============
|
||||
|
||||
The mobile app over at `<https://github.com/qcasey/paperless_share>`_ allows Android users
|
||||
to share any documents with paperless. This can be combined with any of the mobile
|
||||
scanning apps out there, such as Office Lens.
|
||||
|
||||
Furthermore, there is the `Paperless App <https://github.com/bauerj/paperless_app>`_ as well,
|
||||
which not only has document upload, but also document browsing and download features.
|
||||
|
||||
.. _usage-email:
|
||||
|
||||
IMAP (Email)
|
||||
============
|
||||
|
||||
You can tell paperless-ngx to consume documents from your email accounts.
|
||||
This is a very flexible and powerful feature, if you regularly received documents
|
||||
via mail that you need to archive. The mail consumer can be configured by using the
|
||||
admin interface in the following manner:
|
||||
|
||||
1. Define e-mail accounts.
|
||||
2. Define mail rules for your account.
|
||||
|
||||
These rules perform the following:
|
||||
|
||||
1. Connect to the mail server.
|
||||
2. Fetch all matching mails (as defined by folder, maximum age and the filters)
|
||||
3. Check if there are any consumable attachments.
|
||||
4. If so, instruct paperless to consume the attachments and optionally
|
||||
use the metadata provided in the rule for the new document.
|
||||
5. If documents were consumed from a mail, the rule action is performed
|
||||
on that mail.
|
||||
|
||||
Paperless will completely ignore mails that do not match your filters. It will also
|
||||
only perform the action on mails that it has consumed documents from.
|
||||
|
||||
The actions all ensure that the same mail is not consumed twice by different means.
|
||||
These are as follows:
|
||||
|
||||
* **Delete:** Immediately deletes mail that paperless has consumed documents from.
|
||||
Use with caution.
|
||||
* **Mark as read:** Mark consumed mail as read. Paperless will not consume documents
|
||||
from already read mails. If you read a mail before paperless sees it, it will be
|
||||
ignored.
|
||||
* **Flag:** Sets the 'important' flag on mails with consumed documents. Paperless
|
||||
will not consume flagged mails.
|
||||
* **Move to folder:** Moves consumed mails out of the way so that paperless wont
|
||||
consume them again.
|
||||
* **Add custom Tag:** Adds a custom tag to mails with consumed documents (the IMAP
|
||||
standard calls these "keywords"). Paperless will not consume mails already tagged.
|
||||
Not all mail servers support this feature!
|
||||
|
||||
.. caution::
|
||||
|
||||
The mail consumer will perform these actions on all mails it has consumed
|
||||
documents from. Keep in mind that the actual consumption process may fail
|
||||
for some reason, leaving you with missing documents in paperless.
|
||||
|
||||
.. note::
|
||||
|
||||
With the correct set of rules, you can completely automate your email documents.
|
||||
Create rules for every correspondent you receive digital documents from and
|
||||
paperless will read them automatically. The default action "mark as read" is
|
||||
pretty tame and will not cause any damage or data loss whatsoever.
|
||||
|
||||
You can also setup a special folder in your mail account for paperless and use
|
||||
your favorite mail client to move to be consumed mails into that folder
|
||||
automatically or manually and tell paperless to move them to yet another folder
|
||||
after consumption. It's up to you.
|
||||
|
||||
.. note::
|
||||
|
||||
When defining a mail rule with a folder, you may need to try different characters to
|
||||
define how the sub-folders are separated. Common values include ".", "/" or "|", but
|
||||
this varies by the mail server. Check the documentation for your mail server. In the
|
||||
event of an error fetching mail from a certain folder, check the Paperless logs. When
|
||||
a folder is not located, Paperless will attempt to list all folders found in the account
|
||||
to the Paperless logs.
|
||||
|
||||
.. note::
|
||||
|
||||
Paperless will process the rules in the order defined in the admin page.
|
||||
|
||||
You can define catch-all rules and have them executed last to consume
|
||||
any documents not matched by previous rules. Such a rule may assign an "Unknown
|
||||
mail document" tag to consumed documents so you can inspect them further.
|
||||
|
||||
Paperless is set up to check your mails every 10 minutes. This can be configured on the
|
||||
'Scheduled tasks' page in the admin.
|
||||
|
||||
|
||||
REST API
|
||||
========
|
||||
|
||||
You can also submit a document using the REST API, see :ref:`api-file_uploads` for details.
|
||||
|
||||
.. _basic-searching:
|
||||
|
||||
|
||||
Best practices
|
||||
##############
|
||||
|
||||
Paperless offers a couple tools that help you organize your document collection. However,
|
||||
it is up to you to use them in a way that helps you organize documents and find specific
|
||||
documents when you need them. This section offers a couple ideas for managing your collection.
|
||||
|
||||
Document types allow you to classify documents according to what they are. You can define
|
||||
types such as "Receipt", "Invoice", or "Contract". If you used to collect all your receipts
|
||||
in a single binder, you can recreate that system in paperless by defining a document type,
|
||||
assigning documents to that type and then filtering by that type to only see all receipts.
|
||||
|
||||
Not all documents need document types. Sometimes its hard to determine what the type of a
|
||||
document is or it is hard to justify creating a document type that you only need once or twice.
|
||||
This is okay. As long as the types you define help you organize your collection in the way
|
||||
you want, paperless is doing its job.
|
||||
|
||||
Tags can be used in many different ways. Think of tags are more versatile folders or binders.
|
||||
If you have a binder for documents related to university / your car or health care, you can
|
||||
create these binders in paperless by creating tags and assigning them to relevant documents.
|
||||
Just as with documents, you can filter the document list by tags and only see documents of
|
||||
a certain topic.
|
||||
|
||||
With physical documents, you'll often need to decide which folder the document belongs to.
|
||||
The advantage of tags over folders and binders is that a single document can have multiple
|
||||
tags. A physical document cannot magically appear in two different folders, but with tags,
|
||||
this is entirely possible.
|
||||
|
||||
.. hint::
|
||||
|
||||
This can be used in many different ways. One example: Imagine you're working on a particular
|
||||
task, such as signing up for university. Usually you'll need to collect a bunch of different
|
||||
documents that are already sorted into various folders. With the tag system of paperless,
|
||||
you can create a new group of documents that are relevant to this task without destroying
|
||||
the already existing organization. When you're done with the task, you could delete the
|
||||
tag again, which would be equal to sorting documents back into the folder they belong into.
|
||||
Or keep the tag, up to you.
|
||||
|
||||
All of the logic above applies to correspondents as well. Attach them to documents if you
|
||||
feel that they help you organize your collection.
|
||||
|
||||
When you've started organizing your documents, create a couple saved views for document collections
|
||||
you regularly access. This is equal to having labeled physical binders on your desk, except
|
||||
that these saved views are dynamic and simply update themselves as you add documents to the system.
|
||||
|
||||
Here are a couple examples of tags and types that you could use in your collection.
|
||||
|
||||
* An ``inbox`` tag for newly added documents that you haven't manually edited yet.
|
||||
* A tag ``car`` for everything car related (repairs, registration, insurance, etc)
|
||||
* A tag ``todo`` for documents that you still need to do something with, such as reply, or
|
||||
perform some task online.
|
||||
* A tag ``bank account x`` for all bank statement related to that account.
|
||||
* A tag ``mail`` for anything that you added to paperless via its mail processing capabilities.
|
||||
* A tag ``missing_metadata`` when you still need to add some metadata to a document, but can't
|
||||
or don't want to do this right now.
|
||||
|
||||
.. _basic-usage_searching:
|
||||
|
||||
Searching
|
||||
#########
|
||||
|
||||
Paperless offers an extensive searching mechanism that is designed to allow you to quickly
|
||||
find a document you're looking for (for example, that thing that just broke and you bought
|
||||
a couple months ago, that contract you signed 8 years ago).
|
||||
|
||||
When you search paperless for a document, it tries to match this query against your documents.
|
||||
Paperless will look for matching documents by inspecting their content, title, correspondent,
|
||||
type and tags. Paperless returns a scored list of results, so that documents matching your query
|
||||
better will appear further up in the search results.
|
||||
|
||||
By default, paperless returns only documents which contain all words typed in the search bar.
|
||||
However, paperless also offers advanced search syntax if you want to drill down the results
|
||||
further.
|
||||
|
||||
Matching documents with logical expressions:
|
||||
|
||||
.. code::
|
||||
|
||||
shopname AND (product1 OR product2)
|
||||
|
||||
Matching specific tags, correspondents or types:
|
||||
|
||||
.. code::
|
||||
|
||||
type:invoice tag:unpaid
|
||||
correspondent:university certificate
|
||||
|
||||
Matching dates:
|
||||
|
||||
.. code::
|
||||
|
||||
created:[2005 to 2009]
|
||||
added:yesterday
|
||||
modified:today
|
||||
|
||||
Matching inexact words:
|
||||
|
||||
.. code::
|
||||
|
||||
produ*name
|
||||
|
||||
.. note::
|
||||
|
||||
Inexact terms are hard for search indexes. These queries might take a while to execute. That's why paperless offers
|
||||
auto complete and query correction.
|
||||
|
||||
All of these constructs can be combined as you see fit.
|
||||
If you want to learn more about the query language used by paperless, paperless uses Whoosh's default query language.
|
||||
Head over to `Whoosh query language <https://whoosh.readthedocs.io/en/latest/querylang.html>`_.
|
||||
For details on what date parsing utilities are available, see
|
||||
`Date parsing <https://whoosh.readthedocs.io/en/latest/dates.html#parsing-date-queries>`_.
|
||||
|
||||
|
||||
.. _usage-recommended_workflow:
|
||||
|
||||
The recommended workflow
|
||||
########################
|
||||
|
||||
Once you have familiarized yourself with paperless and are ready to use it
|
||||
for all your documents, the recommended workflow for managing your documents
|
||||
is as follows. This workflow also takes into account that some documents
|
||||
have to be kept in physical form, but still ensures that you get all the
|
||||
advantages for these documents as well.
|
||||
|
||||
The following diagram shows how easy it is to manage your documents.
|
||||
|
||||
.. image:: _static/recommended_workflow.png
|
||||
|
||||
Preparations in paperless
|
||||
=========================
|
||||
|
||||
* Create an inbox tag that gets assigned to all new documents.
|
||||
* Create a TODO tag.
|
||||
|
||||
Processing of the physical documents
|
||||
====================================
|
||||
|
||||
Keep a physical inbox. Whenever you receive a document that you need to
|
||||
archive, put it into your inbox. Regularly, do the following for all documents
|
||||
in your inbox:
|
||||
|
||||
1. For each document, decide if you need to keep the document in physical
|
||||
form. This applies to certain important documents, such as contracts and
|
||||
certificates.
|
||||
2. If you need to keep the document, write a running number on the document
|
||||
before scanning, starting at one and counting upwards. This is the archive
|
||||
serial number, or ASN in short.
|
||||
3. Scan the document.
|
||||
4. If the document has an ASN assigned, store it in a *single* binder, sorted
|
||||
by ASN. Don't order this binder in any other way.
|
||||
5. If the document has no ASN, throw it away. Yay!
|
||||
|
||||
Over time, you will notice that your physical binder will fill up. If it is
|
||||
full, label the binder with the range of ASNs in this binder (i.e., "Documents
|
||||
1 to 343"), store the binder in your cellar or elsewhere, and start a new
|
||||
binder.
|
||||
|
||||
The idea behind this process is that you will never have to use the physical
|
||||
binders to find a document. If you need a specific physical document, you
|
||||
may find this document by:
|
||||
|
||||
1. Searching in paperless for the document.
|
||||
2. Identify the ASN of the document, since it appears on the scan.
|
||||
3. Grab the relevant document binder and get the document. This is easy since
|
||||
they are sorted by ASN.
|
||||
|
||||
Processing of documents in paperless
|
||||
====================================
|
||||
|
||||
Once you have scanned in a document, proceed in paperless as follows.
|
||||
|
||||
1. If the document has an ASN, assign the ASN to the document.
|
||||
2. Assign a correspondent to the document (i.e., your employer, bank, etc)
|
||||
This isn't strictly necessary but helps in finding a document when you need
|
||||
it.
|
||||
3. Assign a document type (i.e., invoice, bank statement, etc) to the document
|
||||
This isn't strictly necessary but helps in finding a document when you need
|
||||
it.
|
||||
4. Assign a proper title to the document (the name of an item you bought, the
|
||||
subject of the letter, etc)
|
||||
5. Check that the date of the document is correct. Paperless tries to read
|
||||
the date from the content of the document, but this fails sometimes if the
|
||||
OCR is bad or multiple dates appear on the document.
|
||||
6. Remove inbox tags from the documents.
|
||||
|
||||
.. hint::
|
||||
|
||||
You can setup manual matching rules for your correspondents and tags and
|
||||
paperless will assign them automatically. After consuming a couple documents,
|
||||
you can even ask paperless to *learn* when to assign tags and correspondents
|
||||
by itself. For details on this feature, see :ref:`advanced-matching`.
|
||||
|
||||
Task management
|
||||
===============
|
||||
|
||||
Some documents require attention and require you to act on the document. You
|
||||
may take two different approaches to handle these documents based on how
|
||||
regularly you intend to scan documents and use paperless.
|
||||
|
||||
* If you scan and process your documents in paperless regularly, assign a
|
||||
TODO tag to all scanned documents that you need to process. Create a saved
|
||||
view on the dashboard that shows all documents with this tag.
|
||||
* If you do not scan documents regularly and use paperless solely for archiving,
|
||||
create a physical todo box next to your physical inbox and put documents you
|
||||
need to process in the TODO box. When you performed the task associated with
|
||||
the document, move it to the inbox.
|
||||
You will be redirected shortly...
|
||||
|
Loading…
x
Reference in New Issue
Block a user