mirror of
https://github.com/paperless-ngx/paperless-ngx.git
synced 2025-04-11 10:00:48 -05:00
reworking the documentation.
This commit is contained in:
parent
04335e4aac
commit
f2dbb74d44
BIN
docs/_static/Screenshot_first_logged.png
vendored
BIN
docs/_static/Screenshot_first_logged.png
vendored
Binary file not shown.
Before Width: | Height: | Size: 60 KiB |
BIN
docs/_static/Screenshot_first_run_login.png
vendored
BIN
docs/_static/Screenshot_first_run_login.png
vendored
Binary file not shown.
Before Width: | Height: | Size: 26 KiB |
BIN
docs/_static/Screenshot_upload_and_scanned.png
vendored
BIN
docs/_static/Screenshot_upload_and_scanned.png
vendored
Binary file not shown.
Before Width: | Height: | Size: 113 KiB |
354
docs/administration.rst
Normal file
354
docs/administration.rst
Normal file
@ -0,0 +1,354 @@
|
|||||||
|
|
||||||
|
**************
|
||||||
|
Administration
|
||||||
|
**************
|
||||||
|
|
||||||
|
|
||||||
|
Making backups
|
||||||
|
##############
|
||||||
|
|
||||||
|
.. warning::
|
||||||
|
|
||||||
|
This section is not updated yet.
|
||||||
|
|
||||||
|
So you're bored of this whole project, or you want to make a remote backup of
|
||||||
|
your files for whatever reason. This is easy to do, simply use the
|
||||||
|
:ref:`exporter <utilities-exporter>` to dump your documents and database out
|
||||||
|
into an arbitrary directory.
|
||||||
|
|
||||||
|
|
||||||
|
.. _migrating-restoring:
|
||||||
|
|
||||||
|
Restoring
|
||||||
|
=========
|
||||||
|
|
||||||
|
Restoring your data is just as easy, since nearly all of your data exists either
|
||||||
|
in the file names, or in the contents of the files themselves. You just need to
|
||||||
|
create an empty database (just follow the
|
||||||
|
:ref:`installation instructions <setup-installation>` again) and then import the
|
||||||
|
``tags.json`` file you created as part of your backup. Lastly, copy your
|
||||||
|
exported documents into the consumption directory and start up the consumer.
|
||||||
|
|
||||||
|
.. code-block:: shell-session
|
||||||
|
|
||||||
|
$ cd /path/to/project
|
||||||
|
$ rm data/db.sqlite3 # Delete the database
|
||||||
|
$ cd src
|
||||||
|
$ ./manage.py migrate # Create the database
|
||||||
|
$ ./manage.py createsuperuser
|
||||||
|
$ ./manage.py loaddata /path/to/arbitrary/place/tags.json
|
||||||
|
$ cp /path/to/exported/docs/* /path/to/consumption/dir/
|
||||||
|
$ ./manage.py document_consumer
|
||||||
|
|
||||||
|
Importing your data if you are :ref:`using Docker <setup-installation-docker>`
|
||||||
|
is almost as simple:
|
||||||
|
|
||||||
|
.. code-block:: shell-session
|
||||||
|
|
||||||
|
# Stop and remove your current containers
|
||||||
|
$ docker-compose stop
|
||||||
|
$ docker-compose rm -f
|
||||||
|
|
||||||
|
# Recreate them, add the superuser
|
||||||
|
$ docker-compose up -d
|
||||||
|
$ docker-compose run --rm webserver createsuperuser
|
||||||
|
|
||||||
|
# Load the tags
|
||||||
|
$ cat /path/to/arbitrary/place/tags.json | docker-compose run --rm webserver loaddata_stdin -
|
||||||
|
|
||||||
|
# Load your exported documents into the consumption directory
|
||||||
|
# (How you do this highly depends on how you have set this up)
|
||||||
|
$ cp /path/to/exported/docs/* /path/to/mounted/consumption/dir/
|
||||||
|
|
||||||
|
After loading the documents into the consumption directory the consumer will
|
||||||
|
immediately start consuming the documents.
|
||||||
|
|
||||||
|
.. _administration-updating:
|
||||||
|
|
||||||
|
Updating paperless
|
||||||
|
##################
|
||||||
|
|
||||||
|
.. warning::
|
||||||
|
|
||||||
|
This section is not updated yet.
|
||||||
|
|
||||||
|
For the most part, all you have to do to update Paperless is run ``git pull``
|
||||||
|
on the directory containing the project files, and then use Django's
|
||||||
|
``migrate`` command to execute any database schema updates that might have been
|
||||||
|
rolled in as part of the update:
|
||||||
|
|
||||||
|
.. code-block:: shell-session
|
||||||
|
|
||||||
|
$ cd /path/to/project
|
||||||
|
$ git pull
|
||||||
|
$ pip install -r requirements.txt
|
||||||
|
$ cd src
|
||||||
|
$ ./manage.py migrate
|
||||||
|
|
||||||
|
Note that it's possible (even likely) that while ``git pull`` may update some
|
||||||
|
files, the ``migrate`` step may not update anything. This is totally normal.
|
||||||
|
|
||||||
|
Additionally, as new features are added, the ability to control those features
|
||||||
|
is typically added by way of an environment variable set in ``paperless.conf``.
|
||||||
|
You may want to take a look at the ``paperless.conf.example`` file to see if
|
||||||
|
there's anything new in there compared to what you've got in ``/etc``.
|
||||||
|
|
||||||
|
If you are :ref:`using Docker <setup-installation-docker>` the update process
|
||||||
|
is similar:
|
||||||
|
|
||||||
|
.. code-block:: shell-session
|
||||||
|
|
||||||
|
$ cd /path/to/project
|
||||||
|
$ git pull
|
||||||
|
$ docker build -t paperless .
|
||||||
|
$ docker-compose run --rm consumer migrate
|
||||||
|
$ docker-compose up -d
|
||||||
|
|
||||||
|
If ``git pull`` doesn't report any changes, there is no need to continue with
|
||||||
|
the remaining steps.
|
||||||
|
|
||||||
|
This depends on the route you've chosen to run paperless.
|
||||||
|
|
||||||
|
a. If you are not using docker, update python requirements. Paperless uses
|
||||||
|
`Pipenv`_ for managing dependencies:
|
||||||
|
|
||||||
|
.. code:: bash
|
||||||
|
|
||||||
|
$ pip install --upgrade pipenv
|
||||||
|
$ cd /path/to/paperless
|
||||||
|
$ pipenv install
|
||||||
|
|
||||||
|
This creates a new virtual environment (or uses your existing environment)
|
||||||
|
and installs all dependencies into it. Running commands inside the environment
|
||||||
|
is done via
|
||||||
|
|
||||||
|
.. code:: bash
|
||||||
|
|
||||||
|
$ cd /path/to/paperless/src
|
||||||
|
$ pipenv run python3 manage.py my_command
|
||||||
|
|
||||||
|
You will also need to build the frontend each time a new update is pushed.
|
||||||
|
See updating paperless for more information. TODO REFERENCE
|
||||||
|
|
||||||
|
b. If you are using docker, build the docker image.
|
||||||
|
|
||||||
|
.. code:: bash
|
||||||
|
|
||||||
|
$ docker build -t jonaswinkler/paperless-ng:latest .
|
||||||
|
|
||||||
|
Copy either docker-compose.yml.example or docker-compose.yml.sqlite.example
|
||||||
|
to docker-compose.yml and adjust the consumption directory.
|
||||||
|
|
||||||
|
Management utilities
|
||||||
|
####################
|
||||||
|
|
||||||
|
Paperless comes with some management commands that perform various maintenance
|
||||||
|
tasks on your paperless instance. You can invoce these commands either by
|
||||||
|
|
||||||
|
.. code:: bash
|
||||||
|
|
||||||
|
$ cd /path/to/paperless
|
||||||
|
$ docker-compose run --rm webserver <command> <arguments>
|
||||||
|
|
||||||
|
or
|
||||||
|
|
||||||
|
.. code:: bash
|
||||||
|
|
||||||
|
$ cd /path/to/paperless/src
|
||||||
|
$ pipenv run python manage.py <command> <arguments>
|
||||||
|
|
||||||
|
depending on whether you use docker or not.
|
||||||
|
|
||||||
|
All commands have built-in help, which can be accessed by executing them with
|
||||||
|
the argument ``--help``.
|
||||||
|
|
||||||
|
Document exporter
|
||||||
|
=================
|
||||||
|
|
||||||
|
The document exporter exports all your data from paperless into a folder for
|
||||||
|
backup or migration to another DMS.
|
||||||
|
|
||||||
|
.. code::
|
||||||
|
|
||||||
|
document_exporter target
|
||||||
|
|
||||||
|
``target`` is a folder to which the data gets written. This includes documents,
|
||||||
|
thumbnails and a ``manifest.json`` file. The manifest contains all metadata from
|
||||||
|
the database (correspondents, tags, etc).
|
||||||
|
|
||||||
|
When you use the provided docker compose script, specify ``../export`` as the
|
||||||
|
target. This path inside the container is automatically mounted on your host on
|
||||||
|
the folder ``export``.
|
||||||
|
|
||||||
|
|
||||||
|
.. _utilities-importer:
|
||||||
|
|
||||||
|
Document importer
|
||||||
|
=================
|
||||||
|
|
||||||
|
The document importer takes the export produced by the `Document exporter`_ and
|
||||||
|
imports it into paperless.
|
||||||
|
|
||||||
|
The importer works just like the exporter. You point it at a directory, and
|
||||||
|
the script does the rest of the work:
|
||||||
|
|
||||||
|
.. code::
|
||||||
|
|
||||||
|
document_importer source
|
||||||
|
|
||||||
|
When you use the provided docker compose script, put the export inside the
|
||||||
|
``export`` folder in your paperless source directory. Specify ``../export``
|
||||||
|
as the ``source``.
|
||||||
|
|
||||||
|
|
||||||
|
.. _utilities-retagger:
|
||||||
|
|
||||||
|
Document retagger
|
||||||
|
=================
|
||||||
|
|
||||||
|
Say you've imported a few hundred documents and now want to introduce
|
||||||
|
a tag or set up a new correspondent, and apply its matching to all of
|
||||||
|
the currently-imported docs. This problem is common enough that
|
||||||
|
there are tools for it.
|
||||||
|
|
||||||
|
.. code::
|
||||||
|
|
||||||
|
document_retagger [-h] [-c] [-T] [-t] [-i] [--use-first] [-f]
|
||||||
|
|
||||||
|
optional arguments:
|
||||||
|
-c, --correspondent
|
||||||
|
-T, --tags
|
||||||
|
-t, --document_type
|
||||||
|
-i, --inbox-only
|
||||||
|
--use-first
|
||||||
|
-f, --overwrite
|
||||||
|
|
||||||
|
Run this after changing or adding matching rules. It'll loop over all
|
||||||
|
of the documents in your database and attempt to match documents
|
||||||
|
according to the new rules.
|
||||||
|
|
||||||
|
Specify any combination of ``-c``, ``-T`` and ``-t`` to have the
|
||||||
|
retagger perform matching of the specified metadata type. If you don't
|
||||||
|
specify any of these options, the document retagger won't do anything.
|
||||||
|
|
||||||
|
Specify ``-i`` to have the document retagger work on documents tagged
|
||||||
|
with inbox tags only. This is useful when you don't want to mess with
|
||||||
|
your already processed documents.
|
||||||
|
|
||||||
|
When multiple document types or correspondents match a single document,
|
||||||
|
the retagger won't assign these to the document. Specify ``--use-first``
|
||||||
|
to override this behaviour and just use the first correspondent or type
|
||||||
|
it finds. This option does not apply to tags, since any amount of tags
|
||||||
|
can be applied to a document.
|
||||||
|
|
||||||
|
Finally, ``-f`` specifies that you wish to overwrite already assigned
|
||||||
|
correspondents, types and/or tags. The default behaviour is to not
|
||||||
|
assign correspondents and types to documents that have this data already
|
||||||
|
assigned. ``-f`` works differently for tags: By default, only additional tags get
|
||||||
|
added to documents, no tags will be removed. With ``-f``, tags that don't
|
||||||
|
match a document anymore get removed as well.
|
||||||
|
|
||||||
|
|
||||||
|
Managing the Automatic matching algorithm
|
||||||
|
=========================================
|
||||||
|
|
||||||
|
The *Auto* matching algorithm requires a trained neural network to work.
|
||||||
|
This network needs to be updated whenever somethings in your data
|
||||||
|
changes. The docker image takes care of that automatically with the task
|
||||||
|
scheduler. You can manually renew the classifier by invoking the following
|
||||||
|
management command:
|
||||||
|
|
||||||
|
.. code::
|
||||||
|
|
||||||
|
document_create_classifier
|
||||||
|
|
||||||
|
This command takes no arguments.
|
||||||
|
|
||||||
|
|
||||||
|
Managing the document search index
|
||||||
|
==================================
|
||||||
|
|
||||||
|
The document search index is responsible for delivering search results for the
|
||||||
|
website. The document index is automatically updated whenever documents get
|
||||||
|
added to, changed, or removed from paperless. However, if the search yields
|
||||||
|
non-existing documents or won't find anything, you may need to recreate the
|
||||||
|
index manually.
|
||||||
|
|
||||||
|
.. code::
|
||||||
|
|
||||||
|
document_index {reindex,optimize}
|
||||||
|
|
||||||
|
Specify ``reindex`` to have the index created from scratch. This may take some
|
||||||
|
time.
|
||||||
|
|
||||||
|
Specify ``optimize`` to optimize the index. This updates certain aspects of
|
||||||
|
the index and usually makes queries faster and also ensures that the
|
||||||
|
autocompletion works properly. This command is regularly invoked by the task
|
||||||
|
scheduler.
|
||||||
|
|
||||||
|
|
||||||
|
Managing filenames
|
||||||
|
==================
|
||||||
|
|
||||||
|
.. warning::
|
||||||
|
|
||||||
|
TBD
|
||||||
|
|
||||||
|
.. code::
|
||||||
|
|
||||||
|
document_renamer
|
||||||
|
|
||||||
|
|
||||||
|
.. _utilities-encyption:
|
||||||
|
|
||||||
|
Managing encrpytion
|
||||||
|
===================
|
||||||
|
|
||||||
|
Documents can be stored in Paperless using GnuPG encryption.
|
||||||
|
|
||||||
|
.. danger::
|
||||||
|
|
||||||
|
Decryption is depreceated since paperless-ng 1.0 and doesn't really provide any
|
||||||
|
additional security, since you have to store the passphrase in a configuration
|
||||||
|
file on the same system as the encrypted documents for paperless to work. Also,
|
||||||
|
paperless provides transparent access to your encrypted documents.
|
||||||
|
|
||||||
|
Consider running paperless on an encrypted filesystem instead, which will then
|
||||||
|
at least provide security against physical hardware theft.
|
||||||
|
|
||||||
|
.. code::
|
||||||
|
|
||||||
|
change_storage_type [--passphrase PASSPHRASE] {gpg,unencrypted} {gpg,unencrypted}
|
||||||
|
|
||||||
|
positional arguments:
|
||||||
|
{gpg,unencrypted} The state you want to change your documents from
|
||||||
|
{gpg,unencrypted} The state you want to change your documents to
|
||||||
|
|
||||||
|
optional arguments:
|
||||||
|
--passphrase PASSPHRASE
|
||||||
|
|
||||||
|
Enabling encryption
|
||||||
|
-------------------
|
||||||
|
|
||||||
|
Basic usage to enable encryption of your document store (**USE A MORE SECURE PASSPHRASE**):
|
||||||
|
|
||||||
|
(Note: If ``PAPERLESS_PASSPHRASE`` isn't set already, you need to specify it here)
|
||||||
|
|
||||||
|
.. code::
|
||||||
|
|
||||||
|
change_storage_type [--passphrase SECR3TP4SSPHRA$E] unencrypted gpg
|
||||||
|
|
||||||
|
|
||||||
|
Disabling encryption
|
||||||
|
--------------------
|
||||||
|
|
||||||
|
Basic usage to enable encryption of your document store:
|
||||||
|
|
||||||
|
(Note: Again, if ``PAPERLESS_PASSPHRASE`` isn't set already, you need to specify it here)
|
||||||
|
|
||||||
|
.. code::
|
||||||
|
|
||||||
|
change_storage_type [--passphrase SECR3TP4SSPHRA$E] gpg unencrypted
|
||||||
|
|
||||||
|
|
||||||
|
.. _Pipenv: https://pipenv.pypa.io/en/latest/
|
244
docs/advanced_usage.rst
Normal file
244
docs/advanced_usage.rst
Normal file
@ -0,0 +1,244 @@
|
|||||||
|
***************
|
||||||
|
Advanced topics
|
||||||
|
***************
|
||||||
|
|
||||||
|
Paperless offers a couple features that automate certain tasks and make your life
|
||||||
|
easier.
|
||||||
|
|
||||||
|
Guesswork
|
||||||
|
#########
|
||||||
|
|
||||||
|
|
||||||
|
Any document you put into the consumption directory will be consumed, but if
|
||||||
|
you name the file right, it'll automatically set some values in the database
|
||||||
|
for you. This is is the logic the consumer follows:
|
||||||
|
|
||||||
|
1. Try to find the correspondent, title, and tags in the file name following
|
||||||
|
the pattern: ``Date - Correspondent - Title - tag,tag,tag.pdf``. Note that
|
||||||
|
the format of the date is **rigidly defined** as ``YYYYMMDDHHMMSSZ`` or
|
||||||
|
``YYYYMMDDZ``. The ``Z`` refers "Zulu time" AKA "UTC".
|
||||||
|
The tags are optional, so the format ``Date - Correspondent - Title.pdf``
|
||||||
|
works as well.
|
||||||
|
2. If that doesn't work, we skip the date and try this pattern:
|
||||||
|
``Correspondent - Title - tag,tag,tag.pdf``.
|
||||||
|
3. If that doesn't work, we try to find the correspondent and title in the file
|
||||||
|
name following the pattern: ``Correspondent - Title.pdf``.
|
||||||
|
4. If that doesn't work, just assume that the name of the file is the title.
|
||||||
|
|
||||||
|
So given the above, the following examples would work as you'd expect:
|
||||||
|
|
||||||
|
* ``20150314000700Z - Some Company Name - Invoice 2016-01-01 - money,invoices.pdf``
|
||||||
|
* ``20150314Z - Some Company Name - Invoice 2016-01-01 - money,invoices.pdf``
|
||||||
|
* ``Some Company Name - Invoice 2016-01-01 - money,invoices.pdf``
|
||||||
|
* ``Another Company - Letter of Reference.jpg``
|
||||||
|
* ``Dad's Recipe for Pancakes.png``
|
||||||
|
|
||||||
|
These however wouldn't work:
|
||||||
|
|
||||||
|
* ``2015-03-14 00:07:00 UTC - Some Company Name, Invoice 2016-01-01, money, invoices.pdf``
|
||||||
|
* ``2015-03-14 - Some Company Name, Invoice 2016-01-01, money, invoices.pdf``
|
||||||
|
* ``Some Company Name, Invoice 2016-01-01, money, invoices.pdf``
|
||||||
|
* ``Another Company- Letter of Reference.jpg``
|
||||||
|
|
||||||
|
Do I have to be so strict about naming?
|
||||||
|
=======================================
|
||||||
|
|
||||||
|
Rather than using the strict document naming rules, one can also set the option
|
||||||
|
``PAPERLESS_FILENAME_DATE_ORDER`` in ``paperless.conf`` to any date order
|
||||||
|
that is accepted by dateparser_. Doing so will cause ``paperless`` to default
|
||||||
|
to any date format that is found in the title, instead of a date pulled from
|
||||||
|
the document's text, without requiring the strict formatting of the document
|
||||||
|
filename as described above.
|
||||||
|
|
||||||
|
.. _dateparser: https://github.com/scrapinghub/dateparser/blob/v0.7.0/docs/usage.rst#settings
|
||||||
|
|
||||||
|
Transforming filenames for parsing
|
||||||
|
==================================
|
||||||
|
|
||||||
|
Some devices can't produce filenames that can be parsed by the default
|
||||||
|
parser. By configuring the option ``PAPERLESS_FILENAME_PARSE_TRANSFORMS`` in
|
||||||
|
``paperless.conf`` one can add transformations that are applied to the filename
|
||||||
|
before it's parsed.
|
||||||
|
|
||||||
|
The option contains a list of dictionaries of regular expressions (key:
|
||||||
|
``pattern``) and replacements (key: ``repl``) in JSON format, which are
|
||||||
|
applied in order by passing them to ``re.subn``. Transformation stops
|
||||||
|
after the first match, so at most one transformation is applied. The general
|
||||||
|
syntax is
|
||||||
|
|
||||||
|
.. code:: python
|
||||||
|
|
||||||
|
[{"pattern":"pattern1", "repl":"repl1"}, {"pattern":"pattern2", "repl":"repl2"}, ..., {"pattern":"patternN", "repl":"replN"}]
|
||||||
|
|
||||||
|
The example below is for a Brother ADS-2400N, a scanner that allows
|
||||||
|
different names to different hardware buttons (useful for handling
|
||||||
|
multiple entities in one instance), but insists on adding ``_<count>``
|
||||||
|
to the filename.
|
||||||
|
|
||||||
|
.. code:: python
|
||||||
|
|
||||||
|
# Brother profile configuration, support "Name_Date_Count" (the default
|
||||||
|
# setting) and "Name_Count" (use "Name" as tag and "Count" as title).
|
||||||
|
PAPERLESS_FILENAME_PARSE_TRANSFORMS=[{"pattern":"^([a-z]+)_(\\d{8})_(\\d{6})_([0-9]+)\\.", "repl":"\\2\\3Z - \\4 - \\1."}, {"pattern":"^([a-z]+)_([0-9]+)\\.", "repl":" - \\2 - \\1."}]
|
||||||
|
|
||||||
|
|
||||||
|
Matching tags, correspondents and document types
|
||||||
|
################################################
|
||||||
|
|
||||||
|
After the consumer has tried to figure out what it could from the file name,
|
||||||
|
it starts looking at the content of the document itself. It will compare the
|
||||||
|
matching algorithms defined by every tag and correspondent already set in your
|
||||||
|
database to see if they apply to the text in that document. In other words,
|
||||||
|
if you defined a tag called ``Home Utility`` that had a ``match`` property of
|
||||||
|
``bc hydro`` and a ``matching_algorithm`` of ``literal``, Paperless will
|
||||||
|
automatically tag your newly-consumed document with your ``Home Utility`` tag
|
||||||
|
so long as the text ``bc hydro`` appears in the body of the document somewhere.
|
||||||
|
|
||||||
|
The matching logic is quite powerful, and supports searching the text of your
|
||||||
|
document with different algorithms, and as such, some experimentation may be
|
||||||
|
necessary to get things right.
|
||||||
|
|
||||||
|
In order to have a tag, correspondent or type assigned automatically to newly
|
||||||
|
consumed documents, assign a match and matching algorithm using the web
|
||||||
|
interface. These settings define when to assign correspondents, tags and types
|
||||||
|
to documents.
|
||||||
|
|
||||||
|
The following algorithms are available:
|
||||||
|
|
||||||
|
* **Any:** Looks for any occurrence of any word provided in match in the PDF.
|
||||||
|
If you define the match as ``Bank1 Bank2``, it will match documents containing
|
||||||
|
either of these terms.
|
||||||
|
* **All:** Requires that every word provided appears in the PDF, albeit not in the
|
||||||
|
order provided.
|
||||||
|
* **Literal:** Matches only if the match appears exactly as provided in the PDF.
|
||||||
|
* **Regular expression:** Parses the match as a regular expression and tries to
|
||||||
|
find a match within the document.
|
||||||
|
* **Fuzzy match:** I dont know. Look at the source.
|
||||||
|
* **Auto:** Tries to automatically match new documents. This does not require you
|
||||||
|
to set a match. See the notes below.
|
||||||
|
|
||||||
|
When using the "any" or "all" matching algorithms, you can search for terms
|
||||||
|
that consist of multiple words by enclosing them in double quotes. For example,
|
||||||
|
defining a match text of ``"Bank of America" BofA`` using the "any" algorithm,
|
||||||
|
will match documents that contain either "Bank of America" or "BofA", but will
|
||||||
|
not match documents containing "Bank of South America".
|
||||||
|
|
||||||
|
Then just save your tag/correspondent and run another document through the
|
||||||
|
consumer. Once complete, you should see the newly-created document,
|
||||||
|
automatically tagged with the appropriate data.
|
||||||
|
|
||||||
|
|
||||||
|
Automatic matching
|
||||||
|
==================
|
||||||
|
|
||||||
|
Paperless-ng comes with a new matching algorithm called *Auto*. This matching
|
||||||
|
algorithm tries to assign tags, correspondents and document types to your
|
||||||
|
documents based on how you have assigned these on existing documents. It
|
||||||
|
uses a neural network under the hood.
|
||||||
|
|
||||||
|
If, for example, all your bank statements of your account 123 at the Bank of
|
||||||
|
America are tagged with the tag "bofa_123" and the matching algorithm of this
|
||||||
|
tag is set to *Auto*, this neural network will examine your documents and
|
||||||
|
automatically learn when to assign this tag.
|
||||||
|
|
||||||
|
There are a couple caveats you need to keep in mind when using this feature:
|
||||||
|
|
||||||
|
* Changes to your documents are not immediately reflected by the matching
|
||||||
|
algorithm. The neural network needs to be *trained* on your documents after
|
||||||
|
changes. Paperless periodically (default: once each hour) checks for changes
|
||||||
|
and does this automatically for you.
|
||||||
|
* The Auto matching algorithm only takes documents into account which are NOT
|
||||||
|
placed in your inbox (i.e., have inbox tags assigned to them). This ensures
|
||||||
|
that the neural network only learns from documents which you have correctly
|
||||||
|
tagged before.
|
||||||
|
* The matching algorithm can only work if there is a correlation between the
|
||||||
|
tag, correspondent or document type and the document itself. Your bank
|
||||||
|
statements usually contain your bank account number and the name of the bank,
|
||||||
|
so this works reasonably well, However, tags such as "TODO" cannot be
|
||||||
|
automatically assigned.
|
||||||
|
* The matching algorithm needs a reasonable number of documents to identify when
|
||||||
|
to assign tags, correspondents, and types. If one out of a thousand documents
|
||||||
|
has the correspondent "Very obscure web shop I bought something five years
|
||||||
|
ago", it will probably not assign this correspondent automatically if you buy
|
||||||
|
something from them again. The more documents, the better.
|
||||||
|
|
||||||
|
Hooking into the consumption process
|
||||||
|
####################################
|
||||||
|
|
||||||
|
Sometimes you may want to do something arbitrary whenever a document is
|
||||||
|
consumed. Rather than try to predict what you may want to do, Paperless lets
|
||||||
|
you execute scripts of your own choosing just before or after a document is
|
||||||
|
consumed using a couple simple hooks.
|
||||||
|
|
||||||
|
Just write a script, put it somewhere that Paperless can read & execute, and
|
||||||
|
then put the path to that script in ``paperless.conf`` with the variable name
|
||||||
|
of either ``PAPERLESS_PRE_CONSUME_SCRIPT`` or
|
||||||
|
``PAPERLESS_POST_CONSUME_SCRIPT``.
|
||||||
|
|
||||||
|
.. TODO HYPEREF TO CONFIG
|
||||||
|
|
||||||
|
.. important::
|
||||||
|
|
||||||
|
These scripts are executed in a **blocking** process, which means that if
|
||||||
|
a script takes a long time to run, it can significantly slow down your
|
||||||
|
document consumption flow. If you want things to run asynchronously,
|
||||||
|
you'll have to fork the process in your script and exit.
|
||||||
|
|
||||||
|
|
||||||
|
Pre-consumption script
|
||||||
|
======================
|
||||||
|
|
||||||
|
Executed after the consumer sees a new document in the consumption folder, but
|
||||||
|
before any processing of the document is performed. This script receives exactly
|
||||||
|
one argument:
|
||||||
|
|
||||||
|
* Document file name
|
||||||
|
|
||||||
|
A simple but common example for this would be creating a simple script like
|
||||||
|
this:
|
||||||
|
|
||||||
|
``/usr/local/bin/ocr-pdf``
|
||||||
|
|
||||||
|
.. code:: bash
|
||||||
|
|
||||||
|
#!/usr/bin/env bash
|
||||||
|
pdf2pdfocr.py -i ${1}
|
||||||
|
|
||||||
|
``/etc/paperless.conf``
|
||||||
|
|
||||||
|
.. code:: bash
|
||||||
|
|
||||||
|
...
|
||||||
|
PAPERLESS_PRE_CONSUME_SCRIPT="/usr/local/bin/ocr-pdf"
|
||||||
|
...
|
||||||
|
|
||||||
|
This will pass the path to the document about to be consumed to ``/usr/local/bin/ocr-pdf``,
|
||||||
|
which will in turn call `pdf2pdfocr.py`_ on your document, which will then
|
||||||
|
overwrite the file with an OCR'd version of the file and exit. At which point,
|
||||||
|
the consumption process will begin with the newly modified file.
|
||||||
|
|
||||||
|
.. _pdf2pdfocr.py: https://github.com/LeoFCardoso/pdf2pdfocr
|
||||||
|
|
||||||
|
|
||||||
|
.. _consumption-director-hook-variables-post:
|
||||||
|
|
||||||
|
Post-consumption script
|
||||||
|
=======================
|
||||||
|
|
||||||
|
Executed after the consumer has successfully processed a document and has moved it
|
||||||
|
into paperless. It receives the following arguments:
|
||||||
|
|
||||||
|
* Document id
|
||||||
|
* Generated file name
|
||||||
|
* Source path
|
||||||
|
* Thumbnail path
|
||||||
|
* Download URL
|
||||||
|
* Thumbnail URL
|
||||||
|
* Correspondent
|
||||||
|
* Tags
|
||||||
|
|
||||||
|
The script can be in any language you like, but for a simple shell script
|
||||||
|
example, you can take a look at ``post-consumption-example.sh`` in the
|
||||||
|
``scripts`` directory in this project.
|
||||||
|
|
||||||
|
The post consumption script cannot cancel the consumption process.
|
@ -1,7 +1,12 @@
|
|||||||
.. _api:
|
.. _api:
|
||||||
|
|
||||||
|
************
|
||||||
The REST API
|
The REST API
|
||||||
############
|
************
|
||||||
|
|
||||||
|
.. warning::
|
||||||
|
|
||||||
|
This section is not updated yet.
|
||||||
|
|
||||||
Paperless makes use of the `Django REST Framework`_ standard API interface
|
Paperless makes use of the `Django REST Framework`_ standard API interface
|
||||||
because of its inherent awesomeness. Conveniently, the system is also
|
because of its inherent awesomeness. Conveniently, the system is also
|
||||||
@ -15,7 +20,7 @@ installation.
|
|||||||
.. _api-uploading:
|
.. _api-uploading:
|
||||||
|
|
||||||
Uploading
|
Uploading
|
||||||
---------
|
=========
|
||||||
|
|
||||||
File uploads in an API are hard and so far as I've been able to tell, there's
|
File uploads in an API are hard and so far as I've been able to tell, there's
|
||||||
no standard way of accepting them, so rather than crowbar file uploads into the
|
no standard way of accepting them, so rather than crowbar file uploads into the
|
||||||
|
@ -1,6 +1,79 @@
|
|||||||
|
.. _paperless_changelog:
|
||||||
|
|
||||||
Changelog
|
Changelog
|
||||||
#########
|
#########
|
||||||
|
|
||||||
|
paperless-ng 1.0
|
||||||
|
================
|
||||||
|
|
||||||
|
* **Deprecated:** GnuPG. Don't use it. If you're still using it, be aware that it
|
||||||
|
offers no protection at all, since the passphrase is stored alongside with the
|
||||||
|
encrypted documents itself. This features will most likely be removed in future
|
||||||
|
versions.
|
||||||
|
|
||||||
|
* **Added:** New frontend. Features:
|
||||||
|
|
||||||
|
* Single page application: It's much more responsive than the django admin pages.
|
||||||
|
* Dashboard. Shows recently scanned documents, or todos, or other documents
|
||||||
|
at wish. Allows uploading of documents. Shows basic statistics.
|
||||||
|
* Better document list with multiple display options.
|
||||||
|
* Full text search with result highlighting, auto completion and scoring based
|
||||||
|
on the query. It uses a document search index in the background.
|
||||||
|
* Saveable filters.
|
||||||
|
* Better log viewer.
|
||||||
|
|
||||||
|
* **Added:** Document types. Assign these to documents just as correspondents.
|
||||||
|
They may be used in the future to perform automatic operations on documents
|
||||||
|
depending on the type.
|
||||||
|
* **Added:** Inbox tags. Define an inbox tag and it will automatically be
|
||||||
|
assigned to any new document scanned into the system.
|
||||||
|
* **Added:** Automatic matching. A new matching algorithm that automatically
|
||||||
|
assigns tags, document types and correspondents to your documents. It uses
|
||||||
|
a neural network trained on your data.
|
||||||
|
* **Added:** Archive serial numbers. Assign these to quickly find documents stored in
|
||||||
|
physical binders.
|
||||||
|
* **Added:** Enabled the internal user management of django. This isn't really a
|
||||||
|
multi user solution, however, it allows more than one user to access the website
|
||||||
|
and set some basic permissions / renew passwords.
|
||||||
|
|
||||||
|
* **Modified [breaking]:** REST Api changes:
|
||||||
|
|
||||||
|
* New filters added, other filters removed (case sensitive filters, slug filters)
|
||||||
|
* Endpoints for thumbnails, previews and downloads replace the old ``/fetch/`` urls. Redirects are in place.
|
||||||
|
* Endpoint for document uploads replaces the old ``/push`` url. Redirects are in place.
|
||||||
|
* Foreign key relationships are now served as IDs, not as urls.
|
||||||
|
|
||||||
|
* **Modified [breaking]:** PostgreSQL:
|
||||||
|
|
||||||
|
* If ``PAPERLESS_DBHOST`` is specified in the settings, paperless uses postgresql instead of sqlite.
|
||||||
|
Username, database and password all default to ``paperless`` if not specified.
|
||||||
|
* **docker-compose.yml uses PostgreSQL by default.**
|
||||||
|
|
||||||
|
* **Modified [breaking]:** document_retagger management command rework. See TODO hyperref
|
||||||
|
* **Removed [breaking]:** Reminders.
|
||||||
|
* **Removed:** All customizations made to the django admin pages.
|
||||||
|
|
||||||
|
* **Internal changes:** Mostly code cleanup, including:
|
||||||
|
|
||||||
|
* Rework of the code of the tesseract parser. This is now a lot cleaner.
|
||||||
|
* Rework of the filename handling code. It was a mess.
|
||||||
|
* Fixed some issues with the document exporter not exporting all documents when encountering duplicate filenames.
|
||||||
|
* Consumer rework: now uses the excellent watchdog library, lots of code removed.
|
||||||
|
* Added a task scheduler that takes care of checking mail, training the classifier and maintaining the document search index.
|
||||||
|
* Updated dependencies. Now uses Pipenv all around.
|
||||||
|
* Updated Dockerfile and docker-compose. Now uses ``supervisord`` to run everything paperless-related in a single container.
|
||||||
|
|
||||||
|
* **Settings:**
|
||||||
|
|
||||||
|
* ``PAPERLESS_FORGIVING_OCR`` is now default and gone. Reason: Even if ``langdetect`` fails to detect
|
||||||
|
a language, tesseract still does a very good job at ocr'ing a document with the default language.
|
||||||
|
Certain language specifics such as umlauts may not get picked up properly.
|
||||||
|
* ``PAPERLESS_DEBUG`` defaults to ``false``.
|
||||||
|
* The presence of ``PAPERLESS_DBHOST`` now determines whether to use PostgreSQL or
|
||||||
|
sqlite.
|
||||||
|
|
||||||
|
* Many more small changes here and there. The usual stuff.
|
||||||
|
|
||||||
2.7.0
|
2.7.0
|
||||||
=====
|
=====
|
||||||
|
|
||||||
|
@ -1,15 +0,0 @@
|
|||||||
Changelog (jonaswinkler)
|
|
||||||
########################
|
|
||||||
|
|
||||||
1.0.0
|
|
||||||
=====
|
|
||||||
|
|
||||||
* First release based on paperless 2.6.0
|
|
||||||
* Added: Automatic document classification using neural networks (replaces
|
|
||||||
regex-based tagging)
|
|
||||||
* Added: Document types
|
|
||||||
* Added: Archive serial number allows easy referencing of physical document
|
|
||||||
copies
|
|
||||||
* Added: Inbox tags (added automatically to newly consumed documents)
|
|
||||||
* Added: Document viewer on document edit page
|
|
||||||
* Database backend is now configurable
|
|
@ -54,7 +54,7 @@ source_suffix = '.rst'
|
|||||||
master_doc = 'index'
|
master_doc = 'index'
|
||||||
|
|
||||||
# General information about the project.
|
# General information about the project.
|
||||||
project = u'Paperless'
|
project = u'Paperless-ng'
|
||||||
copyright = u'2015, Daniel Quinn'
|
copyright = u'2015, Daniel Quinn'
|
||||||
|
|
||||||
# The version info for the project you're documenting, acts as replacement for
|
# The version info for the project you're documenting, acts as replacement for
|
||||||
@ -205,7 +205,8 @@ try:
|
|||||||
import sphinx_rtd_theme
|
import sphinx_rtd_theme
|
||||||
html_theme = "sphinx_rtd_theme"
|
html_theme = "sphinx_rtd_theme"
|
||||||
html_theme_path = [sphinx_rtd_theme.get_html_theme_path()]
|
html_theme_path = [sphinx_rtd_theme.get_html_theme_path()]
|
||||||
except ImportError:
|
except ImportError as e:
|
||||||
|
print("error " + str(e))
|
||||||
pass
|
pass
|
||||||
|
|
||||||
# -- Options for LaTeX output ---------------------------------------------
|
# -- Options for LaTeX output ---------------------------------------------
|
||||||
|
@ -1,255 +0,0 @@
|
|||||||
.. _consumption:
|
|
||||||
|
|
||||||
Consumption
|
|
||||||
###########
|
|
||||||
|
|
||||||
Once you've got Paperless setup, you need to start feeding documents into it.
|
|
||||||
Currently, there are three options: the consumption directory, IMAP (email), and
|
|
||||||
HTTP POST.
|
|
||||||
|
|
||||||
|
|
||||||
.. _consumption-directory:
|
|
||||||
|
|
||||||
The Consumption Directory
|
|
||||||
=========================
|
|
||||||
|
|
||||||
The primary method of getting documents into your database is by putting them in
|
|
||||||
the consumption directory. The ``document_consumer`` script runs in an infinite
|
|
||||||
loop looking for new additions to this directory and when it finds them, it goes
|
|
||||||
about the process of parsing them with the OCR, indexing what it finds, and
|
|
||||||
encrypting the PDF (if ``PAPERLESS_PASSPHRASE`` is set), storing it in the
|
|
||||||
media directory.
|
|
||||||
|
|
||||||
Getting stuff into this directory is up to you. If you're running Paperless
|
|
||||||
on your local computer, you might just want to drag and drop files there, but if
|
|
||||||
you're running this on a server and want your scanner to automatically push
|
|
||||||
files to this directory, you'll need to setup some sort of service to accept the
|
|
||||||
files from the scanner. Typically, you're looking at an FTP server like
|
|
||||||
`Proftpd`_ or `Samba`_.
|
|
||||||
|
|
||||||
.. _Proftpd: http://www.proftpd.org/
|
|
||||||
.. _Samba: http://www.samba.org/
|
|
||||||
|
|
||||||
So where is this consumption directory? It's wherever you define it. Look for
|
|
||||||
the ``CONSUMPTION_DIR`` value in ``settings.py``. Set that to somewhere
|
|
||||||
appropriate for your use and put some documents in there. When you're ready,
|
|
||||||
follow the :ref:`consumer <utilities-consumer>` instructions to get it running.
|
|
||||||
|
|
||||||
|
|
||||||
.. _consumption-directory-hook:
|
|
||||||
|
|
||||||
Hooking into the Consumption Process
|
|
||||||
------------------------------------
|
|
||||||
|
|
||||||
Sometimes you may want to do something arbitrary whenever a document is
|
|
||||||
consumed. Rather than try to predict what you may want to do, Paperless lets
|
|
||||||
you execute scripts of your own choosing just before or after a document is
|
|
||||||
consumed using a couple simple hooks.
|
|
||||||
|
|
||||||
Just write a script, put it somewhere that Paperless can read & execute, and
|
|
||||||
then put the path to that script in ``paperless.conf`` with the variable name
|
|
||||||
of either ``PAPERLESS_PRE_CONSUME_SCRIPT`` or
|
|
||||||
``PAPERLESS_POST_CONSUME_SCRIPT``. The script will be executed before or
|
|
||||||
or after the document is consumed respectively.
|
|
||||||
|
|
||||||
.. important::
|
|
||||||
|
|
||||||
These scripts are executed in a **blocking** process, which means that if
|
|
||||||
a script takes a long time to run, it can significantly slow down your
|
|
||||||
document consumption flow. If you want things to run asynchronously,
|
|
||||||
you'll have to fork the process in your script and exit.
|
|
||||||
|
|
||||||
|
|
||||||
.. _consumption-directory-hook-variables:
|
|
||||||
|
|
||||||
What Can These Scripts Do?
|
|
||||||
..........................
|
|
||||||
|
|
||||||
It's your script, so you're only limited by your imagination and the laws of
|
|
||||||
physics. However, the following values are passed to the scripts in order:
|
|
||||||
|
|
||||||
|
|
||||||
.. _consumption-director-hook-variables-pre:
|
|
||||||
|
|
||||||
Pre-consumption script
|
|
||||||
::::::::::::::::::::::
|
|
||||||
|
|
||||||
* Document file name
|
|
||||||
|
|
||||||
A simple but common example for this would be creating a simple script like
|
|
||||||
this:
|
|
||||||
|
|
||||||
``/usr/local/bin/ocr-pdf``
|
|
||||||
|
|
||||||
.. code:: bash
|
|
||||||
|
|
||||||
#!/usr/bin/env bash
|
|
||||||
pdf2pdfocr.py -i ${1}
|
|
||||||
|
|
||||||
``/etc/paperless.conf``
|
|
||||||
|
|
||||||
.. code:: bash
|
|
||||||
|
|
||||||
...
|
|
||||||
PAPERLESS_PRE_CONSUME_SCRIPT="/usr/local/bin/ocr-pdf"
|
|
||||||
...
|
|
||||||
|
|
||||||
This will pass the path to the document about to be consumed to ``/usr/local/bin/ocr-pdf``,
|
|
||||||
which will in turn call `pdf2pdfocr.py`_ on your document, which will then
|
|
||||||
overwrite the file with an OCR'd version of the file and exit. At which point,
|
|
||||||
the consumption process will begin with the newly modified file.
|
|
||||||
|
|
||||||
.. _pdf2pdfocr.py: https://github.com/LeoFCardoso/pdf2pdfocr
|
|
||||||
|
|
||||||
|
|
||||||
.. _consumption-director-hook-variables-post:
|
|
||||||
|
|
||||||
Post-consumption script
|
|
||||||
:::::::::::::::::::::::
|
|
||||||
|
|
||||||
* Document id
|
|
||||||
* Generated file name
|
|
||||||
* Source path
|
|
||||||
* Thumbnail path
|
|
||||||
* Download URL
|
|
||||||
* Thumbnail URL
|
|
||||||
* Correspondent
|
|
||||||
* Tags
|
|
||||||
|
|
||||||
The script can be in any language you like, but for a simple shell script
|
|
||||||
example, you can take a look at ``post-consumption-example.sh`` in the
|
|
||||||
``scripts`` directory in this project.
|
|
||||||
|
|
||||||
|
|
||||||
.. _consumption-imap:
|
|
||||||
|
|
||||||
IMAP (Email)
|
|
||||||
============
|
|
||||||
|
|
||||||
Another handy way to get documents into your database is to email them to
|
|
||||||
yourself. The typical use-case would be to be out for lunch and want to send a
|
|
||||||
copy of the receipt back to your system at home. Paperless can be taught to
|
|
||||||
pull emails down from an arbitrary account and dump them into the consumption
|
|
||||||
directory where the process :ref:`above <consumption-directory>` will follow the
|
|
||||||
usual pattern on consuming the document.
|
|
||||||
|
|
||||||
Some things you need to know about this feature:
|
|
||||||
|
|
||||||
* It's disabled by default. By setting the values below it will be enabled.
|
|
||||||
* It's been tested in a limited environment, so it may not work for you (please
|
|
||||||
submit a pull request if you can!)
|
|
||||||
* It's designed to **delete mail from the server once consumed**. So don't go
|
|
||||||
pointing this to your personal email account and wonder where all your stuff
|
|
||||||
went.
|
|
||||||
* Currently, only one photo (attachment) per email will work.
|
|
||||||
|
|
||||||
So, with all that in mind, here's what you do to get it running:
|
|
||||||
|
|
||||||
1. Setup a new email account somewhere, or if you're feeling daring, create a
|
|
||||||
folder in an existing email box and note the path to that folder.
|
|
||||||
2. In ``/etc/paperless.conf`` set all of the appropriate values in
|
|
||||||
``PATHS AND FOLDERS`` and ``SECURITY``.
|
|
||||||
If you decided to use a subfolder of an existing account, then make sure you
|
|
||||||
set ``PAPERLESS_CONSUME_MAIL_INBOX`` accordingly here. You also have to set
|
|
||||||
the ``PAPERLESS_EMAIL_SECRET`` to something you can remember 'cause you'll
|
|
||||||
have to include that in every email you send.
|
|
||||||
3. Restart the :ref:`consumer <utilities-consumer>`. The consumer will check
|
|
||||||
the configured email account at startup and from then on every 10 minutes
|
|
||||||
for something new and pulls down whatever it finds.
|
|
||||||
4. Send yourself an email! Note that the subject is treated as the file name,
|
|
||||||
so if you set the subject to ``Correspondent - Title - tag,tag,tag``, you'll
|
|
||||||
get what you expect. Also, you must include the aforementioned secret
|
|
||||||
string in every email so the fetcher knows that it's safe to import.
|
|
||||||
Note that Paperless only allows the email title to consist of safe characters
|
|
||||||
to be imported. These consist of alpha-numeric characters and ``-_ ,.'``.
|
|
||||||
5. After a few minutes, the consumer will poll your mailbox, pull down the
|
|
||||||
message, and place the attachment in the consumption directory with the
|
|
||||||
appropriate name. A few minutes later, the consumer will import it like any
|
|
||||||
other file.
|
|
||||||
|
|
||||||
|
|
||||||
.. _consumption-http:
|
|
||||||
|
|
||||||
HTTP POST
|
|
||||||
=========
|
|
||||||
|
|
||||||
You can also submit a document via HTTP POST, so long as you do so after
|
|
||||||
authenticating. To push your document to Paperless, send an HTTP POST to the
|
|
||||||
server with the following name/value pairs:
|
|
||||||
|
|
||||||
* ``correspondent``: The name of the document's correspondent. Note that there
|
|
||||||
are restrictions on what characters you can use here. Specifically,
|
|
||||||
alphanumeric characters, `-`, `,`, `.`, and `'` are ok, everything else is
|
|
||||||
out. You also can't use the sequence ` - ` (space, dash, space).
|
|
||||||
* ``title``: The title of the document. The rules for characters is the same
|
|
||||||
here as the correspondent.
|
|
||||||
* ``document``: The file you're uploading
|
|
||||||
|
|
||||||
Specify ``enctype="multipart/form-data"``, and then POST your file with::
|
|
||||||
|
|
||||||
Content-Disposition: form-data; name="document"; filename="whatever.pdf"
|
|
||||||
|
|
||||||
An example of this in HTML is a typical form:
|
|
||||||
|
|
||||||
.. code:: html
|
|
||||||
|
|
||||||
<form method="post" enctype="multipart/form-data">
|
|
||||||
<input type="text" name="correspondent" value="My Correspondent" />
|
|
||||||
<input type="text" name="title" value="My Title" />
|
|
||||||
<input type="file" name="document" />
|
|
||||||
<input type="submit" name="go" value="Do the thing" />
|
|
||||||
</form>
|
|
||||||
|
|
||||||
But a potentially more useful way to do this would be in Python. Here we use
|
|
||||||
the requests library to handle basic authentication and to send the POST data
|
|
||||||
to the URL.
|
|
||||||
|
|
||||||
.. code:: python
|
|
||||||
|
|
||||||
import os
|
|
||||||
|
|
||||||
from hashlib import sha256
|
|
||||||
|
|
||||||
import requests
|
|
||||||
from requests.auth import HTTPBasicAuth
|
|
||||||
|
|
||||||
# You authenticate via BasicAuth or with a session id.
|
|
||||||
# We use BasicAuth here
|
|
||||||
username = "my-username"
|
|
||||||
password = "my-super-secret-password"
|
|
||||||
|
|
||||||
# Where you have Paperless installed and listening
|
|
||||||
url = "http://localhost:8000/push"
|
|
||||||
|
|
||||||
# Document metadata
|
|
||||||
correspondent = "Test Correspondent"
|
|
||||||
title = "Test Title"
|
|
||||||
|
|
||||||
# The local file you want to push
|
|
||||||
path = "/path/to/some/directory/my-document.pdf"
|
|
||||||
|
|
||||||
|
|
||||||
with open(path, "rb") as f:
|
|
||||||
|
|
||||||
response = requests.post(
|
|
||||||
url=url,
|
|
||||||
data={"title": title, "correspondent": correspondent},
|
|
||||||
files={"document": (os.path.basename(path), f, "application/pdf")},
|
|
||||||
auth=HTTPBasicAuth(username, password),
|
|
||||||
allow_redirects=False
|
|
||||||
)
|
|
||||||
|
|
||||||
if response.status_code == 202:
|
|
||||||
|
|
||||||
# Everything worked out ok
|
|
||||||
print("Upload successful")
|
|
||||||
|
|
||||||
else:
|
|
||||||
|
|
||||||
# If you don't get a 202, it's probably because your credentials
|
|
||||||
# are wrong or something. This will give you a rough idea of what
|
|
||||||
# happened.
|
|
||||||
|
|
||||||
print("We got HTTP status code: {}".format(response.status_code))
|
|
||||||
for k, v in response.headers.items():
|
|
||||||
print("{}: {}".format(k, v))
|
|
@ -1,42 +0,0 @@
|
|||||||
.. _customising:
|
|
||||||
|
|
||||||
Customising Paperless
|
|
||||||
#####################
|
|
||||||
|
|
||||||
Currently, the Paperless' interface is just the default Django admin, which
|
|
||||||
while powerful, is rather boring. If you'd like to give the site a bit of a
|
|
||||||
face-lift, or if you simply want to adjust the colours, contrast, or font size
|
|
||||||
to make things easier to read, you can do that by adding your own CSS or
|
|
||||||
Javascript quite easily.
|
|
||||||
|
|
||||||
|
|
||||||
.. _customising-overrides:
|
|
||||||
|
|
||||||
Overrides
|
|
||||||
=========
|
|
||||||
|
|
||||||
On every page load, Paperless looks for two files in your media root directory
|
|
||||||
(the directory defined by your ``PAPERLESS_MEDIADIR`` configuration variable or
|
|
||||||
the default, ``<project root>/media/``) for two files:
|
|
||||||
|
|
||||||
* ``overrides.css``
|
|
||||||
* ``overrides.js``
|
|
||||||
|
|
||||||
If it finds either or both of those files, they'll be loaded into the page: the
|
|
||||||
CSS in the ``<head>``, and the Javascript stuffed into the last line of the
|
|
||||||
``<body>``.
|
|
||||||
|
|
||||||
|
|
||||||
.. _customising-overrides-note:
|
|
||||||
|
|
||||||
An important note about customisation
|
|
||||||
-------------------------------------
|
|
||||||
|
|
||||||
Any changes you make to the site with your CSS or Javascript are likely to
|
|
||||||
depend on the structure of the current HTML and/or the existing CSS rules. For
|
|
||||||
the most part it's safe to assume that these bits won't change, but *sometimes
|
|
||||||
they do* as features are added or bugs are fixed.
|
|
||||||
|
|
||||||
If you make a change that you think others would appreciate though, submit it
|
|
||||||
as a pull request and maybe we can find a way to work it into the project by
|
|
||||||
default!
|
|
@ -1,131 +0,0 @@
|
|||||||
.. _guesswork:
|
|
||||||
|
|
||||||
Guesswork
|
|
||||||
#########
|
|
||||||
|
|
||||||
During the consumption process, Paperless tries to guess some of the attributes
|
|
||||||
of the document it's looking at. To do this it uses two approaches:
|
|
||||||
|
|
||||||
|
|
||||||
.. _guesswork-naming:
|
|
||||||
|
|
||||||
File Naming
|
|
||||||
===========
|
|
||||||
|
|
||||||
Any document you put into the consumption directory will be consumed, but if
|
|
||||||
you name the file right, it'll automatically set some values in the database
|
|
||||||
for you. This is is the logic the consumer follows:
|
|
||||||
|
|
||||||
1. Try to find the correspondent, title, and tags in the file name following
|
|
||||||
the pattern: ``Date - Correspondent - Title - tag,tag,tag.pdf``. Note that
|
|
||||||
the format of the date is **rigidly defined** as ``YYYYMMDDHHMMSSZ`` or
|
|
||||||
``YYYYMMDDZ``. The ``Z`` refers "Zulu time" AKA "UTC".
|
|
||||||
The tags are optional, so the format ``Date - Correspondent - Title.pdf``
|
|
||||||
works as well.
|
|
||||||
2. If that doesn't work, we skip the date and try this pattern:
|
|
||||||
``Correspondent - Title - tag,tag,tag.pdf``.
|
|
||||||
3. If that doesn't work, we try to find the correspondent and title in the file
|
|
||||||
name following the pattern: ``Correspondent - Title.pdf``.
|
|
||||||
4. If that doesn't work, just assume that the name of the file is the title.
|
|
||||||
|
|
||||||
So given the above, the following examples would work as you'd expect:
|
|
||||||
|
|
||||||
* ``20150314000700Z - Some Company Name - Invoice 2016-01-01 - money,invoices.pdf``
|
|
||||||
* ``20150314Z - Some Company Name - Invoice 2016-01-01 - money,invoices.pdf``
|
|
||||||
* ``Some Company Name - Invoice 2016-01-01 - money,invoices.pdf``
|
|
||||||
* ``Another Company - Letter of Reference.jpg``
|
|
||||||
* ``Dad's Recipe for Pancakes.png``
|
|
||||||
|
|
||||||
These however wouldn't work:
|
|
||||||
|
|
||||||
* ``2015-03-14 00:07:00 UTC - Some Company Name, Invoice 2016-01-01, money, invoices.pdf``
|
|
||||||
* ``2015-03-14 - Some Company Name, Invoice 2016-01-01, money, invoices.pdf``
|
|
||||||
* ``Some Company Name, Invoice 2016-01-01, money, invoices.pdf``
|
|
||||||
* ``Another Company- Letter of Reference.jpg``
|
|
||||||
|
|
||||||
Do I have to be so strict about naming?
|
|
||||||
---------------------------------------
|
|
||||||
Rather than using the strict document naming rules, one can also set the option
|
|
||||||
``PAPERLESS_FILENAME_DATE_ORDER`` in ``paperless.conf`` to any date order
|
|
||||||
that is accepted by dateparser_. Doing so will cause ``paperless`` to default
|
|
||||||
to any date format that is found in the title, instead of a date pulled from
|
|
||||||
the document's text, without requiring the strict formatting of the document
|
|
||||||
filename as described above.
|
|
||||||
|
|
||||||
.. _dateparser: https://github.com/scrapinghub/dateparser/blob/v0.7.0/docs/usage.rst#settings
|
|
||||||
|
|
||||||
Transforming filenames for parsing
|
|
||||||
----------------------------------
|
|
||||||
Some devices can't produce filenames that can be parsed by the default
|
|
||||||
parser. By configuring the option ``PAPERLESS_FILENAME_PARSE_TRANSFORMS`` in
|
|
||||||
``paperless.conf`` one can add transformations that are applied to the filename
|
|
||||||
before it's parsed.
|
|
||||||
|
|
||||||
The option contains a list of dictionaries of regular expressions (key:
|
|
||||||
``pattern``) and replacements (key: ``repl``) in JSON format, which are
|
|
||||||
applied in order by passing them to ``re.subn``. Transformation stops
|
|
||||||
after the first match, so at most one transformation is applied. The general
|
|
||||||
syntax is
|
|
||||||
|
|
||||||
.. code:: python
|
|
||||||
|
|
||||||
[{"pattern":"pattern1", "repl":"repl1"}, {"pattern":"pattern2", "repl":"repl2"}, ..., {"pattern":"patternN", "repl":"replN"}]
|
|
||||||
|
|
||||||
The example below is for a Brother ADS-2400N, a scanner that allows
|
|
||||||
different names to different hardware buttons (useful for handling
|
|
||||||
multiple entities in one instance), but insists on adding ``_<count>``
|
|
||||||
to the filename.
|
|
||||||
|
|
||||||
.. code:: python
|
|
||||||
|
|
||||||
# Brother profile configuration, support "Name_Date_Count" (the default
|
|
||||||
# setting) and "Name_Count" (use "Name" as tag and "Count" as title).
|
|
||||||
PAPERLESS_FILENAME_PARSE_TRANSFORMS=[{"pattern":"^([a-z]+)_(\\d{8})_(\\d{6})_([0-9]+)\\.", "repl":"\\2\\3Z - \\4 - \\1."}, {"pattern":"^([a-z]+)_([0-9]+)\\.", "repl":" - \\2 - \\1."}]
|
|
||||||
|
|
||||||
.. _guesswork-content:
|
|
||||||
|
|
||||||
Reading the Document Contents
|
|
||||||
=============================
|
|
||||||
|
|
||||||
After the consumer has tried to figure out what it could from the file name,
|
|
||||||
it starts looking at the content of the document itself. It will compare the
|
|
||||||
matching algorithms defined by every tag and correspondent already set in your
|
|
||||||
database to see if they apply to the text in that document. In other words,
|
|
||||||
if you defined a tag called ``Home Utility`` that had a ``match`` property of
|
|
||||||
``bc hydro`` and a ``matching_algorithm`` of ``literal``, Paperless will
|
|
||||||
automatically tag your newly-consumed document with your ``Home Utility`` tag
|
|
||||||
so long as the text ``bc hydro`` appears in the body of the document somewhere.
|
|
||||||
|
|
||||||
The matching logic is quite powerful, and supports searching the text of your
|
|
||||||
document with different algorithms, and as such, some experimentation may be
|
|
||||||
necessary to get things Just Right.
|
|
||||||
|
|
||||||
|
|
||||||
.. _guesswork-content-howto:
|
|
||||||
|
|
||||||
How Do I Set Up These Matching Algorithms?
|
|
||||||
------------------------------------------
|
|
||||||
|
|
||||||
Setting up of the algorithms is easily done through the admin interface. When
|
|
||||||
you create a new correspondent or tag, there are optional fields for matching
|
|
||||||
text and matching algorithm. From the help info there:
|
|
||||||
|
|
||||||
.. note::
|
|
||||||
|
|
||||||
Which algorithm you want to use when matching text to the OCR'd PDF. Here,
|
|
||||||
"any" looks for any occurrence of any word provided in the PDF, while "all"
|
|
||||||
requires that every word provided appear in the PDF, albeit not in the
|
|
||||||
order provided. A "literal" match means that the text you enter must
|
|
||||||
appear in the PDF exactly as you've entered it, and "regular expression"
|
|
||||||
uses a regex to match the PDF. If you don't know what a regex is, you
|
|
||||||
probably don't want this option.
|
|
||||||
|
|
||||||
When using the "any" or "all" matching algorithms, you can search for terms
|
|
||||||
that consist of multiple words by enclosing them in double quotes. For example,
|
|
||||||
defining a match text of ``"Bank of America" BofA`` using the "any" algorithm,
|
|
||||||
will match documents that contain either "Bank of America" or "BofA", but will
|
|
||||||
not match documents containing "Bank of South America".
|
|
||||||
|
|
||||||
Then just save your tag/correspondent and run another document through the
|
|
||||||
consumer. Once complete, you should see the newly-created document,
|
|
||||||
automatically tagged with the appropriate data.
|
|
@ -4,8 +4,8 @@ Paperless
|
|||||||
=========
|
=========
|
||||||
|
|
||||||
Paperless is a simple Django application running in two parts:
|
Paperless is a simple Django application running in two parts:
|
||||||
a :ref:`consumer <utilities-consumer>` (the thing that does the indexing) and
|
a *Consumer* (the thing that does the indexing) and
|
||||||
the :ref:`webserver <utilities-webserver>` (the part that lets you search &
|
the *Web server* (the part that lets you search &
|
||||||
download already-indexed documents). If you want to learn more about its
|
download already-indexed documents). If you want to learn more about its
|
||||||
functions keep on reading after the installation section.
|
functions keep on reading after the installation section.
|
||||||
|
|
||||||
@ -25,26 +25,34 @@ finding stuff again. I feed documents right from the post box into the scanner
|
|||||||
and then shred them. Perhaps you might find it useful too.
|
and then shred them. Perhaps you might find it useful too.
|
||||||
|
|
||||||
|
|
||||||
|
Paperless-ng
|
||||||
|
============
|
||||||
|
|
||||||
|
I wanted to make big changes to the project that will impact the way it is used
|
||||||
|
by its users greatly. Among the users who currently use paperless in production
|
||||||
|
there are probably many that don't want these changes right away. I also wanted
|
||||||
|
to have more control over what goes into the code and what does not. Therefore,
|
||||||
|
paperless-ng was created. NG stands for both Angular (the framework used for the
|
||||||
|
Frontend) and next-gen. Publishing this project under a different name also
|
||||||
|
avoids confusion between paperless and paperless-ng.
|
||||||
|
|
||||||
|
It would be great if this project could eventually merge back into the main
|
||||||
|
repository, but it needs a lot more work before that can happen.
|
||||||
|
|
||||||
|
|
||||||
Contents
|
Contents
|
||||||
========
|
========
|
||||||
|
|
||||||
.. toctree::
|
.. toctree::
|
||||||
:maxdepth: 2
|
:maxdepth: 1
|
||||||
|
|
||||||
requirements
|
|
||||||
setup
|
setup
|
||||||
consumption
|
usage_overview
|
||||||
|
advanced_usage
|
||||||
|
administration
|
||||||
api
|
api
|
||||||
utilities
|
|
||||||
guesswork
|
|
||||||
migrating
|
|
||||||
customising
|
|
||||||
extending
|
extending
|
||||||
troubleshooting
|
troubleshooting
|
||||||
contributing
|
contributing
|
||||||
scanners
|
scanners
|
||||||
screenshots
|
|
||||||
changelog
|
changelog
|
||||||
changelog_jonaswinkler
|
|
||||||
|
@ -1,109 +0,0 @@
|
|||||||
.. _migrating:
|
|
||||||
|
|
||||||
Migrating, Updates, and Backups
|
|
||||||
===============================
|
|
||||||
|
|
||||||
As Paperless is still under active development, there's a lot that can change
|
|
||||||
as software updates roll out. You should backup often, so if anything goes
|
|
||||||
wrong during an update, you at least have a means of restoring to something
|
|
||||||
usable. Thankfully, there are automated ways of backing up, restoring, and
|
|
||||||
updating the software.
|
|
||||||
|
|
||||||
|
|
||||||
.. _migrating-backup:
|
|
||||||
|
|
||||||
Backing Up
|
|
||||||
----------
|
|
||||||
|
|
||||||
So you're bored of this whole project, or you want to make a remote backup of
|
|
||||||
your files for whatever reason. This is easy to do, simply use the
|
|
||||||
:ref:`exporter <utilities-exporter>` to dump your documents and database out
|
|
||||||
into an arbitrary directory.
|
|
||||||
|
|
||||||
|
|
||||||
.. _migrating-restoring:
|
|
||||||
|
|
||||||
Restoring
|
|
||||||
---------
|
|
||||||
|
|
||||||
Restoring your data is just as easy, since nearly all of your data exists either
|
|
||||||
in the file names, or in the contents of the files themselves. You just need to
|
|
||||||
create an empty database (just follow the
|
|
||||||
:ref:`installation instructions <setup-installation>` again) and then import the
|
|
||||||
``tags.json`` file you created as part of your backup. Lastly, copy your
|
|
||||||
exported documents into the consumption directory and start up the consumer.
|
|
||||||
|
|
||||||
.. code-block:: shell-session
|
|
||||||
|
|
||||||
$ cd /path/to/project
|
|
||||||
$ rm data/db.sqlite3 # Delete the database
|
|
||||||
$ cd src
|
|
||||||
$ ./manage.py migrate # Create the database
|
|
||||||
$ ./manage.py createsuperuser
|
|
||||||
$ ./manage.py loaddata /path/to/arbitrary/place/tags.json
|
|
||||||
$ cp /path/to/exported/docs/* /path/to/consumption/dir/
|
|
||||||
$ ./manage.py document_consumer
|
|
||||||
|
|
||||||
Importing your data if you are :ref:`using Docker <setup-installation-docker>`
|
|
||||||
is almost as simple:
|
|
||||||
|
|
||||||
.. code-block:: shell-session
|
|
||||||
|
|
||||||
# Stop and remove your current containers
|
|
||||||
$ docker-compose stop
|
|
||||||
$ docker-compose rm -f
|
|
||||||
|
|
||||||
# Recreate them, add the superuser
|
|
||||||
$ docker-compose up -d
|
|
||||||
$ docker-compose run --rm webserver createsuperuser
|
|
||||||
|
|
||||||
# Load the tags
|
|
||||||
$ cat /path/to/arbitrary/place/tags.json | docker-compose run --rm webserver loaddata_stdin -
|
|
||||||
|
|
||||||
# Load your exported documents into the consumption directory
|
|
||||||
# (How you do this highly depends on how you have set this up)
|
|
||||||
$ cp /path/to/exported/docs/* /path/to/mounted/consumption/dir/
|
|
||||||
|
|
||||||
After loading the documents into the consumption directory the consumer will
|
|
||||||
immediately start consuming the documents.
|
|
||||||
|
|
||||||
|
|
||||||
.. _migrating-updates:
|
|
||||||
|
|
||||||
Updates
|
|
||||||
-------
|
|
||||||
|
|
||||||
For the most part, all you have to do to update Paperless is run ``git pull``
|
|
||||||
on the directory containing the project files, and then use Django's
|
|
||||||
``migrate`` command to execute any database schema updates that might have been
|
|
||||||
rolled in as part of the update:
|
|
||||||
|
|
||||||
.. code-block:: shell-session
|
|
||||||
|
|
||||||
$ cd /path/to/project
|
|
||||||
$ git pull
|
|
||||||
$ pip install -r requirements.txt
|
|
||||||
$ cd src
|
|
||||||
$ ./manage.py migrate
|
|
||||||
|
|
||||||
Note that it's possible (even likely) that while ``git pull`` may update some
|
|
||||||
files, the ``migrate`` step may not update anything. This is totally normal.
|
|
||||||
|
|
||||||
Additionally, as new features are added, the ability to control those features
|
|
||||||
is typically added by way of an environment variable set in ``paperless.conf``.
|
|
||||||
You may want to take a look at the ``paperless.conf.example`` file to see if
|
|
||||||
there's anything new in there compared to what you've got in ``/etc``.
|
|
||||||
|
|
||||||
If you are :ref:`using Docker <setup-installation-docker>` the update process
|
|
||||||
is similar:
|
|
||||||
|
|
||||||
.. code-block:: shell-session
|
|
||||||
|
|
||||||
$ cd /path/to/project
|
|
||||||
$ git pull
|
|
||||||
$ docker build -t paperless .
|
|
||||||
$ docker-compose run --rm consumer migrate
|
|
||||||
$ docker-compose up -d
|
|
||||||
|
|
||||||
If ``git pull`` doesn't report any changes, there is no need to continue with
|
|
||||||
the remaining steps.
|
|
@ -1,125 +0,0 @@
|
|||||||
.. _requirements:
|
|
||||||
|
|
||||||
Requirements
|
|
||||||
============
|
|
||||||
|
|
||||||
You need a Linux machine or Unix-like setup (theoretically an Apple machine
|
|
||||||
should work) that has the following software installed:
|
|
||||||
|
|
||||||
* `Python3`_ (with development libraries, pip and virtualenv)
|
|
||||||
* `GNU Privacy Guard`_
|
|
||||||
* `Tesseract`_, plus its language files matching your document base.
|
|
||||||
* `Imagemagick`_ version 6.7.5 or higher
|
|
||||||
* `unpaper`_
|
|
||||||
* `libpoppler-cpp-dev`_ PDF rendering library
|
|
||||||
* `optipng`_
|
|
||||||
|
|
||||||
.. _Python3: https://python.org/
|
|
||||||
.. _GNU Privacy Guard: https://gnupg.org
|
|
||||||
.. _Tesseract: https://github.com/tesseract-ocr
|
|
||||||
.. _Imagemagick: http://imagemagick.org/
|
|
||||||
.. _unpaper: https://github.com/unpaper/unpaper
|
|
||||||
.. _libpoppler-cpp-dev: https://poppler.freedesktop.org/
|
|
||||||
.. _optipng: http://optipng.sourceforge.net/
|
|
||||||
|
|
||||||
Notably, you should confirm how you access your Python3 installation. Many
|
|
||||||
Linux distributions will install Python3 in parallel to Python2, using the
|
|
||||||
names ``python3`` and ``python`` respectively. The same goes for ``pip3`` and
|
|
||||||
``pip``. Running Paperless with Python2 will likely break things, so make sure
|
|
||||||
that you're using the right version.
|
|
||||||
|
|
||||||
For the purposes of simplicity, ``python`` and ``pip`` is used everywhere to
|
|
||||||
refer to their Python3 versions.
|
|
||||||
|
|
||||||
In addition to the above, there are a number of Python requirements, all of
|
|
||||||
which are listed in a file called ``requirements.txt`` in the project root
|
|
||||||
directory.
|
|
||||||
|
|
||||||
If you're not working on a virtual environment (like Docker), you
|
|
||||||
should probably be using a virtualenv, but that's your call. The reasons why
|
|
||||||
you might choose a virtualenv or not aren't really within the scope of this
|
|
||||||
document. Needless to say if you don't know what a virtualenv is, you should
|
|
||||||
probably figure that out before continuing.
|
|
||||||
|
|
||||||
|
|
||||||
.. _requirements-apple:
|
|
||||||
|
|
||||||
Problems with Imagemagick & PDFs
|
|
||||||
--------------------------------
|
|
||||||
|
|
||||||
Some users have `run into problems`_ with getting ImageMagick to do its thing
|
|
||||||
with PDFs. Often this is the case with Apple systems using HomeBrew, but other
|
|
||||||
Linuxes have been a problem as well. The solution appears to be to install
|
|
||||||
ghostscript as well as ImageMagick:
|
|
||||||
|
|
||||||
.. _run into problems: https://github.com/the-paperless-project/paperless/issues/25
|
|
||||||
|
|
||||||
.. code:: bash
|
|
||||||
|
|
||||||
$ brew install ghostscript
|
|
||||||
$ brew install imagemagick
|
|
||||||
$ brew install libmagic
|
|
||||||
|
|
||||||
|
|
||||||
.. _requirements-baremetal:
|
|
||||||
|
|
||||||
Python-specific Requirements: No Virtualenv
|
|
||||||
-------------------------------------------
|
|
||||||
|
|
||||||
If you don't care to use a virtual env, then installation of the Python
|
|
||||||
dependencies is easy:
|
|
||||||
|
|
||||||
.. code:: bash
|
|
||||||
|
|
||||||
$ pip install --user --requirement /path/to/paperless/requirements.txt
|
|
||||||
|
|
||||||
This will download and install all of the requirements into
|
|
||||||
``${HOME}/.local``. Remember that your distribution may be using ``pip3`` as
|
|
||||||
mentioned above.
|
|
||||||
|
|
||||||
|
|
||||||
.. _requirements-virtualenv:
|
|
||||||
|
|
||||||
Python-specific Requirements: Virtualenv
|
|
||||||
----------------------------------------
|
|
||||||
|
|
||||||
Using a virtualenv for this is pretty straightforward: create a virtualenv,
|
|
||||||
enter it, and install the requirements using the ``requirements.txt`` file:
|
|
||||||
|
|
||||||
.. code:: bash
|
|
||||||
|
|
||||||
$ virtualenv --python=/path/to/python3 /path/to/arbitrary/directory
|
|
||||||
$ . /path/to/arbitrary/directory/bin/activate
|
|
||||||
$ pip install --requirement /path/to/paperless/requirements.txt
|
|
||||||
|
|
||||||
Now you're ready to go. Just remember to enter (activate) your virtualenv
|
|
||||||
whenever you want to use Paperless.
|
|
||||||
|
|
||||||
|
|
||||||
.. _requirements-documentation:
|
|
||||||
|
|
||||||
Documentation
|
|
||||||
-------------
|
|
||||||
|
|
||||||
As generation of the documentation is not required for the use of Paperless,
|
|
||||||
dependencies for this process are not included in ``requirements.txt``. If
|
|
||||||
you'd like to generate your own docs locally, you'll need to:
|
|
||||||
|
|
||||||
.. code:: bash
|
|
||||||
|
|
||||||
$ pip install sphinx
|
|
||||||
|
|
||||||
and then cd into the ``docs`` directory and type ``make html``.
|
|
||||||
|
|
||||||
If you are using Docker, you can use the following commands to build the
|
|
||||||
documentation and run a webserver serving it on `port 8001`_:
|
|
||||||
|
|
||||||
.. code:: bash
|
|
||||||
|
|
||||||
$ pwd
|
|
||||||
/path/to/paperless
|
|
||||||
|
|
||||||
$ docker build -t paperless:docs -f docs/Dockerfile .
|
|
||||||
$ docker run --rm -it -p "8001:8000" paperless:docs
|
|
||||||
|
|
||||||
.. _port 8001: http://127.0.0.1:8001
|
|
@ -1,7 +1,8 @@
|
|||||||
.. _scanners:
|
.. _scanners:
|
||||||
|
|
||||||
Scanner Recommendations
|
***********************
|
||||||
=======================
|
Scanner recommendations
|
||||||
|
***********************
|
||||||
|
|
||||||
As Paperless operates by watching a folder for new files, doesn't care what
|
As Paperless operates by watching a folder for new files, doesn't care what
|
||||||
scanner you use, but sometimes finding a scanner that will write to an FTP,
|
scanner you use, but sometimes finding a scanner that will write to an FTP,
|
||||||
@ -23,16 +24,19 @@ that works right for you based on recommentations from other Paperless users.
|
|||||||
+---------+----------------+-----+-----+-----+----------------+
|
+---------+----------------+-----+-----+-----+----------------+
|
||||||
| Fujitsu | `ix500`_ | yes | | yes | `eonist`_ |
|
| Fujitsu | `ix500`_ | yes | | yes | `eonist`_ |
|
||||||
+---------+----------------+-----+-----+-----+----------------+
|
+---------+----------------+-----+-----+-----+----------------+
|
||||||
|
| Fujitsu | `S1300i`_ | yes | | yes | `jonaswinkler`_|
|
||||||
|
+---------+----------------+-----+-----+-----+----------------+
|
||||||
|
|
||||||
.. _ADS-1500W: https://www.brother.ca/en/p/ads1500w
|
.. _ADS-1500W: https://www.brother.ca/en/p/ads1500w
|
||||||
.. _MFC-J6930DW: https://www.brother.ca/en/p/MFCJ6930DW
|
.. _MFC-J6930DW: https://www.brother.ca/en/p/MFCJ6930DW
|
||||||
.. _MFC-J5910DW: https://www.brother.co.uk/printers/inkjet-printers/mfcj5910dw
|
.. _MFC-J5910DW: https://www.brother.co.uk/printers/inkjet-printers/mfcj5910dw
|
||||||
.. _MFC-9142CDN: https://www.brother.co.uk/printers/laser-printers/mfc9140cdn
|
.. _MFC-9142CDN: https://www.brother.co.uk/printers/laser-printers/mfc9140cdn
|
||||||
.. _ix500: http://www.fujitsu.com/us/products/computing/peripheral/scanners/scansnap/ix500/
|
.. _ix500: https://www.fujitsu.com/global/products/computing/peripheral/scanners/scansnap/ix500/
|
||||||
|
.. _S1300i: https://www.fujitsu.com/global/products/computing/peripheral/scanners/soho/s1300i/
|
||||||
|
|
||||||
.. _danielquinn: https://github.com/danielquinn
|
.. _danielquinn: https://github.com/danielquinn
|
||||||
.. _ayounggun: https://github.com/ayounggun
|
.. _ayounggun: https://github.com/ayounggun
|
||||||
.. _bmsleight: https://github.com/bmsleight
|
.. _bmsleight: https://github.com/bmsleight
|
||||||
.. _eonist: https://github.com/eonist
|
.. _eonist: https://github.com/eonist
|
||||||
.. _REOLDEV: https://github.com/REOLDEV
|
.. _REOLDEV: https://github.com/REOLDEV
|
||||||
|
.. _jonaswinkler: https://github.com/jonaswinkler
|
||||||
|
@ -1,16 +0,0 @@
|
|||||||
.. _screenshots:
|
|
||||||
|
|
||||||
Screenshots
|
|
||||||
===========
|
|
||||||
|
|
||||||
Once everything is set-up login to paperless using the web front-end
|
|
||||||
|
|
||||||
.. image:: ./_static/Screenshot_first_run_login.png
|
|
||||||
|
|
||||||
Nice clean interface
|
|
||||||
|
|
||||||
.. image:: ./_static/Screenshot_first_logged.png
|
|
||||||
|
|
||||||
Some documents loaded in via ftp or using the scanners ftp.
|
|
||||||
|
|
||||||
.. image:: ./_static/Screenshot_upload_and_scanned.png
|
|
541
docs/setup.rst
541
docs/setup.rst
@ -1,500 +1,187 @@
|
|||||||
.. _setup:
|
|
||||||
|
|
||||||
|
*****
|
||||||
Setup
|
Setup
|
||||||
=====
|
*****
|
||||||
|
|
||||||
Paperless isn't a very complicated app, but there are a few components, so some
|
|
||||||
basic documentation is in order. If you follow along in this document and
|
|
||||||
still have trouble, please open an `issue on GitHub`_ so I can fill in the
|
|
||||||
gaps.
|
|
||||||
|
|
||||||
.. _issue on GitHub: https://github.com/the-paperless-project/paperless/issues
|
|
||||||
|
|
||||||
|
|
||||||
.. _setup-download:
|
|
||||||
|
|
||||||
Download
|
Download
|
||||||
--------
|
########
|
||||||
|
|
||||||
The source is currently only available via GitHub, so grab it from there,
|
The source is currently only available via GitHub, so grab it from there,
|
||||||
either by using ``git``:
|
by using ``git``:
|
||||||
|
|
||||||
.. code:: bash
|
.. code:: bash
|
||||||
|
|
||||||
$ git clone https://github.com/the-paperless-project/paperless.git
|
$ git clone https://github.com/jonaswinkler/paperless-ng.git
|
||||||
$ cd paperless
|
$ cd paperless
|
||||||
|
|
||||||
or just download the tarball and go that route:
|
Installation
|
||||||
|
############
|
||||||
.. code:: bash
|
|
||||||
|
|
||||||
$ cd to the directory where you want to run Paperless
|
|
||||||
$ wget https://github.com/the-paperless-project/paperless/archive/master.zip
|
|
||||||
$ unzip master.zip
|
|
||||||
$ cd paperless-master
|
|
||||||
|
|
||||||
|
|
||||||
.. _setup-installation:
|
|
||||||
|
|
||||||
Installation & Configuration
|
|
||||||
----------------------------
|
|
||||||
|
|
||||||
You can go multiple routes with setting up and running Paperless:
|
You can go multiple routes with setting up and running Paperless:
|
||||||
|
|
||||||
* The `bare metal route`_
|
* The `docker route`_
|
||||||
* The `docker route`_
|
* The `bare metal route`_
|
||||||
* A suggested `linux containers route`_
|
|
||||||
|
|
||||||
|
The recommended setup route is docker, since it takes care of all dependencies
|
||||||
|
for you.
|
||||||
|
|
||||||
The `docker route`_ is quick & easy.
|
The `docker route`_ is quick & easy.
|
||||||
|
|
||||||
The `bare metal route`_ is a bit more complicated to setup but makes it easier
|
The `bare metal route`_ is more complicated to setup but makes it easier
|
||||||
should you want to contribute some code back.
|
should you want to contribute some code back.
|
||||||
|
|
||||||
The `linux containers route`_ is quick, but makes alot of assumptions on the
|
Docker Route
|
||||||
set-up, on the other hand the script could be used to install on a base
|
============
|
||||||
debian or ubuntu server.
|
|
||||||
|
|
||||||
.. _docker route: setup-installation-docker_
|
1. Install `Docker`_ and `docker-compose`_. [#compose]_
|
||||||
.. _bare metal route: setup-installation-bare-metal_
|
|
||||||
.. _Docker Machine: https://docs.docker.com/machine/
|
|
||||||
|
|
||||||
.. _setup-installation-bare-metal:
|
.. caution::
|
||||||
|
|
||||||
Standard (Bare Metal)
|
If you want to use the included ``docker-compose.yml.example`` file, you
|
||||||
+++++++++++++++++++++
|
need to have at least Docker version **17.09.0** and docker-compose
|
||||||
|
version **1.17.0**.
|
||||||
|
|
||||||
1. Install the requirements as per the :ref:`requirements <requirements>` page.
|
See the `Docker installation guide`_ on how to install the current
|
||||||
2. Within the extract of master.zip go to the ``src`` directory.
|
version of Docker for your operating system or Linux distribution of
|
||||||
3. Copy ``../paperless.conf.example`` to ``/etc/paperless.conf`` and open it in
|
choice. To get an up-to-date version of docker-compose, follow the
|
||||||
your favourite editor. As this file contains passwords. It should only be
|
`docker-compose installation guide`_ if your package repository doesn't
|
||||||
readable by user root and paperless! Set the values for:
|
include it.
|
||||||
|
|
||||||
Set the values for:
|
.. _Docker installation guide: https://docs.docker.com/engine/installation/
|
||||||
|
.. _docker-compose installation guide: https://docs.docker.com/compose/install/
|
||||||
|
|
||||||
* ``PAPERLESS_CONSUMPTION_DIR``: this is where your documents will be
|
2. Create a copy of ``docker-compose.yml.example`` as ``docker-compose.yml``
|
||||||
dumped to be consumed by Paperless.
|
and a copy of ``docker-compose.env.example`` as ``docker-compose.env``.
|
||||||
* ``PAPERLESS_OCR_THREADS``: this is the number of threads the OCR process
|
You'll be editing both these files: taking a copy ensures that you can
|
||||||
will spawn to process document pages in parallel.
|
``git pull`` to receive updates without risking merge conflicts with your
|
||||||
* ``PAPERLESS_PASSPHRASE``: this is only required if you want to use GPG to
|
modified versions of the configuration files.
|
||||||
encrypt your document files. This is the passphrase Paperless uses to
|
3. Modify ``docker-compose.yml`` to your preferences. You should change the path
|
||||||
encrypt/decrypt the original documents. Don't worry about defining this
|
to the consumption directory in this file. Find the line that specifies where
|
||||||
if you don't want to use encryption (the default).
|
to mount the consumption directory:
|
||||||
|
|
||||||
Note also that if you're using the ``runserver`` as mentioned below, you
|
.. code::
|
||||||
should make sure that PAPERLESS_DEBUG="true" or is just commented out as
|
|
||||||
this is the default.
|
|
||||||
|
|
||||||
4. Initialise the SQLite database with ``./manage.py migrate``.
|
- ./consume:/usr/src/paperless/consume
|
||||||
5. Collect the static files for the webserver with ``./manage.py collectstatic``.
|
|
||||||
6. Create a user for your Paperless instance with
|
|
||||||
``./manage.py createsuperuser``. Follow the prompts to create your user.
|
|
||||||
7. Start the webserver with ``./manage.py runserver <IP>:<PORT>``.
|
|
||||||
If no specific IP or port is given, the default is ``127.0.0.1:8000`` also
|
|
||||||
known as http://localhost:8000/.
|
|
||||||
You should now be able to visit your (empty) installation at
|
|
||||||
`Paperless webserver`_ or whatever you chose before. You can login with the
|
|
||||||
user/pass you created in #5.
|
|
||||||
|
|
||||||
8. In a separate window, change to the ``src`` directory in this repo again,
|
Replace the part BEFORE the colon with a local directory of your choice:
|
||||||
but this time, you should start the consumer script with
|
|
||||||
``./manage.py document_consumer``.
|
|
||||||
9. Scan something or put a file into the ``CONSUMPTION_DIR``.
|
|
||||||
10. Wait a few minutes
|
|
||||||
11. Visit the document list on your webserver, and it should be there, indexed
|
|
||||||
and downloadable.
|
|
||||||
|
|
||||||
.. caution::
|
.. code::
|
||||||
|
|
||||||
This installation is not secure. Once everything is working head over to
|
- /home/jonaswinkler/paperless-inbox:/usr/src/paperless/consume
|
||||||
`Making things more permanent`_
|
|
||||||
|
|
||||||
.. _Paperless webserver: http://127.0.0.1:8000
|
Don't change the part after the colon or paperless wont find your documents.
|
||||||
.. _Making things more permanent: setup-permanent_
|
|
||||||
|
|
||||||
.. _setup-installation-docker:
|
|
||||||
|
|
||||||
Docker Method
|
|
||||||
+++++++++++++
|
|
||||||
|
|
||||||
1. Install `Docker`_.
|
|
||||||
|
|
||||||
.. caution::
|
|
||||||
|
|
||||||
As mentioned earlier, this guide assumes that you use Docker natively
|
|
||||||
under Linux. If you are using `Docker Machine`_ under Mac OS X or
|
|
||||||
Windows, you will have to adapt IP addresses, volume-mounting, command
|
|
||||||
execution and maybe more.
|
|
||||||
|
|
||||||
2. Install `docker-compose`_. [#compose]_
|
|
||||||
|
|
||||||
.. caution::
|
|
||||||
|
|
||||||
If you want to use the included ``docker-compose.yml.example`` file, you
|
|
||||||
need to have at least Docker version **1.12.0** and docker-compose
|
|
||||||
version **1.9.0**.
|
|
||||||
|
|
||||||
See the `Docker installation guide`_ on how to install the current
|
|
||||||
version of Docker for your operating system or Linux distribution of
|
|
||||||
choice. To get an up-to-date version of docker-compose, follow the
|
|
||||||
`docker-compose installation guide`_ if your package repository doesn't
|
|
||||||
include it.
|
|
||||||
|
|
||||||
.. _Docker installation guide: https://docs.docker.com/engine/installation/
|
|
||||||
.. _docker-compose installation guide: https://docs.docker.com/compose/install/
|
|
||||||
|
|
||||||
3. Create a copy of ``docker-compose.yml.example`` as ``docker-compose.yml``
|
|
||||||
and a copy of ``docker-compose.env.example`` as ``docker-compose.env``.
|
|
||||||
You'll be editing both these files: taking a copy ensures that you can
|
|
||||||
``git pull`` to receive updates without risking merge conflicts with your
|
|
||||||
modified versions of the configuration files.
|
|
||||||
4. Modify ``docker-compose.yml`` to your preferences, following the
|
|
||||||
instructions in comments in the file. The only change that is a hard
|
|
||||||
requirement is to specify where the consumption directory should
|
|
||||||
mount.[#dockercomposeyml]_
|
|
||||||
|
|
||||||
.. caution::
|
|
||||||
|
|
||||||
If you are using NFS mounts for the consume directory you also need to
|
|
||||||
change the command to turn off inotify as it doesn't work with NFS
|
|
||||||
|
|
||||||
``command: ["document_consumer", "--no-inotify"]``
|
|
||||||
|
|
||||||
|
|
||||||
5. Modify ``docker-compose.env`` and adapt the following environment variables:
|
4. Modify ``docker-compose.env``, following the comments in the file. The
|
||||||
|
most important change is to set ``USERMAP_UID`` and ``USERMAP_GID``
|
||||||
|
to the uid and gid of your user on the host system. This ensures that
|
||||||
|
both the docker container and you on the host machine have write access
|
||||||
|
to the consumption directory. If your UID and GID on the host system is
|
||||||
|
1000 (the default for the first normal user on most systems), it will
|
||||||
|
work out of the box without any modifications.
|
||||||
|
|
||||||
``PAPERLESS_PASSPHRASE``
|
5. Run ``docker-compose up -d``. This will create and start the necessary
|
||||||
This is the passphrase Paperless uses to encrypt/decrypt the original
|
|
||||||
document. If you aren't planning on using GPG encryption, you can just
|
|
||||||
leave this undefined.
|
|
||||||
|
|
||||||
``PAPERLESS_OCR_THREADS``
|
|
||||||
This is the number of threads the OCR process will spawn to process
|
|
||||||
document pages in parallel. If the variable is not set, Python determines
|
|
||||||
the core-count of your CPU and uses that value.
|
|
||||||
|
|
||||||
``PAPERLESS_OCR_LANGUAGES``
|
|
||||||
If you want the OCR to recognize other languages in addition to the
|
|
||||||
default English, set this parameter to a space separated list of
|
|
||||||
three-letter language-codes after `ISO 639-2/T`_. For a list of available
|
|
||||||
languages -- including their three letter codes -- see the
|
|
||||||
`Alpine packagelist`_.
|
|
||||||
|
|
||||||
``USERMAP_UID`` and ``USERMAP_GID``
|
|
||||||
If you want to mount the consumption volume (directory ``/consume`` within
|
|
||||||
the containers) to a host-directory -- which you probably want to do --
|
|
||||||
access rights might be an issue. The default user and group ``paperless``
|
|
||||||
in the containers have an id of 1000. The containers will enforce that the
|
|
||||||
owning group of the consumption directory will be ``paperless`` to be able
|
|
||||||
to delete consumed documents. If your host-system has a group with an ID
|
|
||||||
of 1000 and you don't want this group to have access rights to the
|
|
||||||
consumption directory, you can use ``USERMAP_GID`` to change the id in the
|
|
||||||
container and thus the one of the consumption directory. Furthermore, you
|
|
||||||
can change the id of the default user as well using ``USERMAP_UID``.
|
|
||||||
|
|
||||||
``PAPERLESS_USE_SSL``
|
|
||||||
If you want Paperless to use SSL for the user interface, set this variable
|
|
||||||
to ``true``. You also need to copy your certificate and key to the ``data``
|
|
||||||
directory, named ``ssl.cert`` and ``ssl.key``.
|
|
||||||
This is not an ideal solution and, if possible, a reverse proxy with nginx
|
|
||||||
is preferred.
|
|
||||||
|
|
||||||
6. Run ``docker-compose up -d``. This will create and start the necessary
|
|
||||||
containers.
|
containers.
|
||||||
7. To be able to login, you will need a super user. To create it, execute the
|
|
||||||
following command:
|
|
||||||
|
|
||||||
.. code-block:: shell-session
|
6. To be able to login, you will need a super user. To create it, execute the
|
||||||
|
following command:
|
||||||
|
|
||||||
$ docker-compose run --rm webserver createsuperuser
|
.. code-block:: shell-session
|
||||||
|
|
||||||
This will prompt you to set a username (default ``paperless``), an optional
|
$ docker-compose run --rm webserver createsuperuser
|
||||||
e-mail address and finally a password.
|
|
||||||
8. The default ``docker-compose.yml`` exports the webserver on your local port
|
|
||||||
8000. If you haven't adapted this, you should now be able to visit your
|
|
||||||
`Paperless webserver`_ at ``http://127.0.0.1:8000`` (or
|
|
||||||
``https://127.0.0.1:8000`` if you enabled SSL). You can login with the
|
|
||||||
user and password you just created.
|
|
||||||
9. Add files to consumption directory the way you prefer to. Following are two
|
|
||||||
possible options:
|
|
||||||
|
|
||||||
1. Mount the consumption directory to a local host path by modifying your
|
This will prompt you to set a username, an optional e-mail address and
|
||||||
``docker-compose.yml``:
|
finally a password.
|
||||||
|
|
||||||
.. code-block:: diff
|
|
||||||
|
|
||||||
diff --git a/docker-compose.yml b/docker-compose.yml
|
|
||||||
--- a/docker-compose.yml
|
|
||||||
+++ b/docker-compose.yml
|
|
||||||
@@ -17,9 +18,8 @@ services:
|
|
||||||
volumes:
|
|
||||||
- paperless-data:/usr/src/paperless/data
|
|
||||||
- paperless-media:/usr/src/paperless/media
|
|
||||||
- - /consume
|
|
||||||
+ - /local/path/you/choose:/consume
|
|
||||||
|
|
||||||
.. danger::
|
|
||||||
|
|
||||||
While the consumption container will ensure at startup that it can
|
|
||||||
**delete** a consumed file from a host-mounted directory, it might
|
|
||||||
not be able to **read** the document in the first place if the access
|
|
||||||
rights to the file are incorrect.
|
|
||||||
|
|
||||||
Make sure that the documents you put into the consumption directory
|
|
||||||
will either be readable by everyone (``chmod o+r file.pdf``) or
|
|
||||||
readable by the default user or group id 1000 (or the one you have
|
|
||||||
set with ``USERMAP_UID`` or ``USERMAP_GID`` respectively).
|
|
||||||
|
|
||||||
2. Use ``docker cp`` to copy your files directly into the container:
|
|
||||||
|
|
||||||
.. code-block:: shell-session
|
|
||||||
|
|
||||||
$ # Identify your containers
|
|
||||||
$ docker-compose ps
|
|
||||||
Name Command State Ports
|
|
||||||
-------------------------------------------------------------------------
|
|
||||||
paperless_consumer_1 /sbin/docker-entrypoint.sh ... Exit 0
|
|
||||||
paperless_webserver_1 /sbin/docker-entrypoint.sh ... Exit 0
|
|
||||||
|
|
||||||
$ docker cp /path/to/your/file.pdf paperless_consumer_1:/consume
|
|
||||||
|
|
||||||
``docker cp`` is a one-shot-command, just like ``cp``. This means that
|
|
||||||
every time you want to consume a new document, you will have to execute
|
|
||||||
``docker cp`` again. You can of course automate this process, but option
|
|
||||||
1 is generally the preferred one.
|
|
||||||
|
|
||||||
.. danger::
|
|
||||||
|
|
||||||
``docker cp`` will change the owning user and group of a copied file
|
|
||||||
to the acting user at the destination, which will be ``root``.
|
|
||||||
|
|
||||||
You therefore need to ensure that the documents you want to copy into
|
|
||||||
the container are readable by everyone (``chmod o+r file.pdf``)
|
|
||||||
before copying them.
|
|
||||||
|
|
||||||
|
7. The default ``docker-compose.yml`` exports the webserver on your local port
|
||||||
|
8000. If you haven't adapted this, you should now be able to visit your
|
||||||
|
Paperless instance at ``http://127.0.0.1:8000``. You can login with the
|
||||||
|
user and password you just created.
|
||||||
|
|
||||||
.. _Docker: https://www.docker.com/
|
.. _Docker: https://www.docker.com/
|
||||||
.. _docker-compose: https://docs.docker.com/compose/install/
|
.. _docker-compose: https://docs.docker.com/compose/install/
|
||||||
.. _ISO 639-2/T: https://en.wikipedia.org/wiki/List_of_ISO_639-1_codes
|
|
||||||
.. _Alpine packagelist: https://pkgs.alpinelinux.org/packages?name=tesseract-ocr-data*&arch=x86_64
|
|
||||||
|
|
||||||
.. [#compose] You of course don't have to use docker-compose, but it
|
.. [#compose] You of course don't have to use docker-compose, but it
|
||||||
simplifies deployment immensely. If you know your way around Docker, feel
|
simplifies deployment immensely. If you know your way around Docker, feel
|
||||||
free to tinker around without using compose!
|
free to tinker around without using compose!
|
||||||
|
|
||||||
.. [#dockercomposeyml] If you're upgrading your docker-compose images from
|
|
||||||
version 1.1.0 or earlier, you might need to change in the
|
|
||||||
``docker-compose.yml`` file the ``image: pitkley/paperless`` directive in
|
|
||||||
both the ``webserver`` and ``consumer`` sections to ``build: ./`` as per the
|
|
||||||
newer ``docker-compose.yml.example`` file
|
|
||||||
|
|
||||||
|
Bare Metal Route
|
||||||
|
================
|
||||||
|
|
||||||
.. _setup-permanent:
|
.. warning::
|
||||||
|
|
||||||
Making Things a Little more Permanent
|
TBD. User docker for now.
|
||||||
-------------------------------------
|
|
||||||
|
|
||||||
Once you've tested things and are happy with the work flow, you should secure
|
Migration to paperless-ng
|
||||||
the installation and automate the process of starting the webserver and
|
#########################
|
||||||
consumer.
|
|
||||||
|
|
||||||
|
At its core, paperless-ng is still paperless and fully compatible. However, some
|
||||||
|
things have changed under the hood, so you need to adapt your setup depending on
|
||||||
|
how you installed paperless. The important things to keep in mind are as follows.
|
||||||
|
|
||||||
.. _setup-permanent-webserver:
|
* Read the :ref:`paperless_changelog` and take note of breaking changes.
|
||||||
|
* It is recommended to use postgresql as the database now. The docker-compose
|
||||||
|
deployment will automatically create a postgresql instance and instruct
|
||||||
|
paperless to use it. This means that if you use the docker-compose script
|
||||||
|
with your current paperless media and data volumes and used the default
|
||||||
|
sqlite database, **it will not use your sqlite database and it may seem
|
||||||
|
as if your documents are gone**. You may use the provided
|
||||||
|
``docker-compose.yml.sqlite.example`` script, which does not use postgresql.
|
||||||
|
* The task scheduler of paperless, which is used to execute periodic tasks
|
||||||
|
such as email checking and maintenance, requires a `redis`_ message broker
|
||||||
|
instance. The docker-compose route takes care of that.
|
||||||
|
* The layout of the folder structure for your documents and data remains the
|
||||||
|
same.
|
||||||
|
* The frontend needs to be built from source. The docker image takes care of
|
||||||
|
that.
|
||||||
|
|
||||||
Using a Real Webserver
|
Migration to paperless-ng is then performed in a few simple steps:
|
||||||
++++++++++++++++++++++
|
|
||||||
|
|
||||||
The default is to use Django's development server, as that's easy and does the
|
1. Do a backup for two purposes: If something goes wrong, you still have your
|
||||||
job well enough on a home network. However it is heavily discouraged to use
|
data. Second, if you don't like paperless-ng, you can switch back to
|
||||||
it for more than that.
|
paperless.
|
||||||
|
|
||||||
If you want to do things right you should use a real webserver capable of
|
2. Replace the paperless source with paperless-ng. If you're using git, this
|
||||||
handling more than one thread. You will also have to let the webserver serve
|
is done by:
|
||||||
the static files (CSS, JavaScript) from the directory configured in
|
|
||||||
``PAPERLESS_STATICDIR``. The default static files directory is ``../static``.
|
|
||||||
|
|
||||||
For that you need to activate your virtual environment and collect the static
|
.. code:: bash
|
||||||
files with the command:
|
|
||||||
|
|
||||||
.. code:: bash
|
$ git remote set-url origin https://github.com/jonaswinkler/paperless-ng
|
||||||
|
$ git pull
|
||||||
|
|
||||||
$ cd <paperless directory>/src
|
3. If you are using docker, copy ``docker-compose.yml.example`` to
|
||||||
$ ./manage.py collectstatic
|
``docker-compose.yml`` and ``docker-compose.env.example`` to
|
||||||
|
``docker-compose.env``. Make adjustments to these files as necessary.
|
||||||
|
See `docker route`_ for details.
|
||||||
|
|
||||||
|
4. Update paperless. See :ref:`administration-updating` for details.
|
||||||
|
|
||||||
Apache
|
5. Start paperless-ng.
|
||||||
~~~~~~
|
|
||||||
|
|
||||||
This is a configuration supplied by `steckerhalter`_ on GitHub. It uses Apache
|
.. code:: bash
|
||||||
and mod_wsgi, with a Paperless installation in ``/home/paperless/``:
|
|
||||||
|
|
||||||
.. code:: apache
|
$ docker-compose up
|
||||||
|
|
||||||
<VirtualHost *:80>
|
This will also migrate your database as usual. Verify by inspecting the
|
||||||
ServerName example.com
|
output that the migration was successfully executed. CTRL-C will then
|
||||||
|
gracefully stop the container. After that, you can start paperless-ng as
|
||||||
|
usuall with
|
||||||
|
|
||||||
Alias /static/ /home/paperless/paperless/static/
|
.. code:: bash
|
||||||
<Directory /home/paperless/paperless/static>
|
|
||||||
Require all granted
|
|
||||||
</Directory>
|
|
||||||
|
|
||||||
WSGIScriptAlias / /home/paperless/paperless/src/paperless/wsgi.py
|
$ docker-compose up -d
|
||||||
WSGIDaemonProcess example.com user=paperless group=paperless threads=5 python-path=/home/paperless/paperless/src:/home/paperless/.env/lib/python3.6/site-packages
|
|
||||||
WSGIProcessGroup example.com
|
|
||||||
|
|
||||||
<Directory /home/paperless/paperless/src/paperless>
|
6. Paperless installed a permanent redirect to ``admin/`` in your browser. This
|
||||||
<Files wsgi.py>
|
redirect is still in place and prevents access to the new UI. Clear
|
||||||
Require all granted
|
everything related to paperless in your browsers data in order to fix
|
||||||
</Files>
|
this issue.
|
||||||
</Directory>
|
|
||||||
</VirtualHost>
|
|
||||||
|
|
||||||
.. _steckerhalter: https://github.com/steckerhalter
|
Moving data from sqlite to postgresql
|
||||||
|
=====================================
|
||||||
|
|
||||||
|
.. warning::
|
||||||
|
|
||||||
Nginx + Gunicorn
|
TBD.
|
||||||
~~~~~~~~~~~~~~~~
|
|
||||||
|
|
||||||
If you're using Nginx, the most common setup is to combine it with a
|
|
||||||
Python-based server like Gunicorn so that Nginx is acting as a proxy. Below is
|
|
||||||
a copy of a simple Nginx configuration fragment making use of a gunicorn
|
|
||||||
instance listening on localhost port 8000.
|
|
||||||
|
|
||||||
.. code:: nginx
|
|
||||||
|
|
||||||
server {
|
|
||||||
listen 80;
|
|
||||||
|
|
||||||
index index.html index.htm index.php;
|
|
||||||
access_log /var/log/nginx/paperless_access.log;
|
|
||||||
error_log /var/log/nginx/paperless_error.log;
|
|
||||||
|
|
||||||
location /static {
|
|
||||||
|
|
||||||
autoindex on;
|
|
||||||
alias <path-to-paperless-static-directory>;
|
|
||||||
|
|
||||||
}
|
|
||||||
|
|
||||||
location / {
|
|
||||||
|
|
||||||
proxy_set_header Host $http_host;
|
|
||||||
proxy_set_header X-Real-IP $remote_addr;
|
|
||||||
proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
|
|
||||||
proxy_set_header X-Forwarded-Proto $scheme;
|
|
||||||
|
|
||||||
proxy_pass http://127.0.0.1:8000;
|
|
||||||
}
|
|
||||||
}
|
|
||||||
|
|
||||||
|
|
||||||
The gunicorn server can be started with the command:
|
|
||||||
|
|
||||||
.. code-block:: shell
|
|
||||||
|
|
||||||
$ <path-to-paperless-virtual-environment>/bin/gunicorn --pythonpath=<path-to-paperless>/src paperless.wsgi -w 2
|
|
||||||
|
|
||||||
|
|
||||||
.. _setup-permanent-standard-systemd:
|
|
||||||
|
|
||||||
Standard (Bare Metal + Systemd)
|
|
||||||
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
|
||||||
|
|
||||||
If you're running on a bare metal system that's using Systemd, you can use the
|
|
||||||
service unit files in the ``scripts`` directory to set this up.
|
|
||||||
|
|
||||||
1. You'll need to create a group and user called ``paperless`` (without login)
|
|
||||||
2. Setup Paperless to be in a place that this new user can read and write to.
|
|
||||||
3. Ensure ``/etc/paperless`` is readable by the ``paperless`` user.
|
|
||||||
4. Copy the service file from the ``scripts`` directory to
|
|
||||||
``/etc/systemd/system``.
|
|
||||||
|
|
||||||
.. code-block:: bash
|
|
||||||
|
|
||||||
$ cp /path/to/paperless/scripts/paperless-consumer.service /etc/systemd/system/
|
|
||||||
$ cp /path/to/paperless/scripts/paperless-webserver.service /etc/systemd/system/
|
|
||||||
|
|
||||||
5. Edit the service file to point the ``ExecStart`` line to the proper location
|
|
||||||
of your paperless install, referencing the appropriate Python binary. For
|
|
||||||
example:
|
|
||||||
``ExecStart=/path/to/python3 /path/to/paperless/src/manage.py document_consumer``.
|
|
||||||
6. Start and enable (so they start on boot) the services.
|
|
||||||
|
|
||||||
.. code-block:: bash
|
|
||||||
|
|
||||||
$ systemctl enable paperless-consumer
|
|
||||||
$ systemctl enable paperless-webserver
|
|
||||||
$ systemctl start paperless-consumer
|
|
||||||
$ systemctl start paperless-webserver
|
|
||||||
|
|
||||||
|
|
||||||
.. _setup-permanent-standard-upstart:
|
|
||||||
|
|
||||||
Standard (Bare Metal + Upstart)
|
|
||||||
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
|
||||||
|
|
||||||
Ubuntu 14.04 and earlier use the `Upstart`_ init system to start services
|
|
||||||
during the boot process. To configure Upstart to run Paperless automatically
|
|
||||||
after restarting your system:
|
|
||||||
|
|
||||||
1. Change to the directory where Upstart's configuration files are kept:
|
|
||||||
``cd /etc/init``
|
|
||||||
2. Create a new file: ``sudo nano paperless-server.conf``
|
|
||||||
3. In the newly-created file enter::
|
|
||||||
|
|
||||||
start on (local-filesystems and net-device-up IFACE=eth0)
|
|
||||||
stop on shutdown
|
|
||||||
|
|
||||||
respawn
|
|
||||||
respawn limit 10 5
|
|
||||||
|
|
||||||
script
|
|
||||||
exec <path to paperless virtual environment>/bin/gunicorn --pythonpath=<path to parperless>/src paperless.wsgi -w 2
|
|
||||||
end script
|
|
||||||
|
|
||||||
Note that you'll need to replace ``/srv/paperless/src/manage.py`` with the
|
|
||||||
path to the ``manage.py`` script in your installation directory.
|
|
||||||
|
|
||||||
If you are using a network interface other than ``eth0``, you will have to
|
|
||||||
change ``IFACE=eth0``. For example, if you are connected via WiFi, you will
|
|
||||||
likely need to replace ``eth0`` above with ``wlan0``. To see all interfaces,
|
|
||||||
run ``ifconfig -a``.
|
|
||||||
|
|
||||||
Save the file.
|
|
||||||
|
|
||||||
4. Create a new file: ``sudo nano paperless-consumer.conf``
|
|
||||||
|
|
||||||
5. In the newly-created file enter::
|
|
||||||
|
|
||||||
start on (local-filesystems and net-device-up IFACE=eth0)
|
|
||||||
stop on shutdown
|
|
||||||
|
|
||||||
respawn
|
|
||||||
respawn limit 10 5
|
|
||||||
|
|
||||||
script
|
|
||||||
exec <path to paperless virtual environment>/bin/python <path to parperless>/manage.py document_consumer
|
|
||||||
end script
|
|
||||||
|
|
||||||
Replace the path placeholder and ``eth0`` with the appropriate value and save the file.
|
|
||||||
|
|
||||||
These two configuration files together will start both the Paperless webserver
|
|
||||||
and document consumer processes when the file system and network interface
|
|
||||||
specified is available after boot. Furthermore, if either process ever exits
|
|
||||||
unexpectedly, Upstart will try to restart it a maximum of 10 times within a 5
|
|
||||||
second period.
|
|
||||||
|
|
||||||
.. _Upstart: http://upstart.ubuntu.com/
|
|
||||||
|
|
||||||
|
|
||||||
.. _setup-permanent-docker:
|
|
||||||
|
|
||||||
Docker
|
|
||||||
~~~~~~
|
|
||||||
|
|
||||||
If you're using Docker, you can set a restart-policy_ in the
|
|
||||||
``docker-compose.yml`` to have the containers automatically start with the
|
|
||||||
Docker daemon.
|
|
||||||
|
|
||||||
.. _restart-policy: https://docs.docker.com/engine/reference/commandline/run/#restart-policies-restart
|
|
||||||
|
|
||||||
|
.. _redis: https://redis.io/
|
||||||
|
216
docs/usage_overview.rst
Normal file
216
docs/usage_overview.rst
Normal file
@ -0,0 +1,216 @@
|
|||||||
|
**************
|
||||||
|
Usage Overview
|
||||||
|
**************
|
||||||
|
|
||||||
|
Paperless is an application that manages your personal documents. With
|
||||||
|
the help of a document scanner (see :ref:`scanners`), paperless transforms
|
||||||
|
your wieldy physical document binders into a searchable archive and
|
||||||
|
provices many utilities for finding and managing your documents.
|
||||||
|
|
||||||
|
|
||||||
|
Terms and definitions
|
||||||
|
#####################
|
||||||
|
|
||||||
|
Paperless esentially consists of two different parts for managing your
|
||||||
|
documents:
|
||||||
|
|
||||||
|
* The *consumer* watches a specified folder and adds all documents in that
|
||||||
|
folder to paperless.
|
||||||
|
* The *web server* provides a UI that you use to manage and search for your
|
||||||
|
scanned documents.
|
||||||
|
|
||||||
|
Each document has a couple of fields that you can assign to them:
|
||||||
|
|
||||||
|
* A *Document* is a piece of paper that sometimes contains valuable
|
||||||
|
information.
|
||||||
|
* The *correspondent* of a document is the person, institution or company that
|
||||||
|
a document either originates form, or is sent to.
|
||||||
|
* A *tag* is a label that you can assign to documents. Think of labels as more
|
||||||
|
powerful folders: Multiple documents can be grouped together with a single
|
||||||
|
tag, however, a single document can also have multiple tags. This is not
|
||||||
|
possible with folders. The reason folders are not implemented in paperless
|
||||||
|
is simply that tags are much more versatile than folders.
|
||||||
|
* A *document type* is used to demarkate the type of a document such as letter,
|
||||||
|
bank statement, invoice, contract, etc. It is used to identify what a document
|
||||||
|
is about.
|
||||||
|
* The *date added* of a document is the date the document was scanned into
|
||||||
|
paperless. You cannot and should not change this date.
|
||||||
|
* The *date created* of a document is the date the document was intially issued.
|
||||||
|
This can be the date you bought a product, the date you signed a contract, or
|
||||||
|
the date a letter was sent to you.
|
||||||
|
* The *archive serial number* (short: ASN) of a document is the identifier of
|
||||||
|
the document in your physical document binders. See
|
||||||
|
:ref:`usage-recommended_workflow` below.
|
||||||
|
* The *content* of a document is the text that was OCR'ed from the document.
|
||||||
|
This text is fed into the search engine and is used for matching tags,
|
||||||
|
correspondents and document types.
|
||||||
|
|
||||||
|
.. TODO: hyperref
|
||||||
|
|
||||||
|
Frontend overview
|
||||||
|
#################
|
||||||
|
|
||||||
|
.. warning::
|
||||||
|
|
||||||
|
TBD. Add some fancy screenshots!
|
||||||
|
|
||||||
|
Adding documents to paperless
|
||||||
|
#############################
|
||||||
|
|
||||||
|
Once you've got Paperless setup, you need to start feeding documents into it.
|
||||||
|
Currently, there are three options: the consumption directory, IMAP (email), and
|
||||||
|
HTTP POST.
|
||||||
|
|
||||||
|
|
||||||
|
The consumption directory
|
||||||
|
=========================
|
||||||
|
|
||||||
|
The primary method of getting documents into your database is by putting them in
|
||||||
|
the consumption directory. The consumer runs in an infinite
|
||||||
|
loop looking for new additions to this directory and when it finds them, it goes
|
||||||
|
about the process of parsing them with the OCR, indexing what it finds, and storing
|
||||||
|
it in the media directory.
|
||||||
|
|
||||||
|
Getting stuff into this directory is up to you. If you're running Paperless
|
||||||
|
on your local computer, you might just want to drag and drop files there, but if
|
||||||
|
you're running this on a server and want your scanner to automatically push
|
||||||
|
files to this directory, you'll need to setup some sort of service to accept the
|
||||||
|
files from the scanner. Typically, you're looking at an FTP server like
|
||||||
|
`Proftpd`_ or a Windows folder share with `Samba`_.
|
||||||
|
|
||||||
|
.. _Proftpd: http://www.proftpd.org/
|
||||||
|
.. _Samba: http://www.samba.org/
|
||||||
|
|
||||||
|
.. TODO: hyperref to configuration of the location of this magic folder.
|
||||||
|
|
||||||
|
|
||||||
|
IMAP (Email)
|
||||||
|
============
|
||||||
|
|
||||||
|
Another handy way to get documents into your database is to email them to
|
||||||
|
yourself. The typical use-case would be to be out for lunch and want to send a
|
||||||
|
copy of the receipt back to your system at home. Paperless can be taught to
|
||||||
|
pull emails down from an arbitrary account and dump them into the consumption
|
||||||
|
directory where the consumer will follow the
|
||||||
|
usual pattern on consuming the document.
|
||||||
|
|
||||||
|
Some things you need to know about this feature:
|
||||||
|
|
||||||
|
* It's disabled by default. By setting the values below it will be enabled.
|
||||||
|
* It's been tested in a limited environment, so it may not work for you (please
|
||||||
|
submit a pull request if you can!)
|
||||||
|
* It's designed to **delete mail from the server once consumed**. So don't go
|
||||||
|
pointing this to your personal email account and wonder where all your stuff
|
||||||
|
went.
|
||||||
|
* Currently, only one photo (attachment) per email will work.
|
||||||
|
|
||||||
|
So, with all that in mind, here's what you do to get it running:
|
||||||
|
|
||||||
|
1. Setup a new email account somewhere, or if you're feeling daring, create a
|
||||||
|
folder in an existing email box and note the path to that folder.
|
||||||
|
2. In ``/etc/paperless.conf`` set all of the appropriate values in
|
||||||
|
``PATHS AND FOLDERS`` and ``SECURITY``.
|
||||||
|
If you decided to use a subfolder of an existing account, then make sure you
|
||||||
|
set ``PAPERLESS_CONSUME_MAIL_INBOX`` accordingly here. You also have to set
|
||||||
|
the ``PAPERLESS_EMAIL_SECRET`` to something you can remember 'cause you'll
|
||||||
|
have to include that in every email you send.
|
||||||
|
3. Restart paperless. Paperless will check
|
||||||
|
the configured email account at startup and from then on every 10 minutes
|
||||||
|
for something new and pulls down whatever it finds.
|
||||||
|
4. Send yourself an email! Note that the subject is treated as the file name,
|
||||||
|
so if you set the subject to ``Correspondent - Title - tag,tag,tag``, you'll
|
||||||
|
get what you expect. Also, you must include the aforementioned secret
|
||||||
|
string in every email so the fetcher knows that it's safe to import.
|
||||||
|
Note that Paperless only allows the email title to consist of safe characters
|
||||||
|
to be imported. These consist of alpha-numeric characters and ``-_ ,.'``.
|
||||||
|
|
||||||
|
|
||||||
|
REST API
|
||||||
|
========
|
||||||
|
|
||||||
|
You can also submit a document using the REST API, see the API section for details.
|
||||||
|
|
||||||
|
|
||||||
|
.. _usage-recommended_workflow:
|
||||||
|
|
||||||
|
The recommended workflow
|
||||||
|
########################
|
||||||
|
|
||||||
|
Once you have familiarized yourself with paperless and are ready to use it
|
||||||
|
for all your documents, the recommended workflow for managing your documents
|
||||||
|
is as follows. This workflow also takes into account that some documents
|
||||||
|
have to be kept in physical form, but still ensures that you get all the
|
||||||
|
advantages for these documents as well.
|
||||||
|
|
||||||
|
Preparations in paperless
|
||||||
|
=========================
|
||||||
|
|
||||||
|
* Create an inbox tag that gets assigned to all new documents.
|
||||||
|
* Create a TODO tag.
|
||||||
|
|
||||||
|
Processing of the physical documents
|
||||||
|
====================================
|
||||||
|
|
||||||
|
Keep a physical inbox. Whenever you receive a document that you need to
|
||||||
|
archive, put it into your inbox. Regulary, do the following for all documents
|
||||||
|
in your inbox:
|
||||||
|
|
||||||
|
1. For each document, decide if you need to keep the document in physical
|
||||||
|
form. This applies to certain important documents, such as contracts and
|
||||||
|
certificates.
|
||||||
|
2. If you need to keep the document, write a running number on the document
|
||||||
|
before scanning, starting at one and counting upwards. This is the archive
|
||||||
|
serial number, or ASN in short.
|
||||||
|
3. Scan the document.
|
||||||
|
4. If the document has an ASN assigned, store it in a *single* binder, sorted
|
||||||
|
by ASN. Don't order this binder in any other way.
|
||||||
|
5. If the document has no ASN, throw it away. Yay!
|
||||||
|
|
||||||
|
Over time, you will notice that your physical binder will fill up. If it is
|
||||||
|
full, label the binder with the range of ASNs in this binder (i.e., "Documents
|
||||||
|
1 to 343"), store the binder in your cellar or elsewhere, and start a new
|
||||||
|
binder.
|
||||||
|
|
||||||
|
The idea behind this process is that you will never have to use the physical
|
||||||
|
binders to find a document. If you need a specific physical document, you
|
||||||
|
may find this document by:
|
||||||
|
|
||||||
|
1. Searching in paperless for the document.
|
||||||
|
2. Identify the ASN of the document, since it appears on the scan.
|
||||||
|
3. Grab the relevant document binder and get the document. This is easy since
|
||||||
|
they are sorted by ASN.
|
||||||
|
|
||||||
|
Processing of documents in paperless
|
||||||
|
====================================
|
||||||
|
|
||||||
|
Once you have scanned in a document, proceed in paperless as follows.
|
||||||
|
|
||||||
|
1. If the document has an ASN, assign the ASN to the document.
|
||||||
|
2. Assign a correspondent to the document (i.e., your employer, bank, etc)
|
||||||
|
This isnt strictly necessary but helps in finding a document when you need
|
||||||
|
it.
|
||||||
|
3. Assign a document type (i.e., invoice, bank statement, etc) to the document
|
||||||
|
This isnt strictly necessary but helps in finding a document when you need
|
||||||
|
it.
|
||||||
|
4. Assign a proper title to the document (the name of an item you bought, the
|
||||||
|
subject of the letter, etc)
|
||||||
|
5. Check that the date of the document is corrent. Paperless tries to read
|
||||||
|
the date from the content of the document, but this fails sometimes if the
|
||||||
|
OCR is bad or multiple dates appear on the document.
|
||||||
|
6. Remove inbox tags from the documents.
|
||||||
|
|
||||||
|
|
||||||
|
Task management
|
||||||
|
===============
|
||||||
|
|
||||||
|
Some documents require attention and require you to act on the document. You
|
||||||
|
may take two different approaches to handle these documents based on how
|
||||||
|
regularly you intent to use paperless and scan documents.
|
||||||
|
|
||||||
|
* If you scan and process your documents in paperless regularly, assign a
|
||||||
|
TODO tag to all scanned documents that you need to process. Create a saved
|
||||||
|
view on the dashboard that shows all documents with this tag.
|
||||||
|
* If you do not scan documents regularly and use paperless solely for archiving,
|
||||||
|
create a physical todo box next to your physical inbox and put documents you
|
||||||
|
need to process in the TODO box. When you performed the task associated with
|
||||||
|
the document, move it to the inbox.
|
@ -1,284 +0,0 @@
|
|||||||
.. _utilities:
|
|
||||||
|
|
||||||
Utilities
|
|
||||||
=========
|
|
||||||
|
|
||||||
There's basically three utilities to Paperless: the webserver, consumer, and
|
|
||||||
if needed, the exporter. They're all detailed here.
|
|
||||||
|
|
||||||
|
|
||||||
.. _utilities-webserver:
|
|
||||||
|
|
||||||
The Webserver
|
|
||||||
-------------
|
|
||||||
|
|
||||||
At the heart of it, Paperless is a simple Django webservice, and the entire
|
|
||||||
interface is based on Django's standard admin interface. Once running, visiting
|
|
||||||
the URL for your service delivers the admin, through which you can get a
|
|
||||||
detailed listing of all available documents, search for specific files, and
|
|
||||||
download whatever it is you're looking for.
|
|
||||||
|
|
||||||
|
|
||||||
.. _utilities-webserver-howto:
|
|
||||||
|
|
||||||
How to Use It
|
|
||||||
.............
|
|
||||||
|
|
||||||
The webserver is started via the ``manage.py`` script:
|
|
||||||
|
|
||||||
.. code-block:: shell-session
|
|
||||||
|
|
||||||
$ /path/to/paperless/src/manage.py runserver
|
|
||||||
|
|
||||||
By default, the server runs on localhost, port 8000, but you can change this
|
|
||||||
with a few arguments, run ``manage.py --help`` for more information.
|
|
||||||
|
|
||||||
Add the option ``--noreload`` to reduce resource usage. Otherwise, the server
|
|
||||||
continuously polls all source files for changes to auto-reload them.
|
|
||||||
|
|
||||||
Note that when exiting this command your webserver will disappear.
|
|
||||||
If you want to run this full-time (which is kind of the point)
|
|
||||||
you'll need to have it start in the background -- something you'll need to
|
|
||||||
figure out for your own system. To get you started though, there are Systemd
|
|
||||||
service files in the ``scripts`` directory.
|
|
||||||
|
|
||||||
|
|
||||||
.. _utilities-consumer:
|
|
||||||
|
|
||||||
The Consumer
|
|
||||||
------------
|
|
||||||
|
|
||||||
The consumer script runs in an infinite loop, constantly looking at a directory
|
|
||||||
for documents to parse and index. The process is pretty straightforward:
|
|
||||||
|
|
||||||
1. Look in ``CONSUMPTION_DIR`` for a document. If one is found, go to #2.
|
|
||||||
If not, wait 10 seconds and try again. On Linux, new documents are detected
|
|
||||||
instantly via inotify, so there's no waiting involved.
|
|
||||||
2. Parse the document with Tesseract
|
|
||||||
3. Create a new record in the database with the OCR'd text
|
|
||||||
4. Attempt to automatically assign document attributes by doing some guesswork.
|
|
||||||
Read up on the :ref:`guesswork documentation<guesswork>` for more
|
|
||||||
information about this process.
|
|
||||||
5. Encrypt the document (if you have a passphrase set) and store it in the
|
|
||||||
``media`` directory under ``documents/originals``.
|
|
||||||
6. Go to #1.
|
|
||||||
|
|
||||||
|
|
||||||
.. _utilities-consumer-howto:
|
|
||||||
|
|
||||||
How to Use It
|
|
||||||
.............
|
|
||||||
|
|
||||||
The consumer is started via the ``manage.py`` script:
|
|
||||||
|
|
||||||
.. code-block:: shell-session
|
|
||||||
|
|
||||||
$ /path/to/paperless/src/manage.py document_consumer
|
|
||||||
|
|
||||||
This starts the service that will consume documents as they appear in
|
|
||||||
``CONSUMPTION_DIR``.
|
|
||||||
|
|
||||||
Note that this command runs continuously, so exiting it will mean your webserver
|
|
||||||
disappears. If you want to run this full-time (which is kind of the point)
|
|
||||||
you'll need to have it start in the background -- something you'll need to
|
|
||||||
figure out for your own system. To get you started though, there are Systemd
|
|
||||||
service files in the ``scripts`` directory.
|
|
||||||
|
|
||||||
Some command line arguments are available to customize the behavior of the
|
|
||||||
consumer. By default it will use ``/etc/paperless.conf`` values. Display the
|
|
||||||
help with:
|
|
||||||
|
|
||||||
.. code-block:: shell-session
|
|
||||||
|
|
||||||
$ /path/to/paperless/src/manage.py document_consumer --help
|
|
||||||
|
|
||||||
.. _utilities-exporter:
|
|
||||||
|
|
||||||
The Exporter
|
|
||||||
------------
|
|
||||||
|
|
||||||
Tired of fiddling with Paperless, or just want to do something stupid and are
|
|
||||||
afraid of accidentally damaging your files? You can export all of your
|
|
||||||
documents into neatly named, dated, and unencrypted files.
|
|
||||||
|
|
||||||
|
|
||||||
.. _utilities-exporter-howto:
|
|
||||||
|
|
||||||
How to Use It
|
|
||||||
.............
|
|
||||||
|
|
||||||
This too is done via the ``manage.py`` script:
|
|
||||||
|
|
||||||
.. code-block:: shell-session
|
|
||||||
|
|
||||||
$ /path/to/paperless/src/manage.py document_exporter /path/to/somewhere/
|
|
||||||
|
|
||||||
This will dump all of your unencrypted documents into ``/path/to/somewhere``
|
|
||||||
for you to do with as you please. The files are accompanied with a special
|
|
||||||
file, ``manifest.json`` which can be used to :ref:`import the files
|
|
||||||
<utilities-importer>` at a later date if you wish.
|
|
||||||
|
|
||||||
|
|
||||||
.. _utilities-exporter-howto-docker:
|
|
||||||
|
|
||||||
Docker
|
|
||||||
______
|
|
||||||
|
|
||||||
If you are :ref:`using Docker <setup-installation-docker>`, running the
|
|
||||||
expoorter is almost as easy. To mount a volume for exports, follow the
|
|
||||||
instructions in the ``docker-compose.yml.example`` file for the ``/export``
|
|
||||||
volume (making the changes in your own ``docker-compose.yml`` file, of course).
|
|
||||||
Once you have the volume mounted, the command to run an export is:
|
|
||||||
|
|
||||||
.. code-block:: shell-session
|
|
||||||
|
|
||||||
$ docker-compose run --rm consumer document_exporter /export
|
|
||||||
|
|
||||||
If you prefer to use ``docker run`` directly, supplying the necessary commandline
|
|
||||||
options:
|
|
||||||
|
|
||||||
.. code-block:: shell-session
|
|
||||||
|
|
||||||
$ # Identify your containers
|
|
||||||
$ docker-compose ps
|
|
||||||
Name Command State Ports
|
|
||||||
-------------------------------------------------------------------------
|
|
||||||
paperless_consumer_1 /sbin/docker-entrypoint.sh ... Exit 0
|
|
||||||
paperless_webserver_1 /sbin/docker-entrypoint.sh ... Exit 0
|
|
||||||
|
|
||||||
$ # Make sure to replace your passphrase and remove or adapt the id mapping
|
|
||||||
$ docker run --rm \
|
|
||||||
--volumes-from paperless_data_1 \
|
|
||||||
--volume /path/to/arbitrary/place:/export \
|
|
||||||
-e PAPERLESS_PASSPHRASE=YOUR_PASSPHRASE \
|
|
||||||
-e USERMAP_UID=1000 -e USERMAP_GID=1000 \
|
|
||||||
paperless document_exporter /export
|
|
||||||
|
|
||||||
|
|
||||||
.. _utilities-importer:
|
|
||||||
|
|
||||||
The Importer
|
|
||||||
------------
|
|
||||||
|
|
||||||
Looking to transfer Paperless data from one instance to another, or just want
|
|
||||||
to restore from a backup? This is your go-to toy.
|
|
||||||
|
|
||||||
|
|
||||||
.. _utilities-importer-howto:
|
|
||||||
|
|
||||||
How to Use It
|
|
||||||
.............
|
|
||||||
|
|
||||||
The importer works just like the exporter. You point it at a directory, and
|
|
||||||
the script does the rest of the work:
|
|
||||||
|
|
||||||
.. code-block:: shell-session
|
|
||||||
|
|
||||||
$ /path/to/paperless/src/manage.py document_importer /path/to/somewhere/
|
|
||||||
|
|
||||||
Docker
|
|
||||||
______
|
|
||||||
|
|
||||||
Assuming that you've already gone through the steps above in the
|
|
||||||
:ref:`export <utilities-exporter-howto-docker>` section, then the easiest thing
|
|
||||||
to do is just re-use the ``/export`` path you already setup:
|
|
||||||
|
|
||||||
.. code-block:: shell-session
|
|
||||||
|
|
||||||
$ docker-compose run --rm consumer document_importer /export
|
|
||||||
|
|
||||||
Similarly, if you're not using docker-compose, you can adjust the export
|
|
||||||
instructions above to do the import.
|
|
||||||
|
|
||||||
|
|
||||||
.. _utilities-retagger:
|
|
||||||
|
|
||||||
Re-running your tagging and correspondent matchers
|
|
||||||
--------------------------------------------------
|
|
||||||
|
|
||||||
Say you've imported a few hundred documents and now want to introduce
|
|
||||||
a tag or set up a new correspondent, and apply its matching to all of
|
|
||||||
the currently-imported docs. This problem is common enough that
|
|
||||||
there are tools for it.
|
|
||||||
|
|
||||||
|
|
||||||
.. _utilities-retagger-howto:
|
|
||||||
|
|
||||||
How to Do It
|
|
||||||
............
|
|
||||||
|
|
||||||
This too is done via the ``manage.py`` script:
|
|
||||||
|
|
||||||
.. code:: bash
|
|
||||||
|
|
||||||
$ /path/to/paperless/src/manage.py document_retagger
|
|
||||||
|
|
||||||
Run this after changing or adding tagging rules. It'll loop over all
|
|
||||||
of the documents in your database and attempt to match all of your
|
|
||||||
tags to them. If one matches, it'll be applied. And don't worry, you
|
|
||||||
can run this as often as you like, it won't double-tag a document.
|
|
||||||
|
|
||||||
.. code:: bash
|
|
||||||
|
|
||||||
$ /path/to/paperless/src/manage.py document_correspondents
|
|
||||||
|
|
||||||
This is the similar command to run after adding or changing a correspondent.
|
|
||||||
|
|
||||||
.. _utilities-encyption:
|
|
||||||
|
|
||||||
Enabling Encrpytion
|
|
||||||
-------------------
|
|
||||||
|
|
||||||
Let's say you've imported a few documents to play around with paperless and now
|
|
||||||
you are using it more seriously and want to enable encryption of your files.
|
|
||||||
|
|
||||||
.. utilities-encryption-howto:
|
|
||||||
|
|
||||||
Basic Syntax
|
|
||||||
.............
|
|
||||||
|
|
||||||
Again we'll use the ``manage.py`` script, passing ``change_storage_type``:
|
|
||||||
|
|
||||||
.. code:: console
|
|
||||||
|
|
||||||
$ /path/to/paperless/src/manage.py change_storage_type --help
|
|
||||||
usage: manage.py change_storage_type [-h] [--version] [-v {0,1,2,3}]
|
|
||||||
[--settings SETTINGS]
|
|
||||||
[--pythonpath PYTHONPATH] [--traceback]
|
|
||||||
[--no-color] [--passphrase PASSPHRASE]
|
|
||||||
{gpg,unencrypted} {gpg,unencrypted}
|
|
||||||
|
|
||||||
This is how you migrate your stored documents from an encrypted state to an
|
|
||||||
unencrypted one (or vice-versa)
|
|
||||||
|
|
||||||
positional arguments:
|
|
||||||
{gpg,unencrypted} The state you want to change your documents from
|
|
||||||
{gpg,unencrypted} The state you want to change your documents to
|
|
||||||
|
|
||||||
optional arguments:
|
|
||||||
--passphrase PASSPHRASE
|
|
||||||
If PAPERLESS_PASSPHRASE isn't set already, you need to
|
|
||||||
specify it here
|
|
||||||
|
|
||||||
Enabling Encryption
|
|
||||||
...................
|
|
||||||
|
|
||||||
Basic usage to enable encryption of your document store (**USE A MORE SECURE PASSPHRASE**):
|
|
||||||
|
|
||||||
(Note: If ``PAPERLESS_PASSPHRASE`` isn't set already, you need to specify it here)
|
|
||||||
|
|
||||||
.. code:: bash
|
|
||||||
|
|
||||||
$ /path/to/paperless/src/manage.py change_storage_type [--passphrase SECR3TP4SSPHRA$E] unencrypted gpg
|
|
||||||
|
|
||||||
|
|
||||||
Disabling Encryption
|
|
||||||
....................
|
|
||||||
|
|
||||||
Basic usage to enable encryption of your document store:
|
|
||||||
|
|
||||||
(Note: Again, if ``PAPERLESS_PASSPHRASE`` isn't set already, you need to specify it here)
|
|
||||||
|
|
||||||
.. code:: bash
|
|
||||||
|
|
||||||
$ /path/to/paperless/src/manage.py change_storage_type [--passphrase SECR3TP4SSPHRA$E] gpg unencrypted
|
|
Loading…
x
Reference in New Issue
Block a user