Merge branch 'master' into master

2025-12-24 02:05:48 -06:00 · 2020-12-19 13:49:01 +01:00
parent 3abf6b6b2f 45bf921a0a
commit 4317f5c9d8
602 changed files with 35301 additions and 35494 deletions
--- a/docs/_static/Screenshot_first_logged.png
+++ b/docs/_static/Screenshot_first_logged.png
--- a/docs/_static/Screenshot_first_run_login.png
+++ b/docs/_static/Screenshot_first_run_login.png
--- a/docs/_static/Screenshot_upload_and_scanned.png
+++ b/docs/_static/Screenshot_upload_and_scanned.png
--- a/docs/_static/lxc-install.svg
+++ b/docs/_static/lxc-install.svg
--- a/docs/_static/recommended_workflow.png
+++ b/docs/_static/recommended_workflow.png
--- a/docs/_static/screenshots/correspondents.png
+++ b/docs/_static/screenshots/correspondents.png
--- a/docs/_static/screenshots/dashboard.png
+++ b/docs/_static/screenshots/dashboard.png
--- a/docs/_static/screenshots/documents-filter.png
+++ b/docs/_static/screenshots/documents-filter.png
--- a/docs/_static/screenshots/documents-largecards.png
+++ b/docs/_static/screenshots/documents-largecards.png
--- a/docs/_static/screenshots/documents-smallcards.png
+++ b/docs/_static/screenshots/documents-smallcards.png
--- a/docs/_static/screenshots/documents-table.png
+++ b/docs/_static/screenshots/documents-table.png
--- a/docs/_static/screenshots/editing.png
+++ b/docs/_static/screenshots/editing.png
--- a/docs/_static/screenshots/logs.png
+++ b/docs/_static/screenshots/logs.png
--- a/docs/_static/screenshots/mail-rules-edited.png
+++ b/docs/_static/screenshots/mail-rules-edited.png
--- a/docs/_static/screenshots/mobile.png
+++ b/docs/_static/screenshots/mobile.png
--- a/docs/_static/screenshots/new-tag.png
+++ b/docs/_static/screenshots/new-tag.png
--- a/docs/_static/screenshots/search-preview.png
+++ b/docs/_static/screenshots/search-preview.png
--- a/docs/_static/screenshots/search-results.png
+++ b/docs/_static/screenshots/search-results.png
--- a/docs/administration.rst
+++ b/docs/administration.rst
@@ -0,0 +1,415 @@
+
+**************
+Administration
+**************
+
+.. _administration-backup:
+
+Making backups
+##############
+
+Multiple options exist for making backups of your paperless instance,
+depending on how you installed paperless.
+
+Before making backups, make sure that paperless is not running.
+
+Options available to any installation of paperless:
+
+*   Use the :ref:`document exporter <utilities-exporter>`.
+    The document exporter exports all your documents, thumbnails and
+    metadata to a specific folder. You may import your documents into a
+    fresh instance of paperless again or store your documents in another
+    DMS with this export.
+
+Options available to docker installations:
+
+*   Backup the docker volumes. These usually reside within
+    ``/var/lib/docker/volumes`` on the host and you need to be root in order
+    to access them.
+
+    Paperless uses 3 volumes:
+
+    *   ``paperless_media``: This is where your documents are stored.
+    *   ``paperless_data``: This is where auxillary data is stored. This
+        folder also contains the SQLite database, if you use it.
+    *   ``paperless_pgdata``: Exists only if you use PostgreSQL and contains
+        the database.
+
+Options available to bare-metal and non-docker installations:
+
+*   Backup the entire paperless folder. This ensures that if your paperless instance
+    crashes at some point or your disk fails, you can simply copy the folder back
+    into place and it works.
+
+    When using PostgreSQL, you'll also have to backup the database.
+
+.. _migrating-restoring:
+
+Restoring
+=========
+
+
+
+
+.. _administration-updating:
+
+Updating paperless
+##################
+
+If a new release of paperless-ng is available, upgrading depends on how you
+installed paperless-ng in the first place. The releases are available at
+`release page <https://github.com/jonaswinkler/paperless-ng/releases>`_.
+
+First of all, ensure that paperless is stopped.
+
+.. code:: shell-session
+
+    $ cd /path/to/paperless
+    $ docker-compose down
+
+After that, :ref:`make a backup <administration-backup>`.
+
+A.  If you used the dockerfiles archive, simply download the files of the new release,
+    adjust the settings in the files (i.e., the path to your consumption directory),
+    and replace your existing docker-compose files. Then start paperless as usual,
+    which will pull the new image, and update your database, if necessary:
+
+    .. code:: shell-session
+
+        $ cd /path/to/paperless
+        $ docker-compose up
+
+    If you see everything working, you can start paperless-ng with "-d" to have it
+    run in the background.
+
+    .. hint::
+
+        The released docker-compose files specify exact versions to be pulled from the hub.
+        This is to ensure that if the docker-compose files should change at some point
+        (i.e., services updates/configured differently), you wont run into trouble due to
+        docker pulling the ``latest`` image and running it in an older environment.
+        
+B.  If you built the image yourself, grab the new archive and replace your current
+    paperless folder with the new contents.
+
+    After that, make the necessary adjustments to the docker-compose.yml (i.e.,
+    adjust your consumption directory).
+
+    Build and start the new image with:
+
+    .. code:: shell-session
+
+        $ cd /path/to/paperless
+        $ docker-compose build
+        $ docker-compose up
+
+    If you see everything working, you can start paperless-ng with "-d" to have it
+    run in the background.
+
+.. hint::
+
+    You can usually keep your ``docker-compose.env`` file, since this file will
+    never include mandatory configuration options. However, it is worth checking
+    out the new version of this file, since it might have new recommendations
+    on what to configure.
+
+
+Updating paperless without docker
+=================================
+
+After grabbing the new release and unpacking the contents, do the following:
+
+1.  Update dependencies. New paperless version may require additional
+    dependencies. The dependencies required are listed in the section about 
+    :ref:`bare metal installations <setup-bare_metal>`.
+
+2.  Update python requirements. If you use Pipenv, this is done with the following steps.
+
+    .. code:: shell-session
+
+        $ pip install --upgrade pipenv
+        $ cd /path/to/paperless
+        $ pipenv clean
+        $ pipenv install
+
+    This creates a new virtual environment (or uses your existing environment)
+    and installs all dependencies into it.
+
+3.  Collect static files.
+
+    .. code:: shell-session
+
+        $ cd src
+        $ pipenv run python3 manage.py collectstatic --clear
+    
+4.  Migrate the database.
+
+    .. code:: shell-session
+
+        $ cd src
+        $ pipenv run python3 manage.py migrate
+
+        
+Management utilities
+####################
+
+Paperless comes with some management commands that perform various maintenance
+tasks on your paperless instance. You can invoke these commands either by
+
+.. code:: shell-session
+
+    $ cd /path/to/paperless
+    $ docker-compose run --rm webserver <command> <arguments>
+
+or
+
+.. code:: shell-session
+
+    $ cd /path/to/paperless/src
+    $ pipenv run python manage.py <command> <arguments>
+
+depending on whether you use docker or not.
+
+All commands have built-in help, which can be accessed by executing them with
+the argument ``--help``.
+
+.. _utilities-exporter:
+
+Document exporter
+=================
+
+The document exporter exports all your data from paperless into a folder for
+backup or migration to another DMS.
+
+.. code::
+
+    document_exporter target
+
+``target`` is a folder to which the data gets written. This includes documents,
+thumbnails and a ``manifest.json`` file. The manifest contains all metadata from
+the database (correspondents, tags, etc).
+
+When you use the provided docker compose script, specify ``../export`` as the
+target. This path inside the container is automatically mounted on your host on
+the folder ``export``.
+
+
+.. _utilities-importer:
+
+Document importer
+=================
+
+The document importer takes the export produced by the `Document exporter`_ and
+imports it into paperless.
+
+The importer works just like the exporter.  You point it at a directory, and
+the script does the rest of the work:
+
+.. code::
+
+    document_importer source
+
+When you use the provided docker compose script, put the export inside the
+``export`` folder in your paperless source directory. Specify ``../export``
+as the ``source``.
+
+
+.. _utilities-retagger:
+
+Document retagger
+=================
+
+Say you've imported a few hundred documents and now want to introduce
+a tag or set up a new correspondent, and apply its matching to all of
+the currently-imported docs. This problem is common enough that
+there are tools for it.
+
+.. code::
+
+    document_retagger [-h] [-c] [-T] [-t] [-i] [--use-first] [-f]
+
+    optional arguments:
+    -c, --correspondent
+    -T, --tags
+    -t, --document_type
+    -i, --inbox-only
+    --use-first
+    -f, --overwrite
+
+Run this after changing or adding matching rules. It'll loop over all
+of the documents in your database and attempt to match documents
+according to the new rules.
+
+Specify any combination of ``-c``, ``-T`` and ``-t`` to have the
+retagger perform matching of the specified metadata type. If you don't
+specify any of these options, the document retagger won't do anything.
+
+Specify ``-i`` to have the document retagger work on documents tagged
+with inbox tags only. This is useful when you don't want to mess with
+your already processed documents.
+
+When multiple document types or correspondents match a single document,
+the retagger won't assign these to the document. Specify ``--use-first``
+to override this behavior and just use the first correspondent or type
+it finds. This option does not apply to tags, since any amount of tags
+can be applied to a document.
+
+Finally, ``-f`` specifies that you wish to overwrite already assigned
+correspondents, types and/or tags. The default behavior is to not
+assign correspondents and types to documents that have this data already
+assigned. ``-f`` works differently for tags: By default, only additional tags get
+added to documents, no tags will be removed. With ``-f``, tags that don't
+match a document anymore get removed as well.
+
+
+Managing the Automatic matching algorithm
+=========================================
+
+The *Auto* matching algorithm requires a trained neural network to work.
+This network needs to be updated whenever somethings in your data
+changes. The docker image takes care of that automatically with the task
+scheduler. You can manually renew the classifier by invoking the following
+management command:
+
+.. code::
+
+    document_create_classifier
+
+This command takes no arguments.
+
+.. _`administration-index`:
+
+Managing the document search index
+==================================
+
+The document search index is responsible for delivering search results for the
+website. The document index is automatically updated whenever documents get
+added to, changed, or removed from paperless. However, if the search yields
+non-existing documents or won't find anything, you may need to recreate the
+index manually.
+
+.. code::
+
+    document_index {reindex,optimize}
+
+Specify ``reindex`` to have the index created from scratch. This may take some
+time.
+
+Specify ``optimize`` to optimize the index. This updates certain aspects of
+the index and usually makes queries faster and also ensures that the
+autocompletion works properly. This command is regularly invoked by the task
+scheduler.
+
+.. _utilities-renamer:
+
+Managing filenames
+==================
+
+If you use paperless' feature to
+:ref:`assign custom filenames to your documents <advanced-file_name_handling>`,
+you can use this command to move all your files after changing
+the naming scheme.
+
+.. warning::
+
+    Since this command moves you documents around alot, it is advised to to
+    a backup before. The renaming logic is robust and will never overwrite
+    or delete a file, but you can't ever be careful enough.
+
+.. code::
+
+    document_renamer
+
+The command takes no arguments and processes all your documents at once.
+
+
+Fetching e-mail
+===============
+
+Paperless automatically fetches your e-mail every 10 minutes by default. If
+you want to invoke the email consumer manually, call the following management
+command:
+
+.. code::
+
+    mail_fetcher
+
+The command takes no arguments and processes all your mail accounts and rules.
+
+.. _utilities-archiver:
+
+Creating archived documents
+===========================
+
+Paperless stores archived PDF/A documents alongside your original documents.
+These archived documents will also contain selectable text for image-only
+originals.
+These documents are derived from the originals, which are always stored
+unmodified. If coming from an earlier version of paperless, your documents
+won't have archived versions.
+
+This command creates PDF/A documents for your documents.
+
+.. code::
+
+    document_archiver --overwrite --document <id>
+
+This command will only attempt to create archived documents when no archived
+document exists yet, unless ``--overwrite`` is specified. If ``--document <id>``
+is specified, the archiver will only process that document.
+
+.. note::
+
+    This command essentially performs OCR on all your documents again,
+    according to your settings. If you run this with ``PAPERLESS_OCR_MODE=redo``,
+    it will potentially run for a very long time. You can cancel the command
+    at any time, since this command will skip already archived versions the next time
+    it is run.
+
+.. note::
+
+    Some documents will cause errors and cannot be converted into PDF/A documents,
+    such as encrypted PDF documents. The archiver will skip over these documents
+    each time it sees them.
+
+.. _utilities-encyption:
+
+Managing encryption
+===================
+
+Documents can be stored in Paperless using GnuPG encryption.
+
+.. danger::
+
+    Encryption is deprecated since paperless-ng 0.9 and doesn't really provide any
+    additional security, since you have to store the passphrase in a configuration
+    file on the same system as the encrypted documents for paperless to work.
+    Furthermore, the entire text content of the documents is stored plain in the
+    database, even if your documents are encrypted. Filenames are not encrypted as
+    well.
+    
+    Also, the web server provides transparent access to your encrypted documents.
+
+    Consider running paperless on an encrypted filesystem instead, which will then
+    at least provide security against physical hardware theft.
+
+
+Enabling encryption
+-------------------
+
+Enabling encryption is no longer supported.
+
+
+Disabling encryption
+--------------------
+
+Basic usage to disable encryption of your document store:
+
+(Note: If ``PAPERLESS_PASSPHRASE`` isn't set already, you need to specify it here)
+
+.. code::
+
+    decrypt_documents [--passphrase SECR3TP4SSPHRA$E]
+
+
+.. _Pipenv: https://pipenv.pypa.io/en/latest/
--- a/docs/advanced_usage.rst
+++ b/docs/advanced_usage.rst
@@ -0,0 +1,342 @@
+***************
+Advanced topics
+***************
+
+Paperless offers a couple features that automate certain tasks and make your life
+easier.
+
+Guesswork
+#########
+
+
+Any document you put into the consumption directory will be consumed, but if
+you name the file right, it'll automatically set some values in the database
+for you.  This is is the logic the consumer follows:
+
+1. Try to find the correspondent, title, and tags in the file name following
+   the pattern: ``Date - Correspondent - Title - tag,tag,tag.pdf``.  Note that
+   the format of the date is **rigidly defined** as ``YYYYMMDDHHMMSSZ`` or
+   ``YYYYMMDDZ``.  The ``Z`` refers "Zulu time" AKA "UTC".
+   The tags are optional, so the format ``Date - Correspondent - Title.pdf``
+   works as well.
+2. If that doesn't work, we skip the date and try this pattern:
+   ``Correspondent - Title - tag,tag,tag.pdf``.
+3. If that doesn't work, we try to find the correspondent and title in the file
+   name following the pattern: ``Correspondent - Title.pdf``.
+4. If that doesn't work, just assume that the name of the file is the title.
+
+So given the above, the following examples would work as you'd expect:
+
+* ``20150314000700Z - Some Company Name - Invoice 2016-01-01 - money,invoices.pdf``
+* ``20150314Z - Some Company Name - Invoice 2016-01-01 - money,invoices.pdf``
+* ``Some Company Name - Invoice 2016-01-01 - money,invoices.pdf``
+* ``Another Company - Letter of Reference.jpg``
+* ``Dad's Recipe for Pancakes.png``
+
+These however wouldn't work:
+
+* ``2015-03-14 00:07:00 UTC - Some Company Name, Invoice 2016-01-01, money, invoices.pdf``
+* ``2015-03-14 - Some Company Name, Invoice 2016-01-01, money, invoices.pdf``
+* ``Some Company Name, Invoice 2016-01-01, money, invoices.pdf``
+* ``Another Company- Letter of Reference.jpg``
+
+Do I have to be so strict about naming?
+=======================================
+
+Rather than using the strict document naming rules, one can also set the option
+``PAPERLESS_FILENAME_DATE_ORDER`` in ``paperless.conf`` to any date order
+that is accepted by dateparser_. Doing so will cause ``paperless`` to default
+to any date format that is found in the title, instead of a date pulled from
+the document's text, without requiring the strict formatting of the document
+filename as described above.
+
+.. _dateparser: https://github.com/scrapinghub/dateparser/blob/v0.7.0/docs/usage.rst#settings
+
+.. _advanced-transforming_filenames:
+
+Transforming filenames for parsing
+==================================
+
+Some devices can't produce filenames that can be parsed by the default
+parser. By configuring the option ``PAPERLESS_FILENAME_PARSE_TRANSFORMS`` in
+``paperless.conf`` one can add transformations that are applied to the filename
+before it's parsed.
+
+The option contains a list of dictionaries of regular expressions (key:
+``pattern``) and replacements (key: ``repl``) in JSON format, which are
+applied in order by passing them to ``re.subn``. Transformation stops
+after the first match, so at most one transformation is applied. The general
+syntax is
+
+.. code:: python
+
+   [{"pattern":"pattern1", "repl":"repl1"}, {"pattern":"pattern2", "repl":"repl2"}, ..., {"pattern":"patternN", "repl":"replN"}]
+
+The example below is for a Brother ADS-2400N, a scanner that allows
+different names to different hardware buttons (useful for handling
+multiple entities in one instance), but insists on adding ``_<count>``
+to the filename.
+
+.. code:: python
+
+   # Brother profile configuration, support "Name_Date_Count" (the default
+   # setting) and "Name_Count" (use "Name" as tag and "Count" as title).
+   PAPERLESS_FILENAME_PARSE_TRANSFORMS=[{"pattern":"^([a-z]+)_(\\d{8})_(\\d{6})_([0-9]+)\\.", "repl":"\\2\\3Z - \\4 - \\1."}, {"pattern":"^([a-z]+)_([0-9]+)\\.", "repl":" - \\2 - \\1."}]
+
+
+.. _advanced-matching:
+
+Matching tags, correspondents and document types
+################################################
+
+After the consumer has tried to figure out what it could from the file name,
+it starts looking at the content of the document itself.  It will compare the
+matching algorithms defined by every tag and correspondent already set in your
+database to see if they apply to the text in that document.  In other words,
+if you defined a tag called ``Home Utility`` that had a ``match`` property of
+``bc hydro`` and a ``matching_algorithm`` of ``literal``, Paperless will
+automatically tag your newly-consumed document with your ``Home Utility`` tag
+so long as the text ``bc hydro`` appears in the body of the document somewhere.
+
+The matching logic is quite powerful, and supports searching the text of your
+document with different algorithms, and as such, some experimentation may be
+necessary to get things right.
+
+In order to have a tag, correspondent or type assigned automatically to newly
+consumed documents, assign a match and matching algorithm using the web
+interface. These settings define when to assign correspondents, tags and types
+to documents.
+
+The following algorithms are available:
+
+* **Any:** Looks for any occurrence of any word provided in match in the PDF.
+  If you define the match as ``Bank1 Bank2``, it will match documents containing
+  either of these terms.
+* **All:** Requires that every word provided appears in the PDF, albeit not in the
+  order provided.
+* **Literal:** Matches only if the match appears exactly as provided in the PDF.
+* **Regular expression:** Parses the match as a regular expression and tries to
+  find a match within the document.
+* **Fuzzy match:** I dont know. Look at the source.
+* **Auto:** Tries to automatically match new documents. This does not require you
+  to set a match. See the notes below.
+
+When using the "any" or "all" matching algorithms, you can search for terms
+that consist of multiple words by enclosing them in double quotes. For example,
+defining a match text of ``"Bank of America" BofA`` using the "any" algorithm,
+will match documents that contain either "Bank of America" or "BofA", but will
+not match documents containing "Bank of South America".
+
+Then just save your tag/correspondent and run another document through the
+consumer.  Once complete, you should see the newly-created document,
+automatically tagged with the appropriate data.
+
+
+.. _advanced-automatic_matching:
+
+Automatic matching
+==================
+
+Paperless-ng comes with a new matching algorithm called *Auto*. This matching
+algorithm tries to assign tags, correspondents and document types to your
+documents based on how you have assigned these on existing documents. It
+uses a neural network under the hood.
+
+If, for example, all your bank statements of your account 123 at the Bank of
+America are tagged with the tag "bofa_123" and the matching algorithm of this
+tag is set to *Auto*, this neural network will examine your documents and
+automatically learn when to assign this tag.
+
+Paperless tries to hide much of the involved complexity with this approach.
+However, there are a couple caveats you need to keep in mind when using this
+feature:
+
+* Changes to your documents are not immediately reflected by the matching
+  algorithm. The neural network needs to be *trained* on your documents after
+  changes. Paperless periodically (default: once each hour) checks for changes
+  and does this automatically for you.
+* The Auto matching algorithm only takes documents into account which are NOT
+  placed in your inbox (i.e., have inbox tags assigned to them). This ensures
+  that the neural network only learns from documents which you have correctly
+  tagged before.
+* The matching algorithm can only work if there is a correlation between the
+  tag, correspondent or document type and the document itself. Your bank
+  statements usually contain your bank account number and the name of the bank,
+  so this works reasonably well, However, tags such as "TODO" cannot be
+  automatically assigned.
+* The matching algorithm needs a reasonable number of documents to identify when
+  to assign tags, correspondents, and types. If one out of a thousand documents
+  has the correspondent "Very obscure web shop I bought something five years
+  ago", it will probably not assign this correspondent automatically if you buy
+  something from them again. The more documents, the better.
+* Paperless also needs a reasonable amount of negative examples to decide when
+  not to assign a certain tag, correspondent or type. This will usually be the
+  case as you start filling up paperless with documents. Example: If all your
+  documents are either from "Webshop" and "Bank", paperless will assign one of
+  these correspondents to ANY new document, if both are set to automatic matching.
+
+Hooking into the consumption process
+####################################
+
+Sometimes you may want to do something arbitrary whenever a document is
+consumed.  Rather than try to predict what you may want to do, Paperless lets
+you execute scripts of your own choosing just before or after a document is
+consumed using a couple simple hooks.
+
+Just write a script, put it somewhere that Paperless can read & execute, and
+then put the path to that script in ``paperless.conf`` with the variable name
+of either ``PAPERLESS_PRE_CONSUME_SCRIPT`` or
+``PAPERLESS_POST_CONSUME_SCRIPT``.
+
+.. important::
+
+    These scripts are executed in a **blocking** process, which means that if
+    a script takes a long time to run, it can significantly slow down your
+    document consumption flow.  If you want things to run asynchronously,
+    you'll have to fork the process in your script and exit.
+
+
+Pre-consumption script
+======================
+
+Executed after the consumer sees a new document in the consumption folder, but
+before any processing of the document is performed. This script receives exactly
+one argument:
+
+* Document file name
+
+A simple but common example for this would be creating a simple script like
+this:
+
+``/usr/local/bin/ocr-pdf``
+
+.. code:: bash
+
+    #!/usr/bin/env bash
+    pdf2pdfocr.py -i ${1}
+
+``/etc/paperless.conf``
+
+.. code:: bash
+
+    ...
+    PAPERLESS_PRE_CONSUME_SCRIPT="/usr/local/bin/ocr-pdf"
+    ...
+
+This will pass the path to the document about to be consumed to ``/usr/local/bin/ocr-pdf``,
+which will in turn call `pdf2pdfocr.py`_ on your document, which will then
+overwrite the file with an OCR'd version of the file and exit.  At which point,
+the consumption process will begin with the newly modified file.
+
+.. _pdf2pdfocr.py: https://github.com/LeoFCardoso/pdf2pdfocr
+
+.. _advanced-post_consume_script:
+
+Post-consumption script
+=======================
+
+Executed after the consumer has successfully processed a document and has moved it
+into paperless. It receives the following arguments:
+
+* Document id
+* Generated file name
+* Source path
+* Thumbnail path
+* Download URL
+* Thumbnail URL
+* Correspondent
+* Tags
+
+The script can be in any language you like, but for a simple shell script
+example, you can take a look at ``post-consumption-example.sh`` in the
+``scripts`` directory in this project.
+
+The post consumption script cannot cancel the consumption process.
+
+.. _advanced-file_name_handling:
+
+File name handling
+##################
+
+By default, paperless stores your documents in the media directory and renames them
+using the identifier which it has assigned to each document. You will end up getting
+files like ``0000123.pdf`` in your media directory. This isn't necessarily a bad
+thing, because you normally don't have to access these files manually. However, if
+you wish to name your files differently, you can do that by adjusting the
+``PAPERLESS_FILENAME_FORMAT`` configuration option.
+
+This variable allows you to configure the filename (folders are allowed) using
+placeholders. For example, configuring this to
+
+.. code:: bash
+
+    PAPERLESS_FILENAME_FORMAT={created_year}/{correspondent}/{title}
+
+will create a directory structure as follows:
+
+.. code::
+
+    2019/
+      My bank/
+        Statement January.pdf
+        Statement February.pdf
+    2020/
+      My bank/
+        Statement January.pdf
+        Letter.pdf
+        Letter_01.pdf
+      Shoe store/
+        My new shoes.pdf
+
+.. danger::
+
+    Do not manually move your files in the media folder. Paperless remembers the
+    last filename a document was stored as. If you do rename a file, paperless will
+    report your files as missing and won't be able to find them.
+
+Paperless provides the following placeholders withing filenames:
+
+* ``{correspondent}``: The name of the correspondent, or "none".
+* ``{document_type}``: The name of the document type, or "none".
+* ``{tag_list}``: A comma separated list of all tags assigned to the document.
+* ``{title}``: The title of the document.
+* ``{created}``: The full date and time the document was created.
+* ``{created_year}``: Year created only.
+* ``{created_month}``: Month created only (number 1-12).
+* ``{created_day}``: Day created only (number 1-31).
+* ``{added}``: The full date and time the document was added to paperless.
+* ``{added_year}``: Year added only.
+* ``{added_month}``: Month added only (number 1-12).
+* ``{added_day}``: Day added only (number 1-31).
+
+
+Paperless will try to conserve the information from your database as much as possible.
+However, some characters that you can use in document titles and correspondent names (such
+as ``: \ /`` and a couple more) are not allowed in filenames and will be replaced with dashes.
+
+If paperless detects that two documents share the same filename, paperless will automatically
+append ``_01``, ``_02``, etc to the filename. This happens if all the placeholders in a filename
+evaluate to the same value.
+
+.. hint::
+
+    Paperless checks the filename of a document whenever it is saved. Therefore,
+    you need to update the filenames of your documents and move them after altering
+    this setting by invoking the :ref:`document renamer <utilities-renamer>`.
+
+.. warning::
+
+    Make absolutely sure you get the spelling of the placeholders right, or else
+    paperless will use the default naming scheme instead.
+
+.. caution::
+
+    As of now, you could totally tell paperless to store your files anywhere outside
+    the media directory by setting
+
+    .. code::
+
+        PAPERLESS_FILENAME_FORMAT=../../my/custom/location/{title}
+
+    However, keep in mind that inside docker, if files get stored outside of the
+    predefined volumes, they will be lost after a restart of paperless.
--- a/docs/api.rst
+++ b/docs/api.rst
@@ -1,23 +1,291 @@
-.. _api:

+************
 The REST API
-############
+************

-Paperless makes use of the `Django REST Framework`_ standard API interface
-because of its inherent awesomeness.  Conveniently, the system is also
-self-documenting, so to learn more about the access points, schema, what's
-accepted and what isn't, you need only visit ``/api`` on your local Paperless
-installation.
+
+Paperless makes use of the `Django REST Framework`_ standard API interface.
+It provides a browsable API for most of its endpoints, which you can inspect
+at ``http://<paperless-host>:<port>/api/``. This also documents most of the
+available filters and ordering fields.

 .. _Django REST Framework: http://django-rest-framework.org/

+The API provides 5 main endpoints:

-.. _api-uploading:
+*   ``/api/documents/``: Full CRUD support, except POSTing new documents. See below.
+*   ``/api/correspondents/``: Full CRUD support.
+*   ``/api/document_types/``: Full CRUD support.
+*   ``/api/logs/``: Read-Only.
+*   ``/api/tags/``: Full CRUD support.

-Uploading
---------
+All of these endpoints except for the logging endpoint
+allow you to fetch, edit and delete individual objects
+by appending their primary key to the path, for example ``/api/documents/454/``.

-File uploads in an API are hard and so far as I've been able to tell, there's
-no standard way of accepting them, so rather than crowbar file uploads into the
-REST API and endure that headache, I've left that process to a simple HTTP
-POST, documented on the :ref:`consumption page <consumption-http>`.
+The objects served by the document endpoint contain the following fields:
+
+*   ``id``: ID of the document. Read-only.
+*   ``title``: Title of the document.
+*   ``content``: Plain text content of the document.
+*   ``tags``: List of IDs of tags assigned to this document, or empty list.
+*   ``document_type``: Document type of this document, or null.
+*   ``correspondent``:  Correspondent of this document or null.
+*   ``created``: The date at which this document was created.
+*   ``modified``: The date at which this document was last edited in paperless. Read-only.
+*   ``added``: The date at which this document was added to paperless. Read-only.
+*   ``archive_serial_number``: The identifier of this document in a physical document archive.
+*   ``original_file_name``: Verbose filename of the original document. Read-only.
+*   ``archived_file_name``: Verbose filename of the archived document. Read-only. Null if no archived document is available.
+
+
+Downloading documents
+#####################
+
+In addition to that, the document endpoint offers these additional actions on
+individual documents:
+
+*   ``/api/documents/<pk>/download/``: Download the document.
+*   ``/api/documents/<pk>/preview/``: Display the document inline,
+    without downloading it.
+*   ``/api/documents/<pk>/thumb/``: Download the PNG thumbnail of a document.
+
+Paperless generates archived PDF/A documents from consumed files and stores both
+the original files as well as the archived files. By default, the endpoints
+for previews and downloads serve the archived file, if it is available.
+Otherwise, the original file is served.
+Some document cannot be archived.
+
+The endpoints correctly serve the response header fields ``Content-Disposition``
+and ``Content-Type`` to indicate the filename for download and the type of content of
+the document.
+
+In order to download or preview the original document when an archied document is available,
+supply the query parameter ``original=true``.
+
+.. hint::
+
+    Paperless used to provide these functionality at ``/fetch/<pk>/preview``,
+    ``/fetch/<pk>/thumb`` and ``/fetch/<pk>/doc``. Redirects to the new URLs
+    are in place. However, if you use these old URLs to access documents, you
+    should update your app or script to use the new URLs.
+
+
+Getting document metadata
+#########################
+
+The api also has an endpoint to retrieve read-only metadata about specific documents. this
+information is not served along with the document objects, since it requires reading
+files and would therefore slow down document lists considerably.
+
+Access the metadata of a document with an ID ``id`` at ``/api/documents/<id>/metadata/``.
+
+The endpoint reports the following data:
+
+*   ``original_checksum``: MD5 checksum of the original document.
+*   ``original_size``: Size of the original document, in bytes.
+*   ``original_mime_type``: Mime type of the original document.
+*   ``media_filename``: Current filename of the document, under which it is stored inside the media directory.
+*   ``has_archive_version``: True, if this document is archived, false otherwise.
+*   ``original_metadata``: A list of metadata associated with the original document. See below.
+*   ``archive_checksum``: MD5 checksum of the archived document, or null.
+*   ``archive_size``: Size of the archived document in bytes, or null.
+*   ``archive_metadata``: Metadata associated with the archived document, or null. See below.
+
+File metadata is reported as a list of objects in the following form:
+
+.. code:: json
+
+    [
+        {
+            "namespace": "http://ns.adobe.com/pdf/1.3/",
+            "prefix": "pdf",
+            "key": "Producer",
+            "value": "SparklePDF, Fancy edition"
+        },
+    ]
+
+``namespace`` and ``prefix`` can be null. The actual metadata reported depends on the file type and the metadata
+available in that specific document. Paperless only reports PDF metadata at this point.
+
+Authorization
+#############
+
+The REST api provides three different forms of authentication.
+
+1.  Basic authentication
+
+    Authorize by providing a HTTP header in the form
+
+    .. code::
+
+        Authorization: Basic <credentials>
+
+    where ``credentials`` is a base64-encoded string of ``<username>:<password>``
+
+2.  Session authentication
+
+    When you're logged into paperless in your browser, you're automatically
+    logged into the API as well and don't need to provide any authorization
+    headers.
+
+3.  Token authentication
+
+    Paperless also offers an endpoint to acquire authentication tokens.
+
+    POST a username and password as a form or json string to ``/api/token/``
+    and paperless will respond with a token, if the login data is correct.
+    This token can be used to authenticate other requests with the
+    following HTTP header:
+
+    .. code::
+
+        Authorization: Token <token>
+
+    Tokens can be managed and revoked in the paperless admin.
+
+Searching for documents
+#######################
+
+Paperless-ng offers API endpoints for full text search. These are as follows:
+
+``/api/search/``
+================
+
+Get search results based on a query.
+
+Query parameters:
+
+*   ``query``: The query string. See
+    `here <https://whoosh.readthedocs.io/en/latest/querylang.html>`_
+    for details on the syntax.
+*   ``page``: Specify the page you want to retrieve. Each page
+    contains 10 search results and the first page is ``page=1``, which
+    is the default if this is omitted.
+
+Result list object returned by the endpoint:
+
+.. code:: json
+
+    {
+        "count": 1,
+        "page": 1,
+        "page_count": 1,
+        "corrected_query": "",
+        "results": [
+
+        ]
+    }
+
+*   ``count``: The approximate total number of results.
+*   ``page``: The page returned to you. This might be different from
+    the page you requested, if you requested a page that is behind
+    the last page. In that case, the last page is returned.
+*   ``page_count``: The total number of pages.
+*   ``corrected_query``: Corrected version of the query string. Can be null.
+    If not null, can be used verbatim to start a new query.
+*   ``results``: A list of result objects on the current page.
+
+Result object:
+
+.. code:: json
+
+    {
+        "id": 1,
+        "highlights": [
+
+        ],
+        "score": 6.34234,
+        "rank": 23,
+        "document": {
+
+        }
+    }
+
+*   ``id``: the primary key of the found document
+*   ``highlights``: an object containing parsable highlights for the result.
+    See below.
+*   ``score``: The score assigned to the document. A higher score indicates a
+    better match with the query. Search results are sorted descending by score.
+*   ``rank``: the position of the document within the entire search results list.
+*   ``document``: The full json of the document, as returned by
+    ``/api/documents/<id>/``.
+
+Highlights object:
+
+Highlights are provided as a list of fragments. A fragment is a longer section of
+text from the original document.
+Each fragment contains a list of strings, and some of them are marked as a highlight.
+
+.. code:: json
+
+    [
+        [
+            {"text": "This is a sample text with a "},
+            {"text": "highlighted", "term": 0},
+            {"text": " word."}
+        ],
+        [
+            {"text": "Another", "term": 1},
+            {"text": " fragment with a highlight."}
+        ]
+    ]
+
+
+
+When ``term`` is present within a string, the word within ``text`` should be highlighted.
+The term index groups multiple matches together and words with the same index
+should get identical highlighting.
+A client may use this example to produce the following output:
+
+... This is a sample text with a **highlighted** word. ... **Another** fragment with a highlight. ...
+
+``/api/search/autocomplete/``
+=============================
+
+Get auto completions for a partial search term.
+
+Query parameters:
+
+*   ``term``: The incomplete term.
+*   ``limit``: Amount of results. Defaults to 10.
+
+Results returned by the endpoint are ordered by importance of the term in the
+document index. The first result is the term that has the highest Tf/Idf score
+in the index.
+
+.. code:: json
+
+    [
+        "term1",
+        "term3",
+        "term6",
+        "term4"
+    ]
+
+
+.. _api-file_uploads:
+
+POSTing documents
+#################
+
+The API provides a special endpoint for file uploads:
+
+``/api/documents/post_document/``
+
+POST a multipart form to this endpoint, where the form field ``document`` contains
+the document that you want to upload to paperless. The filename is sanitized and
+then used to store the document in a temporary directory, and the consumer will
+be instructed to consume the document from there.
+
+The endpoint supports the following optional form fields:
+
+*   ``title``: Specify a title that the consumer should use for the document.
+*   ``correspondent``: Specify the ID of a correspondent that the consumer should use for the document.
+*   ``document_type``: Similar to correspondent.
+*   ``tags``: Similar to correspondent. Specify this multiple times to have multiple tags added
+    to the document.
+
+The endpoint will immediately return "OK" if the document consumption process
+was started successfully. No additional status information about the consumption
+process itself is available, since that happens in a different process.
--- a/docs/changelog.rst
+++ b/docs/changelog.rst
@@ -1,4 +1,333 @@
+
+.. _paperless_changelog:
+
+*********
 Changelog
+*********
+
+
+paperless-ng 0.9.8
+##################
+
+This release addresses two severe issues with the previous release.
+
+* The delete buttons for document types, correspondents and tags were not working.
+* The document section in the admin was causing internal server errors (500).
+
+
+paperless-ng 0.9.7
+##################
+
+
+* Front end
+
+  * Thanks to the hard work of `Michael Shamoon`_, paperless now comes with a much more streamlined UI for
+    filtering documents.
+  
+  * `Michael Shamoon`_ replaced the document preview with another component. This should fix compatibility with Safari browsers.
+
+  * Added buttons to the management pages to quickly show all documents with one specific tag, correspondent, or title.
+  
+  * Paperless now stores your saved views on the server and associates them with your user account. 
+    This means that you can access your views on multiple devices and have separate views for different users.
+    You will have to recreate your views.
+
+  * The GitHub and documentation links now open in new tabs/windows. Thanks to `rYR79435`_.
+
+  * Paperless now generates default saved view names when saving views with certain filter rules.
+
+  * Added a small version indicator to the front end.
+
+* Other additions and changes
+
+  * The new filename format field ``{tag_list}`` inserts a list of tags into the filename, separated by comma.
+  * The ``document_retagger`` no longer removes inbox tags or tags without matching rules.
+  * The new configuration option ``PAPERLESS_COOKIE_PREFIX`` allows you to run multiple instances of paperless on different ports.
+    This option enables you to be logged in into multiple instances by specifying different cookie names for each instance.
+
+* Fixes
+  
+  * Sometimes paperless would assign dates in the future to newly consumed documents.
+  * The filename format fields ``{created_month}`` and ``{created_day}`` now use a leading zero for single digit values.
+  * The filename format field ``{tags}`` can no longer be used without arguments.
+  * Paperless was not able to consume many images (especially images from mobile scanners) due to missing DPI information.
+    Paperless now assumes A4 paper size for PDF generation if no DPI information is present.
+  * Documents with empty titles could not be opened from the table view due to the link being empty.
+  * Fixed an issue with filenames containing special characters such as ``:`` not being accepted for upload.
+  * Fixed issues with thumbnail generation for plain text files.
+
+
+paperless-ng 0.9.6
+##################
+
+This release focusses primarily on many small issues with the UI.
+
+* Front end
+
+  * Paperless now has proper window titles.
+  * Fixed an issue with the small cards when more than 7 tags were used.
+  * Navigation of the "Show all" links adjusted. They navigate to the saved view now, if available in the sidebar.
+  * Some indication on the document lists that a filter is active was added.
+  * There's a new filter to filter for documents that do *not* have a certain tag.
+  * The file upload box now shows upload progress.
+  * The document edit page was reorganized.
+  * The document edit page shows various information about a document.
+  * An issue with the height of the preview was fixed.
+  * Table issues with too long document titles fixed.
+
+* API
+
+  * The API now serves file names with documents.
+  * The API now serves various metadata about documents.
+  * API documentation updated.
+
+* Other
+
+  * Fixed an issue with the docker image when a non-standard PostgreSQL port was used.
+  * The docker image was trying check for installed languages before actually installing them.
+  * ``FILENAME_FORMAT`` placeholder for document types.
+  * The filename formatter is now less restrictive with file names and tries to
+    conserve the original correspondents, types and titles as much as possible.
+  * The filename formatter does not include the document ID in filenames anymore. It will
+    rather append ``_01``, ``_02``, etc when it detects duplicate filenames.
+
+.. note::
+
+  The changes to the filename format will apply to newly added documents and changed documents.
+  If you want all files to reflect these changes, execute the ``document_renamer`` management
+  command.
+
+
+paperless-ng 0.9.5
+##################
+
+This release concludes the big changes I wanted to get rolled into paperless. The next releases before 1.0 will
+focus on fixing issues, primarily.
+
+* OCR
+
+  * Paperless now uses `OCRmyPDF <https://github.com/jbarlow83/OCRmyPDF>`_ to perform OCR on documents.
+    It still uses tesseract under the hood, but the PDF parser of Paperless has changed considerably and
+    will behave different for some douments.
+  * OCRmyPDF creates archived PDF/A documents with embedded text that can be selected in the front end.
+  * Paperless stores archived versions of documents alongside with the originals. The originals can be
+    accessed on the document edit page. If available, a dropdown menu will appear next to the download button.
+  * Many of the configuration options regarding OCR have changed. See :ref:`configuration-ocr` for details.
+  * Paperless no longer guesses the language of your documents. It always uses the language that you
+    specified with ``PAPERLESS_OCR_LANGUAGE``. Be sure to set this to the language the majority of your
+    documents are in. Multiple languages can be specified, but that requires more CPU time.
+  * The management command :ref:`document_archiver <utilities-archiver>` can be used to create archived versions for already
+    existing documents.
+
+* Tags from consumption folder.
+
+  * Thanks to `jayme-github`_, paperless now consumes files from sub folders in the consumption folder and is able to assign tags
+    based on the sub folders a document was found in. This can be configured with ``PAPERLESS_CONSUMER_RECURSIVE`` and
+    ``PAPERLESS_CONSUMER_SUBDIRS_AS_TAGS``.
+
+* API
+
+  * The API now offers token authentication.
+  * The endpoint for uploading documents now supports specifying custom titles, correspondents, tags and types.
+    This can be used by clients to override the default behavior of paperless. See :ref:`api-file_uploads`.
+  * The document endpoint of API now serves documents in this form:
+
+    * correspondents, document types and tags are referenced by their ID in the fields ``correspondent``, ``document_type`` and ``tags``. The ``*_id`` versions are gone. These fields are read/write.
+    * paperless does not serve nested tags, correspondents or types anymore.
+
+* Front end
+
+  * Paperless does some basic caching of correspondents, tags and types and will only request them from the server when necessary or when entirely reloading the page.
+  * Document list fetching is about 10%-30% faster now, especially when lots of tags/correspondents are present.
+  * Some minor improvements to the front end, such as document count in the document list, better highlighting of the current page, and improvements to the filter behavior.
+
+* Fixes:
+
+  * A bug with the generation of filenames for files with unsupported types caused the exporter and
+    document saving to crash.
+  * Mail handling no longer exits entirely when encountering errors. It will skip the account/rule/message on which the error occured.
+  * Assigning correspondents from mail sender names failed for very long names. Paperless no longer assigns correspondents in these cases.
+
+paperless-ng 0.9.4
+##################
+
+* Searching:
+
+  * Paperless now supports searching by tags, types and dates and correspondents. In order to have this applied to your
+    existing documents, you need to perform a ``document_index reindex`` management command
+    (see :ref:`administration-index`)
+    that adds the data to the search index. You only need to do this once, since the schema of the search index changed.
+    Paperless keeps the index updated after that whenever something changes.
+  * Paperless now has spelling corrections ("Did you mean") for miss-typed queries.
+  * The documentation contains :ref:`information about the query syntax <basic-searching>`.
+
+* Front end:
+
+  * Clickable tags, correspondents and types allow quick filtering for related documents.
+  * Saved views are now editable.
+  * Preview documents directly in the browser.
+  * Navigation from the dashboard to saved views.
+
+* Fixes:
+
+  * A severe error when trying to use post consume scripts.
+  * An error in the consumer that cause invalid messages of missing files to show up in the log.
+
+* The documentation now contains information about bare metal installs and a section about
+  how to setup the development environment.
+
+paperless-ng 0.9.3
+##################
+
+* Setting ``PAPERLESS_AUTO_LOGIN_USERNAME`` replaces ``PAPERLESS_DISABLE_LOGIN``.
+  You have to specify your username.
+* Added a simple sanity checker that checks your documents for missing or orphaned files,
+  files with wrong checksums, inaccessible files, and documents with empty content.
+* It is no longer possible to encrypt your documents. For the time being, paperless will
+  continue to operate with already encrypted documents.
+* Fixes:
+
+  * Paperless now uses inotify again, since the watchdog was causing issues which I was not
+    aware of.
+  * Issue with the automatic classifier not working with only one tag.
+  * A couple issues with the search index being opened to eagerly.
+
+* Added lots of tests for various parts of the application.
+
+paperless-ng 0.9.2
+##################
+
+* Major changes to the front end (colors, logo, shadows, layout of the cards,
+  better mobile support)
+
+* Paperless now uses mime types and libmagic detection to determine
+  if a file type is supported and which parser to use. Removes all
+  file type checks that where present in MANY different places in
+  paperless.
+
+* Mail consumer now correctly consumes documents even when their
+  content type was not set correctly. (i.e. PDF documents with
+  content type ``application/octet-stream``)
+
+* Basic sorting of mail rules added
+
+* Much better admin for mail rule editing.
+
+* Docker entrypoint script awaits the database server if it is
+  configured.
+
+* Disabled editing of logs.
+
+* New setting ``PAPERLESS_OCR_PAGES`` limits the tesseract parser
+  to the first n pages of scanned documents.
+
+* Fixed a bug where tasks with too long task names would not show
+  up in the admin.
+
+paperless-ng 0.9.1
+##################
+
+* Moved documentation of the settings to the actual documentation.
+* Updated release script to force the user to choose between SQLite
+  and PostgreSQL. This avoids confusion when upgrading from paperless.
+
+
+paperless-ng 0.9.0
+##################
+
+* **Deprecated:** GnuPG. :ref:`See this note on the state of GnuPG in paperless-ng. <utilities-encyption>`
+  This features will most likely be removed in future versions.
+
+* **Added:** New frontend. Features:
+
+  * Single page application: It's much more responsive than the django admin pages.
+  * Dashboard. Shows recently scanned documents, or todo notes, or other documents
+    at wish. Allows uploading of documents. Shows basic statistics.
+  * Better document list with multiple display options.
+  * Full text search with result highlighting, auto completion and scoring based
+    on the query. It uses a document search index in the background.
+  * Saveable filters.
+  * Better log viewer.
+
+* **Added:** Document types. Assign these to documents just as correspondents.
+  They may be used in the future to perform automatic operations on documents
+  depending on the type.
+* **Added:** Inbox tags. Define an inbox tag and it will automatically be
+  assigned to any new document scanned into the system.
+* **Added:** Automatic matching. A new matching algorithm that automatically
+  assigns tags, document types and correspondents to your documents. It uses
+  a neural network trained on your data.
+* **Added:** Archive serial numbers. Assign these to quickly find documents stored in
+  physical binders.
+* **Added:** Enabled the internal user management of django. This isn't really a
+  multi user solution, however, it allows more than one user to access the website
+  and set some basic permissions / renew passwords.
+
+* **Modified [breaking]:** All new mail consumer with customizable filters, actions and
+  multiple account support. Replaces the old mail consumer. The new mail consumer
+  needs different configuration but can be configured to act exactly like the old
+  consumer.
+
+
+* **Modified:** Changes to the consumer:
+
+  * Now uses the excellent watchdog library that should make sure files are
+    discovered no matter what the platform is.
+  * The consumer now uses a task scheduler to run consumption processes in parallel.
+    This means that consuming many documents should be much faster on systems with
+    many cores.
+  * Concurrency is controlled with the new settings ``PAPERLESS_TASK_WORKERS``
+    and ``PAPERLESS_THREADS_PER_WORKER``. See TODO for details on concurrency.
+  * The consumer no longer blocks the database for extended periods of time.
+  * An issue with tesseract running multiple threads per page and slowing down
+    the consumer was fixed.
+
+* **Modified [breaking]:** REST Api changes:
+
+  * New filters added, other filters removed (case sensitive filters, slug filters)
+  * Endpoints for thumbnails, previews and downloads replace the old ``/fetch/`` urls. Redirects are in place.
+  * Endpoint for document uploads replaces the old ``/push`` url. Redirects are in place.
+  * Foreign key relationships are now served as IDs, not as urls.
+
+* **Modified [breaking]:** PostgreSQL:
+
+  * If ``PAPERLESS_DBHOST`` is specified in the settings, paperless uses PostgreSQL instead of SQLite.
+    Username, database and password all default to ``paperless`` if not specified.
+
+* **Modified [breaking]:** document_retagger management command rework. See
+  :ref:`utilities-retagger` for details. Replaces ``document_correspondents``
+  management command.
+* **Removed [breaking]:** Reminders.
+* **Removed:** All customizations made to the django admin pages.
+* **Removed [breaking]:** The docker image no longer supports SSL. If you want to expose
+  paperless to the internet, hide paperless behind a proxy server that handles SSL
+  requests.
+* **Internal changes:** Mostly code cleanup, including:
+
+  * Rework of the code of the tesseract parser. This is now a lot cleaner.
+  * Rework of the filename handling code. It was a mess.
+  * Fixed some issues with the document exporter not exporting all documents when encountering duplicate filenames.
+  * Added a task scheduler that takes care of checking mail, training the classifier, maintaining the document search index
+    and consuming documents.
+  * Updated dependencies. Now uses Pipenv all around.
+  * Updated Dockerfile and docker-compose. Now uses ``supervisord`` to run everything paperless-related in a single container.
+
+* **Settings:**
+
+  * ``PAPERLESS_FORGIVING_OCR`` is now default and gone. Reason: Even if ``langdetect`` fails to detect
+    a language, tesseract still does a very good job at ocr'ing a document with the default language.
+    Certain language specifics such as umlauts may not get picked up properly.
+  * ``PAPERLESS_DEBUG`` defaults to ``false``.
+  * The presence of ``PAPERLESS_DBHOST`` now determines whether to use PostgreSQL or
+    SQLite.
+  * ``PAPERLESS_OCR_THREADS`` is gone and replaced with ``PAPERLESS_TASK_WORKERS`` and
+    ``PAPERLESS_THREADS_PER_WORKER``. Refer to the config example for details.
+  * ``PAPERLESS_OPTIMIZE_THUMBNAILS`` allows you to disable or enable thumbnail
+    optimization. This is useful on less powerful devices.
+
+* Many more small changes here and there. The usual stuff.
+
+Paperless
 #########

 2.7.0
@@ -6,7 +335,7 @@ Changelog

 * `syntonym`_ submitted a pull request to catch IMAP connection errors `#475`_.
 * `Stéphane Brunner`_ added ``psycopg2`` to the Pipfile `#489`_.  He also fixed
-  a syntax error in ``docker-compose.yml.example`` `#488`_ and added [DjangoQL](https://github.com/ivelum/djangoql),
+  a syntax error in ``docker-compose.yml.example`` `#488`_ and added `DjangoQL`_,
  which allows a litany of handy search functionality `#492`_.
 * `CkuT`_ and `JOKer`_ hacked out a simple, but super-helpful optimisation to
  how the thumbnails are served up, improving performance considerably `#481`_.
@@ -194,7 +523,7 @@ that it was more an annoyance than anything else, so this feature is now turned
 off unless you explicitly set a passphrase in your config file.

 Migrating from 1.x
------------------
+==================

 Encryption isn't gone, it's just off for new users.  So long as you have
 ``PAPERLESS_PASSPHRASE`` set in your config or your environment, Paperless
@@ -564,6 +893,9 @@ bulk of the work on this big change.

 * Initial release

+.. _rYR79435: https://github.com/rYR79435
+.. _Michael Shamoon: https://github.com/shamoon
+.. _jayme-github: http://github.com/jayme-github
 .. _Brian Conn: https://github.com/TheConnMan
 .. _Christopher Luu: https://github.com/nuudles
 .. _Florian Jung: https://github.com/the01
@@ -739,6 +1071,6 @@ bulk of the work on this big change.
 .. _#489: https://github.com/the-paperless-project/paperless/pull/489
 .. _#492: https://github.com/the-paperless-project/paperless/pull/492

-.. _pipenv: https://docs.pipenv.org/
 .. _a new home on Docker Hub: https://hub.docker.com/r/danielquinn/paperless/
 .. _optipng: http://optipng.sourceforge.net/
+.. _DjangoQL: https://github.com/ivelum/djangoql
--- a/docs/conf.py
+++ b/docs/conf.py
@@ -1,51 +1,21 @@
-# -*- coding: utf-8 -*-
-#
-# Paperless documentation build configuration file, created by
-# sphinx-quickstart on Mon Oct 26 18:36:52 2015.
-#
-# This file is execfile()d with the current directory set to its
-# containing dir.
-#
-# Note that not all possible configuration values are present in this
-# autogenerated file.
-#
-# All configuration values have a default; values that are commented out
-# serve to show the default.
+import sphinx_rtd_theme

-import sys
-import os

 __version__ = None
 exec(open("../src/paperless/version.py").read())


-# Believe it or not, this is the officially sanctioned way to add custom CSS.
-def setup(app):
-    app.add_stylesheet("custom.css")
-
-# If extensions (or modules to document with autodoc) are in another directory,
-# add these directories to sys.path here. If the directory is relative to the
-# documentation root, use os.path.abspath to make it absolute, like shown here.
-#sys.path.insert(0, os.path.abspath('.'))
-
-# -- General configuration ------------------------------------------------
-
-# If your documentation needs a minimal Sphinx version, state it here.
-#needs_sphinx = '1.0'
-
-# Add any Sphinx extension module names here, as strings. They can be
-# extensions coming with Sphinx (named 'sphinx.ext.*') or your custom
-# ones.
 extensions = [
    'sphinx.ext.autodoc',
    'sphinx.ext.intersphinx',
    'sphinx.ext.todo',
    'sphinx.ext.imgmath',
    'sphinx.ext.viewcode',
+    'sphinx_rtd_theme',
 ]

 # Add any paths that contain templates here, relative to this directory.
-templates_path = ['_templates']
+# templates_path = ['_templates']

 # The suffix of source filenames.
 source_suffix = '.rst'
@@ -57,7 +27,7 @@ source_suffix = '.rst'
 master_doc = 'index'

 # General information about the project.
-project = u'Paperless'
+project = u'Paperless-ng'
 copyright = u'2015, Daniel Quinn'

 # The version info for the project you're documenting, acts as replacement for
@@ -118,7 +88,7 @@ pygments_style = 'sphinx'

 # The theme to use for HTML and HTML Help pages.  See the documentation for
 # a list of builtin themes.
-html_theme = 'default'
+html_theme = 'sphinx_rtd_theme'

 # Theme options are theme-specific and customize the look and feel of a theme
 # further.  For a list of options available for each theme, see the
@@ -198,19 +168,6 @@ html_static_path = ['_static']
 # Output file base name for HTML help builder.
 htmlhelp_basename = 'paperless'

-
-#
-# Attempt to use the ReadTheDocs theme.  If it's not installed, fallback to
-# the default.
-#
-
-try:
-    import sphinx_rtd_theme
-    html_theme = "sphinx_rtd_theme"
-    html_theme_path = [sphinx_rtd_theme.get_html_theme_path()]
-except ImportError:
-    pass
-
 # -- Options for LaTeX output ---------------------------------------------

 latex_elements = {
--- a/docs/configuration.rst
+++ b/docs/configuration.rst
@@ -0,0 +1,426 @@
+.. _configuration:
+
+*************
+Configuration
+*************
+
+Paperless provides a wide range of customizations.
+Depending on how you run paperless, these settings have to be defined in different
+places.
+
+*   If you run paperless on docker, ``paperless.conf`` is not used. Rather, configure
+    paperless by copying necessary options to ``docker-compose.env``.
+*   If you are running paperless on anything else, paperless will search for the
+    configuration file in these locations and use the first one it finds:
+
+    .. code::
+
+        /path/to/paperless/paperless.conf
+        /etc/paperless.conf
+        /usr/local/etc/paperless.conf
+
+
+Required services
+#################
+
+PAPERLESS_REDIS=<url>
+    This is required for processing scheduled tasks such as email fetching, index
+    optimization and for training the automatic document matcher.
+
+    Defaults to redis://localhost:6379.
+
+PAPERLESS_DBHOST=<hostname>
+    By default, sqlite is used as the database backend. This can be changed here.
+    Set PAPERLESS_DBHOST and PostgreSQL will be used instead of mysql.
+
+PAPERLESS_DBPORT=<port>
+    Adjust port if necessary.
+
+    Default is 5432.
+
+PAPERLESS_DBNAME=<name>
+    Database name in PostgreSQL.
+
+    Defaults to "paperless".
+
+PAPERLESS_DBUSER=<name>
+    Database user in PostgreSQL.
+
+    Defaults to "paperless".
+
+PAPERLESS_DBPASS=<password>
+    Database password for PostgreSQL.
+
+    Defaults to "paperless".
+
+
+Paths and folders
+#################
+
+PAPERLESS_CONSUMPTION_DIR=<path>
+    This where your documents should go to be consumed.  Make sure that it exists
+    and that the user running the paperless service can read/write its contents
+    before you start Paperless.
+
+    Don't change this when using docker, as it only changes the path within the
+    container. Change the local consumption directory in the docker-compose.yml
+    file instead.
+
+    Defaults to "../consume", relative to the "src" directory.
+
+PAPERLESS_DATA_DIR=<path>
+    This is where paperless stores all its data (search index, SQLite database,
+    classification model, etc).
+
+    Defaults to "../data", relative to the "src" directory.
+
+PAPERLESS_MEDIA_ROOT=<path>
+    This is where your documents and thumbnails are stored.
+
+    You can set this and PAPERLESS_DATA_DIR to the same folder to have paperless
+    store all its data within the same volume.
+
+    Defaults to "../media", relative to the "src" directory.
+
+PAPERLESS_STATICDIR=<path>
+    Override the default STATIC_ROOT here.  This is where all static files
+    created using "collectstatic" manager command are stored.
+
+    Unless you're doing something fancy, there is no need to override this.
+
+    Defaults to "../static", relative to the "src" directory.
+
+PAPERLESS_FILENAME_FORMAT=<format>
+    Changes the filenames paperless uses to store documents in the media directory.
+    See :ref:`advanced-file_name_handling` for details.
+
+    Default is none, which disables this feature.
+
+Hosting & Security
+##################
+
+PAPERLESS_SECRET_KEY=<key>
+    Paperless uses this to make session tokens. If you expose paperless on the
+    internet, you need to change this, since the default secret is well known.
+
+    Use any sequence of characters. The more, the better. You don't need to
+    remember this. Just face-roll your keyboard.
+
+    Default is listed in the file ``src/paperless/settings.py``.
+
+PAPERLESS_ALLOWED_HOSTS<comma-separated-list>
+    If you're planning on putting Paperless on the open internet, then you
+    really should set this value to the domain name you're using.  Failing to do
+    so leaves you open to HTTP host header attacks:
+    https://docs.djangoproject.com/en/3.1/topics/security/#host-header-validation
+
+    Just remember that this is a comma-separated list, so "example.com" is fine,
+    as is "example.com,www.example.com", but NOT " example.com" or "example.com,"
+
+    Defaults to "*", which is all hosts.
+
+PAPERLESS_CORS_ALLOWED_HOSTS<comma-separated-list>
+    You need to add your servers to the list of allowed hosts that can do CORS
+    calls. Set this to your public domain name.
+
+    Defaults to "http://localhost:8000".
+
+PAPERLESS_FORCE_SCRIPT_NAME=<path>
+    To host paperless under a subpath url like example.com/paperless you set
+    this value to /paperless. No trailing slash!
+
+    .. note::
+
+        I don't know if this works in paperless-ng. Probably not.
+
+    Defaults to none, which hosts paperless at "/".
+
+PAPERLESS_STATIC_URL=<path>
+    Override the STATIC_URL here.  Unless you're hosting Paperless off a
+    subdomain like /paperless/, you probably don't need to change this.
+
+    Defaults to "/static/".
+
+PAPERLESS_AUTO_LOGIN_USERNAME=<username>
+    Specify a username here so that paperless will automatically perform login
+    with the selected user.
+
+    .. danger::
+
+        Do not use this when exposing paperless on the internet. There are no
+        checks in place that would prevent you from doing this.
+
+    Defaults to none, which disables this feature.
+
+
+PAPERLESS_COOKIE_PREFIX=<str>
+    Specify a prefix that is added to the cookies used by paperless to identify
+    the currently logged in user. This is useful for when you're running two
+    instances of paperless on the same host.
+
+    After changing this, you will have to login again.
+
+    Defaults to ``""``, which does not alter the cookie names.
+
+.. _configuration-ocr:
+
+OCR settings
+############
+
+Paperless uses `OCRmyPDF <https://ocrmypdf.readthedocs.io/en/latest/>`_ for
+performing OCR on documents and images. Paperless uses sensible defaults for
+most settings, but all of them can be configured to your needs.
+
+
+PAPERLESS_OCR_LANGUAGE=<lang>
+    Customize the language that paperless will attempt to use when
+    parsing documents.
+
+    It should be a 3-letter language code consistent with ISO
+    639: https://www.loc.gov/standards/iso639-2/php/code_list.php
+
+    Set this to the language most of your documents are written in.
+
+    This can be a combination of multiple languages such as ``deu+eng``,
+    in which case tesseract will use whatever language matches best.
+    Keep in mind that tesseract uses much more cpu time with multiple
+    languages enabled.
+
+    Defaults to "eng".
+
+PAPERLESS_OCR_MODE=<mode>
+    Tell paperless when and how to perform ocr on your documents. Four modes
+    are available:
+
+    *   ``skip``: Paperless skips all pages and will perform ocr only on pages
+        where no text is present. This is the safest option.
+    *   ``skip_noarchive``: In addition to skip, paperless won't create an
+        archived version of your documents when it finds any text in them.
+        This is useful if you don't want to have two almost-identical versions
+        of your digital documents in the media folder. This is the fastest option.
+    *   ``redo``: Paperless will OCR all pages of your documents and attempt to
+        replace any existing text layers with new text. This will be useful for
+        documents from scanners that already performed OCR with insufficient
+        results. It will also perform OCR on purely digital documents.
+
+        This option may fail on some documents that have features that cannot
+        be removed, such as forms. In this case, the text from the document is
+        used instead.
+    *   ``force``: Paperless rasterizes your documents, converting any text
+        into images and puts the OCRed text on top. This works for all documents,
+        however, the resulting document may be significantly larger and text
+        won't appear as sharp when zoomed in.
+    
+    The default is ``skip``, which only performs OCR when necessary and always
+    creates archived documents.
+
+PAPERLESS_OCR_OUTPUT_TYPE=<type>
+    Specify the the type of PDF documents that paperless should produce.
+    
+    *   ``pdf``: Modify the PDF document as little as possible.
+    *   ``pdfa``: Convert PDF documents into PDF/A-2b documents, which is a
+        subset of the entire PDF specification and meant for storing
+        documents long term.
+    *   ``pdfa-1``, ``pdfa-2``, ``pdfa-3`` to specify the exact version of
+        PDF/A you wish to use.
+    
+    If not specified, ``pdfa`` is used. Remember that paperless also keeps
+    the original input file as well as the archived version.
+
+
+PAPERLESS_OCR_PAGES=<num>
+    Tells paperless to use only the specified amount of pages for OCR. Documents
+    with less than the specified amount of pages get OCR'ed completely.
+
+    Specifying 1 here will only use the first page.
+
+    When combined with ``PAPERLESS_OCR_MODE=redo`` or ``PAPERLESS_OCR_MODE=force``,
+    paperless will not modify any text it finds on excluded pages and copy it
+    verbatim.
+
+    Defaults to 0, which disables this feature and always uses all pages.
+
+
+PAPERLESS_OCR_IMAGE_DPI=<num>
+    Paperless will OCR any images you put into the system and convert them
+    into PDF documents. This is useful if your scanner produces images.
+    In order to do so, paperless needs to know the DPI of the image.
+    Most images from scanners will have this information embedded and
+    paperless will detect and use that information. In case this fails, it
+    uses this value as a fallback.
+
+    Set this to the DPI your scanner produces images at.
+
+    Default is none, which causes paperless to fail if no DPI information is
+    present in an image.
+
+
+PAPERLESS_OCR_USER_ARG=<json>
+    OCRmyPDF offers many more options. Use this parameter to specify any
+    additional arguments you wish to pass to OCRmyPDF. Since Paperless uses
+    the API of OCRmyPDF, you have to specify these in a format that can be
+    passed to the API. See `the API reference of OCRmyPDF <https://ocrmypdf.readthedocs.io/en/latest/api.html#reference>`_
+    for valid parameters. All command line options are supported, but they
+    use underscores instead of dashed.
+
+    .. caution::
+
+        Paperless has been tested to work with the OCR options provided
+        above. There are many options that are incompatible with each other,
+        so specifying invalid options may prevent paperless from consuming
+        any documents.
+
+    Specify arguments as a JSON dictionary. Keep note of lower case booleans
+    and double quoted parameter names and strings. Examples:
+
+    .. code:: json
+
+        {"deskew": true, "optimize": 3, "unpaper_args": "--pre-rotate 90"}    
+    
+    
+Software tweaks
+###############
+
+PAPERLESS_TASK_WORKERS=<num>
+    Paperless does multiple things in the background: Maintain the search index,
+    maintain the automatic matching algorithm, check emails, consume documents,
+    etc. This variable specifies how many things it will do in parallel.
+
+
+PAPERLESS_THREADS_PER_WORKER=<num>
+    Furthermore, paperless uses multiple threads when consuming documents to
+    speed up OCR. This variable specifies how many pages paperless will process
+    in parallel on a single document.
+
+    .. caution::
+
+        Ensure that the product
+
+            PAPERLESS_TASK_WORKERS * PAPERLESS_THREADS_PER_WORKER
+
+        does not exceed your CPU core count or else paperless will be extremely slow.
+        If you want paperless to process many documents in parallel, choose a high
+        worker count. If you want paperless to process very large documents faster,
+        use a higher thread per worker count.
+
+    The default is a balance between the two, according to your CPU core count,
+    with a slight favor towards threads per worker, and using as much cores as
+    possible.
+
+    If you only specify PAPERLESS_TASK_WORKERS, paperless will adjust
+    PAPERLESS_THREADS_PER_WORKER automatically.
+
+
+PAPERLESS_TIME_ZONE=<timezone>
+    Set the time zone here.
+    See https://docs.djangoproject.com/en/3.1/ref/settings/#std:setting-TIME_ZONE
+    for details on how to set it.
+
+    Defaults to UTC.
+
+
+PAPERLESS_CONSUMER_POLLING=<num>
+    If paperless won't find documents added to your consume folder, it might
+    not be able to automatically detect filesystem changes. In that case,
+    specify a polling interval in seconds here, which will then cause paperless
+    to periodically check your consumption directory for changes.
+
+    Defaults to 0, which disables polling and uses filesystem notifications.
+
+
+PAPERLESS_CONSUMER_DELETE_DUPLICATES=<bool>
+    When the consumer detects a duplicate document, it will not touch the
+    original document. This default behavior can be changed here.
+
+    Defaults to false.
+
+
+PAPERLESS_CONSUMER_RECURSIVE=<bool>
+    Enable recursive watching of the consumption directory. Paperless will
+    then pickup files from files in subdirectories within your consumption
+    directory as well.
+
+    Defaults to false.
+
+
+PAPERLESS_CONSUMER_SUBDIRS_AS_TAGS=<bool>
+    Set the names of subdirectories as tags for consumed files.
+    E.g. <CONSUMPTION_DIR>/foo/bar/file.pdf will add the tags "foo" and "bar" to
+    the consumed file. Paperless will create any tags that don't exist yet.
+
+    PAPERLESS_CONSUMER_RECURSIVE must be enabled for this to work.
+
+    Defaults to false.
+
+
+PAPERLESS_CONVERT_MEMORY_LIMIT=<num>
+    On smaller systems, or even in the case of Very Large Documents, the consumer
+    may explode, complaining about how it's "unable to extend pixel cache".  In
+    such cases, try setting this to a reasonably low value, like 32.  The
+    default is to use whatever is necessary to do everything without writing to
+    disk, and units are in megabytes.
+
+    For more information on how to use this value, you should search
+    the web for "MAGICK_MEMORY_LIMIT".
+
+    Defaults to 0, which disables the limit.
+
+PAPERLESS_CONVERT_TMPDIR=<path>
+    Similar to the memory limit, if you've got a small system and your OS mounts
+    /tmp as tmpfs, you should set this to a path that's on a physical disk, like
+    /home/your_user/tmp or something.  ImageMagick will use this as scratch space
+    when crunching through very large documents.
+
+    For more information on how to use this value, you should search
+    the web for "MAGICK_TMPDIR".
+
+    Default is none, which disables the temporary directory.
+
+PAPERLESS_OPTIMIZE_THUMBNAILS=<bool>
+    Use optipng to optimize thumbnails. This usually reduces the size of
+    thumbnails by about 20%, but uses considerable compute time during
+    consumption.
+
+    Defaults to true.
+
+PAPERLESS_POST_CONSUME_SCRIPT=<filename>
+    After a document is consumed, Paperless can trigger an arbitrary script if
+    you like.  This script will be passed a number of arguments for you to work
+    with. For more information, take a look at :ref:`advanced-post_consume_script`.
+
+    The default is blank, which means nothing will be executed.
+
+PAPERLESS_FILENAME_DATE_ORDER=<format>
+    Paperless will check the document text for document date information.
+    Use this setting to enable checking the document filename for date
+    information. The date order can be set to any option as specified in
+    https://dateparser.readthedocs.io/en/latest/settings.html#date-order.
+    The filename will be checked first, and if nothing is found, the document
+    text will be checked as normal.
+
+    Defaults to none, which disables this feature.
+
+PAPERLESS_FILENAME_PARSE_TRANSFORMS
+    Transforms filenames before they are processed by paperless. See
+    :ref:`advanced-transforming_filenames` for details.
+
+    Defaults to none, which disables this feature.
+
+Binaries
+########
+
+There are a few external software packages that Paperless expects to find on
+your system when it starts up.  Unless you've done something creative with
+their installation, you probably won't need to edit any of these.  However,
+if you've installed these programs somewhere where simply typing the name of
+the program doesn't automatically execute it (ie. the program isn't in your
+$PATH), then you'll need to specify the literal path for that program.
+
+PAPERLESS_CONVERT_BINARY=<path>
+    Defaults to "/usr/bin/convert".
+
+PAPERLESS_GS_BINARY=<path>
+    Defaults to "/usr/bin/gs".
+
+PAPERLESS_OPTIPNG_BINARY=<path>
+    Defaults to "/usr/bin/optipng".
--- a/docs/consumption.rst
+++ b/docs/consumption.rst
@@ -1,255 +0,0 @@
-.. _consumption:
-
-Consumption
-###########
-
-Once you've got Paperless setup, you need to start feeding documents into it.
-Currently, there are three options: the consumption directory, IMAP (email), and
-HTTP POST.
-
-
-.. _consumption-directory:
-
-The Consumption Directory
-=========================
-
-The primary method of getting documents into your database is by putting them in
-the consumption directory.  The ``document_consumer`` script runs in an infinite
-loop looking for new additions to this directory and when it finds them, it goes
-about the process of parsing them with the OCR, indexing what it finds, and
-encrypting the PDF (if ``PAPERLESS_PASSPHRASE`` is set), storing it in the
-media directory.
-
-Getting stuff into this directory is up to you.  If you're running Paperless
-on your local computer, you might just want to drag and drop files there, but if
-you're running this on a server and want your scanner to automatically push
-files to this directory, you'll need to setup some sort of service to accept the
-files from the scanner.  Typically, you're looking at an FTP server like
-`Proftpd`_ or `Samba`_.
-
-.. _Proftpd: http://www.proftpd.org/
-.. _Samba: http://www.samba.org/
-
-So where is this consumption directory?  It's wherever you define it.  Look for
-the ``CONSUMPTION_DIR`` value in ``settings.py``.  Set that to somewhere
-appropriate for your use and put some documents in there.  When you're ready,
-follow the :ref:`consumer <utilities-consumer>` instructions to get it running.
-
-
-.. _consumption-directory-hook:
-
-Hooking into the Consumption Process
------------------------------------
-
-Sometimes you may want to do something arbitrary whenever a document is
-consumed.  Rather than try to predict what you may want to do, Paperless lets
-you execute scripts of your own choosing just before or after a document is
-consumed using a couple simple hooks.
-
-Just write a script, put it somewhere that Paperless can read & execute, and
-then put the path to that script in ``paperless.conf`` with the variable name
-of either ``PAPERLESS_PRE_CONSUME_SCRIPT`` or
-``PAPERLESS_POST_CONSUME_SCRIPT``.  The script will be executed before or
-or after the document is consumed respectively.
-
-.. important::
-
-    These scripts are executed in a **blocking** process, which means that if
-    a script takes a long time to run, it can significantly slow down your
-    document consumption flow.  If you want things to run asynchronously,
-    you'll have to fork the process in your script and exit.
-
-
-.. _consumption-directory-hook-variables:
-
-What Can These Scripts Do?
-..........................
-
-It's your script, so you're only limited by your imagination and the laws of
-physics.  However, the following values are passed to the scripts in order:
-
-
-.. _consumption-director-hook-variables-pre:
-
-Pre-consumption script
-::::::::::::::::::::::
-
-* Document file name
-
-A simple but common example for this would be creating a simple script like
-this:
-
-``/usr/local/bin/ocr-pdf``
-
-.. code:: bash
-
-    #!/usr/bin/env bash
-    pdf2pdfocr.py -i ${1}
-
-``/etc/paperless.conf``
-
-.. code:: bash
-
-    ...
-    PAPERLESS_PRE_CONSUME_SCRIPT="/usr/local/bin/ocr-pdf"
-    ...
-
-This will pass the path to the document about to be consumed to ``/usr/local/bin/ocr-pdf``,
-which will in turn call `pdf2pdfocr.py`_ on your document, which will then
-overwrite the file with an OCR'd version of the file and exit.  At which point,
-the consumption process will begin with the newly modified file.
-
-.. _pdf2pdfocr.py: https://github.com/LeoFCardoso/pdf2pdfocr
-
-
-.. _consumption-director-hook-variables-post:
-
-Post-consumption script
-:::::::::::::::::::::::
-
-* Document id
-* Generated file name
-* Source path
-* Thumbnail path
-* Download URL
-* Thumbnail URL
-* Correspondent
-* Tags
-
-The script can be in any language you like, but for a simple shell script
-example, you can take a look at ``post-consumption-example.sh`` in the
-``scripts`` directory in this project.
-
-
-.. _consumption-imap:
-
-IMAP (Email)
-============
-
-Another handy way to get documents into your database is to email them to
-yourself.  The typical use-case would be to be out for lunch and want to send a
-copy of the receipt back to your system at home.  Paperless can be taught to
-pull emails down from an arbitrary account and dump them into the consumption
-directory where the process :ref:`above <consumption-directory>` will follow the
-usual pattern on consuming the document.
-
-Some things you need to know about this feature:
-
-* It's disabled by default.  By setting the values below it will be enabled.
-* It's been tested in a limited environment, so it may not work for you (please
-  submit a pull request if you can!)
-* It's designed to **delete mail from the server once consumed**.  So don't go
-  pointing this to your personal email account and wonder where all your stuff
-  went.
-* Currently, only one photo (attachment) per email will work.
-
-So, with all that in mind, here's what you do to get it running:
-
-1. Setup a new email account somewhere, or if you're feeling daring, create a
-   folder in an existing email box and note the path to that folder.
-2. In ``/etc/paperless.conf`` set all of the appropriate values in
-   ``PATHS AND FOLDERS`` and ``SECURITY``.
-   If you decided to use a subfolder of an existing account, then make sure you
-   set ``PAPERLESS_CONSUME_MAIL_INBOX`` accordingly here.  You also have to set
-   the ``PAPERLESS_EMAIL_SECRET`` to something you can remember 'cause you'll
-   have to include that in every email you send.
-3. Restart the :ref:`consumer <utilities-consumer>`.  The consumer will check
-   the configured email account at startup and from then on every 10 minutes
-   for something new and pulls down whatever it finds.
-4. Send yourself an email!  Note that the subject is treated as the file name,
-   so if you set the subject to ``Correspondent - Title - tag,tag,tag``, you'll
-   get what you expect.  Also, you must include the aforementioned secret
-   string in every email so the fetcher knows that it's safe to import.
-   Note that Paperless only allows the email title to consist of safe characters
-   to be imported. These consist of alpha-numeric characters and ``-_ ,.'``.
-5. After a few minutes, the consumer will poll your mailbox, pull down the
-   message, and place the attachment in the consumption directory with the
-   appropriate name.  A few minutes later, the consumer will import it like any
-   other file.
-
-
-.. _consumption-http:
-
-HTTP POST
-=========
-
-You can also submit a document via HTTP POST, so long as you do so after
-authenticating.  To push your document to Paperless, send an HTTP POST to the
-server with the following name/value pairs:
-
-* ``correspondent``: The name of the document's correspondent.  Note that there
-  are restrictions on what characters you can use here.  Specifically,
-  alphanumeric characters, `-`, `,`, `.`, and `'` are ok, everything else is
-  out.  You also can't use the sequence ` - ` (space, dash, space).
-* ``title``: The title of the document.  The rules for characters is the same
-  here as the correspondent.
-* ``document``: The file you're uploading
-
-Specify ``enctype="multipart/form-data"``, and then POST your file with::
-
-    Content-Disposition: form-data; name="document"; filename="whatever.pdf"
-
-An example of this in HTML is a typical form:
-
-.. code:: html
-
-    <form method="post" enctype="multipart/form-data">
-        <input type="text" name="correspondent" value="My Correspondent" />
-        <input type="text" name="title" value="My Title" />
-        <input type="file" name="document" />
-        <input type="submit" name="go" value="Do the thing" />
-    </form>
-
-But a potentially more useful way to do this would be in Python.  Here we use
-the requests library to handle basic authentication and to send the POST data
-to the URL.
-
-.. code:: python
-
-    import os
-
-    from hashlib import sha256
-
-    import requests
-    from requests.auth import HTTPBasicAuth
-
-    # You authenticate via BasicAuth or with a session id.
-    # We use BasicAuth here
-    username = "my-username"
-    password = "my-super-secret-password"
-
-    # Where you have Paperless installed and listening
-    url = "http://localhost:8000/push"
-
-    # Document metadata
-    correspondent = "Test Correspondent"
-    title = "Test Title"
-
-    # The local file you want to push
-    path = "/path/to/some/directory/my-document.pdf"
-
-
-    with open(path, "rb") as f:
-
-        response = requests.post(
-            url=url,
-            data={"title": title,  "correspondent": correspondent},
-            files={"document": (os.path.basename(path), f, "application/pdf")},
-            auth=HTTPBasicAuth(username, password),
-            allow_redirects=False
-        )
-
-        if response.status_code == 202:
-
-            # Everything worked out ok
-            print("Upload successful")
-
-        else:
-
-            # If you don't get a 202, it's probably because your credentials
-            # are wrong or something.  This will give you a rough idea of what
-            # happened.
-
-            print("We got HTTP status code: {}".format(response.status_code))
-            for k, v in response.headers.items():
-                print("{}: {}".format(k, v))
--- a/docs/contributing.rst
+++ b/docs/contributing.rst
@@ -3,6 +3,10 @@
 Contributing to Paperless
 #########################

+.. warning::
+
+    This section is not updated to paperless-ng yet.
+    
 Maybe you've been using Paperless for a while and want to add a feature or two,
 or maybe you've come across a bug that you have some ideas how to solve.  The
 beauty of Free software is that you can see what's wrong and help to get it
@@ -81,7 +85,7 @@ quoted, or triple-quoted string will do:
    problematic_string = 'This is a "string" with "quotes" in it'

 In HTML templates, please use double-quotes for tag attributes, and single
-quotes for arguments passed to Django tempalte tags:
+quotes for arguments passed to Django template tags:

 .. code:: html

--- a/docs/customising.rst
+++ b/docs/customising.rst
@@ -1,42 +0,0 @@
-.. _customising:
-
-Customising Paperless
-#####################
-
-Currently, the Paperless' interface is just the default Django admin, which
-while powerful, is rather boring.  If you'd like to give the site a bit of a
-face-lift, or if you simply want to adjust the colours, contrast, or font size
-to make things easier to read, you can do that by adding your own CSS or
-Javascript quite easily.
-
-
-.. _customising-overrides:
-
-Overrides
-=========
-
-On every page load, Paperless looks for two files in your media root directory
-(the directory defined by your ``PAPERLESS_MEDIADIR`` configuration variable or
-the default, ``<project root>/media/``) for two files:
-
-* ``overrides.css``
-* ``overrides.js``
-
-If it finds either or both of those files, they'll be loaded into the page: the
-CSS in the ``<head>``, and the Javascript stuffed into the last line of the
-``<body>``.
-
-
-.. _customising-overrides-note:
-
-An important note about customisation
-------------------------------------
-
-Any changes you make to the site with your CSS or Javascript are likely to
-depend on the structure of the current HTML and/or the existing CSS rules.  For
-the most part it's safe to assume that these bits won't change, but *sometimes
-they do* as features are added or bugs are fixed.
-
-If you make a change that you think others would appreciate though, submit it
-as a pull request and maybe we can find a way to work it into the project by
-default!
--- a/docs/examples/lxc/lxc-install.sh
+++ b/docs/examples/lxc/lxc-install.sh
@@ -1,158 +0,0 @@
-#!/usr/bin/env bash
-
-# Bash script to install paperless in lxc containter
-# paperless.lan
-#
-# Will set-up paperless, apache2 and proftpd
-#
-# lxc launch ubuntu: paperless
-# lxc exec paperless -- sh -c "sudo apt-get update && sudo apt-get install -y wget"
-# lxc exec paperless -- sh -c "wget https://raw.githubusercontent.com/the-paperless-project/paperless/master/docs/examples/lxc/lxc-install.sh && /bin/bash lxc-install.sh --email "
-#
-#
-set +e
-PASSWORD=$(< /dev/urandom tr -dc _A-Z-a-z-0-9+@%^{} | head -c20;echo;)
-EMAIL=
-
-function displayHelp() {
-    echo "available parameters:
-    -e <email> | --email <email> 
-    -p <password> | --password <password>
-    "
-}
-
-POSITIONAL=()
-while [[ $# -gt 0 ]]
-do
-key="$1"
-i=$key
-
-case $i in
-    -e|--email)
-      EMAIL="${2}"
-      shift
-      shift
-    ;;
-    -p|--password)
-      PASSWORD="${2}"
-      shift
-      shift
-    ;;
-    --default|-h|--help)
-      shift
-      displayHelp
-      exit 0
-    ;;
-    *)
-      echo "argument: $i not recognized"
-      exit 2
-    ;;
-esac
-done
-set -- "${POSITIONAL[@]}" # restore positional parameters
-
-if [ -z $EMAIL ]; then
-  echo "missing email, try running with -h "
-  exit 3
-fi
-if [[ $(/usr/bin/id -u) -ne 0 ]]; then
-    echo "Not running as root"
-    exit
-fi
-
-if [ $(grep -c paperless /etc/passwd) -eq 0 ]; then
-  # Add paperless user with no password
-  adduser --disabled-password --gecos "" paperless
-fi
-
-if [ $(grep -c ftpupload /etc/passwd) -eq 0 ]; then
-  # Add ftpupload
-  adduser --disabled-password --gecos "" ftpupload
-  echo "Set ftpupload password: "
-  #passwd ftpupload
-  #TODO: generate some password and allow parameter 
-  echo "ftpupload:ftpuploadpassword" | chpasswd
-fi
-
-if [ $(id -nG paperless | grep -Fcw ftpupload) -eq 0 ]; then
-  # Allow paperless group to access
-  adduser paperless ftpupload
-  chmod g+w /home/ftpupload 
-fi
-
-# Get apt up to date
-apt-get update
-
-# Needed for plain Paperless
-apt-get -y install unpaper gnupg libpoppler-cpp-dev python3-pyocr tesseract-ocr imagemagick optipng git
-
-# Needed for Apache
-apt-get -y install apache2 libapache2-mod-wsgi-py3
-
-if [ ! -f /etc/proftpd/proftpd.conf ]; then
-  # Install ftp server and make sure all uplaoded files are owned by paperless
-  apt-get -y install proftpd
-fi
-if [ $(grep -c paperless /etc/proftpd/proftpd.conf) -eq 0 ]; then
-  cat <<EOF >> /etc/proftpd/proftpd.conf
-  <Directory /home/ftpupload/>
-    UserOwner   paperless
-    GroupOwner  paperless
-  </Directory>
-EOF
-  systemctl restart proftpd
-fi
-
-#Get Paperless from git 
-su -c "cd /home/paperless ; git clone https://github.com/the-paperless-project/paperless" paperless
-
-# Install Pip Requirements
-apt-get -y install python3-pip python3-venv
-cd /home/paperless/paperless
-pip3 install -r requirements.txt
-
-# Take paperless.conf.example and set consumuption dir (ftp dir)
-sed  -e '/PAPERLESS_CONSUMPTION_DIR=/s/=.*/=\"\/home\/ftpupload\/\"/' \
-     /home/paperless/paperless/paperless.conf.example  >/etc/paperless.conf
-
-# Update /etc/paperless.conf with PAPERLESS_SECRET_KEY
-SECRET=$(strings /dev/urandom | grep -o '[[:alnum:]]' | head -n 30 | tr -d '\n'; echo)
-sed  -i "s/#PAPERLESS_SECRET_KEY.*/PAPERLESS_SECRET_KEY=$SECRET/" /etc/paperless.conf 
-
-#Initialise the SQLite database 
-su -c "cd /home/paperless/paperless/src/ ; ./manage.py migrate" paperless
-echo "if superuser doesn't exists, create one with login: paperless and password: ${PASSWORD}"
-#Create a user for your Paperless instance
-su -c "cd /home/paperless/paperless/src/ ; echo ./manage.py create_superuser_with_password --username paperless --email ${EMAIL} --password ${PASSWORD} --preserve" paperless
-su -c "cd /home/paperless/paperless/src/ ; ./manage.py create_superuser_with_password --username paperless --email ${EMAIL} --password ${PASSWORD} --preserve" paperless
-
-if [ ! -d /home/paperless/paperless/static ]; then
-  # 167 static files copied to '/home/paperless/paperless/static'.
-  su -c "cd /home/paperless/paperless/src/ ; ./manage.py collectstatic" paperless
-fi
-
-if [ ! -f /etc/apache2/sites-available/paperless.conf ]; then
-  # Set-up apache
-  cp /home/paperless/paperless/docs/examples/lxc/paperless.conf /etc/apache2/sites-available/
-  a2dissite 000-default.conf
-  a2ensite paperless.conf
-  systemctl reload apache2
-fi
-
-sed -e "s:home/paperless/project/virtualenv/bin/python:usr/bin/python3:" \
-     /home/paperless/paperless/scripts/paperless-consumer.service \
-     >/etc/systemd/system/paperless-consumer.service
-
-sed -i "s:/home/paperless/project/src/manage.py:/home/paperless/paperless/src/manage.py:" \
-      /etc/systemd/system/paperless-consumer.service
-
-
-systemctl enable paperless-consumer
-systemctl start paperless-consumer
-
-# convert-im6.q16: not authorized
-# Security risk ?
-# https://stackoverflow.com/questions/42928765/convertnot-authorized-aaaa-error-constitute-c-readimage-453
-if [ -f /etc/ImageMagick-6/policy.xml ]; then
-  mv /etc/ImageMagick-6/policy.xml /etc/ImageMagick-6/policy.xmlout
-fi
--- a/docs/examples/lxc/paperless.conf
+++ b/docs/examples/lxc/paperless.conf
@@ -1,18 +0,0 @@
-<VirtualHost *:80>
-    ServerName paperless.lan
-
-    Alias /static/ /home/paperless/paperless/static/
-    <Directory /home/paperless/paperless/static>
-        Require all granted
-    </Directory>
-
-    WSGIScriptAlias / /home/paperless/paperless/src/paperless/wsgi.py
-    WSGIDaemonProcess paperless.lan user=paperless group=paperless threads=5 python-path=/home/paperless/paperless/src 
-    WSGIProcessGroup paperless.lan
-
-    <Directory /home/paperless/paperless/src/paperless>
-        <Files wsgi.py>
-            Require all granted
-        </Files>
-    </Directory>
-</VirtualHost>
--- a/docs/extending.rst
+++ b/docs/extending.rst
@@ -1,112 +1,197 @@
 .. _extending:

+Paperless development
+#####################
+
+This section describes the steps you need to take to start development on paperless-ng.
+
+1.  Check out the source from github. The repository is organized in the following way:
+
+    *   ``master`` always represents the latest release and will only see changes
+        when a new release is made.
+    *   ``dev`` contains the code that will be in the next release.
+    *   ``feature-X`` contain bigger changes that will be in some release, but not
+        necessarily the next one.
+    
+    Apart from that, the folder structure is as follows:
+
+    *   ``docs/`` - Documentation.
+    *   ``src-ui/`` - Code of the front end.
+    *   ``src/`` - Code of the back end.
+    *   ``scripts/`` - Various scripts that help with different parts of development.
+    *   ``docker/`` - Files required to build the docker image.
+
+2.  Install some dependencies.
+
+    *   Python 3.6.
+    *   All dependencies listed in the :ref:`Bare metal route <setup-bare_metal>`
+    *   redis. You can either install redis or use the included scritps/start-redis.sh
+        to use docker to fire up a redis instance.
+
+Back end development
+====================
+
+The backend is a django application. I use PyCharm for development, but you can use whatever
+you want.
+
+Install the python dependencies by performing ``pipenv install --dev`` in the src/ directory.
+This will also create a virtual environment, which you can enter with ``pipenv shell`` or
+execute one-shot commands in with ``pipenv run``.
+
+In ``src/paperless.conf``, enable debug mode.
+
+Configure the IDE to use the src/ folder as the base source folder. Configure the following
+launch configurations in your IDE:
+
+*   python3 manage.py runserver
+*   python3 manage.py qcluster
+*   python3 manage.py consumer
+
+Depending on which part of paperless you're developing for, you need to have some or all of
+them running.
+
+Testing and code style:
+
+*   Run ``pytest`` in the src/ directory to execute all tests. This also generates a HTML coverage
+    report. When runnings test, paperless.conf is loaded as well. However: the tests rely on the default
+    configuration. This is not ideal. But for now, make sure no settings except for DEBUG are overridden when testing.
+*   Run ``pycodestyle`` to test your code for issues with the configured code style settings.
+
+    .. note::
+
+        The line length rule E501 is generally useful for getting multiple source files
+        next to each other on the screen. However, in some cases, its just not possible
+        to make some lines fit, especially complicated IF cases. Append ``  # NOQA: E501``
+        to disable this check for certain lines.
+
+Front end development
+=====================
+
+The front end is build using angular. I use the ``Code - OSS`` IDE for development.
+
+In order to get started, you need ``npm``. Install the Angular CLI interface with
+
+.. code:: shell-session
+
+    $ npm install -g @angular/cli
+
+and make sure that it's on your path. Next, in the src-ui/ directory, install the
+required dependencies of the project.
+
+.. code:: shell-session
+
+    $ npm install
+
+You can launch a development server by running
+
+.. code:: shell-session
+
+    $ ng serve
+
+This will automatically update whenever you save. However, in-place compilation might fail
+on syntax errors, in which case you need to restart it.
+
+By default, the development server is available on ``http://localhost:4200/`` and is configured
+to access the API at ``http://localhost:8000/api/``, which is the default of the backend.
+If you enabled DEBUG on the back end, several security overrides for allowed hosts, CORS and
+X-Frame-Options are in place so that the front end behaves exactly as in production. This also
+relies on you being logged into the back end. Without a valid session, The front end will simply
+not work.
+
+In order to build the front end and serve it as part of django, execute
+
+.. code:: shell-session
+
+    $ ng build --prod --output-path ../src/documents/static/frontend/
+
+This will build the front end and put it in a location from which the Django server will serve
+it as static content. This way, you can verify that authentication is working.
+
+Making a release
+================
+
+Execute the ``make-release.sh <ver>`` script.
+
+This will test and assemble everything and also build and tag a docker image.
+
+
 Extending Paperless
 ===================

-For the most part, Paperless is monolithic, so extending it is often best
-managed by way of modifying the code directly and issuing a pull request on
-`GitHub`_.  However, over time the project has been evolving to be a little
-more "pluggable" so that users can write their own stuff that talks to it.
+Paperless does not have any fancy plugin systems and will probably never have. However,
+some parts of the application have been designed to allow easy integration of additional
+features without any modification to the base code.

-.. _GitHub: https://github.com/the-paperless-project/paperless
+Making custom parsers
+---------------------

+Paperless uses parsers to add documents to paperless. A parser is responsible for:

-.. _extending-parsers:
+*   Retrieve the content from the original
+*   Create a thumbnail
+*   Optional: Retrieve a created date from the original
+*   Optional: Create an archived document from the original

-Parsers
-------
+Custom parsers can be added to paperless to support more file types. In order to do that,
+you need to write the parser itself and announce its existence to paperless.

-You can leverage Paperless' consumption model to have it consume files *other*
-than ones handled by default like ``.pdf``, ``.jpg``, and ``.tiff``.  To do so,
-you simply follow Django's convention of creating a new app, with a few key
-requirements.
-
-
-.. _extending-parsers-parserspy:
-
-parsers.py
-..........
-
-In this file, you create a class that extends
-``documents.parsers.DocumentParser`` and go about implementing the three
-required methods:
-
-* ``get_thumbnail()``: Returns the path to a file we can use as a thumbnail for
-  this document.
-* ``get_text()``: Returns the text from the document and only the text.
-* ``get_date()``: If possible, this returns the date of the document, otherwise
-  it should return ``None``.
-
-
-.. _extending-parsers-signalspy:
-
-signals.py
-..........
-
-At consumption time, Paperless emits a ``document_consumer_declaration``
-signal which your module has to react to in order to let the consumer know
-whether or not it's capable of handling a particular file.  Think of it like
-this:
-
-1. Consumer finds a file in the consumption directory.
-2. It asks all the available parsers: *"Hey, can you handle this file?"*
-3. Each parser responds with either ``None`` meaning they can't handle the
-   file, or a dictionary in the following format:
+The parser itself must extend ``documents.parsers.DocumentParser`` and must implement the
+methods ``parse`` and ``get_thumbnail``. You can provide your own implementation to
+``get_date`` if you don't want to rely on paperless' default date guessing mechanisms.

 .. code:: python

-    {
-        "parser": <the class name>,
-        "weight": <an integer>
-    }
+    class MyCustomParser(DocumentParser):

-The consumer compares the ``weight`` values from all respondents and uses the
-class with the highest value to consume the document.  The default parser,
-``RasterisedDocumentParser`` has a weight of ``0``.
+        def parse(self, document_path, mime_type):
+            # This method does not return anything. Rather, you should assign
+            # whatever you got from the document to the following fields:

+            # The content of the document.
+            self.text = "content"
+            
+            # Optional: path to a PDF document that you created from the original.
+            self.archive_path = os.path.join(self.tempdir, "archived.pdf")

-.. _extending-parsers-appspy:
+            # Optional: "created" date of the document.
+            self.date = get_created_from_metadata(document_path)

-apps.py
-.......
+        def get_thumbnail(self, document_path, mime_type):
+            # This should return the path to a thumbnail you created for this
+            # document.
+            return os.path.join(self.tempdir, "thumb.png")

-This is a standard Django file, but you'll need to add some code to it to
-connect your parser to the ``document_consumer_declaration`` signal.
+If you encounter any issues during parsing, raise a ``documents.parsers.ParseError``.

+The ``self.tempdir`` directory is a temporary directory that is guaranteed to be empty
+and removed after consumption finished. You can use that directory to store any
+intermediate files and also use it to store the thumbnail / archived document.

-.. _extending-parsers-finally:
-
-Finally
-.......
-
-The last step is to update ``settings.py`` to include your new module.
-Eventually, this will be dynamic, but at the moment, you have to edit the
-``INSTALLED_APPS`` section manually.  Simply add the path to your AppConfig to
-the list like this:
+After that, you need to announce your parser to paperless. You need to connect a
+handler to the ``document_consumer_declaration`` signal. Have a look in the file
+``src/paperless_tesseract/apps.py`` on how that's done. The handler is a method
+that returns information about your parser:

 .. code:: python

-    INSTALLED_APPS = [
-        ...
-        "my_module.apps.MyModuleConfig",
-        ...
-    ]
+    def myparser_consumer_declaration(sender, **kwargs):
+        return {
+            "parser": MyCustomParser,
+            "weight": 0,
+            "mime_types": {
+                "application/pdf": ".pdf",
+                "image/jpeg": ".jpg",
+            }
+        }

-Order doesn't matter, but generally it's a good idea to place your module lower
-in the list so that you don't end up accidentally overriding project defaults
-somewhere.
+*   ``parser`` is a reference to a class that extends ``DocumentParser``.

+*   ``weight`` is used whenever two or more parsers are able to parse a file: The parser with
+    the higher weight wins. This can be used to override the parsers provided by
+    paperless.

-.. _extending-parsers-example:
-
-An Example
-..........
-
-The core Paperless functionality is based on this design, so if you want to see
-what a parser module should look like, have a look at `parsers.py`_,
-`signals.py`_, and `apps.py`_ in the `paperless_tesseract`_ module.
-
-.. _parsers.py: https://github.com/the-paperless-project/paperless/blob/master/src/paperless_tesseract/parsers.py
-.. _signals.py: https://github.com/the-paperless-project/paperless/blob/master/src/paperless_tesseract/signals.py
-.. _apps.py: https://github.com/the-paperless-project/paperless/blob/master/src/paperless_tesseract/apps.py
-.. _paperless_tesseract: https://github.com/the-paperless-project/paperless/blob/master/src/paperless_tesseract/
+*   ``mime_types`` is a dictionary. The keys are the mime types your parser supports and the value
+    is the default file extension that paperless should use when storing files and serving them for
+    download. We could guess that from the file extensions, but some mime types have many extensions
+    associated with them and the python methods responsible for guessing the extension do not always
+    return the same value.
--- a/docs/faq.rst
+++ b/docs/faq.rst
@@ -0,0 +1,106 @@
+
+**************************
+Frequently asked questions
+**************************
+
+**Q:** *What's the general plan for Paperless-ng?*
+
+**A:** Paperless-ng is already almost feature-complete. This project will remain
+as simple as it is right now. It will see improvements to features that are already there.
+If you need advanced features such as document versions,
+workflows or multi-user with customizable access to individual files, this is
+not the tool for you.
+
+Features that *are* planned are some more quality of life extensions for the searching
+(i.e., search for similar documents, group results by correspondents with "more from this"
+links, etc), bulk editing and hierarchical tags.
+
+**Q:** *I'm using docker. Where are my documents?*
+
+**A:** Your documents are stored inside the docker volume ``paperless_media``.
+Docker manages this volume automatically for you. It is a persistent storage
+and will persist as long as you don't explicitly delete it. The actual location
+depends on your host operating system. On Linux, chances are high that this location
+is
+
+.. code::
+
+    /var/lib/docker/volumes/paperless_media/_data
+
+.. caution::
+
+    Do not mess with this folder. Don't change permissions and don't move
+    files around manually. This folder is meant to be entirely managed by docker
+    and paperless.
+
+**Q:** *Let's say you don't support this project anymore in a year. Can I easily move to other systems?*
+
+**A:** Your documents are stored as plain files inside the media folder. You can always drag those files
+out of that folder to use them elsewhere. Here are a couple notes about that.
+
+*   Paperless never modifies your original documents. It keeps checksums of all documents and uses a
+    scheduled sanity checker to check that they remain the same.
+*   By default, paperless uses the internal ID of each document as its filename. This might not be very
+    convenient for export. However, you can adjust the way files are stored in paperless by
+    :ref:`configuring the filename format <advanced-file_name_handling>`.
+*   :ref:`The exporter <utilities-exporter>` is another easy way to get your files out of paperless with reasonable file names.
+
+**Q:** *What file types does paperless-ng support?*
+
+**A:** Currently, the following files are supported:
+
+*   PDF documents, PNG images, JPEG images, TIFF images and GIF images are processed with OCR and converted into PDF documents.
+*   Plain text documents are supported as well and are added verbatim
+    to paperless.
+
+Paperless determines the type of a file by inspecting its content. The
+file extensions do not matter.
+
+**Q:** *Will paperless-ng run on Raspberry Pi?*
+
+**A:** The short answer is yes. I've tested it on a Raspberry Pi 3 B.
+The long answer is that certain parts of
+Paperless will run very slow, such as the tesseract OCR. On Raspberry Pi,
+try to OCR documents before feeding them into paperless so that paperless can
+reuse the text. The web interface should be a lot snappier, since it runs
+in your browser and paperless has to do much less work to serve the data.
+
+.. note::
+    
+    You can adjust some of the settings so that paperless uses less processing
+    power. See :ref:`setup-less_powerful_devices` for details.
+    
+
+**Q:** *How do I install paperless-ng on Raspberry Pi?*
+
+**A:** There is no docker image for ARM available. If you know how to build
+that automatically, I'm all ears. For now, you have to grab the latest release
+archive from the project page and build the image yourself. The release comes
+with the front end already compiled, so you don't have to do this on the Pi.
+
+**Q:** *How do I run this on unRaid?*
+
+**A:** Head over to `<https://github.com/selfhosters/unRAID-CA-templates>`_,
+`Uli Fahrer <https://github.com/Tooa>`_ created a container template for that.
+I don't exactly know how to use that though, since I don't use unRaid.
+
+**Q:** *How do I run this on my toaster?*
+
+**A:** I honestly don't know! As for all other devices that might be able
+to run paperless, you're a bit on your own. If you can't run the docker image,
+the documentation has instructions for bare metal installs. I'm running
+paperless on an i3 processor from 2015 or so. This is also what I use to test
+new releases with. Apart from that, I also have a Raspberry Pi, which I
+occasionally build the image on and see if it works.
+
+**Q:** *How do I proxy this with NGINX?*
+
+.. code::
+
+    location / {
+        proxy_pass http://localhost:8000/
+    }
+
+And that's about it. Paperless serves everything, including static files by itself
+when running the docker image. If you want to do anything fancy, you have to
+install paperless bare metal.
--- a/docs/guesswork.rst
+++ b/docs/guesswork.rst
@@ -1,131 +0,0 @@
-.. _guesswork:
-
-Guesswork
-#########
-
-During the consumption process, Paperless tries to guess some of the attributes
-of the document it's looking at.  To do this it uses two approaches:
-
-
-.. _guesswork-naming:
-
-File Naming
-===========
-
-Any document you put into the consumption directory will be consumed, but if
-you name the file right, it'll automatically set some values in the database
-for you.  This is is the logic the consumer follows:
-
-1. Try to find the correspondent, title, and tags in the file name following
-   the pattern: ``Date - Correspondent - Title - tag,tag,tag.pdf``.  Note that
-   the format of the date is **rigidly defined** as ``YYYYMMDDHHMMSSZ`` or
-   ``YYYYMMDDZ``.  The ``Z`` refers "Zulu time" AKA "UTC".
-   The tags are optional, so the format ``Date - Correspondent - Title.pdf``
-   works as well.
-2. If that doesn't work, we skip the date and try this pattern:
-   ``Correspondent - Title - tag,tag,tag.pdf``.
-3. If that doesn't work, we try to find the correspondent and title in the file
-   name following the pattern: ``Correspondent - Title.pdf``.
-4. If that doesn't work, just assume that the name of the file is the title.
-
-So given the above, the following examples would work as you'd expect:
-
-* ``20150314000700Z - Some Company Name - Invoice 2016-01-01 - money,invoices.pdf``
-* ``20150314Z - Some Company Name - Invoice 2016-01-01 - money,invoices.pdf``
-* ``Some Company Name - Invoice 2016-01-01 - money,invoices.pdf``
-* ``Another Company - Letter of Reference.jpg``
-* ``Dad's Recipe for Pancakes.png``
-
-These however wouldn't work:
-
-* ``2015-03-14 00:07:00 UTC - Some Company Name, Invoice 2016-01-01, money, invoices.pdf``
-* ``2015-03-14 - Some Company Name, Invoice 2016-01-01, money, invoices.pdf``
-* ``Some Company Name, Invoice 2016-01-01, money, invoices.pdf``
-* ``Another Company- Letter of Reference.jpg``
-
-Do I have to be so strict about naming?
---------------------------------------
-Rather than using the strict document naming rules, one can also set the option
-``PAPERLESS_FILENAME_DATE_ORDER`` in ``paperless.conf`` to any date order
-that is accepted by dateparser_. Doing so will cause ``paperless`` to default
-to any date format that is found in the title, instead of a date pulled from
-the document's text, without requiring the strict formatting of the document
-filename as described above.
-
-.. _dateparser: https://github.com/scrapinghub/dateparser/blob/v0.7.0/docs/usage.rst#settings
-
-Transforming filenames for parsing
----------------------------------
-Some devices can't produce filenames that can be parsed by the default
-parser. By configuring the option ``PAPERLESS_FILENAME_PARSE_TRANSFORMS`` in
-``paperless.conf`` one can add transformations that are applied to the filename
-before it's parsed.
-
-The option contains a list of dictionaries of regular expressions (key:
-``pattern``) and replacements (key: ``repl``) in JSON format, which are
-applied in order by passing them to ``re.subn``. Transformation stops
-after the first match, so at most one transformation is applied. The general
-syntax is
-
-.. code:: python
-
-   [{"pattern":"pattern1", "repl":"repl1"}, {"pattern":"pattern2", "repl":"repl2"}, ..., {"pattern":"patternN", "repl":"replN"}]
-
-The example below is for a Brother ADS-2400N, a scanner that allows
-different names to different hardware buttons (useful for handling
-multiple entities in one instance), but insists on adding ``_<count>``
-to the filename.
-
-.. code:: python
-
-   # Brother profile configuration, support "Name_Date_Count" (the default
-   # setting) and "Name_Count" (use "Name" as tag and "Count" as title).
-   PAPERLESS_FILENAME_PARSE_TRANSFORMS=[{"pattern":"^([a-z]+)_(\\d{8})_(\\d{6})_([0-9]+)\\.", "repl":"\\2\\3Z - \\4 - \\1."}, {"pattern":"^([a-z]+)_([0-9]+)\\.", "repl":" - \\2 - \\1."}]
-
-.. _guesswork-content:
-
-Reading the Document Contents
-=============================
-
-After the consumer has tried to figure out what it could from the file name,
-it starts looking at the content of the document itself.  It will compare the
-matching algorithms defined by every tag and correspondent already set in your
-database to see if they apply to the text in that document.  In other words,
-if you defined a tag called ``Home Utility`` that had a ``match`` property of
-``bc hydro`` and a ``matching_algorithm`` of ``literal``, Paperless will
-automatically tag your newly-consumed document with your ``Home Utility`` tag
-so long as the text ``bc hydro`` appears in the body of the document somewhere.
-
-The matching logic is quite powerful, and supports searching the text of your
-document with different algorithms, and as such, some experimentation may be
-necessary to get things Just Right.
-
-
-.. _guesswork-content-howto:
-
-How Do I Set Up These Matching Algorithms?
------------------------------------------
-
-Setting up of the algorithms is easily done through the admin interface.  When
-you create a new correspondent or tag, there are optional fields for matching
-text and matching algorithm.  From the help info there:
-
-.. note::
-
-    Which algorithm you want to use when matching text to the OCR'd PDF.  Here,
-    "any" looks for any occurrence of any word provided in the PDF, while "all"
-    requires that every word provided appear in the PDF, albeit not in the
-    order provided.  A "literal" match means that the text you enter must
-    appear in the PDF exactly as you've entered it, and "regular expression"
-    uses a regex to match the PDF.  If you don't know what a regex is, you
-    probably don't want this option.
-
-When using the "any" or "all" matching algorithms, you can search for terms
-that consist of multiple words by enclosing them in double quotes. For example,
-defining a match text of ``"Bank of America" BofA`` using the "any" algorithm,
-will match documents that contain either "Bank of America" or "BofA", but will
-not match documents containing "Bank of South America".
-
-Then just save your tag/correspondent and run another document through the
-consumer.  Once complete, you should see the newly-created document,
-automatically tagged with the appropriate data.
--- a/docs/index.rst
+++ b/docs/index.rst
@@ -1,17 +1,14 @@
-.. _index:
-
+*********
 Paperless
-=========
+*********

 Paperless is a simple Django application running in two parts:
-a :ref:`consumer <utilities-consumer>` (the thing that does the indexing) and
-the :ref:`webserver <utilities-webserver>` (the part that lets you search &
+a *Consumer* (the thing that does the indexing) and
+the *Web server* (the part that lets you search &
 download already-indexed documents). If you want to learn more about its
 functions keep on reading after the installation section.


-.. _index-why-this-exists:
-
 Why This Exists
 ===============

@@ -25,22 +22,54 @@ finding stuff again. I feed documents right from the post box into the scanner
 and then shred them.  Perhaps you might find it useful too.


+Paperless-ng
+============
+
+Paperless-ng is a fork of the original paperless project. It changes many
+things both on the surface and under the hood. Paperless-ng was created
+because I feel that these changes are too big to be pushed into the main
+repository right away.
+
+NG stands for both Angular (the framework used for the
+Frontend) and next-gen. Publishing this project under a different name also
+avoids confusion between paperless and paperless-ng.
+
+If you want to learn about what's different in paperless-ng, check out these
+resources in the documentation:
+
+*   :ref:`Some screenshots <screenshots>` of the new UI are available.
+*   Read :ref:`this section <advanced-automatic_matching>` if you want to
+    learn about how paperless automates all tagging using machine learning.
+*   Paperless now comes with a :ref:`proper email consumer <usage-email>`
+    that's fully tested and production ready.
+*   Paperless creates searchable PDF/A documents from whatever you you put into
+    the consumption directory. This means that you can select text in
+    image-only documents coming from your scanner.
+*   See :ref:`this note <utilities-encyption>` about GnuPG encryption in
+    paperless-ng.
+*   Paperless is now integrated with a
+    :ref:`task processing queue <setup-task_processor>` that tells you
+    at a glance when and why something is not working. 
+*   The :ref:`changelog <paperless_changelog>` contains a detailed list of all changes
+    in paperless-ng.
+
+It would be great if this project could eventually merge back into the main
+repository, but it needs a lot more work before that can happen.


 Contents
 ========

 .. toctree::
-   :maxdepth: 2
+   :maxdepth: 1

-   requirements
   setup
-   consumption
+   usage_overview
+   advanced_usage
+   administration
+   configuration
   api
-   utilities
-   guesswork
-   migrating
-   customising
+   faq
   extending
   troubleshooting
   contributing
--- a/docs/migrating.rst
+++ b/docs/migrating.rst
@@ -1,109 +0,0 @@
-.. _migrating:
-
-Migrating, Updates, and Backups
-===============================
-
-As Paperless is still under active development, there's a lot that can change
-as software updates roll out.  You should backup often, so if anything goes
-wrong during an update, you at least have a means of restoring to something
-usable.  Thankfully, there are automated ways of backing up, restoring, and
-updating the software.
-
-
-.. _migrating-backup:
-
-Backing Up
----------
-
-So you're bored of this whole project, or you want to make a remote backup of
-your files for whatever reason.  This is easy to do, simply use the
-:ref:`exporter <utilities-exporter>` to dump your documents and database out
-into an arbitrary directory.
-
-
-.. _migrating-restoring:
-
-Restoring
---------
-
-Restoring your data is just as easy, since nearly all of your data exists either
-in the file names, or in the contents of the files themselves.  You just need to
-create an empty database (just follow the
-:ref:`installation instructions <setup-installation>` again) and then import the
-``tags.json`` file you created as part of your backup.  Lastly, copy your
-exported documents into the consumption directory and start up the consumer.
-
-.. code-block:: shell-session
-
-    $ cd /path/to/project
-    $ rm data/db.sqlite3  # Delete the database
-    $ cd src
-    $ ./manage.py migrate  # Create the database
-    $ ./manage.py createsuperuser
-    $ ./manage.py loaddata /path/to/arbitrary/place/tags.json
-    $ cp /path/to/exported/docs/* /path/to/consumption/dir/
-    $ ./manage.py document_consumer
-
-Importing your data if you are :ref:`using Docker <setup-installation-docker>`
-is almost as simple:
-
-.. code-block:: shell-session
-
-    # Stop and remove your current containers
-    $ docker-compose stop
-    $ docker-compose rm -f
-
-    # Recreate them, add the superuser
-    $ docker-compose up -d
-    $ docker-compose run --rm webserver createsuperuser
-
-    # Load the tags
-    $ cat /path/to/arbitrary/place/tags.json | docker-compose run --rm webserver loaddata_stdin -
-
-    # Load your exported documents into the consumption directory
-    # (How you do this highly depends on how you have set this up)
-    $ cp /path/to/exported/docs/* /path/to/mounted/consumption/dir/
-
-After loading the documents into the consumption directory the consumer will
-immediately start consuming the documents.
-
-
-.. _migrating-updates:
-
-Updates
-------
-
-For the most part, all you have to do to update Paperless is run ``git pull``
-on the directory containing the project files, and then use Django's
-``migrate`` command to execute any database schema updates that might have been
-rolled in as part of the update:
-
-.. code-block:: shell-session
-
-    $ cd /path/to/project
-    $ git pull
-    $ pip install -r requirements.txt
-    $ cd src
-    $ ./manage.py migrate
-
-Note that it's possible (even likely) that while ``git pull`` may update some
-files, the ``migrate`` step may not update anything.  This is totally normal.
-
-Additionally, as new features are added, the ability to control those features
-is typically added by way of an environment variable set in ``paperless.conf``.
-You may want to take a look at the ``paperless.conf.example`` file to see if
-there's anything new in there compared to what you've got in ``/etc``.
-
-If you are :ref:`using Docker <setup-installation-docker>` the update process
-is similar:
-
-.. code-block:: shell-session
-
-    $ cd /path/to/project
-    $ git pull
-    $ docker build -t paperless .
-    $ docker-compose run --rm consumer migrate
-    $ docker-compose up -d
-
-If ``git pull`` doesn't report any changes, there is no need to continue with
-the remaining steps.
--- a/docs/requirements.rst
+++ b/docs/requirements.rst
@@ -1,125 +0,0 @@
-.. _requirements:
-
-Requirements
-============
-
-You need a Linux machine or Unix-like setup (theoretically an Apple machine
-should work) that has the following software installed:
-
-* `Python3`_ (with development libraries, pip and virtualenv)
-* `GNU Privacy Guard`_
-* `Tesseract`_, plus its language files matching your document base.
-* `Imagemagick`_ version 6.7.5 or higher
-* `unpaper`_
-* `libpoppler-cpp-dev`_ PDF rendering library
-* `optipng`_
-
-.. _Python3: https://python.org/
-.. _GNU Privacy Guard: https://gnupg.org
-.. _Tesseract: https://github.com/tesseract-ocr
-.. _Imagemagick: http://imagemagick.org/
-.. _unpaper: https://github.com/unpaper/unpaper
-.. _libpoppler-cpp-dev: https://poppler.freedesktop.org/
-.. _optipng: http://optipng.sourceforge.net/
-
-Notably, you should confirm how you access your Python3 installation.  Many
-Linux distributions will install Python3 in parallel to Python2, using the
-names ``python3`` and ``python`` respectively.  The same goes for ``pip3`` and
-``pip``.  Running Paperless with Python2 will likely break things, so make sure
-that you're using the right version.
-
-For the purposes of simplicity, ``python`` and ``pip`` is used everywhere to
-refer to their Python3 versions.
-
-In addition to the above, there are a number of Python requirements, all of
-which are listed in a file called ``requirements.txt`` in the project root
-directory.
-
-If you're not working on a virtual environment (like Docker), you
-should probably be using a virtualenv, but that's your call.  The reasons why
-you might choose a virtualenv or not aren't really within the scope of this
-document.  Needless to say if you don't know what a virtualenv is, you should
-probably figure that out before continuing.
-
-
-.. _requirements-apple:
-
-Problems with Imagemagick & PDFs
--------------------------------
-
-Some users have `run into problems`_ with getting ImageMagick to do its thing
-with PDFs.  Often this is the case with Apple systems using HomeBrew, but other
-Linuxes have been a problem as well.  The solution appears to be to install
-ghostscript as well as ImageMagick:
-
-.. _run into problems: https://github.com/the-paperless-project/paperless/issues/25
-
-.. code:: bash
-
-    $ brew install ghostscript
-    $ brew install imagemagick
-    $ brew install libmagic
-
-
-.. _requirements-baremetal:
-
-Python-specific Requirements: No Virtualenv
-------------------------------------------
-
-If you don't care to use a virtual env, then installation of the Python
-dependencies is easy:
-
-.. code:: bash
-
-    $ pip install --user --requirement /path/to/paperless/requirements.txt
-
-This will download and install all of the requirements into
-``${HOME}/.local``.  Remember that your distribution may be using ``pip3`` as
-mentioned above.
-
-
-.. _requirements-virtualenv:
-
-Python-specific Requirements: Virtualenv
----------------------------------------
-
-Using a virtualenv for this is pretty straightforward: create a virtualenv,
-enter it, and install the requirements using the ``requirements.txt`` file:
-
-.. code:: bash
-
-    $ virtualenv --python=/path/to/python3 /path/to/arbitrary/directory
-    $ . /path/to/arbitrary/directory/bin/activate
-    $ pip install  --requirement /path/to/paperless/requirements.txt
-
-Now you're ready to go.  Just remember to enter (activate) your virtualenv
-whenever you want to use Paperless.
-
-
-.. _requirements-documentation:
-
-Documentation
-------------
-
-As generation of the documentation is not required for the use of Paperless,
-dependencies for this process are not included in ``requirements.txt``.  If
-you'd like to generate your own docs locally, you'll need to:
-
-.. code:: bash
-
-    $ pip install sphinx
-
-and then cd into the ``docs`` directory and type ``make html``.
-
-If you are using Docker, you can use the following commands to build the
-documentation and run a webserver serving it on `port 8001`_:
-
-.. code:: bash
-
-    $ pwd
-    /path/to/paperless
-
-    $ docker build -t paperless:docs -f docs/Dockerfile .
-    $ docker run --rm -it -p "8001:8000" paperless:docs
-
-.. _port 8001: http://127.0.0.1:8001
--- a/docs/requirements.txt
+++ b/docs/requirements.txt
--- a/docs/scanners.rst
+++ b/docs/scanners.rst
@@ -1,12 +1,14 @@
+
 .. _scanners:

-Scanner Recommendations
-=======================
+***********************
+Scanner recommendations
+***********************

 As Paperless operates by watching a folder for new files, doesn't care what
 scanner you use, but sometimes finding a scanner that will write to an FTP,
 NFS, or SMB server can be difficult.  This page is here to help you find one
-that works right for you based on recommentations from other Paperless users.
+that works right for you based on recommendations from other Paperless users.

 +---------+----------------+-----+-----+-----+----------------+
 | Brand   | Model          | Supports        | Recommended By |
@@ -25,6 +27,8 @@ that works right for you based on recommentations from other Paperless users.
 +---------+----------------+-----+-----+-----+----------------+
 | Epson   | `WF-7710DWF`_  | yes |     | yes | `Skylinar`_    |
 +---------+----------------+-----+-----+-----+----------------+
+| Fujitsu | `S1300i`_      | yes |     | yes | `jonaswinkler`_|
+---------+----------------+-----+-----+-----+----------------+

 .. _ADS-1500W: https://www.brother.ca/en/p/ads1500w
 .. _MFC-J6930DW: https://www.brother.ca/en/p/MFCJ6930DW
@@ -32,6 +36,7 @@ that works right for you based on recommentations from other Paperless users.
 .. _MFC-9142CDN: https://www.brother.co.uk/printers/laser-printers/mfc9140cdn
 .. _ix500: http://www.fujitsu.com/us/products/computing/peripheral/scanners/scansnap/ix500/
 .. _WF-7710DWF: https://www.epson.de/en/products/printers/inkjet-printers/for-home/workforce-wf-7710dwf
+.. _S1300i: https://www.fujitsu.com/global/products/computing/peripheral/scanners/soho/s1300i/

 .. _danielquinn: https://github.com/danielquinn
 .. _ayounggun: https://github.com/ayounggun
@@ -39,3 +44,4 @@ that works right for you based on recommentations from other Paperless users.
 .. _eonist: https://github.com/eonist
 .. _REOLDEV: https://github.com/REOLDEV
 .. _Skylinar: https://github.com/Skylinar
+.. _jonaswinkler: https://github.com/jonaswinkler
--- a/docs/screenshots.rst
+++ b/docs/screenshots.rst
@@ -1,16 +1,45 @@
 .. _screenshots:

+***********
 Screenshots
-===========
+***********

-Once everything is set-up login to paperless using the web front-end
+This is what paperless-ng looks like. You shouldn't use paperless to index
+research papers though, its a horrible tool for that job.

-.. image:: ./_static/Screenshot_first_run_login.png 
+The dashboard shows customizable views on your document and allows document uploads:

-Nice clean interface
+.. image:: _static/screenshots/dashboard.png

-.. image:: ./_static/Screenshot_first_logged.png 
+The document list provides three different styles to scroll through your documents:

-Some documents loaded in via ftp or using the scanners ftp. 
+.. image:: _static/screenshots/documents-table.png
+.. image:: _static/screenshots/documents-smallcards.png
+.. image:: _static/screenshots/documents-largecards.png
+
+Extensive filtering mechanisms:
+
+.. image:: _static/screenshots/documents-filter.png
+
+Side-by-side editing of documents. Optimized for 1080p.
+
+.. image:: _static/screenshots/editing.png
+
+Tag editing. This looks about the same for correspondents and document types.
+
+.. image:: _static/screenshots/new-tag.png
+
+Searching provides auto complete and highlights the results.
+
+.. image:: _static/screenshots/search-preview.png
+.. image:: _static/screenshots/search-results.png
+
+Fancy mail filters!
+
+.. image:: _static/screenshots/mail-rules-edited.png
+
+Mobile support in the future? This kinda works, however some layouts are still
+too wide.
+
+.. image:: _static/screenshots/mobile.png

-.. image:: ./_static/Screenshot_upload_and_scanned.png 
--- a/docs/setup.rst
+++ b/docs/setup.rst
--- a/docs/troubleshooting.rst
+++ b/docs/troubleshooting.rst
@@ -1,75 +1,51 @@
-.. _troubleshooting:
-
+***************
 Troubleshooting
-===============
+***************

-.. _troubleshooting-languagemissing:
+No files are added by the consumer
+##################################

-Consumer warns ``OCR for XX failed``
------------------------------------
+Check for the following issues:

-If you find the OCR accuracy to be too low, and/or the document consumer warns
-that ``OCR for XX failed, but we're going to stick with what we've got since
-FORGIVING_OCR is enabled``, then you might need to install the
-`Tesseract language files <http://packages.ubuntu.com/search?keywords=tesseract-ocr>`_
-marching your document's languages.
+*   Ensure that the directory you're putting your documents in is the folder
+    paperless is watching. With docker, this setting is performed in the
+    ``docker-compose.yml`` file. Without docker, look at the ``CONSUMPTION_DIR``
+    setting. Don't adjust this setting if you're using docker.
+*   Ensure that redis is up and running. Paperless does its task processing
+    asynchronously, and for documents to arrive at the task processor, it needs
+    redis to run.
+*   Ensure that the task processor is running. Docker does this automatically.
+    Manually invoke the task processor by executing

-As an example, if you are running Paperless from any Ubuntu or Debian
-box, and your documents are written in Spanish you may need to run::
+    .. code:: shell-session

-    apt-get install -y tesseract-ocr-spa
+        $ python3 manage.py qcluster
+
+*   Look at the output of paperless and inspect it for any errors.
+*   Go to the admin interface, and check if there are failed tasks. If so, the
+    tasks will contain an error message.


-.. _troubleshooting-convertpixelcache:
+Consumer fails to pickup any new files
+######################################

-Consumer dies with ``convert: unable to extent pixel cache``
------------------------------------------------------------
+If you notice that the consumer will only pickup files in the consumption
+directory at startup, but won't find any other files added later, check out
+the configuration file and enable filesystem polling with the setting
+``PAPERLESS_CONSUMER_POLLING``.

-During the consumption process, Paperless invokes ImageMagick's ``convert``
-program to translate the source document into something that the OCR engine can
-understand and this can burn a Very Large amount of memory if the original
-document is rather long.  Similarly, if your system doesn't have a lot of
-memory to begin with (ie. a Raspberry Pi), then this can happen for even
-medium-sized documents.
+Operation not permitted
+#######################

-The solution is to tell ImageMagick *not* to Use All The RAM, as is its
-default, and instead tell it to used a fixed amount.  ``convert`` will then
-break up the job into hundreds of individual files and use them to slowly
-compile the finished image.  Simply set ``PAPERLESS_CONVERT_MEMORY_LIMIT`` in
-``/etc/paperless.conf`` to something like ``32000000`` and you'll limit
-``convert`` to 32MB.  Fiddle with this value as you like.
+You might see errors such as:

-**HOWEVER**: Simply setting this value may not be enough on system where
-``/tmp`` is mounted as tmpfs, as this is where ``convert`` will write its
-temporary files.  In these cases (most Systemd machines), you need to tell
-ImageMagick to use a different space for its scratch work.  You do this by
-setting ``PAPERLESS_CONVERT_TMPDIR`` in ``/etc/paperless.conf`` to somewhere
-that's actually on a physical disk (and writable by the user running
-Paperless), like ``/var/tmp/paperless`` or ``/home/my_user/tmp`` in a pinch.
+.. code::

+    chown: changing ownership of '../export': Operation not permitted

-.. _troubleshooting-decompressionbombwarning:
+The container tries to set file ownership on the listed directories. This is
+required so that the user running paperless inside docker has write permissions
+to these folders. This happens when pointing these directories to NFS shares,
+for example.

-DecompressionBombWarning and/or no text in the OCR output
---------------------------------------------------------
-Some users have had issues using Paperless to consume PDFs that were created
-by merging Very Large Scanned Images into one PDF.  If this happens to you,
-it's likely because the PDF you've created contains some very large pages
-(millions of pixels) and the process of converting the PDF to a OCR-friendly
-image is exploding.
-
-Typically, this happens because the scanned images are created with a high
-DPI and then rolled into the PDF with an assumed DPI of 72 (the default).
-The best solution then is to specify the DPI used in the scan in the
-conversion-to-PDF step.  So for example, if you scanned the original image
-with a DPI of 300, then merging the images into the single PDF with
-``convert`` should look like this:
-
-.. code:: bash
-
-    $ convert -density 300 *.jpg finished.pdf
-
-For more information on this and situations like it, you should take a look
-at `Issue #118`_ as that's where this tip originated.
-
-.. _Issue #118: https://github.com/the-paperless-project/paperless/issues/118
+Ensure that `chown` is possible on these directories.
--- a/docs/usage_overview.rst
+++ b/docs/usage_overview.rst
@@ -0,0 +1,403 @@
+**************
+Usage Overview
+**************
+
+Paperless is an application that manages your personal documents. With
+the help of a document scanner (see :ref:`scanners`), paperless transforms
+your wieldy physical document binders into a searchable archive and
+provides many utilities for finding and managing your documents.
+
+
+Terms and definitions
+#####################
+
+Paperless essentially consists of two different parts for managing your
+documents:
+
+* The *consumer* watches a specified folder and adds all documents in that
+  folder to paperless.
+* The *web server* provides a UI that you use to manage and search for your
+  scanned documents.
+
+Each document has a couple of fields that you can assign to them:
+
+* A *Document* is a piece of paper that sometimes contains valuable
+  information.
+* The *correspondent* of a document is the person, institution or company that
+  a document either originates form, or is sent to.
+* A *tag* is a label that you can assign to documents. Think of labels as more
+  powerful folders: Multiple documents can be grouped together with a single
+  tag, however, a single document can also have multiple tags. This is not
+  possible with folders. The reason folders are not implemented in paperless
+  is simply that tags are much more versatile than folders.
+* A *document type* is used to demarcate the type of a document such as letter,
+  bank statement, invoice, contract, etc. It is used to identify what a document
+  is about.
+* The *date added* of a document is the date the document was scanned into
+  paperless. You cannot and should not change this date.
+* The *date created* of a document is the date the document was initially issued.
+  This can be the date you bought a product, the date you signed a contract, or
+  the date a letter was sent to you.
+* The *archive serial number* (short: ASN) of a document is the identifier of
+  the document in your physical document binders. See
+  :ref:`usage-recommended_workflow` below.
+* The *content* of a document is the text that was OCR'ed from the document.
+  This text is fed into the search engine and is used for matching tags,
+  correspondents and document types.
+
+
+Frontend overview
+#################
+
+.. warning::
+
+    TBD. Add some fancy screenshots!
+
+Adding documents to paperless
+#############################
+
+Once you've got Paperless setup, you need to start feeding documents into it.
+When adding documents to paperless, it will perform the following operations on
+your documents:
+
+1.  OCR the document, if it has no text. Digital documents usually have text,
+    and this step will be skipped for those documents.
+2.  Paperless will create an archiveable PDF/A document from your document.
+    If this document is coming from your scanner, it will have embedded selectable text.
+3.  Paperless performs automatic matching of tags, correspondents and types on the
+    document before storing it in the database.
+
+.. hint::
+
+    This process can be configured to fit your needs. If you don't want paperless
+    to create archived versions for digital documents, you can configure that by
+    configuring ``PAPERLESS_OCR_MODE=skip_noarchive``. Please read the 
+    :ref:`relevant section in the documentation <configuration-ocr>`.
+
+.. note::
+
+    No matter which options you choose, Paperless will always store the original
+    document that it found in the consumption directory or in the mail and
+    will never overwrite that document. Archived versions are stored alongside the
+    original versions.
+
+
+The consumption directory
+=========================
+
+The primary method of getting documents into your database is by putting them in
+the consumption directory.  The consumer runs in an infinite
+loop looking for new additions to this directory and when it finds them, it goes
+about the process of parsing them with the OCR, indexing what it finds, and storing
+it in the media directory.
+
+Getting stuff into this directory is up to you.  If you're running Paperless
+on your local computer, you might just want to drag and drop files there, but if
+you're running this on a server and want your scanner to automatically push
+files to this directory, you'll need to setup some sort of service to accept the
+files from the scanner.  Typically, you're looking at an FTP server like
+`Proftpd`_ or a Windows folder share with `Samba`_.
+
+.. _Proftpd: http://www.proftpd.org/
+.. _Samba: http://www.samba.org/
+
+.. TODO: hyperref to configuration of the location of this magic folder.
+
+Dashboard upload
+================
+
+The dashboard has a file drop field to upload documents to paperless. Simply drag a file
+onto this field or select a file with the file dialog. Multiple files are supported.
+
+
+Mobile upload
+=============
+
+The mobile app over at `<https://github.com/qcasey/paperless_share>`_ allows Android users
+to share any documents with paperless. This can be combined with any of the mobile
+scanning apps out there, such as Office Lens.
+
+Furthermore, there is the  `Paperless App <https://github.com/bauerj/paperless_app>`_ as well,
+which no only has document upload, but also document editing and browsing.
+
+.. _usage-email:
+
+IMAP (Email)
+============
+
+You can tell paperless-ng to consume documents from your email accounts.
+This is a very flexible and powerful feature, if you regularly received documents
+via mail that you need to archive. The mail consumer can be configured by using the
+admin interface in the following manner:
+
+1.  Define e-mail accounts.
+2.  Define mail rules for your account.
+
+These rules perform the following:
+
+1.  Connect to the mail server.
+2.  Fetch all matching mails (as defined by folder, maximum age and the filters)
+3.  Check if there are any consumable attachments.
+4.  If so, instruct paperless to consume the attachments and optionally
+    use the metadata provided in the rule for the new document.
+5.  If documents were consumed from a mail, the rule action is performed
+    on that mail.
+
+Paperless will completely ignore mails that do not match your filters. It will also
+only perform the action on mails that it has consumed documents from.
+
+The actions all ensure that the same mail is not consumed twice by different means.
+These are as follows:
+
+*   **Delete:** Immediately deletes mail that paperless has consumed documents from.
+    Use with caution.
+*   **Mark as read:** Mark consumed mail as read. Paperless will not consume documents
+    from already read mails. If you read a mail before paperless sees it, it will be
+    ignored.
+*   **Flag:** Sets the 'important' flag on mails with consumed documents. Paperless
+    will not consume flagged mails.
+*   **Move to folder:** Moves consumed mails out of the way so that paperless wont
+    consume them again.
+
+.. caution::
+
+    The mail consumer will perform these actions on all mails it has consumed
+    documents from. Keep in mind that the actual consumption process may fail
+    for some reason, leaving you with missing documents in paperless.
+
+.. note::
+
+    With the correct set of rules, you can completely automate your email documents.
+    Create rules for every correspondent you receive digital documents from and
+    paperless will read them automatically. The default action "mark as read" is
+    pretty tame and will not cause any damage or data loss whatsoever.
+
+    You can also setup a special folder in your mail account for paperless and use
+    your favorite mail client to move to be consumed mails into that folder
+    automatically or manually and tell paperless to move them to yet another folder
+    after consumption. It's up to you.
+
+.. note::
+
+    Paperless will process the rules in the order defined in the admin page.
+
+    You can define catch-all rules and have them executed last to consume
+    any documents not matched by previous rules. Such a rule may assign an "Unknown
+    mail document" tag to consumed documents so you can inspect them further.
+
+Paperless is set up to check your mails every 10 minutes. This can be configured on the
+'Scheduled tasks' page in the admin.
+
+
+REST API
+========
+
+You can also submit a document using the REST API, see :ref:`api-file_uploads` for details.
+
+.. _basic-searching:
+
+
+Best practices
+##############
+
+Paperless offers a couple tools that help you organize your document collection. However,
+it is up to you to use them in a way that helps you organize documents and find specific
+documents when you need them. This section offers a couple ideas for managing your collection.
+
+Document types allow you to classify documents according to what they are. You can define
+types such as "Receipt", "Invoice", or "Contract". If you used to collect all your receipts
+in a single binder, you can recreate that system in paperless by defining a document type,
+assigning documents to that type and then filtering by that type to only see all receipts.
+
+Not all documents need document types. Sometimes its hard to determine what the type of a
+document is or it is hard to justify creating a document type that you only need once or twice.
+This is okay. As long as the types you define help you organize your collection in the way
+you want, paperless is doing its job.
+
+Tags can be used in many different ways. Think of tags are more versatile folders or binders.
+If you have a binder for documents related to university / your car or health care, you can
+create these binders in paperless by creating tags and assigning them to relevant documents.
+Just as with documents, you can filter the document list by tags and only see documents of
+a certain topic.
+
+With physical documents, you'll often need to decide which folder the document belongs to.
+The advantage of tags over folders and binders is that a single document can have multiple
+tags. A physical document cannot magically appear in two different folders, but with tags,
+this is entirely possible.
+
+.. hint::
+
+  This can be used in many different ways. One example: Imagine you're working on a particular
+  task, such as signing up for university. Usually you'll need to collect a bunch of different
+  documents that are already sorted into various folders. With the tag system of paperless,
+  you can create a new group of documents that are relevant to this task without destroying
+  the already existing organization. When you're done with the task, you could delete the
+  tag again, which would be equal to sorting documents back into the folder they belong into.
+  Or keep the tag, up to you.
+
+All of the logic above applies to correspondents as well. Attach them to documents if you
+feel that they help you organize your collection.
+
+When you've started organizing your documents, create a couple saved views for document collections
+you regularly access. This is equal to having labeled physical binders on your desk, except
+that these saved views are dynamic and simply update themselves as you add documents to the system.
+
+Here are a couple examples of tags and types that you could use in your collection.
+
+* An ``inbox`` tag for newly added documents that you haven't manually edited yet.
+* A tag ``car`` for everything car related (repairs, registration, insurance, etc)
+* A tag ``todo`` for documents that you still need to do something with, such as reply, or
+  perform some task online.
+* A tag ``bank account x`` for all bank statement related to that account.
+* A tag ``mail`` for anything that you added to paperless via its mail processing capabilities.
+* A tag ``missing_metadata`` when you still need to add some metadata to a document, but can't
+  or don't want to do this right now.
+
+Searching
+#########
+
+Paperless offers an extensive searching mechanism that is designed to allow you to quickly
+find a document you're looking for (for example, that thing that just broke and you bought
+a couple months ago, that contract you signed 8 years ago).
+
+When you search paperless for a document, it tries to match this query against your documents.
+Paperless will look for matching documents by inspecting their content, title, correspondent,
+type and tags. Paperless returns a scored list of results, so that documents matching your query
+better will appear further up in the search results.
+
+By default, paperless returns only documents which contain all words typed in the search bar.
+However, paperless also offers advanced search syntax if you want to drill down the results
+further.
+
+Matching documents with logical expressions:
+
+.. code::
+
+  shopname AND (product1 OR product2)
+
+Matching specific tags, correspondents or types:
+
+.. code::
+
+  type:invoice tag:unpaid
+  correspondent:university certificate
+
+Matching dates:
+
+.. code::
+  
+  created:[2005 to 2009]
+  added:yesterday
+  modified:today
+
+Matching inexact words:
+
+.. code::
+
+  produ*name
+
+.. note::
+
+  Inexact terms are hard for search indexes. These queries might take a while to execute. That's why paperless offers
+  auto complete and query correction.
+
+All of these constructs can be combined as you see fit.
+If you want to learn more about the query language used by paperless, paperless uses Whoosh's default query language. 
+Head over to `Whoosh query language <https://whoosh.readthedocs.io/en/latest/querylang.html>`_.
+For details on what date parsing utilities are available, see
+`Date parsing <https://whoosh.readthedocs.io/en/latest/dates.html#parsing-date-queries>`_.
+ 
+
+.. _usage-recommended_workflow:
+
+The recommended workflow
+########################
+
+Once you have familiarized yourself with paperless and are ready to use it
+for all your documents, the recommended workflow for managing your documents
+is as follows. This workflow also takes into account that some documents
+have to be kept in physical form, but still ensures that you get all the
+advantages for these documents as well.
+
+The following diagram shows how easy it is to manage your documents.
+
+.. image:: _static/recommended_workflow.png
+
+Preparations in paperless
+=========================
+
+* Create an inbox tag that gets assigned to all new documents.
+* Create a TODO tag.
+
+Processing of the physical documents
+====================================
+
+Keep a physical inbox. Whenever you receive a document that you need to
+archive, put it into your inbox. Regularly, do the following for all documents
+in your inbox:
+
+1.  For each document, decide if you need to keep the document in physical
+    form. This applies to certain important documents, such as contracts and
+    certificates.
+2.  If you need to keep the document, write a running number on the document
+    before scanning, starting at one and counting upwards. This is the archive
+    serial number, or ASN in short.
+3.  Scan the document.
+4.  If the document has an ASN assigned, store it in a *single* binder, sorted
+    by ASN. Don't order this binder in any other way.
+5.  If the document has no ASN, throw it away. Yay!
+
+Over time, you will notice that your physical binder will fill up. If it is
+full, label the binder with the range of ASNs in this binder (i.e., "Documents
+1 to 343"), store the binder in your cellar or elsewhere, and start a new
+binder.
+
+The idea behind this process is that you will never have to use the physical
+binders to find a document. If you need a specific physical document, you
+may find this document by:
+
+1.  Searching in paperless for the document.
+2.  Identify the ASN of the document, since it appears on the scan.
+3.  Grab the relevant document binder and get the document. This is easy since
+    they are sorted by ASN.
+
+Processing of documents in paperless
+====================================
+
+Once you have scanned in a document, proceed in paperless as follows.
+
+1.  If the document has an ASN, assign the ASN to the document.
+2.  Assign a correspondent to the document (i.e., your employer, bank, etc)
+    This isn't strictly necessary but helps in finding a document when you need
+    it.
+3.  Assign a document type (i.e., invoice, bank statement, etc) to the document
+    This isn't strictly necessary but helps in finding a document when you need
+    it.
+4.  Assign a proper title to the document (the name of an item you bought, the
+    subject of the letter, etc)
+5.  Check that the date of the document is correct. Paperless tries to read
+    the date from the content of the document, but this fails sometimes if the
+    OCR is bad or multiple dates appear on the document.
+6.  Remove inbox tags from the documents.
+
+.. hint::
+    
+    You can setup manual matching rules for your correspondents and tags and
+    paperless will assign them automatically. After consuming a couple documents,
+    you can even ask paperless to *learn* when to assign tags and correspondents
+    by itself. For details on this feature, see :ref:`advanced-matching`.
+
+Task management
+===============
+
+Some documents require attention and require you to act on the document. You
+may take two different approaches to handle these documents based on how
+regularly you intent to use paperless and scan documents.
+
+* If you scan and process your documents in paperless regularly, assign a
+  TODO tag to all scanned documents that you need to process. Create a saved
+  view on the dashboard that shows all documents with this tag.
+* If you do not scan documents regularly and use paperless solely for archiving,
+  create a physical todo box next to your physical inbox and put documents you
+  need to process in the TODO box. When you performed the task associated with
+  the document, move it to the inbox.
--- a/docs/utilities.rst
+++ b/docs/utilities.rst
@@ -1,284 +0,0 @@
-.. _utilities:
-
-Utilities
-=========
-
-There's basically three utilities to Paperless: the webserver, consumer, and
-if needed, the exporter.  They're all detailed here.
-
-
-.. _utilities-webserver:
-
-The Webserver
-------------
-
-At the heart of it, Paperless is a simple Django webservice, and the entire
-interface is based on Django's standard admin interface.  Once running, visiting
-the URL for your service delivers the admin, through which you can get a
-detailed listing of all available documents, search for specific files, and
-download whatever it is you're looking for.
-
-
-.. _utilities-webserver-howto:
-
-How to Use It
-.............
-
-The webserver is started via the ``manage.py`` script:
-
-.. code-block:: shell-session
-
-    $ /path/to/paperless/src/manage.py runserver
-
-By default, the server runs on localhost, port 8000, but you can change this
-with a few arguments, run ``manage.py --help`` for more information.
-
-Add the option ``--noreload`` to reduce resource usage. Otherwise, the server
-continuously polls all source files for changes to auto-reload them.
-
-Note that when exiting this command your webserver will disappear.
-If you want to run this full-time (which is kind of the point)
-you'll need to have it start in the background -- something you'll need to
-figure out for your own system.  To get you started though, there are Systemd
-service files in the ``scripts`` directory.
-
-
-.. _utilities-consumer:
-
-The Consumer
------------
-
-The consumer script runs in an infinite loop, constantly looking at a directory
-for documents to parse and index.  The process is pretty straightforward:
-
-1. Look in ``CONSUMPTION_DIR`` for a document.  If one is found, go to #2.
-   If not, wait 10 seconds and try again.  On Linux, new documents are detected
-   instantly via inotify, so there's no waiting involved.
-2. Parse the document with Tesseract
-3. Create a new record in the database with the OCR'd text
-4. Attempt to automatically assign document attributes by doing some guesswork.
-   Read up on the :ref:`guesswork documentation<guesswork>` for more
-   information about this process.
-5. Encrypt the document (if you have a passphrase set) and store it in the
-   ``media`` directory under ``documents/originals``.
-6. Go to #1.
-
-
-.. _utilities-consumer-howto:
-
-How to Use It
-.............
-
-The consumer is started via the ``manage.py`` script:
-
-.. code-block:: shell-session
-
-    $ /path/to/paperless/src/manage.py document_consumer
-
-This starts the service that will consume documents as they appear in
-``CONSUMPTION_DIR``.
-
-Note that this command runs continuously, so exiting it will mean your webserver
-disappears.  If you want to run this full-time (which is kind of the point)
-you'll need to have it start in the background -- something you'll need to
-figure out for your own system.  To get you started though, there are Systemd
-service files in the ``scripts`` directory.
-
-Some command line arguments are available to customize the behavior of the
-consumer. By default it will use ``/etc/paperless.conf`` values. Display the
-help with:
-
-.. code-block:: shell-session
-
-    $ /path/to/paperless/src/manage.py document_consumer --help
-
-.. _utilities-exporter:
-
-The Exporter
------------
-
-Tired of fiddling with Paperless, or just want to do something stupid and are
-afraid of accidentally damaging your files?  You can export all of your
-documents into neatly named, dated, and unencrypted files.
-
-
-.. _utilities-exporter-howto:
-
-How to Use It
-.............
-
-This too is done via the ``manage.py`` script:
-
-.. code-block:: shell-session
-
-    $ /path/to/paperless/src/manage.py document_exporter /path/to/somewhere/
-
-This will dump all of your unencrypted documents into ``/path/to/somewhere``
-for you to do with as you please.  The files are accompanied with a special
-file, ``manifest.json`` which can be used to :ref:`import the files
-<utilities-importer>` at a later date if you wish.
-
-
-.. _utilities-exporter-howto-docker:
-
-Docker
-______
-
-If you are :ref:`using Docker <setup-installation-docker>`, running the
-expoorter is almost as easy.  To mount a volume for exports, follow the
-instructions in the ``docker-compose.yml.example`` file for the ``/export``
-volume (making the changes in your own ``docker-compose.yml`` file, of course).
-Once you have the volume mounted, the command to run an export is:
-
-.. code-block:: shell-session
-
-   $ docker-compose run --rm consumer document_exporter /export
-
-If you prefer to use ``docker run`` directly, supplying the necessary commandline
-options:
-
-.. code-block:: shell-session
-
-   $ # Identify your containers
-   $ docker-compose ps
-           Name                       Command                State     Ports
-   -------------------------------------------------------------------------
-   paperless_consumer_1    /sbin/docker-entrypoint.sh ...   Exit 0
-   paperless_webserver_1   /sbin/docker-entrypoint.sh ...   Exit 0
-
-   $ # Make sure to replace your passphrase and remove or adapt the id mapping
-   $ docker run --rm \
-       --volumes-from paperless_data_1 \
-       --volume /path/to/arbitrary/place:/export \
-       -e PAPERLESS_PASSPHRASE=YOUR_PASSPHRASE \
-       -e USERMAP_UID=1000 -e USERMAP_GID=1000 \
-       paperless document_exporter /export
-
-
-.. _utilities-importer:
-
-The Importer
------------
-
-Looking to transfer Paperless data from one instance to another, or just want
-to restore from a backup?  This is your go-to toy.
-
-
-.. _utilities-importer-howto:
-
-How to Use It
-.............
-
-The importer works just like the exporter.  You point it at a directory, and
-the script does the rest of the work:
-
-.. code-block:: shell-session
-
-    $ /path/to/paperless/src/manage.py document_importer /path/to/somewhere/
-
-Docker
-______
-
-Assuming that you've already gone through the steps above in the
-:ref:`export <utilities-exporter-howto-docker>` section, then the easiest thing
-to do is just re-use the ``/export`` path you already setup:
-
-.. code-block:: shell-session
-
-   $ docker-compose run --rm consumer document_importer /export
-
-Similarly, if you're not using docker-compose, you can adjust the export
-instructions above to do the import.
-
-
-.. _utilities-retagger:
-
-Re-running your tagging and correspondent matchers
--------------------------------------------------
-
-Say you've imported a few hundred documents and now want to introduce
-a tag or set up a new correspondent, and apply its matching to all of
-the currently-imported docs.  This problem is common enough that
-there are tools for it.
-
-
-.. _utilities-retagger-howto:
-
-How to Do It
-............
-
-This too is done via the ``manage.py`` script:
-
-.. code:: bash
-
-    $ /path/to/paperless/src/manage.py document_retagger
-
-Run this after changing or adding tagging rules.  It'll loop over all
-of the documents in your database and attempt to match all of your
-tags to them.  If one matches, it'll be applied.  And don't worry, you
-can run this as often as you like, it won't double-tag a document.
-
-.. code:: bash
-
-    $ /path/to/paperless/src/manage.py document_correspondents
-
-This is the similar command to run after adding or changing a correspondent.
-
-.. _utilities-encyption:
-
-Enabling Encrpytion
-------------------
-
-Let's say you've imported a few documents to play around with paperless and now
-you are using it more seriously and want to enable encryption of your files.
-
-.. utilities-encryption-howto:
-
-Basic Syntax
-.............
-
-Again we'll use the ``manage.py`` script, passing ``change_storage_type``:
-
-.. code:: console
-
-    $ /path/to/paperless/src/manage.py change_storage_type --help
-    usage: manage.py change_storage_type [-h] [--version] [-v {0,1,2,3}]
-                                     [--settings SETTINGS]
-                                     [--pythonpath PYTHONPATH] [--traceback]
-                                     [--no-color] [--passphrase PASSPHRASE]
-                                     {gpg,unencrypted} {gpg,unencrypted}
-
-    This is how you migrate your stored documents from an encrypted state to an
-    unencrypted one (or vice-versa)
-
-    positional arguments:
-      {gpg,unencrypted}     The state you want to change your documents from
-      {gpg,unencrypted}     The state you want to change your documents to
-
-    optional arguments:
-      --passphrase PASSPHRASE
-                            If PAPERLESS_PASSPHRASE isn't set already, you need to
-                            specify it here
-
-Enabling Encryption
-...................
-
-Basic usage to enable encryption of your document store (**USE A MORE SECURE PASSPHRASE**):
-
-(Note: If ``PAPERLESS_PASSPHRASE`` isn't set already, you need to specify it here)
-
-.. code:: bash
-
-    $ /path/to/paperless/src/manage.py change_storage_type [--passphrase SECR3TP4SSPHRA$E] unencrypted gpg
-
-
-Disabling Encryption
-....................
-
-Basic usage to enable encryption of your document store:
-
-(Note: Again, if ``PAPERLESS_PASSPHRASE`` isn't set already, you need to specify it here)
-
-.. code:: bash
-
-    $ /path/to/paperless/src/manage.py change_storage_type [--passphrase SECR3TP4SSPHRA$E] gpg unencrypted