reworking the documentation.

2025-12-24 02:05:48 -06:00 · 2020-11-13 18:46:19 +01:00
parent 04335e4aac
commit f2dbb74d44
21 changed files with 1042 additions and 1427 deletions
--- a/docs/_static/Screenshot_first_logged.png
+++ b/docs/_static/Screenshot_first_logged.png
--- a/docs/_static/Screenshot_first_run_login.png
+++ b/docs/_static/Screenshot_first_run_login.png
--- a/docs/_static/Screenshot_upload_and_scanned.png
+++ b/docs/_static/Screenshot_upload_and_scanned.png
--- a/docs/administration.rst
+++ b/docs/administration.rst
@@ -0,0 +1,354 @@
 **************
 Administration
 **************
 Making backups
 ##############
 .. warning::
    This section is not updated yet.
 So you're bored of this whole project, or you want to make a remote backup of
 your files for whatever reason.  This is easy to do, simply use the
 :ref:`exporter <utilities-exporter>` to dump your documents and database out
 into an arbitrary directory.
 .. _migrating-restoring:
 Restoring
 =========
 Restoring your data is just as easy, since nearly all of your data exists either
 in the file names, or in the contents of the files themselves.  You just need to
 create an empty database (just follow the
 :ref:`installation instructions <setup-installation>` again) and then import the
 ``tags.json`` file you created as part of your backup.  Lastly, copy your
 exported documents into the consumption directory and start up the consumer.
 .. code-block:: shell-session
    $ cd /path/to/project
    $ rm data/db.sqlite3  # Delete the database
    $ cd src
    $ ./manage.py migrate  # Create the database
    $ ./manage.py createsuperuser
    $ ./manage.py loaddata /path/to/arbitrary/place/tags.json
    $ cp /path/to/exported/docs/* /path/to/consumption/dir/
    $ ./manage.py document_consumer
 Importing your data if you are :ref:`using Docker <setup-installation-docker>`
 is almost as simple:
 .. code-block:: shell-session
    # Stop and remove your current containers
    $ docker-compose stop
    $ docker-compose rm -f
    # Recreate them, add the superuser
    $ docker-compose up -d
    $ docker-compose run --rm webserver createsuperuser
    # Load the tags
    $ cat /path/to/arbitrary/place/tags.json | docker-compose run --rm webserver loaddata_stdin -
    # Load your exported documents into the consumption directory
    # (How you do this highly depends on how you have set this up)
    $ cp /path/to/exported/docs/* /path/to/mounted/consumption/dir/
 After loading the documents into the consumption directory the consumer will
 immediately start consuming the documents.
 .. _administration-updating:
 Updating paperless
 ##################
 .. warning::
    This section is not updated yet.
 For the most part, all you have to do to update Paperless is run ``git pull``
 on the directory containing the project files, and then use Django's
 ``migrate`` command to execute any database schema updates that might have been
 rolled in as part of the update:
 .. code-block:: shell-session
    $ cd /path/to/project
    $ git pull
    $ pip install -r requirements.txt
    $ cd src
    $ ./manage.py migrate
 Note that it's possible (even likely) that while ``git pull`` may update some
 files, the ``migrate`` step may not update anything.  This is totally normal.
 Additionally, as new features are added, the ability to control those features
 is typically added by way of an environment variable set in ``paperless.conf``.
 You may want to take a look at the ``paperless.conf.example`` file to see if
 there's anything new in there compared to what you've got in ``/etc``.
 If you are :ref:`using Docker <setup-installation-docker>` the update process
 is similar:
 .. code-block:: shell-session
    $ cd /path/to/project
    $ git pull
    $ docker build -t paperless .
    $ docker-compose run --rm consumer migrate
    $ docker-compose up -d
 If ``git pull`` doesn't report any changes, there is no need to continue with
 the remaining steps.
 This depends on the route you've chosen to run paperless.
    a.  If you are not using docker, update python requirements. Paperless uses
        `Pipenv`_ for managing dependencies:
        .. code:: bash
            $ pip install --upgrade pipenv
            $ cd /path/to/paperless
            $ pipenv install
        This creates a new virtual environment (or uses your existing environment)
        and installs all dependencies into it. Running commands inside the environment
        is done via
        .. code:: bash
            $ cd /path/to/paperless/src
            $ pipenv run python3 manage.py my_command
        You will also need to build the frontend each time a new update is pushed.
        See updating paperless for more information. TODO REFERENCE
    b.  If you are using docker, build the docker image.
        .. code:: bash
            $ docker build -t jonaswinkler/paperless-ng:latest .
        Copy either docker-compose.yml.example or docker-compose.yml.sqlite.example
        to docker-compose.yml and adjust the consumption directory.
 Management utilities
 ####################
 Paperless comes with some management commands that perform various maintenance
 tasks on your paperless instance. You can invoce these commands either by
 .. code:: bash
    $ cd /path/to/paperless
    $ docker-compose run --rm webserver <command> <arguments>
 or
 .. code:: bash
    $ cd /path/to/paperless/src
    $ pipenv run python manage.py <command> <arguments>
 depending on whether you use docker or not.
 All commands have built-in help, which can be accessed by executing them with
 the argument ``--help``.
 Document exporter
 =================
 The document exporter exports all your data from paperless into a folder for
 backup or migration to another DMS.
 .. code::
    document_exporter target
 ``target`` is a folder to which the data gets written. This includes documents,
 thumbnails and a ``manifest.json`` file. The manifest contains all metadata from
 the database (correspondents, tags, etc).
 When you use the provided docker compose script, specify ``../export`` as the
 target. This path inside the container is automatically mounted on your host on
 the folder ``export``.
 .. _utilities-importer:
 Document importer
 =================
 The document importer takes the export produced by the `Document exporter`_ and
 imports it into paperless.
 The importer works just like the exporter.  You point it at a directory, and
 the script does the rest of the work:
 .. code::
    document_importer source
 When you use the provided docker compose script, put the export inside the
 ``export`` folder in your paperless source directory. Specify ``../export``
 as the ``source``.
 .. _utilities-retagger:
 Document retagger
 =================
 Say you've imported a few hundred documents and now want to introduce
 a tag or set up a new correspondent, and apply its matching to all of
 the currently-imported docs. This problem is common enough that
 there are tools for it.
 .. code::
    document_retagger [-h] [-c] [-T] [-t] [-i] [--use-first] [-f]
    optional arguments:
    -c, --correspondent
    -T, --tags
    -t, --document_type
    -i, --inbox-only
    --use-first
    -f, --overwrite
 Run this after changing or adding matching rules. It'll loop over all
 of the documents in your database and attempt to match documents
 according to the new rules.
 Specify any combination of ``-c``, ``-T`` and ``-t`` to have the
 retagger perform matching of the specified metadata type. If you don't
 specify any of these options, the document retagger won't do anything.
 Specify ``-i`` to have the document retagger work on documents tagged
 with inbox tags only. This is useful when you don't want to mess with
 your already processed documents.
 When multiple document types or correspondents match a single document,
 the retagger won't assign these to the document. Specify ``--use-first``
 to override this behaviour and just use the first correspondent or type
 it finds. This option does not apply to tags, since any amount of tags
 can be applied to a document.
 Finally, ``-f`` specifies that you wish to overwrite already assigned
 correspondents, types and/or tags. The default behaviour is to not
 assign correspondents and types to documents that have this data already
 assigned. ``-f`` works differently for tags: By default, only additional tags get
 added to documents, no tags will be removed. With ``-f``, tags that don't
 match a document anymore get removed as well.
 Managing the Automatic matching algorithm
 =========================================
 The *Auto* matching algorithm requires a trained neural network to work.
 This network needs to be updated whenever somethings in your data
 changes. The docker image takes care of that automatically with the task
 scheduler. You can manually renew the classifier by invoking the following
 management command:
 .. code::
    document_create_classifier
 This command takes no arguments.
 Managing the document search index
 ==================================
 The document search index is responsible for delivering search results for the
 website. The document index is automatically updated whenever documents get
 added to, changed, or removed from paperless. However, if the search yields
 non-existing documents or won't find anything, you may need to recreate the
 index manually.
 .. code::
    document_index {reindex,optimize}
 Specify ``reindex`` to have the index created from scratch. This may take some
 time.
 Specify ``optimize`` to optimize the index. This updates certain aspects of
 the index and usually makes queries faster and also ensures that the
 autocompletion works properly. This command is regularly invoked by the task
 scheduler.
 Managing filenames
 ==================
 .. warning::
    TBD
 .. code::
    document_renamer
 .. _utilities-encyption:
 Managing encrpytion
 ===================
 Documents can be stored in Paperless using GnuPG encryption.
 .. danger::
    Decryption is depreceated since paperless-ng 1.0 and doesn't really provide any
    additional security, since you have to store the passphrase in a configuration
    file on the same system as the encrypted documents for paperless to work. Also,
    paperless provides transparent access to your encrypted documents.
    Consider running paperless on an encrypted filesystem instead, which will then
    at least provide security against physical hardware theft.
 .. code::
    change_storage_type [--passphrase PASSPHRASE] {gpg,unencrypted} {gpg,unencrypted}
    positional arguments:
      {gpg,unencrypted}     The state you want to change your documents from
      {gpg,unencrypted}     The state you want to change your documents to
    optional arguments:
      --passphrase PASSPHRASE
 Enabling encryption
 -------------------
 Basic usage to enable encryption of your document store (**USE A MORE SECURE PASSPHRASE**):
 (Note: If ``PAPERLESS_PASSPHRASE`` isn't set already, you need to specify it here)
 .. code::
    change_storage_type [--passphrase SECR3TP4SSPHRA$E] unencrypted gpg
 Disabling encryption
 --------------------
 Basic usage to enable encryption of your document store:
 (Note: Again, if ``PAPERLESS_PASSPHRASE`` isn't set already, you need to specify it here)
 .. code::
    change_storage_type [--passphrase SECR3TP4SSPHRA$E] gpg unencrypted
 .. _Pipenv: https://pipenv.pypa.io/en/latest/
--- a/docs/advanced_usage.rst
+++ b/docs/advanced_usage.rst
@@ -0,0 +1,244 @@
 ***************
 Advanced topics
 ***************
 Paperless offers a couple features that automate certain tasks and make your life
 easier.
 Guesswork
 #########
 Any document you put into the consumption directory will be consumed, but if
 you name the file right, it'll automatically set some values in the database
 for you.  This is is the logic the consumer follows:
 1. Try to find the correspondent, title, and tags in the file name following
   the pattern: ``Date - Correspondent - Title - tag,tag,tag.pdf``.  Note that
   the format of the date is **rigidly defined** as ``YYYYMMDDHHMMSSZ`` or
   ``YYYYMMDDZ``.  The ``Z`` refers "Zulu time" AKA "UTC".
   The tags are optional, so the format ``Date - Correspondent - Title.pdf``
   works as well.
 2. If that doesn't work, we skip the date and try this pattern:
   ``Correspondent - Title - tag,tag,tag.pdf``.
 3. If that doesn't work, we try to find the correspondent and title in the file
   name following the pattern: ``Correspondent - Title.pdf``.
 4. If that doesn't work, just assume that the name of the file is the title.
 So given the above, the following examples would work as you'd expect:
 * ``20150314000700Z - Some Company Name - Invoice 2016-01-01 - money,invoices.pdf``
 * ``20150314Z - Some Company Name - Invoice 2016-01-01 - money,invoices.pdf``
 * ``Some Company Name - Invoice 2016-01-01 - money,invoices.pdf``
 * ``Another Company - Letter of Reference.jpg``
 * ``Dad's Recipe for Pancakes.png``
 These however wouldn't work:
 * ``2015-03-14 00:07:00 UTC - Some Company Name, Invoice 2016-01-01, money, invoices.pdf``
 * ``2015-03-14 - Some Company Name, Invoice 2016-01-01, money, invoices.pdf``
 * ``Some Company Name, Invoice 2016-01-01, money, invoices.pdf``
 * ``Another Company- Letter of Reference.jpg``
 Do I have to be so strict about naming?
 =======================================
 Rather than using the strict document naming rules, one can also set the option
 ``PAPERLESS_FILENAME_DATE_ORDER`` in ``paperless.conf`` to any date order
 that is accepted by dateparser_. Doing so will cause ``paperless`` to default
 to any date format that is found in the title, instead of a date pulled from
 the document's text, without requiring the strict formatting of the document
 filename as described above.
 .. _dateparser: https://github.com/scrapinghub/dateparser/blob/v0.7.0/docs/usage.rst#settings
 Transforming filenames for parsing
 ==================================
 Some devices can't produce filenames that can be parsed by the default
 parser. By configuring the option ``PAPERLESS_FILENAME_PARSE_TRANSFORMS`` in
 ``paperless.conf`` one can add transformations that are applied to the filename
 before it's parsed.
 The option contains a list of dictionaries of regular expressions (key:
 ``pattern``) and replacements (key: ``repl``) in JSON format, which are
 applied in order by passing them to ``re.subn``. Transformation stops
 after the first match, so at most one transformation is applied. The general
 syntax is
 .. code:: python
   [{"pattern":"pattern1", "repl":"repl1"}, {"pattern":"pattern2", "repl":"repl2"}, ..., {"pattern":"patternN", "repl":"replN"}]
 The example below is for a Brother ADS-2400N, a scanner that allows
 different names to different hardware buttons (useful for handling
 multiple entities in one instance), but insists on adding ``_<count>``
 to the filename.
 .. code:: python
   # Brother profile configuration, support "Name_Date_Count" (the default
   # setting) and "Name_Count" (use "Name" as tag and "Count" as title).
   PAPERLESS_FILENAME_PARSE_TRANSFORMS=[{"pattern":"^([a-z]+)_(\\d{8})_(\\d{6})_([0-9]+)\\.", "repl":"\\2\\3Z - \\4 - \\1."}, {"pattern":"^([a-z]+)_([0-9]+)\\.", "repl":" - \\2 - \\1."}]
 Matching tags, correspondents and document types
 ################################################
 After the consumer has tried to figure out what it could from the file name,
 it starts looking at the content of the document itself.  It will compare the
 matching algorithms defined by every tag and correspondent already set in your
 database to see if they apply to the text in that document.  In other words,
 if you defined a tag called ``Home Utility`` that had a ``match`` property of
 ``bc hydro`` and a ``matching_algorithm`` of ``literal``, Paperless will
 automatically tag your newly-consumed document with your ``Home Utility`` tag
 so long as the text ``bc hydro`` appears in the body of the document somewhere.
 The matching logic is quite powerful, and supports searching the text of your
 document with different algorithms, and as such, some experimentation may be
 necessary to get things right.
 In order to have a tag, correspondent or type assigned automatically to newly
 consumed documents, assign a match and matching algorithm using the web
 interface. These settings define when to assign correspondents, tags and types
 to documents.
 The following algorithms are available:
 * **Any:** Looks for any occurrence of any word provided in match in the PDF.
  If you define the match as ``Bank1 Bank2``, it will match documents containing
  either of these terms.
 * **All:** Requires that every word provided appears in the PDF, albeit not in the
  order provided.
 * **Literal:** Matches only if the match appears exactly as provided in the PDF.
 * **Regular expression:** Parses the match as a regular expression and tries to
  find a match within the document.
 * **Fuzzy match:** I dont know. Look at the source.
 * **Auto:** Tries to automatically match new documents. This does not require you
  to set a match. See the notes below.
 When using the "any" or "all" matching algorithms, you can search for terms
 that consist of multiple words by enclosing them in double quotes. For example,
 defining a match text of ``"Bank of America" BofA`` using the "any" algorithm,
 will match documents that contain either "Bank of America" or "BofA", but will
 not match documents containing "Bank of South America".
 Then just save your tag/correspondent and run another document through the
 consumer.  Once complete, you should see the newly-created document,
 automatically tagged with the appropriate data.
 Automatic matching
 ==================
 Paperless-ng comes with a new matching algorithm called *Auto*. This matching
 algorithm tries to assign tags, correspondents and document types to your
 documents based on how you have assigned these on existing documents. It
 uses a neural network under the hood.
 If, for example, all your bank statements of your account 123 at the Bank of
 America are tagged with the tag "bofa_123" and the matching algorithm of this
 tag is set to *Auto*, this neural network will examine your documents and
 automatically learn when to assign this tag.
 There are a couple caveats you need to keep in mind when using this feature:
 * Changes to your documents are not immediately reflected by the matching
  algorithm. The neural network needs to be *trained* on your documents after
  changes. Paperless periodically (default: once each hour) checks for changes
  and does this automatically for you.
 * The Auto matching algorithm only takes documents into account which are NOT
  placed in your inbox (i.e., have inbox tags assigned to them). This ensures
  that the neural network only learns from documents which you have correctly
  tagged before.
 * The matching algorithm can only work if there is a correlation between the
  tag, correspondent or document type and the document itself. Your bank
  statements usually contain your bank account number and the name of the bank,
  so this works reasonably well, However, tags such as "TODO" cannot be
  automatically assigned.
 * The matching algorithm needs a reasonable number of documents to identify when
  to assign tags, correspondents, and types. If one out of a thousand documents
  has the correspondent "Very obscure web shop I bought something five years
  ago", it will probably not assign this correspondent automatically if you buy
  something from them again. The more documents, the better.
 Hooking into the consumption process
 ####################################
 Sometimes you may want to do something arbitrary whenever a document is
 consumed.  Rather than try to predict what you may want to do, Paperless lets
 you execute scripts of your own choosing just before or after a document is
 consumed using a couple simple hooks.
 Just write a script, put it somewhere that Paperless can read & execute, and
 then put the path to that script in ``paperless.conf`` with the variable name
 of either ``PAPERLESS_PRE_CONSUME_SCRIPT`` or
 ``PAPERLESS_POST_CONSUME_SCRIPT``.
 .. TODO HYPEREF TO CONFIG
 .. important::
    These scripts are executed in a **blocking** process, which means that if
    a script takes a long time to run, it can significantly slow down your
    document consumption flow.  If you want things to run asynchronously,
    you'll have to fork the process in your script and exit.
 Pre-consumption script
 ======================
 Executed after the consumer sees a new document in the consumption folder, but
 before any processing of the document is performed. This script receives exactly
 one argument:
 * Document file name
 A simple but common example for this would be creating a simple script like
 this:
 ``/usr/local/bin/ocr-pdf``
 .. code:: bash
    #!/usr/bin/env bash
    pdf2pdfocr.py -i ${1}
 ``/etc/paperless.conf``
 .. code:: bash
    ...
    PAPERLESS_PRE_CONSUME_SCRIPT="/usr/local/bin/ocr-pdf"
    ...
 This will pass the path to the document about to be consumed to ``/usr/local/bin/ocr-pdf``,
 which will in turn call `pdf2pdfocr.py`_ on your document, which will then
 overwrite the file with an OCR'd version of the file and exit.  At which point,
 the consumption process will begin with the newly modified file.
 .. _pdf2pdfocr.py: https://github.com/LeoFCardoso/pdf2pdfocr
 .. _consumption-director-hook-variables-post:
 Post-consumption script
 =======================
 Executed after the consumer has successfully processed a document and has moved it
 into paperless. It receives the following arguments:
 * Document id
 * Generated file name
 * Source path
 * Thumbnail path
 * Download URL
 * Thumbnail URL
 * Correspondent
 * Tags
 The script can be in any language you like, but for a simple shell script
 example, you can take a look at ``post-consumption-example.sh`` in the
 ``scripts`` directory in this project.
 The post consumption script cannot cancel the consumption process.
--- a/docs/api.rst
+++ b/docs/api.rst
@@ -1,7 +1,12 @@
 .. _api:
 ************
 The REST API
-############
+************
 .. warning::
    This section is not updated yet.
 Paperless makes use of the `Django REST Framework`_ standard API interface
 because of its inherent awesomeness.  Conveniently, the system is also
@@ -15,7 +20,7 @@ installation.
 .. _api-uploading:
 Uploading
---------
+=========
 File uploads in an API are hard and so far as I've been able to tell, there's
 no standard way of accepting them, so rather than crowbar file uploads into the
--- a/docs/changelog.rst
+++ b/docs/changelog.rst
@@ -1,6 +1,79 @@
 .. _paperless_changelog:
 Changelog
 #########
 paperless-ng 1.0
 ================
 * **Deprecated:** GnuPG. Don't use it. If you're still using it, be aware that it
  offers no protection at all, since the passphrase is stored alongside with the
  encrypted documents itself. This features will most likely be removed in future
  versions.
 * **Added:** New frontend. Features:
  * Single page application: It's much more responsive than the django admin pages.
  * Dashboard. Shows recently scanned documents, or todos, or other documents
    at wish. Allows uploading of documents. Shows basic statistics.
  * Better document list with multiple display options.
  * Full text search with result highlighting, auto completion and scoring based
    on the query. It uses a document search index in the background.
  * Saveable filters.
  * Better log viewer.
 * **Added:** Document types. Assign these to documents just as correspondents.
  They may be used in the future to perform automatic operations on documents
  depending on the type.
 * **Added:** Inbox tags. Define an inbox tag and it will automatically be
  assigned to any new document scanned into the system.
 * **Added:** Automatic matching. A new matching algorithm that automatically
  assigns tags, document types and correspondents to your documents. It uses
  a neural network trained on your data.
 * **Added:** Archive serial numbers. Assign these to quickly find documents stored in
  physical binders.
 * **Added:** Enabled the internal user management of django. This isn't really a
  multi user solution, however, it allows more than one user to access the website
  and set some basic permissions / renew passwords.
 * **Modified [breaking]:** REST Api changes:
  * New filters added, other filters removed (case sensitive filters, slug filters)
  * Endpoints for thumbnails, previews and downloads replace the old ``/fetch/`` urls. Redirects are in place.
  * Endpoint for document uploads replaces the old ``/push`` url. Redirects are in place.
  * Foreign key relationships are now served as IDs, not as urls.
 * **Modified [breaking]:** PostgreSQL:
  * If ``PAPERLESS_DBHOST`` is specified in the settings, paperless uses postgresql instead of sqlite.
    Username, database and password all default to ``paperless`` if not specified.
  * **docker-compose.yml uses PostgreSQL by default.**
 * **Modified [breaking]:** document_retagger management command rework. See TODO hyperref
 * **Removed [breaking]:** Reminders.
 * **Removed:** All customizations made to the django admin pages.
 * **Internal changes:** Mostly code cleanup, including:
  * Rework of the code of the tesseract parser. This is now a lot cleaner.
  * Rework of the filename handling code. It was a mess.
  * Fixed some issues with the document exporter not exporting all documents when encountering duplicate filenames.
  * Consumer rework: now uses the excellent watchdog library, lots of code removed.
  * Added a task scheduler that takes care of checking mail, training the classifier and maintaining the document search index.
  * Updated dependencies. Now uses Pipenv all around.
  * Updated Dockerfile and docker-compose. Now uses ``supervisord`` to run everything paperless-related in a single container.
 * **Settings:**
  * ``PAPERLESS_FORGIVING_OCR`` is now default and gone. Reason: Even if ``langdetect`` fails to detect
    a language, tesseract still does a very good job at ocr'ing a document with the default language.
    Certain language specifics such as umlauts may not get picked up properly.
  * ``PAPERLESS_DEBUG`` defaults to ``false``.
  * The presence of ``PAPERLESS_DBHOST`` now determines whether to use PostgreSQL or
    sqlite.
 * Many more small changes here and there. The usual stuff.
 2.7.0
 =====
--- a/docs/changelog_jonaswinkler.rst
+++ b/docs/changelog_jonaswinkler.rst
@@ -1,15 +0,0 @@
 Changelog (jonaswinkler)
 ########################
 1.0.0
 =====
 * First release based on paperless 2.6.0
 * Added: Automatic document classification using neural networks (replaces
  regex-based tagging)
 * Added: Document types
 * Added: Archive serial number allows easy referencing of physical document
  copies
 * Added: Inbox tags (added automatically to newly consumed documents)
 * Added: Document viewer on document edit page
 * Database backend is now configurable
--- a/docs/conf.py
+++ b/docs/conf.py
@@ -54,7 +54,7 @@ source_suffix = '.rst'
 master_doc = 'index'
 # General information about the project.
-project = u'Paperless'
+project = u'Paperless-ng'
 copyright = u'2015, Daniel Quinn'
 # The version info for the project you're documenting, acts as replacement for
@@ -205,7 +205,8 @@ try:
    import sphinx_rtd_theme
    html_theme = "sphinx_rtd_theme"
    html_theme_path = [sphinx_rtd_theme.get_html_theme_path()]
-except ImportError:
+except ImportError as e:
    print("error " + str(e))
    pass
 # -- Options for LaTeX output ---------------------------------------------
--- a/docs/consumption.rst
+++ b/docs/consumption.rst
@@ -1,255 +0,0 @@
 .. _consumption:
 Consumption
 ###########
 Once you've got Paperless setup, you need to start feeding documents into it.
 Currently, there are three options: the consumption directory, IMAP (email), and
 HTTP POST.
 .. _consumption-directory:
 The Consumption Directory
 =========================
 The primary method of getting documents into your database is by putting them in
 the consumption directory.  The ``document_consumer`` script runs in an infinite
 loop looking for new additions to this directory and when it finds them, it goes
 about the process of parsing them with the OCR, indexing what it finds, and
 encrypting the PDF (if ``PAPERLESS_PASSPHRASE`` is set), storing it in the
 media directory.
 Getting stuff into this directory is up to you.  If you're running Paperless
 on your local computer, you might just want to drag and drop files there, but if
 you're running this on a server and want your scanner to automatically push
 files to this directory, you'll need to setup some sort of service to accept the
 files from the scanner.  Typically, you're looking at an FTP server like
 `Proftpd`_ or `Samba`_.
 .. _Proftpd: http://www.proftpd.org/
 .. _Samba: http://www.samba.org/
 So where is this consumption directory?  It's wherever you define it.  Look for
 the ``CONSUMPTION_DIR`` value in ``settings.py``.  Set that to somewhere
 appropriate for your use and put some documents in there.  When you're ready,
 follow the :ref:`consumer <utilities-consumer>` instructions to get it running.
 .. _consumption-directory-hook:
 Hooking into the Consumption Process
 ------------------------------------
 Sometimes you may want to do something arbitrary whenever a document is
 consumed.  Rather than try to predict what you may want to do, Paperless lets
 you execute scripts of your own choosing just before or after a document is
 consumed using a couple simple hooks.
 Just write a script, put it somewhere that Paperless can read & execute, and
 then put the path to that script in ``paperless.conf`` with the variable name
 of either ``PAPERLESS_PRE_CONSUME_SCRIPT`` or
 ``PAPERLESS_POST_CONSUME_SCRIPT``.  The script will be executed before or
 or after the document is consumed respectively.
 .. important::
    These scripts are executed in a **blocking** process, which means that if
    a script takes a long time to run, it can significantly slow down your
    document consumption flow.  If you want things to run asynchronously,
    you'll have to fork the process in your script and exit.
 .. _consumption-directory-hook-variables:
 What Can These Scripts Do?
 ..........................
 It's your script, so you're only limited by your imagination and the laws of
 physics.  However, the following values are passed to the scripts in order:
 .. _consumption-director-hook-variables-pre:
 Pre-consumption script
 ::::::::::::::::::::::
 * Document file name
 A simple but common example for this would be creating a simple script like
 this:
 ``/usr/local/bin/ocr-pdf``
 .. code:: bash
    #!/usr/bin/env bash
    pdf2pdfocr.py -i ${1}
 ``/etc/paperless.conf``
 .. code:: bash
    ...
    PAPERLESS_PRE_CONSUME_SCRIPT="/usr/local/bin/ocr-pdf"
    ...
 This will pass the path to the document about to be consumed to ``/usr/local/bin/ocr-pdf``,
 which will in turn call `pdf2pdfocr.py`_ on your document, which will then
 overwrite the file with an OCR'd version of the file and exit.  At which point,
 the consumption process will begin with the newly modified file.
 .. _pdf2pdfocr.py: https://github.com/LeoFCardoso/pdf2pdfocr
 .. _consumption-director-hook-variables-post:
 Post-consumption script
 :::::::::::::::::::::::
 * Document id
 * Generated file name
 * Source path
 * Thumbnail path
 * Download URL
 * Thumbnail URL
 * Correspondent
 * Tags
 The script can be in any language you like, but for a simple shell script
 example, you can take a look at ``post-consumption-example.sh`` in the
 ``scripts`` directory in this project.
 .. _consumption-imap:
 IMAP (Email)
 ============
 Another handy way to get documents into your database is to email them to
 yourself.  The typical use-case would be to be out for lunch and want to send a
 copy of the receipt back to your system at home.  Paperless can be taught to
 pull emails down from an arbitrary account and dump them into the consumption
 directory where the process :ref:`above <consumption-directory>` will follow the
 usual pattern on consuming the document.
 Some things you need to know about this feature:
 * It's disabled by default.  By setting the values below it will be enabled.
 * It's been tested in a limited environment, so it may not work for you (please
  submit a pull request if you can!)
 * It's designed to **delete mail from the server once consumed**.  So don't go
  pointing this to your personal email account and wonder where all your stuff
  went.
 * Currently, only one photo (attachment) per email will work.
 So, with all that in mind, here's what you do to get it running:
 1. Setup a new email account somewhere, or if you're feeling daring, create a
   folder in an existing email box and note the path to that folder.
 2. In ``/etc/paperless.conf`` set all of the appropriate values in
   ``PATHS AND FOLDERS`` and ``SECURITY``.
   If you decided to use a subfolder of an existing account, then make sure you
   set ``PAPERLESS_CONSUME_MAIL_INBOX`` accordingly here.  You also have to set
   the ``PAPERLESS_EMAIL_SECRET`` to something you can remember 'cause you'll
   have to include that in every email you send.
 3. Restart the :ref:`consumer <utilities-consumer>`.  The consumer will check
   the configured email account at startup and from then on every 10 minutes
   for something new and pulls down whatever it finds.
 4. Send yourself an email!  Note that the subject is treated as the file name,
   so if you set the subject to ``Correspondent - Title - tag,tag,tag``, you'll
   get what you expect.  Also, you must include the aforementioned secret
   string in every email so the fetcher knows that it's safe to import.
   Note that Paperless only allows the email title to consist of safe characters
   to be imported. These consist of alpha-numeric characters and ``-_ ,.'``.
 5. After a few minutes, the consumer will poll your mailbox, pull down the
   message, and place the attachment in the consumption directory with the
   appropriate name.  A few minutes later, the consumer will import it like any
   other file.
 .. _consumption-http:
 HTTP POST
 =========
 You can also submit a document via HTTP POST, so long as you do so after
 authenticating.  To push your document to Paperless, send an HTTP POST to the
 server with the following name/value pairs:
 * ``correspondent``: The name of the document's correspondent.  Note that there
  are restrictions on what characters you can use here.  Specifically,
  alphanumeric characters, `-`, `,`, `.`, and `'` are ok, everything else is
  out.  You also can't use the sequence ` - ` (space, dash, space).
 * ``title``: The title of the document.  The rules for characters is the same
  here as the correspondent.
 * ``document``: The file you're uploading
 Specify ``enctype="multipart/form-data"``, and then POST your file with::
    Content-Disposition: form-data; name="document"; filename="whatever.pdf"
 An example of this in HTML is a typical form:
 .. code:: html
    <form method="post" enctype="multipart/form-data">
        <input type="text" name="correspondent" value="My Correspondent" />
        <input type="text" name="title" value="My Title" />
        <input type="file" name="document" />
        <input type="submit" name="go" value="Do the thing" />
    </form>
 But a potentially more useful way to do this would be in Python.  Here we use
 the requests library to handle basic authentication and to send the POST data
 to the URL.
 .. code:: python
    import os
    from hashlib import sha256
    import requests
    from requests.auth import HTTPBasicAuth
    # You authenticate via BasicAuth or with a session id.
    # We use BasicAuth here
    username = "my-username"
    password = "my-super-secret-password"
    # Where you have Paperless installed and listening
    url = "http://localhost:8000/push"
    # Document metadata
    correspondent = "Test Correspondent"
    title = "Test Title"
    # The local file you want to push
    path = "/path/to/some/directory/my-document.pdf"
    with open(path, "rb") as f:
        response = requests.post(
            url=url,
            data={"title": title,  "correspondent": correspondent},
            files={"document": (os.path.basename(path), f, "application/pdf")},
            auth=HTTPBasicAuth(username, password),
            allow_redirects=False
        )
        if response.status_code == 202:
            # Everything worked out ok
            print("Upload successful")
        else:
            # If you don't get a 202, it's probably because your credentials
            # are wrong or something.  This will give you a rough idea of what
            # happened.
            print("We got HTTP status code: {}".format(response.status_code))
            for k, v in response.headers.items():
                print("{}: {}".format(k, v))
--- a/docs/customising.rst
+++ b/docs/customising.rst
@@ -1,42 +0,0 @@
 .. _customising:
 Customising Paperless
 #####################
 Currently, the Paperless' interface is just the default Django admin, which
 while powerful, is rather boring.  If you'd like to give the site a bit of a
 face-lift, or if you simply want to adjust the colours, contrast, or font size
 to make things easier to read, you can do that by adding your own CSS or
 Javascript quite easily.
 .. _customising-overrides:
 Overrides
 =========
 On every page load, Paperless looks for two files in your media root directory
 (the directory defined by your ``PAPERLESS_MEDIADIR`` configuration variable or
 the default, ``<project root>/media/``) for two files:
 * ``overrides.css``
 * ``overrides.js``
 If it finds either or both of those files, they'll be loaded into the page: the
 CSS in the ``<head>``, and the Javascript stuffed into the last line of the
 ``<body>``.
 .. _customising-overrides-note:
 An important note about customisation
 -------------------------------------
 Any changes you make to the site with your CSS or Javascript are likely to
 depend on the structure of the current HTML and/or the existing CSS rules.  For
 the most part it's safe to assume that these bits won't change, but *sometimes
 they do* as features are added or bugs are fixed.
 If you make a change that you think others would appreciate though, submit it
 as a pull request and maybe we can find a way to work it into the project by
 default!
--- a/docs/guesswork.rst
+++ b/docs/guesswork.rst
@@ -1,131 +0,0 @@
 .. _guesswork:
 Guesswork
 #########
 During the consumption process, Paperless tries to guess some of the attributes
 of the document it's looking at.  To do this it uses two approaches:
 .. _guesswork-naming:
 File Naming
 ===========
 Any document you put into the consumption directory will be consumed, but if
 you name the file right, it'll automatically set some values in the database
 for you.  This is is the logic the consumer follows:
 1. Try to find the correspondent, title, and tags in the file name following
   the pattern: ``Date - Correspondent - Title - tag,tag,tag.pdf``.  Note that
   the format of the date is **rigidly defined** as ``YYYYMMDDHHMMSSZ`` or
   ``YYYYMMDDZ``.  The ``Z`` refers "Zulu time" AKA "UTC".
   The tags are optional, so the format ``Date - Correspondent - Title.pdf``
   works as well.
 2. If that doesn't work, we skip the date and try this pattern:
   ``Correspondent - Title - tag,tag,tag.pdf``.
 3. If that doesn't work, we try to find the correspondent and title in the file
   name following the pattern: ``Correspondent - Title.pdf``.
 4. If that doesn't work, just assume that the name of the file is the title.
 So given the above, the following examples would work as you'd expect:
 * ``20150314000700Z - Some Company Name - Invoice 2016-01-01 - money,invoices.pdf``
 * ``20150314Z - Some Company Name - Invoice 2016-01-01 - money,invoices.pdf``
 * ``Some Company Name - Invoice 2016-01-01 - money,invoices.pdf``
 * ``Another Company - Letter of Reference.jpg``
 * ``Dad's Recipe for Pancakes.png``
 These however wouldn't work:
 * ``2015-03-14 00:07:00 UTC - Some Company Name, Invoice 2016-01-01, money, invoices.pdf``
 * ``2015-03-14 - Some Company Name, Invoice 2016-01-01, money, invoices.pdf``
 * ``Some Company Name, Invoice 2016-01-01, money, invoices.pdf``
 * ``Another Company- Letter of Reference.jpg``
 Do I have to be so strict about naming?
 ---------------------------------------
 Rather than using the strict document naming rules, one can also set the option
 ``PAPERLESS_FILENAME_DATE_ORDER`` in ``paperless.conf`` to any date order
 that is accepted by dateparser_. Doing so will cause ``paperless`` to default
 to any date format that is found in the title, instead of a date pulled from
 the document's text, without requiring the strict formatting of the document
 filename as described above.
 .. _dateparser: https://github.com/scrapinghub/dateparser/blob/v0.7.0/docs/usage.rst#settings
 Transforming filenames for parsing
 ----------------------------------
 Some devices can't produce filenames that can be parsed by the default
 parser. By configuring the option ``PAPERLESS_FILENAME_PARSE_TRANSFORMS`` in
 ``paperless.conf`` one can add transformations that are applied to the filename
 before it's parsed.
 The option contains a list of dictionaries of regular expressions (key:
 ``pattern``) and replacements (key: ``repl``) in JSON format, which are
 applied in order by passing them to ``re.subn``. Transformation stops
 after the first match, so at most one transformation is applied. The general
 syntax is
 .. code:: python
   [{"pattern":"pattern1", "repl":"repl1"}, {"pattern":"pattern2", "repl":"repl2"}, ..., {"pattern":"patternN", "repl":"replN"}]
 The example below is for a Brother ADS-2400N, a scanner that allows
 different names to different hardware buttons (useful for handling
 multiple entities in one instance), but insists on adding ``_<count>``
 to the filename.
 .. code:: python
   # Brother profile configuration, support "Name_Date_Count" (the default
   # setting) and "Name_Count" (use "Name" as tag and "Count" as title).
   PAPERLESS_FILENAME_PARSE_TRANSFORMS=[{"pattern":"^([a-z]+)_(\\d{8})_(\\d{6})_([0-9]+)\\.", "repl":"\\2\\3Z - \\4 - \\1."}, {"pattern":"^([a-z]+)_([0-9]+)\\.", "repl":" - \\2 - \\1."}]
 .. _guesswork-content:
 Reading the Document Contents
 =============================
 After the consumer has tried to figure out what it could from the file name,
 it starts looking at the content of the document itself.  It will compare the
 matching algorithms defined by every tag and correspondent already set in your
 database to see if they apply to the text in that document.  In other words,
 if you defined a tag called ``Home Utility`` that had a ``match`` property of
 ``bc hydro`` and a ``matching_algorithm`` of ``literal``, Paperless will
 automatically tag your newly-consumed document with your ``Home Utility`` tag
 so long as the text ``bc hydro`` appears in the body of the document somewhere.
 The matching logic is quite powerful, and supports searching the text of your
 document with different algorithms, and as such, some experimentation may be
 necessary to get things Just Right.
 .. _guesswork-content-howto:
 How Do I Set Up These Matching Algorithms?
 ------------------------------------------
 Setting up of the algorithms is easily done through the admin interface.  When
 you create a new correspondent or tag, there are optional fields for matching
 text and matching algorithm.  From the help info there:
 .. note::
    Which algorithm you want to use when matching text to the OCR'd PDF.  Here,
    "any" looks for any occurrence of any word provided in the PDF, while "all"
    requires that every word provided appear in the PDF, albeit not in the
    order provided.  A "literal" match means that the text you enter must
    appear in the PDF exactly as you've entered it, and "regular expression"
    uses a regex to match the PDF.  If you don't know what a regex is, you
    probably don't want this option.
 When using the "any" or "all" matching algorithms, you can search for terms
 that consist of multiple words by enclosing them in double quotes. For example,
 defining a match text of ``"Bank of America" BofA`` using the "any" algorithm,
 will match documents that contain either "Bank of America" or "BofA", but will
 not match documents containing "Bank of South America".
 Then just save your tag/correspondent and run another document through the
 consumer.  Once complete, you should see the newly-created document,
 automatically tagged with the appropriate data.
--- a/docs/index.rst
+++ b/docs/index.rst
@@ -4,8 +4,8 @@ Paperless
 =========
 Paperless is a simple Django application running in two parts:
-a :ref:`consumer <utilities-consumer>` (the thing that does the indexing) and
+a *Consumer* (the thing that does the indexing) and
-the :ref:`webserver <utilities-webserver>` (the part that lets you search &
+the *Web server* (the part that lets you search &
 download already-indexed documents). If you want to learn more about its
 functions keep on reading after the installation section.
@@ -25,26 +25,34 @@ finding stuff again. I feed documents right from the post box into the scanner
 and then shred them.  Perhaps you might find it useful too.
 Paperless-ng
 ============
 I wanted to make big changes to the project that will impact the way it is used
 by its users greatly. Among the users who currently use paperless in production
 there are probably many that don't want these changes right away. I also wanted
 to have more control over what goes into the code and what does not. Therefore,
 paperless-ng was created. NG stands for both Angular (the framework used for the
 Frontend) and next-gen. Publishing this project under a different name also
 avoids confusion between paperless and paperless-ng.
 It would be great if this project could eventually merge back into the main
 repository, but it needs a lot more work before that can happen.
 Contents
 ========
 .. toctree::
-   :maxdepth: 2
+   :maxdepth: 1
   requirements
   setup
-   consumption
+   usage_overview
   advanced_usage
   administration
   api
   utilities
   guesswork
   migrating
   customising
   extending
   troubleshooting
   contributing
   scanners
   screenshots
   changelog
   changelog_jonaswinkler
--- a/docs/migrating.rst
+++ b/docs/migrating.rst
@@ -1,109 +0,0 @@
 .. _migrating:
 Migrating, Updates, and Backups
 ===============================
 As Paperless is still under active development, there's a lot that can change
 as software updates roll out.  You should backup often, so if anything goes
 wrong during an update, you at least have a means of restoring to something
 usable.  Thankfully, there are automated ways of backing up, restoring, and
 updating the software.
 .. _migrating-backup:
 Backing Up
 ----------
 So you're bored of this whole project, or you want to make a remote backup of
 your files for whatever reason.  This is easy to do, simply use the
 :ref:`exporter <utilities-exporter>` to dump your documents and database out
 into an arbitrary directory.
 .. _migrating-restoring:
 Restoring
 ---------
 Restoring your data is just as easy, since nearly all of your data exists either
 in the file names, or in the contents of the files themselves.  You just need to
 create an empty database (just follow the
 :ref:`installation instructions <setup-installation>` again) and then import the
 ``tags.json`` file you created as part of your backup.  Lastly, copy your
 exported documents into the consumption directory and start up the consumer.
 .. code-block:: shell-session
    $ cd /path/to/project
    $ rm data/db.sqlite3  # Delete the database
    $ cd src
    $ ./manage.py migrate  # Create the database
    $ ./manage.py createsuperuser
    $ ./manage.py loaddata /path/to/arbitrary/place/tags.json
    $ cp /path/to/exported/docs/* /path/to/consumption/dir/
    $ ./manage.py document_consumer
 Importing your data if you are :ref:`using Docker <setup-installation-docker>`
 is almost as simple:
 .. code-block:: shell-session
    # Stop and remove your current containers
    $ docker-compose stop
    $ docker-compose rm -f
    # Recreate them, add the superuser
    $ docker-compose up -d
    $ docker-compose run --rm webserver createsuperuser
    # Load the tags
    $ cat /path/to/arbitrary/place/tags.json | docker-compose run --rm webserver loaddata_stdin -
    # Load your exported documents into the consumption directory
    # (How you do this highly depends on how you have set this up)
    $ cp /path/to/exported/docs/* /path/to/mounted/consumption/dir/
 After loading the documents into the consumption directory the consumer will
 immediately start consuming the documents.
 .. _migrating-updates:
 Updates
 -------
 For the most part, all you have to do to update Paperless is run ``git pull``
 on the directory containing the project files, and then use Django's
 ``migrate`` command to execute any database schema updates that might have been
 rolled in as part of the update:
 .. code-block:: shell-session
    $ cd /path/to/project
    $ git pull
    $ pip install -r requirements.txt
    $ cd src
    $ ./manage.py migrate
 Note that it's possible (even likely) that while ``git pull`` may update some
 files, the ``migrate`` step may not update anything.  This is totally normal.
 Additionally, as new features are added, the ability to control those features
 is typically added by way of an environment variable set in ``paperless.conf``.
 You may want to take a look at the ``paperless.conf.example`` file to see if
 there's anything new in there compared to what you've got in ``/etc``.
 If you are :ref:`using Docker <setup-installation-docker>` the update process
 is similar:
 .. code-block:: shell-session
    $ cd /path/to/project
    $ git pull
    $ docker build -t paperless .
    $ docker-compose run --rm consumer migrate
    $ docker-compose up -d
 If ``git pull`` doesn't report any changes, there is no need to continue with
 the remaining steps.
--- a/docs/requirements.rst
+++ b/docs/requirements.rst
@@ -1,125 +0,0 @@
 .. _requirements:
 Requirements
 ============
 You need a Linux machine or Unix-like setup (theoretically an Apple machine
 should work) that has the following software installed:
 * `Python3`_ (with development libraries, pip and virtualenv)
 * `GNU Privacy Guard`_
 * `Tesseract`_, plus its language files matching your document base.
 * `Imagemagick`_ version 6.7.5 or higher
 * `unpaper`_
 * `libpoppler-cpp-dev`_ PDF rendering library
 * `optipng`_
 .. _Python3: https://python.org/
 .. _GNU Privacy Guard: https://gnupg.org
 .. _Tesseract: https://github.com/tesseract-ocr
 .. _Imagemagick: http://imagemagick.org/
 .. _unpaper: https://github.com/unpaper/unpaper
 .. _libpoppler-cpp-dev: https://poppler.freedesktop.org/
 .. _optipng: http://optipng.sourceforge.net/
 Notably, you should confirm how you access your Python3 installation.  Many
 Linux distributions will install Python3 in parallel to Python2, using the
 names ``python3`` and ``python`` respectively.  The same goes for ``pip3`` and
 ``pip``.  Running Paperless with Python2 will likely break things, so make sure
 that you're using the right version.
 For the purposes of simplicity, ``python`` and ``pip`` is used everywhere to
 refer to their Python3 versions.
 In addition to the above, there are a number of Python requirements, all of
 which are listed in a file called ``requirements.txt`` in the project root
 directory.
 If you're not working on a virtual environment (like Docker), you
 should probably be using a virtualenv, but that's your call.  The reasons why
 you might choose a virtualenv or not aren't really within the scope of this
 document.  Needless to say if you don't know what a virtualenv is, you should
 probably figure that out before continuing.
 .. _requirements-apple:
 Problems with Imagemagick & PDFs
 --------------------------------
 Some users have `run into problems`_ with getting ImageMagick to do its thing
 with PDFs.  Often this is the case with Apple systems using HomeBrew, but other
 Linuxes have been a problem as well.  The solution appears to be to install
 ghostscript as well as ImageMagick:
 .. _run into problems: https://github.com/the-paperless-project/paperless/issues/25
 .. code:: bash
    $ brew install ghostscript
    $ brew install imagemagick
    $ brew install libmagic
 .. _requirements-baremetal:
 Python-specific Requirements: No Virtualenv
 -------------------------------------------
 If you don't care to use a virtual env, then installation of the Python
 dependencies is easy:
 .. code:: bash
    $ pip install --user --requirement /path/to/paperless/requirements.txt
 This will download and install all of the requirements into
 ``${HOME}/.local``.  Remember that your distribution may be using ``pip3`` as
 mentioned above.
 .. _requirements-virtualenv:
 Python-specific Requirements: Virtualenv
 ----------------------------------------
 Using a virtualenv for this is pretty straightforward: create a virtualenv,
 enter it, and install the requirements using the ``requirements.txt`` file:
 .. code:: bash
    $ virtualenv --python=/path/to/python3 /path/to/arbitrary/directory
    $ . /path/to/arbitrary/directory/bin/activate
    $ pip install  --requirement /path/to/paperless/requirements.txt
 Now you're ready to go.  Just remember to enter (activate) your virtualenv
 whenever you want to use Paperless.
 .. _requirements-documentation:
 Documentation
 -------------
 As generation of the documentation is not required for the use of Paperless,
 dependencies for this process are not included in ``requirements.txt``.  If
 you'd like to generate your own docs locally, you'll need to:
 .. code:: bash
    $ pip install sphinx
 and then cd into the ``docs`` directory and type ``make html``.
 If you are using Docker, you can use the following commands to build the
 documentation and run a webserver serving it on `port 8001`_:
 .. code:: bash
    $ pwd
    /path/to/paperless
    $ docker build -t paperless:docs -f docs/Dockerfile .
    $ docker run --rm -it -p "8001:8000" paperless:docs
 .. _port 8001: http://127.0.0.1:8001
--- a/docs/requirements.txt
+++ b/docs/requirements.txt
--- a/docs/scanners.rst
+++ b/docs/scanners.rst
@@ -1,7 +1,8 @@
 .. _scanners:
-Scanner Recommendations
+***********************
-=======================
+Scanner recommendations
 ***********************
 As Paperless operates by watching a folder for new files, doesn't care what
 scanner you use, but sometimes finding a scanner that will write to an FTP,
@@ -23,16 +24,19 @@ that works right for you based on recommentations from other Paperless users.
 +---------+----------------+-----+-----+-----+----------------+
 | Fujitsu | `ix500`_       | yes |     | yes | `eonist`_      |
 +---------+----------------+-----+-----+-----+----------------+
 | Fujitsu | `S1300i`_      | yes |     | yes | `jonaswinkler`_|
 +---------+----------------+-----+-----+-----+----------------+
 .. _ADS-1500W: https://www.brother.ca/en/p/ads1500w
 .. _MFC-J6930DW: https://www.brother.ca/en/p/MFCJ6930DW
 .. _MFC-J5910DW: https://www.brother.co.uk/printers/inkjet-printers/mfcj5910dw
 .. _MFC-9142CDN: https://www.brother.co.uk/printers/laser-printers/mfc9140cdn
-.. _ix500: http://www.fujitsu.com/us/products/computing/peripheral/scanners/scansnap/ix500/
+.. _ix500: https://www.fujitsu.com/global/products/computing/peripheral/scanners/scansnap/ix500/
 .. _S1300i: https://www.fujitsu.com/global/products/computing/peripheral/scanners/soho/s1300i/
 .. _danielquinn: https://github.com/danielquinn
 .. _ayounggun: https://github.com/ayounggun
 .. _bmsleight: https://github.com/bmsleight
 .. _eonist: https://github.com/eonist
 .. _REOLDEV: https://github.com/REOLDEV
-
+.. _jonaswinkler: https://github.com/jonaswinkler
--- a/docs/screenshots.rst
+++ b/docs/screenshots.rst
@@ -1,16 +0,0 @@
 .. _screenshots:
 Screenshots
 ===========
 Once everything is set-up login to paperless using the web front-end
 .. image:: ./_static/Screenshot_first_run_login.png 
 Nice clean interface
 .. image:: ./_static/Screenshot_first_logged.png 
 Some documents loaded in via ftp or using the scanners ftp. 
 .. image:: ./_static/Screenshot_upload_and_scanned.png 
--- a/docs/setup.rst
+++ b/docs/setup.rst
@@ -1,500 +1,187 @@
 .. _setup:
 *****
 Setup
-=====
+*****
 Paperless isn't a very complicated app, but there are a few components, so some
 basic documentation is in order.  If you follow along in this document and
 still have trouble, please open an `issue on GitHub`_ so I can fill in the
 gaps.
 .. _issue on GitHub: https://github.com/the-paperless-project/paperless/issues
 .. _setup-download:
 Download
--------
+########
 The source is currently only available via GitHub, so grab it from there,
-either by using ``git``:
+by using ``git``:
 .. code:: bash
-    $ git clone https://github.com/the-paperless-project/paperless.git
+    $ git clone https://github.com/jonaswinkler/paperless-ng.git
    $ cd paperless
-or just download the tarball and go that route:
+Installation
-
+############
 .. code:: bash
    $ cd to the directory where you want to run Paperless
    $ wget https://github.com/the-paperless-project/paperless/archive/master.zip
    $ unzip master.zip
    $ cd paperless-master
 .. _setup-installation:
 Installation & Configuration
 ----------------------------
 You can go multiple routes with setting up and running Paperless:
- * The `bare metal route`_
+* The `docker route`_
- * The `docker route`_
+* The `bare metal route`_
 * A suggested `linux containers route`_
 The recommended setup route is docker, since it takes care of all dependencies
 for you.
 The `docker route`_ is quick & easy.
-The `bare metal route`_ is a bit more complicated to setup but makes it easier
+The `bare metal route`_ is more complicated to setup but makes it easier
 should you want to contribute some code back.
-The `linux containers route`_ is quick, but makes alot of assumptions on the
+Docker Route
-set-up, on the other hand the script could be used to install on a base
+============
 debian or ubuntu server.
-.. _docker route: setup-installation-docker_
+1.  Install `Docker`_ and `docker-compose`_. [#compose]_
 .. _bare metal route: setup-installation-bare-metal_
 .. _Docker Machine: https://docs.docker.com/machine/
-.. _setup-installation-bare-metal:
+    .. caution::
-Standard (Bare Metal)
+        If you want to use the included ``docker-compose.yml.example`` file, you
-+++++++++++++++++++++
+        need to have at least Docker version **17.09.0** and docker-compose
        version **1.17.0**.
-1. Install the requirements as per the :ref:`requirements <requirements>` page.
+        See the `Docker installation guide`_ on how to install the current
-2. Within the extract of master.zip go to the ``src`` directory.
+        version of Docker for your operating system or Linux distribution of
-3. Copy ``../paperless.conf.example`` to ``/etc/paperless.conf`` and open it in
+        choice. To get an up-to-date version of docker-compose, follow the
-   your favourite editor.  As this file contains passwords.  It should only be
+        `docker-compose installation guide`_ if your package repository doesn't
-   readable by user root and paperless!  Set the values for:
+        include it.
-   Set the values for:
+        .. _Docker installation guide: https://docs.docker.com/engine/installation/
        .. _docker-compose installation guide: https://docs.docker.com/compose/install/
-    * ``PAPERLESS_CONSUMPTION_DIR``: this is where your documents will be
+2.  Create a copy of ``docker-compose.yml.example`` as ``docker-compose.yml``
-      dumped to be consumed by Paperless.
+    and a copy of ``docker-compose.env.example`` as ``docker-compose.env``.
-    * ``PAPERLESS_OCR_THREADS``: this is the number of threads the OCR process
+    You'll be editing both these files: taking a copy ensures that you can
-      will spawn to process document pages in parallel.
+    ``git pull`` to receive updates without risking merge conflicts with your
-    * ``PAPERLESS_PASSPHRASE``: this is only required if you want to use GPG to
+    modified versions of the configuration files.
-      encrypt your document files.  This is the passphrase Paperless uses to
+3.  Modify ``docker-compose.yml`` to your preferences. You should change the path
-      encrypt/decrypt the original documents.  Don't worry about defining this
+    to the consumption directory in this file. Find the line that specifies where
-      if you don't want to use encryption (the default).
+    to mount the consumption directory:
-   Note also that if you're using the ``runserver`` as mentioned below, you
+    .. code::
   should make sure that PAPERLESS_DEBUG="true" or is just commented out as
   this is the default.
-4. Initialise the SQLite database with ``./manage.py migrate``.
+        - ./consume:/usr/src/paperless/consume
 5. Collect the static files for the webserver with ``./manage.py collectstatic``.
 6. Create a user for your Paperless instance with
   ``./manage.py createsuperuser``. Follow the prompts to create your user.
 7. Start the webserver with ``./manage.py runserver <IP>:<PORT>``.
   If no specific IP or port is given, the default is ``127.0.0.1:8000`` also
   known as http://localhost:8000/.
   You should now be able to visit your (empty) installation at
   `Paperless webserver`_ or whatever you chose before.  You can login with the
   user/pass you created in #5.
-8. In a separate window, change to the ``src`` directory in this repo again,
+    Replace the part BEFORE the colon with a local directory of your choice:
   but this time, you should start the consumer script with
   ``./manage.py document_consumer``.
 9. Scan something or put a file into the  ``CONSUMPTION_DIR``.
 10. Wait a few minutes
 11. Visit the document list on your webserver, and it should be there, indexed
    and downloadable.
-.. caution::
+    .. code::
-    This installation is not secure. Once everything is working head over to
+        - /home/jonaswinkler/paperless-inbox:/usr/src/paperless/consume
    `Making things more permanent`_
-.. _Paperless webserver: http://127.0.0.1:8000
+    Don't change the part after the colon or paperless wont find your documents.
 .. _Making things more permanent: setup-permanent_
 .. _setup-installation-docker:
 Docker Method
 +++++++++++++
 1. Install `Docker`_.
   .. caution::
      As mentioned earlier, this guide assumes that you use Docker natively
      under Linux. If you are using `Docker Machine`_ under Mac OS X or
      Windows, you will have to adapt IP addresses, volume-mounting, command
      execution and maybe more.
 2. Install `docker-compose`_. [#compose]_
   .. caution::
       If you want to use the included ``docker-compose.yml.example`` file, you
       need to have at least Docker version **1.12.0** and docker-compose
       version **1.9.0**.
       See the `Docker installation guide`_ on how to install the current
       version of Docker for your operating system or Linux distribution of
       choice. To get an up-to-date version of docker-compose, follow the
       `docker-compose installation guide`_ if your package repository doesn't
       include it.
       .. _Docker installation guide: https://docs.docker.com/engine/installation/
       .. _docker-compose installation guide: https://docs.docker.com/compose/install/
 3. Create a copy of ``docker-compose.yml.example`` as ``docker-compose.yml``
   and a copy of ``docker-compose.env.example`` as ``docker-compose.env``.
   You'll be editing both these files: taking a copy ensures that you can
   ``git pull`` to receive updates without risking merge conflicts with your
   modified versions of the configuration files.
 4. Modify ``docker-compose.yml`` to your preferences, following the
   instructions in comments in the file. The only change that is a hard
   requirement is to specify where the consumption directory should
   mount.[#dockercomposeyml]_
 	 .. caution::
 	     If you are using NFS mounts for the consume directory you also need to
 			 change the command to turn off inotify as it doesn't work with NFS
 			 ``command: ["document_consumer", "--no-inotify"]``
-5. Modify ``docker-compose.env`` and adapt the following environment variables:
+4.  Modify ``docker-compose.env``, following the comments in the file. The
    most important change is to set ``USERMAP_UID`` and ``USERMAP_GID``
    to the uid and gid of your user on the host system. This ensures that
    both the docker container and you on the host machine have write access
    to the consumption directory. If your UID and GID on the host system is
    1000 (the default for the first normal user on most systems), it will
    work out of the box without any modifications.
-   ``PAPERLESS_PASSPHRASE``
+5. Run ``docker-compose up -d``. This will create and start the necessary
     This is the passphrase Paperless uses to encrypt/decrypt the original
     document.  If you aren't planning on using GPG encryption, you can just
     leave this undefined.
   ``PAPERLESS_OCR_THREADS``
     This is the number of threads the OCR process will spawn to process
     document pages in parallel. If the variable is not set, Python determines
     the core-count of your CPU and uses that value.
   ``PAPERLESS_OCR_LANGUAGES``
     If you want the OCR to recognize other languages in addition to the
     default English, set this parameter to a space separated list of
     three-letter language-codes after `ISO 639-2/T`_. For a list of available
     languages -- including their three letter codes -- see the
     `Alpine packagelist`_.
   ``USERMAP_UID`` and ``USERMAP_GID``
     If you want to mount the consumption volume (directory ``/consume`` within
     the containers) to a host-directory -- which you probably want to do --
     access rights might be an issue. The default user and group ``paperless``
     in the containers have an id of 1000. The containers will enforce that the
     owning group of the consumption directory will be ``paperless`` to be able
     to delete consumed documents. If your host-system has a group with an ID
     of 1000 and you don't want this group to have access rights to the
     consumption directory, you can use ``USERMAP_GID`` to change the id in the
     container and thus the one of the consumption directory. Furthermore, you
     can change the id of the default user as well using ``USERMAP_UID``.
  ``PAPERLESS_USE_SSL``
    If you want Paperless to use SSL for the user interface, set this variable
    to ``true``. You also need to copy your certificate and key to the ``data``
    directory, named ``ssl.cert`` and ``ssl.key``.
    This is not an ideal solution and, if possible, a reverse proxy with nginx
    is preferred.
 6. Run ``docker-compose up -d``. This will create and start the necessary
   containers.
 7. To be able to login, you will need a super user. To create it, execute the
   following command:
-   .. code-block:: shell-session
+6.  To be able to login, you will need a super user. To create it, execute the
    following command:
-       $ docker-compose run --rm webserver createsuperuser
+    .. code-block:: shell-session
-   This will prompt you to set a username (default ``paperless``), an optional
+        $ docker-compose run --rm webserver createsuperuser
   e-mail address and finally a password.
 8. The default ``docker-compose.yml`` exports the webserver on your local port
   8000. If you haven't adapted this, you should now be able to visit your
   `Paperless webserver`_ at ``http://127.0.0.1:8000`` (or
   ``https://127.0.0.1:8000`` if you enabled SSL). You can login with the
   user and password you just created.
 9. Add files to consumption directory the way you prefer to. Following are two
   possible options:
-   1. Mount the consumption directory to a local host path by modifying your
+    This will prompt you to set a username, an optional e-mail address and
-      ``docker-compose.yml``:
+    finally a password.
      .. code-block:: diff
         diff --git a/docker-compose.yml b/docker-compose.yml
         --- a/docker-compose.yml
         +++ b/docker-compose.yml
         @@ -17,9 +18,8 @@ services:
                  volumes:
                      - paperless-data:/usr/src/paperless/data
                      - paperless-media:/usr/src/paperless/media
         -            - /consume
         +            - /local/path/you/choose:/consume
      .. danger::
          While the consumption container will ensure at startup that it can
          **delete** a consumed file from a host-mounted directory, it might
          not be able to **read** the document in the first place if the access
          rights to the file are incorrect.
          Make sure that the documents you put into the consumption directory
          will either be readable by everyone (``chmod o+r file.pdf``) or
          readable by the default user or group id 1000 (or the one you have
          set with ``USERMAP_UID`` or ``USERMAP_GID`` respectively).
   2. Use ``docker cp`` to copy your files directly into the container:
      .. code-block:: shell-session
         $ # Identify your containers
         $ docker-compose ps
                 Name                       Command                State     Ports
         -------------------------------------------------------------------------
         paperless_consumer_1    /sbin/docker-entrypoint.sh ...   Exit 0
         paperless_webserver_1   /sbin/docker-entrypoint.sh ...   Exit 0
         $ docker cp /path/to/your/file.pdf paperless_consumer_1:/consume
      ``docker cp`` is a one-shot-command, just like ``cp``. This means that
      every time you want to consume a new document, you will have to execute
      ``docker cp`` again. You can of course automate this process, but option
      1 is generally the preferred one.
      .. danger::
          ``docker cp`` will change the owning user and group of a copied file
          to the acting user at the destination, which will be ``root``.
          You therefore need to ensure that the documents you want to copy into
          the container are readable by everyone (``chmod o+r file.pdf``)
          before copying them.
 7.  The default ``docker-compose.yml`` exports the webserver on your local port
    8000. If you haven't adapted this, you should now be able to visit your
    Paperless instance at ``http://127.0.0.1:8000``. You can login with the
    user and password you just created.
 .. _Docker: https://www.docker.com/
 .. _docker-compose: https://docs.docker.com/compose/install/
 .. _ISO 639-2/T: https://en.wikipedia.org/wiki/List_of_ISO_639-1_codes
 .. _Alpine packagelist: https://pkgs.alpinelinux.org/packages?name=tesseract-ocr-data*&arch=x86_64
 .. [#compose] You of course don't have to use docker-compose, but it
   simplifies deployment immensely. If you know your way around Docker, feel
   free to tinker around without using compose!
 .. [#dockercomposeyml] If you're upgrading your docker-compose images from
   version 1.1.0 or earlier, you might need to change in the
   ``docker-compose.yml`` file the ``image: pitkley/paperless`` directive in
   both the ``webserver`` and ``consumer`` sections to ``build: ./`` as per the
   newer ``docker-compose.yml.example`` file
 Bare Metal Route
 ================
-.. _setup-permanent:
+.. warning::
-Making Things a Little more Permanent
+    TBD. User docker for now.
 -------------------------------------
-Once you've tested things and are happy with the work flow, you should secure
+Migration to paperless-ng
-the installation and automate the process of starting the webserver and
+#########################
 consumer.
 At its core, paperless-ng is still paperless and fully compatible. However, some
 things have changed under the hood, so you need to adapt your setup depending on
 how you installed paperless. The important things to keep in mind are as follows.
-.. _setup-permanent-webserver:
+* Read the :ref:`paperless_changelog` and take note of breaking changes.
 * It is recommended to use postgresql as the database now. The docker-compose
  deployment will automatically create a postgresql instance and instruct
  paperless to use it. This means that if you use the docker-compose script
  with your current paperless media and data volumes and used the default
  sqlite database, **it will not use your sqlite database and it may seem
  as if your documents are gone**. You may use the provided
  ``docker-compose.yml.sqlite.example`` script, which does not use postgresql.
 * The task scheduler of paperless, which is used to execute periodic tasks
  such as email checking and maintenance, requires a `redis`_ message broker
  instance. The docker-compose route takes care of that.
 * The layout of the folder structure for your documents and data remains the
  same.
 * The frontend needs to be built from source. The docker image takes care of
  that.
-Using a Real Webserver
+Migration to paperless-ng is then performed in a few simple steps:
 ++++++++++++++++++++++
-The default is to use Django's development server, as that's easy and does the
+1.  Do a backup for two purposes: If something goes wrong, you still have your
-job well enough on a home network. However it is heavily discouraged to use
+    data. Second, if you don't like paperless-ng, you can switch back to
-it for more than that.
+    paperless.
-If you want to do things right you should use a real webserver capable of
+2.  Replace the paperless source with paperless-ng. If you're using git, this
-handling more than one thread. You will also have to let the webserver serve
+    is done by:
 the static files (CSS, JavaScript) from the directory configured in
 ``PAPERLESS_STATICDIR``.  The default static files directory is ``../static``.
-For that you need to activate your virtual environment and collect the static
+    .. code:: bash
 files with the command:
-.. code:: bash
+        $ git remote set-url origin https://github.com/jonaswinkler/paperless-ng
        $ git pull
-    $ cd <paperless directory>/src
+3.  If you are using docker, copy ``docker-compose.yml.example`` to
-    $ ./manage.py collectstatic
+    ``docker-compose.yml`` and ``docker-compose.env.example`` to
    ``docker-compose.env``. Make adjustments to these files as necessary.
    See `docker route`_ for details.
 4.  Update paperless. See :ref:`administration-updating` for details.
-Apache
+5.  Start paperless-ng.
 ~~~~~~
-This is a configuration supplied by `steckerhalter`_ on GitHub.  It uses Apache
+    .. code:: bash
 and mod_wsgi, with a Paperless installation in ``/home/paperless/``:
-.. code:: apache
+        $ docker-compose up
-    <VirtualHost *:80>
+    This will also migrate your database as usual. Verify by inspecting the
-        ServerName example.com
+    output that the migration was successfully executed. CTRL-C will then
    gracefully stop the container. After that, you can start paperless-ng as
    usuall with 
-        Alias /static/ /home/paperless/paperless/static/
+    .. code:: bash
        <Directory /home/paperless/paperless/static>
            Require all granted
        </Directory>
-        WSGIScriptAlias / /home/paperless/paperless/src/paperless/wsgi.py
+        $ docker-compose up -d
        WSGIDaemonProcess example.com user=paperless group=paperless threads=5 python-path=/home/paperless/paperless/src:/home/paperless/.env/lib/python3.6/site-packages
        WSGIProcessGroup example.com
-        <Directory /home/paperless/paperless/src/paperless>
+6.  Paperless installed a permanent redirect to ``admin/`` in your browser. This
-            <Files wsgi.py>
+    redirect is still in place and prevents access to the new UI. Clear 
-                Require all granted
+    everything related to paperless in your browsers data in order to fix
-            </Files>
+    this issue.
        </Directory>
    </VirtualHost>
-.. _steckerhalter: https://github.com/steckerhalter
+Moving data from sqlite to postgresql
 =====================================
 .. warning::
-Nginx + Gunicorn
+    TBD.
 ~~~~~~~~~~~~~~~~
 If you're using Nginx, the most common setup is to combine it with a
 Python-based server like Gunicorn so that Nginx is acting as a proxy.  Below is
 a copy of a simple Nginx configuration fragment making use of a gunicorn
 instance listening on localhost port 8000.
 .. code:: nginx
    server {
        listen 80;
        index index.html index.htm index.php;
        access_log /var/log/nginx/paperless_access.log;
        error_log /var/log/nginx/paperless_error.log;
        location /static {
            autoindex on;
            alias <path-to-paperless-static-directory>;
        }
        location / {
            proxy_set_header Host $http_host;
            proxy_set_header X-Real-IP $remote_addr;
            proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
            proxy_set_header X-Forwarded-Proto $scheme;
            proxy_pass http://127.0.0.1:8000;
        }
    }
 The gunicorn server can be started with the command:
 .. code-block:: shell
    $ <path-to-paperless-virtual-environment>/bin/gunicorn --pythonpath=<path-to-paperless>/src paperless.wsgi -w 2
 .. _setup-permanent-standard-systemd:
 Standard (Bare Metal + Systemd)
 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
 If you're running on a bare metal system that's using Systemd, you can use the
 service unit files in the ``scripts`` directory to set this up.
 1. You'll need to create a group and user called ``paperless`` (without login)
 2. Setup Paperless to be in a place that this new user can read and write to.
 3. Ensure ``/etc/paperless`` is readable by the ``paperless`` user.
 4. Copy the service file from the ``scripts`` directory to
   ``/etc/systemd/system``.
 .. code-block:: bash
    $ cp /path/to/paperless/scripts/paperless-consumer.service /etc/systemd/system/
    $ cp /path/to/paperless/scripts/paperless-webserver.service /etc/systemd/system/
 5. Edit the service file to point the ``ExecStart`` line to the proper location
   of your paperless install, referencing the appropriate Python binary. For
   example:
   ``ExecStart=/path/to/python3 /path/to/paperless/src/manage.py document_consumer``.
 6. Start and enable (so they start on boot) the services.
 .. code-block:: bash
    $ systemctl enable paperless-consumer
    $ systemctl enable paperless-webserver
    $ systemctl start paperless-consumer
    $ systemctl start paperless-webserver
 .. _setup-permanent-standard-upstart:
 Standard (Bare Metal + Upstart)
 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
 Ubuntu 14.04 and earlier use the `Upstart`_ init system to start services
 during the boot process. To configure Upstart to run Paperless automatically
 after restarting your system:
 1. Change to the directory where Upstart's configuration files are kept:
   ``cd /etc/init``
 2. Create a new file: ``sudo nano paperless-server.conf``
 3. In the newly-created file enter::
    start on (local-filesystems and net-device-up IFACE=eth0)
    stop on shutdown
    respawn
    respawn limit 10 5
    script
      exec <path to paperless virtual environment>/bin/gunicorn --pythonpath=<path to parperless>/src paperless.wsgi -w 2
    end script
   Note that you'll need to replace ``/srv/paperless/src/manage.py`` with the
   path to the ``manage.py`` script in your installation directory.
  If you are using a network interface other than ``eth0``, you will have to
  change ``IFACE=eth0``. For example, if you are connected via WiFi, you will
  likely need to replace ``eth0`` above with ``wlan0``. To see all interfaces,
  run ``ifconfig -a``.
  Save the file.
 4. Create a new file: ``sudo nano paperless-consumer.conf``
 5. In the newly-created file enter::
    start on (local-filesystems and net-device-up IFACE=eth0)
    stop on shutdown
    respawn
    respawn limit 10 5
    script
      exec <path to paperless virtual environment>/bin/python <path to parperless>/manage.py document_consumer
    end script
  Replace the path placeholder and ``eth0`` with the appropriate value and save the file.
 These two configuration files together will start both the Paperless webserver
 and document consumer processes when the file system and network interface
 specified is available after boot. Furthermore, if either process ever exits
 unexpectedly, Upstart will try to restart it a maximum of 10 times within a 5
 second period.
 .. _Upstart: http://upstart.ubuntu.com/
 .. _setup-permanent-docker:
 Docker
 ~~~~~~
 If you're using Docker, you can set a restart-policy_ in the
 ``docker-compose.yml`` to have the containers automatically start with the
 Docker daemon.
 .. _restart-policy: https://docs.docker.com/engine/reference/commandline/run/#restart-policies-restart
  .. _redis: https://redis.io/
--- a/docs/usage_overview.rst
+++ b/docs/usage_overview.rst
@@ -0,0 +1,216 @@
 **************
 Usage Overview
 **************
 Paperless is an application that manages your personal documents. With
 the help of a document scanner (see :ref:`scanners`), paperless transforms
 your wieldy physical document binders into a searchable archive and
 provices many utilities for finding and managing your documents.
 Terms and definitions
 #####################
 Paperless esentially consists of two different parts for managing your
 documents:
 * The *consumer* watches a specified folder and adds all documents in that
  folder to paperless.
 * The *web server* provides a UI that you use to manage and search for your
  scanned documents.
 Each document has a couple of fields that you can assign to them:
 * A *Document* is a piece of paper that sometimes contains valuable
  information.
 * The *correspondent* of a document is the person, institution or company that
  a document either originates form, or is sent to.
 * A *tag* is a label that you can assign to documents. Think of labels as more
  powerful folders: Multiple documents can be grouped together with a single
  tag, however, a single document can also have multiple tags. This is not 
  possible with folders. The reason folders are not implemented in paperless
  is simply that tags are much more versatile than folders.
 * A *document type* is used to demarkate the type of a document such as letter,
  bank statement, invoice, contract, etc. It is used to identify what a document
  is about.
 * The *date added* of a document is the date the document was scanned into
  paperless. You cannot and should not change this date.
 * The *date created* of a document is the date the document was intially issued.
  This can be the date you bought a product, the date you signed a contract, or
  the date a letter was sent to you.
 * The *archive serial number* (short: ASN) of a document is the identifier of
  the document in your physical document binders. See
  :ref:`usage-recommended_workflow` below.
 * The *content* of a document is the text that was OCR'ed from the document.
  This text is fed into the search engine and is used for matching tags,
  correspondents and document types.
 .. TODO: hyperref
 Frontend overview
 #################
 .. warning::
    TBD. Add some fancy screenshots!
 Adding documents to paperless
 #############################
 Once you've got Paperless setup, you need to start feeding documents into it.
 Currently, there are three options: the consumption directory, IMAP (email), and
 HTTP POST.
 The consumption directory
 =========================
 The primary method of getting documents into your database is by putting them in
 the consumption directory.  The consumer runs in an infinite
 loop looking for new additions to this directory and when it finds them, it goes
 about the process of parsing them with the OCR, indexing what it finds, and storing
 it in the media directory.
 Getting stuff into this directory is up to you.  If you're running Paperless
 on your local computer, you might just want to drag and drop files there, but if
 you're running this on a server and want your scanner to automatically push
 files to this directory, you'll need to setup some sort of service to accept the
 files from the scanner.  Typically, you're looking at an FTP server like
 `Proftpd`_ or a Windows folder share with `Samba`_.
 .. _Proftpd: http://www.proftpd.org/
 .. _Samba: http://www.samba.org/
 .. TODO: hyperref to configuration of the location of this magic folder.
 IMAP (Email)
 ============
 Another handy way to get documents into your database is to email them to
 yourself.  The typical use-case would be to be out for lunch and want to send a
 copy of the receipt back to your system at home.  Paperless can be taught to
 pull emails down from an arbitrary account and dump them into the consumption
 directory where the consumer will follow the
 usual pattern on consuming the document.
 Some things you need to know about this feature:
 * It's disabled by default. By setting the values below it will be enabled.
 * It's been tested in a limited environment, so it may not work for you (please
  submit a pull request if you can!)
 * It's designed to **delete mail from the server once consumed**.  So don't go
  pointing this to your personal email account and wonder where all your stuff
  went.
 * Currently, only one photo (attachment) per email will work.
 So, with all that in mind, here's what you do to get it running:
 1. Setup a new email account somewhere, or if you're feeling daring, create a
   folder in an existing email box and note the path to that folder.
 2. In ``/etc/paperless.conf`` set all of the appropriate values in
   ``PATHS AND FOLDERS`` and ``SECURITY``.
   If you decided to use a subfolder of an existing account, then make sure you
   set ``PAPERLESS_CONSUME_MAIL_INBOX`` accordingly here.  You also have to set
   the ``PAPERLESS_EMAIL_SECRET`` to something you can remember 'cause you'll
   have to include that in every email you send.
 3. Restart paperless.  Paperless will check
   the configured email account at startup and from then on every 10 minutes
   for something new and pulls down whatever it finds.
 4. Send yourself an email!  Note that the subject is treated as the file name,
   so if you set the subject to ``Correspondent - Title - tag,tag,tag``, you'll
   get what you expect.  Also, you must include the aforementioned secret
   string in every email so the fetcher knows that it's safe to import.
   Note that Paperless only allows the email title to consist of safe characters
   to be imported. These consist of alpha-numeric characters and ``-_ ,.'``.
 REST API
 ========
 You can also submit a document using the REST API, see the API section for details.
 .. _usage-recommended_workflow:
 The recommended workflow
 ########################
 Once you have familiarized yourself with paperless and are ready to use it
 for all your documents, the recommended workflow for managing your documents
 is as follows. This workflow also takes into account that some documents
 have to be kept in physical form, but still ensures that you get all the
 advantages for these documents as well.
 Preparations in paperless
 =========================
 * Create an inbox tag that gets assigned to all new documents.
 * Create a TODO tag.
 Processing of the physical documents
 ====================================
 Keep a physical inbox. Whenever you receive a document that you need to 
 archive, put it into your inbox. Regulary, do the following for all documents
 in your inbox:
 1.  For each document, decide if you need to keep the document in physical
    form. This applies to certain important documents, such as contracts and
    certificates.
 2.  If you need to keep the document, write a running number on the document
    before scanning, starting at one and counting upwards. This is the archive
    serial number, or ASN in short.
 3.  Scan the document.
 4.  If the document has an ASN assigned, store it in a *single* binder, sorted
    by ASN. Don't order this binder in any other way.
 5.  If the document has no ASN, throw it away. Yay!
 Over time, you will notice that your physical binder will fill up. If it is
 full, label the binder with the range of ASNs in this binder (i.e., "Documents
 1 to 343"), store the binder in your cellar or elsewhere, and start a new
 binder.
 The idea behind this process is that you will never have to use the physical
 binders to find a document. If you need a specific physical document, you
 may find this document by:
 1.  Searching in paperless for the document.
 2.  Identify the ASN of the document, since it appears on the scan.
 3.  Grab the relevant document binder and get the document. This is easy since
    they are sorted by ASN.
 Processing of documents in paperless
 ====================================
 Once you have scanned in a document, proceed in paperless as follows.
 1.  If the document has an ASN, assign the ASN to the document.
 2.  Assign a correspondent to the document (i.e., your employer, bank, etc)
    This isnt strictly necessary but helps in finding a document when you need
    it.
 3.  Assign a document type (i.e., invoice, bank statement, etc) to the document
    This isnt strictly necessary but helps in finding a document when you need
    it.
 4.  Assign a proper title to the document (the name of an item you bought, the
    subject of the letter, etc)
 5.  Check that the date of the document is corrent. Paperless tries to read
    the date from the content of the document, but this fails sometimes if the
    OCR is bad or multiple dates appear on the document.
 6.  Remove inbox tags from the documents.
 Task management
 ===============
 Some documents require attention and require you to act on the document. You
 may take two different approaches to handle these documents based on how
 regularly you intent to use paperless and scan documents.
 * If you scan and process your documents in paperless regularly, assign a
  TODO tag to all scanned documents that you need to process. Create a saved
  view on the dashboard that shows all documents with this tag.
 * If you do not scan documents regularly and use paperless solely for archiving,
  create a physical todo box next to your physical inbox and put documents you
  need to process in the TODO box. When you performed the task associated with
  the document, move it to the inbox.
--- a/docs/utilities.rst
+++ b/docs/utilities.rst
@@ -1,284 +0,0 @@
 .. _utilities:
 Utilities
 =========
 There's basically three utilities to Paperless: the webserver, consumer, and
 if needed, the exporter.  They're all detailed here.
 .. _utilities-webserver:
 The Webserver
 -------------
 At the heart of it, Paperless is a simple Django webservice, and the entire
 interface is based on Django's standard admin interface.  Once running, visiting
 the URL for your service delivers the admin, through which you can get a
 detailed listing of all available documents, search for specific files, and
 download whatever it is you're looking for.
 .. _utilities-webserver-howto:
 How to Use It
 .............
 The webserver is started via the ``manage.py`` script:
 .. code-block:: shell-session
    $ /path/to/paperless/src/manage.py runserver
 By default, the server runs on localhost, port 8000, but you can change this
 with a few arguments, run ``manage.py --help`` for more information.
 Add the option ``--noreload`` to reduce resource usage. Otherwise, the server
 continuously polls all source files for changes to auto-reload them.
 Note that when exiting this command your webserver will disappear.
 If you want to run this full-time (which is kind of the point)
 you'll need to have it start in the background -- something you'll need to
 figure out for your own system.  To get you started though, there are Systemd
 service files in the ``scripts`` directory.
 .. _utilities-consumer:
 The Consumer
 ------------
 The consumer script runs in an infinite loop, constantly looking at a directory
 for documents to parse and index.  The process is pretty straightforward:
 1. Look in ``CONSUMPTION_DIR`` for a document.  If one is found, go to #2.
   If not, wait 10 seconds and try again.  On Linux, new documents are detected
   instantly via inotify, so there's no waiting involved.
 2. Parse the document with Tesseract
 3. Create a new record in the database with the OCR'd text
 4. Attempt to automatically assign document attributes by doing some guesswork.
   Read up on the :ref:`guesswork documentation<guesswork>` for more
   information about this process.
 5. Encrypt the document (if you have a passphrase set) and store it in the
   ``media`` directory under ``documents/originals``.
 6. Go to #1.
 .. _utilities-consumer-howto:
 How to Use It
 .............
 The consumer is started via the ``manage.py`` script:
 .. code-block:: shell-session
    $ /path/to/paperless/src/manage.py document_consumer
 This starts the service that will consume documents as they appear in
 ``CONSUMPTION_DIR``.
 Note that this command runs continuously, so exiting it will mean your webserver
 disappears.  If you want to run this full-time (which is kind of the point)
 you'll need to have it start in the background -- something you'll need to
 figure out for your own system.  To get you started though, there are Systemd
 service files in the ``scripts`` directory.
 Some command line arguments are available to customize the behavior of the
 consumer. By default it will use ``/etc/paperless.conf`` values. Display the
 help with:
 .. code-block:: shell-session
    $ /path/to/paperless/src/manage.py document_consumer --help
 .. _utilities-exporter:
 The Exporter
 ------------
 Tired of fiddling with Paperless, or just want to do something stupid and are
 afraid of accidentally damaging your files?  You can export all of your
 documents into neatly named, dated, and unencrypted files.
 .. _utilities-exporter-howto:
 How to Use It
 .............
 This too is done via the ``manage.py`` script:
 .. code-block:: shell-session
    $ /path/to/paperless/src/manage.py document_exporter /path/to/somewhere/
 This will dump all of your unencrypted documents into ``/path/to/somewhere``
 for you to do with as you please.  The files are accompanied with a special
 file, ``manifest.json`` which can be used to :ref:`import the files
 <utilities-importer>` at a later date if you wish.
 .. _utilities-exporter-howto-docker:
 Docker
 ______
 If you are :ref:`using Docker <setup-installation-docker>`, running the
 expoorter is almost as easy.  To mount a volume for exports, follow the
 instructions in the ``docker-compose.yml.example`` file for the ``/export``
 volume (making the changes in your own ``docker-compose.yml`` file, of course).
 Once you have the volume mounted, the command to run an export is:
 .. code-block:: shell-session
   $ docker-compose run --rm consumer document_exporter /export
 If you prefer to use ``docker run`` directly, supplying the necessary commandline
 options:
 .. code-block:: shell-session
   $ # Identify your containers
   $ docker-compose ps
           Name                       Command                State     Ports
   -------------------------------------------------------------------------
   paperless_consumer_1    /sbin/docker-entrypoint.sh ...   Exit 0
   paperless_webserver_1   /sbin/docker-entrypoint.sh ...   Exit 0
   $ # Make sure to replace your passphrase and remove or adapt the id mapping
   $ docker run --rm \
       --volumes-from paperless_data_1 \
       --volume /path/to/arbitrary/place:/export \
       -e PAPERLESS_PASSPHRASE=YOUR_PASSPHRASE \
       -e USERMAP_UID=1000 -e USERMAP_GID=1000 \
       paperless document_exporter /export
 .. _utilities-importer:
 The Importer
 ------------
 Looking to transfer Paperless data from one instance to another, or just want
 to restore from a backup?  This is your go-to toy.
 .. _utilities-importer-howto:
 How to Use It
 .............
 The importer works just like the exporter.  You point it at a directory, and
 the script does the rest of the work:
 .. code-block:: shell-session
    $ /path/to/paperless/src/manage.py document_importer /path/to/somewhere/
 Docker
 ______
 Assuming that you've already gone through the steps above in the
 :ref:`export <utilities-exporter-howto-docker>` section, then the easiest thing
 to do is just re-use the ``/export`` path you already setup:
 .. code-block:: shell-session
   $ docker-compose run --rm consumer document_importer /export
 Similarly, if you're not using docker-compose, you can adjust the export
 instructions above to do the import.
 .. _utilities-retagger:
 Re-running your tagging and correspondent matchers
 --------------------------------------------------
 Say you've imported a few hundred documents and now want to introduce
 a tag or set up a new correspondent, and apply its matching to all of
 the currently-imported docs.  This problem is common enough that
 there are tools for it.
 .. _utilities-retagger-howto:
 How to Do It
 ............
 This too is done via the ``manage.py`` script:
 .. code:: bash
    $ /path/to/paperless/src/manage.py document_retagger
 Run this after changing or adding tagging rules.  It'll loop over all
 of the documents in your database and attempt to match all of your
 tags to them.  If one matches, it'll be applied.  And don't worry, you
 can run this as often as you like, it won't double-tag a document.
 .. code:: bash
    $ /path/to/paperless/src/manage.py document_correspondents
 This is the similar command to run after adding or changing a correspondent.
 .. _utilities-encyption:
 Enabling Encrpytion
 -------------------
 Let's say you've imported a few documents to play around with paperless and now
 you are using it more seriously and want to enable encryption of your files.
 .. utilities-encryption-howto:
 Basic Syntax
 .............
 Again we'll use the ``manage.py`` script, passing ``change_storage_type``:
 .. code:: console
    $ /path/to/paperless/src/manage.py change_storage_type --help
    usage: manage.py change_storage_type [-h] [--version] [-v {0,1,2,3}]
                                     [--settings SETTINGS]
                                     [--pythonpath PYTHONPATH] [--traceback]
                                     [--no-color] [--passphrase PASSPHRASE]
                                     {gpg,unencrypted} {gpg,unencrypted}
    This is how you migrate your stored documents from an encrypted state to an
    unencrypted one (or vice-versa)
    positional arguments:
      {gpg,unencrypted}     The state you want to change your documents from
      {gpg,unencrypted}     The state you want to change your documents to
    optional arguments:
      --passphrase PASSPHRASE
                            If PAPERLESS_PASSPHRASE isn't set already, you need to
                            specify it here
 Enabling Encryption
 ...................
 Basic usage to enable encryption of your document store (**USE A MORE SECURE PASSPHRASE**):
 (Note: If ``PAPERLESS_PASSPHRASE`` isn't set already, you need to specify it here)
 .. code:: bash
    $ /path/to/paperless/src/manage.py change_storage_type [--passphrase SECR3TP4SSPHRA$E] unencrypted gpg
 Disabling Encryption
 ....................
 Basic usage to enable encryption of your document store:
 (Note: Again, if ``PAPERLESS_PASSPHRASE`` isn't set already, you need to specify it here)
 .. code:: bash
    $ /path/to/paperless/src/manage.py change_storage_type [--passphrase SECR3TP4SSPHRA$E] gpg unencrypted