Fixes #172

Introduce some creative code around setting of ALLOWED_HOSTS that defaults to ['*']. Also added PAPERLESS_ALLOWED_HOSTS to paperless.conf.example with an explanation as to what it's for
2025-08-03 18:54:40 -05:00 · 2017-01-03 09:52:31 +00:00
79 changed files with 620 additions and 2029 deletions
--- a/.gitignore
+++ b/.gitignore
@@ -68,7 +68,6 @@ db.sqlite3
 .idea
 # Other stuff that doesn't belong
 .virtualenv
 virtualenv
 .vagrant
 docker-compose.yml
--- a/.travis.yml
+++ b/.travis.yml
@@ -8,10 +8,8 @@ matrix:
          env: TOXENV=py34
        - python: 3.5
          env: TOXENV=py35
-        - python: 3.6
+        - python: 3.5
-          env: TOXENV=py36
+          env: TOXENV=pep8
        - python: 3.6
          env: TOXENV=pycodestyle
 install:
    - pip install --requirement requirements.txt
--- a/6
+++ b/6
@@ -35,16 +35,12 @@ RUN groupadd -g 1000 paperless \
    && useradd -u 1000 -g 1000 -d /usr/src/paperless paperless \
    && chown -Rh paperless:paperless /usr/src/paperless
 # Set export directory
 ENV PAPERLESS_EXPORT_DIR /export
 RUN mkdir -p $PAPERLESS_EXPORT_DIR
 # Setup entrypoint
 COPY scripts/docker-entrypoint.sh /sbin/docker-entrypoint.sh
 RUN chmod 755 /sbin/docker-entrypoint.sh
 # Mount volumes
-VOLUME ["/usr/src/paperless/data", "/usr/src/paperless/media", "/consume", "/export"]
+VOLUME ["/usr/src/paperless/data", "/usr/src/paperless/media", "/consume"]
 ENTRYPOINT ["/sbin/docker-entrypoint.sh"]
 CMD ["--help"]
--- a/README.rst
+++ b/README.rst
@@ -6,7 +6,7 @@ Paperless
 |Travis|
 |Dependencies|
-Index and archive all of your scanned paper documents
+Scan, index, and archive all of your paper documents
 I hate paper.  Environmental issues aside, it's a tech person's nightmare:
@@ -23,18 +23,13 @@ it... because paper.  I wrote this to make my life easier.
 How it Works
 ============
-Paperless does not control your scanner, it only helps you deal with what your
+1. Buy a document scanner like `this one`_.
 scanner produces
 1. Buy a document scanner that can write to a place on your network.  If you
   need some inspiration, have a look at the `scanner recommendations`_ page.
   recommended by another user.
 2. Set it up to "scan to FTP" or something similar. It should be able to push
   scanned images to a server without you having to do anything.  If your
   scanner doesn't know how to automatically upload the file somewhere, you can
   always do that manually.  Paperless doesn't care how the documents get into
   its local consumption directory.
-3. Have the target server run the Paperless consumption script to OCR the file
+3. Have the target server run the Paperless consumption script to OCR the PDF
   and index it into a local database.
 4. Use the web frontend to sift through the database and find what you want.
 5. Download the PDF you need/want via the web interface and do whatever you
@@ -52,15 +47,16 @@ Stability
 =========
 Paperless is still under active development (just look at the git commit
-history) so don't expect it to be 100% stable.  You can backup the sqlite3
+history) so don't expect it to be 100% stable.  I'm using it for my own
-database, media directory and your configuration file to be on the safe side.
+documents, but I'm crazy like that.  If you use this and it breaks something,
 you get to keep all the shiny pieces.
 Requirements
 ============
-This is all really a quite simple, shiny, user-friendly wrapper around some
+This is all really a quite simple, shiny, user-friendly wrapper around some very
-very powerful tools.
+powerful tools.
 * `ImageMagick`_ converts the images between colour and greyscale.
 * `Tesseract`_ does the character recognition.
@@ -86,22 +82,22 @@ Similar Projects
 There's another project out there called `Mayan EDMS`_ that has a surprising
 amount of technical overlap with Paperless.  Also based on Django and using
-a consumer model with Tesseract and Unpaper, Mayan EDMS is *much* more
+a consumer model with Tesseract and unpaper, Mayan EDMS is *much* more
-featureful and comes with a slick UI as well, but still in Python 2. It may be
+featureful and comes with a slick UI as well.  It may be that Paperless is
-that Paperless consumes fewer resources, but to be honest, this is just a guess
+better suited for low-resource environments (like a Rasberry Pi), but to be
-as I haven't tested this myself.  One thing's for certain though, *Paperless*
+honest, this is just a guess as I haven't tested this myself.  One thing's
-is a **much** better name.
+for certain though, *Paperless* is a **much** better name.
 Important Note
 ==============
 Document scanners are typically used to scan sensitive documents.  Things like
-your social insurance number, tax records, invoices, etc.  While Paperless
+your social insurance number, tax records, invoices, etc.  While paperless
-encrypts the original files via the consumption script, the OCR'd text is *not*
+encrypts the original PDFs via the consumption script, the OCR'd text is *not*
 encrypted and is therefore stored in the clear (it needs to be searchable, so
 if someone has ideas on how to do that on encrypted data, I'm all ears).  This
-means that Paperless should never be run on an untrusted host.  Instead, I
+means that paperless should never be run on an untrusted host.  Instead, I
 recommend that if you do want to use it, run it locally on a server in your own
 home.
@@ -119,7 +115,7 @@ The thing is, I'm doing ok for money, so I would instead ask you to donate to
 the `United Nations High Commissioner for Refugees`_.  They're doing important
 work and they need the money a lot more than I do.
-.. _scanner recommendations: https://paperless.readthedocs.io/en/latest/scanners.html
+.. _this one: http://www.brother.ca/en-CA/Scanners/11/ProductDetail/ADS1500W?ProductDetail=productdetail
 .. _ImageMagick: http://imagemagick.org/
 .. _Tesseract: https://github.com/tesseract-ocr
 .. _Unpaper: https://www.flameeyes.eu/projects/unpaper
@@ -140,5 +136,5 @@ work and they need the money a lot more than I do.
   :target: https://gitter.im/danielquinn/paperless?utm_source=badge&utm_medium=badge&utm_campaign=pr-badge&utm_content=badge
 .. |Travis| image:: https://travis-ci.org/danielquinn/paperless.svg?branch=master
   :target: https://travis-ci.org/danielquinn/paperless
-.. |Dependencies| image:: https://www.versioneye.com/user/projects/57b33b81d9f1b00016faa500/badge.svg
+.. |Dependencies| image:: https://www.versioneye.com/user/projects/57b33b81d9f1b00016faa500/badge.svg?style=flat-square
   :target: https://www.versioneye.com/user/projects/57b33b81d9f1b00016faa500
--- a/5
+++ b/5
@@ -12,9 +12,4 @@ Vagrant.configure(VAGRANT_API_VERSION) do |config|
  # Networking details
  config.vm.network "private_network", ip: "172.28.128.4"
  config.vm.provider "virtualbox" do |vb|
    # Customize the amount of memory on the VM:
    vb.memory = "1024"
  end
 end
--- a/docker-compose.yml.example
+++ b/docker-compose.yml.example
@@ -17,7 +17,7 @@ services:
        # value with nothing.
        environment:
            - PAPERLESS_OCR_LANGUAGES=
-        command: ["runserver", "--insecure", "0.0.0.0:8000"]
+        command: ["runserver", "0.0.0.0:8000"]
    consumer:
        image: pitkley/paperless
@@ -26,7 +26,7 @@ services:
            - media:/usr/src/paperless/media
            # You have to adapt the local path you want the consumption
            # directory to mount to by modifying the part before the ':'.
-            - ./consume:/consume
+            - /path/to/arbitrary/place:/consume
            # Likewise, you can add a local path to mount a directory for
            # exporting. This is not strictly needed for paperless to
            # function, only if you're exporting your files: uncomment
--- a/docs/changelog.rst
+++ b/docs/changelog.rst
@@ -1,125 +1,9 @@
 Changelog
 #########
 * 1.1.0
  * Fix for `#283`_, a redirect bug which broke interactions with
    paperless-desktop.  Thanks to `chris-aeviator`_ for reporting it.
  * Addition of an optional new financial year filter, courtesy of
    `David Martin`_ `#256`_
  * Fixed a typo in how thumbnails were named in exports `#285`_, courtesy of
    `Dan Panzarella`_
 * 1.0.0
  * Upgrade to Django 1.11.  **You'll need to run
    ``pip install -r requirements.txt`` after the usual ``git pull`` to
    properly update**.
  * Replace the templatetag-based hack we had for document listing in favour of
    a slightly less ugly solution in the form of another template tag with less
    copypasta.
  * Support for multi-word-matches for auto-tagging thanks to an excellent
    patch from `ishirav`_ `#277`_.
  * Fixed a CSS bug reported by `Stefan Hagen`_ that caused an overlapping of
    the text and checkboxes under some resolutions `#272`_.
  * Patched the Docker config to force the serving of static files.  Credit for
    this one goes to `dev-rke`_ via `#248`_.
  * Fix file permissions during Docker start up thanks to `Pit`_ on `#268`_.
  * Date fields in the admin are now expressed as HTML5 date fields thanks to
    `Lukas Winkler`_'s issue `#278`_
 * 0.8.0
  * Paperless can now run in a subdirectory on a host (``/paperless``), rather
    than always running in the root (``/``) thanks to `maphy-psd`_'s work on
    `#255`_.
 * 0.7.0
  * **Potentially breaking change**: As per `#235`_, Paperless will no longer
    automatically delete documents attached to correspondents when those
    correspondents are themselves deleted.  This was Django's default
    behaviour, but didn't make much sense in Paperless' case.  Thanks to
    `Thomas Brueggemann`_ and `David Martin`_ for their input on this one.
  * Fix for `#232`_ wherein Paperless wasn't recognising ``.tif`` files
    properly.  Thanks to `ayounggun`_ for reporting this one and to
    `Kusti Skytén`_ for posting the correct solution in the Github issue.
 * 0.6.0
  * Abandon the shared-secret trick we were using for the POST API in favour
    of BasicAuth or Django session.
  * Fix the POST API so it actually works.  `#236`_
  * **Breaking change**: We've dropped the use of ``PAPERLESS_SHARED_SECRET``
    as it was being used both for the API (now replaced with a normal auth)
    and form email polling.  Now that we're only using it for email, this
    variable has been renamed to ``PAPERLESS_EMAIL_SECRET``.  The old value
    will still work for a while, but you should change your config if you've
    been using the email polling feature.  Thanks to `Joshua Gilman`_ for all
    the help with this feature.
 * 0.5.0
  * Support for fuzzy matching in the auto-tagger & auto-correspondent systems
    thanks to `Jake Gysland`_'s patch `#220`_.
  * Modified the Dockerfile to prepare an export directory (`#212`_).  Thanks
    to combined efforts from `Pit`_ and `Strubbl`_ in working out the kinks on
    this one.
  * Updated the import/export scripts to include support for thumbnails.  Big
    thanks to `CkuT`_ for finding this shortcoming and doing the work to get
    it fixed in `#224`_.
  * All of the following changes are thanks to `David Martin`_:
    * Bumped the dependency on pyocr to 0.4.7 so new users can make use of
    Tesseract 4 if they so prefer (`#226`_).
    * Fixed a number of issues with the automated mail handler (`#227`_, `#228`_)
    * Amended the documentation for better handling of systemd service files (`#229`_)
    * Amended the Django Admin configuration to have nice headers (`#230`_)
 * 0.4.1
  * Fix for `#206`_ wherein the pluggable parser didn't recognise files with
    all-caps suffixes like ``.PDF``
 * 0.4.0
  * Introducing reminders.  See `#199`_ for more information, but the short
    explanation is that you can now attach simple notes & times to documents
    which are made available via the API.  Currently, the default API
    (basically just the Django admin) doesn't really make use of this, but
    `Thomas Brueggemann`_ over at `Paperless Desktop`_ has said that he would
    like to make use of this feature in his project.
 * 0.3.6
  * Fix for `#200`_ (!!) where the API wasn't configured to allow updating the
    correspondent or the tags for a document.
  * The ``content`` field is now optional, to allow for the edge case of a
    purely graphical document.
  * You can no longer add documents via the admin.  This never worked in the
    first place, so all I've done here is remove the link to the broken form.
  * The consumer code has been heavily refactored to support a pluggable
    interface.  Install a paperless consumer via pip and tell paperless about
    it with an environment variable, and you're good to go.  Proper
    documentation is on its way.
 * 0.3.5
  * A serious facelift for the documents listing page wherein we drop the
    tabular layout in favour of a tiled interface.
  * Users can now configure the number of items per page.
  * Fix for `#171`_: Allow users to specify their own ``SECRET_KEY`` value.
  * Moved the dotenv loading to the top of settings.py
  * Fix for `#112`_: Added checks for binaries required for document
    consumption.
 * 0.3.4
  * Removal of django-suit due to a licensing conflict I bumped into in 0.3.3.
    Note that you *can* use Django Suit with Paperless, but only in a
    non-profit situation as their free license prohibits for-profit use.  As a
    result, I can't bundle Suit with Paperless without conflicting with the
    GPL.  Further development will be done against the stock Django admin.
  * I shrunk the thumbnails a little 'cause they were too big for me, even on
    my high-DPI monitor.
  * BasicAuth support for document and thumbnail downloads, as well as the Push
    API thanks to @thomasbrueggemann.  See `#179`_.
 * 0.3.3
  * Thumbnails in the UI and a Django-suit -based face-lift courtesy of @ekw!
  * Timezone, items per page, and default language are now all configurable,
    also thanks to @ekw.
 * 0.3.2
-  * Fix for `#172`_: defaulting ALLOWED_HOSTS to ``["*"]`` and allowing the
+  * Fix for #172: defaulting ALLOWED_HOSTS to ``["*"]`` and allowing the user
-    user to set her own value via ``PAPERLESS_ALLOWED_HOSTS`` should the need
+    to set her own value via ``PAPERLESS_ALLOWED_HOSTS`` should the need
    arise.
 * 0.3.1
@@ -142,8 +26,7 @@ Changelog
    ``paperless.conf``.
  * `#148`_: The database location (sqlite) is now a variable you can set in
    ``paperless.conf``.
-  * `#146`_: Fixed a bug that allowed unauthorised access to the ``/fetch``
+  * `#146`_: Fixed a bug that allowed unauthorised access to the `/fetch` URL.
    URL.
  * `#131`_: Document files are now automatically removed from disk when
    they're deleted in Paperless.
  * `#121`_: Fixed a bug where Paperless wasn't setting document creation time
@@ -252,22 +135,6 @@ Changelog
 .. _Tim White: https://github.com/timwhite
 .. _Florian Harr: https://github.com/evils
 .. _Justin Snyman: https://github.com/stringlytyped
 .. _Thomas Brueggemann: https://github.com/thomasbrueggemann
 .. _Jake Gysland: https://github.com/jgysland
 .. _Strubbl: https://github.com/strubbl
 .. _CkuT: https://github.com/CkuT
 .. _David Martin: https://github.com/ddddavidmartin
 .. _Paperless Desktop: https://github.com/thomasbrueggemann/paperless-desktop
 .. _Joshua Gilman: https://github.com/jmgilman
 .. _ayounggun: https://github.com/ayounggun
 .. _Kusti Skytén: https://github.com/kskyten
 .. _maphy-psd: https://github.com/maphy-psd
 .. _ishirav: https://github.com/ishirav
 .. _Stefan Hagen: https://github.com/xkpd3
 .. _dev-rke: https://github.com/dev-rke
 .. _Lukas Winkler: https://github.com/Findus23
 .. _chris-aeviator: https://github.com/chris-aeviator
 .. _Dan Panzarella: https://github.com/pzl
 .. _#20: https://github.com/danielquinn/paperless/issues/20
 .. _#44: https://github.com/danielquinn/paperless/issues/44
@@ -285,35 +152,8 @@ Changelog
 .. _#89: https://github.com/danielquinn/paperless/issues/89
 .. _#94: https://github.com/danielquinn/paperless/issues/94
 .. _#98: https://github.com/danielquinn/paperless/issues/98
 .. _#112: https://github.com/danielquinn/paperless/issues/112
 .. _#121: https://github.com/danielquinn/paperless/issues/121
 .. _#131: https://github.com/danielquinn/paperless/issues/131
 .. _#146: https://github.com/danielquinn/paperless/issues/146
 .. _#148: https://github.com/danielquinn/paperless/pull/148
 .. _#150: https://github.com/danielquinn/paperless/pull/150
 .. _#171: https://github.com/danielquinn/paperless/issues/171
 .. _#172: https://github.com/danielquinn/paperless/issues/172
 .. _#179: https://github.com/danielquinn/paperless/pull/179
 .. _#199: https://github.com/danielquinn/paperless/issues/199
 .. _#200: https://github.com/danielquinn/paperless/issues/200
 .. _#206: https://github.com/danielquinn/paperless/issues/206
 .. _#212: https://github.com/danielquinn/paperless/pull/212
 .. _#220: https://github.com/danielquinn/paperless/pull/220
 .. _#224: https://github.com/danielquinn/paperless/pull/224
 .. _#226: https://github.com/danielquinn/paperless/pull/226
 .. _#227: https://github.com/danielquinn/paperless/pull/227
 .. _#228: https://github.com/danielquinn/paperless/pull/228
 .. _#229: https://github.com/danielquinn/paperless/pull/229
 .. _#230: https://github.com/danielquinn/paperless/pull/230
 .. _#232: https://github.com/danielquinn/paperless/issues/232
 .. _#235: https://github.com/danielquinn/paperless/issues/235
 .. _#236: https://github.com/danielquinn/paperless/issues/236
 .. _#255: https://github.com/danielquinn/paperless/pull/255
 .. _#268: https://github.com/danielquinn/paperless/pull/268
 .. _#277: https://github.com/danielquinn/paperless/pull/277
 .. _#272: https://github.com/danielquinn/paperless/issues/272
 .. _#248: https://github.com/danielquinn/paperless/issues/248
 .. _#278: https://github.com/danielquinn/paperless/issues/248
 .. _#283: https://github.com/danielquinn/paperless/issues/283
 .. _#256: https://github.com/danielquinn/paperless/pull/256
 .. _#285: https://github.com/danielquinn/paperless/pull/285
--- a/docs/consumption.rst
+++ b/docs/consumption.rst
@@ -121,21 +121,18 @@ So, with all that in mind, here's what you do to get it running:
 1. Setup a new email account somewhere, or if you're feeling daring, create a
   folder in an existing email box and note the path to that folder.
-2. In ``/etc/paperless.conf`` set all of the appropriate values in
+2. In ``settings.py`` set all of the appropriate values in ``MAIL_CONSUMPTION``.
   ``PATHS AND FOLDERS`` and ``SECURITY``.
   If you decided to use a subfolder of an existing account, then make sure you
-   set ``PAPERLESS_CONSUME_MAIL_INBOX`` accordingly here.  You also have to set
+   set ``INBOX`` accordingly here.  You also have to set the
-   the ``PAPERLESS_EMAIL_SECRET`` to something you can remember 'cause you'll
+   ``UPLOAD_SHARED_SECRET`` to something you can remember 'cause you'll have to
-   have to include that in every email you send.
+   include that in every email you send.
 3. Restart the :ref:`consumer <utilities-consumer>`.  The consumer will check
-   the configured email account at startup and from then on every 10 minutes
+   the configured email account every 10 minutes for something new and pull down
-   for something new and pulls down whatever it finds.
+   whatever it finds.
 4. Send yourself an email!  Note that the subject is treated as the file name,
   so if you set the subject to ``Correspondent - Title - tag,tag,tag``, you'll
   get what you expect.  Also, you must include the aforementioned secret
   string in every email so the fetcher knows that it's safe to import.
   Note that Paperless only allows the email title to consist of safe characters
   to be imported. These consist of alpha-numeric characters and ``-_ ,.'``.
 5. After a few minutes, the consumer will poll your mailbox, pull down the
   message, and place the attachment in the consumption directory with the
   appropriate name.  A few minutes later, the consumer will import it like any
@@ -147,83 +144,46 @@ So, with all that in mind, here's what you do to get it running:
 HTTP POST
 =========
-You can also submit a document via HTTP POST, so long as you do so after
+You can also submit a document via HTTP POST.  It doesn't do tags yet, and the
-authenticating.  To push your document to Paperless, send an HTTP POST to the
+URL schema isn't concrete, but it's a start.
-server with the following name/value pairs:
+
 To push your document to Paperless, send an HTTP POST to the server with the
 following name/value pairs:
 * ``correspondent``: The name of the document's correspondent.  Note that there
  are restrictions on what characters you can use here.  Specifically,
-  alphanumeric characters, `-`, `,`, `.`, and `'` are ok, everything else is
+  alphanumeric characters, `-`, `,`, `.`, and `'` are ok, everything else it
  out.  You also can't use the sequence ` - ` (space, dash, space).
 * ``title``: The title of the document.  The rules for characters is the same
  here as the correspondent.
-* ``document``: The file you're uploading
+* ``signature``: For security reasons, we have the correspondent send a
  signature using a "shared secret" method to make sure that random strangers
  don't start uploading stuff to your server.  The means of generating this
  signature is defined below.
 Specify ``enctype="multipart/form-data"``, and then POST your file with::
    Content-Disposition: form-data; name="document"; filename="whatever.pdf"
 An example of this in HTML is a typical form:
-.. code:: html
+.. _consumption-http-signature:
-    <form method="post" enctype="multipart/form-data">
+Generating the Signature
-        <input type="text" name="correspondent" value="My Correspondent" />
+------------------------
        <input type="text" name="title" value="My Title" />
        <input type="file" name="document" />
        <input type="submit" name="go" value="Do the thing" />
    </form>
-But a potentially more useful way to do this would be in Python.  Here we use
+Generating a signature based a shared secret is pretty simple: define a secret,
-the requests library to handle basic authentication and to send the POST data
+and store it on the server and the client.  Then use that secret, along with
-to the URL.
+the text you want to verify to generate a string that you can use for
 verification.
 In the case of Paperless, you configure the server with the secret by setting
 ``UPLOAD_SHARED_SECRET``.  Then on your client, you generate your signature by
 concatenating the correspondent, title, and the secret, and then using sha256
 to generate a hexdigest.
 If you're using Python, this is what that looks like:
 .. code:: python
    import os
    from hashlib import sha256
-
+    signature = sha256(correspondent + title + secret).hexdigest()
    import requests
    from requests.auth import HTTPBasicAuth
    # You authenticate via BasicAuth or with a session id.
    # We use BasicAuth here
    username = "my-username"
    password = "my-super-secret-password"
    # Where you have Paperless installed and listening
    url = "http://localhost:8000/push"
    # Document metadata
    correspondent = "Test Correspondent"
    title = "Test Title"
    # The local file you want to push
    path = "/path/to/some/directory/my-document.pdf"
    with open(path, "rb") as f:
        response = requests.post(
            url=url,
            data={"title": title,  "correspondent": correspondent},
            files={"document": (os.path.basename(path), f, "application/pdf")},
            auth=HTTPBasicAuth(username, password),
            allow_redirects=False
        )
        if response.status_code == 202:
            # Everything worked out ok
            print("Upload successful")
        else:
            # If you don't get a 202, it's probably because your credentials
            # are wrong or something.  This will give you a rough idea of what
            # happened.
            print("We got HTTP status code: {}".format(response.status_code))
            for k, v in response.headers.items():
                print("{}: {}".format(k, v))
--- a/docs/guesswork.rst
+++ b/docs/guesswork.rst
@@ -80,12 +80,6 @@ text and matching algorithm.  From the help info there:
    uses a regex to match the PDF.  If you don't know what a regex is, you
    probably don't want this option.
 When using the "any" or "all" matching algorithms, you can search for terms that
 consist of multiple words by enclosing them in double quotes. For example, defining
 a match text of ``"Bank of America" BofA`` using the "any" algorithm, will match
 documents that contain either "Bank of America" or "BofA", but will not match
 documents containing "Bank of South America".
 Then just save your tag/correspondent and run another document through the
 consumer.  Once complete, you should see the newly-created document,
 automatically tagged with the appropriate data.
--- a/docs/index.rst
+++ b/docs/index.rst
@@ -3,11 +3,7 @@
 Paperless
 =========
-Paperless is a simple Django application running in two parts:
+Scan, index, and archive all of your paper documents.  Say goodbye to paper.
 a :ref:`consumer <utilities-consumer>` (the thing that does the indexing) and
 the :ref:`webserver <utilities-webserver>` (the part that lets you search & download
 already-indexed documents). If you want to learn more about its functions keep on
 reading after the installation section.
 .. _index-why-this-exists:
@@ -19,11 +15,10 @@ Paper is a nightmare.  Environmental issues aside, there's no excuse for it in
 the 21st century.  It takes up space, collects dust, doesn't support any form of
 a search feature, indexing is tedious, it's heavy and prone to damage & loss.
-I wrote this to make "going paperless" easier.  I do not have to worry about
+I wrote this to make "going paperless" easier.  I wanted to be able to feed
-finding stuff again. I feed documents right from the post box into the scanner and
+documents right from the post box into the scanner and then shred them so I
-then shred them.  Perhaps you might find it useful too.
+never have to worry about finding stuff again.  Perhaps you might find it useful
-
+too.
 Contents
@@ -40,5 +35,4 @@ Contents
   guesswork
   migrating
   troubleshooting
   scanners
   changelog
--- a/docs/requirements.rst
+++ b/docs/requirements.rst
@@ -4,7 +4,7 @@ Requirements
 ============
 You need a Linux machine or Unix-like setup (theoretically an Apple machine
-should work) that has the following software installed:
+should work) that has the following software installed on it:
 * `Python3`_ (with development libraries, pip and virtualenv)
 * `GNU Privacy Guard`_
@@ -21,14 +21,14 @@ should work) that has the following software installed:
 Notably, you should confirm how you access your Python3 installation.  Many
 Linux distributions will install Python3 in parallel to Python2, using the names
 ``python3`` and ``python`` respectively.  The same goes for ``pip3`` and
-``pip``.  Running Paperless with Python2 will likely break things, so make sure that 
+``pip``.  Using Python2 will likely break things, so make sure that you're using
-you're using the right version.
+the right version.
 For the purposes of simplicity, ``python`` and ``pip`` is used everywhere to
-refer to their Python3 versions.
+refer to their Python 3 versions.
 In addition to the above, there are a number of Python requirements, all of
-which are listed in a file called ``requirements.txt`` in the project root directory.
+which are listed in a file called ``requirements.txt`` in the project root.
 If you're not working on a virtual environment (like Vagrant or Docker), you
 should probably be using a virtualenv, but that's your call.  The reasons why
@@ -67,7 +67,7 @@ dependencies is easy:
    $ pip install --user --requirement /path/to/paperless/requirements.txt
-This will download and install all of the requirements into
+This should download and install all of the requirements into
 ``${HOME}/.local``.  Remember that your distribution may be using ``pip3`` as
 mentioned above.
@@ -86,8 +86,8 @@ enter it, and install the requirements using the ``requirements.txt`` file:
    $ . /path/to/arbitrary/directory/bin/activate
    $ pip install  --requirement /path/to/paperless/requirements.txt
-Now you're ready to go.  Just remember to enter (activate) your virtualenv 
+Now you're ready to go.  Just remember to enter your virtualenv whenever you
-whenever you want to use Paperless.
+want to use Paperless.
 .. _requirements-documentation:
@@ -95,7 +95,7 @@ whenever you want to use Paperless.
 Documentation
 -------------
-As generation of the documentation is not required for the use of Paperless,
+As generation of the documentation is not required for use of Paperless,
 dependencies for this process are not included in ``requirements.txt``.  If
 you'd like to generate your own docs locally, you'll need to:
--- a/docs/scanners.rst
+++ b/docs/scanners.rst
@@ -1,29 +0,0 @@
 .. _scanners:
 Scanner Recommendations
 =======================
 As Paperless operates by watching a folder for new files, doesn't care what
 scanner you use, but sometimes finding a scanner that will write to an FTP,
 NFS, or SMB server can be difficult.  This page is here to help you find one
 that works right for you based on recommentations from other Paperless users.
 +---------+----------------+-----+-----+-----+----------------+
 | Brand   | Model          | Supports        | Recommended By |
 +---------+----------------+-----+-----+-----+----------------+
 |         |                | FTP | NFS | SMB |                |
 +=========+================+=====+=====+=====+================+
 | Brother | `ADS-1500W`_   | yes | no  | yes | `danielquinn`_ |
 +---------+----------------+-----+-----+-----+----------------+
 | Brother | `MFC-J6930DW`_ | yes |     |     | `ayounggun`_   |
 +---------+----------------+-----+-----+-----+----------------+
 | Fujitsu | `ix500`_       | yes |     | yes | `eonist`_      |
 +---------+----------------+-----+-----+-----+----------------+
 .. _ADS-1500W: https://www.brother.ca/en/p/ads1500w
 .. _MFC-J6930DW: https://www.brother.ca/en/p/MFCJ6930DW
 .. _ix500: http://www.fujitsu.com/us/products/computing/peripheral/scanners/scansnap/ix500/
 .. _danielquinn: https://github.com/danielquinn
 .. _ayounggun: https://github.com/ayounggun
 .. _eonist: https://github.com/eonist
--- a/docs/setup.rst
+++ b/docs/setup.rst
@@ -4,7 +4,7 @@ Setup
 =====
 Paperless isn't a very complicated app, but there are a few components, so some
-basic documentation is in order.  If you follow along in this document and
+basic documentation is in order.  If you go follow along in this document and
 still have trouble, please open an `issue on GitHub`_ so I can fill in the
 gaps.
@@ -28,7 +28,6 @@ or just download the tarball and go that route:
 .. code:: bash
    $ cd to the directory where you want to run Paperless
    $ wget https://github.com/danielquinn/paperless/archive/master.zip
    $ unzip master.zip
    $ cd paperless-master
@@ -44,9 +43,7 @@ route`_ is quick & easy, but means you're running a VM which comes with memory
 consumption etc. We also `support Docker`_, which you can use natively under
 Linux and in a VM with `Docker Machine`_ (this guide was written for native
 Docker usage under Linux, you might have to adapt it for Docker Machine.)
-Not to forget the virtualenv, this is similar to `bare metal`_ with the
+Alternatively the standard, `bare metal`_ approach is a little more
 exception that you have to activate the virtualenv first.
 Last but not least, the standard `bare metal`_ approach is a little more
 complicated, but worth it because it makes it easier should you want to
 contribute some code back.
@@ -62,11 +59,9 @@ Standard (Bare Metal)
 .....................
 1. Install the requirements as per the :ref:`requirements <requirements>` page.
-2. Within the extract of master.zip go to the ``src`` directory.
+2. Change to the ``src`` directory in this repo.
-3. Copy ``paperless.conf.example`` to ``/etc/paperless.conf`` also the virtual
+3. Copy ``paperless.conf.example`` to ``/etc/paperless.conf`` and open it in
-   envrionment look there for it and open it in your favourite editor.
+   your favourite editor.  Set the values for:
   Because this file contains passwords it should only be readable by user root
   and paperless !  Set the values for:
    * ``PAPERLESS_CONSUMPTION_DIR``: this is where your documents will be
      dumped to be consumed by Paperless.
@@ -75,19 +70,18 @@ Standard (Bare Metal)
    * ``PAPERLESS_OCR_THREADS``: this is the number of threads the OCR process
      will spawn to process document pages in parallel.
-4. Initialise the SQLite database with ``./manage.py migrate``.
+4. Initialise the database with ``./manage.py migrate``.
 5. Create a user for your Paperless instance with
   ``./manage.py createsuperuser``. Follow the prompts to create your user.
 6. Start the webserver with ``./manage.py runserver <IP>:<PORT>``.
-   If no specifc IP or port are given, the default is ``127.0.0.1:8000``
+   If no specifc IP or port are given, the default is ``127.0.0.1:8000``.
-   also known as http://localhost:8000/.
+   You should now be able to visit your (empty) `Paperless webserver`_ at
-   You should now be able to visit your (empty) at `Paperless webserver`_ or
+   ``127.0.0.1:8000`` (or whatever you chose).  You can login with the
-   whatever you chose before.  You can login with the user/pass you created in
+   user/pass you created in #5.
   #5.
 7. In a separate window, change to the ``src`` directory in this repo again,
   but this time, you should start the consumer script with
   ``./manage.py document_consumer``.
-8. Scan something or put a file into the  ``CONSUMPTION_DIR``.
+8. Scan something.  Put it in the ``CONSUMPTION_DIR``.
 9. Wait a few minutes
 10. Visit the document list on your webserver, and it should be there, indexed
    and downloadable.
@@ -305,21 +299,17 @@ Standard (Bare Metal, Systemd)
 If you're running on a bare metal system that's using Systemd, you can use the
 service unit files in the ``scripts`` directory to set this up.  You'll need to
-create a user called ``paperless`` (without login (if not already done so #5))
+create a user called ``paperless`` and setup Paperless to be in a place that
-and setup Paperless to be in a place that this new user can read and write to.
+this new user can read and write to. Be sure to edit the service scripts to point
-Be sure to edit the service  scripts to point to the proper location of your
+to the proper location of your paperless install, referencing the appropriate Python
-paperless install, referencing the appropriate Python binary. For example:
+binary. For example: ``ExecStart=/path/to/python3 /path/to/paperless/src/manage.py document_consumer``.
-``ExecStart=/path/to/python3 /path/to/paperless/src/manage.py document_consumer``.
+If you don't want to make a new user, you can change the ``Group`` and ``User`` variables
-If you don't want to make a new user, you can change the ``Group`` and ``User``
+accordingly.
 variables accordingly.
-Then, as ``root`` (or using ``sudo``) you can just copy the ``.service`` files
+Then, you can just tell Systemd as ``root`` (or using ``sudo``) to enable the two ``.service`` files::
 to the Systemd directory and tell it to enable the two services::
-    # cp /path/to/paperless/scripts/paperless-consumer.service /etc/systemd/system/
+    # systemctl enable /path/to/paperless/scripts/paperless-consumer.service
-    # cp /path/to/paperless/scripts/paperless-webserver.service /etc/systemd/system/
+    # systemctl enable /path/to/paperless/scripts/paperless-webserver.service
    # systemctl enable paperless-consumer
    # systemctl enable paperless-webserver
    # systemctl start paperless-consumer
    # systemctl start paperless-webserver
@@ -354,7 +344,7 @@ after restarting your system:
  If you are using a network interface other than ``eth0``, you will have to
  change ``IFACE=eth0``. For example, if you are connected via WiFi, you will
  likely need to replace ``eth0`` above with ``wlan0``. To see all interfaces,
-  run ``ifconfig -a``.
+  run ``ifconfig``.
  Save the file.
@@ -394,10 +384,7 @@ Using a Real Webserver
 The default is to use Django's development server, as that's easy and does the
 job well enough on a home network.  However, if you want to do things right,
 it's probably a good idea to use a webserver capable of handling more than one
-thread. You will also have to let the webserver serve the static files (CSS,
+thread.
 JavaScript) from the directory configured in ``PAPERLESS_STATICDIR``. For that,
 you need to run ``./manage.py collectstatic`` in the ``src`` directory.  The
 default static files directory is ``../static``.
 Apache
 ~~~~~~
@@ -575,28 +562,3 @@ If you're using Docker, you can set a restart-policy_ in the
 Docker daemon.
 .. _restart-policy: https://docs.docker.com/engine/reference/commandline/run/#restart-policies-restart
 .. _setup-subdirectory
 Hosting Paperless in a Subdirectory
 -----------------------------------
 Paperless was designed to run off the root of the hosting domain,
 (ie: ``https://example.com/``) but with a few changes, you can configure
 it to run in a subdirectory on your server
 (ie: ``https://example.com/paperless/``).
 Thanks to the efforts of `maphy-psd`_ on `Github`_, running Paperless in a
 subdirectory is now as easy as setting a config variable.  Simply set
 ``PAPERLESS_FORCE_SCRIPT_NAME`` in your environment or
 ``/etc/paperless.conf`` to the path you want Paperless hosted at, configure
 Nginx/Apache for your needs and you're done.  So, if you want Paperless to live
 at ``https://example.com/arbitrary/path/to/paperless`` then you just set
 ``PAPERLESS_FORCE_SCRIPT_NAME`` to ``/arbitrary/path/to/paperless``.  Note the
 leading ``/`` there.
 As to how to configure Nginx or Apache for this, that's on you :-)
 .. _maphy-psd: https://github.com/maphy-psd
 .. _Github: https://github.com/danielquinn/paperless/pull/255
--- a/paperless.conf.example
+++ b/paperless.conf.example
@@ -1,34 +1,11 @@
 # Sample paperless.conf
 # Copy this file to /etc/paperless.conf and modify it to suit your needs.
 # As this file contains passwords it should only be readable by the user
 # running paperless.
 ###############################################################################
 ####                         Paths & Folders                               ####
 ###############################################################################
 # This where your documents should go to be consumed.  Make sure that it exists
 # and that the user running the paperless service can read/write its contents
 # before you start Paperless.
 PAPERLESS_CONSUMPTION_DIR=""
 # You can specify where you want the SQLite database to be stored instead of
 # the default location of /data/ within the install directory.
 #PAPERLESS_DBDIR=/path/to/database/file
 # Override the default MEDIA_ROOT here.  This is where all files are stored.
 # The default location is /media/documents/ within the install folder.
 #PAPERLESS_MEDIADIR=/path/to/media
 # Override the default STATIC_ROOT here.  This is where all static files
 # created using "collectstatic" manager command are stored.
 #PAPERLESS_STATICDIR=""
 # These values are required if you want paperless to check a particular email
 # box every 10 minutes and attempt to consume documents from there.  If you
 # don't define a HOST, mail checking will just be disabled.
@@ -37,19 +14,6 @@ PAPERLESS_CONSUME_MAIL_PORT=""
 PAPERLESS_CONSUME_MAIL_USER=""
 PAPERLESS_CONSUME_MAIL_PASS=""
 # Override the default IMAP inbox here. If not set Paperless defaults to
 # "INBOX".
 #PAPERLESS_CONSUME_MAIL_INBOX="INBOX"
 # Any email sent to the target account that does not contain this text will be
 # ignored.
 PAPERLESS_EMAIL_SECRET=""
 ###############################################################################
 ####                              Security                                 ####
 ###############################################################################
 # You must have a passphrase in order for Paperless to work at all.  If you set
 # this to "", GNUGPG will "encrypt" your PDF by writing it out as a zero-byte
 # file.
@@ -64,43 +28,20 @@ PAPERLESS_EMAIL_SECRET=""
 # you've since changed it to a new one.
 PAPERLESS_PASSPHRASE="secret"
-
+# If you intend to consume documents either via HTTP POST or by email, you must
-# The secret key has a default that should be fine so long as you're hosting
+# have a shared secret here.
-# Paperless on a closed network.  However, if you're putting this anywhere
+PAPERLESS_SHARED_SECRET=""
 # public, you should change the key to something unique and verbose.
 #PAPERLESS_SECRET_KEY="change-me"
 # If you're planning on putting Paperless on the open internet, then you
 # really should set this value to the domain name you're using.  Failing to do
 # so leaves you open to HTTP host header attacks:
 # https://docs.djangoproject.com/en/1.10/topics/security/#host-headers-virtual-hosting
 #
 # Just remember that this is a comma-separated list, so "example.com" is fine,
 # as is "example.com,www.example.com", but NOT " example.com" or "example.com,"
 #PAPERLESS_ALLOWED_HOSTS="example.com,www.example.com"
 # To host paperless under a subpath url like example.com/paperless you set
 # this value to /paperless. No trailing slash!
 #
 # https://docs.djangoproject.com/en/1.11/ref/settings/#force-script-name
 #PAPERLESS_FORCE_SCRIPT_NAME=""
 ###############################################################################
 ####                          Software Tweaks                              ####
 ###############################################################################
 # After a document is consumed, Paperless can trigger an arbitrary script if
 # you like.  This script will be passed a number of arguments for you to work
 # with.  The default is blank, which means nothing will be executed.  For more
-# information, take a look at the docs:
+# information, take a look at the docs: http://paperless.readthedocs.org/en/latest/consumption.html#hooking-into-the-consumption-process
 # http://paperless.readthedocs.org/en/latest/consumption.html#hooking-into-the-consumption-process
 #PAPERLESS_POST_CONSUME_SCRIPT="/path/to/an/arbitrary/script.sh"
 #
 # The following values use sensible defaults for modern systems, but if you're
-# running Paperless on a low-resource device (like a Raspberry Pi), modifying
+# running Paperless on a low-resource machine (like a Raspberry Pi), modifying
 # some of these values may be necessary.
 #
@@ -110,15 +51,8 @@ PAPERLESS_PASSPHRASE="secret"
 # an integer:
 #PAPERLESS_OCR_THREADS=1
 # Customize the default language that tesseract will attempt to use when
 # parsing documents.  It should be a 3-letter language code consistent with ISO
 # 639: https://www.loc.gov/standards/iso639-2/php/code_list.php
 #PAPERLESS_OCR_LANGUAGE=eng
 # On smaller systems, or even in the case of Very Large Documents, the consumer
-# may explode, complaining about how it's "unable to extend pixel cache".  In
+# may explode, complaining about how it's "unable to extent pixel cache".  In
 # such cases, try setting this to a reasonably low value, like 32000000.  The
 # default is to use whatever is necessary to do everything without writing to
 # disk, and units are in megabytes.
@@ -127,6 +61,16 @@ PAPERLESS_PASSPHRASE="secret"
 # the web for "MAGICK_MEMORY_LIMIT".
 #PAPERLESS_CONVERT_MEMORY_LIMIT=0
 # By default the conversion density setting for documents is 300DPI, in some
 # cases it has proven useful to configure a lesser value.
 # This setting has a high impact on the physical size of tmp page files,
 # the speed of document conversion, and can affect the accuracy of OCR
 # results. Individual results can vary and this setting should be tested 
 # thoroughly against the documents you are importing to see if it has any 
 # impacts either negative or positive. Testing on limited document sets has
 # shown a setting of 200 can cut the size of tmp files by 1/3, and speed up
 # conversion by up to 4x with little impact to OCR accuracy.
 #PAPERLESS_CONVERT_DENSITY=300
 # Similar to the memory limit, if you've got a small system and your OS mounts
 # /tmp as tmpfs, you should set this to a path that's on a physical disk, like
@@ -137,43 +81,22 @@ PAPERLESS_PASSPHRASE="secret"
 # the web for "MAGICK_TMPDIR".
 #PAPERLESS_CONVERT_TMPDIR=/var/tmp/paperless
 # You can specify where you want the SQLite database to be stored instead of 
 # the default location
 #PAPERLESS_DBDIR=/path/to/database/file
-# By default the conversion density setting for documents is 300DPI, in some
+# Override the default MEDIA_ROOT here.  This is where all files are stored.
-# cases it has proven useful to configure a lesser value.
+#PAPERLESS_MEDIADIR=/path/to/media
 # This setting has a high impact on the physical size of tmp page files,
 # the speed of document conversion, and can affect the accuracy of OCR
 # results. Individual results can vary and this setting should be tested
 # thoroughly against the documents you are importing to see if it has any
 # impacts either negative or positive.
 # Testing on limited document sets has shown a setting of 200 can cut the
 # size of tmp files by 1/3, and speed up conversion by up to 4x
 # with little impact to OCR accuracy.
 #PAPERLESS_CONVERT_DENSITY=300
 # The number of seconds that Paperless will wait between checking
 # PAPERLESS_CONSUMPTION_DIR.  If you tend to write documents to this directory
-# rarely, you may want to use a higher value than the default (10).
+# very slowly, you may want to use a higher value than the default (10).
-#PAPERLESS_CONSUMER_LOOP_TIME=10
+# PAPERLESS_CONSUMER_LOOP_TIME=10
 # If you're planning on putting Paperless on the open internet, then you
 # really should set this value to the domain name you're using.  Failing to do
 # so leaves you open to XSS attacks.
 # Just remember that this is a comma-separated list, so "example.com" is fine,
 # as is "example.com,www.example.com", but NOT " example.com" or "example.com,"
 #PAPERLESS_ALLOWED_HOSTS="example.com,www.example.com"
 ###############################################################################
 ####                            Interface                                  ####
 ###############################################################################
 # Override the default UTC time zone here.
 # See https://docs.djangoproject.com/en/1.10/ref/settings/#std:setting-TIME_ZONE
 # for details on how to set it.
 #PAPERLESS_TIME_ZONE=UTC
 # If set, Paperless will show document filters per financial year.
 # The dates must be in the format "mm-dd", for example "07-15" for July 15.
 #PAPERLESS_FINANCIAL_YEAR_START="mm-dd"
 #PAPERLESS_FINANCIAL_YEAR_END="mm-dd"
 # The number of items on each page in the web UI.  This value must be a
 # positive integer, but if you don't define one in paperless.conf, a default of
 # 100 will be used.
 #PAPERLESS_LIST_PER_PAGE=100
--- a/requirements.txt
+++ b/requirements.txt
@@ -1,26 +1,22 @@
-Django>=1.11,<2.0
+Django==1.10.4
 Pillow>=3.1.1
-django-crispy-forms>=1.6.1
+django-crispy-forms>=1.6.0
-django-extensions>=1.7.6
+django-extensions>=1.6.1
 django-filter>=1.0
-django-flat-responsive>=1.2.0
+djangorestframework>=3.4.4
 djangorestframework>=3.5.3
 filemagic>=1.6
-fuzzywuzzy[speedup]==0.15.0
+langdetect>=1.0.5
-langdetect>=1.0.7
+pyocr>=0.3.1
-pyocr>=0.4.7
+python-dateutil>=2.4.2
-python-dateutil>=2.6.0
+python-dotenv>=0.3.0
-python-dotenv>=0.6.2
+python-gnupg>=0.3.8
-python-gnupg>=0.3.9
+pytz>=2015.7
-pytz>=2016.10
+gunicorn==19.6.0
 gunicorn==19.7.1
 # For the tests
 factory-boy
 pytest
 pytest-django
 pytest-sugar
-pytest-env
+pep8
 pycodestyle
 flake8
 tox
--- a/scripts/docker-entrypoint.sh
+++ b/scripts/docker-entrypoint.sh
@@ -7,37 +7,34 @@ map_uidgid() {
    USERMAP_ORIG_UID=$(id -g paperless)
    USERMAP_GID=${USERMAP_GID:-${USERMAP_UID:-$USERMAP_ORIG_GID}}
    USERMAP_UID=${USERMAP_UID:-$USERMAP_ORIG_UID}
-    if [[ ${USERMAP_UID} != "${USERMAP_ORIG_UID}" || ${USERMAP_GID} != "${USERMAP_ORIG_GID}" ]]; then
+    if [[ ${USERMAP_UID} != ${USERMAP_ORIG_UID} || ${USERMAP_GID} != ${USERMAP_ORIG_GID} ]]; then
        echo "Mapping UID and GID for paperless:paperless to $USERMAP_UID:$USERMAP_GID"
-        groupmod -g "${USERMAP_GID}" paperless
+        groupmod -g ${USERMAP_GID} paperless
        sed -i -e "s|:${USERMAP_ORIG_UID}:${USERMAP_GID}:|:${USERMAP_UID}:${USERMAP_GID}:|" /etc/passwd
    fi
 }
 set_permissions() {
-    # Set permissions for consumption and export directory
+    # Set permissions for consumption directory
-    for dir in PAPERLESS_CONSUMPTION_DIR PAPERLESS_EXPORT_DIR; do
+    chgrp paperless "$PAPERLESS_CONSUMPTION_DIR" || {
-      # Extract the name of the current directory from $dir for the error message
+        echo "Changing group of consumption directory:"
-      cur_dir_name=$(echo "$dir" | awk -F'_' '{ print tolower($2); }')
+        echo "  $PAPERLESS_CONSUMPTION_DIR"
-      chgrp paperless "${!dir}" || {
+        echo "failed."
-          echo "Changing group of ${cur_dir_name} directory:"
+        echo ""
-          echo "  ${!dir}"
+        echo "Either try to set it on your host-mounted directory"
-          echo "failed."
+        echo "directly, or make sure that the directory has \`o+x\`"
-          echo ""
+        echo "permissions and the files in it at least \`o+r\`."
-          echo "Either try to set it on your host-mounted directory"
+    } >&2
-          echo "directly, or make sure that the directory has \`g+wx\`"
+    chmod g+x "$PAPERLESS_CONSUMPTION_DIR" || {
-          echo "permissions and the files in it at least \`o+r\`."
+        echo "Changing group permissions of consumption directory:"
-      } >&2
+        echo "  $PAPERLESS_CONSUMPTION_DIR"
-      chmod g+wx "${!dir}" || {
+        echo "failed."
-          echo "Changing group permissions of ${cur_dir_name} directory:"
+        echo ""
-          echo "  ${!dir}"
+        echo "Either try to set it on your host-mounted directory"
-          echo "failed."
+        echo "directly, or make sure that the directory has \`o+x\`"
-          echo ""
+        echo "permissions and the files in it at least \`o+r\`."
-          echo "Either try to set it on your host-mounted directory"
+    } >&2
-          echo "directly, or make sure that the directory has \`g+wx\`"
+
          echo "permissions and the files in it at least \`o+r\`."
      } >&2
    done
    # Set permissions for application directory
    chown -Rh paperless:paperless /usr/src/paperless
 }
@@ -62,11 +59,11 @@ install_languages() {
    # Loop over languages to be installed
    for lang in "${langs[@]}"; do
        pkg="tesseract-ocr-$lang"
-        if dpkg -s "$pkg" > /dev/null 2>&1; then
+        if dpkg -s "$pkg" 2>&1 > /dev/null; then
            continue
        fi
-        if ! apt-cache show "$pkg" > /dev/null 2>&1; then
+        if ! apt-cache show "$pkg" 2>&1 > /dev/null; then
            continue
        fi
--- a/src/documents/admin.py
+++ b/src/documents/admin.py
@@ -1,6 +1,3 @@
 from datetime import datetime
 from django.conf import settings
 from django.contrib import admin
 from django.contrib.auth.models import User, Group
 from django.core.urlresolvers import reverse
@@ -34,90 +31,21 @@ class MonthListFilter(admin.SimpleListFilter):
        return queryset.filter(created__year=year, created__month=month)
-class FinancialYearFilter(admin.SimpleListFilter):
+class CorrespondentAdmin(admin.ModelAdmin):
    title = "Financial Year"
    parameter_name = "fy"
    _fy_wraps = None
    def _fy_start(self, year):
        """Return date of the start of financial year for the given year."""
        fy_start = "{}-{}".format(str(year), settings.FY_START)
        return datetime.strptime(fy_start, "%Y-%m-%d").date()
    def _fy_end(self, year):
        """Return date of the end of financial year for the given year."""
        fy_end = "{}-{}".format(str(year), settings.FY_END)
        return datetime.strptime(fy_end, "%Y-%m-%d").date()
    def _fy_does_wrap(self):
        """Return whether the financial year spans across two years."""
        if self._fy_wraps is None:
            start = "{}".format(settings.FY_START)
            start = datetime.strptime(start, "%m-%d").date()
            end = "{}".format(settings.FY_END)
            end = datetime.strptime(end, "%m-%d").date()
            self._fy_wraps = end < start
        return self._fy_wraps
    def _determine_fy(self, date):
        """Return a (query, display) financial year tuple of the given date."""
        if self._fy_does_wrap():
            fy_start = self._fy_start(date.year)
            if date.date() >= fy_start:
                query = "{}-{}".format(date.year, date.year + 1)
            else:
                query = "{}-{}".format(date.year - 1, date.year)
            # To keep it simple we use the same string for both
            # query parameter and the display.
            return (query, query)
        else:
            query = "{0}-{0}".format(date.year)
            display = "{}".format(date.year)
            return (query, display)
    def lookups(self, request, model_admin):
        if not settings.FY_START or not settings.FY_END:
            return None
        r = []
        for document in Document.objects.all():
            r.append(self._determine_fy(document.created))
        return sorted(set(r), key=lambda x: x[0], reverse=True)
    def queryset(self, request, queryset):
        if not self.value() or not settings.FY_START or not settings.FY_END:
            return None
        start, end = self.value().split("-")
        return queryset.filter(created__gte=self._fy_start(start),
                               created__lte=self._fy_end(end))
 class CommonAdmin(admin.ModelAdmin):
    list_per_page = settings.PAPERLESS_LIST_PER_PAGE
 class CorrespondentAdmin(CommonAdmin):
    list_display = ("name", "match", "matching_algorithm")
    list_filter = ("matching_algorithm",)
    list_editable = ("match", "matching_algorithm")
-class TagAdmin(CommonAdmin):
+class TagAdmin(admin.ModelAdmin):
    list_display = ("name", "colour", "match", "matching_algorithm")
    list_filter = ("colour", "matching_algorithm")
    list_editable = ("colour", "match", "matching_algorithm")
-class DocumentAdmin(CommonAdmin):
+class DocumentAdmin(admin.ModelAdmin):
    class Media:
        css = {
@@ -125,34 +53,12 @@ class DocumentAdmin(CommonAdmin):
        }
    search_fields = ("correspondent__name", "title", "content")
-    list_display = ("title", "created", "thumbnail", "correspondent", "tags_")
+    list_display = ("created", "correspondent", "title", "tags_", "document")
-    list_filter = ("tags", "correspondent", FinancialYearFilter,
+    list_filter = ("tags", "correspondent", MonthListFilter)
-                   MonthListFilter)
+    list_per_page = 25
    ordering = ["-created", "correspondent"]
    def has_add_permission(self, request):
        return False
    def created_(self, obj):
        return obj.created.date().strftime("%Y-%m-%d")
    created_.short_description = "Created"
    def thumbnail(self, obj):
        if settings.FORCE_SCRIPT_NAME:
            src_link = "{}/fetch/thumb/{}".format(
                settings.FORCE_SCRIPT_NAME, obj.id)
        else:
            src_link = "/fetch/thumb/{}".format(obj.id)
        png_img = self._html_tag(
            "img",
            src=src_link,
            width=180,
            alt="Thumbnail of {}".format(obj.file_name),
            title=obj.file_name
        )
        return self._html_tag("a", png_img, href=obj.download_url)
    thumbnail.allow_tags = True
    def tags_(self, obj):
        r = ""
@@ -202,7 +108,7 @@ class DocumentAdmin(CommonAdmin):
        return "<{} {}/>".format(kind, " ".join(attributes))
-class LogAdmin(CommonAdmin):
+class LogAdmin(admin.ModelAdmin):
    list_display = ("created", "message", "level",)
    list_filter = ("level", "created",)
--- a/src/documents/consumer.py
+++ b/src/documents/consumer.py
@@ -1,21 +1,35 @@
 import datetime
 import hashlib
 import logging
 import os
 import re
 import uuid
 import shutil
 import hashlib
 import logging
 import datetime
 import tempfile
 import itertools
 import subprocess
 from multiprocessing.pool import Pool
 import pyocr
 import langdetect
 from PIL import Image
 from django.conf import settings
 from django.utils import timezone
 from paperless.db import GnuPG
 from pyocr.tesseract import TesseractError
 from pyocr.libtesseract.tesseract_raw import \
    TesseractError as OtherTesseractError
-from .models import Document, FileInfo, Tag
+from .models import Tag, Document, FileInfo
 from .parsers import ParseError
 from .signals import (
-    document_consumer_declaration,
+    document_consumption_started,
-    document_consumption_finished,
+    document_consumption_finished
    document_consumption_started
 )
 from .languages import ISO639
 class OCRError(Exception):
    pass
 class ConsumerError(Exception):
@@ -33,7 +47,13 @@ class Consumer(object):
    """
    SCRATCH = settings.SCRATCH_DIR
    CONVERT = settings.CONVERT_BINARY
    UNPAPER = settings.UNPAPER_BINARY
    CONSUME = settings.CONSUMPTION_DIR
    THREADS = int(settings.OCR_THREADS) if settings.OCR_THREADS else None
    DENSITY = settings.CONVERT_DENSITY if settings.CONVERT_DENSITY else 300
    DEFAULT_OCR_LANGUAGE = settings.OCR_LANGUAGE
    def __init__(self):
@@ -58,16 +78,6 @@ class Consumer(object):
            raise ConsumerError(
                "Consumption directory {} does not exist".format(self.CONSUME))
        self.parsers = []
        for response in document_consumer_declaration.send(self):
            self.parsers.append(response[1])
        if not self.parsers:
            raise ConsumerError(
                "No parsers could be found, not even the default.  "
                "This is a problem."
            )
    def log(self, level, message):
        getattr(self.logger, level)(message, extra={
            "group": self.logging_group
@@ -99,13 +109,6 @@ class Consumer(object):
                self._ignore.append(doc)
                continue
            parser_class = self._get_parser_class(doc)
            if not parser_class:
                self.log(
                    "error", "No parsers could be found for {}".format(doc))
                self._ignore.append(doc)
                continue
            self.logging_group = uuid.uuid4()
            self.log("info", "Consuming {}".format(doc))
@@ -116,26 +119,25 @@ class Consumer(object):
                logging_group=self.logging_group
            )
-            parsed_document = parser_class(doc)
+            tempdir = tempfile.mkdtemp(prefix="paperless", dir=self.SCRATCH)
-            thumbnail = parsed_document.get_thumbnail()
+            imgs = self._get_greyscale(tempdir, doc)
            thumbnail = self._get_thumbnail(tempdir, doc)
            try:
-                document = self._store(
+
-                    parsed_document.get_text(),
+                document = self._store(self._get_ocr(imgs), doc, thumbnail)
-                    doc,
+
-                    thumbnail
+            except OCRError as e:
                )
            except ParseError as e:
                self._ignore.append(doc)
-                self.log("error", "PARSE FAILURE for {}: {}".format(doc, e))
+                self.log("error", "OCR FAILURE for {}: {}".format(doc, e))
-                parsed_document.cleanup()
+                self._cleanup_tempdir(tempdir)
                continue
            else:
-                parsed_document.cleanup()
+                self._cleanup_tempdir(tempdir)
                self._cleanup_doc(doc)
                self.log(
@@ -149,30 +151,142 @@ class Consumer(object):
                    logging_group=self.logging_group
                )
-    def _get_parser_class(self, doc):
+    def _get_greyscale(self, tempdir, doc):
        """
-        Determine the appropriate parser class based on the file
+        Greyscale images are easier for Tesseract to OCR
        """
-        options = []
+        self.log("info", "Generating greyscale image from {}".format(doc))
        for parser in self.parsers:
            result = parser(doc)
            if result:
                options.append(result)
-        self.log(
+        # Convert PDF to multiple PNMs
-            "info",
+        pnm = os.path.join(tempdir, "convert-%04d.pnm")
-            "Parsers available: {}".format(
+        run_convert(
-                ", ".join([str(o["parser"].__name__) for o in options])
+            self.CONVERT,
-            )
+            "-density", str(self.DENSITY),
            "-depth", "8",
            "-type", "grayscale",
            doc, pnm,
        )
-        if not options:
+        # Get a list of converted images
-            return None
+        pnms = []
        for f in os.listdir(tempdir):
            if f.endswith(".pnm"):
                pnms.append(os.path.join(tempdir, f))
-        # Return the parser with the highest weight.
+        # Run unpaper in parallel on converted images
-        return sorted(
+        with Pool(processes=self.THREADS) as pool:
-            options, key=lambda _: _["weight"], reverse=True)[0]["parser"]
+            pool.map(run_unpaper, itertools.product([self.UNPAPER], pnms))
        # Return list of converted images, processed with unpaper
        pnms = []
        for f in os.listdir(tempdir):
            if f.endswith(".unpaper.pnm"):
                pnms.append(os.path.join(tempdir, f))
        return sorted(filter(lambda __: os.path.isfile(__), pnms))
    def _get_thumbnail(self, tempdir, doc):
        """
        The thumbnail of a PDF is just a 500px wide image of the first page.
        """
        self.log("info", "Generating the thumbnail")
        run_convert(
            self.CONVERT,
            "-scale", "500x5000",
            "-alpha", "remove",
            doc, os.path.join(tempdir, "convert-%04d.png")
        )
        return os.path.join(tempdir, "convert-0000.png")
    def _guess_language(self, text):
        try:
            guess = langdetect.detect(text)
            self.log("debug", "Language detected: {}".format(guess))
            return guess
        except Exception as e:
            self.log("warning", "Language detection error: {}".format(e))
    def _get_ocr(self, imgs):
        """
        Attempts to do the best job possible OCR'ing the document based on
        simple language detection trial & error.
        """
        if not imgs:
            raise OCRError("No images found")
        self.log("info", "OCRing the document")
        # Since the division gets rounded down by int, this calculation works
        # for every edge-case, i.e. 1
        middle = int(len(imgs) / 2)
        raw_text = self._ocr([imgs[middle]], self.DEFAULT_OCR_LANGUAGE)
        guessed_language = self._guess_language(raw_text)
        if not guessed_language or guessed_language not in ISO639:
            self.log("warning", "Language detection failed!")
            if settings.FORGIVING_OCR:
                self.log(
                    "warning",
                    "As FORGIVING_OCR is enabled, we're going to make the "
                    "best with what we have."
                )
                raw_text = self._assemble_ocr_sections(imgs, middle, raw_text)
                return raw_text
            raise OCRError("Language detection failed")
        if ISO639[guessed_language] == self.DEFAULT_OCR_LANGUAGE:
            raw_text = self._assemble_ocr_sections(imgs, middle, raw_text)
            return raw_text
        try:
            return self._ocr(imgs, ISO639[guessed_language])
        except pyocr.pyocr.tesseract.TesseractError:
            if settings.FORGIVING_OCR:
                self.log(
                    "warning",
                    "OCR for {} failed, but we're going to stick with what "
                    "we've got since FORGIVING_OCR is enabled.".format(
                        guessed_language
                    )
                )
                raw_text = self._assemble_ocr_sections(imgs, middle, raw_text)
                return raw_text
            raise OCRError(
                "The guessed language is not available in this instance of "
                "Tesseract."
            )
    def _assemble_ocr_sections(self, imgs, middle, text):
        """
        Given a `middle` value and the text that middle page represents, we OCR
        the remainder of the document and return the whole thing.
        """
        text = self._ocr(imgs[:middle], self.DEFAULT_OCR_LANGUAGE) + text
        text += self._ocr(imgs[middle + 1:], self.DEFAULT_OCR_LANGUAGE)
        return text
    def _ocr(self, imgs, lang):
        """
        Performs a single OCR attempt.
        """
        if not imgs:
            return ""
        self.log("info", "Parsing for {}".format(lang))
        with Pool(processes=self.THREADS) as pool:
            r = pool.map(image_to_string, itertools.product(imgs, [lang]))
            r = " ".join(r)
        # Strip out excess white space to allow matching to go smoother
        return strip_excess_whitespace(r)
    def _store(self, text, doc, thumbnail):
@@ -218,6 +332,10 @@ class Consumer(object):
        return document
    def _cleanup_tempdir(self, d):
        self.log("debug", "Deleting directory {}".format(d))
        shutil.rmtree(d)
    def _cleanup_doc(self, doc):
        self.log("debug", "Deleting document {}".format(doc))
        os.unlink(doc)
@@ -243,3 +361,41 @@ class Consumer(object):
        with open(doc, "rb") as f:
            checksum = hashlib.md5(f.read()).hexdigest()
        return Document.objects.filter(checksum=checksum).exists()
 def strip_excess_whitespace(text):
    collapsed_spaces = re.sub(r"([^\S\r\n]+)", " ", text)
    no_leading_whitespace = re.sub(
        "([\n\r]+)([^\S\n\r]+)", '\\1', collapsed_spaces)
    no_trailing_whitespace = re.sub("([^\S\n\r]+)$", '', no_leading_whitespace)
    return no_trailing_whitespace
 def image_to_string(args):
    img, lang = args
    ocr = pyocr.get_available_tools()[0]
    with Image.open(os.path.join(Consumer.SCRATCH, img)) as f:
        if ocr.can_detect_orientation():
            try:
                orientation = ocr.detect_orientation(f, lang=lang)
                f = f.rotate(orientation["angle"], expand=1)
            except (TesseractError, OtherTesseractError):
                pass
        return ocr.image_to_string(f, lang=lang)
 def run_unpaper(args):
    unpaper, pnm = args
    subprocess.Popen(
        (unpaper, pnm, pnm.replace(".pnm", ".unpaper.pnm"))).wait()
 def run_convert(*args):
    environment = os.environ.copy()
    if settings.CONVERT_MEMORY_LIMIT:
        environment["MAGICK_MEMORY_LIMIT"] = settings.CONVERT_MEMORY_LIMIT
    if settings.CONVERT_TMPDIR:
        environment["MAGICK_TMPDIR"] = settings.CONVERT_TMPDIR
    subprocess.Popen(args, env=environment).wait()
--- a/src/documents/filters.py
+++ b/src/documents/filters.py
@@ -8,7 +8,7 @@ class CorrespondentFilterSet(FilterSet):
    class Meta(object):
        model = Correspondent
        fields = {
-            "name": [
+            'name': [
                "startswith", "endswith", "contains",
                "istartswith", "iendswith", "icontains"
            ],
@@ -21,7 +21,7 @@ class TagFilterSet(FilterSet):
    class Meta(object):
        model = Tag
        fields = {
-            "name": [
+            'name': [
                "startswith", "endswith", "contains",
                "istartswith", "iendswith", "icontains"
            ],
--- a/src/documents/forms.py
+++ b/src/documents/forms.py
@@ -2,6 +2,7 @@ import magic
 import os
 from datetime import datetime
 from hashlib import sha256
 from time import mktime
 from django import forms
@@ -13,6 +14,7 @@ from .consumer import Consumer
 class UploadForm(forms.Form):
    SECRET = settings.SHARED_SECRET
    TYPE_LOOKUP = {
        "application/pdf": Document.TYPE_PDF,
        "image/png": Document.TYPE_PNG,
@@ -30,9 +32,10 @@ class UploadForm(forms.Form):
        required=False
    )
    document = forms.FileField()
    signature = forms.CharField(max_length=256)
    def __init__(self, *args, **kwargs):
-        forms.Form.__init__(self, *args, **kwargs)
+        forms.Form.__init__(*args, **kwargs)
        self._file_type = None
    def clean_correspondent(self):
@@ -79,6 +82,17 @@ class UploadForm(forms.Form):
        return document
    def clean(self):
        corresp = self.clened_data.get("correspondent")
        title = self.cleaned_data.get("title")
        signature = self.cleaned_data.get("signature")
        if sha256(corresp + title + self.SECRET).hexdigest() == signature:
            return self.cleaned_data
        raise forms.ValidationError("The signature provided did not validate")
    def save(self):
        """
        Since the consumer already does a lot of work, it's easier just to save
@@ -86,11 +100,11 @@ class UploadForm(forms.Form):
        form do that as well.  Think of it as a poor-man's queue server.
        """
-        correspondent = self.cleaned_data.get("correspondent")
+        correspondent = self.clened_data.get("correspondent")
        title = self.cleaned_data.get("title")
        document = self.cleaned_data.get("document")
-        t = int(mktime(datetime.now().timetuple()))
+        t = int(mktime(datetime.now()))
        file_name = os.path.join(
            Consumer.CONSUME,
            "{} - {}.{}".format(correspondent, title, self._file_type)
--- a/src/paperless_tesseract/languages.py
+++ b/src/paperless_tesseract/languages.py
--- a/src/documents/mail.py
+++ b/src/documents/mail.py
@@ -43,10 +43,7 @@ class Message(Loggable):
    and n attachments, and that we don't care about the message body.
    """
-    SECRET = os.getenv(
+    SECRET = settings.SHARED_SECRET
        "PAPERLESS_EMAIL_SECRET",
        os.getenv("PAPERLESS_SHARED_SECRET")  # TODO: Remove after 2017/09
    )
    def __init__(self, data, group=None):
        """
@@ -156,11 +153,11 @@ class MailFetcher(Loggable):
        Loggable.__init__(self)
        self._connection = None
-        self._host = os.getenv("PAPERLESS_CONSUME_MAIL_HOST")
+        self._host = settings.MAIL_CONSUMPTION["HOST"]
-        self._port = os.getenv("PAPERLESS_CONSUME_MAIL_PORT")
+        self._port = settings.MAIL_CONSUMPTION["PORT"]
-        self._username = os.getenv("PAPERLESS_CONSUME_MAIL_USER")
+        self._username = settings.MAIL_CONSUMPTION["USERNAME"]
-        self._password = os.getenv("PAPERLESS_CONSUME_MAIL_PASS")
+        self._password = settings.MAIL_CONSUMPTION["PASSWORD"]
-        self._inbox = os.getenv("PAPERLESS_CONSUME_MAIL_INBOX", "INBOX")
+        self._inbox = settings.MAIL_CONSUMPTION["INBOX"]
        self._enabled = bool(self._host)
@@ -222,7 +219,7 @@ class MailFetcher(Loggable):
        if not login[0] == "OK":
            raise MailFetcherError("Can't log into mail: {}".format(login[1]))
-        inbox = self._connection.select(self._inbox)
+        inbox = self._connection.select("INBOX")
        if not inbox[0] == "OK":
            raise MailFetcherError("Can't find the inbox: {}".format(inbox[1]))
--- a/src/documents/management/commands/document_consumer.py
+++ b/src/documents/management/commands/document_consumer.py
@@ -28,7 +28,6 @@ class Command(BaseCommand):
        self.file_consumer = None
        self.mail_fetcher = None
        self.first_iteration = True
        BaseCommand.__init__(self, *args, **kwargs)
@@ -67,9 +66,6 @@ class Command(BaseCommand):
        self.file_consumer.consume()
        # Occasionally fetch mail and store it to be consumed on the next loop
        # We fetch email when we first start up so that it is not necessary to
        # wait for 10 minutes after making changes to the config file.
        delta = self.mail_fetcher.last_checked + self.MAIL_DELTA
-        if self.first_iteration or delta < datetime.datetime.now():
+        if delta < datetime.datetime.now():
            self.first_iteration = False
            self.mail_fetcher.pull()
--- a/src/documents/management/commands/document_exporter.py
+++ b/src/documents/management/commands/document_exporter.py
@@ -10,7 +10,6 @@ from documents.models import Document, Correspondent, Tag
 from paperless.db import GnuPG
 from ...mixins import Renderable
 from documents.settings import EXPORTER_FILE_NAME, EXPORTER_THUMBNAIL_NAME
 class Command(Renderable, BaseCommand):
@@ -62,24 +61,15 @@ class Command(Renderable, BaseCommand):
            document = document_map[document_dict["pk"]]
-            file_target = os.path.join(self.target, document.file_name)
+            target = os.path.join(self.target, document.file_name)
            document_dict["__exported_file_name__"] = target
-            thumbnail_name = document.file_name + "-thumbnail.png"
+            print("Exporting: {}".format(target))
            thumbnail_target = os.path.join(self.target, thumbnail_name)
-            document_dict[EXPORTER_FILE_NAME] = document.file_name
+            with open(target, "wb") as f:
            document_dict[EXPORTER_THUMBNAIL_NAME] = thumbnail_name
            print("Exporting: {}".format(file_target))
            t = int(time.mktime(document.created.timetuple()))
            with open(file_target, "wb") as f:
                f.write(GnuPG.decrypted(document.source_file))
-                os.utime(file_target, times=(t, t))
+                t = int(time.mktime(document.created.timetuple()))
-
+                os.utime(target, times=(t, t))
            with open(thumbnail_target, "wb") as f:
                f.write(GnuPG.decrypted(document.thumbnail_file))
                os.utime(thumbnail_target, times=(t, t))
        manifest += json.loads(
            serializers.serialize("json", Correspondent.objects.all()))
--- a/src/documents/management/commands/document_importer.py
+++ b/src/documents/management/commands/document_importer.py
@@ -10,8 +10,6 @@ from paperless.db import GnuPG
 from ...mixins import Renderable
 from documents.settings import EXPORTER_FILE_NAME, EXPORTER_THUMBNAIL_NAME
 class Command(Renderable, BaseCommand):
@@ -72,13 +70,13 @@ class Command(Renderable, BaseCommand):
            if not record["model"] == "documents.document":
                continue
-            if EXPORTER_FILE_NAME not in record:
+            if "__exported_file_name__" not in record:
                raise CommandError(
                    'The manifest file contains a record which does not '
                    'refer to an actual document file.'
                )
-            doc_file = record[EXPORTER_FILE_NAME]
+            doc_file = record["__exported_file_name__"]
            if not os.path.exists(os.path.join(self.source, doc_file)):
                raise CommandError(
                    'The manifest file refers to "{}" which does not '
@@ -92,21 +90,10 @@ class Command(Renderable, BaseCommand):
            if not record["model"] == "documents.document":
                continue
-            doc_file = record[EXPORTER_FILE_NAME]
+            doc_file = record["__exported_file_name__"]
            thumb_file = record[EXPORTER_THUMBNAIL_NAME]
            document = Document.objects.get(pk=record["pk"])
-
+            with open(doc_file, "rb") as unencrypted:
            document_path = os.path.join(self.source, doc_file)
            thumbnail_path = os.path.join(self.source, thumb_file)
            with open(document_path, "rb") as unencrypted:
                with open(document.source_path, "wb") as encrypted:
                    print("Encrypting {} and saving it to {}".format(
                        doc_file, document.source_path))
                    encrypted.write(GnuPG.encrypted(unencrypted))
            with open(thumbnail_path, "rb") as unencrypted:
                with open(document.thumbnail_path, "wb") as encrypted:
                    print("Encrypting {} and saving it to {}".format(
                        thumb_file, document.thumbnail_path))
                    encrypted.write(GnuPG.encrypted(unencrypted))
--- a/src/documents/managers.py
+++ b/src/documents/managers.py
@@ -50,7 +50,7 @@ class GroupConcat(models.Aggregate):
    def _get_template(self, separator):
        if self.engine == self.ENGINE_MYSQL:
-            return "%(function)s(%(expressions)s SEPARATOR '{}')".format(
+            return "%(function)s(%(expressions)s, SEPARATOR '{}')".format(
                separator)
        return "%(function)s(%(expressions)s, '{}')".format(separator)
--- a/src/documents/migrations/0001_initial.py
+++ b/src/documents/migrations/0001_initial.py
@@ -3,7 +3,6 @@
 from __future__ import unicode_literals
 from django.db import migrations, models
 from django.conf import settings
 class Migration(migrations.Migration):
@@ -20,7 +19,7 @@ class Migration(migrations.Migration):
                ('id', models.AutoField(auto_created=True, primary_key=True, serialize=False, verbose_name='ID')),
                ('sender', models.CharField(blank=True, db_index=True, max_length=128)),
                ('title', models.CharField(blank=True, db_index=True, max_length=128)),
-                ('content', models.TextField(db_index=("mysql" not in settings.DATABASES["default"]["ENGINE"]))),
+                ('content', models.TextField(db_index=True)),
                ('created', models.DateTimeField(auto_now_add=True)),
                ('modified', models.DateTimeField(auto_now=True)),
            ],
--- a/src/documents/migrations/0003_sender.py
+++ b/src/documents/migrations/0003_sender.py
@@ -47,11 +47,7 @@ class Migration(migrations.Migration):
            ],
        ),
        migrations.RunPython(move_sender_strings_to_sender_model),
-        migrations.RemoveField(
+        migrations.AlterField(
            model_name='document',
            name='sender',
        ),
        migrations.AddField(
            model_name='document',
            name='sender',
            field=models.ForeignKey(blank=True, on_delete=django.db.models.deletion.CASCADE, to='documents.Sender'),
--- a/src/documents/migrations/0012_auto_20160305_0040.py
+++ b/src/documents/migrations/0012_auto_20160305_0040.py
@@ -38,9 +38,6 @@ class GnuPG(object):
 def move_documents_and_create_thumbnails(apps, schema_editor):
    os.makedirs(os.path.join(settings.MEDIA_ROOT, "documents", "originals"), exist_ok=True)
    os.makedirs(os.path.join(settings.MEDIA_ROOT, "documents", "thumbnails"), exist_ok=True)
    documents = os.listdir(os.path.join(settings.MEDIA_ROOT, "documents"))
    if set(documents) == {"originals", "thumbnails"}:
--- a/src/documents/migrations/0016_auto_20170325_1558.py
+++ b/src/documents/migrations/0016_auto_20170325_1558.py
@@ -1,20 +0,0 @@
 # -*- coding: utf-8 -*-
 # Generated by Django 1.10.5 on 2017-03-25 15:58
 from __future__ import unicode_literals
 from django.db import migrations, models
 class Migration(migrations.Migration):
    dependencies = [
        ('documents', '0015_add_insensitive_to_match'),
    ]
    operations = [
        migrations.AlterField(
            model_name='document',
            name='content',
            field=models.TextField(blank=True, db_index=True, help_text='The raw, text-only data of the document.  This field is primarily used for searching.'),
        ),
    ]
--- a/src/documents/migrations/0017_auto_20170512_0507.py
+++ b/src/documents/migrations/0017_auto_20170512_0507.py
@@ -1,25 +0,0 @@
 # -*- coding: utf-8 -*-
 # Generated by Django 1.10.5 on 2017-05-12 05:07
 from __future__ import unicode_literals
 from django.db import migrations, models
 class Migration(migrations.Migration):
    dependencies = [
        ('documents', '0016_auto_20170325_1558'),
    ]
    operations = [
        migrations.AlterField(
            model_name='correspondent',
            name='matching_algorithm',
            field=models.PositiveIntegerField(choices=[(1, 'Any'), (2, 'All'), (3, 'Literal'), (4, 'Regular Expression'), (5, 'Fuzzy Match')], default=1, help_text='Which algorithm you want to use when matching text to the OCR\'d PDF.  Here, "any" looks for any occurrence of any word provided in the PDF, while "all" requires that every word provided appear in the PDF, albeit not in the order provided.  A "literal" match means that the text you enter must appear in the PDF exactly as you\'ve entered it, and "regular expression" uses a regex to match the PDF.  (If you don\'t know what a regex is, you probably don\'t want this option.)  Finally, a "fuzzy match" looks for words or phrases that are mostly—but not exactly—the same, which can be useful for matching against documents containg imperfections that foil accurate OCR.'),
        ),
        migrations.AlterField(
            model_name='tag',
            name='matching_algorithm',
            field=models.PositiveIntegerField(choices=[(1, 'Any'), (2, 'All'), (3, 'Literal'), (4, 'Regular Expression'), (5, 'Fuzzy Match')], default=1, help_text='Which algorithm you want to use when matching text to the OCR\'d PDF.  Here, "any" looks for any occurrence of any word provided in the PDF, while "all" requires that every word provided appear in the PDF, albeit not in the order provided.  A "literal" match means that the text you enter must appear in the PDF exactly as you\'ve entered it, and "regular expression" uses a regex to match the PDF.  (If you don\'t know what a regex is, you probably don\'t want this option.)  Finally, a "fuzzy match" looks for words or phrases that are mostly—but not exactly—the same, which can be useful for matching against documents containg imperfections that foil accurate OCR.'),
        ),
    ]
--- a/src/documents/migrations/0018_auto_20170715_1712.py
+++ b/src/documents/migrations/0018_auto_20170715_1712.py
@@ -1,21 +0,0 @@
 # -*- coding: utf-8 -*-
 # Generated by Django 1.10.5 on 2017-07-15 17:12
 from __future__ import unicode_literals
 from django.db import migrations, models
 import django.db.models.deletion
 class Migration(migrations.Migration):
    dependencies = [
        ('documents', '0017_auto_20170512_0507'),
    ]
    operations = [
        migrations.AlterField(
            model_name='document',
            name='correspondent',
            field=models.ForeignKey(blank=True, null=True, on_delete=django.db.models.deletion.SET_NULL, related_name='documents', to='documents.Correspondent'),
        ),
    ]
--- a/src/documents/models.py
+++ b/src/documents/models.py
@@ -1,5 +1,3 @@
 # coding=utf-8
 import dateutil.parser
 import logging
 import os
@@ -7,7 +5,6 @@ import re
 import uuid
 from collections import OrderedDict
 from fuzzywuzzy import fuzz
 from django.conf import settings
 from django.core.urlresolvers import reverse
@@ -24,13 +21,11 @@ class MatchingModel(models.Model):
    MATCH_ALL = 2
    MATCH_LITERAL = 3
    MATCH_REGEX = 4
    MATCH_FUZZY = 5
    MATCHING_ALGORITHMS = (
        (MATCH_ANY, "Any"),
        (MATCH_ALL, "All"),
        (MATCH_LITERAL, "Literal"),
        (MATCH_REGEX, "Regular Expression"),
        (MATCH_FUZZY, "Fuzzy Match"),
    )
    name = models.CharField(max_length=128, unique=True)
@@ -47,11 +42,8 @@ class MatchingModel(models.Model):
            "provided appear in the PDF, albeit not in the order provided.  A "
            "\"literal\" match means that the text you enter must appear in "
            "the PDF exactly as you've entered it, and \"regular expression\" "
-            "uses a regex to match the PDF.  (If you don't know what a regex "
+            "uses a regex to match the PDF.  If you don't know what a regex "
-            "is, you probably don't want this option.)  Finally, a \"fuzzy "
+            "is, you probably don't want this option."
            "match\" looks for words or phrases that are mostly—but not "
            "exactly—the same, which can be useful for matching against "
            "documents containg imperfections that foil accurate OCR."
        )
    )
@@ -91,7 +83,7 @@ class MatchingModel(models.Model):
            search_kwargs = {"flags": re.IGNORECASE}
        if self.matching_algorithm == self.MATCH_ALL:
-            for word in self._split_match():
+            for word in self.match.split(" "):
                search_result = re.search(
                    r"\b{}\b".format(word), text, **search_kwargs)
                if not search_result:
@@ -99,7 +91,7 @@ class MatchingModel(models.Model):
            return True
        if self.matching_algorithm == self.MATCH_ANY:
-            for word in self._split_match():
+            for word in self.match.split(" "):
                if re.search(r"\b{}\b".format(word), text, **search_kwargs):
                    return True
            return False
@@ -112,32 +104,8 @@ class MatchingModel(models.Model):
            return bool(re.search(
                re.compile(self.match, **search_kwargs), text))
        if self.matching_algorithm == self.MATCH_FUZZY:
            match = re.sub(r'[^\w\s]', '', self.match)
            text = re.sub(r'[^\w\s]', '', text)
            if self.is_insensitive:
                match = match.lower()
                text = text.lower()
            return True if fuzz.partial_ratio(match, text) >= 90 else False
        raise NotImplementedError("Unsupported matching algorithm")
    def _split_match(self):
        """
        Splits the match to individual keywords, getting rid of unnecessary
        spaces and grouping quoted words together.
        Example:
          '  some random  words "with   quotes  " and   spaces'
            ==>
          ["some", "random", "words", "with\s+quotes", "and", "spaces"]
        """
        findterms = re.compile(r'"([^"]+)"|(\S+)').findall
        normspace = re.compile(r"\s+").sub
        return [normspace(r"\s+", (t[0] or t[1]).strip())
                for t in findterms(self.match)]
    def save(self, *args, **kwargs):
        self.match = self.match.lower()
@@ -189,28 +157,14 @@ class Document(models.Model):
    TYPES = (TYPE_PDF, TYPE_PNG, TYPE_JPG, TYPE_GIF, TYPE_TIF,)
    correspondent = models.ForeignKey(
-        Correspondent,
+        Correspondent, blank=True, null=True, related_name="documents")
        blank=True,
        null=True,
        related_name="documents",
        on_delete=models.SET_NULL
    )
    title = models.CharField(max_length=128, blank=True, db_index=True)
-
+    content = models.TextField(db_index=True)
    content = models.TextField(
        db_index=True,
        blank=True,
        help_text="The raw, text-only data of the document.  This field is "
                  "primarily used for searching."
    )
    file_type = models.CharField(
        max_length=4,
        editable=False,
        choices=tuple([(t, t.upper()) for t in TYPES])
    )
    tags = models.ManyToManyField(
        Tag, related_name="documents", blank=True)
@@ -338,45 +292,45 @@ class FileInfo(object):
            r"(?P<correspondent>.*) - "
            r"(?P<title>.*) - "
            r"(?P<tags>[a-z0-9\-,]*)"
-            r"\.(?P<extension>pdf|jpe?g|png|gif|tiff?)$",
+            r"\.(?P<extension>pdf|jpe?g|png|gif|tiff)$",
            flags=re.IGNORECASE
        )),
        ("created-title-tags", re.compile(
            r"^(?P<created>\d\d\d\d\d\d\d\d(\d\d\d\d\d\d)?Z) - "
            r"(?P<title>.*) - "
            r"(?P<tags>[a-z0-9\-,]*)"
-            r"\.(?P<extension>pdf|jpe?g|png|gif|tiff?)$",
+            r"\.(?P<extension>pdf|jpe?g|png|gif|tiff)$",
            flags=re.IGNORECASE
        )),
        ("created-correspondent-title", re.compile(
            r"^(?P<created>\d\d\d\d\d\d\d\d(\d\d\d\d\d\d)?Z) - "
            r"(?P<correspondent>.*) - "
            r"(?P<title>.*)"
-            r"\.(?P<extension>pdf|jpe?g|png|gif|tiff?)$",
+            r"\.(?P<extension>pdf|jpe?g|png|gif|tiff)$",
            flags=re.IGNORECASE
        )),
        ("created-title", re.compile(
            r"^(?P<created>\d\d\d\d\d\d\d\d(\d\d\d\d\d\d)?Z) - "
            r"(?P<title>.*)"
-            r"\.(?P<extension>pdf|jpe?g|png|gif|tiff?)$",
+            r"\.(?P<extension>pdf|jpe?g|png|gif|tiff)$",
            flags=re.IGNORECASE
        )),
        ("correspondent-title-tags", re.compile(
            r"(?P<correspondent>.*) - "
            r"(?P<title>.*) - "
            r"(?P<tags>[a-z0-9\-,]*)"
-            r"\.(?P<extension>pdf|jpe?g|png|gif|tiff?)$",
+            r"\.(?P<extension>pdf|jpe?g|png|gif|tiff)$",
            flags=re.IGNORECASE
        )),
        ("correspondent-title", re.compile(
            r"(?P<correspondent>.*) - "
            r"(?P<title>.*)?"
-            r"\.(?P<extension>pdf|jpe?g|png|gif|tiff?)$",
+            r"\.(?P<extension>pdf|jpe?g|png|gif|tiff)$",
            flags=re.IGNORECASE
        )),
        ("title", re.compile(
            r"(?P<title>.*)"
-            r"\.(?P<extension>pdf|jpe?g|png|gif|tiff?)$",
+            r"\.(?P<extension>pdf|jpe?g|png|gif|tiff)$",
            flags=re.IGNORECASE
        ))
    ])
@@ -419,8 +373,6 @@ class FileInfo(object):
        r = extension.lower()
        if r == "jpeg":
            return "jpg"
        if r == "tif":
            return "tiff"
        return r
    @classmethod
--- a/src/documents/parsers.py
+++ b/src/documents/parsers.py
@@ -1,45 +0,0 @@
 import logging
 import shutil
 import tempfile
 from django.conf import settings
 class ParseError(Exception):
    pass
 class DocumentParser(object):
    """
    Subclass this to make your own parser.  Have a look at
    `paperless_tesseract.parsers` for inspiration.
    """
    SCRATCH = settings.SCRATCH_DIR
    def __init__(self, path):
        self.document_path = path
        self.tempdir = tempfile.mkdtemp(prefix="paperless", dir=self.SCRATCH)
        self.logger = logging.getLogger(__name__)
        self.logging_group = None
    def get_thumbnail(self):
        """
        Returns the path to a file we can use as a thumbnail for this document.
        """
        raise NotImplementedError()
    def get_text(self):
        """
        Returns the text from the document and only the text.
        """
        raise NotImplementedError()
    def log(self, level, message):
        getattr(self.logger, level)(message, extra={
            "group": self.logging_group
        })
    def cleanup(self):
        self.log("debug", "Deleting directory {}".format(self.tempdir))
        shutil.rmtree(self.tempdir)
--- a/src/documents/serialisers.py
+++ b/src/documents/serialisers.py
@@ -18,21 +18,12 @@ class TagSerializer(serializers.HyperlinkedModelSerializer):
            "id", "slug", "name", "colour", "match", "matching_algorithm")
 class CorrespondentField(serializers.HyperlinkedRelatedField):
    def get_queryset(self):
        return Correspondent.objects.all()
 class TagsField(serializers.HyperlinkedRelatedField):
    def get_queryset(self):
        return Tag.objects.all()
 class DocumentSerializer(serializers.ModelSerializer):
-    correspondent = CorrespondentField(
+    correspondent = serializers.HyperlinkedRelatedField(
-        view_name="drf:correspondent-detail", allow_null=True)
+        read_only=True, view_name="drf:correspondent-detail", allow_null=True)
-    tags = TagsField(view_name="drf:tag-detail", many=True)
+    tags = serializers.HyperlinkedRelatedField(
        read_only=True, view_name="drf:tag-detail", many=True)
    class Meta(object):
        model = Document
--- a/src/documents/settings.py
+++ b/src/documents/settings.py
@@ -1,4 +0,0 @@
 # Defines the names of file/thumbnail for the manifest
 # for exporting/importing commands
 EXPORTER_FILE_NAME = "__exported_file_name__"
 EXPORTER_THUMBNAIL_NAME = "__exported_thumbnail_name__"
--- a/src/documents/signals/init.py
+++ b/src/documents/signals/init.py
@@ -2,4 +2,3 @@ from django.dispatch import Signal
 document_consumption_started = Signal(providing_args=["filename"])
 document_consumption_finished = Signal(providing_args=["document"])
 document_consumer_declaration = Signal(providing_args=[])
--- a/src/documents/signals/handlers.py
+++ b/src/documents/signals/handlers.py
@@ -1,5 +1,6 @@
 import logging
 import os
 from subprocess import Popen
 from django.conf import settings
--- a/src/documents/static/paperless.css
+++ b/src/documents/static/paperless.css
@@ -10,14 +10,3 @@ td a.tag {
  margin: 1px;
  display: inline-block;
 }
 #result_list th.column-note {
  text-align: right;
 }
 #result_list td.field-note {
  text-align: right;
 }
 #result_list td textarea {
  width: 90%;
  height: 5em;
 }
--- a/src/documents/templates/admin/documents/document/change_form.html
+++ b/src/documents/templates/admin/documents/document/change_form.html
@@ -1,13 +0,0 @@
 {% extends 'admin/change_form.html' %}
 {% block footer %}
 	{{ block.super }}
 	{# Hack to force Django to make the created date a date input rather than `text` (the default) #}
 	<script>
 		django.jQuery(".field-created input").first().attr("type", "date")
 	</script>
 {% endblock footer %}
--- a/src/documents/templates/admin/documents/document/change_list.html
+++ b/src/documents/templates/admin/documents/document/change_list.html
@@ -1,12 +0,0 @@
 {% extends 'admin/change_list.html' %}
 {% load admin_actions from admin_list%}
 {% load result_list from hacks %}
 {% block result_list %}
 	{% if action_form and actions_on_top and cl.show_admin_actions %}{% admin_actions %}{% endif %}
 	{% result_list cl %}
 	{% if action_form and actions_on_bottom and cl.show_admin_actions %}{% admin_actions %}{% endif %}
 {% endblock %}
--- a/src/documents/templates/admin/documents/document/change_list_results.html
+++ b/src/documents/templates/admin/documents/document/change_list_results.html
@@ -1,162 +0,0 @@
 {% load i18n %}
 <style>
  .grid *, .grid *:after, .grid *:before {
    -webkit-box-sizing: border-box;
    -moz-box-sizing: border-box;
    box-sizing: border-box;
  }
  .box {
    width: 12.5%;
    padding: 1em;
    float: left;
    opacity: 0.7;
    transition: all 0.5s;
  }
  .box:hover {
    opacity: 1;
    transition: all 0.5s;
  }
  .box:last-of-type {
    padding-right: 0;
  }
  .result {
    border: 1px solid #cccccc;
    border-radius: 2%;
    overflow: hidden;
    height: 300px;
  }
  .result .header {
    padding: 5px;
    background-color: #79AEC8;
  }
  .result .header .checkbox{
    width: 5%;
    float: left;
  }
  .result .header .info {
    margin-left: 10%;
  }
  .result .header a,
  .result a.tag {
    color: #ffffff;
  }
  .result .date {
    padding: 5px;
  }
  .result .tags {
    float: left;
  }
  .result .tags a.tag {
    padding: 2px 5px;
    border-radius: 2px;
    display: inline-block;
    margin: 2px;
  }
  .result .date {
    float: right;
    color: #cccccc;
  }
  .result .image img {
    width: 100%;
  }
  .grid {
    margin-right: 260px;
  }
  .grid:after {
    content: "";
    display: table;
    clear: both;
  }
  @media (max-width: 1600px) {
    .box {
      width: 25%
    }
  }
  @media (max-width: 991px) {
    .grid {
      margin-right: 220px;
    }
    .box {
      width: 50%
    }
  }
  @media (max-width: 767px) {
    .grid {
      margin-right: 0;
    }
  }
  @media (max-width: 500px) {
    .box {
      width: 100%
    }
  }
 </style>
 {# This is just copypasta from the parent change_list_results.html file #}
 <table id="result_list">
 <thead>
 <tr>
 {% for header in result_headers %}
 <th scope="col" {{ header.class_attrib }}>
   {% if header.sortable %}
     {% if header.sort_priority > 0 %}
       <div class="sortoptions">
         <a class="sortremove" href="{{ header.url_remove }}" title="{% trans "Remove from sorting" %}"></a>
         {% if num_sorted_fields > 1 %}<span class="sortpriority" title="{% blocktrans with priority_number=header.sort_priority %}Sorting priority: {{ priority_number }}{% endblocktrans %}">{{ header.sort_priority }}</span>{% endif %}
         <a href="{{ header.url_toggle }}" class="toggle {% if header.ascending %}ascending{% else %}descending{% endif %}" title="{% trans "Toggle sorting" %}"></a>
       </div>
     {% endif %}
   {% endif %}
   <div class="text">{% if header.sortable %}<a href="{{ header.url_primary }}">{{ header.text|capfirst }}</a>{% else %}<span>{{ header.text|capfirst }}</span>{% endif %}</div>
   <div class="clear"></div>
 </th>{% endfor %}
 </tr>
 </thead>
 </table>
 {# /copypasta #}
 <div class="grid">
  {% for result in results %}
    {# 0: Checkbox #}
    {# 1: Title #}
    {# 2: Date #}
    {# 3: Image #}
    {# 4: Correspondent #}
    {# 5: Tags #}
    <div class="box">
      <div class="result">
        <div class="header">
          <div class="checkbox">{{ result.0 }}</div>
          <div class="info">
            {{ result.4 }}<br />
            {{ result.1 }}
          </div>
          <div style="clear: both;"></div>
        </div>
        <div class="tags">{{ result.5 }}</div>
        <div class="date">{{ result.2 }}</div>
        <div style="clear: both;"></div>
        <div class="image">{{ result.3 }}</div>
      </div>
    </div>
  {% endfor %}
 </div>
 <script>
  // We need to re-build the select-all functionality as the old logic pointed
  // to a table and we're using divs now.
  django.jQuery("#action-toggle").on("change", function(){
    django.jQuery(".grid .box .result .checkbox input")
      .prop("checked", this.checked);
  });
 </script>
--- a/src/documents/templates/documents/index.html
+++ b/src/documents/templates/documents/index.html
@@ -6,6 +6,5 @@
    <meta charset="utf-8">
  </head>
  <body>
 		{# One day someone (maybe even myself) is going to write a proper web front-end for Paperless, and this is where it'll start. #}
  </body>
 </html>
--- a/src/documents/templatetags/init.py
+++ b/src/documents/templatetags/init.py
--- a/src/documents/templatetags/hacks.py
+++ b/src/documents/templatetags/hacks.py
@@ -1,28 +0,0 @@
 from django.contrib.admin.templatetags.admin_list import (
    result_headers,
    result_hidden_fields,
    results
 )
 from django.template import Library
 register = Library()
@register.inclusion_tag("admin/documents/document/change_list_results.html")
 def result_list(cl):
    """
    Copy/pasted from django.contrib.admin.templatetags.admin_list just so I can
    modify the value passed to `.inclusion_tag()` in the decorator here.  There
    must be a cleaner way... right?
    """
    headers = list(result_headers(cl))
    num_sorted_fields = 0
    for h in headers:
        if h['sortable'] and h['sorted']:
            num_sorted_fields += 1
    return {'cl': cl,
            'result_hidden_fields': list(result_hidden_fields(cl)),
            'result_headers': headers,
            'num_sorted_fields': num_sorted_fields,
            'results': list(results(cl))}
--- a/src/documents/tests/factories.py
+++ b/src/documents/tests/factories.py
@@ -1,17 +0,0 @@
 import factory
 from ..models import Document, Correspondent
 class CorrespondentFactory(factory.DjangoModelFactory):
    class Meta:
        model = Correspondent
    name = factory.Faker("name")
 class DocumentFactory(factory.DjangoModelFactory):
    class Meta:
        model = Document
--- a/src/paperless_tesseract/tests/samples/no-text.png
+++ b/src/paperless_tesseract/tests/samples/no-text.png
--- a/src/documents/tests/test_consumer.py
+++ b/src/documents/tests/test_consumer.py
@@ -1,66 +1,22 @@
 import os
 from unittest import mock, skipIf
 import pyocr
 from django.test import TestCase
-from unittest import mock
+from pyocr.libtesseract.tesseract_raw import \
    TesseractError as OtherTesseractError
 from ..consumer import Consumer
 from ..models import FileInfo
-
+from ..consumer import image_to_string, strip_excess_whitespace
 class TestConsumer(TestCase):
    class DummyParser(object):
        pass
    def test__get_parser_class_1_parser(self):
        self.assertEqual(
            self._get_consumer()._get_parser_class("doc.pdf"),
            self.DummyParser
        )
    @mock.patch("documents.consumer.Consumer.CONSUME")
    @mock.patch("documents.consumer.os.makedirs")
    @mock.patch("documents.consumer.os.path.exists", return_value=True)
    @mock.patch("documents.consumer.document_consumer_declaration.send")
    def test__get_parser_class_n_parsers(self, m, *args):
        class DummyParser1(object):
            pass
        class DummyParser2(object):
            pass
        m.return_value = (
            (None, lambda _: {"weight": 0, "parser": DummyParser1}),
            (None, lambda _: {"weight": 1, "parser": DummyParser2}),
        )
        self.assertEqual(Consumer()._get_parser_class("doc.pdf"), DummyParser2)
    @mock.patch("documents.consumer.Consumer.CONSUME")
    @mock.patch("documents.consumer.os.makedirs")
    @mock.patch("documents.consumer.os.path.exists", return_value=True)
    @mock.patch("documents.consumer.document_consumer_declaration.send")
    def test__get_parser_class_0_parsers(self, m, *args):
        m.return_value = ((None, lambda _: None),)
        self.assertIsNone(Consumer()._get_parser_class("doc.pdf"))
    @mock.patch("documents.consumer.Consumer.CONSUME")
    @mock.patch("documents.consumer.os.makedirs")
    @mock.patch("documents.consumer.os.path.exists", return_value=True)
    @mock.patch("documents.consumer.document_consumer_declaration.send")
    def _get_consumer(self, m, *args):
        m.return_value = (
            (None, lambda _: {"weight": 0, "parser": self.DummyParser}),
        )
        return Consumer()
 class TestAttributes(TestCase):
    TAGS = ("tag1", "tag2", "tag3")
    EXTENSIONS = (
-        "pdf", "png", "jpg", "jpeg", "gif", "tiff", "tif",
+        "pdf", "png", "jpg", "jpeg", "gif",
-        "PDF", "PNG", "JPG", "JPEG", "GIF", "TIFF", "TIF",
+        "PDF", "PNG", "JPG", "JPEG", "GIF",
-        "PdF", "PnG", "JpG", "JPeG", "GiF", "TiFf", "TiF",
+        "PdF", "PnG", "JpG", "JPeG", "GiF",
    )
    def _test_guess_attributes_from_name(self, path, sender, title, tags):
@@ -80,8 +36,6 @@ class TestAttributes(TestCase):
            self.assertEqual(tuple([t.slug for t in file_info.tags]), tags, f)
            if extension.lower() == "jpeg":
                self.assertEqual(file_info.extension, "jpg", f)
            elif extension.lower() == "tif":
                self.assertEqual(file_info.extension, "tiff", f)
            else:
                self.assertEqual(file_info.extension, extension.lower(), f)
@@ -354,3 +308,71 @@ class TestFieldPermutations(TestCase):
                        }
                        self._test_guessed_attributes(
                            template.format(**spec), **spec)
 class FakeTesseract(object):
    @staticmethod
    def can_detect_orientation():
        return True
    @staticmethod
    def detect_orientation(file_handle, lang):
        raise OtherTesseractError("arbitrary status", "message")
    @staticmethod
    def image_to_string(file_handle, lang):
        return "This is test text"
 class FakePyOcr(object):
    @staticmethod
    def get_available_tools():
        return [FakeTesseract]
 class TestOCR(TestCase):
    text_cases = [
        ("simple     string", "simple string"),
        (
            "simple    newline\n   testing string",
            "simple newline\ntesting string"
        ),
        (
            "utf-8   строка с пробелами в конце  ",
            "utf-8 строка с пробелами в конце"
        )
    ]
    SAMPLE_FILES = os.path.join(os.path.dirname(__file__), "samples")
    TESSERACT_INSTALLED = bool(pyocr.get_available_tools())
    def test_strip_excess_whitespace(self):
        for source, result in self.text_cases:
            actual_result = strip_excess_whitespace(source)
            self.assertEqual(
                result,
                actual_result,
                "strip_exceess_whitespace({}) != '{}', but '{}'".format(
                    source,
                    result,
                    actual_result
                )
            )
    @skipIf(not TESSERACT_INSTALLED, "Tesseract not installed. Skipping")
    @mock.patch("documents.consumer.Consumer.SCRATCH", SAMPLE_FILES)
    @mock.patch("documents.consumer.pyocr", FakePyOcr)
    def test_image_to_string_with_text_free_page(self):
        """
        This test is sort of silly, since it's really just reproducing an odd
        exception thrown by pyocr when it encounters a page with no text.
        Actually running this test against an installation of Tesseract results
        in a segmentation fault rooted somewhere deep inside pyocr where I
        don't care to dig.  Regardless, if you run the consumer normally,
        text-free pages are now handled correctly so long as we work around
        this weird exception.
        """
        image_to_string(["no-text.png", "en"])
--- a/src/documents/tests/test_importer.py
+++ b/src/documents/tests/test_importer.py
@@ -3,8 +3,6 @@ from django.test import TestCase
 from ..management.commands.document_importer import Command
 from documents.settings import EXPORTER_FILE_NAME
 class TestImporter(TestCase):
@@ -29,7 +27,7 @@ class TestImporter(TestCase):
        cmd.manifest = [{
            "model": "documents.document",
-            EXPORTER_FILE_NAME: "noexist.pdf"
+            "__exported_file_name__": "noexist.pdf"
        }]
        # self.assertRaises(CommandError, cmd._check_manifest)
        with self.assertRaises(CommandError) as cm:
--- a/src/documents/tests/test_matchables.py
+++ b/src/documents/tests/test_matchables.py
@@ -1,6 +1,6 @@
 from random import randint
-from django.test import TestCase, override_settings
+from django.test import TestCase
 from ..models import Correspondent, Document, Tag
 from ..signals import document_consumption_finished
@@ -16,15 +16,9 @@ class TestMatching(TestCase):
                matching_algorithm=getattr(klass, algorithm)
            )
            for string in true:
-                self.assertTrue(
+                self.assertTrue(instance.matches(string))
                    instance.matches(string),
                    '"%s" should match "%s" but it does not' % (text, string)
                )
            for string in false:
-                self.assertFalse(
+                self.assertFalse(instance.matches(string))
                    instance.matches(string),
                    '"%s" should not match "%s" but it does' % (text, string)
                )
    def test_match_all(self):
@@ -60,21 +54,6 @@ class TestMatching(TestCase):
            )
        )
        self._test_matching(
            'brown fox "lazy dogs"',
            "MATCH_ALL",
            (
                "the quick brown fox jumped over the lazy dogs",
                "the quick brown fox jumped over the lazy  dogs",
            ),
            (
                "the quick fox jumped over the lazy dogs",
                "the quick brown wolf jumped over the lazy dogs",
                "the quick brown fox jumped over the fat dogs",
                "the quick brown fox jumped over the lazy... dogs",
            )
        )
    def test_match_any(self):
        self._test_matching(
@@ -110,18 +89,6 @@ class TestMatching(TestCase):
            )
        )
        self._test_matching(
            '"brown fox" " lazy  dogs "',
            "MATCH_ANY",
            (
                "the quick brown fox",
                "jumped over the lazy  dogs.",
            ),
            (
                "the lazy fox jumped over the brown dogs",
            )
        )
    def test_match_literal(self):
        self._test_matching(
@@ -182,25 +149,8 @@ class TestMatching(TestCase):
            )
        )
    def test_match_fuzzy(self):
-        self._test_matching(
+class TestApplications(TestCase):
            "Springfield, Miss.",
            "MATCH_FUZZY",
            (
                "1220 Main Street, Springf eld, Miss.",
                "1220 Main Street, Spring field, Miss.",
                "1220 Main Street, Springfeld, Miss.",
                "1220 Main Street Springfield Miss",
            ),
            (
                "1220 Main Street, Springfield, Mich.",
            )
        )
@override_settings(POST_CONSUME_SCRIPT=None)
 class TestDocumentConsumptionFinishedSignal(TestCase):
    """
    We make use of document_consumption_finished, so we should test that it's
    doing what we expect wrt to tag & correspondent matching.
--- a/src/documents/tests/test_models.py
+++ b/src/documents/tests/test_models.py
@@ -1,31 +0,0 @@
 from django.test import TestCase
 from ..models import Document, Correspondent
 from .factories import DocumentFactory, CorrespondentFactory
 class CorrespondentTestCase(TestCase):
    def test___str__(self):
        for s in ("test", "οχι", "test with fun_charÅc'\"terß"):
            correspondent = CorrespondentFactory.create(name=s)
            self.assertEqual(str(correspondent), s)
 class DocumentTestCase(TestCase):
    def test_correspondent_deletion_does_not_cascade(self):
        self.assertEqual(Correspondent.objects.all().count(), 0)
        correspondent = CorrespondentFactory.create()
        self.assertEqual(Correspondent.objects.all().count(), 1)
        self.assertEqual(Document.objects.all().count(), 0)
        DocumentFactory.create(correspondent=correspondent)
        self.assertEqual(Document.objects.all().count(), 1)
        self.assertIsNotNone(Document.objects.all().first().correspondent)
        correspondent.delete()
        self.assertEqual(Correspondent.objects.all().count(), 0)
        self.assertEqual(Document.objects.all().count(), 1)
        self.assertIsNone(Document.objects.all().first().correspondent)
--- a/src/documents/views.py
+++ b/src/documents/views.py
@@ -1,16 +1,17 @@
-from django.http import HttpResponse, HttpResponseBadRequest
+from django.contrib.auth.mixins import LoginRequiredMixin
 from django.http import HttpResponse
 from django.views.decorators.csrf import csrf_exempt
 from django.views.generic import DetailView, FormView, TemplateView
 from django_filters.rest_framework import DjangoFilterBackend
 from rest_framework.filters import SearchFilter, OrderingFilter
 from paperless.db import GnuPG
 from paperless.mixins import SessionOrBasicAuthMixin
 from paperless.views import StandardPagination
 from rest_framework.filters import OrderingFilter, SearchFilter
 from rest_framework.mixins import (
    DestroyModelMixin,
    ListModelMixin,
    RetrieveModelMixin,
    UpdateModelMixin
 )
 from rest_framework.pagination import PageNumberPagination
 from rest_framework.permissions import IsAuthenticated
 from rest_framework.viewsets import (
    GenericViewSet,
@@ -40,7 +41,7 @@ class IndexView(TemplateView):
        return TemplateView.get_context_data(self, **kwargs)
-class FetchView(SessionOrBasicAuthMixin, DetailView):
+class FetchView(LoginRequiredMixin, DetailView):
    model = Document
@@ -73,19 +74,28 @@ class FetchView(SessionOrBasicAuthMixin, DetailView):
        return response
-class PushView(SessionOrBasicAuthMixin, FormView):
+class PushView(LoginRequiredMixin, FormView):
    """
    A crude REST-ish API for creating documents.
    """
    form_class = UploadForm
    @classmethod
    def as_view(cls, **kwargs):
        return csrf_exempt(FormView.as_view(**kwargs))
    def form_valid(self, form):
-        form.save()
+        return HttpResponse("1")
        return HttpResponse("1", status=202)
    def form_invalid(self, form):
-        return HttpResponseBadRequest(str(form.errors))
+        return HttpResponse("0")
 class StandardPagination(PageNumberPagination):
    page_size = 25
    page_size_query_param = "page-size"
    max_page_size = 100000
 class CorrespondentViewSet(ModelViewSet):
--- a/src/paperless/init.py
+++ b/src/paperless/init.py
@@ -1 +1 @@
-from .checks import paths_check, binaries_check
+from .checks import paths_check
--- a/src/paperless/checks.py
+++ b/src/paperless/checks.py
@@ -1,15 +1,10 @@
 import os
 import shutil
 from django.conf import settings
 from django.core.checks import Error, register, Warning
@register()
 def paths_check(app_configs, **kwargs):
    """
    Check the various paths for existence, readability and writeability
    """
    check_messages = []
@@ -49,55 +44,4 @@ def paths_check(app_configs, **kwargs):
                    writeable_hint.format(directory)
                ))
    directory = os.getenv("PAPERLESS_STATICDIR")
    if directory:
        if not os.path.exists(directory):
            check_messages.append(Error(
                exists_message.format("PAPERLESS_STATICDIR"),
                exists_hint.format(directory)
            ))
        if not check_messages:
            if not os.access(directory, os.W_OK | os.X_OK):
                check_messages.append(Error(
                    writeable_message.format("PAPERLESS_STATICDIR"),
                    writeable_hint.format(directory)
                ))
    return check_messages
@register()
 def binaries_check(app_configs, **kwargs):
    """
    Paperless requires the existence of a few binaries, so we do some checks
    for those here.
    """
    error = "Paperless can't find {}. Without it, consumption is impossible."
    hint = "Either it's not in your ${PATH} or it's not installed."
    binaries = (settings.CONVERT_BINARY, settings.UNPAPER_BINARY, "tesseract")
    check_messages = []
    for binary in binaries:
        if shutil.which(binary) is None:
            check_messages.append(Warning(error.format(binary), hint))
    return check_messages
@register()
 def config_check(app_configs, **kwargs):
    warning = (
        "It looks like you have PAPERLESS_SHARED_SECRET defined.  Note that "
        "in the \npast, this variable was used for both API authentication "
        "and as the mail \nkeyword.  As the API no no longer uses it, this "
        "variable has been renamed to \nPAPERLESS_EMAIL_SECRET, so if you're "
        "using the mail feature, you'd best update \nyour variable name.\n\n"
        "The old variable will stop working in a few months."
    )
    if os.getenv("PAPERLESS_SHARED_SECRET"):
        return [Warning(warning)]
    return []
--- a/src/paperless/mixins.py
+++ b/src/paperless/mixins.py
@@ -1,46 +0,0 @@
 from django.contrib.auth.mixins import AccessMixin
 from django.contrib.auth import authenticate, login
 import base64
 class SessionOrBasicAuthMixin(AccessMixin):
    """
    Session or Basic Authentication mixin for Django.
    It determines if the requester is already logged in or if they have
    provided proper http-authorization and returning the view if all goes
    well, otherwise responding with a 401.
    Base for mixin found here: https://djangosnippets.org/snippets/3073/
    """
    def dispatch(self, request, *args, **kwargs):
        # check if user is authenticated via the session
        if request.user.is_authenticated:
            # Already logged in, just return the view.
            return super(SessionOrBasicAuthMixin, self).dispatch(
                request, *args, **kwargs
            )
        # apparently not authenticated via session, maybe via HTTP Basic?
        if 'HTTP_AUTHORIZATION' in request.META:
            auth = request.META['HTTP_AUTHORIZATION'].split()
            if len(auth) == 2:
                # NOTE: Support for only basic authentication
                if auth[0].lower() == "basic":
                    authString = base64.b64decode(auth[1]).decode('utf-8')
                    uname, passwd = authString.split(':')
                    user = authenticate(username=uname, password=passwd)
                    if user is not None:
                        if user.is_active:
                            login(request, user)
                            request.user = user
                            return super(
                                SessionOrBasicAuthMixin, self
                            ).dispatch(
                                request, *args, **kwargs
                            )
        # nope, really not authenticated
        return self.handle_no_permission()
--- a/src/paperless/settings.py
+++ b/src/paperless/settings.py
@@ -4,37 +4,25 @@ Django settings for paperless project.
 Generated by 'django-admin startproject' using Django 1.9.
 For more information on this file, see
-https://docs.djangoproject.com/en/1.10/topics/settings/
+https://docs.djangoproject.com/en/1.9/topics/settings/
 For the full list of settings and their values, see
-https://docs.djangoproject.com/en/1.10/ref/settings/
+https://docs.djangoproject.com/en/1.9/ref/settings/
 """
 import os
 from dotenv import load_dotenv
 # Tap paperless.conf if it's available
 if os.path.exists("/etc/paperless.conf"):
    load_dotenv("/etc/paperless.conf")
 # Build paths inside the project like this: os.path.join(BASE_DIR, ...)
 BASE_DIR = os.path.dirname(os.path.dirname(os.path.abspath(__file__)))
 # Quick-start development settings - unsuitable for production
-# See https://docs.djangoproject.com/en/1.10/howto/deployment/checklist/
+# See https://docs.djangoproject.com/en/1.9/howto/deployment/checklist/
 # The secret key has a default that should be fine so long as you're hosting
 # Paperless on a closed network.  However, if you're putting this anywhere
 # public, you should change the key to something unique and verbose.
 SECRET_KEY = os.getenv(
    "PAPERLESS_SECRET_KEY",
    "e11fl1oa-*ytql8p)(06fbj4ukrlo+n7k&q5+$1md7i+mge=ee"
 )
 # SECURITY WARNING: keep the secret key used in production secret!
 SECRET_KEY = 'e11fl1oa-*ytql8p)(06fbj4ukrlo+n7k&q5+$1md7i+mge=ee'
 # SECURITY WARNING: don't run with debug turned on in production!
 DEBUG = True
@@ -44,39 +32,34 @@ LOGIN_URL = '/admin/login'
 ALLOWED_HOSTS = ["*"]
 _allowed_hosts = os.getenv("PAPERLESS_ALLOWED_HOSTS")
-if _allowed_hosts:
+if allowed_hosts:
    ALLOWED_HOSTS = _allowed_hosts.split(",")
-FORCE_SCRIPT_NAME = os.getenv("PAPERLESS_FORCE_SCRIPT_NAME")
+# Tap paperless.conf if it's available
-    
+if os.path.exists("/etc/paperless.conf"):
    load_dotenv("/etc/paperless.conf")
 # Application definition
 INSTALLED_APPS = [
-    "django.contrib.auth",
+    'django.contrib.admin',
-    "django.contrib.contenttypes",
+    'django.contrib.auth',
-    "django.contrib.sessions",
+    'django.contrib.contenttypes',
-    "django.contrib.messages",
+    'django.contrib.sessions',
-    "django.contrib.staticfiles",
+    'django.contrib.messages',
    'django.contrib.staticfiles',
    "django_extensions",
    "documents.apps.DocumentsConfig",
    "reminders.apps.RemindersConfig",
    "paperless_tesseract.apps.PaperlessTesseractConfig",
    "flat_responsive",
    "django.contrib.admin",
    "rest_framework",
    "crispy_forms",
    "django_filters"
 ]
 if os.getenv("PAPERLESS_INSTALLED_APPS"):
    INSTALLED_APPS += os.getenv("PAPERLESS_INSTALLED_APPS").split(",")
 MIDDLEWARE_CLASSES = [
    'django.middleware.security.SecurityMiddleware',
    'django.contrib.sessions.middleware.SessionMiddleware',
@@ -110,7 +93,7 @@ WSGI_APPLICATION = 'paperless.wsgi.application'
 # Database
-# https://docs.djangoproject.com/en/1.10/ref/settings/#databases
+# https://docs.djangoproject.com/en/1.9/ref/settings/#databases
 DATABASES = {
    "default": {
@@ -135,7 +118,7 @@ if os.getenv("PAPERLESS_DBUSER") and os.getenv("PAPERLESS_DBPASS"):
 # Password validation
-# https://docs.djangoproject.com/en/1.10/ref/settings/#auth-password-validators
+# https://docs.djangoproject.com/en/1.9/ref/settings/#auth-password-validators
 AUTH_PASSWORD_VALIDATORS = [
    {
@@ -154,11 +137,11 @@ AUTH_PASSWORD_VALIDATORS = [
 # Internationalization
-# https://docs.djangoproject.com/en/1.10/topics/i18n/
+# https://docs.djangoproject.com/en/1.9/topics/i18n/
 LANGUAGE_CODE = 'en-us'
-TIME_ZONE = os.getenv("PAPERLESS_TIME_ZONE", "UTC")
+TIME_ZONE = 'UTC'
 USE_I18N = True
@@ -168,10 +151,9 @@ USE_TZ = True
 # Static files (CSS, JavaScript, Images)
-# https://docs.djangoproject.com/en/1.10/howto/static-files/
+# https://docs.djangoproject.com/en/1.9/howto/static-files/
-STATIC_ROOT = os.getenv(
+STATIC_ROOT = os.path.join(BASE_DIR, "..", "static")
    "PAPERLESS_STATICDIR", os.path.join(BASE_DIR, "..", "static"))
 MEDIA_ROOT = os.getenv(
    "PAPERLESS_MEDIADIR", os.path.join(BASE_DIR, "..", "media"))
@@ -205,7 +187,7 @@ LOGGING = {
 # The default language that tesseract will attempt to use when parsing
 # documents.  It should be a 3-letter language code consistent with ISO 639.
-OCR_LANGUAGE = os.getenv("PAPERLESS_OCR_LANGUAGE", "eng")
+OCR_LANGUAGE = "eng"
 # The amount of threads to use for OCR
 OCR_THREADS = os.getenv("PAPERLESS_OCR_THREADS")
@@ -238,6 +220,18 @@ CONSUMPTION_DIR = os.getenv("PAPERLESS_CONSUMPTION_DIR")
 # slowly, you may want to use a higher value than the default.
 CONSUMER_LOOP_TIME = int(os.getenv("PAPERLESS_CONSUMER_LOOP_TIME", 10))
 # If you want to use IMAP mail consumption, populate this with useful values.
 # If you leave HOST set to None, we assume you're not going to use this
 # feature.
 MAIL_CONSUMPTION = {
    "HOST": os.getenv("PAPERLESS_CONSUME_MAIL_HOST"),
    "PORT": os.getenv("PAPERLESS_CONSUME_MAIL_PORT"),
    "USERNAME": os.getenv("PAPERLESS_CONSUME_MAIL_USER"),
    "PASSWORD": os.getenv("PAPERLESS_CONSUME_MAIL_PASS"),
    "USE_SSL": os.getenv("PAPERLESS_CONSUME_MAIL_USE_SSL", "y").lower() == "y",  # If True, use SSL/TLS to connect
    "INBOX": "INBOX"  # The name of the inbox on the server
 }
 # This is used to encrypt the original documents and decrypt them later when
 # you want to download them.  Set it and change the permissions on this file to
 # 0600, or set it to `None` and you'll be prompted for the passphrase at
@@ -247,14 +241,11 @@ CONSUMER_LOOP_TIME = int(os.getenv("PAPERLESS_CONSUMER_LOOP_TIME", 10))
 # files.
 PASSPHRASE = os.getenv("PAPERLESS_PASSPHRASE")
 # If you intend to use the "API" to push files into the consumer, you'll need
 # to provide a shared secret here.  Leaving this as the default will disable
 # the API.
 SHARED_SECRET = os.getenv("PAPERLESS_SHARED_SECRET", "")
 # Trigger a script after every successful document consumption?
 PRE_CONSUME_SCRIPT = os.getenv("PAPERLESS_PRE_CONSUME_SCRIPT")
 POST_CONSUME_SCRIPT = os.getenv("PAPERLESS_POST_CONSUME_SCRIPT")
 # The number of items on each page in the web UI.  This value must be a
 # positive integer, but if you don't define one in paperless.conf, a default of
 # 100 will be used.
 PAPERLESS_LIST_PER_PAGE = int(os.getenv("PAPERLESS_LIST_PER_PAGE", 100))
 FY_START = os.getenv("PAPERLESS_FINANCIAL_YEAR_START")
 FY_END = os.getenv("PAPERLESS_FINANCIAL_YEAR_END")
--- a/src/paperless/urls.py
+++ b/src/paperless/urls.py
@@ -1,26 +1,35 @@
 """paperless URL Configuration
 The `urlpatterns` list routes URLs to views. For more information please see:
    https://docs.djangoproject.com/en/1.9/topics/http/urls/
 Examples:
 Function views
    1. Add an import:  from my_app import views
    2. Add a URL to urlpatterns:  url(r'^$', views.home, name='home')
 Class-based views
    1. Add an import:  from other_app.views import Home
    2. Add a URL to urlpatterns:  url(r'^$', Home.as_view(), name='home')
 Including another URLconf
    1. Add an import:  from blog import urls as blog_urls
    2. Import the include() function: from django.conf.urls import url, include
    3. Add a URL to urlpatterns:  url(r'^blog/', include(blog_urls))
 """
 from django.conf import settings
-from django.conf.urls import include, static, url
+from django.conf.urls import url, static, include
 from django.contrib import admin
-from django.views.decorators.csrf import csrf_exempt
+
 from django.views.generic import RedirectView
 from rest_framework.routers import DefaultRouter
 from documents.views import (
-    CorrespondentViewSet,
+    IndexView, FetchView, PushView,
-    DocumentViewSet,
+    CorrespondentViewSet, TagViewSet, DocumentViewSet, LogViewSet
    FetchView,
    LogViewSet,
    PushView,
    TagViewSet
 )
 from reminders.views import ReminderViewSet
 router = DefaultRouter()
-router.register(r"correspondents", CorrespondentViewSet)
+router.register(r'correspondents', CorrespondentViewSet)
-router.register(r"documents", DocumentViewSet)
+router.register(r'tags', TagViewSet)
-router.register(r"logs", LogViewSet)
+router.register(r'documents', DocumentViewSet)
-router.register(r"reminders", ReminderViewSet)
+router.register(r'logs', LogViewSet)
 router.register(r"tags", TagViewSet)
 urlpatterns = [
@@ -31,6 +40,9 @@ urlpatterns = [
    ),
    url(r"^api/", include(router.urls, namespace="drf")),
    # Normal pages (coming soon)
    # url(r"^$", IndexView.as_view(), name="index"),
    # File downloads
    url(
        r"^fetch/(?P<kind>doc|thumb)/(?P<pk>\d+)$",
@@ -38,20 +50,11 @@ urlpatterns = [
        name="fetch"
    ),
    # File uploads
    url(r"^push$", csrf_exempt(PushView.as_view()), name="push"),
    # The Django admin
    url(r"admin/", admin.site.urls),
-
+    url(r"", admin.site.urls),  # This is going away
    # Redirect / to /admin
    url(r"^$", RedirectView.as_view(permanent=True, url="/admin/")),
 ] + static.static(settings.MEDIA_URL, document_root=settings.MEDIA_ROOT)
-# Text in each page's <h1> (and above login form).
+if settings.SHARED_SECRET:
-admin.site.site_header = 'Paperless'
+    urlpatterns.insert(0, url(r"^push$", PushView.as_view(), name="push"))
 # Text at the end of each page's <title>.
 admin.site.site_title = 'Paperless'
 # Text at the top of the admin index page.
 admin.site.index_title = 'Paperless administration'
--- a/src/paperless/version.py
+++ b/src/paperless/version.py
@@ -1 +1 @@
-__version__ = (1, 1, 0)
+__version__ = (0, 3, 2)
--- a/src/paperless/views.py
+++ b/src/paperless/views.py
@@ -1,7 +0,0 @@
 from rest_framework.pagination import PageNumberPagination
 class StandardPagination(PageNumberPagination):
    page_size = 25
    page_size_query_param = "page-size"
    max_page_size = 100000
--- a/src/paperless/wsgi.py
+++ b/src/paperless/wsgi.py
@@ -4,7 +4,7 @@ WSGI config for paperless project.
 It exposes the WSGI callable as a module-level variable named ``application``.
 For more information on this file, see
-https://docs.djangoproject.com/en/1.10/howto/deployment/wsgi/
+https://docs.djangoproject.com/en/1.9/howto/deployment/wsgi/
 """
 import os
--- a/src/paperless_tesseract/init.py
+++ b/src/paperless_tesseract/init.py
--- a/src/paperless_tesseract/apps.py
+++ b/src/paperless_tesseract/apps.py
@@ -1,16 +0,0 @@
 from django.apps import AppConfig
 class PaperlessTesseractConfig(AppConfig):
    name = "paperless_tesseract"
    def ready(self):
        from documents.signals import document_consumer_declaration
        from .signals import ConsumerDeclaration
        document_consumer_declaration.connect(ConsumerDeclaration.handle)
        AppConfig.ready(self)
--- a/src/paperless_tesseract/parsers.py
+++ b/src/paperless_tesseract/parsers.py
@@ -1,214 +0,0 @@
 import itertools
 import os
 import re
 import subprocess
 from multiprocessing.pool import Pool
 import langdetect
 import pyocr
 from django.conf import settings
 from documents.parsers import DocumentParser, ParseError
 from PIL import Image
 from pyocr.libtesseract.tesseract_raw import \
    TesseractError as OtherTesseractError
 from pyocr.tesseract import TesseractError
 from .languages import ISO639
 class OCRError(Exception):
    pass
 class RasterisedDocumentParser(DocumentParser):
    """
    This parser uses Tesseract to try and get some text out of a rasterised
    image, whether it's a PDF, or other graphical format (JPEG, TIFF, etc.)
    """
    CONVERT = settings.CONVERT_BINARY
    DENSITY = settings.CONVERT_DENSITY if settings.CONVERT_DENSITY else 300
    THREADS = int(settings.OCR_THREADS) if settings.OCR_THREADS else None
    UNPAPER = settings.UNPAPER_BINARY
    DEFAULT_OCR_LANGUAGE = settings.OCR_LANGUAGE
    def get_thumbnail(self):
        """
        The thumbnail of a PDF is just a 500px wide image of the first page.
        """
        run_convert(
            self.CONVERT,
            "-scale", "500x5000",
            "-alpha", "remove",
            self.document_path, os.path.join(self.tempdir, "convert-%04d.png")
        )
        return os.path.join(self.tempdir, "convert-0000.png")
    def get_text(self):
        images = self._get_greyscale()
        try:
            return self._get_ocr(images)
        except OCRError as e:
            raise ParseError(e)
    def _get_greyscale(self):
        """
        Greyscale images are easier for Tesseract to OCR
        """
        # Convert PDF to multiple PNMs
        pnm = os.path.join(self.tempdir, "convert-%04d.pnm")
        run_convert(
            self.CONVERT,
            "-density", str(self.DENSITY),
            "-depth", "8",
            "-type", "grayscale",
            self.document_path, pnm,
        )
        # Get a list of converted images
        pnms = []
        for f in os.listdir(self.tempdir):
            if f.endswith(".pnm"):
                pnms.append(os.path.join(self.tempdir, f))
        # Run unpaper in parallel on converted images
        with Pool(processes=self.THREADS) as pool:
            pool.map(run_unpaper, itertools.product([self.UNPAPER], pnms))
        # Return list of converted images, processed with unpaper
        pnms = []
        for f in os.listdir(self.tempdir):
            if f.endswith(".unpaper.pnm"):
                pnms.append(os.path.join(self.tempdir, f))
        return sorted(filter(lambda __: os.path.isfile(__), pnms))
    def _guess_language(self, text):
        try:
            guess = langdetect.detect(text)
            self.log("debug", "Language detected: {}".format(guess))
            return guess
        except Exception as e:
            self.log("warning", "Language detection error: {}".format(e))
    def _get_ocr(self, imgs):
        """
        Attempts to do the best job possible OCR'ing the document based on
        simple language detection trial & error.
        """
        if not imgs:
            raise OCRError("No images found")
        self.log("info", "OCRing the document")
        # Since the division gets rounded down by int, this calculation works
        # for every edge-case, i.e. 1
        middle = int(len(imgs) / 2)
        raw_text = self._ocr([imgs[middle]], self.DEFAULT_OCR_LANGUAGE)
        guessed_language = self._guess_language(raw_text)
        if not guessed_language or guessed_language not in ISO639:
            self.log("warning", "Language detection failed!")
            if settings.FORGIVING_OCR:
                self.log(
                    "warning",
                    "As FORGIVING_OCR is enabled, we're going to make the "
                    "best with what we have."
                )
                raw_text = self._assemble_ocr_sections(imgs, middle, raw_text)
                return raw_text
            raise OCRError("Language detection failed")
        if ISO639[guessed_language] == self.DEFAULT_OCR_LANGUAGE:
            raw_text = self._assemble_ocr_sections(imgs, middle, raw_text)
            return raw_text
        try:
            return self._ocr(imgs, ISO639[guessed_language])
        except pyocr.pyocr.tesseract.TesseractError:
            if settings.FORGIVING_OCR:
                self.log(
                    "warning",
                    "OCR for {} failed, but we're going to stick with what "
                    "we've got since FORGIVING_OCR is enabled.".format(
                        guessed_language
                    )
                )
                raw_text = self._assemble_ocr_sections(imgs, middle, raw_text)
                return raw_text
            raise OCRError(
                "The guessed language is not available in this instance of "
                "Tesseract."
            )
    def _ocr(self, imgs, lang):
        """
        Performs a single OCR attempt.
        """
        if not imgs:
            return ""
        self.log("info", "Parsing for {}".format(lang))
        with Pool(processes=self.THREADS) as pool:
            r = pool.map(image_to_string, itertools.product(imgs, [lang]))
            r = " ".join(r)
        # Strip out excess white space to allow matching to go smoother
        return strip_excess_whitespace(r)
    def _assemble_ocr_sections(self, imgs, middle, text):
        """
        Given a `middle` value and the text that middle page represents, we OCR
        the remainder of the document and return the whole thing.
        """
        text = self._ocr(imgs[:middle], self.DEFAULT_OCR_LANGUAGE) + text
        text += self._ocr(imgs[middle + 1:], self.DEFAULT_OCR_LANGUAGE)
        return text
 def run_convert(*args):
    environment = os.environ.copy()
    if settings.CONVERT_MEMORY_LIMIT:
        environment["MAGICK_MEMORY_LIMIT"] = settings.CONVERT_MEMORY_LIMIT
    if settings.CONVERT_TMPDIR:
        environment["MAGICK_TMPDIR"] = settings.CONVERT_TMPDIR
    subprocess.Popen(args, env=environment).wait()
 def run_unpaper(args):
    unpaper, pnm = args
    subprocess.Popen(
        (unpaper, pnm, pnm.replace(".pnm", ".unpaper.pnm"))).wait()
 def strip_excess_whitespace(text):
    collapsed_spaces = re.sub(r"([^\S\r\n]+)", " ", text)
    no_leading_whitespace = re.sub(
        "([\n\r]+)([^\S\n\r]+)", '\\1', collapsed_spaces)
    no_trailing_whitespace = re.sub("([^\S\n\r]+)$", '', no_leading_whitespace)
    return no_trailing_whitespace
 def image_to_string(args):
    img, lang = args
    ocr = pyocr.get_available_tools()[0]
    with Image.open(os.path.join(RasterisedDocumentParser.SCRATCH, img)) as f:
        if ocr.can_detect_orientation():
            try:
                orientation = ocr.detect_orientation(f, lang=lang)
                f = f.rotate(orientation["angle"], expand=1)
            except (TesseractError, OtherTesseractError):
                pass
        return ocr.image_to_string(f, lang=lang)
--- a/src/paperless_tesseract/signals.py
+++ b/src/paperless_tesseract/signals.py
@@ -1,23 +0,0 @@
 import re
 from .parsers import RasterisedDocumentParser
 class ConsumerDeclaration(object):
    MATCHING_FILES = re.compile("^.*\.(pdf|jpe?g|gif|png|tiff?|pnm|bmp)$")
    @classmethod
    def handle(cls, sender, **kwargs):
        return cls.test
    @classmethod
    def test(cls, doc):
        if cls.MATCHING_FILES.match(doc.lower()):
            return {
                "parser": RasterisedDocumentParser,
                "weight": 0
            }
        return None
--- a/src/paperless_tesseract/tests/init.py
+++ b/src/paperless_tesseract/tests/init.py
--- a/src/paperless_tesseract/tests/test_ocr.py
+++ b/src/paperless_tesseract/tests/test_ocr.py
@@ -1,80 +0,0 @@
 import os
 from unittest import mock, skipIf
 import pyocr
 from django.test import TestCase
 from pyocr.libtesseract.tesseract_raw import \
    TesseractError as OtherTesseractError
 from ..parsers import image_to_string, strip_excess_whitespace
 class FakeTesseract(object):
    @staticmethod
    def can_detect_orientation():
        return True
    @staticmethod
    def detect_orientation(file_handle, lang):
        raise OtherTesseractError("arbitrary status", "message")
    @staticmethod
    def image_to_string(file_handle, lang):
        return "This is test text"
 class FakePyOcr(object):
    @staticmethod
    def get_available_tools():
        return [FakeTesseract]
 class TestOCR(TestCase):
    text_cases = [
        ("simple     string", "simple string"),
        (
            "simple    newline\n   testing string",
            "simple newline\ntesting string"
        ),
        (
            "utf-8   строка с пробелами в конце  ",
            "utf-8 строка с пробелами в конце"
        )
    ]
    SAMPLE_FILES = os.path.join(os.path.dirname(__file__), "samples")
    TESSERACT_INSTALLED = bool(pyocr.get_available_tools())
    def test_strip_excess_whitespace(self):
        for source, result in self.text_cases:
            actual_result = strip_excess_whitespace(source)
            self.assertEqual(
                result,
                actual_result,
                "strip_exceess_whitespace({}) != '{}', but '{}'".format(
                    source,
                    result,
                    actual_result
                )
            )
    @skipIf(not TESSERACT_INSTALLED, "Tesseract not installed. Skipping")
    @mock.patch(
        "paperless_tesseract.parsers.RasterisedDocumentParser.SCRATCH",
        SAMPLE_FILES
    )
    @mock.patch("paperless_tesseract.parsers.pyocr", FakePyOcr)
    def test_image_to_string_with_text_free_page(self):
        """
        This test is sort of silly, since it's really just reproducing an odd
        exception thrown by pyocr when it encounters a page with no text.
        Actually running this test against an installation of Tesseract results
        in a segmentation fault rooted somewhere deep inside pyocr where I
        don't care to dig.  Regardless, if you run the consumer normally,
        text-free pages are now handled correctly so long as we work around
        this weird exception.
        """
        image_to_string(["no-text.png", "en"])
--- a/src/paperless_tesseract/tests/test_signals.py
+++ b/src/paperless_tesseract/tests/test_signals.py
@@ -1,36 +0,0 @@
 from django.test import TestCase
 from ..signals import ConsumerDeclaration
 class SignalsTestCase(TestCase):
    def test_test_handles_various_file_names_true(self):
        prefixes = (
            "doc", "My Document", "Μυ Γρεεκ Δοψθμεντ", "Doc -with - tags",
            "A document with a . in it", "Doc with -- in it"
        )
        suffixes = (
            "pdf", "jpg", "jpeg", "gif", "png", "tiff", "tif", "pnm", "bmp",
            "PDF", "JPG", "JPEG", "GIF", "PNG", "TIFF", "TIF", "PNM", "BMP",
            "pDf", "jPg", "jpEg", "gIf", "pNg", "tIff", "tIf", "pNm", "bMp",
        )
        for prefix in prefixes:
            for suffix in suffixes:
                name = "{}.{}".format(prefix, suffix)
                self.assertTrue(ConsumerDeclaration.test(name))
    def test_test_handles_various_file_names_false(self):
        prefixes = ("doc",)
        suffixes = ("txt", "markdown", "",)
        for prefix in prefixes:
            for suffix in suffixes:
                name = "{}.{}".format(prefix, suffix)
                self.assertFalse(ConsumerDeclaration.test(name))
        self.assertFalse(ConsumerDeclaration.test(""))
        self.assertFalse(ConsumerDeclaration.test("doc"))
--- a/src/pytest.ini
+++ b/src/pytest.ini
@@ -1,8 +1,3 @@
 [pytest]
 DJANGO_SETTINGS_MODULE=paperless.settings
-addopts = --pythonwarnings=all
+
 env =
  PAPERLESS_CONSUME=/tmp
  PAPERLESS_PASSPHRASE=THISISNOTASECRET
  PAPERLESS_SECRET=paperless
  PAPERLESS_EMAIL_SECRET=paperless
--- a/src/reminders/init.py
+++ b/src/reminders/init.py
--- a/src/reminders/admin.py
+++ b/src/reminders/admin.py
@@ -1,20 +0,0 @@
 from django.conf import settings
 from django.contrib import admin
 from .models import Reminder
 class ReminderAdmin(admin.ModelAdmin):
    class Media:
        css = {
            "all": ("paperless.css",)
        }
    list_per_page = settings.PAPERLESS_LIST_PER_PAGE
    list_display = ("date", "document", "note")
    list_filter = ("date",)
    list_editable = ("note",)
 admin.site.register(Reminder, ReminderAdmin)
--- a/src/reminders/apps.py
+++ b/src/reminders/apps.py
@@ -1,5 +0,0 @@
 from django.apps import AppConfig
 class RemindersConfig(AppConfig):
    name = "reminders"
--- a/src/reminders/filters.py
+++ b/src/reminders/filters.py
@@ -1,14 +0,0 @@
 from django_filters.rest_framework import CharFilter, FilterSet
 from .models import Reminder
 class ReminderFilterSet(FilterSet):
    class Meta(object):
        model = Reminder
        fields = {
            "document": ["exact"],
            "date": ["gt", "lt", "gte", "lte", "exact"],
            "note": ["istartswith", "iendswith", "icontains"]
        }
--- a/src/reminders/migrations/0001_initial.py
+++ b/src/reminders/migrations/0001_initial.py
@@ -1,27 +0,0 @@
 # -*- coding: utf-8 -*-
 # Generated by Django 1.10.5 on 2017-03-25 15:58
 from __future__ import unicode_literals
 from django.db import migrations, models
 import django.db.models.deletion
 class Migration(migrations.Migration):
    initial = True
    dependencies = [
        ('documents', '0016_auto_20170325_1558'),
    ]
    operations = [
        migrations.CreateModel(
            name='Reminder',
            fields=[
                ('id', models.AutoField(auto_created=True, primary_key=True, serialize=False, verbose_name='ID')),
                ('date', models.DateTimeField()),
                ('note', models.TextField(blank=True)),
                ('document', models.ForeignKey(on_delete=django.db.models.deletion.CASCADE, to='documents.Document')),
            ],
        ),
    ]
--- a/src/reminders/migrations/init.py
+++ b/src/reminders/migrations/init.py
--- a/src/reminders/models.py
+++ b/src/reminders/models.py
@@ -1,8 +0,0 @@
 from django.db import models
 class Reminder(models.Model):
    document = models.ForeignKey("documents.Document")
    date = models.DateTimeField()
    note = models.TextField(blank=True)
--- a/src/reminders/serialisers.py
+++ b/src/reminders/serialisers.py
@@ -1,14 +0,0 @@
 from documents.models import Document
 from rest_framework import serializers
 from .models import Reminder
 class ReminderSerializer(serializers.HyperlinkedModelSerializer):
    document = serializers.HyperlinkedRelatedField(
        view_name="drf:document-detail", queryset=Document.objects)
    class Meta(object):
        model = Reminder
        fields = ("id", "document", "date", "note")
--- a/src/reminders/tests.py
+++ b/src/reminders/tests.py
@@ -1,3 +0,0 @@
 from django.test import TestCase
 # Create your tests here.
--- a/src/reminders/views.py
+++ b/src/reminders/views.py
@@ -1,22 +0,0 @@
 from django_filters.rest_framework import DjangoFilterBackend
 from rest_framework.filters import OrderingFilter
 from rest_framework.permissions import IsAuthenticated
 from rest_framework.viewsets import (
    ModelViewSet,
 )
 from .filters import ReminderFilterSet
 from .models import Reminder
 from .serialisers import ReminderSerializer
 from paperless.views import StandardPagination
 class ReminderViewSet(ModelViewSet):
    model = Reminder
    queryset = Reminder.objects
    serializer_class = ReminderSerializer
    pagination_class = StandardPagination
    permission_classes = (IsAuthenticated,)
    filter_backends = (DjangoFilterBackend, OrderingFilter)
    filter_class = ReminderFilterSet
    ordering_fields = ("date", "document")
--- a/src/tox.ini
+++ b/src/tox.ini
@@ -5,18 +5,19 @@
 [tox]
 skipsdist = True
-envlist = py34, py35, py36, pycodestyle
+envlist = py34, py35, pep8
 [testenv]
-commands = pytest
+commands = {envpython} manage.py test
 deps = -r{toxinidir}/../requirements.txt
 setenv =
    PAPERLESS_CONSUME=/tmp
    PAPERLESS_PASSPHRASE=THISISNOTASECRET
    PAPERLESS_SECRET=paperless
-[testenv:pycodestyle]
+[testenv:pep8]
-commands=pycodestyle
+commands=pep8
-deps=pycodestyle
+deps=pep8
-[pycodestyle]
+[pep8]
-exclude=
+exclude=.tox,migrations,paperless/settings.py
  .tox,
  migrations,
  paperless/settings.py
`@@ -1 +1 @@`
	`from .checks import paths_check, binaries_check`	`from .checks import paths_check`
`@@ -1 +1 @@`
	`__version__ = (1, 1, 0)`	`__version__ = (0, 3, 2)`
		`@@ -1,3 +0,0 @@`
			`from django.test import TestCase`

			`# Create your tests here.`