Bump to v1.0.0!

Upgrade to Django 1.11.x
Change date fields to actual date fields #278
2025-08-03 18:54:40 -05:00 · 2018-01-06 19:25:33 +00:00 · 2018-01-06 19:24:10 +00:00 · 2018-01-06 19:21:49 +00:00 · 2018-01-06 18:56:37 +00:00 · 2018-01-06 18:51:16 +00:00
49 changed files with 961 additions and 372 deletions
--- a/.gitignore
+++ b/.gitignore
@@ -68,6 +68,7 @@ db.sqlite3
 .idea

 # Other stuff that doesn't belong
+.virtualenv
 virtualenv
 .vagrant
 docker-compose.yml
--- a/6
+++ b/6
@@ -35,12 +35,16 @@ RUN groupadd -g 1000 paperless \
    && useradd -u 1000 -g 1000 -d /usr/src/paperless paperless \
    && chown -Rh paperless:paperless /usr/src/paperless

+# Set export directory
+ENV PAPERLESS_EXPORT_DIR /export
+RUN mkdir -p $PAPERLESS_EXPORT_DIR
+
 # Setup entrypoint
 COPY scripts/docker-entrypoint.sh /sbin/docker-entrypoint.sh
 RUN chmod 755 /sbin/docker-entrypoint.sh

 # Mount volumes
-VOLUME ["/usr/src/paperless/data", "/usr/src/paperless/media", "/consume"]
+VOLUME ["/usr/src/paperless/data", "/usr/src/paperless/media", "/consume", "/export"]

 ENTRYPOINT ["/sbin/docker-entrypoint.sh"]
 CMD ["--help"]
--- a/README.rst
+++ b/README.rst
@@ -6,7 +6,7 @@ Paperless
 |Travis|
 |Dependencies|

-Scan, index, and archive all of your paper documents
+Index and archive all of your scanned paper documents

 I hate paper.  Environmental issues aside, it's a tech person's nightmare:

@@ -23,13 +23,18 @@ it... because paper.  I wrote this to make my life easier.
 How it Works
 ============

-1. Buy a document scanner like `this one`_.
+Paperless does not control your scanner, it only helps you deal with what your
+scanner produces
+
+1. Buy a document scanner that can write to a place on your network.  If you
+   need some inspiration, have a look at the `scanner recommendations`_ page.
+   recommended by another user.
 2. Set it up to "scan to FTP" or something similar. It should be able to push
   scanned images to a server without you having to do anything.  If your
   scanner doesn't know how to automatically upload the file somewhere, you can
   always do that manually.  Paperless doesn't care how the documents get into
   its local consumption directory.
-3. Have the target server run the Paperless consumption script to OCR the PDF
+3. Have the target server run the Paperless consumption script to OCR the file
   and index it into a local database.
 4. Use the web frontend to sift through the database and find what you want.
 5. Download the PDF you need/want via the web interface and do whatever you
@@ -47,16 +52,15 @@ Stability
 =========

 Paperless is still under active development (just look at the git commit
-history) so don't expect it to be 100% stable.  I'm using it for my own
-documents, but I'm crazy like that.  If you use this and it breaks something,
-you get to keep all the shiny pieces.
+history) so don't expect it to be 100% stable.  You can backup the sqlite3
+database, media directory and your configuration file to be on the safe side.


 Requirements
 ============

-This is all really a quite simple, shiny, user-friendly wrapper around some very
-powerful tools.
+This is all really a quite simple, shiny, user-friendly wrapper around some
+very powerful tools.

 * `ImageMagick`_ converts the images between colour and greyscale.
 * `Tesseract`_ does the character recognition.
@@ -82,22 +86,22 @@ Similar Projects

 There's another project out there called `Mayan EDMS`_ that has a surprising
 amount of technical overlap with Paperless.  Also based on Django and using
-a consumer model with Tesseract and unpaper, Mayan EDMS is *much* more
-featureful and comes with a slick UI as well.  It may be that Paperless is
-better suited for low-resource environments (like a Rasberry Pi), but to be
-honest, this is just a guess as I haven't tested this myself.  One thing's
-for certain though, *Paperless* is a **much** better name.
+a consumer model with Tesseract and Unpaper, Mayan EDMS is *much* more
+featureful and comes with a slick UI as well, but still in Python 2. It may be
+that Paperless consumes fewer resources, but to be honest, this is just a guess
+as I haven't tested this myself.  One thing's for certain though, *Paperless*
+is a **much** better name.


 Important Note
 ==============

 Document scanners are typically used to scan sensitive documents.  Things like
-your social insurance number, tax records, invoices, etc.  While paperless
-encrypts the original PDFs via the consumption script, the OCR'd text is *not*
+your social insurance number, tax records, invoices, etc.  While Paperless
+encrypts the original files via the consumption script, the OCR'd text is *not*
 encrypted and is therefore stored in the clear (it needs to be searchable, so
 if someone has ideas on how to do that on encrypted data, I'm all ears).  This
-means that paperless should never be run on an untrusted host.  Instead, I
+means that Paperless should never be run on an untrusted host.  Instead, I
 recommend that if you do want to use it, run it locally on a server in your own
 home.

@@ -115,7 +119,7 @@ The thing is, I'm doing ok for money, so I would instead ask you to donate to
 the `United Nations High Commissioner for Refugees`_.  They're doing important
 work and they need the money a lot more than I do.

-.. _this one: http://www.brother.ca/en-CA/Scanners/11/ProductDetail/ADS1500W?ProductDetail=productdetail
+.. _scanner recommendations: https://paperless.readthedocs.io/en/latest/scanners.html
 .. _ImageMagick: http://imagemagick.org/
 .. _Tesseract: https://github.com/tesseract-ocr
 .. _Unpaper: https://www.flameeyes.eu/projects/unpaper
@@ -136,5 +140,5 @@ work and they need the money a lot more than I do.
   :target: https://gitter.im/danielquinn/paperless?utm_source=badge&utm_medium=badge&utm_campaign=pr-badge&utm_content=badge
 .. |Travis| image:: https://travis-ci.org/danielquinn/paperless.svg?branch=master
   :target: https://travis-ci.org/danielquinn/paperless
-.. |Dependencies| image:: https://www.versioneye.com/user/projects/57b33b81d9f1b00016faa500/badge.svg?style=flat-square
+.. |Dependencies| image:: https://www.versioneye.com/user/projects/57b33b81d9f1b00016faa500/badge.svg
   :target: https://www.versioneye.com/user/projects/57b33b81d9f1b00016faa500
--- a/5
+++ b/5
@@ -12,4 +12,9 @@ Vagrant.configure(VAGRANT_API_VERSION) do |config|

  # Networking details
  config.vm.network "private_network", ip: "172.28.128.4"
+
+  config.vm.provider "virtualbox" do |vb|
+    # Customize the amount of memory on the VM:
+    vb.memory = "1024"
+  end
 end
--- a/docker-compose.yml.example
+++ b/docker-compose.yml.example
@@ -17,7 +17,7 @@ services:
        # value with nothing.
        environment:
            - PAPERLESS_OCR_LANGUAGES=
-        command: ["runserver", "0.0.0.0:8000"]
+        command: ["runserver", "--insecure", "0.0.0.0:8000"]

    consumer:
        image: pitkley/paperless
@@ -26,7 +26,7 @@ services:
            - media:/usr/src/paperless/media
            # You have to adapt the local path you want the consumption
            # directory to mount to by modifying the part before the ':'.
-            - /path/to/arbitrary/place:/consume
+            - ./consume:/consume
            # Likewise, you can add a local path to mount a directory for
            # exporting. This is not strictly needed for paperless to
            # function, only if you're exporting your files: uncomment
--- a/docs/changelog.rst
+++ b/docs/changelog.rst
@@ -1,6 +1,69 @@
 Changelog
 #########

+* 1.0.0
+  * Upgrade to Django 1.11.  **You'll need to run
+    ``pip install -r requirements.txt`` to after the usual ``git pull`` to
+    properly update**.
+  * Replace the templatetag-based hack we had for document listing in favour of
+    a slightly less ugly solution in the form of another template tag with less
+    copypasta.
+  * Support for multi-word-matches for auto-tagging thanks to an excellent
+    patch from `ishirav`_ `#277`_.
+  * Fixed a CSS bug reported by `Stefan Hagen`_ that caused an overlapping of
+    the text and checkboxes under some resolutions `#272`_.
+  * Patched the Docker config to force the serving of static files.  Credit for
+    this one goes to `dev-rke`_ via `#248`_.
+  * Fix file permissions during Docker start up thanks to `Pit`_ on `#268`_.
+  * Date fields in the admin are now expressed as HTML5 date fields thanks to
+    `Lukas Winkler`_'s issue `#278`_
+
+* 0.8.0
+  * Paperless can now run in a subdirectory on a host (``/paperless``), rather
+    than always running in the root (``/``) thanks to `maphy-psd`_'s work on
+    `#255`_.
+
+* 0.7.0
+  * **Potentially breaking change**: As per `#235`_, Paperless will no longer
+    automatically delete documents attached to correspondents when those
+    correspondents are themselves deleted.  This was Django's default
+    behaviour, but didn't make much sense in Paperless' case.  Thanks to
+    `Thomas Brueggemann`_ and `David Martin`_ for their input on this one.
+  * Fix for `#232`_ wherein Paperless wasn't recognising ``.tif`` files
+    properly.  Thanks to `ayounggun`_ for reporting this one and to
+    `Kusti Skytén`_ for posting the correct solution in the Github issue.
+
+* 0.6.0
+  * Abandon the shared-secret trick we were using for the POST API in favour
+    of BasicAuth or Django session.
+  * Fix the POST API so it actually works.  `#236`_
+  * **Breaking change**: We've dropped the use of ``PAPERLESS_SHARED_SECRET``
+    as it was being used both for the API (now replaced with a normal auth)
+    and form email polling.  Now that we're only using it for email, this
+    variable has been renamed to ``PAPERLESS_EMAIL_SECRET``.  The old value
+    will still work for a while, but you should change your config if you've
+    been using the email polling feature.  Thanks to `Joshua Gilman`_ for all
+    the help with this feature.
+* 0.5.0
+  * Support for fuzzy matching in the auto-tagger & auto-correspondent systems
+    thanks to `Jake Gysland`_'s patch `#220`_.
+  * Modified the Dockerfile to prepare an export directory (`#212`_).  Thanks
+    to combined efforts from `Pit`_ and `Strubbl`_ in working out the kinks on
+    this one.
+  * Updated the import/export scripts to include support for thumbnails.  Big
+    thanks to `CkuT`_ for finding this shortcoming and doing the work to get
+    it fixed in `#224`_.
+  * All of the following changes are thanks to `David Martin`_:
+    * Bumped the dependency on pyocr to 0.4.7 so new users can make use of
+    Tesseract 4 if they so prefer (`#226`_).
+    * Fixed a number of issues with the automated mail handler (`#227`_, `#228`_)
+    * Amended the documentation for better handling of systemd service files (`#229`_)
+    * Amended the Django Admin configuration to have nice headers (`#230`_)
+
+* 0.4.1
+  * Fix for `#206`_ wherein the pluggable parser didn't recognise files with
+    all-caps suffixes like ``.PDF``
+
 * 0.4.0
  * Introducing reminders.  See `#199`_ for more information, but the short
    explanation is that you can now attach simple notes & times to documents
@@ -182,7 +245,19 @@ Changelog
 .. _Florian Harr: https://github.com/evils
 .. _Justin Snyman: https://github.com/stringlytyped
 .. _Thomas Brueggemann: https://github.com/thomasbrueggemann
+.. _Jake Gysland: https://github.com/jgysland
+.. _Strubbl: https://github.com/strubbl
+.. _CkuT: https://github.com/CkuT
+.. _David Martin: https://github.com/ddddavidmartin
 .. _Paperless Desktop: https://github.com/thomasbrueggemann/paperless-desktop
+.. _Joshua Gilman: https://github.com/jmgilman
+.. _ayounggun: https://github.com/ayounggun
+.. _Kusti Skytén: https://github.com/kskyten
+.. _maphy-psd: https://github.com/maphy-psd
+.. _ishirav: https://github.com/ishirav
+.. _Stefan Hagen: https://github.com/xkpd3
+.. _dev-rke: https://github.com/dev-rke
+.. _Lukas Winkler: https://github.com/Findus23

 .. _#20: https://github.com/danielquinn/paperless/issues/20
 .. _#44: https://github.com/danielquinn/paperless/issues/44
@@ -211,3 +286,21 @@ Changelog
 .. _#179: https://github.com/danielquinn/paperless/pull/179
 .. _#199: https://github.com/danielquinn/paperless/issues/199
 .. _#200: https://github.com/danielquinn/paperless/issues/200
+.. _#206: https://github.com/danielquinn/paperless/issues/206
+.. _#212: https://github.com/danielquinn/paperless/pull/212
+.. _#220: https://github.com/danielquinn/paperless/pull/220
+.. _#224: https://github.com/danielquinn/paperless/pull/224
+.. _#226: https://github.com/danielquinn/paperless/pull/226
+.. _#227: https://github.com/danielquinn/paperless/pull/227
+.. _#228: https://github.com/danielquinn/paperless/pull/228
+.. _#229: https://github.com/danielquinn/paperless/pull/229
+.. _#230: https://github.com/danielquinn/paperless/pull/230
+.. _#232: https://github.com/danielquinn/paperless/issues/232
+.. _#235: https://github.com/danielquinn/paperless/issues/235
+.. _#236: https://github.com/danielquinn/paperless/issues/236
+.. _#255: https://github.com/danielquinn/paperless/pull/255
+.. _#268: https://github.com/danielquinn/paperless/pull/268
+.. _#277: https://github.com/danielquinn/paperless/pull/277
+.. _#272: https://github.com/danielquinn/paperless/issues/272
+.. _#248: https://github.com/danielquinn/paperless/issues/248
+.. _#278: https://github.com/danielquinn/paperless/issues/248
--- a/docs/consumption.rst
+++ b/docs/consumption.rst
@@ -121,18 +121,21 @@ So, with all that in mind, here's what you do to get it running:

 1. Setup a new email account somewhere, or if you're feeling daring, create a
   folder in an existing email box and note the path to that folder.
-2. In ``settings.py`` set all of the appropriate values in ``MAIL_CONSUMPTION``.
+2. In ``/etc/paperless.conf`` set all of the appropriate values in
+   ``PATHS AND FOLDERS`` and ``SECURITY``.
   If you decided to use a subfolder of an existing account, then make sure you
-   set ``INBOX`` accordingly here.  You also have to set the
-   ``UPLOAD_SHARED_SECRET`` to something you can remember 'cause you'll have to
-   include that in every email you send.
+   set ``PAPERLESS_CONSUME_MAIL_INBOX`` accordingly here.  You also have to set
+   the ``PAPERLESS_EMAIL_SECRET`` to something you can remember 'cause you'll
+   have to include that in every email you send.
 3. Restart the :ref:`consumer <utilities-consumer>`.  The consumer will check
-   the configured email account every 10 minutes for something new and pull down
-   whatever it finds.
+   the configured email account at startup and from then on every 10 minutes
+   for something new and pulls down whatever it finds.
 4. Send yourself an email!  Note that the subject is treated as the file name,
   so if you set the subject to ``Correspondent - Title - tag,tag,tag``, you'll
   get what you expect.  Also, you must include the aforementioned secret
   string in every email so the fetcher knows that it's safe to import.
+   Note that Paperless only allows the email title to consist of safe characters
+   to be imported. These consist of alpha-numeric characters and ``-_ ,.'``.
 5. After a few minutes, the consumer will poll your mailbox, pull down the
   message, and place the attachment in the consumption directory with the
   appropriate name.  A few minutes later, the consumer will import it like any
@@ -144,46 +147,83 @@ So, with all that in mind, here's what you do to get it running:
 HTTP POST
 =========

-You can also submit a document via HTTP POST.  It doesn't do tags yet, and the
-URL schema isn't concrete, but it's a start.
-
-To push your document to Paperless, send an HTTP POST to the server with the
-following name/value pairs:
+You can also submit a document via HTTP POST, so long as you do so after
+authenticating.  To push your document to Paperless, send an HTTP POST to the
+server with the following name/value pairs:

 * ``correspondent``: The name of the document's correspondent.  Note that there
  are restrictions on what characters you can use here.  Specifically,
-  alphanumeric characters, `-`, `,`, `.`, and `'` are ok, everything else it
+  alphanumeric characters, `-`, `,`, `.`, and `'` are ok, everything else is
  out.  You also can't use the sequence ` - ` (space, dash, space).
 * ``title``: The title of the document.  The rules for characters is the same
  here as the correspondent.
-* ``signature``: For security reasons, we have the correspondent send a
-  signature using a "shared secret" method to make sure that random strangers
-  don't start uploading stuff to your server.  The means of generating this
-  signature is defined below.
+* ``document``: The file you're uploading

 Specify ``enctype="multipart/form-data"``, and then POST your file with::

    Content-Disposition: form-data; name="document"; filename="whatever.pdf"

+An example of this in HTML is a typical form:

-.. _consumption-http-signature:
+.. code:: html

-Generating the Signature
------------------------
+    <form method="post" enctype="multipart/form-data">
+        <input type="text" name="correspondent" value="My Correspondent" />
+        <input type="text" name="title" value="My Title" />
+        <input type="file" name="document" />
+        <input type="submit" name="go" value="Do the thing" />
+    </form>

-Generating a signature based a shared secret is pretty simple: define a secret,
-and store it on the server and the client.  Then use that secret, along with
-the text you want to verify to generate a string that you can use for
-verification.
-
-In the case of Paperless, you configure the server with the secret by setting
-``UPLOAD_SHARED_SECRET``.  Then on your client, you generate your signature by
-concatenating the correspondent, title, and the secret, and then using sha256
-to generate a hexdigest.
-
-If you're using Python, this is what that looks like:
+But a potentially more useful way to do this would be in Python.  Here we use
+the requests library to handle basic authentication and to send the POST data
+to the URL.

 .. code:: python

+    import os
+
    from hashlib import sha256
-    signature = sha256(correspondent + title + secret).hexdigest()
+
+    import requests
+    from requests.auth import HTTPBasicAuth
+
+    # You authenticate via BasicAuth or with a session id.
+    # We use BasicAuth here
+    username = "my-username"
+    password = "my-super-secret-password"
+
+    # Where you have Paperless installed and listening
+    url = "http://localhost:8000/push"
+
+    # Document metadata
+    correspondent = "Test Correspondent"
+    title = "Test Title"
+
+    # The local file you want to push
+    path = "/path/to/some/directory/my-document.pdf"
+
+
+    with open(path, "rb") as f:
+
+        response = requests.post(
+            url=url,
+            data={"title": title,  "correspondent": correspondent},
+            files={"document": (os.path.basename(path), f, "application/pdf")},
+            auth=HTTPBasicAuth(username, password),
+            allow_redirects=False
+        )
+
+        if response.status_code == 202:
+
+            # Everything worked out ok
+            print("Upload successful")
+
+        else:
+
+            # If you don't get a 202, it's probably because your credentials
+            # are wrong or something.  This will give you a rough idea of what
+            # happened.
+
+            print("We got HTTP status code: {}".format(response.status_code))
+            for k, v in response.headers.items():
+                print("{}: {}".format(k, v))
--- a/docs/guesswork.rst
+++ b/docs/guesswork.rst
@@ -80,6 +80,12 @@ text and matching algorithm.  From the help info there:
    uses a regex to match the PDF.  If you don't know what a regex is, you
    probably don't want this option.

+When using the "any" or "all" matching algorithms, you can search for terms that
+consist of multiple words by enclosing them in double quotes. For example, defining
+a match text of ``"Bank of America" BofA`` using the "any" algorithm, will match
+documents that contain either "Bank of America" or "BofA", but will not match
+documents containing "Bank of South America".
+
 Then just save your tag/correspondent and run another document through the
 consumer.  Once complete, you should see the newly-created document,
 automatically tagged with the appropriate data.
--- a/docs/index.rst
+++ b/docs/index.rst
@@ -3,7 +3,11 @@
 Paperless
 =========

-Scan, index, and archive all of your paper documents.  Say goodbye to paper.
+Paperless is a simple Django application running in two parts:
+a :ref:`consumer <utilities-consumer>` (the thing that does the indexing) and
+the :ref:`webserver <utilities-webserver>` (the part that lets you search & download
+already-indexed documents). If you want to learn more about its functions keep on
+reading after the installation section.


 .. _index-why-this-exists:
@@ -15,10 +19,11 @@ Paper is a nightmare.  Environmental issues aside, there's no excuse for it in
 the 21st century.  It takes up space, collects dust, doesn't support any form of
 a search feature, indexing is tedious, it's heavy and prone to damage & loss.

-I wrote this to make "going paperless" easier.  I wanted to be able to feed
-documents right from the post box into the scanner and then shred them so I
-never have to worry about finding stuff again.  Perhaps you might find it useful
-too.
+I wrote this to make "going paperless" easier.  I do not have to worry about
+finding stuff again. I feed documents right from the post box into the scanner and
+then shred them.  Perhaps you might find it useful too.
+
+


 Contents
@@ -35,4 +40,5 @@ Contents
   guesswork
   migrating
   troubleshooting
+   scanners
   changelog
--- a/docs/requirements.rst
+++ b/docs/requirements.rst
@@ -4,7 +4,7 @@ Requirements
 ============

 You need a Linux machine or Unix-like setup (theoretically an Apple machine
-should work) that has the following software installed on it:
+should work) that has the following software installed:

 * `Python3`_ (with development libraries, pip and virtualenv)
 * `GNU Privacy Guard`_
@@ -21,14 +21,14 @@ should work) that has the following software installed on it:
 Notably, you should confirm how you access your Python3 installation.  Many
 Linux distributions will install Python3 in parallel to Python2, using the names
 ``python3`` and ``python`` respectively.  The same goes for ``pip3`` and
-``pip``.  Using Python2 will likely break things, so make sure that you're using
-the right version.
+``pip``.  Running Paperless with Python2 will likely break things, so make sure that 
+you're using the right version.

 For the purposes of simplicity, ``python`` and ``pip`` is used everywhere to
-refer to their Python 3 versions.
+refer to their Python3 versions.

 In addition to the above, there are a number of Python requirements, all of
-which are listed in a file called ``requirements.txt`` in the project root.
+which are listed in a file called ``requirements.txt`` in the project root directory.

 If you're not working on a virtual environment (like Vagrant or Docker), you
 should probably be using a virtualenv, but that's your call.  The reasons why
@@ -67,7 +67,7 @@ dependencies is easy:

    $ pip install --user --requirement /path/to/paperless/requirements.txt

-This should download and install all of the requirements into
+This will download and install all of the requirements into
 ``${HOME}/.local``.  Remember that your distribution may be using ``pip3`` as
 mentioned above.

@@ -86,8 +86,8 @@ enter it, and install the requirements using the ``requirements.txt`` file:
    $ . /path/to/arbitrary/directory/bin/activate
    $ pip install  --requirement /path/to/paperless/requirements.txt

-Now you're ready to go.  Just remember to enter your virtualenv whenever you
-want to use Paperless.
+Now you're ready to go.  Just remember to enter (activate) your virtualenv 
+whenever you want to use Paperless.


 .. _requirements-documentation:
@@ -95,7 +95,7 @@ want to use Paperless.
 Documentation
 -------------

-As generation of the documentation is not required for use of Paperless,
+As generation of the documentation is not required for the use of Paperless,
 dependencies for this process are not included in ``requirements.txt``.  If
 you'd like to generate your own docs locally, you'll need to:

--- a/docs/scanners.rst
+++ b/docs/scanners.rst
@@ -0,0 +1,29 @@
+.. _scanners:
+
+Scanner Recommendations
+=======================
+
+As Paperless operates by watching a folder for new files, doesn't care what
+scanner you use, but sometimes finding a scanner that will write to an FTP,
+NFS, or SMB server can be difficult.  This page is here to help you find one
+that works right for you based on recommentations from other Paperless users.
+
+---------+----------------+-----+-----+-----+----------------+
+| Brand   | Model          | Supports        | Recommended By |
+---------+----------------+-----+-----+-----+----------------+
+|         |                | FTP | NFS | SMB |                |
+=========+================+=====+=====+=====+================+
+| Brother | `ADS-1500W`_   | yes | no  | yes | `danielquinn`_ |
+---------+----------------+-----+-----+-----+----------------+
+| Brother | `MFC-J6930DW`_ | yes |     |     | `ayounggun`_   |
+---------+----------------+-----+-----+-----+----------------+
+| Fujitsu | `ix500`_       | yes |     | yes | `eonist`_      |
+---------+----------------+-----+-----+-----+----------------+
+
+.. _ADS-1500W: https://www.brother.ca/en/p/ads1500w
+.. _MFC-J6930DW: https://www.brother.ca/en/p/MFCJ6930DW
+.. _ix500: http://www.fujitsu.com/us/products/computing/peripheral/scanners/scansnap/ix500/
+
+.. _danielquinn: https://github.com/danielquinn
+.. _ayounggun: https://github.com/ayounggun
+.. _eonist: https://github.com/eonist
--- a/docs/setup.rst
+++ b/docs/setup.rst
@@ -4,7 +4,7 @@ Setup
 =====

 Paperless isn't a very complicated app, but there are a few components, so some
-basic documentation is in order.  If you go follow along in this document and
+basic documentation is in order.  If you follow along in this document and
 still have trouble, please open an `issue on GitHub`_ so I can fill in the
 gaps.

@@ -28,6 +28,7 @@ or just download the tarball and go that route:

 .. code:: bash

+    $ cd to the directory where you want to run Paperless
    $ wget https://github.com/danielquinn/paperless/archive/master.zip
    $ unzip master.zip
    $ cd paperless-master
@@ -43,7 +44,9 @@ route`_ is quick & easy, but means you're running a VM which comes with memory
 consumption etc. We also `support Docker`_, which you can use natively under
 Linux and in a VM with `Docker Machine`_ (this guide was written for native
 Docker usage under Linux, you might have to adapt it for Docker Machine.)
-Alternatively the standard, `bare metal`_ approach is a little more
+Not to forget the virtualenv, this is similar to `bare metal`_ with the
+exception that you have to activate the virtualenv first.
+Last but not least, the standard `bare metal`_ approach is a little more
 complicated, but worth it because it makes it easier should you want to
 contribute some code back.

@@ -59,9 +62,11 @@ Standard (Bare Metal)
 .....................

 1. Install the requirements as per the :ref:`requirements <requirements>` page.
-2. Change to the ``src`` directory in this repo.
-3. Copy ``paperless.conf.example`` to ``/etc/paperless.conf`` and open it in
-   your favourite editor.  Set the values for:
+2. Within the extract of master.zip go to the ``src`` directory.
+3. Copy ``paperless.conf.example`` to ``/etc/paperless.conf`` also the virtual
+   envrionment look there for it and open it in your favourite editor.
+   Because this file contains passwords it should only be readable by user root
+   and paperless !  Set the values for:

    * ``PAPERLESS_CONSUMPTION_DIR``: this is where your documents will be
      dumped to be consumed by Paperless.
@@ -70,18 +75,19 @@ Standard (Bare Metal)
    * ``PAPERLESS_OCR_THREADS``: this is the number of threads the OCR process
      will spawn to process document pages in parallel.

-4. Initialise the database with ``./manage.py migrate``.
+4. Initialise the SQLite database with ``./manage.py migrate``.
 5. Create a user for your Paperless instance with
   ``./manage.py createsuperuser``. Follow the prompts to create your user.
 6. Start the webserver with ``./manage.py runserver <IP>:<PORT>``.
-   If no specifc IP or port are given, the default is ``127.0.0.1:8000``.
-   You should now be able to visit your (empty) `Paperless webserver`_ at
-   ``127.0.0.1:8000`` (or whatever you chose).  You can login with the
-   user/pass you created in #5.
+   If no specifc IP or port are given, the default is ``127.0.0.1:8000``
+   also known as http://localhost:8000/.
+   You should now be able to visit your (empty) at `Paperless webserver`_ or
+   whatever you chose before.  You can login with the user/pass you created in
+   #5.
 7. In a separate window, change to the ``src`` directory in this repo again,
   but this time, you should start the consumer script with
   ``./manage.py document_consumer``.
-8. Scan something.  Put it in the ``CONSUMPTION_DIR``.
+8. Scan something or put a file into the  ``CONSUMPTION_DIR``.
 9. Wait a few minutes
 10. Visit the document list on your webserver, and it should be there, indexed
    and downloadable.
@@ -299,17 +305,21 @@ Standard (Bare Metal, Systemd)

 If you're running on a bare metal system that's using Systemd, you can use the
 service unit files in the ``scripts`` directory to set this up.  You'll need to
-create a user called ``paperless`` and setup Paperless to be in a place that
-this new user can read and write to. Be sure to edit the service scripts to point
-to the proper location of your paperless install, referencing the appropriate Python
-binary. For example: ``ExecStart=/path/to/python3 /path/to/paperless/src/manage.py document_consumer``.
-If you don't want to make a new user, you can change the ``Group`` and ``User`` variables
-accordingly.
+create a user called ``paperless`` (without login (if not already done so #5))
+and setup Paperless to be in a place that this new user can read and write to.
+Be sure to edit the service  scripts to point to the proper location of your
+paperless install, referencing the appropriate Python binary. For example:
+``ExecStart=/path/to/python3 /path/to/paperless/src/manage.py document_consumer``.
+If you don't want to make a new user, you can change the ``Group`` and ``User``
+variables accordingly.

-Then, you can just tell Systemd as ``root`` (or using ``sudo``) to enable the two ``.service`` files::
+Then, as ``root`` (or using ``sudo``) you can just copy the ``.service`` files
+to the Systemd directory and tell it to enable the two services::

-    # systemctl enable /path/to/paperless/scripts/paperless-consumer.service
-    # systemctl enable /path/to/paperless/scripts/paperless-webserver.service
+    # cp /path/to/paperless/scripts/paperless-consumer.service /etc/systemd/system/
+    # cp /path/to/paperless/scripts/paperless-webserver.service /etc/systemd/system/
+    # systemctl enable paperless-consumer
+    # systemctl enable paperless-webserver
    # systemctl start paperless-consumer
    # systemctl start paperless-webserver

@@ -344,7 +354,7 @@ after restarting your system:
  If you are using a network interface other than ``eth0``, you will have to
  change ``IFACE=eth0``. For example, if you are connected via WiFi, you will
  likely need to replace ``eth0`` above with ``wlan0``. To see all interfaces,
-  run ``ifconfig``.
+  run ``ifconfig -a``.

  Save the file.

@@ -384,7 +394,10 @@ Using a Real Webserver
 The default is to use Django's development server, as that's easy and does the
 job well enough on a home network.  However, if you want to do things right,
 it's probably a good idea to use a webserver capable of handling more than one
-thread.
+thread. You will also have to let the webserver serve the static files (CSS,
+JavaScript) from the directory configured in ``PAPERLESS_STATICDIR``. For that,
+you need to run ``./manage.py collectstatic`` in the ``src`` directory.  The
+default static files directory is ``../static``.

 Apache
 ~~~~~~
@@ -562,3 +575,28 @@ If you're using Docker, you can set a restart-policy_ in the
 Docker daemon.

 .. _restart-policy: https://docs.docker.com/engine/reference/commandline/run/#restart-policies-restart
+
+
+.. _setup-subdirectory
+
+Hosting Paperless in a Subdirectory
+-----------------------------------
+
+Paperless was designed to run off the root of the hosting domain,
+(ie: ``https://example.com/``) but with a few changes, you can configure
+it to run in a subdirectory on your server
+(ie: ``https://example.com/paperless/``).
+
+Thanks to the efforts of `maphy-psd`_ on `Github`_, running Paperless in a
+subdirectory is now as easy as setting a config variable.  Simply set
+``PAPERLESS_FORCE_SCRIPT_NAME`` in your environment or
+``/etc/paperless.conf`` to the path you want Paperless hosted at, configure
+Nginx/Apache for your needs and you're done.  So, if you want Paperless to live
+at ``https://example.com/arbitrary/path/to/paperless`` then you just set
+``PAPERLESS_FORCE_SCRIPT_NAME`` to ``/arbitrary/path/to/paperless``.  Note the
+leading ``/`` there.
+
+As to how to configure Nginx or Apache for this, that's on you :-)
+
+.. _maphy-psd: https://github.com/maphy-psd
+.. _Github: https://github.com/danielquinn/paperless/pull/255
--- a/paperless.conf.example
+++ b/paperless.conf.example
@@ -1,11 +1,34 @@
 # Sample paperless.conf
 # Copy this file to /etc/paperless.conf and modify it to suit your needs.
+# As this file contains passwords it should only be readable by the user
+# running paperless.
+
+
+###############################################################################
+####                         Paths & Folders                               ####
+###############################################################################

 # This where your documents should go to be consumed.  Make sure that it exists
 # and that the user running the paperless service can read/write its contents
 # before you start Paperless.
 PAPERLESS_CONSUMPTION_DIR=""

+
+# You can specify where you want the SQLite database to be stored instead of
+# the default location of /data/ within the install directory.
+#PAPERLESS_DBDIR=/path/to/database/file
+
+
+# Override the default MEDIA_ROOT here.  This is where all files are stored.
+# The default location is /media/documents/ within the install folder.
+#PAPERLESS_MEDIADIR=/path/to/media
+
+
+# Override the default STATIC_ROOT here.  This is where all static files
+# created using "collectstatic" manager command are stored.
+#PAPERLESS_STATICDIR=""
+
+
 # These values are required if you want paperless to check a particular email
 # box every 10 minutes and attempt to consume documents from there.  If you
 # don't define a HOST, mail checking will just be disabled.
@@ -14,6 +37,19 @@ PAPERLESS_CONSUME_MAIL_PORT=""
 PAPERLESS_CONSUME_MAIL_USER=""
 PAPERLESS_CONSUME_MAIL_PASS=""

+# Override the default IMAP inbox here. If not set Paperless defaults to
+# "INBOX".
+#PAPERLESS_CONSUME_MAIL_INBOX="INBOX"
+
+# Any email sent to the target account that does not contain this text will be
+# ignored.
+PAPERLESS_EMAIL_SECRET=""
+
+
+###############################################################################
+####                              Security                                 ####
+###############################################################################
+
 # You must have a passphrase in order for Paperless to work at all.  If you set
 # this to "", GNUGPG will "encrypt" your PDF by writing it out as a zero-byte
 # file.
@@ -28,75 +64,13 @@ PAPERLESS_CONSUME_MAIL_PASS=""
 # you've since changed it to a new one.
 PAPERLESS_PASSPHRASE="secret"

-# If you intend to consume documents either via HTTP POST or by email, you must
-# have a shared secret here.
-PAPERLESS_SHARED_SECRET=""

-# After a document is consumed, Paperless can trigger an arbitrary script if
-# you like.  This script will be passed a number of arguments for you to work
-# with.  The default is blank, which means nothing will be executed.  For more
-# information, take a look at the docs: http://paperless.readthedocs.org/en/latest/consumption.html#hooking-into-the-consumption-process
-#PAPERLESS_POST_CONSUME_SCRIPT="/path/to/an/arbitrary/script.sh"
+# The secret key has a default that should be fine so long as you're hosting
+# Paperless on a closed network.  However, if you're putting this anywhere
+# public, you should change the key to something unique and verbose.
+#PAPERLESS_SECRET_KEY="change-me"


-#
-# The following values use sensible defaults for modern systems, but if you're
-# running Paperless on a low-resource machine (like a Raspberry Pi), modifying
-# some of these values may be necessary.
-#
-
-
-# By default, Paperless will attempt to use all available CPU cores to process
-# a document, but if you would like to limit that, you can set this value to
-# an integer:
-#PAPERLESS_OCR_THREADS=1
-
-# On smaller systems, or even in the case of Very Large Documents, the consumer
-# may explode, complaining about how it's "unable to extent pixel cache".  In
-# such cases, try setting this to a reasonably low value, like 32000000.  The
-# default is to use whatever is necessary to do everything without writing to
-# disk, and units are in megabytes.
-#
-# For more information on how to use this value, you should probably search
-# the web for "MAGICK_MEMORY_LIMIT".
-#PAPERLESS_CONVERT_MEMORY_LIMIT=0
-
-# By default the conversion density setting for documents is 300DPI, in some
-# cases it has proven useful to configure a lesser value.
-# This setting has a high impact on the physical size of tmp page files,
-# the speed of document conversion, and can affect the accuracy of OCR
-# results. Individual results can vary and this setting should be tested
-# thoroughly against the documents you are importing to see if it has any
-# impacts either negative or positive. Testing on limited document sets has
-# shown a setting of 200 can cut the size of tmp files by 1/3, and speed up
-# conversion by up to 4x with little impact to OCR accuracy.
-#PAPERLESS_CONVERT_DENSITY=300
-
-# Similar to the memory limit, if you've got a small system and your OS mounts
-# /tmp as tmpfs, you should set this to a path that's on a physical disk, like
-# /home/your_user/tmp or something.  ImageMagick will use this as scratch space
-# when crunching through very large documents.
-#
-# For more information on how to use this value, you should probably search
-# the web for "MAGICK_TMPDIR".
-#PAPERLESS_CONVERT_TMPDIR=/var/tmp/paperless
-
-# You can specify where you want the SQLite database to be stored instead of
-# the default location
-#PAPERLESS_DBDIR=/path/to/database/file
-
-# Override the default MEDIA_ROOT here.  This is where all files are stored.
-#PAPERLESS_MEDIADIR=/path/to/media
-
-# Override the default STATIC_ROOT here. This is where all static files created
-# using "collectstatic" manager command are stored.
-#PAPERLESS_STATICDIR=""
-
-# The number of seconds that Paperless will wait between checking
-# PAPERLESS_CONSUMPTION_DIR.  If you tend to write documents to this directory
-# very slowly, you may want to use a higher value than the default (10).
-# PAPERLESS_CONSUMER_LOOP_TIME=10
-
 # If you're planning on putting Paperless on the open internet, then you
 # really should set this value to the domain name you're using.  Failing to do
 # so leaves you open to HTTP host header attacks:
@@ -106,22 +80,94 @@ PAPERLESS_SHARED_SECRET=""
 # as is "example.com,www.example.com", but NOT " example.com" or "example.com,"
 #PAPERLESS_ALLOWED_HOSTS="example.com,www.example.com"

-# Override the default UTC time zone here
+# To host paperless under a subpath url like example.com/paperless you set
+# this value to /paperless. No trailing slash!
+#
+# https://docs.djangoproject.com/en/1.11/ref/settings/#force-script-name
+#PAPERLESS_FORCE_SCRIPT_NAME=""
+
+###############################################################################
+####                          Software Tweaks                              ####
+###############################################################################
+
+# After a document is consumed, Paperless can trigger an arbitrary script if
+# you like.  This script will be passed a number of arguments for you to work
+# with.  The default is blank, which means nothing will be executed.  For more
+# information, take a look at the docs:
+# http://paperless.readthedocs.org/en/latest/consumption.html#hooking-into-the-consumption-process
+#PAPERLESS_POST_CONSUME_SCRIPT="/path/to/an/arbitrary/script.sh"
+
+
+#
+# The following values use sensible defaults for modern systems, but if you're
+# running Paperless on a low-resource device (like a Raspberry Pi), modifying
+# some of these values may be necessary.
+#
+
+
+# By default, Paperless will attempt to use all available CPU cores to process
+# a document, but if you would like to limit that, you can set this value to
+# an integer:
+#PAPERLESS_OCR_THREADS=1
+
+
+# Customize the default language that tesseract will attempt to use when
+# parsing documents.  It should be a 3-letter language code consistent with ISO
+# 639: https://www.loc.gov/standards/iso639-2/php/code_list.php
+#PAPERLESS_OCR_LANGUAGE=eng
+
+
+# On smaller systems, or even in the case of Very Large Documents, the consumer
+# may explode, complaining about how it's "unable to extend pixel cache".  In
+# such cases, try setting this to a reasonably low value, like 32000000.  The
+# default is to use whatever is necessary to do everything without writing to
+# disk, and units are in megabytes.
+#
+# For more information on how to use this value, you should probably search
+# the web for "MAGICK_MEMORY_LIMIT".
+#PAPERLESS_CONVERT_MEMORY_LIMIT=0
+
+
+# Similar to the memory limit, if you've got a small system and your OS mounts
+# /tmp as tmpfs, you should set this to a path that's on a physical disk, like
+# /home/your_user/tmp or something.  ImageMagick will use this as scratch space
+# when crunching through very large documents.
+#
+# For more information on how to use this value, you should probably search
+# the web for "MAGICK_TMPDIR".
+#PAPERLESS_CONVERT_TMPDIR=/var/tmp/paperless
+
+
+# By default the conversion density setting for documents is 300DPI, in some
+# cases it has proven useful to configure a lesser value.
+# This setting has a high impact on the physical size of tmp page files,
+# the speed of document conversion, and can affect the accuracy of OCR
+# results. Individual results can vary and this setting should be tested
+# thoroughly against the documents you are importing to see if it has any
+# impacts either negative or positive.
+# Testing on limited document sets has shown a setting of 200 can cut the
+# size of tmp files by 1/3, and speed up conversion by up to 4x
+# with little impact to OCR accuracy.
+#PAPERLESS_CONVERT_DENSITY=300
+
+
+# The number of seconds that Paperless will wait between checking
+# PAPERLESS_CONSUMPTION_DIR.  If you tend to write documents to this directory
+# rarely, you may want to use a higher value than the default (10).
+#PAPERLESS_CONSUMER_LOOP_TIME=10
+
+
+###############################################################################
+####                            Interface                                  ####
+###############################################################################
+
+# Override the default UTC time zone here.
+# See https://docs.djangoproject.com/en/1.10/ref/settings/#std:setting-TIME_ZONE
+# for details on how to set it.
 #PAPERLESS_TIME_ZONE=UTC

-# Customize number of list items to show per page
-#PAPERLESS_LIST_PER_PAGE=50
-
-# Customize the default language that tesseract will attempt to use when parsing
-# documents.  It should be a 3-letter language code consistent with ISO 639.
-#PAPERLESS_OCR_LANGUAGE=eng

 # The number of items on each page in the web UI.  This value must be a
 # positive integer, but if you don't define one in paperless.conf, a default of
 # 100 will be used.
 #PAPERLESS_LIST_PER_PAGE=100
-
-# The secret key has a default that should be fine so long as you're hosting
-# Paperless on a closed network.  However, if you're putting this anywhere
-# public, you should change the key to something unique and verbose.
-#PAPERLESS_SECRET_KEY="change-me"
--- a/requirements.txt
+++ b/requirements.txt
@@ -1,4 +1,4 @@
-Django==1.10.5
+Django>=1.11,<2.0
 Pillow>=3.1.1
 django-crispy-forms>=1.6.1
 django-extensions>=1.7.6
@@ -6,18 +6,21 @@ django-filter>=1.0
 django-flat-responsive>=1.2.0
 djangorestframework>=3.5.3
 filemagic>=1.6
+fuzzywuzzy[speedup]==0.15.0
 langdetect>=1.0.7
-pyocr>=0.4.6
+pyocr>=0.4.7
 python-dateutil>=2.6.0
 python-dotenv>=0.6.2
 python-gnupg>=0.3.9
 pytz>=2016.10
-gunicorn==19.6.0
+gunicorn==19.7.1

 # For the tests
+factory-boy
 pytest
 pytest-django
 pytest-sugar
-pep8
+pytest-env
+pycodestyle
 flake8
 tox
--- a/scripts/docker-entrypoint.sh
+++ b/scripts/docker-entrypoint.sh
@@ -7,34 +7,37 @@ map_uidgid() {
    USERMAP_ORIG_UID=$(id -g paperless)
    USERMAP_GID=${USERMAP_GID:-${USERMAP_UID:-$USERMAP_ORIG_GID}}
    USERMAP_UID=${USERMAP_UID:-$USERMAP_ORIG_UID}
-    if [[ ${USERMAP_UID} != ${USERMAP_ORIG_UID} || ${USERMAP_GID} != ${USERMAP_ORIG_GID} ]]; then
+    if [[ ${USERMAP_UID} != "${USERMAP_ORIG_UID}" || ${USERMAP_GID} != "${USERMAP_ORIG_GID}" ]]; then
        echo "Mapping UID and GID for paperless:paperless to $USERMAP_UID:$USERMAP_GID"
-        groupmod -g ${USERMAP_GID} paperless
+        groupmod -g "${USERMAP_GID}" paperless
        sed -i -e "s|:${USERMAP_ORIG_UID}:${USERMAP_GID}:|:${USERMAP_UID}:${USERMAP_GID}:|" /etc/passwd
    fi
 }

 set_permissions() {
-    # Set permissions for consumption directory
-    chgrp paperless "$PAPERLESS_CONSUMPTION_DIR" || {
-        echo "Changing group of consumption directory:"
-        echo "  $PAPERLESS_CONSUMPTION_DIR"
-        echo "failed."
-        echo ""
-        echo "Either try to set it on your host-mounted directory"
-        echo "directly, or make sure that the directory has \`o+x\`"
-        echo "permissions and the files in it at least \`o+r\`."
-    } >&2
-    chmod g+x "$PAPERLESS_CONSUMPTION_DIR" || {
-        echo "Changing group permissions of consumption directory:"
-        echo "  $PAPERLESS_CONSUMPTION_DIR"
-        echo "failed."
-        echo ""
-        echo "Either try to set it on your host-mounted directory"
-        echo "directly, or make sure that the directory has \`o+x\`"
-        echo "permissions and the files in it at least \`o+r\`."
-    } >&2
-
+    # Set permissions for consumption and export directory
+    for dir in PAPERLESS_CONSUMPTION_DIR PAPERLESS_EXPORT_DIR; do
+      # Extract the name of the current directory from $dir for the error message
+      cur_dir_name=$(echo "$dir" | awk -F'_' '{ print tolower($2); }')
+      chgrp paperless "${!dir}" || {
+          echo "Changing group of ${cur_dir_name} directory:"
+          echo "  ${!dir}"
+          echo "failed."
+          echo ""
+          echo "Either try to set it on your host-mounted directory"
+          echo "directly, or make sure that the directory has \`g+wx\`"
+          echo "permissions and the files in it at least \`o+r\`."
+      } >&2
+      chmod g+wx "${!dir}" || {
+          echo "Changing group permissions of ${cur_dir_name} directory:"
+          echo "  ${!dir}"
+          echo "failed."
+          echo ""
+          echo "Either try to set it on your host-mounted directory"
+          echo "directly, or make sure that the directory has \`g+wx\`"
+          echo "permissions and the files in it at least \`o+r\`."
+      } >&2
+    done
    # Set permissions for application directory
    chown -Rh paperless:paperless /usr/src/paperless
 }
@@ -59,11 +62,11 @@ install_languages() {
    # Loop over languages to be installed
    for lang in "${langs[@]}"; do
        pkg="tesseract-ocr-$lang"
-        if dpkg -s "$pkg" 2>&1 > /dev/null; then
+        if dpkg -s "$pkg" > /dev/null 2>&1; then
            continue
        fi

-        if ! apt-cache show "$pkg" 2>&1 > /dev/null; then
+        if ! apt-cache show "$pkg" > /dev/null 2>&1; then
            continue
        fi

--- a/src/documents/admin.py
+++ b/src/documents/admin.py
@@ -70,9 +70,14 @@ class DocumentAdmin(CommonAdmin):
    created_.short_description = "Created"

    def thumbnail(self, obj):
+        if settings.FORCE_SCRIPT_NAME:
+            src_link = "{}/fetch/thumb/{}".format(
+                settings.FORCE_SCRIPT_NAME, obj.id)
+        else:
+            src_link = "/fetch/thumb/{}".format(obj.id)
        png_img = self._html_tag(
            "img",
-            src="/fetch/thumb/{}".format(obj.id),
+            src=src_link,
            width=180,
            alt="Thumbnail of {}".format(obj.file_name),
            title=obj.file_name
--- a/src/documents/consumer.py
+++ b/src/documents/consumer.py
@@ -102,7 +102,7 @@ class Consumer(object):
            parser_class = self._get_parser_class(doc)
            if not parser_class:
                self.log(
-                    "info", "No parsers could be found for {}".format(doc))
+                    "error", "No parsers could be found for {}".format(doc))
                self._ignore.append(doc)
                continue

@@ -160,6 +160,16 @@ class Consumer(object):
            if result:
                options.append(result)

+        self.log(
+            "info",
+            "Parsers available: {}".format(
+                ", ".join([str(o["parser"].__name__) for o in options])
+            )
+        )
+
+        if not options:
+            return None
+
        # Return the parser with the highest weight.
        return sorted(
            options, key=lambda _: _["weight"], reverse=True)[0]["parser"]
--- a/src/documents/forms.py
+++ b/src/documents/forms.py
@@ -2,7 +2,6 @@ import magic
 import os

 from datetime import datetime
-from hashlib import sha256
 from time import mktime

 from django import forms
@@ -14,7 +13,6 @@ from .consumer import Consumer

 class UploadForm(forms.Form):

-    SECRET = settings.SHARED_SECRET
    TYPE_LOOKUP = {
        "application/pdf": Document.TYPE_PDF,
        "image/png": Document.TYPE_PNG,
@@ -32,10 +30,9 @@ class UploadForm(forms.Form):
        required=False
    )
    document = forms.FileField()
-    signature = forms.CharField(max_length=256)

    def __init__(self, *args, **kwargs):
-        forms.Form.__init__(*args, **kwargs)
+        forms.Form.__init__(self, *args, **kwargs)
        self._file_type = None

    def clean_correspondent(self):
@@ -82,17 +79,6 @@ class UploadForm(forms.Form):

        return document

-    def clean(self):
-
-        corresp = self.clened_data.get("correspondent")
-        title = self.cleaned_data.get("title")
-        signature = self.cleaned_data.get("signature")
-
-        if sha256(corresp + title + self.SECRET).hexdigest() == signature:
-            return self.cleaned_data
-
-        raise forms.ValidationError("The signature provided did not validate")
-
    def save(self):
        """
        Since the consumer already does a lot of work, it's easier just to save
@@ -100,11 +86,11 @@ class UploadForm(forms.Form):
        form do that as well.  Think of it as a poor-man's queue server.
        """

-        correspondent = self.clened_data.get("correspondent")
+        correspondent = self.cleaned_data.get("correspondent")
        title = self.cleaned_data.get("title")
        document = self.cleaned_data.get("document")

-        t = int(mktime(datetime.now()))
+        t = int(mktime(datetime.now().timetuple()))
        file_name = os.path.join(
            Consumer.CONSUME,
            "{} - {}.{}".format(correspondent, title, self._file_type)
--- a/src/documents/mail.py
+++ b/src/documents/mail.py
@@ -43,7 +43,10 @@ class Message(Loggable):
    and n attachments, and that we don't care about the message body.
    """

-    SECRET = settings.SHARED_SECRET
+    SECRET = os.getenv(
+        "PAPERLESS_EMAIL_SECRET",
+        os.getenv("PAPERLESS_SHARED_SECRET")  # TODO: Remove after 2017/09
+    )

    def __init__(self, data, group=None):
        """
@@ -153,11 +156,11 @@ class MailFetcher(Loggable):
        Loggable.__init__(self)

        self._connection = None
-        self._host = settings.MAIL_CONSUMPTION["HOST"]
-        self._port = settings.MAIL_CONSUMPTION["PORT"]
-        self._username = settings.MAIL_CONSUMPTION["USERNAME"]
-        self._password = settings.MAIL_CONSUMPTION["PASSWORD"]
-        self._inbox = settings.MAIL_CONSUMPTION["INBOX"]
+        self._host = os.getenv("PAPERLESS_CONSUME_MAIL_HOST")
+        self._port = os.getenv("PAPERLESS_CONSUME_MAIL_PORT")
+        self._username = os.getenv("PAPERLESS_CONSUME_MAIL_USER")
+        self._password = os.getenv("PAPERLESS_CONSUME_MAIL_PASS")
+        self._inbox = os.getenv("PAPERLESS_CONSUME_MAIL_INBOX", "INBOX")

        self._enabled = bool(self._host)

@@ -219,7 +222,7 @@ class MailFetcher(Loggable):
        if not login[0] == "OK":
            raise MailFetcherError("Can't log into mail: {}".format(login[1]))

-        inbox = self._connection.select("INBOX")
+        inbox = self._connection.select(self._inbox)
        if not inbox[0] == "OK":
            raise MailFetcherError("Can't find the inbox: {}".format(inbox[1]))

--- a/src/documents/management/commands/document_consumer.py
+++ b/src/documents/management/commands/document_consumer.py
@@ -28,6 +28,7 @@ class Command(BaseCommand):

        self.file_consumer = None
        self.mail_fetcher = None
+        self.first_iteration = True

        BaseCommand.__init__(self, *args, **kwargs)

@@ -66,6 +67,9 @@ class Command(BaseCommand):
        self.file_consumer.consume()

        # Occasionally fetch mail and store it to be consumed on the next loop
+        # We fetch email when we first start up so that it is not necessary to
+        # wait for 10 minutes after making changes to the config file.
        delta = self.mail_fetcher.last_checked + self.MAIL_DELTA
-        if delta < datetime.datetime.now():
+        if self.first_iteration or delta < datetime.datetime.now():
+            self.first_iteration = False
            self.mail_fetcher.pull()
--- a/src/documents/management/commands/document_exporter.py
+++ b/src/documents/management/commands/document_exporter.py
@@ -10,6 +10,7 @@ from documents.models import Document, Correspondent, Tag
 from paperless.db import GnuPG

 from ...mixins import Renderable
+from documents.settings import EXPORTER_FILE_NAME, EXPORTER_THUMBNAIL_NAME


 class Command(Renderable, BaseCommand):
@@ -61,15 +62,24 @@ class Command(Renderable, BaseCommand):

            document = document_map[document_dict["pk"]]

-            target = os.path.join(self.target, document.file_name)
-            document_dict["__exported_file_name__"] = target
+            file_target = os.path.join(self.target, document.file_name)

-            print("Exporting: {}".format(target))
+            thumbnail_name = document.file_name + "-tumbnail.png"
+            thumbnail_target = os.path.join(self.target, thumbnail_name)

-            with open(target, "wb") as f:
+            document_dict[EXPORTER_FILE_NAME] = document.file_name
+            document_dict[EXPORTER_THUMBNAIL_NAME] = thumbnail_name
+
+            print("Exporting: {}".format(file_target))
+
+            t = int(time.mktime(document.created.timetuple()))
+            with open(file_target, "wb") as f:
                f.write(GnuPG.decrypted(document.source_file))
-                t = int(time.mktime(document.created.timetuple()))
-                os.utime(target, times=(t, t))
+                os.utime(file_target, times=(t, t))
+
+            with open(thumbnail_target, "wb") as f:
+                f.write(GnuPG.decrypted(document.thumbnail_file))
+                os.utime(thumbnail_target, times=(t, t))

        manifest += json.loads(
            serializers.serialize("json", Correspondent.objects.all()))
--- a/src/documents/management/commands/document_importer.py
+++ b/src/documents/management/commands/document_importer.py
@@ -10,6 +10,8 @@ from paperless.db import GnuPG

 from ...mixins import Renderable

+from documents.settings import EXPORTER_FILE_NAME, EXPORTER_THUMBNAIL_NAME
+

 class Command(Renderable, BaseCommand):

@@ -70,13 +72,13 @@ class Command(Renderable, BaseCommand):
            if not record["model"] == "documents.document":
                continue

-            if "__exported_file_name__" not in record:
+            if EXPORTER_FILE_NAME not in record:
                raise CommandError(
                    'The manifest file contains a record which does not '
                    'refer to an actual document file.'
                )

-            doc_file = record["__exported_file_name__"]
+            doc_file = record[EXPORTER_FILE_NAME]
            if not os.path.exists(os.path.join(self.source, doc_file)):
                raise CommandError(
                    'The manifest file refers to "{}" which does not '
@@ -90,10 +92,21 @@ class Command(Renderable, BaseCommand):
            if not record["model"] == "documents.document":
                continue

-            doc_file = record["__exported_file_name__"]
+            doc_file = record[EXPORTER_FILE_NAME]
+            thumb_file = record[EXPORTER_THUMBNAIL_NAME]
            document = Document.objects.get(pk=record["pk"])
-            with open(doc_file, "rb") as unencrypted:
+
+            document_path = os.path.join(self.source, doc_file)
+            thumbnail_path = os.path.join(self.source, thumb_file)
+
+            with open(document_path, "rb") as unencrypted:
                with open(document.source_path, "wb") as encrypted:
                    print("Encrypting {} and saving it to {}".format(
                        doc_file, document.source_path))
                    encrypted.write(GnuPG.encrypted(unencrypted))
+
+            with open(thumbnail_path, "rb") as unencrypted:
+                with open(document.thumbnail_path, "wb") as encrypted:
+                    print("Encrypting {} and saving it to {}".format(
+                        thumb_file, document.thumbnail_path))
+                    encrypted.write(GnuPG.encrypted(unencrypted))
--- a/src/documents/managers.py
+++ b/src/documents/managers.py
@@ -50,7 +50,7 @@ class GroupConcat(models.Aggregate):

    def _get_template(self, separator):
        if self.engine == self.ENGINE_MYSQL:
-            return "%(function)s(%(expressions)s, SEPARATOR '{}')".format(
+            return "%(function)s(%(expressions)s SEPARATOR '{}')".format(
                separator)
        return "%(function)s(%(expressions)s, '{}')".format(separator)

--- a/src/documents/migrations/0012_auto_20160305_0040.py
+++ b/src/documents/migrations/0012_auto_20160305_0040.py
@@ -38,6 +38,9 @@ class GnuPG(object):

 def move_documents_and_create_thumbnails(apps, schema_editor):

+    os.makedirs(os.path.join(settings.MEDIA_ROOT, "documents", "originals"), exist_ok=True)
+    os.makedirs(os.path.join(settings.MEDIA_ROOT, "documents", "thumbnails"), exist_ok=True)
+
    documents = os.listdir(os.path.join(settings.MEDIA_ROOT, "documents"))

    if set(documents) == {"originals", "thumbnails"}:
--- a/src/documents/migrations/0017_auto_20170512_0507.py
+++ b/src/documents/migrations/0017_auto_20170512_0507.py
@@ -0,0 +1,25 @@
+# -*- coding: utf-8 -*-
+# Generated by Django 1.10.5 on 2017-05-12 05:07
+from __future__ import unicode_literals
+
+from django.db import migrations, models
+
+
+class Migration(migrations.Migration):
+
+    dependencies = [
+        ('documents', '0016_auto_20170325_1558'),
+    ]
+
+    operations = [
+        migrations.AlterField(
+            model_name='correspondent',
+            name='matching_algorithm',
+            field=models.PositiveIntegerField(choices=[(1, 'Any'), (2, 'All'), (3, 'Literal'), (4, 'Regular Expression'), (5, 'Fuzzy Match')], default=1, help_text='Which algorithm you want to use when matching text to the OCR\'d PDF.  Here, "any" looks for any occurrence of any word provided in the PDF, while "all" requires that every word provided appear in the PDF, albeit not in the order provided.  A "literal" match means that the text you enter must appear in the PDF exactly as you\'ve entered it, and "regular expression" uses a regex to match the PDF.  (If you don\'t know what a regex is, you probably don\'t want this option.)  Finally, a "fuzzy match" looks for words or phrases that are mostly—but not exactly—the same, which can be useful for matching against documents containg imperfections that foil accurate OCR.'),
+        ),
+        migrations.AlterField(
+            model_name='tag',
+            name='matching_algorithm',
+            field=models.PositiveIntegerField(choices=[(1, 'Any'), (2, 'All'), (3, 'Literal'), (4, 'Regular Expression'), (5, 'Fuzzy Match')], default=1, help_text='Which algorithm you want to use when matching text to the OCR\'d PDF.  Here, "any" looks for any occurrence of any word provided in the PDF, while "all" requires that every word provided appear in the PDF, albeit not in the order provided.  A "literal" match means that the text you enter must appear in the PDF exactly as you\'ve entered it, and "regular expression" uses a regex to match the PDF.  (If you don\'t know what a regex is, you probably don\'t want this option.)  Finally, a "fuzzy match" looks for words or phrases that are mostly—but not exactly—the same, which can be useful for matching against documents containg imperfections that foil accurate OCR.'),
+        ),
+    ]
--- a/src/documents/migrations/0018_auto_20170715_1712.py
+++ b/src/documents/migrations/0018_auto_20170715_1712.py
@@ -0,0 +1,21 @@
+# -*- coding: utf-8 -*-
+# Generated by Django 1.10.5 on 2017-07-15 17:12
+from __future__ import unicode_literals
+
+from django.db import migrations, models
+import django.db.models.deletion
+
+
+class Migration(migrations.Migration):
+
+    dependencies = [
+        ('documents', '0017_auto_20170512_0507'),
+    ]
+
+    operations = [
+        migrations.AlterField(
+            model_name='document',
+            name='correspondent',
+            field=models.ForeignKey(blank=True, null=True, on_delete=django.db.models.deletion.SET_NULL, related_name='documents', to='documents.Correspondent'),
+        ),
+    ]
--- a/src/documents/models.py
+++ b/src/documents/models.py
@@ -1,3 +1,5 @@
+# coding=utf-8
+
 import dateutil.parser
 import logging
 import os
@@ -5,6 +7,7 @@ import re
 import uuid

 from collections import OrderedDict
+from fuzzywuzzy import fuzz

 from django.conf import settings
 from django.core.urlresolvers import reverse
@@ -21,11 +24,13 @@ class MatchingModel(models.Model):
    MATCH_ALL = 2
    MATCH_LITERAL = 3
    MATCH_REGEX = 4
+    MATCH_FUZZY = 5
    MATCHING_ALGORITHMS = (
        (MATCH_ANY, "Any"),
        (MATCH_ALL, "All"),
        (MATCH_LITERAL, "Literal"),
        (MATCH_REGEX, "Regular Expression"),
+        (MATCH_FUZZY, "Fuzzy Match"),
    )

    name = models.CharField(max_length=128, unique=True)
@@ -42,8 +47,11 @@ class MatchingModel(models.Model):
            "provided appear in the PDF, albeit not in the order provided.  A "
            "\"literal\" match means that the text you enter must appear in "
            "the PDF exactly as you've entered it, and \"regular expression\" "
-            "uses a regex to match the PDF.  If you don't know what a regex "
-            "is, you probably don't want this option."
+            "uses a regex to match the PDF.  (If you don't know what a regex "
+            "is, you probably don't want this option.)  Finally, a \"fuzzy "
+            "match\" looks for words or phrases that are mostly—but not "
+            "exactly—the same, which can be useful for matching against "
+            "documents containg imperfections that foil accurate OCR."
        )
    )

@@ -83,7 +91,7 @@ class MatchingModel(models.Model):
            search_kwargs = {"flags": re.IGNORECASE}

        if self.matching_algorithm == self.MATCH_ALL:
-            for word in self.match.split(" "):
+            for word in self._split_match():
                search_result = re.search(
                    r"\b{}\b".format(word), text, **search_kwargs)
                if not search_result:
@@ -91,7 +99,7 @@ class MatchingModel(models.Model):
            return True

        if self.matching_algorithm == self.MATCH_ANY:
-            for word in self.match.split(" "):
+            for word in self._split_match():
                if re.search(r"\b{}\b".format(word), text, **search_kwargs):
                    return True
            return False
@@ -104,8 +112,32 @@ class MatchingModel(models.Model):
            return bool(re.search(
                re.compile(self.match, **search_kwargs), text))

+        if self.matching_algorithm == self.MATCH_FUZZY:
+            match = re.sub(r'[^\w\s]', '', self.match)
+            text = re.sub(r'[^\w\s]', '', text)
+            if self.is_insensitive:
+                match = match.lower()
+                text = text.lower()
+
+            return True if fuzz.partial_ratio(match, text) >= 90 else False
+
        raise NotImplementedError("Unsupported matching algorithm")

+    def _split_match(self):
+        """
+        Splits the match to individual keywords, getting rid of unnecessary
+        spaces and grouping quoted words together.
+
+        Example:
+          '  some random  words "with   quotes  " and   spaces'
+            ==>
+          ["some", "random", "words", "with\s+quotes", "and", "spaces"]
+        """
+        findterms = re.compile(r'"([^"]+)"|(\S+)').findall
+        normspace = re.compile(r"\s+").sub
+        return [normspace(r"\s+", (t[0] or t[1]).strip())
+                for t in findterms(self.match)]
+
    def save(self, *args, **kwargs):

        self.match = self.match.lower()
@@ -157,7 +189,12 @@ class Document(models.Model):
    TYPES = (TYPE_PDF, TYPE_PNG, TYPE_JPG, TYPE_GIF, TYPE_TIF,)

    correspondent = models.ForeignKey(
-        Correspondent, blank=True, null=True, related_name="documents")
+        Correspondent,
+        blank=True,
+        null=True,
+        related_name="documents",
+        on_delete=models.SET_NULL
+    )

    title = models.CharField(max_length=128, blank=True, db_index=True)

@@ -301,45 +338,45 @@ class FileInfo(object):
            r"(?P<correspondent>.*) - "
            r"(?P<title>.*) - "
            r"(?P<tags>[a-z0-9\-,]*)"
-            r"\.(?P<extension>pdf|jpe?g|png|gif|tiff)$",
+            r"\.(?P<extension>pdf|jpe?g|png|gif|tiff?)$",
            flags=re.IGNORECASE
        )),
        ("created-title-tags", re.compile(
            r"^(?P<created>\d\d\d\d\d\d\d\d(\d\d\d\d\d\d)?Z) - "
            r"(?P<title>.*) - "
            r"(?P<tags>[a-z0-9\-,]*)"
-            r"\.(?P<extension>pdf|jpe?g|png|gif|tiff)$",
+            r"\.(?P<extension>pdf|jpe?g|png|gif|tiff?)$",
            flags=re.IGNORECASE
        )),
        ("created-correspondent-title", re.compile(
            r"^(?P<created>\d\d\d\d\d\d\d\d(\d\d\d\d\d\d)?Z) - "
            r"(?P<correspondent>.*) - "
            r"(?P<title>.*)"
-            r"\.(?P<extension>pdf|jpe?g|png|gif|tiff)$",
+            r"\.(?P<extension>pdf|jpe?g|png|gif|tiff?)$",
            flags=re.IGNORECASE
        )),
        ("created-title", re.compile(
            r"^(?P<created>\d\d\d\d\d\d\d\d(\d\d\d\d\d\d)?Z) - "
            r"(?P<title>.*)"
-            r"\.(?P<extension>pdf|jpe?g|png|gif|tiff)$",
+            r"\.(?P<extension>pdf|jpe?g|png|gif|tiff?)$",
            flags=re.IGNORECASE
        )),
        ("correspondent-title-tags", re.compile(
            r"(?P<correspondent>.*) - "
            r"(?P<title>.*) - "
            r"(?P<tags>[a-z0-9\-,]*)"
-            r"\.(?P<extension>pdf|jpe?g|png|gif|tiff)$",
+            r"\.(?P<extension>pdf|jpe?g|png|gif|tiff?)$",
            flags=re.IGNORECASE
        )),
        ("correspondent-title", re.compile(
            r"(?P<correspondent>.*) - "
            r"(?P<title>.*)?"
-            r"\.(?P<extension>pdf|jpe?g|png|gif|tiff)$",
+            r"\.(?P<extension>pdf|jpe?g|png|gif|tiff?)$",
            flags=re.IGNORECASE
        )),
        ("title", re.compile(
            r"(?P<title>.*)"
-            r"\.(?P<extension>pdf|jpe?g|png|gif|tiff)$",
+            r"\.(?P<extension>pdf|jpe?g|png|gif|tiff?)$",
            flags=re.IGNORECASE
        ))
    ])
@@ -382,6 +419,8 @@ class FileInfo(object):
        r = extension.lower()
        if r == "jpeg":
            return "jpg"
+        if r == "tif":
+            return "tiff"
        return r

    @classmethod
--- a/src/documents/settings.py
+++ b/src/documents/settings.py
@@ -0,0 +1,4 @@
+# Defines the names of file/thumbnail for the manifest
+# for exporting/importing commands
+EXPORTER_FILE_NAME = "__exported_file_name__"
+EXPORTER_THUMBNAIL_NAME = "__exported_thumbnail_name__"
--- a/src/documents/templates/admin/change_list_results.html
+++ b/src/documents/templates/admin/change_list_results.html
@@ -1,6 +0,0 @@
-{% load hacks %}
-
-{# See documents.templatetags.hacks.change_list_results for an explanation #}
-
-{% change_list_results %}
-
--- a/src/documents/templates/admin/documents/document/change_form.html
+++ b/src/documents/templates/admin/documents/document/change_form.html
@@ -0,0 +1,13 @@
+{% extends 'admin/change_form.html' %}
+
+
+{% block footer %}
+
+	{{ block.super }}
+
+	{# Hack to force Django to make the created date a date input rather than `text` (the default) #}
+	<script>
+		django.jQuery(".field-created input").first().attr("type", "date")
+	</script>
+
+{% endblock footer %}
--- a/src/documents/templates/admin/documents/document/change_list.html
+++ b/src/documents/templates/admin/documents/document/change_list.html
@@ -0,0 +1,12 @@
+{% extends 'admin/change_list.html' %}
+
+
+{% load admin_actions from admin_list%}
+{% load result_list from hacks %}
+
+
+{% block result_list %}
+	{% if action_form and actions_on_top and cl.show_admin_actions %}{% admin_actions %}{% endif %}
+	{% result_list cl %}
+	{% if action_form and actions_on_bottom and cl.show_admin_actions %}{% admin_actions %}{% endif %}
+{% endblock %}
--- a/src/documents/templates/admin/documents/document/change_list_results.html
+++ b/src/documents/templates/admin/documents/document/change_list_results.html
@@ -29,18 +29,13 @@
  .result .header {
    padding: 5px;
    background-color: #79AEC8;
-    height: 6em;
-  }
-  .result .header .checkbox {
-    margin-right: 5px;
  }
  .result .header .checkbox{
    width: 5%;
    float: left;
  }
  .result .header .info {
-    width: 90%;
-    float: left;
+    margin-left: 10%;
  }
  .result .header a,
  .result a.tag {
--- a/src/documents/templates/documents/index.html
+++ b/src/documents/templates/documents/index.html
@@ -6,5 +6,6 @@
    <meta charset="utf-8">
  </head>
  <body>
+		{# One day someone (maybe even myself) is going to write a proper web front-end for Paperless, and this is where it'll start. #}
  </body>
 </html>
--- a/src/documents/templatetags/hacks.py
+++ b/src/documents/templatetags/hacks.py
@@ -1,41 +1,28 @@
-import os
-
-from django.contrib import admin
+from django.contrib.admin.templatetags.admin_list import (
+    result_headers,
+    result_hidden_fields,
+    results
+)
 from django.template import Library
-from django.template.loader import get_template
-
-from ..models import Document


 register = Library()


-@register.simple_tag(takes_context=True)
-def change_list_results(context):
+@register.inclusion_tag("admin/documents/document/change_list_results.html")
+def result_list(cl):
    """
-    Django has a lot of places where you can override defaults, but
-    unfortunately, `change_list_results.html` is not one of them.  In fact,
-    it's a downright pain in the ass to override this file on a per-model basis
-    and this is the cleanest way I could come up with.
-
-    Basically all we've done here is defined `change_list_results.html` in an
-    `admin` directory which globally overrides that file for *every* model.
-    That template however simply loads this templatetag which determines
-    whether we're currently looking at a `Document` listing or something else
-    and loads the appropriate file in each case.
-
-    Better work arounds for this are welcome as I hate this myself, but at the
-    moment, it's all I could come up with.
+    Copy/pasted from django.contrib.admin.templatetags.admin_list just so I can
+    modify the value passed to `.inclusion_tag()` in the decorator here.  There
+    must be a cleaner way... right?
    """
-
-    path = os.path.join(
-        os.path.dirname(admin.__file__),
-        "templates",
-        "admin",
-        "change_list_results.html"
-    )
-
-    if context["cl"].model == Document:
-        path = "admin/documents/document/change_list_results.html"
-
-    return get_template(path).render(context)
+    headers = list(result_headers(cl))
+    num_sorted_fields = 0
+    for h in headers:
+        if h['sortable'] and h['sorted']:
+            num_sorted_fields += 1
+    return {'cl': cl,
+            'result_hidden_fields': list(result_hidden_fields(cl)),
+            'result_headers': headers,
+            'num_sorted_fields': num_sorted_fields,
+            'results': list(results(cl))}
--- a/src/documents/tests/factories.py
+++ b/src/documents/tests/factories.py
@@ -0,0 +1,17 @@
+import factory
+
+from ..models import Document, Correspondent
+
+
+class CorrespondentFactory(factory.DjangoModelFactory):
+
+    class Meta:
+        model = Correspondent
+
+    name = factory.Faker("name")
+
+
+class DocumentFactory(factory.DjangoModelFactory):
+
+    class Meta:
+        model = Document
--- a/src/documents/tests/test_consumer.py
+++ b/src/documents/tests/test_consumer.py
@@ -1,15 +1,66 @@
 from django.test import TestCase
+from unittest import mock

+from ..consumer import Consumer
 from ..models import FileInfo


+class TestConsumer(TestCase):
+
+    class DummyParser(object):
+        pass
+
+    def test__get_parser_class_1_parser(self):
+        self.assertEqual(
+            self._get_consumer()._get_parser_class("doc.pdf"),
+            self.DummyParser
+        )
+
+    @mock.patch("documents.consumer.Consumer.CONSUME")
+    @mock.patch("documents.consumer.os.makedirs")
+    @mock.patch("documents.consumer.os.path.exists", return_value=True)
+    @mock.patch("documents.consumer.document_consumer_declaration.send")
+    def test__get_parser_class_n_parsers(self, m, *args):
+
+        class DummyParser1(object):
+            pass
+
+        class DummyParser2(object):
+            pass
+
+        m.return_value = (
+            (None, lambda _: {"weight": 0, "parser": DummyParser1}),
+            (None, lambda _: {"weight": 1, "parser": DummyParser2}),
+        )
+
+        self.assertEqual(Consumer()._get_parser_class("doc.pdf"), DummyParser2)
+
+    @mock.patch("documents.consumer.Consumer.CONSUME")
+    @mock.patch("documents.consumer.os.makedirs")
+    @mock.patch("documents.consumer.os.path.exists", return_value=True)
+    @mock.patch("documents.consumer.document_consumer_declaration.send")
+    def test__get_parser_class_0_parsers(self, m, *args):
+        m.return_value = ((None, lambda _: None),)
+        self.assertIsNone(Consumer()._get_parser_class("doc.pdf"))
+
+    @mock.patch("documents.consumer.Consumer.CONSUME")
+    @mock.patch("documents.consumer.os.makedirs")
+    @mock.patch("documents.consumer.os.path.exists", return_value=True)
+    @mock.patch("documents.consumer.document_consumer_declaration.send")
+    def _get_consumer(self, m, *args):
+        m.return_value = (
+            (None, lambda _: {"weight": 0, "parser": self.DummyParser}),
+        )
+        return Consumer()
+
+
 class TestAttributes(TestCase):

    TAGS = ("tag1", "tag2", "tag3")
    EXTENSIONS = (
-        "pdf", "png", "jpg", "jpeg", "gif",
-        "PDF", "PNG", "JPG", "JPEG", "GIF",
-        "PdF", "PnG", "JpG", "JPeG", "GiF",
+        "pdf", "png", "jpg", "jpeg", "gif", "tiff", "tif",
+        "PDF", "PNG", "JPG", "JPEG", "GIF", "TIFF", "TIF",
+        "PdF", "PnG", "JpG", "JPeG", "GiF", "TiFf", "TiF",
    )

    def _test_guess_attributes_from_name(self, path, sender, title, tags):
@@ -29,6 +80,8 @@ class TestAttributes(TestCase):
            self.assertEqual(tuple([t.slug for t in file_info.tags]), tags, f)
            if extension.lower() == "jpeg":
                self.assertEqual(file_info.extension, "jpg", f)
+            elif extension.lower() == "tif":
+                self.assertEqual(file_info.extension, "tiff", f)
            else:
                self.assertEqual(file_info.extension, extension.lower(), f)

--- a/src/documents/tests/test_importer.py
+++ b/src/documents/tests/test_importer.py
@@ -3,6 +3,8 @@ from django.test import TestCase

 from ..management.commands.document_importer import Command

+from documents.settings import EXPORTER_FILE_NAME
+

 class TestImporter(TestCase):

@@ -27,7 +29,7 @@ class TestImporter(TestCase):

        cmd.manifest = [{
            "model": "documents.document",
-            "__exported_file_name__": "noexist.pdf"
+            EXPORTER_FILE_NAME: "noexist.pdf"
        }]
        # self.assertRaises(CommandError, cmd._check_manifest)
        with self.assertRaises(CommandError) as cm:
--- a/src/documents/tests/test_matchables.py
+++ b/src/documents/tests/test_matchables.py
@@ -1,6 +1,6 @@
 from random import randint

-from django.test import TestCase
+from django.test import TestCase, override_settings

 from ..models import Correspondent, Document, Tag
 from ..signals import document_consumption_finished
@@ -16,9 +16,15 @@ class TestMatching(TestCase):
                matching_algorithm=getattr(klass, algorithm)
            )
            for string in true:
-                self.assertTrue(instance.matches(string))
+                self.assertTrue(
+                    instance.matches(string),
+                    '"%s" should match "%s" but it does not' % (text, string)
+                )
            for string in false:
-                self.assertFalse(instance.matches(string))
+                self.assertFalse(
+                    instance.matches(string),
+                    '"%s" should not match "%s" but it does' % (text, string)
+                )

    def test_match_all(self):

@@ -54,6 +60,21 @@ class TestMatching(TestCase):
            )
        )

+        self._test_matching(
+            'brown fox "lazy dogs"',
+            "MATCH_ALL",
+            (
+                "the quick brown fox jumped over the lazy dogs",
+                "the quick brown fox jumped over the lazy  dogs",
+            ),
+            (
+                "the quick fox jumped over the lazy dogs",
+                "the quick brown wolf jumped over the lazy dogs",
+                "the quick brown fox jumped over the fat dogs",
+                "the quick brown fox jumped over the lazy... dogs",
+            )
+        )
+
    def test_match_any(self):

        self._test_matching(
@@ -89,6 +110,18 @@ class TestMatching(TestCase):
            )
        )

+        self._test_matching(
+            '"brown fox" " lazy  dogs "',
+            "MATCH_ANY",
+            (
+                "the quick brown fox",
+                "jumped over the lazy  dogs.",
+            ),
+            (
+                "the lazy fox jumped over the brown dogs",
+            )
+        )
+
    def test_match_literal(self):

        self._test_matching(
@@ -149,8 +182,25 @@ class TestMatching(TestCase):
            )
        )

+    def test_match_fuzzy(self):

-class TestApplications(TestCase):
+        self._test_matching(
+            "Springfield, Miss.",
+            "MATCH_FUZZY",
+            (
+                "1220 Main Street, Springf eld, Miss.",
+                "1220 Main Street, Spring field, Miss.",
+                "1220 Main Street, Springfeld, Miss.",
+                "1220 Main Street Springfield Miss",
+            ),
+            (
+                "1220 Main Street, Springfield, Mich.",
+            )
+        )
+
+
+@override_settings(POST_CONSUME_SCRIPT=None)
+class TestDocumentConsumptionFinishedSignal(TestCase):
    """
    We make use of document_consumption_finished, so we should test that it's
    doing what we expect wrt to tag & correspondent matching.
--- a/src/documents/tests/test_models.py
+++ b/src/documents/tests/test_models.py
@@ -0,0 +1,31 @@
+from django.test import TestCase
+
+from ..models import Document, Correspondent
+from .factories import DocumentFactory, CorrespondentFactory
+
+
+class CorrespondentTestCase(TestCase):
+
+    def test___str__(self):
+        for s in ("test", "οχι", "test with fun_charÅc'\"terß"):
+            correspondent = CorrespondentFactory.create(name=s)
+            self.assertEqual(str(correspondent), s)
+
+
+class DocumentTestCase(TestCase):
+
+    def test_correspondent_deletion_does_not_cascade(self):
+
+        self.assertEqual(Correspondent.objects.all().count(), 0)
+        correspondent = CorrespondentFactory.create()
+        self.assertEqual(Correspondent.objects.all().count(), 1)
+
+        self.assertEqual(Document.objects.all().count(), 0)
+        DocumentFactory.create(correspondent=correspondent)
+        self.assertEqual(Document.objects.all().count(), 1)
+        self.assertIsNotNone(Document.objects.all().first().correspondent)
+
+        correspondent.delete()
+        self.assertEqual(Correspondent.objects.all().count(), 0)
+        self.assertEqual(Document.objects.all().count(), 1)
+        self.assertIsNone(Document.objects.all().first().correspondent)
--- a/src/documents/views.py
+++ b/src/documents/views.py
@@ -1,5 +1,4 @@
-from django.http import HttpResponse
-from django.views.decorators.csrf import csrf_exempt
+from django.http import HttpResponse, HttpResponseBadRequest
 from django.views.generic import DetailView, FormView, TemplateView
 from django_filters.rest_framework import DjangoFilterBackend
 from paperless.db import GnuPG
@@ -81,15 +80,12 @@ class PushView(SessionOrBasicAuthMixin, FormView):

    form_class = UploadForm

-    @classmethod
-    def as_view(cls, **kwargs):
-        return csrf_exempt(FormView.as_view(**kwargs))
-
    def form_valid(self, form):
-        return HttpResponse("1")
+        form.save()
+        return HttpResponse("1", status=202)

    def form_invalid(self, form):
-        return HttpResponse("0")
+        return HttpResponseBadRequest(str(form.errors))


 class CorrespondentViewSet(ModelViewSet):
--- a/src/paperless/checks.py
+++ b/src/paperless/checks.py
@@ -84,3 +84,20 @@ def binaries_check(app_configs, **kwargs):
            check_messages.append(Warning(error.format(binary), hint))

    return check_messages
+
+
+@register()
+def config_check(app_configs, **kwargs):
+    warning = (
+        "It looks like you have PAPERLESS_SHARED_SECRET defined.  Note that "
+        "in the \npast, this variable was used for both API authentication "
+        "and as the mail \nkeyword.  As the API no no longer uses it, this "
+        "variable has been renamed to \nPAPERLESS_EMAIL_SECRET, so if you're "
+        "using the mail feature, you'd best update \nyour variable name.\n\n"
+        "The old variable will stop working in a few months."
+    )
+
+    if os.getenv("PAPERLESS_SHARED_SECRET"):
+        return [Warning(warning)]
+
+    return []
--- a/src/paperless/settings.py
+++ b/src/paperless/settings.py
@@ -4,10 +4,10 @@ Django settings for paperless project.
 Generated by 'django-admin startproject' using Django 1.9.

 For more information on this file, see
-https://docs.djangoproject.com/en/1.9/topics/settings/
+https://docs.djangoproject.com/en/1.10/topics/settings/

 For the full list of settings and their values, see
-https://docs.djangoproject.com/en/1.9/ref/settings/
+https://docs.djangoproject.com/en/1.10/ref/settings/
 """

 import os
@@ -25,7 +25,7 @@ BASE_DIR = os.path.dirname(os.path.dirname(os.path.abspath(__file__)))


 # Quick-start development settings - unsuitable for production
-# See https://docs.djangoproject.com/en/1.9/howto/deployment/checklist/
+# See https://docs.djangoproject.com/en/1.10/howto/deployment/checklist/

 # The secret key has a default that should be fine so long as you're hosting
 # Paperless on a closed network.  However, if you're putting this anywhere
@@ -47,7 +47,8 @@ _allowed_hosts = os.getenv("PAPERLESS_ALLOWED_HOSTS")
 if _allowed_hosts:
    ALLOWED_HOSTS = _allowed_hosts.split(",")

-
+FORCE_SCRIPT_NAME = os.getenv("PAPERLESS_FORCE_SCRIPT_NAME")
+    
 # Application definition

 INSTALLED_APPS = [
@@ -69,6 +70,7 @@ INSTALLED_APPS = [

    "rest_framework",
    "crispy_forms",
+    "django_filters"

 ]

@@ -108,7 +110,7 @@ WSGI_APPLICATION = 'paperless.wsgi.application'


 # Database
-# https://docs.djangoproject.com/en/1.9/ref/settings/#databases
+# https://docs.djangoproject.com/en/1.10/ref/settings/#databases

 DATABASES = {
    "default": {
@@ -133,7 +135,7 @@ if os.getenv("PAPERLESS_DBUSER") and os.getenv("PAPERLESS_DBPASS"):


 # Password validation
-# https://docs.djangoproject.com/en/1.9/ref/settings/#auth-password-validators
+# https://docs.djangoproject.com/en/1.10/ref/settings/#auth-password-validators

 AUTH_PASSWORD_VALIDATORS = [
    {
@@ -152,7 +154,7 @@ AUTH_PASSWORD_VALIDATORS = [


 # Internationalization
-# https://docs.djangoproject.com/en/1.9/topics/i18n/
+# https://docs.djangoproject.com/en/1.10/topics/i18n/

 LANGUAGE_CODE = 'en-us'

@@ -166,7 +168,7 @@ USE_TZ = True


 # Static files (CSS, JavaScript, Images)
-# https://docs.djangoproject.com/en/1.9/howto/static-files/
+# https://docs.djangoproject.com/en/1.10/howto/static-files/

 STATIC_ROOT = os.getenv(
    "PAPERLESS_STATICDIR", os.path.join(BASE_DIR, "..", "static"))
@@ -236,18 +238,6 @@ CONSUMPTION_DIR = os.getenv("PAPERLESS_CONSUMPTION_DIR")
 # slowly, you may want to use a higher value than the default.
 CONSUMER_LOOP_TIME = int(os.getenv("PAPERLESS_CONSUMER_LOOP_TIME", 10))

-# If you want to use IMAP mail consumption, populate this with useful values.
-# If you leave HOST set to None, we assume you're not going to use this
-# feature.
-MAIL_CONSUMPTION = {
-    "HOST": os.getenv("PAPERLESS_CONSUME_MAIL_HOST"),
-    "PORT": os.getenv("PAPERLESS_CONSUME_MAIL_PORT"),
-    "USERNAME": os.getenv("PAPERLESS_CONSUME_MAIL_USER"),
-    "PASSWORD": os.getenv("PAPERLESS_CONSUME_MAIL_PASS"),
-    "USE_SSL": os.getenv("PAPERLESS_CONSUME_MAIL_USE_SSL", "y").lower() == "y",  # If True, use SSL/TLS to connect
-    "INBOX": "INBOX"  # The name of the inbox on the server
-}
-
 # This is used to encrypt the original documents and decrypt them later when
 # you want to download them.  Set it and change the permissions on this file to
 # 0600, or set it to `None` and you'll be prompted for the passphrase at
@@ -257,11 +247,6 @@ MAIL_CONSUMPTION = {
 # files.
 PASSPHRASE = os.getenv("PAPERLESS_PASSPHRASE")

-# If you intend to use the "API" to push files into the consumer, you'll need
-# to provide a shared secret here.  Leaving this as the default will disable
-# the API.
-SHARED_SECRET = os.getenv("PAPERLESS_SHARED_SECRET", "")
-
 # Trigger a script after every successful document consumption?
 PRE_CONSUME_SCRIPT = os.getenv("PAPERLESS_PRE_CONSUME_SCRIPT")
 POST_CONSUME_SCRIPT = os.getenv("PAPERLESS_POST_CONSUME_SCRIPT")
--- a/src/paperless/urls.py
+++ b/src/paperless/urls.py
@@ -1,28 +1,17 @@
-"""paperless URL Configuration
-
-The `urlpatterns` list routes URLs to views. For more information please see:
-    https://docs.djangoproject.com/en/1.9/topics/http/urls/
-Examples:
-Function views
-    1. Add an import:  from my_app import views
-    2. Add a URL to urlpatterns:  url(r'^$', views.home, name='home')
-Class-based views
-    1. Add an import:  from other_app.views import Home
-    2. Add a URL to urlpatterns:  url(r'^$', Home.as_view(), name='home')
-Including another URLconf
-    1. Add an import:  from blog import urls as blog_urls
-    2. Import the include() function: from django.conf.urls import url, include
-    3. Add a URL to urlpatterns:  url(r'^blog/', include(blog_urls))
-"""
 from django.conf import settings
-from django.conf.urls import url, static, include
+from django.conf.urls import include, static, url
 from django.contrib import admin
-
+from django.views.decorators.csrf import csrf_exempt
+from django.views.generic import RedirectView
 from rest_framework.routers import DefaultRouter

 from documents.views import (
-    IndexView, FetchView, PushView,
-    CorrespondentViewSet, TagViewSet, DocumentViewSet, LogViewSet
+    CorrespondentViewSet,
+    DocumentViewSet,
+    FetchView,
+    LogViewSet,
+    PushView,
+    TagViewSet
 )
 from reminders.views import ReminderViewSet

@@ -42,9 +31,6 @@ urlpatterns = [
    ),
    url(r"^api/", include(router.urls, namespace="drf")),

-    # Normal pages (coming soon)
-    # url(r"^$", IndexView.as_view(), name="index"),
-
    # File downloads
    url(
        r"^fetch/(?P<kind>doc|thumb)/(?P<pk>\d+)$",
@@ -52,11 +38,20 @@ urlpatterns = [
        name="fetch"
    ),

+    # File uploads
+    url(r"^push$", csrf_exempt(PushView.as_view()), name="push"),
+
    # The Django admin
    url(r"admin/", admin.site.urls),
-    url(r"", admin.site.urls),  # This is going away
+
+    # Catch all redirect back to /admin
+    url(r"", RedirectView.as_view(permanent=True, url="/admin/")),

 ] + static.static(settings.MEDIA_URL, document_root=settings.MEDIA_ROOT)

-if settings.SHARED_SECRET:
-    urlpatterns.insert(0, url(r"^push$", PushView.as_view(), name="push"))
+# Text in each page's <h1> (and above login form).
+admin.site.site_header = 'Paperless'
+# Text at the end of each page's <title>.
+admin.site.site_title = 'Paperless'
+# Text at the top of the admin index page.
+admin.site.index_title = 'Paperless administration'
--- a/src/paperless/version.py
+++ b/src/paperless/version.py
@@ -1 +1 @@
-__version__ = (0, 3, 6)
+__version__ = (1, 0, 0)
--- a/src/paperless/wsgi.py
+++ b/src/paperless/wsgi.py
@@ -4,7 +4,7 @@ WSGI config for paperless project.
 It exposes the WSGI callable as a module-level variable named ``application``.

 For more information on this file, see
-https://docs.djangoproject.com/en/1.9/howto/deployment/wsgi/
+https://docs.djangoproject.com/en/1.10/howto/deployment/wsgi/
 """

 import os
--- a/src/paperless_tesseract/signals.py
+++ b/src/paperless_tesseract/signals.py
@@ -5,7 +5,7 @@ from .parsers import RasterisedDocumentParser

 class ConsumerDeclaration(object):

-    MATCHING_FILES = re.compile("^.*\.(pdf|jpg|gif|png|tiff|pnm|bmp)$")
+    MATCHING_FILES = re.compile("^.*\.(pdf|jpe?g|gif|png|tiff?|pnm|bmp)$")

    @classmethod
    def handle(cls, sender, **kwargs):
@@ -14,7 +14,7 @@ class ConsumerDeclaration(object):
    @classmethod
    def test(cls, doc):

-        if cls.MATCHING_FILES.match(doc):
+        if cls.MATCHING_FILES.match(doc.lower()):
            return {
                "parser": RasterisedDocumentParser,
                "weight": 0
--- a/src/paperless_tesseract/tests/test_signals.py
+++ b/src/paperless_tesseract/tests/test_signals.py
@@ -0,0 +1,36 @@
+from django.test import TestCase
+
+from ..signals import ConsumerDeclaration
+
+
+class SignalsTestCase(TestCase):
+
+    def test_test_handles_various_file_names_true(self):
+
+        prefixes = (
+            "doc", "My Document", "Μυ Γρεεκ Δοψθμεντ", "Doc -with - tags",
+            "A document with a . in it", "Doc with -- in it"
+        )
+        suffixes = (
+            "pdf", "jpg", "jpeg", "gif", "png", "tiff", "tif", "pnm", "bmp",
+            "PDF", "JPG", "JPEG", "GIF", "PNG", "TIFF", "TIF", "PNM", "BMP",
+            "pDf", "jPg", "jpEg", "gIf", "pNg", "tIff", "tIf", "pNm", "bMp",
+        )
+
+        for prefix in prefixes:
+            for suffix in suffixes:
+                name = "{}.{}".format(prefix, suffix)
+                self.assertTrue(ConsumerDeclaration.test(name))
+
+    def test_test_handles_various_file_names_false(self):
+
+        prefixes = ("doc",)
+        suffixes = ("txt", "markdown", "",)
+
+        for prefix in prefixes:
+            for suffix in suffixes:
+                name = "{}.{}".format(prefix, suffix)
+                self.assertFalse(ConsumerDeclaration.test(name))
+
+        self.assertFalse(ConsumerDeclaration.test(""))
+        self.assertFalse(ConsumerDeclaration.test("doc"))
--- a/src/pytest.ini
+++ b/src/pytest.ini
@@ -1,3 +1,8 @@
 [pytest]
 DJANGO_SETTINGS_MODULE=paperless.settings
-
+addopts = --pythonwarnings=all
+env =
+  PAPERLESS_CONSUME=/tmp
+  PAPERLESS_PASSPHRASE=THISISNOTASECRET
+  PAPERLESS_SECRET=paperless
+  PAPERLESS_EMAIL_SECRET=paperless
--- a/src/tox.ini
+++ b/src/tox.ini
@@ -5,19 +5,18 @@

 [tox]
 skipsdist = True
-envlist = py34, py35, py36, pep8
+envlist = py34, py35, py36, pycodestyle

 [testenv]
-commands = {envpython} manage.py test
+commands = pytest
 deps = -r{toxinidir}/../requirements.txt
-setenv =
-    PAPERLESS_CONSUME=/tmp
-    PAPERLESS_PASSPHRASE=THISISNOTASECRET
-    PAPERLESS_SECRET=paperless

-[testenv:pep8]
-commands=pep8
-deps=pep8
+[testenv:pycodestyle]
+commands=pycodestyle
+deps=pycodestyle

-[pep8]
-exclude=.tox,migrations,paperless/settings.py
+[pycodestyle]
+exclude=
+  .tox,
+  migrations,
+  paperless/settings.py
Author	SHA1	Message	Date
Daniel Quinn	3ca215e4dc	Bump to v1.0.0!	2018-01-06 19:25:33 +00:00
Daniel Quinn	16c4183333	Upgrade to Django 1.11.x	2018-01-06 19:24:10 +00:00
Daniel Quinn	6fe37678f2	Change date fields to actual date fields #278	2018-01-06 19:21:49 +00:00
Daniel Quinn	b58188f805	Switch from pep8 to pycodestyle	2018-01-06 18:56:37 +00:00
Daniel Quinn	f2a42ab6fe	Add catch-all redirect for /admin/	2018-01-06 18:51:16 +00:00
Daniel Quinn	e236b7bf7b	isort	2018-01-06 18:51:10 +00:00
Daniel Quinn	35004f434b	Add a smarter work-around for the change-list-results hack	2018-01-06 18:47:01 +00:00
Daniel Quinn	75251ad694	Add a note for future development	2018-01-06 18:30:33 +00:00
Daniel Quinn	870357968a	Fix tests to run on boxes with post-consume-scripts set	2018-01-06 17:23:24 +00:00
Daniel Quinn	a593798b4b	Add encoding declaration	2018-01-06 17:23:07 +00:00
Daniel Quinn	4f070ba162	Use double quotes by default	2018-01-06 17:22:57 +00:00
Daniel Quinn	9517d27f40	Add warnings to the test runner	2018-01-06 17:22:40 +00:00
Daniel Quinn	35bb3dbcc2	Clean up CSS for #272	2018-01-06 15:57:25 +00:00
Daniel Quinn	06117929bb	Merge pull request #277 from ishirav/multi-word-match Add multi-word match	2017-12-27 11:21:27 +01:00
ishirav	d1c8241947	break long lines (pep8)	2017-12-23 07:39:40 +02:00
ishirav	4c38b28469	break long lines (pep8)	2017-12-23 06:59:48 +02:00
ishirav	ad0f0a0b5d	Add documentation about multi-word search terms	2017-12-23 06:44:06 +02:00
ishirav	83746a9aeb	Add tests and improve whitespace handling	2017-12-23 06:37:00 +02:00
ishirav	6a36a4ec97	Support search terms that contain multiple words in ANY/ALL matching modes, by surrounding the terms with double quotes.	2017-12-23 06:05:48 +02:00
Daniel Quinn	af4623e605	Merge pull request #270 from dev-rke/patch-2 #248: fix missing CSS I'm not thrilled about this and would much rather have Nginx running to do the job, but I just don't have the time to do that right now. As Pit says, this is better than leaving DEBUG on.	2017-11-05 20:45:28 +00:00
Daniel Quinn	db8e116681	Merge pull request #269 from dev-rke/patch-1 Change default /consume volume	2017-11-05 20:42:08 +00:00
dev-rke	a8616ebfe2	#248 : fix missing CSS Force the server to use --insecure flag to also provide static contents like CSS files. See #248 and #167 for more details.	2017-11-04 16:02:29 +01:00
dev-rke	a38d3bf7f8	Change default /consume volume Change default /consume volume to ./consume on your host, so that no unexpected folder will be generated on the host machine.	2017-11-04 15:57:21 +01:00
Daniel Quinn	1cb5bbd07d	Merge pull request #268 from pitkley/267-dir-permissions Set `g+w` on the consumption/export directories	2017-11-01 11:30:16 +00:00
Daniel Quinn	6edb5b912f	Move all scanner recommendations to new doc page	2017-11-01 11:24:11 +00:00
Daniel Quinn	ec20c7577e	Add a new page for scanner recommendations	2017-11-01 11:15:37 +00:00
Daniel Quinn	d6df9b3656	Strip whitespace	2017-11-01 11:15:22 +00:00
Pit Kleyersburg	80a849fef7	Set `g+w` on the consumption/export directories This should fix issue #267.	2017-10-31 15:30:33 +01:00
Daniel Quinn	bd67b53d50	Update test for #259 fix	2017-10-16 10:53:18 +01:00
Daniel Quinn	e32ed09da3	Support .jpeg as well as .jpg	2017-10-16 09:00:38 +01:00
Daniel Quinn	c5632e5c04	Update changelog for 0.8.0	2017-09-10 12:51:00 +01:00
Daniel Quinn	4d2b71454d	Ignore .virtualenv	2017-09-09 12:22:03 +03:00
Daniel Quinn	5cbb33b02b	Add documentation for the new FORCE_SCRIPT_NAME feature	2017-09-09 12:21:31 +03:00
Daniel Quinn	2c55aad6c0	Merge pull request #255 from maphy-psd/master add FORCE_SCRIPT_NAME to host paperless on a subpath url	2017-09-06 15:56:44 +01:00
Daniel Quinn	1e039dcb32	Bump gunicorn	2017-08-30 00:44:13 +03:00
Daniel Quinn	6ca8da4858	Patch requirements to keep up with Django versions	2017-08-30 00:27:54 +03:00
maphy-psd	82f05e27c3	fix travis ci E510 E501 line too long (85 > 79 characters)	2017-08-20 16:18:39 +02:00
maphy-psd	7a627e4ad8	white spacing and remove var's prefix	2017-08-20 14:29:51 +02:00
maphy-psd	73af9552ec	getenv has "None" as default @MasterofJOKers in PR#255	2017-08-20 14:13:23 +02:00
maphy-psd	e4854f2144	def thumbnail uses FORCE_SCRIPT_NAME with this edit the tumbnails are show up..	2017-08-19 18:37:17 +02:00
maphy-psd	6f5c1ac4e1	add FORCE_SCRIPT_NAME setting	2017-08-19 12:39:25 +02:00
maphy-psd	22acc51284	add PAPERLESS_FORCE_SCRIPT_NAME	2017-08-19 12:38:45 +02:00
Daniel Quinn	a05644fc31	Merge pull request #250 from brightdroid/master create documents subfolder folder if they do not exist	2017-08-12 14:39:22 +01:00
Christoph Roeder	d1aa54caa9	create documents subfolder folder if they do not exist	2017-07-31 21:35:41 +02:00
Daniel Quinn	e293f70a91	Merge pull request #247 from danielquinn/issue/235 Allow correspondents to be deleted without deleting their documents	2017-07-15 19:41:33 +01:00
Daniel Quinn	347986a2b3	Allow correspondents to be deleted without deleting their documents Fixes #235	2017-07-15 19:13:10 +01:00
Daniel Quinn	ede274386b	Detect .tif files properly Fixes #232	2017-07-15 19:02:11 +01:00
Daniel Quinn	3e083354cc	Merge pull request #246 from kskyten/vb_memory Add memory to the virtual machine	2017-07-10 15:02:45 +01:00
Kusti Skytén	b2b4f6516a	Add memory to the virtual machine Fixes #244	2017-07-10 16:55:51 +03:00
Daniel Quinn	2ae702c7bb	Merge pull request #245 from tooomm/patch-1 README: unify badges (versioneye)	2017-07-09 18:25:28 +01:00
tooomm	b748420a94	unify badges (versioneye) normal > flat style	2017-07-09 15:17:42 +02:00
Daniel Quinn	8a4546ce0d	Merge pull request #242 from MasterofJOKers/setup_collectstatic Mention "collectstatic" in the docs	2017-06-27 13:07:07 +01:00
MasterofJOKers	167412a003	Mention "collectstatic" in the docs When using the built-in webserver in debug mode, the static files are handled automatically. From the Django docs: During development, if you use django.contrib.staticfiles, this will be done automatically by runserver when DEBUG is set to True (see django.contrib.staticfiles.views.serve()). This method is grossly inefficient and probably insecure, so it is unsuitable for production. This means, when using a real webserver, it also has to serve the static files, i.e. CSS and JavaScript. For that, one needs to run `./manage.py collectstatic` first.	2017-06-26 17:08:37 +02:00
Daniel Quinn	e8d90b42a1	Merge pull request #240 from ddddavidmartin/timezone_documentation_clarification Add link to Django documentation for time zone setting in example config.	2017-06-24 09:24:42 +01:00
David Martin	d8c7e9de5f	Add link to documentation for time zone setting in example config. It is not obvious which time zones the option in the config file accepts. Having a link to the official django documentation makes it clear.	2017-06-24 12:27:26 +10:00
Daniel Quinn	2ac1b78a2c	Move testing ENV vars into pytest.ini	2017-06-19 10:57:30 +01:00
Daniel Quinn	e8e38befb7	Fix test for new email secret	2017-06-19 10:24:23 +01:00
Daniel Quinn	b30629dd60	Remove debugging info	2017-06-19 09:22:26 +01:00
Daniel Quinn	f66d7e1c2d	Drop SHARED_SECRET in favour of EMAIL_SECRET Originally we used SHARED secret both for email and for the API. That was a bad idea, and now that we're only using this value for one case, I've renamed it to reflect its actual use.	2017-06-18 22:08:42 +01:00
Daniel Quinn	8417ac7eeb	Merge pull request #237 from danielquinn/fix-http-post Fix http post	2017-06-13 17:52:48 +01:00
Daniel Quinn	6342225b22	Merge pull request #238 from Strubbl/fix-shellcheck-issues docker-entrypoint.sh: fix shellcheck issues	2017-06-13 17:51:49 +01:00
Sven Fischer	4460fb7004	docker-entrypoint.sh: fix shellcheck issues issues found by shellcheck were: ``` $ shellcheck docker-entrypoint.sh In docker-entrypoint.sh line 10: if [[ ${USERMAP_UID} != ${USERMAP_ORIG_UID} \|\| ${USERMAP_GID} != ${USERMAP_ORIG_GID} ]]; then ^-- SC2053: Quote the rhs of != in [[ ]] to prevent glob matching. ^-- SC2053: Quote the rhs of != in [[ ]] to prevent glob matching. In docker-entrypoint.sh line 12: groupmod -g ${USERMAP_GID} paperless ^-- SC2086: Double quote to prevent globbing and word splitting. In docker-entrypoint.sh line 65: if dpkg -s "$pkg" 2>&1 > /dev/null; then ^-- SC2069: The order of the 2>&1 and the redirect matters. The 2>&1 has to be last. In docker-entrypoint.sh line 69: if ! apt-cache show "$pkg" 2>&1 > /dev/null; then ^-- SC2069: The order of the 2>&1 and the redirect matters. The 2>&1 has to be last. ```	2017-06-12 21:09:59 +02:00
Daniel Quinn	6f635c74fc	Fix HTTP POST of documents After tinkering with this for about 2 hours, I'm reasonably sure this ever worked. This feature was added by me in haste and poked by by the occasional contributor, and it suffered from neglect. * Removed the requirement for signature generation in favour of simply requiring BasicAuth or a valid session id. * Fixed a number of bugs in the form itself that would have ensured that the form never accepted anything. * Documented it all properly so now (hopefully) people will have less trouble figuring it out in the future.	2017-06-11 01:23:37 +01:00
Daniel Quinn	c82d45689c	Remove unused imports & comments	2017-06-11 01:23:08 +01:00
Daniel Quinn	02e0543a02	Merge pull request #233 from lucaskolstad/django_filters_installed_app Add django_filters to INSTALLED_APPS	2017-05-31 10:39:49 +01:00
Lucas Kolstad	fde0276d65	Add django_filters to INSTALLED_APPS	2017-05-30 15:05:34 -07:00
Daniel Quinn	3d6289e4e1	Preparing for 0.5.0 I hadn't realised that I hadn't released 0.5.0 yet, so I've amended the version numbers	2017-05-27 13:23:25 +01:00
Daniel Quinn	5e55b971a8	Update changelog for 0.5.1	2017-05-27 13:21:04 +01:00
Daniel Quinn	0a43b84a96	Merge pull request #228 from ddddavidmartin/extend_email_handling Set email inbox in config file, fetch email at consumer startup and bring documentation up to date	2017-05-27 13:07:17 +01:00
Daniel Quinn	dc74cc2db5	Merge pull request #230 from ddddavidmartin/webserver_paperless_titles Refer to Paperless in Django webserver titles and update Django documentation URLs	2017-05-27 13:00:46 +01:00
Daniel Quinn	fc00a09318	Merge pull request #229 from ddddavidmartin/clarify_systemd_instructions Copy Paperless service files to systemd directory before enabling them.	2017-05-27 12:59:00 +01:00
Daniel Quinn	19cf9d0b9a	Merge pull request #227 from ddddavidmartin/fix_forms_typos Fix clened_data typos in forms.py.	2017-05-27 12:57:43 +01:00
Daniel Quinn	f81780cf88	Merge pull request #226 from ddddavidmartin/bump_pyocr_requirement_for_tesseract_4_support Bump pyocr requirement to version 0.4.7 to support tesseract 4.0.0alpha.	2017-05-27 12:56:54 +01:00
David Martin	c3a55c91dc	Update version of remaining weblinks to Django documentation. We are using Django 1.10 as per requirements.txt and should refer to its documentation as well.	2017-05-27 08:49:03 +10:00
David Martin	482f02fbaa	Update link to Django documentation in urls.py. As per requirements.txt we are using Django version 1.10. It makes sense to link to the documentation for that version as well. Also, the documentation for the previous version has a notice on the top that informs about the version being unsafe which is a bit disconcerting when seeing it.	2017-05-25 20:22:05 +10:00
David Martin	6bf7429ef6	Refer to Paperless instead of Django in webserver pages. It looks better to have the page titles refer to Paperless rather than Django. The same with the login. Setting it in urls.py is based on this stackoverflow response [0]. The proper documentation for the admin page is under [1]. [0] https://stackoverflow.com/a/24983231 [1] https://docs.djangoproject.com/en/1.10/ref/contrib/admin/#adminsite-attributes	2017-05-25 20:16:59 +10:00
David Martin	4198de604f	Copy Paperless service files to systemd directory before enabling them. The problem with the original instruction is that systemd creates a symlink pointing to the service file in the paperless directory. A user is unlikely to leave the changes in the service files committed (especially not on a master branch checkout) and they are easily lost and the services fail to start without obvious reason. To avoid this we simply copy the service files to the systemd directory directly and use the files in the repository only as an example.	2017-05-24 22:48:35 +10:00
David Martin	8c06dc2dd1	Mention safe characters for email titles in documentation. This makes it clear that only a specific set of characters is allowed to be used for email titles. It is worth mentioning this in the documentation as it otherwise needs to be figured out from the Paperless sources [0]. [0] SAFE_REGEX in src/documents/models.py	2017-05-23 11:16:38 +10:00
David Martin	13b4610c1d	Clarify consumption documentation to match the current Paperless behaviour. The configuration does not have to be hardcoded in settings.py anymore, and instead happens in the config file. Also, we added that the emails are checked at startup [0]. [0] see commit `3153bbd6a8`	2017-05-23 11:15:33 +10:00
David Martin	0090128249	Fix clened_data typos in forms.py. This is where linters shine. Either pylint or pyflake discovered these typos and even suggested the correct name.	2017-05-21 17:05:49 +10:00
David Martin	3153bbd6a8	Fetch emails right at startup instead of waiting for 10 minutes. Especially when first setting up the configuration for consuming documents from emails it makes sense to quickly test the changes. Having to wait for 10 minutes is not acceptable. There are two ways around it that come to my mind: the simple approach is to always fetch the emails when Paperless first starts. This way the fetching of emails can be tested straight away. The alternative would be to have a configuration option that allows to set the interval in which emails are checked. The user could then reduce it to test the setup and increase it again later on. This seems needlessly complicated though, so fetching at startup it is.	2017-05-21 14:23:46 +10:00
David Martin	7b1812a9be	Capitalise Paperless in example config. This is in line with how it is spelled in the rest of the config file.	2017-05-21 08:44:41 +10:00
David Martin	c647daace2	Connect to configured inbox instead of hardcoded one. Now the retrieving of emails from the inbox set in the config file works as expected.	2017-05-21 08:34:49 +10:00
David Martin	70dceb3b37	Allow to configure the email inbox via config file. Same as all the other parameters it makes sense to set it in the config file as well.	2017-05-20 16:48:40 +10:00
David Martin	72b1ce5fe6	Bump pyocr requirement to version 0.4.7 to support tesseract 4.0.0alpha. The latest pyocr version now allows running it with the latest tesseract version. Hopefully this means better OCR results. I am not sure about whether there are binary packages for the latest tesseract. But on my setup it was simply a case of checking out the master branch [0] and compiling + installing from there. It seems to work fine with paperless as well. [0] https://github.com/tesseract-ocr/tesseract	2017-05-14 12:59:32 +10:00
Daniel Quinn	731942d855	add: migration for fuzzy matching	2017-05-11 22:09:30 -07:00
Daniel Quinn	058dad7ba7	Merge branch 'master' of github.com:danielquinn/paperless	2017-05-10 16:14:14 -07:00
Daniel Quinn	fe43e5a717	add: credit for ckut's import/export changes	2017-05-10 16:14:05 -07:00
Daniel Quinn	34bab04310	fix: formatting cleanup	2017-05-10 17:38:00 -07:00
Daniel Quinn	18f7c4f31f	Merge pull request #224 from CkuT/exporter_improvements WIP : Exporter improvements	2017-05-10 16:09:11 -07:00
Daniel Quinn	3477b96d87	Merge pull request #222 from tido-/master little changes to reflect as much as possible	2017-05-10 15:25:35 -07:00
Tido-	ac850b64aa	minor changes on documentation files	2017-05-10 22:25:59 +02:00
CkuT	279e421ad5	PEP8	2017-05-08 15:48:37 +02:00
CkuT	22c8049bed	Use relatives paths instead of absolutes paths for document export/import	2017-05-08 15:23:35 +02:00
CkuT	3f1392769d	Refactor to get the document time once	2017-05-08 15:02:59 +02:00
CkuT	da71eab0ae	Use constants for manifest	2017-05-08 14:54:48 +02:00
CkuT	2e0e6bb8d2	Add thumbnail export	2017-05-06 15:14:36 +02:00
CkuT	1f145c6cba	Fix the source file checking	2017-05-06 15:04:47 +02:00
Tido-	c4d48181ee	find the error in line break 03	2017-05-04 19:39:58 +02:00
Tido-	0c4ecad4a7	find the error in line break 02	2017-05-04 19:36:55 +02:00
Tido-	d25de5592a	find the error in line break 01	2017-05-04 19:35:58 +02:00
Tido-	88fc35d8ea	find the error in line break	2017-05-04 19:31:17 +02:00
Tido-	02730be871	found some additional bits to yours	2017-05-03 22:20:13 +02:00
Daniel Quinn	c7876dbbe8	add: credit for #212	2017-05-03 12:01:04 -07:00
Daniel Quinn	85fcb5fedf	Merge pull request #212 from Strubbl/docker-prepare-export Docker: prepare export directory	2017-05-03 09:55:43 -07:00
Tido-	58cbfeb72a	little changes to reflect as much as possible	2017-05-02 22:48:37 +02:00
Sven Fischer	b2b6cbe9c8	Docker: review refacorting for export directory preparation	2017-05-02 19:52:36 +02:00
Sven Fischer	4c05a511c2	Docker: review fix: if end-user host-mounts the export directory	2017-05-02 19:06:01 +02:00
Sven Fischer	b5bef13b46	Docker: prepare export directory	2017-05-02 13:01:09 +02:00
Daniel Quinn	bb47dc5e06	fix: spacing and typos	2017-05-01 13:25:07 -07:00
Daniel Quinn	511f154e16	Merge pull request #221 from tido-/master adding sections, grouped what belongs together	2017-05-01 13:10:23 -07:00
Tido-	10ae2207df	adding sections, grouped what belongs together	2017-05-01 21:18:34 +02:00
Daniel Quinn	71df99ffb6	add: note for new fuzzy match support	2017-04-30 19:40:58 -07:00
Daniel Quinn	5eb26102d4	Merge pull request #220 from jgysland/add-fuzzy-matching fuzzy matching	2017-04-30 19:37:03 -07:00
jgysland	a7fa82a83f	KISS fuzzy match help text	2017-04-30 16:56:50 -04:00
jgysland	6ce27d225d	add fuzzy matching + tests	2017-04-29 17:13:04 -04:00
Daniel Quinn	819a0e1f57	Merge pull request #219 from Strubbl/remove-duplicate-conf-option paperless.conf.example: remove duplicate option	2017-04-28 17:38:55 -07:00
Sven Fischer	702a60b7e7	paperless.conf.example: remove duplicate option This commit removes the duplicated option in this config. Please see `057d5f149f/paperless.conf.example (L113)` compared with `057d5f149f/paperless.conf.example (L122)`	2017-04-24 23:43:54 +02:00
Daniel Quinn	057d5f149f	Merge pull request #214 from philippeowagner/master Fixes #213 (MySQL syntax error)	2017-04-19 10:42:50 +01:00
Philippe O. Wagner	d047dafd23	Fixes #213 (MySQL syntax error)	2017-04-19 11:30:12 +02:00
Daniel Quinn	b449a7f6e2	feat: add @eonist's recommendation Fixes #211	2017-04-08 20:12:59 +01:00
Daniel Quinn	f302874ae8	Merge pull request #207 from danielquinn/fix/travis fix: travis doesn't like my new tests	2017-03-28 22:26:35 +01:00
Daniel Quinn	6af58203dd	fix: travis doesn't like my new tests	2017-03-28 21:23:42 +00:00
Daniel Quinn	fa4924d5ba	fix: allow for caps in file name suffixes #206 @schinkelg ran aground of this one and I took the opportunity to add a test to catch this sort of thing for next time.	2017-03-28 21:14:24 +00:00