Merge branch 'master' of github.com:danielquinn/paperless into ENH_filename_date_parsing

2025-12-18 01:41:14 -06:00 · 2018-11-15 23:17:59 -05:00
parent 6e88634fa8 8d6825dac0
commit b0326b5a19
42 changed files with 1405 additions and 770 deletions
--- a/docs/changelog.rst
+++ b/docs/changelog.rst
@@ -1,6 +1,83 @@
 Changelog
 #########

+2.6.0
+=====
+
+* Allow an infinite number of logs to be deleted.  Thanks to `Ulli`_ for noting
+  the problem in `#433`_.
+* Fix the ``RecentCorrespondentsFilter`` correspondents filter that was added
+  in 2.4 to play nice with the defaults.  Thanks to `tsia`_ and `Sblop`_ who
+  pointed this out. `#423`_.
+* Updated dependencies to include (among other things) a security patch to
+  requests.
+
+
+2.5.0
+=====
+
+* **New dependency**: Paperless now optimises thumbnail generation with
+  `optipng`_, so you'll need to install that somewhere in your PATH or declare
+  its location in ``PAPERLESS_OPTIPNG_BINARY``.  The Docker image has already
+  been updated on the Docker Hub, so you just need to pull the latest one from
+  there if you're a Docker user.
+
+* "Login free" instances of Paperless were breaking whenever you tried to edit
+  objects in the admin: adding/deleting tags or correspondents, or even fixing
+  spelling.  This was due to the "user hack" we were applying to sessions that
+  weren't using a login, as that hack user didn't have a valid id.  The fix was
+  to attribute the first user id in the system to this hack user.  `#394`_
+
+* A problem in how we handle slug values on Tags and Correspondents required a
+  few changes to how we handle this field `#393`_:
+
+  1. Slugs are no longer editable.  They're derived from the name of the tag or
+     correspondent at save time, so if you wanna change the slug, you have to
+     change the name, and even then you're restricted to the rules of the
+     ``slugify()`` function.  The slug value is still visible in the admin
+     though.
+  2. I've added a migration to go over all existing tags & correspondents and
+     rewrite the ``.slug`` values to ones conforming to the ``slugify()``
+     rules.
+  3. The consumption process now uses the same rules as ``.save()`` in
+     determining a slug and using that to check for an existing
+     tag/correspondent.
+
+* An annoying bug in the date capture code was causing some bogus dates to be
+  attached to documents, which in turn busted the UI.  Thanks to `Andrew Peng`_
+  for reporting this. `#414`_.
+
+* A bug in the Dockerfile meant that Tesseract language files weren't being
+  installed correctly.  `euri10`_ was quick to provide a fix: `#406`_, `#413`_.
+
+* Document consumption is now wrapped in a transaction as per an old ticket
+  `#262`_.
+
+* The ``get_date()`` functionality of the parsers has been consolidated onto
+  the ``DocumentParser`` class since much of that code was redundant anyway.
+
+
+2.4.0
+=====
+
+* A new set of actions are now available thanks to `jonaswinkler`_'s very first
+  pull request!  You can now do nifty things like tag documents in bulk, or set
+  correspondents in bulk.  `#405`_
+* The import/export system is now a little smarter.  By default, documents are
+  tagged as ``unencrypted``, since exports are by their nature unencrypted.
+  It's now in the import step that we decide the storage type.  This allows you
+  to export from an encrypted system and import into an unencrypted one, or
+  vice-versa.
+* The migration history has been slightly modified to accommodate PostgreSQL
+  users.  Additionally, you can now tell paperless to use PostgreSQL simply by
+  declaring ``PAPERLESS_DBUSER`` in your environment.  This will attempt to
+  connect to your Postgres database without a password unless you also set
+  ``PAPERLESS_DBPASS``.
+* A bug was found in the REST API filter system that was the result of an
+  update of django-filter some time ago.  This has now been patched in `#412`_.
+  Thanks to `thepill`_ for spotting it!
+
+
 2.3.0
 =====

@@ -15,7 +92,8 @@ Changelog
 * As his last bit of effort on this release, Joshua also added some code to
  allow you to view the documents inline rather than download them as an
  attachment. `#400`_
-* Finally, `ahyear`_ found a slip in the Docker documentation and patched it. `#401`_
+* Finally, `ahyear`_ found a slip in the Docker documentation and patched it.
+  `#401`_


 2.2.1
@@ -32,14 +110,14 @@ Changelog
  version of Paperless that supports Django 2.0!  As a result of their hard
  work, you can now also run Paperless on Python 3.7 as well: `#386`_ &
  `#390`_.
-* `Stéphane Brunner`_ added a few lines of code that made tagging interface a lot
-  easier on those of us with lots of different tags: `#391`_.
+* `Stéphane Brunner`_ added a few lines of code that made tagging interface a
+  lot easier on those of us with lots of different tags: `#391`_.
 * `Kilian Koeltzsch`_ noticed a bug in how we capture & automatically create
  tags, so that's fixed now too: `#384`_.
 * `erikarvstedt`_ tweaked the behaviour of the test suite to be better behaved
  for packaging environments: `#383`_.
-* `Lukasz Soluch`_ added CORS support to make building a new Javascript-based front-end
-  cleaner & easier: `#387`_.
+* `Lukasz Soluch`_ added CORS support to make building a new Javascript-based
+  front-end cleaner & easier: `#387`_.


 2.1.0
@@ -499,8 +577,15 @@ bulk of the work on this big change.
 .. _Kilian Koeltzsch: https://github.com/kiliankoe
 .. _Lukasz Soluch: https://github.com/LukaszSolo
 .. _Joshua Taillon: https://github.com/jat255
-.. _dubit0:  https://github.com/dubit0
-.. _ahyear:  https://github.com/ahyear
+.. _dubit0: https://github.com/dubit0
+.. _ahyear: https://github.com/ahyear
+.. _jonaswinkler: https://github.com/jonaswinkler
+.. _thepill: https://github.com/thepill
+.. _Andrew Peng: https://github.com/pengc99
+.. _euri10: https://github.com/euri10
+.. _Ulli: https://github.com/Ulli2k
+.. _tsia: https://github.com/tsia
+.. _Sblop:  https://github.com/Sblop

 .. _#20: https://github.com/danielquinn/paperless/issues/20
 .. _#44: https://github.com/danielquinn/paperless/issues/44
@@ -566,6 +651,7 @@ bulk of the work on this big change.
 .. _#322: https://github.com/danielquinn/paperless/pull/322
 .. _#328: https://github.com/danielquinn/paperless/pull/328
 .. _#253: https://github.com/danielquinn/paperless/issues/253
+.. _#262: https://github.com/danielquinn/paperless/issues/262
 .. _#323: https://github.com/danielquinn/paperless/issues/323
 .. _#344: https://github.com/danielquinn/paperless/pull/344
 .. _#351: https://github.com/danielquinn/paperless/pull/351
@@ -582,11 +668,21 @@ bulk of the work on this big change.
 .. _#391: https://github.com/danielquinn/paperless/pull/391
 .. _#390: https://github.com/danielquinn/paperless/pull/390
 .. _#392: https://github.com/danielquinn/paperless/issues/392
+.. _#393: https://github.com/danielquinn/paperless/issues/393
 .. _#395: https://github.com/danielquinn/paperless/pull/395
+.. _#394: https://github.com/danielquinn/paperless/issues/394
 .. _#396: https://github.com/danielquinn/paperless/pull/396
 .. _#399: https://github.com/danielquinn/paperless/pull/399
 .. _#400: https://github.com/danielquinn/paperless/pull/400
 .. _#401: https://github.com/danielquinn/paperless/pull/401
+.. _#405: https://github.com/danielquinn/paperless/pull/405
+.. _#406: https://github.com/danielquinn/paperless/issues/406
+.. _#412: https://github.com/danielquinn/paperless/issues/412
+.. _#413: https://github.com/danielquinn/paperless/pull/413
+.. _#414: https://github.com/danielquinn/paperless/issues/414
+.. _#423: https://github.com/danielquinn/paperless/issues/423
+.. _#433: https://github.com/danielquinn/paperless/issues/433

 .. _pipenv: https://docs.pipenv.org/
 .. _a new home on Docker Hub: https://hub.docker.com/r/danielquinn/paperless/
+.. _optipng: http://optipng.sourceforge.net/
--- a/docs/consumption.rst
+++ b/docs/consumption.rst
@@ -76,6 +76,31 @@ Pre-consumption script

 * Document file name

+A simple but common example for this would be creating a simple script like
+this:
+
+``/usr/local/bin/ocr-pdf``
+
+.. code:: bash
+
+    #!/usr/bin/env bash
+    pdf2pdfocr.py -i ${1}
+
+``/etc/paperless.conf``
+
+.. code:: bash
+
+    ...
+    PAPERLESS_PRE_CONSUME_SCRIPT="/usr/local/bin/ocr-pdf"
+    ...
+
+This will pass the path to the document about to be consumed to ``/usr/local/bin/ocr-pdf``,
+which will in turn call `pdf2pdfocr.py`_ on your document, which will then
+overwrite the file with an OCR'd version of the file and exit.  At which point,
+the consumption process will begin with the newly modified file.
+
+.. _pdf2pdfocr.py: https://github.com/LeoFCardoso/pdf2pdfocr
+

 .. _consumption-director-hook-variables-post:

--- a/docs/contributing.rst
+++ b/docs/contributing.rst
@@ -0,0 +1,141 @@
+.. _contributing:
+
+Contributing to Paperless
+#########################
+
+Maybe you've been using Paperless for a while and want to add a feature or two,
+or maybe you've come across a bug that you have some ideas how to solve.  The
+beauty of Free software is that you can see what's wrong and help to get it
+fixed for everyone!
+
+
+How to Get Your Changes Rolled Into Paperless
+=============================================
+
+If you've found a bug, but don't know how to fix it, you can always post an
+issue on `GitHub`_ in the hopes that someone will have the time to fix it for
+you.  If however you're the one with the time, pull requests are always
+welcome, you just have to make sure that your code conforms to a few standards:
+
+Pep8
+----
+
+It's the standard for all Python development, so it's `very well documented`_.
+The short version is:
+
+* Lines should wrap at 79 characters
+* Use ``snake_case`` for variables, ``CamelCase`` for classes, and ``ALL_CAPS``
+  for constants.
+* Space out your operators: ``stuff + 7`` instead of ``stuff+7``
+* Two empty lines between classes, and functions, but 1 empty line between
+  class methods.
+
+There's more to it than that, but if you follow those, you'll probably be
+alright.  When you submit your pull request, there's a pep8 checker that'll
+look at your code to see if anything is off.  If it finds anything, it'll
+complain at you until you fix it.
+
+
+Additional Style Guides
+-----------------------
+
+Where pep8 is ambiguous, I've tried to be a little more specific.  These rules
+aren't hard-and-fast, but if you can conform to them, I'll appreciate it and
+spend less time trying to conform your PR before merging:
+
+
+Function calls
+..............
+
+If you're calling a function and that necessitates more than one line of code,
+please format it like this:
+
+.. code:: python
+
+    my_function(
+        argument1,
+        kwarg1="x",
+        kwarg2="y"
+        another_really_long_kwarg="some big value"
+        a_kwarg_calling_another_long_function=another_function(
+            another_arg,
+            another_kwarg="kwarg!"
+        )
+    )
+
+This is all in the interest of code uniformity rather than anything else.  If
+we stick to a style, everything is understandable in the same way.
+
+
+Quoting Strings
+...............
+
+pep8 is a little too open-minded on this for my liking.  Python strings should
+be quoted with double quotes (``"``) except in cases where the resulting string
+would require too much escaping of a double quote, in which case, a single
+quoted, or triple-quoted string will do:
+
+.. code:: python
+
+    my_string = "This is my string"
+    problematic_string = 'This is a "string" with "quotes" in it'
+
+In HTML templates, please use double-quotes for tag attributes, and single
+quotes for arguments passed to Django tempalte tags:
+
+.. code:: html
+
+    <div class="stuff">
+        <a href="{% url 'some-url-name' pk='w00t' %}">link this</a>
+    </div>
+
+This is to keep linters happy they look at an HTML file and see an attribute
+closing the ``"`` before it should have been.
+
+--
+
+That's all there is in terms of guidelines, so I hope it's not too daunting.
+
+
+Indentation & Spacing
+.....................
+
+When it comes to indentation:
+
+* For Python, the rule is: follow pep8 and use 4 spaces.
+* For Javascript, CSS, and HTML, please use 1 tab.
+
+Additionally, Django templates making use of block elements like ``{% if %}``,
+``{% for %}``, and ``{% block %}`` etc. should be indented:
+
+Good:
+
+.. code:: html
+
+    {% block stuff %}
+    	<h1>This is the stuff</h1>
+    {% endblock %}
+
+Bad:
+
+.. code:: html
+
+    {% block stuff %}
+    <h1>This is the stuff</h1>
+    {% endblock %}
+
+
+The Code of Conduct
+===================
+
+Paperless has a `code of conduct`_.  It's a lot like the other ones you see out
+there, with a few small changes, but basically it boils down to:
+
+> Don't be an ass, or you might get banned.
+
+I'm proud to say that the CoC has never had to be enforced because everyone has
+been awesome, friendly, and professional.
+
+.. _GitHub: https://github.com/danielquinn/paperless/issues
+.. _very well documented: https://www.python.org/dev/peps/pep-0008/
+.. _code of conduct: https://github.com/danielquinn/paperless/blob/master/CODE_OF_CONDUCT.md
--- a/docs/guesswork.rst
+++ b/docs/guesswork.rst
@@ -43,6 +43,16 @@ These however wouldn't work:
 * ``Some Company Name, Invoice 2016-01-01, money, invoices.pdf``
 * ``Another Company- Letter of Reference.jpg``

+Do I have to be so strict about naming?
+---------------------------------------
+Rather than using the strict document naming rules, one can also set the option
+``PAPERLESS_FILENAME_DATE_ORDER`` in ``paperless.conf`` to any date order
+that is accepted by dateparser_. Doing so will cause ``paperless`` to default
+to any date format that is found in the title, instead of a date pulled from
+the document's text, without requiring the strict formatting of the document
+filename as described above.
+
+.. _dateparser: https://github.com/scrapinghub/dateparser/blob/v0.7.0/docs/usage.rst#settings

 .. _guesswork-content:

--- a/docs/index.rst
+++ b/docs/index.rst
@@ -43,5 +43,6 @@ Contents
   customising
   extending
   troubleshooting
+   contributing
   scanners
   changelog
--- a/docs/migrating.rst
+++ b/docs/migrating.rst
@@ -82,6 +82,7 @@ rolled in as part of the update:

    $ cd /path/to/project
    $ git pull
+    $ pip install -r requirements.txt
    $ cd src
    $ ./manage.py migrate

--- a/docs/requirements.rst
+++ b/docs/requirements.rst
@@ -33,7 +33,7 @@ In addition to the above, there are a number of Python requirements, all of
 which are listed in a file called ``requirements.txt`` in the project root
 directory.

-If you're not working on a virtual environment (like Vagrant or Docker), you
+If you're not working on a virtual environment (like Docker), you
 should probably be using a virtualenv, but that's your call.  The reasons why
 you might choose a virtualenv or not aren't really within the scope of this
 document.  Needless to say if you don't know what a virtualenv is, you should
--- a/docs/setup.rst
+++ b/docs/setup.rst
@@ -42,18 +42,14 @@ Installation & Configuration
 You can go multiple routes with setting up and running Paperless:

 * The `bare metal route`_
- * The `vagrant route`_
 * The `docker route`_


-The `Vagrant route`_ is quick & easy, but means you're running a VM which comes
-with memory consumption, cpu overhead etc. The `docker route`_ offers the same
-simplicity as Vagrant with lower resource consumption.
+The `docker route`_ is quick & easy.

 The `bare metal route`_ is a bit more complicated to setup but makes it easier
 should you want to contribute some code back.

-.. _Vagrant route: setup-installation-vagrant_
 .. _docker route: setup-installation-docker_
 .. _bare metal route: setup-installation-bare-metal_
 .. _Docker Machine: https://docs.docker.com/machine/
@@ -267,54 +263,6 @@ Docker Method
   newer ``docker-compose.yml.example`` file


-.. _setup-installation-vagrant:
-
-Vagrant Method
-++++++++++++++
-
-1. Install `Vagrant`_.  How you do that is really between you and your OS.
-2. Run ``vagrant up``.  An instance will start up for you.  When it's ready and
-   provisioned...
-3. Run ``vagrant ssh`` and once inside your new vagrant box, edit
-   ``/etc/paperless.conf`` and set the values for:
-
-    * ``PAPERLESS_CONSUMPTION_DIR``: This is where your documents will be
-      dumped to be consumed by Paperless.
-    * ``PAPERLESS_PASSPHRASE``: This is the passphrase Paperless uses to
-      encrypt/decrypt the original document.  It's only required if you want
-      your original files to be encrypted, otherwise, just leave it unset.
-    * ``PAPERLESS_EMAIL_SECRET``: this is the "magic word" used when consuming
-      documents from mail or via the API.  If you don't use either, leaving it
-      blank is just fine.
-
-4. Exit the vagrant box and re-enter it with ``vagrant ssh`` again.  This
-   updates the environment to make use of the changes you made to the config
-   file.
-5. Initialise the database with ``/opt/paperless/src/manage.py migrate``.
-6. Still inside your vagrant box, create a user for your Paperless instance
-   with ``/opt/paperless/src/manage.py createsuperuser``. Follow the prompts to
-   create your user.
-7. Start the webserver with
-   ``/opt/paperless/src/manage.py runserver 0.0.0.0:8000``. You should now be
-   able to visit your (empty) `Paperless webserver`_ at ``172.28.128.4:8000``.
-   You can login with the user/pass you created in #6.
-8. In a separate window, run ``vagrant ssh`` again, but this time once inside
-   your vagrant instance, you should start the consumer script with
-   ``/opt/paperless/src/manage.py document_consumer``.
-9. Scan something.  Put it in the ``CONSUMPTION_DIR``.
-10. Wait a few minutes
-11. Visit the document list on your webserver, and it should be there, indexed
-    and downloadable.
-
-.. caution::
-
-    This installation is not secure. Once everything is working head up to
-    `Making things more permanent`_
-
-.. _Vagrant: https://vagrantup.com/
-.. _Paperless server: http://172.28.128.4:8000
-
-
 .. _setup-permanent:

 Making Things a Little more Permanent
@@ -398,7 +346,7 @@ instance listening on localhost port 8000.
        location /static {

            autoindex on;
-            alias <path-to-paperless-static-directory>
+            alias <path-to-paperless-static-directory>;

        }

@@ -409,7 +357,7 @@ instance listening on localhost port 8000.
            proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
            proxy_set_header X-Forwarded-Proto $scheme;

-            proxy_pass http://127.0.0.1:8000
+            proxy_pass http://127.0.0.1:8000;
        }
    }

@@ -513,13 +461,6 @@ second period.
 .. _Upstart: http://upstart.ubuntu.com/


-Vagrant
-~~~~~~~
-
-You may use the Ubuntu explanation above. Replace
-``(local-filesystems and net-device-up IFACE=eth0)`` with ``vagrant-mounted``.
-
-
 .. _setup-permanent-docker:

 Docker
--- a/docs/troubleshooting.rst
+++ b/docs/troubleshooting.rst
@@ -14,9 +14,8 @@ FORGIVING_OCR is enabled``, then you might need to install the
 `Tesseract language files <http://packages.ubuntu.com/search?keywords=tesseract-ocr>`_
 marching your document's languages.

-As an example, if you are running Paperless from the Vagrant setup provided
-(or from any Ubuntu or Debian box), and your documents are written in Spanish
-you may need to run::
+As an example, if you are running Paperless from any Ubuntu or Debian
+box, and your documents are written in Spanish you may need to run::

    apt-get install -y tesseract-ocr-spa