Merge branch 'master' of github.com:danielquinn/paperless into ENH_filename_date_parsing

This commit is contained in:
Joshua Taillon
2018-11-15 23:17:59 -05:00
42 changed files with 1405 additions and 770 deletions

View File

@@ -1,6 +1,83 @@
Changelog
#########
2.6.0
=====
* Allow an infinite number of logs to be deleted. Thanks to `Ulli`_ for noting
the problem in `#433`_.
* Fix the ``RecentCorrespondentsFilter`` correspondents filter that was added
in 2.4 to play nice with the defaults. Thanks to `tsia`_ and `Sblop`_ who
pointed this out. `#423`_.
* Updated dependencies to include (among other things) a security patch to
requests.
2.5.0
=====
* **New dependency**: Paperless now optimises thumbnail generation with
`optipng`_, so you'll need to install that somewhere in your PATH or declare
its location in ``PAPERLESS_OPTIPNG_BINARY``. The Docker image has already
been updated on the Docker Hub, so you just need to pull the latest one from
there if you're a Docker user.
* "Login free" instances of Paperless were breaking whenever you tried to edit
objects in the admin: adding/deleting tags or correspondents, or even fixing
spelling. This was due to the "user hack" we were applying to sessions that
weren't using a login, as that hack user didn't have a valid id. The fix was
to attribute the first user id in the system to this hack user. `#394`_
* A problem in how we handle slug values on Tags and Correspondents required a
few changes to how we handle this field `#393`_:
1. Slugs are no longer editable. They're derived from the name of the tag or
correspondent at save time, so if you wanna change the slug, you have to
change the name, and even then you're restricted to the rules of the
``slugify()`` function. The slug value is still visible in the admin
though.
2. I've added a migration to go over all existing tags & correspondents and
rewrite the ``.slug`` values to ones conforming to the ``slugify()``
rules.
3. The consumption process now uses the same rules as ``.save()`` in
determining a slug and using that to check for an existing
tag/correspondent.
* An annoying bug in the date capture code was causing some bogus dates to be
attached to documents, which in turn busted the UI. Thanks to `Andrew Peng`_
for reporting this. `#414`_.
* A bug in the Dockerfile meant that Tesseract language files weren't being
installed correctly. `euri10`_ was quick to provide a fix: `#406`_, `#413`_.
* Document consumption is now wrapped in a transaction as per an old ticket
`#262`_.
* The ``get_date()`` functionality of the parsers has been consolidated onto
the ``DocumentParser`` class since much of that code was redundant anyway.
2.4.0
=====
* A new set of actions are now available thanks to `jonaswinkler`_'s very first
pull request! You can now do nifty things like tag documents in bulk, or set
correspondents in bulk. `#405`_
* The import/export system is now a little smarter. By default, documents are
tagged as ``unencrypted``, since exports are by their nature unencrypted.
It's now in the import step that we decide the storage type. This allows you
to export from an encrypted system and import into an unencrypted one, or
vice-versa.
* The migration history has been slightly modified to accommodate PostgreSQL
users. Additionally, you can now tell paperless to use PostgreSQL simply by
declaring ``PAPERLESS_DBUSER`` in your environment. This will attempt to
connect to your Postgres database without a password unless you also set
``PAPERLESS_DBPASS``.
* A bug was found in the REST API filter system that was the result of an
update of django-filter some time ago. This has now been patched in `#412`_.
Thanks to `thepill`_ for spotting it!
2.3.0
=====
@@ -15,7 +92,8 @@ Changelog
* As his last bit of effort on this release, Joshua also added some code to
allow you to view the documents inline rather than download them as an
attachment. `#400`_
* Finally, `ahyear`_ found a slip in the Docker documentation and patched it. `#401`_
* Finally, `ahyear`_ found a slip in the Docker documentation and patched it.
`#401`_
2.2.1
@@ -32,14 +110,14 @@ Changelog
version of Paperless that supports Django 2.0! As a result of their hard
work, you can now also run Paperless on Python 3.7 as well: `#386`_ &
`#390`_.
* `Stéphane Brunner`_ added a few lines of code that made tagging interface a lot
easier on those of us with lots of different tags: `#391`_.
* `Stéphane Brunner`_ added a few lines of code that made tagging interface a
lot easier on those of us with lots of different tags: `#391`_.
* `Kilian Koeltzsch`_ noticed a bug in how we capture & automatically create
tags, so that's fixed now too: `#384`_.
* `erikarvstedt`_ tweaked the behaviour of the test suite to be better behaved
for packaging environments: `#383`_.
* `Lukasz Soluch`_ added CORS support to make building a new Javascript-based front-end
cleaner & easier: `#387`_.
* `Lukasz Soluch`_ added CORS support to make building a new Javascript-based
front-end cleaner & easier: `#387`_.
2.1.0
@@ -499,8 +577,15 @@ bulk of the work on this big change.
.. _Kilian Koeltzsch: https://github.com/kiliankoe
.. _Lukasz Soluch: https://github.com/LukaszSolo
.. _Joshua Taillon: https://github.com/jat255
.. _dubit0: https://github.com/dubit0
.. _ahyear: https://github.com/ahyear
.. _dubit0: https://github.com/dubit0
.. _ahyear: https://github.com/ahyear
.. _jonaswinkler: https://github.com/jonaswinkler
.. _thepill: https://github.com/thepill
.. _Andrew Peng: https://github.com/pengc99
.. _euri10: https://github.com/euri10
.. _Ulli: https://github.com/Ulli2k
.. _tsia: https://github.com/tsia
.. _Sblop: https://github.com/Sblop
.. _#20: https://github.com/danielquinn/paperless/issues/20
.. _#44: https://github.com/danielquinn/paperless/issues/44
@@ -566,6 +651,7 @@ bulk of the work on this big change.
.. _#322: https://github.com/danielquinn/paperless/pull/322
.. _#328: https://github.com/danielquinn/paperless/pull/328
.. _#253: https://github.com/danielquinn/paperless/issues/253
.. _#262: https://github.com/danielquinn/paperless/issues/262
.. _#323: https://github.com/danielquinn/paperless/issues/323
.. _#344: https://github.com/danielquinn/paperless/pull/344
.. _#351: https://github.com/danielquinn/paperless/pull/351
@@ -582,11 +668,21 @@ bulk of the work on this big change.
.. _#391: https://github.com/danielquinn/paperless/pull/391
.. _#390: https://github.com/danielquinn/paperless/pull/390
.. _#392: https://github.com/danielquinn/paperless/issues/392
.. _#393: https://github.com/danielquinn/paperless/issues/393
.. _#395: https://github.com/danielquinn/paperless/pull/395
.. _#394: https://github.com/danielquinn/paperless/issues/394
.. _#396: https://github.com/danielquinn/paperless/pull/396
.. _#399: https://github.com/danielquinn/paperless/pull/399
.. _#400: https://github.com/danielquinn/paperless/pull/400
.. _#401: https://github.com/danielquinn/paperless/pull/401
.. _#405: https://github.com/danielquinn/paperless/pull/405
.. _#406: https://github.com/danielquinn/paperless/issues/406
.. _#412: https://github.com/danielquinn/paperless/issues/412
.. _#413: https://github.com/danielquinn/paperless/pull/413
.. _#414: https://github.com/danielquinn/paperless/issues/414
.. _#423: https://github.com/danielquinn/paperless/issues/423
.. _#433: https://github.com/danielquinn/paperless/issues/433
.. _pipenv: https://docs.pipenv.org/
.. _a new home on Docker Hub: https://hub.docker.com/r/danielquinn/paperless/
.. _optipng: http://optipng.sourceforge.net/

View File

@@ -76,6 +76,31 @@ Pre-consumption script
* Document file name
A simple but common example for this would be creating a simple script like
this:
``/usr/local/bin/ocr-pdf``
.. code:: bash
#!/usr/bin/env bash
pdf2pdfocr.py -i ${1}
``/etc/paperless.conf``
.. code:: bash
...
PAPERLESS_PRE_CONSUME_SCRIPT="/usr/local/bin/ocr-pdf"
...
This will pass the path to the document about to be consumed to ``/usr/local/bin/ocr-pdf``,
which will in turn call `pdf2pdfocr.py`_ on your document, which will then
overwrite the file with an OCR'd version of the file and exit. At which point,
the consumption process will begin with the newly modified file.
.. _pdf2pdfocr.py: https://github.com/LeoFCardoso/pdf2pdfocr
.. _consumption-director-hook-variables-post:

141
docs/contributing.rst Normal file
View File

@@ -0,0 +1,141 @@
.. _contributing:
Contributing to Paperless
#########################
Maybe you've been using Paperless for a while and want to add a feature or two,
or maybe you've come across a bug that you have some ideas how to solve. The
beauty of Free software is that you can see what's wrong and help to get it
fixed for everyone!
How to Get Your Changes Rolled Into Paperless
=============================================
If you've found a bug, but don't know how to fix it, you can always post an
issue on `GitHub`_ in the hopes that someone will have the time to fix it for
you. If however you're the one with the time, pull requests are always
welcome, you just have to make sure that your code conforms to a few standards:
Pep8
----
It's the standard for all Python development, so it's `very well documented`_.
The short version is:
* Lines should wrap at 79 characters
* Use ``snake_case`` for variables, ``CamelCase`` for classes, and ``ALL_CAPS``
for constants.
* Space out your operators: ``stuff + 7`` instead of ``stuff+7``
* Two empty lines between classes, and functions, but 1 empty line between
class methods.
There's more to it than that, but if you follow those, you'll probably be
alright. When you submit your pull request, there's a pep8 checker that'll
look at your code to see if anything is off. If it finds anything, it'll
complain at you until you fix it.
Additional Style Guides
-----------------------
Where pep8 is ambiguous, I've tried to be a little more specific. These rules
aren't hard-and-fast, but if you can conform to them, I'll appreciate it and
spend less time trying to conform your PR before merging:
Function calls
..............
If you're calling a function and that necessitates more than one line of code,
please format it like this:
.. code:: python
my_function(
argument1,
kwarg1="x",
kwarg2="y"
another_really_long_kwarg="some big value"
a_kwarg_calling_another_long_function=another_function(
another_arg,
another_kwarg="kwarg!"
)
)
This is all in the interest of code uniformity rather than anything else. If
we stick to a style, everything is understandable in the same way.
Quoting Strings
...............
pep8 is a little too open-minded on this for my liking. Python strings should
be quoted with double quotes (``"``) except in cases where the resulting string
would require too much escaping of a double quote, in which case, a single
quoted, or triple-quoted string will do:
.. code:: python
my_string = "This is my string"
problematic_string = 'This is a "string" with "quotes" in it'
In HTML templates, please use double-quotes for tag attributes, and single
quotes for arguments passed to Django tempalte tags:
.. code:: html
<div class="stuff">
<a href="{% url 'some-url-name' pk='w00t' %}">link this</a>
</div>
This is to keep linters happy they look at an HTML file and see an attribute
closing the ``"`` before it should have been.
--
That's all there is in terms of guidelines, so I hope it's not too daunting.
Indentation & Spacing
.....................
When it comes to indentation:
* For Python, the rule is: follow pep8 and use 4 spaces.
* For Javascript, CSS, and HTML, please use 1 tab.
Additionally, Django templates making use of block elements like ``{% if %}``,
``{% for %}``, and ``{% block %}`` etc. should be indented:
Good:
.. code:: html
{% block stuff %}
<h1>This is the stuff</h1>
{% endblock %}
Bad:
.. code:: html
{% block stuff %}
<h1>This is the stuff</h1>
{% endblock %}
The Code of Conduct
===================
Paperless has a `code of conduct`_. It's a lot like the other ones you see out
there, with a few small changes, but basically it boils down to:
> Don't be an ass, or you might get banned.
I'm proud to say that the CoC has never had to be enforced because everyone has
been awesome, friendly, and professional.
.. _GitHub: https://github.com/danielquinn/paperless/issues
.. _very well documented: https://www.python.org/dev/peps/pep-0008/
.. _code of conduct: https://github.com/danielquinn/paperless/blob/master/CODE_OF_CONDUCT.md

View File

@@ -43,6 +43,16 @@ These however wouldn't work:
* ``Some Company Name, Invoice 2016-01-01, money, invoices.pdf``
* ``Another Company- Letter of Reference.jpg``
Do I have to be so strict about naming?
---------------------------------------
Rather than using the strict document naming rules, one can also set the option
``PAPERLESS_FILENAME_DATE_ORDER`` in ``paperless.conf`` to any date order
that is accepted by dateparser_. Doing so will cause ``paperless`` to default
to any date format that is found in the title, instead of a date pulled from
the document's text, without requiring the strict formatting of the document
filename as described above.
.. _dateparser: https://github.com/scrapinghub/dateparser/blob/v0.7.0/docs/usage.rst#settings
.. _guesswork-content:

View File

@@ -43,5 +43,6 @@ Contents
customising
extending
troubleshooting
contributing
scanners
changelog

View File

@@ -82,6 +82,7 @@ rolled in as part of the update:
$ cd /path/to/project
$ git pull
$ pip install -r requirements.txt
$ cd src
$ ./manage.py migrate

View File

@@ -33,7 +33,7 @@ In addition to the above, there are a number of Python requirements, all of
which are listed in a file called ``requirements.txt`` in the project root
directory.
If you're not working on a virtual environment (like Vagrant or Docker), you
If you're not working on a virtual environment (like Docker), you
should probably be using a virtualenv, but that's your call. The reasons why
you might choose a virtualenv or not aren't really within the scope of this
document. Needless to say if you don't know what a virtualenv is, you should

View File

@@ -42,18 +42,14 @@ Installation & Configuration
You can go multiple routes with setting up and running Paperless:
* The `bare metal route`_
* The `vagrant route`_
* The `docker route`_
The `Vagrant route`_ is quick & easy, but means you're running a VM which comes
with memory consumption, cpu overhead etc. The `docker route`_ offers the same
simplicity as Vagrant with lower resource consumption.
The `docker route`_ is quick & easy.
The `bare metal route`_ is a bit more complicated to setup but makes it easier
should you want to contribute some code back.
.. _Vagrant route: setup-installation-vagrant_
.. _docker route: setup-installation-docker_
.. _bare metal route: setup-installation-bare-metal_
.. _Docker Machine: https://docs.docker.com/machine/
@@ -267,54 +263,6 @@ Docker Method
newer ``docker-compose.yml.example`` file
.. _setup-installation-vagrant:
Vagrant Method
++++++++++++++
1. Install `Vagrant`_. How you do that is really between you and your OS.
2. Run ``vagrant up``. An instance will start up for you. When it's ready and
provisioned...
3. Run ``vagrant ssh`` and once inside your new vagrant box, edit
``/etc/paperless.conf`` and set the values for:
* ``PAPERLESS_CONSUMPTION_DIR``: This is where your documents will be
dumped to be consumed by Paperless.
* ``PAPERLESS_PASSPHRASE``: This is the passphrase Paperless uses to
encrypt/decrypt the original document. It's only required if you want
your original files to be encrypted, otherwise, just leave it unset.
* ``PAPERLESS_EMAIL_SECRET``: this is the "magic word" used when consuming
documents from mail or via the API. If you don't use either, leaving it
blank is just fine.
4. Exit the vagrant box and re-enter it with ``vagrant ssh`` again. This
updates the environment to make use of the changes you made to the config
file.
5. Initialise the database with ``/opt/paperless/src/manage.py migrate``.
6. Still inside your vagrant box, create a user for your Paperless instance
with ``/opt/paperless/src/manage.py createsuperuser``. Follow the prompts to
create your user.
7. Start the webserver with
``/opt/paperless/src/manage.py runserver 0.0.0.0:8000``. You should now be
able to visit your (empty) `Paperless webserver`_ at ``172.28.128.4:8000``.
You can login with the user/pass you created in #6.
8. In a separate window, run ``vagrant ssh`` again, but this time once inside
your vagrant instance, you should start the consumer script with
``/opt/paperless/src/manage.py document_consumer``.
9. Scan something. Put it in the ``CONSUMPTION_DIR``.
10. Wait a few minutes
11. Visit the document list on your webserver, and it should be there, indexed
and downloadable.
.. caution::
This installation is not secure. Once everything is working head up to
`Making things more permanent`_
.. _Vagrant: https://vagrantup.com/
.. _Paperless server: http://172.28.128.4:8000
.. _setup-permanent:
Making Things a Little more Permanent
@@ -398,7 +346,7 @@ instance listening on localhost port 8000.
location /static {
autoindex on;
alias <path-to-paperless-static-directory>
alias <path-to-paperless-static-directory>;
}
@@ -409,7 +357,7 @@ instance listening on localhost port 8000.
proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
proxy_set_header X-Forwarded-Proto $scheme;
proxy_pass http://127.0.0.1:8000
proxy_pass http://127.0.0.1:8000;
}
}
@@ -513,13 +461,6 @@ second period.
.. _Upstart: http://upstart.ubuntu.com/
Vagrant
~~~~~~~
You may use the Ubuntu explanation above. Replace
``(local-filesystems and net-device-up IFACE=eth0)`` with ``vagrant-mounted``.
.. _setup-permanent-docker:
Docker

View File

@@ -14,9 +14,8 @@ FORGIVING_OCR is enabled``, then you might need to install the
`Tesseract language files <http://packages.ubuntu.com/search?keywords=tesseract-ocr>`_
marching your document's languages.
As an example, if you are running Paperless from the Vagrant setup provided
(or from any Ubuntu or Debian box), and your documents are written in Spanish
you may need to run::
As an example, if you are running Paperless from any Ubuntu or Debian
box, and your documents are written in Spanish you may need to run::
apt-get install -y tesseract-ocr-spa