Compare commits

...

31 Commits
0.8.0 ... 1.0.0

Author SHA1 Message Date
Daniel Quinn
3ca215e4dc Bump to v1.0.0! 2018-01-06 19:25:33 +00:00
Daniel Quinn
16c4183333 Upgrade to Django 1.11.x 2018-01-06 19:24:10 +00:00
Daniel Quinn
6fe37678f2 Change date fields to actual date fields #278 2018-01-06 19:21:49 +00:00
Daniel Quinn
b58188f805 Switch from pep8 to pycodestyle 2018-01-06 18:56:37 +00:00
Daniel Quinn
f2a42ab6fe Add catch-all redirect for /admin/ 2018-01-06 18:51:16 +00:00
Daniel Quinn
e236b7bf7b isort 2018-01-06 18:51:10 +00:00
Daniel Quinn
35004f434b Add a smarter work-around for the change-list-results hack 2018-01-06 18:47:01 +00:00
Daniel Quinn
75251ad694 Add a note for future development 2018-01-06 18:30:33 +00:00
Daniel Quinn
870357968a Fix tests to run on boxes with post-consume-scripts set 2018-01-06 17:23:24 +00:00
Daniel Quinn
a593798b4b Add encoding declaration 2018-01-06 17:23:07 +00:00
Daniel Quinn
4f070ba162 Use double quotes by default 2018-01-06 17:22:57 +00:00
Daniel Quinn
9517d27f40 Add warnings to the test runner 2018-01-06 17:22:40 +00:00
Daniel Quinn
35bb3dbcc2 Clean up CSS for #272 2018-01-06 15:57:25 +00:00
Daniel Quinn
06117929bb Merge pull request #277 from ishirav/multi-word-match
Add multi-word match
2017-12-27 11:21:27 +01:00
ishirav
d1c8241947 break long lines (pep8) 2017-12-23 07:39:40 +02:00
ishirav
4c38b28469 break long lines (pep8) 2017-12-23 06:59:48 +02:00
ishirav
ad0f0a0b5d Add documentation about multi-word search terms 2017-12-23 06:44:06 +02:00
ishirav
83746a9aeb Add tests and improve whitespace handling 2017-12-23 06:37:00 +02:00
ishirav
6a36a4ec97 Support search terms that contain multiple words in ANY/ALL matching modes, by surrounding the terms with double quotes. 2017-12-23 06:05:48 +02:00
Daniel Quinn
af4623e605 Merge pull request #270 from dev-rke/patch-2
#248: fix missing CSS
I'm not thrilled about this and would much rather have Nginx running to do the job, but I just don't have the time to do that right now.  As Pit says, this is better than leaving DEBUG on.
2017-11-05 20:45:28 +00:00
Daniel Quinn
db8e116681 Merge pull request #269 from dev-rke/patch-1
Change default /consume volume
2017-11-05 20:42:08 +00:00
dev-rke
a8616ebfe2 #248: fix missing CSS
Force the server to use --insecure flag to also provide static contents like CSS files.
See #248 and #167 for more details.
2017-11-04 16:02:29 +01:00
dev-rke
a38d3bf7f8 Change default /consume volume
Change default /consume volume to ./consume on your host, so that no unexpected folder will be generated on the host machine.
2017-11-04 15:57:21 +01:00
Daniel Quinn
1cb5bbd07d Merge pull request #268 from pitkley/267-dir-permissions
Set `g+w` on the consumption/export directories
2017-11-01 11:30:16 +00:00
Daniel Quinn
6edb5b912f Move all scanner recommendations to new doc page 2017-11-01 11:24:11 +00:00
Daniel Quinn
ec20c7577e Add a new page for scanner recommendations 2017-11-01 11:15:37 +00:00
Daniel Quinn
d6df9b3656 Strip whitespace 2017-11-01 11:15:22 +00:00
Pit Kleyersburg
80a849fef7 Set g+w on the consumption/export directories
This should fix issue #267.
2017-10-31 15:30:33 +01:00
Daniel Quinn
bd67b53d50 Update test for #259 fix 2017-10-16 10:53:18 +01:00
Daniel Quinn
e32ed09da3 Support .jpeg as well as .jpg 2017-10-16 09:00:38 +01:00
Daniel Quinn
c5632e5c04 Update changelog for 0.8.0 2017-09-10 12:51:00 +01:00
22 changed files with 214 additions and 82 deletions

View File

@@ -26,7 +26,8 @@ How it Works
Paperless does not control your scanner, it only helps you deal with what your
scanner produces
1. Buy a document scanner like `this one`_ (used by me) or `this other one`_
1. Buy a document scanner that can write to a place on your network. If you
need some inspiration, have a look at the `scanner recommendations`_ page.
recommended by another user.
2. Set it up to "scan to FTP" or something similar. It should be able to push
scanned images to a server without you having to do anything. If your
@@ -118,8 +119,7 @@ The thing is, I'm doing ok for money, so I would instead ask you to donate to
the `United Nations High Commissioner for Refugees`_. They're doing important
work and they need the money a lot more than I do.
.. _this one: http://www.brother.ca/en-CA/Scanners/11/ProductDetail/ADS1500W?ProductDetail=productdetail
.. _this other one: http://www.fujitsu.com/us/products/computing/peripheral/scanners/scansnap/ix500/
.. _scanner recommendations: https://paperless.readthedocs.io/en/latest/scanners.html
.. _ImageMagick: http://imagemagick.org/
.. _Tesseract: https://github.com/tesseract-ocr
.. _Unpaper: https://www.flameeyes.eu/projects/unpaper

View File

@@ -17,7 +17,7 @@ services:
# value with nothing.
environment:
- PAPERLESS_OCR_LANGUAGES=
command: ["runserver", "0.0.0.0:8000"]
command: ["runserver", "--insecure", "0.0.0.0:8000"]
consumer:
image: pitkley/paperless
@@ -26,7 +26,7 @@ services:
- media:/usr/src/paperless/media
# You have to adapt the local path you want the consumption
# directory to mount to by modifying the part before the ':'.
- /path/to/arbitrary/place:/consume
- ./consume:/consume
# Likewise, you can add a local path to mount a directory for
# exporting. This is not strictly needed for paperless to
# function, only if you're exporting your files: uncomment

View File

@@ -1,6 +1,28 @@
Changelog
#########
* 1.0.0
* Upgrade to Django 1.11. **You'll need to run
``pip install -r requirements.txt`` to after the usual ``git pull`` to
properly update**.
* Replace the templatetag-based hack we had for document listing in favour of
a slightly less ugly solution in the form of another template tag with less
copypasta.
* Support for multi-word-matches for auto-tagging thanks to an excellent
patch from `ishirav`_ `#277`_.
* Fixed a CSS bug reported by `Stefan Hagen`_ that caused an overlapping of
the text and checkboxes under some resolutions `#272`_.
* Patched the Docker config to force the serving of static files. Credit for
this one goes to `dev-rke`_ via `#248`_.
* Fix file permissions during Docker start up thanks to `Pit`_ on `#268`_.
* Date fields in the admin are now expressed as HTML5 date fields thanks to
`Lukas Winkler`_'s issue `#278`_
* 0.8.0
* Paperless can now run in a subdirectory on a host (``/paperless``), rather
than always running in the root (``/``) thanks to `maphy-psd`_'s work on
`#255`_.
* 0.7.0
* **Potentially breaking change**: As per `#235`_, Paperless will no longer
automatically delete documents attached to correspondents when those
@@ -231,6 +253,11 @@ Changelog
.. _Joshua Gilman: https://github.com/jmgilman
.. _ayounggun: https://github.com/ayounggun
.. _Kusti Skytén: https://github.com/kskyten
.. _maphy-psd: https://github.com/maphy-psd
.. _ishirav: https://github.com/ishirav
.. _Stefan Hagen: https://github.com/xkpd3
.. _dev-rke: https://github.com/dev-rke
.. _Lukas Winkler: https://github.com/Findus23
.. _#20: https://github.com/danielquinn/paperless/issues/20
.. _#44: https://github.com/danielquinn/paperless/issues/44
@@ -271,3 +298,9 @@ Changelog
.. _#232: https://github.com/danielquinn/paperless/issues/232
.. _#235: https://github.com/danielquinn/paperless/issues/235
.. _#236: https://github.com/danielquinn/paperless/issues/236
.. _#255: https://github.com/danielquinn/paperless/pull/255
.. _#268: https://github.com/danielquinn/paperless/pull/268
.. _#277: https://github.com/danielquinn/paperless/pull/277
.. _#272: https://github.com/danielquinn/paperless/issues/272
.. _#248: https://github.com/danielquinn/paperless/issues/248
.. _#278: https://github.com/danielquinn/paperless/issues/248

View File

@@ -80,6 +80,12 @@ text and matching algorithm. From the help info there:
uses a regex to match the PDF. If you don't know what a regex is, you
probably don't want this option.
When using the "any" or "all" matching algorithms, you can search for terms that
consist of multiple words by enclosing them in double quotes. For example, defining
a match text of ``"Bank of America" BofA`` using the "any" algorithm, will match
documents that contain either "Bank of America" or "BofA", but will not match
documents containing "Bank of South America".
Then just save your tag/correspondent and run another document through the
consumer. Once complete, you should see the newly-created document,
automatically tagged with the appropriate data.

View File

@@ -3,10 +3,10 @@
Paperless
=========
Paperless is a simple Django application running in two parts:
a :ref:`consumer <utilities-consumer>` (the thing that does the indexing) and
Paperless is a simple Django application running in two parts:
a :ref:`consumer <utilities-consumer>` (the thing that does the indexing) and
the :ref:`webserver <utilities-webserver>` (the part that lets you search & download
already-indexed documents). If you want to learn more about its functions keep on
already-indexed documents). If you want to learn more about its functions keep on
reading after the installation section.
@@ -19,8 +19,8 @@ Paper is a nightmare. Environmental issues aside, there's no excuse for it in
the 21st century. It takes up space, collects dust, doesn't support any form of
a search feature, indexing is tedious, it's heavy and prone to damage & loss.
I wrote this to make "going paperless" easier. I do not have to worry about
finding stuff again. I feed documents right from the post box into the scanner and
I wrote this to make "going paperless" easier. I do not have to worry about
finding stuff again. I feed documents right from the post box into the scanner and
then shred them. Perhaps you might find it useful too.
@@ -40,4 +40,5 @@ Contents
guesswork
migrating
troubleshooting
scanners
changelog

29
docs/scanners.rst Normal file
View File

@@ -0,0 +1,29 @@
.. _scanners:
Scanner Recommendations
=======================
As Paperless operates by watching a folder for new files, doesn't care what
scanner you use, but sometimes finding a scanner that will write to an FTP,
NFS, or SMB server can be difficult. This page is here to help you find one
that works right for you based on recommentations from other Paperless users.
+---------+----------------+-----+-----+-----+----------------+
| Brand | Model | Supports | Recommended By |
+---------+----------------+-----+-----+-----+----------------+
| | | FTP | NFS | SMB | |
+=========+================+=====+=====+=====+================+
| Brother | `ADS-1500W`_ | yes | no | yes | `danielquinn`_ |
+---------+----------------+-----+-----+-----+----------------+
| Brother | `MFC-J6930DW`_ | yes | | | `ayounggun`_ |
+---------+----------------+-----+-----+-----+----------------+
| Fujitsu | `ix500`_ | yes | | yes | `eonist`_ |
+---------+----------------+-----+-----+-----+----------------+
.. _ADS-1500W: https://www.brother.ca/en/p/ads1500w
.. _MFC-J6930DW: https://www.brother.ca/en/p/MFCJ6930DW
.. _ix500: http://www.fujitsu.com/us/products/computing/peripheral/scanners/scansnap/ix500/
.. _danielquinn: https://github.com/danielquinn
.. _ayounggun: https://github.com/ayounggun
.. _eonist: https://github.com/eonist

View File

@@ -1,4 +1,4 @@
Django>=1.10,<1.11
Django>=1.11,<2.0
Pillow>=3.1.1
django-crispy-forms>=1.6.1
django-extensions>=1.7.6
@@ -21,6 +21,6 @@ pytest
pytest-django
pytest-sugar
pytest-env
pep8
pycodestyle
flake8
tox

View File

@@ -25,16 +25,16 @@ set_permissions() {
echo "failed."
echo ""
echo "Either try to set it on your host-mounted directory"
echo "directly, or make sure that the directory has \`o+x\`"
echo "directly, or make sure that the directory has \`g+wx\`"
echo "permissions and the files in it at least \`o+r\`."
} >&2
chmod g+x "${!dir}" || {
chmod g+wx "${!dir}" || {
echo "Changing group permissions of ${cur_dir_name} directory:"
echo " ${!dir}"
echo "failed."
echo ""
echo "Either try to set it on your host-mounted directory"
echo "directly, or make sure that the directory has \`o+x\`"
echo "directly, or make sure that the directory has \`g+wx\`"
echo "permissions and the files in it at least \`o+r\`."
} >&2
done

View File

@@ -1,3 +1,5 @@
# coding=utf-8
import dateutil.parser
import logging
import os
@@ -89,7 +91,7 @@ class MatchingModel(models.Model):
search_kwargs = {"flags": re.IGNORECASE}
if self.matching_algorithm == self.MATCH_ALL:
for word in self.match.split(" "):
for word in self._split_match():
search_result = re.search(
r"\b{}\b".format(word), text, **search_kwargs)
if not search_result:
@@ -97,7 +99,7 @@ class MatchingModel(models.Model):
return True
if self.matching_algorithm == self.MATCH_ANY:
for word in self.match.split(" "):
for word in self._split_match():
if re.search(r"\b{}\b".format(word), text, **search_kwargs):
return True
return False
@@ -121,6 +123,21 @@ class MatchingModel(models.Model):
raise NotImplementedError("Unsupported matching algorithm")
def _split_match(self):
"""
Splits the match to individual keywords, getting rid of unnecessary
spaces and grouping quoted words together.
Example:
' some random words "with quotes " and spaces'
==>
["some", "random", "words", "with\s+quotes", "and", "spaces"]
"""
findterms = re.compile(r'"([^"]+)"|(\S+)').findall
normspace = re.compile(r"\s+").sub
return [normspace(r"\s+", (t[0] or t[1]).strip())
for t in findterms(self.match)]
def save(self, *args, **kwargs):
self.match = self.match.lower()

View File

@@ -1,6 +0,0 @@
{% load hacks %}
{# See documents.templatetags.hacks.change_list_results for an explanation #}
{% change_list_results %}

View File

@@ -0,0 +1,13 @@
{% extends 'admin/change_form.html' %}
{% block footer %}
{{ block.super }}
{# Hack to force Django to make the created date a date input rather than `text` (the default) #}
<script>
django.jQuery(".field-created input").first().attr("type", "date")
</script>
{% endblock footer %}

View File

@@ -0,0 +1,12 @@
{% extends 'admin/change_list.html' %}
{% load admin_actions from admin_list%}
{% load result_list from hacks %}
{% block result_list %}
{% if action_form and actions_on_top and cl.show_admin_actions %}{% admin_actions %}{% endif %}
{% result_list cl %}
{% if action_form and actions_on_bottom and cl.show_admin_actions %}{% admin_actions %}{% endif %}
{% endblock %}

View File

@@ -29,18 +29,13 @@
.result .header {
padding: 5px;
background-color: #79AEC8;
height: 6em;
}
.result .header .checkbox {
margin-right: 5px;
}
.result .header .checkbox{
width: 5%;
float: left;
}
.result .header .info {
width: 90%;
float: left;
margin-left: 10%;
}
.result .header a,
.result a.tag {

View File

@@ -6,5 +6,6 @@
<meta charset="utf-8">
</head>
<body>
{# One day someone (maybe even myself) is going to write a proper web front-end for Paperless, and this is where it'll start. #}
</body>
</html>

View File

@@ -1,41 +1,28 @@
import os
from django.contrib import admin
from django.contrib.admin.templatetags.admin_list import (
result_headers,
result_hidden_fields,
results
)
from django.template import Library
from django.template.loader import get_template
from ..models import Document
register = Library()
@register.simple_tag(takes_context=True)
def change_list_results(context):
@register.inclusion_tag("admin/documents/document/change_list_results.html")
def result_list(cl):
"""
Django has a lot of places where you can override defaults, but
unfortunately, `change_list_results.html` is not one of them. In fact,
it's a downright pain in the ass to override this file on a per-model basis
and this is the cleanest way I could come up with.
Basically all we've done here is defined `change_list_results.html` in an
`admin` directory which globally overrides that file for *every* model.
That template however simply loads this templatetag which determines
whether we're currently looking at a `Document` listing or something else
and loads the appropriate file in each case.
Better work arounds for this are welcome as I hate this myself, but at the
moment, it's all I could come up with.
Copy/pasted from django.contrib.admin.templatetags.admin_list just so I can
modify the value passed to `.inclusion_tag()` in the decorator here. There
must be a cleaner way... right?
"""
path = os.path.join(
os.path.dirname(admin.__file__),
"templates",
"admin",
"change_list_results.html"
)
if context["cl"].model == Document:
path = "admin/documents/document/change_list_results.html"
return get_template(path).render(context)
headers = list(result_headers(cl))
num_sorted_fields = 0
for h in headers:
if h['sortable'] and h['sorted']:
num_sorted_fields += 1
return {'cl': cl,
'result_hidden_fields': list(result_hidden_fields(cl)),
'result_headers': headers,
'num_sorted_fields': num_sorted_fields,
'results': list(results(cl))}

View File

@@ -1,6 +1,6 @@
from random import randint
from django.test import TestCase
from django.test import TestCase, override_settings
from ..models import Correspondent, Document, Tag
from ..signals import document_consumption_finished
@@ -16,9 +16,15 @@ class TestMatching(TestCase):
matching_algorithm=getattr(klass, algorithm)
)
for string in true:
self.assertTrue(instance.matches(string))
self.assertTrue(
instance.matches(string),
'"%s" should match "%s" but it does not' % (text, string)
)
for string in false:
self.assertFalse(instance.matches(string))
self.assertFalse(
instance.matches(string),
'"%s" should not match "%s" but it does' % (text, string)
)
def test_match_all(self):
@@ -54,6 +60,21 @@ class TestMatching(TestCase):
)
)
self._test_matching(
'brown fox "lazy dogs"',
"MATCH_ALL",
(
"the quick brown fox jumped over the lazy dogs",
"the quick brown fox jumped over the lazy dogs",
),
(
"the quick fox jumped over the lazy dogs",
"the quick brown wolf jumped over the lazy dogs",
"the quick brown fox jumped over the fat dogs",
"the quick brown fox jumped over the lazy... dogs",
)
)
def test_match_any(self):
self._test_matching(
@@ -89,6 +110,18 @@ class TestMatching(TestCase):
)
)
self._test_matching(
'"brown fox" " lazy dogs "',
"MATCH_ANY",
(
"the quick brown fox",
"jumped over the lazy dogs.",
),
(
"the lazy fox jumped over the brown dogs",
)
)
def test_match_literal(self):
self._test_matching(
@@ -166,7 +199,8 @@ class TestMatching(TestCase):
)
class TestApplications(TestCase):
@override_settings(POST_CONSUME_SCRIPT=None)
class TestDocumentConsumptionFinishedSignal(TestCase):
"""
We make use of document_consumption_finished, so we should test that it's
doing what we expect wrt to tag & correspondent matching.

View File

@@ -1,13 +1,17 @@
from django.conf import settings
from django.conf.urls import url, static, include
from django.conf.urls import include, static, url
from django.contrib import admin
from django.views.decorators.csrf import csrf_exempt
from django.views.generic import RedirectView
from rest_framework.routers import DefaultRouter
from documents.views import (
FetchView, PushView,
CorrespondentViewSet, TagViewSet, DocumentViewSet, LogViewSet
CorrespondentViewSet,
DocumentViewSet,
FetchView,
LogViewSet,
PushView,
TagViewSet
)
from reminders.views import ReminderViewSet
@@ -39,7 +43,9 @@ urlpatterns = [
# The Django admin
url(r"admin/", admin.site.urls),
url(r"", admin.site.urls), # This is going away
# Catch all redirect back to /admin
url(r"", RedirectView.as_view(permanent=True, url="/admin/")),
] + static.static(settings.MEDIA_URL, document_root=settings.MEDIA_ROOT)

View File

@@ -1 +1 @@
__version__ = (0, 6, 1)
__version__ = (1, 0, 0)

View File

@@ -5,7 +5,7 @@ from .parsers import RasterisedDocumentParser
class ConsumerDeclaration(object):
MATCHING_FILES = re.compile("^.*\.(pdf|jpg|gif|png|tiff?|pnm|bmp)$")
MATCHING_FILES = re.compile("^.*\.(pdf|jpe?g|gif|png|tiff?|pnm|bmp)$")
@classmethod
def handle(cls, sender, **kwargs):

View File

@@ -12,9 +12,9 @@ class SignalsTestCase(TestCase):
"A document with a . in it", "Doc with -- in it"
)
suffixes = (
"pdf", "jpg", "gif", "png", "tiff", "tif", "pnm", "bmp",
"PDF", "JPG", "GIF", "PNG", "TIFF", "TIF", "PNM", "BMP",
"pDf", "jPg", "gIf", "pNg", "tIff", "tIf", "pNm", "bMp",
"pdf", "jpg", "jpeg", "gif", "png", "tiff", "tif", "pnm", "bmp",
"PDF", "JPG", "JPEG", "GIF", "PNG", "TIFF", "TIF", "PNM", "BMP",
"pDf", "jPg", "jpEg", "gIf", "pNg", "tIff", "tIf", "pNm", "bMp",
)
for prefix in prefixes:

View File

@@ -1,5 +1,6 @@
[pytest]
DJANGO_SETTINGS_MODULE=paperless.settings
addopts = --pythonwarnings=all
env =
PAPERLESS_CONSUME=/tmp
PAPERLESS_PASSPHRASE=THISISNOTASECRET

View File

@@ -5,15 +5,18 @@
[tox]
skipsdist = True
envlist = py34, py35, py36, pep8
envlist = py34, py35, py36, pycodestyle
[testenv]
commands = pytest
deps = -r{toxinidir}/../requirements.txt
[testenv:pep8]
commands=pep8
deps=pep8
[testenv:pycodestyle]
commands=pycodestyle
deps=pycodestyle
[pep8]
exclude=.tox,migrations,paperless/settings.py
[pycodestyle]
exclude=
.tox,
migrations,
paperless/settings.py