mirror of
https://github.com/paperless-ngx/paperless-ngx.git
synced 2025-04-02 13:45:10 -05:00
Merge branch 'dev' into feature-bulk-edit
This commit is contained in:
commit
80420a99f5
26
README.md
26
README.md
@ -1,11 +1,12 @@
|
||||
[](https://travis-ci.org/jonaswinkler/paperless-ng)
|
||||
[](https://paperless-ng.readthedocs.io/en/latest/?badge=latest)
|
||||
[](https://gitter.im/paperless-ng/community?utm_source=badge&utm_medium=badge&utm_campaign=pr-badge&utm_content=badge)
|
||||
[](https://hub.docker.com/r/jonaswinkler/paperless-ng)
|
||||
[](https://coveralls.io/github/jonaswinkler/paperless-ng?branch=master)
|
||||
[](https://coveralls.io/github/jonaswinkler/paperless-ng?branch=master) *<sup><-- green badge, yay :)</sup>*
|
||||
|
||||
# Paperless-ng
|
||||
|
||||
[Paperless](https://github.com/the-paperless-project/paperless) is an application by Daniel Quinn and others that indexes your scanned documents and allows you to easily search for documents and store metadata alongside your documents.
|
||||
[Paperless](https://github.com/the-paperless-project/paperless) is an application by Daniel Quinn and contributors that indexes your scanned documents and allows you to easily search for documents and store metadata alongside your documents.
|
||||
|
||||
Paperless-ng is a fork of the original project, adding a new interface and many other changes under the hood. For a detailed list of changes, have a look at the changelog in the documentation.
|
||||
|
||||
@ -39,14 +40,13 @@ Here's what you get:
|
||||
* Auto completion suggests relevant words from your documents.
|
||||
* Results are sorted by relevance to your search query.
|
||||
* Highlighting shows you which parts of the document matched the query.
|
||||
* Searching for similar documents ("More like this")
|
||||
* Email processing: Paperless adds documents from your email accounts.
|
||||
* Configure multiple accounts and filters for each account.
|
||||
* When adding documents from mails, paperless can move these mails to a new folder, mark them as read, flag them or delete them.
|
||||
* Machine learning powered document matching.
|
||||
* Paperless learns from your documents and will be able to automatically assign tags, correspondents and types to documents once you've stored a few documents in paperless.
|
||||
* A task processor that processes documents in parallel and also tells you when something goes wrong. On modern multi core systems, consumption is blazing fast.
|
||||
* Code cleanup in many, MANY areas. Some of the code from OG paperless was just overly complicated.
|
||||
* More tests, more stability.
|
||||
|
||||
If you want to see some screenshots of paperless-ng in action, [some are available in the documentation](https://paperless-ng.readthedocs.io/en/latest/screenshots.html).
|
||||
|
||||
@ -54,10 +54,7 @@ For a complete list of changes from paperless, check out the [changelog](https:/
|
||||
|
||||
# Roadmap for 1.0
|
||||
|
||||
- **Bulk editing**. Add/remove metadata from multiple documents at once.
|
||||
|
||||
- Make the front end nice (except mobile).
|
||||
- Test coverage at 90%.
|
||||
- Fix whatever bugs I and you find.
|
||||
|
||||
## Roadmap for versions beyond 1.0
|
||||
@ -66,13 +63,13 @@ These are things that I want to add to paperless eventually. They are sorted by
|
||||
|
||||
- **More search.** The search backend is incredibly versatile and customizable. Searching is the most important feature of this project and thus, I want to implement things like:
|
||||
- Group and limit search results by correspondent, show “more from this” links in the results.
|
||||
- Ability to search for “Similar documents” in the search results
|
||||
- **Nested tags**. Organize tags in a hierarchical structure. This will combine the benefits of folders and tags in one coherent system.
|
||||
- **Localization.** I won't translate paperless into any other languages except English and German, however, I'll add the necessary means so that anyone can translate paperless into their favorite language.
|
||||
- **An interactive consumer** that shows its progress for documents it processes on the web page.
|
||||
- With live updates ans websockets. This already works on a dev branch, but requires a lot of new dependencies, which I'm not particular happy about.
|
||||
- With live updates and websockets. This already works on a dev branch, but requires a lot of new dependencies, which I'm not particularly happy about.
|
||||
- Notifications when a document was added with buttons to open the new document right away.
|
||||
- **Arbitrary tag colors**. Allow the selection of any color with a color picker.
|
||||
- **More file types**. Possibly allow more file types to be processed by paperless, such as office .odt, .doc, .docx documents.
|
||||
- **More file types**. Possibly allow more file types to be processed by paperless, such as office .odt, .doc and .docx documents.
|
||||
|
||||
Apart from that, paperless is pretty much feature complete.
|
||||
|
||||
@ -80,6 +77,15 @@ Apart from that, paperless is pretty much feature complete.
|
||||
|
||||
- **GnuPG encrypion.** [Here's a note about encryption in paperless](https://paperless-ng.readthedocs.io/en/latest/administration.html#managing-encryption). The gist of it is that I don't see which attacks this implementation protects against. It gives a false sense of security to users who don't care about how it works.
|
||||
|
||||
## Wont-do list.
|
||||
|
||||
These features will probably never make it into paperless, since paperless is meant to be an easy to use set-and-forget solution.
|
||||
|
||||
- **Document versions.** I might consider adding the ability to update a document with a newer version, but that's about it. The kind of documents that get added to paperless usually don't change at all.
|
||||
- **Workflows.** I don't see a use case for these, yet.
|
||||
- **Folders.** Tags are superior in just about every way.
|
||||
- **Apps / extension support.** Again, paperless is meant to be simple.
|
||||
|
||||
# Getting started
|
||||
|
||||
The recommended way to deploy paperless is docker-compose. Don't clone the repository, grab the latest release to get started instead. The dockerfiles archive contains just the docker files which will pull the image from docker hub. The source archive contains everything you need to build the docker image yourself (i.e. if you want to run on Raspberry Pi).
|
||||
|
@ -1,4 +1,4 @@
|
||||
bind = '127.0.0.1:8000'
|
||||
bind = '[::]:8000'
|
||||
backlog = 2048
|
||||
workers = 3
|
||||
worker_class = 'sync'
|
||||
|
@ -8,7 +8,7 @@ loglevel=info ; log level; default info; others: debug,warn,trace
|
||||
user=root
|
||||
|
||||
[program:gunicorn]
|
||||
command=gunicorn -c /usr/src/paperless/gunicorn.conf.py -b 0.0.0.0:8000 paperless.wsgi
|
||||
command=gunicorn -c /usr/src/paperless/gunicorn.conf.py -b '[::]:8000' paperless.wsgi
|
||||
user=paperless
|
||||
|
||||
stdout_logfile=/dev/stdout
|
||||
|
@ -100,3 +100,13 @@ body {
|
||||
padding-top: 1px;
|
||||
}
|
||||
}
|
||||
|
||||
@supports (-webkit-touch-callout: none) {
|
||||
input[type="number"],
|
||||
input[type="search"],
|
||||
input[type="text"],
|
||||
select:focus,
|
||||
textarea {
|
||||
font-size: 16px;
|
||||
}
|
||||
}
|
||||
|
@ -2,7 +2,9 @@ import itertools
|
||||
|
||||
from django.db.models import Q
|
||||
from django_q.tasks import async_task
|
||||
from whoosh.writing import AsyncWriter
|
||||
|
||||
from documents import index
|
||||
from documents.models import Document, Correspondent, DocumentType
|
||||
|
||||
|
||||
@ -15,6 +17,11 @@ def set_correspondent(doc_ids, correspondent):
|
||||
affected_docs = [doc.id for doc in qs]
|
||||
qs.update(correspondent=correspondent)
|
||||
|
||||
async_task(
|
||||
"documents.tasks.bulk_index_documents",
|
||||
document_ids=affected_docs
|
||||
)
|
||||
|
||||
async_task("documents.tasks.bulk_rename_files", document_ids=affected_docs)
|
||||
|
||||
return "OK"
|
||||
@ -29,6 +36,11 @@ def set_document_type(doc_ids, document_type):
|
||||
affected_docs = [doc.id for doc in qs]
|
||||
qs.update(document_type=document_type)
|
||||
|
||||
async_task(
|
||||
"documents.tasks.bulk_index_documents",
|
||||
document_ids=affected_docs
|
||||
)
|
||||
|
||||
async_task("documents.tasks.bulk_rename_files", document_ids=affected_docs)
|
||||
|
||||
return "OK"
|
||||
@ -46,6 +58,11 @@ def add_tag(doc_ids, tag):
|
||||
document_id=doc, tag_id=tag) for doc in affected_docs
|
||||
])
|
||||
|
||||
async_task(
|
||||
"documents.tasks.bulk_index_documents",
|
||||
document_ids=affected_docs
|
||||
)
|
||||
|
||||
async_task("documents.tasks.bulk_rename_files", document_ids=affected_docs)
|
||||
|
||||
return "OK"
|
||||
@ -63,6 +80,11 @@ def remove_tag(doc_ids, tag):
|
||||
Q(tag_id=tag)
|
||||
).delete()
|
||||
|
||||
async_task(
|
||||
"documents.tasks.bulk_index_documents",
|
||||
document_ids=affected_docs
|
||||
)
|
||||
|
||||
async_task("documents.tasks.bulk_rename_files", document_ids=affected_docs)
|
||||
|
||||
return "OK"
|
||||
@ -92,4 +114,9 @@ def modify_tags(doc_ids, add_tags, remove_tags):
|
||||
def delete(doc_ids):
|
||||
Document.objects.filter(id__in=doc_ids).delete()
|
||||
|
||||
ix = index.open_index()
|
||||
with AsyncWriter(ix) as writer:
|
||||
for id in doc_ids:
|
||||
index.remove_document_by_id(writer, id)
|
||||
|
||||
return "OK"
|
||||
|
@ -87,11 +87,6 @@ def open_index(recreate=False):
|
||||
|
||||
|
||||
def update_document(writer, doc):
|
||||
# TODO: this line caused many issues all around, since:
|
||||
# We need to make sure that this method does not get called with
|
||||
# deserialized documents (i.e, document objects that don't come from
|
||||
# Django's ORM interfaces directly.
|
||||
logger.debug("Indexing {}...".format(doc))
|
||||
tags = ",".join([t.name for t in doc.tags.all()])
|
||||
writer.update_document(
|
||||
id=doc.pk,
|
||||
@ -107,9 +102,11 @@ def update_document(writer, doc):
|
||||
|
||||
|
||||
def remove_document(writer, doc):
|
||||
# TODO: see above.
|
||||
logger.debug("Removing {} from index...".format(doc))
|
||||
writer.delete_by_term('id', doc.pk)
|
||||
remove_document_by_id(writer, doc.pk)
|
||||
|
||||
|
||||
def remove_document_by_id(writer, doc_id):
|
||||
writer.delete_by_term('id', doc_id)
|
||||
|
||||
|
||||
def add_or_update_document(document):
|
||||
|
@ -94,3 +94,12 @@ def bulk_rename_files(document_ids):
|
||||
qs = Document.objects.filter(id__in=document_ids)
|
||||
for doc in qs:
|
||||
post_save.send(Document, instance=doc, created=False)
|
||||
|
||||
|
||||
def bulk_index_documents(document_ids):
|
||||
documents = Document.objects.filter(id__in=document_ids)
|
||||
|
||||
ix = index.open_index()
|
||||
with AsyncWriter(ix) as writer:
|
||||
for doc in documents:
|
||||
index.update_document(writer, doc)
|
||||
|
@ -699,49 +699,49 @@ class TestBulkEdit(DirectoriesMixin, APITestCase):
|
||||
self.assertEqual(Document.objects.filter(correspondent=self.c2).count(), 1)
|
||||
bulk_edit.set_correspondent([self.doc1.id, self.doc2.id, self.doc3.id], self.c2.id)
|
||||
self.assertEqual(Document.objects.filter(correspondent=self.c2).count(), 3)
|
||||
self.async_task.assert_called_once()
|
||||
args, kwargs = self.async_task.call_args
|
||||
self.assertCountEqual(kwargs['document_ids'], [self.doc1.id, self.doc2.id])
|
||||
self.assertEqual(self.async_task.call_count, 2)
|
||||
self.assertCountEqual(self.async_task.call_args_list[0][1]['document_ids'], [self.doc1.id, self.doc2.id])
|
||||
self.assertCountEqual(self.async_task.call_args_list[0][1]['document_ids'], [self.doc1.id, self.doc2.id])
|
||||
|
||||
def test_unset_correspondent(self):
|
||||
self.assertEqual(Document.objects.filter(correspondent=self.c2).count(), 1)
|
||||
bulk_edit.set_correspondent([self.doc1.id, self.doc2.id, self.doc3.id], None)
|
||||
self.assertEqual(Document.objects.filter(correspondent=self.c2).count(), 0)
|
||||
self.async_task.assert_called_once()
|
||||
args, kwargs = self.async_task.call_args
|
||||
self.assertCountEqual(kwargs['document_ids'], [self.doc2.id, self.doc3.id])
|
||||
self.assertEqual(self.async_task.call_count, 2)
|
||||
self.assertCountEqual(self.async_task.call_args_list[0][1]['document_ids'], [self.doc2.id, self.doc3.id])
|
||||
self.assertCountEqual(self.async_task.call_args_list[0][1]['document_ids'], [self.doc2.id, self.doc3.id])
|
||||
|
||||
def test_set_document_type(self):
|
||||
self.assertEqual(Document.objects.filter(document_type=self.dt2).count(), 1)
|
||||
bulk_edit.set_document_type([self.doc1.id, self.doc2.id, self.doc3.id], self.dt2.id)
|
||||
self.assertEqual(Document.objects.filter(document_type=self.dt2).count(), 3)
|
||||
self.async_task.assert_called_once()
|
||||
args, kwargs = self.async_task.call_args
|
||||
self.assertCountEqual(kwargs['document_ids'], [self.doc1.id, self.doc2.id])
|
||||
self.assertEqual(self.async_task.call_count, 2)
|
||||
self.assertCountEqual(self.async_task.call_args_list[0][1]['document_ids'], [self.doc1.id, self.doc2.id])
|
||||
self.assertCountEqual(self.async_task.call_args_list[0][1]['document_ids'], [self.doc1.id, self.doc2.id])
|
||||
|
||||
def test_unset_document_type(self):
|
||||
self.assertEqual(Document.objects.filter(document_type=self.dt2).count(), 1)
|
||||
bulk_edit.set_document_type([self.doc1.id, self.doc2.id, self.doc3.id], None)
|
||||
self.assertEqual(Document.objects.filter(document_type=self.dt2).count(), 0)
|
||||
self.async_task.assert_called_once()
|
||||
args, kwargs = self.async_task.call_args
|
||||
self.assertCountEqual(kwargs['document_ids'], [self.doc2.id, self.doc3.id])
|
||||
self.assertEqual(self.async_task.call_count, 2)
|
||||
self.assertCountEqual(self.async_task.call_args_list[0][1]['document_ids'], [self.doc2.id, self.doc3.id])
|
||||
self.assertCountEqual(self.async_task.call_args_list[0][1]['document_ids'], [self.doc2.id, self.doc3.id])
|
||||
|
||||
def test_add_tag(self):
|
||||
self.assertEqual(Document.objects.filter(tags__id=self.t1.id).count(), 2)
|
||||
bulk_edit.add_tag([self.doc1.id, self.doc2.id, self.doc3.id, self.doc4.id], self.t1.id)
|
||||
self.assertEqual(Document.objects.filter(tags__id=self.t1.id).count(), 4)
|
||||
self.async_task.assert_called_once()
|
||||
args, kwargs = self.async_task.call_args
|
||||
self.assertCountEqual(kwargs['document_ids'], [self.doc1.id, self.doc3.id])
|
||||
self.assertEqual(self.async_task.call_count, 2)
|
||||
self.assertCountEqual(self.async_task.call_args_list[0][1]['document_ids'], [self.doc1.id, self.doc3.id])
|
||||
self.assertCountEqual(self.async_task.call_args_list[0][1]['document_ids'], [self.doc1.id, self.doc3.id])
|
||||
|
||||
def test_remove_tag(self):
|
||||
self.assertEqual(Document.objects.filter(tags__id=self.t1.id).count(), 2)
|
||||
bulk_edit.remove_tag([self.doc1.id, self.doc3.id, self.doc4.id], self.t1.id)
|
||||
self.assertEqual(Document.objects.filter(tags__id=self.t1.id).count(), 1)
|
||||
self.async_task.assert_called_once()
|
||||
args, kwargs = self.async_task.call_args
|
||||
self.assertCountEqual(kwargs['document_ids'], [self.doc4.id])
|
||||
self.assertEqual(self.async_task.call_count, 2)
|
||||
self.assertCountEqual(self.async_task.call_args_list[0][1]['document_ids'], [self.doc4.id])
|
||||
self.assertCountEqual(self.async_task.call_args_list[0][1]['document_ids'], [self.doc4.id])
|
||||
|
||||
def test_modify_tags(self):
|
||||
tag_unrelated = Tag.objects.create(name="unrelated")
|
||||
|
@ -350,7 +350,7 @@ class TestConsumer(DirectoriesMixin, TestCase):
|
||||
try:
|
||||
self.consumer.try_consume_file(self.get_test_file())
|
||||
except ConsumerError as e:
|
||||
self.assertTrue("No parsers abvailable for" in str(e))
|
||||
self.assertEqual("Unsupported mime type application/pdf of file sample.pdf", str(e))
|
||||
return
|
||||
|
||||
self.fail("Should throw exception")
|
||||
|
Loading…
x
Reference in New Issue
Block a user