From c4286d0a481e17b6d3bed4056c4462fba1005560 Mon Sep 17 00:00:00 2001 From: Jonas Winkler Date: Tue, 22 Dec 2020 16:33:41 +0100 Subject: [PATCH 01/10] Update README.md --- README.md | 7 +------ 1 file changed, 1 insertion(+), 6 deletions(-) diff --git a/README.md b/README.md index 32ff2ab4a..526644685 100644 --- a/README.md +++ b/README.md @@ -39,14 +39,13 @@ Here's what you get: * Auto completion suggests relevant words from your documents. * Results are sorted by relevance to your search query. * Highlighting shows you which parts of the document matched the query. + * Searching for similar documents ("More like this") * Email processing: Paperless adds documents from your email accounts. * Configure multiple accounts and filters for each account. * When adding documents from mails, paperless can move these mails to a new folder, mark them as read, flag them or delete them. * Machine learning powered document matching. * Paperless learns from your documents and will be able to automatically assign tags, correspondents and types to documents once you've stored a few documents in paperless. * A task processor that processes documents in parallel and also tells you when something goes wrong. On modern multi core systems, consumption is blazing fast. -* Code cleanup in many, MANY areas. Some of the code from OG paperless was just overly complicated. -* More tests, more stability. If you want to see some screenshots of paperless-ng in action, [some are available in the documentation](https://paperless-ng.readthedocs.io/en/latest/screenshots.html). @@ -54,10 +53,7 @@ For a complete list of changes from paperless, check out the [changelog](https:/ # Roadmap for 1.0 -- **Bulk editing**. Add/remove metadata from multiple documents at once. - - Make the front end nice (except mobile). -- Test coverage at 90%. - Fix whatever bugs I and you find. ## Roadmap for versions beyond 1.0 @@ -66,7 +62,6 @@ These are things that I want to add to paperless eventually. They are sorted by - **More search.** The search backend is incredibly versatile and customizable. Searching is the most important feature of this project and thus, I want to implement things like: - Group and limit search results by correspondent, show “more from this” links in the results. - - Ability to search for “Similar documents” in the search results - **Nested tags**. Organize tags in a hierarchical structure. This will combine the benefits of folders and tags in one coherent system. - **An interactive consumer** that shows its progress for documents it processes on the web page. - With live updates ans websockets. This already works on a dev branch, but requires a lot of new dependencies, which I'm not particular happy about. From 7d676a75a84dbf1e82a044c0a1dd5569daa31537 Mon Sep 17 00:00:00 2001 From: Jonas Winkler Date: Tue, 22 Dec 2020 17:26:07 +0100 Subject: [PATCH 02/10] Update README.md --- README.md | 15 ++++++++++++--- 1 file changed, 12 insertions(+), 3 deletions(-) diff --git a/README.md b/README.md index 526644685..978684d76 100644 --- a/README.md +++ b/README.md @@ -5,7 +5,7 @@ # Paperless-ng -[Paperless](https://github.com/the-paperless-project/paperless) is an application by Daniel Quinn and others that indexes your scanned documents and allows you to easily search for documents and store metadata alongside your documents. +[Paperless](https://github.com/the-paperless-project/paperless) is an application by Daniel Quinn and contributors that indexes your scanned documents and allows you to easily search for documents and store metadata alongside your documents. Paperless-ng is a fork of the original project, adding a new interface and many other changes under the hood. For a detailed list of changes, have a look at the changelog in the documentation. @@ -64,10 +64,10 @@ These are things that I want to add to paperless eventually. They are sorted by - Group and limit search results by correspondent, show “more from this” links in the results. - **Nested tags**. Organize tags in a hierarchical structure. This will combine the benefits of folders and tags in one coherent system. - **An interactive consumer** that shows its progress for documents it processes on the web page. - - With live updates ans websockets. This already works on a dev branch, but requires a lot of new dependencies, which I'm not particular happy about. + - With live updates and websockets. This already works on a dev branch, but requires a lot of new dependencies, which I'm not particularly happy about. - Notifications when a document was added with buttons to open the new document right away. - **Arbitrary tag colors**. Allow the selection of any color with a color picker. -- **More file types**. Possibly allow more file types to be processed by paperless, such as office .odt, .doc, .docx documents. +- **More file types**. Possibly allow more file types to be processed by paperless, such as office .odt, .doc and .docx documents. Apart from that, paperless is pretty much feature complete. @@ -75,6 +75,15 @@ Apart from that, paperless is pretty much feature complete. - **GnuPG encrypion.** [Here's a note about encryption in paperless](https://paperless-ng.readthedocs.io/en/latest/administration.html#managing-encryption). The gist of it is that I don't see which attacks this implementation protects against. It gives a false sense of security to users who don't care about how it works. +## Wont-do list. + +These features will probably never make it into paperless, since paperless is meant to be an easy to use set-and-forget solution. + +- **Document versions.** I might consider adding the ability to update a document with a newer version, but that's about it. The kind of documents that get added to paperless usually don't change at all. +- **Workflows.** I don't see a use case for these, yet. +- **Folders.** Tags are superior in just about every way. +- **Apps / extension support.** Again, paperless is meant to be simple. + # Getting started The recommended way to deploy paperless is docker-compose. Don't clone the repository, grab the latest release to get started instead. The dockerfiles archive contains just the docker files which will pull the image from docker hub. The source archive contains everything you need to build the docker image yourself (i.e. if you want to run on Raspberry Pi). From 9c2c74ad2b67bdfcf8c2b74a76fd82ff2ff9e5f9 Mon Sep 17 00:00:00 2001 From: Jonas Winkler Date: Wed, 23 Dec 2020 01:51:38 +0100 Subject: [PATCH 03/10] Update README.md --- README.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/README.md b/README.md index 978684d76..214f90b6f 100644 --- a/README.md +++ b/README.md @@ -1,7 +1,7 @@ [![Build Status](https://travis-ci.org/jonaswinkler/paperless-ng.svg?branch=master)](https://travis-ci.org/jonaswinkler/paperless-ng) [![Documentation Status](https://readthedocs.org/projects/paperless-ng/badge/?version=latest)](https://paperless-ng.readthedocs.io/en/latest/?badge=latest) [![Docker Hub Pulls](https://img.shields.io/docker/pulls/jonaswinkler/paperless-ng.svg)](https://hub.docker.com/r/jonaswinkler/paperless-ng) -[![Coverage Status](https://coveralls.io/repos/github/jonaswinkler/paperless-ng/badge.svg?branch=master)](https://coveralls.io/github/jonaswinkler/paperless-ng?branch=master) +[![Coverage Status](https://coveralls.io/repos/github/jonaswinkler/paperless-ng/badge.svg?branch=master)](https://coveralls.io/github/jonaswinkler/paperless-ng?branch=master) *<-- green badge, yay :)* # Paperless-ng From 6589369e1b62505d79801ca8d496e6e6c6950151 Mon Sep 17 00:00:00 2001 From: Jonas Winkler Date: Wed, 23 Dec 2020 02:29:58 +0100 Subject: [PATCH 04/10] Update README.md --- README.md | 1 + 1 file changed, 1 insertion(+) diff --git a/README.md b/README.md index 214f90b6f..1806fd4bb 100644 --- a/README.md +++ b/README.md @@ -63,6 +63,7 @@ These are things that I want to add to paperless eventually. They are sorted by - **More search.** The search backend is incredibly versatile and customizable. Searching is the most important feature of this project and thus, I want to implement things like: - Group and limit search results by correspondent, show “more from this” links in the results. - **Nested tags**. Organize tags in a hierarchical structure. This will combine the benefits of folders and tags in one coherent system. +- **Localization.** I won't translate paperless into any other languages except English and German, however, I'll add the necessary means so that anyone can translate paperless into their favorite language. - **An interactive consumer** that shows its progress for documents it processes on the web page. - With live updates and websockets. This already works on a dev branch, but requires a lot of new dependencies, which I'm not particularly happy about. - Notifications when a document was added with buttons to open the new document right away. From c54d26ed195c12a666480111855b7ac34252161a Mon Sep 17 00:00:00 2001 From: Jonas Winkler Date: Wed, 23 Dec 2020 16:09:13 +0100 Subject: [PATCH 05/10] Update README.md --- README.md | 1 + 1 file changed, 1 insertion(+) diff --git a/README.md b/README.md index 1806fd4bb..7a5f6028d 100644 --- a/README.md +++ b/README.md @@ -1,5 +1,6 @@ [![Build Status](https://travis-ci.org/jonaswinkler/paperless-ng.svg?branch=master)](https://travis-ci.org/jonaswinkler/paperless-ng) [![Documentation Status](https://readthedocs.org/projects/paperless-ng/badge/?version=latest)](https://paperless-ng.readthedocs.io/en/latest/?badge=latest) +[![Gitter](https://badges.gitter.im/paperless-ng/community.svg)](https://gitter.im/paperless-ng/community?utm_source=badge&utm_medium=badge&utm_campaign=pr-badge&utm_content=badge) [![Docker Hub Pulls](https://img.shields.io/docker/pulls/jonaswinkler/paperless-ng.svg)](https://hub.docker.com/r/jonaswinkler/paperless-ng) [![Coverage Status](https://coveralls.io/repos/github/jonaswinkler/paperless-ng/badge.svg?branch=master)](https://coveralls.io/github/jonaswinkler/paperless-ng?branch=master) *<-- green badge, yay :)* From 37caf6a64a6eaa96dba6cd5ed5d0e6cd7f70dc2f Mon Sep 17 00:00:00 2001 From: Andrew Rowson Date: Thu, 24 Dec 2020 11:53:20 +0000 Subject: [PATCH 06/10] Gunicorn should bind to both ipv4 and ipv6 As per https://docs.gunicorn.org/en/stable/settings.html#bind --- docker/supervisord.conf | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/docker/supervisord.conf b/docker/supervisord.conf index ebe0f005d..ff3ed4311 100644 --- a/docker/supervisord.conf +++ b/docker/supervisord.conf @@ -8,7 +8,7 @@ loglevel=info ; log level; default info; others: debug,warn,trace user=root [program:gunicorn] -command=gunicorn -c /usr/src/paperless/gunicorn.conf.py -b 0.0.0.0:8000 paperless.wsgi +command=gunicorn -c /usr/src/paperless/gunicorn.conf.py -b '[::]:8000' paperless.wsgi user=paperless stdout_logfile=/dev/stdout From 3f8c74c4af1c3e15670425ddfd06ff36c6b51e13 Mon Sep 17 00:00:00 2001 From: Andrew Rowson Date: Sat, 26 Dec 2020 11:53:29 +0000 Subject: [PATCH 07/10] Updated bind param gunicorn config file to listen on ipv6 --- docker/gunicorn.conf.py | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/docker/gunicorn.conf.py b/docker/gunicorn.conf.py index a2f456079..88d881664 100644 --- a/docker/gunicorn.conf.py +++ b/docker/gunicorn.conf.py @@ -1,4 +1,4 @@ -bind = '127.0.0.1:8000' +bind = '[::]:8000' backlog = 2048 workers = 3 worker_class = 'sync' From 5e5059b2e77f684aae3bd4c84c4699824f766161 Mon Sep 17 00:00:00 2001 From: Michael Shamoon <4887959+nikonratm@users.noreply.github.com> Date: Sat, 26 Dec 2020 21:16:12 -0800 Subject: [PATCH 08/10] Prevent iOS input zoom --- src-ui/src/styles.scss | 10 ++++++++++ 1 file changed, 10 insertions(+) diff --git a/src-ui/src/styles.scss b/src-ui/src/styles.scss index 6e09db630..7e9a9377a 100644 --- a/src-ui/src/styles.scss +++ b/src-ui/src/styles.scss @@ -100,3 +100,13 @@ body { padding-top: 1px; } } + +@supports (-webkit-touch-callout: none) { + input[type="number"], + input[type="search"], + input[type="text"], + select:focus, + textarea { + font-size: 16px; + } +} From fb830699752a13c044dcd00468a203c4ddcc6671 Mon Sep 17 00:00:00 2001 From: jonaswinkler Date: Sun, 27 Dec 2020 14:50:57 +0100 Subject: [PATCH 09/10] fix test case. --- src/documents/tests/test_consumer.py | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/src/documents/tests/test_consumer.py b/src/documents/tests/test_consumer.py index f53981850..795ca7f95 100644 --- a/src/documents/tests/test_consumer.py +++ b/src/documents/tests/test_consumer.py @@ -350,7 +350,7 @@ class TestConsumer(DirectoriesMixin, TestCase): try: self.consumer.try_consume_file(self.get_test_file()) except ConsumerError as e: - self.assertTrue("No parsers abvailable for" in str(e)) + self.assertEqual("Unsupported mime type application/pdf of file sample.pdf", str(e)) return self.fail("Should throw exception") From 6a70369a77246d298eb741a832736ad0148ec8ad Mon Sep 17 00:00:00 2001 From: jonaswinkler Date: Sun, 27 Dec 2020 17:05:35 +0100 Subject: [PATCH 10/10] update index after bulk edit operations #195 --- src/documents/bulk_edit.py | 27 +++++++++++++++++++++++++ src/documents/index.py | 13 +++++------- src/documents/tasks.py | 9 +++++++++ src/documents/tests/test_api.py | 36 ++++++++++++++++----------------- 4 files changed, 59 insertions(+), 26 deletions(-) diff --git a/src/documents/bulk_edit.py b/src/documents/bulk_edit.py index aa5b8ea3f..fd787f56a 100644 --- a/src/documents/bulk_edit.py +++ b/src/documents/bulk_edit.py @@ -1,6 +1,8 @@ from django.db.models import Q from django_q.tasks import async_task +from whoosh.writing import AsyncWriter +from documents import index from documents.models import Document, Correspondent, DocumentType @@ -13,6 +15,11 @@ def set_correspondent(doc_ids, correspondent): affected_docs = [doc.id for doc in qs] qs.update(correspondent=correspondent) + async_task( + "documents.tasks.bulk_index_documents", + document_ids=affected_docs + ) + async_task("documents.tasks.bulk_rename_files", document_ids=affected_docs) return "OK" @@ -27,6 +34,11 @@ def set_document_type(doc_ids, document_type): affected_docs = [doc.id for doc in qs] qs.update(document_type=document_type) + async_task( + "documents.tasks.bulk_index_documents", + document_ids=affected_docs + ) + async_task("documents.tasks.bulk_rename_files", document_ids=affected_docs) return "OK" @@ -44,6 +56,11 @@ def add_tag(doc_ids, tag): document_id=doc, tag_id=tag) for doc in affected_docs ]) + async_task( + "documents.tasks.bulk_index_documents", + document_ids=affected_docs + ) + async_task("documents.tasks.bulk_rename_files", document_ids=affected_docs) return "OK" @@ -61,6 +78,11 @@ def remove_tag(doc_ids, tag): Q(tag_id=tag) ).delete() + async_task( + "documents.tasks.bulk_index_documents", + document_ids=affected_docs + ) + async_task("documents.tasks.bulk_rename_files", document_ids=affected_docs) return "OK" @@ -69,4 +91,9 @@ def remove_tag(doc_ids, tag): def delete(doc_ids): Document.objects.filter(id__in=doc_ids).delete() + ix = index.open_index() + with AsyncWriter(ix) as writer: + for id in doc_ids: + index.remove_document_by_id(writer, id) + return "OK" diff --git a/src/documents/index.py b/src/documents/index.py index 308ee932e..51197e252 100644 --- a/src/documents/index.py +++ b/src/documents/index.py @@ -87,11 +87,6 @@ def open_index(recreate=False): def update_document(writer, doc): - # TODO: this line caused many issues all around, since: - # We need to make sure that this method does not get called with - # deserialized documents (i.e, document objects that don't come from - # Django's ORM interfaces directly. - logger.debug("Indexing {}...".format(doc)) tags = ",".join([t.name for t in doc.tags.all()]) writer.update_document( id=doc.pk, @@ -107,9 +102,11 @@ def update_document(writer, doc): def remove_document(writer, doc): - # TODO: see above. - logger.debug("Removing {} from index...".format(doc)) - writer.delete_by_term('id', doc.pk) + remove_document_by_id(writer, doc.pk) + + +def remove_document_by_id(writer, doc_id): + writer.delete_by_term('id', doc_id) def add_or_update_document(document): diff --git a/src/documents/tasks.py b/src/documents/tasks.py index fafe6e10f..c1f3ffbaa 100644 --- a/src/documents/tasks.py +++ b/src/documents/tasks.py @@ -94,3 +94,12 @@ def bulk_rename_files(document_ids): qs = Document.objects.filter(id__in=document_ids) for doc in qs: post_save.send(Document, instance=doc, created=False) + + +def bulk_index_documents(document_ids): + documents = Document.objects.filter(id__in=document_ids) + + ix = index.open_index() + with AsyncWriter(ix) as writer: + for doc in documents: + index.update_document(writer, doc) diff --git a/src/documents/tests/test_api.py b/src/documents/tests/test_api.py index 0262b6d6a..35b998a9d 100644 --- a/src/documents/tests/test_api.py +++ b/src/documents/tests/test_api.py @@ -699,49 +699,49 @@ class TestBulkEdit(DirectoriesMixin, APITestCase): self.assertEqual(Document.objects.filter(correspondent=self.c2).count(), 1) bulk_edit.set_correspondent([self.doc1.id, self.doc2.id, self.doc3.id], self.c2.id) self.assertEqual(Document.objects.filter(correspondent=self.c2).count(), 3) - self.async_task.assert_called_once() - args, kwargs = self.async_task.call_args - self.assertCountEqual(kwargs['document_ids'], [self.doc1.id, self.doc2.id]) + self.assertEqual(self.async_task.call_count, 2) + self.assertCountEqual(self.async_task.call_args_list[0][1]['document_ids'], [self.doc1.id, self.doc2.id]) + self.assertCountEqual(self.async_task.call_args_list[0][1]['document_ids'], [self.doc1.id, self.doc2.id]) def test_unset_correspondent(self): self.assertEqual(Document.objects.filter(correspondent=self.c2).count(), 1) bulk_edit.set_correspondent([self.doc1.id, self.doc2.id, self.doc3.id], None) self.assertEqual(Document.objects.filter(correspondent=self.c2).count(), 0) - self.async_task.assert_called_once() - args, kwargs = self.async_task.call_args - self.assertCountEqual(kwargs['document_ids'], [self.doc2.id, self.doc3.id]) + self.assertEqual(self.async_task.call_count, 2) + self.assertCountEqual(self.async_task.call_args_list[0][1]['document_ids'], [self.doc2.id, self.doc3.id]) + self.assertCountEqual(self.async_task.call_args_list[0][1]['document_ids'], [self.doc2.id, self.doc3.id]) def test_set_document_type(self): self.assertEqual(Document.objects.filter(document_type=self.dt2).count(), 1) bulk_edit.set_document_type([self.doc1.id, self.doc2.id, self.doc3.id], self.dt2.id) self.assertEqual(Document.objects.filter(document_type=self.dt2).count(), 3) - self.async_task.assert_called_once() - args, kwargs = self.async_task.call_args - self.assertCountEqual(kwargs['document_ids'], [self.doc1.id, self.doc2.id]) + self.assertEqual(self.async_task.call_count, 2) + self.assertCountEqual(self.async_task.call_args_list[0][1]['document_ids'], [self.doc1.id, self.doc2.id]) + self.assertCountEqual(self.async_task.call_args_list[0][1]['document_ids'], [self.doc1.id, self.doc2.id]) def test_unset_document_type(self): self.assertEqual(Document.objects.filter(document_type=self.dt2).count(), 1) bulk_edit.set_document_type([self.doc1.id, self.doc2.id, self.doc3.id], None) self.assertEqual(Document.objects.filter(document_type=self.dt2).count(), 0) - self.async_task.assert_called_once() - args, kwargs = self.async_task.call_args - self.assertCountEqual(kwargs['document_ids'], [self.doc2.id, self.doc3.id]) + self.assertEqual(self.async_task.call_count, 2) + self.assertCountEqual(self.async_task.call_args_list[0][1]['document_ids'], [self.doc2.id, self.doc3.id]) + self.assertCountEqual(self.async_task.call_args_list[0][1]['document_ids'], [self.doc2.id, self.doc3.id]) def test_add_tag(self): self.assertEqual(Document.objects.filter(tags__id=self.t1.id).count(), 2) bulk_edit.add_tag([self.doc1.id, self.doc2.id, self.doc3.id, self.doc4.id], self.t1.id) self.assertEqual(Document.objects.filter(tags__id=self.t1.id).count(), 4) - self.async_task.assert_called_once() - args, kwargs = self.async_task.call_args - self.assertCountEqual(kwargs['document_ids'], [self.doc1.id, self.doc3.id]) + self.assertEqual(self.async_task.call_count, 2) + self.assertCountEqual(self.async_task.call_args_list[0][1]['document_ids'], [self.doc1.id, self.doc3.id]) + self.assertCountEqual(self.async_task.call_args_list[0][1]['document_ids'], [self.doc1.id, self.doc3.id]) def test_remove_tag(self): self.assertEqual(Document.objects.filter(tags__id=self.t1.id).count(), 2) bulk_edit.remove_tag([self.doc1.id, self.doc3.id, self.doc4.id], self.t1.id) self.assertEqual(Document.objects.filter(tags__id=self.t1.id).count(), 1) - self.async_task.assert_called_once() - args, kwargs = self.async_task.call_args - self.assertCountEqual(kwargs['document_ids'], [self.doc4.id]) + self.assertEqual(self.async_task.call_count, 2) + self.assertCountEqual(self.async_task.call_args_list[0][1]['document_ids'], [self.doc4.id]) + self.assertCountEqual(self.async_task.call_args_list[0][1]['document_ids'], [self.doc4.id]) def test_delete(self): self.assertEqual(Document.objects.count(), 5)