From 758d53d816ae800a5dd0b87090a02770a2bbeccc Mon Sep 17 00:00:00 2001 From: Jonas Winkler Date: Wed, 18 Nov 2020 00:00:55 +0100 Subject: [PATCH] more documentation. --- docs/api.rst | 8 +++-- docs/setup.rst | 77 +++++++++++++++++++++++++++++++++++++++-- scripts/make-release.sh | 15 ++++++-- 3 files changed, 93 insertions(+), 7 deletions(-) diff --git a/docs/api.rst b/docs/api.rst index 16a0861df..34764540e 100644 --- a/docs/api.rst +++ b/docs/api.rst @@ -66,7 +66,7 @@ Result list object returned by the endpoint: "page": 1, "page_count": 1, "results": [ - ... + ] } @@ -83,11 +83,13 @@ Result object: { "id": 1, - "highlights": ..., + "highlights": [ + + ], "score": 6.34234, "rank": 23, "document": { - ... + } * ``id``: the primary key of the found document diff --git a/docs/setup.rst b/docs/setup.rst index 733ecbd0f..b9b240a10 100644 --- a/docs/setup.rst +++ b/docs/setup.rst @@ -23,6 +23,77 @@ There are multiple options available. that need to be compiled, and that's already done for you in the release. +Overview of Paperless-ng +######################## + +Compared to paperless, paperless-ng works a little different under the hood and has +more moving parts that work together. While this increases the complexity of +the system, it also brings many benefits. + +Paperless consists of the following components: + +* **The webserver:** This is pretty much the same as in paperless. It serves + the administration pages, the API, and the new frontend. This is the main + tool you'll be using to interact with paperless. You may start the webserver + with + + .. code:: shell-session + + $ cd /path/to/paperless/src/ + $ pipenv run gunicorn -c /usr/src/paperless/gunicorn.conf.py -b 0.0.0.0:8000 paperless.wsgi + + or by any other means such as Apache ``mod_wsgi``. + +* **The consumer:** This is what watches your consumption folder for documents. + However, the consumer itself does not consume really consume your documents anymore. + It rather notifies a task processor that a new file is ready for consumption. + I suppose it should be named differently. + This also used to check your emails, but that's now gone elsewhere as well. + + Start the consumer with the management command ``document_consumer``: + + .. code:: shell-session + + $ cd /path/to/paperless/src/ + $ pipenv run python3 manage.py document_consumer + +* **The task processor:** Paperless relies on `Django Q `_ + for doing much of the heavy lifting. This is a task queue that accepts tasks from + multiple sources and processes tasks in parallel. It also comes with a scheduler that executes + certain commands periodically. + + This task processor is responsible for: + + * Consuming documents. When the consumer finds new documents, it notifies the task processor to + start a consumption task. + * Consuming emails. It periodically checks your configured accounts for new mails and + produces consumption tasks for any documents it finds. + * The task processor also performs the consumption of any documents you upload through + the web interface. + * Maintain the search index and the automatic matching algorithm. These are things that paperless + needs to do from time to time in order to operate properly. + + This allows paperless to process multiple documents from your consumption folder in parallel! On + a modern multicore system, consumption with full ocr is blazing fast. + + The task processor comes with a built-in admin interface that you can use to see whenever any of the + tasks fail and inspect the errors. + + You may start the task processor by executing: + + .. code:: shell-session + + $ cd /path/to/paperless/src/ + $ pipenv run python3 manage.py qcluster + +* A `redis `_ message broker: This is a really lightweight service that is responsible + for getting the tasks from the webserver and consumer to the task scheduler. These run in different + processes (maybe even on different machines!), and therefore, this is necessary. + +* A database server. Paperless supports PostgreSQL and sqlite for storing its data. However, with the + added concurrency, it is strongly advised to use PostgreSQL, as sqlite has its limits in that regard. + + Installation ############ @@ -31,10 +102,12 @@ You can go multiple routes with setting up and running Paperless: * The `docker route`_ * The `bare metal route`_ -The `docker route`_ is quick & easy. This is the recommended route. +The `docker route`_ is quick & easy. This is the recommended route. This configures all the stuff +from above automatically so that it just works and uses sensible defaults for all configuration options. The `bare metal route`_ is more complicated to setup but makes it easier -should you want to contribute some code back. +should you want to contribute some code back. You need to configure and +run the above mentioned components yourself. Docker Route ============ diff --git a/scripts/make-release.sh b/scripts/make-release.sh index df8c8ab08..4b509b5bf 100755 --- a/scripts/make-release.sh +++ b/scripts/make-release.sh @@ -2,6 +2,15 @@ set -e + +VERSION=$1 + +if [ -z "$VERSION" ] +then + echo "Need a version string." + exit 1 +fi + # source root directory of paperless PAPERLESS_ROOT=$(git rev-parse --show-toplevel) @@ -81,10 +90,12 @@ cp "$PAPERLESS_ROOT/docker/supervisord.conf" "$PAPERLESS_DIST_APP/docker/" cd "$PAPERLESS_DIST_APP" -docker-compose build +docker build . -t "jonaswinkler/paperless-ng:$VERSION" + +docker push "jonaswinkler/paperless-ng:$VERSION" # works. package the app! cd "$PAPERLESS_DIST" -tar -cJf paperless-ng.tar.xz paperless-ng/ +tar -cJf "paperless-ng-$VERSION.tar.xz" paperless-ng/