From d7160de9f1ebd7aafe765561af059125fe14275e Mon Sep 17 00:00:00 2001 From: Jonas Winkler Date: Mon, 23 Nov 2020 19:34:52 +0100 Subject: [PATCH] many changes to the documentation, mostly typos --- docs/administration.rst | 12 ++++++------ docs/advanced_usage.rst | 6 ++++-- docs/api.rst | 2 +- docs/changelog.rst | 6 +++--- docs/configuration.rst | 8 ++++---- docs/contributing.rst | 2 +- docs/faq.rst | 6 +++--- docs/scanners.rst | 2 +- docs/screenshots.rst | 2 +- docs/setup.rst | 6 +++--- docs/usage_overview.rst | 24 +++++++++++++++--------- 11 files changed, 42 insertions(+), 34 deletions(-) diff --git a/docs/administration.rst b/docs/administration.rst index a77c559f9..c582e83a0 100644 --- a/docs/administration.rst +++ b/docs/administration.rst @@ -30,7 +30,7 @@ Options available to docker installations: Paperless uses 3 volumes: * ``paperless_media``: This is where your documents are stored. - * ``paperless_data``: This is where auxilliary data is stored. This + * ``paperless_data``: This is where auxillary data is stored. This folder also contains the SQLite database, if you use it. * ``paperless_pgdata``: Exists only if you use PostgreSQL and contains the database. @@ -109,7 +109,7 @@ B. If you built the image yourself, grab the new archive and replace your curre .. hint:: You can usually keep your ``docker-compose.env`` file, since this file will - never include mandantory configuration options. However, it is worth checking + never include mandatory configuration options. However, it is worth checking out the new version of this file, since it might have new recommendations on what to configure. @@ -126,8 +126,8 @@ After grabbing the new release and unpacking the contents, do the following: $ pip install --upgrade pipenv $ cd /path/to/paperless - $ pipenv install $ pipenv clean + $ pipenv install This creates a new virtual environment (or uses your existing environment) and installs all dependencies into it. @@ -247,12 +247,12 @@ your already processed documents. When multiple document types or correspondents match a single document, the retagger won't assign these to the document. Specify ``--use-first`` -to override this behaviour and just use the first correspondent or type +to override this behavior and just use the first correspondent or type it finds. This option does not apply to tags, since any amount of tags can be applied to a document. Finally, ``-f`` specifies that you wish to overwrite already assigned -correspondents, types and/or tags. The default behaviour is to not +correspondents, types and/or tags. The default behavior is to not assign correspondents and types to documents that have this data already assigned. ``-f`` works differently for tags: By default, only additional tags get added to documents, no tags will be removed. With ``-f``, tags that don't @@ -341,7 +341,7 @@ Documents can be stored in Paperless using GnuPG encryption. .. danger:: - Encryption is depreceated since paperless-ng 0.9 and doesn't really provide any + Encryption is deprecated since paperless-ng 0.9 and doesn't really provide any additional security, since you have to store the passphrase in a configuration file on the same system as the encrypted documents for paperless to work. Furthermore, the entire text content of the documents is stored plain in the diff --git a/docs/advanced_usage.rst b/docs/advanced_usage.rst index a6f44ce48..653bee1c6 100644 --- a/docs/advanced_usage.rst +++ b/docs/advanced_usage.rst @@ -84,6 +84,8 @@ to the filename. PAPERLESS_FILENAME_PARSE_TRANSFORMS=[{"pattern":"^([a-z]+)_(\\d{8})_(\\d{6})_([0-9]+)\\.", "repl":"\\2\\3Z - \\4 - \\1."}, {"pattern":"^([a-z]+)_([0-9]+)\\.", "repl":" - \\2 - \\1."}] +.. _advanced-matching: + Matching tags, correspondents and document types ################################################ @@ -253,7 +255,7 @@ By default, paperless stores your documents in the media directory and renames t using the identifier which it has assigned to each document. You will end up getting files like ``0000123.pdf`` in your media directory. This isn't necessarily a bad thing, because you normally don't have to access these files manually. However, if -you wish to name your files differently, you can do that by adjustng the +you wish to name your files differently, you can do that by adjusting the ``PAPERLESS_FILENAME_FORMAT`` settings variable. This variable allows you to configure the filename (folders are allowed!) using @@ -278,7 +280,7 @@ will create a directory structure as follows: my_new_shoes-0000004.pdf Paperless appends the unique identifier of each document to the filename. This -avoides filename clashes. +avoids filename clashes. .. danger:: diff --git a/docs/api.rst b/docs/api.rst index e661cc3ff..4f41832de 100644 --- a/docs/api.rst +++ b/docs/api.rst @@ -94,7 +94,7 @@ Result object: } * ``id``: the primary key of the found document -* ``highlights``: an object containing parseable highlights for the result. +* ``highlights``: an object containing parsable highlights for the result. See below. * ``score``: The score assigned to the document. A higher score indicates a better match with the query. Search results are sorted descending by score. diff --git a/docs/changelog.rst b/docs/changelog.rst index 9ab853439..9ef3f4326 100644 --- a/docs/changelog.rst +++ b/docs/changelog.rst @@ -52,7 +52,7 @@ paperless-ng 0.9.0 * **Added:** New frontend. Features: * Single page application: It's much more responsive than the django admin pages. - * Dashboard. Shows recently scanned documents, or todos, or other documents + * Dashboard. Shows recently scanned documents, or todo notes, or other documents at wish. Allows uploading of documents. Shows basic statistics. * Better document list with multiple display options. * Full text search with result highlighting, auto completion and scoring based @@ -102,7 +102,7 @@ paperless-ng 0.9.0 * **Modified [breaking]:** PostgreSQL: - * If ``PAPERLESS_DBHOST`` is specified in the settings, paperless uses postgresql instead of sqlite. + * If ``PAPERLESS_DBHOST`` is specified in the settings, paperless uses PostgreSQL instead of SQLite. Username, database and password all default to ``paperless`` if not specified. * **Modified [breaking]:** document_retagger management command rework. See @@ -130,7 +130,7 @@ paperless-ng 0.9.0 Certain language specifics such as umlauts may not get picked up properly. * ``PAPERLESS_DEBUG`` defaults to ``false``. * The presence of ``PAPERLESS_DBHOST`` now determines whether to use PostgreSQL or - sqlite. + SQLite. * ``PAPERLESS_OCR_THREADS`` is gone and replaced with ``PAPERLESS_TASK_WORKERS`` and ``PAPERLESS_THREADS_PER_WORKER``. Refer to the config example for details. * ``PAPERLESS_OPTIMIZE_THUMBNAILS`` allows you to disable or enable thumbnail diff --git a/docs/configuration.rst b/docs/configuration.rst index afb0b5f90..e3f0c0e9f 100644 --- a/docs/configuration.rst +++ b/docs/configuration.rst @@ -69,7 +69,7 @@ PAPERLESS_CONSUMPTION_DIR= Defaults to "../consume", relative to the "src" directory. PAPERLESS_DATA_DIR= - This is where paperless stores all its data (search index, sqlite database, + This is where paperless stores all its data (search index, SQLite database, classification model, etc). Defaults to "../data", relative to the "src" directory. @@ -100,7 +100,7 @@ Hosting & Security ################## PAPERLESS_SECRET_KEY= - Paperless uses this to make session tokens. If you exose paperless on the + Paperless uses this to make session tokens. If you expose paperless on the internet, you need to change this, since the default secret is well known. Use any sequence of characters. The more, the better. You don't need to @@ -220,7 +220,7 @@ PAPERLESS_CONSUMER_POLLING= specify a polling interval in seconds here, which will then cause paperless to periodically check your consumption directory for changes. - Defaults to 0, which disables polling and uses filesystem notifiactions. + Defaults to 0, which disables polling and uses filesystem notifications. PAPERLESS_CONSUMER_DELETE_DUPLICATES= When the consumer detects a duplicate document, it will not touch the @@ -264,7 +264,7 @@ PAPERLESS_CONVERT_DENSITY= Default is 300. PAPERLESS_OPTIMIZE_THUMBNAILS= - Use optipng to optimize thumbnails. This usually reduces the sice of + Use optipng to optimize thumbnails. This usually reduces the size of thumbnails by about 20%, but uses considerable compute time during consumption. diff --git a/docs/contributing.rst b/docs/contributing.rst index 540081b7e..30eb9779a 100644 --- a/docs/contributing.rst +++ b/docs/contributing.rst @@ -85,7 +85,7 @@ quoted, or triple-quoted string will do: problematic_string = 'This is a "string" with "quotes" in it' In HTML templates, please use double-quotes for tag attributes, and single -quotes for arguments passed to Django tempalte tags: +quotes for arguments passed to Django template tags: .. code:: html diff --git a/docs/faq.rst b/docs/faq.rst index ea05544a6..7b5432326 100644 --- a/docs/faq.rst +++ b/docs/faq.rst @@ -17,7 +17,7 @@ is .. caution:: - Dont mess with this folder. Don't change permissions and don't move + Do not mess with this folder. Don't change permissions and don't move files around manually. This folder is meant to be entirely managed by docker and paperless. @@ -36,9 +36,9 @@ file extensions do not matter. **A:** The short answer is yes. I've tested it on a Raspberry Pi 3 B. The long answer is that certain parts of -Paperless will run very slow, such as the tesseract OCR. On Rasperry Pi, +Paperless will run very slow, such as the tesseract OCR. On Raspberry Pi, try to OCR documents before feeding them into paperless so that paperless can -reuse the text. The web interface should be alot snappier, since it runs +reuse the text. The web interface should be a lot snappier, since it runs in your browser and paperless has to do much less work to serve the data. .. note:: diff --git a/docs/scanners.rst b/docs/scanners.rst index 0c78f79e4..d4ad4dfb1 100644 --- a/docs/scanners.rst +++ b/docs/scanners.rst @@ -8,7 +8,7 @@ Scanner recommendations As Paperless operates by watching a folder for new files, doesn't care what scanner you use, but sometimes finding a scanner that will write to an FTP, NFS, or SMB server can be difficult. This page is here to help you find one -that works right for you based on recommentations from other Paperless users. +that works right for you based on recommendations from other Paperless users. +---------+----------------+-----+-----+-----+----------------+ | Brand | Model | Supports | Recommended By | diff --git a/docs/screenshots.rst b/docs/screenshots.rst index cf99641c5..7ba431563 100644 --- a/docs/screenshots.rst +++ b/docs/screenshots.rst @@ -21,7 +21,7 @@ Extensive filtering mechanisms: .. image:: _static/screenshots/documents-filter.png -Side-by-side editing of documents. Optmized for 1080p. +Side-by-side editing of documents. Optimized for 1080p. .. image:: _static/screenshots/editing.png diff --git a/docs/setup.rst b/docs/setup.rst index af2f47f90..d0e7099c7 100644 --- a/docs/setup.rst +++ b/docs/setup.rst @@ -85,7 +85,7 @@ Paperless consists of the following components: needs to do from time to time in order to operate properly. This allows paperless to process multiple documents from your consumption folder in parallel! On - a modern multicore system, consumption with full ocr is blazing fast. + a modern multi core system, consumption with full ocr is blazing fast. The task processor comes with a built-in admin interface that you can use to see whenever any of the tasks fail and inspect the errors (i.e., wrong email credentials, errors during consuming a specific @@ -322,7 +322,7 @@ management commands as below. $ cd /path/to/paperless $ docker-compose run --rm webserver /bin/bash - This will lauch the container and initialize the PostgreSQL database. + This will launch the container and initialize the PostgreSQL database. b) Without docker, open a shell in your virtual environment, switch to the ``src`` directory and create the database schema: @@ -372,7 +372,7 @@ configuring some options in paperless can help improve performance immensely: * ``PAPERLESS_TASK_WORKERS`` and ``PAPERLESS_THREADS_PER_WORKER`` are configured to use all cores. The Raspberry Pi models 3 and up have 4 cores, meaning that paperless will use 2 workers and 2 threads per worker. This may result in - slugish response times during consumption, so you might want to lower these + sluggish response times during consumption, so you might want to lower these settings (example: 2 workers and 1 thread to always have some computing power left for other tasks). * Keep ``PAPERLESS_OCR_ALWAYS`` at its default value 'false' and consider OCR'ing diff --git a/docs/usage_overview.rst b/docs/usage_overview.rst index 5f47b56a9..0e50dafc2 100644 --- a/docs/usage_overview.rst +++ b/docs/usage_overview.rst @@ -5,13 +5,13 @@ Usage Overview Paperless is an application that manages your personal documents. With the help of a document scanner (see :ref:`scanners`), paperless transforms your wieldy physical document binders into a searchable archive and -provices many utilities for finding and managing your documents. +provides many utilities for finding and managing your documents. Terms and definitions ##################### -Paperless esentially consists of two different parts for managing your +Paperless essentially consists of two different parts for managing your documents: * The *consumer* watches a specified folder and adds all documents in that @@ -30,12 +30,12 @@ Each document has a couple of fields that you can assign to them: tag, however, a single document can also have multiple tags. This is not possible with folders. The reason folders are not implemented in paperless is simply that tags are much more versatile than folders. -* A *document type* is used to demarkate the type of a document such as letter, +* A *document type* is used to demarcate the type of a document such as letter, bank statement, invoice, contract, etc. It is used to identify what a document is about. * The *date added* of a document is the date the document was scanned into paperless. You cannot and should not change this date. -* The *date created* of a document is the date the document was intially issued. +* The *date created* of a document is the date the document was initially issued. This can be the date you bought a product, the date you signed a contract, or the date a letter was sent to you. * The *archive serial number* (short: ASN) of a document is the identifier of @@ -131,7 +131,7 @@ These are as follows: With the correct set of rules, you can completely automate your email documents. Create rules for every correspondent you receive digital documents from and - paperless will read them automatically. The default acion "mark as read" is + paperless will read them automatically. The default action "mark as read" is pretty tame and will not cause any damage or data loss whatsoever. You can also setup a special folder in your mail account for paperless and use @@ -182,7 +182,7 @@ Processing of the physical documents ==================================== Keep a physical inbox. Whenever you receive a document that you need to -archive, put it into your inbox. Regulary, do the following for all documents +archive, put it into your inbox. Regularly, do the following for all documents in your inbox: 1. For each document, decide if you need to keep the document in physical @@ -217,18 +217,24 @@ Once you have scanned in a document, proceed in paperless as follows. 1. If the document has an ASN, assign the ASN to the document. 2. Assign a correspondent to the document (i.e., your employer, bank, etc) - This isnt strictly necessary but helps in finding a document when you need + This isn't strictly necessary but helps in finding a document when you need it. 3. Assign a document type (i.e., invoice, bank statement, etc) to the document - This isnt strictly necessary but helps in finding a document when you need + This isn't strictly necessary but helps in finding a document when you need it. 4. Assign a proper title to the document (the name of an item you bought, the subject of the letter, etc) -5. Check that the date of the document is corrent. Paperless tries to read +5. Check that the date of the document is correct. Paperless tries to read the date from the content of the document, but this fails sometimes if the OCR is bad or multiple dates appear on the document. 6. Remove inbox tags from the documents. +.. hint:: + + You can setup manual matching rules for your correspondents and tags and + paperless will assign them automatically. After consuming a couple documents, + you can even ask paperless to *learn* when to assign tags and correspondents + by itself. For details on this feature, see :ref:`advanced-matching`. Task management ===============