Move docs to material-mkdocs

2025-12-31 13:58:04 -06:00 · 2022-11-29 12:49:23 -08:00
parent a96f79f6a3
commit 605f885e19
58 changed files with 5198 additions and 5895 deletions
--- a/docs/usage.md
+++ b/docs/usage.md
@@ -0,0 +1,483 @@
+# Usage Overview
+
+Paperless is an application that manages your personal documents. With
+the help of a document scanner (see [the scanners wiki](https://github.com/paperless-ngx/paperless-ngx/wiki/Scanner-&-Software-Recommendations)), paperless transforms your wieldy physical document binders
+into a searchable archive and provides many utilities for finding and
+managing your documents.
+
+## Terms and definitions
+
+Paperless essentially consists of two different parts for managing your
+documents:
+
+- The _consumer_ watches a specified folder and adds all documents in
+  that folder to paperless.
+- The _web server_ provides a UI that you use to manage and search for
+  your scanned documents.
+
+Each document has a couple of fields that you can assign to them:
+
+- A _Document_ is a piece of paper that sometimes contains valuable
+  information.
+- The _correspondent_ of a document is the person, institution or
+  company that a document either originates from, or is sent to.
+- A _tag_ is a label that you can assign to documents. Think of labels
+  as more powerful folders: Multiple documents can be grouped together
+  with a single tag, however, a single document can also have multiple
+  tags. This is not possible with folders. The reason folders are not
+  implemented in paperless is simply that tags are much more versatile
+  than folders.
+- A _document type_ is used to demarcate the type of a document such
+  as letter, bank statement, invoice, contract, etc. It is used to
+  identify what a document is about.
+- The _date added_ of a document is the date the document was scanned
+  into paperless. You cannot and should not change this date.
+- The _date created_ of a document is the date the document was
+  initially issued. This can be the date you bought a product, the
+  date you signed a contract, or the date a letter was sent to you.
+- The _archive serial number_ (short: ASN) of a document is the
+  identifier of the document in your physical document binders. See
+  `usage-recommended_workflow`{.interpreted-text role="ref"} below.
+- The _content_ of a document is the text that was OCR'ed from the
+  document. This text is fed into the search engine and is used for
+  matching tags, correspondents and document types.
+
+## Adding documents to paperless
+
+Once you've got Paperless setup, you need to start feeding documents
+into it. When adding documents to paperless, it will perform the
+following operations on your documents:
+
+1.  OCR the document, if it has no text. Digital documents usually have
+    text, and this step will be skipped for those documents.
+2.  Paperless will create an archivable PDF/A document from your
+    document. If this document is coming from your scanner, it will have
+    embedded selectable text.
+3.  Paperless performs automatic matching of tags, correspondents and
+    types on the document before storing it in the database.
+
+!!! tip
+
+    This process can be configured to fit your needs. If you don't want
+    paperless to create archived versions for digital documents, you can
+    configure that by configuring `PAPERLESS_OCR_MODE=skip_noarchive`.
+    Please read the
+    [relevant section in the documentation](/configuration#ocr).
+
+!!! note
+
+    No matter which options you choose, Paperless will always store the
+    original document that it found in the consumption directory or in the
+    mail and will never overwrite that document. Archived versions are
+    stored alongside the original versions.
+
+### The consumption directory
+
+The primary method of getting documents into your database is by putting
+them in the consumption directory. The consumer runs in an infinite
+loop, looking for new additions to this directory. When it finds them,
+the consumer goes about the process of parsing them with the OCR,
+indexing what it finds, and storing it in the media directory.
+
+Getting stuff into this directory is up to you. If you're running
+Paperless on your local computer, you might just want to drag and drop
+files there, but if you're running this on a server and want your
+scanner to automatically push files to this directory, you'll need to
+setup some sort of service to accept the files from the scanner.
+Typically, you're looking at an FTP server like
+[Proftpd](http://www.proftpd.org/) or a Windows folder share with
+[Samba](http://www.samba.org/).
+
+### Web UI Upload
+
+The dashboard has a file drop field to upload documents to paperless.
+Simply drag a file onto this field or select a file with the file
+dialog. Multiple files are supported.
+
+You can also upload documents on any other page of the web UI by
+dragging-and-dropping files into your browser window.
+
+### Mobile upload {#usage-mobile_upload}
+
+The mobile app over at <https://github.com/qcasey/paperless_share>
+allows Android users to share any documents with paperless. This can be
+combined with any of the mobile scanning apps out there, such as Office
+Lens.
+
+Furthermore, there is the [Paperless
+App](https://github.com/bauerj/paperless_app) as well, which not only
+has document upload, but also document browsing and download features.
+
+### IMAP (Email) {#usage-email}
+
+You can tell paperless-ngx to consume documents from your email
+accounts. This is a very flexible and powerful feature, if you regularly
+received documents via mail that you need to archive. The mail consumer
+can be configured via the frontend settings (/settings/mail) in the following
+manner:
+
+1.  Define e-mail accounts.
+2.  Define mail rules for your account.
+
+These rules perform the following:
+
+1.  Connect to the mail server.
+2.  Fetch all matching mails (as defined by folder, maximum age and the
+    filters)
+3.  Check if there are any consumable attachments.
+4.  If so, instruct paperless to consume the attachments and optionally
+    use the metadata provided in the rule for the new document.
+5.  If documents were consumed from a mail, the rule action is performed
+    on that mail.
+
+Paperless will completely ignore mails that do not match your filters.
+It will also only perform the action on mails that it has consumed
+documents from.
+
+The actions all ensure that the same mail is not consumed twice by
+different means. These are as follows:
+
+- **Delete:** Immediately deletes mail that paperless has consumed
+  documents from. Use with caution.
+- **Mark as read:** Mark consumed mail as read. Paperless will not
+  consume documents from already read mails. If you read a mail before
+  paperless sees it, it will be ignored.
+- **Flag:** Sets the 'important' flag on mails with consumed
+  documents. Paperless will not consume flagged mails.
+- **Move to folder:** Moves consumed mails out of the way so that
+  paperless wont consume them again.
+- **Add custom Tag:** Adds a custom tag to mails with consumed
+  documents (the IMAP standard calls these "keywords"). Paperless
+  will not consume mails already tagged. Not all mail servers support
+  this feature!
+
+!!! warning
+
+    The mail consumer will perform these actions on all mails it has
+    consumed documents from. Keep in mind that the actual consumption
+    process may fail for some reason, leaving you with missing documents in
+    paperless.
+
+!!! note
+
+    With the correct set of rules, you can completely automate your email
+    documents. Create rules for every correspondent you receive digital
+    documents from and paperless will read them automatically. The default
+    action "mark as read" is pretty tame and will not cause any damage or
+    data loss whatsoever.
+
+    You can also setup a special folder in your mail account for paperless
+    and use your favorite mail client to move to be consumed mails into that
+    folder automatically or manually and tell paperless to move them to yet
+    another folder after consumption. It's up to you.
+
+!!! note
+
+    When defining a mail rule with a folder, you may need to try different
+    characters to define how the sub-folders are separated. Common values
+    include ".", "/" or "\|", but this varies by the mail server.
+    Check the documentation for your mail server. In the event of an error
+    fetching mail from a certain folder, check the Paperless logs. When a
+    folder is not located, Paperless will attempt to list all folders found
+    in the account to the Paperless logs.
+
+!!! note
+
+    Paperless will process the rules in the order defined in the admin page.
+
+    You can define catch-all rules and have them executed last to consume
+    any documents not matched by previous rules. Such a rule may assign an
+    "Unknown mail document" tag to consumed documents so you can inspect
+    them further.
+
+Paperless is set up to check your mails every 10 minutes. This can be
+configured on the 'Scheduled tasks' page in the admin.
+
+### REST API
+
+You can also submit a document using the REST API, see
+`api-file_uploads`{.interpreted-text role="ref"} for details.
+
+## Best practices {#basic-searching}
+
+Paperless offers a couple tools that help you organize your document
+collection. However, it is up to you to use them in a way that helps you
+organize documents and find specific documents when you need them. This
+section offers a couple ideas for managing your collection.
+
+Document types allow you to classify documents according to what they
+are. You can define types such as "Receipt", "Invoice", or
+"Contract". If you used to collect all your receipts in a single
+binder, you can recreate that system in paperless by defining a document
+type, assigning documents to that type and then filtering by that type
+to only see all receipts.
+
+Not all documents need document types. Sometimes its hard to determine
+what the type of a document is or it is hard to justify creating a
+document type that you only need once or twice. This is okay. As long as
+the types you define help you organize your collection in the way you
+want, paperless is doing its job.
+
+Tags can be used in many different ways. Think of tags are more
+versatile folders or binders. If you have a binder for documents related
+to university / your car or health care, you can create these binders in
+paperless by creating tags and assigning them to relevant documents.
+Just as with documents, you can filter the document list by tags and
+only see documents of a certain topic.
+
+With physical documents, you'll often need to decide which folder the
+document belongs to. The advantage of tags over folders and binders is
+that a single document can have multiple tags. A physical document
+cannot magically appear in two different folders, but with tags, this is
+entirely possible.
+
+!!! tip
+
+    This can be used in many different ways. One example: Imagine you're
+    working on a particular task, such as signing up for university. Usually
+    you'll need to collect a bunch of different documents that are already
+    sorted into various folders. With the tag system of paperless, you can
+    create a new group of documents that are relevant to this task without
+    destroying the already existing organization. When you're done with the
+    task, you could delete the tag again, which would be equal to sorting
+    documents back into the folder they belong into. Or keep the tag, up to
+    you.
+
+All of the logic above applies to correspondents as well. Attach them to
+documents if you feel that they help you organize your collection.
+
+When you've started organizing your documents, create a couple saved
+views for document collections you regularly access. This is equal to
+having labeled physical binders on your desk, except that these saved
+views are dynamic and simply update themselves as you add documents to
+the system.
+
+Here are a couple examples of tags and types that you could use in your
+collection.
+
+- An `inbox` tag for newly added documents that you haven't manually
+  edited yet.
+- A tag `car` for everything car related (repairs, registration,
+  insurance, etc)
+- A tag `todo` for documents that you still need to do something with,
+  such as reply, or perform some task online.
+- A tag `bank account x` for all bank statement related to that
+  account.
+- A tag `mail` for anything that you added to paperless via its mail
+  processing capabilities.
+- A tag `missing_metadata` when you still need to add some metadata to
+  a document, but can't or don't want to do this right now.
+
+## Searching {#basic-usage_searching}
+
+Paperless offers an extensive searching mechanism that is designed to
+allow you to quickly find a document you're looking for (for example,
+that thing that just broke and you bought a couple months ago, that
+contract you signed 8 years ago).
+
+When you search paperless for a document, it tries to match this query
+against your documents. Paperless will look for matching documents by
+inspecting their content, title, correspondent, type and tags. Paperless
+returns a scored list of results, so that documents matching your query
+better will appear further up in the search results.
+
+By default, paperless returns only documents which contain all words
+typed in the search bar. However, paperless also offers advanced search
+syntax if you want to drill down the results further.
+
+Matching documents with logical expressions:
+
+```
+shopname AND (product1 OR product2)
+```
+
+Matching specific tags, correspondents or types:
+
+```
+type:invoice tag:unpaid
+correspondent:university certificate
+```
+
+Matching dates:
+
+```
+created:[2005 to 2009]
+added:yesterday
+modified:today
+```
+
+Matching inexact words:
+
+```
+produ*name
+```
+
+!!! note
+
+    Inexact terms are hard for search indexes. These queries might take a
+    while to execute. That's why paperless offers auto complete and query
+    correction.
+
+All of these constructs can be combined as you see fit. If you want to
+learn more about the query language used by paperless, paperless uses
+Whoosh's default query language. Head over to [Whoosh query
+language](https://whoosh.readthedocs.io/en/latest/querylang.html). For
+details on what date parsing utilities are available, see [Date
+parsing](https://whoosh.readthedocs.io/en/latest/dates.html#parsing-date-queries).
+
+## The recommended workflow {#usage-recommended_workflow}
+
+Once you have familiarized yourself with paperless and are ready to use
+it for all your documents, the recommended workflow for managing your
+documents is as follows. This workflow also takes into account that some
+documents have to be kept in physical form, but still ensures that you
+get all the advantages for these documents as well.
+
+The following diagram shows how easy it is to manage your documents.
+
+![image](assets/recommended_workflow.png){width=400}
+
+### Preparations in paperless
+
+- Create an inbox tag that gets assigned to all new documents.
+- Create a TODO tag.
+
+### Processing of the physical documents
+
+Keep a physical inbox. Whenever you receive a document that you need to
+archive, put it into your inbox. Regularly, do the following for all
+documents in your inbox:
+
+1.  For each document, decide if you need to keep the document in
+    physical form. This applies to certain important documents, such as
+    contracts and certificates.
+2.  If you need to keep the document, write a running number on the
+    document before scanning, starting at one and counting upwards. This
+    is the archive serial number, or ASN in short.
+3.  Scan the document.
+4.  If the document has an ASN assigned, store it in a _single_ binder,
+    sorted by ASN. Don't order this binder in any other way.
+5.  If the document has no ASN, throw it away. Yay!
+
+Over time, you will notice that your physical binder will fill up. If it
+is full, label the binder with the range of ASNs in this binder (i.e.,
+"Documents 1 to 343"), store the binder in your cellar or elsewhere,
+and start a new binder.
+
+The idea behind this process is that you will never have to use the
+physical binders to find a document. If you need a specific physical
+document, you may find this document by:
+
+1.  Searching in paperless for the document.
+2.  Identify the ASN of the document, since it appears on the scan.
+3.  Grab the relevant document binder and get the document. This is easy
+    since they are sorted by ASN.
+
+### Processing of documents in paperless
+
+Once you have scanned in a document, proceed in paperless as follows.
+
+1.  If the document has an ASN, assign the ASN to the document.
+2.  Assign a correspondent to the document (i.e., your employer, bank,
+    etc) This isn't strictly necessary but helps in finding a document
+    when you need it.
+3.  Assign a document type (i.e., invoice, bank statement, etc) to the
+    document This isn't strictly necessary but helps in finding a
+    document when you need it.
+4.  Assign a proper title to the document (the name of an item you
+    bought, the subject of the letter, etc)
+5.  Check that the date of the document is correct. Paperless tries to
+    read the date from the content of the document, but this fails
+    sometimes if the OCR is bad or multiple dates appear on the
+    document.
+6.  Remove inbox tags from the documents.
+
+!!! tip
+
+    You can setup manual matching rules for your correspondents and tags and
+    paperless will assign them automatically. After consuming a couple
+    documents, you can even ask paperless to *learn* when to assign tags and
+    correspondents by itself. For details on this feature, see
+    `advanced-matching`{.interpreted-text role="ref"}.
+
+### Task management
+
+Some documents require attention and require you to act on the document.
+You may take two different approaches to handle these documents based on
+how regularly you intend to scan documents and use paperless.
+
+- If you scan and process your documents in paperless regularly,
+  assign a TODO tag to all scanned documents that you need to process.
+  Create a saved view on the dashboard that shows all documents with
+  this tag.
+- If you do not scan documents regularly and use paperless solely for
+  archiving, create a physical todo box next to your physical inbox
+  and put documents you need to process in the TODO box. When you
+  performed the task associated with the document, move it to the
+  inbox.
+
+## Architectue
+
+Paperless-ngx consists of the following components:
+
+- **The webserver:** This serves the administration pages, the API,
+  and the new frontend. This is the main tool you'll be using to interact
+  with paperless. You may start the webserver directly with
+
+  ```shell-session
+  $ cd /path/to/paperless/src/
+  $ gunicorn -c ../gunicorn.conf.py paperless.wsgi
+  ```
+
+  or by any other means such as Apache `mod_wsgi`.
+
+- **The consumer:** This is what watches your consumption folder for
+  documents. However, the consumer itself does not really consume your
+  documents. Now it notifies a task processor that a new file is ready
+  for consumption. I suppose it should be named differently. This was
+  also used to check your emails, but that's now done elsewhere as
+  well.
+
+  Start the consumer with the management command `document_consumer`:
+
+  ```shell-session
+  $ cd /path/to/paperless/src/
+  $ python3 manage.py document_consumer
+  ```
+
+- **The task processor:** Paperless relies on [Celery - Distributed
+  Task Queue](https://docs.celeryq.dev/en/stable/index.html) for doing
+  most of the heavy lifting. This is a task queue that accepts tasks
+  from multiple sources and processes these in parallel. It also comes
+  with a scheduler that executes certain commands periodically.
+
+  This task processor is responsible for:
+
+  - Consuming documents. When the consumer finds new documents, it
+    notifies the task processor to start a consumption task.
+  - The task processor also performs the consumption of any
+    documents you upload through the web interface.
+  - Consuming emails. It periodically checks your configured
+    accounts for new emails and notifies the task processor to
+    consume the attachment of an email.
+  - Maintaining the search index and the automatic matching
+    algorithm. These are things that paperless needs to do from time
+    to time in order to operate properly.
+
+  This allows paperless to process multiple documents from your
+  consumption folder in parallel! On a modern multi core system, this
+  makes the consumption process with full OCR blazingly fast.
+
+  The task processor comes with a built-in admin interface that you
+  can use to check whenever any of the tasks fail and inspect the
+  errors (i.e., wrong email credentials, errors during consuming a
+  specific file, etc).
+
+- A [redis](https://redis.io/) message broker: This is a really
+  lightweight service that is responsible for getting the tasks from
+  the webserver and the consumer to the task scheduler. These run in a
+  different process (maybe even on different machines!), and
+  therefore, this is necessary.
+
+- Optional: A database server. Paperless supports PostgreSQL, MariaDB
+  and SQLite for storing its data.