mirror of
https://github.com/paperless-ngx/paperless-ngx.git
synced 2025-04-02 13:45:10 -05:00

commit b1410a854e03087023c89998b14c3296ac669f1f Merge: f9ce4d8f 8ec9c77e Author: shamoon <4887959+shamoon@users.noreply.github.com> Date: Thu Dec 29 20:09:09 2022 -0800 Merge pull request #2263 from paperless-ngx/v1.11.0-changelog [Documentation] Add v1.11.0 changelog commit 8ec9c77e51dc492f6b7f468ab533204848a554b3 Author: github-actions <41898282+github-actions[bot]@users.noreply.github.com> Date: Fri Dec 30 04:08:17 2022 +0000 Changelog v1.11.0 - GHA commit f9ce4d8f6a9086d21f7f9c5411a28dd8b0b7135e Author: Michael Shamoon <4887959+shamoon@users.noreply.github.com> Date: Thu Dec 29 19:40:25 2022 -0800 Update version strings for 1.11.0 commit 8c9a74ee0ca03d1f1afd7dee9203648d48bb33c1 Merge: 605f86f0 0b59ef2c Author: Michael Shamoon <4887959+shamoon@users.noreply.github.com> Date: Thu Dec 29 19:39:38 2022 -0800 Merge branch 'dev' commit 605f86f0cfb908761d2f71d7e17c1e60668b7edf Merge: 800e842a 8cbaca22 Author: shamoon <4887959+shamoon@users.noreply.github.com> Date: Wed Dec 28 15:55:35 2022 -0800 Merge pull request #2256 from mendelk/patch-1 Fixed typo in docs commit 8cbaca22c12b5f3129b52a376dd56f00600f27be Author: Mendel Kramer <mendelk@users.noreply.github.com> Date: Wed Dec 28 18:16:00 2022 -0500 Fixed typo in docs commit 800e842ab304ce2fcb1c126d491dac0770ad66ff Author: ThellraAK <github.com@absurdlybored.com> Date: Wed Dec 21 01:36:37 2022 -0900 Removing Mariadb default open port (#2227) * Removing Mariadb default open port Removing the listening port 3306 for the DB, Docker networks will let the containers talk to one another. The existing setup would allow anyone to connect to the DB and use the default passwords. * Update docker-compose.mariadb-tika.yml Adding change to the other compose file to remove open port * Remove excess blank lines * Remove excess blank lines Co-authored-by: Felix E <felix@eckhofer.com> commit 6f6f365e2b36410110275ca92b5ba467500bb577 Merge: 6d324dbd 43b863b8 Author: shamoon <4887959+shamoon@users.noreply.github.com> Date: Sat Dec 17 19:58:06 2022 -0800 Merge pull request #2203 from tooomm/docs_updates Docs: More fixes and improvements commit 43b863b816337dd19dd9b903e76ecf50b47f1583 Author: tooomm <tooomm@users.noreply.github.com> Date: Sun Dec 11 19:44:18 2022 +0100 doc fixes This reverts commit e015babdc102a65a3cce0cc71812d3eb730da92e. link fix fix escaping, spacing, profile links, typo revert ~~add~~ at fixes Revert "~~add~~ at fixes" This reverts commit ce0192b733c19614048de81ea917660e25bb35f2. commit 6d324dbd8e73c5acdd3b53fd9013c70c53d012e1 Author: shamoon <4887959+shamoon@users.noreply.github.com> Date: Fri Dec 16 09:10:11 2022 -0800 Update config.yml commit 8ddf05e573c4bc2a55ef6d20f5e36181ccf534b5 Author: shamoon <4887959+shamoon@users.noreply.github.com> Date: Fri Dec 16 09:09:48 2022 -0800 Update bug-report.yml commit 0472dfe25a02b3bc9b148f435bcda6e2e2987355 Author: tooomm <tooomm@users.noreply.github.com> Date: Sun Dec 11 19:12:58 2022 +0100 Docs: Fix leftover issues from conversion (#2172) commit 8b36c9ad64bb7638e33d9cb22217f3d8345d5c1e Author: tooomm <tooomm@users.noreply.github.com> Date: Sun Dec 11 16:07:08 2022 +0100 more fixes and cleanup commit 1266f2d5b948b7d99dab267e34840ece6a3fbaa4 Author: tooomm <tooomm@users.noreply.github.com> Date: Sun Dec 11 12:06:15 2022 +0100 fix links commit 81960519592095df714fb0e0f7a0e907488fa269 Merge: 06a6eb03 d198142a Author: shamoon <4887959+shamoon@users.noreply.github.com> Date: Fri Dec 9 16:12:20 2022 -0800 Merge pull request #2157 from Weltraumschaf/patch-1 Update setup.md commit d198142a1ef8cdcaa0d19d126d67b4ade754fceb Author: Sven Strittmatter <ich@weltraumschaf.de> Date: Fri Dec 9 22:09:06 2022 +0100 Update setup.md W/o the slash it resolves to /setup/configuration/ which does 404. commit 06a6eb0326af6eb3bbe523b0c0061fc324578834 Author: Michael Shamoon <4887959+shamoon@users.noreply.github.com> Date: Fri Dec 9 08:15:03 2022 -0800 fix code block indentation commit 28819d6d0fb77b8f6030865b0c0d2a1b74a39cad Author: shamoon <4887959+shamoon@users.noreply.github.com> Date: Fri Dec 9 08:11:42 2022 -0800 Fix code block indentation commit 8cd5e25364768512af90c773c6a2d307cf59febe Merge: 32d54674 7788d932 Author: shamoon <4887959+shamoon@users.noreply.github.com> Date: Tue Dec 6 11:23:15 2022 -0800 Merge pull request #2137 from paperless-ngx/more-docs-cleanup Chore: Cleanup of new documentation commit 7788d932275fd108f6ab9425b1daeabd2c931422 Author: Trenton Holmes <797416+stumpylog@users.noreply.github.com> Date: Sun Dec 4 08:34:49 2022 -0800 Further cleanup of docs, including fixing autoconvert issues and general cleanups commit 32d546740bd4f086369d1a81ddb6658b2f9298b0 Merge: b0ca57a7 24da3e50 Author: shamoon <4887959+shamoon@users.noreply.github.com> Date: Sun Dec 4 19:12:27 2022 -0800 Merge pull request #2118 from alexander-bauer/chart-bump commit 24da3e50342d3494ba93c83a601c8f44c635e43d Author: Alexander Bauer <sasha@linux.com> Date: Mon Dec 5 02:51:35 2022 +0000 Bump Helm Chart version to trigger release commit b0ca57a7f0e5694f5442303e6b17cf6abe120f9a Merge: cdd49c51 c864b3cd Author: shamoon <4887959+shamoon@users.noreply.github.com> Date: Sun Dec 4 14:36:00 2022 -0800 Merge pull request #2114 from paperless-ngx/v1.10.2-changelog [Documentation] Add v1.10.2 changelog commit cdd49c51426e0de8937210a65e717fb46eea6101 Author: Michael Shamoon <4887959+shamoon@users.noreply.github.com> Date: Sun Dec 4 14:32:08 2022 -0800 Update frontend compilation info commit c864b3cd19da3dc37f2f3ba3afa34cfcb73892a8 Author: github-actions <41898282+github-actions[bot]@users.noreply.github.com> Date: Sun Dec 4 21:17:16 2022 +0000 Changelog v1.10.2 - GHA
484 lines
20 KiB
Markdown
484 lines
20 KiB
Markdown
# Usage Overview
|
|
|
|
Paperless is an application that manages your personal documents. With
|
|
the help of a document scanner (see [the scanners wiki](https://github.com/paperless-ngx/paperless-ngx/wiki/Scanner-&-Software-Recommendations)),
|
|
paperless transforms your unwieldy physical document binders into a searchable archive
|
|
and provides many utilities for finding and managing your documents.
|
|
|
|
## Terms and definitions
|
|
|
|
Paperless essentially consists of two different parts for managing your
|
|
documents:
|
|
|
|
- The _consumer_ watches a specified folder and adds all documents in
|
|
that folder to paperless.
|
|
- The _web server_ provides a UI that you use to manage and search for
|
|
your scanned documents.
|
|
|
|
Each document has a couple of fields that you can assign to them:
|
|
|
|
- A _Document_ is a piece of paper that sometimes contains valuable
|
|
information.
|
|
- The _correspondent_ of a document is the person, institution or
|
|
company that a document either originates from, or is sent to.
|
|
- A _tag_ is a label that you can assign to documents. Think of labels
|
|
as more powerful folders: Multiple documents can be grouped together
|
|
with a single tag, however, a single document can also have multiple
|
|
tags. This is not possible with folders. The reason folders are not
|
|
implemented in paperless is simply that tags are much more versatile
|
|
than folders.
|
|
- A _document type_ is used to demarcate the type of a document such
|
|
as letter, bank statement, invoice, contract, etc. It is used to
|
|
identify what a document is about.
|
|
- The _date added_ of a document is the date the document was scanned
|
|
into paperless. You cannot and should not change this date.
|
|
- The _date created_ of a document is the date the document was
|
|
initially issued. This can be the date you bought a product, the
|
|
date you signed a contract, or the date a letter was sent to you.
|
|
- The _archive serial number_ (short: ASN) of a document is the
|
|
identifier of the document in your physical document binders. See
|
|
[recommended workflow](#usage-recommended-workflow) below.
|
|
- The _content_ of a document is the text that was OCR'ed from the
|
|
document. This text is fed into the search engine and is used for
|
|
matching tags, correspondents and document types.
|
|
|
|
## Adding documents to paperless
|
|
|
|
Once you've got Paperless setup, you need to start feeding documents
|
|
into it. When adding documents to paperless, it will perform the
|
|
following operations on your documents:
|
|
|
|
1. OCR the document, if it has no text. Digital documents usually have
|
|
text, and this step will be skipped for those documents.
|
|
2. Paperless will create an archivable PDF/A document from your
|
|
document. If this document is coming from your scanner, it will have
|
|
embedded selectable text.
|
|
3. Paperless performs automatic matching of tags, correspondents and
|
|
types on the document before storing it in the database.
|
|
|
|
!!! tip
|
|
|
|
This process can be configured to fit your needs. If you don't want
|
|
paperless to create archived versions for digital documents, you can
|
|
configure that by configuring `PAPERLESS_OCR_MODE=skip_noarchive`.
|
|
Please read the
|
|
[relevant section in the documentation](/configuration#ocr).
|
|
|
|
!!! note
|
|
|
|
No matter which options you choose, Paperless will always store the
|
|
original document that it found in the consumption directory or in the
|
|
mail and will never overwrite that document. Archived versions are
|
|
stored alongside the original versions.
|
|
|
|
### The consumption directory
|
|
|
|
The primary method of getting documents into your database is by putting
|
|
them in the consumption directory. The consumer waits patiently, looking
|
|
for new additions to this directory. When it finds them,
|
|
the consumer goes about the process of parsing them with the OCR,
|
|
indexing what it finds, and storing it in the media directory.
|
|
|
|
Getting stuff into this directory is up to you. If you're running
|
|
Paperless on your local computer, you might just want to drag and drop
|
|
files there, but if you're running this on a server and want your
|
|
scanner to automatically push files to this directory, you'll need to
|
|
setup some sort of service to accept the files from the scanner.
|
|
Typically, you're looking at an FTP server like
|
|
[Proftpd](http://www.proftpd.org/) or a Windows folder share with
|
|
[Samba](https://www.samba.org/).
|
|
|
|
### Web UI Upload
|
|
|
|
The dashboard has a file drop field to upload documents to paperless.
|
|
Simply drag a file onto this field or select a file with the file
|
|
dialog. Multiple files are supported.
|
|
|
|
You can also upload documents on any other page of the web UI by
|
|
dragging-and-dropping files into your browser window.
|
|
|
|
### Mobile upload {#usage-mobile_upload}
|
|
|
|
The mobile app over at [https://github.com/qcasey/paperless_share](https://github.com/qcasey/paperless_share)
|
|
allows Android users to share any documents with paperless. This can be
|
|
combined with any of the mobile scanning apps out there, such as Office
|
|
Lens.
|
|
|
|
Furthermore, there is the [Paperless
|
|
App](https://github.com/bauerj/paperless_app) as well, which not only
|
|
has document upload, but also document browsing and download features.
|
|
|
|
### IMAP (Email) {#usage-email}
|
|
|
|
You can tell paperless-ngx to consume documents from your email
|
|
accounts. This is a very flexible and powerful feature, if you regularly
|
|
received documents via mail that you need to archive. The mail consumer
|
|
can be configured via the frontend settings (/settings/mail) in the following
|
|
manner:
|
|
|
|
1. Define e-mail accounts.
|
|
2. Define mail rules for your account.
|
|
|
|
These rules perform the following:
|
|
|
|
1. Connect to the mail server.
|
|
2. Fetch all matching mails (as defined by folder, maximum age and the
|
|
filters)
|
|
3. Check if there are any consumable attachments.
|
|
4. If so, instruct paperless to consume the attachments and optionally
|
|
use the metadata provided in the rule for the new document.
|
|
5. If documents were consumed from a mail, the rule action is performed
|
|
on that mail.
|
|
|
|
Paperless will completely ignore mails that do not match your filters.
|
|
It will also only perform the action on mails that it has consumed
|
|
documents from.
|
|
|
|
The actions all ensure that the same mail is not consumed twice by
|
|
different means. These are as follows:
|
|
|
|
- **Delete:** Immediately deletes mail that paperless has consumed
|
|
documents from. Use with caution.
|
|
- **Mark as read:** Mark consumed mail as read. Paperless will not
|
|
consume documents from already read mails. If you read a mail before
|
|
paperless sees it, it will be ignored.
|
|
- **Flag:** Sets the 'important' flag on mails with consumed
|
|
documents. Paperless will not consume flagged mails.
|
|
- **Move to folder:** Moves consumed mails out of the way so that
|
|
paperless wont consume them again.
|
|
- **Add custom Tag:** Adds a custom tag to mails with consumed
|
|
documents (the IMAP standard calls these "keywords"). Paperless
|
|
will not consume mails already tagged. Not all mail servers support
|
|
this feature!
|
|
|
|
!!! warning
|
|
|
|
The mail consumer will perform these actions on all mails it has
|
|
consumed documents from. Keep in mind that the actual consumption
|
|
process may fail for some reason, leaving you with missing documents in
|
|
paperless.
|
|
|
|
!!! note
|
|
|
|
With the correct set of rules, you can completely automate your email
|
|
documents. Create rules for every correspondent you receive digital
|
|
documents from and paperless will read them automatically. The default
|
|
action "mark as read" is pretty tame and will not cause any damage or
|
|
data loss whatsoever.
|
|
|
|
You can also setup a special folder in your mail account for paperless
|
|
and use your favorite mail client to move to be consumed mails into that
|
|
folder automatically or manually and tell paperless to move them to yet
|
|
another folder after consumption. It's up to you.
|
|
|
|
!!! note
|
|
|
|
When defining a mail rule with a folder, you may need to try different
|
|
characters to define how the sub-folders are separated. Common values
|
|
include ".", "/" or "\|", but this varies by the mail server.
|
|
Check the documentation for your mail server. In the event of an error
|
|
fetching mail from a certain folder, check the Paperless logs. When a
|
|
folder is not located, Paperless will attempt to list all folders found
|
|
in the account to the Paperless logs.
|
|
|
|
!!! note
|
|
|
|
Paperless will process the rules in the order defined in the admin page.
|
|
|
|
You can define catch-all rules and have them executed last to consume
|
|
any documents not matched by previous rules. Such a rule may assign an
|
|
"Unknown mail document" tag to consumed documents so you can inspect
|
|
them further.
|
|
|
|
Paperless is set up to check your mails every 10 minutes. This can be
|
|
configured on the 'Scheduled tasks' page in the admin.
|
|
|
|
### REST API
|
|
|
|
You can also submit a document using the REST API, see [POSTing documents](/api#file-uploads)
|
|
for details.
|
|
|
|
## Best practices {#basic-searching}
|
|
|
|
Paperless offers a couple tools that help you organize your document
|
|
collection. However, it is up to you to use them in a way that helps you
|
|
organize documents and find specific documents when you need them. This
|
|
section offers a couple ideas for managing your collection.
|
|
|
|
Document types allow you to classify documents according to what they
|
|
are. You can define types such as "Receipt", "Invoice", or
|
|
"Contract". If you used to collect all your receipts in a single
|
|
binder, you can recreate that system in paperless by defining a document
|
|
type, assigning documents to that type and then filtering by that type
|
|
to only see all receipts.
|
|
|
|
Not all documents need document types. Sometimes its hard to determine
|
|
what the type of a document is or it is hard to justify creating a
|
|
document type that you only need once or twice. This is okay. As long as
|
|
the types you define help you organize your collection in the way you
|
|
want, paperless is doing its job.
|
|
|
|
Tags can be used in many different ways. Think of tags are more
|
|
versatile folders or binders. If you have a binder for documents related
|
|
to university / your car or health care, you can create these binders in
|
|
paperless by creating tags and assigning them to relevant documents.
|
|
Just as with documents, you can filter the document list by tags and
|
|
only see documents of a certain topic.
|
|
|
|
With physical documents, you'll often need to decide which folder the
|
|
document belongs to. The advantage of tags over folders and binders is
|
|
that a single document can have multiple tags. A physical document
|
|
cannot magically appear in two different folders, but with tags, this is
|
|
entirely possible.
|
|
|
|
!!! tip
|
|
|
|
This can be used in many different ways. One example: Imagine you're
|
|
working on a particular task, such as signing up for university. Usually
|
|
you'll need to collect a bunch of different documents that are already
|
|
sorted into various folders. With the tag system of paperless, you can
|
|
create a new group of documents that are relevant to this task without
|
|
destroying the already existing organization. When you're done with the
|
|
task, you could delete the tag again, which would be equal to sorting
|
|
documents back into the folder they belong into. Or keep the tag, up to
|
|
you.
|
|
|
|
All of the logic above applies to correspondents as well. Attach them to
|
|
documents if you feel that they help you organize your collection.
|
|
|
|
When you've started organizing your documents, create a couple saved
|
|
views for document collections you regularly access. This is equal to
|
|
having labeled physical binders on your desk, except that these saved
|
|
views are dynamic and simply update themselves as you add documents to
|
|
the system.
|
|
|
|
Here are a couple examples of tags and types that you could use in your
|
|
collection.
|
|
|
|
- An `inbox` tag for newly added documents that you haven't manually
|
|
edited yet.
|
|
- A tag `car` for everything car related (repairs, registration,
|
|
insurance, etc)
|
|
- A tag `todo` for documents that you still need to do something with,
|
|
such as reply, or perform some task online.
|
|
- A tag `bank account x` for all bank statement related to that
|
|
account.
|
|
- A tag `mail` for anything that you added to paperless via its mail
|
|
processing capabilities.
|
|
- A tag `missing_metadata` when you still need to add some metadata to
|
|
a document, but can't or don't want to do this right now.
|
|
|
|
## Searching {#basic-usage_searching}
|
|
|
|
Paperless offers an extensive searching mechanism that is designed to
|
|
allow you to quickly find a document you're looking for (for example,
|
|
that thing that just broke and you bought a couple months ago, that
|
|
contract you signed 8 years ago).
|
|
|
|
When you search paperless for a document, it tries to match this query
|
|
against your documents. Paperless will look for matching documents by
|
|
inspecting their content, title, correspondent, type and tags. Paperless
|
|
returns a scored list of results, so that documents matching your query
|
|
better will appear further up in the search results.
|
|
|
|
By default, paperless returns only documents which contain all words
|
|
typed in the search bar. However, paperless also offers advanced search
|
|
syntax if you want to drill down the results further.
|
|
|
|
Matching documents with logical expressions:
|
|
|
|
```
|
|
shopname AND (product1 OR product2)
|
|
```
|
|
|
|
Matching specific tags, correspondents or types:
|
|
|
|
```
|
|
type:invoice tag:unpaid
|
|
correspondent:university certificate
|
|
```
|
|
|
|
Matching dates:
|
|
|
|
```
|
|
created:[2005 to 2009]
|
|
added:yesterday
|
|
modified:today
|
|
```
|
|
|
|
Matching inexact words:
|
|
|
|
```
|
|
produ*name
|
|
```
|
|
|
|
!!! note
|
|
|
|
Inexact terms are hard for search indexes. These queries might take a
|
|
while to execute. That's why paperless offers auto complete and query
|
|
correction.
|
|
|
|
All of these constructs can be combined as you see fit. If you want to
|
|
learn more about the query language used by paperless, paperless uses
|
|
Whoosh's default query language. Head over to [Whoosh query
|
|
language](https://whoosh.readthedocs.io/en/latest/querylang.html). For
|
|
details on what date parsing utilities are available, see [Date
|
|
parsing](https://whoosh.readthedocs.io/en/latest/dates.html#parsing-date-queries).
|
|
|
|
## The recommended workflow {#usage-recommended-workflow}
|
|
|
|
Once you have familiarized yourself with paperless and are ready to use
|
|
it for all your documents, the recommended workflow for managing your
|
|
documents is as follows. This workflow also takes into account that some
|
|
documents have to be kept in physical form, but still ensures that you
|
|
get all the advantages for these documents as well.
|
|
|
|
The following diagram shows how easy it is to manage your documents.
|
|
|
|
{width=400}
|
|
|
|
### Preparations in paperless
|
|
|
|
- Create an inbox tag that gets assigned to all new documents.
|
|
- Create a TODO tag.
|
|
|
|
### Processing of the physical documents
|
|
|
|
Keep a physical inbox. Whenever you receive a document that you need to
|
|
archive, put it into your inbox. Regularly, do the following for all
|
|
documents in your inbox:
|
|
|
|
1. For each document, decide if you need to keep the document in
|
|
physical form. This applies to certain important documents, such as
|
|
contracts and certificates.
|
|
2. If you need to keep the document, write a running number on the
|
|
document before scanning, starting at one and counting upwards. This
|
|
is the archive serial number, or ASN in short.
|
|
3. Scan the document.
|
|
4. If the document has an ASN assigned, store it in a _single_ binder,
|
|
sorted by ASN. Don't order this binder in any other way.
|
|
5. If the document has no ASN, throw it away. Yay!
|
|
|
|
Over time, you will notice that your physical binder will fill up. If it
|
|
is full, label the binder with the range of ASNs in this binder (i.e.,
|
|
"Documents 1 to 343"), store the binder in your cellar or elsewhere,
|
|
and start a new binder.
|
|
|
|
The idea behind this process is that you will never have to use the
|
|
physical binders to find a document. If you need a specific physical
|
|
document, you may find this document by:
|
|
|
|
1. Searching in paperless for the document.
|
|
2. Identify the ASN of the document, since it appears on the scan.
|
|
3. Grab the relevant document binder and get the document. This is easy
|
|
since they are sorted by ASN.
|
|
|
|
### Processing of documents in paperless
|
|
|
|
Once you have scanned in a document, proceed in paperless as follows.
|
|
|
|
1. If the document has an ASN, assign the ASN to the document.
|
|
2. Assign a correspondent to the document (i.e., your employer, bank,
|
|
etc) This isn't strictly necessary but helps in finding a document
|
|
when you need it.
|
|
3. Assign a document type (i.e., invoice, bank statement, etc) to the
|
|
document This isn't strictly necessary but helps in finding a
|
|
document when you need it.
|
|
4. Assign a proper title to the document (the name of an item you
|
|
bought, the subject of the letter, etc)
|
|
5. Check that the date of the document is correct. Paperless tries to
|
|
read the date from the content of the document, but this fails
|
|
sometimes if the OCR is bad or multiple dates appear on the
|
|
document.
|
|
6. Remove inbox tags from the documents.
|
|
|
|
!!! tip
|
|
|
|
You can setup manual matching rules for your correspondents and tags and
|
|
paperless will assign them automatically. After consuming a couple
|
|
documents, you can even ask paperless to *learn* when to assign tags and
|
|
correspondents by itself. For details on this feature, see
|
|
[advanced matching](/advanced_usage#matching).
|
|
|
|
### Task management
|
|
|
|
Some documents require attention and require you to act on the document.
|
|
You may take two different approaches to handle these documents based on
|
|
how regularly you intend to scan documents and use paperless.
|
|
|
|
- If you scan and process your documents in paperless regularly,
|
|
assign a TODO tag to all scanned documents that you need to process.
|
|
Create a saved view on the dashboard that shows all documents with
|
|
this tag.
|
|
- If you do not scan documents regularly and use paperless solely for
|
|
archiving, create a physical todo box next to your physical inbox
|
|
and put documents you need to process in the TODO box. When you
|
|
performed the task associated with the document, move it to the
|
|
inbox.
|
|
|
|
## Architecture
|
|
|
|
Paperless-ngx consists of the following components:
|
|
|
|
- **The webserver:** This serves the administration pages, the API,
|
|
and the new frontend. This is the main tool you'll be using to interact
|
|
with paperless. You may start the webserver directly with
|
|
|
|
```shell-session
|
|
$ cd /path/to/paperless/src/
|
|
$ gunicorn -c ../gunicorn.conf.py paperless.wsgi
|
|
```
|
|
|
|
or by any other means such as Apache `mod_wsgi`.
|
|
|
|
- **The consumer:** This is what watches your consumption folder for
|
|
documents. However, the consumer itself does not really consume your
|
|
documents. Now it notifies a task processor that a new file is ready
|
|
for consumption. I suppose it should be named differently. This was
|
|
also used to check your emails, but that's now done elsewhere as
|
|
well.
|
|
|
|
Start the consumer with the management command `document_consumer`:
|
|
|
|
```shell-session
|
|
$ cd /path/to/paperless/src/
|
|
$ python3 manage.py document_consumer
|
|
```
|
|
|
|
- **The task processor:** Paperless relies on [Celery - Distributed
|
|
Task Queue](https://docs.celeryq.dev/en/stable/index.html) for doing
|
|
most of the heavy lifting. This is a task queue that accepts tasks
|
|
from multiple sources and processes these in parallel. It also comes
|
|
with a scheduler that executes certain commands periodically.
|
|
|
|
This task processor is responsible for:
|
|
|
|
- Consuming documents. When the consumer finds new documents, it
|
|
notifies the task processor to start a consumption task.
|
|
- The task processor also performs the consumption of any
|
|
documents you upload through the web interface.
|
|
- Consuming emails. It periodically checks your configured
|
|
accounts for new emails and notifies the task processor to
|
|
consume the attachment of an email.
|
|
- Maintaining the search index and the automatic matching
|
|
algorithm. These are things that paperless needs to do from time
|
|
to time in order to operate properly.
|
|
|
|
This allows paperless to process multiple documents from your
|
|
consumption folder in parallel! On a modern multi core system, this
|
|
makes the consumption process with full OCR blazingly fast.
|
|
|
|
The task processor comes with a built-in admin interface that you
|
|
can use to check whenever any of the tasks fail and inspect the
|
|
errors (i.e., wrong email credentials, errors during consuming a
|
|
specific file, etc).
|
|
|
|
- A [redis](https://redis.io/) message broker: This is a really
|
|
lightweight service that is responsible for getting the tasks from
|
|
the webserver and the consumer to the task scheduler. These run in a
|
|
different process (maybe even on different machines!), and
|
|
therefore, this is necessary.
|
|
|
|
- Optional: A database server. Paperless supports PostgreSQL, MariaDB
|
|
and SQLite for storing its data.
|