mirror of
				https://github.com/paperless-ngx/paperless-ngx.git
				synced 2025-10-24 03:26:11 -05:00 
			
		
		
		
	Compare commits
	
		
			3 Commits
		
	
	
		
			e8957de4a7
			...
			sunset-rtd
		
	
	| Author | SHA1 | Date | |
|---|---|---|---|
|   | 15f4808fec | ||
|   | d531805597 | ||
|   | 304cfc42a9 | 
							
								
								
									
										8
									
								
								docs/_static/css/custom.css
									
									
									
									
										vendored
									
									
								
							
							
						
						
									
										8
									
								
								docs/_static/css/custom.css
									
									
									
									
										vendored
									
									
								
							| @@ -595,3 +595,11 @@ html.writer-html5 .rst-content dl.footnote code { | ||||
| .wy-nav-content-wrap { | ||||
|   z-index: 20; | ||||
| } | ||||
|  | ||||
| .rst-content .toctree-wrapper { | ||||
|   display: none; | ||||
| } | ||||
|  | ||||
| .redirect-notice { | ||||
|   font-size: 2.5rem; | ||||
| } | ||||
|   | ||||
							
								
								
									
										25
									
								
								docs/_templates/layout.html
									
									
									
									
										vendored
									
									
								
							
							
						
						
									
										25
									
								
								docs/_templates/layout.html
									
									
									
									
										vendored
									
									
								
							| @@ -8,6 +8,31 @@ | ||||
|  | ||||
|         document.documentElement.classList.toggle("dark-mode", darkModeState); | ||||
|         document.documentElement.classList.toggle("light-mode", !darkModeState); | ||||
|  | ||||
|         const RTD_TO_MKD = { | ||||
|             "index.html": "", | ||||
|             "setup.html": "setup", | ||||
|             "usage_overview.html": "usage", | ||||
|             "advanced_usage.html": "advanced_usage", | ||||
|             "administration.html": "administration", | ||||
|             "configuration.html": "configuration", | ||||
|             "api.html": "api", | ||||
|             "faq.html": "faq", | ||||
|             "troubleshooting.html": "troubleshooting", | ||||
|             "extending.html": "development", | ||||
|             "scanners.html": "", | ||||
|             "screenshots.html": "", | ||||
|             "changelog.html": "changelog", | ||||
|         } | ||||
|  | ||||
|         const path = (RTD_TO_MKD[window.location.pathname.substring(window.location.pathname.lastIndexOf("/") + 1)] ?? "") + "/"; | ||||
|         const hash = window.location.hash; | ||||
|         const redirectURL = new URL(path  + hash, "https://docs.paperless-ngx.com/"); | ||||
|         console.log(`Redirecting to ${redirectURL} in 3 seconds...`); | ||||
|  | ||||
|         setTimeout(() => { | ||||
|             window.location.replace(redirectURL); | ||||
|         }, 3000); | ||||
|     </script> | ||||
|     {{ super() }} | ||||
| {% endblock %} | ||||
|   | ||||
| @@ -1,531 +1,11 @@ | ||||
| .. _administration: | ||||
|  | ||||
| ************** | ||||
| Administration | ||||
| ************** | ||||
|  | ||||
| .. _administration-backup: | ||||
| .. cssclass:: redirect-notice | ||||
|  | ||||
| Making backups | ||||
| ############## | ||||
|     The Paperless-ngx documentation has permanently moved. | ||||
|  | ||||
| Multiple options exist for making backups of your paperless instance, | ||||
| depending on how you installed paperless. | ||||
|  | ||||
| Before making backups, make sure that paperless is not running. | ||||
|  | ||||
| Options available to any installation of paperless: | ||||
|  | ||||
| *   Use the :ref:`document exporter <utilities-exporter>`. | ||||
|     The document exporter exports all your documents, thumbnails and | ||||
|     metadata to a specific folder. You may import your documents into a | ||||
|     fresh instance of paperless again or store your documents in another | ||||
|     DMS with this export. | ||||
| *   The document exporter is also able to update an already existing export. | ||||
|     Therefore, incremental backups with ``rsync`` are entirely possible. | ||||
|  | ||||
| .. caution:: | ||||
|  | ||||
|     You cannot import the export generated with one version of paperless in a | ||||
|     different version of paperless. The export contains an exact image of the | ||||
|     database, and migrations may change the database layout. | ||||
|  | ||||
| Options available to docker installations: | ||||
|  | ||||
| *   Backup the docker volumes. These usually reside within | ||||
|     ``/var/lib/docker/volumes`` on the host and you need to be root in order | ||||
|     to access them. | ||||
|  | ||||
|     Paperless uses 4 volumes: | ||||
|  | ||||
|     *   ``paperless_media``: This is where your documents are stored. | ||||
|     *   ``paperless_data``: This is where auxillary data is stored. This | ||||
|         folder also contains the SQLite database, if you use it. | ||||
|     *   ``paperless_pgdata``: Exists only if you use PostgreSQL and contains | ||||
|         the database. | ||||
|     *   ``paperless_dbdata``: Exists only if you use MariaDB and contains | ||||
|         the database. | ||||
|  | ||||
| Options available to bare-metal and non-docker installations: | ||||
|  | ||||
| *   Backup the entire paperless folder. This ensures that if your paperless instance | ||||
|     crashes at some point or your disk fails, you can simply copy the folder back | ||||
|     into place and it works. | ||||
|  | ||||
|     When using PostgreSQL or MariaDB, you'll also have to backup the database. | ||||
|  | ||||
| .. _migrating-restoring: | ||||
|  | ||||
| Restoring | ||||
| ========= | ||||
|  | ||||
| .. _administration-updating: | ||||
|  | ||||
| Updating Paperless | ||||
| ################## | ||||
|  | ||||
| Docker Route | ||||
| ============ | ||||
|  | ||||
| If a new release of paperless-ngx is available, upgrading depends on how you | ||||
| installed paperless-ngx in the first place. The releases are available at the | ||||
| `release page <https://github.com/paperless-ngx/paperless-ngx/releases>`_. | ||||
|  | ||||
| First of all, ensure that paperless is stopped. | ||||
|  | ||||
| .. code:: shell-session | ||||
|  | ||||
|     $ cd /path/to/paperless | ||||
|     $ docker-compose down | ||||
|  | ||||
| After that, :ref:`make a backup <administration-backup>`. | ||||
|  | ||||
| A.  If you pull the image from the docker hub, all you need to do is: | ||||
|  | ||||
|     .. code:: shell-session | ||||
|  | ||||
|         $ docker-compose pull | ||||
|         $ docker-compose up | ||||
|  | ||||
|     The docker-compose files refer to the ``latest`` version, which is always the latest | ||||
|     stable release. | ||||
|  | ||||
| B.  If you built the image yourself, do the following: | ||||
|  | ||||
|     .. code:: shell-session | ||||
|  | ||||
|         $ git pull | ||||
|         $ docker-compose build | ||||
|         $ docker-compose up | ||||
|  | ||||
| Running ``docker-compose up`` will also apply any new database migrations. | ||||
| If you see everything working, press CTRL+C once to gracefully stop paperless. | ||||
| Then you can start paperless-ngx with ``-d`` to have it run in the background. | ||||
|  | ||||
|     .. note:: | ||||
|  | ||||
|         In version 0.9.14, the update process was changed. In 0.9.13 and earlier, the | ||||
|         docker-compose files specified exact versions and pull won't automatically | ||||
|         update to newer versions. In order to enable updates as described above, either | ||||
|         get the new ``docker-compose.yml`` file from `here <https://github.com/paperless-ngx/paperless-ngx/tree/master/docker/compose>`_ | ||||
|         or edit the ``docker-compose.yml`` file, find the line that says | ||||
|  | ||||
|             .. code:: | ||||
|  | ||||
|                 image: ghcr.io/paperless-ngx/paperless-ngx:0.9.x | ||||
|  | ||||
|         and replace the version with ``latest``: | ||||
|  | ||||
|             .. code:: | ||||
|  | ||||
|                 image: ghcr.io/paperless-ngx/paperless-ngx:latest | ||||
|  | ||||
|     .. note:: | ||||
|         In version 1.7.1 and onwards, the Docker image can now be pinned to a release series. | ||||
|         This is often combined with automatic updaters such as Watchtower to allow safer | ||||
|         unattended upgrading to new bugfix releases only.  It is still recommended to always | ||||
|         review release notes before upgrading.  To pin your install to a release series, edit | ||||
|         the ``docker-compose.yml`` find the line that says | ||||
|  | ||||
|             .. code:: | ||||
|  | ||||
|                 image: ghcr.io/paperless-ngx/paperless-ngx:latest | ||||
|  | ||||
|         and replace the version with the series you want to track, for example: | ||||
|  | ||||
|             .. code:: | ||||
|  | ||||
|                 image: ghcr.io/paperless-ngx/paperless-ngx:1.7 | ||||
|  | ||||
| Bare Metal Route | ||||
| ================ | ||||
|  | ||||
| After grabbing the new release and unpacking the contents, do the following: | ||||
|  | ||||
| 1.  Update dependencies. New paperless version may require additional | ||||
|     dependencies. The dependencies required are listed in the section about | ||||
|     :ref:`bare metal installations <setup-bare_metal>`. | ||||
|  | ||||
| 2.  Update python requirements. Keep in mind to activate your virtual environment | ||||
|     before that, if you use one. | ||||
|  | ||||
|     .. code:: shell-session | ||||
|  | ||||
|         $ pip install -r requirements.txt | ||||
|  | ||||
| 3.  Migrate the database. | ||||
|  | ||||
|     .. code:: shell-session | ||||
|  | ||||
|         $ cd src | ||||
|         $ python3 manage.py migrate | ||||
|  | ||||
|     This might not actually do anything. Not every new paperless version comes with new | ||||
|     database migrations. | ||||
|  | ||||
| Downgrading Paperless | ||||
| ##################### | ||||
|  | ||||
| Downgrades are possible. However, some updates also contain database migrations (these change the layout of the database and may move data). | ||||
| In order to move back from a version that applied database migrations, you'll have to revert the database migration *before* downgrading, | ||||
| and then downgrade paperless. | ||||
|  | ||||
| This table lists the compatible versions for each database migration number. | ||||
|  | ||||
| +------------------+-----------------+ | ||||
| | Migration number | Version range   | | ||||
| +------------------+-----------------+ | ||||
| | 1011             | 1.0.0           | | ||||
| +------------------+-----------------+ | ||||
| | 1012             | 1.1.0 - 1.2.1   | | ||||
| +------------------+-----------------+ | ||||
| | 1014             | 1.3.0 - 1.3.1   | | ||||
| +------------------+-----------------+ | ||||
| | 1016             | 1.3.2 - current | | ||||
| +------------------+-----------------+ | ||||
|  | ||||
| Execute the following management command to migrate your database: | ||||
|  | ||||
| .. code:: shell-session | ||||
|  | ||||
|     $ python3 manage.py migrate documents <migration number> | ||||
|  | ||||
| .. note:: | ||||
|  | ||||
|     Some migrations cannot be undone. The command will issue errors if that happens. | ||||
|  | ||||
| .. _utilities-management-commands: | ||||
|  | ||||
| Management utilities | ||||
| #################### | ||||
|  | ||||
| Paperless comes with some management commands that perform various maintenance | ||||
| tasks on your paperless instance. You can invoke these commands in the following way: | ||||
|  | ||||
| With docker-compose, while paperless is running: | ||||
|  | ||||
| .. code:: shell-session | ||||
|  | ||||
|     $ cd /path/to/paperless | ||||
|     $ docker-compose exec webserver <command> <arguments> | ||||
|  | ||||
| With docker, while paperless is running: | ||||
|  | ||||
| .. code:: shell-session | ||||
|  | ||||
|     $ docker exec -it <container-name> <command> <arguments> | ||||
|  | ||||
| Bare metal: | ||||
|  | ||||
| .. code:: shell-session | ||||
|  | ||||
|     $ cd /path/to/paperless/src | ||||
|     $ python3 manage.py <command> <arguments> | ||||
|  | ||||
| All commands have built-in help, which can be accessed by executing them with | ||||
| the argument ``--help``. | ||||
|  | ||||
| .. _utilities-exporter: | ||||
|  | ||||
| Document exporter | ||||
| ================= | ||||
|  | ||||
| The document exporter exports all your data from paperless into a folder for | ||||
| backup or migration to another DMS. | ||||
|  | ||||
| If you use the document exporter within a cronjob to backup your data you might use the ``-T`` flag behind exec to suppress "The input device is not a TTY" errors. For example: ``docker-compose exec -T webserver document_exporter ../export`` | ||||
|  | ||||
| .. code:: | ||||
|  | ||||
|     document_exporter target [-c] [-f] [-d] | ||||
|  | ||||
|     optional arguments: | ||||
|     -c, --compare-checksums | ||||
|     -f, --use-filename-format | ||||
|     -d, --delete | ||||
|  | ||||
| ``target`` is a folder to which the data gets written. This includes documents, | ||||
| thumbnails and a ``manifest.json`` file. The manifest contains all metadata from | ||||
| the database (correspondents, tags, etc). | ||||
|  | ||||
| When you use the provided docker compose script, specify ``../export`` as the | ||||
| target. This path inside the container is automatically mounted on your host on | ||||
| the folder ``export``. | ||||
|  | ||||
| If the target directory already exists and contains files, paperless will assume | ||||
| that the contents of the export directory are a previous export and will attempt | ||||
| to update the previous export. Paperless will only export changed and added files. | ||||
| Paperless determines whether a file has changed by inspecting the file attributes | ||||
| "date/time modified" and "size". If that does not work out for you, specify | ||||
| ``--compare-checksums`` and paperless will attempt to compare file checksums instead. | ||||
| This is slower. | ||||
|  | ||||
| Paperless will not remove any existing files in the export directory. If you want | ||||
| paperless to also remove files that do not belong to the current export such as files | ||||
| from deleted documents, specify ``--delete``. Be careful when pointing paperless to | ||||
| a directory that already contains other files. | ||||
|  | ||||
| The filenames generated by this command follow the format | ||||
| ``[date created] [correspondent] [title].[extension]``. | ||||
| If you want paperless to use ``PAPERLESS_FILENAME_FORMAT`` for exported filenames | ||||
| instead, specify ``--use-filename-format``. | ||||
|  | ||||
|  | ||||
| .. _utilities-importer: | ||||
|  | ||||
| Document importer | ||||
| ================= | ||||
|  | ||||
| The document importer takes the export produced by the `Document exporter`_ and | ||||
| imports it into paperless. | ||||
|  | ||||
| The importer works just like the exporter.  You point it at a directory, and | ||||
| the script does the rest of the work: | ||||
|  | ||||
| .. code:: | ||||
|  | ||||
|     document_importer source | ||||
|  | ||||
| When you use the provided docker compose script, put the export inside the | ||||
| ``export`` folder in your paperless source directory. Specify ``../export`` | ||||
| as the ``source``. | ||||
|  | ||||
| .. note:: | ||||
|  | ||||
|     Importing from a previous version of Paperless may work, but for best results | ||||
|     it is suggested to match the versions. | ||||
|  | ||||
| .. _utilities-retagger: | ||||
|  | ||||
| Document retagger | ||||
| ================= | ||||
|  | ||||
| Say you've imported a few hundred documents and now want to introduce | ||||
| a tag or set up a new correspondent, and apply its matching to all of | ||||
| the currently-imported docs. This problem is common enough that | ||||
| there are tools for it. | ||||
|  | ||||
| .. code:: | ||||
|  | ||||
|     document_retagger [-h] [-c] [-T] [-t] [-i] [--use-first] [-f] | ||||
|  | ||||
|     optional arguments: | ||||
|     -c, --correspondent | ||||
|     -T, --tags | ||||
|     -t, --document_type | ||||
|     -s, --storage_path | ||||
|     -i, --inbox-only | ||||
|     --use-first | ||||
|     -f, --overwrite | ||||
|  | ||||
| Run this after changing or adding matching rules. It'll loop over all | ||||
| of the documents in your database and attempt to match documents | ||||
| according to the new rules. | ||||
|  | ||||
| Specify any combination of ``-c``, ``-T``, ``-t`` and ``-s`` to have the | ||||
| retagger perform matching of the specified metadata type. If you don't | ||||
| specify any of these options, the document retagger won't do anything. | ||||
|  | ||||
| Specify ``-i`` to have the document retagger work on documents tagged | ||||
| with inbox tags only. This is useful when you don't want to mess with | ||||
| your already processed documents. | ||||
|  | ||||
| When multiple document types or correspondents match a single document, | ||||
| the retagger won't assign these to the document. Specify ``--use-first`` | ||||
| to override this behavior and just use the first correspondent or type | ||||
| it finds. This option does not apply to tags, since any amount of tags | ||||
| can be applied to a document. | ||||
|  | ||||
| Finally, ``-f`` specifies that you wish to overwrite already assigned | ||||
| correspondents, types and/or tags. The default behavior is to not | ||||
| assign correspondents and types to documents that have this data already | ||||
| assigned. ``-f`` works differently for tags: By default, only additional tags get | ||||
| added to documents, no tags will be removed. With ``-f``, tags that don't | ||||
| match a document anymore get removed as well. | ||||
|  | ||||
|  | ||||
| Managing the Automatic matching algorithm | ||||
| ========================================= | ||||
|  | ||||
| The *Auto* matching algorithm requires a trained neural network to work. | ||||
| This network needs to be updated whenever somethings in your data | ||||
| changes. The docker image takes care of that automatically with the task | ||||
| scheduler. You can manually renew the classifier by invoking the following | ||||
| management command: | ||||
|  | ||||
| .. code:: | ||||
|  | ||||
|     document_create_classifier | ||||
|  | ||||
| This command takes no arguments. | ||||
|  | ||||
| .. _`administration-index`: | ||||
|  | ||||
| Managing the document search index | ||||
| ================================== | ||||
|  | ||||
| The document search index is responsible for delivering search results for the | ||||
| website. The document index is automatically updated whenever documents get | ||||
| added to, changed, or removed from paperless. However, if the search yields | ||||
| non-existing documents or won't find anything, you may need to recreate the | ||||
| index manually. | ||||
|  | ||||
| .. code:: | ||||
|  | ||||
|     document_index {reindex,optimize} | ||||
|  | ||||
| Specify ``reindex`` to have the index created from scratch. This may take some | ||||
| time. | ||||
|  | ||||
| Specify ``optimize`` to optimize the index. This updates certain aspects of | ||||
| the index and usually makes queries faster and also ensures that the | ||||
| autocompletion works properly. This command is regularly invoked by the task | ||||
| scheduler. | ||||
|  | ||||
| .. _utilities-renamer: | ||||
|  | ||||
| Managing filenames | ||||
| ================== | ||||
|  | ||||
| If you use paperless' feature to | ||||
| :ref:`assign custom filenames to your documents <advanced-file_name_handling>`, | ||||
| you can use this command to move all your files after changing | ||||
| the naming scheme. | ||||
|  | ||||
| .. warning:: | ||||
|  | ||||
|     Since this command moves your documents, it is advised to do | ||||
|     a backup beforehand. The renaming logic is robust and will never overwrite | ||||
|     or delete a file, but you can't ever be careful enough. | ||||
|  | ||||
| .. code:: | ||||
|  | ||||
|     document_renamer | ||||
|  | ||||
| The command takes no arguments and processes all your documents at once. | ||||
|  | ||||
| Learn how to use :ref:`Management Utilities<utilities-management-commands>`. | ||||
|  | ||||
|  | ||||
| .. _utilities-sanity-checker: | ||||
|  | ||||
| Sanity checker | ||||
| ============== | ||||
|  | ||||
| Paperless has a built-in sanity checker that inspects your document collection for issues. | ||||
|  | ||||
| The issues detected by the sanity checker are as follows: | ||||
|  | ||||
| * Missing original files. | ||||
| * Missing archive files. | ||||
| * Inaccessible original files due to improper permissions. | ||||
| * Inaccessible archive files due to improper permissions. | ||||
| * Corrupted original documents by comparing their checksum against what is stored in the database. | ||||
| * Corrupted archive documents by comparing their checksum against what is stored in the database. | ||||
| * Missing thumbnails. | ||||
| * Inaccessible thumbnails due to improper permissions. | ||||
| * Documents without any content (warning). | ||||
| * Orphaned files in the media directory (warning). These are files that are not referenced by any document im paperless. | ||||
|  | ||||
|  | ||||
| .. code:: | ||||
|  | ||||
|     document_sanity_checker | ||||
|  | ||||
| The command takes no arguments. Depending on the size of your document archive, this may take some time. | ||||
|  | ||||
|  | ||||
| Fetching e-mail | ||||
| =============== | ||||
|  | ||||
| Paperless automatically fetches your e-mail every 10 minutes by default. If | ||||
| you want to invoke the email consumer manually, call the following management | ||||
| command: | ||||
|  | ||||
| .. code:: | ||||
|  | ||||
|     mail_fetcher | ||||
|  | ||||
| The command takes no arguments and processes all your mail accounts and rules. | ||||
|  | ||||
|  .. note:: | ||||
|  | ||||
|     As of October 2022 Microsoft no longer supports IMAP authentication for Exchange | ||||
|     servers, thus Exchange is no longer supported until a solution is implemented in | ||||
|     the Python IMAP library used by Paperless. See  `learn.microsoft.com`_ | ||||
|  | ||||
| .. _learn.microsoft.com: https://learn.microsoft.com/en-us/exchange/clients-and-mobile-in-exchange-online/deprecation-of-basic-authentication-exchange-online | ||||
|  | ||||
| .. _utilities-archiver: | ||||
|  | ||||
| Creating archived documents | ||||
| =========================== | ||||
|  | ||||
| Paperless stores archived PDF/A documents alongside your original documents. | ||||
| These archived documents will also contain selectable text for image-only | ||||
| originals. | ||||
| These documents are derived from the originals, which are always stored | ||||
| unmodified. If coming from an earlier version of paperless, your documents | ||||
| won't have archived versions. | ||||
|  | ||||
| This command creates PDF/A documents for your documents. | ||||
|  | ||||
| .. code:: | ||||
|  | ||||
|     document_archiver --overwrite --document <id> | ||||
|  | ||||
| This command will only attempt to create archived documents when no archived | ||||
| document exists yet, unless ``--overwrite`` is specified. If ``--document <id>`` | ||||
| is specified, the archiver will only process that document. | ||||
|  | ||||
| .. note:: | ||||
|  | ||||
|     This command essentially performs OCR on all your documents again, | ||||
|     according to your settings. If you run this with ``PAPERLESS_OCR_MODE=redo``, | ||||
|     it will potentially run for a very long time. You can cancel the command | ||||
|     at any time, since this command will skip already archived versions the next time | ||||
|     it is run. | ||||
|  | ||||
| .. note:: | ||||
|  | ||||
|     Some documents will cause errors and cannot be converted into PDF/A documents, | ||||
|     such as encrypted PDF documents. The archiver will skip over these documents | ||||
|     each time it sees them. | ||||
|  | ||||
| .. _utilities-encyption: | ||||
|  | ||||
| Managing encryption | ||||
| =================== | ||||
|  | ||||
| Documents can be stored in Paperless using GnuPG encryption. | ||||
|  | ||||
| .. danger:: | ||||
|  | ||||
|     Encryption is deprecated since paperless-ngx 0.9 and doesn't really provide any | ||||
|     additional security, since you have to store the passphrase in a configuration | ||||
|     file on the same system as the encrypted documents for paperless to work. | ||||
|     Furthermore, the entire text content of the documents is stored plain in the | ||||
|     database, even if your documents are encrypted. Filenames are not encrypted as | ||||
|     well. | ||||
|  | ||||
|     Also, the web server provides transparent access to your encrypted documents. | ||||
|  | ||||
|     Consider running paperless on an encrypted filesystem instead, which will then | ||||
|     at least provide security against physical hardware theft. | ||||
|  | ||||
|  | ||||
| Enabling encryption | ||||
| ------------------- | ||||
|  | ||||
| Enabling encryption is no longer supported. | ||||
|  | ||||
|  | ||||
| Disabling encryption | ||||
| -------------------- | ||||
|  | ||||
| Basic usage to disable encryption of your document store: | ||||
|  | ||||
| (Note: If ``PAPERLESS_PASSPHRASE`` isn't set already, you need to specify it here) | ||||
|  | ||||
| .. code:: | ||||
|  | ||||
|     decrypt_documents [--passphrase SECR3TP4SSPHRA$E] | ||||
|     You will be redirected shortly... | ||||
|   | ||||
| @@ -1,447 +1,11 @@ | ||||
| .. _advanced_usage: | ||||
|  | ||||
| *************** | ||||
| Advanced topics | ||||
| *************** | ||||
|  | ||||
| Paperless offers a couple features that automate certain tasks and make your life | ||||
| easier. | ||||
| .. cssclass:: redirect-notice | ||||
|  | ||||
| .. _advanced-matching: | ||||
|     The Paperless-ngx documentation has permanently moved. | ||||
|  | ||||
| Matching tags, correspondents, document types, and storage paths | ||||
| ################################################################ | ||||
|  | ||||
| Paperless will compare the matching algorithms defined by every tag, correspondent, | ||||
| document type, and storage path in your database to see if they apply to the text | ||||
| in a document. In other words, if you define a tag called ``Home Utility`` | ||||
| that had a ``match`` property of ``bc hydro`` and a ``matching_algorithm`` of | ||||
| ``literal``, Paperless will automatically tag your newly-consumed document with | ||||
| your ``Home Utility`` tag so long as the text ``bc hydro`` appears in the body | ||||
| of the document somewhere. | ||||
|  | ||||
| The matching logic is quite powerful. It supports searching the text of your | ||||
| document with different algorithms, and as such, some experimentation may be | ||||
| necessary to get things right. | ||||
|  | ||||
| In order to have a tag, correspondent, document type, or storage path assigned | ||||
| automatically to newly consumed documents, assign a match and matching algorithm | ||||
| using the web interface. These settings define when to assign tags, correspondents, | ||||
| document types, and storage paths to documents. | ||||
|  | ||||
| The following algorithms are available: | ||||
|  | ||||
| * **Any:** Looks for any occurrence of any word provided in match in the PDF. | ||||
|   If you define the match as ``Bank1 Bank2``, it will match documents containing | ||||
|   either of these terms. | ||||
| * **All:** Requires that every word provided appears in the PDF, albeit not in the | ||||
|   order provided. | ||||
| * **Literal:** Matches only if the match appears exactly as provided (i.e. preserve ordering) in the PDF. | ||||
| * **Regular expression:** Parses the match as a regular expression and tries to | ||||
|   find a match within the document. | ||||
| * **Fuzzy match:** I don't know. Look at the source. | ||||
| * **Auto:** Tries to automatically match new documents. This does not require you | ||||
|   to set a match. See the notes below. | ||||
|  | ||||
| When using the *any* or *all* matching algorithms, you can search for terms | ||||
| that consist of multiple words by enclosing them in double quotes. For example, | ||||
| defining a match text of ``"Bank of America" BofA`` using the *any* algorithm, | ||||
| will match documents that contain either "Bank of America" or "BofA", but will | ||||
| not match documents containing "Bank of South America". | ||||
|  | ||||
| Then just save your tag, correspondent, document type, or storage path and run | ||||
| another document through the consumer.  Once complete, you should see the | ||||
| newly-created document, automatically tagged with the appropriate data. | ||||
|  | ||||
|  | ||||
| .. _advanced-automatic_matching: | ||||
|  | ||||
| Automatic matching | ||||
| ================== | ||||
|  | ||||
| Paperless-ngx comes with a new matching algorithm called *Auto*. This matching | ||||
| algorithm tries to assign tags, correspondents, document types, and storage paths | ||||
| to your documents based on how you have already assigned these on existing documents. | ||||
| It uses a neural network under the hood. | ||||
|  | ||||
| If, for example, all your bank statements of your account 123 at the Bank of | ||||
| America are tagged with the tag "bofa_123" and the matching algorithm of this | ||||
| tag is set to *Auto*, this neural network will examine your documents and | ||||
| automatically learn when to assign this tag. | ||||
|  | ||||
| Paperless tries to hide much of the involved complexity with this approach. | ||||
| However, there are a couple caveats you need to keep in mind when using this | ||||
| feature: | ||||
|  | ||||
| * Changes to your documents are not immediately reflected by the matching | ||||
|   algorithm. The neural network needs to be *trained* on your documents after | ||||
|   changes. Paperless periodically (default: once each hour) checks for changes | ||||
|   and does this automatically for you. | ||||
| * The Auto matching algorithm only takes documents into account which are NOT | ||||
|   placed in your inbox (i.e. have any inbox tags assigned to them). This ensures | ||||
|   that the neural network only learns from documents which you have correctly | ||||
|   tagged before. | ||||
| * The matching algorithm can only work if there is a correlation between the | ||||
|   tag, correspondent, document type, or storage path and the document itself. | ||||
|   Your bank statements usually contain your bank account number and the name | ||||
|   of the bank, so this works reasonably well, However, tags such as "TODO" | ||||
|   cannot be automatically assigned. | ||||
| * The matching algorithm needs a reasonable number of documents to identify when | ||||
|   to assign tags, correspondents, storage paths, and types. If one out of a | ||||
|   thousand documents has the correspondent "Very obscure web shop I bought | ||||
|   something five years ago", it will probably not assign this correspondent | ||||
|   automatically if you buy something from them again. The more documents, the better. | ||||
| * Paperless also needs a reasonable amount of negative examples to decide when | ||||
|   not to assign a certain tag, correspondent, document type, or storage path. This will | ||||
|   usually be the case as you start filling up paperless with documents. | ||||
|   Example: If all your documents are either from "Webshop" and "Bank", paperless | ||||
|   will assign one of these correspondents to ANY new document, if both are set | ||||
|   to automatic matching. | ||||
|  | ||||
| Hooking into the consumption process | ||||
| #################################### | ||||
|  | ||||
| Sometimes you may want to do something arbitrary whenever a document is | ||||
| consumed.  Rather than try to predict what you may want to do, Paperless lets | ||||
| you execute scripts of your own choosing just before or after a document is | ||||
| consumed using a couple simple hooks. | ||||
|  | ||||
| Just write a script, put it somewhere that Paperless can read & execute, and | ||||
| then put the path to that script in ``paperless.conf`` or ``docker-compose.env`` with the variable name | ||||
| of either ``PAPERLESS_PRE_CONSUME_SCRIPT`` or | ||||
| ``PAPERLESS_POST_CONSUME_SCRIPT``. | ||||
|  | ||||
| .. important:: | ||||
|  | ||||
|     These scripts are executed in a **blocking** process, which means that if | ||||
|     a script takes a long time to run, it can significantly slow down your | ||||
|     document consumption flow.  If you want things to run asynchronously, | ||||
|     you'll have to fork the process in your script and exit. | ||||
|  | ||||
|  | ||||
| Pre-consumption script | ||||
| ====================== | ||||
|  | ||||
| Executed after the consumer sees a new document in the consumption folder, but | ||||
| before any processing of the document is performed. This script can access the | ||||
| following relevant environment variables set: | ||||
|  | ||||
| * ``DOCUMENT_SOURCE_PATH`` | ||||
|  | ||||
| A simple but common example for this would be creating a simple script like | ||||
| this: | ||||
|  | ||||
| ``/usr/local/bin/ocr-pdf`` | ||||
|  | ||||
| .. code:: bash | ||||
|  | ||||
|     #!/usr/bin/env bash | ||||
|     pdf2pdfocr.py -i ${DOCUMENT_SOURCE_PATH} | ||||
|  | ||||
| ``/etc/paperless.conf`` | ||||
|  | ||||
| .. code:: bash | ||||
|  | ||||
|     ... | ||||
|     PAPERLESS_PRE_CONSUME_SCRIPT="/usr/local/bin/ocr-pdf" | ||||
|     ... | ||||
|  | ||||
| This will pass the path to the document about to be consumed to ``/usr/local/bin/ocr-pdf``, | ||||
| which will in turn call `pdf2pdfocr.py`_ on your document, which will then | ||||
| overwrite the file with an OCR'd version of the file and exit.  At which point, | ||||
| the consumption process will begin with the newly modified file. | ||||
|  | ||||
| The script's stdout and stderr will be logged line by line to the webserver log, along | ||||
| with the exit code of the script. | ||||
|  | ||||
| .. _pdf2pdfocr.py: https://github.com/LeoFCardoso/pdf2pdfocr | ||||
|  | ||||
| .. _advanced-post_consume_script: | ||||
|  | ||||
| Post-consumption script | ||||
| ======================= | ||||
|  | ||||
| Executed after the consumer has successfully processed a document and has moved it | ||||
| into paperless. It receives the following environment variables: | ||||
|  | ||||
| * ``DOCUMENT_ID`` | ||||
| * ``DOCUMENT_FILE_NAME`` | ||||
| * ``DOCUMENT_CREATED`` | ||||
| * ``DOCUMENT_MODIFIED`` | ||||
| * ``DOCUMENT_ADDED`` | ||||
| * ``DOCUMENT_SOURCE_PATH`` | ||||
| * ``DOCUMENT_ARCHIVE_PATH`` | ||||
| * ``DOCUMENT_THUMBNAIL_PATH`` | ||||
| * ``DOCUMENT_DOWNLOAD_URL`` | ||||
| * ``DOCUMENT_THUMBNAIL_URL`` | ||||
| * ``DOCUMENT_CORRESPONDENT`` | ||||
| * ``DOCUMENT_TAGS`` | ||||
| * ``DOCUMENT_ORIGINAL_FILENAME`` | ||||
|  | ||||
| The script can be in any language, but for a simple shell script | ||||
| example, you can take a look at `post-consumption-example.sh`_ in this project. | ||||
|  | ||||
| The post consumption script cannot cancel the consumption process. | ||||
|  | ||||
| The script's stdout and stderr will be logged line by line to the webserver log, along | ||||
| with the exit code of the script. | ||||
|  | ||||
|  | ||||
| Docker | ||||
| ------ | ||||
| Assumed you have ``/home/foo/paperless-ngx/scripts/post-consumption-example.sh``. | ||||
|  | ||||
| You can pass that script into the consumer container via a host mount in your ``docker-compose.yml``. | ||||
|  | ||||
| .. code:: bash | ||||
|  | ||||
|   ... | ||||
|   consumer: | ||||
|     ... | ||||
|     volumes: | ||||
|       ... | ||||
|       - /home/paperless-ngx/scripts:/path/in/container/scripts/ | ||||
|   ... | ||||
|  | ||||
| Example (docker-compose.yml): ``- /home/foo/paperless-ngx/scripts:/usr/src/paperless/scripts`` | ||||
|  | ||||
| which in turn requires the variable ``PAPERLESS_POST_CONSUME_SCRIPT`` in ``docker-compose.env``  to point to ``/path/in/container/scripts/post-consumption-example.sh``. | ||||
|  | ||||
| Example (docker-compose.env): ``PAPERLESS_POST_CONSUME_SCRIPT=/usr/src/paperless/scripts/post-consumption-example.sh`` | ||||
|  | ||||
| Troubleshooting: | ||||
|  | ||||
| - Monitor the docker-compose log ``cd ~/paperless-ngx; docker-compose logs -f`` | ||||
| - Check your script's permission e.g. in case of permission error ``sudo chmod 755 post-consumption-example.sh`` | ||||
| - Pipe your scripts's output to a log file e.g. ``echo "${DOCUMENT_ID}" | tee --append /usr/src/paperless/scripts/post-consumption-example.log`` | ||||
|  | ||||
| .. _post-consumption-example.sh: https://github.com/paperless-ngx/paperless-ngx/blob/main/scripts/post-consumption-example.sh | ||||
|  | ||||
| .. _advanced-file_name_handling: | ||||
|  | ||||
| File name handling | ||||
| ################## | ||||
|  | ||||
| By default, paperless stores your documents in the media directory and renames them | ||||
| using the identifier which it has assigned to each document. You will end up getting | ||||
| files like ``0000123.pdf`` in your media directory. This isn't necessarily a bad | ||||
| thing, because you normally don't have to access these files manually. However, if | ||||
| you wish to name your files differently, you can do that by adjusting the | ||||
| ``PAPERLESS_FILENAME_FORMAT`` configuration option. Paperless adds the correct | ||||
| file extension e.g. ``.pdf``, ``.jpg`` automatically. | ||||
|  | ||||
| This variable allows you to configure the filename (folders are allowed) using | ||||
| placeholders. For example, configuring this to | ||||
|  | ||||
| .. code:: bash | ||||
|  | ||||
|     PAPERLESS_FILENAME_FORMAT={created_year}/{correspondent}/{title} | ||||
|  | ||||
| will create a directory structure as follows: | ||||
|  | ||||
| .. code:: | ||||
|  | ||||
|     2019/ | ||||
|       My bank/ | ||||
|         Statement January.pdf | ||||
|         Statement February.pdf | ||||
|     2020/ | ||||
|       My bank/ | ||||
|         Statement January.pdf | ||||
|         Letter.pdf | ||||
|         Letter_01.pdf | ||||
|       Shoe store/ | ||||
|         My new shoes.pdf | ||||
|  | ||||
| .. danger:: | ||||
|  | ||||
|     Do not manually move your files in the media folder. Paperless remembers the | ||||
|     last filename a document was stored as. If you do rename a file, paperless will | ||||
|     report your files as missing and won't be able to find them. | ||||
|  | ||||
| Paperless provides the following placeholders within filenames: | ||||
|  | ||||
| * ``{asn}``: The archive serial number of the document, or "none". | ||||
| * ``{correspondent}``: The name of the correspondent, or "none". | ||||
| * ``{document_type}``: The name of the document type, or "none". | ||||
| * ``{tag_list}``: A comma separated list of all tags assigned to the document. | ||||
| * ``{title}``: The title of the document. | ||||
| * ``{created}``: The full date (ISO format) the document was created. | ||||
| * ``{created_year}``: Year created only, formatted as the year with century. | ||||
| * ``{created_year_short}``: Year created only, formatted as the year without century, zero padded. | ||||
| * ``{created_month}``: Month created only (number 01-12). | ||||
| * ``{created_month_name}``: Month created name, as per locale | ||||
| * ``{created_month_name_short}``: Month created abbreviated name, as per locale | ||||
| * ``{created_day}``: Day created only (number 01-31). | ||||
| * ``{added}``: The full date (ISO format) the document was added to paperless. | ||||
| * ``{added_year}``: Year added only. | ||||
| * ``{added_year_short}``: Year added only, formatted as the year without century, zero padded. | ||||
| * ``{added_month}``: Month added only (number 01-12). | ||||
| * ``{added_month_name}``: Month added name, as per locale | ||||
| * ``{added_month_name_short}``: Month added abbreviated name, as per locale | ||||
| * ``{added_day}``: Day added only (number 01-31). | ||||
|  | ||||
|  | ||||
| Paperless will try to conserve the information from your database as much as possible. | ||||
| However, some characters that you can use in document titles and correspondent names (such | ||||
| as ``: \ /`` and a couple more) are not allowed in filenames and will be replaced with dashes. | ||||
|  | ||||
| If paperless detects that two documents share the same filename, paperless will automatically | ||||
| append ``_01``, ``_02``, etc to the filename. This happens if all the placeholders in a filename | ||||
| evaluate to the same value. | ||||
|  | ||||
| .. hint:: | ||||
|     You can affect how empty placeholders are treated by changing the following setting to | ||||
|     `true`. | ||||
|  | ||||
|     .. code:: | ||||
|  | ||||
|         PAPERLESS_FILENAME_FORMAT_REMOVE_NONE=True | ||||
|  | ||||
|     Doing this results in all empty placeholders resolving to "" instead of "none" as stated above. | ||||
|     Spaces before empty placeholders are removed as well, empty directories are omitted. | ||||
|  | ||||
| .. hint:: | ||||
|  | ||||
|     Paperless checks the filename of a document whenever it is saved. Therefore, | ||||
|     you need to update the filenames of your documents and move them after altering | ||||
|     this setting by invoking the :ref:`document renamer <utilities-renamer>`. | ||||
|  | ||||
| .. warning:: | ||||
|  | ||||
|     Make absolutely sure you get the spelling of the placeholders right, or else | ||||
|     paperless will use the default naming scheme instead. | ||||
|  | ||||
| .. caution:: | ||||
|  | ||||
|     As of now, you could totally tell paperless to store your files anywhere outside | ||||
|     the media directory by setting | ||||
|  | ||||
|     .. code:: | ||||
|  | ||||
|         PAPERLESS_FILENAME_FORMAT=../../my/custom/location/{title} | ||||
|  | ||||
|     However, keep in mind that inside docker, if files get stored outside of the | ||||
|     predefined volumes, they will be lost after a restart of paperless. | ||||
|  | ||||
|  | ||||
| Storage paths | ||||
| ############# | ||||
|  | ||||
| One of the best things in Paperless is that you can not only access the documents via the | ||||
| web interface, but also via the file system. | ||||
|  | ||||
| When as single storage layout is not sufficient for your use case, storage paths come to | ||||
| the rescue. Storage paths allow you to configure more precisely where each document is stored | ||||
| in the file system. | ||||
|  | ||||
| - Each storage path is a `PAPERLESS_FILENAME_FORMAT` and follows the rules described above | ||||
| - Each document is assigned a storage path using the matching algorithms described above, but | ||||
|   can be overwritten at any time | ||||
|  | ||||
| For example, you could define the following two storage paths: | ||||
|  | ||||
| 1. Normal communications are put into a folder structure sorted by `year/correspondent` | ||||
| 2. Communications with insurance companies are stored in a flat structure with longer file names, | ||||
|    but containing the full date of the correspondence. | ||||
|  | ||||
| .. code:: | ||||
|  | ||||
|     By Year = {created_year}/{correspondent}/{title} | ||||
|     Insurances = Insurances/{correspondent}/{created_year}-{created_month}-{created_day} {title} | ||||
|  | ||||
|  | ||||
| If you then map these storage paths to the documents, you might get the following result. | ||||
| For simplicity, `By Year` defines the same structure as in the previous example above. | ||||
|  | ||||
| .. code:: text | ||||
|  | ||||
|    2019/                                   # By Year | ||||
|       My bank/ | ||||
|         Statement January.pdf | ||||
|         Statement February.pdf | ||||
|  | ||||
|     Insurances/                           # Insurances | ||||
|       Healthcare 123/ | ||||
|         2022-01-01 Statement January.pdf | ||||
|         2022-02-02 Letter.pdf | ||||
|         2022-02-03 Letter.pdf | ||||
|       Dental 456/ | ||||
|         2021-12-01 New Conditions.pdf | ||||
|  | ||||
|  | ||||
| .. hint:: | ||||
|  | ||||
|     Defining a storage path is optional. If no storage path is defined for a document, the global | ||||
|     `PAPERLESS_FILENAME_FORMAT` is applied. | ||||
|  | ||||
| .. caution:: | ||||
|  | ||||
|     If you adjust the format of an existing storage path, old documents don't get relocated automatically. | ||||
|     You need to run the :ref:`document renamer <utilities-renamer>` to adjust their pathes. | ||||
|  | ||||
| .. _advanced-celery-monitoring: | ||||
|  | ||||
| Celery Monitoring | ||||
| ################# | ||||
|  | ||||
| The monitoring tool `Flower <https://flower.readthedocs.io/en/latest/index.html>`_ can be used to view more | ||||
| detailed information about the health of the celery workers used for asynchronous tasks.  This includes details | ||||
| on currently running, queued and completed tasks, timing and more.  Flower can also be used with Prometheus, as it | ||||
| exports metrics.  For details on its capabilities, refer to the Flower documentation. | ||||
|  | ||||
| To configure Flower further, create a `flowerconfig.py` and place it into the `src/paperless` directory.  For | ||||
| a Docker installation, you can use volumes to accomplish this: | ||||
|  | ||||
| .. code:: yaml | ||||
|  | ||||
|     services: | ||||
|       # ... | ||||
|       webserver: | ||||
|         # ... | ||||
|         volumes: | ||||
|           - /path/to/my/flowerconfig.py:/usr/src/paperless/src/paperless/flowerconfig.py:ro | ||||
|  | ||||
| Custom Container Initialization | ||||
| ############################### | ||||
|  | ||||
| The Docker image includes the ability to run custom user scripts during startup.  This could be | ||||
| utilized for installing additional tools or Python packages, for example. | ||||
|  | ||||
| To utilize this, mount a folder containing your scripts to the custom initialization directory, `/custom-cont-init.d` | ||||
| and place scripts you wish to run inside.  For security, the folder and its contents must be owned by `root`. | ||||
| Additionally, scripts must only be writable by `root`. | ||||
|  | ||||
| Your scripts will be run directly before the webserver completes startup.  Scripts will be run by the `root` user. | ||||
| This is an advanced functionality with which you could break functionality or lose data. | ||||
|  | ||||
| For example, using Docker Compose: | ||||
|  | ||||
|  | ||||
| .. code:: yaml | ||||
|  | ||||
|     services: | ||||
|       # ... | ||||
|       webserver: | ||||
|         # ... | ||||
|         volumes: | ||||
|           - /path/to/my/scripts:/custom-cont-init.d:ro | ||||
|  | ||||
| .. _advanced-mysql-caveats: | ||||
|  | ||||
| MySQL Caveats | ||||
| ############# | ||||
|  | ||||
| Case Sensitivity | ||||
| ================ | ||||
|  | ||||
| The database interface does not provide a method to configure a MySQL database to | ||||
| be case sensitive.  This would prevent a user from creating a tag ``Name`` and ``NAME`` | ||||
| as they are considered the same. | ||||
|  | ||||
| Per Django documentation, to enable this requires manual intervention.  To enable | ||||
| case sensetive tables, you can execute the following command against each table: | ||||
|  | ||||
| ``ALTER TABLE <table_name> CONVERT TO CHARACTER SET utf8mb4 COLLATE utf8mb4_bin;`` | ||||
|  | ||||
| You can also set the default for new tables (this does NOT affect existing tables) with: | ||||
|  | ||||
| ``ALTER DATABASE <db_name> CHARACTER SET utf8mb4 COLLATE utf8mb4_bin;`` | ||||
|     You will be redirected shortly... | ||||
|   | ||||
							
								
								
									
										299
									
								
								docs/api.rst
									
									
									
									
									
								
							
							
						
						
									
										299
									
								
								docs/api.rst
									
									
									
									
									
								
							| @@ -1,303 +1,12 @@ | ||||
| .. _api: | ||||
|  | ||||
| ************ | ||||
| The REST API | ||||
| ************ | ||||
|  | ||||
|  | ||||
| Paperless makes use of the `Django REST Framework`_ standard API interface. | ||||
| It provides a browsable API for most of its endpoints, which you can inspect | ||||
| at ``http://<paperless-host>:<port>/api/``. This also documents most of the | ||||
| available filters and ordering fields. | ||||
| .. cssclass:: redirect-notice | ||||
|  | ||||
| .. _Django REST Framework: http://django-rest-framework.org/ | ||||
|     The Paperless-ngx documentation has permanently moved. | ||||
|  | ||||
| The API provides 5 main endpoints: | ||||
|  | ||||
| *   ``/api/documents/``: Full CRUD support, except POSTing new documents. See below. | ||||
| *   ``/api/correspondents/``: Full CRUD support. | ||||
| *   ``/api/document_types/``: Full CRUD support. | ||||
| *   ``/api/logs/``: Read-Only. | ||||
| *   ``/api/tags/``: Full CRUD support. | ||||
|  | ||||
| All of these endpoints except for the logging endpoint | ||||
| allow you to fetch, edit and delete individual objects | ||||
| by appending their primary key to the path, for example ``/api/documents/454/``. | ||||
|  | ||||
| The objects served by the document endpoint contain the following fields: | ||||
|  | ||||
| *   ``id``: ID of the document. Read-only. | ||||
| *   ``title``: Title of the document. | ||||
| *   ``content``: Plain text content of the document. | ||||
| *   ``tags``: List of IDs of tags assigned to this document, or empty list. | ||||
| *   ``document_type``: Document type of this document, or null. | ||||
| *   ``correspondent``:  Correspondent of this document or null. | ||||
| *   ``created``: The date time at which this document was created. | ||||
| *   ``created_date``: The date (YYYY-MM-DD) at which this document was created. Optional. If also passed with created, this is ignored. | ||||
| *   ``modified``: The date at which this document was last edited in paperless. Read-only. | ||||
| *   ``added``: The date at which this document was added to paperless. Read-only. | ||||
| *   ``archive_serial_number``: The identifier of this document in a physical document archive. | ||||
| *   ``original_file_name``: Verbose filename of the original document. Read-only. | ||||
| *   ``archived_file_name``: Verbose filename of the archived document. Read-only. Null if no archived document is available. | ||||
|  | ||||
|  | ||||
| Downloading documents | ||||
| ##################### | ||||
|  | ||||
| In addition to that, the document endpoint offers these additional actions on | ||||
| individual documents: | ||||
|  | ||||
| *   ``/api/documents/<pk>/download/``: Download the document. | ||||
| *   ``/api/documents/<pk>/preview/``: Display the document inline, | ||||
|     without downloading it. | ||||
| *   ``/api/documents/<pk>/thumb/``: Download the PNG thumbnail of a document. | ||||
|  | ||||
| Paperless generates archived PDF/A documents from consumed files and stores both | ||||
| the original files as well as the archived files. By default, the endpoints | ||||
| for previews and downloads serve the archived file, if it is available. | ||||
| Otherwise, the original file is served. | ||||
| Some document cannot be archived. | ||||
|  | ||||
| The endpoints correctly serve the response header fields ``Content-Disposition`` | ||||
| and ``Content-Type`` to indicate the filename for download and the type of content of | ||||
| the document. | ||||
|  | ||||
| In order to download or preview the original document when an archived document is available, | ||||
| supply the query parameter ``original=true``. | ||||
|  | ||||
| .. hint:: | ||||
|  | ||||
|     Paperless used to provide these functionality at ``/fetch/<pk>/preview``, | ||||
|     ``/fetch/<pk>/thumb`` and ``/fetch/<pk>/doc``. Redirects to the new URLs | ||||
|     are in place. However, if you use these old URLs to access documents, you | ||||
|     should update your app or script to use the new URLs. | ||||
|  | ||||
|  | ||||
| Getting document metadata | ||||
| ######################### | ||||
|  | ||||
| The api also has an endpoint to retrieve read-only metadata about specific documents. this | ||||
| information is not served along with the document objects, since it requires reading | ||||
| files and would therefore slow down document lists considerably. | ||||
|  | ||||
| Access the metadata of a document with an ID ``id`` at ``/api/documents/<id>/metadata/``. | ||||
|  | ||||
| The endpoint reports the following data: | ||||
|  | ||||
| *   ``original_checksum``: MD5 checksum of the original document. | ||||
| *   ``original_size``: Size of the original document, in bytes. | ||||
| *   ``original_mime_type``: Mime type of the original document. | ||||
| *   ``media_filename``: Current filename of the document, under which it is stored inside the media directory. | ||||
| *   ``has_archive_version``: True, if this document is archived, false otherwise. | ||||
| *   ``original_metadata``: A list of metadata associated with the original document. See below. | ||||
| *   ``archive_checksum``: MD5 checksum of the archived document, or null. | ||||
| *   ``archive_size``: Size of the archived document in bytes, or null. | ||||
| *   ``archive_metadata``: Metadata associated with the archived document, or null. See below. | ||||
|  | ||||
| File metadata is reported as a list of objects in the following form: | ||||
|  | ||||
| .. code:: json | ||||
|  | ||||
|     [ | ||||
|         { | ||||
|             "namespace": "http://ns.adobe.com/pdf/1.3/", | ||||
|             "prefix": "pdf", | ||||
|             "key": "Producer", | ||||
|             "value": "SparklePDF, Fancy edition" | ||||
|         }, | ||||
|     ] | ||||
|  | ||||
| ``namespace`` and ``prefix`` can be null. The actual metadata reported depends on the file type and the metadata | ||||
| available in that specific document. Paperless only reports PDF metadata at this point. | ||||
|  | ||||
| Authorization | ||||
| ############# | ||||
|  | ||||
| The REST api provides three different forms of authentication. | ||||
|  | ||||
| 1.  Basic authentication | ||||
|  | ||||
|     Authorize by providing a HTTP header in the form | ||||
|  | ||||
|     .. code:: | ||||
|  | ||||
|         Authorization: Basic <credentials> | ||||
|  | ||||
|     where ``credentials`` is a base64-encoded string of ``<username>:<password>`` | ||||
|  | ||||
| 2.  Session authentication | ||||
|  | ||||
|     When you're logged into paperless in your browser, you're automatically | ||||
|     logged into the API as well and don't need to provide any authorization | ||||
|     headers. | ||||
|  | ||||
| 3.  Token authentication | ||||
|  | ||||
|     Paperless also offers an endpoint to acquire authentication tokens. | ||||
|  | ||||
|     POST a username and password as a form or json string to ``/api/token/`` | ||||
|     and paperless will respond with a token, if the login data is correct. | ||||
|     This token can be used to authenticate other requests with the | ||||
|     following HTTP header: | ||||
|  | ||||
|     .. code:: | ||||
|  | ||||
|         Authorization: Token <token> | ||||
|  | ||||
|     Tokens can be managed and revoked in the paperless admin. | ||||
|  | ||||
| Searching for documents | ||||
| ####################### | ||||
|  | ||||
| Full text searching is available on the ``/api/documents/`` endpoint. Two specific | ||||
| query parameters cause the API to return full text search results: | ||||
|  | ||||
| *   ``/api/documents/?query=your%20search%20query``: Search for a document using a full text query. | ||||
|     For details on the syntax, see :ref:`basic-usage_searching`. | ||||
|  | ||||
| *   ``/api/documents/?more_like=1234``: Search for documents similar to the document with id 1234. | ||||
|  | ||||
| Pagination works exactly the same as it does for normal requests on this endpoint. | ||||
|  | ||||
| Certain limitations apply to full text queries: | ||||
|  | ||||
| *   Results are always sorted by search score. The results matching the query best will show up first. | ||||
|  | ||||
| *   Only a small subset of filtering parameters are supported. | ||||
|  | ||||
| Furthermore, each returned document has an additional ``__search_hit__`` attribute with various information | ||||
| about the search results: | ||||
|  | ||||
| .. code:: | ||||
|  | ||||
|     { | ||||
|         "count": 31, | ||||
|         "next": "http://localhost:8000/api/documents/?page=2&query=test", | ||||
|         "previous": null, | ||||
|         "results": [ | ||||
|  | ||||
|             ... | ||||
|  | ||||
|             { | ||||
|                 "id": 123, | ||||
|                 "title": "title", | ||||
|                 "content": "content", | ||||
|  | ||||
|                 ... | ||||
|  | ||||
|                 "__search_hit__": { | ||||
|                     "score": 0.343, | ||||
|                     "highlights": "text <span class=\"match\">Test</span> text", | ||||
|                     "rank": 23 | ||||
|                 } | ||||
|             }, | ||||
|  | ||||
|             ... | ||||
|  | ||||
|         ] | ||||
|     } | ||||
|  | ||||
| *   ``score`` is an indication how well this document matches the query relative to the other search results. | ||||
| *   ``highlights`` is an excerpt from the document content and highlights the search terms with ``<span>`` tags as shown above. | ||||
| *   ``rank`` is the index of the search results. The first result will have rank 0. | ||||
|  | ||||
| ``/api/search/autocomplete/`` | ||||
| ============================= | ||||
|  | ||||
| Get auto completions for a partial search term. | ||||
|  | ||||
| Query parameters: | ||||
|  | ||||
| *   ``term``: The incomplete term. | ||||
| *   ``limit``: Amount of results. Defaults to 10. | ||||
|  | ||||
| Results returned by the endpoint are ordered by importance of the term in the | ||||
| document index. The first result is the term that has the highest Tf/Idf score | ||||
| in the index. | ||||
|  | ||||
| .. code:: json | ||||
|  | ||||
|     [ | ||||
|         "term1", | ||||
|         "term3", | ||||
|         "term6", | ||||
|         "term4" | ||||
|     ] | ||||
|  | ||||
|  | ||||
| .. _api-file_uploads: | ||||
|  | ||||
| POSTing documents | ||||
| ################# | ||||
|  | ||||
| The API provides a special endpoint for file uploads: | ||||
|  | ||||
| ``/api/documents/post_document/`` | ||||
|  | ||||
| POST a multipart form to this endpoint, where the form field ``document`` contains | ||||
| the document that you want to upload to paperless. The filename is sanitized and | ||||
| then used to store the document in a temporary directory, and the consumer will | ||||
| be instructed to consume the document from there. | ||||
|  | ||||
| The endpoint supports the following optional form fields: | ||||
|  | ||||
| *   ``title``: Specify a title that the consumer should use for the document. | ||||
| *   ``created``: Specify a DateTime where the document was created (e.g. "2016-04-19" or "2016-04-19 06:15:00+02:00"). | ||||
| *   ``correspondent``: Specify the ID of a correspondent that the consumer should use for the document. | ||||
| *   ``document_type``: Similar to correspondent. | ||||
| *   ``tags``: Similar to correspondent. Specify this multiple times to have multiple tags added | ||||
|     to the document. | ||||
|  | ||||
|  | ||||
| The endpoint will immediately return "OK" if the document consumption process | ||||
| was started successfully. No additional status information about the consumption | ||||
| process itself is available, since that happens in a different process. | ||||
|  | ||||
|  | ||||
| .. _api-versioning: | ||||
|  | ||||
| API Versioning | ||||
| ############## | ||||
|  | ||||
| The REST API is versioned since Paperless-ngx 1.3.0. | ||||
|  | ||||
| * Versioning ensures that changes to the API don't break older clients. | ||||
| * Clients specify the specific version of the API they wish to use with every request and Paperless will handle the request using the specified API version. | ||||
| * Even if the underlying data model changes, older API versions will always serve compatible data. | ||||
| * If no version is specified, Paperless will serve version 1 to ensure compatibility with older clients that do not request a specific API version. | ||||
|  | ||||
| API versions are specified by submitting an additional HTTP ``Accept`` header with every request: | ||||
|  | ||||
| .. code:: | ||||
|  | ||||
|     Accept: application/json; version=6 | ||||
|  | ||||
| If an invalid version is specified, Paperless 1.3.0 will respond with "406 Not Acceptable" and an error message in the body. | ||||
| Earlier versions of Paperless will serve API version 1 regardless of whether a version is specified via the ``Accept`` header. | ||||
|  | ||||
| If a client wishes to verify whether it is compatible with any given server, the following procedure should be performed: | ||||
|  | ||||
| 1.  Perform an *authenticated* request against any API endpoint. If the server is on version 1.3.0 or newer, the server will | ||||
|     add two custom headers to the response: | ||||
|  | ||||
|     .. code:: | ||||
|  | ||||
|         X-Api-Version: 2 | ||||
|         X-Version: 1.3.0 | ||||
|  | ||||
| 2.  Determine whether the client is compatible with this server based on the presence/absence of these headers and their values if present. | ||||
|  | ||||
|  | ||||
| API Changelog | ||||
| ============= | ||||
|  | ||||
| Version 1 | ||||
| --------- | ||||
|  | ||||
| Initial API version. | ||||
|  | ||||
| Version 2 | ||||
| --------- | ||||
|  | ||||
| * Added field ``Tag.color``. This read/write string field contains a hex color such as ``#a6cee3``. | ||||
| * Added read-only field ``Tag.text_color``. This field contains the text color to use for a specific tag, which is either black or white depending on the brightness of ``Tag.color``. | ||||
| * Removed field ``Tag.colour``. | ||||
|     You will be redirected shortly... | ||||
|   | ||||
							
								
								
									
										2445
									
								
								docs/changelog.md
									
									
									
									
									
								
							
							
						
						
									
										2445
									
								
								docs/changelog.md
									
									
									
									
									
								
							
										
											
												File diff suppressed because it is too large
												Load Diff
											
										
									
								
							
							
								
								
									
										11
									
								
								docs/changelog.rst
									
									
									
									
									
										Normal file
									
								
							
							
						
						
									
										11
									
								
								docs/changelog.rst
									
									
									
									
									
										Normal file
									
								
							| @@ -0,0 +1,11 @@ | ||||
| .. _changelog: | ||||
|  | ||||
| ********* | ||||
| Changelog | ||||
| ********* | ||||
|  | ||||
| .. cssclass:: redirect-notice | ||||
|  | ||||
|     The Paperless-ngx documentation has permanently moved. | ||||
|  | ||||
|     You will be redirected shortly... | ||||
| @@ -4,928 +4,9 @@ | ||||
| Configuration | ||||
| ************* | ||||
|  | ||||
| Paperless provides a wide range of customizations. | ||||
| Depending on how you run paperless, these settings have to be defined in different | ||||
| places. | ||||
|  | ||||
| *   If you run paperless on docker, ``paperless.conf`` is not used. Rather, configure | ||||
|     paperless by copying necessary options to ``docker-compose.env``. | ||||
| *   If you are running paperless on anything else, paperless will search for the | ||||
|     configuration file in these locations and use the first one it finds: | ||||
| .. cssclass:: redirect-notice | ||||
|  | ||||
|     .. code:: | ||||
|     The Paperless-ngx documentation has permanently moved. | ||||
|  | ||||
|         /path/to/paperless/paperless.conf | ||||
|         /etc/paperless.conf | ||||
|         /usr/local/etc/paperless.conf | ||||
|  | ||||
|  | ||||
| Required services | ||||
| ################# | ||||
|  | ||||
| PAPERLESS_REDIS=<url> | ||||
|     This is required for processing scheduled tasks such as email fetching, index | ||||
|     optimization and for training the automatic document matcher. | ||||
|  | ||||
|     * If your Redis server needs login credentials PAPERLESS_REDIS = ``redis://<username>:<password>@<host>:<port>`` | ||||
|  | ||||
|     * With the requirepass option PAPERLESS_REDIS = ``redis://:<password>@<host>:<port>`` | ||||
|  | ||||
|     `More information on securing your Redis Instance <https://redis.io/docs/getting-started/#securing-redis>`_. | ||||
|  | ||||
|     Defaults to redis://localhost:6379. | ||||
|  | ||||
| PAPERLESS_DBENGINE=<engine_name> | ||||
|     Optional, gives the ability to choose Postgres or MariaDB for database engine. | ||||
|     Available options are `postgresql` and `mariadb`. | ||||
|  | ||||
|     Default is `postgresql`. | ||||
|  | ||||
|     .. warning:: | ||||
|  | ||||
|       Using MariaDB comes with some caveats.  See :ref:`advanced-mysql-caveats` for details. | ||||
|  | ||||
|  | ||||
| PAPERLESS_DBHOST=<hostname> | ||||
|     By default, sqlite is used as the database backend. This can be changed here. | ||||
|  | ||||
|     Set PAPERLESS_DBHOST and another database will be used instead of sqlite. | ||||
|  | ||||
| PAPERLESS_DBPORT=<port> | ||||
|     Adjust port if necessary. | ||||
|  | ||||
|     Default is 5432. | ||||
|  | ||||
| PAPERLESS_DBNAME=<name> | ||||
|     Database name in PostgreSQL or MariaDB. | ||||
|  | ||||
|     Defaults to "paperless". | ||||
|  | ||||
| PAPERLESS_DBUSER=<name> | ||||
|     Database user in PostgreSQL or MariaDB. | ||||
|  | ||||
|     Defaults to "paperless". | ||||
|  | ||||
| PAPERLESS_DBPASS=<password> | ||||
|     Database password for PostgreSQL or MariaDB. | ||||
|  | ||||
|     Defaults to "paperless". | ||||
|  | ||||
| PAPERLESS_DBSSLMODE=<mode> | ||||
|     SSL mode to use when connecting to PostgreSQL. | ||||
|  | ||||
|     See `the official documentation about sslmode <https://www.postgresql.org/docs/current/libpq-ssl.html>`_. | ||||
|  | ||||
|     Default is ``prefer``. | ||||
|  | ||||
| PAPERLESS_DB_TIMEOUT=<float> | ||||
|     Amount of time for a database connection to wait for the database to unlock. | ||||
|     Mostly applicable for an sqlite based installation, consider changing to postgresql | ||||
|     if you need to increase this. | ||||
|  | ||||
|     Defaults to unset, keeping the Django defaults. | ||||
|  | ||||
| Paths and folders | ||||
| ################# | ||||
|  | ||||
| PAPERLESS_CONSUMPTION_DIR=<path> | ||||
|     This where your documents should go to be consumed.  Make sure that it exists | ||||
|     and that the user running the paperless service can read/write its contents | ||||
|     before you start Paperless. | ||||
|  | ||||
|     Don't change this when using docker, as it only changes the path within the | ||||
|     container. Change the local consumption directory in the docker-compose.yml | ||||
|     file instead. | ||||
|  | ||||
|     Defaults to "../consume/", relative to the "src" directory. | ||||
|  | ||||
| PAPERLESS_DATA_DIR=<path> | ||||
|     This is where paperless stores all its data (search index, SQLite database, | ||||
|     classification model, etc). | ||||
|  | ||||
|     Defaults to "../data/", relative to the "src" directory. | ||||
|  | ||||
| PAPERLESS_TRASH_DIR=<path> | ||||
|     Instead of removing deleted documents, they are moved to this directory. | ||||
|  | ||||
|     This must be writeable by the user running paperless. When running inside | ||||
|     docker, ensure that this path is within a permanent volume (such as | ||||
|     "../media/trash") so it won't get lost on upgrades. | ||||
|  | ||||
|     Defaults to empty (i.e. really delete documents). | ||||
|  | ||||
| PAPERLESS_MEDIA_ROOT=<path> | ||||
|     This is where your documents and thumbnails are stored. | ||||
|  | ||||
|     You can set this and PAPERLESS_DATA_DIR to the same folder to have paperless | ||||
|     store all its data within the same volume. | ||||
|  | ||||
|     Defaults to "../media/", relative to the "src" directory. | ||||
|  | ||||
| PAPERLESS_STATICDIR=<path> | ||||
|     Override the default STATIC_ROOT here.  This is where all static files | ||||
|     created using "collectstatic" manager command are stored. | ||||
|  | ||||
|     Unless you're doing something fancy, there is no need to override this. | ||||
|  | ||||
|     Defaults to "../static/", relative to the "src" directory. | ||||
|  | ||||
| PAPERLESS_FILENAME_FORMAT=<format> | ||||
|     Changes the filenames paperless uses to store documents in the media directory. | ||||
|     See :ref:`advanced-file_name_handling` for details. | ||||
|  | ||||
|     Default is none, which disables this feature. | ||||
|  | ||||
| PAPERLESS_FILENAME_FORMAT_REMOVE_NONE=<bool> | ||||
|     Tells paperless to replace placeholders in `PAPERLESS_FILENAME_FORMAT` that would resolve | ||||
|     to 'none' to be omitted from the resulting filename. This also holds true for directory | ||||
|     names. | ||||
|     See :ref:`advanced-file_name_handling` for details. | ||||
|  | ||||
|     Defaults to `false` which disables this feature. | ||||
|  | ||||
| PAPERLESS_LOGGING_DIR=<path> | ||||
|     This is where paperless will store log files. | ||||
|  | ||||
|     Defaults to "``PAPERLESS_DATA_DIR``/log/". | ||||
|  | ||||
|  | ||||
| Logging | ||||
| ####### | ||||
|  | ||||
| PAPERLESS_LOGROTATE_MAX_SIZE=<num> | ||||
|     Maximum file size for log files before they are rotated, in bytes. | ||||
|  | ||||
|     Defaults to 1 MiB. | ||||
|  | ||||
| PAPERLESS_LOGROTATE_MAX_BACKUPS=<num> | ||||
|     Number of rotated log files to keep. | ||||
|  | ||||
|     Defaults to 20. | ||||
|  | ||||
| .. _hosting-and-security: | ||||
|  | ||||
| Hosting & Security | ||||
| ################## | ||||
|  | ||||
| PAPERLESS_SECRET_KEY=<key> | ||||
|     Paperless uses this to make session tokens. If you expose paperless on the | ||||
|     internet, you need to change this, since the default secret is well known. | ||||
|  | ||||
|     Use any sequence of characters. The more, the better. You don't need to | ||||
|     remember this. Just face-roll your keyboard. | ||||
|  | ||||
|     Default is listed in the file ``src/paperless/settings.py``. | ||||
|  | ||||
| PAPERLESS_URL=<url> | ||||
|     This setting can be used to set the three options below (ALLOWED_HOSTS, | ||||
|     CORS_ALLOWED_HOSTS and CSRF_TRUSTED_ORIGINS). If the other options are | ||||
|     set the values will be combined with this one. Do not include a trailing | ||||
|     slash. E.g. https://paperless.domain.com | ||||
|  | ||||
|     Defaults to empty string, leaving the other settings unaffected. | ||||
|  | ||||
| PAPERLESS_CSRF_TRUSTED_ORIGINS=<comma-separated-list> | ||||
|     A list of trusted origins for unsafe requests (e.g. POST). As of Django 4.0 | ||||
|     this is required to access the Django admin via the web. | ||||
|     See https://docs.djangoproject.com/en/4.0/ref/settings/#csrf-trusted-origins | ||||
|  | ||||
|     Can also be set using PAPERLESS_URL (see above). | ||||
|  | ||||
|     Defaults to empty string, which does not add any origins to the trusted list. | ||||
|  | ||||
| PAPERLESS_ALLOWED_HOSTS=<comma-separated-list> | ||||
|     If you're planning on putting Paperless on the open internet, then you | ||||
|     really should set this value to the domain name you're using.  Failing to do | ||||
|     so leaves you open to HTTP host header attacks: | ||||
|     https://docs.djangoproject.com/en/3.1/topics/security/#host-header-validation | ||||
|  | ||||
|     Just remember that this is a comma-separated list, so "example.com" is fine, | ||||
|     as is "example.com,www.example.com", but NOT " example.com" or "example.com," | ||||
|  | ||||
|     Can also be set using PAPERLESS_URL (see above). | ||||
|  | ||||
|     If manually set, please remember to include "localhost". Otherwise docker | ||||
|     healthcheck will fail. | ||||
|  | ||||
|     Defaults to "*", which is all hosts. | ||||
|  | ||||
| PAPERLESS_CORS_ALLOWED_HOSTS=<comma-separated-list> | ||||
|     You need to add your servers to the list of allowed hosts that can do CORS | ||||
|     calls. Set this to your public domain name. | ||||
|  | ||||
|     Can also be set using PAPERLESS_URL (see above). | ||||
|  | ||||
|     Defaults to "http://localhost:8000". | ||||
|  | ||||
| PAPERLESS_FORCE_SCRIPT_NAME=<path> | ||||
|     To host paperless under a subpath url like example.com/paperless you set | ||||
|     this value to /paperless. No trailing slash! | ||||
|  | ||||
|     Defaults to none, which hosts paperless at "/". | ||||
|  | ||||
| PAPERLESS_STATIC_URL=<path> | ||||
|     Override the STATIC_URL here.  Unless you're hosting Paperless off a | ||||
|     subdomain like /paperless/, you probably don't need to change this. | ||||
|     If you do change it, be sure to include the trailing slash. | ||||
|  | ||||
|     Defaults to "/static/". | ||||
|  | ||||
|     .. note:: | ||||
|  | ||||
|         When hosting paperless behind a reverse proxy like Traefik or Nginx at a subpath e.g. | ||||
|         example.com/paperlessngx you will also need to set ``PAPERLESS_FORCE_SCRIPT_NAME`` | ||||
|         (see above). | ||||
|  | ||||
| PAPERLESS_AUTO_LOGIN_USERNAME=<username> | ||||
|     Specify a username here so that paperless will automatically perform login | ||||
|     with the selected user. | ||||
|  | ||||
|     .. danger:: | ||||
|  | ||||
|         Do not use this when exposing paperless on the internet. There are no | ||||
|         checks in place that would prevent you from doing this. | ||||
|  | ||||
|     Defaults to none, which disables this feature. | ||||
|  | ||||
| PAPERLESS_ADMIN_USER=<username> | ||||
|     If this environment variable is specified, Paperless automatically creates | ||||
|     a superuser with the provided username at start. This is useful in cases | ||||
|     where you can not run the `createsuperuser` command separately, such as Kubernetes | ||||
|     or AWS ECS. | ||||
|  | ||||
|     Requires `PAPERLESS_ADMIN_PASSWORD` to be set. | ||||
|  | ||||
|     .. note:: | ||||
|  | ||||
|         This will not change an existing [super]user's password, nor will | ||||
|         it recreate a user that already exists. You can leave this throughout | ||||
|         the lifecycle of the containers. | ||||
|  | ||||
| PAPERLESS_ADMIN_MAIL=<email> | ||||
|     (Optional) Specify superuser email address. Only used when | ||||
|     `PAPERLESS_ADMIN_USER` is set. | ||||
|  | ||||
|     Defaults to ``root@localhost``. | ||||
|  | ||||
| PAPERLESS_ADMIN_PASSWORD=<password> | ||||
|     Only used when `PAPERLESS_ADMIN_USER` is set. | ||||
|     This will be the password of the automatically created superuser. | ||||
|  | ||||
|  | ||||
| PAPERLESS_COOKIE_PREFIX=<str> | ||||
|     Specify a prefix that is added to the cookies used by paperless to identify | ||||
|     the currently logged in user. This is useful for when you're running two | ||||
|     instances of paperless on the same host. | ||||
|  | ||||
|     After changing this, you will have to login again. | ||||
|  | ||||
|     Defaults to ``""``, which does not alter the cookie names. | ||||
|  | ||||
| PAPERLESS_ENABLE_HTTP_REMOTE_USER=<bool> | ||||
|     Allows authentication via HTTP_REMOTE_USER which is used by some SSO | ||||
|     applications. | ||||
|  | ||||
|     .. warning:: | ||||
|  | ||||
|         This will allow authentication by simply adding a ``Remote-User: <username>`` header | ||||
|         to a request. Use with care! You especially *must* ensure that any such header is not | ||||
|         passed from your proxy server to paperless. | ||||
|  | ||||
|         If you're exposing paperless to the internet directly, do not use this. | ||||
|  | ||||
|         Also see the warning `in the official documentation <https://docs.djangoproject.com/en/3.1/howto/auth-remote-user/#configuration>`. | ||||
|  | ||||
|     Defaults to `false` which disables this feature. | ||||
|  | ||||
| PAPERLESS_HTTP_REMOTE_USER_HEADER_NAME=<str> | ||||
|     If `PAPERLESS_ENABLE_HTTP_REMOTE_USER` is enabled, this property allows to | ||||
|     customize the name of the HTTP header from which the authenticated username | ||||
|     is extracted. Values are in terms of | ||||
|     [HttpRequest.META](https://docs.djangoproject.com/en/3.1/ref/request-response/#django.http.HttpRequest.META). | ||||
|     Thus, the configured value must start with `HTTP_` followed by the | ||||
|     normalized actual header name. | ||||
|  | ||||
|     Defaults to `HTTP_REMOTE_USER`. | ||||
|  | ||||
| PAPERLESS_LOGOUT_REDIRECT_URL=<str> | ||||
|     URL to redirect the user to after a logout. This can be used together with | ||||
|     `PAPERLESS_ENABLE_HTTP_REMOTE_USER` to redirect the user back to the SSO | ||||
|     application's logout page. | ||||
|  | ||||
|     Defaults to None, which disables this feature. | ||||
|  | ||||
| .. _configuration-ocr: | ||||
|  | ||||
| OCR settings | ||||
| ############ | ||||
|  | ||||
| Paperless uses `OCRmyPDF <https://ocrmypdf.readthedocs.io/en/latest/>`_ for | ||||
| performing OCR on documents and images. Paperless uses sensible defaults for | ||||
| most settings, but all of them can be configured to your needs. | ||||
|  | ||||
| PAPERLESS_OCR_LANGUAGE=<lang> | ||||
|     Customize the language that paperless will attempt to use when | ||||
|     parsing documents. | ||||
|  | ||||
|     It should be a 3-letter language code consistent with ISO | ||||
|     639: https://www.loc.gov/standards/iso639-2/php/code_list.php | ||||
|  | ||||
|     Set this to the language most of your documents are written in. | ||||
|  | ||||
|     This can be a combination of multiple languages such as ``deu+eng``, | ||||
|     in which case tesseract will use whatever language matches best. | ||||
|     Keep in mind that tesseract uses much more cpu time with multiple | ||||
|     languages enabled. | ||||
|  | ||||
|     Defaults to "eng". | ||||
|  | ||||
| 		Note: If your language contains a '-' such as chi-sim, you must use chi_sim | ||||
|  | ||||
| PAPERLESS_OCR_MODE=<mode> | ||||
|     Tell paperless when and how to perform ocr on your documents. Four modes | ||||
|     are available: | ||||
|  | ||||
|     *   ``skip``: Paperless skips all pages and will perform ocr only on pages | ||||
|         where no text is present. This is the safest option. | ||||
|     *   ``skip_noarchive``: In addition to skip, paperless won't create an | ||||
|         archived version of your documents when it finds any text in them. | ||||
|         This is useful if you don't want to have two almost-identical versions | ||||
|         of your digital documents in the media folder. This is the fastest option. | ||||
|     *   ``redo``: Paperless will OCR all pages of your documents and attempt to | ||||
|         replace any existing text layers with new text. This will be useful for | ||||
|         documents from scanners that already performed OCR with insufficient | ||||
|         results. It will also perform OCR on purely digital documents. | ||||
|  | ||||
|         This option may fail on some documents that have features that cannot | ||||
|         be removed, such as forms. In this case, the text from the document is | ||||
|         used instead. | ||||
|     *   ``force``: Paperless rasterizes your documents, converting any text | ||||
|         into images and puts the OCRed text on top. This works for all documents, | ||||
|         however, the resulting document may be significantly larger and text | ||||
|         won't appear as sharp when zoomed in. | ||||
|  | ||||
|     The default is ``skip``, which only performs OCR when necessary and always | ||||
|     creates archived documents. | ||||
|  | ||||
|     Read more about this in the `OCRmyPDF documentation <https://ocrmypdf.readthedocs.io/en/latest/advanced.html#when-ocr-is-skipped>`_. | ||||
|  | ||||
| PAPERLESS_OCR_CLEAN=<mode> | ||||
|     Tells paperless to use ``unpaper`` to clean any input document before | ||||
|     sending it to tesseract. This uses more resources, but generally results | ||||
|     in better OCR results. The following modes are available: | ||||
|  | ||||
|     *   ``clean``: Apply unpaper. | ||||
|     *   ``clean-final``: Apply unpaper, and use the cleaned images to build the | ||||
|         output file instead of the original images. | ||||
|     *   ``none``: Do not apply unpaper. | ||||
|  | ||||
|     Defaults to ``clean``. | ||||
|  | ||||
|     .. note:: | ||||
|  | ||||
|         ``clean-final`` is incompatible with ocr mode ``redo``. When both | ||||
|         ``clean-final`` and the ocr mode ``redo`` is configured, ``clean`` | ||||
|         is used instead. | ||||
|  | ||||
| PAPERLESS_OCR_DESKEW=<bool> | ||||
|     Tells paperless to correct skewing (slight rotation of input images mainly | ||||
|     due to improper scanning) | ||||
|  | ||||
|     Defaults to ``true``, which enables this feature. | ||||
|  | ||||
|     .. note:: | ||||
|  | ||||
|         Deskewing is incompatible with ocr mode ``redo``. Deskewing will get | ||||
|         disabled automatically if ``redo`` is used as the ocr mode. | ||||
|  | ||||
| PAPERLESS_OCR_ROTATE_PAGES=<bool> | ||||
|     Tells paperless to correct page rotation (90°, 180° and 270° rotation). | ||||
|  | ||||
|     If you notice that paperless is not rotating incorrectly rotated | ||||
|     pages (or vice versa), try adjusting the threshold up or down (see below). | ||||
|  | ||||
|     Defaults to ``true``, which enables this feature. | ||||
|  | ||||
|  | ||||
| PAPERLESS_OCR_ROTATE_PAGES_THRESHOLD=<num> | ||||
|     Adjust the threshold for automatic page rotation by ``PAPERLESS_OCR_ROTATE_PAGES``. | ||||
|     This is an arbitrary value reported by tesseract. "15" is a very conservative value, | ||||
|     whereas "2" is a very aggressive option and will often result in correctly rotated pages | ||||
|     being rotated as well. | ||||
|  | ||||
|     Defaults to "12". | ||||
|  | ||||
| PAPERLESS_OCR_OUTPUT_TYPE=<type> | ||||
|     Specify the the type of PDF documents that paperless should produce. | ||||
|  | ||||
|     *   ``pdf``: Modify the PDF document as little as possible. | ||||
|     *   ``pdfa``: Convert PDF documents into PDF/A-2b documents, which is a | ||||
|         subset of the entire PDF specification and meant for storing | ||||
|         documents long term. | ||||
|     *   ``pdfa-1``, ``pdfa-2``, ``pdfa-3`` to specify the exact version of | ||||
|         PDF/A you wish to use. | ||||
|  | ||||
|     If not specified, ``pdfa`` is used. Remember that paperless also keeps | ||||
|     the original input file as well as the archived version. | ||||
|  | ||||
|  | ||||
| PAPERLESS_OCR_PAGES=<num> | ||||
|     Tells paperless to use only the specified amount of pages for OCR. Documents | ||||
|     with less than the specified amount of pages get OCR'ed completely. | ||||
|  | ||||
|     Specifying 1 here will only use the first page. | ||||
|  | ||||
|     When combined with ``PAPERLESS_OCR_MODE=redo`` or ``PAPERLESS_OCR_MODE=force``, | ||||
|     paperless will not modify any text it finds on excluded pages and copy it | ||||
|     verbatim. | ||||
|  | ||||
|     Defaults to 0, which disables this feature and always uses all pages. | ||||
|  | ||||
| PAPERLESS_OCR_IMAGE_DPI=<num> | ||||
|     Paperless will OCR any images you put into the system and convert them | ||||
|     into PDF documents. This is useful if your scanner produces images. | ||||
|     In order to do so, paperless needs to know the DPI of the image. | ||||
|     Most images from scanners will have this information embedded and | ||||
|     paperless will detect and use that information. In case this fails, it | ||||
|     uses this value as a fallback. | ||||
|  | ||||
|     Set this to the DPI your scanner produces images at. | ||||
|  | ||||
|     Default is none, which will automatically calculate image DPI so that | ||||
|     the produced PDF documents are A4 sized. | ||||
|  | ||||
| PAPERLESS_OCR_MAX_IMAGE_PIXELS=<num> | ||||
|     Paperless will raise a warning when OCRing images which are over this limit and | ||||
|     will not OCR images which are more than twice this limit.  Note this does not | ||||
|     prevent the document from being consumed, but could result in missing text content. | ||||
|  | ||||
|     If unset, will default to the value determined by | ||||
|     `Pillow <https://pillow.readthedocs.io/en/stable/reference/Image.html#PIL.Image.MAX_IMAGE_PIXELS>`_. | ||||
|  | ||||
|     .. note:: | ||||
|  | ||||
|         Increasing this limit could cause Paperless to consume additional resources | ||||
|         when consuming a file.  Be sure you have sufficient system resources. | ||||
|  | ||||
|     .. caution:: | ||||
|  | ||||
|         The limit is intended to prevent malicious files from consuming system resources | ||||
|         and causing crashes and other errors.  Only increase this value if you are certain | ||||
|         your documents are not malicious and you need the text which was not OCRed | ||||
|  | ||||
| PAPERLESS_OCR_USER_ARGS=<json> | ||||
|     OCRmyPDF offers many more options. Use this parameter to specify any | ||||
|     additional arguments you wish to pass to OCRmyPDF. Since Paperless uses | ||||
|     the API of OCRmyPDF, you have to specify these in a format that can be | ||||
|     passed to the API. See `the API reference of OCRmyPDF <https://ocrmypdf.readthedocs.io/en/latest/api.html#reference>`_ | ||||
|     for valid parameters. All command line options are supported, but they | ||||
|     use underscores instead of dashes. | ||||
|  | ||||
|     .. caution:: | ||||
|  | ||||
|         Paperless has been tested to work with the OCR options provided | ||||
|         above. There are many options that are incompatible with each other, | ||||
|         so specifying invalid options may prevent paperless from consuming | ||||
|         any documents. | ||||
|  | ||||
|     Specify arguments as a JSON dictionary. Keep note of lower case booleans | ||||
|     and double quoted parameter names and strings. Examples: | ||||
|  | ||||
|     .. code:: json | ||||
|  | ||||
|         {"deskew": true, "optimize": 3, "unpaper_args": "--pre-rotate 90"} | ||||
|  | ||||
| .. _configuration-tika: | ||||
|  | ||||
| Tika settings | ||||
| ############# | ||||
|  | ||||
| Paperless can make use of `Tika <https://tika.apache.org/>`_ and | ||||
| `Gotenberg <https://gotenberg.dev/>`_ for parsing and | ||||
| converting "Office" documents (such as ".doc", ".xlsx" and ".odt"). If you | ||||
| wish to use this, you must provide a Tika server and a Gotenberg server, | ||||
| configure their endpoints, and enable the feature. | ||||
|  | ||||
| PAPERLESS_TIKA_ENABLED=<bool> | ||||
|     Enable (or disable) the Tika parser. | ||||
|  | ||||
|     Defaults to false. | ||||
|  | ||||
| PAPERLESS_TIKA_ENDPOINT=<url> | ||||
|     Set the endpoint URL were Paperless can reach your Tika server. | ||||
|  | ||||
|     Defaults to "http://localhost:9998". | ||||
|  | ||||
| PAPERLESS_TIKA_GOTENBERG_ENDPOINT=<url> | ||||
|     Set the endpoint URL were Paperless can reach your Gotenberg server. | ||||
|  | ||||
|     Defaults to "http://localhost:3000". | ||||
|  | ||||
| If you run paperless on docker, you can add those services to the docker-compose | ||||
| file (see the provided ``docker-compose.sqlite-tika.yml`` file for reference). The changes | ||||
| requires are as follows: | ||||
|  | ||||
| .. code:: yaml | ||||
|  | ||||
|     services: | ||||
|         # ... | ||||
|  | ||||
|         webserver: | ||||
|             # ... | ||||
|  | ||||
|             environment: | ||||
|                 # ... | ||||
|  | ||||
|                 PAPERLESS_TIKA_ENABLED: 1 | ||||
|                 PAPERLESS_TIKA_GOTENBERG_ENDPOINT: http://gotenberg:3000 | ||||
|                 PAPERLESS_TIKA_ENDPOINT: http://tika:9998 | ||||
|  | ||||
|         # ... | ||||
|  | ||||
|         gotenberg: | ||||
|             image: gotenberg/gotenberg:7.6 | ||||
|             restart: unless-stopped | ||||
|             command: | ||||
|                 - "gotenberg" | ||||
|                 - "--chromium-disable-routes=true" | ||||
|  | ||||
|         tika: | ||||
|             image: ghcr.io/paperless-ngx/tika:latest | ||||
|             restart: unless-stopped | ||||
|  | ||||
| Add the configuration variables to the environment of the webserver (alternatively | ||||
| put the configuration in the ``docker-compose.env`` file) and add the additional | ||||
| services below the webserver service. Watch out for indentation. | ||||
|  | ||||
| Make sure to use the correct format `PAPERLESS_TIKA_ENABLED = 1` so python_dotenv can parse the statement correctly. | ||||
|  | ||||
| Software tweaks | ||||
| ############### | ||||
|  | ||||
| PAPERLESS_TASK_WORKERS=<num> | ||||
|     Paperless does multiple things in the background: Maintain the search index, | ||||
|     maintain the automatic matching algorithm, check emails, consume documents, | ||||
|     etc. This variable specifies how many things it will do in parallel. | ||||
|  | ||||
|     Defaults to 1 | ||||
|  | ||||
|  | ||||
| PAPERLESS_THREADS_PER_WORKER=<num> | ||||
|     Furthermore, paperless uses multiple threads when consuming documents to | ||||
|     speed up OCR. This variable specifies how many pages paperless will process | ||||
|     in parallel on a single document. | ||||
|  | ||||
|     .. caution:: | ||||
|  | ||||
|         Ensure that the product | ||||
|  | ||||
|             PAPERLESS_TASK_WORKERS * PAPERLESS_THREADS_PER_WORKER | ||||
|  | ||||
|         does not exceed your CPU core count or else paperless will be extremely slow. | ||||
|         If you want paperless to process many documents in parallel, choose a high | ||||
|         worker count. If you want paperless to process very large documents faster, | ||||
|         use a higher thread per worker count. | ||||
|  | ||||
|     The default is a balance between the two, according to your CPU core count, | ||||
|     with a slight favor towards threads per worker: | ||||
|  | ||||
|     +----------------+---------+---------+ | ||||
|     | CPU core count | Workers | Threads | | ||||
|     +----------------+---------+---------+ | ||||
|     |              1 |       1 |       1 | | ||||
|     +----------------+---------+---------+ | ||||
|     |              2 |       2 |       1 | | ||||
|     +----------------+---------+---------+ | ||||
|     |              4 |       2 |       2 | | ||||
|     +----------------+---------+---------+ | ||||
|     |              6 |       2 |       3 | | ||||
|     +----------------+---------+---------+ | ||||
|     |              8 |       2 |       4 | | ||||
|     +----------------+---------+---------+ | ||||
|     |             12 |       3 |       4 | | ||||
|     +----------------+---------+---------+ | ||||
|     |             16 |       4 |       4 | | ||||
|     +----------------+---------+---------+ | ||||
|  | ||||
|     If you only specify PAPERLESS_TASK_WORKERS, paperless will adjust | ||||
|     PAPERLESS_THREADS_PER_WORKER automatically. | ||||
|  | ||||
|  | ||||
| PAPERLESS_WORKER_TIMEOUT=<num> | ||||
|     Machines with few cores or weak ones might not be able to finish OCR on | ||||
|     large documents within the default 1800 seconds. So extending this timeout | ||||
|     may prove to be useful on weak hardware setups. | ||||
|  | ||||
| PAPERLESS_WORKER_RETRY=<num> | ||||
|     If PAPERLESS_WORKER_TIMEOUT has been configured, the retry time for a task can | ||||
|     also be configured.  By default, this value will be set to 10s more than the | ||||
|     worker timeout.  This value should never be set less than the worker timeout. | ||||
|  | ||||
| PAPERLESS_TIME_ZONE=<timezone> | ||||
|     Set the time zone here. | ||||
|     See https://docs.djangoproject.com/en/3.1/ref/settings/#std:setting-TIME_ZONE | ||||
|     for details on how to set it. | ||||
|  | ||||
|     Defaults to UTC. | ||||
|  | ||||
|  | ||||
| .. _configuration-polling: | ||||
|  | ||||
| PAPERLESS_CONSUMER_POLLING=<num> | ||||
|     If paperless won't find documents added to your consume folder, it might | ||||
|     not be able to automatically detect filesystem changes. In that case, | ||||
|     specify a polling interval in seconds here, which will then cause paperless | ||||
|     to periodically check your consumption directory for changes. This will also | ||||
|     disable listening for file system changes with ``inotify``. | ||||
|  | ||||
|     Defaults to 0, which disables polling and uses filesystem notifications. | ||||
|  | ||||
| PAPERLESS_CONSUMER_POLLING_RETRY_COUNT=<num> | ||||
|     If consumer polling is enabled, sets the number of times paperless will check for a | ||||
|     file to remain unmodified. | ||||
|  | ||||
|     Defaults to 5. | ||||
|  | ||||
| PAPERLESS_CONSUMER_POLLING_DELAY=<num> | ||||
|     If consumer polling is enabled, sets the delay in seconds between each check (above) paperless | ||||
|     will do while waiting for a file to remain unmodified. | ||||
|  | ||||
|     Defaults to 5. | ||||
|  | ||||
| .. _configuration-inotify: | ||||
|  | ||||
| PAPERLESS_CONSUMER_INOTIFY_DELAY=<num> | ||||
|     Sets the time in seconds the consumer will wait for additional events | ||||
|     from inotify before the consumer will consider a file ready and begin consumption. | ||||
|     Certain scanners or network setups may generate multiple events for a single file, | ||||
|     leading to multiple consumers working on the same file.  Configure this to | ||||
|     prevent that. | ||||
|  | ||||
|     Defaults to 0.5 seconds. | ||||
|  | ||||
| PAPERLESS_CONSUMER_DELETE_DUPLICATES=<bool> | ||||
|     When the consumer detects a duplicate document, it will not touch the | ||||
|     original document. This default behavior can be changed here. | ||||
|  | ||||
|     Defaults to false. | ||||
|  | ||||
|  | ||||
| PAPERLESS_CONSUMER_RECURSIVE=<bool> | ||||
|     Enable recursive watching of the consumption directory. Paperless will | ||||
|     then pickup files from files in subdirectories within your consumption | ||||
|     directory as well. | ||||
|  | ||||
|     Defaults to false. | ||||
|  | ||||
|  | ||||
| PAPERLESS_CONSUMER_SUBDIRS_AS_TAGS=<bool> | ||||
|     Set the names of subdirectories as tags for consumed files. | ||||
|     E.g. <CONSUMPTION_DIR>/foo/bar/file.pdf will add the tags "foo" and "bar" to | ||||
|     the consumed file. Paperless will create any tags that don't exist yet. | ||||
|  | ||||
|     This is useful for sorting documents with certain tags such as ``car`` or | ||||
|     ``todo`` prior to consumption. These folders won't be deleted. | ||||
|  | ||||
|     PAPERLESS_CONSUMER_RECURSIVE must be enabled for this to work. | ||||
|  | ||||
|     Defaults to false. | ||||
|  | ||||
| PAPERLESS_CONSUMER_ENABLE_BARCODES=<bool> | ||||
|     Enables the scanning and page separation based on detected barcodes. | ||||
|     This allows for scanning and adding multiple documents per uploaded | ||||
|     file, which are separated by one or multiple barcode pages. | ||||
|  | ||||
|     For ease of use, it is suggested to use a standardized separation page, | ||||
|     e.g. `here <https://www.alliancegroup.co.uk/patch-codes.htm>`_. | ||||
|  | ||||
|     If no barcodes are detected in the uploaded file, no page separation | ||||
|     will happen. | ||||
|  | ||||
|     The original document will be removed and the separated pages will be | ||||
|     saved as pdf. | ||||
|  | ||||
|     Defaults to false. | ||||
|  | ||||
|  | ||||
| PAPERLESS_CONSUMER_BARCODE_TIFF_SUPPORT=<bool> | ||||
|     Whether TIFF image files should be scanned for barcodes. | ||||
|     This will automatically convert any TIFF image(s) to pdfs for later | ||||
|     processing. | ||||
|     This only has an effect, if PAPERLESS_CONSUMER_ENABLE_BARCODES has been | ||||
|     enabled. | ||||
|  | ||||
|     Defaults to false. | ||||
|  | ||||
| PAPERLESS_CONSUMER_BARCODE_STRING=PATCHT | ||||
|   Defines the string to be detected as a separator barcode. | ||||
|   If paperless is used with the PATCH-T separator pages, users | ||||
|   shouldn't change this. | ||||
|  | ||||
|   Defaults to "PATCHT" | ||||
|  | ||||
| PAPERLESS_CONVERT_MEMORY_LIMIT=<num> | ||||
|     On smaller systems, or even in the case of Very Large Documents, the consumer | ||||
|     may explode, complaining about how it's "unable to extend pixel cache".  In | ||||
|     such cases, try setting this to a reasonably low value, like 32.  The | ||||
|     default is to use whatever is necessary to do everything without writing to | ||||
|     disk, and units are in megabytes. | ||||
|  | ||||
|     For more information on how to use this value, you should search | ||||
|     the web for "MAGICK_MEMORY_LIMIT". | ||||
|  | ||||
|     Defaults to 0, which disables the limit. | ||||
|  | ||||
| PAPERLESS_CONVERT_TMPDIR=<path> | ||||
|     Similar to the memory limit, if you've got a small system and your OS mounts | ||||
|     /tmp as tmpfs, you should set this to a path that's on a physical disk, like | ||||
|     /home/your_user/tmp or something.  ImageMagick will use this as scratch space | ||||
|     when crunching through very large documents. | ||||
|  | ||||
|     For more information on how to use this value, you should search | ||||
|     the web for "MAGICK_TMPDIR". | ||||
|  | ||||
|     Default is none, which disables the temporary directory. | ||||
|  | ||||
| PAPERLESS_POST_CONSUME_SCRIPT=<filename> | ||||
|     After a document is consumed, Paperless can trigger an arbitrary script if | ||||
|     you like.  This script will be passed a number of arguments for you to work | ||||
|     with. For more information, take a look at :ref:`advanced-post_consume_script`. | ||||
|  | ||||
|     The default is blank, which means nothing will be executed. | ||||
|  | ||||
| PAPERLESS_FILENAME_DATE_ORDER=<format> | ||||
|     Paperless will check the document text for document date information. | ||||
|     Use this setting to enable checking the document filename for date | ||||
|     information. The date order can be set to any option as specified in | ||||
|     https://dateparser.readthedocs.io/en/latest/settings.html#date-order. | ||||
|     The filename will be checked first, and if nothing is found, the document | ||||
|     text will be checked as normal. | ||||
|  | ||||
|     A date in a filename must have some separators (`.`, `-`, `/`, etc) | ||||
|     for it to be parsed. | ||||
|  | ||||
|     Defaults to none, which disables this feature. | ||||
|  | ||||
| PAPERLESS_NUMBER_OF_SUGGESTED_DATES=<num> | ||||
|     Paperless searches an entire document for dates. The first date found will | ||||
|     be used as the initial value for the created date. When this variable is | ||||
|     greater than 0 (or left to it's default value), paperless will also suggest | ||||
|     other dates found in the document, up to a maximum of this setting. Note that | ||||
|     duplicates will be removed, which can result in fewer dates displayed in the | ||||
|     frontend than this setting value. | ||||
|  | ||||
|     The task to find all dates can be time-consuming and increases with a higher | ||||
|     (maximum) number of suggested dates and slower hardware. | ||||
|  | ||||
|     Defaults to 3. Set to 0 to disable this feature. | ||||
|  | ||||
| PAPERLESS_THUMBNAIL_FONT_NAME=<filename> | ||||
|     Paperless creates thumbnails for plain text files by rendering the content | ||||
|     of the file on an image and uses a predefined font for that. This | ||||
|     font can be changed here. | ||||
|  | ||||
|     Note that this won't have any effect on already generated thumbnails. | ||||
|  | ||||
|     Defaults to ``/usr/share/fonts/liberation/LiberationSerif-Regular.ttf``. | ||||
|  | ||||
| PAPERLESS_IGNORE_DATES=<string> | ||||
|     Paperless parses a documents creation date from filename and file content. | ||||
|     You may specify a comma separated list of dates that should be ignored during | ||||
|     this process. This is useful for special dates (like date of birth) that appear | ||||
|     in documents regularly but are very unlikely to be the documents creation date. | ||||
|  | ||||
|     The date is parsed using the order specified in PAPERLESS_DATE_ORDER | ||||
|  | ||||
|     Defaults to an empty string to not ignore any dates. | ||||
|  | ||||
| PAPERLESS_DATE_ORDER=<format> | ||||
|     Paperless will try to determine the document creation date from its contents. | ||||
|     Specify the date format Paperless should expect to see within your documents. | ||||
|  | ||||
|     This option defaults to DMY which translates to day first, month second, and year | ||||
|     last order. Characters D, M, or Y can be shuffled to meet the required order. | ||||
|  | ||||
| PAPERLESS_CONSUMER_IGNORE_PATTERNS=<json> | ||||
|     By default, paperless ignores certain files and folders in the consumption | ||||
|     directory, such as system files created by the Mac OS. | ||||
|  | ||||
|     This can be adjusted by configuring a custom json array with patterns to exclude. | ||||
|  | ||||
|     Defaults to ``[".DS_STORE/*", "._*", ".stfolder/*", ".stversions/*", ".localized/*", "desktop.ini"]``. | ||||
|  | ||||
| Binaries | ||||
| ######## | ||||
|  | ||||
| There are a few external software packages that Paperless expects to find on | ||||
| your system when it starts up.  Unless you've done something creative with | ||||
| their installation, you probably won't need to edit any of these.  However, | ||||
| if you've installed these programs somewhere where simply typing the name of | ||||
| the program doesn't automatically execute it (ie. the program isn't in your | ||||
| $PATH), then you'll need to specify the literal path for that program. | ||||
|  | ||||
| PAPERLESS_CONVERT_BINARY=<path> | ||||
|     Defaults to "convert". | ||||
|  | ||||
| PAPERLESS_GS_BINARY=<path> | ||||
|     Defaults to "gs". | ||||
|  | ||||
|  | ||||
| .. _configuration-docker: | ||||
|  | ||||
| Docker-specific options | ||||
| ####################### | ||||
|  | ||||
| These options don't have any effect in ``paperless.conf``. These options adjust | ||||
| the behavior of the docker container. Configure these in `docker-compose.env`. | ||||
|  | ||||
| PAPERLESS_WEBSERVER_WORKERS=<num> | ||||
|     The number of worker processes the webserver should spawn. More worker processes | ||||
|     usually result in the front end to load data much quicker. However, each worker process | ||||
|     also loads the entire application into memory separately, so increasing this value | ||||
|     will increase RAM usage. | ||||
|  | ||||
|     Defaults to 1. | ||||
|  | ||||
| PAPERLESS_BIND_ADDR=<ip address> | ||||
|     The IP address the webserver will listen on inside the container. There are | ||||
|     special setups where you may need to configure this value to restrict the | ||||
|     Ip address or interface the webserver listens on. | ||||
|  | ||||
|     Defaults to [::], meaning all interfaces, including IPv6. | ||||
|  | ||||
| PAPERLESS_PORT=<port> | ||||
|     The port number the webserver will listen on inside the container. There are | ||||
|     special setups where you may need this to avoid collisions with other | ||||
|     services (like using podman with multiple containers in one pod). | ||||
|  | ||||
|     Don't change this when using Docker. To change the port the webserver is | ||||
|     reachable outside of the container, instead refer to the "ports" key in | ||||
|     ``docker-compose.yml``. | ||||
|  | ||||
|     Defaults to 8000. | ||||
|  | ||||
| USERMAP_UID=<uid> | ||||
|     The ID of the paperless user in the container. Set this to your actual user ID on the | ||||
|     host system, which you can get by executing | ||||
|  | ||||
|     .. code:: shell-session | ||||
|  | ||||
|         $ id -u | ||||
|  | ||||
|     Paperless will change ownership on its folders to this user, so you need to get this right | ||||
|     in order to be able to write to the consumption directory. | ||||
|  | ||||
|     Defaults to 1000. | ||||
|  | ||||
| USERMAP_GID=<gid> | ||||
|     The ID of the paperless Group in the container. Set this to your actual group ID on the | ||||
|     host system, which you can get by executing | ||||
|  | ||||
|     .. code:: shell-session | ||||
|  | ||||
|         $ id -g | ||||
|  | ||||
|     Paperless will change ownership on its folders to this group, so you need to get this right | ||||
|     in order to be able to write to the consumption directory. | ||||
|  | ||||
|     Defaults to 1000. | ||||
|  | ||||
| PAPERLESS_OCR_LANGUAGES=<list> | ||||
|     Additional OCR languages to install. By default, paperless comes with | ||||
|     English, German, Italian, Spanish and French. If your language is not in this list, install | ||||
|     additional languages with this configuration option: | ||||
|  | ||||
|     .. code:: bash | ||||
|  | ||||
|         PAPERLESS_OCR_LANGUAGES=tur ces | ||||
|  | ||||
|     To actually use these languages, also set the default OCR language of paperless: | ||||
|  | ||||
|     .. code:: bash | ||||
|  | ||||
|         PAPERLESS_OCR_LANGUAGE=tur | ||||
|  | ||||
|     Defaults to none, which does not install any additional languages. | ||||
|  | ||||
| PAPERLESS_ENABLE_FLOWER=<defined> | ||||
|     If this environment variable is defined, the Celery monitoring tool | ||||
|     `Flower <https://flower.readthedocs.io/en/latest/index.html>`_ will | ||||
|     be started by the container. | ||||
|  | ||||
|     You can read more about this in the :ref:`advanced setup <advanced-celery-monitoring>` | ||||
|     documentation. | ||||
|  | ||||
|  | ||||
| .. _configuration-update-checking: | ||||
|  | ||||
| Update Checking | ||||
| ############### | ||||
|  | ||||
| PAPERLESS_ENABLE_UPDATE_CHECK=<bool> | ||||
|  | ||||
|     .. note:: | ||||
|  | ||||
|             This setting was deprecated in favor of a frontend setting after v1.9.2. A one-time | ||||
|             migration is performed for users who have this setting set. This setting is always | ||||
|             ignored if the corresponding frontend setting has been set. | ||||
|     You will be redirected shortly... | ||||
|   | ||||
| @@ -1,431 +1,12 @@ | ||||
| .. _extending: | ||||
|  | ||||
| ************************* | ||||
| Paperless-ngx Development | ||||
| ######################### | ||||
| ************************* | ||||
|  | ||||
| This section describes the steps you need to take to start development on paperless-ngx. | ||||
|  | ||||
| Check out the source from github. The repository is organized in the following way: | ||||
| .. cssclass:: redirect-notice | ||||
|  | ||||
| *   ``main`` always represents the latest release and will only see changes | ||||
|     when a new release is made. | ||||
| *   ``dev`` contains the code that will be in the next release. | ||||
| *   ``feature-X`` contain bigger changes that will be in some release, but not | ||||
|     necessarily the next one. | ||||
|     The Paperless-ngx documentation has permanently moved. | ||||
|  | ||||
| When making functional changes to paperless, *always* make your changes on the ``dev`` branch. | ||||
|  | ||||
| Apart from that, the folder structure is as follows: | ||||
|  | ||||
| *   ``docs/`` - Documentation. | ||||
| *   ``src-ui/`` - Code of the front end. | ||||
| *   ``src/`` - Code of the back end. | ||||
| *   ``scripts/`` - Various scripts that help with different parts of development. | ||||
| *   ``docker/`` - Files required to build the docker image. | ||||
|  | ||||
| Contributing to Paperless | ||||
| ========================= | ||||
|  | ||||
| Maybe you've been using Paperless for a while and want to add a feature or two, | ||||
| or maybe you've come across a bug that you have some ideas how to solve.  The | ||||
| beauty of open source software is that you can see what's wrong and help to get | ||||
| it fixed for everyone! | ||||
|  | ||||
| Before contributing please review our `code of conduct`_ and other important | ||||
| information in the `contributing guidelines`_. | ||||
|  | ||||
| .. _code-formatting-with-pre-commit-hooks: | ||||
|  | ||||
| Code formatting with pre-commit Hooks | ||||
| ===================================== | ||||
|  | ||||
| To ensure a consistent style and formatting across the project source, the project | ||||
| utilizes a Git `pre-commit` hook to perform some formatting and linting before a | ||||
| commit is allowed. That way, everyone uses the same style and some common issues | ||||
| can be caught early on. See below for installation instructions. | ||||
|  | ||||
| Once installed, hooks will run when you commit. If the formatting isn't quite right | ||||
| or a linter catches something, the commit will be rejected. You'll need to look at the | ||||
| output and fix the issue. Some hooks, such as the Python formatting tool `black`, | ||||
| will format failing files, so all you need to do is `git add` those files again and | ||||
| retry your commit. | ||||
|  | ||||
| Initial setup and first start | ||||
| ============================= | ||||
|  | ||||
| After you forked and cloned the code from github you need to perform a first-time setup. | ||||
| To do the setup you need to perform the steps from the following chapters in a certain order: | ||||
|  | ||||
| 1.  Install prerequisites + pipenv as mentioned in :ref:`Bare metal route <setup-bare_metal>` | ||||
| 2.  Copy ``paperless.conf.example`` to ``paperless.conf`` and enable debug mode. | ||||
| 3.  Install the Angular CLI interface: | ||||
|  | ||||
|     .. code:: shell-session | ||||
|  | ||||
|         $ npm install -g @angular/cli | ||||
|  | ||||
| 4.  Install pre-commit | ||||
|  | ||||
|     .. code:: shell-session | ||||
|  | ||||
|         pre-commit install | ||||
|  | ||||
| 5.  Create ``consume`` and ``media`` folders in the cloned root folder. | ||||
|  | ||||
|     .. code:: shell-session | ||||
|  | ||||
|         mkdir -p consume media | ||||
|  | ||||
| 6.  You can now either ... | ||||
|  | ||||
|     *  install redis or | ||||
|     *  use the included scripts/start-services.sh to use docker to fire up a redis instance (and some other services such as tika, gotenberg and a database server) or | ||||
|     *  spin up a bare redis container | ||||
|  | ||||
|         .. code:: shell-session | ||||
|  | ||||
|             docker run -d -p 6379:6379 --restart unless-stopped redis:latest | ||||
|  | ||||
| 7.  Install the python dependencies by performing in the src/ directory. | ||||
|  | ||||
|     .. code:: shell-session | ||||
|  | ||||
|         pipenv install --dev | ||||
|  | ||||
|   * Make sure you're using python 3.9.x or lower. Otherwise you might get issues with building dependencies. You can use `pyenv <https://github.com/pyenv/pyenv>`_ to install a specific python version. | ||||
|  | ||||
| 8.  Generate the static UI so you can perform a login to get session that is required for frontend development (this needs to be done one time only). From src-ui directory: | ||||
|  | ||||
|     .. code:: shell-session | ||||
|  | ||||
|         npm install . | ||||
|         ./node_modules/.bin/ng build --configuration production | ||||
|  | ||||
| 9.  Apply migrations and create a superuser for your dev instance: | ||||
|  | ||||
|     .. code:: shell-session | ||||
|  | ||||
|         python3 manage.py migrate | ||||
|         python3 manage.py createsuperuser | ||||
|  | ||||
| 10.  Now spin up the dev backend. Depending on which part of paperless you're developing for, you need to have some or all of them running. | ||||
|  | ||||
|     .. code:: shell-session | ||||
|  | ||||
|         python3 manage.py runserver & python3 manage.py document_consumer & celery --app paperless worker | ||||
|  | ||||
| 11. Login with the superuser credentials provided in step 8 at ``http://localhost:8000`` to create a session that enables you to use the backend. | ||||
|  | ||||
| Backend development environment is now ready, to start Frontend development go to ``/src-ui`` and run ``ng serve``. From there you can use ``http://localhost:4200`` for a preview. | ||||
|  | ||||
| Back end development | ||||
| ==================== | ||||
|  | ||||
| The backend is a django application. PyCharm works well for development, but you can use whatever | ||||
| you want. | ||||
|  | ||||
| Configure the IDE to use the src/ folder as the base source folder. Configure the following | ||||
| launch configurations in your IDE: | ||||
|  | ||||
| *   python3 manage.py runserver | ||||
| *   celery --app paperless worker | ||||
| *   python3 manage.py document_consumer | ||||
|  | ||||
| To start them all: | ||||
|  | ||||
| .. code:: shell-session | ||||
|  | ||||
|     python3 manage.py runserver & python3 manage.py document_consumer & celery --app paperless worker | ||||
|  | ||||
| Testing and code style: | ||||
|  | ||||
| *   Run ``pytest`` in the src/ directory to execute all tests. This also generates a HTML coverage | ||||
|     report. When runnings test, paperless.conf is loaded as well. However: the tests rely on the default | ||||
|     configuration. This is not ideal. But for now, make sure no settings except for DEBUG are overridden when testing. | ||||
| *   Coding style is enforced by the Git pre-commit hooks.  These will ensure your code is formatted and do some | ||||
|     linting when you do a `git commit`. | ||||
| *   You can also run ``black`` manually to format your code | ||||
|  | ||||
|     .. note:: | ||||
|  | ||||
|         The line length rule E501 is generally useful for getting multiple source files | ||||
|         next to each other on the screen. However, in some cases, its just not possible | ||||
|         to make some lines fit, especially complicated IF cases. Append ``# NOQA: E501`` | ||||
|         to disable this check for certain lines. | ||||
|  | ||||
| Front end development | ||||
| ===================== | ||||
|  | ||||
| The front end is built using Angular. In order to get started, you need ``npm``. | ||||
| Install the Angular CLI interface with | ||||
|  | ||||
| .. code:: shell-session | ||||
|  | ||||
|     $ npm install -g @angular/cli | ||||
|  | ||||
| and make sure that it's on your path. Next, in the src-ui/ directory, install the | ||||
| required dependencies of the project. | ||||
|  | ||||
| .. code:: shell-session | ||||
|  | ||||
|     $ npm install | ||||
|  | ||||
| You can launch a development server by running | ||||
|  | ||||
| .. code:: shell-session | ||||
|  | ||||
|     $ ng serve | ||||
|  | ||||
| This will automatically update whenever you save. However, in-place compilation might fail | ||||
| on syntax errors, in which case you need to restart it. | ||||
|  | ||||
| By default, the development server is available on ``http://localhost:4200/`` and is configured | ||||
| to access the API at ``http://localhost:8000/api/``, which is the default of the backend. | ||||
| If you enabled DEBUG on the back end, several security overrides for allowed hosts, CORS and | ||||
| X-Frame-Options are in place so that the front end behaves exactly as in production. This also | ||||
| relies on you being logged into the back end. Without a valid session, The front end will simply | ||||
| not work. | ||||
|  | ||||
| Testing and code style: | ||||
|  | ||||
| *   The frontend code (.ts, .html, .scss) use ``prettier`` for code formatting via the Git | ||||
|     ``pre-commit`` hooks which run automatically on commit. See | ||||
|     :ref:`above <code-formatting-with-pre-commit-hooks>` for installation. You can also run this | ||||
|     via cli with a command such as | ||||
|  | ||||
|     .. code:: shell-session | ||||
|  | ||||
|         $ git ls-files -- '*.ts' | xargs pre-commit run prettier --files | ||||
|  | ||||
| *   Frontend testing uses jest and cypress. There is currently a need for significantly more | ||||
|     frontend tests. Unit tests and e2e tests, respectively, can be run non-interactively with: | ||||
|  | ||||
|     .. code:: shell-session | ||||
|  | ||||
|         $ ng test | ||||
|         $ npm run e2e:ci | ||||
|  | ||||
|     Cypress also includes a UI which can be run from within the ``src-ui`` directory with | ||||
|  | ||||
|     .. code:: shell-session | ||||
|  | ||||
|         $ ./node_modules/.bin/cypress open | ||||
|  | ||||
| In order to build the front end and serve it as part of django, execute | ||||
|  | ||||
| .. code:: shell-session | ||||
|  | ||||
|     $ ng build --prod | ||||
|  | ||||
| This will build the front end and put it in a location from which the Django server will serve | ||||
| it as static content. This way, you can verify that authentication is working. | ||||
|  | ||||
|  | ||||
| Localization | ||||
| ============ | ||||
|  | ||||
| Paperless is available in many different languages. Since paperless consists both of a django | ||||
| application and an Angular front end, both these parts have to be translated separately. | ||||
|  | ||||
| Front end localization | ||||
| ---------------------- | ||||
|  | ||||
| *   The Angular front end does localization according to the `Angular documentation <https://angular.io/guide/i18n>`_. | ||||
| *   The source language of the project is "en_US". | ||||
| *   The source strings end up in the file "src-ui/messages.xlf". | ||||
| *   The translated strings need to be placed in the "src-ui/src/locale/" folder. | ||||
| *   In order to extract added or changed strings from the source files, call ``ng xi18n --ivy``. | ||||
|  | ||||
| Adding new languages requires adding the translated files in the "src-ui/src/locale/" folder and adjusting a couple files. | ||||
|  | ||||
| 1.  Adjust "src-ui/angular.json": | ||||
|  | ||||
|     .. code:: json | ||||
|  | ||||
|         "i18n": { | ||||
|             "sourceLocale": "en-US", | ||||
|             "locales": { | ||||
|                 "de": "src/locale/messages.de.xlf", | ||||
|                 "nl-NL": "src/locale/messages.nl_NL.xlf", | ||||
|                 "fr": "src/locale/messages.fr.xlf", | ||||
|                 "en-GB": "src/locale/messages.en_GB.xlf", | ||||
|                 "pt-BR": "src/locale/messages.pt_BR.xlf", | ||||
|                 "language-code": "language-file" | ||||
|             } | ||||
|         } | ||||
|  | ||||
| 2.  Add the language to the available options in "src-ui/src/app/services/settings.service.ts": | ||||
|  | ||||
|     .. code:: typescript | ||||
|  | ||||
|         getLanguageOptions(): LanguageOption[] { | ||||
|             return [ | ||||
|                 {code: "en-us", name: $localize`English (US)`, englishName: "English (US)", dateInputFormat: "mm/dd/yyyy"}, | ||||
|                 {code: "en-gb", name: $localize`English (GB)`, englishName: "English (GB)", dateInputFormat: "dd/mm/yyyy"}, | ||||
|                 {code: "de", name: $localize`German`, englishName: "German", dateInputFormat: "dd.mm.yyyy"}, | ||||
|                 {code: "nl", name: $localize`Dutch`, englishName: "Dutch", dateInputFormat: "dd-mm-yyyy"}, | ||||
|                 {code: "fr", name: $localize`French`, englishName: "French", dateInputFormat: "dd/mm/yyyy"}, | ||||
|                 {code: "pt-br", name: $localize`Portuguese (Brazil)`, englishName: "Portuguese (Brazil)", dateInputFormat: "dd/mm/yyyy"} | ||||
|                 // Add your new language here | ||||
|             ] | ||||
|         } | ||||
|  | ||||
|     ``dateInputFormat`` is a special string that defines the behavior of the date input fields and absolutely needs to contain "dd", "mm" and "yyyy". | ||||
|  | ||||
| 3.  Import and register the Angular data for this locale in "src-ui/src/app/app.module.ts": | ||||
|  | ||||
|     .. code:: typescript | ||||
|  | ||||
|         import localeDe from '@angular/common/locales/de'; | ||||
|         registerLocaleData(localeDe) | ||||
|  | ||||
| Back end localization | ||||
| --------------------- | ||||
|  | ||||
| A majority of the strings that appear in the back end appear only when the admin is used. However, | ||||
| some of these are still shown on the front end (such as error messages). | ||||
|  | ||||
| *   The django application does localization according to the `django documentation <https://docs.djangoproject.com/en/3.1/topics/i18n/translation/>`_. | ||||
| *   The source language of the project is "en_US". | ||||
| *   Localization files end up in the folder "src/locale/". | ||||
| *   In order to extract strings from the application, call ``python3 manage.py makemessages -l en_US``. This is important after making changes to translatable strings. | ||||
| *   The message files need to be compiled for them to show up in the application. Call ``python3 manage.py compilemessages`` to do this. The generated files don't get | ||||
|     committed into git, since these are derived artifacts. The build pipeline takes care of executing this command. | ||||
|  | ||||
| Adding new languages requires adding the translated files in the "src/locale/" folder and adjusting the file "src/paperless/settings.py" to include the new language: | ||||
|  | ||||
| .. code:: python | ||||
|  | ||||
|     LANGUAGES = [ | ||||
|         ("en-us", _("English (US)")), | ||||
|         ("en-gb", _("English (GB)")), | ||||
|         ("de", _("German")), | ||||
|         ("nl-nl", _("Dutch")), | ||||
|         ("fr", _("French")), | ||||
|         ("pt-br", _("Portuguese (Brazil)")), | ||||
|         # Add language here. | ||||
|     ] | ||||
|  | ||||
|  | ||||
| Building the documentation | ||||
| ========================== | ||||
|  | ||||
| The documentation is built using sphinx. I've configured ReadTheDocs to automatically build | ||||
| the documentation when changes are pushed. If you want to build the documentation locally, | ||||
| this is how you do it: | ||||
|  | ||||
| 1.  Install python dependencies. | ||||
|  | ||||
|     .. code:: shell-session | ||||
|  | ||||
|         $ cd /path/to/paperless | ||||
|         $ pipenv install --dev | ||||
|  | ||||
| 2.  Build the documentation | ||||
|  | ||||
|     .. code:: shell-session | ||||
|  | ||||
|         $ cd /path/to/paperless/docs | ||||
|         $ pipenv run make clean html | ||||
|  | ||||
| This will build the HTML documentation, and put the resulting files in the ``_build/html`` | ||||
| directory. | ||||
|  | ||||
| Building the Docker image | ||||
| ========================= | ||||
|  | ||||
| The docker image is primarily built by the GitHub actions workflow, but it can be | ||||
| faster when developing to build and tag an image locally. | ||||
|  | ||||
| To provide the build arguments automatically, build the image using the helper | ||||
| script ``build-docker-image.sh``. | ||||
|  | ||||
| Building the docker image from source: | ||||
|  | ||||
|     .. code:: shell-session | ||||
|  | ||||
|         ./build-docker-image.sh Dockerfile -t <your-tag> | ||||
|  | ||||
| Extending Paperless | ||||
| =================== | ||||
|  | ||||
| Paperless does not have any fancy plugin systems and will probably never have. However, | ||||
| some parts of the application have been designed to allow easy integration of additional | ||||
| features without any modification to the base code. | ||||
|  | ||||
| Making custom parsers | ||||
| --------------------- | ||||
|  | ||||
| Paperless uses parsers to add documents to paperless. A parser is responsible for: | ||||
|  | ||||
| *   Retrieve the content from the original | ||||
| *   Create a thumbnail | ||||
| *   Optional: Retrieve a created date from the original | ||||
| *   Optional: Create an archived document from the original | ||||
|  | ||||
| Custom parsers can be added to paperless to support more file types. In order to do that, | ||||
| you need to write the parser itself and announce its existence to paperless. | ||||
|  | ||||
| The parser itself must extend ``documents.parsers.DocumentParser`` and must implement the | ||||
| methods ``parse`` and ``get_thumbnail``. You can provide your own implementation to | ||||
| ``get_date`` if you don't want to rely on paperless' default date guessing mechanisms. | ||||
|  | ||||
| .. code:: python | ||||
|  | ||||
|     class MyCustomParser(DocumentParser): | ||||
|  | ||||
|         def parse(self, document_path, mime_type): | ||||
|             # This method does not return anything. Rather, you should assign | ||||
|             # whatever you got from the document to the following fields: | ||||
|  | ||||
|             # The content of the document. | ||||
|             self.text = "content" | ||||
|  | ||||
|             # Optional: path to a PDF document that you created from the original. | ||||
|             self.archive_path = os.path.join(self.tempdir, "archived.pdf") | ||||
|  | ||||
|             # Optional: "created" date of the document. | ||||
|             self.date = get_created_from_metadata(document_path) | ||||
|  | ||||
|         def get_thumbnail(self, document_path, mime_type): | ||||
|             # This should return the path to a thumbnail you created for this | ||||
|             # document. | ||||
|             return os.path.join(self.tempdir, "thumb.png") | ||||
|  | ||||
| If you encounter any issues during parsing, raise a ``documents.parsers.ParseError``. | ||||
|  | ||||
| The ``self.tempdir`` directory is a temporary directory that is guaranteed to be empty | ||||
| and removed after consumption finished. You can use that directory to store any | ||||
| intermediate files and also use it to store the thumbnail / archived document. | ||||
|  | ||||
| After that, you need to announce your parser to paperless. You need to connect a | ||||
| handler to the ``document_consumer_declaration`` signal. Have a look in the file | ||||
| ``src/paperless_tesseract/apps.py`` on how that's done. The handler is a method | ||||
| that returns information about your parser: | ||||
|  | ||||
| .. code:: python | ||||
|  | ||||
|     def myparser_consumer_declaration(sender, **kwargs): | ||||
|         return { | ||||
|             "parser": MyCustomParser, | ||||
|             "weight": 0, | ||||
|             "mime_types": { | ||||
|                 "application/pdf": ".pdf", | ||||
|                 "image/jpeg": ".jpg", | ||||
|             } | ||||
|         } | ||||
|  | ||||
| *   ``parser`` is a reference to a class that extends ``DocumentParser``. | ||||
|  | ||||
| *   ``weight`` is used whenever two or more parsers are able to parse a file: The parser with | ||||
|     the higher weight wins. This can be used to override the parsers provided by | ||||
|     paperless. | ||||
|  | ||||
| *   ``mime_types`` is a dictionary. The keys are the mime types your parser supports and the value | ||||
|     is the default file extension that paperless should use when storing files and serving them for | ||||
|     download. We could guess that from the file extensions, but some mime types have many extensions | ||||
|     associated with them and the python methods responsible for guessing the extension do not always | ||||
|     return the same value. | ||||
|  | ||||
| .. _code of conduct: https://github.com/paperless-ngx/paperless-ngx/blob/main/CODE_OF_CONDUCT.md | ||||
| .. _contributing guidelines: https://github.com/paperless-ngx/paperless-ngx/blob/main/CONTRIBUTING.md | ||||
|     You will be redirected shortly... | ||||
|   | ||||
							
								
								
									
										113
									
								
								docs/faq.rst
									
									
									
									
									
								
							
							
						
						
									
										113
									
								
								docs/faq.rst
									
									
									
									
									
								
							| @@ -1,117 +1,12 @@ | ||||
| .. _faq: | ||||
|  | ||||
| ************************** | ||||
| Frequently asked questions | ||||
| ************************** | ||||
|  | ||||
| **Q:** *What's the general plan for Paperless-ngx?* | ||||
|  | ||||
| **A:** While Paperless-ngx is already considered largely "feature-complete" it is a community-driven | ||||
| project and development will be guided in this way. New features can be submitted via | ||||
| GitHub discussions and "up-voted" by the community but this is not a guarantee the feature | ||||
| will be implemented. This project will always be open to collaboration in the form of PRs, | ||||
| ideas etc. | ||||
| .. cssclass:: redirect-notice | ||||
|  | ||||
| **Q:** *I'm using docker. Where are my documents?* | ||||
|     The Paperless-ngx documentation has permanently moved. | ||||
|  | ||||
| **A:** Your documents are stored inside the docker volume ``paperless_media``. | ||||
| Docker manages this volume automatically for you. It is a persistent storage | ||||
| and will persist as long as you don't explicitly delete it. The actual location | ||||
| depends on your host operating system. On Linux, chances are high that this location | ||||
| is | ||||
|  | ||||
| .. code:: | ||||
|  | ||||
|     /var/lib/docker/volumes/paperless_media/_data | ||||
|  | ||||
| .. caution:: | ||||
|  | ||||
|     Do not mess with this folder. Don't change permissions and don't move | ||||
|     files around manually. This folder is meant to be entirely managed by docker | ||||
|     and paperless. | ||||
|  | ||||
| **Q:** *Let's say I want to switch tools in a year. Can I easily move to other systems?* | ||||
|  | ||||
| **A:** Your documents are stored as plain files inside the media folder. You can always drag those files | ||||
| out of that folder to use them elsewhere. Here are a couple notes about that. | ||||
|  | ||||
| *   Paperless-ngx never modifies your original documents. It keeps checksums of all documents and uses a | ||||
|     scheduled sanity checker to check that they remain the same. | ||||
| *   By default, paperless uses the internal ID of each document as its filename. This might not be very | ||||
|     convenient for export. However, you can adjust the way files are stored in paperless by | ||||
|     :ref:`configuring the filename format <advanced-file_name_handling>`. | ||||
| *   :ref:`The exporter <utilities-exporter>` is another easy way to get your files out of paperless with reasonable file names. | ||||
|  | ||||
| **Q:** *What file types does paperless-ngx support?* | ||||
|  | ||||
| **A:** Currently, the following files are supported: | ||||
|  | ||||
| *   PDF documents, PNG images, JPEG images, TIFF images and GIF images are processed with OCR and converted into PDF documents. | ||||
| *   Plain text documents are supported as well and are added verbatim | ||||
|     to paperless. | ||||
| *   With the optional Tika integration enabled (see :ref:`Configuration <configuration-tika>`), Paperless also supports various | ||||
|     Office documents (.docx, .doc, odt, .ppt, .pptx, .odp, .xls, .xlsx, .ods). | ||||
|  | ||||
| Paperless-ngx determines the type of a file by inspecting its content. The | ||||
| file extensions do not matter. | ||||
|  | ||||
| **Q:** *Will paperless-ngx run on Raspberry Pi?* | ||||
|  | ||||
| **A:** The short answer is yes. I've tested it on a Raspberry Pi 3 B. | ||||
| The long answer is that certain parts of | ||||
| Paperless will run very slow, such as the OCR. On Raspberry Pi, | ||||
| try to OCR documents before feeding them into paperless so that paperless can | ||||
| reuse the text. The web interface is a lot snappier, since it runs | ||||
| in your browser and paperless has to do much less work to serve the data. | ||||
|  | ||||
| .. note:: | ||||
|  | ||||
|     You can adjust some of the settings so that paperless uses less processing | ||||
|     power. See :ref:`setup-less_powerful_devices` for details. | ||||
|  | ||||
|  | ||||
| **Q:** *How do I install paperless-ngx on Raspberry Pi?* | ||||
|  | ||||
| **A:** Docker images are available for arm and arm64 hardware, so just follow | ||||
| the docker-compose instructions. Apart from more required disk space compared to | ||||
| a bare metal installation, docker comes with close to zero overhead, even on | ||||
| Raspberry Pi. | ||||
|  | ||||
| If you decide to got with the bare metal route, be aware that some of the | ||||
| python requirements do not have precompiled packages for ARM / ARM64. Installation | ||||
| of these will require additional development libraries and compilation will take | ||||
| a long time. | ||||
|  | ||||
| **Q:** *How do I run this on Unraid?* | ||||
|  | ||||
| **A:** Paperless-ngx is available as `community app <https://unraid.net/community/apps?q=paperless-ngx>`_ | ||||
| in Unraid. `Uli Fahrer <https://github.com/Tooa>`_ created a container template for that. | ||||
|  | ||||
| **Q:** *How do I run this on my toaster?* | ||||
|  | ||||
| **A:** I honestly don't know! As for all other devices that might be able | ||||
| to run paperless, you're a bit on your own. If you can't run the docker image, | ||||
| the documentation has instructions for bare metal installs. I'm running | ||||
| paperless on an i3 processor from 2015 or so. This is also what I use to test | ||||
| new releases with. Apart from that, I also have a Raspberry Pi, which I | ||||
| occasionally build the image on and see if it works. | ||||
|  | ||||
| **Q:** *How do I proxy this with NGINX?* | ||||
|  | ||||
| **A:** See :ref:`here <setup-nginx>`. | ||||
|  | ||||
| .. _faq-mod_wsgi: | ||||
|  | ||||
| **Q:** *How do I get WebSocket support with Apache mod_wsgi*? | ||||
|  | ||||
| **A:** ``mod_wsgi`` by itself does not support ASGI. Paperless will continue | ||||
| to work with WSGI, but certain features such as status notifications about | ||||
| document consumption won't be available. | ||||
|  | ||||
| If you want to continue using ``mod_wsgi``, you will have to run an ASGI-enabled | ||||
| web server as well that processes WebSocket connections, and configure Apache to | ||||
| redirect WebSocket connections to this server. Multiple options for ASGI servers | ||||
| exist: | ||||
|  | ||||
| * ``gunicorn`` with ``uvicorn`` as the worker implementation (the default of paperless) | ||||
| * ``daphne`` as a standalone server, which is the reference implementation for ASGI. | ||||
| * ``uvicorn`` as a standalone server | ||||
|     You will be redirected shortly... | ||||
|   | ||||
| @@ -2,74 +2,24 @@ | ||||
| Paperless | ||||
| ********* | ||||
|  | ||||
| Paperless is a simple Django application running in two parts: | ||||
| a *Consumer* (the thing that does the indexing) and | ||||
| the *Web server* (the part that lets you search & | ||||
| download already-indexed documents). If you want to learn more about its | ||||
| functions keep on reading after the installation section. | ||||
|  | ||||
| .. cssclass:: redirect-notice | ||||
|  | ||||
| Why This Exists | ||||
| =============== | ||||
|     The Paperless-ngx documentation has permanently moved. | ||||
|  | ||||
| Paper is a nightmare.  Environmental issues aside, there's no excuse for it in | ||||
| the 21st century.  It takes up space, collects dust, doesn't support any form | ||||
| of a search feature, indexing is tedious, it's heavy and prone to damage & | ||||
| loss. | ||||
|  | ||||
| I wrote this to make "going paperless" easier.  I do not have to worry about | ||||
| finding stuff again. I feed documents right from the post box into the scanner | ||||
| and then shred them.  Perhaps you might find it useful too. | ||||
|  | ||||
|  | ||||
| Paperless-ngx | ||||
| ============= | ||||
|  | ||||
| Paperless-ngx is a document management system that transforms your physical | ||||
| documents into a searchable online archive so you can keep, well, *less paper*. | ||||
|  | ||||
| Paperless-ngx forked from paperless-ng to continue the great work and | ||||
| distribute responsibility of supporting and advancing the project among a team | ||||
| of people. | ||||
|  | ||||
| NG stands for both Angular (the framework used for the | ||||
| Frontend) and next-gen. Publishing this project under a different name also | ||||
| avoids confusion between paperless and paperless-ngx. | ||||
|  | ||||
| If you want to learn about what's different in paperless-ngx from Paperless, check out these | ||||
| resources in the documentation: | ||||
|  | ||||
| *   :ref:`Some screenshots <screenshots>` of the new UI are available. | ||||
| *   Read :ref:`this section <advanced-automatic_matching>` if you want to | ||||
|     learn about how paperless automates all tagging using machine learning. | ||||
| *   Paperless now comes with a :ref:`proper email consumer <usage-email>` | ||||
|     that's fully tested and production ready. | ||||
| *   Paperless creates searchable PDF/A documents from whatever you put into | ||||
|     the consumption directory. This means that you can select text in | ||||
|     image-only documents coming from your scanner. | ||||
| *   See :ref:`this note <utilities-encyption>` about GnuPG encryption in | ||||
|     paperless-ngx. | ||||
| *   Paperless is now integrated with a | ||||
|     :ref:`task processing queue <setup-task_processor>` that tells you | ||||
|     at a glance when and why something is not working. | ||||
| *   The :doc:`changelog </changelog>` contains a detailed list of all changes | ||||
|     in paperless-ngx. | ||||
|  | ||||
| Contents | ||||
| ======== | ||||
|     You will be redirected shortly... | ||||
|  | ||||
| .. toctree:: | ||||
|    :maxdepth: 1 | ||||
|  | ||||
|    setup | ||||
|    usage_overview | ||||
|    advanced_usage | ||||
|    administration | ||||
|    configuration | ||||
|    api | ||||
|    faq | ||||
|    troubleshooting | ||||
|    extending | ||||
|    scanners | ||||
|    screenshots | ||||
|    changelog | ||||
|     screenshots | ||||
|     scanners | ||||
|     administration | ||||
|     advanced_usage | ||||
|     usage_overview | ||||
|     setup | ||||
|     troubleshooting | ||||
|     changelog | ||||
|     configuration | ||||
|     extending | ||||
|     api | ||||
|     faq | ||||
|   | ||||
| @@ -1,8 +1,12 @@ | ||||
|  | ||||
| .. _scanners: | ||||
|  | ||||
| ******************* | ||||
| Scanners & Software | ||||
| ******************* | ||||
|  | ||||
| Paperless-ngx is compatible with many different scanners and scanning tools. A user-maintained list of scanners and other software is available on `the wiki <https://github.com/paperless-ngx/paperless-ngx/wiki/Scanner-&-Software-Recommendations>`_. | ||||
|  | ||||
| .. cssclass:: redirect-notice | ||||
|  | ||||
|     The Paperless-ngx documentation has permanently moved. | ||||
|  | ||||
|     You will be redirected shortly... | ||||
|   | ||||
| @@ -4,60 +4,9 @@ | ||||
| Screenshots | ||||
| *********** | ||||
|  | ||||
| This is what Paperless-ngx looks like. | ||||
|  | ||||
| The dashboard shows customizable views on your document and allows document uploads: | ||||
| .. cssclass:: redirect-notice | ||||
|  | ||||
| .. image:: _static/screenshots/dashboard.png | ||||
|     :target: _static/screenshots/dashboard.png | ||||
|     The Paperless-ngx documentation has permanently moved. | ||||
|  | ||||
| The document list provides three different styles to scroll through your documents: | ||||
|  | ||||
| .. image:: _static/screenshots/documents-table.png | ||||
|     :target: _static/screenshots/documents-table.png | ||||
| .. image:: _static/screenshots/documents-smallcards.png | ||||
|     :target: _static/screenshots/documents-smallcards.png | ||||
| .. image:: _static/screenshots/documents-largecards.png | ||||
|     :target: _static/screenshots/documents-largecards.png | ||||
|  | ||||
| Paperless-ngx also supports "dark mode": | ||||
|  | ||||
| .. image:: _static/screenshots/documents-smallcards-dark.png | ||||
|     :target: _static/screenshots/documents-smallcards-dark.png | ||||
|  | ||||
| Extensive filtering mechanisms: | ||||
|  | ||||
| .. image:: _static/screenshots/documents-filter.png | ||||
|     :target: _static/screenshots/documents-filter.png | ||||
|  | ||||
| Bulk editing of document tags, correspondents, etc.: | ||||
|  | ||||
| .. image:: _static/screenshots/bulk-edit.png | ||||
|     :target: _static/screenshots/bulk-edit.png | ||||
|  | ||||
| Side-by-side editing of documents: | ||||
|  | ||||
| .. image:: _static/screenshots/editing.png | ||||
|     :target: _static/screenshots/editing.png | ||||
|  | ||||
| Tag editing. This looks about the same for correspondents and document types. | ||||
|  | ||||
| .. image:: _static/screenshots/new-tag.png | ||||
|     :target: _static/screenshots/new-tag.png | ||||
|  | ||||
| Searching provides auto complete and highlights the results. | ||||
|  | ||||
| .. image:: _static/screenshots/search-preview.png | ||||
|     :target: _static/screenshots/search-preview.png | ||||
| .. image:: _static/screenshots/search-results.png | ||||
|     :target: _static/screenshots/search-results.png | ||||
|  | ||||
| Fancy mail filters! | ||||
|  | ||||
| .. image:: _static/screenshots/mail-rules-edited.png | ||||
|     :target: _static/screenshots/mail-rules-edited.png | ||||
|  | ||||
| Mobile devices are supported. | ||||
|  | ||||
| .. image:: _static/screenshots/mobile.png | ||||
|     :target: _static/screenshots/mobile.png | ||||
|     You will be redirected shortly... | ||||
|   | ||||
							
								
								
									
										890
									
								
								docs/setup.rst
									
									
									
									
									
								
							
							
						
						
									
										890
									
								
								docs/setup.rst
									
									
									
									
									
								
							| @@ -1,894 +1,12 @@ | ||||
| .. _setup: | ||||
|  | ||||
| ***** | ||||
| Setup | ||||
| ***** | ||||
|  | ||||
| Overview of Paperless-ngx | ||||
| ######################### | ||||
|  | ||||
| Compared to paperless, paperless-ngx works a little different under the hood and has | ||||
| more moving parts that work together. While this increases the complexity of | ||||
| the system, it also brings many benefits. | ||||
| .. cssclass:: redirect-notice | ||||
|  | ||||
| Paperless consists of the following components: | ||||
|     The Paperless-ngx documentation has permanently moved. | ||||
|  | ||||
| *   **The webserver:** This is pretty much the same as in paperless. It serves | ||||
|     the administration pages, the API, and the new frontend. This is the main | ||||
|     tool you'll be using to interact with paperless. You may start the webserver | ||||
|     with | ||||
|  | ||||
|     .. code:: shell-session | ||||
|  | ||||
|         $ cd /path/to/paperless/src/ | ||||
|         $ gunicorn -c ../gunicorn.conf.py paperless.wsgi | ||||
|  | ||||
|     or by any other means such as Apache ``mod_wsgi``. | ||||
|  | ||||
| *   **The consumer:** This is what watches your consumption folder for documents. | ||||
|     However, the consumer itself does not really consume your documents. | ||||
|     Now it notifies a task processor that a new file is ready for consumption. | ||||
|     I suppose it should be named differently. | ||||
|     This was also used to check your emails, but that's now done elsewhere as well. | ||||
|  | ||||
|     Start the consumer with the management command ``document_consumer``: | ||||
|  | ||||
|     .. code:: shell-session | ||||
|  | ||||
|         $ cd /path/to/paperless/src/ | ||||
|         $ python3 manage.py document_consumer | ||||
|  | ||||
|     .. _setup-task_processor: | ||||
|  | ||||
| *   **The task processor:** Paperless relies on `Celery - Distributed Task Queue <https://docs.celeryq.dev/en/stable/index.html>`_ | ||||
|     for doing most of the heavy lifting. This is a task queue that accepts tasks from | ||||
|     multiple sources and processes these in parallel. It also comes with a scheduler that executes | ||||
|     certain commands periodically. | ||||
|  | ||||
|     This task processor is responsible for: | ||||
|  | ||||
|     *   Consuming documents. When the consumer finds new documents, it notifies the task processor to | ||||
|         start a consumption task. | ||||
|     *   The task processor also performs the consumption of any documents you upload through | ||||
|         the web interface. | ||||
|     *   Consuming emails. It periodically checks your configured accounts for new emails and | ||||
|         notifies the task processor to consume the attachment of an email. | ||||
|     *   Maintaining the search index and the automatic matching algorithm. These are things that paperless | ||||
|         needs to do from time to time in order to operate properly. | ||||
|  | ||||
|     This allows paperless to process multiple documents from your consumption folder in parallel! On | ||||
|     a modern multi core system, this makes the consumption process with full OCR blazingly fast. | ||||
|  | ||||
|     The task processor comes with a built-in admin interface that you can use to check whenever any of the | ||||
|     tasks fail and inspect the errors (i.e., wrong email credentials, errors during consuming a specific | ||||
|     file, etc). | ||||
|  | ||||
| *   A `redis <https://redis.io/>`_ message broker: This is a really lightweight service that is responsible | ||||
|     for getting the tasks from the webserver and the consumer to the task scheduler. These run in a different | ||||
|     process (maybe even on different machines!), and therefore, this is necessary. | ||||
|  | ||||
| *   Optional: A database server. Paperless supports PostgreSQL, MariaDB and SQLite for storing its data. | ||||
|  | ||||
|  | ||||
| Installation | ||||
| ############ | ||||
|  | ||||
| You can go multiple routes to setup and run Paperless: | ||||
|  | ||||
| * :ref:`Use the easy install docker script <setup-docker_script>` | ||||
| * :ref:`Pull the image from Docker Hub <setup-docker_hub>` | ||||
| * :ref:`Build the Docker image yourself <setup-docker_build>` | ||||
| * :ref:`Install Paperless directly on your system manually (bare metal) <setup-bare_metal>` | ||||
|  | ||||
| The Docker routes are quick & easy. These are the recommended routes. This configures all the stuff | ||||
| from the above automatically so that it just works and uses sensible defaults for all configuration options. | ||||
| Here you find a cheat-sheet for docker beginners: `CLI Basics <https://www.sehn.tech/refs/devops-with-docker/>`_ | ||||
|  | ||||
| The bare metal route is complicated to setup but makes it easier | ||||
| should you want to contribute some code back. You need to configure and | ||||
| run the above mentioned components yourself. | ||||
|  | ||||
| .. _CLI Basics: https://www.sehn.tech/refs/devops-with-docker/ | ||||
|  | ||||
| .. _setup-docker_script: | ||||
|  | ||||
| Install Paperless from Docker Hub using the installation script | ||||
| =============================================================== | ||||
|  | ||||
| Paperless provides an interactive installation script. This script will ask you | ||||
| for a couple configuration options, download and create the necessary configuration files, pull the docker image, start paperless and create your user account. This script essentially | ||||
| performs all the steps described in :ref:`setup-docker_hub` automatically. | ||||
|  | ||||
| 1.  Make sure that docker and docker-compose are installed. | ||||
| 2.  Download and run the installation script: | ||||
|  | ||||
|     .. code:: shell-session | ||||
|  | ||||
|         $ bash -c "$(curl -L https://raw.githubusercontent.com/paperless-ngx/paperless-ngx/main/install-paperless-ngx.sh)" | ||||
|  | ||||
| .. _setup-docker_hub: | ||||
|  | ||||
| Install Paperless from Docker Hub | ||||
| ================================= | ||||
|  | ||||
| 1.  Login with your user and create a folder in your home-directory `mkdir -v ~/paperless-ngx` to have a place for your configuration files and consumption directory. | ||||
|  | ||||
| 2.  Go to the `/docker/compose directory on the project page <https://github.com/paperless-ngx/paperless-ngx/tree/master/docker/compose>`_ | ||||
|     and download one of the `docker-compose.*.yml` files, depending on which database backend you | ||||
|     want to use. Rename this file to `docker-compose.yml`. | ||||
|     If you want to enable optional support for Office documents, download a file with `-tika` in the file name. | ||||
|     Download the ``docker-compose.env`` file and the ``.env`` file as well and store them | ||||
|     in the same directory. | ||||
|  | ||||
|     .. hint:: | ||||
|  | ||||
|         For new installations, it is recommended to use PostgreSQL as the database | ||||
|         backend. | ||||
|  | ||||
| 3.  Install `Docker`_ and `docker-compose`_. | ||||
|  | ||||
|     .. caution:: | ||||
|  | ||||
|         If you want to use the included ``docker-compose.*.yml`` file, you | ||||
|         need to have at least Docker version **17.09.0** and docker-compose | ||||
|         version **1.17.0**. | ||||
|         To check do: `docker-compose -v` or `docker -v` | ||||
|  | ||||
|         See the `Docker installation guide`_ on how to install the current | ||||
|         version of Docker for your operating system or Linux distribution of | ||||
|         choice. To get the latest version of docker-compose, follow the | ||||
|         `docker-compose installation guide`_ if your package repository doesn't | ||||
|         include it. | ||||
|  | ||||
|         .. _Docker installation guide: https://docs.docker.com/engine/installation/ | ||||
|         .. _docker-compose installation guide: https://docs.docker.com/compose/install/ | ||||
|  | ||||
| 4.  Modify ``docker-compose.yml`` to your preferences. You may want to change the path | ||||
|     to the consumption directory. Find the line that specifies where | ||||
|     to mount the consumption directory: | ||||
|  | ||||
|     .. code:: | ||||
|  | ||||
|         - ./consume:/usr/src/paperless/consume | ||||
|  | ||||
|     Replace the part BEFORE the colon with a local directory of your choice: | ||||
|  | ||||
|     .. code:: | ||||
|  | ||||
|         - /home/jonaswinkler/paperless-inbox:/usr/src/paperless/consume | ||||
|  | ||||
|     Don't change the part after the colon or paperless wont find your documents. | ||||
|  | ||||
|     You may also need to change the default port that the webserver will use | ||||
|     from the default (8000): | ||||
|  | ||||
|      .. code:: | ||||
|  | ||||
|         ports: | ||||
|           - 8000:8000 | ||||
|  | ||||
|     Replace the part BEFORE the colon with a port of your choice: | ||||
|  | ||||
|      .. code:: | ||||
|  | ||||
|         ports: | ||||
|           - 8010:8000 | ||||
|  | ||||
|     Don't change the part after the colon or edit other lines that refer to | ||||
|     port 8000. Modifying the part before the colon will map requests on another | ||||
|     port to the webserver running on the default port. | ||||
|  | ||||
|     **Rootless** | ||||
|  | ||||
|     If you want to run Paperless as a rootless container, you will need to do the | ||||
|     following in your ``docker-compose.yml``: | ||||
|  | ||||
|     - set the ``user`` running the container to map to the ``paperless`` user in the | ||||
|       container. | ||||
|       This value (``user_id`` below), should be the same id that ``USERMAP_UID`` and | ||||
|       ``USERMAP_GID`` are set to in the next step. | ||||
|       See ``USERMAP_UID`` and ``USERMAP_GID`` :ref:`here <configuration-docker>`. | ||||
|  | ||||
|     Your entry for Paperless should contain something like: | ||||
|  | ||||
|      .. code:: | ||||
|  | ||||
|         webserver: | ||||
|           image: ghcr.io/paperless-ngx/paperless-ngx:latest | ||||
|           user: <user_id> | ||||
|  | ||||
| 5.  Modify ``docker-compose.env``, following the comments in the file. The | ||||
|     most important change is to set ``USERMAP_UID`` and ``USERMAP_GID`` | ||||
|     to the uid and gid of your user on the host system. Use ``id -u`` and | ||||
|     ``id -g`` to get these. | ||||
|  | ||||
|     This ensures that | ||||
|     both the docker container and you on the host machine have write access | ||||
|     to the consumption directory. If your UID and GID on the host system is | ||||
|     1000 (the default for the first normal user on most systems), it will | ||||
|     work out of the box without any modifications. `id "username"` to check. | ||||
|  | ||||
|     .. note:: | ||||
|  | ||||
|         You can copy any setting from the file ``paperless.conf.example`` and paste it here. | ||||
|         Have a look at :ref:`configuration` to see what's available. | ||||
|  | ||||
|     .. note:: | ||||
|  | ||||
|         You can utilize Docker secrets for some configuration settings by | ||||
|         appending `_FILE` to some configuration values.  This is supported currently | ||||
|         only by: | ||||
|  | ||||
|           * PAPERLESS_DBUSER | ||||
|           * PAPERLESS_DBPASS | ||||
|           * PAPERLESS_SECRET_KEY | ||||
|           * PAPERLESS_AUTO_LOGIN_USERNAME | ||||
|           * PAPERLESS_ADMIN_USER | ||||
|           * PAPERLESS_ADMIN_MAIL | ||||
|           * PAPERLESS_ADMIN_PASSWORD | ||||
|  | ||||
|     .. caution:: | ||||
|  | ||||
|         Some file systems such as NFS network shares don't support file system | ||||
|         notifications with ``inotify``. When storing the consumption directory | ||||
|         on such a file system, paperless will not pick up new files | ||||
|         with the default configuration. You will need to use ``PAPERLESS_CONSUMER_POLLING``, | ||||
|         which will disable inotify. See :ref:`here <configuration-polling>`. | ||||
|  | ||||
| 6.  Run ``docker-compose pull``, followed by ``docker-compose up -d``. | ||||
|     This will pull the image, create and start the necessary containers. | ||||
|  | ||||
| 7.  To be able to login, you will need a super user. To create it, execute the | ||||
|     following command: | ||||
|  | ||||
|     .. code-block:: shell-session | ||||
|  | ||||
|         $ docker-compose run --rm webserver createsuperuser | ||||
|  | ||||
|     This will prompt you to set a username, an optional e-mail address and | ||||
|     finally a password (at least 8 characters). | ||||
|  | ||||
| 8.  The default ``docker-compose.yml`` exports the webserver on your local port | ||||
|     8000. If you did not change this, you should now be able to visit your | ||||
|     Paperless instance at ``http://127.0.0.1:8000`` or your servers IP-Address:8000. | ||||
|     Use the login credentials you have created with the previous step. | ||||
|  | ||||
| .. _Docker: https://www.docker.com/ | ||||
| .. _docker-compose: https://docs.docker.com/compose/install/ | ||||
|  | ||||
| .. _setup-docker_build: | ||||
|  | ||||
| Build the Docker image yourself | ||||
| =============================== | ||||
|  | ||||
| 1.  Clone the entire repository of paperless: | ||||
|  | ||||
|     .. code:: shell-session | ||||
|  | ||||
|         git clone https://github.com/paperless-ngx/paperless-ngx | ||||
|  | ||||
|     The master branch always reflects the latest stable version. | ||||
|  | ||||
| 2.  Copy one of the ``docker/compose/docker-compose.*.yml`` to ``docker-compose.yml`` in the root folder, | ||||
|     depending on which database backend you want to use. Copy | ||||
|     ``docker-compose.env`` into the project root as well. | ||||
|  | ||||
| 3.  In the ``docker-compose.yml`` file, find the line that instructs docker-compose to pull the paperless image from Docker Hub: | ||||
|  | ||||
|     .. code:: yaml | ||||
|  | ||||
|         webserver: | ||||
|             image: ghcr.io/paperless-ngx/paperless-ngx:latest | ||||
|  | ||||
|     and replace it with a line that instructs docker-compose to build the image from the current working directory instead: | ||||
|  | ||||
|     .. code:: yaml | ||||
|  | ||||
|         webserver: | ||||
|             build: | ||||
|               context: . | ||||
|               args: | ||||
|                 QPDF_VERSION: x.y.x | ||||
|                 PIKEPDF_VERSION: x.y.z | ||||
|                 PSYCOPG2_VERSION: x.y.z | ||||
|                 JBIG2ENC_VERSION: 0.29 | ||||
|  | ||||
|     .. note:: | ||||
|  | ||||
|         You should match the build argument versions to the version for the release you have | ||||
|         checked out.  These are pre-built images with certain, more updated software. | ||||
|         If you want to build these images your self, that is possible, but beyond | ||||
|         the scope of these steps. | ||||
|  | ||||
| 4.  Follow steps 3 to 8 of :ref:`setup-docker_hub`. When asked to run | ||||
|     ``docker-compose pull`` to pull the image, do | ||||
|  | ||||
|     .. code:: shell-session | ||||
|  | ||||
|         $ docker-compose build | ||||
|  | ||||
|     instead to build the image. | ||||
|  | ||||
| .. _setup-bare_metal: | ||||
|  | ||||
| Bare Metal Route | ||||
| ================ | ||||
|  | ||||
| Paperless runs on linux only. The following procedure has been tested on a minimal | ||||
| installation of Debian/Buster, which is the current stable release at the time of | ||||
| writing. Windows is not and will never be supported. | ||||
|  | ||||
| 1.  Install dependencies. Paperless requires the following packages. | ||||
|  | ||||
|     *   ``python3`` 3.8, 3.9 | ||||
|     *   ``python3-pip`` | ||||
|     *   ``python3-dev`` | ||||
|  | ||||
|     *   ``default-libmysqlclient-dev`` for MariaDB | ||||
|     *   ``fonts-liberation`` for generating thumbnails for plain text files | ||||
|     *   ``imagemagick`` >= 6 for PDF conversion | ||||
|     *   ``gnupg`` for handling encrypted documents | ||||
|     *   ``libpq-dev`` for PostgreSQL | ||||
|     *   ``libmagic-dev`` for mime type detection | ||||
|     *   ``mariadb-client`` for MariaDB compile time | ||||
|     *   ``mime-support`` for mime type detection | ||||
|     *   ``libzbar0`` for barcode detection | ||||
|     *   ``poppler-utils`` for barcode detection | ||||
|  | ||||
|     Use this list for your preferred package management: | ||||
|  | ||||
|     .. code:: | ||||
|  | ||||
|         python3 python3-pip python3-dev imagemagick fonts-liberation gnupg libpq-dev default-libmysqlclient-dev libmagic-dev mime-support libzbar0 poppler-utils | ||||
|  | ||||
|     These dependencies are required for OCRmyPDF, which is used for text recognition. | ||||
|  | ||||
|     *   ``unpaper`` | ||||
|     *   ``ghostscript`` | ||||
|     *   ``icc-profiles-free`` | ||||
|     *   ``qpdf`` | ||||
|     *   ``liblept5`` | ||||
|     *   ``libxml2`` | ||||
|     *   ``pngquant`` (suggested for certain PDF image optimizations) | ||||
|     *   ``zlib1g`` | ||||
|     *   ``tesseract-ocr`` >= 4.0.0 for OCR | ||||
|     *   ``tesseract-ocr`` language packs (``tesseract-ocr-eng``, ``tesseract-ocr-deu``, etc) | ||||
|  | ||||
|     Use this list for your preferred package management: | ||||
|  | ||||
|     .. code:: | ||||
|  | ||||
|         unpaper ghostscript icc-profiles-free qpdf liblept5 libxml2 pngquant zlib1g tesseract-ocr | ||||
|  | ||||
|     On Raspberry Pi, these libraries are required as well: | ||||
|  | ||||
|     *   ``libatlas-base-dev`` | ||||
|     *   ``libxslt1-dev`` | ||||
|  | ||||
|     You will also need ``build-essential``, ``python3-setuptools`` and ``python3-wheel`` | ||||
|     for installing some of the python dependencies. | ||||
|  | ||||
| 2.  Install ``redis`` >= 6.0 and configure it to start automatically. | ||||
|  | ||||
| 3.  Optional. Install ``postgresql`` and configure a database, user and password for paperless. If you do not wish | ||||
|     to use PostgreSQL, MariaDB and SQLite are available as well. | ||||
|  | ||||
|     .. note:: | ||||
|  | ||||
|         On bare-metal installations using SQLite, ensure the | ||||
|         `JSON1 extension <https://code.djangoproject.com/wiki/JSON1Extension>`_ is enabled. This is | ||||
|         usually the case, but not always. | ||||
|  | ||||
| 4.  Get the release archive from `<https://github.com/paperless-ngx/paperless-ngx/releases>`_. | ||||
|     If you clone the git repo as it is, you also have to compile the front end by yourself. | ||||
|     Extract the archive to a place from where you wish to execute it, such as ``/opt/paperless``. | ||||
|  | ||||
| 5.  Configure paperless. See :ref:`configuration` for details. Edit the included ``paperless.conf`` and adjust the | ||||
|     settings to your needs. Required settings for getting paperless running are: | ||||
|  | ||||
|     *   ``PAPERLESS_REDIS`` should point to your redis server, such as redis://localhost:6379. | ||||
|     *   ``PAPERLESS_DBENGINE`` optional, and should be one of `postgres, mariadb, or sqlite` | ||||
|     *   ``PAPERLESS_DBHOST`` should be the hostname on which your PostgreSQL server is running. Do not configure this | ||||
|         to use SQLite instead. Also configure port, database name, user and password as necessary. | ||||
|     *   ``PAPERLESS_CONSUMPTION_DIR`` should point to a folder which paperless should watch for documents. You might | ||||
|         want to have this somewhere else. Likewise, ``PAPERLESS_DATA_DIR`` and ``PAPERLESS_MEDIA_ROOT`` define where | ||||
|         paperless stores its data. If you like, you can point both to the same directory. | ||||
|     *   ``PAPERLESS_SECRET_KEY`` should be a random sequence of characters. It's used for authentication. Failure | ||||
|         to do so allows third parties to forge authentication credentials. | ||||
|     *   ``PAPERLESS_URL`` if you are behind a reverse proxy. This should point to your domain. Please see | ||||
|         :ref:`configuration` for more information. | ||||
|  | ||||
|     Many more adjustments can be made to paperless, especially the OCR part. The following options are recommended | ||||
|     for everyone: | ||||
|  | ||||
|     *   Set ``PAPERLESS_OCR_LANGUAGE`` to the language most of your documents are written in. | ||||
|     *   Set ``PAPERLESS_TIME_ZONE`` to your local time zone. | ||||
|  | ||||
| 6.  Create a system user under which you wish to run paperless. | ||||
|  | ||||
|     .. code:: shell-session | ||||
|  | ||||
|         adduser paperless --system --home /opt/paperless --group | ||||
|  | ||||
| 7.  Ensure that these directories exist | ||||
|     and that the paperless user has write permissions to the following directories: | ||||
|  | ||||
|     *   ``/opt/paperless/media`` | ||||
|     *   ``/opt/paperless/data`` | ||||
|     *   ``/opt/paperless/consume`` | ||||
|  | ||||
|     Adjust as necessary if you configured different folders. | ||||
|  | ||||
| 8.  Install python requirements from the ``requirements.txt`` file. | ||||
|     It is up to you if you wish to use a virtual environment or not. First you should update your pip, so it gets the actual packages. | ||||
|  | ||||
|     .. code:: shell-session | ||||
|  | ||||
|         sudo -Hu paperless pip3 install --upgrade pip | ||||
|  | ||||
|     .. code:: shell-session | ||||
|  | ||||
|         sudo -Hu paperless pip3 install -r requirements.txt | ||||
|  | ||||
|     This will install all python dependencies in the home directory of | ||||
|     the new paperless user. | ||||
|  | ||||
| 9.  Go to ``/opt/paperless/src``, and execute the following commands: | ||||
|  | ||||
|     .. code:: bash | ||||
|  | ||||
|         # This creates the database schema. | ||||
|         sudo -Hu paperless python3 manage.py migrate | ||||
|  | ||||
|         # This creates your first paperless user | ||||
|         sudo -Hu paperless python3 manage.py createsuperuser | ||||
|  | ||||
| 10. Optional: Test that paperless is working by executing | ||||
|  | ||||
|       .. code:: bash | ||||
|  | ||||
|         # This collects static files from paperless and django. | ||||
|         sudo -Hu paperless python3 manage.py runserver | ||||
|  | ||||
|     and pointing your browser to http://localhost:8000/. | ||||
|  | ||||
|     .. warning:: | ||||
|  | ||||
|         This is a development server which should not be used in | ||||
|         production. It is not audited for security and performance | ||||
|         is inferior to production ready web servers. | ||||
|  | ||||
|     .. hint:: | ||||
|  | ||||
|         This will not start the consumer. Paperless does this in a | ||||
|         separate process. | ||||
|  | ||||
| 11. Setup systemd services to run paperless automatically. You may | ||||
|     use the service definition files included in the ``scripts`` folder | ||||
|     as a starting point. | ||||
|  | ||||
|     Paperless needs the ``webserver`` script to run the webserver, the | ||||
|     ``consumer`` script to watch the input folder, ``taskqueue`` for the background workers | ||||
|     used to handle things like document consumption and the ``scheduler`` script to run tasks such as | ||||
|     email checking at certain times . | ||||
|  | ||||
| 		The ``socket`` script enables ``gunicorn`` to run on port 80 without | ||||
| 		root privileges. For this you need to uncomment the ``Require=paperless-webserver.socket`` | ||||
| 		in the ``webserver`` script and configure ``gunicorn`` to listen on port 80 (see ``paperless/gunicorn.conf.py``). | ||||
|  | ||||
|     You may need to adjust the path to the ``gunicorn`` executable. This | ||||
|     will be installed as part of the python dependencies, and is either located | ||||
|     in the ``bin`` folder of your virtual environment, or in ``~/.local/bin/`` if | ||||
|     no virtual environment is used. | ||||
|  | ||||
|     These services rely on redis and optionally the database server, but | ||||
|     don't need to be started in any particular order. The example files | ||||
|     depend on redis being started. If you use a database server, you should | ||||
|     add additional dependencies. | ||||
|  | ||||
|     .. caution:: | ||||
|  | ||||
|         The included scripts run a ``gunicorn`` standalone server, | ||||
|         which is fine for running paperless. It does support SSL, | ||||
|         however, the documentation of GUnicorn states that you should | ||||
|         use a proxy server in front of gunicorn instead. | ||||
|  | ||||
|         For instructions on how to use nginx for that, | ||||
|         :ref:`see the instructions below <setup-nginx>`. | ||||
|  | ||||
| 12. Optional: Install a samba server and make the consumption folder | ||||
|     available as a network share. | ||||
|  | ||||
| 13. Configure ImageMagick to allow processing of PDF documents. Most distributions have | ||||
|     this disabled by default, since PDF documents can contain malware. If | ||||
|     you don't do this, paperless will fall back to ghostscript for certain steps | ||||
|     such as thumbnail generation. | ||||
|  | ||||
|     Edit ``/etc/ImageMagick-6/policy.xml`` and adjust | ||||
|  | ||||
|     .. code:: | ||||
|  | ||||
|         <policy domain="coder" rights="none" pattern="PDF" /> | ||||
|  | ||||
|     to | ||||
|  | ||||
|     .. code:: | ||||
|  | ||||
|         <policy domain="coder" rights="read|write" pattern="PDF" /> | ||||
|  | ||||
| 14. Optional: Install the `jbig2enc <https://ocrmypdf.readthedocs.io/en/latest/jbig2.html>`_ | ||||
|     encoder. This will reduce the size of generated PDF documents. You'll most likely need | ||||
|     to compile this by yourself, because this software has been patented until around 2017 and | ||||
|     binary packages are not available for most distributions. | ||||
|  | ||||
| 15. Optional: If using the NLTK machine learning processing (see ``PAPERLESS_ENABLE_NLTK`` in | ||||
|     :ref:`configuration` for details), download the NLTK data for the Snowball Stemmer, Stopwords | ||||
|     and Punkt tokenizer to your ``PAPERLESS_DATA_DIR/nltk``.  Refer to | ||||
|     the `NLTK instructions <https://www.nltk.org/data.html>`_ for details on how to | ||||
|     download the data. | ||||
|  | ||||
|  | ||||
| Migrating to Paperless-ngx | ||||
| ########################## | ||||
|  | ||||
| Migration is possible both from Paperless-ng or directly from the 'original' Paperless. | ||||
|  | ||||
| Migrating from Paperless-ng | ||||
| =========================== | ||||
|  | ||||
| Paperless-ngx is meant to be a drop-in replacement for Paperless-ng and thus upgrading should be | ||||
| trivial for most users, especially when using docker. However, as with any major change, it is | ||||
| recommended to take a full backup first. Once you are ready, simply change the docker image to | ||||
| point to the new source. E.g. if using Docker Compose, edit ``docker-compose.yml`` and change: | ||||
|  | ||||
| .. code:: | ||||
|  | ||||
|   image: jonaswinkler/paperless-ng:latest | ||||
|  | ||||
| to | ||||
|  | ||||
| .. code:: | ||||
|  | ||||
|   image: ghcr.io/paperless-ngx/paperless-ngx:latest | ||||
|  | ||||
| and then run ``docker-compose up -d`` which will pull the new image recreate the container. | ||||
| That's it! | ||||
|  | ||||
| Users who installed with the bare-metal route should also update their Git clone to point to | ||||
| ``https://github.com/paperless-ngx/paperless-ngx``, e.g. using the command | ||||
| ``git remote set-url origin https://github.com/paperless-ngx/paperless-ngx`` and then pull the | ||||
| lastest version. | ||||
|  | ||||
| Migrating from Paperless | ||||
| ======================== | ||||
|  | ||||
| At its core, paperless-ngx is still paperless and fully compatible. However, some | ||||
| things have changed under the hood, so you need to adapt your setup depending on | ||||
| how you installed paperless. | ||||
|  | ||||
| This setup describes how to update an existing paperless Docker installation. | ||||
| The important things to keep in mind are as follows: | ||||
|  | ||||
| * Read the :doc:`changelog </changelog>` and take note of breaking changes. | ||||
| * You should decide if you want to stick with SQLite or want to migrate your database | ||||
|   to PostgreSQL. See :ref:`setup-sqlite_to_psql` for details on how to move your data from | ||||
|   SQLite to PostgreSQL. Both work fine with paperless. However, if you already have a | ||||
|   database server running for other services, you might as well use it for paperless as well. | ||||
| * The task scheduler of paperless, which is used to execute periodic tasks | ||||
|   such as email checking and maintenance, requires a `redis`_ message broker | ||||
|   instance. The docker-compose route takes care of that. | ||||
| * The layout of the folder structure for your documents and data remains the | ||||
|   same, so you can just plug your old docker volumes into paperless-ngx and | ||||
|   expect it to find everything where it should be. | ||||
|  | ||||
| Migration to paperless-ngx is then performed in a few simple steps: | ||||
|  | ||||
| 1.  Stop paperless. | ||||
|  | ||||
|     .. code:: bash | ||||
|  | ||||
|         $ cd /path/to/current/paperless | ||||
|         $ docker-compose down | ||||
|  | ||||
| 2.  Do a backup for two purposes: If something goes wrong, you still have your | ||||
|     data. Second, if you don't like paperless-ngx, you can switch back to | ||||
|     paperless. | ||||
|  | ||||
| 3.  Download the latest release of paperless-ngx. You can either go with the | ||||
|     docker-compose files from `here <https://github.com/paperless-ngx/paperless-ngx/tree/master/docker/compose>`__ | ||||
|     or clone the repository to build the image yourself (see :ref:`above <setup-docker_build>`). | ||||
|     You can either replace your current paperless folder or put paperless-ngx | ||||
|     in a different location. | ||||
|  | ||||
|     .. caution:: | ||||
|  | ||||
|         Paperless-ngx includes a ``.env`` file. This will set the | ||||
|         project name for docker compose to ``paperless``, which will also define the name | ||||
|         of the volumes by paperless-ngx. However, if you experience that paperless-ngx | ||||
|         is not using your old paperless volumes, verify the names of your volumes with | ||||
|  | ||||
|         .. code:: shell-session | ||||
|  | ||||
|             $ docker volume ls | grep _data | ||||
|  | ||||
|         and adjust the project name in the ``.env`` file so that it matches the name | ||||
|         of the volumes before the ``_data`` part. | ||||
|  | ||||
|  | ||||
| 4.  Download the ``docker-compose.sqlite.yml`` file to ``docker-compose.yml``. | ||||
|     If you want to switch to PostgreSQL, do that after you migrated your existing | ||||
|     SQLite database. | ||||
|  | ||||
| 5.  Adjust ``docker-compose.yml`` and ``docker-compose.env`` to your needs. | ||||
|     See :ref:`setup-docker_hub` for details on which edits are advised. | ||||
|  | ||||
| 6.  :ref:`Update paperless. <administration-updating>` | ||||
|  | ||||
| 7.  In order to find your existing documents with the new search feature, you need | ||||
|     to invoke a one-time operation that will create the search index: | ||||
|  | ||||
|     .. code:: shell-session | ||||
|  | ||||
|         $ docker-compose run --rm webserver document_index reindex | ||||
|  | ||||
|     This will migrate your database and create the search index. After that, | ||||
|     paperless will take care of maintaining the index by itself. | ||||
|  | ||||
| 8.  Start paperless-ngx. | ||||
|  | ||||
|     .. code:: bash | ||||
|  | ||||
|         $ docker-compose up -d | ||||
|  | ||||
|     This will run paperless in the background and automatically start it on system boot. | ||||
|  | ||||
| 9.  Paperless installed a permanent redirect to ``admin/`` in your browser. This | ||||
|     redirect is still in place and prevents access to the new UI. Clear your | ||||
|     browsing cache in order to fix this. | ||||
|  | ||||
| 10.  Optionally, follow the instructions below to migrate your existing data to PostgreSQL. | ||||
|  | ||||
|  | ||||
| Migrating from LinuxServer.io Docker Image | ||||
| ========================================== | ||||
|  | ||||
| As with any upgrades and large changes, it is highly recommended to create a backup before | ||||
| starting.  This assumes the image was running using Docker Compose, but the instructions | ||||
| are translatable to Docker commands as well. | ||||
|  | ||||
| 1.  Stop and remove the paperless container | ||||
| 2.  If using an external database, stop the container | ||||
| 3.  Update Redis configuration | ||||
|  | ||||
|     a)  If ``REDIS_URL`` is already set, change it to ``PAPERLESS_REDIS`` and continue | ||||
|         to step 4. | ||||
|     b)  Otherwise, in the ``docker-compose.yml`` add a new service for Redis, | ||||
|         following `the example compose files <https://github.com/paperless-ngx/paperless-ngx/tree/main/docker/compose>`_ | ||||
|     c)  Set the environment variable ``PAPERLESS_REDIS`` so it points to the new Redis container | ||||
|  | ||||
| 4.  Update user mapping | ||||
|  | ||||
|     a)  If set, change the environment variable ``PUID`` to ``USERMAP_UID`` | ||||
|     b)  If set, change the environment variable ``PGID`` to ``USERMAP_GID`` | ||||
|  | ||||
| 5.  Update configuration paths | ||||
|  | ||||
|     a) Set the environment variable ``PAPERLESS_DATA_DIR`` | ||||
|        to ``/config`` | ||||
|  | ||||
| 6.  Update media paths | ||||
|  | ||||
|     a) Set the environment variable ``PAPERLESS_MEDIA_ROOT`` | ||||
|        to ``/data/media`` | ||||
|  | ||||
| 7.  Update timezone | ||||
|  | ||||
|     a) Set the environment variable ``PAPERLESS_TIME_ZONE`` | ||||
|        to the same value as ``TZ`` | ||||
|  | ||||
| 8.  Modify the ``image:`` to point to ``ghcr.io/paperless-ngx/paperless-ngx:latest`` or | ||||
|     a specific version if preferred. | ||||
|  | ||||
| 9.  Start the containers as before, using ``docker-compose``. | ||||
|  | ||||
| .. _setup-sqlite_to_psql: | ||||
|  | ||||
| Moving data from SQLite to PostgreSQL or MySQL/MariaDB | ||||
| ====================================================== | ||||
|  | ||||
| Moving your data from SQLite to PostgreSQL or MySQL/MariaDB is done via executing a series of django | ||||
| management commands as below.  The commands below use PostgreSQL, but are applicable to MySQL/MariaDB | ||||
| with the | ||||
|  | ||||
| .. caution:: | ||||
|  | ||||
|     Make sure that your SQLite database is migrated to the latest version. | ||||
|     Starting paperless will make sure that this is the case. If your try to | ||||
|     load data from an old database schema in SQLite into a newer database | ||||
|     schema in PostgreSQL, you will run into trouble. | ||||
|  | ||||
| .. warning:: | ||||
|  | ||||
|     On some database fields, PostgreSQL enforces predefined limits on maximum | ||||
|     length, whereas SQLite does not. The fields in question are the title of documents | ||||
|     (128 characters), names of document types, tags and correspondents (128 characters), | ||||
|     and filenames (1024 characters). If you have data in these fields that surpasses these | ||||
|     limits, migration to PostgreSQL is not possible and will fail with an error. | ||||
|  | ||||
| .. warning:: | ||||
|  | ||||
|     MySQL is case insensitive by default, treating values like "Name" and "NAME" as identical. | ||||
|     See :ref:`advanced-mysql-caveats` for details. | ||||
|  | ||||
|  | ||||
| 1.  Stop paperless, if it is running. | ||||
| 2.  Tell paperless to use PostgreSQL: | ||||
|  | ||||
|     a)  With docker, copy the provided ``docker-compose.postgres.yml`` file to | ||||
|         ``docker-compose.yml``. Remember to adjust the consumption directory, | ||||
|         if necessary. | ||||
|     b)  Without docker, configure the database in your ``paperless.conf`` file. | ||||
|         See :ref:`configuration` for details. | ||||
|  | ||||
| 3.  Open a shell and initialize the database: | ||||
|  | ||||
|     a)  With docker, run the following command to open a shell within the paperless | ||||
|         container: | ||||
|  | ||||
|         .. code:: shell-session | ||||
|  | ||||
|             $ cd /path/to/paperless | ||||
|             $ docker-compose run --rm webserver /bin/bash | ||||
|  | ||||
|         This will launch the container and initialize the PostgreSQL database. | ||||
|  | ||||
|     b)  Without docker, remember to activate any virtual environment, switch to | ||||
|         the ``src`` directory and create the database schema: | ||||
|  | ||||
|         .. code:: shell-session | ||||
|  | ||||
|             $ cd /path/to/paperless/src | ||||
|             $ python3 manage.py migrate | ||||
|  | ||||
|         This will not copy any data yet. | ||||
|  | ||||
| 4.  Dump your data from SQLite: | ||||
|  | ||||
|     .. code:: shell-session | ||||
|  | ||||
|         $ python3 manage.py dumpdata --database=sqlite --exclude=contenttypes --exclude=auth.Permission > data.json | ||||
|  | ||||
| 5.  Load your data into PostgreSQL: | ||||
|  | ||||
|     .. code:: shell-session | ||||
|  | ||||
|         $ python3 manage.py loaddata data.json | ||||
|  | ||||
| 6.  If operating inside Docker, you may exit the shell now. | ||||
|  | ||||
|     .. code:: shell-session | ||||
|  | ||||
|         $ exit | ||||
|  | ||||
| 7.  Start paperless. | ||||
|  | ||||
|  | ||||
| Moving back to Paperless | ||||
| ======================== | ||||
|  | ||||
| Lets say you migrated to Paperless-ngx and used it for a while, but decided that | ||||
| you don't like it and want to move back (If you do, send me a mail about what | ||||
| part you didn't like!), you can totally do that with a few simple steps. | ||||
|  | ||||
| Paperless-ngx modified the database schema slightly, however, these changes can | ||||
| be reverted while keeping your current data, so that your current data will | ||||
| be compatible with original Paperless. | ||||
|  | ||||
| Execute this: | ||||
|  | ||||
| .. code:: shell-session | ||||
|  | ||||
|     $ cd /path/to/paperless | ||||
|     $ docker-compose run --rm webserver migrate documents 0023 | ||||
|  | ||||
| Or without docker: | ||||
|  | ||||
| .. code:: shell-session | ||||
|  | ||||
|     $ cd /path/to/paperless/src | ||||
|     $ python3 manage.py migrate documents 0023 | ||||
|  | ||||
| After that, you need to clear your cookies (Paperless-ngx comes with updated | ||||
| dependencies that do cookie-processing differently) and probably your cache | ||||
| as well. | ||||
|  | ||||
| .. _setup-less_powerful_devices: | ||||
|  | ||||
|  | ||||
| Considerations for less powerful devices | ||||
| ######################################## | ||||
|  | ||||
| Paperless runs on Raspberry Pi. However, some things are rather slow on the Pi and | ||||
| configuring some options in paperless can help improve performance immensely: | ||||
|  | ||||
| *   Stick with SQLite to save some resources. | ||||
| *   Consider setting ``PAPERLESS_OCR_PAGES`` to 1, so that paperless will only OCR | ||||
|     the first page of your documents. In most cases, this page contains enough | ||||
|     information to be able to find it. | ||||
| *   ``PAPERLESS_TASK_WORKERS`` and ``PAPERLESS_THREADS_PER_WORKER`` are configured | ||||
|     to use all cores. The Raspberry Pi models 3 and up have 4 cores, meaning that | ||||
|     paperless will use 2 workers and 2 threads per worker. This may result in | ||||
|     sluggish response times during consumption, so you might want to lower these | ||||
|     settings (example: 2 workers and 1 thread to always have some computing power | ||||
|     left for other tasks). | ||||
| *   Keep ``PAPERLESS_OCR_MODE`` at its default value ``skip`` and consider OCR'ing | ||||
|     your documents before feeding them into paperless. Some scanners are able to | ||||
|     do this! You might want to even specify ``skip_noarchive`` to skip archive | ||||
|     file generation for already ocr'ed documents entirely. | ||||
| *   If you want to perform OCR on the device, consider using ``PAPERLESS_OCR_CLEAN=none``. | ||||
|     This will speed up OCR times and use less memory at the expense of slightly worse | ||||
|     OCR results. | ||||
| *   If using docker, consider setting ``PAPERLESS_WEBSERVER_WORKERS`` to | ||||
|     1. This will save some memory. | ||||
| *   Consider setting ``PAPERLESS_ENABLE_NLTK`` to false, to disable the more | ||||
|     advanced language processing, which can take more memory and processing time. | ||||
|  | ||||
| For details, refer to :ref:`configuration`. | ||||
|  | ||||
| .. note:: | ||||
|  | ||||
|     Updating the :ref:`automatic matching algorithm <advanced-automatic_matching>` | ||||
|     takes quite a bit of time. However, the update mechanism checks if your | ||||
|     data has changed before doing the heavy lifting. If you experience the | ||||
|     algorithm taking too much cpu time, consider changing the schedule in the | ||||
|     admin interface to daily. You can also manually invoke the task | ||||
|     by changing the date and time of the next run to today/now. | ||||
|  | ||||
|     The actual matching of the algorithm is fast and works on Raspberry Pi as | ||||
|     well as on any other device. | ||||
|  | ||||
| .. _redis: https://redis.io/ | ||||
|  | ||||
|  | ||||
| .. _setup-nginx: | ||||
|  | ||||
| Using nginx as a reverse proxy | ||||
| ############################## | ||||
|  | ||||
| If you want to expose paperless to the internet, you should hide it behind a | ||||
| reverse proxy with SSL enabled. | ||||
|  | ||||
| In addition to the usual configuration for SSL, | ||||
| the following configuration is required for paperless to operate: | ||||
|  | ||||
| .. code:: nginx | ||||
|  | ||||
|     http { | ||||
|  | ||||
|         # Adjust as required. This is the maximum size for file uploads. | ||||
|         # The default value 1M might be a little too small. | ||||
|         client_max_body_size 10M; | ||||
|  | ||||
|         server { | ||||
|  | ||||
|             location / { | ||||
|  | ||||
|                 # Adjust host and port as required. | ||||
|                 proxy_pass http://localhost:8000/; | ||||
|  | ||||
|                 # These configuration options are required for WebSockets to work. | ||||
|                 proxy_http_version 1.1; | ||||
|                 proxy_set_header Upgrade $http_upgrade; | ||||
|                 proxy_set_header Connection "upgrade"; | ||||
|  | ||||
|                 proxy_redirect off; | ||||
|                 proxy_set_header Host $host; | ||||
|                 proxy_set_header X-Real-IP $remote_addr; | ||||
|                 proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for; | ||||
|                 proxy_set_header X-Forwarded-Host $server_name; | ||||
|             } | ||||
|         } | ||||
|     } | ||||
|  | ||||
| The ``PAPERLESS_URL`` configuration variable is also required when using a reverse proxy. Please refer to the :ref:`hosting-and-security` docs. | ||||
|  | ||||
| Also read `this <https://channels.readthedocs.io/en/stable/deploying.html#nginx-supervisor-ubuntu>`__, towards the end of the section. | ||||
|     You will be redirected shortly... | ||||
|   | ||||
| @@ -1,328 +1,12 @@ | ||||
| .. _troubleshooting: | ||||
|  | ||||
| *************** | ||||
| Troubleshooting | ||||
| *************** | ||||
|  | ||||
| No files are added by the consumer | ||||
| ################################## | ||||
|  | ||||
| Check for the following issues: | ||||
| .. cssclass:: redirect-notice | ||||
|  | ||||
| *   Ensure that the directory you're putting your documents in is the folder | ||||
|     paperless is watching. With docker, this setting is performed in the | ||||
|     ``docker-compose.yml`` file. Without docker, look at the ``CONSUMPTION_DIR`` | ||||
|     setting. Don't adjust this setting if you're using docker. | ||||
| *   Ensure that redis is up and running. Paperless does its task processing | ||||
|     asynchronously, and for documents to arrive at the task processor, it needs | ||||
|     redis to run. | ||||
| *   Ensure that the task processor is running. Docker does this automatically. | ||||
|     Manually invoke the task processor by executing | ||||
|     The Paperless-ngx documentation has permanently moved. | ||||
|  | ||||
|     .. code:: shell-session | ||||
|  | ||||
|         $ celery --app paperless worker | ||||
|  | ||||
| *   Look at the output of paperless and inspect it for any errors. | ||||
| *   Go to the admin interface, and check if there are failed tasks. If so, the | ||||
|     tasks will contain an error message. | ||||
|  | ||||
| Consumer warns ``OCR for XX failed`` | ||||
| #################################### | ||||
|  | ||||
| If you find the OCR accuracy to be too low, and/or the document consumer warns | ||||
| that ``OCR for XX failed, but we're going to stick with what we've got since | ||||
| FORGIVING_OCR is enabled``, then you might need to install the | ||||
| `Tesseract language files <http://packages.ubuntu.com/search?keywords=tesseract-ocr>`_ | ||||
| marching your document's languages. | ||||
|  | ||||
| As an example, if you are running Paperless-ngx from any Ubuntu or Debian | ||||
| box, and your documents are written in Spanish you may need to run:: | ||||
|  | ||||
|     apt-get install -y tesseract-ocr-spa | ||||
|  | ||||
| Consumer fails to pickup any new files | ||||
| ###################################### | ||||
|  | ||||
| If you notice that the consumer will only pickup files in the consumption | ||||
| directory at startup, but won't find any other files added later, you will need to | ||||
| enable filesystem polling with the configuration option | ||||
| ``PAPERLESS_CONSUMER_POLLING``, see :ref:`here <configuration-polling>`. | ||||
|  | ||||
| This will disable listening to filesystem changes with inotify and paperless will | ||||
| manually check the consumption directory for changes instead. | ||||
|  | ||||
|  | ||||
| Paperless always redirects to /admin | ||||
| #################################### | ||||
|  | ||||
| You probably had the old paperless installed at some point. Paperless installed | ||||
| a permanent redirect to /admin in your browser, and you need to clear your | ||||
| browsing data / cache to fix that. | ||||
|  | ||||
|  | ||||
| Operation not permitted | ||||
| ####################### | ||||
|  | ||||
| You might see errors such as: | ||||
|  | ||||
| .. code:: shell-session | ||||
|  | ||||
|     chown: changing ownership of '../export': Operation not permitted | ||||
|  | ||||
| The container tries to set file ownership on the listed directories. This is | ||||
| required so that the user running paperless inside docker has write permissions | ||||
| to these folders. This happens when pointing these directories to NFS shares, | ||||
| for example. | ||||
|  | ||||
| Ensure that ``chown`` is possible on these directories. | ||||
|  | ||||
|  | ||||
| Classifier error: No training data available | ||||
| ############################################ | ||||
|  | ||||
| This indicates that the Auto matching algorithm found no documents to learn from. | ||||
| This may have two reasons: | ||||
|  | ||||
| *   You don't use the Auto matching algorithm: The error can be safely ignored in this case. | ||||
| *   You are using the Auto matching algorithm: The classifier explicitly excludes documents | ||||
|     with Inbox tags. Verify that there are documents in your archive without inbox tags. | ||||
|     The algorithm will only learn from documents not in your inbox. | ||||
|  | ||||
|  | ||||
| UserWarning in sklearn on every single document | ||||
| ############################################### | ||||
|  | ||||
| You may encounter warnings like this: | ||||
|  | ||||
| .. code:: | ||||
|  | ||||
|     /usr/local/lib/python3.7/site-packages/sklearn/base.py:315: | ||||
|     UserWarning: Trying to unpickle estimator CountVectorizer from version 0.23.2 when using version 0.24.0. | ||||
|     This might lead to breaking code or invalid results. Use at your own risk. | ||||
|  | ||||
| This happens when certain dependencies of paperless that are responsible for the auto matching algorithm are | ||||
| updated. After updating these, your current training data *might* not be compatible anymore. This can be ignored | ||||
| in most cases. This warning will disappear automatically when paperless updates the training data. | ||||
|  | ||||
| If you want to get rid of the warning or actually experience issues with automatic matching, delete | ||||
| the file ``classification_model.pickle`` in the data directory and let paperless recreate it. | ||||
|  | ||||
|  | ||||
| 504 Server Error: Gateway Timeout when adding Office documents | ||||
| ############################################################## | ||||
|  | ||||
| You may experience these errors when using the optional TIKA integration: | ||||
|  | ||||
| .. code:: | ||||
|  | ||||
|     requests.exceptions.HTTPError: 504 Server Error: Gateway Timeout for url: http://gotenberg:3000/forms/libreoffice/convert | ||||
|  | ||||
| Gotenberg is a server that converts Office documents into PDF documents and has a default timeout of 30 seconds. | ||||
| When conversion takes longer, Gotenberg raises this error. | ||||
|  | ||||
| You can increase the timeout by configuring a command flag for Gotenberg (see also `here <https://gotenberg.dev/docs/modules/api#properties>`__). | ||||
| If using docker-compose, this is achieved by the following configuration change in the ``docker-compose.yml`` file: | ||||
|  | ||||
| .. code:: yaml | ||||
|  | ||||
|     gotenberg: | ||||
|         image: gotenberg/gotenberg:7.6 | ||||
|         restart: unless-stopped | ||||
|         command: | ||||
|             - "gotenberg" | ||||
|             - "--chromium-disable-routes=true" | ||||
|             - "--api-timeout=60" | ||||
|  | ||||
| Permission denied errors in the consumption directory | ||||
| ##################################################### | ||||
|  | ||||
| You might encounter errors such as: | ||||
|  | ||||
| .. code:: shell-session | ||||
|  | ||||
|     The following error occured while consuming document.pdf: [Errno 13] Permission denied: '/usr/src/paperless/src/../consume/document.pdf' | ||||
|  | ||||
| This happens when paperless does not have permission to delete files inside the consumption directory. | ||||
| Ensure that ``USERMAP_UID`` and ``USERMAP_GID`` are set to the user id and group id you use on the host operating system, if these are | ||||
| different from ``1000``. See :ref:`setup-docker_hub`. | ||||
|  | ||||
| Also ensure that you are able to read and write to the consumption directory on the host. | ||||
|  | ||||
|  | ||||
| OSError: [Errno 19] No such device when consuming files | ||||
| ####################################################### | ||||
|  | ||||
| If you experience errors such as: | ||||
|  | ||||
| .. code:: shell-session | ||||
|  | ||||
|     File "/usr/local/lib/python3.7/site-packages/whoosh/codec/base.py", line 570, in open_compound_file | ||||
|     return CompoundStorage(dbfile, use_mmap=storage.supports_mmap) | ||||
|     File "/usr/local/lib/python3.7/site-packages/whoosh/filedb/compound.py", line 75, in __init__ | ||||
|     self._source = mmap.mmap(fileno, 0, access=mmap.ACCESS_READ) | ||||
|     OSError: [Errno 19] No such device | ||||
|  | ||||
|     During handling of the above exception, another exception occurred: | ||||
|  | ||||
|     Traceback (most recent call last): | ||||
|     File "/usr/local/lib/python3.7/site-packages/django_q/cluster.py", line 436, in worker | ||||
|     res = f(*task["args"], **task["kwargs"]) | ||||
|     File "/usr/src/paperless/src/documents/tasks.py", line 73, in consume_file | ||||
|     override_tag_ids=override_tag_ids) | ||||
|     File "/usr/src/paperless/src/documents/consumer.py", line 271, in try_consume_file | ||||
|     raise ConsumerError(e) | ||||
|  | ||||
| Paperless uses a search index to provide better and faster full text searching. This search index is stored inside | ||||
| the ``data`` folder. The search index uses memory-mapped files (mmap). The above error indicates that paperless | ||||
| was unable to create and open these files. | ||||
|  | ||||
| This happens when you're trying to store the data directory on certain file systems (mostly network shares) | ||||
| that don't support memory-mapped files. | ||||
|  | ||||
|  | ||||
| Web-UI stuck at "Loading..." | ||||
| ############################ | ||||
|  | ||||
| This might have multiple reasons. | ||||
|  | ||||
|  | ||||
| 1.  If you built the docker image yourself or deployed using the bare metal route, | ||||
|     make sure that there are files in ``<paperless-root>/static/frontend/<lang-code>/``. | ||||
|     If there are no files, make sure that you executed ``collectstatic`` successfully, either | ||||
|     manually or as part of the docker image build. | ||||
|  | ||||
|     If the front end is still missing, make sure that the front end is compiled (files present in | ||||
|     ``src/documents/static/frontend``). If it is not, you need to compile the front end yourself | ||||
|     or download the release archive instead of cloning the repository. | ||||
|  | ||||
| 2.  Check the output of the web server. You might see errors like this: | ||||
|  | ||||
|  | ||||
|     .. code:: | ||||
|  | ||||
|         [2021-01-25 10:08:04 +0000] [40] [ERROR] Socket error processing request. | ||||
|         Traceback (most recent call last): | ||||
|         File "/usr/local/lib/python3.7/site-packages/gunicorn/workers/sync.py", line 134, in handle | ||||
|             self.handle_request(listener, req, client, addr) | ||||
|         File "/usr/local/lib/python3.7/site-packages/gunicorn/workers/sync.py", line 190, in handle_request | ||||
|             util.reraise(*sys.exc_info()) | ||||
|         File "/usr/local/lib/python3.7/site-packages/gunicorn/util.py", line 625, in reraise | ||||
|             raise value | ||||
|         File "/usr/local/lib/python3.7/site-packages/gunicorn/workers/sync.py", line 178, in handle_request | ||||
|             resp.write_file(respiter) | ||||
|         File "/usr/local/lib/python3.7/site-packages/gunicorn/http/wsgi.py", line 396, in write_file | ||||
|             if not self.sendfile(respiter): | ||||
|         File "/usr/local/lib/python3.7/site-packages/gunicorn/http/wsgi.py", line 386, in sendfile | ||||
|             sent += os.sendfile(sockno, fileno, offset + sent, count) | ||||
|         OSError: [Errno 22] Invalid argument | ||||
|  | ||||
|     To fix this issue, add | ||||
|  | ||||
|     .. code:: | ||||
|  | ||||
|         SENDFILE=0 | ||||
|  | ||||
|     to your `docker-compose.env` file. | ||||
|  | ||||
| Error while reading metadata | ||||
| ############################ | ||||
|  | ||||
| You might find messages like these in your log files: | ||||
|  | ||||
| .. code:: | ||||
|  | ||||
|     [WARNING] [paperless.parsing.tesseract] Error while reading metadata | ||||
|  | ||||
| This indicates that paperless failed to read PDF metadata from one of your documents. This happens when you | ||||
| open the affected documents in paperless for editing. Paperless will continue to work, and will simply not | ||||
| show the invalid metadata. | ||||
|  | ||||
| Consumer fails with a FileNotFoundError | ||||
| ####################################### | ||||
|  | ||||
| You might find messages like these in your log files: | ||||
|  | ||||
| .. code:: | ||||
|  | ||||
|     [ERROR] [paperless.consumer] Error while consuming document SCN_0001.pdf: FileNotFoundError: [Errno 2] No such file or directory: '/tmp/ocrmypdf.io.yhk3zbv0/origin.pdf' | ||||
|     Traceback (most recent call last): | ||||
|       File "/app/paperless/src/paperless_tesseract/parsers.py", line 261, in parse | ||||
|         ocrmypdf.ocr(**args) | ||||
|       File "/usr/local/lib/python3.8/dist-packages/ocrmypdf/api.py", line 337, in ocr | ||||
|         return run_pipeline(options=options, plugin_manager=plugin_manager, api=True) | ||||
|       File "/usr/local/lib/python3.8/dist-packages/ocrmypdf/_sync.py", line 385, in run_pipeline | ||||
|         exec_concurrent(context, executor) | ||||
|       File "/usr/local/lib/python3.8/dist-packages/ocrmypdf/_sync.py", line 302, in exec_concurrent | ||||
|         pdf = post_process(pdf, context, executor) | ||||
|       File "/usr/local/lib/python3.8/dist-packages/ocrmypdf/_sync.py", line 235, in post_process | ||||
|         pdf_out = metadata_fixup(pdf_out, context) | ||||
|       File "/usr/local/lib/python3.8/dist-packages/ocrmypdf/_pipeline.py", line 798, in metadata_fixup | ||||
|         with pikepdf.open(context.origin) as original, pikepdf.open(working_file) as pdf: | ||||
|       File "/usr/local/lib/python3.8/dist-packages/pikepdf/_methods.py", line 923, in open | ||||
|         pdf = Pdf._open( | ||||
|     FileNotFoundError: [Errno 2] No such file or directory: '/tmp/ocrmypdf.io.yhk3zbv0/origin.pdf' | ||||
|  | ||||
| This probably indicates paperless tried to consume the same file twice.  This can happen for a number of reasons, | ||||
| depending on how documents are placed into the consume folder.  If paperless is using inotify (the default) to | ||||
| check for documents, try adjusting the :ref:`inotify configuration <configuration-inotify>`.  If polling is enabled, | ||||
| try adjusting the :ref:`polling configuration <configuration-polling>`. | ||||
|  | ||||
| Consumer fails waiting for file to remain unmodified. | ||||
| ##################################################### | ||||
|  | ||||
| You might find messages like these in your log files: | ||||
|  | ||||
| .. code:: | ||||
|  | ||||
|     [ERROR] [paperless.management.consumer] Timeout while waiting on file /usr/src/paperless/src/../consume/SCN_0001.pdf to remain unmodified. | ||||
|  | ||||
| This indicates paperless timed out while waiting for the file to be completely written to the consume folder. | ||||
| Adjusting :ref:`polling configuration <configuration-polling>` values should resolve the issue. | ||||
|  | ||||
| .. note:: | ||||
|  | ||||
|     The user will need to manually move the file out of the consume folder and | ||||
|     back in, for the initial failing file to be consumed. | ||||
|  | ||||
| Consumer fails reporting "OS reports file as busy still". | ||||
| ######################################################### | ||||
|  | ||||
| You might find messages like these in your log files: | ||||
|  | ||||
| .. code:: | ||||
|  | ||||
|     [WARNING] [paperless.management.consumer] Not consuming file /usr/src/paperless/src/../consume/SCN_0001.pdf: OS reports file as busy still | ||||
|  | ||||
| This indicates paperless was unable to open the file, as the OS reported the file as still being in use.  To prevent a | ||||
| crash, paperless did not try to consume the file.  If paperless is using inotify (the default) to | ||||
| check for documents, try adjusting the :ref:`inotify configuration <configuration-inotify>`.  If polling is enabled, | ||||
| try adjusting the :ref:`polling configuration <configuration-polling>`. | ||||
|  | ||||
| .. note:: | ||||
|  | ||||
|     The user will need to manually move the file out of the consume folder and | ||||
|     back in, for the initial failing file to be consumed. | ||||
|  | ||||
| Log reports "Creating PaperlessTask failed". | ||||
| ######################################################### | ||||
|  | ||||
| You might find messages like these in your log files: | ||||
|  | ||||
| .. code:: | ||||
|  | ||||
|     [ERROR] [paperless.management.consumer] Creating PaperlessTask failed: db locked | ||||
|  | ||||
| You are likely using an sqlite based installation, with an increased number of workers and are running into sqlite's concurrency limitations. | ||||
| Uploading or consuming multiple files at once results in many workers attempting to access the database simultaneously. | ||||
|  | ||||
| Consider changing to the PostgreSQL database if you will be processing many documents at once often.  Otherwise, | ||||
| try tweaking the ``PAPERLESS_DB_TIMEOUT`` setting to allow more time for the database to unlock.  This may have | ||||
| minor performance implications. | ||||
|  | ||||
|  | ||||
| gunicorn fails to start with "is not a valid port number" | ||||
| ######################################################### | ||||
|  | ||||
| You are likely running using Kubernetes, which automatically creates an environment variable named `${serviceName}_PORT`. | ||||
| This is the same environment variable which is used by Paperless to optionally change the port gunicorn listens on. | ||||
|  | ||||
| To fix this, set `PAPERLESS_PORT` again to your desired port, or the default of 8000. | ||||
|     You will be redirected shortly... | ||||
|   | ||||
| @@ -1,420 +1,12 @@ | ||||
| .. _usage_overview: | ||||
|  | ||||
| ************** | ||||
| Usage Overview | ||||
| ************** | ||||
|  | ||||
| Paperless is an application that manages your personal documents. With | ||||
| the help of a document scanner (see :ref:`scanners`), paperless transforms | ||||
| your wieldy physical document binders into a searchable archive and | ||||
| provides many utilities for finding and managing your documents. | ||||
|  | ||||
| .. cssclass:: redirect-notice | ||||
|  | ||||
| Terms and definitions | ||||
| ##################### | ||||
|     The Paperless-ngx documentation has permanently moved. | ||||
|  | ||||
| Paperless essentially consists of two different parts for managing your | ||||
| documents: | ||||
|  | ||||
| * The *consumer* watches a specified folder and adds all documents in that | ||||
|   folder to paperless. | ||||
| * The *web server* provides a UI that you use to manage and search for your | ||||
|   scanned documents. | ||||
|  | ||||
| Each document has a couple of fields that you can assign to them: | ||||
|  | ||||
| * A *Document* is a piece of paper that sometimes contains valuable | ||||
|   information. | ||||
| * The *correspondent* of a document is the person, institution or company that | ||||
|   a document either originates from, or is sent to. | ||||
| * A *tag* is a label that you can assign to documents. Think of labels as more | ||||
|   powerful folders: Multiple documents can be grouped together with a single | ||||
|   tag, however, a single document can also have multiple tags. This is not | ||||
|   possible with folders. The reason folders are not implemented in paperless | ||||
|   is simply that tags are much more versatile than folders. | ||||
| * A *document type* is used to demarcate the type of a document such as letter, | ||||
|   bank statement, invoice, contract, etc. It is used to identify what a document | ||||
|   is about. | ||||
| * The *date added* of a document is the date the document was scanned into | ||||
|   paperless. You cannot and should not change this date. | ||||
| * The *date created* of a document is the date the document was initially issued. | ||||
|   This can be the date you bought a product, the date you signed a contract, or | ||||
|   the date a letter was sent to you. | ||||
| * The *archive serial number* (short: ASN) of a document is the identifier of | ||||
|   the document in your physical document binders. See | ||||
|   :ref:`usage-recommended_workflow` below. | ||||
| * The *content* of a document is the text that was OCR'ed from the document. | ||||
|   This text is fed into the search engine and is used for matching tags, | ||||
|   correspondents and document types. | ||||
|  | ||||
|  | ||||
| Frontend overview | ||||
| ################# | ||||
|  | ||||
| .. warning:: | ||||
|  | ||||
|     TBD. Add some fancy screenshots! | ||||
|  | ||||
| Adding documents to paperless | ||||
| ############################# | ||||
|  | ||||
| Once you've got Paperless setup, you need to start feeding documents into it. | ||||
| When adding documents to paperless, it will perform the following operations on | ||||
| your documents: | ||||
|  | ||||
| 1.  OCR the document, if it has no text. Digital documents usually have text, | ||||
|     and this step will be skipped for those documents. | ||||
| 2.  Paperless will create an archivable PDF/A document from your document. | ||||
|     If this document is coming from your scanner, it will have embedded selectable text. | ||||
| 3.  Paperless performs automatic matching of tags, correspondents and types on the | ||||
|     document before storing it in the database. | ||||
|  | ||||
| .. hint:: | ||||
|  | ||||
|     This process can be configured to fit your needs. If you don't want paperless | ||||
|     to create archived versions for digital documents, you can configure that by | ||||
|     configuring ``PAPERLESS_OCR_MODE=skip_noarchive``. Please read the | ||||
|     :ref:`relevant section in the documentation <configuration-ocr>`. | ||||
|  | ||||
| .. note:: | ||||
|  | ||||
|     No matter which options you choose, Paperless will always store the original | ||||
|     document that it found in the consumption directory or in the mail and | ||||
|     will never overwrite that document. Archived versions are stored alongside the | ||||
|     original versions. | ||||
|  | ||||
|  | ||||
| The consumption directory | ||||
| ========================= | ||||
|  | ||||
| The primary method of getting documents into your database is by putting them in | ||||
| the consumption directory.  The consumer runs in an infinite loop, looking for new | ||||
| additions to this directory. When it finds them, the consumer goes about the process | ||||
| of parsing them with the OCR, indexing what it finds, and storing it in the media directory. | ||||
|  | ||||
| Getting stuff into this directory is up to you.  If you're running Paperless | ||||
| on your local computer, you might just want to drag and drop files there, but if | ||||
| you're running this on a server and want your scanner to automatically push | ||||
| files to this directory, you'll need to setup some sort of service to accept the | ||||
| files from the scanner.  Typically, you're looking at an FTP server like | ||||
| `Proftpd`_ or a Windows folder share with `Samba`_. | ||||
|  | ||||
| .. _Proftpd: http://www.proftpd.org/ | ||||
| .. _Samba: http://www.samba.org/ | ||||
|  | ||||
| .. TODO: hyperref to configuration of the location of this magic folder. | ||||
|  | ||||
| Web UI Upload | ||||
| ============= | ||||
|  | ||||
| The dashboard has a file drop field to upload documents to paperless. Simply drag a file | ||||
| onto this field or select a file with the file dialog. Multiple files are supported. | ||||
|  | ||||
| You can also upload documents on any other page of the web UI by dragging-and-dropping | ||||
| files into your browser window. | ||||
|  | ||||
| .. _usage-mobile_upload: | ||||
|  | ||||
| Mobile upload | ||||
| ============= | ||||
|  | ||||
| The mobile app over at `<https://github.com/qcasey/paperless_share>`_ allows Android users | ||||
| to share any documents with paperless. This can be combined with any of the mobile | ||||
| scanning apps out there, such as Office Lens. | ||||
|  | ||||
| Furthermore, there is the  `Paperless App <https://github.com/bauerj/paperless_app>`_ as well, | ||||
| which not only has document upload, but also document browsing and download features. | ||||
|  | ||||
| .. _usage-email: | ||||
|  | ||||
| IMAP (Email) | ||||
| ============ | ||||
|  | ||||
| You can tell paperless-ngx to consume documents from your email accounts. | ||||
| This is a very flexible and powerful feature, if you regularly received documents | ||||
| via mail that you need to archive. The mail consumer can be configured by using the | ||||
| admin interface in the following manner: | ||||
|  | ||||
| 1.  Define e-mail accounts. | ||||
| 2.  Define mail rules for your account. | ||||
|  | ||||
| These rules perform the following: | ||||
|  | ||||
| 1.  Connect to the mail server. | ||||
| 2.  Fetch all matching mails (as defined by folder, maximum age and the filters) | ||||
| 3.  Check if there are any consumable attachments. | ||||
| 4.  If so, instruct paperless to consume the attachments and optionally | ||||
|     use the metadata provided in the rule for the new document. | ||||
| 5.  If documents were consumed from a mail, the rule action is performed | ||||
|     on that mail. | ||||
|  | ||||
| Paperless will completely ignore mails that do not match your filters. It will also | ||||
| only perform the action on mails that it has consumed documents from. | ||||
|  | ||||
| The actions all ensure that the same mail is not consumed twice by different means. | ||||
| These are as follows: | ||||
|  | ||||
| *   **Delete:** Immediately deletes mail that paperless has consumed documents from. | ||||
|     Use with caution. | ||||
| *   **Mark as read:** Mark consumed mail as read. Paperless will not consume documents | ||||
|     from already read mails. If you read a mail before paperless sees it, it will be | ||||
|     ignored. | ||||
| *   **Flag:** Sets the 'important' flag on mails with consumed documents. Paperless | ||||
|     will not consume flagged mails. | ||||
| *   **Move to folder:** Moves consumed mails out of the way so that paperless wont | ||||
|     consume them again. | ||||
| *   **Add custom Tag:** Adds a custom tag to mails with consumed documents (the IMAP | ||||
|     standard calls these "keywords"). Paperless will not consume mails already tagged. | ||||
|     Not all mail servers support this feature! | ||||
|  | ||||
| .. caution:: | ||||
|  | ||||
|     The mail consumer will perform these actions on all mails it has consumed | ||||
|     documents from. Keep in mind that the actual consumption process may fail | ||||
|     for some reason, leaving you with missing documents in paperless. | ||||
|  | ||||
| .. note:: | ||||
|  | ||||
|     With the correct set of rules, you can completely automate your email documents. | ||||
|     Create rules for every correspondent you receive digital documents from and | ||||
|     paperless will read them automatically. The default action "mark as read" is | ||||
|     pretty tame and will not cause any damage or data loss whatsoever. | ||||
|  | ||||
|     You can also setup a special folder in your mail account for paperless and use | ||||
|     your favorite mail client to move to be consumed mails into that folder | ||||
|     automatically or manually and tell paperless to move them to yet another folder | ||||
|     after consumption. It's up to you. | ||||
|  | ||||
| .. note:: | ||||
|  | ||||
|     When defining a mail rule with a folder, you may need to try different characters to | ||||
|     define how the sub-folders are separated.  Common values include ".", "/" or "|", but | ||||
|     this varies by the mail server.  Check the documentation for your mail server.  In the | ||||
|     event of an error fetching mail from a certain folder, check the Paperless logs.  When | ||||
|     a folder is not located, Paperless will attempt to list all folders found in the account | ||||
|     to the Paperless logs. | ||||
|  | ||||
| .. note:: | ||||
|  | ||||
|     Paperless will process the rules in the order defined in the admin page. | ||||
|  | ||||
|     You can define catch-all rules and have them executed last to consume | ||||
|     any documents not matched by previous rules. Such a rule may assign an "Unknown | ||||
|     mail document" tag to consumed documents so you can inspect them further. | ||||
|  | ||||
| Paperless is set up to check your mails every 10 minutes. This can be configured on the | ||||
| 'Scheduled tasks' page in the admin. | ||||
|  | ||||
|  | ||||
| REST API | ||||
| ======== | ||||
|  | ||||
| You can also submit a document using the REST API, see :ref:`api-file_uploads` for details. | ||||
|  | ||||
| .. _basic-searching: | ||||
|  | ||||
|  | ||||
| Best practices | ||||
| ############## | ||||
|  | ||||
| Paperless offers a couple tools that help you organize your document collection. However, | ||||
| it is up to you to use them in a way that helps you organize documents and find specific | ||||
| documents when you need them. This section offers a couple ideas for managing your collection. | ||||
|  | ||||
| Document types allow you to classify documents according to what they are. You can define | ||||
| types such as "Receipt", "Invoice", or "Contract". If you used to collect all your receipts | ||||
| in a single binder, you can recreate that system in paperless by defining a document type, | ||||
| assigning documents to that type and then filtering by that type to only see all receipts. | ||||
|  | ||||
| Not all documents need document types. Sometimes its hard to determine what the type of a | ||||
| document is or it is hard to justify creating a document type that you only need once or twice. | ||||
| This is okay. As long as the types you define help you organize your collection in the way | ||||
| you want, paperless is doing its job. | ||||
|  | ||||
| Tags can be used in many different ways. Think of tags are more versatile folders or binders. | ||||
| If you have a binder for documents related to university / your car or health care, you can | ||||
| create these binders in paperless by creating tags and assigning them to relevant documents. | ||||
| Just as with documents, you can filter the document list by tags and only see documents of | ||||
| a certain topic. | ||||
|  | ||||
| With physical documents, you'll often need to decide which folder the document belongs to. | ||||
| The advantage of tags over folders and binders is that a single document can have multiple | ||||
| tags. A physical document cannot magically appear in two different folders, but with tags, | ||||
| this is entirely possible. | ||||
|  | ||||
| .. hint:: | ||||
|  | ||||
|   This can be used in many different ways. One example: Imagine you're working on a particular | ||||
|   task, such as signing up for university. Usually you'll need to collect a bunch of different | ||||
|   documents that are already sorted into various folders. With the tag system of paperless, | ||||
|   you can create a new group of documents that are relevant to this task without destroying | ||||
|   the already existing organization. When you're done with the task, you could delete the | ||||
|   tag again, which would be equal to sorting documents back into the folder they belong into. | ||||
|   Or keep the tag, up to you. | ||||
|  | ||||
| All of the logic above applies to correspondents as well. Attach them to documents if you | ||||
| feel that they help you organize your collection. | ||||
|  | ||||
| When you've started organizing your documents, create a couple saved views for document collections | ||||
| you regularly access. This is equal to having labeled physical binders on your desk, except | ||||
| that these saved views are dynamic and simply update themselves as you add documents to the system. | ||||
|  | ||||
| Here are a couple examples of tags and types that you could use in your collection. | ||||
|  | ||||
| * An ``inbox`` tag for newly added documents that you haven't manually edited yet. | ||||
| * A tag ``car`` for everything car related (repairs, registration, insurance, etc) | ||||
| * A tag ``todo`` for documents that you still need to do something with, such as reply, or | ||||
|   perform some task online. | ||||
| * A tag ``bank account x`` for all bank statement related to that account. | ||||
| * A tag ``mail`` for anything that you added to paperless via its mail processing capabilities. | ||||
| * A tag ``missing_metadata`` when you still need to add some metadata to a document, but can't | ||||
|   or don't want to do this right now. | ||||
|  | ||||
| .. _basic-usage_searching: | ||||
|  | ||||
| Searching | ||||
| ######### | ||||
|  | ||||
| Paperless offers an extensive searching mechanism that is designed to allow you to quickly | ||||
| find a document you're looking for (for example, that thing that just broke and you bought | ||||
| a couple months ago, that contract you signed 8 years ago). | ||||
|  | ||||
| When you search paperless for a document, it tries to match this query against your documents. | ||||
| Paperless will look for matching documents by inspecting their content, title, correspondent, | ||||
| type and tags. Paperless returns a scored list of results, so that documents matching your query | ||||
| better will appear further up in the search results. | ||||
|  | ||||
| By default, paperless returns only documents which contain all words typed in the search bar. | ||||
| However, paperless also offers advanced search syntax if you want to drill down the results | ||||
| further. | ||||
|  | ||||
| Matching documents with logical expressions: | ||||
|  | ||||
| .. code:: | ||||
|  | ||||
|   shopname AND (product1 OR product2) | ||||
|  | ||||
| Matching specific tags, correspondents or types: | ||||
|  | ||||
| .. code:: | ||||
|  | ||||
|   type:invoice tag:unpaid | ||||
|   correspondent:university certificate | ||||
|  | ||||
| Matching dates: | ||||
|  | ||||
| .. code:: | ||||
|  | ||||
|   created:[2005 to 2009] | ||||
|   added:yesterday | ||||
|   modified:today | ||||
|  | ||||
| Matching inexact words: | ||||
|  | ||||
| .. code:: | ||||
|  | ||||
|   produ*name | ||||
|  | ||||
| .. note:: | ||||
|  | ||||
|   Inexact terms are hard for search indexes. These queries might take a while to execute. That's why paperless offers | ||||
|   auto complete and query correction. | ||||
|  | ||||
| All of these constructs can be combined as you see fit. | ||||
| If you want to learn more about the query language used by paperless, paperless uses Whoosh's default query language. | ||||
| Head over to `Whoosh query language <https://whoosh.readthedocs.io/en/latest/querylang.html>`_. | ||||
| For details on what date parsing utilities are available, see | ||||
| `Date parsing <https://whoosh.readthedocs.io/en/latest/dates.html#parsing-date-queries>`_. | ||||
|  | ||||
|  | ||||
| .. _usage-recommended_workflow: | ||||
|  | ||||
| The recommended workflow | ||||
| ######################## | ||||
|  | ||||
| Once you have familiarized yourself with paperless and are ready to use it | ||||
| for all your documents, the recommended workflow for managing your documents | ||||
| is as follows. This workflow also takes into account that some documents | ||||
| have to be kept in physical form, but still ensures that you get all the | ||||
| advantages for these documents as well. | ||||
|  | ||||
| The following diagram shows how easy it is to manage your documents. | ||||
|  | ||||
| .. image:: _static/recommended_workflow.png | ||||
|  | ||||
| Preparations in paperless | ||||
| ========================= | ||||
|  | ||||
| * Create an inbox tag that gets assigned to all new documents. | ||||
| * Create a TODO tag. | ||||
|  | ||||
| Processing of the physical documents | ||||
| ==================================== | ||||
|  | ||||
| Keep a physical inbox. Whenever you receive a document that you need to | ||||
| archive, put it into your inbox. Regularly, do the following for all documents | ||||
| in your inbox: | ||||
|  | ||||
| 1.  For each document, decide if you need to keep the document in physical | ||||
|     form. This applies to certain important documents, such as contracts and | ||||
|     certificates. | ||||
| 2.  If you need to keep the document, write a running number on the document | ||||
|     before scanning, starting at one and counting upwards. This is the archive | ||||
|     serial number, or ASN in short. | ||||
| 3.  Scan the document. | ||||
| 4.  If the document has an ASN assigned, store it in a *single* binder, sorted | ||||
|     by ASN. Don't order this binder in any other way. | ||||
| 5.  If the document has no ASN, throw it away. Yay! | ||||
|  | ||||
| Over time, you will notice that your physical binder will fill up. If it is | ||||
| full, label the binder with the range of ASNs in this binder (i.e., "Documents | ||||
| 1 to 343"), store the binder in your cellar or elsewhere, and start a new | ||||
| binder. | ||||
|  | ||||
| The idea behind this process is that you will never have to use the physical | ||||
| binders to find a document. If you need a specific physical document, you | ||||
| may find this document by: | ||||
|  | ||||
| 1.  Searching in paperless for the document. | ||||
| 2.  Identify the ASN of the document, since it appears on the scan. | ||||
| 3.  Grab the relevant document binder and get the document. This is easy since | ||||
|     they are sorted by ASN. | ||||
|  | ||||
| Processing of documents in paperless | ||||
| ==================================== | ||||
|  | ||||
| Once you have scanned in a document, proceed in paperless as follows. | ||||
|  | ||||
| 1.  If the document has an ASN, assign the ASN to the document. | ||||
| 2.  Assign a correspondent to the document (i.e., your employer, bank, etc) | ||||
|     This isn't strictly necessary but helps in finding a document when you need | ||||
|     it. | ||||
| 3.  Assign a document type (i.e., invoice, bank statement, etc) to the document | ||||
|     This isn't strictly necessary but helps in finding a document when you need | ||||
|     it. | ||||
| 4.  Assign a proper title to the document (the name of an item you bought, the | ||||
|     subject of the letter, etc) | ||||
| 5.  Check that the date of the document is correct. Paperless tries to read | ||||
|     the date from the content of the document, but this fails sometimes if the | ||||
|     OCR is bad or multiple dates appear on the document. | ||||
| 6.  Remove inbox tags from the documents. | ||||
|  | ||||
| .. hint:: | ||||
|  | ||||
|     You can setup manual matching rules for your correspondents and tags and | ||||
|     paperless will assign them automatically. After consuming a couple documents, | ||||
|     you can even ask paperless to *learn* when to assign tags and correspondents | ||||
|     by itself. For details on this feature, see :ref:`advanced-matching`. | ||||
|  | ||||
| Task management | ||||
| =============== | ||||
|  | ||||
| Some documents require attention and require you to act on the document. You | ||||
| may take two different approaches to handle these documents based on how | ||||
| regularly you intend to scan documents and use paperless. | ||||
|  | ||||
| * If you scan and process your documents in paperless regularly, assign a | ||||
|   TODO tag to all scanned documents that you need to process. Create a saved | ||||
|   view on the dashboard that shows all documents with this tag. | ||||
| * If you do not scan documents regularly and use paperless solely for archiving, | ||||
|   create a physical todo box next to your physical inbox and put documents you | ||||
|   need to process in the TODO box. When you performed the task associated with | ||||
|   the document, move it to the inbox. | ||||
|     You will be redirected shortly... | ||||
|   | ||||
		Reference in New Issue
	
	Block a user