mirror of
				https://github.com/paperless-ngx/paperless-ngx.git
				synced 2025-10-24 03:26:11 -05:00 
			
		
		
		
	Compare commits
	
		
			3 Commits
		
	
	
		
			v2.13.4
			...
			sunset-rtd
		
	
	| Author | SHA1 | Date | |
|---|---|---|---|
|   | 15f4808fec | ||
|   | d531805597 | ||
|   | 304cfc42a9 | 
							
								
								
									
										8
									
								
								docs/_static/css/custom.css
									
									
									
									
										vendored
									
									
								
							
							
						
						
									
										8
									
								
								docs/_static/css/custom.css
									
									
									
									
										vendored
									
									
								
							| @@ -595,3 +595,11 @@ html.writer-html5 .rst-content dl.footnote code { | |||||||
| .wy-nav-content-wrap { | .wy-nav-content-wrap { | ||||||
|   z-index: 20; |   z-index: 20; | ||||||
| } | } | ||||||
|  |  | ||||||
|  | .rst-content .toctree-wrapper { | ||||||
|  |   display: none; | ||||||
|  | } | ||||||
|  |  | ||||||
|  | .redirect-notice { | ||||||
|  |   font-size: 2.5rem; | ||||||
|  | } | ||||||
|   | |||||||
							
								
								
									
										25
									
								
								docs/_templates/layout.html
									
									
									
									
										vendored
									
									
								
							
							
						
						
									
										25
									
								
								docs/_templates/layout.html
									
									
									
									
										vendored
									
									
								
							| @@ -8,6 +8,31 @@ | |||||||
|  |  | ||||||
|         document.documentElement.classList.toggle("dark-mode", darkModeState); |         document.documentElement.classList.toggle("dark-mode", darkModeState); | ||||||
|         document.documentElement.classList.toggle("light-mode", !darkModeState); |         document.documentElement.classList.toggle("light-mode", !darkModeState); | ||||||
|  |  | ||||||
|  |         const RTD_TO_MKD = { | ||||||
|  |             "index.html": "", | ||||||
|  |             "setup.html": "setup", | ||||||
|  |             "usage_overview.html": "usage", | ||||||
|  |             "advanced_usage.html": "advanced_usage", | ||||||
|  |             "administration.html": "administration", | ||||||
|  |             "configuration.html": "configuration", | ||||||
|  |             "api.html": "api", | ||||||
|  |             "faq.html": "faq", | ||||||
|  |             "troubleshooting.html": "troubleshooting", | ||||||
|  |             "extending.html": "development", | ||||||
|  |             "scanners.html": "", | ||||||
|  |             "screenshots.html": "", | ||||||
|  |             "changelog.html": "changelog", | ||||||
|  |         } | ||||||
|  |  | ||||||
|  |         const path = (RTD_TO_MKD[window.location.pathname.substring(window.location.pathname.lastIndexOf("/") + 1)] ?? "") + "/"; | ||||||
|  |         const hash = window.location.hash; | ||||||
|  |         const redirectURL = new URL(path  + hash, "https://docs.paperless-ngx.com/"); | ||||||
|  |         console.log(`Redirecting to ${redirectURL} in 3 seconds...`); | ||||||
|  |  | ||||||
|  |         setTimeout(() => { | ||||||
|  |             window.location.replace(redirectURL); | ||||||
|  |         }, 3000); | ||||||
|     </script> |     </script> | ||||||
|     {{ super() }} |     {{ super() }} | ||||||
| {% endblock %} | {% endblock %} | ||||||
|   | |||||||
| @@ -1,531 +1,11 @@ | |||||||
|  | .. _administration: | ||||||
|  |  | ||||||
| ************** | ************** | ||||||
| Administration | Administration | ||||||
| ************** | ************** | ||||||
|  |  | ||||||
| .. _administration-backup: | .. cssclass:: redirect-notice | ||||||
|  |  | ||||||
| Making backups |     The Paperless-ngx documentation has permanently moved. | ||||||
| ############## |  | ||||||
|  |  | ||||||
| Multiple options exist for making backups of your paperless instance, |     You will be redirected shortly... | ||||||
| depending on how you installed paperless. |  | ||||||
|  |  | ||||||
| Before making backups, make sure that paperless is not running. |  | ||||||
|  |  | ||||||
| Options available to any installation of paperless: |  | ||||||
|  |  | ||||||
| *   Use the :ref:`document exporter <utilities-exporter>`. |  | ||||||
|     The document exporter exports all your documents, thumbnails and |  | ||||||
|     metadata to a specific folder. You may import your documents into a |  | ||||||
|     fresh instance of paperless again or store your documents in another |  | ||||||
|     DMS with this export. |  | ||||||
| *   The document exporter is also able to update an already existing export. |  | ||||||
|     Therefore, incremental backups with ``rsync`` are entirely possible. |  | ||||||
|  |  | ||||||
| .. caution:: |  | ||||||
|  |  | ||||||
|     You cannot import the export generated with one version of paperless in a |  | ||||||
|     different version of paperless. The export contains an exact image of the |  | ||||||
|     database, and migrations may change the database layout. |  | ||||||
|  |  | ||||||
| Options available to docker installations: |  | ||||||
|  |  | ||||||
| *   Backup the docker volumes. These usually reside within |  | ||||||
|     ``/var/lib/docker/volumes`` on the host and you need to be root in order |  | ||||||
|     to access them. |  | ||||||
|  |  | ||||||
|     Paperless uses 4 volumes: |  | ||||||
|  |  | ||||||
|     *   ``paperless_media``: This is where your documents are stored. |  | ||||||
|     *   ``paperless_data``: This is where auxillary data is stored. This |  | ||||||
|         folder also contains the SQLite database, if you use it. |  | ||||||
|     *   ``paperless_pgdata``: Exists only if you use PostgreSQL and contains |  | ||||||
|         the database. |  | ||||||
|     *   ``paperless_dbdata``: Exists only if you use MariaDB and contains |  | ||||||
|         the database. |  | ||||||
|  |  | ||||||
| Options available to bare-metal and non-docker installations: |  | ||||||
|  |  | ||||||
| *   Backup the entire paperless folder. This ensures that if your paperless instance |  | ||||||
|     crashes at some point or your disk fails, you can simply copy the folder back |  | ||||||
|     into place and it works. |  | ||||||
|  |  | ||||||
|     When using PostgreSQL or MariaDB, you'll also have to backup the database. |  | ||||||
|  |  | ||||||
| .. _migrating-restoring: |  | ||||||
|  |  | ||||||
| Restoring |  | ||||||
| ========= |  | ||||||
|  |  | ||||||
| .. _administration-updating: |  | ||||||
|  |  | ||||||
| Updating Paperless |  | ||||||
| ################## |  | ||||||
|  |  | ||||||
| Docker Route |  | ||||||
| ============ |  | ||||||
|  |  | ||||||
| If a new release of paperless-ngx is available, upgrading depends on how you |  | ||||||
| installed paperless-ngx in the first place. The releases are available at the |  | ||||||
| `release page <https://github.com/paperless-ngx/paperless-ngx/releases>`_. |  | ||||||
|  |  | ||||||
| First of all, ensure that paperless is stopped. |  | ||||||
|  |  | ||||||
| .. code:: shell-session |  | ||||||
|  |  | ||||||
|     $ cd /path/to/paperless |  | ||||||
|     $ docker-compose down |  | ||||||
|  |  | ||||||
| After that, :ref:`make a backup <administration-backup>`. |  | ||||||
|  |  | ||||||
| A.  If you pull the image from the docker hub, all you need to do is: |  | ||||||
|  |  | ||||||
|     .. code:: shell-session |  | ||||||
|  |  | ||||||
|         $ docker-compose pull |  | ||||||
|         $ docker-compose up |  | ||||||
|  |  | ||||||
|     The docker-compose files refer to the ``latest`` version, which is always the latest |  | ||||||
|     stable release. |  | ||||||
|  |  | ||||||
| B.  If you built the image yourself, do the following: |  | ||||||
|  |  | ||||||
|     .. code:: shell-session |  | ||||||
|  |  | ||||||
|         $ git pull |  | ||||||
|         $ docker-compose build |  | ||||||
|         $ docker-compose up |  | ||||||
|  |  | ||||||
| Running ``docker-compose up`` will also apply any new database migrations. |  | ||||||
| If you see everything working, press CTRL+C once to gracefully stop paperless. |  | ||||||
| Then you can start paperless-ngx with ``-d`` to have it run in the background. |  | ||||||
|  |  | ||||||
|     .. note:: |  | ||||||
|  |  | ||||||
|         In version 0.9.14, the update process was changed. In 0.9.13 and earlier, the |  | ||||||
|         docker-compose files specified exact versions and pull won't automatically |  | ||||||
|         update to newer versions. In order to enable updates as described above, either |  | ||||||
|         get the new ``docker-compose.yml`` file from `here <https://github.com/paperless-ngx/paperless-ngx/tree/master/docker/compose>`_ |  | ||||||
|         or edit the ``docker-compose.yml`` file, find the line that says |  | ||||||
|  |  | ||||||
|             .. code:: |  | ||||||
|  |  | ||||||
|                 image: ghcr.io/paperless-ngx/paperless-ngx:0.9.x |  | ||||||
|  |  | ||||||
|         and replace the version with ``latest``: |  | ||||||
|  |  | ||||||
|             .. code:: |  | ||||||
|  |  | ||||||
|                 image: ghcr.io/paperless-ngx/paperless-ngx:latest |  | ||||||
|  |  | ||||||
|     .. note:: |  | ||||||
|         In version 1.7.1 and onwards, the Docker image can now be pinned to a release series. |  | ||||||
|         This is often combined with automatic updaters such as Watchtower to allow safer |  | ||||||
|         unattended upgrading to new bugfix releases only.  It is still recommended to always |  | ||||||
|         review release notes before upgrading.  To pin your install to a release series, edit |  | ||||||
|         the ``docker-compose.yml`` find the line that says |  | ||||||
|  |  | ||||||
|             .. code:: |  | ||||||
|  |  | ||||||
|                 image: ghcr.io/paperless-ngx/paperless-ngx:latest |  | ||||||
|  |  | ||||||
|         and replace the version with the series you want to track, for example: |  | ||||||
|  |  | ||||||
|             .. code:: |  | ||||||
|  |  | ||||||
|                 image: ghcr.io/paperless-ngx/paperless-ngx:1.7 |  | ||||||
|  |  | ||||||
| Bare Metal Route |  | ||||||
| ================ |  | ||||||
|  |  | ||||||
| After grabbing the new release and unpacking the contents, do the following: |  | ||||||
|  |  | ||||||
| 1.  Update dependencies. New paperless version may require additional |  | ||||||
|     dependencies. The dependencies required are listed in the section about |  | ||||||
|     :ref:`bare metal installations <setup-bare_metal>`. |  | ||||||
|  |  | ||||||
| 2.  Update python requirements. Keep in mind to activate your virtual environment |  | ||||||
|     before that, if you use one. |  | ||||||
|  |  | ||||||
|     .. code:: shell-session |  | ||||||
|  |  | ||||||
|         $ pip install -r requirements.txt |  | ||||||
|  |  | ||||||
| 3.  Migrate the database. |  | ||||||
|  |  | ||||||
|     .. code:: shell-session |  | ||||||
|  |  | ||||||
|         $ cd src |  | ||||||
|         $ python3 manage.py migrate |  | ||||||
|  |  | ||||||
|     This might not actually do anything. Not every new paperless version comes with new |  | ||||||
|     database migrations. |  | ||||||
|  |  | ||||||
| Downgrading Paperless |  | ||||||
| ##################### |  | ||||||
|  |  | ||||||
| Downgrades are possible. However, some updates also contain database migrations (these change the layout of the database and may move data). |  | ||||||
| In order to move back from a version that applied database migrations, you'll have to revert the database migration *before* downgrading, |  | ||||||
| and then downgrade paperless. |  | ||||||
|  |  | ||||||
| This table lists the compatible versions for each database migration number. |  | ||||||
|  |  | ||||||
| +------------------+-----------------+ |  | ||||||
| | Migration number | Version range   | |  | ||||||
| +------------------+-----------------+ |  | ||||||
| | 1011             | 1.0.0           | |  | ||||||
| +------------------+-----------------+ |  | ||||||
| | 1012             | 1.1.0 - 1.2.1   | |  | ||||||
| +------------------+-----------------+ |  | ||||||
| | 1014             | 1.3.0 - 1.3.1   | |  | ||||||
| +------------------+-----------------+ |  | ||||||
| | 1016             | 1.3.2 - current | |  | ||||||
| +------------------+-----------------+ |  | ||||||
|  |  | ||||||
| Execute the following management command to migrate your database: |  | ||||||
|  |  | ||||||
| .. code:: shell-session |  | ||||||
|  |  | ||||||
|     $ python3 manage.py migrate documents <migration number> |  | ||||||
|  |  | ||||||
| .. note:: |  | ||||||
|  |  | ||||||
|     Some migrations cannot be undone. The command will issue errors if that happens. |  | ||||||
|  |  | ||||||
| .. _utilities-management-commands: |  | ||||||
|  |  | ||||||
| Management utilities |  | ||||||
| #################### |  | ||||||
|  |  | ||||||
| Paperless comes with some management commands that perform various maintenance |  | ||||||
| tasks on your paperless instance. You can invoke these commands in the following way: |  | ||||||
|  |  | ||||||
| With docker-compose, while paperless is running: |  | ||||||
|  |  | ||||||
| .. code:: shell-session |  | ||||||
|  |  | ||||||
|     $ cd /path/to/paperless |  | ||||||
|     $ docker-compose exec webserver <command> <arguments> |  | ||||||
|  |  | ||||||
| With docker, while paperless is running: |  | ||||||
|  |  | ||||||
| .. code:: shell-session |  | ||||||
|  |  | ||||||
|     $ docker exec -it <container-name> <command> <arguments> |  | ||||||
|  |  | ||||||
| Bare metal: |  | ||||||
|  |  | ||||||
| .. code:: shell-session |  | ||||||
|  |  | ||||||
|     $ cd /path/to/paperless/src |  | ||||||
|     $ python3 manage.py <command> <arguments> |  | ||||||
|  |  | ||||||
| All commands have built-in help, which can be accessed by executing them with |  | ||||||
| the argument ``--help``. |  | ||||||
|  |  | ||||||
| .. _utilities-exporter: |  | ||||||
|  |  | ||||||
| Document exporter |  | ||||||
| ================= |  | ||||||
|  |  | ||||||
| The document exporter exports all your data from paperless into a folder for |  | ||||||
| backup or migration to another DMS. |  | ||||||
|  |  | ||||||
| If you use the document exporter within a cronjob to backup your data you might use the ``-T`` flag behind exec to suppress "The input device is not a TTY" errors. For example: ``docker-compose exec -T webserver document_exporter ../export`` |  | ||||||
|  |  | ||||||
| .. code:: |  | ||||||
|  |  | ||||||
|     document_exporter target [-c] [-f] [-d] |  | ||||||
|  |  | ||||||
|     optional arguments: |  | ||||||
|     -c, --compare-checksums |  | ||||||
|     -f, --use-filename-format |  | ||||||
|     -d, --delete |  | ||||||
|  |  | ||||||
| ``target`` is a folder to which the data gets written. This includes documents, |  | ||||||
| thumbnails and a ``manifest.json`` file. The manifest contains all metadata from |  | ||||||
| the database (correspondents, tags, etc). |  | ||||||
|  |  | ||||||
| When you use the provided docker compose script, specify ``../export`` as the |  | ||||||
| target. This path inside the container is automatically mounted on your host on |  | ||||||
| the folder ``export``. |  | ||||||
|  |  | ||||||
| If the target directory already exists and contains files, paperless will assume |  | ||||||
| that the contents of the export directory are a previous export and will attempt |  | ||||||
| to update the previous export. Paperless will only export changed and added files. |  | ||||||
| Paperless determines whether a file has changed by inspecting the file attributes |  | ||||||
| "date/time modified" and "size". If that does not work out for you, specify |  | ||||||
| ``--compare-checksums`` and paperless will attempt to compare file checksums instead. |  | ||||||
| This is slower. |  | ||||||
|  |  | ||||||
| Paperless will not remove any existing files in the export directory. If you want |  | ||||||
| paperless to also remove files that do not belong to the current export such as files |  | ||||||
| from deleted documents, specify ``--delete``. Be careful when pointing paperless to |  | ||||||
| a directory that already contains other files. |  | ||||||
|  |  | ||||||
| The filenames generated by this command follow the format |  | ||||||
| ``[date created] [correspondent] [title].[extension]``. |  | ||||||
| If you want paperless to use ``PAPERLESS_FILENAME_FORMAT`` for exported filenames |  | ||||||
| instead, specify ``--use-filename-format``. |  | ||||||
|  |  | ||||||
|  |  | ||||||
| .. _utilities-importer: |  | ||||||
|  |  | ||||||
| Document importer |  | ||||||
| ================= |  | ||||||
|  |  | ||||||
| The document importer takes the export produced by the `Document exporter`_ and |  | ||||||
| imports it into paperless. |  | ||||||
|  |  | ||||||
| The importer works just like the exporter.  You point it at a directory, and |  | ||||||
| the script does the rest of the work: |  | ||||||
|  |  | ||||||
| .. code:: |  | ||||||
|  |  | ||||||
|     document_importer source |  | ||||||
|  |  | ||||||
| When you use the provided docker compose script, put the export inside the |  | ||||||
| ``export`` folder in your paperless source directory. Specify ``../export`` |  | ||||||
| as the ``source``. |  | ||||||
|  |  | ||||||
| .. note:: |  | ||||||
|  |  | ||||||
|     Importing from a previous version of Paperless may work, but for best results |  | ||||||
|     it is suggested to match the versions. |  | ||||||
|  |  | ||||||
| .. _utilities-retagger: |  | ||||||
|  |  | ||||||
| Document retagger |  | ||||||
| ================= |  | ||||||
|  |  | ||||||
| Say you've imported a few hundred documents and now want to introduce |  | ||||||
| a tag or set up a new correspondent, and apply its matching to all of |  | ||||||
| the currently-imported docs. This problem is common enough that |  | ||||||
| there are tools for it. |  | ||||||
|  |  | ||||||
| .. code:: |  | ||||||
|  |  | ||||||
|     document_retagger [-h] [-c] [-T] [-t] [-i] [--use-first] [-f] |  | ||||||
|  |  | ||||||
|     optional arguments: |  | ||||||
|     -c, --correspondent |  | ||||||
|     -T, --tags |  | ||||||
|     -t, --document_type |  | ||||||
|     -s, --storage_path |  | ||||||
|     -i, --inbox-only |  | ||||||
|     --use-first |  | ||||||
|     -f, --overwrite |  | ||||||
|  |  | ||||||
| Run this after changing or adding matching rules. It'll loop over all |  | ||||||
| of the documents in your database and attempt to match documents |  | ||||||
| according to the new rules. |  | ||||||
|  |  | ||||||
| Specify any combination of ``-c``, ``-T``, ``-t`` and ``-s`` to have the |  | ||||||
| retagger perform matching of the specified metadata type. If you don't |  | ||||||
| specify any of these options, the document retagger won't do anything. |  | ||||||
|  |  | ||||||
| Specify ``-i`` to have the document retagger work on documents tagged |  | ||||||
| with inbox tags only. This is useful when you don't want to mess with |  | ||||||
| your already processed documents. |  | ||||||
|  |  | ||||||
| When multiple document types or correspondents match a single document, |  | ||||||
| the retagger won't assign these to the document. Specify ``--use-first`` |  | ||||||
| to override this behavior and just use the first correspondent or type |  | ||||||
| it finds. This option does not apply to tags, since any amount of tags |  | ||||||
| can be applied to a document. |  | ||||||
|  |  | ||||||
| Finally, ``-f`` specifies that you wish to overwrite already assigned |  | ||||||
| correspondents, types and/or tags. The default behavior is to not |  | ||||||
| assign correspondents and types to documents that have this data already |  | ||||||
| assigned. ``-f`` works differently for tags: By default, only additional tags get |  | ||||||
| added to documents, no tags will be removed. With ``-f``, tags that don't |  | ||||||
| match a document anymore get removed as well. |  | ||||||
|  |  | ||||||
|  |  | ||||||
| Managing the Automatic matching algorithm |  | ||||||
| ========================================= |  | ||||||
|  |  | ||||||
| The *Auto* matching algorithm requires a trained neural network to work. |  | ||||||
| This network needs to be updated whenever somethings in your data |  | ||||||
| changes. The docker image takes care of that automatically with the task |  | ||||||
| scheduler. You can manually renew the classifier by invoking the following |  | ||||||
| management command: |  | ||||||
|  |  | ||||||
| .. code:: |  | ||||||
|  |  | ||||||
|     document_create_classifier |  | ||||||
|  |  | ||||||
| This command takes no arguments. |  | ||||||
|  |  | ||||||
| .. _`administration-index`: |  | ||||||
|  |  | ||||||
| Managing the document search index |  | ||||||
| ================================== |  | ||||||
|  |  | ||||||
| The document search index is responsible for delivering search results for the |  | ||||||
| website. The document index is automatically updated whenever documents get |  | ||||||
| added to, changed, or removed from paperless. However, if the search yields |  | ||||||
| non-existing documents or won't find anything, you may need to recreate the |  | ||||||
| index manually. |  | ||||||
|  |  | ||||||
| .. code:: |  | ||||||
|  |  | ||||||
|     document_index {reindex,optimize} |  | ||||||
|  |  | ||||||
| Specify ``reindex`` to have the index created from scratch. This may take some |  | ||||||
| time. |  | ||||||
|  |  | ||||||
| Specify ``optimize`` to optimize the index. This updates certain aspects of |  | ||||||
| the index and usually makes queries faster and also ensures that the |  | ||||||
| autocompletion works properly. This command is regularly invoked by the task |  | ||||||
| scheduler. |  | ||||||
|  |  | ||||||
| .. _utilities-renamer: |  | ||||||
|  |  | ||||||
| Managing filenames |  | ||||||
| ================== |  | ||||||
|  |  | ||||||
| If you use paperless' feature to |  | ||||||
| :ref:`assign custom filenames to your documents <advanced-file_name_handling>`, |  | ||||||
| you can use this command to move all your files after changing |  | ||||||
| the naming scheme. |  | ||||||
|  |  | ||||||
| .. warning:: |  | ||||||
|  |  | ||||||
|     Since this command moves your documents, it is advised to do |  | ||||||
|     a backup beforehand. The renaming logic is robust and will never overwrite |  | ||||||
|     or delete a file, but you can't ever be careful enough. |  | ||||||
|  |  | ||||||
| .. code:: |  | ||||||
|  |  | ||||||
|     document_renamer |  | ||||||
|  |  | ||||||
| The command takes no arguments and processes all your documents at once. |  | ||||||
|  |  | ||||||
| Learn how to use :ref:`Management Utilities<utilities-management-commands>`. |  | ||||||
|  |  | ||||||
|  |  | ||||||
| .. _utilities-sanity-checker: |  | ||||||
|  |  | ||||||
| Sanity checker |  | ||||||
| ============== |  | ||||||
|  |  | ||||||
| Paperless has a built-in sanity checker that inspects your document collection for issues. |  | ||||||
|  |  | ||||||
| The issues detected by the sanity checker are as follows: |  | ||||||
|  |  | ||||||
| * Missing original files. |  | ||||||
| * Missing archive files. |  | ||||||
| * Inaccessible original files due to improper permissions. |  | ||||||
| * Inaccessible archive files due to improper permissions. |  | ||||||
| * Corrupted original documents by comparing their checksum against what is stored in the database. |  | ||||||
| * Corrupted archive documents by comparing their checksum against what is stored in the database. |  | ||||||
| * Missing thumbnails. |  | ||||||
| * Inaccessible thumbnails due to improper permissions. |  | ||||||
| * Documents without any content (warning). |  | ||||||
| * Orphaned files in the media directory (warning). These are files that are not referenced by any document im paperless. |  | ||||||
|  |  | ||||||
|  |  | ||||||
| .. code:: |  | ||||||
|  |  | ||||||
|     document_sanity_checker |  | ||||||
|  |  | ||||||
| The command takes no arguments. Depending on the size of your document archive, this may take some time. |  | ||||||
|  |  | ||||||
|  |  | ||||||
| Fetching e-mail |  | ||||||
| =============== |  | ||||||
|  |  | ||||||
| Paperless automatically fetches your e-mail every 10 minutes by default. If |  | ||||||
| you want to invoke the email consumer manually, call the following management |  | ||||||
| command: |  | ||||||
|  |  | ||||||
| .. code:: |  | ||||||
|  |  | ||||||
|     mail_fetcher |  | ||||||
|  |  | ||||||
| The command takes no arguments and processes all your mail accounts and rules. |  | ||||||
|  |  | ||||||
|  .. note:: |  | ||||||
|  |  | ||||||
|     As of October 2022 Microsoft no longer supports IMAP authentication for Exchange |  | ||||||
|     servers, thus Exchange is no longer supported until a solution is implemented in |  | ||||||
|     the Python IMAP library used by Paperless. See  `learn.microsoft.com`_ |  | ||||||
|  |  | ||||||
| .. _learn.microsoft.com: https://learn.microsoft.com/en-us/exchange/clients-and-mobile-in-exchange-online/deprecation-of-basic-authentication-exchange-online |  | ||||||
|  |  | ||||||
| .. _utilities-archiver: |  | ||||||
|  |  | ||||||
| Creating archived documents |  | ||||||
| =========================== |  | ||||||
|  |  | ||||||
| Paperless stores archived PDF/A documents alongside your original documents. |  | ||||||
| These archived documents will also contain selectable text for image-only |  | ||||||
| originals. |  | ||||||
| These documents are derived from the originals, which are always stored |  | ||||||
| unmodified. If coming from an earlier version of paperless, your documents |  | ||||||
| won't have archived versions. |  | ||||||
|  |  | ||||||
| This command creates PDF/A documents for your documents. |  | ||||||
|  |  | ||||||
| .. code:: |  | ||||||
|  |  | ||||||
|     document_archiver --overwrite --document <id> |  | ||||||
|  |  | ||||||
| This command will only attempt to create archived documents when no archived |  | ||||||
| document exists yet, unless ``--overwrite`` is specified. If ``--document <id>`` |  | ||||||
| is specified, the archiver will only process that document. |  | ||||||
|  |  | ||||||
| .. note:: |  | ||||||
|  |  | ||||||
|     This command essentially performs OCR on all your documents again, |  | ||||||
|     according to your settings. If you run this with ``PAPERLESS_OCR_MODE=redo``, |  | ||||||
|     it will potentially run for a very long time. You can cancel the command |  | ||||||
|     at any time, since this command will skip already archived versions the next time |  | ||||||
|     it is run. |  | ||||||
|  |  | ||||||
| .. note:: |  | ||||||
|  |  | ||||||
|     Some documents will cause errors and cannot be converted into PDF/A documents, |  | ||||||
|     such as encrypted PDF documents. The archiver will skip over these documents |  | ||||||
|     each time it sees them. |  | ||||||
|  |  | ||||||
| .. _utilities-encyption: |  | ||||||
|  |  | ||||||
| Managing encryption |  | ||||||
| =================== |  | ||||||
|  |  | ||||||
| Documents can be stored in Paperless using GnuPG encryption. |  | ||||||
|  |  | ||||||
| .. danger:: |  | ||||||
|  |  | ||||||
|     Encryption is deprecated since paperless-ngx 0.9 and doesn't really provide any |  | ||||||
|     additional security, since you have to store the passphrase in a configuration |  | ||||||
|     file on the same system as the encrypted documents for paperless to work. |  | ||||||
|     Furthermore, the entire text content of the documents is stored plain in the |  | ||||||
|     database, even if your documents are encrypted. Filenames are not encrypted as |  | ||||||
|     well. |  | ||||||
|  |  | ||||||
|     Also, the web server provides transparent access to your encrypted documents. |  | ||||||
|  |  | ||||||
|     Consider running paperless on an encrypted filesystem instead, which will then |  | ||||||
|     at least provide security against physical hardware theft. |  | ||||||
|  |  | ||||||
|  |  | ||||||
| Enabling encryption |  | ||||||
| ------------------- |  | ||||||
|  |  | ||||||
| Enabling encryption is no longer supported. |  | ||||||
|  |  | ||||||
|  |  | ||||||
| Disabling encryption |  | ||||||
| -------------------- |  | ||||||
|  |  | ||||||
| Basic usage to disable encryption of your document store: |  | ||||||
|  |  | ||||||
| (Note: If ``PAPERLESS_PASSPHRASE`` isn't set already, you need to specify it here) |  | ||||||
|  |  | ||||||
| .. code:: |  | ||||||
|  |  | ||||||
|     decrypt_documents [--passphrase SECR3TP4SSPHRA$E] |  | ||||||
|   | |||||||
| @@ -1,447 +1,11 @@ | |||||||
|  | .. _advanced_usage: | ||||||
|  |  | ||||||
| *************** | *************** | ||||||
| Advanced topics | Advanced topics | ||||||
| *************** | *************** | ||||||
|  |  | ||||||
| Paperless offers a couple features that automate certain tasks and make your life | .. cssclass:: redirect-notice | ||||||
| easier. |  | ||||||
|  |  | ||||||
| .. _advanced-matching: |     The Paperless-ngx documentation has permanently moved. | ||||||
|  |  | ||||||
| Matching tags, correspondents, document types, and storage paths |     You will be redirected shortly... | ||||||
| ################################################################ |  | ||||||
|  |  | ||||||
| Paperless will compare the matching algorithms defined by every tag, correspondent, |  | ||||||
| document type, and storage path in your database to see if they apply to the text |  | ||||||
| in a document. In other words, if you define a tag called ``Home Utility`` |  | ||||||
| that had a ``match`` property of ``bc hydro`` and a ``matching_algorithm`` of |  | ||||||
| ``literal``, Paperless will automatically tag your newly-consumed document with |  | ||||||
| your ``Home Utility`` tag so long as the text ``bc hydro`` appears in the body |  | ||||||
| of the document somewhere. |  | ||||||
|  |  | ||||||
| The matching logic is quite powerful. It supports searching the text of your |  | ||||||
| document with different algorithms, and as such, some experimentation may be |  | ||||||
| necessary to get things right. |  | ||||||
|  |  | ||||||
| In order to have a tag, correspondent, document type, or storage path assigned |  | ||||||
| automatically to newly consumed documents, assign a match and matching algorithm |  | ||||||
| using the web interface. These settings define when to assign tags, correspondents, |  | ||||||
| document types, and storage paths to documents. |  | ||||||
|  |  | ||||||
| The following algorithms are available: |  | ||||||
|  |  | ||||||
| * **Any:** Looks for any occurrence of any word provided in match in the PDF. |  | ||||||
|   If you define the match as ``Bank1 Bank2``, it will match documents containing |  | ||||||
|   either of these terms. |  | ||||||
| * **All:** Requires that every word provided appears in the PDF, albeit not in the |  | ||||||
|   order provided. |  | ||||||
| * **Literal:** Matches only if the match appears exactly as provided (i.e. preserve ordering) in the PDF. |  | ||||||
| * **Regular expression:** Parses the match as a regular expression and tries to |  | ||||||
|   find a match within the document. |  | ||||||
| * **Fuzzy match:** I don't know. Look at the source. |  | ||||||
| * **Auto:** Tries to automatically match new documents. This does not require you |  | ||||||
|   to set a match. See the notes below. |  | ||||||
|  |  | ||||||
| When using the *any* or *all* matching algorithms, you can search for terms |  | ||||||
| that consist of multiple words by enclosing them in double quotes. For example, |  | ||||||
| defining a match text of ``"Bank of America" BofA`` using the *any* algorithm, |  | ||||||
| will match documents that contain either "Bank of America" or "BofA", but will |  | ||||||
| not match documents containing "Bank of South America". |  | ||||||
|  |  | ||||||
| Then just save your tag, correspondent, document type, or storage path and run |  | ||||||
| another document through the consumer.  Once complete, you should see the |  | ||||||
| newly-created document, automatically tagged with the appropriate data. |  | ||||||
|  |  | ||||||
|  |  | ||||||
| .. _advanced-automatic_matching: |  | ||||||
|  |  | ||||||
| Automatic matching |  | ||||||
| ================== |  | ||||||
|  |  | ||||||
| Paperless-ngx comes with a new matching algorithm called *Auto*. This matching |  | ||||||
| algorithm tries to assign tags, correspondents, document types, and storage paths |  | ||||||
| to your documents based on how you have already assigned these on existing documents. |  | ||||||
| It uses a neural network under the hood. |  | ||||||
|  |  | ||||||
| If, for example, all your bank statements of your account 123 at the Bank of |  | ||||||
| America are tagged with the tag "bofa_123" and the matching algorithm of this |  | ||||||
| tag is set to *Auto*, this neural network will examine your documents and |  | ||||||
| automatically learn when to assign this tag. |  | ||||||
|  |  | ||||||
| Paperless tries to hide much of the involved complexity with this approach. |  | ||||||
| However, there are a couple caveats you need to keep in mind when using this |  | ||||||
| feature: |  | ||||||
|  |  | ||||||
| * Changes to your documents are not immediately reflected by the matching |  | ||||||
|   algorithm. The neural network needs to be *trained* on your documents after |  | ||||||
|   changes. Paperless periodically (default: once each hour) checks for changes |  | ||||||
|   and does this automatically for you. |  | ||||||
| * The Auto matching algorithm only takes documents into account which are NOT |  | ||||||
|   placed in your inbox (i.e. have any inbox tags assigned to them). This ensures |  | ||||||
|   that the neural network only learns from documents which you have correctly |  | ||||||
|   tagged before. |  | ||||||
| * The matching algorithm can only work if there is a correlation between the |  | ||||||
|   tag, correspondent, document type, or storage path and the document itself. |  | ||||||
|   Your bank statements usually contain your bank account number and the name |  | ||||||
|   of the bank, so this works reasonably well, However, tags such as "TODO" |  | ||||||
|   cannot be automatically assigned. |  | ||||||
| * The matching algorithm needs a reasonable number of documents to identify when |  | ||||||
|   to assign tags, correspondents, storage paths, and types. If one out of a |  | ||||||
|   thousand documents has the correspondent "Very obscure web shop I bought |  | ||||||
|   something five years ago", it will probably not assign this correspondent |  | ||||||
|   automatically if you buy something from them again. The more documents, the better. |  | ||||||
| * Paperless also needs a reasonable amount of negative examples to decide when |  | ||||||
|   not to assign a certain tag, correspondent, document type, or storage path. This will |  | ||||||
|   usually be the case as you start filling up paperless with documents. |  | ||||||
|   Example: If all your documents are either from "Webshop" and "Bank", paperless |  | ||||||
|   will assign one of these correspondents to ANY new document, if both are set |  | ||||||
|   to automatic matching. |  | ||||||
|  |  | ||||||
| Hooking into the consumption process |  | ||||||
| #################################### |  | ||||||
|  |  | ||||||
| Sometimes you may want to do something arbitrary whenever a document is |  | ||||||
| consumed.  Rather than try to predict what you may want to do, Paperless lets |  | ||||||
| you execute scripts of your own choosing just before or after a document is |  | ||||||
| consumed using a couple simple hooks. |  | ||||||
|  |  | ||||||
| Just write a script, put it somewhere that Paperless can read & execute, and |  | ||||||
| then put the path to that script in ``paperless.conf`` or ``docker-compose.env`` with the variable name |  | ||||||
| of either ``PAPERLESS_PRE_CONSUME_SCRIPT`` or |  | ||||||
| ``PAPERLESS_POST_CONSUME_SCRIPT``. |  | ||||||
|  |  | ||||||
| .. important:: |  | ||||||
|  |  | ||||||
|     These scripts are executed in a **blocking** process, which means that if |  | ||||||
|     a script takes a long time to run, it can significantly slow down your |  | ||||||
|     document consumption flow.  If you want things to run asynchronously, |  | ||||||
|     you'll have to fork the process in your script and exit. |  | ||||||
|  |  | ||||||
|  |  | ||||||
| Pre-consumption script |  | ||||||
| ====================== |  | ||||||
|  |  | ||||||
| Executed after the consumer sees a new document in the consumption folder, but |  | ||||||
| before any processing of the document is performed. This script can access the |  | ||||||
| following relevant environment variables set: |  | ||||||
|  |  | ||||||
| * ``DOCUMENT_SOURCE_PATH`` |  | ||||||
|  |  | ||||||
| A simple but common example for this would be creating a simple script like |  | ||||||
| this: |  | ||||||
|  |  | ||||||
| ``/usr/local/bin/ocr-pdf`` |  | ||||||
|  |  | ||||||
| .. code:: bash |  | ||||||
|  |  | ||||||
|     #!/usr/bin/env bash |  | ||||||
|     pdf2pdfocr.py -i ${DOCUMENT_SOURCE_PATH} |  | ||||||
|  |  | ||||||
| ``/etc/paperless.conf`` |  | ||||||
|  |  | ||||||
| .. code:: bash |  | ||||||
|  |  | ||||||
|     ... |  | ||||||
|     PAPERLESS_PRE_CONSUME_SCRIPT="/usr/local/bin/ocr-pdf" |  | ||||||
|     ... |  | ||||||
|  |  | ||||||
| This will pass the path to the document about to be consumed to ``/usr/local/bin/ocr-pdf``, |  | ||||||
| which will in turn call `pdf2pdfocr.py`_ on your document, which will then |  | ||||||
| overwrite the file with an OCR'd version of the file and exit.  At which point, |  | ||||||
| the consumption process will begin with the newly modified file. |  | ||||||
|  |  | ||||||
| The script's stdout and stderr will be logged line by line to the webserver log, along |  | ||||||
| with the exit code of the script. |  | ||||||
|  |  | ||||||
| .. _pdf2pdfocr.py: https://github.com/LeoFCardoso/pdf2pdfocr |  | ||||||
|  |  | ||||||
| .. _advanced-post_consume_script: |  | ||||||
|  |  | ||||||
| Post-consumption script |  | ||||||
| ======================= |  | ||||||
|  |  | ||||||
| Executed after the consumer has successfully processed a document and has moved it |  | ||||||
| into paperless. It receives the following environment variables: |  | ||||||
|  |  | ||||||
| * ``DOCUMENT_ID`` |  | ||||||
| * ``DOCUMENT_FILE_NAME`` |  | ||||||
| * ``DOCUMENT_CREATED`` |  | ||||||
| * ``DOCUMENT_MODIFIED`` |  | ||||||
| * ``DOCUMENT_ADDED`` |  | ||||||
| * ``DOCUMENT_SOURCE_PATH`` |  | ||||||
| * ``DOCUMENT_ARCHIVE_PATH`` |  | ||||||
| * ``DOCUMENT_THUMBNAIL_PATH`` |  | ||||||
| * ``DOCUMENT_DOWNLOAD_URL`` |  | ||||||
| * ``DOCUMENT_THUMBNAIL_URL`` |  | ||||||
| * ``DOCUMENT_CORRESPONDENT`` |  | ||||||
| * ``DOCUMENT_TAGS`` |  | ||||||
| * ``DOCUMENT_ORIGINAL_FILENAME`` |  | ||||||
|  |  | ||||||
| The script can be in any language, but for a simple shell script |  | ||||||
| example, you can take a look at `post-consumption-example.sh`_ in this project. |  | ||||||
|  |  | ||||||
| The post consumption script cannot cancel the consumption process. |  | ||||||
|  |  | ||||||
| The script's stdout and stderr will be logged line by line to the webserver log, along |  | ||||||
| with the exit code of the script. |  | ||||||
|  |  | ||||||
|  |  | ||||||
| Docker |  | ||||||
| ------ |  | ||||||
| Assumed you have ``/home/foo/paperless-ngx/scripts/post-consumption-example.sh``. |  | ||||||
|  |  | ||||||
| You can pass that script into the consumer container via a host mount in your ``docker-compose.yml``. |  | ||||||
|  |  | ||||||
| .. code:: bash |  | ||||||
|  |  | ||||||
|   ... |  | ||||||
|   consumer: |  | ||||||
|     ... |  | ||||||
|     volumes: |  | ||||||
|       ... |  | ||||||
|       - /home/paperless-ngx/scripts:/path/in/container/scripts/ |  | ||||||
|   ... |  | ||||||
|  |  | ||||||
| Example (docker-compose.yml): ``- /home/foo/paperless-ngx/scripts:/usr/src/paperless/scripts`` |  | ||||||
|  |  | ||||||
| which in turn requires the variable ``PAPERLESS_POST_CONSUME_SCRIPT`` in ``docker-compose.env``  to point to ``/path/in/container/scripts/post-consumption-example.sh``. |  | ||||||
|  |  | ||||||
| Example (docker-compose.env): ``PAPERLESS_POST_CONSUME_SCRIPT=/usr/src/paperless/scripts/post-consumption-example.sh`` |  | ||||||
|  |  | ||||||
| Troubleshooting: |  | ||||||
|  |  | ||||||
| - Monitor the docker-compose log ``cd ~/paperless-ngx; docker-compose logs -f`` |  | ||||||
| - Check your script's permission e.g. in case of permission error ``sudo chmod 755 post-consumption-example.sh`` |  | ||||||
| - Pipe your scripts's output to a log file e.g. ``echo "${DOCUMENT_ID}" | tee --append /usr/src/paperless/scripts/post-consumption-example.log`` |  | ||||||
|  |  | ||||||
| .. _post-consumption-example.sh: https://github.com/paperless-ngx/paperless-ngx/blob/main/scripts/post-consumption-example.sh |  | ||||||
|  |  | ||||||
| .. _advanced-file_name_handling: |  | ||||||
|  |  | ||||||
| File name handling |  | ||||||
| ################## |  | ||||||
|  |  | ||||||
| By default, paperless stores your documents in the media directory and renames them |  | ||||||
| using the identifier which it has assigned to each document. You will end up getting |  | ||||||
| files like ``0000123.pdf`` in your media directory. This isn't necessarily a bad |  | ||||||
| thing, because you normally don't have to access these files manually. However, if |  | ||||||
| you wish to name your files differently, you can do that by adjusting the |  | ||||||
| ``PAPERLESS_FILENAME_FORMAT`` configuration option. Paperless adds the correct |  | ||||||
| file extension e.g. ``.pdf``, ``.jpg`` automatically. |  | ||||||
|  |  | ||||||
| This variable allows you to configure the filename (folders are allowed) using |  | ||||||
| placeholders. For example, configuring this to |  | ||||||
|  |  | ||||||
| .. code:: bash |  | ||||||
|  |  | ||||||
|     PAPERLESS_FILENAME_FORMAT={created_year}/{correspondent}/{title} |  | ||||||
|  |  | ||||||
| will create a directory structure as follows: |  | ||||||
|  |  | ||||||
| .. code:: |  | ||||||
|  |  | ||||||
|     2019/ |  | ||||||
|       My bank/ |  | ||||||
|         Statement January.pdf |  | ||||||
|         Statement February.pdf |  | ||||||
|     2020/ |  | ||||||
|       My bank/ |  | ||||||
|         Statement January.pdf |  | ||||||
|         Letter.pdf |  | ||||||
|         Letter_01.pdf |  | ||||||
|       Shoe store/ |  | ||||||
|         My new shoes.pdf |  | ||||||
|  |  | ||||||
| .. danger:: |  | ||||||
|  |  | ||||||
|     Do not manually move your files in the media folder. Paperless remembers the |  | ||||||
|     last filename a document was stored as. If you do rename a file, paperless will |  | ||||||
|     report your files as missing and won't be able to find them. |  | ||||||
|  |  | ||||||
| Paperless provides the following placeholders within filenames: |  | ||||||
|  |  | ||||||
| * ``{asn}``: The archive serial number of the document, or "none". |  | ||||||
| * ``{correspondent}``: The name of the correspondent, or "none". |  | ||||||
| * ``{document_type}``: The name of the document type, or "none". |  | ||||||
| * ``{tag_list}``: A comma separated list of all tags assigned to the document. |  | ||||||
| * ``{title}``: The title of the document. |  | ||||||
| * ``{created}``: The full date (ISO format) the document was created. |  | ||||||
| * ``{created_year}``: Year created only, formatted as the year with century. |  | ||||||
| * ``{created_year_short}``: Year created only, formatted as the year without century, zero padded. |  | ||||||
| * ``{created_month}``: Month created only (number 01-12). |  | ||||||
| * ``{created_month_name}``: Month created name, as per locale |  | ||||||
| * ``{created_month_name_short}``: Month created abbreviated name, as per locale |  | ||||||
| * ``{created_day}``: Day created only (number 01-31). |  | ||||||
| * ``{added}``: The full date (ISO format) the document was added to paperless. |  | ||||||
| * ``{added_year}``: Year added only. |  | ||||||
| * ``{added_year_short}``: Year added only, formatted as the year without century, zero padded. |  | ||||||
| * ``{added_month}``: Month added only (number 01-12). |  | ||||||
| * ``{added_month_name}``: Month added name, as per locale |  | ||||||
| * ``{added_month_name_short}``: Month added abbreviated name, as per locale |  | ||||||
| * ``{added_day}``: Day added only (number 01-31). |  | ||||||
|  |  | ||||||
|  |  | ||||||
| Paperless will try to conserve the information from your database as much as possible. |  | ||||||
| However, some characters that you can use in document titles and correspondent names (such |  | ||||||
| as ``: \ /`` and a couple more) are not allowed in filenames and will be replaced with dashes. |  | ||||||
|  |  | ||||||
| If paperless detects that two documents share the same filename, paperless will automatically |  | ||||||
| append ``_01``, ``_02``, etc to the filename. This happens if all the placeholders in a filename |  | ||||||
| evaluate to the same value. |  | ||||||
|  |  | ||||||
| .. hint:: |  | ||||||
|     You can affect how empty placeholders are treated by changing the following setting to |  | ||||||
|     `true`. |  | ||||||
|  |  | ||||||
|     .. code:: |  | ||||||
|  |  | ||||||
|         PAPERLESS_FILENAME_FORMAT_REMOVE_NONE=True |  | ||||||
|  |  | ||||||
|     Doing this results in all empty placeholders resolving to "" instead of "none" as stated above. |  | ||||||
|     Spaces before empty placeholders are removed as well, empty directories are omitted. |  | ||||||
|  |  | ||||||
| .. hint:: |  | ||||||
|  |  | ||||||
|     Paperless checks the filename of a document whenever it is saved. Therefore, |  | ||||||
|     you need to update the filenames of your documents and move them after altering |  | ||||||
|     this setting by invoking the :ref:`document renamer <utilities-renamer>`. |  | ||||||
|  |  | ||||||
| .. warning:: |  | ||||||
|  |  | ||||||
|     Make absolutely sure you get the spelling of the placeholders right, or else |  | ||||||
|     paperless will use the default naming scheme instead. |  | ||||||
|  |  | ||||||
| .. caution:: |  | ||||||
|  |  | ||||||
|     As of now, you could totally tell paperless to store your files anywhere outside |  | ||||||
|     the media directory by setting |  | ||||||
|  |  | ||||||
|     .. code:: |  | ||||||
|  |  | ||||||
|         PAPERLESS_FILENAME_FORMAT=../../my/custom/location/{title} |  | ||||||
|  |  | ||||||
|     However, keep in mind that inside docker, if files get stored outside of the |  | ||||||
|     predefined volumes, they will be lost after a restart of paperless. |  | ||||||
|  |  | ||||||
|  |  | ||||||
| Storage paths |  | ||||||
| ############# |  | ||||||
|  |  | ||||||
| One of the best things in Paperless is that you can not only access the documents via the |  | ||||||
| web interface, but also via the file system. |  | ||||||
|  |  | ||||||
| When as single storage layout is not sufficient for your use case, storage paths come to |  | ||||||
| the rescue. Storage paths allow you to configure more precisely where each document is stored |  | ||||||
| in the file system. |  | ||||||
|  |  | ||||||
| - Each storage path is a `PAPERLESS_FILENAME_FORMAT` and follows the rules described above |  | ||||||
| - Each document is assigned a storage path using the matching algorithms described above, but |  | ||||||
|   can be overwritten at any time |  | ||||||
|  |  | ||||||
| For example, you could define the following two storage paths: |  | ||||||
|  |  | ||||||
| 1. Normal communications are put into a folder structure sorted by `year/correspondent` |  | ||||||
| 2. Communications with insurance companies are stored in a flat structure with longer file names, |  | ||||||
|    but containing the full date of the correspondence. |  | ||||||
|  |  | ||||||
| .. code:: |  | ||||||
|  |  | ||||||
|     By Year = {created_year}/{correspondent}/{title} |  | ||||||
|     Insurances = Insurances/{correspondent}/{created_year}-{created_month}-{created_day} {title} |  | ||||||
|  |  | ||||||
|  |  | ||||||
| If you then map these storage paths to the documents, you might get the following result. |  | ||||||
| For simplicity, `By Year` defines the same structure as in the previous example above. |  | ||||||
|  |  | ||||||
| .. code:: text |  | ||||||
|  |  | ||||||
|    2019/                                   # By Year |  | ||||||
|       My bank/ |  | ||||||
|         Statement January.pdf |  | ||||||
|         Statement February.pdf |  | ||||||
|  |  | ||||||
|     Insurances/                           # Insurances |  | ||||||
|       Healthcare 123/ |  | ||||||
|         2022-01-01 Statement January.pdf |  | ||||||
|         2022-02-02 Letter.pdf |  | ||||||
|         2022-02-03 Letter.pdf |  | ||||||
|       Dental 456/ |  | ||||||
|         2021-12-01 New Conditions.pdf |  | ||||||
|  |  | ||||||
|  |  | ||||||
| .. hint:: |  | ||||||
|  |  | ||||||
|     Defining a storage path is optional. If no storage path is defined for a document, the global |  | ||||||
|     `PAPERLESS_FILENAME_FORMAT` is applied. |  | ||||||
|  |  | ||||||
| .. caution:: |  | ||||||
|  |  | ||||||
|     If you adjust the format of an existing storage path, old documents don't get relocated automatically. |  | ||||||
|     You need to run the :ref:`document renamer <utilities-renamer>` to adjust their pathes. |  | ||||||
|  |  | ||||||
| .. _advanced-celery-monitoring: |  | ||||||
|  |  | ||||||
| Celery Monitoring |  | ||||||
| ################# |  | ||||||
|  |  | ||||||
| The monitoring tool `Flower <https://flower.readthedocs.io/en/latest/index.html>`_ can be used to view more |  | ||||||
| detailed information about the health of the celery workers used for asynchronous tasks.  This includes details |  | ||||||
| on currently running, queued and completed tasks, timing and more.  Flower can also be used with Prometheus, as it |  | ||||||
| exports metrics.  For details on its capabilities, refer to the Flower documentation. |  | ||||||
|  |  | ||||||
| To configure Flower further, create a `flowerconfig.py` and place it into the `src/paperless` directory.  For |  | ||||||
| a Docker installation, you can use volumes to accomplish this: |  | ||||||
|  |  | ||||||
| .. code:: yaml |  | ||||||
|  |  | ||||||
|     services: |  | ||||||
|       # ... |  | ||||||
|       webserver: |  | ||||||
|         # ... |  | ||||||
|         volumes: |  | ||||||
|           - /path/to/my/flowerconfig.py:/usr/src/paperless/src/paperless/flowerconfig.py:ro |  | ||||||
|  |  | ||||||
| Custom Container Initialization |  | ||||||
| ############################### |  | ||||||
|  |  | ||||||
| The Docker image includes the ability to run custom user scripts during startup.  This could be |  | ||||||
| utilized for installing additional tools or Python packages, for example. |  | ||||||
|  |  | ||||||
| To utilize this, mount a folder containing your scripts to the custom initialization directory, `/custom-cont-init.d` |  | ||||||
| and place scripts you wish to run inside.  For security, the folder and its contents must be owned by `root`. |  | ||||||
| Additionally, scripts must only be writable by `root`. |  | ||||||
|  |  | ||||||
| Your scripts will be run directly before the webserver completes startup.  Scripts will be run by the `root` user. |  | ||||||
| This is an advanced functionality with which you could break functionality or lose data. |  | ||||||
|  |  | ||||||
| For example, using Docker Compose: |  | ||||||
|  |  | ||||||
|  |  | ||||||
| .. code:: yaml |  | ||||||
|  |  | ||||||
|     services: |  | ||||||
|       # ... |  | ||||||
|       webserver: |  | ||||||
|         # ... |  | ||||||
|         volumes: |  | ||||||
|           - /path/to/my/scripts:/custom-cont-init.d:ro |  | ||||||
|  |  | ||||||
| .. _advanced-mysql-caveats: |  | ||||||
|  |  | ||||||
| MySQL Caveats |  | ||||||
| ############# |  | ||||||
|  |  | ||||||
| Case Sensitivity |  | ||||||
| ================ |  | ||||||
|  |  | ||||||
| The database interface does not provide a method to configure a MySQL database to |  | ||||||
| be case sensitive.  This would prevent a user from creating a tag ``Name`` and ``NAME`` |  | ||||||
| as they are considered the same. |  | ||||||
|  |  | ||||||
| Per Django documentation, to enable this requires manual intervention.  To enable |  | ||||||
| case sensetive tables, you can execute the following command against each table: |  | ||||||
|  |  | ||||||
| ``ALTER TABLE <table_name> CONVERT TO CHARACTER SET utf8mb4 COLLATE utf8mb4_bin;`` |  | ||||||
|  |  | ||||||
| You can also set the default for new tables (this does NOT affect existing tables) with: |  | ||||||
|  |  | ||||||
| ``ALTER DATABASE <db_name> CHARACTER SET utf8mb4 COLLATE utf8mb4_bin;`` |  | ||||||
|   | |||||||
							
								
								
									
										299
									
								
								docs/api.rst
									
									
									
									
									
								
							
							
						
						
									
										299
									
								
								docs/api.rst
									
									
									
									
									
								
							| @@ -1,303 +1,12 @@ | |||||||
|  | .. _api: | ||||||
|  |  | ||||||
| ************ | ************ | ||||||
| The REST API | The REST API | ||||||
| ************ | ************ | ||||||
|  |  | ||||||
|  |  | ||||||
| Paperless makes use of the `Django REST Framework`_ standard API interface. | .. cssclass:: redirect-notice | ||||||
| It provides a browsable API for most of its endpoints, which you can inspect |  | ||||||
| at ``http://<paperless-host>:<port>/api/``. This also documents most of the |  | ||||||
| available filters and ordering fields. |  | ||||||
|  |  | ||||||
| .. _Django REST Framework: http://django-rest-framework.org/ |     The Paperless-ngx documentation has permanently moved. | ||||||
|  |  | ||||||
| The API provides 5 main endpoints: |     You will be redirected shortly... | ||||||
|  |  | ||||||
| *   ``/api/documents/``: Full CRUD support, except POSTing new documents. See below. |  | ||||||
| *   ``/api/correspondents/``: Full CRUD support. |  | ||||||
| *   ``/api/document_types/``: Full CRUD support. |  | ||||||
| *   ``/api/logs/``: Read-Only. |  | ||||||
| *   ``/api/tags/``: Full CRUD support. |  | ||||||
|  |  | ||||||
| All of these endpoints except for the logging endpoint |  | ||||||
| allow you to fetch, edit and delete individual objects |  | ||||||
| by appending their primary key to the path, for example ``/api/documents/454/``. |  | ||||||
|  |  | ||||||
| The objects served by the document endpoint contain the following fields: |  | ||||||
|  |  | ||||||
| *   ``id``: ID of the document. Read-only. |  | ||||||
| *   ``title``: Title of the document. |  | ||||||
| *   ``content``: Plain text content of the document. |  | ||||||
| *   ``tags``: List of IDs of tags assigned to this document, or empty list. |  | ||||||
| *   ``document_type``: Document type of this document, or null. |  | ||||||
| *   ``correspondent``:  Correspondent of this document or null. |  | ||||||
| *   ``created``: The date time at which this document was created. |  | ||||||
| *   ``created_date``: The date (YYYY-MM-DD) at which this document was created. Optional. If also passed with created, this is ignored. |  | ||||||
| *   ``modified``: The date at which this document was last edited in paperless. Read-only. |  | ||||||
| *   ``added``: The date at which this document was added to paperless. Read-only. |  | ||||||
| *   ``archive_serial_number``: The identifier of this document in a physical document archive. |  | ||||||
| *   ``original_file_name``: Verbose filename of the original document. Read-only. |  | ||||||
| *   ``archived_file_name``: Verbose filename of the archived document. Read-only. Null if no archived document is available. |  | ||||||
|  |  | ||||||
|  |  | ||||||
| Downloading documents |  | ||||||
| ##################### |  | ||||||
|  |  | ||||||
| In addition to that, the document endpoint offers these additional actions on |  | ||||||
| individual documents: |  | ||||||
|  |  | ||||||
| *   ``/api/documents/<pk>/download/``: Download the document. |  | ||||||
| *   ``/api/documents/<pk>/preview/``: Display the document inline, |  | ||||||
|     without downloading it. |  | ||||||
| *   ``/api/documents/<pk>/thumb/``: Download the PNG thumbnail of a document. |  | ||||||
|  |  | ||||||
| Paperless generates archived PDF/A documents from consumed files and stores both |  | ||||||
| the original files as well as the archived files. By default, the endpoints |  | ||||||
| for previews and downloads serve the archived file, if it is available. |  | ||||||
| Otherwise, the original file is served. |  | ||||||
| Some document cannot be archived. |  | ||||||
|  |  | ||||||
| The endpoints correctly serve the response header fields ``Content-Disposition`` |  | ||||||
| and ``Content-Type`` to indicate the filename for download and the type of content of |  | ||||||
| the document. |  | ||||||
|  |  | ||||||
| In order to download or preview the original document when an archived document is available, |  | ||||||
| supply the query parameter ``original=true``. |  | ||||||
|  |  | ||||||
| .. hint:: |  | ||||||
|  |  | ||||||
|     Paperless used to provide these functionality at ``/fetch/<pk>/preview``, |  | ||||||
|     ``/fetch/<pk>/thumb`` and ``/fetch/<pk>/doc``. Redirects to the new URLs |  | ||||||
|     are in place. However, if you use these old URLs to access documents, you |  | ||||||
|     should update your app or script to use the new URLs. |  | ||||||
|  |  | ||||||
|  |  | ||||||
| Getting document metadata |  | ||||||
| ######################### |  | ||||||
|  |  | ||||||
| The api also has an endpoint to retrieve read-only metadata about specific documents. this |  | ||||||
| information is not served along with the document objects, since it requires reading |  | ||||||
| files and would therefore slow down document lists considerably. |  | ||||||
|  |  | ||||||
| Access the metadata of a document with an ID ``id`` at ``/api/documents/<id>/metadata/``. |  | ||||||
|  |  | ||||||
| The endpoint reports the following data: |  | ||||||
|  |  | ||||||
| *   ``original_checksum``: MD5 checksum of the original document. |  | ||||||
| *   ``original_size``: Size of the original document, in bytes. |  | ||||||
| *   ``original_mime_type``: Mime type of the original document. |  | ||||||
| *   ``media_filename``: Current filename of the document, under which it is stored inside the media directory. |  | ||||||
| *   ``has_archive_version``: True, if this document is archived, false otherwise. |  | ||||||
| *   ``original_metadata``: A list of metadata associated with the original document. See below. |  | ||||||
| *   ``archive_checksum``: MD5 checksum of the archived document, or null. |  | ||||||
| *   ``archive_size``: Size of the archived document in bytes, or null. |  | ||||||
| *   ``archive_metadata``: Metadata associated with the archived document, or null. See below. |  | ||||||
|  |  | ||||||
| File metadata is reported as a list of objects in the following form: |  | ||||||
|  |  | ||||||
| .. code:: json |  | ||||||
|  |  | ||||||
|     [ |  | ||||||
|         { |  | ||||||
|             "namespace": "http://ns.adobe.com/pdf/1.3/", |  | ||||||
|             "prefix": "pdf", |  | ||||||
|             "key": "Producer", |  | ||||||
|             "value": "SparklePDF, Fancy edition" |  | ||||||
|         }, |  | ||||||
|     ] |  | ||||||
|  |  | ||||||
| ``namespace`` and ``prefix`` can be null. The actual metadata reported depends on the file type and the metadata |  | ||||||
| available in that specific document. Paperless only reports PDF metadata at this point. |  | ||||||
|  |  | ||||||
| Authorization |  | ||||||
| ############# |  | ||||||
|  |  | ||||||
| The REST api provides three different forms of authentication. |  | ||||||
|  |  | ||||||
| 1.  Basic authentication |  | ||||||
|  |  | ||||||
|     Authorize by providing a HTTP header in the form |  | ||||||
|  |  | ||||||
|     .. code:: |  | ||||||
|  |  | ||||||
|         Authorization: Basic <credentials> |  | ||||||
|  |  | ||||||
|     where ``credentials`` is a base64-encoded string of ``<username>:<password>`` |  | ||||||
|  |  | ||||||
| 2.  Session authentication |  | ||||||
|  |  | ||||||
|     When you're logged into paperless in your browser, you're automatically |  | ||||||
|     logged into the API as well and don't need to provide any authorization |  | ||||||
|     headers. |  | ||||||
|  |  | ||||||
| 3.  Token authentication |  | ||||||
|  |  | ||||||
|     Paperless also offers an endpoint to acquire authentication tokens. |  | ||||||
|  |  | ||||||
|     POST a username and password as a form or json string to ``/api/token/`` |  | ||||||
|     and paperless will respond with a token, if the login data is correct. |  | ||||||
|     This token can be used to authenticate other requests with the |  | ||||||
|     following HTTP header: |  | ||||||
|  |  | ||||||
|     .. code:: |  | ||||||
|  |  | ||||||
|         Authorization: Token <token> |  | ||||||
|  |  | ||||||
|     Tokens can be managed and revoked in the paperless admin. |  | ||||||
|  |  | ||||||
| Searching for documents |  | ||||||
| ####################### |  | ||||||
|  |  | ||||||
| Full text searching is available on the ``/api/documents/`` endpoint. Two specific |  | ||||||
| query parameters cause the API to return full text search results: |  | ||||||
|  |  | ||||||
| *   ``/api/documents/?query=your%20search%20query``: Search for a document using a full text query. |  | ||||||
|     For details on the syntax, see :ref:`basic-usage_searching`. |  | ||||||
|  |  | ||||||
| *   ``/api/documents/?more_like=1234``: Search for documents similar to the document with id 1234. |  | ||||||
|  |  | ||||||
| Pagination works exactly the same as it does for normal requests on this endpoint. |  | ||||||
|  |  | ||||||
| Certain limitations apply to full text queries: |  | ||||||
|  |  | ||||||
| *   Results are always sorted by search score. The results matching the query best will show up first. |  | ||||||
|  |  | ||||||
| *   Only a small subset of filtering parameters are supported. |  | ||||||
|  |  | ||||||
| Furthermore, each returned document has an additional ``__search_hit__`` attribute with various information |  | ||||||
| about the search results: |  | ||||||
|  |  | ||||||
| .. code:: |  | ||||||
|  |  | ||||||
|     { |  | ||||||
|         "count": 31, |  | ||||||
|         "next": "http://localhost:8000/api/documents/?page=2&query=test", |  | ||||||
|         "previous": null, |  | ||||||
|         "results": [ |  | ||||||
|  |  | ||||||
|             ... |  | ||||||
|  |  | ||||||
|             { |  | ||||||
|                 "id": 123, |  | ||||||
|                 "title": "title", |  | ||||||
|                 "content": "content", |  | ||||||
|  |  | ||||||
|                 ... |  | ||||||
|  |  | ||||||
|                 "__search_hit__": { |  | ||||||
|                     "score": 0.343, |  | ||||||
|                     "highlights": "text <span class=\"match\">Test</span> text", |  | ||||||
|                     "rank": 23 |  | ||||||
|                 } |  | ||||||
|             }, |  | ||||||
|  |  | ||||||
|             ... |  | ||||||
|  |  | ||||||
|         ] |  | ||||||
|     } |  | ||||||
|  |  | ||||||
| *   ``score`` is an indication how well this document matches the query relative to the other search results. |  | ||||||
| *   ``highlights`` is an excerpt from the document content and highlights the search terms with ``<span>`` tags as shown above. |  | ||||||
| *   ``rank`` is the index of the search results. The first result will have rank 0. |  | ||||||
|  |  | ||||||
| ``/api/search/autocomplete/`` |  | ||||||
| ============================= |  | ||||||
|  |  | ||||||
| Get auto completions for a partial search term. |  | ||||||
|  |  | ||||||
| Query parameters: |  | ||||||
|  |  | ||||||
| *   ``term``: The incomplete term. |  | ||||||
| *   ``limit``: Amount of results. Defaults to 10. |  | ||||||
|  |  | ||||||
| Results returned by the endpoint are ordered by importance of the term in the |  | ||||||
| document index. The first result is the term that has the highest Tf/Idf score |  | ||||||
| in the index. |  | ||||||
|  |  | ||||||
| .. code:: json |  | ||||||
|  |  | ||||||
|     [ |  | ||||||
|         "term1", |  | ||||||
|         "term3", |  | ||||||
|         "term6", |  | ||||||
|         "term4" |  | ||||||
|     ] |  | ||||||
|  |  | ||||||
|  |  | ||||||
| .. _api-file_uploads: |  | ||||||
|  |  | ||||||
| POSTing documents |  | ||||||
| ################# |  | ||||||
|  |  | ||||||
| The API provides a special endpoint for file uploads: |  | ||||||
|  |  | ||||||
| ``/api/documents/post_document/`` |  | ||||||
|  |  | ||||||
| POST a multipart form to this endpoint, where the form field ``document`` contains |  | ||||||
| the document that you want to upload to paperless. The filename is sanitized and |  | ||||||
| then used to store the document in a temporary directory, and the consumer will |  | ||||||
| be instructed to consume the document from there. |  | ||||||
|  |  | ||||||
| The endpoint supports the following optional form fields: |  | ||||||
|  |  | ||||||
| *   ``title``: Specify a title that the consumer should use for the document. |  | ||||||
| *   ``created``: Specify a DateTime where the document was created (e.g. "2016-04-19" or "2016-04-19 06:15:00+02:00"). |  | ||||||
| *   ``correspondent``: Specify the ID of a correspondent that the consumer should use for the document. |  | ||||||
| *   ``document_type``: Similar to correspondent. |  | ||||||
| *   ``tags``: Similar to correspondent. Specify this multiple times to have multiple tags added |  | ||||||
|     to the document. |  | ||||||
|  |  | ||||||
|  |  | ||||||
| The endpoint will immediately return "OK" if the document consumption process |  | ||||||
| was started successfully. No additional status information about the consumption |  | ||||||
| process itself is available, since that happens in a different process. |  | ||||||
|  |  | ||||||
|  |  | ||||||
| .. _api-versioning: |  | ||||||
|  |  | ||||||
| API Versioning |  | ||||||
| ############## |  | ||||||
|  |  | ||||||
| The REST API is versioned since Paperless-ngx 1.3.0. |  | ||||||
|  |  | ||||||
| * Versioning ensures that changes to the API don't break older clients. |  | ||||||
| * Clients specify the specific version of the API they wish to use with every request and Paperless will handle the request using the specified API version. |  | ||||||
| * Even if the underlying data model changes, older API versions will always serve compatible data. |  | ||||||
| * If no version is specified, Paperless will serve version 1 to ensure compatibility with older clients that do not request a specific API version. |  | ||||||
|  |  | ||||||
| API versions are specified by submitting an additional HTTP ``Accept`` header with every request: |  | ||||||
|  |  | ||||||
| .. code:: |  | ||||||
|  |  | ||||||
|     Accept: application/json; version=6 |  | ||||||
|  |  | ||||||
| If an invalid version is specified, Paperless 1.3.0 will respond with "406 Not Acceptable" and an error message in the body. |  | ||||||
| Earlier versions of Paperless will serve API version 1 regardless of whether a version is specified via the ``Accept`` header. |  | ||||||
|  |  | ||||||
| If a client wishes to verify whether it is compatible with any given server, the following procedure should be performed: |  | ||||||
|  |  | ||||||
| 1.  Perform an *authenticated* request against any API endpoint. If the server is on version 1.3.0 or newer, the server will |  | ||||||
|     add two custom headers to the response: |  | ||||||
|  |  | ||||||
|     .. code:: |  | ||||||
|  |  | ||||||
|         X-Api-Version: 2 |  | ||||||
|         X-Version: 1.3.0 |  | ||||||
|  |  | ||||||
| 2.  Determine whether the client is compatible with this server based on the presence/absence of these headers and their values if present. |  | ||||||
|  |  | ||||||
|  |  | ||||||
| API Changelog |  | ||||||
| ============= |  | ||||||
|  |  | ||||||
| Version 1 |  | ||||||
| --------- |  | ||||||
|  |  | ||||||
| Initial API version. |  | ||||||
|  |  | ||||||
| Version 2 |  | ||||||
| --------- |  | ||||||
|  |  | ||||||
| * Added field ``Tag.color``. This read/write string field contains a hex color such as ``#a6cee3``. |  | ||||||
| * Added read-only field ``Tag.text_color``. This field contains the text color to use for a specific tag, which is either black or white depending on the brightness of ``Tag.color``. |  | ||||||
| * Removed field ``Tag.colour``. |  | ||||||
|   | |||||||
							
								
								
									
										2445
									
								
								docs/changelog.md
									
									
									
									
									
								
							
							
						
						
									
										2445
									
								
								docs/changelog.md
									
									
									
									
									
								
							
										
											
												File diff suppressed because it is too large
												Load Diff
											
										
									
								
							
							
								
								
									
										11
									
								
								docs/changelog.rst
									
									
									
									
									
										Normal file
									
								
							
							
						
						
									
										11
									
								
								docs/changelog.rst
									
									
									
									
									
										Normal file
									
								
							| @@ -0,0 +1,11 @@ | |||||||
|  | .. _changelog: | ||||||
|  |  | ||||||
|  | ********* | ||||||
|  | Changelog | ||||||
|  | ********* | ||||||
|  |  | ||||||
|  | .. cssclass:: redirect-notice | ||||||
|  |  | ||||||
|  |     The Paperless-ngx documentation has permanently moved. | ||||||
|  |  | ||||||
|  |     You will be redirected shortly... | ||||||
| @@ -4,928 +4,9 @@ | |||||||
| Configuration | Configuration | ||||||
| ************* | ************* | ||||||
|  |  | ||||||
| Paperless provides a wide range of customizations. |  | ||||||
| Depending on how you run paperless, these settings have to be defined in different |  | ||||||
| places. |  | ||||||
|  |  | ||||||
| *   If you run paperless on docker, ``paperless.conf`` is not used. Rather, configure | .. cssclass:: redirect-notice | ||||||
|     paperless by copying necessary options to ``docker-compose.env``. |  | ||||||
| *   If you are running paperless on anything else, paperless will search for the |  | ||||||
|     configuration file in these locations and use the first one it finds: |  | ||||||
|  |  | ||||||
|     .. code:: |     The Paperless-ngx documentation has permanently moved. | ||||||
|  |  | ||||||
|         /path/to/paperless/paperless.conf |     You will be redirected shortly... | ||||||
|         /etc/paperless.conf |  | ||||||
|         /usr/local/etc/paperless.conf |  | ||||||
|  |  | ||||||
|  |  | ||||||
| Required services |  | ||||||
| ################# |  | ||||||
|  |  | ||||||
| PAPERLESS_REDIS=<url> |  | ||||||
|     This is required for processing scheduled tasks such as email fetching, index |  | ||||||
|     optimization and for training the automatic document matcher. |  | ||||||
|  |  | ||||||
|     * If your Redis server needs login credentials PAPERLESS_REDIS = ``redis://<username>:<password>@<host>:<port>`` |  | ||||||
|  |  | ||||||
|     * With the requirepass option PAPERLESS_REDIS = ``redis://:<password>@<host>:<port>`` |  | ||||||
|  |  | ||||||
|     `More information on securing your Redis Instance <https://redis.io/docs/getting-started/#securing-redis>`_. |  | ||||||
|  |  | ||||||
|     Defaults to redis://localhost:6379. |  | ||||||
|  |  | ||||||
| PAPERLESS_DBENGINE=<engine_name> |  | ||||||
|     Optional, gives the ability to choose Postgres or MariaDB for database engine. |  | ||||||
|     Available options are `postgresql` and `mariadb`. |  | ||||||
|  |  | ||||||
|     Default is `postgresql`. |  | ||||||
|  |  | ||||||
|     .. warning:: |  | ||||||
|  |  | ||||||
|       Using MariaDB comes with some caveats.  See :ref:`advanced-mysql-caveats` for details. |  | ||||||
|  |  | ||||||
|  |  | ||||||
| PAPERLESS_DBHOST=<hostname> |  | ||||||
|     By default, sqlite is used as the database backend. This can be changed here. |  | ||||||
|  |  | ||||||
|     Set PAPERLESS_DBHOST and another database will be used instead of sqlite. |  | ||||||
|  |  | ||||||
| PAPERLESS_DBPORT=<port> |  | ||||||
|     Adjust port if necessary. |  | ||||||
|  |  | ||||||
|     Default is 5432. |  | ||||||
|  |  | ||||||
| PAPERLESS_DBNAME=<name> |  | ||||||
|     Database name in PostgreSQL or MariaDB. |  | ||||||
|  |  | ||||||
|     Defaults to "paperless". |  | ||||||
|  |  | ||||||
| PAPERLESS_DBUSER=<name> |  | ||||||
|     Database user in PostgreSQL or MariaDB. |  | ||||||
|  |  | ||||||
|     Defaults to "paperless". |  | ||||||
|  |  | ||||||
| PAPERLESS_DBPASS=<password> |  | ||||||
|     Database password for PostgreSQL or MariaDB. |  | ||||||
|  |  | ||||||
|     Defaults to "paperless". |  | ||||||
|  |  | ||||||
| PAPERLESS_DBSSLMODE=<mode> |  | ||||||
|     SSL mode to use when connecting to PostgreSQL. |  | ||||||
|  |  | ||||||
|     See `the official documentation about sslmode <https://www.postgresql.org/docs/current/libpq-ssl.html>`_. |  | ||||||
|  |  | ||||||
|     Default is ``prefer``. |  | ||||||
|  |  | ||||||
| PAPERLESS_DB_TIMEOUT=<float> |  | ||||||
|     Amount of time for a database connection to wait for the database to unlock. |  | ||||||
|     Mostly applicable for an sqlite based installation, consider changing to postgresql |  | ||||||
|     if you need to increase this. |  | ||||||
|  |  | ||||||
|     Defaults to unset, keeping the Django defaults. |  | ||||||
|  |  | ||||||
| Paths and folders |  | ||||||
| ################# |  | ||||||
|  |  | ||||||
| PAPERLESS_CONSUMPTION_DIR=<path> |  | ||||||
|     This where your documents should go to be consumed.  Make sure that it exists |  | ||||||
|     and that the user running the paperless service can read/write its contents |  | ||||||
|     before you start Paperless. |  | ||||||
|  |  | ||||||
|     Don't change this when using docker, as it only changes the path within the |  | ||||||
|     container. Change the local consumption directory in the docker-compose.yml |  | ||||||
|     file instead. |  | ||||||
|  |  | ||||||
|     Defaults to "../consume/", relative to the "src" directory. |  | ||||||
|  |  | ||||||
| PAPERLESS_DATA_DIR=<path> |  | ||||||
|     This is where paperless stores all its data (search index, SQLite database, |  | ||||||
|     classification model, etc). |  | ||||||
|  |  | ||||||
|     Defaults to "../data/", relative to the "src" directory. |  | ||||||
|  |  | ||||||
| PAPERLESS_TRASH_DIR=<path> |  | ||||||
|     Instead of removing deleted documents, they are moved to this directory. |  | ||||||
|  |  | ||||||
|     This must be writeable by the user running paperless. When running inside |  | ||||||
|     docker, ensure that this path is within a permanent volume (such as |  | ||||||
|     "../media/trash") so it won't get lost on upgrades. |  | ||||||
|  |  | ||||||
|     Defaults to empty (i.e. really delete documents). |  | ||||||
|  |  | ||||||
| PAPERLESS_MEDIA_ROOT=<path> |  | ||||||
|     This is where your documents and thumbnails are stored. |  | ||||||
|  |  | ||||||
|     You can set this and PAPERLESS_DATA_DIR to the same folder to have paperless |  | ||||||
|     store all its data within the same volume. |  | ||||||
|  |  | ||||||
|     Defaults to "../media/", relative to the "src" directory. |  | ||||||
|  |  | ||||||
| PAPERLESS_STATICDIR=<path> |  | ||||||
|     Override the default STATIC_ROOT here.  This is where all static files |  | ||||||
|     created using "collectstatic" manager command are stored. |  | ||||||
|  |  | ||||||
|     Unless you're doing something fancy, there is no need to override this. |  | ||||||
|  |  | ||||||
|     Defaults to "../static/", relative to the "src" directory. |  | ||||||
|  |  | ||||||
| PAPERLESS_FILENAME_FORMAT=<format> |  | ||||||
|     Changes the filenames paperless uses to store documents in the media directory. |  | ||||||
|     See :ref:`advanced-file_name_handling` for details. |  | ||||||
|  |  | ||||||
|     Default is none, which disables this feature. |  | ||||||
|  |  | ||||||
| PAPERLESS_FILENAME_FORMAT_REMOVE_NONE=<bool> |  | ||||||
|     Tells paperless to replace placeholders in `PAPERLESS_FILENAME_FORMAT` that would resolve |  | ||||||
|     to 'none' to be omitted from the resulting filename. This also holds true for directory |  | ||||||
|     names. |  | ||||||
|     See :ref:`advanced-file_name_handling` for details. |  | ||||||
|  |  | ||||||
|     Defaults to `false` which disables this feature. |  | ||||||
|  |  | ||||||
| PAPERLESS_LOGGING_DIR=<path> |  | ||||||
|     This is where paperless will store log files. |  | ||||||
|  |  | ||||||
|     Defaults to "``PAPERLESS_DATA_DIR``/log/". |  | ||||||
|  |  | ||||||
|  |  | ||||||
| Logging |  | ||||||
| ####### |  | ||||||
|  |  | ||||||
| PAPERLESS_LOGROTATE_MAX_SIZE=<num> |  | ||||||
|     Maximum file size for log files before they are rotated, in bytes. |  | ||||||
|  |  | ||||||
|     Defaults to 1 MiB. |  | ||||||
|  |  | ||||||
| PAPERLESS_LOGROTATE_MAX_BACKUPS=<num> |  | ||||||
|     Number of rotated log files to keep. |  | ||||||
|  |  | ||||||
|     Defaults to 20. |  | ||||||
|  |  | ||||||
| .. _hosting-and-security: |  | ||||||
|  |  | ||||||
| Hosting & Security |  | ||||||
| ################## |  | ||||||
|  |  | ||||||
| PAPERLESS_SECRET_KEY=<key> |  | ||||||
|     Paperless uses this to make session tokens. If you expose paperless on the |  | ||||||
|     internet, you need to change this, since the default secret is well known. |  | ||||||
|  |  | ||||||
|     Use any sequence of characters. The more, the better. You don't need to |  | ||||||
|     remember this. Just face-roll your keyboard. |  | ||||||
|  |  | ||||||
|     Default is listed in the file ``src/paperless/settings.py``. |  | ||||||
|  |  | ||||||
| PAPERLESS_URL=<url> |  | ||||||
|     This setting can be used to set the three options below (ALLOWED_HOSTS, |  | ||||||
|     CORS_ALLOWED_HOSTS and CSRF_TRUSTED_ORIGINS). If the other options are |  | ||||||
|     set the values will be combined with this one. Do not include a trailing |  | ||||||
|     slash. E.g. https://paperless.domain.com |  | ||||||
|  |  | ||||||
|     Defaults to empty string, leaving the other settings unaffected. |  | ||||||
|  |  | ||||||
| PAPERLESS_CSRF_TRUSTED_ORIGINS=<comma-separated-list> |  | ||||||
|     A list of trusted origins for unsafe requests (e.g. POST). As of Django 4.0 |  | ||||||
|     this is required to access the Django admin via the web. |  | ||||||
|     See https://docs.djangoproject.com/en/4.0/ref/settings/#csrf-trusted-origins |  | ||||||
|  |  | ||||||
|     Can also be set using PAPERLESS_URL (see above). |  | ||||||
|  |  | ||||||
|     Defaults to empty string, which does not add any origins to the trusted list. |  | ||||||
|  |  | ||||||
| PAPERLESS_ALLOWED_HOSTS=<comma-separated-list> |  | ||||||
|     If you're planning on putting Paperless on the open internet, then you |  | ||||||
|     really should set this value to the domain name you're using.  Failing to do |  | ||||||
|     so leaves you open to HTTP host header attacks: |  | ||||||
|     https://docs.djangoproject.com/en/3.1/topics/security/#host-header-validation |  | ||||||
|  |  | ||||||
|     Just remember that this is a comma-separated list, so "example.com" is fine, |  | ||||||
|     as is "example.com,www.example.com", but NOT " example.com" or "example.com," |  | ||||||
|  |  | ||||||
|     Can also be set using PAPERLESS_URL (see above). |  | ||||||
|  |  | ||||||
|     If manually set, please remember to include "localhost". Otherwise docker |  | ||||||
|     healthcheck will fail. |  | ||||||
|  |  | ||||||
|     Defaults to "*", which is all hosts. |  | ||||||
|  |  | ||||||
| PAPERLESS_CORS_ALLOWED_HOSTS=<comma-separated-list> |  | ||||||
|     You need to add your servers to the list of allowed hosts that can do CORS |  | ||||||
|     calls. Set this to your public domain name. |  | ||||||
|  |  | ||||||
|     Can also be set using PAPERLESS_URL (see above). |  | ||||||
|  |  | ||||||
|     Defaults to "http://localhost:8000". |  | ||||||
|  |  | ||||||
| PAPERLESS_FORCE_SCRIPT_NAME=<path> |  | ||||||
|     To host paperless under a subpath url like example.com/paperless you set |  | ||||||
|     this value to /paperless. No trailing slash! |  | ||||||
|  |  | ||||||
|     Defaults to none, which hosts paperless at "/". |  | ||||||
|  |  | ||||||
| PAPERLESS_STATIC_URL=<path> |  | ||||||
|     Override the STATIC_URL here.  Unless you're hosting Paperless off a |  | ||||||
|     subdomain like /paperless/, you probably don't need to change this. |  | ||||||
|     If you do change it, be sure to include the trailing slash. |  | ||||||
|  |  | ||||||
|     Defaults to "/static/". |  | ||||||
|  |  | ||||||
|     .. note:: |  | ||||||
|  |  | ||||||
|         When hosting paperless behind a reverse proxy like Traefik or Nginx at a subpath e.g. |  | ||||||
|         example.com/paperlessngx you will also need to set ``PAPERLESS_FORCE_SCRIPT_NAME`` |  | ||||||
|         (see above). |  | ||||||
|  |  | ||||||
| PAPERLESS_AUTO_LOGIN_USERNAME=<username> |  | ||||||
|     Specify a username here so that paperless will automatically perform login |  | ||||||
|     with the selected user. |  | ||||||
|  |  | ||||||
|     .. danger:: |  | ||||||
|  |  | ||||||
|         Do not use this when exposing paperless on the internet. There are no |  | ||||||
|         checks in place that would prevent you from doing this. |  | ||||||
|  |  | ||||||
|     Defaults to none, which disables this feature. |  | ||||||
|  |  | ||||||
| PAPERLESS_ADMIN_USER=<username> |  | ||||||
|     If this environment variable is specified, Paperless automatically creates |  | ||||||
|     a superuser with the provided username at start. This is useful in cases |  | ||||||
|     where you can not run the `createsuperuser` command separately, such as Kubernetes |  | ||||||
|     or AWS ECS. |  | ||||||
|  |  | ||||||
|     Requires `PAPERLESS_ADMIN_PASSWORD` to be set. |  | ||||||
|  |  | ||||||
|     .. note:: |  | ||||||
|  |  | ||||||
|         This will not change an existing [super]user's password, nor will |  | ||||||
|         it recreate a user that already exists. You can leave this throughout |  | ||||||
|         the lifecycle of the containers. |  | ||||||
|  |  | ||||||
| PAPERLESS_ADMIN_MAIL=<email> |  | ||||||
|     (Optional) Specify superuser email address. Only used when |  | ||||||
|     `PAPERLESS_ADMIN_USER` is set. |  | ||||||
|  |  | ||||||
|     Defaults to ``root@localhost``. |  | ||||||
|  |  | ||||||
| PAPERLESS_ADMIN_PASSWORD=<password> |  | ||||||
|     Only used when `PAPERLESS_ADMIN_USER` is set. |  | ||||||
|     This will be the password of the automatically created superuser. |  | ||||||
|  |  | ||||||
|  |  | ||||||
| PAPERLESS_COOKIE_PREFIX=<str> |  | ||||||
|     Specify a prefix that is added to the cookies used by paperless to identify |  | ||||||
|     the currently logged in user. This is useful for when you're running two |  | ||||||
|     instances of paperless on the same host. |  | ||||||
|  |  | ||||||
|     After changing this, you will have to login again. |  | ||||||
|  |  | ||||||
|     Defaults to ``""``, which does not alter the cookie names. |  | ||||||
|  |  | ||||||
| PAPERLESS_ENABLE_HTTP_REMOTE_USER=<bool> |  | ||||||
|     Allows authentication via HTTP_REMOTE_USER which is used by some SSO |  | ||||||
|     applications. |  | ||||||
|  |  | ||||||
|     .. warning:: |  | ||||||
|  |  | ||||||
|         This will allow authentication by simply adding a ``Remote-User: <username>`` header |  | ||||||
|         to a request. Use with care! You especially *must* ensure that any such header is not |  | ||||||
|         passed from your proxy server to paperless. |  | ||||||
|  |  | ||||||
|         If you're exposing paperless to the internet directly, do not use this. |  | ||||||
|  |  | ||||||
|         Also see the warning `in the official documentation <https://docs.djangoproject.com/en/3.1/howto/auth-remote-user/#configuration>`. |  | ||||||
|  |  | ||||||
|     Defaults to `false` which disables this feature. |  | ||||||
|  |  | ||||||
| PAPERLESS_HTTP_REMOTE_USER_HEADER_NAME=<str> |  | ||||||
|     If `PAPERLESS_ENABLE_HTTP_REMOTE_USER` is enabled, this property allows to |  | ||||||
|     customize the name of the HTTP header from which the authenticated username |  | ||||||
|     is extracted. Values are in terms of |  | ||||||
|     [HttpRequest.META](https://docs.djangoproject.com/en/3.1/ref/request-response/#django.http.HttpRequest.META). |  | ||||||
|     Thus, the configured value must start with `HTTP_` followed by the |  | ||||||
|     normalized actual header name. |  | ||||||
|  |  | ||||||
|     Defaults to `HTTP_REMOTE_USER`. |  | ||||||
|  |  | ||||||
| PAPERLESS_LOGOUT_REDIRECT_URL=<str> |  | ||||||
|     URL to redirect the user to after a logout. This can be used together with |  | ||||||
|     `PAPERLESS_ENABLE_HTTP_REMOTE_USER` to redirect the user back to the SSO |  | ||||||
|     application's logout page. |  | ||||||
|  |  | ||||||
|     Defaults to None, which disables this feature. |  | ||||||
|  |  | ||||||
| .. _configuration-ocr: |  | ||||||
|  |  | ||||||
| OCR settings |  | ||||||
| ############ |  | ||||||
|  |  | ||||||
| Paperless uses `OCRmyPDF <https://ocrmypdf.readthedocs.io/en/latest/>`_ for |  | ||||||
| performing OCR on documents and images. Paperless uses sensible defaults for |  | ||||||
| most settings, but all of them can be configured to your needs. |  | ||||||
|  |  | ||||||
| PAPERLESS_OCR_LANGUAGE=<lang> |  | ||||||
|     Customize the language that paperless will attempt to use when |  | ||||||
|     parsing documents. |  | ||||||
|  |  | ||||||
|     It should be a 3-letter language code consistent with ISO |  | ||||||
|     639: https://www.loc.gov/standards/iso639-2/php/code_list.php |  | ||||||
|  |  | ||||||
|     Set this to the language most of your documents are written in. |  | ||||||
|  |  | ||||||
|     This can be a combination of multiple languages such as ``deu+eng``, |  | ||||||
|     in which case tesseract will use whatever language matches best. |  | ||||||
|     Keep in mind that tesseract uses much more cpu time with multiple |  | ||||||
|     languages enabled. |  | ||||||
|  |  | ||||||
|     Defaults to "eng". |  | ||||||
|  |  | ||||||
| 		Note: If your language contains a '-' such as chi-sim, you must use chi_sim |  | ||||||
|  |  | ||||||
| PAPERLESS_OCR_MODE=<mode> |  | ||||||
|     Tell paperless when and how to perform ocr on your documents. Four modes |  | ||||||
|     are available: |  | ||||||
|  |  | ||||||
|     *   ``skip``: Paperless skips all pages and will perform ocr only on pages |  | ||||||
|         where no text is present. This is the safest option. |  | ||||||
|     *   ``skip_noarchive``: In addition to skip, paperless won't create an |  | ||||||
|         archived version of your documents when it finds any text in them. |  | ||||||
|         This is useful if you don't want to have two almost-identical versions |  | ||||||
|         of your digital documents in the media folder. This is the fastest option. |  | ||||||
|     *   ``redo``: Paperless will OCR all pages of your documents and attempt to |  | ||||||
|         replace any existing text layers with new text. This will be useful for |  | ||||||
|         documents from scanners that already performed OCR with insufficient |  | ||||||
|         results. It will also perform OCR on purely digital documents. |  | ||||||
|  |  | ||||||
|         This option may fail on some documents that have features that cannot |  | ||||||
|         be removed, such as forms. In this case, the text from the document is |  | ||||||
|         used instead. |  | ||||||
|     *   ``force``: Paperless rasterizes your documents, converting any text |  | ||||||
|         into images and puts the OCRed text on top. This works for all documents, |  | ||||||
|         however, the resulting document may be significantly larger and text |  | ||||||
|         won't appear as sharp when zoomed in. |  | ||||||
|  |  | ||||||
|     The default is ``skip``, which only performs OCR when necessary and always |  | ||||||
|     creates archived documents. |  | ||||||
|  |  | ||||||
|     Read more about this in the `OCRmyPDF documentation <https://ocrmypdf.readthedocs.io/en/latest/advanced.html#when-ocr-is-skipped>`_. |  | ||||||
|  |  | ||||||
| PAPERLESS_OCR_CLEAN=<mode> |  | ||||||
|     Tells paperless to use ``unpaper`` to clean any input document before |  | ||||||
|     sending it to tesseract. This uses more resources, but generally results |  | ||||||
|     in better OCR results. The following modes are available: |  | ||||||
|  |  | ||||||
|     *   ``clean``: Apply unpaper. |  | ||||||
|     *   ``clean-final``: Apply unpaper, and use the cleaned images to build the |  | ||||||
|         output file instead of the original images. |  | ||||||
|     *   ``none``: Do not apply unpaper. |  | ||||||
|  |  | ||||||
|     Defaults to ``clean``. |  | ||||||
|  |  | ||||||
|     .. note:: |  | ||||||
|  |  | ||||||
|         ``clean-final`` is incompatible with ocr mode ``redo``. When both |  | ||||||
|         ``clean-final`` and the ocr mode ``redo`` is configured, ``clean`` |  | ||||||
|         is used instead. |  | ||||||
|  |  | ||||||
| PAPERLESS_OCR_DESKEW=<bool> |  | ||||||
|     Tells paperless to correct skewing (slight rotation of input images mainly |  | ||||||
|     due to improper scanning) |  | ||||||
|  |  | ||||||
|     Defaults to ``true``, which enables this feature. |  | ||||||
|  |  | ||||||
|     .. note:: |  | ||||||
|  |  | ||||||
|         Deskewing is incompatible with ocr mode ``redo``. Deskewing will get |  | ||||||
|         disabled automatically if ``redo`` is used as the ocr mode. |  | ||||||
|  |  | ||||||
| PAPERLESS_OCR_ROTATE_PAGES=<bool> |  | ||||||
|     Tells paperless to correct page rotation (90°, 180° and 270° rotation). |  | ||||||
|  |  | ||||||
|     If you notice that paperless is not rotating incorrectly rotated |  | ||||||
|     pages (or vice versa), try adjusting the threshold up or down (see below). |  | ||||||
|  |  | ||||||
|     Defaults to ``true``, which enables this feature. |  | ||||||
|  |  | ||||||
|  |  | ||||||
| PAPERLESS_OCR_ROTATE_PAGES_THRESHOLD=<num> |  | ||||||
|     Adjust the threshold for automatic page rotation by ``PAPERLESS_OCR_ROTATE_PAGES``. |  | ||||||
|     This is an arbitrary value reported by tesseract. "15" is a very conservative value, |  | ||||||
|     whereas "2" is a very aggressive option and will often result in correctly rotated pages |  | ||||||
|     being rotated as well. |  | ||||||
|  |  | ||||||
|     Defaults to "12". |  | ||||||
|  |  | ||||||
| PAPERLESS_OCR_OUTPUT_TYPE=<type> |  | ||||||
|     Specify the the type of PDF documents that paperless should produce. |  | ||||||
|  |  | ||||||
|     *   ``pdf``: Modify the PDF document as little as possible. |  | ||||||
|     *   ``pdfa``: Convert PDF documents into PDF/A-2b documents, which is a |  | ||||||
|         subset of the entire PDF specification and meant for storing |  | ||||||
|         documents long term. |  | ||||||
|     *   ``pdfa-1``, ``pdfa-2``, ``pdfa-3`` to specify the exact version of |  | ||||||
|         PDF/A you wish to use. |  | ||||||
|  |  | ||||||
|     If not specified, ``pdfa`` is used. Remember that paperless also keeps |  | ||||||
|     the original input file as well as the archived version. |  | ||||||
|  |  | ||||||
|  |  | ||||||
| PAPERLESS_OCR_PAGES=<num> |  | ||||||
|     Tells paperless to use only the specified amount of pages for OCR. Documents |  | ||||||
|     with less than the specified amount of pages get OCR'ed completely. |  | ||||||
|  |  | ||||||
|     Specifying 1 here will only use the first page. |  | ||||||
|  |  | ||||||
|     When combined with ``PAPERLESS_OCR_MODE=redo`` or ``PAPERLESS_OCR_MODE=force``, |  | ||||||
|     paperless will not modify any text it finds on excluded pages and copy it |  | ||||||
|     verbatim. |  | ||||||
|  |  | ||||||
|     Defaults to 0, which disables this feature and always uses all pages. |  | ||||||
|  |  | ||||||
| PAPERLESS_OCR_IMAGE_DPI=<num> |  | ||||||
|     Paperless will OCR any images you put into the system and convert them |  | ||||||
|     into PDF documents. This is useful if your scanner produces images. |  | ||||||
|     In order to do so, paperless needs to know the DPI of the image. |  | ||||||
|     Most images from scanners will have this information embedded and |  | ||||||
|     paperless will detect and use that information. In case this fails, it |  | ||||||
|     uses this value as a fallback. |  | ||||||
|  |  | ||||||
|     Set this to the DPI your scanner produces images at. |  | ||||||
|  |  | ||||||
|     Default is none, which will automatically calculate image DPI so that |  | ||||||
|     the produced PDF documents are A4 sized. |  | ||||||
|  |  | ||||||
| PAPERLESS_OCR_MAX_IMAGE_PIXELS=<num> |  | ||||||
|     Paperless will raise a warning when OCRing images which are over this limit and |  | ||||||
|     will not OCR images which are more than twice this limit.  Note this does not |  | ||||||
|     prevent the document from being consumed, but could result in missing text content. |  | ||||||
|  |  | ||||||
|     If unset, will default to the value determined by |  | ||||||
|     `Pillow <https://pillow.readthedocs.io/en/stable/reference/Image.html#PIL.Image.MAX_IMAGE_PIXELS>`_. |  | ||||||
|  |  | ||||||
|     .. note:: |  | ||||||
|  |  | ||||||
|         Increasing this limit could cause Paperless to consume additional resources |  | ||||||
|         when consuming a file.  Be sure you have sufficient system resources. |  | ||||||
|  |  | ||||||
|     .. caution:: |  | ||||||
|  |  | ||||||
|         The limit is intended to prevent malicious files from consuming system resources |  | ||||||
|         and causing crashes and other errors.  Only increase this value if you are certain |  | ||||||
|         your documents are not malicious and you need the text which was not OCRed |  | ||||||
|  |  | ||||||
| PAPERLESS_OCR_USER_ARGS=<json> |  | ||||||
|     OCRmyPDF offers many more options. Use this parameter to specify any |  | ||||||
|     additional arguments you wish to pass to OCRmyPDF. Since Paperless uses |  | ||||||
|     the API of OCRmyPDF, you have to specify these in a format that can be |  | ||||||
|     passed to the API. See `the API reference of OCRmyPDF <https://ocrmypdf.readthedocs.io/en/latest/api.html#reference>`_ |  | ||||||
|     for valid parameters. All command line options are supported, but they |  | ||||||
|     use underscores instead of dashes. |  | ||||||
|  |  | ||||||
|     .. caution:: |  | ||||||
|  |  | ||||||
|         Paperless has been tested to work with the OCR options provided |  | ||||||
|         above. There are many options that are incompatible with each other, |  | ||||||
|         so specifying invalid options may prevent paperless from consuming |  | ||||||
|         any documents. |  | ||||||
|  |  | ||||||
|     Specify arguments as a JSON dictionary. Keep note of lower case booleans |  | ||||||
|     and double quoted parameter names and strings. Examples: |  | ||||||
|  |  | ||||||
|     .. code:: json |  | ||||||
|  |  | ||||||
|         {"deskew": true, "optimize": 3, "unpaper_args": "--pre-rotate 90"} |  | ||||||
|  |  | ||||||
| .. _configuration-tika: |  | ||||||
|  |  | ||||||
| Tika settings |  | ||||||
| ############# |  | ||||||
|  |  | ||||||
| Paperless can make use of `Tika <https://tika.apache.org/>`_ and |  | ||||||
| `Gotenberg <https://gotenberg.dev/>`_ for parsing and |  | ||||||
| converting "Office" documents (such as ".doc", ".xlsx" and ".odt"). If you |  | ||||||
| wish to use this, you must provide a Tika server and a Gotenberg server, |  | ||||||
| configure their endpoints, and enable the feature. |  | ||||||
|  |  | ||||||
| PAPERLESS_TIKA_ENABLED=<bool> |  | ||||||
|     Enable (or disable) the Tika parser. |  | ||||||
|  |  | ||||||
|     Defaults to false. |  | ||||||
|  |  | ||||||
| PAPERLESS_TIKA_ENDPOINT=<url> |  | ||||||
|     Set the endpoint URL were Paperless can reach your Tika server. |  | ||||||
|  |  | ||||||
|     Defaults to "http://localhost:9998". |  | ||||||
|  |  | ||||||
| PAPERLESS_TIKA_GOTENBERG_ENDPOINT=<url> |  | ||||||
|     Set the endpoint URL were Paperless can reach your Gotenberg server. |  | ||||||
|  |  | ||||||
|     Defaults to "http://localhost:3000". |  | ||||||
|  |  | ||||||
| If you run paperless on docker, you can add those services to the docker-compose |  | ||||||
| file (see the provided ``docker-compose.sqlite-tika.yml`` file for reference). The changes |  | ||||||
| requires are as follows: |  | ||||||
|  |  | ||||||
| .. code:: yaml |  | ||||||
|  |  | ||||||
|     services: |  | ||||||
|         # ... |  | ||||||
|  |  | ||||||
|         webserver: |  | ||||||
|             # ... |  | ||||||
|  |  | ||||||
|             environment: |  | ||||||
|                 # ... |  | ||||||
|  |  | ||||||
|                 PAPERLESS_TIKA_ENABLED: 1 |  | ||||||
|                 PAPERLESS_TIKA_GOTENBERG_ENDPOINT: http://gotenberg:3000 |  | ||||||
|                 PAPERLESS_TIKA_ENDPOINT: http://tika:9998 |  | ||||||
|  |  | ||||||
|         # ... |  | ||||||
|  |  | ||||||
|         gotenberg: |  | ||||||
|             image: gotenberg/gotenberg:7.6 |  | ||||||
|             restart: unless-stopped |  | ||||||
|             command: |  | ||||||
|                 - "gotenberg" |  | ||||||
|                 - "--chromium-disable-routes=true" |  | ||||||
|  |  | ||||||
|         tika: |  | ||||||
|             image: ghcr.io/paperless-ngx/tika:latest |  | ||||||
|             restart: unless-stopped |  | ||||||
|  |  | ||||||
| Add the configuration variables to the environment of the webserver (alternatively |  | ||||||
| put the configuration in the ``docker-compose.env`` file) and add the additional |  | ||||||
| services below the webserver service. Watch out for indentation. |  | ||||||
|  |  | ||||||
| Make sure to use the correct format `PAPERLESS_TIKA_ENABLED = 1` so python_dotenv can parse the statement correctly. |  | ||||||
|  |  | ||||||
| Software tweaks |  | ||||||
| ############### |  | ||||||
|  |  | ||||||
| PAPERLESS_TASK_WORKERS=<num> |  | ||||||
|     Paperless does multiple things in the background: Maintain the search index, |  | ||||||
|     maintain the automatic matching algorithm, check emails, consume documents, |  | ||||||
|     etc. This variable specifies how many things it will do in parallel. |  | ||||||
|  |  | ||||||
|     Defaults to 1 |  | ||||||
|  |  | ||||||
|  |  | ||||||
| PAPERLESS_THREADS_PER_WORKER=<num> |  | ||||||
|     Furthermore, paperless uses multiple threads when consuming documents to |  | ||||||
|     speed up OCR. This variable specifies how many pages paperless will process |  | ||||||
|     in parallel on a single document. |  | ||||||
|  |  | ||||||
|     .. caution:: |  | ||||||
|  |  | ||||||
|         Ensure that the product |  | ||||||
|  |  | ||||||
|             PAPERLESS_TASK_WORKERS * PAPERLESS_THREADS_PER_WORKER |  | ||||||
|  |  | ||||||
|         does not exceed your CPU core count or else paperless will be extremely slow. |  | ||||||
|         If you want paperless to process many documents in parallel, choose a high |  | ||||||
|         worker count. If you want paperless to process very large documents faster, |  | ||||||
|         use a higher thread per worker count. |  | ||||||
|  |  | ||||||
|     The default is a balance between the two, according to your CPU core count, |  | ||||||
|     with a slight favor towards threads per worker: |  | ||||||
|  |  | ||||||
|     +----------------+---------+---------+ |  | ||||||
|     | CPU core count | Workers | Threads | |  | ||||||
|     +----------------+---------+---------+ |  | ||||||
|     |              1 |       1 |       1 | |  | ||||||
|     +----------------+---------+---------+ |  | ||||||
|     |              2 |       2 |       1 | |  | ||||||
|     +----------------+---------+---------+ |  | ||||||
|     |              4 |       2 |       2 | |  | ||||||
|     +----------------+---------+---------+ |  | ||||||
|     |              6 |       2 |       3 | |  | ||||||
|     +----------------+---------+---------+ |  | ||||||
|     |              8 |       2 |       4 | |  | ||||||
|     +----------------+---------+---------+ |  | ||||||
|     |             12 |       3 |       4 | |  | ||||||
|     +----------------+---------+---------+ |  | ||||||
|     |             16 |       4 |       4 | |  | ||||||
|     +----------------+---------+---------+ |  | ||||||
|  |  | ||||||
|     If you only specify PAPERLESS_TASK_WORKERS, paperless will adjust |  | ||||||
|     PAPERLESS_THREADS_PER_WORKER automatically. |  | ||||||
|  |  | ||||||
|  |  | ||||||
| PAPERLESS_WORKER_TIMEOUT=<num> |  | ||||||
|     Machines with few cores or weak ones might not be able to finish OCR on |  | ||||||
|     large documents within the default 1800 seconds. So extending this timeout |  | ||||||
|     may prove to be useful on weak hardware setups. |  | ||||||
|  |  | ||||||
| PAPERLESS_WORKER_RETRY=<num> |  | ||||||
|     If PAPERLESS_WORKER_TIMEOUT has been configured, the retry time for a task can |  | ||||||
|     also be configured.  By default, this value will be set to 10s more than the |  | ||||||
|     worker timeout.  This value should never be set less than the worker timeout. |  | ||||||
|  |  | ||||||
| PAPERLESS_TIME_ZONE=<timezone> |  | ||||||
|     Set the time zone here. |  | ||||||
|     See https://docs.djangoproject.com/en/3.1/ref/settings/#std:setting-TIME_ZONE |  | ||||||
|     for details on how to set it. |  | ||||||
|  |  | ||||||
|     Defaults to UTC. |  | ||||||
|  |  | ||||||
|  |  | ||||||
| .. _configuration-polling: |  | ||||||
|  |  | ||||||
| PAPERLESS_CONSUMER_POLLING=<num> |  | ||||||
|     If paperless won't find documents added to your consume folder, it might |  | ||||||
|     not be able to automatically detect filesystem changes. In that case, |  | ||||||
|     specify a polling interval in seconds here, which will then cause paperless |  | ||||||
|     to periodically check your consumption directory for changes. This will also |  | ||||||
|     disable listening for file system changes with ``inotify``. |  | ||||||
|  |  | ||||||
|     Defaults to 0, which disables polling and uses filesystem notifications. |  | ||||||
|  |  | ||||||
| PAPERLESS_CONSUMER_POLLING_RETRY_COUNT=<num> |  | ||||||
|     If consumer polling is enabled, sets the number of times paperless will check for a |  | ||||||
|     file to remain unmodified. |  | ||||||
|  |  | ||||||
|     Defaults to 5. |  | ||||||
|  |  | ||||||
| PAPERLESS_CONSUMER_POLLING_DELAY=<num> |  | ||||||
|     If consumer polling is enabled, sets the delay in seconds between each check (above) paperless |  | ||||||
|     will do while waiting for a file to remain unmodified. |  | ||||||
|  |  | ||||||
|     Defaults to 5. |  | ||||||
|  |  | ||||||
| .. _configuration-inotify: |  | ||||||
|  |  | ||||||
| PAPERLESS_CONSUMER_INOTIFY_DELAY=<num> |  | ||||||
|     Sets the time in seconds the consumer will wait for additional events |  | ||||||
|     from inotify before the consumer will consider a file ready and begin consumption. |  | ||||||
|     Certain scanners or network setups may generate multiple events for a single file, |  | ||||||
|     leading to multiple consumers working on the same file.  Configure this to |  | ||||||
|     prevent that. |  | ||||||
|  |  | ||||||
|     Defaults to 0.5 seconds. |  | ||||||
|  |  | ||||||
| PAPERLESS_CONSUMER_DELETE_DUPLICATES=<bool> |  | ||||||
|     When the consumer detects a duplicate document, it will not touch the |  | ||||||
|     original document. This default behavior can be changed here. |  | ||||||
|  |  | ||||||
|     Defaults to false. |  | ||||||
|  |  | ||||||
|  |  | ||||||
| PAPERLESS_CONSUMER_RECURSIVE=<bool> |  | ||||||
|     Enable recursive watching of the consumption directory. Paperless will |  | ||||||
|     then pickup files from files in subdirectories within your consumption |  | ||||||
|     directory as well. |  | ||||||
|  |  | ||||||
|     Defaults to false. |  | ||||||
|  |  | ||||||
|  |  | ||||||
| PAPERLESS_CONSUMER_SUBDIRS_AS_TAGS=<bool> |  | ||||||
|     Set the names of subdirectories as tags for consumed files. |  | ||||||
|     E.g. <CONSUMPTION_DIR>/foo/bar/file.pdf will add the tags "foo" and "bar" to |  | ||||||
|     the consumed file. Paperless will create any tags that don't exist yet. |  | ||||||
|  |  | ||||||
|     This is useful for sorting documents with certain tags such as ``car`` or |  | ||||||
|     ``todo`` prior to consumption. These folders won't be deleted. |  | ||||||
|  |  | ||||||
|     PAPERLESS_CONSUMER_RECURSIVE must be enabled for this to work. |  | ||||||
|  |  | ||||||
|     Defaults to false. |  | ||||||
|  |  | ||||||
| PAPERLESS_CONSUMER_ENABLE_BARCODES=<bool> |  | ||||||
|     Enables the scanning and page separation based on detected barcodes. |  | ||||||
|     This allows for scanning and adding multiple documents per uploaded |  | ||||||
|     file, which are separated by one or multiple barcode pages. |  | ||||||
|  |  | ||||||
|     For ease of use, it is suggested to use a standardized separation page, |  | ||||||
|     e.g. `here <https://www.alliancegroup.co.uk/patch-codes.htm>`_. |  | ||||||
|  |  | ||||||
|     If no barcodes are detected in the uploaded file, no page separation |  | ||||||
|     will happen. |  | ||||||
|  |  | ||||||
|     The original document will be removed and the separated pages will be |  | ||||||
|     saved as pdf. |  | ||||||
|  |  | ||||||
|     Defaults to false. |  | ||||||
|  |  | ||||||
|  |  | ||||||
| PAPERLESS_CONSUMER_BARCODE_TIFF_SUPPORT=<bool> |  | ||||||
|     Whether TIFF image files should be scanned for barcodes. |  | ||||||
|     This will automatically convert any TIFF image(s) to pdfs for later |  | ||||||
|     processing. |  | ||||||
|     This only has an effect, if PAPERLESS_CONSUMER_ENABLE_BARCODES has been |  | ||||||
|     enabled. |  | ||||||
|  |  | ||||||
|     Defaults to false. |  | ||||||
|  |  | ||||||
| PAPERLESS_CONSUMER_BARCODE_STRING=PATCHT |  | ||||||
|   Defines the string to be detected as a separator barcode. |  | ||||||
|   If paperless is used with the PATCH-T separator pages, users |  | ||||||
|   shouldn't change this. |  | ||||||
|  |  | ||||||
|   Defaults to "PATCHT" |  | ||||||
|  |  | ||||||
| PAPERLESS_CONVERT_MEMORY_LIMIT=<num> |  | ||||||
|     On smaller systems, or even in the case of Very Large Documents, the consumer |  | ||||||
|     may explode, complaining about how it's "unable to extend pixel cache".  In |  | ||||||
|     such cases, try setting this to a reasonably low value, like 32.  The |  | ||||||
|     default is to use whatever is necessary to do everything without writing to |  | ||||||
|     disk, and units are in megabytes. |  | ||||||
|  |  | ||||||
|     For more information on how to use this value, you should search |  | ||||||
|     the web for "MAGICK_MEMORY_LIMIT". |  | ||||||
|  |  | ||||||
|     Defaults to 0, which disables the limit. |  | ||||||
|  |  | ||||||
| PAPERLESS_CONVERT_TMPDIR=<path> |  | ||||||
|     Similar to the memory limit, if you've got a small system and your OS mounts |  | ||||||
|     /tmp as tmpfs, you should set this to a path that's on a physical disk, like |  | ||||||
|     /home/your_user/tmp or something.  ImageMagick will use this as scratch space |  | ||||||
|     when crunching through very large documents. |  | ||||||
|  |  | ||||||
|     For more information on how to use this value, you should search |  | ||||||
|     the web for "MAGICK_TMPDIR". |  | ||||||
|  |  | ||||||
|     Default is none, which disables the temporary directory. |  | ||||||
|  |  | ||||||
| PAPERLESS_POST_CONSUME_SCRIPT=<filename> |  | ||||||
|     After a document is consumed, Paperless can trigger an arbitrary script if |  | ||||||
|     you like.  This script will be passed a number of arguments for you to work |  | ||||||
|     with. For more information, take a look at :ref:`advanced-post_consume_script`. |  | ||||||
|  |  | ||||||
|     The default is blank, which means nothing will be executed. |  | ||||||
|  |  | ||||||
| PAPERLESS_FILENAME_DATE_ORDER=<format> |  | ||||||
|     Paperless will check the document text for document date information. |  | ||||||
|     Use this setting to enable checking the document filename for date |  | ||||||
|     information. The date order can be set to any option as specified in |  | ||||||
|     https://dateparser.readthedocs.io/en/latest/settings.html#date-order. |  | ||||||
|     The filename will be checked first, and if nothing is found, the document |  | ||||||
|     text will be checked as normal. |  | ||||||
|  |  | ||||||
|     A date in a filename must have some separators (`.`, `-`, `/`, etc) |  | ||||||
|     for it to be parsed. |  | ||||||
|  |  | ||||||
|     Defaults to none, which disables this feature. |  | ||||||
|  |  | ||||||
| PAPERLESS_NUMBER_OF_SUGGESTED_DATES=<num> |  | ||||||
|     Paperless searches an entire document for dates. The first date found will |  | ||||||
|     be used as the initial value for the created date. When this variable is |  | ||||||
|     greater than 0 (or left to it's default value), paperless will also suggest |  | ||||||
|     other dates found in the document, up to a maximum of this setting. Note that |  | ||||||
|     duplicates will be removed, which can result in fewer dates displayed in the |  | ||||||
|     frontend than this setting value. |  | ||||||
|  |  | ||||||
|     The task to find all dates can be time-consuming and increases with a higher |  | ||||||
|     (maximum) number of suggested dates and slower hardware. |  | ||||||
|  |  | ||||||
|     Defaults to 3. Set to 0 to disable this feature. |  | ||||||
|  |  | ||||||
| PAPERLESS_THUMBNAIL_FONT_NAME=<filename> |  | ||||||
|     Paperless creates thumbnails for plain text files by rendering the content |  | ||||||
|     of the file on an image and uses a predefined font for that. This |  | ||||||
|     font can be changed here. |  | ||||||
|  |  | ||||||
|     Note that this won't have any effect on already generated thumbnails. |  | ||||||
|  |  | ||||||
|     Defaults to ``/usr/share/fonts/liberation/LiberationSerif-Regular.ttf``. |  | ||||||
|  |  | ||||||
| PAPERLESS_IGNORE_DATES=<string> |  | ||||||
|     Paperless parses a documents creation date from filename and file content. |  | ||||||
|     You may specify a comma separated list of dates that should be ignored during |  | ||||||
|     this process. This is useful for special dates (like date of birth) that appear |  | ||||||
|     in documents regularly but are very unlikely to be the documents creation date. |  | ||||||
|  |  | ||||||
|     The date is parsed using the order specified in PAPERLESS_DATE_ORDER |  | ||||||
|  |  | ||||||
|     Defaults to an empty string to not ignore any dates. |  | ||||||
|  |  | ||||||
| PAPERLESS_DATE_ORDER=<format> |  | ||||||
|     Paperless will try to determine the document creation date from its contents. |  | ||||||
|     Specify the date format Paperless should expect to see within your documents. |  | ||||||
|  |  | ||||||
|     This option defaults to DMY which translates to day first, month second, and year |  | ||||||
|     last order. Characters D, M, or Y can be shuffled to meet the required order. |  | ||||||
|  |  | ||||||
| PAPERLESS_CONSUMER_IGNORE_PATTERNS=<json> |  | ||||||
|     By default, paperless ignores certain files and folders in the consumption |  | ||||||
|     directory, such as system files created by the Mac OS. |  | ||||||
|  |  | ||||||
|     This can be adjusted by configuring a custom json array with patterns to exclude. |  | ||||||
|  |  | ||||||
|     Defaults to ``[".DS_STORE/*", "._*", ".stfolder/*", ".stversions/*", ".localized/*", "desktop.ini"]``. |  | ||||||
|  |  | ||||||
| Binaries |  | ||||||
| ######## |  | ||||||
|  |  | ||||||
| There are a few external software packages that Paperless expects to find on |  | ||||||
| your system when it starts up.  Unless you've done something creative with |  | ||||||
| their installation, you probably won't need to edit any of these.  However, |  | ||||||
| if you've installed these programs somewhere where simply typing the name of |  | ||||||
| the program doesn't automatically execute it (ie. the program isn't in your |  | ||||||
| $PATH), then you'll need to specify the literal path for that program. |  | ||||||
|  |  | ||||||
| PAPERLESS_CONVERT_BINARY=<path> |  | ||||||
|     Defaults to "convert". |  | ||||||
|  |  | ||||||
| PAPERLESS_GS_BINARY=<path> |  | ||||||
|     Defaults to "gs". |  | ||||||
|  |  | ||||||
|  |  | ||||||
| .. _configuration-docker: |  | ||||||
|  |  | ||||||
| Docker-specific options |  | ||||||
| ####################### |  | ||||||
|  |  | ||||||
| These options don't have any effect in ``paperless.conf``. These options adjust |  | ||||||
| the behavior of the docker container. Configure these in `docker-compose.env`. |  | ||||||
|  |  | ||||||
| PAPERLESS_WEBSERVER_WORKERS=<num> |  | ||||||
|     The number of worker processes the webserver should spawn. More worker processes |  | ||||||
|     usually result in the front end to load data much quicker. However, each worker process |  | ||||||
|     also loads the entire application into memory separately, so increasing this value |  | ||||||
|     will increase RAM usage. |  | ||||||
|  |  | ||||||
|     Defaults to 1. |  | ||||||
|  |  | ||||||
| PAPERLESS_BIND_ADDR=<ip address> |  | ||||||
|     The IP address the webserver will listen on inside the container. There are |  | ||||||
|     special setups where you may need to configure this value to restrict the |  | ||||||
|     Ip address or interface the webserver listens on. |  | ||||||
|  |  | ||||||
|     Defaults to [::], meaning all interfaces, including IPv6. |  | ||||||
|  |  | ||||||
| PAPERLESS_PORT=<port> |  | ||||||
|     The port number the webserver will listen on inside the container. There are |  | ||||||
|     special setups where you may need this to avoid collisions with other |  | ||||||
|     services (like using podman with multiple containers in one pod). |  | ||||||
|  |  | ||||||
|     Don't change this when using Docker. To change the port the webserver is |  | ||||||
|     reachable outside of the container, instead refer to the "ports" key in |  | ||||||
|     ``docker-compose.yml``. |  | ||||||
|  |  | ||||||
|     Defaults to 8000. |  | ||||||
|  |  | ||||||
| USERMAP_UID=<uid> |  | ||||||
|     The ID of the paperless user in the container. Set this to your actual user ID on the |  | ||||||
|     host system, which you can get by executing |  | ||||||
|  |  | ||||||
|     .. code:: shell-session |  | ||||||
|  |  | ||||||
|         $ id -u |  | ||||||
|  |  | ||||||
|     Paperless will change ownership on its folders to this user, so you need to get this right |  | ||||||
|     in order to be able to write to the consumption directory. |  | ||||||
|  |  | ||||||
|     Defaults to 1000. |  | ||||||
|  |  | ||||||
| USERMAP_GID=<gid> |  | ||||||
|     The ID of the paperless Group in the container. Set this to your actual group ID on the |  | ||||||
|     host system, which you can get by executing |  | ||||||
|  |  | ||||||
|     .. code:: shell-session |  | ||||||
|  |  | ||||||
|         $ id -g |  | ||||||
|  |  | ||||||
|     Paperless will change ownership on its folders to this group, so you need to get this right |  | ||||||
|     in order to be able to write to the consumption directory. |  | ||||||
|  |  | ||||||
|     Defaults to 1000. |  | ||||||
|  |  | ||||||
| PAPERLESS_OCR_LANGUAGES=<list> |  | ||||||
|     Additional OCR languages to install. By default, paperless comes with |  | ||||||
|     English, German, Italian, Spanish and French. If your language is not in this list, install |  | ||||||
|     additional languages with this configuration option: |  | ||||||
|  |  | ||||||
|     .. code:: bash |  | ||||||
|  |  | ||||||
|         PAPERLESS_OCR_LANGUAGES=tur ces |  | ||||||
|  |  | ||||||
|     To actually use these languages, also set the default OCR language of paperless: |  | ||||||
|  |  | ||||||
|     .. code:: bash |  | ||||||
|  |  | ||||||
|         PAPERLESS_OCR_LANGUAGE=tur |  | ||||||
|  |  | ||||||
|     Defaults to none, which does not install any additional languages. |  | ||||||
|  |  | ||||||
| PAPERLESS_ENABLE_FLOWER=<defined> |  | ||||||
|     If this environment variable is defined, the Celery monitoring tool |  | ||||||
|     `Flower <https://flower.readthedocs.io/en/latest/index.html>`_ will |  | ||||||
|     be started by the container. |  | ||||||
|  |  | ||||||
|     You can read more about this in the :ref:`advanced setup <advanced-celery-monitoring>` |  | ||||||
|     documentation. |  | ||||||
|  |  | ||||||
|  |  | ||||||
| .. _configuration-update-checking: |  | ||||||
|  |  | ||||||
| Update Checking |  | ||||||
| ############### |  | ||||||
|  |  | ||||||
| PAPERLESS_ENABLE_UPDATE_CHECK=<bool> |  | ||||||
|  |  | ||||||
|     .. note:: |  | ||||||
|  |  | ||||||
|             This setting was deprecated in favor of a frontend setting after v1.9.2. A one-time |  | ||||||
|             migration is performed for users who have this setting set. This setting is always |  | ||||||
|             ignored if the corresponding frontend setting has been set. |  | ||||||
|   | |||||||
| @@ -1,431 +1,12 @@ | |||||||
| .. _extending: | .. _extending: | ||||||
|  |  | ||||||
|  | ************************* | ||||||
| Paperless-ngx Development | Paperless-ngx Development | ||||||
| ######################### | ************************* | ||||||
|  |  | ||||||
| This section describes the steps you need to take to start development on paperless-ngx. |  | ||||||
|  |  | ||||||
| Check out the source from github. The repository is organized in the following way: | .. cssclass:: redirect-notice | ||||||
|  |  | ||||||
| *   ``main`` always represents the latest release and will only see changes |     The Paperless-ngx documentation has permanently moved. | ||||||
|     when a new release is made. |  | ||||||
| *   ``dev`` contains the code that will be in the next release. |  | ||||||
| *   ``feature-X`` contain bigger changes that will be in some release, but not |  | ||||||
|     necessarily the next one. |  | ||||||
|  |  | ||||||
| When making functional changes to paperless, *always* make your changes on the ``dev`` branch. |     You will be redirected shortly... | ||||||
|  |  | ||||||
| Apart from that, the folder structure is as follows: |  | ||||||
|  |  | ||||||
| *   ``docs/`` - Documentation. |  | ||||||
| *   ``src-ui/`` - Code of the front end. |  | ||||||
| *   ``src/`` - Code of the back end. |  | ||||||
| *   ``scripts/`` - Various scripts that help with different parts of development. |  | ||||||
| *   ``docker/`` - Files required to build the docker image. |  | ||||||
|  |  | ||||||
| Contributing to Paperless |  | ||||||
| ========================= |  | ||||||
|  |  | ||||||
| Maybe you've been using Paperless for a while and want to add a feature or two, |  | ||||||
| or maybe you've come across a bug that you have some ideas how to solve.  The |  | ||||||
| beauty of open source software is that you can see what's wrong and help to get |  | ||||||
| it fixed for everyone! |  | ||||||
|  |  | ||||||
| Before contributing please review our `code of conduct`_ and other important |  | ||||||
| information in the `contributing guidelines`_. |  | ||||||
|  |  | ||||||
| .. _code-formatting-with-pre-commit-hooks: |  | ||||||
|  |  | ||||||
| Code formatting with pre-commit Hooks |  | ||||||
| ===================================== |  | ||||||
|  |  | ||||||
| To ensure a consistent style and formatting across the project source, the project |  | ||||||
| utilizes a Git `pre-commit` hook to perform some formatting and linting before a |  | ||||||
| commit is allowed. That way, everyone uses the same style and some common issues |  | ||||||
| can be caught early on. See below for installation instructions. |  | ||||||
|  |  | ||||||
| Once installed, hooks will run when you commit. If the formatting isn't quite right |  | ||||||
| or a linter catches something, the commit will be rejected. You'll need to look at the |  | ||||||
| output and fix the issue. Some hooks, such as the Python formatting tool `black`, |  | ||||||
| will format failing files, so all you need to do is `git add` those files again and |  | ||||||
| retry your commit. |  | ||||||
|  |  | ||||||
| Initial setup and first start |  | ||||||
| ============================= |  | ||||||
|  |  | ||||||
| After you forked and cloned the code from github you need to perform a first-time setup. |  | ||||||
| To do the setup you need to perform the steps from the following chapters in a certain order: |  | ||||||
|  |  | ||||||
| 1.  Install prerequisites + pipenv as mentioned in :ref:`Bare metal route <setup-bare_metal>` |  | ||||||
| 2.  Copy ``paperless.conf.example`` to ``paperless.conf`` and enable debug mode. |  | ||||||
| 3.  Install the Angular CLI interface: |  | ||||||
|  |  | ||||||
|     .. code:: shell-session |  | ||||||
|  |  | ||||||
|         $ npm install -g @angular/cli |  | ||||||
|  |  | ||||||
| 4.  Install pre-commit |  | ||||||
|  |  | ||||||
|     .. code:: shell-session |  | ||||||
|  |  | ||||||
|         pre-commit install |  | ||||||
|  |  | ||||||
| 5.  Create ``consume`` and ``media`` folders in the cloned root folder. |  | ||||||
|  |  | ||||||
|     .. code:: shell-session |  | ||||||
|  |  | ||||||
|         mkdir -p consume media |  | ||||||
|  |  | ||||||
| 6.  You can now either ... |  | ||||||
|  |  | ||||||
|     *  install redis or |  | ||||||
|     *  use the included scripts/start-services.sh to use docker to fire up a redis instance (and some other services such as tika, gotenberg and a database server) or |  | ||||||
|     *  spin up a bare redis container |  | ||||||
|  |  | ||||||
|         .. code:: shell-session |  | ||||||
|  |  | ||||||
|             docker run -d -p 6379:6379 --restart unless-stopped redis:latest |  | ||||||
|  |  | ||||||
| 7.  Install the python dependencies by performing in the src/ directory. |  | ||||||
|  |  | ||||||
|     .. code:: shell-session |  | ||||||
|  |  | ||||||
|         pipenv install --dev |  | ||||||
|  |  | ||||||
|   * Make sure you're using python 3.9.x or lower. Otherwise you might get issues with building dependencies. You can use `pyenv <https://github.com/pyenv/pyenv>`_ to install a specific python version. |  | ||||||
|  |  | ||||||
| 8.  Generate the static UI so you can perform a login to get session that is required for frontend development (this needs to be done one time only). From src-ui directory: |  | ||||||
|  |  | ||||||
|     .. code:: shell-session |  | ||||||
|  |  | ||||||
|         npm install . |  | ||||||
|         ./node_modules/.bin/ng build --configuration production |  | ||||||
|  |  | ||||||
| 9.  Apply migrations and create a superuser for your dev instance: |  | ||||||
|  |  | ||||||
|     .. code:: shell-session |  | ||||||
|  |  | ||||||
|         python3 manage.py migrate |  | ||||||
|         python3 manage.py createsuperuser |  | ||||||
|  |  | ||||||
| 10.  Now spin up the dev backend. Depending on which part of paperless you're developing for, you need to have some or all of them running. |  | ||||||
|  |  | ||||||
|     .. code:: shell-session |  | ||||||
|  |  | ||||||
|         python3 manage.py runserver & python3 manage.py document_consumer & celery --app paperless worker |  | ||||||
|  |  | ||||||
| 11. Login with the superuser credentials provided in step 8 at ``http://localhost:8000`` to create a session that enables you to use the backend. |  | ||||||
|  |  | ||||||
| Backend development environment is now ready, to start Frontend development go to ``/src-ui`` and run ``ng serve``. From there you can use ``http://localhost:4200`` for a preview. |  | ||||||
|  |  | ||||||
| Back end development |  | ||||||
| ==================== |  | ||||||
|  |  | ||||||
| The backend is a django application. PyCharm works well for development, but you can use whatever |  | ||||||
| you want. |  | ||||||
|  |  | ||||||
| Configure the IDE to use the src/ folder as the base source folder. Configure the following |  | ||||||
| launch configurations in your IDE: |  | ||||||
|  |  | ||||||
| *   python3 manage.py runserver |  | ||||||
| *   celery --app paperless worker |  | ||||||
| *   python3 manage.py document_consumer |  | ||||||
|  |  | ||||||
| To start them all: |  | ||||||
|  |  | ||||||
| .. code:: shell-session |  | ||||||
|  |  | ||||||
|     python3 manage.py runserver & python3 manage.py document_consumer & celery --app paperless worker |  | ||||||
|  |  | ||||||
| Testing and code style: |  | ||||||
|  |  | ||||||
| *   Run ``pytest`` in the src/ directory to execute all tests. This also generates a HTML coverage |  | ||||||
|     report. When runnings test, paperless.conf is loaded as well. However: the tests rely on the default |  | ||||||
|     configuration. This is not ideal. But for now, make sure no settings except for DEBUG are overridden when testing. |  | ||||||
| *   Coding style is enforced by the Git pre-commit hooks.  These will ensure your code is formatted and do some |  | ||||||
|     linting when you do a `git commit`. |  | ||||||
| *   You can also run ``black`` manually to format your code |  | ||||||
|  |  | ||||||
|     .. note:: |  | ||||||
|  |  | ||||||
|         The line length rule E501 is generally useful for getting multiple source files |  | ||||||
|         next to each other on the screen. However, in some cases, its just not possible |  | ||||||
|         to make some lines fit, especially complicated IF cases. Append ``# NOQA: E501`` |  | ||||||
|         to disable this check for certain lines. |  | ||||||
|  |  | ||||||
| Front end development |  | ||||||
| ===================== |  | ||||||
|  |  | ||||||
| The front end is built using Angular. In order to get started, you need ``npm``. |  | ||||||
| Install the Angular CLI interface with |  | ||||||
|  |  | ||||||
| .. code:: shell-session |  | ||||||
|  |  | ||||||
|     $ npm install -g @angular/cli |  | ||||||
|  |  | ||||||
| and make sure that it's on your path. Next, in the src-ui/ directory, install the |  | ||||||
| required dependencies of the project. |  | ||||||
|  |  | ||||||
| .. code:: shell-session |  | ||||||
|  |  | ||||||
|     $ npm install |  | ||||||
|  |  | ||||||
| You can launch a development server by running |  | ||||||
|  |  | ||||||
| .. code:: shell-session |  | ||||||
|  |  | ||||||
|     $ ng serve |  | ||||||
|  |  | ||||||
| This will automatically update whenever you save. However, in-place compilation might fail |  | ||||||
| on syntax errors, in which case you need to restart it. |  | ||||||
|  |  | ||||||
| By default, the development server is available on ``http://localhost:4200/`` and is configured |  | ||||||
| to access the API at ``http://localhost:8000/api/``, which is the default of the backend. |  | ||||||
| If you enabled DEBUG on the back end, several security overrides for allowed hosts, CORS and |  | ||||||
| X-Frame-Options are in place so that the front end behaves exactly as in production. This also |  | ||||||
| relies on you being logged into the back end. Without a valid session, The front end will simply |  | ||||||
| not work. |  | ||||||
|  |  | ||||||
| Testing and code style: |  | ||||||
|  |  | ||||||
| *   The frontend code (.ts, .html, .scss) use ``prettier`` for code formatting via the Git |  | ||||||
|     ``pre-commit`` hooks which run automatically on commit. See |  | ||||||
|     :ref:`above <code-formatting-with-pre-commit-hooks>` for installation. You can also run this |  | ||||||
|     via cli with a command such as |  | ||||||
|  |  | ||||||
|     .. code:: shell-session |  | ||||||
|  |  | ||||||
|         $ git ls-files -- '*.ts' | xargs pre-commit run prettier --files |  | ||||||
|  |  | ||||||
| *   Frontend testing uses jest and cypress. There is currently a need for significantly more |  | ||||||
|     frontend tests. Unit tests and e2e tests, respectively, can be run non-interactively with: |  | ||||||
|  |  | ||||||
|     .. code:: shell-session |  | ||||||
|  |  | ||||||
|         $ ng test |  | ||||||
|         $ npm run e2e:ci |  | ||||||
|  |  | ||||||
|     Cypress also includes a UI which can be run from within the ``src-ui`` directory with |  | ||||||
|  |  | ||||||
|     .. code:: shell-session |  | ||||||
|  |  | ||||||
|         $ ./node_modules/.bin/cypress open |  | ||||||
|  |  | ||||||
| In order to build the front end and serve it as part of django, execute |  | ||||||
|  |  | ||||||
| .. code:: shell-session |  | ||||||
|  |  | ||||||
|     $ ng build --prod |  | ||||||
|  |  | ||||||
| This will build the front end and put it in a location from which the Django server will serve |  | ||||||
| it as static content. This way, you can verify that authentication is working. |  | ||||||
|  |  | ||||||
|  |  | ||||||
| Localization |  | ||||||
| ============ |  | ||||||
|  |  | ||||||
| Paperless is available in many different languages. Since paperless consists both of a django |  | ||||||
| application and an Angular front end, both these parts have to be translated separately. |  | ||||||
|  |  | ||||||
| Front end localization |  | ||||||
| ---------------------- |  | ||||||
|  |  | ||||||
| *   The Angular front end does localization according to the `Angular documentation <https://angular.io/guide/i18n>`_. |  | ||||||
| *   The source language of the project is "en_US". |  | ||||||
| *   The source strings end up in the file "src-ui/messages.xlf". |  | ||||||
| *   The translated strings need to be placed in the "src-ui/src/locale/" folder. |  | ||||||
| *   In order to extract added or changed strings from the source files, call ``ng xi18n --ivy``. |  | ||||||
|  |  | ||||||
| Adding new languages requires adding the translated files in the "src-ui/src/locale/" folder and adjusting a couple files. |  | ||||||
|  |  | ||||||
| 1.  Adjust "src-ui/angular.json": |  | ||||||
|  |  | ||||||
|     .. code:: json |  | ||||||
|  |  | ||||||
|         "i18n": { |  | ||||||
|             "sourceLocale": "en-US", |  | ||||||
|             "locales": { |  | ||||||
|                 "de": "src/locale/messages.de.xlf", |  | ||||||
|                 "nl-NL": "src/locale/messages.nl_NL.xlf", |  | ||||||
|                 "fr": "src/locale/messages.fr.xlf", |  | ||||||
|                 "en-GB": "src/locale/messages.en_GB.xlf", |  | ||||||
|                 "pt-BR": "src/locale/messages.pt_BR.xlf", |  | ||||||
|                 "language-code": "language-file" |  | ||||||
|             } |  | ||||||
|         } |  | ||||||
|  |  | ||||||
| 2.  Add the language to the available options in "src-ui/src/app/services/settings.service.ts": |  | ||||||
|  |  | ||||||
|     .. code:: typescript |  | ||||||
|  |  | ||||||
|         getLanguageOptions(): LanguageOption[] { |  | ||||||
|             return [ |  | ||||||
|                 {code: "en-us", name: $localize`English (US)`, englishName: "English (US)", dateInputFormat: "mm/dd/yyyy"}, |  | ||||||
|                 {code: "en-gb", name: $localize`English (GB)`, englishName: "English (GB)", dateInputFormat: "dd/mm/yyyy"}, |  | ||||||
|                 {code: "de", name: $localize`German`, englishName: "German", dateInputFormat: "dd.mm.yyyy"}, |  | ||||||
|                 {code: "nl", name: $localize`Dutch`, englishName: "Dutch", dateInputFormat: "dd-mm-yyyy"}, |  | ||||||
|                 {code: "fr", name: $localize`French`, englishName: "French", dateInputFormat: "dd/mm/yyyy"}, |  | ||||||
|                 {code: "pt-br", name: $localize`Portuguese (Brazil)`, englishName: "Portuguese (Brazil)", dateInputFormat: "dd/mm/yyyy"} |  | ||||||
|                 // Add your new language here |  | ||||||
|             ] |  | ||||||
|         } |  | ||||||
|  |  | ||||||
|     ``dateInputFormat`` is a special string that defines the behavior of the date input fields and absolutely needs to contain "dd", "mm" and "yyyy". |  | ||||||
|  |  | ||||||
| 3.  Import and register the Angular data for this locale in "src-ui/src/app/app.module.ts": |  | ||||||
|  |  | ||||||
|     .. code:: typescript |  | ||||||
|  |  | ||||||
|         import localeDe from '@angular/common/locales/de'; |  | ||||||
|         registerLocaleData(localeDe) |  | ||||||
|  |  | ||||||
| Back end localization |  | ||||||
| --------------------- |  | ||||||
|  |  | ||||||
| A majority of the strings that appear in the back end appear only when the admin is used. However, |  | ||||||
| some of these are still shown on the front end (such as error messages). |  | ||||||
|  |  | ||||||
| *   The django application does localization according to the `django documentation <https://docs.djangoproject.com/en/3.1/topics/i18n/translation/>`_. |  | ||||||
| *   The source language of the project is "en_US". |  | ||||||
| *   Localization files end up in the folder "src/locale/". |  | ||||||
| *   In order to extract strings from the application, call ``python3 manage.py makemessages -l en_US``. This is important after making changes to translatable strings. |  | ||||||
| *   The message files need to be compiled for them to show up in the application. Call ``python3 manage.py compilemessages`` to do this. The generated files don't get |  | ||||||
|     committed into git, since these are derived artifacts. The build pipeline takes care of executing this command. |  | ||||||
|  |  | ||||||
| Adding new languages requires adding the translated files in the "src/locale/" folder and adjusting the file "src/paperless/settings.py" to include the new language: |  | ||||||
|  |  | ||||||
| .. code:: python |  | ||||||
|  |  | ||||||
|     LANGUAGES = [ |  | ||||||
|         ("en-us", _("English (US)")), |  | ||||||
|         ("en-gb", _("English (GB)")), |  | ||||||
|         ("de", _("German")), |  | ||||||
|         ("nl-nl", _("Dutch")), |  | ||||||
|         ("fr", _("French")), |  | ||||||
|         ("pt-br", _("Portuguese (Brazil)")), |  | ||||||
|         # Add language here. |  | ||||||
|     ] |  | ||||||
|  |  | ||||||
|  |  | ||||||
| Building the documentation |  | ||||||
| ========================== |  | ||||||
|  |  | ||||||
| The documentation is built using sphinx. I've configured ReadTheDocs to automatically build |  | ||||||
| the documentation when changes are pushed. If you want to build the documentation locally, |  | ||||||
| this is how you do it: |  | ||||||
|  |  | ||||||
| 1.  Install python dependencies. |  | ||||||
|  |  | ||||||
|     .. code:: shell-session |  | ||||||
|  |  | ||||||
|         $ cd /path/to/paperless |  | ||||||
|         $ pipenv install --dev |  | ||||||
|  |  | ||||||
| 2.  Build the documentation |  | ||||||
|  |  | ||||||
|     .. code:: shell-session |  | ||||||
|  |  | ||||||
|         $ cd /path/to/paperless/docs |  | ||||||
|         $ pipenv run make clean html |  | ||||||
|  |  | ||||||
| This will build the HTML documentation, and put the resulting files in the ``_build/html`` |  | ||||||
| directory. |  | ||||||
|  |  | ||||||
| Building the Docker image |  | ||||||
| ========================= |  | ||||||
|  |  | ||||||
| The docker image is primarily built by the GitHub actions workflow, but it can be |  | ||||||
| faster when developing to build and tag an image locally. |  | ||||||
|  |  | ||||||
| To provide the build arguments automatically, build the image using the helper |  | ||||||
| script ``build-docker-image.sh``. |  | ||||||
|  |  | ||||||
| Building the docker image from source: |  | ||||||
|  |  | ||||||
|     .. code:: shell-session |  | ||||||
|  |  | ||||||
|         ./build-docker-image.sh Dockerfile -t <your-tag> |  | ||||||
|  |  | ||||||
| Extending Paperless |  | ||||||
| =================== |  | ||||||
|  |  | ||||||
| Paperless does not have any fancy plugin systems and will probably never have. However, |  | ||||||
| some parts of the application have been designed to allow easy integration of additional |  | ||||||
| features without any modification to the base code. |  | ||||||
|  |  | ||||||
| Making custom parsers |  | ||||||
| --------------------- |  | ||||||
|  |  | ||||||
| Paperless uses parsers to add documents to paperless. A parser is responsible for: |  | ||||||
|  |  | ||||||
| *   Retrieve the content from the original |  | ||||||
| *   Create a thumbnail |  | ||||||
| *   Optional: Retrieve a created date from the original |  | ||||||
| *   Optional: Create an archived document from the original |  | ||||||
|  |  | ||||||
| Custom parsers can be added to paperless to support more file types. In order to do that, |  | ||||||
| you need to write the parser itself and announce its existence to paperless. |  | ||||||
|  |  | ||||||
| The parser itself must extend ``documents.parsers.DocumentParser`` and must implement the |  | ||||||
| methods ``parse`` and ``get_thumbnail``. You can provide your own implementation to |  | ||||||
| ``get_date`` if you don't want to rely on paperless' default date guessing mechanisms. |  | ||||||
|  |  | ||||||
| .. code:: python |  | ||||||
|  |  | ||||||
|     class MyCustomParser(DocumentParser): |  | ||||||
|  |  | ||||||
|         def parse(self, document_path, mime_type): |  | ||||||
|             # This method does not return anything. Rather, you should assign |  | ||||||
|             # whatever you got from the document to the following fields: |  | ||||||
|  |  | ||||||
|             # The content of the document. |  | ||||||
|             self.text = "content" |  | ||||||
|  |  | ||||||
|             # Optional: path to a PDF document that you created from the original. |  | ||||||
|             self.archive_path = os.path.join(self.tempdir, "archived.pdf") |  | ||||||
|  |  | ||||||
|             # Optional: "created" date of the document. |  | ||||||
|             self.date = get_created_from_metadata(document_path) |  | ||||||
|  |  | ||||||
|         def get_thumbnail(self, document_path, mime_type): |  | ||||||
|             # This should return the path to a thumbnail you created for this |  | ||||||
|             # document. |  | ||||||
|             return os.path.join(self.tempdir, "thumb.png") |  | ||||||
|  |  | ||||||
| If you encounter any issues during parsing, raise a ``documents.parsers.ParseError``. |  | ||||||
|  |  | ||||||
| The ``self.tempdir`` directory is a temporary directory that is guaranteed to be empty |  | ||||||
| and removed after consumption finished. You can use that directory to store any |  | ||||||
| intermediate files and also use it to store the thumbnail / archived document. |  | ||||||
|  |  | ||||||
| After that, you need to announce your parser to paperless. You need to connect a |  | ||||||
| handler to the ``document_consumer_declaration`` signal. Have a look in the file |  | ||||||
| ``src/paperless_tesseract/apps.py`` on how that's done. The handler is a method |  | ||||||
| that returns information about your parser: |  | ||||||
|  |  | ||||||
| .. code:: python |  | ||||||
|  |  | ||||||
|     def myparser_consumer_declaration(sender, **kwargs): |  | ||||||
|         return { |  | ||||||
|             "parser": MyCustomParser, |  | ||||||
|             "weight": 0, |  | ||||||
|             "mime_types": { |  | ||||||
|                 "application/pdf": ".pdf", |  | ||||||
|                 "image/jpeg": ".jpg", |  | ||||||
|             } |  | ||||||
|         } |  | ||||||
|  |  | ||||||
| *   ``parser`` is a reference to a class that extends ``DocumentParser``. |  | ||||||
|  |  | ||||||
| *   ``weight`` is used whenever two or more parsers are able to parse a file: The parser with |  | ||||||
|     the higher weight wins. This can be used to override the parsers provided by |  | ||||||
|     paperless. |  | ||||||
|  |  | ||||||
| *   ``mime_types`` is a dictionary. The keys are the mime types your parser supports and the value |  | ||||||
|     is the default file extension that paperless should use when storing files and serving them for |  | ||||||
|     download. We could guess that from the file extensions, but some mime types have many extensions |  | ||||||
|     associated with them and the python methods responsible for guessing the extension do not always |  | ||||||
|     return the same value. |  | ||||||
|  |  | ||||||
| .. _code of conduct: https://github.com/paperless-ngx/paperless-ngx/blob/main/CODE_OF_CONDUCT.md |  | ||||||
| .. _contributing guidelines: https://github.com/paperless-ngx/paperless-ngx/blob/main/CONTRIBUTING.md |  | ||||||
|   | |||||||
							
								
								
									
										113
									
								
								docs/faq.rst
									
									
									
									
									
								
							
							
						
						
									
										113
									
								
								docs/faq.rst
									
									
									
									
									
								
							| @@ -1,117 +1,12 @@ | |||||||
|  | .. _faq: | ||||||
|  |  | ||||||
| ************************** | ************************** | ||||||
| Frequently asked questions | Frequently asked questions | ||||||
| ************************** | ************************** | ||||||
|  |  | ||||||
| **Q:** *What's the general plan for Paperless-ngx?* |  | ||||||
|  |  | ||||||
| **A:** While Paperless-ngx is already considered largely "feature-complete" it is a community-driven | .. cssclass:: redirect-notice | ||||||
| project and development will be guided in this way. New features can be submitted via |  | ||||||
| GitHub discussions and "up-voted" by the community but this is not a guarantee the feature |  | ||||||
| will be implemented. This project will always be open to collaboration in the form of PRs, |  | ||||||
| ideas etc. |  | ||||||
|  |  | ||||||
| **Q:** *I'm using docker. Where are my documents?* |     The Paperless-ngx documentation has permanently moved. | ||||||
|  |  | ||||||
| **A:** Your documents are stored inside the docker volume ``paperless_media``. |     You will be redirected shortly... | ||||||
| Docker manages this volume automatically for you. It is a persistent storage |  | ||||||
| and will persist as long as you don't explicitly delete it. The actual location |  | ||||||
| depends on your host operating system. On Linux, chances are high that this location |  | ||||||
| is |  | ||||||
|  |  | ||||||
| .. code:: |  | ||||||
|  |  | ||||||
|     /var/lib/docker/volumes/paperless_media/_data |  | ||||||
|  |  | ||||||
| .. caution:: |  | ||||||
|  |  | ||||||
|     Do not mess with this folder. Don't change permissions and don't move |  | ||||||
|     files around manually. This folder is meant to be entirely managed by docker |  | ||||||
|     and paperless. |  | ||||||
|  |  | ||||||
| **Q:** *Let's say I want to switch tools in a year. Can I easily move to other systems?* |  | ||||||
|  |  | ||||||
| **A:** Your documents are stored as plain files inside the media folder. You can always drag those files |  | ||||||
| out of that folder to use them elsewhere. Here are a couple notes about that. |  | ||||||
|  |  | ||||||
| *   Paperless-ngx never modifies your original documents. It keeps checksums of all documents and uses a |  | ||||||
|     scheduled sanity checker to check that they remain the same. |  | ||||||
| *   By default, paperless uses the internal ID of each document as its filename. This might not be very |  | ||||||
|     convenient for export. However, you can adjust the way files are stored in paperless by |  | ||||||
|     :ref:`configuring the filename format <advanced-file_name_handling>`. |  | ||||||
| *   :ref:`The exporter <utilities-exporter>` is another easy way to get your files out of paperless with reasonable file names. |  | ||||||
|  |  | ||||||
| **Q:** *What file types does paperless-ngx support?* |  | ||||||
|  |  | ||||||
| **A:** Currently, the following files are supported: |  | ||||||
|  |  | ||||||
| *   PDF documents, PNG images, JPEG images, TIFF images and GIF images are processed with OCR and converted into PDF documents. |  | ||||||
| *   Plain text documents are supported as well and are added verbatim |  | ||||||
|     to paperless. |  | ||||||
| *   With the optional Tika integration enabled (see :ref:`Configuration <configuration-tika>`), Paperless also supports various |  | ||||||
|     Office documents (.docx, .doc, odt, .ppt, .pptx, .odp, .xls, .xlsx, .ods). |  | ||||||
|  |  | ||||||
| Paperless-ngx determines the type of a file by inspecting its content. The |  | ||||||
| file extensions do not matter. |  | ||||||
|  |  | ||||||
| **Q:** *Will paperless-ngx run on Raspberry Pi?* |  | ||||||
|  |  | ||||||
| **A:** The short answer is yes. I've tested it on a Raspberry Pi 3 B. |  | ||||||
| The long answer is that certain parts of |  | ||||||
| Paperless will run very slow, such as the OCR. On Raspberry Pi, |  | ||||||
| try to OCR documents before feeding them into paperless so that paperless can |  | ||||||
| reuse the text. The web interface is a lot snappier, since it runs |  | ||||||
| in your browser and paperless has to do much less work to serve the data. |  | ||||||
|  |  | ||||||
| .. note:: |  | ||||||
|  |  | ||||||
|     You can adjust some of the settings so that paperless uses less processing |  | ||||||
|     power. See :ref:`setup-less_powerful_devices` for details. |  | ||||||
|  |  | ||||||
|  |  | ||||||
| **Q:** *How do I install paperless-ngx on Raspberry Pi?* |  | ||||||
|  |  | ||||||
| **A:** Docker images are available for arm and arm64 hardware, so just follow |  | ||||||
| the docker-compose instructions. Apart from more required disk space compared to |  | ||||||
| a bare metal installation, docker comes with close to zero overhead, even on |  | ||||||
| Raspberry Pi. |  | ||||||
|  |  | ||||||
| If you decide to got with the bare metal route, be aware that some of the |  | ||||||
| python requirements do not have precompiled packages for ARM / ARM64. Installation |  | ||||||
| of these will require additional development libraries and compilation will take |  | ||||||
| a long time. |  | ||||||
|  |  | ||||||
| **Q:** *How do I run this on Unraid?* |  | ||||||
|  |  | ||||||
| **A:** Paperless-ngx is available as `community app <https://unraid.net/community/apps?q=paperless-ngx>`_ |  | ||||||
| in Unraid. `Uli Fahrer <https://github.com/Tooa>`_ created a container template for that. |  | ||||||
|  |  | ||||||
| **Q:** *How do I run this on my toaster?* |  | ||||||
|  |  | ||||||
| **A:** I honestly don't know! As for all other devices that might be able |  | ||||||
| to run paperless, you're a bit on your own. If you can't run the docker image, |  | ||||||
| the documentation has instructions for bare metal installs. I'm running |  | ||||||
| paperless on an i3 processor from 2015 or so. This is also what I use to test |  | ||||||
| new releases with. Apart from that, I also have a Raspberry Pi, which I |  | ||||||
| occasionally build the image on and see if it works. |  | ||||||
|  |  | ||||||
| **Q:** *How do I proxy this with NGINX?* |  | ||||||
|  |  | ||||||
| **A:** See :ref:`here <setup-nginx>`. |  | ||||||
|  |  | ||||||
| .. _faq-mod_wsgi: |  | ||||||
|  |  | ||||||
| **Q:** *How do I get WebSocket support with Apache mod_wsgi*? |  | ||||||
|  |  | ||||||
| **A:** ``mod_wsgi`` by itself does not support ASGI. Paperless will continue |  | ||||||
| to work with WSGI, but certain features such as status notifications about |  | ||||||
| document consumption won't be available. |  | ||||||
|  |  | ||||||
| If you want to continue using ``mod_wsgi``, you will have to run an ASGI-enabled |  | ||||||
| web server as well that processes WebSocket connections, and configure Apache to |  | ||||||
| redirect WebSocket connections to this server. Multiple options for ASGI servers |  | ||||||
| exist: |  | ||||||
|  |  | ||||||
| * ``gunicorn`` with ``uvicorn`` as the worker implementation (the default of paperless) |  | ||||||
| * ``daphne`` as a standalone server, which is the reference implementation for ASGI. |  | ||||||
| * ``uvicorn`` as a standalone server |  | ||||||
|   | |||||||
| @@ -2,74 +2,24 @@ | |||||||
| Paperless | Paperless | ||||||
| ********* | ********* | ||||||
|  |  | ||||||
| Paperless is a simple Django application running in two parts: |  | ||||||
| a *Consumer* (the thing that does the indexing) and |  | ||||||
| the *Web server* (the part that lets you search & |  | ||||||
| download already-indexed documents). If you want to learn more about its |  | ||||||
| functions keep on reading after the installation section. |  | ||||||
|  |  | ||||||
|  | .. cssclass:: redirect-notice | ||||||
|  |  | ||||||
| Why This Exists |     The Paperless-ngx documentation has permanently moved. | ||||||
| =============== |  | ||||||
|  |  | ||||||
| Paper is a nightmare.  Environmental issues aside, there's no excuse for it in |     You will be redirected shortly... | ||||||
| the 21st century.  It takes up space, collects dust, doesn't support any form |  | ||||||
| of a search feature, indexing is tedious, it's heavy and prone to damage & |  | ||||||
| loss. |  | ||||||
|  |  | ||||||
| I wrote this to make "going paperless" easier.  I do not have to worry about |  | ||||||
| finding stuff again. I feed documents right from the post box into the scanner |  | ||||||
| and then shred them.  Perhaps you might find it useful too. |  | ||||||
|  |  | ||||||
|  |  | ||||||
| Paperless-ngx |  | ||||||
| ============= |  | ||||||
|  |  | ||||||
| Paperless-ngx is a document management system that transforms your physical |  | ||||||
| documents into a searchable online archive so you can keep, well, *less paper*. |  | ||||||
|  |  | ||||||
| Paperless-ngx forked from paperless-ng to continue the great work and |  | ||||||
| distribute responsibility of supporting and advancing the project among a team |  | ||||||
| of people. |  | ||||||
|  |  | ||||||
| NG stands for both Angular (the framework used for the |  | ||||||
| Frontend) and next-gen. Publishing this project under a different name also |  | ||||||
| avoids confusion between paperless and paperless-ngx. |  | ||||||
|  |  | ||||||
| If you want to learn about what's different in paperless-ngx from Paperless, check out these |  | ||||||
| resources in the documentation: |  | ||||||
|  |  | ||||||
| *   :ref:`Some screenshots <screenshots>` of the new UI are available. |  | ||||||
| *   Read :ref:`this section <advanced-automatic_matching>` if you want to |  | ||||||
|     learn about how paperless automates all tagging using machine learning. |  | ||||||
| *   Paperless now comes with a :ref:`proper email consumer <usage-email>` |  | ||||||
|     that's fully tested and production ready. |  | ||||||
| *   Paperless creates searchable PDF/A documents from whatever you put into |  | ||||||
|     the consumption directory. This means that you can select text in |  | ||||||
|     image-only documents coming from your scanner. |  | ||||||
| *   See :ref:`this note <utilities-encyption>` about GnuPG encryption in |  | ||||||
|     paperless-ngx. |  | ||||||
| *   Paperless is now integrated with a |  | ||||||
|     :ref:`task processing queue <setup-task_processor>` that tells you |  | ||||||
|     at a glance when and why something is not working. |  | ||||||
| *   The :doc:`changelog </changelog>` contains a detailed list of all changes |  | ||||||
|     in paperless-ngx. |  | ||||||
|  |  | ||||||
| Contents |  | ||||||
| ======== |  | ||||||
|  |  | ||||||
| .. toctree:: | .. toctree:: | ||||||
|    :maxdepth: 1 |  | ||||||
|  |  | ||||||
|    setup |     screenshots | ||||||
|    usage_overview |     scanners | ||||||
|    advanced_usage |     administration | ||||||
|    administration |     advanced_usage | ||||||
|    configuration |     usage_overview | ||||||
|    api |     setup | ||||||
|    faq |     troubleshooting | ||||||
|    troubleshooting |     changelog | ||||||
|    extending |     configuration | ||||||
|    scanners |     extending | ||||||
|    screenshots |     api | ||||||
|    changelog |     faq | ||||||
|   | |||||||
| @@ -1,8 +1,12 @@ | |||||||
|  |  | ||||||
| .. _scanners: | .. _scanners: | ||||||
|  |  | ||||||
| ******************* | ******************* | ||||||
| Scanners & Software | Scanners & Software | ||||||
| ******************* | ******************* | ||||||
|  |  | ||||||
| Paperless-ngx is compatible with many different scanners and scanning tools. A user-maintained list of scanners and other software is available on `the wiki <https://github.com/paperless-ngx/paperless-ngx/wiki/Scanner-&-Software-Recommendations>`_. |  | ||||||
|  | .. cssclass:: redirect-notice | ||||||
|  |  | ||||||
|  |     The Paperless-ngx documentation has permanently moved. | ||||||
|  |  | ||||||
|  |     You will be redirected shortly... | ||||||
|   | |||||||
| @@ -4,60 +4,9 @@ | |||||||
| Screenshots | Screenshots | ||||||
| *********** | *********** | ||||||
|  |  | ||||||
| This is what Paperless-ngx looks like. |  | ||||||
|  |  | ||||||
| The dashboard shows customizable views on your document and allows document uploads: | .. cssclass:: redirect-notice | ||||||
|  |  | ||||||
| .. image:: _static/screenshots/dashboard.png |     The Paperless-ngx documentation has permanently moved. | ||||||
|     :target: _static/screenshots/dashboard.png |  | ||||||
|  |  | ||||||
| The document list provides three different styles to scroll through your documents: |     You will be redirected shortly... | ||||||
|  |  | ||||||
| .. image:: _static/screenshots/documents-table.png |  | ||||||
|     :target: _static/screenshots/documents-table.png |  | ||||||
| .. image:: _static/screenshots/documents-smallcards.png |  | ||||||
|     :target: _static/screenshots/documents-smallcards.png |  | ||||||
| .. image:: _static/screenshots/documents-largecards.png |  | ||||||
|     :target: _static/screenshots/documents-largecards.png |  | ||||||
|  |  | ||||||
| Paperless-ngx also supports "dark mode": |  | ||||||
|  |  | ||||||
| .. image:: _static/screenshots/documents-smallcards-dark.png |  | ||||||
|     :target: _static/screenshots/documents-smallcards-dark.png |  | ||||||
|  |  | ||||||
| Extensive filtering mechanisms: |  | ||||||
|  |  | ||||||
| .. image:: _static/screenshots/documents-filter.png |  | ||||||
|     :target: _static/screenshots/documents-filter.png |  | ||||||
|  |  | ||||||
| Bulk editing of document tags, correspondents, etc.: |  | ||||||
|  |  | ||||||
| .. image:: _static/screenshots/bulk-edit.png |  | ||||||
|     :target: _static/screenshots/bulk-edit.png |  | ||||||
|  |  | ||||||
| Side-by-side editing of documents: |  | ||||||
|  |  | ||||||
| .. image:: _static/screenshots/editing.png |  | ||||||
|     :target: _static/screenshots/editing.png |  | ||||||
|  |  | ||||||
| Tag editing. This looks about the same for correspondents and document types. |  | ||||||
|  |  | ||||||
| .. image:: _static/screenshots/new-tag.png |  | ||||||
|     :target: _static/screenshots/new-tag.png |  | ||||||
|  |  | ||||||
| Searching provides auto complete and highlights the results. |  | ||||||
|  |  | ||||||
| .. image:: _static/screenshots/search-preview.png |  | ||||||
|     :target: _static/screenshots/search-preview.png |  | ||||||
| .. image:: _static/screenshots/search-results.png |  | ||||||
|     :target: _static/screenshots/search-results.png |  | ||||||
|  |  | ||||||
| Fancy mail filters! |  | ||||||
|  |  | ||||||
| .. image:: _static/screenshots/mail-rules-edited.png |  | ||||||
|     :target: _static/screenshots/mail-rules-edited.png |  | ||||||
|  |  | ||||||
| Mobile devices are supported. |  | ||||||
|  |  | ||||||
| .. image:: _static/screenshots/mobile.png |  | ||||||
|     :target: _static/screenshots/mobile.png |  | ||||||
|   | |||||||
							
								
								
									
										890
									
								
								docs/setup.rst
									
									
									
									
									
								
							
							
						
						
									
										890
									
								
								docs/setup.rst
									
									
									
									
									
								
							| @@ -1,894 +1,12 @@ | |||||||
|  | .. _setup: | ||||||
|  |  | ||||||
| ***** | ***** | ||||||
| Setup | Setup | ||||||
| ***** | ***** | ||||||
|  |  | ||||||
| Overview of Paperless-ngx |  | ||||||
| ######################### |  | ||||||
|  |  | ||||||
| Compared to paperless, paperless-ngx works a little different under the hood and has | .. cssclass:: redirect-notice | ||||||
| more moving parts that work together. While this increases the complexity of |  | ||||||
| the system, it also brings many benefits. |  | ||||||
|  |  | ||||||
| Paperless consists of the following components: |     The Paperless-ngx documentation has permanently moved. | ||||||
|  |  | ||||||
| *   **The webserver:** This is pretty much the same as in paperless. It serves |     You will be redirected shortly... | ||||||
|     the administration pages, the API, and the new frontend. This is the main |  | ||||||
|     tool you'll be using to interact with paperless. You may start the webserver |  | ||||||
|     with |  | ||||||
|  |  | ||||||
|     .. code:: shell-session |  | ||||||
|  |  | ||||||
|         $ cd /path/to/paperless/src/ |  | ||||||
|         $ gunicorn -c ../gunicorn.conf.py paperless.wsgi |  | ||||||
|  |  | ||||||
|     or by any other means such as Apache ``mod_wsgi``. |  | ||||||
|  |  | ||||||
| *   **The consumer:** This is what watches your consumption folder for documents. |  | ||||||
|     However, the consumer itself does not really consume your documents. |  | ||||||
|     Now it notifies a task processor that a new file is ready for consumption. |  | ||||||
|     I suppose it should be named differently. |  | ||||||
|     This was also used to check your emails, but that's now done elsewhere as well. |  | ||||||
|  |  | ||||||
|     Start the consumer with the management command ``document_consumer``: |  | ||||||
|  |  | ||||||
|     .. code:: shell-session |  | ||||||
|  |  | ||||||
|         $ cd /path/to/paperless/src/ |  | ||||||
|         $ python3 manage.py document_consumer |  | ||||||
|  |  | ||||||
|     .. _setup-task_processor: |  | ||||||
|  |  | ||||||
| *   **The task processor:** Paperless relies on `Celery - Distributed Task Queue <https://docs.celeryq.dev/en/stable/index.html>`_ |  | ||||||
|     for doing most of the heavy lifting. This is a task queue that accepts tasks from |  | ||||||
|     multiple sources and processes these in parallel. It also comes with a scheduler that executes |  | ||||||
|     certain commands periodically. |  | ||||||
|  |  | ||||||
|     This task processor is responsible for: |  | ||||||
|  |  | ||||||
|     *   Consuming documents. When the consumer finds new documents, it notifies the task processor to |  | ||||||
|         start a consumption task. |  | ||||||
|     *   The task processor also performs the consumption of any documents you upload through |  | ||||||
|         the web interface. |  | ||||||
|     *   Consuming emails. It periodically checks your configured accounts for new emails and |  | ||||||
|         notifies the task processor to consume the attachment of an email. |  | ||||||
|     *   Maintaining the search index and the automatic matching algorithm. These are things that paperless |  | ||||||
|         needs to do from time to time in order to operate properly. |  | ||||||
|  |  | ||||||
|     This allows paperless to process multiple documents from your consumption folder in parallel! On |  | ||||||
|     a modern multi core system, this makes the consumption process with full OCR blazingly fast. |  | ||||||
|  |  | ||||||
|     The task processor comes with a built-in admin interface that you can use to check whenever any of the |  | ||||||
|     tasks fail and inspect the errors (i.e., wrong email credentials, errors during consuming a specific |  | ||||||
|     file, etc). |  | ||||||
|  |  | ||||||
| *   A `redis <https://redis.io/>`_ message broker: This is a really lightweight service that is responsible |  | ||||||
|     for getting the tasks from the webserver and the consumer to the task scheduler. These run in a different |  | ||||||
|     process (maybe even on different machines!), and therefore, this is necessary. |  | ||||||
|  |  | ||||||
| *   Optional: A database server. Paperless supports PostgreSQL, MariaDB and SQLite for storing its data. |  | ||||||
|  |  | ||||||
|  |  | ||||||
| Installation |  | ||||||
| ############ |  | ||||||
|  |  | ||||||
| You can go multiple routes to setup and run Paperless: |  | ||||||
|  |  | ||||||
| * :ref:`Use the easy install docker script <setup-docker_script>` |  | ||||||
| * :ref:`Pull the image from Docker Hub <setup-docker_hub>` |  | ||||||
| * :ref:`Build the Docker image yourself <setup-docker_build>` |  | ||||||
| * :ref:`Install Paperless directly on your system manually (bare metal) <setup-bare_metal>` |  | ||||||
|  |  | ||||||
| The Docker routes are quick & easy. These are the recommended routes. This configures all the stuff |  | ||||||
| from the above automatically so that it just works and uses sensible defaults for all configuration options. |  | ||||||
| Here you find a cheat-sheet for docker beginners: `CLI Basics <https://www.sehn.tech/refs/devops-with-docker/>`_ |  | ||||||
|  |  | ||||||
| The bare metal route is complicated to setup but makes it easier |  | ||||||
| should you want to contribute some code back. You need to configure and |  | ||||||
| run the above mentioned components yourself. |  | ||||||
|  |  | ||||||
| .. _CLI Basics: https://www.sehn.tech/refs/devops-with-docker/ |  | ||||||
|  |  | ||||||
| .. _setup-docker_script: |  | ||||||
|  |  | ||||||
| Install Paperless from Docker Hub using the installation script |  | ||||||
| =============================================================== |  | ||||||
|  |  | ||||||
| Paperless provides an interactive installation script. This script will ask you |  | ||||||
| for a couple configuration options, download and create the necessary configuration files, pull the docker image, start paperless and create your user account. This script essentially |  | ||||||
| performs all the steps described in :ref:`setup-docker_hub` automatically. |  | ||||||
|  |  | ||||||
| 1.  Make sure that docker and docker-compose are installed. |  | ||||||
| 2.  Download and run the installation script: |  | ||||||
|  |  | ||||||
|     .. code:: shell-session |  | ||||||
|  |  | ||||||
|         $ bash -c "$(curl -L https://raw.githubusercontent.com/paperless-ngx/paperless-ngx/main/install-paperless-ngx.sh)" |  | ||||||
|  |  | ||||||
| .. _setup-docker_hub: |  | ||||||
|  |  | ||||||
| Install Paperless from Docker Hub |  | ||||||
| ================================= |  | ||||||
|  |  | ||||||
| 1.  Login with your user and create a folder in your home-directory `mkdir -v ~/paperless-ngx` to have a place for your configuration files and consumption directory. |  | ||||||
|  |  | ||||||
| 2.  Go to the `/docker/compose directory on the project page <https://github.com/paperless-ngx/paperless-ngx/tree/master/docker/compose>`_ |  | ||||||
|     and download one of the `docker-compose.*.yml` files, depending on which database backend you |  | ||||||
|     want to use. Rename this file to `docker-compose.yml`. |  | ||||||
|     If you want to enable optional support for Office documents, download a file with `-tika` in the file name. |  | ||||||
|     Download the ``docker-compose.env`` file and the ``.env`` file as well and store them |  | ||||||
|     in the same directory. |  | ||||||
|  |  | ||||||
|     .. hint:: |  | ||||||
|  |  | ||||||
|         For new installations, it is recommended to use PostgreSQL as the database |  | ||||||
|         backend. |  | ||||||
|  |  | ||||||
| 3.  Install `Docker`_ and `docker-compose`_. |  | ||||||
|  |  | ||||||
|     .. caution:: |  | ||||||
|  |  | ||||||
|         If you want to use the included ``docker-compose.*.yml`` file, you |  | ||||||
|         need to have at least Docker version **17.09.0** and docker-compose |  | ||||||
|         version **1.17.0**. |  | ||||||
|         To check do: `docker-compose -v` or `docker -v` |  | ||||||
|  |  | ||||||
|         See the `Docker installation guide`_ on how to install the current |  | ||||||
|         version of Docker for your operating system or Linux distribution of |  | ||||||
|         choice. To get the latest version of docker-compose, follow the |  | ||||||
|         `docker-compose installation guide`_ if your package repository doesn't |  | ||||||
|         include it. |  | ||||||
|  |  | ||||||
|         .. _Docker installation guide: https://docs.docker.com/engine/installation/ |  | ||||||
|         .. _docker-compose installation guide: https://docs.docker.com/compose/install/ |  | ||||||
|  |  | ||||||
| 4.  Modify ``docker-compose.yml`` to your preferences. You may want to change the path |  | ||||||
|     to the consumption directory. Find the line that specifies where |  | ||||||
|     to mount the consumption directory: |  | ||||||
|  |  | ||||||
|     .. code:: |  | ||||||
|  |  | ||||||
|         - ./consume:/usr/src/paperless/consume |  | ||||||
|  |  | ||||||
|     Replace the part BEFORE the colon with a local directory of your choice: |  | ||||||
|  |  | ||||||
|     .. code:: |  | ||||||
|  |  | ||||||
|         - /home/jonaswinkler/paperless-inbox:/usr/src/paperless/consume |  | ||||||
|  |  | ||||||
|     Don't change the part after the colon or paperless wont find your documents. |  | ||||||
|  |  | ||||||
|     You may also need to change the default port that the webserver will use |  | ||||||
|     from the default (8000): |  | ||||||
|  |  | ||||||
|      .. code:: |  | ||||||
|  |  | ||||||
|         ports: |  | ||||||
|           - 8000:8000 |  | ||||||
|  |  | ||||||
|     Replace the part BEFORE the colon with a port of your choice: |  | ||||||
|  |  | ||||||
|      .. code:: |  | ||||||
|  |  | ||||||
|         ports: |  | ||||||
|           - 8010:8000 |  | ||||||
|  |  | ||||||
|     Don't change the part after the colon or edit other lines that refer to |  | ||||||
|     port 8000. Modifying the part before the colon will map requests on another |  | ||||||
|     port to the webserver running on the default port. |  | ||||||
|  |  | ||||||
|     **Rootless** |  | ||||||
|  |  | ||||||
|     If you want to run Paperless as a rootless container, you will need to do the |  | ||||||
|     following in your ``docker-compose.yml``: |  | ||||||
|  |  | ||||||
|     - set the ``user`` running the container to map to the ``paperless`` user in the |  | ||||||
|       container. |  | ||||||
|       This value (``user_id`` below), should be the same id that ``USERMAP_UID`` and |  | ||||||
|       ``USERMAP_GID`` are set to in the next step. |  | ||||||
|       See ``USERMAP_UID`` and ``USERMAP_GID`` :ref:`here <configuration-docker>`. |  | ||||||
|  |  | ||||||
|     Your entry for Paperless should contain something like: |  | ||||||
|  |  | ||||||
|      .. code:: |  | ||||||
|  |  | ||||||
|         webserver: |  | ||||||
|           image: ghcr.io/paperless-ngx/paperless-ngx:latest |  | ||||||
|           user: <user_id> |  | ||||||
|  |  | ||||||
| 5.  Modify ``docker-compose.env``, following the comments in the file. The |  | ||||||
|     most important change is to set ``USERMAP_UID`` and ``USERMAP_GID`` |  | ||||||
|     to the uid and gid of your user on the host system. Use ``id -u`` and |  | ||||||
|     ``id -g`` to get these. |  | ||||||
|  |  | ||||||
|     This ensures that |  | ||||||
|     both the docker container and you on the host machine have write access |  | ||||||
|     to the consumption directory. If your UID and GID on the host system is |  | ||||||
|     1000 (the default for the first normal user on most systems), it will |  | ||||||
|     work out of the box without any modifications. `id "username"` to check. |  | ||||||
|  |  | ||||||
|     .. note:: |  | ||||||
|  |  | ||||||
|         You can copy any setting from the file ``paperless.conf.example`` and paste it here. |  | ||||||
|         Have a look at :ref:`configuration` to see what's available. |  | ||||||
|  |  | ||||||
|     .. note:: |  | ||||||
|  |  | ||||||
|         You can utilize Docker secrets for some configuration settings by |  | ||||||
|         appending `_FILE` to some configuration values.  This is supported currently |  | ||||||
|         only by: |  | ||||||
|  |  | ||||||
|           * PAPERLESS_DBUSER |  | ||||||
|           * PAPERLESS_DBPASS |  | ||||||
|           * PAPERLESS_SECRET_KEY |  | ||||||
|           * PAPERLESS_AUTO_LOGIN_USERNAME |  | ||||||
|           * PAPERLESS_ADMIN_USER |  | ||||||
|           * PAPERLESS_ADMIN_MAIL |  | ||||||
|           * PAPERLESS_ADMIN_PASSWORD |  | ||||||
|  |  | ||||||
|     .. caution:: |  | ||||||
|  |  | ||||||
|         Some file systems such as NFS network shares don't support file system |  | ||||||
|         notifications with ``inotify``. When storing the consumption directory |  | ||||||
|         on such a file system, paperless will not pick up new files |  | ||||||
|         with the default configuration. You will need to use ``PAPERLESS_CONSUMER_POLLING``, |  | ||||||
|         which will disable inotify. See :ref:`here <configuration-polling>`. |  | ||||||
|  |  | ||||||
| 6.  Run ``docker-compose pull``, followed by ``docker-compose up -d``. |  | ||||||
|     This will pull the image, create and start the necessary containers. |  | ||||||
|  |  | ||||||
| 7.  To be able to login, you will need a super user. To create it, execute the |  | ||||||
|     following command: |  | ||||||
|  |  | ||||||
|     .. code-block:: shell-session |  | ||||||
|  |  | ||||||
|         $ docker-compose run --rm webserver createsuperuser |  | ||||||
|  |  | ||||||
|     This will prompt you to set a username, an optional e-mail address and |  | ||||||
|     finally a password (at least 8 characters). |  | ||||||
|  |  | ||||||
| 8.  The default ``docker-compose.yml`` exports the webserver on your local port |  | ||||||
|     8000. If you did not change this, you should now be able to visit your |  | ||||||
|     Paperless instance at ``http://127.0.0.1:8000`` or your servers IP-Address:8000. |  | ||||||
|     Use the login credentials you have created with the previous step. |  | ||||||
|  |  | ||||||
| .. _Docker: https://www.docker.com/ |  | ||||||
| .. _docker-compose: https://docs.docker.com/compose/install/ |  | ||||||
|  |  | ||||||
| .. _setup-docker_build: |  | ||||||
|  |  | ||||||
| Build the Docker image yourself |  | ||||||
| =============================== |  | ||||||
|  |  | ||||||
| 1.  Clone the entire repository of paperless: |  | ||||||
|  |  | ||||||
|     .. code:: shell-session |  | ||||||
|  |  | ||||||
|         git clone https://github.com/paperless-ngx/paperless-ngx |  | ||||||
|  |  | ||||||
|     The master branch always reflects the latest stable version. |  | ||||||
|  |  | ||||||
| 2.  Copy one of the ``docker/compose/docker-compose.*.yml`` to ``docker-compose.yml`` in the root folder, |  | ||||||
|     depending on which database backend you want to use. Copy |  | ||||||
|     ``docker-compose.env`` into the project root as well. |  | ||||||
|  |  | ||||||
| 3.  In the ``docker-compose.yml`` file, find the line that instructs docker-compose to pull the paperless image from Docker Hub: |  | ||||||
|  |  | ||||||
|     .. code:: yaml |  | ||||||
|  |  | ||||||
|         webserver: |  | ||||||
|             image: ghcr.io/paperless-ngx/paperless-ngx:latest |  | ||||||
|  |  | ||||||
|     and replace it with a line that instructs docker-compose to build the image from the current working directory instead: |  | ||||||
|  |  | ||||||
|     .. code:: yaml |  | ||||||
|  |  | ||||||
|         webserver: |  | ||||||
|             build: |  | ||||||
|               context: . |  | ||||||
|               args: |  | ||||||
|                 QPDF_VERSION: x.y.x |  | ||||||
|                 PIKEPDF_VERSION: x.y.z |  | ||||||
|                 PSYCOPG2_VERSION: x.y.z |  | ||||||
|                 JBIG2ENC_VERSION: 0.29 |  | ||||||
|  |  | ||||||
|     .. note:: |  | ||||||
|  |  | ||||||
|         You should match the build argument versions to the version for the release you have |  | ||||||
|         checked out.  These are pre-built images with certain, more updated software. |  | ||||||
|         If you want to build these images your self, that is possible, but beyond |  | ||||||
|         the scope of these steps. |  | ||||||
|  |  | ||||||
| 4.  Follow steps 3 to 8 of :ref:`setup-docker_hub`. When asked to run |  | ||||||
|     ``docker-compose pull`` to pull the image, do |  | ||||||
|  |  | ||||||
|     .. code:: shell-session |  | ||||||
|  |  | ||||||
|         $ docker-compose build |  | ||||||
|  |  | ||||||
|     instead to build the image. |  | ||||||
|  |  | ||||||
| .. _setup-bare_metal: |  | ||||||
|  |  | ||||||
| Bare Metal Route |  | ||||||
| ================ |  | ||||||
|  |  | ||||||
| Paperless runs on linux only. The following procedure has been tested on a minimal |  | ||||||
| installation of Debian/Buster, which is the current stable release at the time of |  | ||||||
| writing. Windows is not and will never be supported. |  | ||||||
|  |  | ||||||
| 1.  Install dependencies. Paperless requires the following packages. |  | ||||||
|  |  | ||||||
|     *   ``python3`` 3.8, 3.9 |  | ||||||
|     *   ``python3-pip`` |  | ||||||
|     *   ``python3-dev`` |  | ||||||
|  |  | ||||||
|     *   ``default-libmysqlclient-dev`` for MariaDB |  | ||||||
|     *   ``fonts-liberation`` for generating thumbnails for plain text files |  | ||||||
|     *   ``imagemagick`` >= 6 for PDF conversion |  | ||||||
|     *   ``gnupg`` for handling encrypted documents |  | ||||||
|     *   ``libpq-dev`` for PostgreSQL |  | ||||||
|     *   ``libmagic-dev`` for mime type detection |  | ||||||
|     *   ``mariadb-client`` for MariaDB compile time |  | ||||||
|     *   ``mime-support`` for mime type detection |  | ||||||
|     *   ``libzbar0`` for barcode detection |  | ||||||
|     *   ``poppler-utils`` for barcode detection |  | ||||||
|  |  | ||||||
|     Use this list for your preferred package management: |  | ||||||
|  |  | ||||||
|     .. code:: |  | ||||||
|  |  | ||||||
|         python3 python3-pip python3-dev imagemagick fonts-liberation gnupg libpq-dev default-libmysqlclient-dev libmagic-dev mime-support libzbar0 poppler-utils |  | ||||||
|  |  | ||||||
|     These dependencies are required for OCRmyPDF, which is used for text recognition. |  | ||||||
|  |  | ||||||
|     *   ``unpaper`` |  | ||||||
|     *   ``ghostscript`` |  | ||||||
|     *   ``icc-profiles-free`` |  | ||||||
|     *   ``qpdf`` |  | ||||||
|     *   ``liblept5`` |  | ||||||
|     *   ``libxml2`` |  | ||||||
|     *   ``pngquant`` (suggested for certain PDF image optimizations) |  | ||||||
|     *   ``zlib1g`` |  | ||||||
|     *   ``tesseract-ocr`` >= 4.0.0 for OCR |  | ||||||
|     *   ``tesseract-ocr`` language packs (``tesseract-ocr-eng``, ``tesseract-ocr-deu``, etc) |  | ||||||
|  |  | ||||||
|     Use this list for your preferred package management: |  | ||||||
|  |  | ||||||
|     .. code:: |  | ||||||
|  |  | ||||||
|         unpaper ghostscript icc-profiles-free qpdf liblept5 libxml2 pngquant zlib1g tesseract-ocr |  | ||||||
|  |  | ||||||
|     On Raspberry Pi, these libraries are required as well: |  | ||||||
|  |  | ||||||
|     *   ``libatlas-base-dev`` |  | ||||||
|     *   ``libxslt1-dev`` |  | ||||||
|  |  | ||||||
|     You will also need ``build-essential``, ``python3-setuptools`` and ``python3-wheel`` |  | ||||||
|     for installing some of the python dependencies. |  | ||||||
|  |  | ||||||
| 2.  Install ``redis`` >= 6.0 and configure it to start automatically. |  | ||||||
|  |  | ||||||
| 3.  Optional. Install ``postgresql`` and configure a database, user and password for paperless. If you do not wish |  | ||||||
|     to use PostgreSQL, MariaDB and SQLite are available as well. |  | ||||||
|  |  | ||||||
|     .. note:: |  | ||||||
|  |  | ||||||
|         On bare-metal installations using SQLite, ensure the |  | ||||||
|         `JSON1 extension <https://code.djangoproject.com/wiki/JSON1Extension>`_ is enabled. This is |  | ||||||
|         usually the case, but not always. |  | ||||||
|  |  | ||||||
| 4.  Get the release archive from `<https://github.com/paperless-ngx/paperless-ngx/releases>`_. |  | ||||||
|     If you clone the git repo as it is, you also have to compile the front end by yourself. |  | ||||||
|     Extract the archive to a place from where you wish to execute it, such as ``/opt/paperless``. |  | ||||||
|  |  | ||||||
| 5.  Configure paperless. See :ref:`configuration` for details. Edit the included ``paperless.conf`` and adjust the |  | ||||||
|     settings to your needs. Required settings for getting paperless running are: |  | ||||||
|  |  | ||||||
|     *   ``PAPERLESS_REDIS`` should point to your redis server, such as redis://localhost:6379. |  | ||||||
|     *   ``PAPERLESS_DBENGINE`` optional, and should be one of `postgres, mariadb, or sqlite` |  | ||||||
|     *   ``PAPERLESS_DBHOST`` should be the hostname on which your PostgreSQL server is running. Do not configure this |  | ||||||
|         to use SQLite instead. Also configure port, database name, user and password as necessary. |  | ||||||
|     *   ``PAPERLESS_CONSUMPTION_DIR`` should point to a folder which paperless should watch for documents. You might |  | ||||||
|         want to have this somewhere else. Likewise, ``PAPERLESS_DATA_DIR`` and ``PAPERLESS_MEDIA_ROOT`` define where |  | ||||||
|         paperless stores its data. If you like, you can point both to the same directory. |  | ||||||
|     *   ``PAPERLESS_SECRET_KEY`` should be a random sequence of characters. It's used for authentication. Failure |  | ||||||
|         to do so allows third parties to forge authentication credentials. |  | ||||||
|     *   ``PAPERLESS_URL`` if you are behind a reverse proxy. This should point to your domain. Please see |  | ||||||
|         :ref:`configuration` for more information. |  | ||||||
|  |  | ||||||
|     Many more adjustments can be made to paperless, especially the OCR part. The following options are recommended |  | ||||||
|     for everyone: |  | ||||||
|  |  | ||||||
|     *   Set ``PAPERLESS_OCR_LANGUAGE`` to the language most of your documents are written in. |  | ||||||
|     *   Set ``PAPERLESS_TIME_ZONE`` to your local time zone. |  | ||||||
|  |  | ||||||
| 6.  Create a system user under which you wish to run paperless. |  | ||||||
|  |  | ||||||
|     .. code:: shell-session |  | ||||||
|  |  | ||||||
|         adduser paperless --system --home /opt/paperless --group |  | ||||||
|  |  | ||||||
| 7.  Ensure that these directories exist |  | ||||||
|     and that the paperless user has write permissions to the following directories: |  | ||||||
|  |  | ||||||
|     *   ``/opt/paperless/media`` |  | ||||||
|     *   ``/opt/paperless/data`` |  | ||||||
|     *   ``/opt/paperless/consume`` |  | ||||||
|  |  | ||||||
|     Adjust as necessary if you configured different folders. |  | ||||||
|  |  | ||||||
| 8.  Install python requirements from the ``requirements.txt`` file. |  | ||||||
|     It is up to you if you wish to use a virtual environment or not. First you should update your pip, so it gets the actual packages. |  | ||||||
|  |  | ||||||
|     .. code:: shell-session |  | ||||||
|  |  | ||||||
|         sudo -Hu paperless pip3 install --upgrade pip |  | ||||||
|  |  | ||||||
|     .. code:: shell-session |  | ||||||
|  |  | ||||||
|         sudo -Hu paperless pip3 install -r requirements.txt |  | ||||||
|  |  | ||||||
|     This will install all python dependencies in the home directory of |  | ||||||
|     the new paperless user. |  | ||||||
|  |  | ||||||
| 9.  Go to ``/opt/paperless/src``, and execute the following commands: |  | ||||||
|  |  | ||||||
|     .. code:: bash |  | ||||||
|  |  | ||||||
|         # This creates the database schema. |  | ||||||
|         sudo -Hu paperless python3 manage.py migrate |  | ||||||
|  |  | ||||||
|         # This creates your first paperless user |  | ||||||
|         sudo -Hu paperless python3 manage.py createsuperuser |  | ||||||
|  |  | ||||||
| 10. Optional: Test that paperless is working by executing |  | ||||||
|  |  | ||||||
|       .. code:: bash |  | ||||||
|  |  | ||||||
|         # This collects static files from paperless and django. |  | ||||||
|         sudo -Hu paperless python3 manage.py runserver |  | ||||||
|  |  | ||||||
|     and pointing your browser to http://localhost:8000/. |  | ||||||
|  |  | ||||||
|     .. warning:: |  | ||||||
|  |  | ||||||
|         This is a development server which should not be used in |  | ||||||
|         production. It is not audited for security and performance |  | ||||||
|         is inferior to production ready web servers. |  | ||||||
|  |  | ||||||
|     .. hint:: |  | ||||||
|  |  | ||||||
|         This will not start the consumer. Paperless does this in a |  | ||||||
|         separate process. |  | ||||||
|  |  | ||||||
| 11. Setup systemd services to run paperless automatically. You may |  | ||||||
|     use the service definition files included in the ``scripts`` folder |  | ||||||
|     as a starting point. |  | ||||||
|  |  | ||||||
|     Paperless needs the ``webserver`` script to run the webserver, the |  | ||||||
|     ``consumer`` script to watch the input folder, ``taskqueue`` for the background workers |  | ||||||
|     used to handle things like document consumption and the ``scheduler`` script to run tasks such as |  | ||||||
|     email checking at certain times . |  | ||||||
|  |  | ||||||
| 		The ``socket`` script enables ``gunicorn`` to run on port 80 without |  | ||||||
| 		root privileges. For this you need to uncomment the ``Require=paperless-webserver.socket`` |  | ||||||
| 		in the ``webserver`` script and configure ``gunicorn`` to listen on port 80 (see ``paperless/gunicorn.conf.py``). |  | ||||||
|  |  | ||||||
|     You may need to adjust the path to the ``gunicorn`` executable. This |  | ||||||
|     will be installed as part of the python dependencies, and is either located |  | ||||||
|     in the ``bin`` folder of your virtual environment, or in ``~/.local/bin/`` if |  | ||||||
|     no virtual environment is used. |  | ||||||
|  |  | ||||||
|     These services rely on redis and optionally the database server, but |  | ||||||
|     don't need to be started in any particular order. The example files |  | ||||||
|     depend on redis being started. If you use a database server, you should |  | ||||||
|     add additional dependencies. |  | ||||||
|  |  | ||||||
|     .. caution:: |  | ||||||
|  |  | ||||||
|         The included scripts run a ``gunicorn`` standalone server, |  | ||||||
|         which is fine for running paperless. It does support SSL, |  | ||||||
|         however, the documentation of GUnicorn states that you should |  | ||||||
|         use a proxy server in front of gunicorn instead. |  | ||||||
|  |  | ||||||
|         For instructions on how to use nginx for that, |  | ||||||
|         :ref:`see the instructions below <setup-nginx>`. |  | ||||||
|  |  | ||||||
| 12. Optional: Install a samba server and make the consumption folder |  | ||||||
|     available as a network share. |  | ||||||
|  |  | ||||||
| 13. Configure ImageMagick to allow processing of PDF documents. Most distributions have |  | ||||||
|     this disabled by default, since PDF documents can contain malware. If |  | ||||||
|     you don't do this, paperless will fall back to ghostscript for certain steps |  | ||||||
|     such as thumbnail generation. |  | ||||||
|  |  | ||||||
|     Edit ``/etc/ImageMagick-6/policy.xml`` and adjust |  | ||||||
|  |  | ||||||
|     .. code:: |  | ||||||
|  |  | ||||||
|         <policy domain="coder" rights="none" pattern="PDF" /> |  | ||||||
|  |  | ||||||
|     to |  | ||||||
|  |  | ||||||
|     .. code:: |  | ||||||
|  |  | ||||||
|         <policy domain="coder" rights="read|write" pattern="PDF" /> |  | ||||||
|  |  | ||||||
| 14. Optional: Install the `jbig2enc <https://ocrmypdf.readthedocs.io/en/latest/jbig2.html>`_ |  | ||||||
|     encoder. This will reduce the size of generated PDF documents. You'll most likely need |  | ||||||
|     to compile this by yourself, because this software has been patented until around 2017 and |  | ||||||
|     binary packages are not available for most distributions. |  | ||||||
|  |  | ||||||
| 15. Optional: If using the NLTK machine learning processing (see ``PAPERLESS_ENABLE_NLTK`` in |  | ||||||
|     :ref:`configuration` for details), download the NLTK data for the Snowball Stemmer, Stopwords |  | ||||||
|     and Punkt tokenizer to your ``PAPERLESS_DATA_DIR/nltk``.  Refer to |  | ||||||
|     the `NLTK instructions <https://www.nltk.org/data.html>`_ for details on how to |  | ||||||
|     download the data. |  | ||||||
|  |  | ||||||
|  |  | ||||||
| Migrating to Paperless-ngx |  | ||||||
| ########################## |  | ||||||
|  |  | ||||||
| Migration is possible both from Paperless-ng or directly from the 'original' Paperless. |  | ||||||
|  |  | ||||||
| Migrating from Paperless-ng |  | ||||||
| =========================== |  | ||||||
|  |  | ||||||
| Paperless-ngx is meant to be a drop-in replacement for Paperless-ng and thus upgrading should be |  | ||||||
| trivial for most users, especially when using docker. However, as with any major change, it is |  | ||||||
| recommended to take a full backup first. Once you are ready, simply change the docker image to |  | ||||||
| point to the new source. E.g. if using Docker Compose, edit ``docker-compose.yml`` and change: |  | ||||||
|  |  | ||||||
| .. code:: |  | ||||||
|  |  | ||||||
|   image: jonaswinkler/paperless-ng:latest |  | ||||||
|  |  | ||||||
| to |  | ||||||
|  |  | ||||||
| .. code:: |  | ||||||
|  |  | ||||||
|   image: ghcr.io/paperless-ngx/paperless-ngx:latest |  | ||||||
|  |  | ||||||
| and then run ``docker-compose up -d`` which will pull the new image recreate the container. |  | ||||||
| That's it! |  | ||||||
|  |  | ||||||
| Users who installed with the bare-metal route should also update their Git clone to point to |  | ||||||
| ``https://github.com/paperless-ngx/paperless-ngx``, e.g. using the command |  | ||||||
| ``git remote set-url origin https://github.com/paperless-ngx/paperless-ngx`` and then pull the |  | ||||||
| lastest version. |  | ||||||
|  |  | ||||||
| Migrating from Paperless |  | ||||||
| ======================== |  | ||||||
|  |  | ||||||
| At its core, paperless-ngx is still paperless and fully compatible. However, some |  | ||||||
| things have changed under the hood, so you need to adapt your setup depending on |  | ||||||
| how you installed paperless. |  | ||||||
|  |  | ||||||
| This setup describes how to update an existing paperless Docker installation. |  | ||||||
| The important things to keep in mind are as follows: |  | ||||||
|  |  | ||||||
| * Read the :doc:`changelog </changelog>` and take note of breaking changes. |  | ||||||
| * You should decide if you want to stick with SQLite or want to migrate your database |  | ||||||
|   to PostgreSQL. See :ref:`setup-sqlite_to_psql` for details on how to move your data from |  | ||||||
|   SQLite to PostgreSQL. Both work fine with paperless. However, if you already have a |  | ||||||
|   database server running for other services, you might as well use it for paperless as well. |  | ||||||
| * The task scheduler of paperless, which is used to execute periodic tasks |  | ||||||
|   such as email checking and maintenance, requires a `redis`_ message broker |  | ||||||
|   instance. The docker-compose route takes care of that. |  | ||||||
| * The layout of the folder structure for your documents and data remains the |  | ||||||
|   same, so you can just plug your old docker volumes into paperless-ngx and |  | ||||||
|   expect it to find everything where it should be. |  | ||||||
|  |  | ||||||
| Migration to paperless-ngx is then performed in a few simple steps: |  | ||||||
|  |  | ||||||
| 1.  Stop paperless. |  | ||||||
|  |  | ||||||
|     .. code:: bash |  | ||||||
|  |  | ||||||
|         $ cd /path/to/current/paperless |  | ||||||
|         $ docker-compose down |  | ||||||
|  |  | ||||||
| 2.  Do a backup for two purposes: If something goes wrong, you still have your |  | ||||||
|     data. Second, if you don't like paperless-ngx, you can switch back to |  | ||||||
|     paperless. |  | ||||||
|  |  | ||||||
| 3.  Download the latest release of paperless-ngx. You can either go with the |  | ||||||
|     docker-compose files from `here <https://github.com/paperless-ngx/paperless-ngx/tree/master/docker/compose>`__ |  | ||||||
|     or clone the repository to build the image yourself (see :ref:`above <setup-docker_build>`). |  | ||||||
|     You can either replace your current paperless folder or put paperless-ngx |  | ||||||
|     in a different location. |  | ||||||
|  |  | ||||||
|     .. caution:: |  | ||||||
|  |  | ||||||
|         Paperless-ngx includes a ``.env`` file. This will set the |  | ||||||
|         project name for docker compose to ``paperless``, which will also define the name |  | ||||||
|         of the volumes by paperless-ngx. However, if you experience that paperless-ngx |  | ||||||
|         is not using your old paperless volumes, verify the names of your volumes with |  | ||||||
|  |  | ||||||
|         .. code:: shell-session |  | ||||||
|  |  | ||||||
|             $ docker volume ls | grep _data |  | ||||||
|  |  | ||||||
|         and adjust the project name in the ``.env`` file so that it matches the name |  | ||||||
|         of the volumes before the ``_data`` part. |  | ||||||
|  |  | ||||||
|  |  | ||||||
| 4.  Download the ``docker-compose.sqlite.yml`` file to ``docker-compose.yml``. |  | ||||||
|     If you want to switch to PostgreSQL, do that after you migrated your existing |  | ||||||
|     SQLite database. |  | ||||||
|  |  | ||||||
| 5.  Adjust ``docker-compose.yml`` and ``docker-compose.env`` to your needs. |  | ||||||
|     See :ref:`setup-docker_hub` for details on which edits are advised. |  | ||||||
|  |  | ||||||
| 6.  :ref:`Update paperless. <administration-updating>` |  | ||||||
|  |  | ||||||
| 7.  In order to find your existing documents with the new search feature, you need |  | ||||||
|     to invoke a one-time operation that will create the search index: |  | ||||||
|  |  | ||||||
|     .. code:: shell-session |  | ||||||
|  |  | ||||||
|         $ docker-compose run --rm webserver document_index reindex |  | ||||||
|  |  | ||||||
|     This will migrate your database and create the search index. After that, |  | ||||||
|     paperless will take care of maintaining the index by itself. |  | ||||||
|  |  | ||||||
| 8.  Start paperless-ngx. |  | ||||||
|  |  | ||||||
|     .. code:: bash |  | ||||||
|  |  | ||||||
|         $ docker-compose up -d |  | ||||||
|  |  | ||||||
|     This will run paperless in the background and automatically start it on system boot. |  | ||||||
|  |  | ||||||
| 9.  Paperless installed a permanent redirect to ``admin/`` in your browser. This |  | ||||||
|     redirect is still in place and prevents access to the new UI. Clear your |  | ||||||
|     browsing cache in order to fix this. |  | ||||||
|  |  | ||||||
| 10.  Optionally, follow the instructions below to migrate your existing data to PostgreSQL. |  | ||||||
|  |  | ||||||
|  |  | ||||||
| Migrating from LinuxServer.io Docker Image |  | ||||||
| ========================================== |  | ||||||
|  |  | ||||||
| As with any upgrades and large changes, it is highly recommended to create a backup before |  | ||||||
| starting.  This assumes the image was running using Docker Compose, but the instructions |  | ||||||
| are translatable to Docker commands as well. |  | ||||||
|  |  | ||||||
| 1.  Stop and remove the paperless container |  | ||||||
| 2.  If using an external database, stop the container |  | ||||||
| 3.  Update Redis configuration |  | ||||||
|  |  | ||||||
|     a)  If ``REDIS_URL`` is already set, change it to ``PAPERLESS_REDIS`` and continue |  | ||||||
|         to step 4. |  | ||||||
|     b)  Otherwise, in the ``docker-compose.yml`` add a new service for Redis, |  | ||||||
|         following `the example compose files <https://github.com/paperless-ngx/paperless-ngx/tree/main/docker/compose>`_ |  | ||||||
|     c)  Set the environment variable ``PAPERLESS_REDIS`` so it points to the new Redis container |  | ||||||
|  |  | ||||||
| 4.  Update user mapping |  | ||||||
|  |  | ||||||
|     a)  If set, change the environment variable ``PUID`` to ``USERMAP_UID`` |  | ||||||
|     b)  If set, change the environment variable ``PGID`` to ``USERMAP_GID`` |  | ||||||
|  |  | ||||||
| 5.  Update configuration paths |  | ||||||
|  |  | ||||||
|     a) Set the environment variable ``PAPERLESS_DATA_DIR`` |  | ||||||
|        to ``/config`` |  | ||||||
|  |  | ||||||
| 6.  Update media paths |  | ||||||
|  |  | ||||||
|     a) Set the environment variable ``PAPERLESS_MEDIA_ROOT`` |  | ||||||
|        to ``/data/media`` |  | ||||||
|  |  | ||||||
| 7.  Update timezone |  | ||||||
|  |  | ||||||
|     a) Set the environment variable ``PAPERLESS_TIME_ZONE`` |  | ||||||
|        to the same value as ``TZ`` |  | ||||||
|  |  | ||||||
| 8.  Modify the ``image:`` to point to ``ghcr.io/paperless-ngx/paperless-ngx:latest`` or |  | ||||||
|     a specific version if preferred. |  | ||||||
|  |  | ||||||
| 9.  Start the containers as before, using ``docker-compose``. |  | ||||||
|  |  | ||||||
| .. _setup-sqlite_to_psql: |  | ||||||
|  |  | ||||||
| Moving data from SQLite to PostgreSQL or MySQL/MariaDB |  | ||||||
| ====================================================== |  | ||||||
|  |  | ||||||
| Moving your data from SQLite to PostgreSQL or MySQL/MariaDB is done via executing a series of django |  | ||||||
| management commands as below.  The commands below use PostgreSQL, but are applicable to MySQL/MariaDB |  | ||||||
| with the |  | ||||||
|  |  | ||||||
| .. caution:: |  | ||||||
|  |  | ||||||
|     Make sure that your SQLite database is migrated to the latest version. |  | ||||||
|     Starting paperless will make sure that this is the case. If your try to |  | ||||||
|     load data from an old database schema in SQLite into a newer database |  | ||||||
|     schema in PostgreSQL, you will run into trouble. |  | ||||||
|  |  | ||||||
| .. warning:: |  | ||||||
|  |  | ||||||
|     On some database fields, PostgreSQL enforces predefined limits on maximum |  | ||||||
|     length, whereas SQLite does not. The fields in question are the title of documents |  | ||||||
|     (128 characters), names of document types, tags and correspondents (128 characters), |  | ||||||
|     and filenames (1024 characters). If you have data in these fields that surpasses these |  | ||||||
|     limits, migration to PostgreSQL is not possible and will fail with an error. |  | ||||||
|  |  | ||||||
| .. warning:: |  | ||||||
|  |  | ||||||
|     MySQL is case insensitive by default, treating values like "Name" and "NAME" as identical. |  | ||||||
|     See :ref:`advanced-mysql-caveats` for details. |  | ||||||
|  |  | ||||||
|  |  | ||||||
| 1.  Stop paperless, if it is running. |  | ||||||
| 2.  Tell paperless to use PostgreSQL: |  | ||||||
|  |  | ||||||
|     a)  With docker, copy the provided ``docker-compose.postgres.yml`` file to |  | ||||||
|         ``docker-compose.yml``. Remember to adjust the consumption directory, |  | ||||||
|         if necessary. |  | ||||||
|     b)  Without docker, configure the database in your ``paperless.conf`` file. |  | ||||||
|         See :ref:`configuration` for details. |  | ||||||
|  |  | ||||||
| 3.  Open a shell and initialize the database: |  | ||||||
|  |  | ||||||
|     a)  With docker, run the following command to open a shell within the paperless |  | ||||||
|         container: |  | ||||||
|  |  | ||||||
|         .. code:: shell-session |  | ||||||
|  |  | ||||||
|             $ cd /path/to/paperless |  | ||||||
|             $ docker-compose run --rm webserver /bin/bash |  | ||||||
|  |  | ||||||
|         This will launch the container and initialize the PostgreSQL database. |  | ||||||
|  |  | ||||||
|     b)  Without docker, remember to activate any virtual environment, switch to |  | ||||||
|         the ``src`` directory and create the database schema: |  | ||||||
|  |  | ||||||
|         .. code:: shell-session |  | ||||||
|  |  | ||||||
|             $ cd /path/to/paperless/src |  | ||||||
|             $ python3 manage.py migrate |  | ||||||
|  |  | ||||||
|         This will not copy any data yet. |  | ||||||
|  |  | ||||||
| 4.  Dump your data from SQLite: |  | ||||||
|  |  | ||||||
|     .. code:: shell-session |  | ||||||
|  |  | ||||||
|         $ python3 manage.py dumpdata --database=sqlite --exclude=contenttypes --exclude=auth.Permission > data.json |  | ||||||
|  |  | ||||||
| 5.  Load your data into PostgreSQL: |  | ||||||
|  |  | ||||||
|     .. code:: shell-session |  | ||||||
|  |  | ||||||
|         $ python3 manage.py loaddata data.json |  | ||||||
|  |  | ||||||
| 6.  If operating inside Docker, you may exit the shell now. |  | ||||||
|  |  | ||||||
|     .. code:: shell-session |  | ||||||
|  |  | ||||||
|         $ exit |  | ||||||
|  |  | ||||||
| 7.  Start paperless. |  | ||||||
|  |  | ||||||
|  |  | ||||||
| Moving back to Paperless |  | ||||||
| ======================== |  | ||||||
|  |  | ||||||
| Lets say you migrated to Paperless-ngx and used it for a while, but decided that |  | ||||||
| you don't like it and want to move back (If you do, send me a mail about what |  | ||||||
| part you didn't like!), you can totally do that with a few simple steps. |  | ||||||
|  |  | ||||||
| Paperless-ngx modified the database schema slightly, however, these changes can |  | ||||||
| be reverted while keeping your current data, so that your current data will |  | ||||||
| be compatible with original Paperless. |  | ||||||
|  |  | ||||||
| Execute this: |  | ||||||
|  |  | ||||||
| .. code:: shell-session |  | ||||||
|  |  | ||||||
|     $ cd /path/to/paperless |  | ||||||
|     $ docker-compose run --rm webserver migrate documents 0023 |  | ||||||
|  |  | ||||||
| Or without docker: |  | ||||||
|  |  | ||||||
| .. code:: shell-session |  | ||||||
|  |  | ||||||
|     $ cd /path/to/paperless/src |  | ||||||
|     $ python3 manage.py migrate documents 0023 |  | ||||||
|  |  | ||||||
| After that, you need to clear your cookies (Paperless-ngx comes with updated |  | ||||||
| dependencies that do cookie-processing differently) and probably your cache |  | ||||||
| as well. |  | ||||||
|  |  | ||||||
| .. _setup-less_powerful_devices: |  | ||||||
|  |  | ||||||
|  |  | ||||||
| Considerations for less powerful devices |  | ||||||
| ######################################## |  | ||||||
|  |  | ||||||
| Paperless runs on Raspberry Pi. However, some things are rather slow on the Pi and |  | ||||||
| configuring some options in paperless can help improve performance immensely: |  | ||||||
|  |  | ||||||
| *   Stick with SQLite to save some resources. |  | ||||||
| *   Consider setting ``PAPERLESS_OCR_PAGES`` to 1, so that paperless will only OCR |  | ||||||
|     the first page of your documents. In most cases, this page contains enough |  | ||||||
|     information to be able to find it. |  | ||||||
| *   ``PAPERLESS_TASK_WORKERS`` and ``PAPERLESS_THREADS_PER_WORKER`` are configured |  | ||||||
|     to use all cores. The Raspberry Pi models 3 and up have 4 cores, meaning that |  | ||||||
|     paperless will use 2 workers and 2 threads per worker. This may result in |  | ||||||
|     sluggish response times during consumption, so you might want to lower these |  | ||||||
|     settings (example: 2 workers and 1 thread to always have some computing power |  | ||||||
|     left for other tasks). |  | ||||||
| *   Keep ``PAPERLESS_OCR_MODE`` at its default value ``skip`` and consider OCR'ing |  | ||||||
|     your documents before feeding them into paperless. Some scanners are able to |  | ||||||
|     do this! You might want to even specify ``skip_noarchive`` to skip archive |  | ||||||
|     file generation for already ocr'ed documents entirely. |  | ||||||
| *   If you want to perform OCR on the device, consider using ``PAPERLESS_OCR_CLEAN=none``. |  | ||||||
|     This will speed up OCR times and use less memory at the expense of slightly worse |  | ||||||
|     OCR results. |  | ||||||
| *   If using docker, consider setting ``PAPERLESS_WEBSERVER_WORKERS`` to |  | ||||||
|     1. This will save some memory. |  | ||||||
| *   Consider setting ``PAPERLESS_ENABLE_NLTK`` to false, to disable the more |  | ||||||
|     advanced language processing, which can take more memory and processing time. |  | ||||||
|  |  | ||||||
| For details, refer to :ref:`configuration`. |  | ||||||
|  |  | ||||||
| .. note:: |  | ||||||
|  |  | ||||||
|     Updating the :ref:`automatic matching algorithm <advanced-automatic_matching>` |  | ||||||
|     takes quite a bit of time. However, the update mechanism checks if your |  | ||||||
|     data has changed before doing the heavy lifting. If you experience the |  | ||||||
|     algorithm taking too much cpu time, consider changing the schedule in the |  | ||||||
|     admin interface to daily. You can also manually invoke the task |  | ||||||
|     by changing the date and time of the next run to today/now. |  | ||||||
|  |  | ||||||
|     The actual matching of the algorithm is fast and works on Raspberry Pi as |  | ||||||
|     well as on any other device. |  | ||||||
|  |  | ||||||
| .. _redis: https://redis.io/ |  | ||||||
|  |  | ||||||
|  |  | ||||||
| .. _setup-nginx: |  | ||||||
|  |  | ||||||
| Using nginx as a reverse proxy |  | ||||||
| ############################## |  | ||||||
|  |  | ||||||
| If you want to expose paperless to the internet, you should hide it behind a |  | ||||||
| reverse proxy with SSL enabled. |  | ||||||
|  |  | ||||||
| In addition to the usual configuration for SSL, |  | ||||||
| the following configuration is required for paperless to operate: |  | ||||||
|  |  | ||||||
| .. code:: nginx |  | ||||||
|  |  | ||||||
|     http { |  | ||||||
|  |  | ||||||
|         # Adjust as required. This is the maximum size for file uploads. |  | ||||||
|         # The default value 1M might be a little too small. |  | ||||||
|         client_max_body_size 10M; |  | ||||||
|  |  | ||||||
|         server { |  | ||||||
|  |  | ||||||
|             location / { |  | ||||||
|  |  | ||||||
|                 # Adjust host and port as required. |  | ||||||
|                 proxy_pass http://localhost:8000/; |  | ||||||
|  |  | ||||||
|                 # These configuration options are required for WebSockets to work. |  | ||||||
|                 proxy_http_version 1.1; |  | ||||||
|                 proxy_set_header Upgrade $http_upgrade; |  | ||||||
|                 proxy_set_header Connection "upgrade"; |  | ||||||
|  |  | ||||||
|                 proxy_redirect off; |  | ||||||
|                 proxy_set_header Host $host; |  | ||||||
|                 proxy_set_header X-Real-IP $remote_addr; |  | ||||||
|                 proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for; |  | ||||||
|                 proxy_set_header X-Forwarded-Host $server_name; |  | ||||||
|             } |  | ||||||
|         } |  | ||||||
|     } |  | ||||||
|  |  | ||||||
| The ``PAPERLESS_URL`` configuration variable is also required when using a reverse proxy. Please refer to the :ref:`hosting-and-security` docs. |  | ||||||
|  |  | ||||||
| Also read `this <https://channels.readthedocs.io/en/stable/deploying.html#nginx-supervisor-ubuntu>`__, towards the end of the section. |  | ||||||
|   | |||||||
| @@ -1,328 +1,12 @@ | |||||||
|  | .. _troubleshooting: | ||||||
|  |  | ||||||
| *************** | *************** | ||||||
| Troubleshooting | Troubleshooting | ||||||
| *************** | *************** | ||||||
|  |  | ||||||
| No files are added by the consumer |  | ||||||
| ################################## |  | ||||||
|  |  | ||||||
| Check for the following issues: | .. cssclass:: redirect-notice | ||||||
|  |  | ||||||
| *   Ensure that the directory you're putting your documents in is the folder |     The Paperless-ngx documentation has permanently moved. | ||||||
|     paperless is watching. With docker, this setting is performed in the |  | ||||||
|     ``docker-compose.yml`` file. Without docker, look at the ``CONSUMPTION_DIR`` |  | ||||||
|     setting. Don't adjust this setting if you're using docker. |  | ||||||
| *   Ensure that redis is up and running. Paperless does its task processing |  | ||||||
|     asynchronously, and for documents to arrive at the task processor, it needs |  | ||||||
|     redis to run. |  | ||||||
| *   Ensure that the task processor is running. Docker does this automatically. |  | ||||||
|     Manually invoke the task processor by executing |  | ||||||
|  |  | ||||||
|     .. code:: shell-session |     You will be redirected shortly... | ||||||
|  |  | ||||||
|         $ celery --app paperless worker |  | ||||||
|  |  | ||||||
| *   Look at the output of paperless and inspect it for any errors. |  | ||||||
| *   Go to the admin interface, and check if there are failed tasks. If so, the |  | ||||||
|     tasks will contain an error message. |  | ||||||
|  |  | ||||||
| Consumer warns ``OCR for XX failed`` |  | ||||||
| #################################### |  | ||||||
|  |  | ||||||
| If you find the OCR accuracy to be too low, and/or the document consumer warns |  | ||||||
| that ``OCR for XX failed, but we're going to stick with what we've got since |  | ||||||
| FORGIVING_OCR is enabled``, then you might need to install the |  | ||||||
| `Tesseract language files <http://packages.ubuntu.com/search?keywords=tesseract-ocr>`_ |  | ||||||
| marching your document's languages. |  | ||||||
|  |  | ||||||
| As an example, if you are running Paperless-ngx from any Ubuntu or Debian |  | ||||||
| box, and your documents are written in Spanish you may need to run:: |  | ||||||
|  |  | ||||||
|     apt-get install -y tesseract-ocr-spa |  | ||||||
|  |  | ||||||
| Consumer fails to pickup any new files |  | ||||||
| ###################################### |  | ||||||
|  |  | ||||||
| If you notice that the consumer will only pickup files in the consumption |  | ||||||
| directory at startup, but won't find any other files added later, you will need to |  | ||||||
| enable filesystem polling with the configuration option |  | ||||||
| ``PAPERLESS_CONSUMER_POLLING``, see :ref:`here <configuration-polling>`. |  | ||||||
|  |  | ||||||
| This will disable listening to filesystem changes with inotify and paperless will |  | ||||||
| manually check the consumption directory for changes instead. |  | ||||||
|  |  | ||||||
|  |  | ||||||
| Paperless always redirects to /admin |  | ||||||
| #################################### |  | ||||||
|  |  | ||||||
| You probably had the old paperless installed at some point. Paperless installed |  | ||||||
| a permanent redirect to /admin in your browser, and you need to clear your |  | ||||||
| browsing data / cache to fix that. |  | ||||||
|  |  | ||||||
|  |  | ||||||
| Operation not permitted |  | ||||||
| ####################### |  | ||||||
|  |  | ||||||
| You might see errors such as: |  | ||||||
|  |  | ||||||
| .. code:: shell-session |  | ||||||
|  |  | ||||||
|     chown: changing ownership of '../export': Operation not permitted |  | ||||||
|  |  | ||||||
| The container tries to set file ownership on the listed directories. This is |  | ||||||
| required so that the user running paperless inside docker has write permissions |  | ||||||
| to these folders. This happens when pointing these directories to NFS shares, |  | ||||||
| for example. |  | ||||||
|  |  | ||||||
| Ensure that ``chown`` is possible on these directories. |  | ||||||
|  |  | ||||||
|  |  | ||||||
| Classifier error: No training data available |  | ||||||
| ############################################ |  | ||||||
|  |  | ||||||
| This indicates that the Auto matching algorithm found no documents to learn from. |  | ||||||
| This may have two reasons: |  | ||||||
|  |  | ||||||
| *   You don't use the Auto matching algorithm: The error can be safely ignored in this case. |  | ||||||
| *   You are using the Auto matching algorithm: The classifier explicitly excludes documents |  | ||||||
|     with Inbox tags. Verify that there are documents in your archive without inbox tags. |  | ||||||
|     The algorithm will only learn from documents not in your inbox. |  | ||||||
|  |  | ||||||
|  |  | ||||||
| UserWarning in sklearn on every single document |  | ||||||
| ############################################### |  | ||||||
|  |  | ||||||
| You may encounter warnings like this: |  | ||||||
|  |  | ||||||
| .. code:: |  | ||||||
|  |  | ||||||
|     /usr/local/lib/python3.7/site-packages/sklearn/base.py:315: |  | ||||||
|     UserWarning: Trying to unpickle estimator CountVectorizer from version 0.23.2 when using version 0.24.0. |  | ||||||
|     This might lead to breaking code or invalid results. Use at your own risk. |  | ||||||
|  |  | ||||||
| This happens when certain dependencies of paperless that are responsible for the auto matching algorithm are |  | ||||||
| updated. After updating these, your current training data *might* not be compatible anymore. This can be ignored |  | ||||||
| in most cases. This warning will disappear automatically when paperless updates the training data. |  | ||||||
|  |  | ||||||
| If you want to get rid of the warning or actually experience issues with automatic matching, delete |  | ||||||
| the file ``classification_model.pickle`` in the data directory and let paperless recreate it. |  | ||||||
|  |  | ||||||
|  |  | ||||||
| 504 Server Error: Gateway Timeout when adding Office documents |  | ||||||
| ############################################################## |  | ||||||
|  |  | ||||||
| You may experience these errors when using the optional TIKA integration: |  | ||||||
|  |  | ||||||
| .. code:: |  | ||||||
|  |  | ||||||
|     requests.exceptions.HTTPError: 504 Server Error: Gateway Timeout for url: http://gotenberg:3000/forms/libreoffice/convert |  | ||||||
|  |  | ||||||
| Gotenberg is a server that converts Office documents into PDF documents and has a default timeout of 30 seconds. |  | ||||||
| When conversion takes longer, Gotenberg raises this error. |  | ||||||
|  |  | ||||||
| You can increase the timeout by configuring a command flag for Gotenberg (see also `here <https://gotenberg.dev/docs/modules/api#properties>`__). |  | ||||||
| If using docker-compose, this is achieved by the following configuration change in the ``docker-compose.yml`` file: |  | ||||||
|  |  | ||||||
| .. code:: yaml |  | ||||||
|  |  | ||||||
|     gotenberg: |  | ||||||
|         image: gotenberg/gotenberg:7.6 |  | ||||||
|         restart: unless-stopped |  | ||||||
|         command: |  | ||||||
|             - "gotenberg" |  | ||||||
|             - "--chromium-disable-routes=true" |  | ||||||
|             - "--api-timeout=60" |  | ||||||
|  |  | ||||||
| Permission denied errors in the consumption directory |  | ||||||
| ##################################################### |  | ||||||
|  |  | ||||||
| You might encounter errors such as: |  | ||||||
|  |  | ||||||
| .. code:: shell-session |  | ||||||
|  |  | ||||||
|     The following error occured while consuming document.pdf: [Errno 13] Permission denied: '/usr/src/paperless/src/../consume/document.pdf' |  | ||||||
|  |  | ||||||
| This happens when paperless does not have permission to delete files inside the consumption directory. |  | ||||||
| Ensure that ``USERMAP_UID`` and ``USERMAP_GID`` are set to the user id and group id you use on the host operating system, if these are |  | ||||||
| different from ``1000``. See :ref:`setup-docker_hub`. |  | ||||||
|  |  | ||||||
| Also ensure that you are able to read and write to the consumption directory on the host. |  | ||||||
|  |  | ||||||
|  |  | ||||||
| OSError: [Errno 19] No such device when consuming files |  | ||||||
| ####################################################### |  | ||||||
|  |  | ||||||
| If you experience errors such as: |  | ||||||
|  |  | ||||||
| .. code:: shell-session |  | ||||||
|  |  | ||||||
|     File "/usr/local/lib/python3.7/site-packages/whoosh/codec/base.py", line 570, in open_compound_file |  | ||||||
|     return CompoundStorage(dbfile, use_mmap=storage.supports_mmap) |  | ||||||
|     File "/usr/local/lib/python3.7/site-packages/whoosh/filedb/compound.py", line 75, in __init__ |  | ||||||
|     self._source = mmap.mmap(fileno, 0, access=mmap.ACCESS_READ) |  | ||||||
|     OSError: [Errno 19] No such device |  | ||||||
|  |  | ||||||
|     During handling of the above exception, another exception occurred: |  | ||||||
|  |  | ||||||
|     Traceback (most recent call last): |  | ||||||
|     File "/usr/local/lib/python3.7/site-packages/django_q/cluster.py", line 436, in worker |  | ||||||
|     res = f(*task["args"], **task["kwargs"]) |  | ||||||
|     File "/usr/src/paperless/src/documents/tasks.py", line 73, in consume_file |  | ||||||
|     override_tag_ids=override_tag_ids) |  | ||||||
|     File "/usr/src/paperless/src/documents/consumer.py", line 271, in try_consume_file |  | ||||||
|     raise ConsumerError(e) |  | ||||||
|  |  | ||||||
| Paperless uses a search index to provide better and faster full text searching. This search index is stored inside |  | ||||||
| the ``data`` folder. The search index uses memory-mapped files (mmap). The above error indicates that paperless |  | ||||||
| was unable to create and open these files. |  | ||||||
|  |  | ||||||
| This happens when you're trying to store the data directory on certain file systems (mostly network shares) |  | ||||||
| that don't support memory-mapped files. |  | ||||||
|  |  | ||||||
|  |  | ||||||
| Web-UI stuck at "Loading..." |  | ||||||
| ############################ |  | ||||||
|  |  | ||||||
| This might have multiple reasons. |  | ||||||
|  |  | ||||||
|  |  | ||||||
| 1.  If you built the docker image yourself or deployed using the bare metal route, |  | ||||||
|     make sure that there are files in ``<paperless-root>/static/frontend/<lang-code>/``. |  | ||||||
|     If there are no files, make sure that you executed ``collectstatic`` successfully, either |  | ||||||
|     manually or as part of the docker image build. |  | ||||||
|  |  | ||||||
|     If the front end is still missing, make sure that the front end is compiled (files present in |  | ||||||
|     ``src/documents/static/frontend``). If it is not, you need to compile the front end yourself |  | ||||||
|     or download the release archive instead of cloning the repository. |  | ||||||
|  |  | ||||||
| 2.  Check the output of the web server. You might see errors like this: |  | ||||||
|  |  | ||||||
|  |  | ||||||
|     .. code:: |  | ||||||
|  |  | ||||||
|         [2021-01-25 10:08:04 +0000] [40] [ERROR] Socket error processing request. |  | ||||||
|         Traceback (most recent call last): |  | ||||||
|         File "/usr/local/lib/python3.7/site-packages/gunicorn/workers/sync.py", line 134, in handle |  | ||||||
|             self.handle_request(listener, req, client, addr) |  | ||||||
|         File "/usr/local/lib/python3.7/site-packages/gunicorn/workers/sync.py", line 190, in handle_request |  | ||||||
|             util.reraise(*sys.exc_info()) |  | ||||||
|         File "/usr/local/lib/python3.7/site-packages/gunicorn/util.py", line 625, in reraise |  | ||||||
|             raise value |  | ||||||
|         File "/usr/local/lib/python3.7/site-packages/gunicorn/workers/sync.py", line 178, in handle_request |  | ||||||
|             resp.write_file(respiter) |  | ||||||
|         File "/usr/local/lib/python3.7/site-packages/gunicorn/http/wsgi.py", line 396, in write_file |  | ||||||
|             if not self.sendfile(respiter): |  | ||||||
|         File "/usr/local/lib/python3.7/site-packages/gunicorn/http/wsgi.py", line 386, in sendfile |  | ||||||
|             sent += os.sendfile(sockno, fileno, offset + sent, count) |  | ||||||
|         OSError: [Errno 22] Invalid argument |  | ||||||
|  |  | ||||||
|     To fix this issue, add |  | ||||||
|  |  | ||||||
|     .. code:: |  | ||||||
|  |  | ||||||
|         SENDFILE=0 |  | ||||||
|  |  | ||||||
|     to your `docker-compose.env` file. |  | ||||||
|  |  | ||||||
| Error while reading metadata |  | ||||||
| ############################ |  | ||||||
|  |  | ||||||
| You might find messages like these in your log files: |  | ||||||
|  |  | ||||||
| .. code:: |  | ||||||
|  |  | ||||||
|     [WARNING] [paperless.parsing.tesseract] Error while reading metadata |  | ||||||
|  |  | ||||||
| This indicates that paperless failed to read PDF metadata from one of your documents. This happens when you |  | ||||||
| open the affected documents in paperless for editing. Paperless will continue to work, and will simply not |  | ||||||
| show the invalid metadata. |  | ||||||
|  |  | ||||||
| Consumer fails with a FileNotFoundError |  | ||||||
| ####################################### |  | ||||||
|  |  | ||||||
| You might find messages like these in your log files: |  | ||||||
|  |  | ||||||
| .. code:: |  | ||||||
|  |  | ||||||
|     [ERROR] [paperless.consumer] Error while consuming document SCN_0001.pdf: FileNotFoundError: [Errno 2] No such file or directory: '/tmp/ocrmypdf.io.yhk3zbv0/origin.pdf' |  | ||||||
|     Traceback (most recent call last): |  | ||||||
|       File "/app/paperless/src/paperless_tesseract/parsers.py", line 261, in parse |  | ||||||
|         ocrmypdf.ocr(**args) |  | ||||||
|       File "/usr/local/lib/python3.8/dist-packages/ocrmypdf/api.py", line 337, in ocr |  | ||||||
|         return run_pipeline(options=options, plugin_manager=plugin_manager, api=True) |  | ||||||
|       File "/usr/local/lib/python3.8/dist-packages/ocrmypdf/_sync.py", line 385, in run_pipeline |  | ||||||
|         exec_concurrent(context, executor) |  | ||||||
|       File "/usr/local/lib/python3.8/dist-packages/ocrmypdf/_sync.py", line 302, in exec_concurrent |  | ||||||
|         pdf = post_process(pdf, context, executor) |  | ||||||
|       File "/usr/local/lib/python3.8/dist-packages/ocrmypdf/_sync.py", line 235, in post_process |  | ||||||
|         pdf_out = metadata_fixup(pdf_out, context) |  | ||||||
|       File "/usr/local/lib/python3.8/dist-packages/ocrmypdf/_pipeline.py", line 798, in metadata_fixup |  | ||||||
|         with pikepdf.open(context.origin) as original, pikepdf.open(working_file) as pdf: |  | ||||||
|       File "/usr/local/lib/python3.8/dist-packages/pikepdf/_methods.py", line 923, in open |  | ||||||
|         pdf = Pdf._open( |  | ||||||
|     FileNotFoundError: [Errno 2] No such file or directory: '/tmp/ocrmypdf.io.yhk3zbv0/origin.pdf' |  | ||||||
|  |  | ||||||
| This probably indicates paperless tried to consume the same file twice.  This can happen for a number of reasons, |  | ||||||
| depending on how documents are placed into the consume folder.  If paperless is using inotify (the default) to |  | ||||||
| check for documents, try adjusting the :ref:`inotify configuration <configuration-inotify>`.  If polling is enabled, |  | ||||||
| try adjusting the :ref:`polling configuration <configuration-polling>`. |  | ||||||
|  |  | ||||||
| Consumer fails waiting for file to remain unmodified. |  | ||||||
| ##################################################### |  | ||||||
|  |  | ||||||
| You might find messages like these in your log files: |  | ||||||
|  |  | ||||||
| .. code:: |  | ||||||
|  |  | ||||||
|     [ERROR] [paperless.management.consumer] Timeout while waiting on file /usr/src/paperless/src/../consume/SCN_0001.pdf to remain unmodified. |  | ||||||
|  |  | ||||||
| This indicates paperless timed out while waiting for the file to be completely written to the consume folder. |  | ||||||
| Adjusting :ref:`polling configuration <configuration-polling>` values should resolve the issue. |  | ||||||
|  |  | ||||||
| .. note:: |  | ||||||
|  |  | ||||||
|     The user will need to manually move the file out of the consume folder and |  | ||||||
|     back in, for the initial failing file to be consumed. |  | ||||||
|  |  | ||||||
| Consumer fails reporting "OS reports file as busy still". |  | ||||||
| ######################################################### |  | ||||||
|  |  | ||||||
| You might find messages like these in your log files: |  | ||||||
|  |  | ||||||
| .. code:: |  | ||||||
|  |  | ||||||
|     [WARNING] [paperless.management.consumer] Not consuming file /usr/src/paperless/src/../consume/SCN_0001.pdf: OS reports file as busy still |  | ||||||
|  |  | ||||||
| This indicates paperless was unable to open the file, as the OS reported the file as still being in use.  To prevent a |  | ||||||
| crash, paperless did not try to consume the file.  If paperless is using inotify (the default) to |  | ||||||
| check for documents, try adjusting the :ref:`inotify configuration <configuration-inotify>`.  If polling is enabled, |  | ||||||
| try adjusting the :ref:`polling configuration <configuration-polling>`. |  | ||||||
|  |  | ||||||
| .. note:: |  | ||||||
|  |  | ||||||
|     The user will need to manually move the file out of the consume folder and |  | ||||||
|     back in, for the initial failing file to be consumed. |  | ||||||
|  |  | ||||||
| Log reports "Creating PaperlessTask failed". |  | ||||||
| ######################################################### |  | ||||||
|  |  | ||||||
| You might find messages like these in your log files: |  | ||||||
|  |  | ||||||
| .. code:: |  | ||||||
|  |  | ||||||
|     [ERROR] [paperless.management.consumer] Creating PaperlessTask failed: db locked |  | ||||||
|  |  | ||||||
| You are likely using an sqlite based installation, with an increased number of workers and are running into sqlite's concurrency limitations. |  | ||||||
| Uploading or consuming multiple files at once results in many workers attempting to access the database simultaneously. |  | ||||||
|  |  | ||||||
| Consider changing to the PostgreSQL database if you will be processing many documents at once often.  Otherwise, |  | ||||||
| try tweaking the ``PAPERLESS_DB_TIMEOUT`` setting to allow more time for the database to unlock.  This may have |  | ||||||
| minor performance implications. |  | ||||||
|  |  | ||||||
|  |  | ||||||
| gunicorn fails to start with "is not a valid port number" |  | ||||||
| ######################################################### |  | ||||||
|  |  | ||||||
| You are likely running using Kubernetes, which automatically creates an environment variable named `${serviceName}_PORT`. |  | ||||||
| This is the same environment variable which is used by Paperless to optionally change the port gunicorn listens on. |  | ||||||
|  |  | ||||||
| To fix this, set `PAPERLESS_PORT` again to your desired port, or the default of 8000. |  | ||||||
|   | |||||||
| @@ -1,420 +1,12 @@ | |||||||
|  | .. _usage_overview: | ||||||
|  |  | ||||||
| ************** | ************** | ||||||
| Usage Overview | Usage Overview | ||||||
| ************** | ************** | ||||||
|  |  | ||||||
| Paperless is an application that manages your personal documents. With |  | ||||||
| the help of a document scanner (see :ref:`scanners`), paperless transforms |  | ||||||
| your wieldy physical document binders into a searchable archive and |  | ||||||
| provides many utilities for finding and managing your documents. |  | ||||||
|  |  | ||||||
|  | .. cssclass:: redirect-notice | ||||||
|  |  | ||||||
| Terms and definitions |     The Paperless-ngx documentation has permanently moved. | ||||||
| ##################### |  | ||||||
|  |  | ||||||
| Paperless essentially consists of two different parts for managing your |     You will be redirected shortly... | ||||||
| documents: |  | ||||||
|  |  | ||||||
| * The *consumer* watches a specified folder and adds all documents in that |  | ||||||
|   folder to paperless. |  | ||||||
| * The *web server* provides a UI that you use to manage and search for your |  | ||||||
|   scanned documents. |  | ||||||
|  |  | ||||||
| Each document has a couple of fields that you can assign to them: |  | ||||||
|  |  | ||||||
| * A *Document* is a piece of paper that sometimes contains valuable |  | ||||||
|   information. |  | ||||||
| * The *correspondent* of a document is the person, institution or company that |  | ||||||
|   a document either originates from, or is sent to. |  | ||||||
| * A *tag* is a label that you can assign to documents. Think of labels as more |  | ||||||
|   powerful folders: Multiple documents can be grouped together with a single |  | ||||||
|   tag, however, a single document can also have multiple tags. This is not |  | ||||||
|   possible with folders. The reason folders are not implemented in paperless |  | ||||||
|   is simply that tags are much more versatile than folders. |  | ||||||
| * A *document type* is used to demarcate the type of a document such as letter, |  | ||||||
|   bank statement, invoice, contract, etc. It is used to identify what a document |  | ||||||
|   is about. |  | ||||||
| * The *date added* of a document is the date the document was scanned into |  | ||||||
|   paperless. You cannot and should not change this date. |  | ||||||
| * The *date created* of a document is the date the document was initially issued. |  | ||||||
|   This can be the date you bought a product, the date you signed a contract, or |  | ||||||
|   the date a letter was sent to you. |  | ||||||
| * The *archive serial number* (short: ASN) of a document is the identifier of |  | ||||||
|   the document in your physical document binders. See |  | ||||||
|   :ref:`usage-recommended_workflow` below. |  | ||||||
| * The *content* of a document is the text that was OCR'ed from the document. |  | ||||||
|   This text is fed into the search engine and is used for matching tags, |  | ||||||
|   correspondents and document types. |  | ||||||
|  |  | ||||||
|  |  | ||||||
| Frontend overview |  | ||||||
| ################# |  | ||||||
|  |  | ||||||
| .. warning:: |  | ||||||
|  |  | ||||||
|     TBD. Add some fancy screenshots! |  | ||||||
|  |  | ||||||
| Adding documents to paperless |  | ||||||
| ############################# |  | ||||||
|  |  | ||||||
| Once you've got Paperless setup, you need to start feeding documents into it. |  | ||||||
| When adding documents to paperless, it will perform the following operations on |  | ||||||
| your documents: |  | ||||||
|  |  | ||||||
| 1.  OCR the document, if it has no text. Digital documents usually have text, |  | ||||||
|     and this step will be skipped for those documents. |  | ||||||
| 2.  Paperless will create an archivable PDF/A document from your document. |  | ||||||
|     If this document is coming from your scanner, it will have embedded selectable text. |  | ||||||
| 3.  Paperless performs automatic matching of tags, correspondents and types on the |  | ||||||
|     document before storing it in the database. |  | ||||||
|  |  | ||||||
| .. hint:: |  | ||||||
|  |  | ||||||
|     This process can be configured to fit your needs. If you don't want paperless |  | ||||||
|     to create archived versions for digital documents, you can configure that by |  | ||||||
|     configuring ``PAPERLESS_OCR_MODE=skip_noarchive``. Please read the |  | ||||||
|     :ref:`relevant section in the documentation <configuration-ocr>`. |  | ||||||
|  |  | ||||||
| .. note:: |  | ||||||
|  |  | ||||||
|     No matter which options you choose, Paperless will always store the original |  | ||||||
|     document that it found in the consumption directory or in the mail and |  | ||||||
|     will never overwrite that document. Archived versions are stored alongside the |  | ||||||
|     original versions. |  | ||||||
|  |  | ||||||
|  |  | ||||||
| The consumption directory |  | ||||||
| ========================= |  | ||||||
|  |  | ||||||
| The primary method of getting documents into your database is by putting them in |  | ||||||
| the consumption directory.  The consumer runs in an infinite loop, looking for new |  | ||||||
| additions to this directory. When it finds them, the consumer goes about the process |  | ||||||
| of parsing them with the OCR, indexing what it finds, and storing it in the media directory. |  | ||||||
|  |  | ||||||
| Getting stuff into this directory is up to you.  If you're running Paperless |  | ||||||
| on your local computer, you might just want to drag and drop files there, but if |  | ||||||
| you're running this on a server and want your scanner to automatically push |  | ||||||
| files to this directory, you'll need to setup some sort of service to accept the |  | ||||||
| files from the scanner.  Typically, you're looking at an FTP server like |  | ||||||
| `Proftpd`_ or a Windows folder share with `Samba`_. |  | ||||||
|  |  | ||||||
| .. _Proftpd: http://www.proftpd.org/ |  | ||||||
| .. _Samba: http://www.samba.org/ |  | ||||||
|  |  | ||||||
| .. TODO: hyperref to configuration of the location of this magic folder. |  | ||||||
|  |  | ||||||
| Web UI Upload |  | ||||||
| ============= |  | ||||||
|  |  | ||||||
| The dashboard has a file drop field to upload documents to paperless. Simply drag a file |  | ||||||
| onto this field or select a file with the file dialog. Multiple files are supported. |  | ||||||
|  |  | ||||||
| You can also upload documents on any other page of the web UI by dragging-and-dropping |  | ||||||
| files into your browser window. |  | ||||||
|  |  | ||||||
| .. _usage-mobile_upload: |  | ||||||
|  |  | ||||||
| Mobile upload |  | ||||||
| ============= |  | ||||||
|  |  | ||||||
| The mobile app over at `<https://github.com/qcasey/paperless_share>`_ allows Android users |  | ||||||
| to share any documents with paperless. This can be combined with any of the mobile |  | ||||||
| scanning apps out there, such as Office Lens. |  | ||||||
|  |  | ||||||
| Furthermore, there is the  `Paperless App <https://github.com/bauerj/paperless_app>`_ as well, |  | ||||||
| which not only has document upload, but also document browsing and download features. |  | ||||||
|  |  | ||||||
| .. _usage-email: |  | ||||||
|  |  | ||||||
| IMAP (Email) |  | ||||||
| ============ |  | ||||||
|  |  | ||||||
| You can tell paperless-ngx to consume documents from your email accounts. |  | ||||||
| This is a very flexible and powerful feature, if you regularly received documents |  | ||||||
| via mail that you need to archive. The mail consumer can be configured by using the |  | ||||||
| admin interface in the following manner: |  | ||||||
|  |  | ||||||
| 1.  Define e-mail accounts. |  | ||||||
| 2.  Define mail rules for your account. |  | ||||||
|  |  | ||||||
| These rules perform the following: |  | ||||||
|  |  | ||||||
| 1.  Connect to the mail server. |  | ||||||
| 2.  Fetch all matching mails (as defined by folder, maximum age and the filters) |  | ||||||
| 3.  Check if there are any consumable attachments. |  | ||||||
| 4.  If so, instruct paperless to consume the attachments and optionally |  | ||||||
|     use the metadata provided in the rule for the new document. |  | ||||||
| 5.  If documents were consumed from a mail, the rule action is performed |  | ||||||
|     on that mail. |  | ||||||
|  |  | ||||||
| Paperless will completely ignore mails that do not match your filters. It will also |  | ||||||
| only perform the action on mails that it has consumed documents from. |  | ||||||
|  |  | ||||||
| The actions all ensure that the same mail is not consumed twice by different means. |  | ||||||
| These are as follows: |  | ||||||
|  |  | ||||||
| *   **Delete:** Immediately deletes mail that paperless has consumed documents from. |  | ||||||
|     Use with caution. |  | ||||||
| *   **Mark as read:** Mark consumed mail as read. Paperless will not consume documents |  | ||||||
|     from already read mails. If you read a mail before paperless sees it, it will be |  | ||||||
|     ignored. |  | ||||||
| *   **Flag:** Sets the 'important' flag on mails with consumed documents. Paperless |  | ||||||
|     will not consume flagged mails. |  | ||||||
| *   **Move to folder:** Moves consumed mails out of the way so that paperless wont |  | ||||||
|     consume them again. |  | ||||||
| *   **Add custom Tag:** Adds a custom tag to mails with consumed documents (the IMAP |  | ||||||
|     standard calls these "keywords"). Paperless will not consume mails already tagged. |  | ||||||
|     Not all mail servers support this feature! |  | ||||||
|  |  | ||||||
| .. caution:: |  | ||||||
|  |  | ||||||
|     The mail consumer will perform these actions on all mails it has consumed |  | ||||||
|     documents from. Keep in mind that the actual consumption process may fail |  | ||||||
|     for some reason, leaving you with missing documents in paperless. |  | ||||||
|  |  | ||||||
| .. note:: |  | ||||||
|  |  | ||||||
|     With the correct set of rules, you can completely automate your email documents. |  | ||||||
|     Create rules for every correspondent you receive digital documents from and |  | ||||||
|     paperless will read them automatically. The default action "mark as read" is |  | ||||||
|     pretty tame and will not cause any damage or data loss whatsoever. |  | ||||||
|  |  | ||||||
|     You can also setup a special folder in your mail account for paperless and use |  | ||||||
|     your favorite mail client to move to be consumed mails into that folder |  | ||||||
|     automatically or manually and tell paperless to move them to yet another folder |  | ||||||
|     after consumption. It's up to you. |  | ||||||
|  |  | ||||||
| .. note:: |  | ||||||
|  |  | ||||||
|     When defining a mail rule with a folder, you may need to try different characters to |  | ||||||
|     define how the sub-folders are separated.  Common values include ".", "/" or "|", but |  | ||||||
|     this varies by the mail server.  Check the documentation for your mail server.  In the |  | ||||||
|     event of an error fetching mail from a certain folder, check the Paperless logs.  When |  | ||||||
|     a folder is not located, Paperless will attempt to list all folders found in the account |  | ||||||
|     to the Paperless logs. |  | ||||||
|  |  | ||||||
| .. note:: |  | ||||||
|  |  | ||||||
|     Paperless will process the rules in the order defined in the admin page. |  | ||||||
|  |  | ||||||
|     You can define catch-all rules and have them executed last to consume |  | ||||||
|     any documents not matched by previous rules. Such a rule may assign an "Unknown |  | ||||||
|     mail document" tag to consumed documents so you can inspect them further. |  | ||||||
|  |  | ||||||
| Paperless is set up to check your mails every 10 minutes. This can be configured on the |  | ||||||
| 'Scheduled tasks' page in the admin. |  | ||||||
|  |  | ||||||
|  |  | ||||||
| REST API |  | ||||||
| ======== |  | ||||||
|  |  | ||||||
| You can also submit a document using the REST API, see :ref:`api-file_uploads` for details. |  | ||||||
|  |  | ||||||
| .. _basic-searching: |  | ||||||
|  |  | ||||||
|  |  | ||||||
| Best practices |  | ||||||
| ############## |  | ||||||
|  |  | ||||||
| Paperless offers a couple tools that help you organize your document collection. However, |  | ||||||
| it is up to you to use them in a way that helps you organize documents and find specific |  | ||||||
| documents when you need them. This section offers a couple ideas for managing your collection. |  | ||||||
|  |  | ||||||
| Document types allow you to classify documents according to what they are. You can define |  | ||||||
| types such as "Receipt", "Invoice", or "Contract". If you used to collect all your receipts |  | ||||||
| in a single binder, you can recreate that system in paperless by defining a document type, |  | ||||||
| assigning documents to that type and then filtering by that type to only see all receipts. |  | ||||||
|  |  | ||||||
| Not all documents need document types. Sometimes its hard to determine what the type of a |  | ||||||
| document is or it is hard to justify creating a document type that you only need once or twice. |  | ||||||
| This is okay. As long as the types you define help you organize your collection in the way |  | ||||||
| you want, paperless is doing its job. |  | ||||||
|  |  | ||||||
| Tags can be used in many different ways. Think of tags are more versatile folders or binders. |  | ||||||
| If you have a binder for documents related to university / your car or health care, you can |  | ||||||
| create these binders in paperless by creating tags and assigning them to relevant documents. |  | ||||||
| Just as with documents, you can filter the document list by tags and only see documents of |  | ||||||
| a certain topic. |  | ||||||
|  |  | ||||||
| With physical documents, you'll often need to decide which folder the document belongs to. |  | ||||||
| The advantage of tags over folders and binders is that a single document can have multiple |  | ||||||
| tags. A physical document cannot magically appear in two different folders, but with tags, |  | ||||||
| this is entirely possible. |  | ||||||
|  |  | ||||||
| .. hint:: |  | ||||||
|  |  | ||||||
|   This can be used in many different ways. One example: Imagine you're working on a particular |  | ||||||
|   task, such as signing up for university. Usually you'll need to collect a bunch of different |  | ||||||
|   documents that are already sorted into various folders. With the tag system of paperless, |  | ||||||
|   you can create a new group of documents that are relevant to this task without destroying |  | ||||||
|   the already existing organization. When you're done with the task, you could delete the |  | ||||||
|   tag again, which would be equal to sorting documents back into the folder they belong into. |  | ||||||
|   Or keep the tag, up to you. |  | ||||||
|  |  | ||||||
| All of the logic above applies to correspondents as well. Attach them to documents if you |  | ||||||
| feel that they help you organize your collection. |  | ||||||
|  |  | ||||||
| When you've started organizing your documents, create a couple saved views for document collections |  | ||||||
| you regularly access. This is equal to having labeled physical binders on your desk, except |  | ||||||
| that these saved views are dynamic and simply update themselves as you add documents to the system. |  | ||||||
|  |  | ||||||
| Here are a couple examples of tags and types that you could use in your collection. |  | ||||||
|  |  | ||||||
| * An ``inbox`` tag for newly added documents that you haven't manually edited yet. |  | ||||||
| * A tag ``car`` for everything car related (repairs, registration, insurance, etc) |  | ||||||
| * A tag ``todo`` for documents that you still need to do something with, such as reply, or |  | ||||||
|   perform some task online. |  | ||||||
| * A tag ``bank account x`` for all bank statement related to that account. |  | ||||||
| * A tag ``mail`` for anything that you added to paperless via its mail processing capabilities. |  | ||||||
| * A tag ``missing_metadata`` when you still need to add some metadata to a document, but can't |  | ||||||
|   or don't want to do this right now. |  | ||||||
|  |  | ||||||
| .. _basic-usage_searching: |  | ||||||
|  |  | ||||||
| Searching |  | ||||||
| ######### |  | ||||||
|  |  | ||||||
| Paperless offers an extensive searching mechanism that is designed to allow you to quickly |  | ||||||
| find a document you're looking for (for example, that thing that just broke and you bought |  | ||||||
| a couple months ago, that contract you signed 8 years ago). |  | ||||||
|  |  | ||||||
| When you search paperless for a document, it tries to match this query against your documents. |  | ||||||
| Paperless will look for matching documents by inspecting their content, title, correspondent, |  | ||||||
| type and tags. Paperless returns a scored list of results, so that documents matching your query |  | ||||||
| better will appear further up in the search results. |  | ||||||
|  |  | ||||||
| By default, paperless returns only documents which contain all words typed in the search bar. |  | ||||||
| However, paperless also offers advanced search syntax if you want to drill down the results |  | ||||||
| further. |  | ||||||
|  |  | ||||||
| Matching documents with logical expressions: |  | ||||||
|  |  | ||||||
| .. code:: |  | ||||||
|  |  | ||||||
|   shopname AND (product1 OR product2) |  | ||||||
|  |  | ||||||
| Matching specific tags, correspondents or types: |  | ||||||
|  |  | ||||||
| .. code:: |  | ||||||
|  |  | ||||||
|   type:invoice tag:unpaid |  | ||||||
|   correspondent:university certificate |  | ||||||
|  |  | ||||||
| Matching dates: |  | ||||||
|  |  | ||||||
| .. code:: |  | ||||||
|  |  | ||||||
|   created:[2005 to 2009] |  | ||||||
|   added:yesterday |  | ||||||
|   modified:today |  | ||||||
|  |  | ||||||
| Matching inexact words: |  | ||||||
|  |  | ||||||
| .. code:: |  | ||||||
|  |  | ||||||
|   produ*name |  | ||||||
|  |  | ||||||
| .. note:: |  | ||||||
|  |  | ||||||
|   Inexact terms are hard for search indexes. These queries might take a while to execute. That's why paperless offers |  | ||||||
|   auto complete and query correction. |  | ||||||
|  |  | ||||||
| All of these constructs can be combined as you see fit. |  | ||||||
| If you want to learn more about the query language used by paperless, paperless uses Whoosh's default query language. |  | ||||||
| Head over to `Whoosh query language <https://whoosh.readthedocs.io/en/latest/querylang.html>`_. |  | ||||||
| For details on what date parsing utilities are available, see |  | ||||||
| `Date parsing <https://whoosh.readthedocs.io/en/latest/dates.html#parsing-date-queries>`_. |  | ||||||
|  |  | ||||||
|  |  | ||||||
| .. _usage-recommended_workflow: |  | ||||||
|  |  | ||||||
| The recommended workflow |  | ||||||
| ######################## |  | ||||||
|  |  | ||||||
| Once you have familiarized yourself with paperless and are ready to use it |  | ||||||
| for all your documents, the recommended workflow for managing your documents |  | ||||||
| is as follows. This workflow also takes into account that some documents |  | ||||||
| have to be kept in physical form, but still ensures that you get all the |  | ||||||
| advantages for these documents as well. |  | ||||||
|  |  | ||||||
| The following diagram shows how easy it is to manage your documents. |  | ||||||
|  |  | ||||||
| .. image:: _static/recommended_workflow.png |  | ||||||
|  |  | ||||||
| Preparations in paperless |  | ||||||
| ========================= |  | ||||||
|  |  | ||||||
| * Create an inbox tag that gets assigned to all new documents. |  | ||||||
| * Create a TODO tag. |  | ||||||
|  |  | ||||||
| Processing of the physical documents |  | ||||||
| ==================================== |  | ||||||
|  |  | ||||||
| Keep a physical inbox. Whenever you receive a document that you need to |  | ||||||
| archive, put it into your inbox. Regularly, do the following for all documents |  | ||||||
| in your inbox: |  | ||||||
|  |  | ||||||
| 1.  For each document, decide if you need to keep the document in physical |  | ||||||
|     form. This applies to certain important documents, such as contracts and |  | ||||||
|     certificates. |  | ||||||
| 2.  If you need to keep the document, write a running number on the document |  | ||||||
|     before scanning, starting at one and counting upwards. This is the archive |  | ||||||
|     serial number, or ASN in short. |  | ||||||
| 3.  Scan the document. |  | ||||||
| 4.  If the document has an ASN assigned, store it in a *single* binder, sorted |  | ||||||
|     by ASN. Don't order this binder in any other way. |  | ||||||
| 5.  If the document has no ASN, throw it away. Yay! |  | ||||||
|  |  | ||||||
| Over time, you will notice that your physical binder will fill up. If it is |  | ||||||
| full, label the binder with the range of ASNs in this binder (i.e., "Documents |  | ||||||
| 1 to 343"), store the binder in your cellar or elsewhere, and start a new |  | ||||||
| binder. |  | ||||||
|  |  | ||||||
| The idea behind this process is that you will never have to use the physical |  | ||||||
| binders to find a document. If you need a specific physical document, you |  | ||||||
| may find this document by: |  | ||||||
|  |  | ||||||
| 1.  Searching in paperless for the document. |  | ||||||
| 2.  Identify the ASN of the document, since it appears on the scan. |  | ||||||
| 3.  Grab the relevant document binder and get the document. This is easy since |  | ||||||
|     they are sorted by ASN. |  | ||||||
|  |  | ||||||
| Processing of documents in paperless |  | ||||||
| ==================================== |  | ||||||
|  |  | ||||||
| Once you have scanned in a document, proceed in paperless as follows. |  | ||||||
|  |  | ||||||
| 1.  If the document has an ASN, assign the ASN to the document. |  | ||||||
| 2.  Assign a correspondent to the document (i.e., your employer, bank, etc) |  | ||||||
|     This isn't strictly necessary but helps in finding a document when you need |  | ||||||
|     it. |  | ||||||
| 3.  Assign a document type (i.e., invoice, bank statement, etc) to the document |  | ||||||
|     This isn't strictly necessary but helps in finding a document when you need |  | ||||||
|     it. |  | ||||||
| 4.  Assign a proper title to the document (the name of an item you bought, the |  | ||||||
|     subject of the letter, etc) |  | ||||||
| 5.  Check that the date of the document is correct. Paperless tries to read |  | ||||||
|     the date from the content of the document, but this fails sometimes if the |  | ||||||
|     OCR is bad or multiple dates appear on the document. |  | ||||||
| 6.  Remove inbox tags from the documents. |  | ||||||
|  |  | ||||||
| .. hint:: |  | ||||||
|  |  | ||||||
|     You can setup manual matching rules for your correspondents and tags and |  | ||||||
|     paperless will assign them automatically. After consuming a couple documents, |  | ||||||
|     you can even ask paperless to *learn* when to assign tags and correspondents |  | ||||||
|     by itself. For details on this feature, see :ref:`advanced-matching`. |  | ||||||
|  |  | ||||||
| Task management |  | ||||||
| =============== |  | ||||||
|  |  | ||||||
| Some documents require attention and require you to act on the document. You |  | ||||||
| may take two different approaches to handle these documents based on how |  | ||||||
| regularly you intend to scan documents and use paperless. |  | ||||||
|  |  | ||||||
| * If you scan and process your documents in paperless regularly, assign a |  | ||||||
|   TODO tag to all scanned documents that you need to process. Create a saved |  | ||||||
|   view on the dashboard that shows all documents with this tag. |  | ||||||
| * If you do not scan documents regularly and use paperless solely for archiving, |  | ||||||
|   create a physical todo box next to your physical inbox and put documents you |  | ||||||
|   need to process in the TODO box. When you performed the task associated with |  | ||||||
|   the document, move it to the inbox. |  | ||||||
|   | |||||||
		Reference in New Issue
	
	Block a user