mirror of
				https://github.com/paperless-ngx/paperless-ngx.git
				synced 2025-10-30 03:56:23 -05:00 
			
		
		
		
	Merge branch 'dev'
This commit is contained in:
		| @@ -15,7 +15,7 @@ services: | ||||
|       POSTGRES_PASSWORD: paperless | ||||
| 
 | ||||
|   webserver: | ||||
|     image: jonaswinkler/paperless-ng:0.9 | ||||
|     image: jonaswinkler/paperless-ng:0.9.1 | ||||
|     restart: always | ||||
|     depends_on: | ||||
|       - db | ||||
| @@ -5,7 +5,7 @@ services: | ||||
|     restart: always | ||||
|  | ||||
|   webserver: | ||||
|     image: jonaswinkler/paperless-ng:0.9 | ||||
|     image: jonaswinkler/paperless-ng:0.9.1 | ||||
|     restart: always | ||||
|     depends_on: | ||||
|       - broker | ||||
|   | ||||
| @@ -1,7 +1,3 @@ | ||||
| ############################################################################### | ||||
| ### Back end                                                                ### | ||||
| ############################################################################### | ||||
|  | ||||
| FROM python:3.7-slim | ||||
|  | ||||
| WORKDIR /usr/src/paperless/ | ||||
|   | ||||
| @@ -82,6 +82,13 @@ A.  If you used the docker-compose file, simply download the files of the new re | ||||
|     If you see everything working, you can start paperless-ng with "-d" to have it | ||||
|     run in the background. | ||||
|  | ||||
|     .. hint:: | ||||
|  | ||||
|         The released docker-compose files specify exact versions to be pulled from the hub. | ||||
|         This is to ensure that if the docker-compose files should change at some point | ||||
|         (i.e., services updates/configured differently), you wont run into trouble due to | ||||
|         docker pulling the ``latest`` image and running it in an older environment. | ||||
|          | ||||
| B.  If you built the image yourself, grab the new archive and replace your current | ||||
|     paperless folder with the new contents. | ||||
|  | ||||
| @@ -120,6 +127,7 @@ After grabbing the new release and unpacking the contents, do the following: | ||||
|         $ pip install --upgrade pipenv | ||||
|         $ cd /path/to/paperless | ||||
|         $ pipenv install | ||||
|         $ pipenv clean | ||||
|  | ||||
|     This creates a new virtual environment (or uses your existing environment) | ||||
|     and installs all dependencies into it. | ||||
| @@ -143,7 +151,7 @@ Management utilities | ||||
| #################### | ||||
|  | ||||
| Paperless comes with some management commands that perform various maintenance | ||||
| tasks on your paperless instance. You can invoce these commands either by | ||||
| tasks on your paperless instance. You can invoke these commands either by | ||||
|  | ||||
| .. code:: bash | ||||
|  | ||||
| @@ -311,6 +319,19 @@ the naming scheme. | ||||
| The command takes no arguments and processes all your documents at once. | ||||
|  | ||||
|  | ||||
| Fetching e-mail | ||||
| =============== | ||||
|  | ||||
| Paperless automatically fetches your e-mail every 10 minutes by default. If | ||||
| you want to invoke the email consumer manually, call the following management | ||||
| command: | ||||
|  | ||||
| .. code:: | ||||
|  | ||||
|     mail_fetcher | ||||
|  | ||||
| The command takes no arguments and processes all your mail accounts and rules. | ||||
|  | ||||
| .. _utilities-encyption: | ||||
|  | ||||
| Managing encryption | ||||
| @@ -320,7 +341,7 @@ Documents can be stored in Paperless using GnuPG encryption. | ||||
|  | ||||
| .. danger:: | ||||
|  | ||||
|     Decryption is depreceated since paperless-ng 0.9 and doesn't really provide any | ||||
|     Encryption is depreceated since paperless-ng 0.9 and doesn't really provide any | ||||
|     additional security, since you have to store the passphrase in a configuration | ||||
|     file on the same system as the encrypted documents for paperless to work. | ||||
|     Furthermore, the entire text content of the documents is stored plain in the | ||||
|   | ||||
| @@ -52,6 +52,8 @@ filename as described above. | ||||
|  | ||||
| .. _dateparser: https://github.com/scrapinghub/dateparser/blob/v0.7.0/docs/usage.rst#settings | ||||
|  | ||||
| .. _advanced-transforming_filenames: | ||||
|  | ||||
| Transforming filenames for parsing | ||||
| ================================== | ||||
|  | ||||
| @@ -219,6 +221,7 @@ the consumption process will begin with the newly modified file. | ||||
|  | ||||
| .. _pdf2pdfocr.py: https://github.com/LeoFCardoso/pdf2pdfocr | ||||
|  | ||||
| .. _advanced-post_consume_script: | ||||
|  | ||||
| Post-consumption script | ||||
| ======================= | ||||
|   | ||||
| @@ -91,6 +91,7 @@ Result object: | ||||
|         "document": { | ||||
|              | ||||
|         } | ||||
|     } | ||||
|  | ||||
| *   ``id``: the primary key of the found document | ||||
| *   ``highlights``: an object containing parseable highlights for the result. | ||||
| @@ -109,7 +110,7 @@ Each fragment contains a list of strings, and some of them are marked as a highl | ||||
|  | ||||
| .. code:: json | ||||
|  | ||||
|     "highlights": [ | ||||
|     [ | ||||
|         [ | ||||
|             {"text": "This is a sample text with a "}, | ||||
|             {"text": "highlighted", "term": 0}, | ||||
| @@ -121,6 +122,8 @@ Each fragment contains a list of strings, and some of them are marked as a highl | ||||
|         ] | ||||
|     ] | ||||
|      | ||||
|  | ||||
|  | ||||
| When ``term`` is present within a string, the word within ``text`` should be highlighted. | ||||
| The term index groups multiple matches together and words with the same index | ||||
| should get identical highlighting. | ||||
|   | ||||
| @@ -66,7 +66,6 @@ paperless-ng 0.9.0 | ||||
|  | ||||
|   * If ``PAPERLESS_DBHOST`` is specified in the settings, paperless uses postgresql instead of sqlite. | ||||
|     Username, database and password all default to ``paperless`` if not specified. | ||||
|   * **docker-compose.yml uses PostgreSQL by default.** | ||||
|  | ||||
| * **Modified [breaking]:** document_retagger management command rework. See | ||||
|   :ref:`utilities-retagger` for details. Replaces ``document_correspondents`` | ||||
|   | ||||
| @@ -1,9 +1,10 @@ | ||||
| .. _configuration: | ||||
|  | ||||
| ************* | ||||
| Configuration | ||||
| ************* | ||||
|  | ||||
| Paperless provides a wide range of customizations. | ||||
| Have a look at ``paperless.conf.example`` for available configuration options. | ||||
| Depending on how you run paperless, these settings have to be defined in different | ||||
| places. | ||||
|  | ||||
| @@ -18,5 +19,288 @@ places. | ||||
|         /etc/paperless.conf | ||||
|         /usr/local/etc/paperless.conf | ||||
|  | ||||
|     Copy ``paperless.conf.example`` to any of these locations and adjust it to your | ||||
|     needs. | ||||
|  | ||||
| Required services | ||||
| ################# | ||||
|  | ||||
| PAPERLESS_REDIS=<url> | ||||
|     This is required for processing scheduled tasks such as email fetching, index | ||||
|     optimization and for training the automatic document matcher. | ||||
|  | ||||
|     Defaults to redis://localhost:6379. | ||||
|  | ||||
| PAPERLESS_DBHOST=<hostname> | ||||
|     By default, sqlite is used as the database backend. This can be changed here. | ||||
|     Set PAPERLESS_DBHOST and PostgreSQL will be used instead of mysql. | ||||
|  | ||||
| PAPERLESS_DBPORT=<port> | ||||
|     Adjust port if necessary. | ||||
|      | ||||
|     Default is 5432. | ||||
|  | ||||
| PAPERLESS_DBNAME=<name> | ||||
|     Database name in PostgreSQL. | ||||
|      | ||||
|     Defaults to "paperless". | ||||
|  | ||||
| PAPERLESS_DBUSER=<name> | ||||
|     Database user in PostgreSQL. | ||||
|      | ||||
|     Defaults to "paperless". | ||||
|  | ||||
| PAPERLESS_DBPASS=<password> | ||||
|     Database password for PostgreSQL. | ||||
|      | ||||
|     Defaults to "paperless". | ||||
|  | ||||
|  | ||||
| Paths and folders | ||||
| ################# | ||||
|  | ||||
| PAPERLESS_CONSUMPTION_DIR=<path> | ||||
|     This where your documents should go to be consumed.  Make sure that it exists | ||||
|     and that the user running the paperless service can read/write its contents | ||||
|     before you start Paperless. | ||||
|  | ||||
|     Don't change this when using docker, as it only changes the path within the | ||||
|     container. Change the local consumption directory in the docker-compose.yml | ||||
|     file instead. | ||||
|  | ||||
|     Defaults to "../consume", relative to the "src" directory. | ||||
|  | ||||
| PAPERLESS_DATA_DIR=<path> | ||||
|     This is where paperless stores all its data (search index, sqlite database, | ||||
|     classification model, etc). | ||||
|  | ||||
|     Defaults to "../data", relative to the "src" directory. | ||||
|  | ||||
| PAPERLESS_MEDIA_ROOT=<path> | ||||
|     This is where your documents and thumbnails are stored. | ||||
|  | ||||
|     You can set this and PAPERLESS_DATA_DIR to the same folder to have paperless | ||||
|     store all its data within the same volume. | ||||
|  | ||||
|     Defaults to "../media", relative to the "src" directory. | ||||
|  | ||||
| PAPERLESS_STATICDIR=<path> | ||||
|     Override the default STATIC_ROOT here.  This is where all static files | ||||
|     created using "collectstatic" manager command are stored. | ||||
|  | ||||
|     Unless you're doing something fancy, there is no need to override this. | ||||
|  | ||||
|     Defaults to "../static", relative to the "src" directory. | ||||
|  | ||||
| PAPERLESS_FILENAME_FORMAT=<format> | ||||
|     Changes the filenames paperless uses to store documents in the media directory. | ||||
|     See :ref:`advanced-file_name_handling` for details. | ||||
|  | ||||
|     Default is none, which disables this feature. | ||||
|  | ||||
| Hosting & Security | ||||
| ################## | ||||
|  | ||||
| PAPERLESS_SECRET_KEY=<key> | ||||
|     Paperless uses this to make session tokens. If you exose paperless on the | ||||
|     internet, you need to change this, since the default secret is well known. | ||||
|  | ||||
|     Use any sequence of characters. The more, the better. You don't need to | ||||
|     remember this. Just face-roll your keyboard. | ||||
|  | ||||
|     Default is listed in the file ``src/paperless/settings.py``. | ||||
|  | ||||
| PAPERLESS_ALLOWED_HOSTS<comma-separated-list> | ||||
|     If you're planning on putting Paperless on the open internet, then you | ||||
|     really should set this value to the domain name you're using.  Failing to do | ||||
|     so leaves you open to HTTP host header attacks: | ||||
|     https://docs.djangoproject.com/en/3.1/topics/security/#host-header-validation | ||||
|      | ||||
|     Just remember that this is a comma-separated list, so "example.com" is fine, | ||||
|     as is "example.com,www.example.com", but NOT " example.com" or "example.com," | ||||
|  | ||||
|     Defaults to "*", which is all hosts. | ||||
|  | ||||
| PAPERLESS_CORS_ALLOWED_HOSTS<comma-separated-list> | ||||
|     You need to add your servers to the list of allowed hosts that can do CORS | ||||
|     calls. Set this to your public domain name. | ||||
|  | ||||
|     Defaults to "http://localhost:8000". | ||||
|  | ||||
| PAPERLESS_FORCE_SCRIPT_NAME=<path> | ||||
|     To host paperless under a subpath url like example.com/paperless you set | ||||
|     this value to /paperless. No trailing slash! | ||||
|  | ||||
|     .. note:: | ||||
|  | ||||
|         I don't know if this works in paperless-ng. Probably not. | ||||
|      | ||||
|     Defaults to none, which hosts paperless at "/". | ||||
|  | ||||
| PAPERLESS_STATIC_URL=<path> | ||||
|     Override the STATIC_URL here.  Unless you're hosting Paperless off a | ||||
|     subdomain like /paperless/, you probably don't need to change this. | ||||
|      | ||||
|     Defaults to "/static/". | ||||
|  | ||||
|  | ||||
| Software tweaks | ||||
| ############### | ||||
|  | ||||
| PAPERLESS_TASK_WORKERS=<num> | ||||
|     Paperless does multiple things in the background: Maintain the search index, | ||||
|     maintain the automatic matching algorithm, check emails, consume documents, | ||||
|     etc. This variable specifies how many things it will do in parallel. | ||||
|  | ||||
| PAPERLESS_THREADS_PER_WORKER=<num> | ||||
|     Furthermore, paperless uses multiple threads when consuming documents to | ||||
|     speed up OCR. This variable specifies how many pages paperless will process | ||||
|     in parallel on a single document. | ||||
|  | ||||
|     .. caution:: | ||||
|          | ||||
|         Ensure that the product | ||||
|          | ||||
|             PAPERLESS_TASK_WORKERS * PAPERLESS_THREADS_PER_WORKER | ||||
|          | ||||
|         does not exceed your CPU core count or else paperless will be extremely slow. | ||||
|         If you want paperless to process many documents in parallel, choose a high | ||||
|         worker count. If you want paperless to process very large documents faster, | ||||
|         use a higher thread per worker count. | ||||
|  | ||||
|     The default is a balance between the two, according to your CPU core count, | ||||
|     with a slight favor towards threads per worker, and using as much cores as | ||||
|     possible. | ||||
|  | ||||
|     If you only specify PAPERLESS_TASK_WORKERS, paperless will adjust | ||||
|     PAPERLESS_THREADS_PER_WORKER automatically. | ||||
|  | ||||
|  | ||||
|  | ||||
| PAPERLESS_TIME_ZONE=<timezone> | ||||
|     Set the time zone here. | ||||
|     See https://docs.djangoproject.com/en/3.1/ref/settings/#std:setting-TIME_ZONE | ||||
|     for details on how to set it. | ||||
|  | ||||
|     Defaults to UTC. | ||||
|  | ||||
|  | ||||
|  | ||||
| PAPERLESS_OCR_LANGUAGE=<lang> | ||||
|     Customize the default language that tesseract will attempt to use when | ||||
|     parsing documents. The default language is used whenever | ||||
|      | ||||
|     * No language could be detected on a document | ||||
|     * No tesseract data files are available for the detected language | ||||
|      | ||||
|     It should be a 3-letter language code consistent with ISO | ||||
|     639: https://www.loc.gov/standards/iso639-2/php/code_list.php | ||||
|  | ||||
|     Set this to the language most of your documents are written in. | ||||
|  | ||||
|     Defaults to "eng". | ||||
|  | ||||
| PAPERLESS_OCR_ALWAYS=<bool> | ||||
|     By default Paperless does not OCR a document if the text can be retrieved from | ||||
|     the document directly. Set to true to always OCR documents. | ||||
|  | ||||
|     Defaults to false. | ||||
|  | ||||
| PAPERLESS_CONSUMER_POLLING=<num> | ||||
|     If paperless won't find documents added to your consume folder, it might | ||||
|     not be able to automatically detect filesystem changes. In that case, | ||||
|     specify a polling interval in seconds here, which will then cause paperless | ||||
|     to periodically check your consumption directory for changes. | ||||
|  | ||||
|     Defaults to 0, which disables polling and uses filesystem notifiactions. | ||||
|  | ||||
| PAPERLESS_CONSUMER_DELETE_DUPLICATES=<bool> | ||||
|     When the consumer detects a duplicate document, it will not touch the | ||||
|     original document. This default behavior can be changed here. | ||||
|  | ||||
|     Defaults to false. | ||||
|  | ||||
| PAPERLESS_CONVERT_MEMORY_LIMIT=<num> | ||||
|     On smaller systems, or even in the case of Very Large Documents, the consumer | ||||
|     may explode, complaining about how it's "unable to extend pixel cache".  In | ||||
|     such cases, try setting this to a reasonably low value, like 32.  The | ||||
|     default is to use whatever is necessary to do everything without writing to | ||||
|     disk, and units are in megabytes. | ||||
|      | ||||
|     For more information on how to use this value, you should search | ||||
|     the web for "MAGICK_MEMORY_LIMIT". | ||||
|  | ||||
|     Defaults to 0, which disables the limit. | ||||
|  | ||||
| PAPERLESS_CONVERT_TMPDIR=<path> | ||||
|     Similar to the memory limit, if you've got a small system and your OS mounts | ||||
|     /tmp as tmpfs, you should set this to a path that's on a physical disk, like | ||||
|     /home/your_user/tmp or something.  ImageMagick will use this as scratch space | ||||
|     when crunching through very large documents. | ||||
|      | ||||
|     For more information on how to use this value, you should search | ||||
|     the web for "MAGICK_TMPDIR". | ||||
|  | ||||
|     Default is none, which disables the temporary directory. | ||||
|  | ||||
| PAPERLESS_CONVERT_DENSITY=<num> | ||||
|     This setting has a high impact on the physical size of tmp page files, | ||||
|     the speed of document conversion, and can affect the accuracy of OCR | ||||
|     results. Individual results can vary and this setting should be tested | ||||
|     thoroughly against the documents you are importing to see if it has any | ||||
|     impacts either negative or positive. | ||||
|     Testing on limited document sets has shown a setting of 200 can cut the | ||||
|     size of tmp files by 1/3, and speed up conversion by up to 4x | ||||
|     with little impact to OCR accuracy. | ||||
|  | ||||
|     Default is 300. | ||||
|  | ||||
| PAPERLESS_OPTIMIZE_THUMBNAILS=<bool> | ||||
|     Use optipng to optimize thumbnails. This usually reduces the sice of | ||||
|     thumbnails by about 20%, but uses considerable compute time during | ||||
|     consumption. | ||||
|  | ||||
|     Defaults to true. | ||||
|  | ||||
| PAPERLESS_POST_CONSUME_SCRIPT=<filename> | ||||
|     After a document is consumed, Paperless can trigger an arbitrary script if | ||||
|     you like.  This script will be passed a number of arguments for you to work | ||||
|     with. For more information, take a look at :ref:`advanced-post_consume_script`. | ||||
|  | ||||
|     The default is blank, which means nothing will be executed. | ||||
|  | ||||
| PAPERLESS_FILENAME_DATE_ORDER=<format> | ||||
|     Paperless will check the document text for document date information. | ||||
|     Use this setting to enable checking the document filename for date | ||||
|     information. The date order can be set to any option as specified in | ||||
|     https://dateparser.readthedocs.io/en/latest/settings.html#date-order. | ||||
|     The filename will be checked first, and if nothing is found, the document  | ||||
|     text will be checked as normal. | ||||
|  | ||||
|     Defaults to none, which disables this feature. | ||||
|  | ||||
| PAPERLESS_FILENAME_PARSE_TRANSFORMS | ||||
|     Transforms filenames before they are processed by paperless. See | ||||
|     :ref:`advanced-transforming_filenames` for details. | ||||
|  | ||||
|     Defaults to none, which disables this feature. | ||||
|  | ||||
| Binaries | ||||
| ######## | ||||
|  | ||||
| There are a few external software packages that Paperless expects to find on | ||||
| your system when it starts up.  Unless you've done something creative with | ||||
| their installation, you probably won't need to edit any of these.  However, | ||||
| if you've installed these programs somewhere where simply typing the name of | ||||
| the program doesn't automatically execute it (ie. the program isn't in your | ||||
| $PATH), then you'll need to specify the literal path for that program. | ||||
|  | ||||
| PAPERLESS_CONVERT_BINARY=<path> | ||||
|     Defaults to "/usr/bin/convert". | ||||
|  | ||||
| PAPERLESS_GS_BINARY=<path> | ||||
|     Defaults to "/usr/bin/gs". | ||||
|  | ||||
| PAPERLESS_UNPAPER_BINARY=<path> | ||||
|     Defaults to "/usr/bin/unpaper". | ||||
|  | ||||
| PAPERLESS_OPTIPNG_BINARY=<path> | ||||
|     Defaults to "/usr/bin/optipng". | ||||
|   | ||||
| @@ -25,14 +25,28 @@ and then shred them.  Perhaps you might find it useful too. | ||||
| Paperless-ng | ||||
| ============ | ||||
|  | ||||
| I wanted to make big changes to the project that will impact the way it is used | ||||
| by its users greatly. Among the users who currently use paperless in production | ||||
| there are probably many that don't want these changes right away. I also wanted | ||||
| to have more control over what goes into the code and what does not. Therefore, | ||||
| paperless-ng was created. NG stands for both Angular (the framework used for the | ||||
| Paperless-ng is a fork of the original paperless project. It changes many | ||||
| things both on the surface and under the hood. Paperless-ng was created | ||||
| because I feel that these changes are too big to be pushed into the main | ||||
| repository right away. | ||||
|  | ||||
| NG stands for both Angular (the framework used for the | ||||
| Frontend) and next-gen. Publishing this project under a different name also | ||||
| avoids confusion between paperless and paperless-ng. | ||||
|  | ||||
| If you want to learn about what's different in paperless-ng, check out these | ||||
| resources in the documentation: | ||||
|  | ||||
| *   :ref:`Some screenshots <screenshots>` of the new UI are available. | ||||
| *   Read :ref:`this section <advanced-automatic_matching>` if you want to | ||||
|     learn about how paperless automates all tagging using machine learning. | ||||
| *   Paperless now comes with a :ref:`proper email consumer <usage-email>` | ||||
|     that's fully tested and production ready. | ||||
| *   See :ref:`this note <utilities-encyption>` about GnuPG encryption in | ||||
|     paperless-ng. | ||||
| *   The :ref:`changelog <paperless_changelog>` contains a detailed list of all changes | ||||
|     in paperless-ng. | ||||
|  | ||||
| It would be great if this project could eventually merge back into the main | ||||
| repository, but it needs a lot more work before that can happen. | ||||
|  | ||||
|   | ||||
| @@ -1,3 +1,5 @@ | ||||
| .. _screenshots: | ||||
|  | ||||
| *********** | ||||
| Screenshots | ||||
| *********** | ||||
|   | ||||
| @@ -116,7 +116,7 @@ Docker Route | ||||
|  | ||||
|     .. caution:: | ||||
|  | ||||
|         If you want to use the included ``docker-compose.yml.example`` file, you | ||||
|         If you want to use the included ``docker-compose.*.yml`` file, you | ||||
|         need to have at least Docker version **17.09.0** and docker-compose | ||||
|         version **1.17.0**. | ||||
|  | ||||
| @@ -129,6 +129,14 @@ Docker Route | ||||
|         .. _Docker installation guide: https://docs.docker.com/engine/installation/ | ||||
|         .. _docker-compose installation guide: https://docs.docker.com/compose/install/ | ||||
|  | ||||
| 2.  Copy either ``docker-compose.sqlite.yml`` or ``docker-compose.postgres.yml`` to | ||||
|     ``docker-compose.yml``, depending on which database backend you want to use. | ||||
|  | ||||
|     .. hint:: | ||||
|  | ||||
|         For new installations, it is recommended to use postgresql as the database | ||||
|         backend. This is due to the increased amount of concurrency in paperless-ng. | ||||
|  | ||||
| 2.  Modify ``docker-compose.yml`` to your preferences. You should change the path | ||||
|     to the consumption directory in this file. Find the line that specifies where | ||||
|     to mount the consumption directory: | ||||
| @@ -154,6 +162,11 @@ Docker Route | ||||
|     1000 (the default for the first normal user on most systems), it will | ||||
|     work out of the box without any modifications. | ||||
|  | ||||
|     .. note:: | ||||
|  | ||||
|         You can use any settings from the file ``paperless.conf`` in this file. | ||||
|         Have a look at :ref:`configuration` to see whats available. | ||||
|  | ||||
| 4.  Run ``docker-compose up -d``. This will create and start the necessary | ||||
|     containers. This will also build the image of paperless if you grabbed the | ||||
|     source archive. | ||||
| @@ -196,14 +209,9 @@ things have changed under the hood, so you need to adapt your setup depending on | ||||
| how you installed paperless. The important things to keep in mind are as follows. | ||||
|  | ||||
| * Read the :ref:`changelog <paperless_changelog>` and take note of breaking changes. | ||||
| * It is recommended to use postgresql as the database now. The docker-compose | ||||
|   deployment will automatically create a postgresql instance and instruct | ||||
|   paperless to use it. This means that if you use the docker-compose script | ||||
|   with your current paperless media and data volumes and used the default | ||||
|   sqlite database, **it will not use your sqlite database and it may seem | ||||
|   as if your documents are gone**. You may use the provided | ||||
|   ``docker-compose.sqlite.yml`` script instead, which does not use postgresql. See | ||||
|   :ref:`setup-sqlite_to_psql` for details on how to move your data from | ||||
| * It is recommended to use postgresql as the database now. If you want to continue | ||||
|   using SQLite, which is the default of paperless, use ``docker-compose.sqlite.yml``. | ||||
|   See :ref:`setup-sqlite_to_psql` for details on how to move your data from | ||||
|   sqlite to postgres. | ||||
| * The task scheduler of paperless, which is used to execute periodic tasks | ||||
|   such as email checking and maintenance, requires a `redis`_ message broker | ||||
| @@ -228,26 +236,40 @@ Migration to paperless-ng is then performed in a few simple steps: | ||||
| 3.  Download the latest release of paperless-ng. You can either go with the | ||||
|     docker-compose files or use the archive to build the image yourself. | ||||
|     You can either replace your current paperless folder or put paperless-ng | ||||
|     in a different location. Paperless-ng will use the same docker volumes | ||||
|     as paperless. | ||||
|     in a different location. | ||||
|  | ||||
|     .. caution:: | ||||
|  | ||||
|         Make sure you also download the ``.env`` file. This will set the | ||||
|         project name for docker compose to ``paperless`` and then it will | ||||
|         automatically reuse your existing paperless volumes. | ||||
|  | ||||
| 4.  Adjust ``docker-compose.yml`` and | ||||
|     ``docker-compose.env`` to your needs. | ||||
|     See `docker route`_ for details on which edits are required. | ||||
|  | ||||
| 5.  Update paperless. See :ref:`administration-updating` for details. | ||||
| 5.  Start paperless-ng. | ||||
|  | ||||
| 6.  Start paperless-ng. | ||||
|     .. code:: bash | ||||
|  | ||||
|         $ docker-compose up | ||||
|  | ||||
|     If you see everything working (you should see some migrations getting | ||||
|     applied, for instance), you can gracefully stop paperless-ng with Ctrl-C | ||||
|     and then start paperless-ng as usual with | ||||
|  | ||||
|     .. code:: bash | ||||
|  | ||||
|         $ docker-compose up -d | ||||
|  | ||||
| 7.  Paperless installed a permanent redirect to ``admin/`` in your browser. This | ||||
|     This will run paperless in the background and automatically start it on system boot. | ||||
|  | ||||
| 6.  Paperless installed a permanent redirect to ``admin/`` in your browser. This | ||||
|     redirect is still in place and prevents access to the new UI. Clear | ||||
|     everything related to paperless in your browsers data in order to fix | ||||
|     this issue. | ||||
|  | ||||
|  | ||||
| .. _setup-sqlite_to_psql: | ||||
|  | ||||
| Moving data from sqlite to postgresql | ||||
|   | ||||
| @@ -82,6 +82,7 @@ files from the scanner.  Typically, you're looking at an FTP server like | ||||
|  | ||||
| .. TODO: hyperref to configuration of the location of this magic folder. | ||||
|  | ||||
| .. _usage-email: | ||||
|  | ||||
| IMAP (Email) | ||||
| ============ | ||||
| @@ -133,6 +134,11 @@ These are as follows: | ||||
|     paperless will read them automatically. The default acion "mark as read" is | ||||
|     pretty tame and will not cause any damage or data loss whatsoever. | ||||
|  | ||||
|     You can also setup a special folder in your mail account for paperless and use | ||||
|     your favorite mail client to move to be consumed mails into that folder | ||||
|     automatically or manually and tell paperless to move them to yet another folder | ||||
|     after consumption. It's up to you. | ||||
|  | ||||
| .. note:: | ||||
|  | ||||
|     Paperless will process the rules in the order defined in the admin page. | ||||
|   | ||||
| @@ -1,287 +1,55 @@ | ||||
| # Sample paperless.conf | ||||
| # Copy this file to /etc/paperless.conf and modify it to suit your needs. | ||||
| # As this file contains passwords it should only be readable by the user | ||||
| # running paperless. | ||||
| # Have a look at the docs for documentation. | ||||
| # https://paperless-ng.readthedocs.io/en/latest/configuration.html | ||||
|  | ||||
| ############################################################################### | ||||
| ####                           Message Broker                              #### | ||||
| ############################################################################### | ||||
| # Debug. Only enable this for development. | ||||
|  | ||||
| #PAPERLESS_DEBUG=false | ||||
|  | ||||
| # Required services | ||||
|  | ||||
| # This is required for processing scheduled tasks such as email fetching, index | ||||
| # optimization and for training the automatic document matcher. | ||||
| # Defaults to localhost:6379. | ||||
| #PAPERLESS_REDIS=redis://localhost:6379 | ||||
|  | ||||
|  | ||||
| ############################################################################### | ||||
| ####                        Database Settings                              #### | ||||
| ############################################################################### | ||||
|  | ||||
| # By default, sqlite is used as the database backend. This can be changed here. | ||||
| # The docker-compose service definition uses a postgresql server. The | ||||
| # configuration for this is already done inside the docker-compose.env file. | ||||
|  | ||||
| #Set PAPERLESS_DBHOST and postgresql will be used instead of mysql. | ||||
| #PAPERLESS_DBHOST=localhost | ||||
|  | ||||
| #Adjust port if necessary | ||||
| #PAPERLESS_DBPORT= | ||||
|  | ||||
| #name, user and pass all default to "paperless" | ||||
| #PAPERLESS_DBPORT=5432 | ||||
| #PAPERLESS_DBNAME=paperless | ||||
| #PAPERLESS_DBUSER=paperless | ||||
| #PAPERLESS_DBPASS=paperless | ||||
|  | ||||
| # Paths and folders | ||||
|  | ||||
| ############################################################################### | ||||
| ####                         Paths & Folders                               #### | ||||
| ############################################################################### | ||||
|  | ||||
| # This where your documents should go to be consumed.  Make sure that it exists | ||||
| # and that the user running the paperless service can read/write its contents | ||||
| # before you start Paperless. | ||||
| PAPERLESS_CONSUMPTION_DIR=../consume | ||||
|  | ||||
| # This is where paperless stores all its data (search index, sqlite database, | ||||
| # classification model, etc). | ||||
| #PAPERLESS_CONSUMPTION_DIR=../consume | ||||
| #PAPERLESS_DATA_DIR=../data | ||||
|  | ||||
| # This is where your documents and thumbnails are stored. | ||||
| #PAPERLESS_MEDIA_ROOT=../media | ||||
|  | ||||
| # Override the default STATIC_ROOT here.  This is where all static files | ||||
| # created using "collectstatic" manager command are stored. | ||||
| #PAPERLESS_STATICDIR=../static | ||||
|  | ||||
|  | ||||
| # Override the STATIC_URL here.  Unless you're hosting Paperless off a | ||||
| # subdomain like /paperless/, you probably don't need to change this. | ||||
| #PAPERLESS_STATIC_URL=/static/ | ||||
|  | ||||
|  | ||||
| # Specify a filename format for the document (directories are supported) | ||||
| # Use the following placeholders: | ||||
| # * {correspondent} | ||||
| # * {title} | ||||
| # * {created} | ||||
| # * {added} | ||||
| # * {tags[KEY]} If your tags conform to key_value or key-value | ||||
| # * {tags[INDEX]} If your tags are strings, select the tag by index | ||||
| # Uniqueness of filenames is ensured, as an incrementing counter is attached | ||||
| # to each filename. | ||||
| #PAPERLESS_FILENAME_FORMAT= | ||||
|  | ||||
| ############################################################################### | ||||
| ####                              Security                                 #### | ||||
| ############################################################################### | ||||
| # Security and hosting | ||||
|  | ||||
| # Controls whether django's debug mode is enabled. Disable this on production | ||||
| # systems. Debug mode is disabled by default. | ||||
| #PAPERLESS_DEBUG=false | ||||
|  | ||||
| # GnuPG encryption is deprecated and will be removed in future versions. | ||||
| # | ||||
| # Dont use it. It does not provide any security at all. | ||||
| # | ||||
| # Paperless can be instructed to attempt to encrypt your PDF files with GPG | ||||
| # using the PAPERLESS_PASSPHRASE specified below.  If however you're not | ||||
| # concerned about encrypting these files (for example if you have disk | ||||
| # encryption locally) then you don't need this and can safely leave this value | ||||
| # un-set. | ||||
| # | ||||
| # One final note about the passphrase.  Once you've consumed a document with | ||||
| # one passphrase, DON'T CHANGE IT.  Paperless assumes this to be a constant and | ||||
| # can't properly export documents that were encrypted with an old passphrase if | ||||
| # you've since changed it to a new one. | ||||
| # | ||||
| # The default is to not use encryption at all. | ||||
| #PAPERLESS_PASSPHRASE=secret | ||||
|  | ||||
|  | ||||
| # The secret key has a default that should be fine so long as you're hosting | ||||
| # Paperless on a closed network.  However, if you're putting this anywhere | ||||
| # public, you should change the key to something unique and verbose. | ||||
| #PAPERLESS_SECRET_KEY=change-me | ||||
|  | ||||
|  | ||||
| # If you're planning on putting Paperless on the open internet, then you | ||||
| # really should set this value to the domain name you're using.  Failing to do | ||||
| # so leaves you open to HTTP host header attacks: | ||||
| # https://docs.djangoproject.com/en/1.10/topics/security/#host-headers-virtual-hosting | ||||
| # | ||||
| # Just remember that this is a comma-separated list, so "example.com" is fine, | ||||
| # as is "example.com,www.example.com", but NOT " example.com" or "example.com," | ||||
| #PAPERLESS_ALLOWED_HOSTS=example.com,www.example.com | ||||
|  | ||||
| # If you decide to use the Paperless API in an ajax call, you need to add your | ||||
| # servers to the list of allowed hosts that can do CORS calls. By default | ||||
| # Paperless allows calls from localhost:8080, but you'd like to change that, | ||||
| # you can set this value to a comma-separated list. | ||||
| #PAPERLESS_CORS_ALLOWED_HOSTS=localhost:8080,example.com,localhost:8000 | ||||
|  | ||||
| # To host paperless under a subpath url like example.com/paperless you set | ||||
| # this value to /paperless. No trailing slash! | ||||
| # | ||||
| # https://docs.djangoproject.com/en/1.11/ref/settings/#force-script-name | ||||
| #PAPERLESS_FORCE_SCRIPT_NAME= | ||||
| #PAPERLESS_STATIC_URL=/static/ | ||||
|  | ||||
| ############################################################################### | ||||
| ####                          Software Tweaks                              #### | ||||
| ############################################################################### | ||||
| # Software tweaks | ||||
|  | ||||
| # Paperless does multiple things in the background: Maintain the search index, | ||||
| # maintain the automatic matching algorithm, check emails, consume documents, | ||||
| # etc. This variable specifies how many things it will do in parallel. | ||||
| #PAPERLESS_TASK_WORKERS=1 | ||||
|  | ||||
| # Furthermore, paperless uses multiple threads when consuming documents to | ||||
| # speed up OCR. This variable specifies how many pages paperless will process | ||||
| # in parallel on a single document. | ||||
| #PAPERLESS_THREADS_PER_WORKER=1 | ||||
|  | ||||
| # Ensure that the product | ||||
| #   PAPERLESS_TASK_WORKERS * PAPERLESS_THREADS_PER_WORKER | ||||
| # does not exceed your CPU core count or else paperless will be extremely slow. | ||||
| # If you want paperless to process many documents in parallel, choose a high | ||||
| # worker count. If you want paperless to process very large documents faster, | ||||
| # use a higher thread per worker count. | ||||
| # The default is a balance between the two, according to your CPU core count, | ||||
| # with a slight favor towards threads per worker, and using as much cores as | ||||
| # possible. | ||||
| # If you only specify PAPERLESS_TASK_WORKERS, paperless will adjust | ||||
| # PAPERLESS_THREADS_PER_WORKER automatically. | ||||
|  | ||||
| # If paperless won't find documents added to your consume folder, it might | ||||
| # not be able to automatically detect filesystem changes. In that case, | ||||
| # specify a polling interval in seconds below, which will then cause paperless | ||||
| # to periodically check your consumption directory for changes. | ||||
| #PAPERLESS_CONSUMER_POLLING=10 | ||||
|  | ||||
|  | ||||
| # When the consumer detects a duplicate document, it will not touch the | ||||
| # original document. This default behavior can be changed here. | ||||
| #PAPERLESS_CONSUMER_DELETE_DUPLICATES=false | ||||
|  | ||||
| # Use optipng to optimize thumbnails. This usually reduces the sice of | ||||
| # thumbnails by about 20%, but uses considerable compute time during | ||||
| # consumption. | ||||
| #PAPERLESS_OPTIMIZE_THUMBNAILS=true | ||||
|  | ||||
| # After a document is consumed, Paperless can trigger an arbitrary script if | ||||
| # you like.  This script will be passed a number of arguments for you to work | ||||
| # with.  The default is blank, which means nothing will be executed.  For more | ||||
| # information, take a look at the docs: | ||||
| # http://paperless.readthedocs.org/en/latest/consumption.html#hooking-into-the-consumption-process | ||||
| #PAPERLESS_POST_CONSUME_SCRIPT=/path/to/an/arbitrary/script.sh | ||||
|  | ||||
| # By default, paperless will check the document text for document date information. | ||||
| # Uncomment the line below to enable checking the document filename for date | ||||
| # information. The date order can be set to any option as specified in | ||||
| # https://dateparser.readthedocs.io/en/latest/#settings. The filename will be | ||||
| # checked first, and if nothing is found, the document text will be checked | ||||
| # as normal. | ||||
| #PAPERLESS_FILENAME_DATE_ORDER=YMD | ||||
|  | ||||
| # Sometimes devices won't create filenames which can be parsed properly | ||||
| # by the filename parser (see | ||||
| # https://paperless.readthedocs.io/en/latest/guesswork.html). | ||||
| # | ||||
| # This setting allows to specify a list of transformations | ||||
| # in regular expression syntax, which are passed in order to re.sub. | ||||
| # Transformation stops after the first match, so at most one transformation | ||||
| # is applied. | ||||
| # | ||||
| # Syntax is a JSON array of dictionaries containing "pattern" and "repl" | ||||
| # as keys. | ||||
| # | ||||
| # The example below transforms filenames created by a Brother ADS-2400N | ||||
| # document scanner in its standard configuration `Name_Date_Count', so that | ||||
| # count is used as title, name as tag and date can be parsed by paperless. | ||||
| #PAPERLESS_FILENAME_PARSE_TRANSFORMS=[{"pattern":"^([a-z]+)_(\\d{8})_(\\d{6})_([0-9]+)\\.", "repl":"\\2\\3Z - \\4 - \\1."}] | ||||
|  | ||||
| # | ||||
| # The following values use sensible defaults for modern systems, but if you're | ||||
| # running Paperless on a low-resource device (like a Raspberry Pi), modifying | ||||
| # some of these values may be necessary. | ||||
| # | ||||
|  | ||||
|  | ||||
| # Customize the default language that tesseract will attempt to use when | ||||
| # parsing documents. The default language is used whenever | ||||
| #  - No language could be detected on a document | ||||
| #  - No tesseract data files are available for the detected language | ||||
| # It should be a 3-letter language code consistent with ISO | ||||
| # 639: https://www.loc.gov/standards/iso639-2/php/code_list.php | ||||
| #PAPERLESS_OCR_LANGUAGE=eng | ||||
|  | ||||
|  | ||||
| # On smaller systems, or even in the case of Very Large Documents, the consumer | ||||
| # may explode, complaining about how it's "unable to extend pixel cache".  In | ||||
| # such cases, try setting this to a reasonably low value, like 32000000.  The | ||||
| # default is to use whatever is necessary to do everything without writing to | ||||
| # disk, and units are in megabytes. | ||||
| # | ||||
| # For more information on how to use this value, you should probably search | ||||
| # the web for "MAGICK_MEMORY_LIMIT". | ||||
| #PAPERLESS_CONVERT_MEMORY_LIMIT=0 | ||||
|  | ||||
|  | ||||
| # Similar to the memory limit, if you've got a small system and your OS mounts | ||||
| # /tmp as tmpfs, you should set this to a path that's on a physical disk, like | ||||
| # /home/your_user/tmp or something.  ImageMagick will use this as scratch space | ||||
| # when crunching through very large documents. | ||||
| # | ||||
| # For more information on how to use this value, you should probably search | ||||
| # the web for "MAGICK_TMPDIR". | ||||
| #PAPERLESS_CONVERT_TMPDIR=/var/tmp/paperless | ||||
|  | ||||
|  | ||||
| # By default the conversion density setting for documents is 300DPI, in some | ||||
| # cases it has proven useful to configure a lesser value. | ||||
| # This setting has a high impact on the physical size of tmp page files, | ||||
| # the speed of document conversion, and can affect the accuracy of OCR | ||||
| # results. Individual results can vary and this setting should be tested | ||||
| # thoroughly against the documents you are importing to see if it has any | ||||
| # impacts either negative or positive. | ||||
| # Testing on limited document sets has shown a setting of 200 can cut the | ||||
| # size of tmp files by 1/3, and speed up conversion by up to 4x | ||||
| # with little impact to OCR accuracy. | ||||
| #PAPERLESS_CONVERT_DENSITY=300 | ||||
|  | ||||
| # By default Paperless does not OCR a document if the text can be retrieved from | ||||
| # the document directly. Set to true to always OCR documents. | ||||
| #PAPERLESS_OCR_ALWAYS=false | ||||
|  | ||||
|  | ||||
| ############################################################################### | ||||
| ####                            Interface                                  #### | ||||
| ############################################################################### | ||||
|  | ||||
| # Override the default UTC time zone here. | ||||
| # See https://docs.djangoproject.com/en/1.10/ref/settings/#std:setting-TIME_ZONE | ||||
| # for details on how to set it. | ||||
| #PAPERLESS_TIME_ZONE=UTC | ||||
| #PAPERLESS_OCR_LANGUAGE=eng | ||||
| #PAPERLESS_OCR_ALWAYS=false | ||||
| #PAPERLESS_CONSUMER_POLLING=10 | ||||
| #PAPERLESS_CONSUMER_DELETE_DUPLICATES=false | ||||
| #PAPERLESS_CONVERT_MEMORY_LIMIT=0 | ||||
| #PAPERLESS_CONVERT_TMPDIR=/var/tmp/paperless | ||||
| #PAPERLESS_CONVERT_DENSITY=300 | ||||
| #PAPERLESS_OPTIMIZE_THUMBNAILS=true | ||||
| #PAPERLESS_POST_CONSUME_SCRIPT=/path/to/an/arbitrary/script.sh | ||||
| #PAPERLESS_FILENAME_DATE_ORDER=YMD | ||||
| #PAPERLESS_FILENAME_PARSE_TRANSFORMS=[] | ||||
|  | ||||
| # Binaries | ||||
|  | ||||
| ############################################################################### | ||||
| ####                     Third-Party Binaries                              #### | ||||
| ############################################################################### | ||||
|  | ||||
| # There are a few external software packages that Paperless expects to find on | ||||
| # your system when it starts up.  Unless you've done something creative with | ||||
| # their installation, you probably won't need to edit any of these.  However, | ||||
| # if you've installed these programs somewhere where simply typing the name of | ||||
| # the program doesn't automatically execute it (ie. the program isn't in your | ||||
| # $PATH), then you'll need to specify the literal path for that program here. | ||||
|  | ||||
| # Convert (part of the ImageMagick suite) | ||||
| #PAPERLESS_CONVERT_BINARY=/usr/bin/convert | ||||
|  | ||||
| # Ghostscript | ||||
| #PAPERLESS_GS_BINARY=/usr/bin/gs | ||||
|  | ||||
| # Unpaper | ||||
| #PAPERLESS_UNPAPER_BINARY=/usr/bin/unpaper | ||||
|  | ||||
| # Optipng (for optimising thumbnail sizes) | ||||
| #PAPERLESS_OPTIPNG_BINARY=/usr/bin/optipng | ||||
|   | ||||
| @@ -79,6 +79,7 @@ cp "$PAPERLESS_ROOT/docker/docker-compose.env" "$PAPERLESS_DIST_APP" | ||||
|  | ||||
| # docker files for pulling from docker hub | ||||
| cp "$PAPERLESS_ROOT/docker/hub/"* "$PAPERLESS_DIST" | ||||
| cp "$PAPERLESS_ROOT/.env" "$PAPERLESS_DIST" | ||||
| cp "$PAPERLESS_ROOT/docker/docker-compose.env" "$PAPERLESS_DIST" | ||||
|  | ||||
| # auxiliary files required for the docker image | ||||
|   | ||||
| @@ -152,11 +152,11 @@ else: | ||||
|     X_FRAME_OPTIONS = 'SAMEORIGIN' | ||||
|  | ||||
| # We allow CORS from localhost:8080 | ||||
| CORS_ORIGIN_WHITELIST = tuple(os.getenv("PAPERLESS_CORS_ALLOWED_HOSTS", "http://localhost:8080,https://localhost:8080").split(",")) | ||||
| CORS_ALLOWED_ORIGINS = tuple(os.getenv("PAPERLESS_CORS_ALLOWED_HOSTS", "http://localhost:8000").split(",")) | ||||
|  | ||||
| if DEBUG: | ||||
|     # Allow access from the angular development server during debugging | ||||
|     CORS_ORIGIN_WHITELIST += ('http://localhost:4200',) | ||||
|     CORS_ALLOWED_ORIGINS += ('http://localhost:4200',) | ||||
|  | ||||
| # The secret key has a default that should be fine so long as you're hosting | ||||
| # Paperless on a closed network.  However, if you're putting this anywhere | ||||
|   | ||||
		Reference in New Issue
	
	Block a user
	 Jonas Winkler
					Jonas Winkler