diff --git a/docs/administration.rst b/docs/administration.rst index 42a91bcdb..2bfce9588 100644 --- a/docs/administration.rst +++ b/docs/administration.rst @@ -82,6 +82,13 @@ A. If you used the docker-compose file, simply download the files of the new re If you see everything working, you can start paperless-ng with "-d" to have it run in the background. + .. hint:: + + The released docker-compose files specify exact versions to be pulled from the hub. + This is to ensure that if the docker-compose files should change at some point + (i.e., services updates/configured differently), you wont run into trouble due to + docker pulling the ``latest`` image and running it in an older environment. + B. If you built the image yourself, grab the new archive and replace your current paperless folder with the new contents. @@ -120,6 +127,7 @@ After grabbing the new release and unpacking the contents, do the following: $ pip install --upgrade pipenv $ cd /path/to/paperless $ pipenv install + $ pipenv clean This creates a new virtual environment (or uses your existing environment) and installs all dependencies into it. @@ -143,7 +151,7 @@ Management utilities #################### Paperless comes with some management commands that perform various maintenance -tasks on your paperless instance. You can invoce these commands either by +tasks on your paperless instance. You can invoke these commands either by .. code:: bash @@ -311,6 +319,19 @@ the naming scheme. The command takes no arguments and processes all your documents at once. +Fetching e-mail +=============== + +Paperless automatically fetches your e-mail every 10 minutes by default. If +you want to invoke the email consumer manually, call the following management +command: + +.. code:: + + mail_fetcher + +The command takes no arguments and processes all your mail accounts and rules. + .. _utilities-encyption: Managing encryption @@ -320,7 +341,7 @@ Documents can be stored in Paperless using GnuPG encryption. .. danger:: - Decryption is depreceated since paperless-ng 0.9 and doesn't really provide any + Encryption is depreceated since paperless-ng 0.9 and doesn't really provide any additional security, since you have to store the passphrase in a configuration file on the same system as the encrypted documents for paperless to work. Furthermore, the entire text content of the documents is stored plain in the diff --git a/docs/advanced_usage.rst b/docs/advanced_usage.rst index 6183baae1..a6f44ce48 100644 --- a/docs/advanced_usage.rst +++ b/docs/advanced_usage.rst @@ -52,6 +52,8 @@ filename as described above. .. _dateparser: https://github.com/scrapinghub/dateparser/blob/v0.7.0/docs/usage.rst#settings +.. _advanced-transforming_filenames: + Transforming filenames for parsing ================================== @@ -219,6 +221,7 @@ the consumption process will begin with the newly modified file. .. _pdf2pdfocr.py: https://github.com/LeoFCardoso/pdf2pdfocr +.. _advanced-post_consume_script: Post-consumption script ======================= diff --git a/docs/api.rst b/docs/api.rst index 34764540e..e661cc3ff 100644 --- a/docs/api.rst +++ b/docs/api.rst @@ -91,6 +91,7 @@ Result object: "document": { } + } * ``id``: the primary key of the found document * ``highlights``: an object containing parseable highlights for the result. @@ -109,7 +110,7 @@ Each fragment contains a list of strings, and some of them are marked as a highl .. code:: json - "highlights": [ + [ [ {"text": "This is a sample text with a "}, {"text": "highlighted", "term": 0}, @@ -120,6 +121,8 @@ Each fragment contains a list of strings, and some of them are marked as a highl {"text": " fragment with a highlight."} ] ] + + When ``term`` is present within a string, the word within ``text`` should be highlighted. The term index groups multiple matches together and words with the same index diff --git a/docs/changelog.rst b/docs/changelog.rst index 9fcf10940..7850b6d9a 100644 --- a/docs/changelog.rst +++ b/docs/changelog.rst @@ -66,7 +66,6 @@ paperless-ng 0.9.0 * If ``PAPERLESS_DBHOST`` is specified in the settings, paperless uses postgresql instead of sqlite. Username, database and password all default to ``paperless`` if not specified. - * **docker-compose.yml uses PostgreSQL by default.** * **Modified [breaking]:** document_retagger management command rework. See :ref:`utilities-retagger` for details. Replaces ``document_correspondents`` diff --git a/docs/configuration.rst b/docs/configuration.rst index 48eea64cb..1ddd7ca0e 100644 --- a/docs/configuration.rst +++ b/docs/configuration.rst @@ -1,9 +1,10 @@ +.. _configuration: + ************* Configuration ************* Paperless provides a wide range of customizations. -Have a look at ``paperless.conf.example`` for available configuration options. Depending on how you run paperless, these settings have to be defined in different places. @@ -18,5 +19,288 @@ places. /etc/paperless.conf /usr/local/etc/paperless.conf - Copy ``paperless.conf.example`` to any of these locations and adjust it to your - needs. + +Required services +################# + +PAPERLESS_REDIS= + This is required for processing scheduled tasks such as email fetching, index + optimization and for training the automatic document matcher. + + Defaults to redis://localhost:6379. + +PAPERLESS_DBHOST= + By default, sqlite is used as the database backend. This can be changed here. + Set PAPERLESS_DBHOST and PostgreSQL will be used instead of mysql. + +PAPERLESS_DBPORT= + Adjust port if necessary. + + Default is 5432. + +PAPERLESS_DBNAME= + Database name in PostgreSQL. + + Defaults to "paperless". + +PAPERLESS_DBUSER= + Database user in PostgreSQL. + + Defaults to "paperless". + +PAPERLESS_DBPASS= + Database password for PostgreSQL. + + Defaults to "paperless". + + +Paths and folders +################# + +PAPERLESS_CONSUMPTION_DIR= + This where your documents should go to be consumed. Make sure that it exists + and that the user running the paperless service can read/write its contents + before you start Paperless. + + Don't change this when using docker, as it only changes the path within the + container. Change the local consumption directory in the docker-compose.yml + file instead. + + Defaults to "../consume", relative to the "src" directory. + +PAPERLESS_DATA_DIR= + This is where paperless stores all its data (search index, sqlite database, + classification model, etc). + + Defaults to "../data", relative to the "src" directory. + +PAPERLESS_MEDIA_ROOT= + This is where your documents and thumbnails are stored. + + You can set this and PAPERLESS_DATA_DIR to the same folder to have paperless + store all its data within the same volume. + + Defaults to "../media", relative to the "src" directory. + +PAPERLESS_STATICDIR= + Override the default STATIC_ROOT here. This is where all static files + created using "collectstatic" manager command are stored. + + Unless you're doing something fancy, there is no need to override this. + + Defaults to "../static", relative to the "src" directory. + +PAPERLESS_FILENAME_FORMAT= + Changes the filenames paperless uses to store documents in the media directory. + See :ref:`advanced-file_name_handling` for details. + + Default is none, which disables this feature. + +Hosting & Security +################## + +PAPERLESS_SECRET_KEY= + Paperless uses this to make session tokens. If you exose paperless on the + internet, you need to change this, since the default secret is well known. + + Use any sequence of characters. The more, the better. You don't need to + remember this. Just face-roll your keyboard. + + Default is listed in the file ``src/paperless/settings.py``. + +PAPERLESS_ALLOWED_HOSTS + If you're planning on putting Paperless on the open internet, then you + really should set this value to the domain name you're using. Failing to do + so leaves you open to HTTP host header attacks: + https://docs.djangoproject.com/en/3.1/topics/security/#host-header-validation + + Just remember that this is a comma-separated list, so "example.com" is fine, + as is "example.com,www.example.com", but NOT " example.com" or "example.com," + + Defaults to "*", which is all hosts. + +PAPERLESS_CORS_ALLOWED_HOSTS + You need to add your servers to the list of allowed hosts that can do CORS + calls. Set this to your public domain name. + + Defaults to "http://localhost:8000". + +PAPERLESS_FORCE_SCRIPT_NAME= + To host paperless under a subpath url like example.com/paperless you set + this value to /paperless. No trailing slash! + + .. note:: + + I don't know if this works in paperless-ng. Probably not. + + Defaults to none, which hosts paperless at "/". + +PAPERLESS_STATIC_URL= + Override the STATIC_URL here. Unless you're hosting Paperless off a + subdomain like /paperless/, you probably don't need to change this. + + Defaults to "/static/". + + +Software tweaks +############### + +PAPERLESS_TASK_WORKERS= + Paperless does multiple things in the background: Maintain the search index, + maintain the automatic matching algorithm, check emails, consume documents, + etc. This variable specifies how many things it will do in parallel. + +PAPERLESS_THREADS_PER_WORKER= + Furthermore, paperless uses multiple threads when consuming documents to + speed up OCR. This variable specifies how many pages paperless will process + in parallel on a single document. + + .. caution:: + + Ensure that the product + + PAPERLESS_TASK_WORKERS * PAPERLESS_THREADS_PER_WORKER + + does not exceed your CPU core count or else paperless will be extremely slow. + If you want paperless to process many documents in parallel, choose a high + worker count. If you want paperless to process very large documents faster, + use a higher thread per worker count. + + The default is a balance between the two, according to your CPU core count, + with a slight favor towards threads per worker, and using as much cores as + possible. + + If you only specify PAPERLESS_TASK_WORKERS, paperless will adjust + PAPERLESS_THREADS_PER_WORKER automatically. + + + +PAPERLESS_TIME_ZONE= + Set the time zone here. + See https://docs.djangoproject.com/en/3.1/ref/settings/#std:setting-TIME_ZONE + for details on how to set it. + + Defaults to UTC. + + + +PAPERLESS_OCR_LANGUAGE= + Customize the default language that tesseract will attempt to use when + parsing documents. The default language is used whenever + + * No language could be detected on a document + * No tesseract data files are available for the detected language + + It should be a 3-letter language code consistent with ISO + 639: https://www.loc.gov/standards/iso639-2/php/code_list.php + + Set this to the language most of your documents are written in. + + Defaults to "eng". + +PAPERLESS_OCR_ALWAYS= + By default Paperless does not OCR a document if the text can be retrieved from + the document directly. Set to true to always OCR documents. + + Defaults to false. + +PAPERLESS_CONSUMER_POLLING= + If paperless won't find documents added to your consume folder, it might + not be able to automatically detect filesystem changes. In that case, + specify a polling interval in seconds here, which will then cause paperless + to periodically check your consumption directory for changes. + + Defaults to 0, which disables polling and uses filesystem notifiactions. + +PAPERLESS_CONSUMER_DELETE_DUPLICATES= + When the consumer detects a duplicate document, it will not touch the + original document. This default behavior can be changed here. + + Defaults to false. + +PAPERLESS_CONVERT_MEMORY_LIMIT= + On smaller systems, or even in the case of Very Large Documents, the consumer + may explode, complaining about how it's "unable to extend pixel cache". In + such cases, try setting this to a reasonably low value, like 32. The + default is to use whatever is necessary to do everything without writing to + disk, and units are in megabytes. + + For more information on how to use this value, you should search + the web for "MAGICK_MEMORY_LIMIT". + + Defaults to 0, which disables the limit. + +PAPERLESS_CONVERT_TMPDIR= + Similar to the memory limit, if you've got a small system and your OS mounts + /tmp as tmpfs, you should set this to a path that's on a physical disk, like + /home/your_user/tmp or something. ImageMagick will use this as scratch space + when crunching through very large documents. + + For more information on how to use this value, you should search + the web for "MAGICK_TMPDIR". + + Default is none, which disables the temporary directory. + +PAPERLESS_CONVERT_DENSITY= + This setting has a high impact on the physical size of tmp page files, + the speed of document conversion, and can affect the accuracy of OCR + results. Individual results can vary and this setting should be tested + thoroughly against the documents you are importing to see if it has any + impacts either negative or positive. + Testing on limited document sets has shown a setting of 200 can cut the + size of tmp files by 1/3, and speed up conversion by up to 4x + with little impact to OCR accuracy. + + Default is 300. + +PAPERLESS_OPTIMIZE_THUMBNAILS= + Use optipng to optimize thumbnails. This usually reduces the sice of + thumbnails by about 20%, but uses considerable compute time during + consumption. + + Defaults to true. + +PAPERLESS_POST_CONSUME_SCRIPT= + After a document is consumed, Paperless can trigger an arbitrary script if + you like. This script will be passed a number of arguments for you to work + with. For more information, take a look at :ref:`advanced-post_consume_script`. + + The default is blank, which means nothing will be executed. + +PAPERLESS_FILENAME_DATE_ORDER= + Paperless will check the document text for document date information. + Use this setting to enable checking the document filename for date + information. The date order can be set to any option as specified in + https://dateparser.readthedocs.io/en/latest/settings.html#date-order. + The filename will be checked first, and if nothing is found, the document + text will be checked as normal. + + Defaults to none, which disables this feature. + +PAPERLESS_FILENAME_PARSE_TRANSFORMS + Transforms filenames before they are processed by paperless. See + :ref:`advanced-transforming_filenames` for details. + + Defaults to none, which disables this feature. + +Binaries +######## + +There are a few external software packages that Paperless expects to find on +your system when it starts up. Unless you've done something creative with +their installation, you probably won't need to edit any of these. However, +if you've installed these programs somewhere where simply typing the name of +the program doesn't automatically execute it (ie. the program isn't in your +$PATH), then you'll need to specify the literal path for that program. + +PAPERLESS_CONVERT_BINARY= + Defaults to "/usr/bin/convert". + +PAPERLESS_GS_BINARY= + Defaults to "/usr/bin/gs". + +PAPERLESS_UNPAPER_BINARY= + Defaults to "/usr/bin/unpaper". + +PAPERLESS_OPTIPNG_BINARY= + Defaults to "/usr/bin/optipng". diff --git a/docs/index.rst b/docs/index.rst index 95cbd71a8..756fee3b1 100644 --- a/docs/index.rst +++ b/docs/index.rst @@ -25,14 +25,28 @@ and then shred them. Perhaps you might find it useful too. Paperless-ng ============ -I wanted to make big changes to the project that will impact the way it is used -by its users greatly. Among the users who currently use paperless in production -there are probably many that don't want these changes right away. I also wanted -to have more control over what goes into the code and what does not. Therefore, -paperless-ng was created. NG stands for both Angular (the framework used for the +Paperless-ng is a fork of the original paperless project. It changes many +things both on the surface and under the hood. Paperless-ng was created +because I feel that these changes are too big to be pushed into the main +repository right away. + +NG stands for both Angular (the framework used for the Frontend) and next-gen. Publishing this project under a different name also avoids confusion between paperless and paperless-ng. +If you want to learn about what's different in paperless-ng, check out these +resources in the documentation: + +* :ref:`Some screenshots ` of the new UI are available. +* Read :ref:`this section ` if you want to + learn about how paperless automates all tagging using machine learning. +* Paperless now comes with a :ref:`proper email consumer ` + that's fully tested and production ready. +* See :ref:`this note ` about GnuPG encryption in + paperless-ng. +* The :ref:`changelog ` contains a detailed list of all changes + in paperless-ng. + It would be great if this project could eventually merge back into the main repository, but it needs a lot more work before that can happen. diff --git a/docs/screenshots.rst b/docs/screenshots.rst index bc9487a39..9fbd55634 100644 --- a/docs/screenshots.rst +++ b/docs/screenshots.rst @@ -1,3 +1,5 @@ +.. _screenshots: + *********** Screenshots *********** diff --git a/docs/setup.rst b/docs/setup.rst index b9b240a10..71acfba42 100644 --- a/docs/setup.rst +++ b/docs/setup.rst @@ -28,20 +28,20 @@ Overview of Paperless-ng Compared to paperless, paperless-ng works a little different under the hood and has more moving parts that work together. While this increases the complexity of -the system, it also brings many benefits. +the system, it also brings many benefits. Paperless consists of the following components: -* **The webserver:** This is pretty much the same as in paperless. It serves +* **The webserver:** This is pretty much the same as in paperless. It serves the administration pages, the API, and the new frontend. This is the main tool you'll be using to interact with paperless. You may start the webserver with .. code:: shell-session - + $ cd /path/to/paperless/src/ $ pipenv run gunicorn -c /usr/src/paperless/gunicorn.conf.py -b 0.0.0.0:8000 paperless.wsgi - + or by any other means such as Apache ``mod_wsgi``. * **The consumer:** This is what watches your consumption folder for documents. @@ -53,7 +53,7 @@ Paperless consists of the following components: Start the consumer with the management command ``document_consumer``: .. code:: shell-session - + $ cd /path/to/paperless/src/ $ pipenv run python3 manage.py document_consumer @@ -61,7 +61,7 @@ Paperless consists of the following components: for doing much of the heavy lifting. This is a task queue that accepts tasks from multiple sources and processes tasks in parallel. It also comes with a scheduler that executes certain commands periodically. - + This task processor is responsible for: * Consuming documents. When the consumer finds new documents, it notifies the task processor to @@ -72,7 +72,7 @@ Paperless consists of the following components: the web interface. * Maintain the search index and the automatic matching algorithm. These are things that paperless needs to do from time to time in order to operate properly. - + This allows paperless to process multiple documents from your consumption folder in parallel! On a modern multicore system, consumption with full ocr is blazing fast. @@ -82,7 +82,7 @@ Paperless consists of the following components: You may start the task processor by executing: .. code:: shell-session - + $ cd /path/to/paperless/src/ $ pipenv run python3 manage.py qcluster @@ -116,7 +116,7 @@ Docker Route .. caution:: - If you want to use the included ``docker-compose.yml.example`` file, you + If you want to use the included ``docker-compose.*.yml`` file, you need to have at least Docker version **17.09.0** and docker-compose version **1.17.0**. @@ -129,20 +129,28 @@ Docker Route .. _Docker installation guide: https://docs.docker.com/engine/installation/ .. _docker-compose installation guide: https://docs.docker.com/compose/install/ +2. Copy either ``docker-compose.sqlite.yml`` or ``docker-compose.postgres.yml`` to + ``docker-compose.yml``, depending on which database backend you want to use. + + .. hint:: + + For new installations, it is recommended to use postgresql as the database + backend. This is due to the increased amount of concurrency in paperless-ng. + 2. Modify ``docker-compose.yml`` to your preferences. You should change the path to the consumption directory in this file. Find the line that specifies where to mount the consumption directory: .. code:: - + - ./consume:/usr/src/paperless/consume - + Replace the part BEFORE the colon with a local directory of your choice: .. code:: - /home/jonaswinkler/paperless-inbox:/usr/src/paperless/consume - + Don't change the part after the colon or paperless wont find your documents. @@ -154,6 +162,11 @@ Docker Route 1000 (the default for the first normal user on most systems), it will work out of the box without any modifications. + .. note:: + + You can use any settings from the file ``paperless.conf`` in this file. + Have a look at :ref:`configuration` to see whats available. + 4. Run ``docker-compose up -d``. This will create and start the necessary containers. This will also build the image of paperless if you grabbed the source archive. @@ -196,14 +209,9 @@ things have changed under the hood, so you need to adapt your setup depending on how you installed paperless. The important things to keep in mind are as follows. * Read the :ref:`changelog ` and take note of breaking changes. -* It is recommended to use postgresql as the database now. The docker-compose - deployment will automatically create a postgresql instance and instruct - paperless to use it. This means that if you use the docker-compose script - with your current paperless media and data volumes and used the default - sqlite database, **it will not use your sqlite database and it may seem - as if your documents are gone**. You may use the provided - ``docker-compose.sqlite.yml`` script instead, which does not use postgresql. See - :ref:`setup-sqlite_to_psql` for details on how to move your data from +* It is recommended to use postgresql as the database now. If you want to continue + using SQLite, which is the default of paperless, use ``docker-compose.sqlite.yml``. + See :ref:`setup-sqlite_to_psql` for details on how to move your data from sqlite to postgres. * The task scheduler of paperless, which is used to execute periodic tasks such as email checking and maintenance, requires a `redis`_ message broker @@ -228,26 +236,40 @@ Migration to paperless-ng is then performed in a few simple steps: 3. Download the latest release of paperless-ng. You can either go with the docker-compose files or use the archive to build the image yourself. You can either replace your current paperless folder or put paperless-ng - in a different location. Paperless-ng will use the same docker volumes - as paperless. + in a different location. + + .. caution:: + + Make sure you also download the ``.env`` file. This will set the + project name for docker compose to ``paperless`` and then it will + automatically reuse your existing paperless volumes. 4. Adjust ``docker-compose.yml`` and ``docker-compose.env`` to your needs. See `docker route`_ for details on which edits are required. -5. Update paperless. See :ref:`administration-updating` for details. +5. Start paperless-ng. -6. Start paperless-ng. + .. code:: bash + + $ docker-compose up + + If you see everything working (you should see some migrations getting + applied, for instance), you can gracefully stop paperless-ng with Ctrl-C + and then start paperless-ng as usual with .. code:: bash $ docker-compose up -d -7. Paperless installed a permanent redirect to ``admin/`` in your browser. This - redirect is still in place and prevents access to the new UI. Clear + This will run paperless in the background and automatically start it on system boot. + +6. Paperless installed a permanent redirect to ``admin/`` in your browser. This + redirect is still in place and prevents access to the new UI. Clear everything related to paperless in your browsers data in order to fix this issue. + .. _setup-sqlite_to_psql: Moving data from sqlite to postgresql diff --git a/docs/usage_overview.rst b/docs/usage_overview.rst index 4fa19d01f..5f47b56a9 100644 --- a/docs/usage_overview.rst +++ b/docs/usage_overview.rst @@ -82,6 +82,7 @@ files from the scanner. Typically, you're looking at an FTP server like .. TODO: hyperref to configuration of the location of this magic folder. +.. _usage-email: IMAP (Email) ============ @@ -133,6 +134,11 @@ These are as follows: paperless will read them automatically. The default acion "mark as read" is pretty tame and will not cause any damage or data loss whatsoever. + You can also setup a special folder in your mail account for paperless and use + your favorite mail client to move to be consumed mails into that folder + automatically or manually and tell paperless to move them to yet another folder + after consumption. It's up to you. + .. note:: Paperless will process the rules in the order defined in the admin page. diff --git a/paperless.conf.example b/paperless.conf.example index afc178bcf..e1fd17a77 100644 --- a/paperless.conf.example +++ b/paperless.conf.example @@ -1,287 +1,55 @@ -# Sample paperless.conf -# Copy this file to /etc/paperless.conf and modify it to suit your needs. -# As this file contains passwords it should only be readable by the user -# running paperless. +# Have a look at the docs for documentation. +# https://paperless-ng.readthedocs.io/en/latest/configuration.html -############################################################################### -#### Message Broker #### -############################################################################### +# Debug. Only enable this for development. + +#PAPERLESS_DEBUG=false + +# Required services -# This is required for processing scheduled tasks such as email fetching, index -# optimization and for training the automatic document matcher. -# Defaults to localhost:6379. #PAPERLESS_REDIS=redis://localhost:6379 - - -############################################################################### -#### Database Settings #### -############################################################################### - -# By default, sqlite is used as the database backend. This can be changed here. -# The docker-compose service definition uses a postgresql server. The -# configuration for this is already done inside the docker-compose.env file. - -#Set PAPERLESS_DBHOST and postgresql will be used instead of mysql. #PAPERLESS_DBHOST=localhost - -#Adjust port if necessary -#PAPERLESS_DBPORT= - -#name, user and pass all default to "paperless" +#PAPERLESS_DBPORT=5432 #PAPERLESS_DBNAME=paperless #PAPERLESS_DBUSER=paperless #PAPERLESS_DBPASS=paperless +# Paths and folders -############################################################################### -#### Paths & Folders #### -############################################################################### - -# This where your documents should go to be consumed. Make sure that it exists -# and that the user running the paperless service can read/write its contents -# before you start Paperless. -PAPERLESS_CONSUMPTION_DIR=../consume - -# This is where paperless stores all its data (search index, sqlite database, -# classification model, etc). +#PAPERLESS_CONSUMPTION_DIR=../consume #PAPERLESS_DATA_DIR=../data - -# This is where your documents and thumbnails are stored. #PAPERLESS_MEDIA_ROOT=../media - -# Override the default STATIC_ROOT here. This is where all static files -# created using "collectstatic" manager command are stored. #PAPERLESS_STATICDIR=../static - - -# Override the STATIC_URL here. Unless you're hosting Paperless off a -# subdomain like /paperless/, you probably don't need to change this. -#PAPERLESS_STATIC_URL=/static/ - - -# Specify a filename format for the document (directories are supported) -# Use the following placeholders: -# * {correspondent} -# * {title} -# * {created} -# * {added} -# * {tags[KEY]} If your tags conform to key_value or key-value -# * {tags[INDEX]} If your tags are strings, select the tag by index -# Uniqueness of filenames is ensured, as an incrementing counter is attached -# to each filename. #PAPERLESS_FILENAME_FORMAT= -############################################################################### -#### Security #### -############################################################################### +# Security and hosting -# Controls whether django's debug mode is enabled. Disable this on production -# systems. Debug mode is disabled by default. -#PAPERLESS_DEBUG=false - -# GnuPG encryption is deprecated and will be removed in future versions. -# -# Dont use it. It does not provide any security at all. -# -# Paperless can be instructed to attempt to encrypt your PDF files with GPG -# using the PAPERLESS_PASSPHRASE specified below. If however you're not -# concerned about encrypting these files (for example if you have disk -# encryption locally) then you don't need this and can safely leave this value -# un-set. -# -# One final note about the passphrase. Once you've consumed a document with -# one passphrase, DON'T CHANGE IT. Paperless assumes this to be a constant and -# can't properly export documents that were encrypted with an old passphrase if -# you've since changed it to a new one. -# -# The default is to not use encryption at all. -#PAPERLESS_PASSPHRASE=secret - - -# The secret key has a default that should be fine so long as you're hosting -# Paperless on a closed network. However, if you're putting this anywhere -# public, you should change the key to something unique and verbose. #PAPERLESS_SECRET_KEY=change-me - - -# If you're planning on putting Paperless on the open internet, then you -# really should set this value to the domain name you're using. Failing to do -# so leaves you open to HTTP host header attacks: -# https://docs.djangoproject.com/en/1.10/topics/security/#host-headers-virtual-hosting -# -# Just remember that this is a comma-separated list, so "example.com" is fine, -# as is "example.com,www.example.com", but NOT " example.com" or "example.com," #PAPERLESS_ALLOWED_HOSTS=example.com,www.example.com - -# If you decide to use the Paperless API in an ajax call, you need to add your -# servers to the list of allowed hosts that can do CORS calls. By default -# Paperless allows calls from localhost:8080, but you'd like to change that, -# you can set this value to a comma-separated list. #PAPERLESS_CORS_ALLOWED_HOSTS=localhost:8080,example.com,localhost:8000 - -# To host paperless under a subpath url like example.com/paperless you set -# this value to /paperless. No trailing slash! -# -# https://docs.djangoproject.com/en/1.11/ref/settings/#force-script-name #PAPERLESS_FORCE_SCRIPT_NAME= +#PAPERLESS_STATIC_URL=/static/ -############################################################################### -#### Software Tweaks #### -############################################################################### +# Software tweaks -# Paperless does multiple things in the background: Maintain the search index, -# maintain the automatic matching algorithm, check emails, consume documents, -# etc. This variable specifies how many things it will do in parallel. #PAPERLESS_TASK_WORKERS=1 - -# Furthermore, paperless uses multiple threads when consuming documents to -# speed up OCR. This variable specifies how many pages paperless will process -# in parallel on a single document. #PAPERLESS_THREADS_PER_WORKER=1 - -# Ensure that the product -# PAPERLESS_TASK_WORKERS * PAPERLESS_THREADS_PER_WORKER -# does not exceed your CPU core count or else paperless will be extremely slow. -# If you want paperless to process many documents in parallel, choose a high -# worker count. If you want paperless to process very large documents faster, -# use a higher thread per worker count. -# The default is a balance between the two, according to your CPU core count, -# with a slight favor towards threads per worker, and using as much cores as -# possible. -# If you only specify PAPERLESS_TASK_WORKERS, paperless will adjust -# PAPERLESS_THREADS_PER_WORKER automatically. - -# If paperless won't find documents added to your consume folder, it might -# not be able to automatically detect filesystem changes. In that case, -# specify a polling interval in seconds below, which will then cause paperless -# to periodically check your consumption directory for changes. -#PAPERLESS_CONSUMER_POLLING=10 - - -# When the consumer detects a duplicate document, it will not touch the -# original document. This default behavior can be changed here. -#PAPERLESS_CONSUMER_DELETE_DUPLICATES=false - -# Use optipng to optimize thumbnails. This usually reduces the sice of -# thumbnails by about 20%, but uses considerable compute time during -# consumption. -#PAPERLESS_OPTIMIZE_THUMBNAILS=true - -# After a document is consumed, Paperless can trigger an arbitrary script if -# you like. This script will be passed a number of arguments for you to work -# with. The default is blank, which means nothing will be executed. For more -# information, take a look at the docs: -# http://paperless.readthedocs.org/en/latest/consumption.html#hooking-into-the-consumption-process -#PAPERLESS_POST_CONSUME_SCRIPT=/path/to/an/arbitrary/script.sh - -# By default, paperless will check the document text for document date information. -# Uncomment the line below to enable checking the document filename for date -# information. The date order can be set to any option as specified in -# https://dateparser.readthedocs.io/en/latest/#settings. The filename will be -# checked first, and if nothing is found, the document text will be checked -# as normal. -#PAPERLESS_FILENAME_DATE_ORDER=YMD - -# Sometimes devices won't create filenames which can be parsed properly -# by the filename parser (see -# https://paperless.readthedocs.io/en/latest/guesswork.html). -# -# This setting allows to specify a list of transformations -# in regular expression syntax, which are passed in order to re.sub. -# Transformation stops after the first match, so at most one transformation -# is applied. -# -# Syntax is a JSON array of dictionaries containing "pattern" and "repl" -# as keys. -# -# The example below transforms filenames created by a Brother ADS-2400N -# document scanner in its standard configuration `Name_Date_Count', so that -# count is used as title, name as tag and date can be parsed by paperless. -#PAPERLESS_FILENAME_PARSE_TRANSFORMS=[{"pattern":"^([a-z]+)_(\\d{8})_(\\d{6})_([0-9]+)\\.", "repl":"\\2\\3Z - \\4 - \\1."}] - -# -# The following values use sensible defaults for modern systems, but if you're -# running Paperless on a low-resource device (like a Raspberry Pi), modifying -# some of these values may be necessary. -# - - -# Customize the default language that tesseract will attempt to use when -# parsing documents. The default language is used whenever -# - No language could be detected on a document -# - No tesseract data files are available for the detected language -# It should be a 3-letter language code consistent with ISO -# 639: https://www.loc.gov/standards/iso639-2/php/code_list.php -#PAPERLESS_OCR_LANGUAGE=eng - - -# On smaller systems, or even in the case of Very Large Documents, the consumer -# may explode, complaining about how it's "unable to extend pixel cache". In -# such cases, try setting this to a reasonably low value, like 32000000. The -# default is to use whatever is necessary to do everything without writing to -# disk, and units are in megabytes. -# -# For more information on how to use this value, you should probably search -# the web for "MAGICK_MEMORY_LIMIT". -#PAPERLESS_CONVERT_MEMORY_LIMIT=0 - - -# Similar to the memory limit, if you've got a small system and your OS mounts -# /tmp as tmpfs, you should set this to a path that's on a physical disk, like -# /home/your_user/tmp or something. ImageMagick will use this as scratch space -# when crunching through very large documents. -# -# For more information on how to use this value, you should probably search -# the web for "MAGICK_TMPDIR". -#PAPERLESS_CONVERT_TMPDIR=/var/tmp/paperless - - -# By default the conversion density setting for documents is 300DPI, in some -# cases it has proven useful to configure a lesser value. -# This setting has a high impact on the physical size of tmp page files, -# the speed of document conversion, and can affect the accuracy of OCR -# results. Individual results can vary and this setting should be tested -# thoroughly against the documents you are importing to see if it has any -# impacts either negative or positive. -# Testing on limited document sets has shown a setting of 200 can cut the -# size of tmp files by 1/3, and speed up conversion by up to 4x -# with little impact to OCR accuracy. -#PAPERLESS_CONVERT_DENSITY=300 - -# By default Paperless does not OCR a document if the text can be retrieved from -# the document directly. Set to true to always OCR documents. -#PAPERLESS_OCR_ALWAYS=false - - -############################################################################### -#### Interface #### -############################################################################### - -# Override the default UTC time zone here. -# See https://docs.djangoproject.com/en/1.10/ref/settings/#std:setting-TIME_ZONE -# for details on how to set it. #PAPERLESS_TIME_ZONE=UTC +#PAPERLESS_OCR_LANGUAGE=eng +#PAPERLESS_OCR_ALWAYS=false +#PAPERLESS_CONSUMER_POLLING=10 +#PAPERLESS_CONSUMER_DELETE_DUPLICATES=false +#PAPERLESS_CONVERT_MEMORY_LIMIT=0 +#PAPERLESS_CONVERT_TMPDIR=/var/tmp/paperless +#PAPERLESS_CONVERT_DENSITY=300 +#PAPERLESS_OPTIMIZE_THUMBNAILS=true +#PAPERLESS_POST_CONSUME_SCRIPT=/path/to/an/arbitrary/script.sh +#PAPERLESS_FILENAME_DATE_ORDER=YMD +#PAPERLESS_FILENAME_PARSE_TRANSFORMS=[] +# Binaries -############################################################################### -#### Third-Party Binaries #### -############################################################################### - -# There are a few external software packages that Paperless expects to find on -# your system when it starts up. Unless you've done something creative with -# their installation, you probably won't need to edit any of these. However, -# if you've installed these programs somewhere where simply typing the name of -# the program doesn't automatically execute it (ie. the program isn't in your -# $PATH), then you'll need to specify the literal path for that program here. - -# Convert (part of the ImageMagick suite) #PAPERLESS_CONVERT_BINARY=/usr/bin/convert - -# Ghostscript #PAPERLESS_GS_BINARY=/usr/bin/gs - -# Unpaper #PAPERLESS_UNPAPER_BINARY=/usr/bin/unpaper - -# Optipng (for optimising thumbnail sizes) #PAPERLESS_OPTIPNG_BINARY=/usr/bin/optipng diff --git a/src/paperless/settings.py b/src/paperless/settings.py index 2713e2b5e..06895e92f 100644 --- a/src/paperless/settings.py +++ b/src/paperless/settings.py @@ -152,11 +152,11 @@ else: X_FRAME_OPTIONS = 'SAMEORIGIN' # We allow CORS from localhost:8080 -CORS_ORIGIN_WHITELIST = tuple(os.getenv("PAPERLESS_CORS_ALLOWED_HOSTS", "http://localhost:8080,https://localhost:8080").split(",")) +CORS_ALLOWED_ORIGINS = tuple(os.getenv("PAPERLESS_CORS_ALLOWED_HOSTS", "http://localhost:8000").split(",")) if DEBUG: # Allow access from the angular development server during debugging - CORS_ORIGIN_WHITELIST += ('http://localhost:4200',) + CORS_ALLOWED_ORIGINS += ('http://localhost:4200',) # The secret key has a default that should be fine so long as you're hosting # Paperless on a closed network. However, if you're putting this anywhere