Merge branch 'dev'

This commit is contained in:
Michael Shamoon
2022-12-29 19:39:38 -08:00
181 changed files with 44867 additions and 8345 deletions

View File

@@ -233,6 +233,7 @@ optional arguments:
-c, --compare-checksums
-f, --use-filename-format
-d, --delete
-z --zip
```
`target` is a folder to which the data gets written. This includes
@@ -258,6 +259,9 @@ current export such as files from deleted documents, specify `--delete`.
Be careful when pointing paperless to a directory that already contains
other files.
If `-z` or `--zip` is provided, the export will be a zipfile
in the target directory, named according to the current date.
The filenames generated by this command follow the format
`[date created] [correspondent] [title].[extension]`. If you want
paperless to use `PAPERLESS_FILENAME_FORMAT` for exported filenames

View File

@@ -10,12 +10,10 @@ run paperless, these settings have to be defined in different places.
- If you are running paperless on anything else, paperless will search
for the configuration file in these locations and use the first one
it finds:
```
/path/to/paperless/paperless.conf
/etc/paperless.conf
/usr/local/etc/paperless.conf
```
- The environment variable `PAPERLESS_CONFIGURATION_PATH`
- `/path/to/paperless/paperless.conf`
- `/etc/paperless.conf`
- `/usr/local/etc/paperless.conf`
## Required services
@@ -170,6 +168,19 @@ details.
Defaults to `PAPERLESS_DATA_DIR/log/`.
`PAPERLESS_NLTK_DIR=<path>`
: This is where paperless will search for the data required for NLTK
processing, if you are using it. If you are using the Docker image,
this should not be changed, as the data is included in the image
already.
Previously, the location defaulted to `PAPERLESS_DATA_DIR/nltk`.
Unless you are using this in a bare metal install or other setup,
this folder is no longer needed and can be removed manually.
Defaults to `/usr/local/share/nltk_data`
## Logging
`PAPERLESS_LOGROTATE_MAX_SIZE=<num>`
@@ -564,8 +575,10 @@ they use underscores instead of dashes.
Paperless can make use of [Tika](https://tika.apache.org/) and
[Gotenberg](https://gotenberg.dev/) for parsing and converting
"Office" documents (such as ".doc", ".xlsx" and ".odt"). If you
wish to use this, you must provide a Tika server and a Gotenberg server,
"Office" documents (such as ".doc", ".xlsx" and ".odt").
Tika and Gotenberg are also needed to allow parsing of E-Mails (.eml).
If you wish to use this, you must provide a Tika server and a Gotenberg server,
configure their endpoints, and enable the feature.
`PAPERLESS_TIKA_ENABLED=<bool>`
@@ -604,14 +617,17 @@ services:
PAPERLESS_TIKA_GOTENBERG_ENDPOINT: http://gotenberg:3000
PAPERLESS_TIKA_ENDPOINT: http://tika:9998
# ...
# ...
gotenberg:
image: gotenberg/gotenberg:7.6
restart: unless-stopped
command:
- 'gotenberg'
- '--chromium-disable-routes=true'
gotenberg:
image: gotenberg/gotenberg:7.6
restart: unless-stopped
# The gotenberg chromium route is used to convert .eml files. We do not
# want to allow external content like tracking pixels or even javascript.
command:
- 'gotenberg'
- '--chromium-disable-javascript=true'
- '--chromium-allow-list=file:///tmp/.*'
tika:
image: ghcr.io/paperless-ngx/tika:latest
@@ -658,7 +674,7 @@ paperless will process in parallel on a single document.
count, with a slight favor towards threads per worker:
| CPU core count | Workers | Threads |
|----------------|---------|---------|
| -------------- | ------- | ------- |
| > 1 | > 1 | > 1 |
| > 2 | > 2 | > 1 |
| > 4 | > 2 | > 2 |
@@ -691,6 +707,16 @@ for details on how to set it.
Defaults to UTC.
`PAPERLESS_ENABLE_NLTK=<bool>`
: Enables or disables the advanced natural language processing
used during automatic classification. If disabled, paperless will
still preform some basic text pre-processing before matching.
See also `PAPERLESS_NLTK_DIR`.
Defaults to 1.
## Polling {#polling}
`PAPERLESS_CONSUMER_POLLING=<num>`

View File

@@ -125,13 +125,13 @@ using docker-compose, this is achieved by the following configuration
change in the `docker-compose.yml` file:
```yaml
gotenberg:
image: gotenberg/gotenberg:7.6
restart: unless-stopped
command:
- 'gotenberg'
- '--chromium-disable-routes=true'
- '--api-timeout=60'
# The gotenberg chromium route is used to convert .eml files. We do not
# want to allow external content like tracking pixels or even javascript.
command:
- 'gotenberg'
- '--chromium-disable-javascript=true'
- '--chromium-allow-list=file:///tmp/.*'
- '--api-timeout=60'
```
## Permission denied errors in the consumption directory