Merge branch 'dev'

This commit is contained in:
shamoon
2023-08-04 11:38:12 -07:00
151 changed files with 22313 additions and 14471 deletions

View File

@@ -68,23 +68,23 @@ $ docker-compose down
After that, [make a backup](#backup).
1. If you pull the image from the docker hub, all you need to do is:
1. If you pull the image from the docker hub, all you need to do is:
```shell-session
$ docker-compose pull
$ docker-compose up
```
```shell-session
$ docker-compose pull
$ docker-compose up
```
The docker-compose files refer to the `latest` version, which is
always the latest stable release.
The docker-compose files refer to the `latest` version, which is
always the latest stable release.
2. If you built the image yourself, do the following:
1. If you built the image yourself, do the following:
```shell-session
$ git pull
$ docker-compose build
$ docker-compose up
```
```shell-session
$ git pull
$ docker-compose build
$ docker-compose up
```
Running `docker-compose up` will also apply any new database migrations.
If you see everything working, press CTRL+C once to gracefully stop
@@ -470,7 +470,7 @@ The issues detected by the sanity checker are as follows:
- Inaccessible thumbnails due to improper permissions.
- Documents without any content (warning).
- Orphaned files in the media directory (warning). These are files
that are not referenced by any document im paperless.
that are not referenced by any document in paperless.
```
document_sanity_checker

View File

@@ -1,6 +1,6 @@
# Advanced Topics
Paperless offers a couple features that automate certain tasks and make
Paperless offers a couple of features that automate certain tasks and make
your life easier.
## Matching tags, correspondents, document types, and storage paths {#matching}
@@ -35,9 +35,9 @@ The following algorithms are available:
(i.e. preserve ordering) in the PDF.
- **Regular expression:** Parses the match as a regular expression and
tries to find a match within the document.
- **Fuzzy match:** I don't know. Look at the source.
- **Fuzzy match:** I don't know. Look at [the source](https://github.com/paperless-ngx/paperless-ngx/blob/main/src/documents/matching.py).
- **Auto:** Tries to automatically match new documents. This does not
require you to set a match. See the notes below.
require you to set a match. See the [notes below](#automatic-matching).
When using the _any_ or _all_ matching algorithms, you can search for
terms that consist of multiple words by enclosing them in double quotes.
@@ -92,7 +92,7 @@ when using this feature:
decide when not to assign a certain tag, correspondent, document
type, or storage path. This will usually be the case as you start
filling up paperless with documents. Example: If all your documents
are either from "Webshop" and "Bank", paperless will assign one
are either from "Webshop" or "Bank", paperless will assign one
of these correspondents to ANY new document, if both are set to
automatic matching.
@@ -101,7 +101,7 @@ when using this feature:
Sometimes you may want to do something arbitrary whenever a document is
consumed. Rather than try to predict what you may want to do, Paperless
lets you execute scripts of your own choosing just before or after a
document is consumed using a couple simple hooks.
document is consumed using a couple of simple hooks.
Just write a script, put it somewhere that Paperless can read & execute,
and then put the path to that script in `paperless.conf` or
@@ -197,7 +197,7 @@ The script can be in any language, A simple shell script example:
!!! warning
The post consumption script should not modify the document files
directly
directly.
The script's stdout and stderr will be logged line by line to the
webserver log, along with the exit code of the script.
@@ -311,6 +311,7 @@ Paperless provides the following placeholders within filenames:
- `{added_day}`: Day added only (number 01-31).
- `{owner_username}`: Username of document owner, if any, or "none"
- `{original_name}`: Document original filename, minus the extension, if any, or "none"
- `{doc_pk}`: The paperless identifier (primary key) for the document.
Paperless will try to conserve the information from your database as
much as possible. However, some characters that you can use in document
@@ -528,7 +529,7 @@ For how to enable barcode usage, see [the configuration](/configuration#barcodes
The two settings may be enabled independently, but do have interactions as explained
below.
### Document Splitting
### Document Splitting {#document-splitting}
When enabled, Paperless will look for a barcode with the configured value and create a new document
starting from the next page. The page with the barcode on it will _not_ be retained. It
@@ -543,3 +544,69 @@ If document splitting via barcode is also enabled, documents will be split when
barcode is located. However, differing from the splitting, the page with the
barcode _will_ be retained. This allows application of a barcode to any page, including
one which holds data to keep in the document.
## Automatic collation of double-sided documents {#collate}
!!! note
If your scanner supports double-sided scanning natively, you do not need this feature.
This feature is turned off by default, see [configuration](/configuration#collate) on how to turn it on.
### Summary
If you have a scanner with an automatic document feeder (ADF) that only scans a single side,
this feature makes scanning double-sided documents much more convenient by automatically
collating two separate scans into one document, reordering the pages as necessary.
### Usage example
Suppose you have a double-sided document with 6 pages (3 sheets of paper). First,
put the stack into your ADF as normal, ensuring that page 1 is scanned first. Your ADF
will now scan pages 1, 3, and 5. Then you (or your the scanner, if it supports it) upload
the scan into the correct sub-directory of the consume folder (`double-sided` by default;
keep in mind that Paperless will _not_ automatically create the directory for you.)
Paperless will then process the scan and move it into an internal staging area.
The next step is to turn your stack upside down (without reordering the sheets of paper),
and scan it once again, your ADF will now scan pages 6, 4, and 2, in that order. Once this
scan is copied into the sub-directory, Paperless will collate the previous scan with the
new one, reversing the order of the pages on the second, "even numbered" scan. The
resulting document will have the pages 1-6 in the correct order, and this new file will
then be processed as normal.
!!! tip
When scanning the even numbered pages, you can omit the last empty pages, if there are
any. For example, if page 6 is empty, you only need to scan pages 2 and 4. _Do not_ omit
empty pages in the middle of the document.
### Things that could go wrong
Paperless will notice when the first, "odd numbered" scan has less pages than the second
scan (this can happen when e.g. the ADF skipped a few pages in the first pass). In that
case, Paperless will remove the staging copy as well as the scan, and give you an error
message asking you to restart the process from scratch, by scanning the odd pages again,
followed by the even pages.
Another thing that might happen is that you start a double sided scan, but then forget
to upload the second file. To avoid collating the wrong documents if you then come back
a day later to scan a new double-sided document, Paperless will only keep an "odd numbered
pages" file for up to 30 minutes. If more time passes, it will consider the next incoming
scan a completely new "odd numbered pages" one. The old staging file will get discarded.
### Interaction with "subdirs as tags"
The collation feature can be used together with the "subdirs as tags" feature (but this is not
a requirement). Just create a correctly named double-sided subdir in the hierachy and upload
your scans there. For example, both `double-sided/foo/bar` as well as `foo/bar/double-sided` will
cause the collated document to be treated as if it were uploaded into `foo/bar` and receive both
`foo` and `bar` tags, but not `double-sided`.
### Interaction with document splitting
You can use the [document splitting](#document-splitting) feature, but if you use a normal
single-sided split marker page, the split document(s) will have an empty page at the front (or
whatever else was on the backside of the split marker page.) You can work around that by having
a split marker page that has the split barcode on _both_ sides. This way, the extra page will
get automatically removed.

View File

@@ -524,7 +524,7 @@ parsing documents.
`PAPERLESS_OCR_MODE=<mode>`
: Tell paperless when and how to perform ocr on your documents. Four
: Tell paperless when and how to perform ocr on your documents. Three
modes are available:
- `skip`: Paperless skips all pages and will perform ocr only on
@@ -1116,6 +1116,43 @@ combination with PAPERLESS_CONSUMER_BARCODE_UPSCALE bigger than 1.0.
Defaults to "300"
## Collate Double-Sided Documents {#collate}
`PAPERLESS_CONSUMER_ENABLE_COLLATE_DOUBLE_SIDED=<bool>`
: Enables automatic collation of two single-sided scans into a double-sided
document.
This is useful if you have an automatic document feeder that only supports
single-sided scans, but you need to scan a double-sided document. If your
ADF supports double-sided scans natively, you do not need this feature.
`PAPERLESS_CONSUMER_RECURSIVE` must be enabled for this to work.
For more information, read the [corresponding section in the advanced
documentation](/advanced_usage#collate).
Defaults to false.
`PAPERLESS_CONSUMER_COLLATE_DOUBLE_SIDED_SUBDIR_NAME=<str>`
: The name of the subdirectory that the collate feature expects documents to
arrive.
This only has an effect if `PAPERLESS_CONSUMER_ENABLE_COLLATE_DOUBLE_SIDED`
has been enabled. Note that Paperless will not automatically create the
directory.
Defaults to "double-sided".
`PAPERLESS_CONSUMER_COLLATE_DOUBLE_SIDED_TIFF_SUPPORT=<bool>`
: Whether TIFF image files should be supported when collating documents.
This will automatically convert any TIFF image(s) to pdfs for later
processing. This only has an effect if
`PAPERLESS_CONSUMER_ENABLE_COLLATE_DOUBLE_SIDED` has been enabled.
Defaults to false.
## Binaries
There are a few external software packages that Paperless expects to
@@ -1123,7 +1160,7 @@ find on your system when it starts up. Unless you've done something
creative with their installation, you probably won't need to edit any
of these. However, if you've installed these programs somewhere where
simply typing the name of the program doesn't automatically execute it
(ie. the program isn't in your \$PATH), then you'll need to specify
(ie. the program isn't in your $PATH), then you'll need to specify
the literal path for that program.
`PAPERLESS_CONVERT_BINARY=<path>`
@@ -1207,7 +1244,7 @@ actual group ID on the host system, which you can get by executing
with English, German, Italian, Spanish and French. If your language
is not in this list, install additional languages with this
configuration option. You will need to [find the right LangCodes](https://tesseract-ocr.github.io/tessdoc/Data-Files-in-different-versions.html)
but note that (tesseract-ocr-\* package names)[https://packages.debian.org/bullseye/graphics/]
but note that [tesseract-ocr-\* package names](https://packages.debian.org/bullseye/graphics/)
do not always correspond with the language codes e.g. "chi_tra" should be
specified as "chi-tra".

View File

@@ -58,7 +58,7 @@ first-time setup.
!!! note
Every command is executed directly from the root folder of the project unless specified otherwise.
Every command is executed directly from the root folder of the project unless specified otherwise.
1. Install prerequisites + pipenv as mentioned in
[Bare metal route](/setup#bare_metal).
@@ -177,68 +177,69 @@ The front end is built using AngularJS. In order to get started, you need Node.j
The following commands are all performed in the `src-ui`-directory. You will need a running back end (including an active session) to connect to the back end API. To spin it up refer to the commands under the section [above](#back-end-development).
1. Install the Angular CLI. You might need sudo privileges
to perform this command:
1. Install the Angular CLI. You might need sudo privileges to perform this command:
```bash
$ npm install -g @angular/cli
```
```bash
$ npm install -g @angular/cli
```
2. Make sure that it's on your path.
2. Make sure that it's on your path.
3. Install all necessary modules:
3. Install all necessary modules:
```bash
$ npm install
```
```bash
$ npm install
```
4. You can launch a development server by running:
4. You can launch a development server by running:
```bash
$ ng serve
```
```bash
$ ng serve
```
This will automatically update whenever you save. However, in-place
compilation might fail on syntax errors, in which case you need to
restart it.
This will automatically update whenever you save. However, in-place
compilation might fail on syntax errors, in which case you need to
restart it.
By default, the development server is available on `http://localhost:4200/` and is configured to access the API at
`http://localhost:8000/api/`, which is the default of the backend. If you enabled `DEBUG` on the back end, several security overrides for allowed hosts, CORS and X-Frame-Options are in place so that the front end behaves exactly as in production.
By default, the development server is available on `http://localhost:4200/` and is configured to access the API at
`http://localhost:8000/api/`, which is the default of the backend. If you enabled `DEBUG` on the back end, several security overrides for allowed hosts, CORS and X-Frame-Options are in place so that the front end behaves exactly as in production.
### Testing and code style
- The front end code (.ts, .html, .scss) use `prettier` for code
formatting via the Git `pre-commit` hooks which run automatically on
commit. See [above](#code-formatting-with-pre-commit-hooks) for installation instructions. You can also run this via the CLI with a
command such as
The front end code (.ts, .html, .scss) use `prettier` for code
formatting via the Git `pre-commit` hooks which run automatically on
commit. See [above](#code-formatting-with-pre-commit-hooks) for installation instructions. You can also run this via the CLI with a
command such as
```bash
$ git ls-files -- '*.ts' | xargs pre-commit run prettier --files
```
```bash
$ git ls-files -- '*.ts' | xargs pre-commit run prettier --files
```
- Front end testing uses Jest and Playwright. Unit tests and e2e tests,
respectively, can be run non-interactively with:
Front end testing uses Jest and Playwright. Unit tests and e2e tests,
respectively, can be run non-interactively with:
```bash
$ ng test
$ npx playwright test
```
```bash
$ ng test
$ npx playwright test
```
- Playwright also includes a UI which can be run with:
Playwright also includes a UI which can be run with:
```bash
$ npx playwright test --ui
```
```bash
$ npx playwright test --ui
```
- In order to build the front end and serve it as part of Django, execute:
### Building the frontend
```bash
$ ng build --configuration production
```
In order to build the front end and serve it as part of Django, execute:
This will build the front end and put it in a location from which the
Django server will serve it as static content. This way, you can verify
that authentication is working.
```bash
$ ng build --configuration production
```
This will build the front end and put it in a location from which the
Django server will serve it as static content. This way, you can verify
that authentication is working.
## Localization

View File

@@ -3,10 +3,11 @@
## _What's the general plan for Paperless-ngx?_
**A:** While Paperless-ngx is already considered largely
"feature-complete" it is a community-driven project and development
will be guided in this way. New features can be submitted via GitHub
discussions and "up-voted" by the community but this is not a
guarantee the feature will be implemented. This project will always be
"feature-complete", it is a community-driven project and development
will be guided in this way. New features can be submitted via
[GitHub discussions](https://github.com/paperless-ngx/paperless-ngx/discussions)
and "up-voted" by the community, but this is not a
guarantee that the feature will be implemented. This project will always be
open to collaboration in the form of PRs, ideas etc.
## _I'm using docker. Where are my documents?_
@@ -58,7 +59,7 @@ elsewhere. Here are a couple notes about that.
WebP images are processed with OCR and converted into PDF documents.
- Plain text documents are supported as well and are added verbatim to
paperless.
- With the optional Tika integration enabled (see [Tika configuration](/configuration#tika),
- With the optional Tika integration enabled (see [Tika configuration](https://docs.paperless-ngx.com/configuration#tika)),
Paperless also supports various Office documents (.docx, .doc, odt,
.ppt, .pptx, .odp, .xls, .xlsx, .ods).
@@ -82,7 +83,7 @@ has to do much less work to serve the data.
## _How do I install paperless-ngx on Raspberry Pi?_
**A:** Docker images are available for armv7 and arm64 hardware, so just
follow the docker-compose instructions. Apart from more required disk
follow the [docker-compose instructions](https://docs.paperless-ngx.com/setup/#installation). Apart from more required disk
space compared to a bare metal installation, docker comes with close to
zero overhead, even on Raspberry Pi.