Compare commits

..

1 Commits

Author SHA1 Message Date
Daniel Quinn
717b4a2b49 Fixes #172
Introduce some creative code around setting of ALLOWED_HOSTS that defaults to ['*'].  Also added PAPERLESS_ALLOWED_HOSTS to paperless.conf.example with an explanation as to what it's for
2017-01-03 09:52:31 +00:00
807 changed files with 32403 additions and 228039 deletions

View File

@@ -1,9 +0,0 @@
{
"qpdf": {
"version": "10.6.3"
},
"jbig2enc": {
"version": "0.29",
"git_tag": "0.29"
}
}

View File

@@ -1,21 +0,0 @@
**/__pycache__
/src-ui/.vscode
/src-ui/node_modules
/src-ui/dist
.git
/export
/consume
/media
/data
/docs
.pytest_cache
/dist
/scripts
/resources
**/tests
**/*.spec.ts
**/htmlcov
/src/.pytest_cache
.idea
.venv/
.vscode/

View File

@@ -1,37 +0,0 @@
# EditorConfig: http://EditorConfig.org
root = true
[*]
indent_style = tab
indent_size = 2
insert_final_newline = true
trim_trailing_whitespace = true
end_of_line = lf
charset = utf-8
max_line_length = 79
[{*.html,*.css,*.js}]
max_line_length = off
[*.py]
indent_size = 4
indent_style = space
[*.{yml,yaml}]
indent_style = space
[*.rst]
indent_style = space
[*.md]
indent_style = space
# Tests don't get a line width restriction. It's still a good idea to follow
# the 79 character rule, but in the interests of clarity, tests often need to
# violate it.
[**/test_*.py]
max_line_length = off
[Dockerfile*]
indent_style = space

2
.env
View File

@@ -1,2 +0,0 @@
COMPOSE_PROJECT_NAME=paperless
export PROMPT="(pipenv-projectname)$P$G"

View File

@@ -1,86 +0,0 @@
name: Bug report
description: Something is not working
title: "[BUG] Concise description of the issue"
labels: ["bug", "unconfirmed"]
body:
- type: markdown
attributes:
value: |
Have a question? 👉 [Start a new discussion](https://github.com/paperless-ngx/paperless-ngx/discussions/new) or [ask in chat](https://matrix.to/#/#paperless:adnidor.de).
Before opening an issue, please double check:
- [The troubleshooting documentation](https://paperless-ngx.readthedocs.io/en/latest/troubleshooting.html).
- [The installation instructions](https://paperless-ngx.readthedocs.io/en/latest/setup.html#installation).
- [Existing issues and discussions](https://github.com/paperless-ngx/paperless-ngx/search?q=&type=issues).
If you encounter issues while installing or configuring Paperless-ngx, please post in the ["Support" section of the discussions](https://github.com/paperless-ngx/paperless-ngx/discussions/new?category=support).
- type: textarea
id: description
attributes:
label: Description
description: A clear and concise description of what the bug is. If applicable, add screenshots to help explain your problem.
placeholder: |
Currently Paperless does not work when...
[Screenshot if applicable]
validations:
required: true
- type: textarea
id: reproduction
attributes:
label: Steps to reproduce
description: Steps to reproduce the behavior.
placeholder: |
1. Go to '...'
2. Click on '....'
3. See error
validations:
required: true
- type: textarea
id: logs
attributes:
label: Webserver logs
description: If available, post any logs from the web server related to your issue.
render: bash
- type: input
id: version
attributes:
label: Paperless-ngx version
placeholder: e.g. 1.6.0
validations:
required: true
- type: input
id: host-os
attributes:
label: Host OS
description: Host OS of the machine running paperless-ngx. Please add the architecture (uname -m) if applicable.
placeholder: e.g. Archlinux / Ubuntu 20.04 / Raspberry Pi `arm64`
validations:
required: true
- type: dropdown
id: install-method
attributes:
label: Installation method
options:
- Docker
- Bare metal
- Other (please describe above)
validations:
required: true
- type: input
id: browser
attributes:
label: Browser
description: Which browser you are using, if relevant.
placeholder: e.g. Chrome, Safari
- type: input
id: config-changes
attributes:
label: Configuration changes
description: Any configuration changes you made in `docker-compose.yml`, `docker-compose.env` or `paperless.conf`.
- type: input
id: other
attributes:
label: Other
description: Any other relevant details.

View File

@@ -1,11 +0,0 @@
blank_issues_enabled: false
contact_links:
- name: 🤔 Questions and Help
url: https://github.com/paperless-ngx/paperless-ngx/discussions
about: This issue tracker is not for support questions. Please refer to our Discussions.
- name: 💬 Chat
url: https://matrix.to/#/#paperless:adnidor.de
about: Want to discuss Paperless-ngx with others? Check out our chat.
- name: 🚀 Feature Request
url: https://github.com/paperless-ngx/paperless-ngx/discussions/new?category=feature-requests
about: Remember to search for existing feature requests and "up-vote" any you like

View File

@@ -1,32 +0,0 @@
<!--
Note: All PRs with code changes should be targeted to the `dev` branch, pure documentation changes can target `main`
-->
## Proposed change
<!--
Please include a summary of the change and which issue is fixed (if any) and any relevant motivation / context. List any dependencies that are required for this change. If appropriate, please include an explanation of how your proposed change can be tested. Screenshots and / or videos can also be helpful if appropriate.
-->
Fixes # (issue)
## Type of change
<!--
What type of change does your PR introduce to Paperless-ngx?
NOTE: Please check only one box!
-->
- [ ] Bug fix (non-breaking change which fixes an issue)
- [ ] New feature (non-breaking change which adds functionality)
- [ ] Breaking change (fix or feature that would cause existing functionality to not work as expected)
- [ ] Other (please explain)
## Checklist:
- [ ] I have read & agree with the [contributing guidelines](https://github.com/paperless-ngx/paperless-ngx/blob/main/CONTRIBUTING.md).
- [ ] If applicable, I have tested my code for new features & regressions on both mobile & desktop devices, using the latest version of major browsers.
- [ ] If applicable, I have checked that all tests pass, see [documentation](https://paperless-ngx.readthedocs.io/en/latest/extending.html#back-end-development).
- [ ] I have run all `pre-commit` hooks, see [documentation](https://paperless-ngx.readthedocs.io/en/latest/extending.html#code-formatting-with-pre-commit-hooks).
- [ ] I have made corresponding changes to the documentation as needed.
- [ ] I have checked my modifications for any breaking changes.

View File

@@ -1,48 +0,0 @@
# https://docs.github.com/en/code-security/supply-chain-security/keeping-your-dependencies-updated-automatically/configuration-options-for-dependency-updates#package-ecosystem
version: 2
updates:
# Enable version updates for npm
- package-ecosystem: "npm"
target-branch: "dev"
# Look for `package.json` and `lock` files in the `/src-ui` directory
directory: "/src-ui"
# Check the npm registry for updates every month
schedule:
interval: "monthly"
labels:
- "frontend"
- "dependencies"
# Add reviewers
reviewers:
- "paperless-ngx/frontend"
# Enable version updates for Python
- package-ecosystem: "pip"
target-branch: "dev"
# Look for a `Pipfile` in the `root` directory
directory: "/"
# Check for updates once a week
schedule:
interval: "weekly"
labels:
- "backend"
- "dependencies"
# Add reviewers
reviewers:
- "paperless-ngx/backend"
# Enable updates for Github Actions
- package-ecosystem: "github-actions"
target-branch: "dev"
directory: "/"
schedule:
# Check for updates to GitHub Actions every month
interval: "monthly"
labels:
- "ci-cd"
- "dependencies"
# Add reviewers
reviewers:
- "paperless-ngx/ci-cd"

View File

@@ -1,37 +0,0 @@
categories:
- title: 'Breaking Changes'
labels:
- 'breaking-change'
- title: 'Features'
labels:
- 'enhancement'
- title: 'Bug Fixes'
labels:
- 'bug'
- title: 'Documentation'
label: 'documentation'
- title: 'Maintenance'
labels:
- 'chore'
- 'deployment'
- 'translation'
- title: 'Dependencies'
collapse-after: 3
label: 'dependencies'
include-labels:
- 'enhancement'
- 'bug'
- 'chore'
- 'deployment'
- 'translation'
- 'dependencies'
replacers: # Changes "Feature: Update checker" to "Update checker"
- search: '/Feature:|Feat:|\[feature\]/gi'
replace: ''
category-template: '### $TITLE'
change-template: '- $TITLE [@$AUTHOR](https://github.com/$AUTHOR) ([#$NUMBER]($URL))'
change-title-escapes: '\<*_&#@'
template: |
## paperless-ngx $RESOLVED_VERSION
$CHANGES

View File

@@ -1,27 +0,0 @@
#!/usr/bin/env python3
def get_image_tag(
repo_name: str,
pkg_name: str,
pkg_version: str,
) -> str:
"""
Returns a string representing the normal image for a given package
"""
return f"ghcr.io/{repo_name}/builder/{pkg_name}:{pkg_version}"
def get_cache_image_tag(
repo_name: str,
pkg_name: str,
pkg_version: str,
branch_name: str,
) -> str:
"""
Returns a string representing the expected image cache tag for a given package
Registry type caching is utilized for the builder images, to allow fast
rebuilds, generally almost instant for the same version
"""
return f"ghcr.io/{repo_name}/builder/cache/{pkg_name}:{pkg_version}"

View File

@@ -1,102 +0,0 @@
#!/usr/bin/env python3
"""
This is a helper script for the mutli-stage Docker image builder.
It provides a single point of configuration for package version control.
The output JSON object is used by the CI workflow to determine what versions
to build and pull into the final Docker image.
Python package information is obtained from the Pipfile.lock. As this is
kept updated by dependabot, it usually will need no further configuration.
The sole exception currently is pikepdf, which has a dependency on qpdf,
and is configured here to use the latest version of qpdf built by the workflow.
Other package version information is configured directly below, generally by
setting the version and Git information, if any.
"""
import argparse
import json
import os
from pathlib import Path
from typing import Final
from common import get_cache_image_tag
from common import get_image_tag
def _main():
parser = argparse.ArgumentParser(
description="Generate a JSON object of information required to build the given package, based on the Pipfile.lock",
)
parser.add_argument(
"package",
help="The name of the package to generate JSON for",
)
PIPFILE_LOCK_PATH: Final[Path] = Path("Pipfile.lock")
BUILD_CONFIG_PATH: Final[Path] = Path(".build-config.json")
# Read the main config file
build_json: Final = json.loads(BUILD_CONFIG_PATH.read_text())
# Read Pipfile.lock file
pipfile_data: Final = json.loads(PIPFILE_LOCK_PATH.read_text())
args: Final = parser.parse_args()
# Read from environment variables set by GitHub Actions
repo_name: Final[str] = os.environ["GITHUB_REPOSITORY"]
branch_name: Final[str] = os.environ["GITHUB_REF_NAME"]
# Default output values
version = None
git_tag = None
extra_config = {}
if args.package in pipfile_data["default"]:
# Read the version from Pipfile.lock
pkg_data = pipfile_data["default"][args.package]
pkg_version = pkg_data["version"].split("==")[-1]
version = pkg_version
# Based on the package, generate the expected Git tag name
if args.package == "pikepdf":
git_tag = f"v{pkg_version}"
elif args.package == "psycopg2":
git_tag = pkg_version.replace(".", "_")
# Any extra/special values needed
if args.package == "pikepdf":
extra_config["qpdf_version"] = build_json["qpdf"]["version"]
elif args.package in build_json:
version = build_json[args.package]["version"]
if "git_tag" in build_json[args.package]:
git_tag = build_json[args.package]["git_tag"]
else:
raise NotImplementedError(args.package)
# The JSON object we'll output
output = {
"name": args.package,
"version": version,
"git_tag": git_tag,
"image_tag": get_image_tag(repo_name, args.package, version),
"cache_tag": get_cache_image_tag(
repo_name,
args.package,
version,
branch_name,
),
}
# Add anything special a package may need
output.update(extra_config)
# Output the JSON info to stdout
print(json.dumps(output))
if __name__ == "__main__":
_main()

15
.github/stale.yml vendored
View File

@@ -1,15 +0,0 @@
# Number of days of inactivity before an issue becomes stale
daysUntilStale: 30
# Number of days of inactivity before a stale issue is closed
daysUntilClose: 7
onlyLabels:
- unconfirmed
# Label to use when marking an issue as stale
staleLabel: stale
# Comment to post when marking an issue as stale. Set to `false` to disable
markComment: >
This issue has been automatically marked as stale because it has not had
recent activity. It will be closed if no further activity occurs. Thank you
for your contributions.
# Comment to post when closing a stale issue. Set to `false` to disable
closeComment: false

View File

@@ -1,401 +0,0 @@
name: ci
on:
push:
tags:
# https://semver.org/#spec-item-2
- 'v[0-9]+.[0-9]+.[0-9]+'
# https://semver.org/#spec-item-9
- 'v[0-9]+.[0-9]+.[0-9]+-beta.rc[0-9]+'
branches-ignore:
- 'translations**'
pull_request:
branches-ignore:
- 'translations**'
jobs:
documentation:
name: "Build Documentation"
runs-on: ubuntu-20.04
steps:
-
name: Checkout
uses: actions/checkout@v3
-
name: Install pipenv
run: pipx install pipenv
-
name: Set up Python
uses: actions/setup-python@v3
with:
python-version: 3.9
cache: "pipenv"
cache-dependency-path: 'Pipfile.lock'
-
name: Install dependencies
run: |
pipenv sync --dev
-
name: Make documentation
run: |
cd docs/
pipenv run make html
-
name: Upload artifact
uses: actions/upload-artifact@v3
with:
name: documentation
path: docs/_build/html/
ci-backend:
uses: ./.github/workflows/reusable-ci-backend.yml
ci-frontend:
uses: ./.github/workflows/reusable-ci-frontend.yml
prepare-docker-build:
name: Prepare Docker Pipeline Data
if: github.event_name == 'push' && (startsWith(github.ref, 'refs/heads/feature-') || github.ref == 'refs/heads/dev' || github.ref == 'refs/heads/beta' || contains(github.ref, 'beta.rc') || startsWith(github.ref, 'refs/tags/v'))
runs-on: ubuntu-20.04
needs:
- documentation
- ci-backend
- ci-frontend
steps:
-
name: Checkout
uses: actions/checkout@v3
-
name: Set up Python
uses: actions/setup-python@v3
with:
python-version: "3.9"
-
name: Setup qpdf image
id: qpdf-setup
run: |
build_json=$(python ${GITHUB_WORKSPACE}/.github/scripts/get-build-json.py qpdf)
echo ${build_json}
echo ::set-output name=qpdf-json::${build_json}
-
name: Setup psycopg2 image
id: psycopg2-setup
run: |
build_json=$(python ${GITHUB_WORKSPACE}/.github/scripts/get-build-json.py psycopg2)
echo ${build_json}
echo ::set-output name=psycopg2-json::${build_json}
-
name: Setup pikepdf image
id: pikepdf-setup
run: |
build_json=$(python ${GITHUB_WORKSPACE}/.github/scripts/get-build-json.py pikepdf)
echo ${build_json}
echo ::set-output name=pikepdf-json::${build_json}
-
name: Setup jbig2enc image
id: jbig2enc-setup
run: |
build_json=$(python ${GITHUB_WORKSPACE}/.github/scripts/get-build-json.py jbig2enc)
echo ${build_json}
echo ::set-output name=jbig2enc-json::${build_json}
outputs:
qpdf-json: ${{ steps.qpdf-setup.outputs.qpdf-json }}
pikepdf-json: ${{ steps.pikepdf-setup.outputs.pikepdf-json }}
psycopg2-json: ${{ steps.psycopg2-setup.outputs.psycopg2-json }}
jbig2enc-json: ${{ steps.jbig2enc-setup.outputs.jbig2enc-json}}
build-qpdf-debs:
name: qpdf
needs:
- prepare-docker-build
uses: ./.github/workflows/reusable-workflow-builder.yml
with:
dockerfile: ./docker-builders/Dockerfile.qpdf
build-json: ${{ needs.prepare-docker-build.outputs.qpdf-json }}
build-args: |
QPDF_VERSION=${{ fromJSON(needs.prepare-docker-build.outputs.qpdf-json).version }}
build-jbig2enc:
name: jbig2enc
needs:
- prepare-docker-build
uses: ./.github/workflows/reusable-workflow-builder.yml
with:
dockerfile: ./docker-builders/Dockerfile.jbig2enc
build-json: ${{ needs.prepare-docker-build.outputs.jbig2enc-json }}
build-args: |
JBIG2ENC_VERSION=${{ fromJSON(needs.prepare-docker-build.outputs.jbig2enc-json).version }}
build-psycopg2-wheel:
name: psycopg2
needs:
- prepare-docker-build
uses: ./.github/workflows/reusable-workflow-builder.yml
with:
dockerfile: ./docker-builders/Dockerfile.psycopg2
build-json: ${{ needs.prepare-docker-build.outputs.psycopg2-json }}
build-args: |
PSYCOPG2_GIT_TAG=${{ fromJSON(needs.prepare-docker-build.outputs.psycopg2-json).git_tag }}
PSYCOPG2_VERSION=${{ fromJSON(needs.prepare-docker-build.outputs.psycopg2-json).version }}
build-pikepdf-wheel:
name: pikepdf
needs:
- prepare-docker-build
- build-qpdf-debs
uses: ./.github/workflows/reusable-workflow-builder.yml
with:
dockerfile: ./docker-builders/Dockerfile.pikepdf
build-json: ${{ needs.prepare-docker-build.outputs.pikepdf-json }}
build-args: |
REPO=${{ github.repository }}
QPDF_VERSION=${{ fromJSON(needs.prepare-docker-build.outputs.qpdf-json).version }}
PIKEPDF_GIT_TAG=${{ fromJSON(needs.prepare-docker-build.outputs.pikepdf-json).git_tag }}
PIKEPDF_VERSION=${{ fromJSON(needs.prepare-docker-build.outputs.pikepdf-json).version }}
# build and push image to docker hub.
build-docker-image:
runs-on: ubuntu-20.04
concurrency:
group: ${{ github.workflow }}-build-docker-image-${{ github.ref_name }}
cancel-in-progress: true
needs:
- prepare-docker-build
- build-psycopg2-wheel
- build-jbig2enc
- build-qpdf-debs
- build-pikepdf-wheel
steps:
-
name: Check pushing to Docker Hub
id: docker-hub
# Only push to Dockerhub from the main repo
# Otherwise forks would require a Docker Hub account and secrets setup
run: |
if [[ ${{ github.repository }} == "paperless-ngx/paperless-ngx" ]] ; then
echo ::set-output name=enable::"true"
else
echo ::set-output name=enable::"false"
fi
-
name: Gather Docker metadata
id: docker-meta
uses: docker/metadata-action@v3
with:
images: |
ghcr.io/${{ github.repository }}
name=paperlessngx/paperless-ngx,enable=${{ steps.docker-hub.outputs.enable }}
tags: |
# Tag branches with branch name
type=ref,event=branch
# Process semver tags
# For a tag x.y.z or vX.Y.Z, output an x.y.z and x.y image tag
type=semver,pattern={{version}}
type=semver,pattern={{major}}.{{minor}}
-
name: Checkout
uses: actions/checkout@v3
-
name: Set up Docker Buildx
uses: docker/setup-buildx-action@v1
-
name: Set up QEMU
uses: docker/setup-qemu-action@v1
-
name: Login to Github Container Registry
uses: docker/login-action@v1
with:
registry: ghcr.io
username: ${{ github.actor }}
password: ${{ secrets.GITHUB_TOKEN }}
-
name: Login to Docker Hub
uses: docker/login-action@v1
# Don't attempt to login is not pushing to Docker Hub
if: steps.docker-hub.outputs.enable == 'true'
with:
username: ${{ secrets.DOCKERHUB_USERNAME }}
password: ${{ secrets.DOCKERHUB_TOKEN }}
-
name: Build and push
uses: docker/build-push-action@v2
with:
context: .
file: ./Dockerfile
platforms: linux/amd64,linux/arm/v7,linux/arm64
push: ${{ github.event_name != 'pull_request' }}
tags: ${{ steps.docker-meta.outputs.tags }}
labels: ${{ steps.docker-meta.outputs.labels }}
build-args: |
JBIG2ENC_VERSION=${{ fromJSON(needs.prepare-docker-build.outputs.jbig2enc-json).version }}
QPDF_VERSION=${{ fromJSON(needs.prepare-docker-build.outputs.qpdf-json).version }}
PIKEPDF_VERSION=${{ fromJSON(needs.prepare-docker-build.outputs.pikepdf-json).version }}
PSYCOPG2_VERSION=${{ fromJSON(needs.prepare-docker-build.outputs.psycopg2-json).version }}
# Get cache layers from this branch, then dev, then main
# This allows new branches to get at least some cache benefits, generally from dev
cache-from: |
type=registry,ref=ghcr.io/${{ github.repository }}/builder/cache/app:${{ github.ref_name }}
type=registry,ref=ghcr.io/${{ github.repository }}/builder/cache/app:dev
type=registry,ref=ghcr.io/${{ github.repository }}/builder/cache/app:main
cache-to: |
type=registry,mode=max,ref=ghcr.io/${{ github.repository }}/builder/cache/app:${{ github.ref_name }}
-
name: Inspect image
run: |
docker buildx imagetools inspect ${{ fromJSON(steps.docker-meta.outputs.json).tags[0] }}
-
name: Export frontend artifact from docker
run: |
docker create --name frontend-extract ${{ fromJSON(steps.docker-meta.outputs.json).tags[0] }}
docker cp frontend-extract:/usr/src/paperless/src/documents/static/frontend src/documents/static/frontend/
-
name: Upload frontend artifact
uses: actions/upload-artifact@v3
with:
name: frontend-compiled
path: src/documents/static/frontend/
build-release:
needs:
- build-docker-image
runs-on: ubuntu-20.04
steps:
-
name: Checkout
uses: actions/checkout@v3
-
name: Set up Python
uses: actions/setup-python@v3
with:
python-version: 3.9
-
name: Install dependencies
run: |
sudo apt-get update -qq
sudo apt-get install -qq --no-install-recommends gettext liblept5
pip3 install --upgrade pip setuptools wheel
pip3 install -r requirements.txt
-
name: Download frontend artifact
uses: actions/download-artifact@v3
with:
name: frontend-compiled
path: src/documents/static/frontend/
-
name: Download documentation artifact
uses: actions/download-artifact@v3
with:
name: documentation
path: docs/_build/html/
-
name: Move files
run: |
mkdir dist
mkdir dist/paperless-ngx
mkdir dist/paperless-ngx/scripts
cp .dockerignore .env Dockerfile Pipfile Pipfile.lock LICENSE README.md requirements.txt dist/paperless-ngx/
cp paperless.conf.example dist/paperless-ngx/paperless.conf
cp gunicorn.conf.py dist/paperless-ngx/gunicorn.conf.py
cp docker/ dist/paperless-ngx/docker -r
cp scripts/*.service scripts/*.sh dist/paperless-ngx/scripts/
cp src/ dist/paperless-ngx/src -r
cp docs/_build/html/ dist/paperless-ngx/docs -r
-
name: Compile messages
run: |
cd dist/paperless-ngx/src
python3 manage.py compilemessages
-
name: Collect static files
run: |
cd dist/paperless-ngx/src
python3 manage.py collectstatic --no-input
-
name: Make release package
run: |
cd dist
find . -name __pycache__ | xargs rm -r
tar -cJf paperless-ngx.tar.xz paperless-ngx/
-
name: Upload release artifact
uses: actions/upload-artifact@v3
with:
name: release
path: dist/paperless-ngx.tar.xz
publish-release:
runs-on: ubuntu-20.04
needs:
- build-release
if: github.ref_type == 'tag' && (startsWith(github.ref_name, 'v') || contains(github.ref_name, '-beta.rc'))
steps:
-
name: Download release artifact
uses: actions/download-artifact@v3
with:
name: release
path: ./
-
name: Get version
id: get_version
run: |
echo ::set-output name=version::${{ github.ref_name }}
if [[ ${{ contains(github.ref_name, '-beta.rc') }} == 'true' ]]; then
echo ::set-output name=prerelease::true
else
echo ::set-output name=prerelease::false
fi
-
name: Create Release and Changelog
id: create-release
uses: release-drafter/release-drafter@v5
with:
name: Paperless-ngx ${{ steps.get_version.outputs.version }}
tag: ${{ steps.get_version.outputs.version }}
version: ${{ steps.get_version.outputs.version }}
prerelease: ${{ steps.get_version.outputs.prerelease }}
publish: true # ensures release is not marked as draft
env:
GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }}
-
name: Upload release archive
id: upload-release-asset
uses: actions/upload-release-asset@v1
env:
GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }}
with:
upload_url: ${{ steps.create-release.outputs.upload_url }}
asset_path: ./paperless-ngx.tar.xz
asset_name: paperless-ngx-${{ steps.get_version.outputs.version }}.tar.xz
asset_content_type: application/x-xz
-
name: Checkout
uses: actions/checkout@v3
with:
ref: main
-
name: Append Changelog to docs
id: append-Changelog
working-directory: docs
run: |
echo -e "# Changelog\n\n${{ steps.create-release.outputs.body }}\n" > changelog-new.md
CURRENT_CHANGELOG=`tail --lines +2 changelog.md`
echo -e "$CURRENT_CHANGELOG" >> changelog-new.md
mv changelog-new.md changelog.md
git config --global user.name "github-actions"
git config --global user.email "41898282+github-actions[bot]@users.noreply.github.com"
git commit -am "Changelog ${{ steps.get_version.outputs.version }} - GHA"
git push origin HEAD:main

View File

@@ -1,54 +0,0 @@
# For most projects, this workflow file will not need changing; you simply need
# to commit it to your repository.
#
# You may wish to alter this file to override the set of languages analyzed,
# or to provide custom queries or build logic.
#
# ******** NOTE ********
# We have attempted to detect the languages in your repository. Please check
# the `language` matrix defined below to confirm you have the correct set of
# supported CodeQL languages.
#
name: "CodeQL"
on:
push:
branches: [ main, dev ]
pull_request:
# The branches below must be a subset of the branches above
branches: [ dev ]
schedule:
- cron: '28 13 * * 5'
jobs:
analyze:
name: Analyze
runs-on: ubuntu-latest
permissions:
actions: read
contents: read
security-events: write
strategy:
fail-fast: false
matrix:
language: [ 'javascript', 'python' ]
# CodeQL supports [ 'cpp', 'csharp', 'go', 'java', 'javascript', 'python', 'ruby' ]
# Learn more about CodeQL language support at https://git.io/codeql-language-support
steps:
- name: Checkout repository
uses: actions/checkout@v2
# Initializes the CodeQL tools for scanning.
- name: Initialize CodeQL
uses: github/codeql-action/init@v2
with:
languages: ${{ matrix.language }}
# If you wish to specify custom queries, you can do so here or in a config file.
# By default, queries listed here will override any specified in a config file.
# Prefix the list here with "+" to use these queries and those in the config file.
# queries: ./path/to/local/query, your-org/your-repo/queries@main
- name: Perform CodeQL Analysis
uses: github/codeql-action/analyze@v2

View File

@@ -1,47 +0,0 @@
name: Project Automations
on:
issues:
types:
- opened
- reopened
pull_request_target: #_target allows access to secrets
types:
- opened
- reopened
branches:
- main
- dev
env:
todo: Todo
done: Done
in_progress: In Progress
jobs:
issue_opened_or_reopened:
name: issue_opened_or_reopened
runs-on: ubuntu-latest
if: github.event_name == 'issues' && (github.event.action == 'opened' || github.event.action == 'reopened')
steps:
- name: Set issue status to ${{ env.todo }}
uses: leonsteinhaeuser/project-beta-automations@v1.2.1
with:
gh_token: ${{ secrets.GH_TOKEN }}
organization: paperless-ngx
project_id: 2
resource_node_id: ${{ github.event.issue.node_id }}
status_value: ${{ env.todo }} # Target status
pr_opened_or_reopened:
name: pr_opened_or_reopened
runs-on: ubuntu-latest
if: github.event_name == 'pull_request_target' && (github.event.action == 'opened' || github.event.action == 'reopened')
steps:
- name: Set PR status to ${{ env.in_progress }}
uses: leonsteinhaeuser/project-beta-automations@v1.2.1
with:
gh_token: ${{ secrets.GH_TOKEN }}
organization: paperless-ngx
project_id: 2
resource_node_id: ${{ github.event.pull_request.node_id }}
status_value: ${{ env.in_progress }} # Target status

View File

@@ -1,108 +0,0 @@
name: Backend CI Jobs
on:
workflow_call:
jobs:
code-checks-backend:
name: "Code Style Checks"
runs-on: ubuntu-20.04
steps:
-
name: Checkout
uses: actions/checkout@v3
-
name: Install checkers
run: |
pipx install reorder-python-imports
pipx install yesqa
pipx install add-trailing-comma
pipx install flake8
-
name: Run reorder-python-imports
run: |
find src/ -type f -name '*.py' ! -path "*/migrations/*" | xargs reorder-python-imports
-
name: Run yesqa
run: |
find src/ -type f -name '*.py' ! -path "*/migrations/*" | xargs yesqa
-
name: Run add-trailing-comma
run: |
find src/ -type f -name '*.py' ! -path "*/migrations/*" | xargs add-trailing-comma
# black is placed after add-trailing-comma because it may format differently
# if a trailing comma is added
-
name: Run black
uses: psf/black@stable
with:
options: "--check --diff"
version: "22.3.0"
-
name: Run flake8 checks
run: |
cd src/
flake8 --max-line-length=88 --ignore=E203,W503
tests-backend:
name: "Tests (${{ matrix.python-version }})"
runs-on: ubuntu-20.04
needs:
- code-checks-backend
strategy:
matrix:
python-version: ['3.8', '3.9', '3.10']
fail-fast: false
steps:
-
name: Checkout
uses: actions/checkout@v3
with:
fetch-depth: 2
-
name: Install pipenv
run: pipx install pipenv
-
name: Set up Python
uses: actions/setup-python@v3
with:
python-version: "${{ matrix.python-version }}"
cache: "pipenv"
cache-dependency-path: 'Pipfile.lock'
-
name: Install system dependencies
run: |
sudo apt-get update -qq
sudo apt-get install -qq --no-install-recommends unpaper tesseract-ocr imagemagick ghostscript optipng libzbar0 poppler-utils
-
name: Install Python dependencies
run: |
pipenv sync --dev
-
name: Tests
run: |
cd src/
pipenv run pytest
-
name: Get changed files
id: changed-files-specific
uses: tj-actions/changed-files@v19
with:
files: |
src/**
-
name: List all changed files
run: |
for file in ${{ steps.changed-files-specific.outputs.all_changed_files }}; do
echo "${file} was changed"
done
-
name: Publish coverage results
if: matrix.python-version == '3.9' && steps.changed-files-specific.outputs.any_changed == 'true'
env:
GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }}
# https://github.com/coveralls-clients/coveralls-python/issues/251
run: |
cd src/
pipenv run coveralls --service=github

View File

@@ -1,42 +0,0 @@
name: Frontend CI Jobs
on:
workflow_call:
jobs:
code-checks-frontend:
name: "Code Style Checks"
runs-on: ubuntu-20.04
steps:
-
name: Checkout
uses: actions/checkout@v3
- uses: actions/setup-node@v3
with:
node-version: '16'
-
name: Install prettier
run: |
npm install prettier
-
name: Run prettier
run:
npx prettier --check --ignore-path Pipfile.lock **/*.js **/*.ts *.md **/*.md
tests-frontend:
name: "Tests"
runs-on: ubuntu-20.04
needs:
- code-checks-frontend
strategy:
matrix:
node-version: [16.x]
steps:
- uses: actions/checkout@v3
- name: Use Node.js ${{ matrix.node-version }}
uses: actions/setup-node@v3
with:
node-version: ${{ matrix.node-version }}
- run: cd src-ui && npm ci
- run: cd src-ui && npm run test
- run: cd src-ui && npm run e2e:ci

View File

@@ -1,53 +0,0 @@
name: Reusable Image Builder
on:
workflow_call:
inputs:
dockerfile:
required: true
type: string
build-json:
required: true
type: string
build-args:
required: false
default: ""
type: string
concurrency:
group: ${{ github.workflow }}-${{ fromJSON(inputs.build-json).name }}-${{ fromJSON(inputs.build-json).version }}
cancel-in-progress: false
jobs:
build-image:
name: Build ${{ fromJSON(inputs.build-json).name }} @ ${{ fromJSON(inputs.build-json).version }}
runs-on: ubuntu-latest
steps:
-
name: Checkout
uses: actions/checkout@v3
-
name: Login to Github Container Registry
uses: docker/login-action@v1
with:
registry: ghcr.io
username: ${{ github.actor }}
password: ${{ secrets.GITHUB_TOKEN }}
-
name: Set up Docker Buildx
uses: docker/setup-buildx-action@v1
-
name: Set up QEMU
uses: docker/setup-qemu-action@v1
-
name: Build ${{ fromJSON(inputs.build-json).name }}
uses: docker/build-push-action@v2
with:
context: .
file: ${{ inputs.dockerfile }}
tags: ${{ fromJSON(inputs.build-json).image_tag }}
platforms: linux/amd64,linux/arm64,linux/arm/v7
build-args: ${{ inputs.build-args }}
push: true
cache-from: type=registry,ref=${{ fromJSON(inputs.build-json).cache_tag }}
cache-to: type=registry,mode=max,ref=${{ fromJSON(inputs.build-json).cache_tag }}

32
.gitignore vendored
View File

@@ -42,7 +42,6 @@ htmlcov/
nosetests.xml
coverage.xml
*,cover
.pytest_cache
# Translations
*.mo
@@ -57,35 +56,24 @@ docs/_build/
# PyBuilder
target/
# Stored PDFs
media/documents/*.gpg
media/documents/thumbnails/*.gpg
media/documents/originals/*.gpg
# Sqlite database
db.sqlite3
# PyCharm
.idea
# VS Code
.vscode
# Other stuff that doesn't belong
.virtualenv
virtualenv
/venv
/docker-compose.env
/docker-compose.yml
.vagrant
docker-compose.yml
docker-compose.env
# Used for development
scripts/import-for-development
scripts/nuke
# Static files collected by the collectstatic command
/static/
# Stored PDFs
/media/
/data/
/paperless.conf
/consume/
/export/
/src-ui/.vscode
# this is where the compiled frontend is moved to.
/src/documents/static/frontend/
/docs/.vscode/settings.json

View File

@@ -1,94 +0,0 @@
# This file configures pre-commit hooks.
# See https://pre-commit.com/ for general information
# See https://pre-commit.com/hooks.html for a listing of possible hooks
repos:
# General hooks
- repo: https://github.com/pre-commit/pre-commit-hooks
rev: v4.2.0
hooks:
- id: check-docstring-first
- id: check-json
exclude: "tsconfig.*json"
- id: check-yaml
- id: check-toml
- id: check-executables-have-shebangs
- id: end-of-file-fixer
exclude_types:
- svg
- pofile
exclude: "(^LICENSE$)"
- id: mixed-line-ending
args:
- "--fix=lf"
- id: trailing-whitespace
exclude_types:
- svg
- id: check-case-conflict
- id: detect-private-key
- repo: https://github.com/pre-commit/mirrors-prettier
rev: "v2.6.2"
hooks:
- id: prettier
types_or:
- javascript
- ts
- markdown
exclude: "(^Pipfile\\.lock$)"
# Python hooks
- repo: https://github.com/asottile/reorder_python_imports
rev: v3.1.0
hooks:
- id: reorder-python-imports
exclude: "(migrations)"
- repo: https://github.com/asottile/yesqa
rev: "v1.3.0"
hooks:
- id: yesqa
exclude: "(migrations)"
- repo: https://github.com/asottile/add-trailing-comma
rev: "v2.2.3"
hooks:
- id: add-trailing-comma
exclude: "(migrations)"
- repo: https://gitlab.com/pycqa/flake8
rev: 3.9.2
hooks:
- id: flake8
files: ^src/
args:
- "--config=./src/setup.cfg"
- repo: https://github.com/psf/black
rev: 22.3.0
hooks:
- id: black
- repo: https://github.com/asottile/pyupgrade
rev: v2.32.1
hooks:
- id: pyupgrade
exclude: "(migrations)"
args:
- "--py38-plus"
# Dockerfile hooks
- repo: https://github.com/AleksaC/hadolint-py
rev: v2.10.0
hooks:
- id: hadolint
args:
- --ignore
- DL3008 # https://github.com/hadolint/hadolint/wiki/DL3008 (should probably do this at some point)
- --ignore
- DL3013 # https://github.com/hadolint/hadolint/wiki/DL3013 (should probably do this too at some point)
- --ignore
- DL3003 # https://github.com/hadolint/hadolint/wiki/DL3003 (seems excessive to use WORKDIR so much)
# Shell script hooks
- repo: https://github.com/lovesegfault/beautysh
rev: v6.2.1
hooks:
- id: beautysh
args:
- "--tab"
- repo: https://github.com/shellcheck-py/shellcheck-py
rev: "v0.8.0.4"
hooks:
- id: shellcheck

View File

@@ -1,4 +0,0 @@
# https://prettier.io/docs/en/options.html#semicolons
semi: false
# https://prettier.io/docs/en/options.html#quotes
singleQuote: true

View File

@@ -1,16 +0,0 @@
# .readthedocs.yml
# Read the Docs configuration file
# See https://docs.readthedocs.io/en/stable/config-file/v2.html for details
# Required
version: 2
# Build documentation in the docs/ directory with Sphinx
sphinx:
configuration: docs/conf.py
# Optionally set the version of Python and requirements required to build your docs
python:
version: "3.8"
install:
- requirements: docs/requirements.txt

18
.travis.yml Normal file
View File

@@ -0,0 +1,18 @@
language: python
sudo: false
matrix:
include:
- python: 3.4
env: TOXENV=py34
- python: 3.5
env: TOXENV=py35
- python: 3.5
env: TOXENV=pep8
install:
- pip install --requirement requirements.txt
- pip install tox
script: tox -c src/tox.ini

View File

@@ -1,10 +0,0 @@
/.github/workflows/ @paperless-ngx/ci-cd
/docker/ @paperless-ngx/ci-cd
/scripts/ @paperless-ngx/ci-cd
/src-ui/ @paperless-ngx/frontend
/src/ @paperless-ngx/backend
Pipfile* @paperless-ngx/backend
*.py @paperless-ngx/backend
requirements.txt @paperless-ngx/backend

View File

@@ -1,128 +0,0 @@
# Contributor Covenant Code of Conduct
## Our Pledge
We as members, contributors, and leaders pledge to make participation in our
community a harassment-free experience for everyone, regardless of age, body
size, visible or invisible disability, ethnicity, sex characteristics, gender
identity and expression, level of experience, education, socio-economic status,
nationality, personal appearance, race, religion, or sexual identity
and orientation.
We pledge to act and interact in ways that contribute to an open, welcoming,
diverse, inclusive, and healthy community.
## Our Standards
Examples of behavior that contributes to a positive environment for our
community include:
- Demonstrating empathy and kindness toward other people
- Being respectful of differing opinions, viewpoints, and experiences
- Giving and gracefully accepting constructive feedback
- Accepting responsibility and apologizing to those affected by our mistakes,
and learning from the experience
- Focusing on what is best not just for us as individuals, but for the
overall community
Examples of unacceptable behavior include:
- The use of sexualized language or imagery, and sexual attention or
advances of any kind
- Trolling, insulting or derogatory comments, and personal or political attacks
- Public or private harassment
- Publishing others' private information, such as a physical or email
address, without their explicit permission
- Other conduct which could reasonably be considered inappropriate in a
professional setting
## Enforcement Responsibilities
Community leaders are responsible for clarifying and enforcing our standards of
acceptable behavior and will take appropriate and fair corrective action in
response to any behavior that they deem inappropriate, threatening, offensive,
or harmful.
Community leaders have the right and responsibility to remove, edit, or reject
comments, commits, code, wiki edits, issues, and other contributions that are
not aligned to this Code of Conduct, and will communicate reasons for moderation
decisions when appropriate.
## Scope
This Code of Conduct applies within all community spaces, and also applies when
an individual is officially representing the community in public spaces.
Examples of representing our community include using an official e-mail address,
posting via an official social media account, or acting as an appointed
representative at an online or offline event.
## Enforcement
Instances of abusive, harassing, or otherwise unacceptable behavior may be
reported to the community leaders responsible for enforcement at
hello@paperless-ngx.com.
All complaints will be reviewed and investigated promptly and fairly.
All community leaders are obligated to respect the privacy and security of the
reporter of any incident.
## Enforcement Guidelines
Community leaders will follow these Community Impact Guidelines in determining
the consequences for any action they deem in violation of this Code of Conduct:
### 1. Correction
**Community Impact**: Use of inappropriate language or other behavior deemed
unprofessional or unwelcome in the community.
**Consequence**: A private, written warning from community leaders, providing
clarity around the nature of the violation and an explanation of why the
behavior was inappropriate. A public apology may be requested.
### 2. Warning
**Community Impact**: A violation through a single incident or series
of actions.
**Consequence**: A warning with consequences for continued behavior. No
interaction with the people involved, including unsolicited interaction with
those enforcing the Code of Conduct, for a specified period of time. This
includes avoiding interactions in community spaces as well as external channels
like social media. Violating these terms may lead to a temporary or
permanent ban.
### 3. Temporary Ban
**Community Impact**: A serious violation of community standards, including
sustained inappropriate behavior.
**Consequence**: A temporary ban from any sort of interaction or public
communication with the community for a specified period of time. No public or
private interaction with the people involved, including unsolicited interaction
with those enforcing the Code of Conduct, is allowed during this period.
Violating these terms may lead to a permanent ban.
### 4. Permanent Ban
**Community Impact**: Demonstrating a pattern of violation of community
standards, including sustained inappropriate behavior, harassment of an
individual, or aggression toward or disparagement of classes of individuals.
**Consequence**: A permanent ban from any sort of public interaction within
the community.
## Attribution
This Code of Conduct is adapted from the [Contributor Covenant][homepage],
version 2.0, available at
https://www.contributor-covenant.org/version/2/0/code_of_conduct.html.
Community Impact Guidelines were inspired by [Mozilla's code of conduct
enforcement ladder](https://github.com/mozilla/diversity).
[homepage]: https://www.contributor-covenant.org
For answers to common questions about this code of conduct, see the FAQ at
https://www.contributor-covenant.org/faq. Translations are available at
https://www.contributor-covenant.org/translations.

View File

@@ -1,132 +0,0 @@
# Contributing
If you feel like contributing to the project, please do! Bug fixes and improvements are always welcome.
If you want to implement something big:
- Please start a discussion about that in the issues! Maybe something similar is already in development and we can make it happen together.
- When making additions to the project, consider if the majority of users will benefit from your change. If not, you're probably better of forking the project.
- Also consider if your change will get in the way of other users. A good change is a change that enhances the experience of some users who want that change and does not affect users who do not care about the change.
- Please see the [paperless-ngx merge process](#merging-prs) below.
## Python
Paperless supports python 3.8 and 3.9. We format Python code with [Black](https://github.com/psf/black).
## Branches
`main` always reflects the latest release. Apart from changes to the documentation or readme, absolutely no functional changes on this branch in between releases.
`dev` contains all changes that will be part of the next release. Use this branch to start making your changes.
`feature-X` branches are for experimental stuff that will eventually be merged into dev.
## Testing:
Please format and test your code! I know it's a hassle, but it makes sure that your code works now and will allow us to detect regressions easily.
To test your code, execute `pytest` in the src/ directory. This also generates a html coverage report, which you can use to see if you missed anything important during testing.
Before you can run `pytest`, ensure to [properly set up your local environment](https://paperless-ngx.readthedocs.io/en/latest/extending.html#initial-setup-and-first-start).
## More info:
... is available in the documentation. https://paperless-ngx.readthedocs.io/en/latest/extending.html
# Merging PRs
Once you have submitted a **P**ull **R**equest it will be reviewed, approved, and merged by one or more community members of any team. Automated code tests and formatting checks must be passed.
## Non-Trivial Requests
PRs deemed `non-trivial` will go through a stricter review process before being merged into `dev`. This is to ensure code quality and complete functionality (free of side effects).
Examples of `non-trivial` PRs might include:
- Additional features
- Large changes to many distinct files
- Breaking or depreciation of existing features
Our community review process for `non-trivial` PRs is the following:
1. Must pass usual automated code tests and formatting checks.
2. The PR will be assigned and pinged to the appropriately experienced team (i.e. @paperless-ngx/backend for backend changes).
3. Development team will check and test code manually (possibly over several days).
- You may be asked to make changes or rebase.
- The team may ask for additional testing done by @paperless-ngx/test
4. **At least two** members of the team will approve and finally merge the request into `dev` 🎉.
This process might be slow as community members have different schedules and time to dedicate to the Paperless project. However it ensures community code reviews are as brilliantly thorough as they once were with @jonaswinkler.
# Translating Paperless-ngx
Some notes about translation:
- There are two resources:
- `src-ui/messages.xlf` contains the translation strings for the front end. This is the most important.
- `django.po` contains strings for the administration section of paperless, which is nice to have translated.
- Most of the front-end strings are used on buttons, menu items, etc., so ideally the translated string should not be much longer than the English original.
- Translation units may contain placeholders. These usually mean that there's a name of a tag or document or something in the string. You can click on the placeholders to copy them.
- Translation units may contain plural expressions such as `{PLURAL_VAR, plural, =1 {one result} =0 {no results} other {<placeholder> results}}`. Copy these verbatim and translate only the content in the inner `{}` brackets. Example: `{PLURAL_VAR, plural, =1 {Ein Ergebnis} =0 {Keine Ergebnisse} other {<placeholder> Ergebnisse}}`
- Changes to translations on Crowdin will get pushed into the repository automatically.
## Adding new languages to the codebase
If a language has already been added, and you would like to contribute new translations or change existing translations, please read the "Translation" section in the README.md file for further details on that.
If you would like the project to be translated to another language, first head over to https://crwd.in/paperless-ngx to check if that language has already been enabled for translation.
If not, please request the language to be added by creating an issue on GitHub. The issue should contain:
- English name of the language (the localized name can be added on Crowdin).
- ISO language code. A list of those can be found here: https://support.crowdin.com/enterprise/language-codes/
- Date format commonly used for the language, e.g. dd/mm/yyyy, mm/dd/yyyy, etc.
After the language has been added and some translations have been made on Crowdin, the language needs to be enabled in the code.
Note that there is no need to manually add a .po of .xlf file as those will be automatically generated and imported from Crowdin.
The following files need to be changed:
- src-ui/angular.json (under the _projects/paperless-ui/i18n/locales_ JSON key)
- src/paperless/settings.py (in the _LANGUAGES_ array)
- src-ui/src/app/services/settings.service.ts (inside the _getLanguageOptions_ method)
- src-ui/src/app/app.module.ts (import locale from _angular/common/locales_ and call _registerLocaleData_)
Please add the language in the correct order, alphabetically by locale.
Note that _en-us_ needs to stay on top of the list, as it is the default project language
If you are familiar with Git, feel free to send a Pull Request with those changes.
If not, let us know in the issue you created for the language, so that another developer can make these changes.
# Organization Structure & Membership
Paperless-ngx is a community project. We do our best to delegate permission and responsibility among a team of people to ensure the longevity of the project.
## Structure
As of writing, there are 21 members in paperless-ngx. 4 of these people have complete administrative privileges to the repo:
- [@shamoon](https://github.com/shamoon)
- [@bauerj](https://github.com/bauerj)
- [@qcasey](https://github.com/qcasey)
- [@FrankStrieter](https://github.com/FrankStrieter)
There are 5 teams collaborating on specific tasks within paperless-ngx:
- @paperless-ngx/backend (Python / django)
- @paperless-ngx/frontend (JavaScript / Typescript)
- @paperless-ngx/ci-cd (GitHub Actions / Deployment)
- @paperless-ngx/issues (Issue triage)
- @paperless-ngx/test (General testing for larger PRs)
## Permissions
All team members are notified when mentioned or assigned to a relevant issue or pull request. Additionally, each team has slightly different access to paperless-ngx:
- The **test** team has no special permissions.
- The **issues** team has `triage` access. This means they can organize issues and pull requests.
- The **backend**, **frontend**, and **ci-cd** teams have `write` access. This means they can approve PRs and push code, containers, releases, and more.
## Joining
We are not overly strict with inviting people to the organization. If you have read the [team permissions](#permissions) and think having additional access would enhance your contributions, please reach out to an [admin](#structure) of the team.
The admins occasionally invite contributors directly if we believe having them on a team will accelerate their work.

View File

@@ -1,216 +1,46 @@
# syntax=docker/dockerfile:1.4
FROM python:3.5
MAINTAINER Pit Kleyersburg <pitkley@googlemail.com>
# Pull the installer images from the library
# These are all built previously
# They provide either a .deb or .whl
# Install dependencies
RUN apt-get update \
&& apt-get install -y --no-install-recommends \
sudo \
tesseract-ocr tesseract-ocr-eng imagemagick ghostscript unpaper \
&& rm -rf /var/lib/apt/lists/*
ARG JBIG2ENC_VERSION
ARG QPDF_VERSION
ARG PIKEPDF_VERSION
ARG PSYCOPG2_VERSION
# Install python dependencies
RUN mkdir -p /usr/src/paperless
WORKDIR /usr/src/paperless
COPY requirements.txt /usr/src/paperless/
RUN pip install --no-cache-dir -r requirements.txt
FROM ghcr.io/paperless-ngx/paperless-ngx/builder/jbig2enc:${JBIG2ENC_VERSION} as jbig2enc-builder
FROM ghcr.io/paperless-ngx/paperless-ngx/builder/qpdf:${QPDF_VERSION} as qpdf-builder
FROM ghcr.io/paperless-ngx/paperless-ngx/builder/pikepdf:${PIKEPDF_VERSION} as pikepdf-builder
FROM ghcr.io/paperless-ngx/paperless-ngx/builder/psycopg2:${PSYCOPG2_VERSION} as psycopg2-builder
# Copy application
RUN mkdir -p /usr/src/paperless/src
RUN mkdir -p /usr/src/paperless/data
RUN mkdir -p /usr/src/paperless/media
COPY src/ /usr/src/paperless/src/
COPY data/ /usr/src/paperless/data/
COPY media/ /usr/src/paperless/media/
FROM --platform=$BUILDPLATFORM node:16-bullseye-slim AS compile-frontend
# Set consumption directory
ENV PAPERLESS_CONSUMPTION_DIR /consume
RUN mkdir -p $PAPERLESS_CONSUMPTION_DIR
# This stage compiles the frontend
# This stage runs once for the native platform, as the outputs are not
# dependent on target arch
# Inputs: None
# Migrate database
WORKDIR /usr/src/paperless/src
RUN ./manage.py migrate
COPY ./src-ui /src/src-ui
# Create user
RUN groupadd -g 1000 paperless \
&& useradd -u 1000 -g 1000 -d /usr/src/paperless paperless \
&& chown -Rh paperless:paperless /usr/src/paperless
WORKDIR /src/src-ui
RUN set -eux \
&& npm update npm -g \
&& npm ci --no-optional
RUN set -eux \
&& ./node_modules/.bin/ng build --configuration production
# Setup entrypoint
COPY scripts/docker-entrypoint.sh /sbin/docker-entrypoint.sh
RUN chmod 755 /sbin/docker-entrypoint.sh
FROM python:3.9-slim-bullseye as main-app
LABEL org.opencontainers.image.authors="paperless-ngx team <hello@paperless-ngx.com>"
LABEL org.opencontainers.image.documentation="https://paperless-ngx.readthedocs.io/en/latest/"
LABEL org.opencontainers.image.source="https://github.com/paperless-ngx/paperless-ngx"
LABEL org.opencontainers.image.url="https://github.com/paperless-ngx/paperless-ngx"
LABEL org.opencontainers.image.licenses="GPL-3.0-only"
ARG DEBIAN_FRONTEND=noninteractive
#
# Begin installation and configuration
# Order the steps below from least often changed to most
#
# copy jbig2enc
# Basically will never change again
COPY --from=jbig2enc-builder /usr/src/jbig2enc/src/.libs/libjbig2enc* /usr/local/lib/
COPY --from=jbig2enc-builder /usr/src/jbig2enc/src/jbig2 /usr/local/bin/
COPY --from=jbig2enc-builder /usr/src/jbig2enc/src/*.h /usr/local/include/
# Packages need for running
ARG RUNTIME_PACKAGES="\
curl \
file \
# fonts for text file thumbnail generation
fonts-liberation \
gettext \
ghostscript \
gnupg \
gosu \
icc-profiles-free \
imagemagick \
media-types \
liblept5 \
libpq5 \
libxml2 \
liblcms2-2 \
libtiff5 \
libxslt1.1 \
libfreetype6 \
libwebp6 \
libopenjp2-7 \
libimagequant0 \
libraqm0 \
libgnutls30 \
libjpeg62-turbo \
optipng \
python3 \
python3-pip \
python3-setuptools \
postgresql-client \
# For Numpy
libatlas3-base \
# thumbnail size reduction
pngquant \
# OCRmyPDF dependencies
tesseract-ocr \
tesseract-ocr-eng \
tesseract-ocr-deu \
tesseract-ocr-fra \
tesseract-ocr-ita \
tesseract-ocr-spa \
tzdata \
unpaper \
# Mime type detection
zlib1g \
# Barcode splitter
libzbar0 \
poppler-utils"
# Install basic runtime packages.
# These change very infrequently
RUN set -eux \
echo "Installing system packages" \
&& apt-get update \
&& apt-get install --yes --quiet --no-install-recommends ${RUNTIME_PACKAGES} \
&& rm -rf /var/lib/apt/lists/* \
&& echo "Installing supervisor" \
&& python3 -m pip install --default-timeout=1000 --upgrade --no-cache-dir supervisor==4.2.4
# Copy gunicorn config
# Changes very infrequently
WORKDIR /usr/src/paperless/
COPY gunicorn.conf.py .
# setup docker-specific things
# Use mounts to avoid copying installer files into the image
# These change sometimes, but rarely
WORKDIR /usr/src/paperless/src/docker/
RUN --mount=type=bind,readwrite,source=docker,target=./ \
set -eux \
&& echo "Configuring ImageMagick" \
&& cp imagemagick-policy.xml /etc/ImageMagick-6/policy.xml \
&& echo "Configuring supervisord" \
&& mkdir /var/log/supervisord /var/run/supervisord \
&& cp supervisord.conf /etc/supervisord.conf \
&& echo "Setting up Docker scripts" \
&& cp docker-entrypoint.sh /sbin/docker-entrypoint.sh \
&& chmod 755 /sbin/docker-entrypoint.sh \
&& cp docker-prepare.sh /sbin/docker-prepare.sh \
&& chmod 755 /sbin/docker-prepare.sh \
&& cp wait-for-redis.py /sbin/wait-for-redis.py \
&& chmod 755 /sbin/wait-for-redis.py \
&& echo "Installing managment commands" \
&& chmod +x install_management_commands.sh \
&& ./install_management_commands.sh
# Install the built packages from the installer library images
# Use mounts to avoid copying installer files into the image
# These change sometimes
RUN --mount=type=bind,from=qpdf-builder,target=/qpdf \
--mount=type=bind,from=psycopg2-builder,target=/psycopg2 \
--mount=type=bind,from=pikepdf-builder,target=/pikepdf \
set -eux \
&& echo "Installing qpdf" \
&& apt-get install --yes --no-install-recommends /qpdf/usr/src/qpdf/libqpdf28_*.deb \
&& apt-get install --yes --no-install-recommends /qpdf/usr/src/qpdf/qpdf_*.deb \
&& echo "Installing pikepdf and dependencies" \
&& python3 -m pip install --no-cache-dir /pikepdf/usr/src/pikepdf/wheels/packaging*.whl \
&& python3 -m pip install --no-cache-dir /pikepdf/usr/src/pikepdf/wheels/lxml*.whl \
&& python3 -m pip install --no-cache-dir /pikepdf/usr/src/pikepdf/wheels/Pillow*.whl \
&& python3 -m pip install --no-cache-dir /pikepdf/usr/src/pikepdf/wheels/pyparsing*.whl \
&& python3 -m pip install --no-cache-dir /pikepdf/usr/src/pikepdf/wheels/pikepdf*.whl \
&& python -m pip list \
&& echo "Installing psycopg2" \
&& python3 -m pip install --no-cache-dir /psycopg2/usr/src/psycopg2/wheels/psycopg2*.whl \
&& python -m pip list
# Python dependencies
# Change pretty frequently
COPY requirements.txt ../
# Packages needed only for building a few quick Python
# dependencies
ARG BUILD_PACKAGES="\
build-essential \
python3-dev"
RUN set -eux \
&& echo "Installing build system packages" \
&& apt-get update \
&& apt-get install --yes --quiet --no-install-recommends ${BUILD_PACKAGES} \
&& python3 -m pip install --no-cache-dir --upgrade wheel \
&& echo "Installing Python requirements" \
&& python3 -m pip install --default-timeout=1000 --no-cache-dir -r ../requirements.txt \
&& echo "Cleaning up image" \
&& apt-get -y purge ${BUILD_PACKAGES} \
&& apt-get -y autoremove --purge \
&& apt-get clean --yes \
&& rm -rf /var/lib/apt/lists/* \
&& rm -rf /tmp/* \
&& rm -rf /var/tmp/* \
&& rm -rf /var/cache/apt/archives/* \
&& truncate -s 0 /var/log/*log
WORKDIR /usr/src/paperless/src/
# copy backend
COPY ./src ./
# copy frontend
COPY --from=compile-frontend /src/src/documents/static/frontend/ ./documents/static/frontend/
# add users, setup scripts
RUN set -eux \
&& addgroup --gid 1000 paperless \
&& useradd --uid 1000 --gid paperless --home-dir /usr/src/paperless paperless \
&& chown -R paperless:paperless ../ \
&& gosu paperless python3 manage.py collectstatic --clear --no-input \
&& gosu paperless python3 manage.py compilemessages
VOLUME ["/usr/src/paperless/data", \
"/usr/src/paperless/media", \
"/usr/src/paperless/consume", \
"/usr/src/paperless/export"]
# Mount volumes
VOLUME ["/usr/src/paperless/data", "/usr/src/paperless/media", "/consume"]
ENTRYPOINT ["/sbin/docker-entrypoint.sh"]
EXPOSE 8000
CMD ["/usr/local/bin/supervisord", "-c", "/etc/supervisord.conf"]
CMD ["--help"]

72
Pipfile
View File

@@ -1,72 +0,0 @@
[[source]]
url = "https://pypi.python.org/simple"
verify_ssl = true
name = "pypi"
[[source]]
url = "https://www.piwheels.org/simple"
verify_ssl = true
name = "piwheels"
[packages]
dateparser = "~=1.1"
django = "~=4.0"
django-cors-headers = "*"
django-extensions = "*"
django-filter = "~=21.1"
django-q = "~=1.3"
djangorestframework = "~=3.13"
filelock = "*"
fuzzywuzzy = {extras = ["speedup"], version = "*"}
gunicorn = "*"
imap-tools = "~=0.54.0"
langdetect = "*"
pathvalidate = "*"
pillow = "~=9.1"
# Any version update to pikepdf requires a base image update
pikepdf = "~=5.1"
python-gnupg = "*"
python-dotenv = "*"
python-dateutil = "*"
python-magic = "*"
# Any version update to psycopg2 requires a base image update
psycopg2 = "*"
redis = "*"
# Pinned because aarch64 wheels and updates cause warnings when loading the classifier model.
scikit-learn="==1.0.2"
whitenoise = "~=6.0.0"
watchdog = "~=2.1.0"
whoosh="~=2.7.4"
inotifyrecursive = "~=0.3"
ocrmypdf = "~=13.4"
tqdm = "*"
tika = "*"
# TODO: This will sadly also install daphne+dependencies,
# which an ASGI server we don't need. Adds about 15MB image size.
channels = "~=3.0"
channels-redis = "*"
uvicorn = {extras = ["standard"], version = "*"}
concurrent-log-handler = "*"
"pdfminer.six" = "*"
"backports.zoneinfo" = {version = "*", markers = "python_version < '3.9'"}
"importlib-resources" = {version = "*", markers = "python_version < '3.9'"}
zipp = {version = "*", markers = "python_version < '3.9'"}
pyzbar = "*"
pdf2image = "*"
[dev-packages]
coveralls = "*"
factory-boy = "*"
pycodestyle = "*"
pytest = "*"
pytest-cov = "*"
pytest-django = "*"
pytest-env = "*"
pytest-sugar = "*"
pytest-xdist = "*"
sphinx = "~=4.5.0"
sphinx_rtd_theme = "*"
tox = "*"
black = "*"
pre-commit = "*"
myst-parser = "*"

2177
Pipfile.lock generated

File diff suppressed because it is too large Load Diff

120
README.md
View File

@@ -1,120 +0,0 @@
[![ci](https://github.com/paperless-ngx/paperless-ngx/workflows/ci/badge.svg)](https://github.com/paperless-ngx/paperless-ngx/actions)
[![Crowdin](https://badges.crowdin.net/paperless-ngx/localized.svg)](https://crowdin.com/project/paperless-ngx)
[![Documentation Status](https://readthedocs.org/projects/paperless-ngx/badge/?version=latest)](https://paperless-ngx.readthedocs.io/en/latest/?badge=latest)
[![Coverage Status](https://coveralls.io/repos/github/paperless-ngx/paperless-ngx/badge.svg?branch=master)](https://coveralls.io/github/paperless-ngx/paperless-ngx?branch=master)
[![Chat on Matrix](https://matrix.to/img/matrix-badge.svg)](https://matrix.to/#/#paperless:adnidor.de)
<p align="center">
<img src="https://github.com/paperless-ngx/paperless-ngx/raw/main/resources/logo/web/png/Black%20logo%20-%20no%20background.png#gh-light-mode-only" width="50%" />
<img src="https://github.com/paperless-ngx/paperless-ngx/raw/main/resources/logo/web/png/White%20logo%20-%20no%20background.png#gh-dark-mode-only" width="50%" />
</p>
<!-- omit in toc -->
# Paperless-ngx
Paperless-ngx is a document management system that transforms your physical documents into a searchable online archive so you can keep, well, _less paper_.
Paperless-ngx forked from [paperless-ng](https://github.com/jonaswinkler/paperless-ng) to continue the great work and distribute responsibility of supporting and advancing the project among a team of people. [Consider joining us!](#community-support) Discussion of this transition can be found in issues
[#1599](https://github.com/jonaswinkler/paperless-ng/issues/1599) and [#1632](https://github.com/jonaswinkler/paperless-ng/issues/1632).
A demo is available at [demo.paperless-ngx.com](https://demo.paperless-ngx.com) using login `demo` / `demo`. _Note: demo content is reset frequently and confidential information should not be uploaded._
- [Features](#features)
- [Getting started](#getting-started)
- [Contributing](#contributing)
- [Community Support](#community-support)
- [Translation](#translation)
- [Feature Requests](#feature-requests)
- [Bugs](#bugs)
- [Affiliated Projects](#affiliated-projects)
- [Important Note](#important-note)
# Features
![Dashboard](https://raw.githubusercontent.com/paperless-ngx/paperless-ngx/main/docs/_static/screenshots/documents-wchrome.png#gh-light-mode-only)
![Dashboard](https://raw.githubusercontent.com/paperless-ngx/paperless-ngx/main/docs/_static/screenshots/documents-wchrome-dark.png#gh-dark-mode-only)
- Organize and index your scanned documents with tags, correspondents, types, and more.
- Performs OCR on your documents, adds selectable text to image only documents and adds tags, correspondents and document types to your documents.
- Supports PDF documents, images, plain text files, and Office documents (Word, Excel, Powerpoint, and LibreOffice equivalents).
- Office document support is optional and provided by Apache Tika (see [configuration](https://paperless-ngx.readthedocs.io/en/latest/configuration.html#tika-settings))
- Paperless stores your documents plain on disk. Filenames and folders are managed by paperless and their format can be configured freely.
- Single page application front end.
- Includes a dashboard that shows basic statistics and has document upload.
- Filtering by tags, correspondents, types, and more.
- Customizable views can be saved and displayed on the dashboard.
- Full text search helps you find what you need.
- Auto completion suggests relevant words from your documents.
- Results are sorted by relevance to your search query.
- Highlighting shows you which parts of the document matched the query.
- Searching for similar documents ("More like this")
- Email processing: Paperless adds documents from your email accounts.
- Configure multiple accounts and filters for each account.
- When adding documents from mail, paperless can move these mail to a new folder, mark them as read, flag them as important or delete them.
- Machine learning powered document matching.
- Paperless-ngx learns from your documents and will be able to automatically assign tags, correspondents and types to documents once you've stored a few documents in paperless.
- Optimized for multi core systems: Paperless-ngx consumes multiple documents in parallel.
- The integrated sanity checker makes sure that your document archive is in good health.
- [More screenshots are available in the documentation](https://paperless-ngx.readthedocs.io/en/latest/screenshots.html).
# Getting started
The easiest way to deploy paperless is docker-compose. The files in the [`/docker/compose` directory](https://github.com/paperless-ngx/paperless-ngx/tree/main/docker/compose) are configured to pull the image from Github Packages.
If you'd like to jump right in, you can configure a docker-compose environment with our install script:
```bash
bash -c "$(curl -L https://raw.githubusercontent.com/paperless-ngx/paperless-ngx/main/install-paperless-ngx.sh)"
```
Alternatively, you can install the dependencies and setup apache and a database server yourself. The [documentation](https://paperless-ngx.readthedocs.io/en/latest/setup.html#installation) has a step by step guide on how to do it.
Migrating from Paperless-ng is easy, just drop in the new docker image! See the [documentation on migrating](https://paperless-ngx.readthedocs.io/en/latest/setup.html#migrating-from-paperless-ng) for more details.
<!-- omit in toc -->
### Documentation
The documentation for Paperless-ngx is available on [ReadTheDocs](https://paperless-ngx.readthedocs.io/).
# Contributing
If you feel like contributing to the project, please do! Bug fixes, enhancements, visual fixes etc. are always welcome. If you want to implement something big: Please start a discussion about that! The [documentation](https://paperless-ngx.readthedocs.io/en/latest/extending.html) has some basic information on how to get started.
## Community Support
People interested in continuing the work on paperless-ngx are encouraged to reach out here on github and in the [Matrix Room](https://matrix.to/#/#paperless:adnidor.de). If you would like to contribute to the project on an ongoing basis there are multiple [teams](https://github.com/orgs/paperless-ngx/people) (frontend, ci/cd, etc) that could use your help so please reach out!
## Translation
Paperless-ngx is available in many languages that are coordinated on Crowdin. If you want to help out by translating paperless-ngx into your language, please head over to https://crwd.in/paperless-ngx, and thank you! More details can be found in [CONTRIBUTING.md](https://github.com/paperless-ngx/paperless-ngx/blob/main/CONTRIBUTING.md#translating-paperless-ngx).
## Feature Requests
Feature requests can be submitted via [GitHub Discussions](https://github.com/paperless-ngx/paperless-ngx/discussions/categories/feature-requests), you can search for existing ideas, add your own and vote for the ones you care about.
## Bugs
For bugs please [open an issue](https://github.com/paperless-ngx/paperless-ngx/issues) or [start a discussion](https://github.com/paperless-ngx/paperless-ngx/discussions) if you have questions.
# Affiliated Projects
Paperless has been around a while now, and people are starting to build stuff on top of it. If you're one of those people, we can add your project to this list:
- [Paperless App](https://github.com/bauerj/paperless_app): An Android/iOS app for Paperless-ngx. Also works with the original Paperless and Paperless-ngx.
- [Paperless Share](https://github.com/qcasey/paperless_share). Share any files from your Android application with paperless. Very simple, but works with all of the mobile scanning apps out there that allow you to share scanned documents.
- [Scan to Paperless](https://github.com/sbrunner/scan-to-paperless): Scan and prepare (crop, deskew, OCR, ...) your documents for Paperless.
These projects also exist, but their status and compatibility with paperless-ngx is unknown.
- [paperless-cli](https://github.com/stgarf/paperless-cli): A golang command line binary to interact with a Paperless instance.
This project also exists, but needs updates to be compatible with paperless-ngx.
- [Paperless Desktop](https://github.com/thomasbrueggemann/paperless-desktop): A desktop UI for your Paperless installation. Runs on Mac, Linux, and Windows.
Known issues on Mac: (Could not load reminders and documents)
# Important Note
Document scanners are typically used to scan sensitive documents. Things like your social insurance number, tax records, invoices, etc. Everything is stored in the clear without encryption. This means that Paperless should never be run on an untrusted host. Instead, I recommend that if you do want to use it, run it locally on a server in your own home.

140
README.rst Normal file
View File

@@ -0,0 +1,140 @@
Paperless
#########
|Documentation|
|Chat|
|Travis|
|Dependencies|
Scan, index, and archive all of your paper documents
I hate paper. Environmental issues aside, it's a tech person's nightmare:
* There's no search feature
* It takes up physical space
* Backups mean more paper
In the past few months I've been bitten more than a few times by the problem
of not having the right document around. Sometimes I recycled a document I
needed (who keeps water bills for two years?) and other times I just lost
it... because paper. I wrote this to make my life easier.
How it Works
============
1. Buy a document scanner like `this one`_.
2. Set it up to "scan to FTP" or something similar. It should be able to push
scanned images to a server without you having to do anything. If your
scanner doesn't know how to automatically upload the file somewhere, you can
always do that manually. Paperless doesn't care how the documents get into
its local consumption directory.
3. Have the target server run the Paperless consumption script to OCR the PDF
and index it into a local database.
4. Use the web frontend to sift through the database and find what you want.
5. Download the PDF you need/want via the web interface and do whatever you
like with it. You can even print it and send it as if it's the original.
In most cases, no one will care or notice.
Here's what you get:
.. image:: docs/_static/screenshot.png
:alt: The before and after
:target: docs/_static/screenshot.png
Stability
=========
Paperless is still under active development (just look at the git commit
history) so don't expect it to be 100% stable. I'm using it for my own
documents, but I'm crazy like that. If you use this and it breaks something,
you get to keep all the shiny pieces.
Requirements
============
This is all really a quite simple, shiny, user-friendly wrapper around some very
powerful tools.
* `ImageMagick`_ converts the images between colour and greyscale.
* `Tesseract`_ does the character recognition.
* `Unpaper`_ despeckles and deskews the scanned image.
* `GNU Privacy Guard`_ is used as the encryption backend.
* `Python 3`_ is the language of the project.
* `Pillow`_ loads the image data as a python object to be used with PyOCR.
* `PyOCR`_ is a slick programmatic wrapper around tesseract.
* `Django`_ is the framework this project is written against.
* `Python-GNUPG`_ decrypts the PDFs on-the-fly to allow you to download
unencrypted files, leaving the encrypted ones on-disk.
Documentation
=============
It's all available on `ReadTheDocs`_.
Similar Projects
================
There's another project out there called `Mayan EDMS`_ that has a surprising
amount of technical overlap with Paperless. Also based on Django and using
a consumer model with Tesseract and unpaper, Mayan EDMS is *much* more
featureful and comes with a slick UI as well. It may be that Paperless is
better suited for low-resource environments (like a Rasberry Pi), but to be
honest, this is just a guess as I haven't tested this myself. One thing's
for certain though, *Paperless* is a **much** better name.
Important Note
==============
Document scanners are typically used to scan sensitive documents. Things like
your social insurance number, tax records, invoices, etc. While paperless
encrypts the original PDFs via the consumption script, the OCR'd text is *not*
encrypted and is therefore stored in the clear (it needs to be searchable, so
if someone has ideas on how to do that on encrypted data, I'm all ears). This
means that paperless should never be run on an untrusted host. Instead, I
recommend that if you do want to use it, run it locally on a server in your own
home.
Donations
=========
As with all Free software, the power is less in the finances and more in the
collective efforts. I really appreciate every pull request and bug report
offered up by Paperless' users, so please keep that stuff coming. If however,
you're not one for coding/design/documentation, and would like to contribute
financially, I won't say no ;-)
The thing is, I'm doing ok for money, so I would instead ask you to donate to
the `United Nations High Commissioner for Refugees`_. They're doing important
work and they need the money a lot more than I do.
.. _this one: http://www.brother.ca/en-CA/Scanners/11/ProductDetail/ADS1500W?ProductDetail=productdetail
.. _ImageMagick: http://imagemagick.org/
.. _Tesseract: https://github.com/tesseract-ocr
.. _Unpaper: https://www.flameeyes.eu/projects/unpaper
.. _GNU Privacy Guard: https://gnupg.org/
.. _Python 3: https://python.org/
.. _Pillow: https://pypi.python.org/pypi/pillowfight/
.. _PyOCR: https://github.com/jflesch/pyocr
.. _Django: https://www.djangoproject.com/
.. _Python-GNUPG: http://pythonhosted.org/python-gnupg/
.. _ReadTheDocs: https://paperless.readthedocs.org/
.. _Mayan EDMS: https://mayan.readthedocs.org/en/latest/
.. _United Nations High Commissioner for Refugees: https://donate.unhcr.org/int-en/general
.. |Documentation| image:: https://readthedocs.org/projects/paperless/badge/?version=latest
:alt: Read the documentation at https://paperless.readthedocs.org/
:target: https://paperless.readthedocs.org/
.. |Chat| image:: https://badges.gitter.im/danielquinn/paperless.svg
:alt: Join the chat at https://gitter.im/danielquinn/paperless
:target: https://gitter.im/danielquinn/paperless?utm_source=badge&utm_medium=badge&utm_campaign=pr-badge&utm_content=badge
.. |Travis| image:: https://travis-ci.org/danielquinn/paperless.svg?branch=master
:target: https://travis-ci.org/danielquinn/paperless
.. |Dependencies| image:: https://www.versioneye.com/user/projects/57b33b81d9f1b00016faa500/badge.svg?style=flat-square
:target: https://www.versioneye.com/user/projects/57b33b81d9f1b00016faa500

15
Vagrantfile vendored Normal file
View File

@@ -0,0 +1,15 @@
# -*- mode: ruby -*-
# vi: set ft=ruby :
VAGRANT_API_VERSION = "2"
Vagrant.configure(VAGRANT_API_VERSION) do |config|
config.vm.box = "ubuntu/trusty64"
# Provision using shell
config.vm.host_name = "dev.paperless"
config.vm.synced_folder ".", "/opt/paperless"
config.vm.provision "shell", path: "scripts/vagrant-provision"
# Networking details
config.vm.network "private_network", ip: "172.28.128.4"
end

View File

@@ -1,42 +0,0 @@
#!/usr/bin/env bash
# Helper script for building the Docker image locally.
# Parses and provides the nessecary versions of other images to Docker
# before passing in the rest of script args.
# First Argument: The Dockerfile to build
# Other Arguments: Additional arguments to docker build
# Example Usage:
# ./build-docker-image.sh Dockerfile -t paperless-ngx:my-awesome-feature
set -eux
if ! command -v jq; then
echo "jq required"
exit 1
elif [ ! -f "$1" ]; then
echo "$1 is not a file, please provide the Dockerfile"
exit 1
fi
# Parse what we can from Pipfile.lock
pikepdf_version=$(jq ".default.pikepdf.version" Pipfile.lock | sed 's/=//g' | sed 's/"//g')
psycopg2_version=$(jq ".default.psycopg2.version" Pipfile.lock | sed 's/=//g' | sed 's/"//g')
# Read this from the other config file
qpdf_version=$(jq ".qpdf.version" .build-config.json | sed 's/"//g')
jbig2enc_version=$(jq ".jbig2enc.version" .build-config.json | sed 's/"//g')
# Get the branch name (used for caching)
branch_name=$(git rev-parse --abbrev-ref HEAD)
# https://docs.docker.com/develop/develop-images/build_enhancements/
# Required to use cache-from
export DOCKER_BUILDKIT=1
docker build --file "$1" \
--cache-from ghcr.io/paperless-ngx/paperless-ngx/builder/cache/app:"${branch_name}" \
--cache-from ghcr.io/paperless-ngx/paperless-ngx/builder/cache/app:dev \
--build-arg JBIG2ENC_VERSION="${jbig2enc_version}" \
--build-arg QPDF_VERSION="${qpdf_version}" \
--build-arg PIKEPDF_VERSION="${pikepdf_version}" \
--build-arg PSYCOPG2_VERSION="${psycopg2_version}" "${@:2}" .

View File

@@ -1,6 +0,0 @@
commit_message: '[ci skip]'
files:
- source: /src/locale/en_US/LC_MESSAGES/django.po
translation: /src/locale/%locale_with_underscore%/LC_MESSAGES/django.po
- source: /src-ui/messages.xlf
translation: /src-ui/src/locale/messages.%locale_with_underscore%.xlf

View File

@@ -1,14 +0,0 @@
# This Dockerfile compiles the frontend
# Inputs: None
FROM node:16-bullseye-slim AS compile-frontend
COPY ./src /src/src
COPY ./src-ui /src/src-ui
WORKDIR /src/src-ui
RUN set -eux \
&& npm update npm -g \
&& npm ci --no-optional
RUN set -eux \
&& ./node_modules/.bin/ng build --configuration production

View File

@@ -1,39 +0,0 @@
# This Dockerfile compiles the jbig2enc library
# Inputs:
# - JBIG2ENC_VERSION - the Git tag to checkout and build
FROM debian:bullseye-slim as main
LABEL org.opencontainers.image.description="A intermediate image with jbig2enc built"
ARG DEBIAN_FRONTEND=noninteractive
ARG BUILD_PACKAGES="\
build-essential \
automake \
libtool \
libleptonica-dev \
zlib1g-dev \
git \
ca-certificates"
WORKDIR /usr/src/jbig2enc
# As this is an base image for a multi-stage final image
# the added size of the install is basically irrelevant
RUN apt-get update --quiet \
&& apt-get install --yes --quiet --no-install-recommends ${BUILD_PACKAGES} \
&& rm -rf /var/lib/apt/lists/*
# Layers after this point change according to required version
# For better caching, seperate the basic installs from
# the building
ARG JBIG2ENC_VERSION
RUN set -eux \
&& git clone --quiet --branch $JBIG2ENC_VERSION https://github.com/agl/jbig2enc .
RUN set -eux \
&& ./autogen.sh
RUN set -eux \
&& ./configure && make

View File

@@ -1,92 +0,0 @@
# This Dockerfile builds the pikepdf wheel
# Inputs:
# - REPO - Docker repository to pull qpdf from
# - QPDF_VERSION - The image qpdf version to copy .deb files from
# - PIKEPDF_GIT_TAG - The Git tag to clone and build from
# - PIKEPDF_VERSION - Used to force the built pikepdf version to match
# Default to pulling from the main repo registry when manually building
ARG REPO="paperless-ngx/paperless-ngx"
ARG QPDF_VERSION
FROM ghcr.io/${REPO}/builder/qpdf:${QPDF_VERSION} as qpdf-builder
# This does nothing, except provide a name for a copy below
FROM python:3.9-slim-bullseye as main
LABEL org.opencontainers.image.description="A intermediate image with pikepdf wheel built"
ARG DEBIAN_FRONTEND=noninteractive
ARG BUILD_PACKAGES="\
build-essential \
python3-dev \
python3-pip \
git \
# qpdf requirement - https://github.com/qpdf/qpdf#crypto-providers
libgnutls28-dev \
# lxml requrements - https://lxml.de/installation.html
libxml2-dev \
libxslt1-dev \
# Pillow requirements - https://pillow.readthedocs.io/en/stable/installation.html#external-libraries
# JPEG functionality
libjpeg62-turbo-dev \
# conpressed PNG
zlib1g-dev \
# compressed TIFF
libtiff-dev \
# type related services
libfreetype-dev \
# color management
liblcms2-dev \
# WebP format
libwebp-dev \
# JPEG 2000
libopenjp2-7-dev \
# improved color quantization
libimagequant-dev \
# complex text layout support
libraqm-dev"
WORKDIR /usr/src
COPY --from=qpdf-builder /usr/src/qpdf/*.deb ./
# As this is an base image for a multi-stage final image
# the added size of the install is basically irrelevant
RUN set -eux \
&& apt-get update --quiet \
&& apt-get install --yes --quiet --no-install-recommends $BUILD_PACKAGES \
&& dpkg --install libqpdf28_*.deb \
&& dpkg --install libqpdf-dev_*.deb \
&& python3 -m pip install --no-cache-dir --upgrade \
pip \
wheel \
# https://pikepdf.readthedocs.io/en/latest/installation.html#requirements
pybind11 \
&& rm -rf /var/lib/apt/lists/*
# Layers after this point change according to required version
# For better caching, seperate the basic installs from
# the building
ARG PIKEPDF_GIT_TAG
ARG PIKEPDF_VERSION
RUN set -eux \
&& echo "building pikepdf wheel" \
# Note the v in the tag name here
&& git clone --quiet --depth 1 --branch "${PIKEPDF_GIT_TAG}" https://github.com/pikepdf/pikepdf.git \
&& cd pikepdf \
# pikepdf seems to specifciy either a next version when built OR
# a post release tag.
# In either case, this won't match what we want from requirements.txt
# Directly modify the setup.py to set the version we just checked out of Git
&& sed -i "s/use_scm_version=True/version=\"${PIKEPDF_VERSION}\"/g" setup.py \
# https://github.com/pikepdf/pikepdf/issues/323
&& rm pyproject.toml \
&& mkdir wheels \
&& python3 -m pip wheel . --wheel-dir wheels \
&& ls -ahl wheels

View File

@@ -1,45 +0,0 @@
# This Dockerfile builds the psycopg2 wheel
# Inputs:
# - PSYCOPG2_GIT_TAG - The Git tag to clone and build from
# - PSYCOPG2_VERSION - Unused, kept for future possible usage
FROM python:3.9-slim-bullseye as main
LABEL org.opencontainers.image.description="A intermediate image with psycopg2 wheel built"
ARG DEBIAN_FRONTEND=noninteractive
ARG BUILD_PACKAGES="\
build-essential \
git \
python3-dev \
python3-pip \
# https://www.psycopg.org/docs/install.html#prerequisites
libpq-dev"
WORKDIR /usr/src
# As this is an base image for a multi-stage final image
# the added size of the install is basically irrelevant
RUN set -eux \
&& apt-get update --quiet \
&& apt-get install --yes --quiet --no-install-recommends $BUILD_PACKAGES \
&& rm -rf /var/lib/apt/lists/* \
&& python3 -m pip install --no-cache-dir --upgrade pip wheel
# Layers after this point change according to required version
# For better caching, seperate the basic installs from
# the building
ARG PSYCOPG2_GIT_TAG
ARG PSYCOPG2_VERSION
RUN set -eux \
&& echo "Building psycopg2 wheel" \
&& cd /usr/src \
&& git clone --quiet --depth 1 --branch ${PSYCOPG2_GIT_TAG} https://github.com/psycopg/psycopg2.git \
&& cd psycopg2 \
&& mkdir wheels \
&& python3 -m pip wheel . --wheel-dir wheels \
&& ls -ahl wheels/

View File

@@ -1,53 +0,0 @@
FROM debian:bullseye-slim as main
LABEL org.opencontainers.image.description="A intermediate image with qpdf built"
ARG DEBIAN_FRONTEND=noninteractive
ARG BUILD_PACKAGES="\
build-essential \
debhelper \
debian-keyring \
devscripts \
equivs \
libtool \
# https://qpdf.readthedocs.io/en/stable/installation.html#system-requirements
libjpeg62-turbo-dev \
libgnutls28-dev \
packaging-dev \
zlib1g-dev"
WORKDIR /usr/src
# As this is an base image for a multi-stage final image
# the added size of the install is basically irrelevant
RUN set -eux \
&& apt-get update --quiet \
&& apt-get install --yes --quiet --no-install-recommends $BUILD_PACKAGES \
&& rm -rf /var/lib/apt/lists/*
# Layers after this point change according to required version
# For better caching, seperate the basic installs from
# the building
# This must match to pikepdf's minimum at least
ARG QPDF_VERSION
# In order to get the required version of qpdf, it is backported from bookwork
# and then built from source
RUN set -eux \
&& echo "Building qpdf" \
&& echo "deb-src http://deb.debian.org/debian/ bookworm main" > /etc/apt/sources.list.d/bookworm-src.list \
&& apt-get update \
&& mkdir qpdf \
&& cd qpdf \
&& apt-get source --yes --quiet qpdf=${QPDF_VERSION}-1/bookworm \
&& rm -rf /var/lib/apt/lists/* \
&& cd qpdf-$QPDF_VERSION \
# We don't need to build the tests (also don't run them)
&& rm -rf libtests \
&& DEBEMAIL=hello@paperless-ngx.com debchange --bpo \
&& export DEB_BUILD_OPTIONS="terse nocheck nodoc parallel=2" \
&& dpkg-buildpackage --build=binary --unsigned-source --unsigned-changes \
&& ls -ahl ../*.deb

View File

@@ -0,0 +1,15 @@
# Environment variables to set for Paperless
# Commented out variables will be replaced by a default within Paperless.
# Passphrase Paperless uses to encrypt and decrypt your documents
PAPERLESS_PASSPHRASE=CHANGE_ME
# The amount of threads to use for text recognition
# PAPERLESS_OCR_THREADS=4
# Additional languages to install for text recognition
# PAPERLESS_OCR_LANGUAGES=deu ita
# You can change the default user and group id to a custom one
# USERMAP_UID=1000
# USERMAP_GID=1000

View File

@@ -0,0 +1,41 @@
version: '2'
services:
webserver:
image: pitkley/paperless
ports:
# You can adapt the port you want Paperless to listen on by
# modifying the part before the `:`.
- "8000:8000"
volumes:
- data:/usr/src/paperless/data
- media:/usr/src/paperless/media
env_file: docker-compose.env
# The reason the line is here is so that the webserver that doesn't do
# any text recognition and doesn't have to install unnecessary
# languages the user might have set in the env-file by overwriting the
# value with nothing.
environment:
- PAPERLESS_OCR_LANGUAGES=
command: ["runserver", "0.0.0.0:8000"]
consumer:
image: pitkley/paperless
volumes:
- data:/usr/src/paperless/data
- media:/usr/src/paperless/media
# You have to adapt the local path you want the consumption
# directory to mount to by modifying the part before the ':'.
- /path/to/arbitrary/place:/consume
# Likewise, you can add a local path to mount a directory for
# exporting. This is not strictly needed for paperless to
# function, only if you're exporting your files: uncomment
# it and fill in a local path if you know you're going to
# want to export your documents.
# - /path/to/another/arbitrary/place:/export
env_file: docker-compose.env
command: ["document_consumer"]
volumes:
data:
media:

View File

@@ -1 +0,0 @@
COMPOSE_PROJECT_NAME=paperless

View File

@@ -1,38 +0,0 @@
# The UID and GID of the user used to run paperless in the container. Set this
# to your UID and GID on the host so that you have write access to the
# consumption directory.
#USERMAP_UID=1000
#USERMAP_GID=1000
# Additional languages to install for text recognition, separated by a
# whitespace. Note that this is
# different from PAPERLESS_OCR_LANGUAGE (default=eng), which defines the
# language used for OCR.
# The container installs English, German, Italian, Spanish and French by
# default.
# See https://packages.debian.org/search?keywords=tesseract-ocr-&searchon=names&suite=buster
# for available languages.
#PAPERLESS_OCR_LANGUAGES=tur ces
###############################################################################
# Paperless-specific settings #
###############################################################################
# All settings defined in the paperless.conf.example can be used here. The
# Docker setup does not use the configuration file.
# A few commonly adjusted settings are provided below.
# This is required if you will be exposing Paperless-ngx on a public domain
# (if doing so please consider security measures such as reverse proxy)
#PAPERLESS_URL=https://paperless.example.com
# Adjust this key if you plan to make paperless available publicly. It should
# be a very long sequence of random characters. You don't need to remember it.
#PAPERLESS_SECRET_KEY=change-me
# Use this variable to set a timezone for the Paperless Docker containers. If not specified, defaults to UTC.
#PAPERLESS_TIME_ZONE=America/Los_Angeles
# The default language to use for OCR. Set this to the language most of your
# documents are written in.
#PAPERLESS_OCR_LANGUAGE=eng

View File

@@ -1,97 +0,0 @@
# docker-compose file for running paperless from the Docker Hub.
# This file contains everything paperless needs to run.
# Paperless supports amd64, arm and arm64 hardware.
#
# All compose files of paperless configure paperless in the following way:
#
# - Paperless is (re)started on system boot, if it was running before shutdown.
# - Docker volumes for storing data are managed by Docker.
# - Folders for importing and exporting files are created in the same directory
# as this file and mounted to the correct folders inside the container.
# - Paperless listens on port 8010.
#
# In addition to that, this docker-compose file adds the following optional
# configurations:
#
# - Instead of SQLite (default), PostgreSQL is used as the database server.
#
# To install and update paperless with this file, do the following:
#
# - Open portainer Stacks list and click 'Add stack'
# - Paste the contents of this file and assign a name, e.g. 'Paperless'
# - Click 'Deploy the stack' and wait for it to be deployed
# - Open the list of containers, select paperless_webserver_1
# - Click 'Console' and then 'Connect' to open the command line inside the container
# - Run 'python3 manage.py createsuperuser' to create a user
# - Exit the console
#
# For more extensive installation and update instructions, refer to the
# documentation.
version: "3.4"
services:
broker:
image: redis:6.0
restart: unless-stopped
volumes:
- redisdata:/data
db:
image: postgres:13
restart: unless-stopped
volumes:
- pgdata:/var/lib/postgresql/data
environment:
POSTGRES_DB: paperless
POSTGRES_USER: paperless
POSTGRES_PASSWORD: paperless
webserver:
image: ghcr.io/paperless-ngx/paperless-ngx:latest
restart: unless-stopped
depends_on:
- db
- broker
ports:
- 8010:8000
healthcheck:
test: ["CMD", "curl", "-fs", "-S", "--max-time", "2", "http://localhost:8000"]
interval: 30s
timeout: 10s
retries: 5
volumes:
- data:/usr/src/paperless/data
- media:/usr/src/paperless/media
- ./export:/usr/src/paperless/export
- ./consume:/usr/src/paperless/consume
environment:
PAPERLESS_REDIS: redis://broker:6379
PAPERLESS_DBHOST: db
# The UID and GID of the user used to run paperless in the container. Set this
# to your UID and GID on the host so that you have write access to the
# consumption directory.
USERMAP_UID: 1000
USERMAP_GID: 100
# Additional languages to install for text recognition, separated by a
# whitespace. Note that this is
# different from PAPERLESS_OCR_LANGUAGE (default=eng), which defines the
# language used for OCR.
# The container installs English, German, Italian, Spanish and French by
# default.
# See https://packages.debian.org/search?keywords=tesseract-ocr-&searchon=names&suite=buster
# for available languages.
#PAPERLESS_OCR_LANGUAGES: tur ces
# Adjust this key if you plan to make paperless available publicly. It should
# be a very long sequence of random characters. You don't need to remember it.
#PAPERLESS_SECRET_KEY: change-me
# Use this variable to set a timezone for the Paperless Docker containers. If not specified, defaults to UTC.
#PAPERLESS_TIME_ZONE: America/Los_Angeles
# The default language to use for OCR. Set this to the language most of your
# documents are written in.
#PAPERLESS_OCR_LANGUAGE: eng
volumes:
data:
media:
pgdata:
redisdata:

View File

@@ -1,94 +0,0 @@
# docker-compose file for running paperless from the docker container registry.
# This file contains everything paperless needs to run.
# Paperless supports amd64, arm and arm64 hardware.
#
# All compose files of paperless configure paperless in the following way:
#
# - Paperless is (re)started on system boot, if it was running before shutdown.
# - Docker volumes for storing data are managed by Docker.
# - Folders for importing and exporting files are created in the same directory
# as this file and mounted to the correct folders inside the container.
# - Paperless listens on port 8000.
#
# In addition to that, this docker-compose file adds the following optional
# configurations:
#
# - Instead of SQLite (default), PostgreSQL is used as the database server.
# - Apache Tika and Gotenberg servers are started with paperless and paperless
# is configured to use these services. These provide support for consuming
# Office documents (Word, Excel, Power Point and their LibreOffice counter-
# parts.
#
# To install and update paperless with this file, do the following:
#
# - Copy this file as 'docker-compose.yml' and the files 'docker-compose.env'
# and '.env' into a folder.
# - Run 'docker-compose pull'.
# - Run 'docker-compose run --rm webserver createsuperuser' to create a user.
# - Run 'docker-compose up -d'.
#
# For more extensive installation and update instructions, refer to the
# documentation.
version: "3.4"
services:
broker:
image: redis:6.0
restart: unless-stopped
volumes:
- redisdata:/data
db:
image: postgres:13
restart: unless-stopped
volumes:
- pgdata:/var/lib/postgresql/data
environment:
POSTGRES_DB: paperless
POSTGRES_USER: paperless
POSTGRES_PASSWORD: paperless
webserver:
image: ghcr.io/paperless-ngx/paperless-ngx:latest
restart: unless-stopped
depends_on:
- db
- broker
- gotenberg
- tika
ports:
- 8000:8000
healthcheck:
test: ["CMD", "curl", "-fs", "-S", "--max-time", "2", "http://localhost:8000"]
interval: 30s
timeout: 10s
retries: 5
volumes:
- data:/usr/src/paperless/data
- media:/usr/src/paperless/media
- ./export:/usr/src/paperless/export
- ./consume:/usr/src/paperless/consume
env_file: docker-compose.env
environment:
PAPERLESS_REDIS: redis://broker:6379
PAPERLESS_DBHOST: db
PAPERLESS_TIKA_ENABLED: 1
PAPERLESS_TIKA_GOTENBERG_ENDPOINT: http://gotenberg:3000
PAPERLESS_TIKA_ENDPOINT: http://tika:9998
gotenberg:
image: gotenberg/gotenberg:7.4
restart: unless-stopped
command:
- "gotenberg"
- "--chromium-disable-routes=true"
tika:
image: ghcr.io/paperless-ngx/tika:latest
restart: unless-stopped
volumes:
data:
media:
pgdata:
redisdata:

View File

@@ -1,75 +0,0 @@
# docker-compose file for running paperless from the Docker Hub.
# This file contains everything paperless needs to run.
# Paperless supports amd64, arm and arm64 hardware.
#
# All compose files of paperless configure paperless in the following way:
#
# - Paperless is (re)started on system boot, if it was running before shutdown.
# - Docker volumes for storing data are managed by Docker.
# - Folders for importing and exporting files are created in the same directory
# as this file and mounted to the correct folders inside the container.
# - Paperless listens on port 8000.
#
# In addition to that, this docker-compose file adds the following optional
# configurations:
#
# - Instead of SQLite (default), PostgreSQL is used as the database server.
#
# To install and update paperless with this file, do the following:
#
# - Copy this file as 'docker-compose.yml' and the files 'docker-compose.env'
# and '.env' into a folder.
# - Run 'docker-compose pull'.
# - Run 'docker-compose run --rm webserver createsuperuser' to create a user.
# - Run 'docker-compose up -d'.
#
# For more extensive installation and update instructions, refer to the
# documentation.
version: "3.4"
services:
broker:
image: redis:6.0
restart: unless-stopped
volumes:
- redisdata:/data
db:
image: postgres:13
restart: unless-stopped
volumes:
- pgdata:/var/lib/postgresql/data
environment:
POSTGRES_DB: paperless
POSTGRES_USER: paperless
POSTGRES_PASSWORD: paperless
webserver:
image: ghcr.io/paperless-ngx/paperless-ngx:latest
restart: unless-stopped
depends_on:
- db
- broker
ports:
- 8000:8000
healthcheck:
test: ["CMD", "curl", "-fs", "-S", "--max-time", "2", "http://localhost:8000"]
interval: 30s
timeout: 10s
retries: 5
volumes:
- data:/usr/src/paperless/data
- media:/usr/src/paperless/media
- ./export:/usr/src/paperless/export
- ./consume:/usr/src/paperless/consume
env_file: docker-compose.env
environment:
PAPERLESS_REDIS: redis://broker:6379
PAPERLESS_DBHOST: db
volumes:
data:
media:
pgdata:
redisdata:

View File

@@ -1,81 +0,0 @@
# docker-compose file for running paperless from the docker container registry.
# This file contains everything paperless needs to run.
# Paperless supports amd64, arm and arm64 hardware.
# All compose files of paperless configure paperless in the following way:
#
# - Paperless is (re)started on system boot, if it was running before shutdown.
# - Docker volumes for storing data are managed by Docker.
# - Folders for importing and exporting files are created in the same directory
# as this file and mounted to the correct folders inside the container.
# - Paperless listens on port 8000.
#
# SQLite is used as the database. The SQLite file is stored in the data volume.
#
# In addition to that, this docker-compose file adds the following optional
# configurations:
#
# - Apache Tika and Gotenberg servers are started with paperless and paperless
# is configured to use these services. These provide support for consuming
# Office documents (Word, Excel, Power Point and their LibreOffice counter-
# parts.
#
# To install and update paperless with this file, do the following:
#
# - Copy this file as 'docker-compose.yml' and the files 'docker-compose.env'
# and '.env' into a folder.
# - Run 'docker-compose pull'.
# - Run 'docker-compose run --rm webserver createsuperuser' to create a user.
# - Run 'docker-compose up -d'.
#
# For more extensive installation and update instructions, refer to the
# documentation.
version: "3.4"
services:
broker:
image: redis:6.0
restart: unless-stopped
volumes:
- redisdata:/data
webserver:
image: ghcr.io/paperless-ngx/paperless-ngx:latest
restart: unless-stopped
depends_on:
- broker
- gotenberg
- tika
ports:
- 8000:8000
healthcheck:
test: ["CMD", "curl", "-fs", "-S", "--max-time", "2", "http://localhost:8000"]
interval: 30s
timeout: 10s
retries: 5
volumes:
- data:/usr/src/paperless/data
- media:/usr/src/paperless/media
- ./export:/usr/src/paperless/export
- ./consume:/usr/src/paperless/consume
env_file: docker-compose.env
environment:
PAPERLESS_REDIS: redis://broker:6379
PAPERLESS_TIKA_ENABLED: 1
PAPERLESS_TIKA_GOTENBERG_ENDPOINT: http://gotenberg:3000
PAPERLESS_TIKA_ENDPOINT: http://tika:9998
gotenberg:
image: gotenberg/gotenberg:7.4
restart: unless-stopped
command:
- "gotenberg"
- "--chromium-disable-routes=true"
tika:
image: ghcr.io/paperless-ngx/tika:latest
restart: unless-stopped
volumes:
data:
media:
redisdata:

View File

@@ -1,59 +0,0 @@
# docker-compose file for running paperless from the Docker Hub.
# This file contains everything paperless needs to run.
# Paperless supports amd64, arm and arm64 hardware.
#
# All compose files of paperless configure paperless in the following way:
#
# - Paperless is (re)started on system boot, if it was running before shutdown.
# - Docker volumes for storing data are managed by Docker.
# - Folders for importing and exporting files are created in the same directory
# as this file and mounted to the correct folders inside the container.
# - Paperless listens on port 8000.
#
# SQLite is used as the database. The SQLite file is stored in the data volume.
#
# To install and update paperless with this file, do the following:
#
# - Copy this file as 'docker-compose.yml' and the files 'docker-compose.env'
# and '.env' into a folder.
# - Run 'docker-compose pull'.
# - Run 'docker-compose run --rm webserver createsuperuser' to create a user.
# - Run 'docker-compose up -d'.
#
# For more extensive installation and update instructions, refer to the
# documentation.
version: "3.4"
services:
broker:
image: redis:6.0
restart: unless-stopped
volumes:
- redisdata:/data
webserver:
image: ghcr.io/paperless-ngx/paperless-ngx:latest
restart: unless-stopped
depends_on:
- broker
ports:
- 8000:8000
healthcheck:
test: ["CMD", "curl", "-fs", "-S", "--max-time", "2", "http://localhost:8000"]
interval: 30s
timeout: 10s
retries: 5
volumes:
- data:/usr/src/paperless/data
- media:/usr/src/paperless/media
- ./export:/usr/src/paperless/export
- ./consume:/usr/src/paperless/consume
env_file: docker-compose.env
environment:
PAPERLESS_REDIS: redis://broker:6379
volumes:
data:
media:
redisdata:

View File

@@ -1,92 +0,0 @@
#!/usr/bin/env bash
set -e
# Source: https://github.com/sameersbn/docker-gitlab/
map_uidgid() {
USERMAP_ORIG_UID=$(id -u paperless)
USERMAP_ORIG_GID=$(id -g paperless)
USERMAP_NEW_UID=${USERMAP_UID:-$USERMAP_ORIG_UID}
USERMAP_NEW_GID=${USERMAP_GID:-${USERMAP_ORIG_GID:-$USERMAP_NEW_UID}}
if [[ ${USERMAP_NEW_UID} != "${USERMAP_ORIG_UID}" || ${USERMAP_NEW_GID} != "${USERMAP_ORIG_GID}" ]]; then
echo "Mapping UID and GID for paperless:paperless to $USERMAP_NEW_UID:$USERMAP_NEW_GID"
usermod -o -u "${USERMAP_NEW_UID}" paperless
groupmod -o -g "${USERMAP_NEW_GID}" paperless
fi
}
initialize() {
map_uidgid
for dir in export data data/index media media/documents media/documents/originals media/documents/thumbnails; do
if [[ ! -d "../$dir" ]]; then
echo "Creating directory ../$dir"
mkdir ../$dir
fi
done
echo "Creating directory /tmp/paperless"
mkdir -p /tmp/paperless
set +e
echo "Adjusting permissions of paperless files. This may take a while."
chown -R paperless:paperless /tmp/paperless
find .. -not \( -user paperless -and -group paperless \) -exec chown paperless:paperless {} +
set -e
gosu paperless /sbin/docker-prepare.sh
}
install_languages() {
echo "Installing languages..."
local langs="$1"
read -ra langs <<<"$langs"
# Check that it is not empty
if [ ${#langs[@]} -eq 0 ]; then
return
fi
apt-get update
for lang in "${langs[@]}"; do
pkg="tesseract-ocr-$lang"
# English is installed by default
#if [[ "$lang" == "eng" ]]; then
# continue
#fi
if dpkg -s "$pkg" &>/dev/null; then
echo "Package $pkg already installed!"
continue
fi
if ! apt-cache show "$pkg" &>/dev/null; then
echo "Package $pkg not found! :("
continue
fi
echo "Installing package $pkg..."
if ! apt-get -y install "$pkg" &>/dev/null; then
echo "Could not install $pkg"
exit 1
fi
done
}
echo "Paperless-ngx docker container starting..."
# Install additional languages if specified
if [[ -n "$PAPERLESS_OCR_LANGUAGES" ]]; then
install_languages "$PAPERLESS_OCR_LANGUAGES"
fi
initialize
if [[ "$1" != "/"* ]]; then
echo Executing management command "$@"
exec gosu paperless python3 manage.py "$@"
else
echo Executing "$@"
exec "$@"
fi

View File

@@ -1,81 +0,0 @@
#!/usr/bin/env bash
set -e
wait_for_postgres() {
attempt_num=1
max_attempts=5
echo "Waiting for PostgreSQL to start..."
host="${PAPERLESS_DBHOST:=localhost}"
port="${PAPERLESS_DBPORT:=5432}"
while [ ! "$(pg_isready -h $host -p $port)" ]; do
if [ $attempt_num -eq $max_attempts ]; then
echo "Unable to connect to database."
exit 1
else
echo "Attempt $attempt_num failed! Trying again in 5 seconds..."
fi
attempt_num=$(("$attempt_num" + 1))
sleep 5
done
}
wait_for_redis() {
# We use a Python script to send the Redis ping
# instead of installing redis-tools just for 1 thing
if ! python3 /sbin/wait-for-redis.py; then
exit 1
fi
}
migrations() {
(
# flock is in place to prevent multiple containers from doing migrations
# simultaneously. This also ensures that the db is ready when the command
# of the current container starts.
flock 200
echo "Apply database migrations..."
python3 manage.py migrate
) 200>/usr/src/paperless/data/migration_lock
}
search_index() {
index_version=1
index_version_file=/usr/src/paperless/data/.index_version
if [[ (! -f "$index_version_file") || $(<$index_version_file) != "$index_version" ]]; then
echo "Search index out of date. Updating..."
python3 manage.py document_index reindex
echo $index_version | tee $index_version_file >/dev/null
fi
}
superuser() {
if [[ -n "${PAPERLESS_ADMIN_USER}" ]]; then
python3 manage.py manage_superuser
fi
}
do_work() {
if [[ -n "${PAPERLESS_DBHOST}" ]]; then
wait_for_postgres
fi
wait_for_redis
migrations
search_index
superuser
}
do_work

View File

@@ -1,96 +0,0 @@
<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE policymap [
<!ELEMENT policymap (policy)+>
<!ATTLIST policymap xmlns CDATA #FIXED ''>
<!ELEMENT policy EMPTY>
<!ATTLIST policy xmlns CDATA #FIXED '' domain NMTOKEN #REQUIRED
name NMTOKEN #IMPLIED pattern CDATA #IMPLIED rights NMTOKEN #IMPLIED
stealth NMTOKEN #IMPLIED value CDATA #IMPLIED>
]>
<!--
Configure ImageMagick policies.
Domains include system, delegate, coder, filter, path, or resource.
Rights include none, read, write, execute and all. Use | to combine them,
for example: "read | write" to permit read from, or write to, a path.
Use a glob expression as a pattern.
Suppose we do not want users to process MPEG video images:
<policy domain="delegate" rights="none" pattern="mpeg:decode" />
Here we do not want users reading images from HTTP:
<policy domain="coder" rights="none" pattern="HTTP" />
The /repository file system is restricted to read only. We use a glob
expression to match all paths that start with /repository:
<policy domain="path" rights="read" pattern="/repository/*" />
Lets prevent users from executing any image filters:
<policy domain="filter" rights="none" pattern="*" />
Any large image is cached to disk rather than memory:
<policy domain="resource" name="area" value="1GP"/>
Define arguments for the memory, map, area, width, height and disk resources
with SI prefixes (.e.g 100MB). In addition, resource policies are maximums
for each instance of ImageMagick (e.g. policy memory limit 1GB, -limit 2GB
exceeds policy maximum so memory limit is 1GB).
Rules are processed in order. Here we want to restrict ImageMagick to only
read or write a small subset of proven web-safe image types:
<policy domain="delegate" rights="none" pattern="*" />
<policy domain="filter" rights="none" pattern="*" />
<policy domain="coder" rights="none" pattern="*" />
<policy domain="coder" rights="read|write" pattern="{GIF,JPEG,PNG,WEBP}" />
-->
<policymap>
<!-- <policy domain="system" name="shred" value="2"/> -->
<!-- <policy domain="system" name="precision" value="6"/> -->
<!-- <policy domain="system" name="memory-map" value="anonymous"/> -->
<!-- <policy domain="system" name="max-memory-request" value="256MiB"/> -->
<!-- <policy domain="resource" name="temporary-path" value="/tmp"/> -->
<policy domain="resource" name="memory" value="256MiB"/>
<policy domain="resource" name="map" value="512MiB"/>
<policy domain="resource" name="width" value="16KP"/>
<policy domain="resource" name="height" value="16KP"/>
<!-- <policy domain="resource" name="list-length" value="128"/> -->
<policy domain="resource" name="area" value="128MB"/>
<policy domain="resource" name="disk" value="1GiB"/>
<!-- <policy domain="resource" name="file" value="768"/> -->
<!-- <policy domain="resource" name="thread" value="4"/> -->
<!-- <policy domain="resource" name="throttle" value="0"/> -->
<!-- <policy domain="resource" name="time" value="3600"/> -->
<!-- <policy domain="coder" rights="none" pattern="MVG" /> -->
<!-- <policy domain="module" rights="none" pattern="{PS,PDF,XPS}" /> -->
<!-- <policy domain="delegate" rights="none" pattern="HTTPS" /> -->
<!-- <policy domain="path" rights="none" pattern="@*" /> -->
<!-- <policy domain="cache" name="memory-map" value="anonymous"/> -->
<!-- <policy domain="cache" name="synchronize" value="True"/> -->
<!-- <policy domain="cache" name="shared-secret" value="passphrase" stealth="true"/> -->
<!-- <policy domain="system" name="pixel-cache-memory" value="anonymous"/> -->
<!-- <policy domain="system" name="shred" value="2"/> -->
<!-- <policy domain="system" name="precision" value="6"/> -->
<!-- not needed due to the need to use explicitly by mvg: -->
<!-- <policy domain="delegate" rights="none" pattern="MVG" /> -->
<!-- use curl -->
<policy domain="delegate" rights="none" pattern="URL" />
<policy domain="delegate" rights="none" pattern="HTTPS" />
<policy domain="delegate" rights="none" pattern="HTTP" />
<!-- in order to avoid to get image with password text -->
<policy domain="path" rights="none" pattern="@*"/>
<!-- disable ghostscript format types -->
<policy domain="coder" rights="none" pattern="PS" />
<policy domain="coder" rights="none" pattern="PS2" />
<policy domain="coder" rights="none" pattern="PS3" />
<policy domain="coder" rights="none" pattern="EPS" />
<policy domain="coder" rights="read|write" pattern="PDF" />
<policy domain="coder" rights="none" pattern="XPS" />
</policymap>

View File

@@ -1,10 +0,0 @@
#!/usr/bin/env bash
set -eu
for command in document_archiver document_exporter document_importer mail_fetcher document_create_classifier document_index document_renamer document_retagger document_thumbnails document_sanity_checker manage_superuser;
do
echo "installing $command..."
sed "s/management_command/$command/g" management_script.sh > /usr/local/bin/$command
chmod +x /usr/local/bin/$command
done

View File

@@ -1,15 +0,0 @@
#!/usr/bin/env bash
set -e
cd /usr/src/paperless/src/
if [[ $(id -u) == 0 ]] ;
then
gosu paperless python3 manage.py management_command "$@"
elif [[ $(id -un) == "paperless" ]] ;
then
python3 manage.py management_command "$@"
else
echo "Unknown user."
fi

View File

@@ -1,36 +0,0 @@
[supervisord]
nodaemon=true ; start in foreground if true; default false
logfile=/var/log/supervisord/supervisord.log ; main log file; default $CWD/supervisord.log
pidfile=/var/run/supervisord/supervisord.pid ; supervisord pidfile; default supervisord.pid
logfile_maxbytes=50MB ; max main logfile bytes b4 rotation; default 50MB
logfile_backups=10 ; # of main logfile backups; 0 means none, default 10
loglevel=info ; log level; default info; others: debug,warn,trace
user=root
[program:gunicorn]
command=gunicorn -c /usr/src/paperless/gunicorn.conf.py paperless.asgi:application
user=paperless
stdout_logfile=/dev/stdout
stdout_logfile_maxbytes=0
stderr_logfile=/dev/stderr
stderr_logfile_maxbytes=0
[program:consumer]
command=python3 manage.py document_consumer
user=paperless
stdout_logfile=/dev/stdout
stdout_logfile_maxbytes=0
stderr_logfile=/dev/stderr
stderr_logfile_maxbytes=0
[program:scheduler]
command=python3 manage.py qcluster
user=paperless
stopasgroup = true
stdout_logfile=/dev/stdout
stdout_logfile_maxbytes=0
stderr_logfile=/dev/stderr
stderr_logfile_maxbytes=0

View File

@@ -1,42 +0,0 @@
#!/usr/bin/env python3
"""
Simple script which attempts to ping the Redis broker as set in the environment for
a certain number of times, waiting a little bit in between
"""
import os
import sys
import time
from typing import Final
from redis import Redis
if __name__ == "__main__":
MAX_RETRY_COUNT: Final[int] = 5
RETRY_SLEEP_SECONDS: Final[int] = 5
REDIS_URL: Final[str] = os.getenv("PAPERLESS_REDIS", "redis://localhost:6379")
print(f"Waiting for Redis: {REDIS_URL}", flush=True)
attempt = 0
with Redis.from_url(url=REDIS_URL) as client:
while attempt < MAX_RETRY_COUNT:
try:
client.ping()
break
except Exception:
print(
f"Redis ping #{attempt} failed, waiting {RETRY_SLEEP_SECONDS}s",
flush=True,
)
time.sleep(RETRY_SLEEP_SECONDS)
attempt += 1
if attempt >= MAX_RETRY_COUNT:
print(f"Failed to connect to: {REDIS_URL}")
sys.exit(os.EX_UNAVAILABLE)
else:
print(f"Connected to Redis broker: {REDIS_URL}")
sys.exit(os.EX_OK)

View File

@@ -1,10 +1,11 @@
FROM python:3.5.1
MAINTAINER Pit Kleyersburg <pitkley@googlemail.com>
# Install Sphinx and Pygments
RUN pip install --no-cache-dir Sphinx Pygments \
# Setup directories, copy data
&& mkdir /build
RUN pip install Sphinx Pygments
# Setup directories, copy data
RUN mkdir /build
COPY . /build
WORKDIR /build/docs

View File

@@ -1,592 +0,0 @@
/* Variables */
:root {
--color-text-body: #5c5962;
--color-text-body-light: #fcfcfc;
--color-text-anchor: #7253ed;
--color-text-alt: rgba(0, 0, 0, 0.3);
--color-text-title: #27262b;
--color-text-code-inline: #e74c3c;
--color-text-code-nt: #062873;
--color-text-selection: #b19eff;
--color-bg-body: #fcfcfc;
--color-bg-body-alt: #f3f6f6;
--color-bg-side-nav: #f5f6fa;
--color-bg-side-nav-hover: #ebedf5;
--color-bg-code-block: var(--color-bg-side-nav);
--color-border: #eeebee;
--color-btn-neutral-bg: #f3f6f6;
--color-btn-neutral-bg-hover: #e5ebeb;
--color-success-title: #1abc9c;
--color-success-body: #dbfaf4;
--color-warning-title: #f0b37e;
--color-warning-body: #ffedcc;
--color-danger-title: #f29f97;
--color-danger-body: #fdf3f2;
--color-info-title: #6ab0de;
--color-info-body: #e7f2fa;
}
.dark-mode {
--color-text-body: #abb2bf;
--color-text-body-light: #9499a2;
--color-text-alt: rgba(0255, 255, 255, 0.5);
--color-text-title: var(--color-text-anchor);
--color-text-code-inline: #abb2bf;
--color-text-code-nt: #2063f3;
--color-text-selection: #030303;
--color-bg-body: #1d1d20 !important;
--color-bg-body-alt: #131315;
--color-bg-side-nav: #18181a;
--color-bg-side-nav-hover: #101216;
--color-bg-code-block: #101216;
--color-border: #47494f;
--color-btn-neutral-bg: #242529;
--color-btn-neutral-bg-hover: #101216;
--color-success-title: #02120f;
--color-success-body: #041b17;
--color-warning-title: #1b0e03;
--color-warning-body: #371d06;
--color-danger-title: #120902;
--color-danger-body: #1b0503;
--color-info-title: #020608;
--color-info-body: #06141e;
}
* {
transition: background-color 0.3s ease, border-color 0.3s ease;
}
/* Typography */
body {
font-family: system-ui,-apple-system,BlinkMacSystemFont,"Segoe UI",Roboto,"Helvetica Neue",Arial,sans-serif;
font-size: inherit;
line-height: 1.4;
color: var(--color-text-body);
}
h1, h2, h3, h4, h5, h6 {
font-family: inherit;
}
.rst-content .toctree-wrapper>p.caption, .rst-content h1, .rst-content h2, .rst-content h3, .rst-content h4, .rst-content h5, .rst-content h6 {
padding-top: .5em;
}
p, .main-content-wrap, .rst-content .section ul, .rst-content .toctree-wrapper ul, .rst-content section ul, .wy-plain-list-disc, article ul {
line-height: 1.6;
}
pre, .code, .rst-content .linenodiv pre, .rst-content div[class^=highlight] pre, .rst-content pre.literal-block {
font-family: "SFMono-Regular", Menlo,Consolas, Monospace;
font-size: 0.75em;
line-height: 1.8;
}
.wy-menu-vertical li.toctree-l3,.wy-menu-vertical li.toctree-l4 {
font-size: 1rem
}
.rst-versions {
font-family: inherit;
line-height: 1;
}
footer, footer p {
font-size: .8rem;
}
footer .rst-footer-buttons {
font-size: 1rem;
}
@media (max-width: 400px) {
/* break code lines on mobile */
pre, code {
word-break: break-word;
}
}
/* Layout */
.wy-side-nav-search, .wy-menu-vertical {
width: auto;
}
.wy-nav-side {
z-index: 0;
display: flex;
flex-wrap: wrap;
background-color: var(--color-bg-side-nav);
}
.wy-side-scroll {
width: 100%;
overflow-y: auto;
}
@media (min-width: 66.5rem) {
.wy-side-scroll {
width:264px
}
}
@media (min-width: 50rem) {
.wy-nav-side {
flex-wrap: nowrap;
position: fixed;
width: 248px;
height: 100%;
flex-direction: column;
border-right: 1px solid var(--color-border);
align-items:flex-end
}
}
@media (min-width: 66.5rem) {
.wy-nav-side {
width: calc((100% - 1064px) / 2 + 264px);
min-width:264px
}
}
@media (min-width: 50rem) {
.wy-nav-content-wrap {
position: relative;
max-width: 800px;
margin-left:248px
}
}
@media (min-width: 66.5rem) {
.wy-nav-content-wrap {
margin-left:calc((100% - 1064px) / 2 + 264px)
}
}
/* Colors */
body.wy-body-for-nav,
.wy-nav-content {
background: var(--color-bg-body);
}
.wy-nav-side {
border-right: 1px solid var(--color-border);
}
.wy-side-nav-search, .wy-nav-top {
background: var(--color-bg-side-nav);
border-bottom: 1px solid var(--color-border);
}
.wy-nav-content-wrap {
background: inherit;
}
.wy-side-nav-search > a, .wy-nav-top a, .wy-nav-top i {
color: var(--color-text-title);
}
.wy-side-nav-search > a:hover, .wy-nav-top a:hover {
background: transparent;
}
.wy-side-nav-search > div.version {
color: var(--color-text-alt)
}
.wy-side-nav-search > div[role="search"] {
border-top: 1px solid var(--color-border);
}
.wy-menu-vertical li.toctree-l2.current>a, .wy-menu-vertical li.toctree-l2.current li.toctree-l3>a,
.wy-menu-vertical li.toctree-l3.current>a, .wy-menu-vertical li.toctree-l3.current li.toctree-l4>a {
background: var(--color-bg-side-nav);
}
.rst-content .highlighted {
background: #eedd85;
box-shadow: 0 0 0 2px #eedd85;
font-weight: 600;
}
.wy-side-nav-search input[type=text],
html.writer-html5 .rst-content table.docutils th {
color: var(--color-text-body);
}
.rst-content table.docutils:not(.field-list) tr:nth-child(2n-1) td,
.wy-table-backed,
.wy-table-odd td,
.wy-table-striped tr:nth-child(2n-1) td {
background-color: var(--color-bg-body-alt);
}
.rst-content table.docutils,
.wy-table-bordered-all,
html.writer-html5 .rst-content table.docutils th,
.rst-content table.docutils td,
.wy-table-bordered-all td,
hr {
border-color: var(--color-border) !important;
}
::selection {
background: var(--color-text-selection);
}
/* Ridiculous rules are taken from sphinx_rtd */
.rst-content .admonition-title,
.wy-alert-title {
color: var(--color-text-body-light);
}
.rst-content .hint,
.rst-content .important,
.rst-content .tip,
.rst-content .wy-alert-success,
.wy-alert.wy-alert-success {
background: var(--color-success-body);
}
.rst-content .hint .admonition-title,
.rst-content .hint .wy-alert-title,
.rst-content .important .admonition-title,
.rst-content .important .wy-alert-title,
.rst-content .tip .admonition-title,
.rst-content .tip .wy-alert-title,
.rst-content .wy-alert-success .admonition-title,
.rst-content .wy-alert-success .wy-alert-title,
.wy-alert.wy-alert-success .rst-content .admonition-title,
.wy-alert.wy-alert-success .wy-alert-title {
background-color: var(--color-success-title);
}
.rst-content .admonition-todo,
.rst-content .attention,
.rst-content .caution,
.rst-content .warning,
.rst-content .wy-alert-warning,
.wy-alert.wy-alert-warning {
background: var(--color-warning-body);
}
.rst-content .admonition-todo .admonition-title,
.rst-content .admonition-todo .wy-alert-title,
.rst-content .attention .admonition-title,
.rst-content .attention .wy-alert-title,
.rst-content .caution .admonition-title,
.rst-content .caution .wy-alert-title,
.rst-content .warning .admonition-title,
.rst-content .warning .wy-alert-title,
.rst-content .wy-alert-warning .admonition-title,
.rst-content .wy-alert-warning .wy-alert-title,
.rst-content .wy-alert.wy-alert-warning .admonition-title,
.wy-alert.wy-alert-warning .rst-content .admonition-title,
.wy-alert.wy-alert-warning .wy-alert-title {
background: var(--color-warning-title);
}
.rst-content .danger,
.rst-content .error,
.rst-content .wy-alert-danger,
.wy-alert.wy-alert-danger {
background: var(--color-danger-body);
}
.rst-content .danger .admonition-title,
.rst-content .danger .wy-alert-title,
.rst-content .error .admonition-title,
.rst-content .error .wy-alert-title,
.rst-content .wy-alert-danger .admonition-title,
.rst-content .wy-alert-danger .wy-alert-title,
.wy-alert.wy-alert-danger .rst-content .admonition-title,
.wy-alert.wy-alert-danger .wy-alert-title {
background: var(--color-danger-title);
}
.rst-content .note,
.rst-content .seealso,
.rst-content .wy-alert-info,
.wy-alert.wy-alert-info {
background: var(--color-info-body);
}
.rst-content .note .admonition-title,
.rst-content .note .wy-alert-title,
.rst-content .seealso .admonition-title,
.rst-content .seealso .wy-alert-title,
.rst-content .wy-alert-info .admonition-title,
.rst-content .wy-alert-info .wy-alert-title,
.wy-alert.wy-alert-info .rst-content .admonition-title,
.wy-alert.wy-alert-info .wy-alert-title {
background: var(--color-info-title);
}
/* Links */
a, a:visited,
.wy-menu-vertical a,
a.icon.icon-home,
.wy-menu-vertical li.toctree-l1.current > a.current {
color: var(--color-text-anchor);
text-decoration: none;
}
a:hover, .wy-breadcrumbs-aside a {
color: var(--color-text-anchor); /* reset */
}
.rst-versions a, .rst-versions .rst-current-version {
color: #var(--color-text-anchor);
}
.wy-nav-content a.reference, .wy-nav-content a:not([class]) {
background-image: linear-gradient(var(--color-border) 0%, var(--color-border) 100%);
background-repeat: repeat-x;
background-position: 0 100%;
background-size: 1px 1px;
}
.wy-nav-content a.reference:hover, .wy-nav-content a:not([class]):hover {
background-image: linear-gradient(rgba(114,83,237,0.45) 0%, rgba(114,83,237,0.45) 100%);
background-size: 1px 1px;
}
.wy-menu-vertical a:hover,
.wy-menu-vertical li.current a:hover,
.wy-menu-vertical a:active {
background: var(--color-bg-side-nav-hover) !important;
color: var(--color-text-body);
}
.wy-menu-vertical li.toctree-l1.current>a,
.wy-menu-vertical li.current>a,
.wy-menu-vertical li.on a {
background-color: var(--color-bg-side-nav-hover);
border: none;
font-weight: normal;
}
.wy-menu-vertical li.current {
background-color: inherit;
}
.wy-menu-vertical li.current a {
border-right: none;
}
.wy-menu-vertical li.toctree-l2 a,
.wy-menu-vertical li.toctree-l3 a,
.wy-menu-vertical li.toctree-l4 a,
.wy-menu-vertical li.toctree-l5 a,
.wy-menu-vertical li.toctree-l6 a,
.wy-menu-vertical li.toctree-l7 a,
.wy-menu-vertical li.toctree-l8 a,
.wy-menu-vertical li.toctree-l9 a,
.wy-menu-vertical li.toctree-l10 a {
color: var(--color-text-body);
}
a.image-reference, a.image-reference:hover {
background: none !important;
}
a.image-reference img {
cursor: zoom-in;
}
/* Code blocks */
.rst-content code, .rst-content tt, code {
padding: 0.25em;
font-weight: 400;
background-color: var(--color-bg-code-block);
border: 1px solid var(--color-border);
border-radius: 4px;
}
.rst-content div[class^=highlight], .rst-content pre.literal-block {
padding: 0.7rem;
margin-top: 0;
margin-bottom: 0.75rem;
overflow-x: auto;
background-color: var(--color-bg-side-nav);
border-color: var(--color-border);
border-radius: 4px;
box-shadow: none;
}
.rst-content .admonition-title,
.rst-content div.admonition,
.wy-alert-title {
padding: 10px 12px;
border-top-left-radius: 4px;
border-top-right-radius: 4px;
}
.highlight .go {
color: inherit;
}
.highlight .nt {
color: var(--color-text-code-nt);
}
.rst-content code.literal,
.rst-content tt.literal {
border-color: var(--color-border);
background-color: var(--color-border);
color: var(--color-text-code-inline)
}
/* Search */
.wy-side-nav-search input[type=text] {
border: none;
border-radius: 0;
background-color: transparent;
font-family: inherit;
font-size: .85rem;
box-shadow: none;
padding: .7rem 1rem .7rem 2.8rem;
margin: 0;
}
#rtd-search-form {
position: relative;
}
#rtd-search-form:before {
font: normal normal normal 14px/1 FontAwesome;
font-size: inherit;
text-rendering: auto;
-webkit-font-smoothing: antialiased;
-moz-osx-font-smoothing: grayscale;
content: "\f002";
color: var(--color-text-alt);
position: absolute;
left: 1.5rem;
top: .7rem;
}
/* Side nav */
.wy-side-nav-search {
padding: 1rem 0 0 0;
}
.wy-menu-vertical li a button.toctree-expand {
float: right;
margin-right: -1.5em;
padding: 0 .5em;
}
.wy-menu-vertical a,
.wy-menu-vertical li.current>a,
.wy-menu-vertical li.current li>a {
padding-right: 1.5em !important;
}
.wy-menu-vertical li.current li>a.current {
font-weight: 600;
}
/* Misc spacing */
.rst-content .admonition-title, .wy-alert-title {
padding: 10px 12px;
}
/* Buttons */
.btn {
display: inline-block;
box-sizing: border-box;
padding: 0.3em 1em;
margin: 0;
font-family: inherit;
font-size: inherit;
font-weight: 500;
line-height: 1.5;
color: #var(--color-text-anchor);
text-decoration: none;
vertical-align: baseline;
background-color: #f7f7f7;
border-width: 0;
border-radius: 4px;
box-shadow: 0 1px 2px rgba(0,0,0,0.12),0 3px 10px rgba(0,0,0,0.08);
appearance: none;
}
.btn:active {
padding: 0.3em 1em;
}
.rst-content .btn:focus {
outline: 1px solid #ccc;
}
.rst-content .btn-neutral, .rst-content .btn span.fa {
color: var(--color-text-body) !important;
}
.btn-neutral {
background-color: var(--color-btn-neutral-bg) !important;
color: var(--color-btn-neutral-text) !important;
border: 1px solid var(--color-btn-neutral-bg);
}
.btn:hover, .btn-neutral:hover {
background-color: var(--color-btn-neutral-bg-hover) !important;
}
/* Icon overrides */
.wy-side-nav-search a.icon-home:before {
display: none;
}
.fa-minus-square-o:before,.wy-menu-vertical li.current>a button.toctree-expand:before,.wy-menu-vertical li.on a button.toctree-expand:before {
content: "\f106"; /* fa-angle-up */
}
.fa-plus-square-o:before, .wy-menu-vertical li button.toctree-expand:before {
content: "\f107"; /* fa-angle-down */
}
/* Misc */
.wy-nav-top {
line-height: 36px;
}
.wy-nav-top > i {
font-size: 24px;
padding: 8px 0 0 2px;
color:#var(--color-text-anchor);
}
.rst-content table.docutils td,
.rst-content table.docutils th,
.rst-content table.field-list td,
.rst-content table.field-list th,
.wy-table td,
.wy-table th {
padding: 8px 14px;
}
.dark-mode-toggle {
position: absolute;
top: 14px;
right: 12px;
height: 20px;
width: 24px;
z-index: 10;
border: none;
background-color: transparent;
color: inherit;
opacity: 0.7;
}
.wy-nav-content-wrap {
z-index: 20;
}

14
docs/_static/custom.css vendored Normal file
View File

@@ -0,0 +1,14 @@
/* override table width restrictions */
@media screen and (min-width: 767px) {
.wy-table-responsive table td {
/* !important prevents the common CSS stylesheets from
overriding this as on RTD they are loaded after this stylesheet */
white-space: normal !important;
}
.wy-table-responsive {
overflow: visible !important;
}
}

View File

@@ -1,47 +0,0 @@
let toggleButton;
let icon;
function load() {
"use strict";
toggleButton = document.createElement("button");
toggleButton.setAttribute("title", "Toggle dark mode");
toggleButton.classList.add("dark-mode-toggle");
icon = document.createElement("i");
icon.classList.add("fa", darkModeState ? "fa-sun-o" : "fa-moon-o");
toggleButton.appendChild(icon);
document.body.prepend(toggleButton);
// Listen for changes in the OS settings
// addListener is used because older versions of Safari don't support addEventListener
// prefersDarkQuery set in <head>
if (prefersDarkQuery) {
prefersDarkQuery.addListener(function (evt) {
toggleDarkMode(evt.matches);
});
}
// Initial setting depending on the prefers-color-mode or localstorage
// darkModeState should be set in the document <head> to prevent flash
if (darkModeState == undefined) darkModeState = false;
toggleDarkMode(darkModeState);
// Toggles the "dark-mode" class on click and sets localStorage state
toggleButton.addEventListener("click", () => {
darkModeState = !darkModeState;
toggleDarkMode(darkModeState);
localStorage.setItem("dark-mode", darkModeState);
});
}
function toggleDarkMode(state) {
document.documentElement.classList.toggle("dark-mode", state);
document.documentElement.classList.toggle("light-mode", !state);
icon.classList.remove("fa-sun-o");
icon.classList.remove("fa-moon-o");
icon.classList.add(state ? "fa-sun-o" : "fa-moon-o");
darkModeState = state;
}
document.addEventListener("DOMContentLoaded", load);

Binary file not shown.

Before

Width:  |  Height:  |  Size: 67 KiB

BIN
docs/_static/screenshot.png vendored Normal file

Binary file not shown.

After

Width:  |  Height:  |  Size: 445 KiB

Binary file not shown.

Before

Width:  |  Height:  |  Size: 661 KiB

Binary file not shown.

Before

Width:  |  Height:  |  Size: 457 KiB

Binary file not shown.

Before

Width:  |  Height:  |  Size: 436 KiB

Binary file not shown.

Before

Width:  |  Height:  |  Size: 462 KiB

Binary file not shown.

Before

Width:  |  Height:  |  Size: 608 KiB

Binary file not shown.

Before

Width:  |  Height:  |  Size: 698 KiB

Binary file not shown.

Before

Width:  |  Height:  |  Size: 706 KiB

Binary file not shown.

Before

Width:  |  Height:  |  Size: 480 KiB

Binary file not shown.

Before

Width:  |  Height:  |  Size: 680 KiB

Binary file not shown.

Before

Width:  |  Height:  |  Size: 686 KiB

Binary file not shown.

Before

Width:  |  Height:  |  Size: 848 KiB

Binary file not shown.

Before

Width:  |  Height:  |  Size: 703 KiB

Binary file not shown.

Before

Width:  |  Height:  |  Size: 96 KiB

Binary file not shown.

Before

Width:  |  Height:  |  Size: 388 KiB

Binary file not shown.

Before

Width:  |  Height:  |  Size: 26 KiB

Binary file not shown.

Before

Width:  |  Height:  |  Size: 54 KiB

Binary file not shown.

Before

Width:  |  Height:  |  Size: 517 KiB

View File

@@ -1,13 +0,0 @@
{% extends "!layout.html" %}
{% block extrahead %}
<script>
// MediaQueryList object
const prefersDarkQuery = window.matchMedia("(prefers-color-scheme: dark)");
const lsDark = localStorage.getItem("dark-mode");
let darkModeState = lsDark !== null ? lsDark == "true" : prefersDarkQuery.matches;
document.documentElement.classList.toggle("dark-mode", darkModeState);
document.documentElement.classList.toggle("light-mode", !darkModeState);
</script>
{{ super() }}
{% endblock %}

View File

@@ -1,516 +0,0 @@
**************
Administration
**************
.. _administration-backup:
Making backups
##############
Multiple options exist for making backups of your paperless instance,
depending on how you installed paperless.
Before making backups, make sure that paperless is not running.
Options available to any installation of paperless:
* Use the :ref:`document exporter <utilities-exporter>`.
The document exporter exports all your documents, thumbnails and
metadata to a specific folder. You may import your documents into a
fresh instance of paperless again or store your documents in another
DMS with this export.
* The document exporter is also able to update an already existing export.
Therefore, incremental backups with ``rsync`` are entirely possible.
.. caution::
You cannot import the export generated with one version of paperless in a
different version of paperless. The export contains an exact image of the
database, and migrations may change the database layout.
Options available to docker installations:
* Backup the docker volumes. These usually reside within
``/var/lib/docker/volumes`` on the host and you need to be root in order
to access them.
Paperless uses 3 volumes:
* ``paperless_media``: This is where your documents are stored.
* ``paperless_data``: This is where auxillary data is stored. This
folder also contains the SQLite database, if you use it.
* ``paperless_pgdata``: Exists only if you use PostgreSQL and contains
the database.
Options available to bare-metal and non-docker installations:
* Backup the entire paperless folder. This ensures that if your paperless instance
crashes at some point or your disk fails, you can simply copy the folder back
into place and it works.
When using PostgreSQL, you'll also have to backup the database.
.. _migrating-restoring:
Restoring
=========
.. _administration-updating:
Updating Paperless
##################
Docker Route
============
If a new release of paperless-ngx is available, upgrading depends on how you
installed paperless-ngx in the first place. The releases are available at the
`release page <https://github.com/paperless-ngx/paperless-ngx/releases>`_.
First of all, ensure that paperless is stopped.
.. code:: shell-session
$ cd /path/to/paperless
$ docker-compose down
After that, :ref:`make a backup <administration-backup>`.
A. If you pull the image from the docker hub, all you need to do is:
.. code:: shell-session
$ docker-compose pull
$ docker-compose up
The docker-compose files refer to the ``latest`` version, which is always the latest
stable release.
B. If you built the image yourself, do the following:
.. code:: shell-session
$ git pull
$ docker-compose build
$ docker-compose up
Running ``docker-compose up`` will also apply any new database migrations.
If you see everything working, press CTRL+C once to gracefully stop paperless.
Then you can start paperless-ngx with ``-d`` to have it run in the background.
.. note::
In version 0.9.14, the update process was changed. In 0.9.13 and earlier, the
docker-compose files specified exact versions and pull won't automatically
update to newer versions. In order to enable updates as described above, either
get the new ``docker-compose.yml`` file from `here <https://github.com/paperless-ngx/paperless-ngx/tree/master/docker/compose>`_
or edit the ``docker-compose.yml`` file, find the line that says
.. code::
image: ghcr.io/paperless-ngx/paperless-ngx:0.9.x
and replace the version with ``latest``:
.. code::
image: ghcr.io/paperless-ngx/paperless-ngx:latest
.. note::
In version 1.7.1 and onwards, the Docker image can now pinned to a release series.
This is often combined with automatic updaters such as Watchtower to allow safer
unattended upgrading to new bugfix releases only. It is still recommended to always
review release notes before upgrading. To ping your install to a release series, edit
the ``docker-compose.yml`` find the line that says
.. code::
image: ghcr.io/paperless-ngx/paperless-ngx:latest
and replace the version with the series you want to track, for example:
.. code::
image: ghcr.io/paperless-ngx/paperless-ngx:1.7
Bare Metal Route
================
After grabbing the new release and unpacking the contents, do the following:
1. Update dependencies. New paperless version may require additional
dependencies. The dependencies required are listed in the section about
:ref:`bare metal installations <setup-bare_metal>`.
2. Update python requirements. Keep in mind to activate your virtual environment
before that, if you use one.
.. code:: shell-session
$ pip install -r requirements.txt
3. Migrate the database.
.. code:: shell-session
$ cd src
$ python3 manage.py migrate
This might not actually do anything. Not every new paperless version comes with new
database migrations.
Downgrading Paperless
#####################
Downgrades are possible. However, some updates also contain database migrations (these change the layout of the database and may move data).
In order to move back from a version that applied database migrations, you'll have to revert the database migration *before* downgrading,
and then downgrade paperless.
This table lists the compatible versions for each database migration number.
+------------------+-----------------+
| Migration number | Version range |
+------------------+-----------------+
| 1011 | 1.0.0 |
+------------------+-----------------+
| 1012 | 1.1.0 - 1.2.1 |
+------------------+-----------------+
| 1014 | 1.3.0 - 1.3.1 |
+------------------+-----------------+
| 1016 | 1.3.2 - current |
+------------------+-----------------+
Execute the following management command to migrate your database:
.. code:: shell-session
$ python3 manage.py migrate documents <migration number>
.. note::
Some migrations cannot be undone. The command will issue errors if that happens.
.. _utilities-management-commands:
Management utilities
####################
Paperless comes with some management commands that perform various maintenance
tasks on your paperless instance. You can invoke these commands in the following way:
With docker-compose, while paperless is running:
.. code:: shell-session
$ cd /path/to/paperless
$ docker-compose exec webserver <command> <arguments>
With docker, while paperless is running:
.. code:: shell-session
$ docker exec -it <container-name> <command> <arguments>
Bare metal:
.. code:: shell-session
$ cd /path/to/paperless/src
$ python3 manage.py <command> <arguments>
All commands have built-in help, which can be accessed by executing them with
the argument ``--help``.
.. _utilities-exporter:
Document exporter
=================
The document exporter exports all your data from paperless into a folder for
backup or migration to another DMS.
If you use the document exporter within a cronjob to backup your data you might use the ``-T`` flag behind exec to suppress "The input device is not a TTY" errors. For example: ``docker-compose exec -T webserver document_exporter ../export``
.. code::
document_exporter target [-c] [-f] [-d]
optional arguments:
-c, --compare-checksums
-f, --use-filename-format
-d, --delete
``target`` is a folder to which the data gets written. This includes documents,
thumbnails and a ``manifest.json`` file. The manifest contains all metadata from
the database (correspondents, tags, etc).
When you use the provided docker compose script, specify ``../export`` as the
target. This path inside the container is automatically mounted on your host on
the folder ``export``.
If the target directory already exists and contains files, paperless will assume
that the contents of the export directory are a previous export and will attempt
to update the previous export. Paperless will only export changed and added files.
Paperless determines whether a file has changed by inspecting the file attributes
"date/time modified" and "size". If that does not work out for you, specify
``--compare-checksums`` and paperless will attempt to compare file checksums instead.
This is slower.
Paperless will not remove any existing files in the export directory. If you want
paperless to also remove files that do not belong to the current export such as files
from deleted documents, specify ``--delete``. Be careful when pointing paperless to
a directory that already contains other files.
The filenames generated by this command follow the format
``[date created] [correspondent] [title].[extension]``.
If you want paperless to use ``PAPERLESS_FILENAME_FORMAT`` for exported filenames
instead, specify ``--use-filename-format``.
.. _utilities-importer:
Document importer
=================
The document importer takes the export produced by the `Document exporter`_ and
imports it into paperless.
The importer works just like the exporter. You point it at a directory, and
the script does the rest of the work:
.. code::
document_importer source
When you use the provided docker compose script, put the export inside the
``export`` folder in your paperless source directory. Specify ``../export``
as the ``source``.
.. _utilities-retagger:
Document retagger
=================
Say you've imported a few hundred documents and now want to introduce
a tag or set up a new correspondent, and apply its matching to all of
the currently-imported docs. This problem is common enough that
there are tools for it.
.. code::
document_retagger [-h] [-c] [-T] [-t] [-i] [--use-first] [-f]
optional arguments:
-c, --correspondent
-T, --tags
-t, --document_type
-i, --inbox-only
--use-first
-f, --overwrite
Run this after changing or adding matching rules. It'll loop over all
of the documents in your database and attempt to match documents
according to the new rules.
Specify any combination of ``-c``, ``-T`` and ``-t`` to have the
retagger perform matching of the specified metadata type. If you don't
specify any of these options, the document retagger won't do anything.
Specify ``-i`` to have the document retagger work on documents tagged
with inbox tags only. This is useful when you don't want to mess with
your already processed documents.
When multiple document types or correspondents match a single document,
the retagger won't assign these to the document. Specify ``--use-first``
to override this behavior and just use the first correspondent or type
it finds. This option does not apply to tags, since any amount of tags
can be applied to a document.
Finally, ``-f`` specifies that you wish to overwrite already assigned
correspondents, types and/or tags. The default behavior is to not
assign correspondents and types to documents that have this data already
assigned. ``-f`` works differently for tags: By default, only additional tags get
added to documents, no tags will be removed. With ``-f``, tags that don't
match a document anymore get removed as well.
Managing the Automatic matching algorithm
=========================================
The *Auto* matching algorithm requires a trained neural network to work.
This network needs to be updated whenever somethings in your data
changes. The docker image takes care of that automatically with the task
scheduler. You can manually renew the classifier by invoking the following
management command:
.. code::
document_create_classifier
This command takes no arguments.
.. _`administration-index`:
Managing the document search index
==================================
The document search index is responsible for delivering search results for the
website. The document index is automatically updated whenever documents get
added to, changed, or removed from paperless. However, if the search yields
non-existing documents or won't find anything, you may need to recreate the
index manually.
.. code::
document_index {reindex,optimize}
Specify ``reindex`` to have the index created from scratch. This may take some
time.
Specify ``optimize`` to optimize the index. This updates certain aspects of
the index and usually makes queries faster and also ensures that the
autocompletion works properly. This command is regularly invoked by the task
scheduler.
.. _utilities-renamer:
Managing filenames
==================
If you use paperless' feature to
:ref:`assign custom filenames to your documents <advanced-file_name_handling>`,
you can use this command to move all your files after changing
the naming scheme.
.. warning::
Since this command moves you documents around a lot, it is advised to to
a backup before. The renaming logic is robust and will never overwrite
or delete a file, but you can't ever be careful enough.
.. code::
document_renamer
The command takes no arguments and processes all your documents at once.
Learn how to use :ref:`Management Utilities<utilities-management-commands>`.
.. _utilities-sanity-checker:
Sanity checker
==============
Paperless has a built-in sanity checker that inspects your document collection for issues.
The issues detected by the sanity checker are as follows:
* Missing original files.
* Missing archive files.
* Inaccessible original files due to improper permissions.
* Inaccessible archive files due to improper permissions.
* Corrupted original documents by comparing their checksum against what is stored in the database.
* Corrupted archive documents by comparing their checksum against what is stored in the database.
* Missing thumbnails.
* Inaccessible thumbnails due to improper permissions.
* Documents without any content (warning).
* Orphaned files in the media directory (warning). These are files that are not referenced by any document im paperless.
.. code::
document_sanity_checker
The command takes no arguments. Depending on the size of your document archive, this may take some time.
Fetching e-mail
===============
Paperless automatically fetches your e-mail every 10 minutes by default. If
you want to invoke the email consumer manually, call the following management
command:
.. code::
mail_fetcher
The command takes no arguments and processes all your mail accounts and rules.
.. _utilities-archiver:
Creating archived documents
===========================
Paperless stores archived PDF/A documents alongside your original documents.
These archived documents will also contain selectable text for image-only
originals.
These documents are derived from the originals, which are always stored
unmodified. If coming from an earlier version of paperless, your documents
won't have archived versions.
This command creates PDF/A documents for your documents.
.. code::
document_archiver --overwrite --document <id>
This command will only attempt to create archived documents when no archived
document exists yet, unless ``--overwrite`` is specified. If ``--document <id>``
is specified, the archiver will only process that document.
.. note::
This command essentially performs OCR on all your documents again,
according to your settings. If you run this with ``PAPERLESS_OCR_MODE=redo``,
it will potentially run for a very long time. You can cancel the command
at any time, since this command will skip already archived versions the next time
it is run.
.. note::
Some documents will cause errors and cannot be converted into PDF/A documents,
such as encrypted PDF documents. The archiver will skip over these documents
each time it sees them.
.. _utilities-encyption:
Managing encryption
===================
Documents can be stored in Paperless using GnuPG encryption.
.. danger::
Encryption is deprecated since paperless-ngx 0.9 and doesn't really provide any
additional security, since you have to store the passphrase in a configuration
file on the same system as the encrypted documents for paperless to work.
Furthermore, the entire text content of the documents is stored plain in the
database, even if your documents are encrypted. Filenames are not encrypted as
well.
Also, the web server provides transparent access to your encrypted documents.
Consider running paperless on an encrypted filesystem instead, which will then
at least provide security against physical hardware theft.
Enabling encryption
-------------------
Enabling encryption is no longer supported.
Disabling encryption
--------------------
Basic usage to disable encryption of your document store:
(Note: If ``PAPERLESS_PASSPHRASE`` isn't set already, you need to specify it here)
.. code::
decrypt_documents [--passphrase SECR3TP4SSPHRA$E]

View File

@@ -1,292 +0,0 @@
***************
Advanced topics
***************
Paperless offers a couple features that automate certain tasks and make your life
easier.
.. _advanced-matching:
Matching tags, correspondents and document types
################################################
Paperless will compare the matching algorithms defined by every tag and
correspondent already set in your database to see if they apply to the text in
a document. In other words, if you defined a tag called ``Home Utility``
that had a ``match`` property of ``bc hydro`` and a ``matching_algorithm`` of
``literal``, Paperless will automatically tag your newly-consumed document with
your ``Home Utility`` tag so long as the text ``bc hydro`` appears in the body
of the document somewhere.
The matching logic is quite powerful. It supports searching the text of your
document with different algorithms, and as such, some experimentation may be
necessary to get things right.
In order to have a tag, correspondent, or type assigned automatically to newly
consumed documents, assign a match and matching algorithm using the web
interface. These settings define when to assign correspondents, tags, and types
to documents.
The following algorithms are available:
* **Any:** Looks for any occurrence of any word provided in match in the PDF.
If you define the match as ``Bank1 Bank2``, it will match documents containing
either of these terms.
* **All:** Requires that every word provided appears in the PDF, albeit not in the
order provided.
* **Literal:** Matches only if the match appears exactly as provided (i.e. preserve ordering) in the PDF.
* **Regular expression:** Parses the match as a regular expression and tries to
find a match within the document.
* **Fuzzy match:** I dont know. Look at the source.
* **Auto:** Tries to automatically match new documents. This does not require you
to set a match. See the notes below.
When using the *any* or *all* matching algorithms, you can search for terms
that consist of multiple words by enclosing them in double quotes. For example,
defining a match text of ``"Bank of America" BofA`` using the *any* algorithm,
will match documents that contain either "Bank of America" or "BofA", but will
not match documents containing "Bank of South America".
Then just save your tag/correspondent and run another document through the
consumer. Once complete, you should see the newly-created document,
automatically tagged with the appropriate data.
.. _advanced-automatic_matching:
Automatic matching
==================
Paperless-ngx comes with a new matching algorithm called *Auto*. This matching
algorithm tries to assign tags, correspondents, and document types to your
documents based on how you have already assigned these on existing documents. It
uses a neural network under the hood.
If, for example, all your bank statements of your account 123 at the Bank of
America are tagged with the tag "bofa_123" and the matching algorithm of this
tag is set to *Auto*, this neural network will examine your documents and
automatically learn when to assign this tag.
Paperless tries to hide much of the involved complexity with this approach.
However, there are a couple caveats you need to keep in mind when using this
feature:
* Changes to your documents are not immediately reflected by the matching
algorithm. The neural network needs to be *trained* on your documents after
changes. Paperless periodically (default: once each hour) checks for changes
and does this automatically for you.
* The Auto matching algorithm only takes documents into account which are NOT
placed in your inbox (i.e. have any inbox tags assigned to them). This ensures
that the neural network only learns from documents which you have correctly
tagged before.
* The matching algorithm can only work if there is a correlation between the
tag, correspondent, or document type and the document itself. Your bank
statements usually contain your bank account number and the name of the bank,
so this works reasonably well, However, tags such as "TODO" cannot be
automatically assigned.
* The matching algorithm needs a reasonable number of documents to identify when
to assign tags, correspondents, and types. If one out of a thousand documents
has the correspondent "Very obscure web shop I bought something five years
ago", it will probably not assign this correspondent automatically if you buy
something from them again. The more documents, the better.
* Paperless also needs a reasonable amount of negative examples to decide when
not to assign a certain tag, correspondent or type. This will usually be the
case as you start filling up paperless with documents. Example: If all your
documents are either from "Webshop" and "Bank", paperless will assign one of
these correspondents to ANY new document, if both are set to automatic matching.
Hooking into the consumption process
####################################
Sometimes you may want to do something arbitrary whenever a document is
consumed. Rather than try to predict what you may want to do, Paperless lets
you execute scripts of your own choosing just before or after a document is
consumed using a couple simple hooks.
Just write a script, put it somewhere that Paperless can read & execute, and
then put the path to that script in ``paperless.conf`` or ``docker-compose.env`` with the variable name
of either ``PAPERLESS_PRE_CONSUME_SCRIPT`` or
``PAPERLESS_POST_CONSUME_SCRIPT``.
.. important::
These scripts are executed in a **blocking** process, which means that if
a script takes a long time to run, it can significantly slow down your
document consumption flow. If you want things to run asynchronously,
you'll have to fork the process in your script and exit.
Pre-consumption script
======================
Executed after the consumer sees a new document in the consumption folder, but
before any processing of the document is performed. This script receives exactly
one argument:
* Document file name
A simple but common example for this would be creating a simple script like
this:
``/usr/local/bin/ocr-pdf``
.. code:: bash
#!/usr/bin/env bash
pdf2pdfocr.py -i ${1}
``/etc/paperless.conf``
.. code:: bash
...
PAPERLESS_PRE_CONSUME_SCRIPT="/usr/local/bin/ocr-pdf"
...
This will pass the path to the document about to be consumed to ``/usr/local/bin/ocr-pdf``,
which will in turn call `pdf2pdfocr.py`_ on your document, which will then
overwrite the file with an OCR'd version of the file and exit. At which point,
the consumption process will begin with the newly modified file.
.. _pdf2pdfocr.py: https://github.com/LeoFCardoso/pdf2pdfocr
.. _advanced-post_consume_script:
Post-consumption script
=======================
Executed after the consumer has successfully processed a document and has moved it
into paperless. It receives the following arguments:
* Document id
* Generated file name
* Source path
* Thumbnail path
* Download URL
* Thumbnail URL
* Correspondent
* Tags
The script can be in any language, but for a simple shell script
example, you can take a look at `post-consumption-example.sh`_ in this project.
The post consumption script cannot cancel the consumption process.
Docker
------
Assumed you have ``/home/foo/paperless-ngx/scripts/post-consumption-example.sh``.
You can pass that script into the consumer container via a host mount in your ``docker-compose.yml``.
.. code:: bash
...
consumer:
...
volumes:
...
- /home/paperless-ngx/scripts:/path/in/container/scripts/
...
Example (docker-compose.yml): ``- /home/foo/paperless-ngx/scripts:/usr/src/paperless/scripts``
which in turn requires the variable ``PAPERLESS_POST_CONSUME_SCRIPT`` in ``docker-compose.env`` to point to ``/path/in/container/scripts/post-consumption-example.sh``.
Example (docker-compose.env): ``PAPERLESS_POST_CONSUME_SCRIPT=/usr/src/paperless/scripts/post-consumption-example.sh``
Troubleshooting:
- Monitor the docker-compose log ``cd ~/paperless-ngx; docker-compose logs -f``
- Check your script's permission e.g. in case of permission error ``sudo chmod 755 post-consumption-example.sh``
- Pipe your scripts's output to a log file e.g. ``echo "${DOCUMENT_ID}" | tee --append /usr/src/paperless/scripts/post-consumption-example.log``
.. _post-consumption-example.sh: https://github.com/paperless-ngx/paperless-ngx/blob/main/scripts/post-consumption-example.sh
.. _advanced-file_name_handling:
File name handling
##################
By default, paperless stores your documents in the media directory and renames them
using the identifier which it has assigned to each document. You will end up getting
files like ``0000123.pdf`` in your media directory. This isn't necessarily a bad
thing, because you normally don't have to access these files manually. However, if
you wish to name your files differently, you can do that by adjusting the
``PAPERLESS_FILENAME_FORMAT`` configuration option.
This variable allows you to configure the filename (folders are allowed) using
placeholders. For example, configuring this to
.. code:: bash
PAPERLESS_FILENAME_FORMAT={created_year}/{correspondent}/{title}
will create a directory structure as follows:
.. code::
2019/
My bank/
Statement January.pdf
Statement February.pdf
2020/
My bank/
Statement January.pdf
Letter.pdf
Letter_01.pdf
Shoe store/
My new shoes.pdf
.. danger::
Do not manually move your files in the media folder. Paperless remembers the
last filename a document was stored as. If you do rename a file, paperless will
report your files as missing and won't be able to find them.
Paperless provides the following placeholders within filenames:
* ``{asn}``: The archive serial number of the document, or "none".
* ``{correspondent}``: The name of the correspondent, or "none".
* ``{document_type}``: The name of the document type, or "none".
* ``{tag_list}``: A comma separated list of all tags assigned to the document.
* ``{title}``: The title of the document.
* ``{created}``: The full date (ISO format) the document was created.
* ``{created_year}``: Year created only.
* ``{created_month}``: Month created only (number 01-12).
* ``{created_day}``: Day created only (number 01-31).
* ``{added}``: The full date (ISO format) the document was added to paperless.
* ``{added_year}``: Year added only.
* ``{added_month}``: Month added only (number 01-12).
* ``{added_day}``: Day added only (number 01-31).
Paperless will try to conserve the information from your database as much as possible.
However, some characters that you can use in document titles and correspondent names (such
as ``: \ /`` and a couple more) are not allowed in filenames and will be replaced with dashes.
If paperless detects that two documents share the same filename, paperless will automatically
append ``_01``, ``_02``, etc to the filename. This happens if all the placeholders in a filename
evaluate to the same value.
.. hint::
Paperless checks the filename of a document whenever it is saved. Therefore,
you need to update the filenames of your documents and move them after altering
this setting by invoking the :ref:`document renamer <utilities-renamer>`.
.. warning::
Make absolutely sure you get the spelling of the placeholders right, or else
paperless will use the default naming scheme instead.
.. caution::
As of now, you could totally tell paperless to store your files anywhere outside
the media directory by setting
.. code::
PAPERLESS_FILENAME_FORMAT=../../my/custom/location/{title}
However, keep in mind that inside docker, if files get stored outside of the
predefined volumes, they will be lost after a restart of paperless.

View File

@@ -1,300 +1,23 @@
.. _api:
************
The REST API
************
############
Paperless makes use of the `Django REST Framework`_ standard API interface.
It provides a browsable API for most of its endpoints, which you can inspect
at ``http://<paperless-host>:<port>/api/``. This also documents most of the
available filters and ordering fields.
Paperless makes use of the `Django REST Framework`_ standard API interface
because of its inherent awesomeness. Conveniently, the system is also
self-documenting, so to learn more about the access points, schema, what's
accepted and what isn't, you need only visit ``/api`` on your local Paperless
installation.
.. _Django REST Framework: http://django-rest-framework.org/
The API provides 5 main endpoints:
* ``/api/documents/``: Full CRUD support, except POSTing new documents. See below.
* ``/api/correspondents/``: Full CRUD support.
* ``/api/document_types/``: Full CRUD support.
* ``/api/logs/``: Read-Only.
* ``/api/tags/``: Full CRUD support.
.. _api-uploading:
All of these endpoints except for the logging endpoint
allow you to fetch, edit and delete individual objects
by appending their primary key to the path, for example ``/api/documents/454/``.
The objects served by the document endpoint contain the following fields:
* ``id``: ID of the document. Read-only.
* ``title``: Title of the document.
* ``content``: Plain text content of the document.
* ``tags``: List of IDs of tags assigned to this document, or empty list.
* ``document_type``: Document type of this document, or null.
* ``correspondent``: Correspondent of this document or null.
* ``created``: The date at which this document was created.
* ``modified``: The date at which this document was last edited in paperless. Read-only.
* ``added``: The date at which this document was added to paperless. Read-only.
* ``archive_serial_number``: The identifier of this document in a physical document archive.
* ``original_file_name``: Verbose filename of the original document. Read-only.
* ``archived_file_name``: Verbose filename of the archived document. Read-only. Null if no archived document is available.
Downloading documents
#####################
In addition to that, the document endpoint offers these additional actions on
individual documents:
* ``/api/documents/<pk>/download/``: Download the document.
* ``/api/documents/<pk>/preview/``: Display the document inline,
without downloading it.
* ``/api/documents/<pk>/thumb/``: Download the PNG thumbnail of a document.
Paperless generates archived PDF/A documents from consumed files and stores both
the original files as well as the archived files. By default, the endpoints
for previews and downloads serve the archived file, if it is available.
Otherwise, the original file is served.
Some document cannot be archived.
The endpoints correctly serve the response header fields ``Content-Disposition``
and ``Content-Type`` to indicate the filename for download and the type of content of
the document.
In order to download or preview the original document when an archived document is available,
supply the query parameter ``original=true``.
.. hint::
Paperless used to provide these functionality at ``/fetch/<pk>/preview``,
``/fetch/<pk>/thumb`` and ``/fetch/<pk>/doc``. Redirects to the new URLs
are in place. However, if you use these old URLs to access documents, you
should update your app or script to use the new URLs.
Getting document metadata
#########################
The api also has an endpoint to retrieve read-only metadata about specific documents. this
information is not served along with the document objects, since it requires reading
files and would therefore slow down document lists considerably.
Access the metadata of a document with an ID ``id`` at ``/api/documents/<id>/metadata/``.
The endpoint reports the following data:
* ``original_checksum``: MD5 checksum of the original document.
* ``original_size``: Size of the original document, in bytes.
* ``original_mime_type``: Mime type of the original document.
* ``media_filename``: Current filename of the document, under which it is stored inside the media directory.
* ``has_archive_version``: True, if this document is archived, false otherwise.
* ``original_metadata``: A list of metadata associated with the original document. See below.
* ``archive_checksum``: MD5 checksum of the archived document, or null.
* ``archive_size``: Size of the archived document in bytes, or null.
* ``archive_metadata``: Metadata associated with the archived document, or null. See below.
File metadata is reported as a list of objects in the following form:
.. code:: json
[
{
"namespace": "http://ns.adobe.com/pdf/1.3/",
"prefix": "pdf",
"key": "Producer",
"value": "SparklePDF, Fancy edition"
},
]
``namespace`` and ``prefix`` can be null. The actual metadata reported depends on the file type and the metadata
available in that specific document. Paperless only reports PDF metadata at this point.
Authorization
#############
The REST api provides three different forms of authentication.
1. Basic authentication
Authorize by providing a HTTP header in the form
.. code::
Authorization: Basic <credentials>
where ``credentials`` is a base64-encoded string of ``<username>:<password>``
2. Session authentication
When you're logged into paperless in your browser, you're automatically
logged into the API as well and don't need to provide any authorization
headers.
3. Token authentication
Paperless also offers an endpoint to acquire authentication tokens.
POST a username and password as a form or json string to ``/api/token/``
and paperless will respond with a token, if the login data is correct.
This token can be used to authenticate other requests with the
following HTTP header:
.. code::
Authorization: Token <token>
Tokens can be managed and revoked in the paperless admin.
Searching for documents
#######################
Full text searching is available on the ``/api/documents/`` endpoint. Two specific
query parameters cause the API to return full text search results:
* ``/api/documents/?query=your%20search%20query``: Search for a document using a full text query.
For details on the syntax, see :ref:`basic-usage_searching`.
* ``/api/documents/?more_like=1234``: Search for documents similar to the document with id 1234.
Pagination works exactly the same as it does for normal requests on this endpoint.
Certain limitations apply to full text queries:
* Results are always sorted by search score. The results matching the query best will show up first.
* Only a small subset of filtering parameters are supported.
Furthermore, each returned document has an additional ``__search_hit__`` attribute with various information
about the search results:
.. code::
{
"count": 31,
"next": "http://localhost:8000/api/documents/?page=2&query=test",
"previous": null,
"results": [
...
{
"id": 123,
"title": "title",
"content": "content",
...
"__search_hit__": {
"score": 0.343,
"highlights": "text <span class=\"match\">Test</span> text",
"rank": 23
}
},
...
]
}
* ``score`` is an indication how well this document matches the query relative to the other search results.
* ``highlights`` is an excerpt from the document content and highlights the search terms with ``<span>`` tags as shown above.
* ``rank`` is the index of the search results. The first result will have rank 0.
``/api/search/autocomplete/``
=============================
Get auto completions for a partial search term.
Query parameters:
* ``term``: The incomplete term.
* ``limit``: Amount of results. Defaults to 10.
Results returned by the endpoint are ordered by importance of the term in the
document index. The first result is the term that has the highest Tf/Idf score
in the index.
.. code:: json
[
"term1",
"term3",
"term6",
"term4"
]
.. _api-file_uploads:
POSTing documents
#################
The API provides a special endpoint for file uploads:
``/api/documents/post_document/``
POST a multipart form to this endpoint, where the form field ``document`` contains
the document that you want to upload to paperless. The filename is sanitized and
then used to store the document in a temporary directory, and the consumer will
be instructed to consume the document from there.
The endpoint supports the following optional form fields:
* ``title``: Specify a title that the consumer should use for the document.
* ``correspondent``: Specify the ID of a correspondent that the consumer should use for the document.
* ``document_type``: Similar to correspondent.
* ``tags``: Similar to correspondent. Specify this multiple times to have multiple tags added
to the document.
The endpoint will immediately return "OK" if the document consumption process
was started successfully. No additional status information about the consumption
process itself is available, since that happens in a different process.
.. _api-versioning:
API Versioning
##############
The REST API is versioned since Paperless-ngx 1.3.0.
* Versioning ensures that changes to the API don't break older clients.
* Clients specify the specific version of the API they wish to use with every request and Paperless will handle the request using the specified API version.
* Even if the underlying data model changes, older API versions will always serve compatible data.
* If no version is specified, Paperless will serve version 1 to ensure compatibility with older clients that do not request a specific API version.
API versions are specified by submitting an additional HTTP ``Accept`` header with every request:
.. code::
Accept: application/json; version=6
If an invalid version is specified, Paperless 1.3.0 will respond with "406 Not Acceptable" and an error message in the body.
Earlier versions of Paperless will serve API version 1 regardless of whether a version is specified via the ``Accept`` header.
If a client wishes to verify whether it is compatible with any given server, the following procedure should be performed:
1. Perform an *authenticated* request against any API endpoint. If the server is on version 1.3.0 or newer, the server will
add two custom headers to the response:
.. code::
X-Api-Version: 2
X-Version: 1.3.0
2. Determine whether the client is compatible with this server based on the presence/absence of these headers and their values if present.
API Changelog
=============
Version 1
Uploading
---------
Initial API version.
Version 2
---------
* Added field ``Tag.color``. This read/write string field contains a hex color such as ``#a6cee3``.
* Added read-only field ``Tag.text_color``. This field contains the text color to use for a specific tag, which is either black or white depending on the brightness of ``Tag.color``.
* Removed field ``Tag.colour``.
File uploads in an API are hard and so far as I've been able to tell, there's
no standard way of accepting them, so rather than crowbar file uploads into the
REST API and endure that headache, I've left that process to a simple HTTP
POST, documented on the :ref:`consumption page <consumption-http>`.

File diff suppressed because it is too large Load Diff

159
docs/changelog.rst Normal file
View File

@@ -0,0 +1,159 @@
Changelog
#########
* 0.3.2
* Fix for #172: defaulting ALLOWED_HOSTS to ``["*"]`` and allowing the user
to set her own value via ``PAPERLESS_ALLOWED_HOSTS`` should the need
arise.
* 0.3.1
* Added a default value for ``CONVERT_BINARY``
* 0.3.0
* Updated to using django-filter 1.x
* Added some system checks so new users aren't confused by misconfigurations.
* Consumer loop time is now configurable for systems with slow writes. Just
set ``PAPERLESS_CONSUMER_LOOP_TIME`` to a number of seconds. The default
is 10.
* As per `#44`_, we've removed support for ``PAPERLESS_CONVERT``,
``PAPERLESS_CONSUME``, and ``PAPERLESS_SECRET``. Please use
``PAPERLESS_CONVERT_BINARY``, ``PAPERLESS_CONSUMPTION_DIR``, and
``PAPERLESS_SHARED_SECRET`` respectively instead.
* 0.2.0
* `#150`_: The media root is now a variable you can set in
``paperless.conf``.
* `#148`_: The database location (sqlite) is now a variable you can set in
``paperless.conf``.
* `#146`_: Fixed a bug that allowed unauthorised access to the `/fetch` URL.
* `#131`_: Document files are now automatically removed from disk when
they're deleted in Paperless.
* `#121`_: Fixed a bug where Paperless wasn't setting document creation time
based on the file naming scheme.
* `#81`_: Added a hook to run an arbitrary script after every document is
consumed.
* `#98`_: Added optional environment variables for ImageMagick so that it
doesn't explode when handling Very Large Documents or when it's just
running on a low-memory system. Thanks to `Florian Harr`_ for his help on
this one.
* `#89`_ Ported the auto-tagging code to correspondents as well. Thanks to
`Justin Snyman`_ for the pointers in the issue queue.
* Added support for guessing the date from the file name along with the
correspondent, title, and tags. Thanks to `Tikitu de Jager`_ for his pull
request that I took forever to merge and to `Pit`_ for his efforts on the
regex front.
* `#94`_: Restored support for changing the created date in the UI. Thanks
to `Martin Honermeyer`_ and `Tim White`_ for working with me on this.
* 0.1.1
* Potentially **Breaking Change**: All references to "sender" in the code
have been renamed to "correspondent" to better reflect the nature of the
property (one could quite reasonably scan a document before sending it to
someone.)
* `#67`_: Rewrote the document exporter and added a new importer that allows
for full metadata retention without depending on the file name and
modification time. A big thanks to `Tikitu de Jager`_, `Pit`_,
`Florian Jung`_, and `Christopher Luu`_ for their code snippets and
contributing conversation that lead to this change.
* `#20`_: Added *unpaper* support to help in cleaning up the scanned image
before it's OCR'd. Thanks to `Pit`_ for this one.
* `#71`_ Added (encrypted) thumbnails in anticipation of a proper UI.
* `#68`_: Added support for using a proper config file at
``/etc/paperless.conf`` and modified the systemd unit files to use it.
* Refactored the Vagrant installation process to use environment variables
rather than asking the user to modify ``settings.py``.
* `#44`_: Harmonise environment variable names with constant names.
* `#60`_: Setup logging to actually use the Python native logging framework.
* `#53`_: Fixed an annoying bug that caused ``.jpeg`` and ``.JPG`` images
to be imported but made unavailable.
* 0.1.0
* Docker support! Big thanks to `Wayne Werner`_, `Brian Conn`_, and
`Tikitu de Jager`_ for this one, and especially to `Pit`_
who spearheadded this effort.
* A simple REST API is in place, but it should be considered unstable.
* Cleaned up the consumer to use temporary directories instead of a single
scratch space. (Thanks `Pit`_)
* Improved the efficiency of the consumer by parsing pages more intelligently
and introducing a threaded OCR process (thanks again `Pit`_).
* `#45`_: Cleaned up the logic for tag matching. Reported by `darkmatter`_.
* `#47`_: Auto-rotate landscape documents. Reported by `Paul`_ and fixed by
`Pit`_.
* `#48`_: Matching algorithms should do so on a word boundary (`darkmatter`_)
* `#54`_: Documented the re-tagger (`zedster`_)
* `#57`_: Make sure file is preserved on import failure (`darkmatter`_)
* Added tox with pep8 checking
* 0.0.6
* Added support for parallel OCR (significant work from `Pit`_)
* Sped up the language detection (significant work from `Pit`_)
* Added simple logging
* 0.0.5
* Added support for image files as documents (png, jpg, gif, tiff)
* Added a crude means of HTTP POST for document imports
* Added IMAP mail support
* Added a re-tagging utility
* Documentation for the above as well as data migration
* 0.0.4
* Added automated tagging basted on keyword matching
* Cleaned up the document listing page
* Removed ``User`` and ``Group`` from the admin
* Added ``pytz`` to the list of requirements
* 0.0.3
* Added basic tagging
* 0.0.2
* Added language detection
* Added datestamps to ``document_exporter``.
* Changed ``settings.TESSERACT_LANGUAGE`` to ``settings.OCR_LANGUAGE``.
* 0.0.1
* Initial release
.. _Brian Conn: https://github.com/TheConnMan
.. _Christopher Luu: https://github.com/nuudles
.. _Florian Jung: https://github.com/the01
.. _Tikitu de Jager: https://github.com/tikitu
.. _Paul: https://github.com/polo2ro
.. _Pit: https://github.com/pitkley
.. _Wayne Werner: https://github.com/waynew
.. _darkmatter: https://github.com/darkmatter
.. _zedster: https://github.com/zedster
.. _Martin Honermeyer: https://github.com/djmaze
.. _Tim White: https://github.com/timwhite
.. _Florian Harr: https://github.com/evils
.. _Justin Snyman: https://github.com/stringlytyped
.. _#20: https://github.com/danielquinn/paperless/issues/20
.. _#44: https://github.com/danielquinn/paperless/issues/44
.. _#45: https://github.com/danielquinn/paperless/issues/45
.. _#47: https://github.com/danielquinn/paperless/issues/47
.. _#48: https://github.com/danielquinn/paperless/issues/48
.. _#53: https://github.com/danielquinn/paperless/issues/53
.. _#54: https://github.com/danielquinn/paperless/issues/54
.. _#57: https://github.com/danielquinn/paperless/issues/57
.. _#60: https://github.com/danielquinn/paperless/issues/60
.. _#67: https://github.com/danielquinn/paperless/issues/67
.. _#68: https://github.com/danielquinn/paperless/issues/68
.. _#71: https://github.com/danielquinn/paperless/issues/71
.. _#81: https://github.com/danielquinn/paperless/issues/81
.. _#89: https://github.com/danielquinn/paperless/issues/89
.. _#94: https://github.com/danielquinn/paperless/issues/94
.. _#98: https://github.com/danielquinn/paperless/issues/98
.. _#121: https://github.com/danielquinn/paperless/issues/121
.. _#131: https://github.com/danielquinn/paperless/issues/131
.. _#146: https://github.com/danielquinn/paperless/issues/146
.. _#148: https://github.com/danielquinn/paperless/pull/148
.. _#150: https://github.com/danielquinn/paperless/pull/150

View File

@@ -1,40 +1,64 @@
import sphinx_rtd_theme
# -*- coding: utf-8 -*-
#
# Paperless documentation build configuration file, created by
# sphinx-quickstart on Mon Oct 26 18:36:52 2015.
#
# This file is execfile()d with the current directory set to its
# containing dir.
#
# Note that not all possible configuration values are present in this
# autogenerated file.
#
# All configuration values have a default; values that are commented out
# serve to show the default.
import sys
import os
__version__ = None
__full_version_str__ = None
__major_minor_version_str__ = None
exec(open("../src/paperless/version.py").read())
# Believe it or not, this is the officially sanctioned way to add custom CSS.
def setup(app):
app.add_stylesheet("custom.css")
# If extensions (or modules to document with autodoc) are in another directory,
# add these directories to sys.path here. If the directory is relative to the
# documentation root, use os.path.abspath to make it absolute, like shown here.
#sys.path.insert(0, os.path.abspath('.'))
# -- General configuration ------------------------------------------------
# If your documentation needs a minimal Sphinx version, state it here.
#needs_sphinx = '1.0'
# Add any Sphinx extension module names here, as strings. They can be
# extensions coming with Sphinx (named 'sphinx.ext.*') or your custom
# ones.
extensions = [
"sphinx.ext.autodoc",
"sphinx.ext.intersphinx",
"sphinx.ext.todo",
"sphinx.ext.imgmath",
"sphinx.ext.viewcode",
"sphinx_rtd_theme",
"myst_parser",
'sphinx.ext.autodoc',
'sphinx.ext.intersphinx',
'sphinx.ext.todo',
'sphinx.ext.pngmath',
'sphinx.ext.viewcode',
]
# Add any paths that contain templates here, relative to this directory.
templates_path = ["_templates"]
templates_path = ['_templates']
# The suffix of source filenames.
source_suffix = {
".rst": "restructuredtext",
".md": "markdown",
}
source_suffix = '.rst'
# The encoding of source files.
# source_encoding = 'utf-8-sig'
#source_encoding = 'utf-8-sig'
# The master toctree document.
master_doc = "index"
master_doc = 'index'
# General information about the project.
project = "Paperless-ngx"
copyright = "2015-2022, Daniel Quinn, Jonas Winkler, and the paperless-ngx team"
project = u'Paperless'
copyright = u'2015, Daniel Quinn'
# The version info for the project you're documenting, acts as replacement for
# |version| and |release|, also used in various other places throughout the
@@ -47,190 +71,199 @@ copyright = "2015-2022, Daniel Quinn, Jonas Winkler, and the paperless-ngx team"
#
# The short X.Y version.
version = __major_minor_version_str__
version = ".".join([str(_) for _ in __version__[:2]])
# The full version, including alpha/beta/rc tags.
release = __full_version_str__
release = ".".join([str(_) for _ in __version__[:3]])
# The language for content autogenerated by Sphinx. Refer to documentation
# for a list of supported languages.
# language = None
#language = None
# There are two options for replacing |today|: either, you set today to some
# non-false value, then it is used:
# today = ''
#today = ''
# Else, today_fmt is used as the format for a strftime call.
# today_fmt = '%B %d, %Y'
#today_fmt = '%B %d, %Y'
# List of patterns, relative to source directory, that match files and
# directories to ignore when looking for source files.
exclude_patterns = ["_build"]
exclude_patterns = ['_build']
# The reST default role (used for this markup: `text`) to use for all
# documents.
# default_role = None
#default_role = None
# If true, '()' will be appended to :func: etc. cross-reference text.
# add_function_parentheses = True
#add_function_parentheses = True
# If true, the current module name will be prepended to all description
# unit titles (such as .. function::).
# add_module_names = True
#add_module_names = True
# If true, sectionauthor and moduleauthor directives will be shown in the
# output. They are ignored by default.
# show_authors = False
#show_authors = False
# The name of the Pygments (syntax highlighting) style to use.
pygments_style = "sphinx"
pygments_style = 'sphinx'
# A list of ignored prefixes for module index sorting.
# modindex_common_prefix = []
#modindex_common_prefix = []
# If true, keep warnings as "system message" paragraphs in the built documents.
# keep_warnings = False
#keep_warnings = False
# -- Options for HTML output ----------------------------------------------
# The theme to use for HTML and HTML Help pages. See the documentation for
# a list of builtin themes.
html_theme = "sphinx_rtd_theme"
html_theme = 'default'
# Theme options are theme-specific and customize the look and feel of a theme
# further. For a list of options available for each theme, see the
# documentation.
# html_theme_options = {}
#html_theme_options = {}
# Add any paths that contain custom themes here, relative to this directory.
html_theme_path = []
# The name for this set of Sphinx documents. If None, it defaults to
# "<project> v<release> documentation".
# html_title = None
#html_title = None
# A shorter title for the navigation bar. Default is the same as html_title.
# html_short_title = None
#html_short_title = None
# The name of an image file (relative to this directory) to place at the top
# of the sidebar.
# html_logo = None
#html_logo = None
# The name of an image file (within the static path) to use as favicon of the
# docs. This file should be a Windows icon file (.ico) being 16x16 or 32x32
# pixels large.
# html_favicon = None
#html_favicon = None
# Add any paths that contain custom static files (such as style sheets) here,
# relative to this directory. They are copied after the builtin static files,
# so a file named "default.css" will overwrite the builtin "default.css".
html_static_path = ["_static"]
# These paths are either relative to html_static_path
# or fully qualified paths (eg. https://...)
html_css_files = [
"css/custom.css",
]
html_js_files = [
"js/darkmode.js",
]
html_static_path = ['_static']
# Add any extra paths that contain custom files (such as robots.txt or
# .htaccess) here, relative to this directory. These files are copied
# directly to the root of the documentation.
# html_extra_path = []
#html_extra_path = []
# If not '', a 'Last updated on:' timestamp is inserted at every page bottom,
# using the given strftime format.
# html_last_updated_fmt = '%b %d, %Y'
#html_last_updated_fmt = '%b %d, %Y'
# If true, SmartyPants will be used to convert quotes and dashes to
# typographically correct entities.
# html_use_smartypants = True
#html_use_smartypants = True
# Custom sidebar templates, maps document names to template names.
# html_sidebars = {}
#html_sidebars = {}
# Additional templates that should be rendered to pages, maps page names to
# template names.
# html_additional_pages = {}
#html_additional_pages = {}
# If false, no module index is generated.
# html_domain_indices = True
#html_domain_indices = True
# If false, no index is generated.
# html_use_index = True
#html_use_index = True
# If true, the index is split into individual pages for each letter.
# html_split_index = False
#html_split_index = False
# If true, links to the reST sources are added to the pages.
# html_show_sourcelink = True
#html_show_sourcelink = True
# If true, "Created using Sphinx" is shown in the HTML footer. Default is True.
# html_show_sphinx = True
#html_show_sphinx = True
# If true, "(C) Copyright ..." is shown in the HTML footer. Default is True.
# html_show_copyright = True
#html_show_copyright = True
# If true, an OpenSearch description file will be output, and all pages will
# contain a <link> tag referring to it. The value of this option must be the
# base URL from which the finished HTML is served.
# html_use_opensearch = ''
#html_use_opensearch = ''
# This is the file name suffix for HTML files (e.g. ".xhtml").
# html_file_suffix = None
#html_file_suffix = None
# Output file base name for HTML help builder.
htmlhelp_basename = "paperless"
htmlhelp_basename = 'paperless'
#
# Attempt to use the ReadTheDocs theme. If it's not installed, fallback to
# the default.
#
try:
import sphinx_rtd_theme
html_theme = "sphinx_rtd_theme"
html_theme_path = [sphinx_rtd_theme.get_html_theme_path()]
except ImportError:
pass
# -- Options for LaTeX output ---------------------------------------------
latex_elements = {
# The paper size ('letterpaper' or 'a4paper').
#'papersize': 'letterpaper',
# The font size ('10pt', '11pt' or '12pt').
#'pointsize': '10pt',
# Additional stuff for the LaTeX preamble.
#'preamble': '',
# The paper size ('letterpaper' or 'a4paper').
#'papersize': 'letterpaper',
# The font size ('10pt', '11pt' or '12pt').
#'pointsize': '10pt',
# Additional stuff for the LaTeX preamble.
#'preamble': '',
}
# Grouping the document tree into LaTeX files. List of tuples
# (source start file, target name, title,
# author, documentclass [howto, manual, or own class]).
latex_documents = [
("index", "paperless.tex", "Paperless Documentation", "Daniel Quinn", "manual"),
('index', 'paperless.tex', u'Paperless Documentation',
u'Daniel Quinn', 'manual'),
]
# The name of an image file (relative to this directory) to place at the top of
# the title page.
# latex_logo = None
#latex_logo = None
# For "manual" documents, if this is true, then toplevel headings are parts,
# not chapters.
# latex_use_parts = False
#latex_use_parts = False
# If true, show page references after internal links.
# latex_show_pagerefs = False
#latex_show_pagerefs = False
# If true, show URL addresses after external links.
# latex_show_urls = False
#latex_show_urls = False
# Documents to append as an appendix to all manuals.
# latex_appendices = []
#latex_appendices = []
# If false, no module index is generated.
# latex_domain_indices = True
#latex_domain_indices = True
# -- Options for manual page output ---------------------------------------
# One entry per manual page. List of tuples
# (source start file, name, description, authors, manual section).
man_pages = [("index", "paperless", "Paperless Documentation", ["Daniel Quinn"], 1)]
man_pages = [
('index', 'paperless', u'Paperless Documentation',
[u'Daniel Quinn'], 1)
]
# If true, show URL addresses after external links.
# man_show_urls = False
#man_show_urls = False
# -- Options for Texinfo output -------------------------------------------
@@ -239,99 +272,93 @@ man_pages = [("index", "paperless", "Paperless Documentation", ["Daniel Quinn"],
# (source start file, target name, title, author,
# dir menu entry, description, category)
texinfo_documents = [
(
"index",
"Paperless",
"Paperless Documentation",
"Daniel Quinn",
"paperless",
"Scan, index, and archive all of your paper documents.",
"Miscellaneous",
),
('index', 'Paperless', u'Paperless Documentation',
u'Daniel Quinn', 'paperless', 'Scan, index, and archive all of your paper documents.',
'Miscellaneous'),
]
# Documents to append as an appendix to all manuals.
# texinfo_appendices = []
#texinfo_appendices = []
# If false, no module index is generated.
# texinfo_domain_indices = True
#texinfo_domain_indices = True
# How to display URL addresses: 'footnote', 'no', or 'inline'.
# texinfo_show_urls = 'footnote'
#texinfo_show_urls = 'footnote'
# If true, do not generate a @detailmenu in the "Top" node's menu.
# texinfo_no_detailmenu = False
#texinfo_no_detailmenu = False
# -- Options for Epub output ----------------------------------------------
# Bibliographic Dublin Core info.
epub_title = "Paperless"
epub_author = "Daniel Quinn"
epub_publisher = "Daniel Quinn"
epub_copyright = "2015, Daniel Quinn"
epub_title = u'Paperless'
epub_author = u'Daniel Quinn'
epub_publisher = u'Daniel Quinn'
epub_copyright = u'2015, Daniel Quinn'
# The basename for the epub file. It defaults to the project name.
# epub_basename = u'Paperless'
#epub_basename = u'Paperless'
# The HTML theme for the epub output. Since the default themes are not optimized
# for small screen space, using the same theme for HTML and epub output is
# usually not wise. This defaults to 'epub', a theme designed to save visual
# space.
# epub_theme = 'epub'
#epub_theme = 'epub'
# The language of the text. It defaults to the language option
# or en if the language is not set.
# epub_language = ''
#epub_language = ''
# The scheme of the identifier. Typical schemes are ISBN or URL.
# epub_scheme = ''
#epub_scheme = ''
# The unique identifier of the text. This can be a ISBN number
# or the project homepage.
# epub_identifier = ''
#epub_identifier = ''
# A unique identification for the text.
# epub_uid = ''
#epub_uid = ''
# A tuple containing the cover image and cover page html template filenames.
# epub_cover = ()
#epub_cover = ()
# A sequence of (type, uri, title) tuples for the guide element of content.opf.
# epub_guide = ()
#epub_guide = ()
# HTML files that should be inserted before the pages created by sphinx.
# The format is a list of tuples containing the path and title.
# epub_pre_files = []
#epub_pre_files = []
# HTML files shat should be inserted after the pages created by sphinx.
# The format is a list of tuples containing the path and title.
# epub_post_files = []
#epub_post_files = []
# A list of files that should not be packed into the epub file.
epub_exclude_files = ["search.html"]
epub_exclude_files = ['search.html']
# The depth of the table of contents in toc.ncx.
# epub_tocdepth = 3
#epub_tocdepth = 3
# Allow duplicate toc entries.
# epub_tocdup = True
#epub_tocdup = True
# Choose between 'default' and 'includehidden'.
# epub_tocscope = 'default'
#epub_tocscope = 'default'
# Fix unsupported image types using the PIL.
# epub_fix_images = False
#epub_fix_images = False
# Scale large images.
# epub_max_image_width = 0
#epub_max_image_width = 0
# How to display URL addresses: 'footnote', 'no', or 'inline'.
# epub_show_urls = 'inline'
#epub_show_urls = 'inline'
# If false, no index is generated.
# epub_use_index = True
#epub_use_index = True
# Example configuration for intersphinx: refer to the Python standard library.
intersphinx_mapping = {"http://docs.python.org/": None}
intersphinx_mapping = {'http://docs.python.org/': None}

View File

@@ -1,851 +0,0 @@
.. _configuration:
*************
Configuration
*************
Paperless provides a wide range of customizations.
Depending on how you run paperless, these settings have to be defined in different
places.
* If you run paperless on docker, ``paperless.conf`` is not used. Rather, configure
paperless by copying necessary options to ``docker-compose.env``.
* If you are running paperless on anything else, paperless will search for the
configuration file in these locations and use the first one it finds:
.. code::
/path/to/paperless/paperless.conf
/etc/paperless.conf
/usr/local/etc/paperless.conf
Required services
#################
PAPERLESS_REDIS=<url>
This is required for processing scheduled tasks such as email fetching, index
optimization and for training the automatic document matcher.
Defaults to redis://localhost:6379.
PAPERLESS_DBHOST=<hostname>
By default, sqlite is used as the database backend. This can be changed here.
Set PAPERLESS_DBHOST and PostgreSQL will be used instead of mysql.
PAPERLESS_DBPORT=<port>
Adjust port if necessary.
Default is 5432.
PAPERLESS_DBNAME=<name>
Database name in PostgreSQL.
Defaults to "paperless".
PAPERLESS_DBUSER=<name>
Database user in PostgreSQL.
Defaults to "paperless".
PAPERLESS_DBPASS=<password>
Database password for PostgreSQL.
Defaults to "paperless".
PAPERLESS_DBSSLMODE=<mode>
SSL mode to use when connecting to PostgreSQL.
See `the official documentation about sslmode <https://www.postgresql.org/docs/current/libpq-ssl.html>`_.
Default is ``prefer``.
Paths and folders
#################
PAPERLESS_CONSUMPTION_DIR=<path>
This where your documents should go to be consumed. Make sure that it exists
and that the user running the paperless service can read/write its contents
before you start Paperless.
Don't change this when using docker, as it only changes the path within the
container. Change the local consumption directory in the docker-compose.yml
file instead.
Defaults to "../consume/", relative to the "src" directory.
PAPERLESS_DATA_DIR=<path>
This is where paperless stores all its data (search index, SQLite database,
classification model, etc).
Defaults to "../data/", relative to the "src" directory.
PAPERLESS_TRASH_DIR=<path>
Instead of removing deleted documents, they are moved to this directory.
This must be writeable by the user running paperless. When running inside
docker, ensure that this path is within a permanent volume (such as
"../media/trash") so it won't get lost on upgrades.
Defaults to empty (i.e. really delete documents).
PAPERLESS_MEDIA_ROOT=<path>
This is where your documents and thumbnails are stored.
You can set this and PAPERLESS_DATA_DIR to the same folder to have paperless
store all its data within the same volume.
Defaults to "../media/", relative to the "src" directory.
PAPERLESS_STATICDIR=<path>
Override the default STATIC_ROOT here. This is where all static files
created using "collectstatic" manager command are stored.
Unless you're doing something fancy, there is no need to override this.
Defaults to "../static/", relative to the "src" directory.
PAPERLESS_FILENAME_FORMAT=<format>
Changes the filenames paperless uses to store documents in the media directory.
See :ref:`advanced-file_name_handling` for details.
Default is none, which disables this feature.
PAPERLESS_LOGGING_DIR=<path>
This is where paperless will store log files.
Defaults to "``PAPERLESS_DATA_DIR``/log/".
Logging
#######
PAPERLESS_LOGROTATE_MAX_SIZE=<num>
Maximum file size for log files before they are rotated, in bytes.
Defaults to 1 MiB.
PAPERLESS_LOGROTATE_MAX_BACKUPS=<num>
Number of rotated log files to keep.
Defaults to 20.
.. _hosting-and-security:
Hosting & Security
##################
PAPERLESS_SECRET_KEY=<key>
Paperless uses this to make session tokens. If you expose paperless on the
internet, you need to change this, since the default secret is well known.
Use any sequence of characters. The more, the better. You don't need to
remember this. Just face-roll your keyboard.
Default is listed in the file ``src/paperless/settings.py``.
PAPERLESS_URL=<url>
This setting can be used to set the three options below (ALLOWED_HOSTS,
CORS_ALLOWED_HOSTS and CSRF_TRUSTED_ORIGINS). If the other options are
set the values will be combined with this one. Do not include a trailing
slash. E.g. https://paperless.domain.com
Defaults to empty string, leaving the other settings unaffected.
PAPERLESS_CSRF_TRUSTED_ORIGINS=<comma-separated-list>
A list of trusted origins for unsafe requests (e.g. POST). As of Django 4.0
this is required to access the Django admin via the web.
See https://docs.djangoproject.com/en/4.0/ref/settings/#csrf-trusted-origins
Can also be set using PAPERLESS_URL (see above).
Defaults to empty string, which does not add any origins to the trusted list.
PAPERLESS_ALLOWED_HOSTS=<comma-separated-list>
If you're planning on putting Paperless on the open internet, then you
really should set this value to the domain name you're using. Failing to do
so leaves you open to HTTP host header attacks:
https://docs.djangoproject.com/en/3.1/topics/security/#host-header-validation
Just remember that this is a comma-separated list, so "example.com" is fine,
as is "example.com,www.example.com", but NOT " example.com" or "example.com,"
Can also be set using PAPERLESS_URL (see above).
If manually set, please remember to include "localhost". Otherwise docker
healthcheck will fail.
Defaults to "*", which is all hosts.
PAPERLESS_CORS_ALLOWED_HOSTS=<comma-separated-list>
You need to add your servers to the list of allowed hosts that can do CORS
calls. Set this to your public domain name.
Can also be set using PAPERLESS_URL (see above).
Defaults to "http://localhost:8000".
PAPERLESS_FORCE_SCRIPT_NAME=<path>
To host paperless under a subpath url like example.com/paperless you set
this value to /paperless. No trailing slash!
Defaults to none, which hosts paperless at "/".
PAPERLESS_STATIC_URL=<path>
Override the STATIC_URL here. Unless you're hosting Paperless off a
subdomain like /paperless/, you probably don't need to change this.
Defaults to "/static/".
PAPERLESS_AUTO_LOGIN_USERNAME=<username>
Specify a username here so that paperless will automatically perform login
with the selected user.
.. danger::
Do not use this when exposing paperless on the internet. There are no
checks in place that would prevent you from doing this.
Defaults to none, which disables this feature.
PAPERLESS_ADMIN_USER=<username>
If this environment variable is specified, Paperless automatically creates
a superuser with the provided username at start. This is useful in cases
where you can not run the `createsuperuser` command separately, such as Kubernetes
or AWS ECS.
Requires `PAPERLESS_ADMIN_PASSWORD` to be set.
.. note::
This will not change an existing [super]user's password, nor will
it recreate a user that already exists. You can leave this throughout
the lifecycle of the containers.
PAPERLESS_ADMIN_MAIL=<email>
(Optional) Specify superuser email address. Only used when
`PAPERLESS_ADMIN_USER` is set.
Defaults to ``root@localhost``.
PAPERLESS_ADMIN_PASSWORD=<password>
Only used when `PAPERLESS_ADMIN_USER` is set.
This will be the password of the automatically created superuser.
PAPERLESS_COOKIE_PREFIX=<str>
Specify a prefix that is added to the cookies used by paperless to identify
the currently logged in user. This is useful for when you're running two
instances of paperless on the same host.
After changing this, you will have to login again.
Defaults to ``""``, which does not alter the cookie names.
PAPERLESS_ENABLE_HTTP_REMOTE_USER=<bool>
Allows authentication via HTTP_REMOTE_USER which is used by some SSO
applications.
.. warning::
This will allow authentication by simply adding a ``Remote-User: <username>`` header
to a request. Use with care! You especially *must* ensure that any such header is not
passed from your proxy server to paperless.
If you're exposing paperless to the internet directly, do not use this.
Also see the warning `in the official documentation <https://docs.djangoproject.com/en/3.1/howto/auth-remote-user/#configuration>`.
Defaults to `false` which disables this feature.
PAPERLESS_HTTP_REMOTE_USER_HEADER_NAME=<str>
If `PAPERLESS_ENABLE_HTTP_REMOTE_USER` is enabled, this property allows to
customize the name of the HTTP header from which the authenticated username
is extracted. Values are in terms of
[HttpRequest.META](https://docs.djangoproject.com/en/3.1/ref/request-response/#django.http.HttpRequest.META).
Thus, the configured value must start with `HTTP_` followed by the
normalized actual header name.
Defaults to `HTTP_REMOTE_USER`.
PAPERLESS_LOGOUT_REDIRECT_URL=<str>
URL to redirect the user to after a logout. This can be used together with
`PAPERLESS_ENABLE_HTTP_REMOTE_USER` to redirect the user back to the SSO
application's logout page.
Defaults to None, which disables this feature.
.. _configuration-ocr:
OCR settings
############
Paperless uses `OCRmyPDF <https://ocrmypdf.readthedocs.io/en/latest/>`_ for
performing OCR on documents and images. Paperless uses sensible defaults for
most settings, but all of them can be configured to your needs.
PAPERLESS_OCR_LANGUAGE=<lang>
Customize the language that paperless will attempt to use when
parsing documents.
It should be a 3-letter language code consistent with ISO
639: https://www.loc.gov/standards/iso639-2/php/code_list.php
Set this to the language most of your documents are written in.
This can be a combination of multiple languages such as ``deu+eng``,
in which case tesseract will use whatever language matches best.
Keep in mind that tesseract uses much more cpu time with multiple
languages enabled.
Defaults to "eng".
Note: If your language contains a '-' such as chi-sim, you must use chi_sim
PAPERLESS_OCR_MODE=<mode>
Tell paperless when and how to perform ocr on your documents. Four modes
are available:
* ``skip``: Paperless skips all pages and will perform ocr only on pages
where no text is present. This is the safest option.
* ``skip_noarchive``: In addition to skip, paperless won't create an
archived version of your documents when it finds any text in them.
This is useful if you don't want to have two almost-identical versions
of your digital documents in the media folder. This is the fastest option.
* ``redo``: Paperless will OCR all pages of your documents and attempt to
replace any existing text layers with new text. This will be useful for
documents from scanners that already performed OCR with insufficient
results. It will also perform OCR on purely digital documents.
This option may fail on some documents that have features that cannot
be removed, such as forms. In this case, the text from the document is
used instead.
* ``force``: Paperless rasterizes your documents, converting any text
into images and puts the OCRed text on top. This works for all documents,
however, the resulting document may be significantly larger and text
won't appear as sharp when zoomed in.
The default is ``skip``, which only performs OCR when necessary and always
creates archived documents.
Read more about this in the `OCRmyPDF documentation <https://ocrmypdf.readthedocs.io/en/latest/advanced.html#when-ocr-is-skipped>`_.
PAPERLESS_OCR_CLEAN=<mode>
Tells paperless to use ``unpaper`` to clean any input document before
sending it to tesseract. This uses more resources, but generally results
in better OCR results. The following modes are available:
* ``clean``: Apply unpaper.
* ``clean-final``: Apply unpaper, and use the cleaned images to build the
output file instead of the original images.
* ``none``: Do not apply unpaper.
Defaults to ``clean``.
.. note::
``clean-final`` is incompatible with ocr mode ``redo``. When both
``clean-final`` and the ocr mode ``redo`` is configured, ``clean``
is used instead.
PAPERLESS_OCR_DESKEW=<bool>
Tells paperless to correct skewing (slight rotation of input images mainly
due to improper scanning)
Defaults to ``true``, which enables this feature.
.. note::
Deskewing is incompatible with ocr mode ``redo``. Deskewing will get
disabled automatically if ``redo`` is used as the ocr mode.
PAPERLESS_OCR_ROTATE_PAGES=<bool>
Tells paperless to correct page rotation (90°, 180° and 270° rotation).
If you notice that paperless is not rotating incorrectly rotated
pages (or vice versa), try adjusting the threshold up or down (see below).
Defaults to ``true``, which enables this feature.
PAPERLESS_OCR_ROTATE_PAGES_THRESHOLD=<num>
Adjust the threshold for automatic page rotation by ``PAPERLESS_OCR_ROTATE_PAGES``.
This is an arbitrary value reported by tesseract. "15" is a very conservative value,
whereas "2" is a very aggressive option and will often result in correctly rotated pages
being rotated as well.
Defaults to "12".
PAPERLESS_OCR_OUTPUT_TYPE=<type>
Specify the the type of PDF documents that paperless should produce.
* ``pdf``: Modify the PDF document as little as possible.
* ``pdfa``: Convert PDF documents into PDF/A-2b documents, which is a
subset of the entire PDF specification and meant for storing
documents long term.
* ``pdfa-1``, ``pdfa-2``, ``pdfa-3`` to specify the exact version of
PDF/A you wish to use.
If not specified, ``pdfa`` is used. Remember that paperless also keeps
the original input file as well as the archived version.
PAPERLESS_OCR_PAGES=<num>
Tells paperless to use only the specified amount of pages for OCR. Documents
with less than the specified amount of pages get OCR'ed completely.
Specifying 1 here will only use the first page.
When combined with ``PAPERLESS_OCR_MODE=redo`` or ``PAPERLESS_OCR_MODE=force``,
paperless will not modify any text it finds on excluded pages and copy it
verbatim.
Defaults to 0, which disables this feature and always uses all pages.
PAPERLESS_OCR_IMAGE_DPI=<num>
Paperless will OCR any images you put into the system and convert them
into PDF documents. This is useful if your scanner produces images.
In order to do so, paperless needs to know the DPI of the image.
Most images from scanners will have this information embedded and
paperless will detect and use that information. In case this fails, it
uses this value as a fallback.
Set this to the DPI your scanner produces images at.
Default is none, which will automatically calculate image DPI so that
the produced PDF documents are A4 sized.
PAPERLESS_OCR_MAX_IMAGE_PIXELS=<num>
Paperless will not OCR images that have more pixels than this limit.
This is intended to prevent decompression bombs from overloading paperless.
Increasing this limit is desired if you face a DecompressionBombError despite
the concerning file not being malicious; this could e.g. be caused by invalidly
recognized metadata.
If you have enough resources or if you are certain that your uploaded files
are not malicious you can increase this value to your needs.
The default value is 256000000, an image with more pixels than that would not be parsed.
PAPERLESS_OCR_USER_ARGS=<json>
OCRmyPDF offers many more options. Use this parameter to specify any
additional arguments you wish to pass to OCRmyPDF. Since Paperless uses
the API of OCRmyPDF, you have to specify these in a format that can be
passed to the API. See `the API reference of OCRmyPDF <https://ocrmypdf.readthedocs.io/en/latest/api.html#reference>`_
for valid parameters. All command line options are supported, but they
use underscores instead of dashes.
.. caution::
Paperless has been tested to work with the OCR options provided
above. There are many options that are incompatible with each other,
so specifying invalid options may prevent paperless from consuming
any documents.
Specify arguments as a JSON dictionary. Keep note of lower case booleans
and double quoted parameter names and strings. Examples:
.. code:: json
{"deskew": true, "optimize": 3, "unpaper_args": "--pre-rotate 90"}
.. _configuration-tika:
Tika settings
#############
Paperless can make use of `Tika <https://tika.apache.org/>`_ and
`Gotenberg <https://gotenberg.dev/>`_ for parsing and
converting "Office" documents (such as ".doc", ".xlsx" and ".odt"). If you
wish to use this, you must provide a Tika server and a Gotenberg server,
configure their endpoints, and enable the feature.
PAPERLESS_TIKA_ENABLED=<bool>
Enable (or disable) the Tika parser.
Defaults to false.
PAPERLESS_TIKA_ENDPOINT=<url>
Set the endpoint URL were Paperless can reach your Tika server.
Defaults to "http://localhost:9998".
PAPERLESS_TIKA_GOTENBERG_ENDPOINT=<url>
Set the endpoint URL were Paperless can reach your Gotenberg server.
Defaults to "http://localhost:3000".
If you run paperless on docker, you can add those services to the docker-compose
file (see the provided ``docker-compose.sqlite-tika.yml`` file for reference). The changes
requires are as follows:
.. code:: yaml
services:
# ...
webserver:
# ...
environment:
# ...
PAPERLESS_TIKA_ENABLED: 1
PAPERLESS_TIKA_GOTENBERG_ENDPOINT: http://gotenberg:3000
PAPERLESS_TIKA_ENDPOINT: http://tika:9998
# ...
gotenberg:
image: gotenberg/gotenberg:7.4
restart: unless-stopped
command:
- "gotenberg"
- "--chromium-disable-routes=true"
tika:
image: ghcr.io/paperless-ngx/tika:latest
restart: unless-stopped
Add the configuration variables to the environment of the webserver (alternatively
put the configuration in the ``docker-compose.env`` file) and add the additional
services below the webserver service. Watch out for indentation.
Make sure to use the correct format `PAPERLESS_TIKA_ENABLED = 1` so python_dotenv can parse the statement correctly.
Software tweaks
###############
PAPERLESS_TASK_WORKERS=<num>
Paperless does multiple things in the background: Maintain the search index,
maintain the automatic matching algorithm, check emails, consume documents,
etc. This variable specifies how many things it will do in parallel.
PAPERLESS_THREADS_PER_WORKER=<num>
Furthermore, paperless uses multiple threads when consuming documents to
speed up OCR. This variable specifies how many pages paperless will process
in parallel on a single document.
.. caution::
Ensure that the product
PAPERLESS_TASK_WORKERS * PAPERLESS_THREADS_PER_WORKER
does not exceed your CPU core count or else paperless will be extremely slow.
If you want paperless to process many documents in parallel, choose a high
worker count. If you want paperless to process very large documents faster,
use a higher thread per worker count.
The default is a balance between the two, according to your CPU core count,
with a slight favor towards threads per worker:
+----------------+---------+---------+
| CPU core count | Workers | Threads |
+----------------+---------+---------+
| 1 | 1 | 1 |
+----------------+---------+---------+
| 2 | 2 | 1 |
+----------------+---------+---------+
| 4 | 2 | 2 |
+----------------+---------+---------+
| 6 | 2 | 3 |
+----------------+---------+---------+
| 8 | 2 | 4 |
+----------------+---------+---------+
| 12 | 3 | 4 |
+----------------+---------+---------+
| 16 | 4 | 4 |
+----------------+---------+---------+
If you only specify PAPERLESS_TASK_WORKERS, paperless will adjust
PAPERLESS_THREADS_PER_WORKER automatically.
PAPERLESS_WORKER_TIMEOUT=<num>
Machines with few cores or weak ones might not be able to finish OCR on
large documents within the default 1800 seconds. So extending this timeout
may prove to be useful on weak hardware setups.
PAPERLESS_WORKER_RETRY=<num>
If PAPERLESS_WORKER_TIMEOUT has been configured, the retry time for a task can
also be configured. By default, this value will be set to 10s more than the
worker timeout. This value should never be set less than the worker timeout.
PAPERLESS_TIME_ZONE=<timezone>
Set the time zone here.
See https://docs.djangoproject.com/en/3.1/ref/settings/#std:setting-TIME_ZONE
for details on how to set it.
Defaults to UTC.
.. _configuration-polling:
PAPERLESS_CONSUMER_POLLING=<num>
If paperless won't find documents added to your consume folder, it might
not be able to automatically detect filesystem changes. In that case,
specify a polling interval in seconds here, which will then cause paperless
to periodically check your consumption directory for changes. This will also
disable listening for file system changes with ``inotify``.
Defaults to 0, which disables polling and uses filesystem notifications.
PAPERLESS_CONSUMER_DELETE_DUPLICATES=<bool>
When the consumer detects a duplicate document, it will not touch the
original document. This default behavior can be changed here.
Defaults to false.
PAPERLESS_CONSUMER_RECURSIVE=<bool>
Enable recursive watching of the consumption directory. Paperless will
then pickup files from files in subdirectories within your consumption
directory as well.
Defaults to false.
PAPERLESS_CONSUMER_SUBDIRS_AS_TAGS=<bool>
Set the names of subdirectories as tags for consumed files.
E.g. <CONSUMPTION_DIR>/foo/bar/file.pdf will add the tags "foo" and "bar" to
the consumed file. Paperless will create any tags that don't exist yet.
This is useful for sorting documents with certain tags such as ``car`` or
``todo`` prior to consumption. These folders won't be deleted.
PAPERLESS_CONSUMER_RECURSIVE must be enabled for this to work.
Defaults to false.
PAPERLESS_CONSUMER_ENABLE_BARCODES=<bool>
Enables the scanning and page separation based on detected barcodes.
This allows for scanning and adding multiple documents per uploaded
file, which are separated by one or multiple barcode pages.
For ease of use, it is suggested to use a standardized separation page,
e.g. `here <https://www.alliancegroup.co.uk/patch-codes.htm>`_.
If no barcodes are detected in the uploaded file, no page separation
will happen.
The original document will be removed and the separated pages will be
saved as pdf.
Defaults to false.
PAPERLESS_CONSUMER_BARCODE_TIFF_SUPPORT=<bool>
Whether TIFF image files should be scanned for barcodes.
This will automatically convert any TIFF image(s) to pdfs for later
processing.
This only has an effect, if PAPERLESS_CONSUMER_ENABLE_BARCODES has been
enabled.
Defaults to false.
PAPERLESS_CONSUMER_BARCODE_STRING=PATCHT
Defines the string to be detected as a separator barcode.
If paperless is used with the PATCH-T separator pages, users
shouldn't change this.
Defaults to "PATCHT"
PAPERLESS_CONVERT_MEMORY_LIMIT=<num>
On smaller systems, or even in the case of Very Large Documents, the consumer
may explode, complaining about how it's "unable to extend pixel cache". In
such cases, try setting this to a reasonably low value, like 32. The
default is to use whatever is necessary to do everything without writing to
disk, and units are in megabytes.
For more information on how to use this value, you should search
the web for "MAGICK_MEMORY_LIMIT".
Defaults to 0, which disables the limit.
PAPERLESS_CONVERT_TMPDIR=<path>
Similar to the memory limit, if you've got a small system and your OS mounts
/tmp as tmpfs, you should set this to a path that's on a physical disk, like
/home/your_user/tmp or something. ImageMagick will use this as scratch space
when crunching through very large documents.
For more information on how to use this value, you should search
the web for "MAGICK_TMPDIR".
Default is none, which disables the temporary directory.
PAPERLESS_OPTIMIZE_THUMBNAILS=<bool>
Use optipng to optimize thumbnails. This usually reduces the size of
thumbnails by about 20%, but uses considerable compute time during
consumption.
Defaults to true.
PAPERLESS_POST_CONSUME_SCRIPT=<filename>
After a document is consumed, Paperless can trigger an arbitrary script if
you like. This script will be passed a number of arguments for you to work
with. For more information, take a look at :ref:`advanced-post_consume_script`.
The default is blank, which means nothing will be executed.
PAPERLESS_FILENAME_DATE_ORDER=<format>
Paperless will check the document text for document date information.
Use this setting to enable checking the document filename for date
information. The date order can be set to any option as specified in
https://dateparser.readthedocs.io/en/latest/settings.html#date-order.
The filename will be checked first, and if nothing is found, the document
text will be checked as normal.
Defaults to none, which disables this feature.
PAPERLESS_THUMBNAIL_FONT_NAME=<filename>
Paperless creates thumbnails for plain text files by rendering the content
of the file on an image and uses a predefined font for that. This
font can be changed here.
Note that this won't have any effect on already generated thumbnails.
Defaults to ``/usr/share/fonts/liberation/LiberationSerif-Regular.ttf``.
PAPERLESS_IGNORE_DATES=<string>
Paperless parses a documents creation date from filename and file content.
You may specify a comma separated list of dates that should be ignored during
this process. This is useful for special dates (like date of birth) that appear
in documents regularly but are very unlikely to be the documents creation date.
You may specify dates in a multitude of formats supported by dateparser (see
https://dateparser.readthedocs.io/en/latest/#popular-formats) but as the dates
need to be comma separated, the options are limited.
Example: "2020-12-02,22.04.1999"
Defaults to an empty string to not ignore any dates.
PAPERLESS_DATE_ORDER=<format>
Paperless will try to determine the document creation date from its contents.
Specify the date format Paperless should expect to see within your documents.
This option defaults to DMY which translates to day first, month second, and year
last order. Characters D, M, or Y can be shuffled to meet the required order.
PAPERLESS_CONSUMER_IGNORE_PATTERNS=<json>
By default, paperless ignores certain files and folders in the consumption
directory, such as system files created by the Mac OS.
This can be adjusted by configuring a custom json array with patterns to exclude.
Defaults to ``[".DS_STORE/*", "._*", ".stfolder/*", ".stversions/*", ".localized/*", "desktop.ini"]``.
Binaries
########
There are a few external software packages that Paperless expects to find on
your system when it starts up. Unless you've done something creative with
their installation, you probably won't need to edit any of these. However,
if you've installed these programs somewhere where simply typing the name of
the program doesn't automatically execute it (ie. the program isn't in your
$PATH), then you'll need to specify the literal path for that program.
PAPERLESS_CONVERT_BINARY=<path>
Defaults to "/usr/bin/convert".
PAPERLESS_GS_BINARY=<path>
Defaults to "/usr/bin/gs".
PAPERLESS_OPTIPNG_BINARY=<path>
Defaults to "/usr/bin/optipng".
.. _configuration-docker:
Docker-specific options
#######################
These options don't have any effect in ``paperless.conf``. These options adjust
the behavior of the docker container. Configure these in `docker-compose.env`.
PAPERLESS_WEBSERVER_WORKERS=<num>
The number of worker processes the webserver should spawn. More worker processes
usually result in the front end to load data much quicker. However, each worker process
also loads the entire application into memory separately, so increasing this value
will increase RAM usage.
Consider configuring this to 1 on low power devices with limited amount of RAM.
Defaults to 2.
PAPERLESS_PORT=<port>
The port number the webserver will listen on inside the container. There are
special setups where you may need this to avoid collisions with other
services (like using podman with multiple containers in one pod).
Don't change this when using Docker. To change the port the webserver is
reachable outside of the container, instead refer to the "ports" key in
``docker-compose.yml``.
Defaults to 8000.
USERMAP_UID=<uid>
The ID of the paperless user in the container. Set this to your actual user ID on the
host system, which you can get by executing
.. code:: shell-session
$ id -u
Paperless will change ownership on its folders to this user, so you need to get this right
in order to be able to write to the consumption directory.
Defaults to 1000.
USERMAP_GID=<gid>
The ID of the paperless Group in the container. Set this to your actual group ID on the
host system, which you can get by executing
.. code:: shell-session
$ id -g
Paperless will change ownership on its folders to this group, so you need to get this right
in order to be able to write to the consumption directory.
Defaults to 1000.
PAPERLESS_OCR_LANGUAGES=<list>
Additional OCR languages to install. By default, paperless comes with
English, German, Italian, Spanish and French. If your language is not in this list, install
additional languages with this configuration option:
.. code:: bash
PAPERLESS_OCR_LANGUAGES=tur ces
To actually use these languages, also set the default OCR language of paperless:
.. code:: bash
PAPERLESS_OCR_LANGUAGE=tur
Defaults to none, which does not install any additional languages.
.. _configuration-update-checking:
Update Checking
###############
PAPERLESS_ENABLE_UPDATE_CHECK=<bool>
Enable (or disable) the automatic check for available updates. This feature is disabled
by default but if it is not explicitly set Paperless-ngx will show a message about this.
If enabled, the feature works by pinging the the Github API for the latest release e.g.
https://api.github.com/repos/paperless-ngx/paperless-ngx/releases/latest
to determine whether a new version is available.
Actual updating of the app must still be performed manually.
Note that for users of thirdy-party containers e.g. linuxserver.io this notification
may be 'ahead' of a new release from the third-party maintainers.
In either case, no tracking data is collected by the app in any way.
Defaults to none, which disables the feature.

189
docs/consumption.rst Normal file
View File

@@ -0,0 +1,189 @@
.. _consumption:
Consumption
###########
Once you've got Paperless setup, you need to start feeding documents into it.
Currently, there are three options: the consumption directory, IMAP (email), and
HTTP POST.
.. _consumption-directory:
The Consumption Directory
=========================
The primary method of getting documents into your database is by putting them in
the consumption directory. The ``document_consumer`` script runs in an infinite
loop looking for new additions to this directory and when it finds them, it goes
about the process of parsing them with the OCR, indexing what it finds, and
encrypting the PDF, storing it in the media directory.
Getting stuff into this directory is up to you. If you're running Paperless
on your local computer, you might just want to drag and drop files there, but if
you're running this on a server and want your scanner to automatically push
files to this directory, you'll need to setup some sort of service to accept the
files from the scanner. Typically, you're looking at an FTP server like
`Proftpd`_ or `Samba`_.
.. _Proftpd: http://www.proftpd.org/
.. _Samba: http://www.samba.org/
So where is this consumption directory? It's wherever you define it. Look for
the ``CONSUMPTION_DIR`` value in ``settings.py``. Set that to somewhere
appropriate for your use and put some documents in there. When you're ready,
follow the :ref:`consumer <utilities-consumer>` instructions to get it running.
.. _consumption-directory-hook:
Hooking into the Consumption Process
------------------------------------
Sometimes you may want to do something arbitrary whenever a document is
consumed. Rather than try to predict what you may want to do, Paperless lets
you execute scripts of your own choosing just before or after a document is
consumed using a couple simple hooks.
Just write a script, put it somewhere that Paperless can read & execute, and
then put the path to that script in ``paperless.conf`` with the variable name
of either ``PAPERLESS_PRE_CONSUME_SCRIPT`` or
``PAPERLESS_POST_CONSUME_SCRIPT``. The script will be executed before or
or after the document is consumed respectively.
.. important::
These scripts are executed in a **blocking** process, which means that if
a script takes a long time to run, it can significantly slow down your
document consumption flow. If you want things to run asynchronously,
you'll have to fork the process in your script and exit.
.. _consumption-directory-hook-variables:
What Can These Scripts Do?
..........................
It's your script, so you're only limited by your imagination and the laws of
physics. However, the following values are passed to the scripts in order:
.. _consumption-director-hook-variables-pre:
Pre-consumption script
::::::::::::::::::::::
* Document file name
.. _consumption-director-hook-variables-post:
Post-consumption script
:::::::::::::::::::::::
* Document id
* Generated file name
* Source path
* Thumbnail path
* Download URL
* Thumbnail URL
* Correspondent
* Tags
The script can be in any language you like, but for a simple shell script
example, you can take a look at ``post-consumption-example.sh`` in the
``scripts`` directory in this project.
.. _consumption-imap:
IMAP (Email)
============
Another handy way to get documents into your database is to email them to
yourself. The typical use-case would be to be out for lunch and want to send a
copy of the receipt back to your system at home. Paperless can be taught to
pull emails down from an arbitrary account and dump them into the consumption
directory where the process :ref:`above <consumption-directory>` will follow the
usual pattern on consuming the document.
Some things you need to know about this feature:
* It's disabled by default. By setting the values below it will be enabled.
* It's been tested in a limited environment, so it may not work for you (please
submit a pull request if you can!)
* It's designed to **delete mail from the server once consumed**. So don't go
pointing this to your personal email account and wonder where all your stuff
went.
* Currently, only one photo (attachment) per email will work.
So, with all that in mind, here's what you do to get it running:
1. Setup a new email account somewhere, or if you're feeling daring, create a
folder in an existing email box and note the path to that folder.
2. In ``settings.py`` set all of the appropriate values in ``MAIL_CONSUMPTION``.
If you decided to use a subfolder of an existing account, then make sure you
set ``INBOX`` accordingly here. You also have to set the
``UPLOAD_SHARED_SECRET`` to something you can remember 'cause you'll have to
include that in every email you send.
3. Restart the :ref:`consumer <utilities-consumer>`. The consumer will check
the configured email account every 10 minutes for something new and pull down
whatever it finds.
4. Send yourself an email! Note that the subject is treated as the file name,
so if you set the subject to ``Correspondent - Title - tag,tag,tag``, you'll
get what you expect. Also, you must include the aforementioned secret
string in every email so the fetcher knows that it's safe to import.
5. After a few minutes, the consumer will poll your mailbox, pull down the
message, and place the attachment in the consumption directory with the
appropriate name. A few minutes later, the consumer will import it like any
other file.
.. _consumption-http:
HTTP POST
=========
You can also submit a document via HTTP POST. It doesn't do tags yet, and the
URL schema isn't concrete, but it's a start.
To push your document to Paperless, send an HTTP POST to the server with the
following name/value pairs:
* ``correspondent``: The name of the document's correspondent. Note that there
are restrictions on what characters you can use here. Specifically,
alphanumeric characters, `-`, `,`, `.`, and `'` are ok, everything else it
out. You also can't use the sequence ` - ` (space, dash, space).
* ``title``: The title of the document. The rules for characters is the same
here as the correspondent.
* ``signature``: For security reasons, we have the correspondent send a
signature using a "shared secret" method to make sure that random strangers
don't start uploading stuff to your server. The means of generating this
signature is defined below.
Specify ``enctype="multipart/form-data"``, and then POST your file with::
Content-Disposition: form-data; name="document"; filename="whatever.pdf"
.. _consumption-http-signature:
Generating the Signature
------------------------
Generating a signature based a shared secret is pretty simple: define a secret,
and store it on the server and the client. Then use that secret, along with
the text you want to verify to generate a string that you can use for
verification.
In the case of Paperless, you configure the server with the secret by setting
``UPLOAD_SHARED_SECRET``. Then on your client, you generate your signature by
concatenating the correspondent, title, and the secret, and then using sha256
to generate a hexdigest.
If you're using Python, this is what that looks like:
.. code:: python
from hashlib import sha256
signature = sha256(correspondent + title + secret).hexdigest()

View File

@@ -1,431 +0,0 @@
.. _extending:
Paperless-ngx Development
#########################
This section describes the steps you need to take to start development on paperless-ngx.
Check out the source from github. The repository is organized in the following way:
* ``main`` always represents the latest release and will only see changes
when a new release is made.
* ``dev`` contains the code that will be in the next release.
* ``feature-X`` contain bigger changes that will be in some release, but not
necessarily the next one.
When making functional changes to paperless, *always* make your changes on the ``dev`` branch.
Apart from that, the folder structure is as follows:
* ``docs/`` - Documentation.
* ``src-ui/`` - Code of the front end.
* ``src/`` - Code of the back end.
* ``scripts/`` - Various scripts that help with different parts of development.
* ``docker/`` - Files required to build the docker image.
Contributing to Paperless
=========================
Maybe you've been using Paperless for a while and want to add a feature or two,
or maybe you've come across a bug that you have some ideas how to solve. The
beauty of open source software is that you can see what's wrong and help to get
it fixed for everyone!
Before contributing please review our `code of conduct`_ and other important
information in the `contributing guidelines`_.
.. _code-formatting-with-pre-commit-hooks:
Code formatting with pre-commit Hooks
=====================================
To ensure a consistent style and formatting across the project source, the project
utilizes a Git `pre-commit` hook to perform some formatting and linting before a
commit is allowed. That way, everyone uses the same style and some common issues
can be caught early on. See below for installation instructions.
Once installed, hooks will run when you commit. If the formatting isn't quite right
or a linter catches something, the commit will be rejected. You'll need to look at the
output and fix the issue. Some hooks, such as the Python formatting tool `black`,
will format failing files, so all you need to do is `git add` those files again and
retry your commit.
Initial setup and first start
=============================
After you forked and cloned the code from github you need to perform a first-time setup.
To do the setup you need to perform the steps from the following chapters in a certain order:
1. Install prerequisites + pipenv as mentioned in :ref:`Bare metal route <setup-bare_metal>`
2. Copy ``paperless.conf.example`` to ``paperless.conf`` and enable debug mode.
3. Install the Angular CLI interface:
.. code:: shell-session
$ npm install -g @angular/cli
4. Install pre-commit
.. code:: shell-session
pre-commit install
5. Create ``consume`` and ``media`` folders in the cloned root folder.
.. code:: shell-session
mkdir -p consume media
6. You can now either ...
* install redis or
* use the included scripts/start-services.sh to use docker to fire up a redis instance (and some other services such as tika, gotenberg and a postgresql server) or
* spin up a bare redis container
.. code:: shell-session
docker run -d -p 6379:6379 --restart unless-stopped redis:latest
7. Install the python dependencies by performing in the src/ directory.
.. code:: shell-session
pipenv install --dev
* Make sure you're using python 3.9.x or lower. Otherwise you might get issues with building dependencies. You can use `pyenv <https://github.com/pyenv/pyenv>`_ to install a specific python version.
8. Generate the static UI so you can perform a login to get session that is required for frontend development (this needs to be done one time only). From src-ui directory:
.. code:: shell-session
npm install .
./node_modules/.bin/ng build --configuration production
9. Apply migrations and create a superuser for your dev instance:
.. code:: shell-session
python3 manage.py migrate
python3 manage.py createsuperuser
10. Now spin up the dev backend. Depending on which part of paperless you're developing for, you need to have some or all of them running.
.. code:: shell-session
python3 manage.py runserver & python3 manage.py document_consumer & python3 manage.py qcluster
11. Login with the superuser credentials provided in step 8 at ``http://localhost:8000`` to create a session that enables you to use the backend.
Backend development environment is now ready, to start Frontend development go to ``/src-ui`` and run ``ng serve``. From there you can use ``http://localhost:4200`` for a preview.
Back end development
====================
The backend is a django application. PyCharm works well for development, but you can use whatever
you want.
Configure the IDE to use the src/ folder as the base source folder. Configure the following
launch configurations in your IDE:
* python3 manage.py runserver
* python3 manage.py qcluster
* python3 manage.py document_consumer
To start them all:
.. code:: shell-session
python3 manage.py runserver & python3 manage.py document_consumer & python3 manage.py qcluster
Testing and code style:
* Run ``pytest`` in the src/ directory to execute all tests. This also generates a HTML coverage
report. When runnings test, paperless.conf is loaded as well. However: the tests rely on the default
configuration. This is not ideal. But for now, make sure no settings except for DEBUG are overridden when testing.
* Coding style is enforced by the Git pre-commit hooks. These will ensure your code is formatted and do some
linting when you do a `git commit`.
* You can also run ``black`` manually to format your code
.. note::
The line length rule E501 is generally useful for getting multiple source files
next to each other on the screen. However, in some cases, its just not possible
to make some lines fit, especially complicated IF cases. Append ``# NOQA: E501``
to disable this check for certain lines.
Front end development
=====================
The front end is built using Angular. In order to get started, you need ``npm``.
Install the Angular CLI interface with
.. code:: shell-session
$ npm install -g @angular/cli
and make sure that it's on your path. Next, in the src-ui/ directory, install the
required dependencies of the project.
.. code:: shell-session
$ npm install
You can launch a development server by running
.. code:: shell-session
$ ng serve
This will automatically update whenever you save. However, in-place compilation might fail
on syntax errors, in which case you need to restart it.
By default, the development server is available on ``http://localhost:4200/`` and is configured
to access the API at ``http://localhost:8000/api/``, which is the default of the backend.
If you enabled DEBUG on the back end, several security overrides for allowed hosts, CORS and
X-Frame-Options are in place so that the front end behaves exactly as in production. This also
relies on you being logged into the back end. Without a valid session, The front end will simply
not work.
Testing and code style:
* The frontend code (.ts, .html, .scss) use ``prettier`` for code formatting via the Git
``pre-commit`` hooks which run automatically on commit. See
:ref:`above <code-formatting-with-pre-commit-hooks>` for installation. You can also run this
via cli with a command such as
.. code:: shell-session
$ git ls-files -- '*.ts' | xargs pre-commit run prettier --files
* Frontend testing uses jest and cypress. There is currently a need for significantly more
frontend tests. Unit tests and e2e tests, respectively, can be run non-interactively with:
.. code:: shell-session
$ ng test
$ npm run e2e:ci
Cypress also includes a UI which can be run from within the ``src-ui`` directory with
.. code:: shell-session
$ ./node_modules/.bin/cypress open
In order to build the front end and serve it as part of django, execute
.. code:: shell-session
$ ng build --prod
This will build the front end and put it in a location from which the Django server will serve
it as static content. This way, you can verify that authentication is working.
Localization
============
Paperless is available in many different languages. Since paperless consists both of a django
application and an Angular front end, both these parts have to be translated separately.
Front end localization
----------------------
* The Angular front end does localization according to the `Angular documentation <https://angular.io/guide/i18n>`_.
* The source language of the project is "en_US".
* The source strings end up in the file "src-ui/messages.xlf".
* The translated strings need to be placed in the "src-ui/src/locale/" folder.
* In order to extract added or changed strings from the source files, call ``ng xi18n --ivy``.
Adding new languages requires adding the translated files in the "src-ui/src/locale/" folder and adjusting a couple files.
1. Adjust "src-ui/angular.json":
.. code:: json
"i18n": {
"sourceLocale": "en-US",
"locales": {
"de": "src/locale/messages.de.xlf",
"nl-NL": "src/locale/messages.nl_NL.xlf",
"fr": "src/locale/messages.fr.xlf",
"en-GB": "src/locale/messages.en_GB.xlf",
"pt-BR": "src/locale/messages.pt_BR.xlf",
"language-code": "language-file"
}
}
2. Add the language to the available options in "src-ui/src/app/services/settings.service.ts":
.. code:: typescript
getLanguageOptions(): LanguageOption[] {
return [
{code: "en-us", name: $localize`English (US)`, englishName: "English (US)", dateInputFormat: "mm/dd/yyyy"},
{code: "en-gb", name: $localize`English (GB)`, englishName: "English (GB)", dateInputFormat: "dd/mm/yyyy"},
{code: "de", name: $localize`German`, englishName: "German", dateInputFormat: "dd.mm.yyyy"},
{code: "nl", name: $localize`Dutch`, englishName: "Dutch", dateInputFormat: "dd-mm-yyyy"},
{code: "fr", name: $localize`French`, englishName: "French", dateInputFormat: "dd/mm/yyyy"},
{code: "pt-br", name: $localize`Portuguese (Brazil)`, englishName: "Portuguese (Brazil)", dateInputFormat: "dd/mm/yyyy"}
// Add your new language here
]
}
``dateInputFormat`` is a special string that defines the behavior of the date input fields and absolutely needs to contain "dd", "mm" and "yyyy".
3. Import and register the Angular data for this locale in "src-ui/src/app/app.module.ts":
.. code:: typescript
import localeDe from '@angular/common/locales/de';
registerLocaleData(localeDe)
Back end localization
---------------------
A majority of the strings that appear in the back end appear only when the admin is used. However,
some of these are still shown on the front end (such as error messages).
* The django application does localization according to the `django documentation <https://docs.djangoproject.com/en/3.1/topics/i18n/translation/>`_.
* The source language of the project is "en_US".
* Localization files end up in the folder "src/locale/".
* In order to extract strings from the application, call ``python3 manage.py makemessages -l en_US``. This is important after making changes to translatable strings.
* The message files need to be compiled for them to show up in the application. Call ``python3 manage.py compilemessages`` to do this. The generated files don't get
committed into git, since these are derived artifacts. The build pipeline takes care of executing this command.
Adding new languages requires adding the translated files in the "src/locale/" folder and adjusting the file "src/paperless/settings.py" to include the new language:
.. code:: python
LANGUAGES = [
("en-us", _("English (US)")),
("en-gb", _("English (GB)")),
("de", _("German")),
("nl-nl", _("Dutch")),
("fr", _("French")),
("pt-br", _("Portuguese (Brazil)")),
# Add language here.
]
Building the documentation
==========================
The documentation is built using sphinx. I've configured ReadTheDocs to automatically build
the documentation when changes are pushed. If you want to build the documentation locally,
this is how you do it:
1. Install python dependencies.
.. code:: shell-session
$ cd /path/to/paperless
$ pipenv install --dev
2. Build the documentation
.. code:: shell-session
$ cd /path/to/paperless/docs
$ pipenv run make clean html
This will build the HTML documentation, and put the resulting files in the ``_build/html``
directory.
Building the Docker image
=========================
The docker image is primarily built by the GitHub actions workflow, but it can be
faster when developing to build and tag an image locally.
To provide the build arguments automatically, build the image using the helper
script ``build-docker-image.sh``.
Building the docker image from source:
.. code:: shell-session
./build-docker-image.sh Dockerfile -t <your-tag>
Extending Paperless
===================
Paperless does not have any fancy plugin systems and will probably never have. However,
some parts of the application have been designed to allow easy integration of additional
features without any modification to the base code.
Making custom parsers
---------------------
Paperless uses parsers to add documents to paperless. A parser is responsible for:
* Retrieve the content from the original
* Create a thumbnail
* Optional: Retrieve a created date from the original
* Optional: Create an archived document from the original
Custom parsers can be added to paperless to support more file types. In order to do that,
you need to write the parser itself and announce its existence to paperless.
The parser itself must extend ``documents.parsers.DocumentParser`` and must implement the
methods ``parse`` and ``get_thumbnail``. You can provide your own implementation to
``get_date`` if you don't want to rely on paperless' default date guessing mechanisms.
.. code:: python
class MyCustomParser(DocumentParser):
def parse(self, document_path, mime_type):
# This method does not return anything. Rather, you should assign
# whatever you got from the document to the following fields:
# The content of the document.
self.text = "content"
# Optional: path to a PDF document that you created from the original.
self.archive_path = os.path.join(self.tempdir, "archived.pdf")
# Optional: "created" date of the document.
self.date = get_created_from_metadata(document_path)
def get_thumbnail(self, document_path, mime_type):
# This should return the path to a thumbnail you created for this
# document.
return os.path.join(self.tempdir, "thumb.png")
If you encounter any issues during parsing, raise a ``documents.parsers.ParseError``.
The ``self.tempdir`` directory is a temporary directory that is guaranteed to be empty
and removed after consumption finished. You can use that directory to store any
intermediate files and also use it to store the thumbnail / archived document.
After that, you need to announce your parser to paperless. You need to connect a
handler to the ``document_consumer_declaration`` signal. Have a look in the file
``src/paperless_tesseract/apps.py`` on how that's done. The handler is a method
that returns information about your parser:
.. code:: python
def myparser_consumer_declaration(sender, **kwargs):
return {
"parser": MyCustomParser,
"weight": 0,
"mime_types": {
"application/pdf": ".pdf",
"image/jpeg": ".jpg",
}
}
* ``parser`` is a reference to a class that extends ``DocumentParser``.
* ``weight`` is used whenever two or more parsers are able to parse a file: The parser with
the higher weight wins. This can be used to override the parsers provided by
paperless.
* ``mime_types`` is a dictionary. The keys are the mime types your parser supports and the value
is the default file extension that paperless should use when storing files and serving them for
download. We could guess that from the file extensions, but some mime types have many extensions
associated with them and the python methods responsible for guessing the extension do not always
return the same value.
.. _code of conduct: https://github.com/paperless-ngx/paperless-ngx/blob/main/CODE_OF_CONDUCT.md
.. _contributing guidelines: https://github.com/paperless-ngx/paperless-ngx/blob/main/CONTRIBUTING.md

View File

@@ -1,117 +0,0 @@
**************************
Frequently asked questions
**************************
**Q:** *What's the general plan for Paperless-ngx?*
**A:** While Paperless-ngx is already considered largely "feature-complete" it is a community-driven
project and development will be guided in this way. New features can be submitted via
GitHub discussions and "up-voted" by the community but this is not a guarantee the feature
will be implemented. This project will always be open to collaboration in the form of PRs,
ideas etc.
**Q:** *I'm using docker. Where are my documents?*
**A:** Your documents are stored inside the docker volume ``paperless_media``.
Docker manages this volume automatically for you. It is a persistent storage
and will persist as long as you don't explicitly delete it. The actual location
depends on your host operating system. On Linux, chances are high that this location
is
.. code::
/var/lib/docker/volumes/paperless_media/_data
.. caution::
Do not mess with this folder. Don't change permissions and don't move
files around manually. This folder is meant to be entirely managed by docker
and paperless.
**Q:** *Let's say I want to switch tools in a year. Can I easily move to other systems?*
**A:** Your documents are stored as plain files inside the media folder. You can always drag those files
out of that folder to use them elsewhere. Here are a couple notes about that.
* Paperless-ngx never modifies your original documents. It keeps checksums of all documents and uses a
scheduled sanity checker to check that they remain the same.
* By default, paperless uses the internal ID of each document as its filename. This might not be very
convenient for export. However, you can adjust the way files are stored in paperless by
:ref:`configuring the filename format <advanced-file_name_handling>`.
* :ref:`The exporter <utilities-exporter>` is another easy way to get your files out of paperless with reasonable file names.
**Q:** *What file types does paperless-ngx support?*
**A:** Currently, the following files are supported:
* PDF documents, PNG images, JPEG images, TIFF images and GIF images are processed with OCR and converted into PDF documents.
* Plain text documents are supported as well and are added verbatim
to paperless.
* With the optional Tika integration enabled (see :ref:`Configuration <configuration-tika>`), Paperless also supports various
Office documents (.docx, .doc, odt, .ppt, .pptx, .odp, .xls, .xlsx, .ods).
Paperless-ngx determines the type of a file by inspecting its content. The
file extensions do not matter.
**Q:** *Will paperless-ngx run on Raspberry Pi?*
**A:** The short answer is yes. I've tested it on a Raspberry Pi 3 B.
The long answer is that certain parts of
Paperless will run very slow, such as the OCR. On Raspberry Pi,
try to OCR documents before feeding them into paperless so that paperless can
reuse the text. The web interface is a lot snappier, since it runs
in your browser and paperless has to do much less work to serve the data.
.. note::
You can adjust some of the settings so that paperless uses less processing
power. See :ref:`setup-less_powerful_devices` for details.
**Q:** *How do I install paperless-ngx on Raspberry Pi?*
**A:** Docker images are available for arm and arm64 hardware, so just follow
the docker-compose instructions. Apart from more required disk space compared to
a bare metal installation, docker comes with close to zero overhead, even on
Raspberry Pi.
If you decide to got with the bare metal route, be aware that some of the
python requirements do not have precompiled packages for ARM / ARM64. Installation
of these will require additional development libraries and compilation will take
a long time.
**Q:** *How do I run this on Unraid?*
**A:** Paperless-ngx is available as `community app <https://unraid.net/community/apps?q=paperless-ngx>`_
in Unraid. `Uli Fahrer <https://github.com/Tooa>`_ created a container template for that.
**Q:** *How do I run this on my toaster?*
**A:** I honestly don't know! As for all other devices that might be able
to run paperless, you're a bit on your own. If you can't run the docker image,
the documentation has instructions for bare metal installs. I'm running
paperless on an i3 processor from 2015 or so. This is also what I use to test
new releases with. Apart from that, I also have a Raspberry Pi, which I
occasionally build the image on and see if it works.
**Q:** *How do I proxy this with NGINX?*
**A:** See :ref:`here <setup-nginx>`.
.. _faq-mod_wsgi:
**Q:** *How do I get WebSocket support with Apache mod_wsgi*?
**A:** ``mod_wsgi`` by itself does not support ASGI. Paperless will continue
to work with WSGI, but certain features such as status notifications about
document consumption won't be available.
If you want to continue using ``mod_wsgi``, you will have to run an ASGI-enabled
web server as well that processes WebSocket connections, and configure Apache to
redirect WebSocket connections to this server. Multiple options for ASGI servers
exist:
* ``gunicorn`` with ``uvicorn`` as the worker implementation (the default of paperless)
* ``daphne`` as a standalone server, which is the reference implementation for ASGI.
* ``uvicorn`` as a standalone server

85
docs/guesswork.rst Normal file
View File

@@ -0,0 +1,85 @@
.. _guesswork:
Guesswork
#########
During the consumption process, Paperless tries to guess some of the attributes
of the document it's looking at. To do this it uses two approaches:
.. _guesswork-naming:
File Naming
===========
Any document you put into the consumption directory will be consumed, but if
you name the file right, it'll automatically set some values in the database
for you. This is is the logic the consumer follows:
1. Try to find the correspondent, title, and tags in the file name following
the pattern: ``Date - Correspondent - Title - tag,tag,tag.pdf``. Note that
the format of the date is **rigidly defined** as ``YYYYMMDDHHMMSSZ`` or
``YYYYMMDDZ``. The ``Z`` refers "Zulu time" AKA "UTC".
2. If that doesn't work, we skip the date and try this pattern:
``Correspondent - Title - tag,tag,tag.pdf``.
3. If that doesn't work, we try to find the correspondent and title in the file
name following the pattern: ``Correspondent - Title.pdf``.
4. If that doesn't work, just assume that the name of the file is the title.
So given the above, the following examples would work as you'd expect:
* ``20150314000700Z - Some Company Name - Invoice 2016-01-01 - money,invoices.pdf``
* ``20150314Z - Some Company Name - Invoice 2016-01-01 - money,invoices.pdf``
* ``Some Company Name - Invoice 2016-01-01 - money,invoices.pdf``
* ``Another Company - Letter of Reference.jpg``
* ``Dad's Recipe for Pancakes.png``
These however wouldn't work:
* ``2015-03-14 00:07:00 UTC - Some Company Name, Invoice 2016-01-01, money, invoices.pdf``
* ``2015-03-14 - Some Company Name, Invoice 2016-01-01, money, invoices.pdf``
* ``Some Company Name, Invoice 2016-01-01, money, invoices.pdf``
* ``Another Company- Letter of Reference.jpg``
.. _guesswork-content:
Reading the Document Contents
=============================
After the consumer has tried to figure out what it could from the file name,
it starts looking at the content of the document itself. It will compare the
matching algorithms defined by every tag and correspondent already set in your
database to see if they apply to the text in that document. In other words,
if you defined a tag called ``Home Utility`` that had a ``match`` property of
``bc hydro`` and a ``matching_algorithm`` of ``literal``, Paperless will
automatically tag your newly-consumed document with your ``Home Utility`` tag
so long as the text ``bc hydro`` appears in the body of the document somewhere.
The matching logic is quite powerful, and supports searching the text of your
document with different algorithms, and as such, some experimentation may be
necessary to get things Just Right.
.. _guesswork-content-howto:
How Do I Set Up These Matching Algorithms?
------------------------------------------
Setting up of the algorithms is easily done through the admin interface. When
you create a new correspondent or tag, there are optional fields for matching
text and matching algorithm. From the help info there:
.. note::
Which algorithm you want to use when matching text to the OCR'd PDF. Here,
"any" looks for any occurrence of any word provided in the PDF, while "all"
requires that every word provided appear in the PDF, albeit not in the
order provided. A "literal" match means that the text you enter must
appear in the PDF exactly as you've entered it, and "regular expression"
uses a regex to match the PDF. If you don't know what a regex is, you
probably don't want this option.
Then just save your tag/correspondent and run another document through the
consumer. Once complete, you should see the newly-created document,
automatically tagged with the appropriate data.

View File

@@ -1,75 +1,38 @@
*********
.. _index:
Paperless
*********
=========
Paperless is a simple Django application running in two parts:
a *Consumer* (the thing that does the indexing) and
the *Web server* (the part that lets you search &
download already-indexed documents). If you want to learn more about its
functions keep on reading after the installation section.
Scan, index, and archive all of your paper documents. Say goodbye to paper.
.. _index-why-this-exists:
Why This Exists
===============
Paper is a nightmare. Environmental issues aside, there's no excuse for it in
the 21st century. It takes up space, collects dust, doesn't support any form
of a search feature, indexing is tedious, it's heavy and prone to damage &
loss.
the 21st century. It takes up space, collects dust, doesn't support any form of
a search feature, indexing is tedious, it's heavy and prone to damage & loss.
I wrote this to make "going paperless" easier. I do not have to worry about
finding stuff again. I feed documents right from the post box into the scanner
and then shred them. Perhaps you might find it useful too.
I wrote this to make "going paperless" easier. I wanted to be able to feed
documents right from the post box into the scanner and then shred them so I
never have to worry about finding stuff again. Perhaps you might find it useful
too.
Paperless-ngx
=============
Paperless-ngx is a document management system that transforms your physical
documents into a searchable online archive so you can keep, well, *less paper*.
Paperless-ngx forked from paperless-ng to continue the great work and
distribute responsibility of supporting and advancing the project among a team
of people.
NG stands for both Angular (the framework used for the
Frontend) and next-gen. Publishing this project under a different name also
avoids confusion between paperless and paperless-ngx.
If you want to learn about what's different in paperless-ngx from Paperless, check out these
resources in the documentation:
* :ref:`Some screenshots <screenshots>` of the new UI are available.
* Read :ref:`this section <advanced-automatic_matching>` if you want to
learn about how paperless automates all tagging using machine learning.
* Paperless now comes with a :ref:`proper email consumer <usage-email>`
that's fully tested and production ready.
* Paperless creates searchable PDF/A documents from whatever you you put into
the consumption directory. This means that you can select text in
image-only documents coming from your scanner.
* See :ref:`this note <utilities-encyption>` about GnuPG encryption in
paperless-ngx.
* Paperless is now integrated with a
:ref:`task processing queue <setup-task_processor>` that tells you
at a glance when and why something is not working.
* The :doc:`changelog </changelog>` contains a detailed list of all changes
in paperless-ngx.
Contents
========
.. toctree::
:maxdepth: 1
:maxdepth: 2
requirements
setup
usage_overview
advanced_usage
administration
configuration
consumption
api
faq
utilities
guesswork
migrating
troubleshooting
extending
scanners
screenshots
changelog

108
docs/migrating.rst Normal file
View File

@@ -0,0 +1,108 @@
.. _migrating:
Migrating, Updates, and Backups
===============================
As Paperless is still under active development, there's a lot that can change
as software updates roll out. You should backup often, so if anything goes
wrong during an update, you at least have a means of restoring to something
usable. Thankfully, there are automated ways of backing up, restoring, and
updating the software.
.. _migrating-backup:
Backing Up
----------
So you're bored of this whole project, or you want to make a remote backup of
the unencrypted files for whatever reason. This is easy to do, simply use the
:ref:`exporter <utilities-exporter>` to dump your documents and database out
into an arbitrary directory.
.. _migrating-restoring:
Restoring
---------
Restoring your data is just as easy, since nearly all of your data exists either
in the file names, or in the contents of the files themselves. You just need to
create an empty database (just follow the
:ref:`installation instructions <setup-installation>` again) and then import the
``tags.json`` file you created as part of your backup. Lastly, copy your
exported documents into the consumption directory and start up the consumer.
.. code-block:: shell-session
$ cd /path/to/project
$ rm data/db.sqlite3 # Delete the database
$ cd src
$ ./manage.py migrate # Create the database
$ ./manage.py createsuperuser
$ ./manage.py loaddata /path/to/arbitrary/place/tags.json
$ cp /path/to/exported/docs/* /path/to/consumption/dir/
$ ./manage.py document_consumer
Importing your data if you are :ref:`using Docker <setup-installation-docker>`
is almost as simple:
.. code-block:: shell-session
# Stop and remove your current containers
$ docker-compose stop
$ docker-compose rm -f
# Recreate them, add the superuser
$ docker-compose up -d
$ docker-compose run --rm webserver createsuperuser
# Load the tags
$ cat /path/to/arbitrary/place/tags.json | docker-compose run --rm webserver loaddata_stdin -
# Load your exported documents into the consumption directory
# (How you do this highly depends on how you have set this up)
$ cp /path/to/exported/docs/* /path/to/mounted/consumption/dir/
After loading the documents into the consumption directory the consumer will
immediately start consuming the documents.
.. _migrating-updates:
Updates
-------
For the most part, all you have to do to update Paperless is run ``git pull``
on the directory containing the project files, and then use Django's
``migrate`` command to execute any database schema updates that might have been
rolled in as part of the update:
.. code-block:: shell-session
$ cd /path/to/project
$ git pull
$ cd src
$ ./manage.py migrate
Note that it's possible (even likely) that while ``git pull`` may update some
files, the ``migrate`` step may not update anything. This is totally normal.
Additionally, as new features are added, the ability to control those features
is typically added by way of an environment variable set in ``paperless.conf``.
You may want to take a look at the ``paperless.conf.example`` file to see if
there's anything new in there compared to what you've got int ``/etc``.
If you are :ref:`using Docker <setup-installation-docker>` the update process
requires only one additional step:
.. code-block:: shell-session
$ cd /path/to/project
$ git pull
$ docker build -t paperless .
$ docker-compose up -d
$ docker-compose run --rm webserver migrate
If ``git pull`` doesn't report any changes, there is no need to continue with
the remaining steps.

119
docs/requirements.rst Normal file
View File

@@ -0,0 +1,119 @@
.. _requirements:
Requirements
============
You need a Linux machine or Unix-like setup (theoretically an Apple machine
should work) that has the following software installed on it:
* `Python3`_ (with development libraries, pip and virtualenv)
* `GNU Privacy Guard`_
* `Tesseract`_, plus its language files matching your document base.
* `Imagemagick`_ version 6.7.5 or higher
* `unpaper`_
.. _Python3: https://python.org/
.. _GNU Privacy Guard: https://gnupg.org
.. _Tesseract: https://github.com/tesseract-ocr
.. _Imagemagick: http://imagemagick.org/
.. _unpaper: https://www.flameeyes.eu/projects/unpaper
Notably, you should confirm how you access your Python3 installation. Many
Linux distributions will install Python3 in parallel to Python2, using the names
``python3`` and ``python`` respectively. The same goes for ``pip3`` and
``pip``. Using Python2 will likely break things, so make sure that you're using
the right version.
For the purposes of simplicity, ``python`` and ``pip`` is used everywhere to
refer to their Python 3 versions.
In addition to the above, there are a number of Python requirements, all of
which are listed in a file called ``requirements.txt`` in the project root.
If you're not working on a virtual environment (like Vagrant or Docker), you
should probably be using a virtualenv, but that's your call. The reasons why
you might choose a virtualenv or not aren't really within the scope of this
document. Needless to say if you don't know what a virtualenv is, you should
probably figure that out before continuing.
.. _requirements-apple:
Apple-tastic Complications
--------------------------
Some users have `run into problems`_ with installing ImageMagick on Apple
systems using HomeBrew. The solution appears to be to install ghostscript as
well as ImageMagick:
.. _run into problems: https://github.com/danielquinn/paperless/issues/25
.. code:: bash
$ brew install ghostscript
$ brew install imagemagick
$ brew install libmagic
.. _requirements-baremetal:
Python-specific Requirements: No Virtualenv
-------------------------------------------
If you don't care to use a virtual env, then installation of the Python
dependencies is easy:
.. code:: bash
$ pip install --user --requirement /path/to/paperless/requirements.txt
This should download and install all of the requirements into
``${HOME}/.local``. Remember that your distribution may be using ``pip3`` as
mentioned above.
.. _requirements-virtualenv:
Python-specific Requirements: Virtualenv
----------------------------------------
Using a virtualenv for this is pretty straightforward: create a virtualenv,
enter it, and install the requirements using the ``requirements.txt`` file:
.. code:: bash
$ virtualenv --python=/path/to/python3 /path/to/arbitrary/directory
$ . /path/to/arbitrary/directory/bin/activate
$ pip install --requirement /path/to/paperless/requirements.txt
Now you're ready to go. Just remember to enter your virtualenv whenever you
want to use Paperless.
.. _requirements-documentation:
Documentation
-------------
As generation of the documentation is not required for use of Paperless,
dependencies for this process are not included in ``requirements.txt``. If
you'd like to generate your own docs locally, you'll need to:
.. code:: bash
$ pip install sphinx
and then cd into the ``docs`` directory and type ``make html``.
If you are using Docker, you can use the following commands to build the
documentation and run a webserver serving it on `port 8001`_:
.. code:: bash
$ pwd
/path/to/paperless
$ docker build -t paperless:docs -f docs/Dockerfile .
$ docker run --rm -it -p "8001:8000" paperless:docs
.. _port 8001: http://127.0.0.1:8001

View File

@@ -1,133 +0,0 @@
.. _scanners:
***********************
Scanner recommendations
***********************
As Paperless operates by watching a folder for new files, doesn't care what
scanner you use, but sometimes finding a scanner that will write to an FTP,
NFS, or SMB server can be difficult. This page is here to help you find one
that works right for you based on recommendations from other Paperless users.
Physical scanners
=================
+---------+----------------+-----+------+-----+-----+------+----------+----------------+
| Brand | Model | Supports | Recommended By |
+---------+----------------+-----+------+-----+-----+------+----------+----------------+
| | | FTP | SFTP | NFS | SMB | SMTP | API [1]_ | |
+=========+================+=====+======+=====+=====+======+==========+================+
| Brother | `ADS-1700W`_ | yes | | | yes | yes | |`holzhannes`_ |
+---------+----------------+-----+------+-----+-----+------+----------+----------------+
| Brother | `ADS-1600W`_ | yes | | | yes | yes | |`holzhannes`_ |
+---------+----------------+-----+------+-----+-----+------+----------+----------------+
| Brother | `ADS-1500W`_ | yes | | | yes | yes | |`danielquinn`_ |
+---------+----------------+-----+------+-----+-----+------+----------+----------------+
| Brother | `ADS-1100W`_ | yes | | | | | |`ytzelf`_ |
+---------+----------------+-----+------+-----+-----+------+----------+----------------+
| Brother | `ADS-2800W`_ | yes | yes | | yes | yes | |`philpagel`_ |
+---------+----------------+-----+------+-----+-----+------+----------+----------------+
| Brother | `MFC-J6930DW`_ | yes | | | | | |`ayounggun`_ |
+---------+----------------+-----+------+-----+-----+------+----------+----------------+
| Brother | `MFC-L5850DW`_ | yes | | | | yes | |`holzhannes`_ |
+---------+----------------+-----+------+-----+-----+------+----------+----------------+
| Brother | `MFC-L2750DW`_ | yes | | | yes | yes | |`muued`_ |
+---------+----------------+-----+------+-----+-----+------+----------+----------------+
| Brother | `MFC-J5910DW`_ | yes | | | | | |`bmsleight`_ |
+---------+----------------+-----+------+-----+-----+------+----------+----------------+
| Brother | `MFC-8950DW`_ | yes | | | yes | yes | |`philpagel`_ |
+---------+----------------+-----+------+-----+-----+------+----------+----------------+
| Brother | `MFC-9142CDN`_ | yes | | | yes | | |`REOLDEV`_ |
+---------+----------------+-----+------+-----+-----+------+----------+----------------+
| Fujitsu | `ix500`_ | yes | | | yes | | |`eonist`_ |
+---------+----------------+-----+------+-----+-----+------+----------+----------------+
| Epson | `ES-580W`_ | yes | | | yes | yes | |`fignew`_ |
+---------+----------------+-----+------+-----+-----+------+----------+----------------+
| Epson | `WF-7710DWF`_ | yes | | | yes | | |`Skylinar`_ |
+---------+----------------+-----+------+-----+-----+------+----------+----------------+
| Fujitsu | `S1300i`_ | yes | | | yes | | |`jonaswinkler`_ |
+---------+----------------+-----+------+-----+-----+------+----------+----------------+
| Doxie | `Q2`_ | | | | | | yes |`Unkn0wnCat`_ |
+---------+----------------+-----+------+-----+-----+------+----------+----------------+
.. _MFC-L5850DW: https://www.brother-usa.com/products/mfcl5850dw
.. _MFC-L2750DW: https://www.brother.de/drucker/laserdrucker/mfc-l2750dw
.. _ADS-1700W: https://www.brother-usa.com/products/ads1700w
.. _ADS-1600W: https://www.brother-usa.com/products/ads1600w
.. _ADS-1500W: https://www.brother.ca/en/p/ads1500w
.. _ADS-1100W: https://support.brother.com/g/b/downloadtop.aspx?c=fr&lang=fr&prod=ads1100w_eu_as_cn
.. _ADS-2800W: https://www.brother-usa.com/products/ads2800w
.. _MFC-J6930DW: https://www.brother.ca/en/p/MFCJ6930DW
.. _MFC-J5910DW: https://www.brother.co.uk/printers/inkjet-printers/mfcj5910dw
.. _MFC-8950DW: https://www.brother-usa.com/products/mfc8950dw
.. _MFC-9142CDN: https://www.brother.co.uk/printers/laser-printers/mfc9140cdn
.. _ES-580W: https://epson.com/Support/Scanners/ES-Series/Epson-WorkForce-ES-580W/s/SPT_B11B258201
.. _WF-7710DWF: https://www.epson.de/en/products/printers/inkjet-printers/for-home/workforce-wf-7710dwf
.. _ix500: http://www.fujitsu.com/us/products/computing/peripheral/scanners/scansnap/ix500/
.. _S1300i: https://www.fujitsu.com/global/products/computing/peripheral/scanners/soho/s1300i/
.. _Q2: https://www.getdoxie.com/product/doxie-q/
.. _ayounggun: https://github.com/ayounggun
.. _bmsleight: https://github.com/bmsleight
.. _danielquinn: https://github.com/danielquinn
.. _eonist: https://github.com/eonist
.. _fignew: https://github.com/fignew
.. _holzhannes: https://github.com/holzhannes
.. _jonaswinkler: https://github.com/jonaswinkler
.. _REOLDEV: https://github.com/REOLDEV
.. _Skylinar: https://github.com/Skylinar
.. _ytzelf: https://github.com/ytzelf
.. _Unkn0wnCat: https://github.com/Unkn0wnCat
.. _muued: https://github.com/muued
.. _philpagel: https://github.com/philpagel
.. [1] Scanners with API Integration allow to push scanned documents directly to :ref:`Paperless API <api-file_uploads>`, sometimes referred to as Webhook or Document POST.
Mobile phone software
=====================
You can use your phone to "scan" documents. The regular camera app will work, but may have too low contrast for OCR to work well. Apps specifically for scanning are recommended.
+-----------------------------+----------------+-----+-----+-----+-------+--------+------------------+
| Name | OS | Supports | Recommended By |
+-----------------------------+----------------+-----+-----+-----+-------+--------+------------------+
| | | FTP | NFS | SMB | Email | WebDAV | |
+=============================+================+=====+=====+=====+=======+========+==================+
| `Office Lens`_ | Android | ? | ? | ? | ? | ? | `jonaswinkler`_ |
+-----------------------------+----------------+-----+-----+-----+-------+--------+------------------+
| `Genius Scan`_ | Android | yes | no | yes | yes | yes | `hannahswain`_ |
+-----------------------------+----------------+-----+-----+-----+-------+--------+------------------+
| `OpenScan`_ | Android | no | no | no | no | no | `benjaminfrank`_ |
+-----------------------------+----------------+-----+-----+-----+-------+--------+------------------+
| `OCR Scanner - QuickScan`_ | iOS | no | no | no | no | yes | `holzhannes`_ |
+-----------------------------+----------------+-----+-----+-----+-------+--------+------------------+
On Android, you can use these applications in combination with one of the :ref:`Paperless-ngx compatible apps <usage-mobile_upload>` to "Share" the documents produced by these scanner apps with paperless. On iOS, you can share the scanned documents via iOS-Sharing to other mail, WebDav or FTP apps.
.. _Office Lens: https://play.google.com/store/apps/details?id=com.microsoft.office.officelens
.. _Genius Scan: https://play.google.com/store/apps/details?id=com.thegrizzlylabs.geniusscan.free
.. _OCR Scanner - QuickScan: https://apps.apple.com/us/app/quickscan-scanner-text-ocr/id1513790291
.. _OpenScan: https://github.com/Ethereal-Developers-Inc/OpenScan
.. _hannahswain: https://github.com/hannahswain
.. _benjaminfrank: https://github.com/benjaminfrank
API Scanning Setup
==================
This sections contains information on how to set up scanners to post directly to :ref:`Paperless API <api-file_uploads>`.
Doxie Q2
--------
This part assumes your Doxie is connected to WiFi and you know its IP.
1. Open your Doxie web UI by navigating to its IP address
2. Navigate to Options -> Webhook
3. Set the *URL* to ``https://[your-paperless-ngx-instance]/api/documents/post_document/``
4. Set the *File Parameter Name* to ``document``
5. Add the username and password to the respective fields (Consider creating a user just for your Doxie)
6. Click *Submit* at the bottom of the page
Congrats, you can now scan directly from your Doxie to your Paperless-ngx instance!

View File

@@ -1,63 +0,0 @@
.. _screenshots:
***********
Screenshots
***********
This is what Paperless-ngx looks like.
The dashboard shows customizable views on your document and allows document uploads:
.. image:: _static/screenshots/dashboard.png
:target: _static/screenshots/dashboard.png
The document list provides three different styles to scroll through your documents:
.. image:: _static/screenshots/documents-table.png
:target: _static/screenshots/documents-table.png
.. image:: _static/screenshots/documents-smallcards.png
:target: _static/screenshots/documents-smallcards.png
.. image:: _static/screenshots/documents-largecards.png
:target: _static/screenshots/documents-largecards.png
Paperless-ngx also supports "dark mode":
.. image:: _static/screenshots/documents-smallcards-dark.png
:target: _static/screenshots/documents-smallcards-dark.png
Extensive filtering mechanisms:
.. image:: _static/screenshots/documents-filter.png
:target: _static/screenshots/documents-filter.png
Bulk editing of document tags, correspondents, etc.:
.. image:: _static/screenshots/bulk-edit.png
:target: _static/screenshots/bulk-edit.png
Side-by-side editing of documents:
.. image:: _static/screenshots/editing.png
:target: _static/screenshots/editing.png
Tag editing. This looks about the same for correspondents and document types.
.. image:: _static/screenshots/new-tag.png
:target: _static/screenshots/new-tag.png
Searching provides auto complete and highlights the results.
.. image:: _static/screenshots/search-preview.png
:target: _static/screenshots/search-preview.png
.. image:: _static/screenshots/search-results.png
:target: _static/screenshots/search-results.png
Fancy mail filters!
.. image:: _static/screenshots/mail-rules-edited.png
:target: _static/screenshots/mail-rules-edited.png
Mobile devices are supported.
.. image:: _static/screenshots/mobile.png
:target: _static/screenshots/mobile.png

File diff suppressed because it is too large Load Diff

View File

@@ -1,32 +1,12 @@
***************
.. _troubleshooting:
Troubleshooting
***************
===============
No files are added by the consumer
##################################
Check for the following issues:
* Ensure that the directory you're putting your documents in is the folder
paperless is watching. With docker, this setting is performed in the
``docker-compose.yml`` file. Without docker, look at the ``CONSUMPTION_DIR``
setting. Don't adjust this setting if you're using docker.
* Ensure that redis is up and running. Paperless does its task processing
asynchronously, and for documents to arrive at the task processor, it needs
redis to run.
* Ensure that the task processor is running. Docker does this automatically.
Manually invoke the task processor by executing
.. code:: shell-session
$ python3 manage.py qcluster
* Look at the output of paperless and inspect it for any errors.
* Go to the admin interface, and check if there are failed tasks. If so, the
tasks will contain an error message.
.. _troubleshooting-languagemissing:
Consumer warns ``OCR for XX failed``
####################################
------------------------------------
If you find the OCR accuracy to be too low, and/or the document consumer warns
that ``OCR for XX failed, but we're going to stick with what we've got since
@@ -34,204 +14,63 @@ FORGIVING_OCR is enabled``, then you might need to install the
`Tesseract language files <http://packages.ubuntu.com/search?keywords=tesseract-ocr>`_
marching your document's languages.
As an example, if you are running Paperless-ngx from any Ubuntu or Debian
box, and your documents are written in Spanish you may need to run::
As an example, if you are running Paperless from the Vagrant setup provided
(or from any Ubuntu or Debian box), and your documents are written in Spanish
you may need to run::
apt-get install -y tesseract-ocr-spa
Consumer fails to pickup any new files
######################################
If you notice that the consumer will only pickup files in the consumption
directory at startup, but won't find any other files added later, you will need to
enable filesystem polling with the configuration option
``PAPERLESS_CONSUMER_POLLING``, see :ref:`here <configuration-polling>`.
.. _troubleshooting-convertpixelcache:
This will disable listening to filesystem changes with inotify and paperless will
manually check the consumption directory for changes instead.
Consumer dies with ``convert: unable to extent pixel cache``
------------------------------------------------------------
During the consumption process, Paperless invokes ImageMagick's ``convert``
program to translate the source document into something that the OCR engine can
understand and this can burn a Very Large amount of memory if the original
document is rather long. Similarly, if your system doesn't have a lot of
memory to begin with (ie. a Raspberry Pi), then this can happen for even
medium-sized documents.
Paperless always redirects to /admin
####################################
The solution is to tell ImageMagick *not* to Use All The RAM, as is its
default, and instead tell it to used a fixed amount. ``convert`` will then
break up the job into hundreds of individual files and use them to slowly
compile the finished image. Simply set ``PAPERLESS_CONVERT_MEMORY_LIMIT`` in
``/etc/paperless.conf`` to something like ``32000000`` and you'll limit
``convert`` to 32MB. Fiddle with this value as you like.
You probably had the old paperless installed at some point. Paperless installed
a permanent redirect to /admin in your browser, and you need to clear your
browsing data / cache to fix that.
**HOWEVER**: Simply setting this value may not be enough on system where
``/tmp`` is mounted as tmpfs, as this is where ``convert`` will write its
temporary files. In these cases (most Systemd machines), you need to tell
ImageMagick to use a different space for its scratch work. You do this by
setting ``PAPERLESS_CONVERT_TMPDIR`` in ``/etc/paperless.conf`` to somewhere
that's actually on a physical disk (and writable by the user running
Paperless), like ``/var/tmp/paperless`` or ``/home/my_user/tmp`` in a pinch.
Operation not permitted
#######################
.. _troubleshooting-decompressionbombwarning:
You might see errors such as:
DecompressionBombWarning and/or no text in the OCR output
---------------------------------------------------------
Some users have had issues using Paperless to consume PDFs that were created
by merging Very Large Scanned Images into one PDF. If this happens to you,
it's likely because the PDF you've created contains some very large pages
(millions of pixels) and the process of converting the PDF to a OCR-friendly
image is exploding.
.. code:: shell-session
Typically, this happens because the scanned images are created with a high
DPI and then rolled into the PDF with an assumed DPI of 72 (the default).
The best solution then is to specify the DPI used in the scan in the
conversion-to-PDF step. So for example, if you scanned the original image
with a DPI of 300, then merging the images into the single PDF with
``convert`` should look like this:
chown: changing ownership of '../export': Operation not permitted
.. code:: bash
The container tries to set file ownership on the listed directories. This is
required so that the user running paperless inside docker has write permissions
to these folders. This happens when pointing these directories to NFS shares,
for example.
$ convert -density 300 *.jpg finished.pdf
Ensure that ``chown`` is possible on these directories.
For more information on this and situations like it, you should take a look
at `Issue #118`_ as that's where this tip originated.
Classifier error: No training data available
############################################
This indicates that the Auto matching algorithm found no documents to learn from.
This may have two reasons:
* You don't use the Auto matching algorithm: The error can be safely ignored in this case.
* You are using the Auto matching algorithm: The classifier explicitly excludes documents
with Inbox tags. Verify that there are documents in your archive without inbox tags.
The algorithm will only learn from documents not in your inbox.
UserWarning in sklearn on every single document
###############################################
You may encounter warnings like this:
.. code::
/usr/local/lib/python3.7/site-packages/sklearn/base.py:315:
UserWarning: Trying to unpickle estimator CountVectorizer from version 0.23.2 when using version 0.24.0.
This might lead to breaking code or invalid results. Use at your own risk.
This happens when certain dependencies of paperless that are responsible for the auto matching algorithm are
updated. After updating these, your current training data *might* not be compatible anymore. This can be ignored
in most cases. This warning will disappear automatically when paperless updates the training data.
If you want to get rid of the warning or actually experience issues with automatic matching, delete
the file ``classification_model.pickle`` in the data directory and let paperless recreate it.
504 Server Error: Gateway Timeout when adding Office documents
##############################################################
You may experience these errors when using the optional TIKA integration:
.. code::
requests.exceptions.HTTPError: 504 Server Error: Gateway Timeout for url: http://gotenberg:3000/forms/libreoffice/convert
Gotenberg is a server that converts Office documents into PDF documents and has a default timeout of 30 seconds.
When conversion takes longer, Gotenberg raises this error.
You can increase the timeout by configuring a command flag for Gotenberg (see also `here <https://gotenberg.dev/docs/modules/api#properties>`__).
If using docker-compose, this is achieved by the following configuration change in the ``docker-compose.yml`` file:
.. code:: yaml
gotenberg:
image: gotenberg/gotenberg:7.4
restart: unless-stopped
command:
- "gotenberg"
- "--chromium-disable-routes=true"
- "--api-timeout=60"
Permission denied errors in the consumption directory
#####################################################
You might encounter errors such as:
.. code:: shell-session
The following error occured while consuming document.pdf: [Errno 13] Permission denied: '/usr/src/paperless/src/../consume/document.pdf'
This happens when paperless does not have permission to delete files inside the consumption directory.
Ensure that ``USERMAP_UID`` and ``USERMAP_GID`` are set to the user id and group id you use on the host operating system, if these are
different from ``1000``. See :ref:`setup-docker_hub`.
Also ensure that you are able to read and write to the consumption directory on the host.
OSError: [Errno 19] No such device when consuming files
#######################################################
If you experience errors such as:
.. code:: shell-session
File "/usr/local/lib/python3.7/site-packages/whoosh/codec/base.py", line 570, in open_compound_file
return CompoundStorage(dbfile, use_mmap=storage.supports_mmap)
File "/usr/local/lib/python3.7/site-packages/whoosh/filedb/compound.py", line 75, in __init__
self._source = mmap.mmap(fileno, 0, access=mmap.ACCESS_READ)
OSError: [Errno 19] No such device
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/usr/local/lib/python3.7/site-packages/django_q/cluster.py", line 436, in worker
res = f(*task["args"], **task["kwargs"])
File "/usr/src/paperless/src/documents/tasks.py", line 73, in consume_file
override_tag_ids=override_tag_ids)
File "/usr/src/paperless/src/documents/consumer.py", line 271, in try_consume_file
raise ConsumerError(e)
Paperless uses a search index to provide better and faster full text searching. This search index is stored inside
the ``data`` folder. The search index uses memory-mapped files (mmap). The above error indicates that paperless
was unable to create and open these files.
This happens when you're trying to store the data directory on certain file systems (mostly network shares)
that don't support memory-mapped files.
Web-UI stuck at "Loading..."
############################
This might have multiple reasons.
1. If you built the docker image yourself or deployed using the bare metal route,
make sure that there are files in ``<paperless-root>/static/frontend/<lang-code>/``.
If there are no files, make sure that you executed ``collectstatic`` successfully, either
manually or as part of the docker image build.
If the front end is still missing, make sure that the front end is compiled (files present in
``src/documents/static/frontend``). If it is not, you need to compile the front end yourself
or download the release archive instead of cloning the repository.
2. Check the output of the web server. You might see errors like this:
.. code::
[2021-01-25 10:08:04 +0000] [40] [ERROR] Socket error processing request.
Traceback (most recent call last):
File "/usr/local/lib/python3.7/site-packages/gunicorn/workers/sync.py", line 134, in handle
self.handle_request(listener, req, client, addr)
File "/usr/local/lib/python3.7/site-packages/gunicorn/workers/sync.py", line 190, in handle_request
util.reraise(*sys.exc_info())
File "/usr/local/lib/python3.7/site-packages/gunicorn/util.py", line 625, in reraise
raise value
File "/usr/local/lib/python3.7/site-packages/gunicorn/workers/sync.py", line 178, in handle_request
resp.write_file(respiter)
File "/usr/local/lib/python3.7/site-packages/gunicorn/http/wsgi.py", line 396, in write_file
if not self.sendfile(respiter):
File "/usr/local/lib/python3.7/site-packages/gunicorn/http/wsgi.py", line 386, in sendfile
sent += os.sendfile(sockno, fileno, offset + sent, count)
OSError: [Errno 22] Invalid argument
To fix this issue, add
.. code::
SENDFILE=0
to your `docker-compose.env` file.
Error while reading metadata
############################
You might find messages like these in your log files:
.. code::
[WARNING] [paperless.parsing.tesseract] Error while reading metadata
This indicates that paperless failed to read PDF metadata from one of your documents. This happens when you
open the affected documents in paperless for editing. Paperless will continue to work, and will simply not
show the invalid metadata.
.. _Issue #118: https://github.com/danielquinn/paperless/issues/118

View File

@@ -1,417 +0,0 @@
**************
Usage Overview
**************
Paperless is an application that manages your personal documents. With
the help of a document scanner (see :ref:`scanners`), paperless transforms
your wieldy physical document binders into a searchable archive and
provides many utilities for finding and managing your documents.
Terms and definitions
#####################
Paperless essentially consists of two different parts for managing your
documents:
* The *consumer* watches a specified folder and adds all documents in that
folder to paperless.
* The *web server* provides a UI that you use to manage and search for your
scanned documents.
Each document has a couple of fields that you can assign to them:
* A *Document* is a piece of paper that sometimes contains valuable
information.
* The *correspondent* of a document is the person, institution or company that
a document either originates from, or is sent to.
* A *tag* is a label that you can assign to documents. Think of labels as more
powerful folders: Multiple documents can be grouped together with a single
tag, however, a single document can also have multiple tags. This is not
possible with folders. The reason folders are not implemented in paperless
is simply that tags are much more versatile than folders.
* A *document type* is used to demarcate the type of a document such as letter,
bank statement, invoice, contract, etc. It is used to identify what a document
is about.
* The *date added* of a document is the date the document was scanned into
paperless. You cannot and should not change this date.
* The *date created* of a document is the date the document was initially issued.
This can be the date you bought a product, the date you signed a contract, or
the date a letter was sent to you.
* The *archive serial number* (short: ASN) of a document is the identifier of
the document in your physical document binders. See
:ref:`usage-recommended_workflow` below.
* The *content* of a document is the text that was OCR'ed from the document.
This text is fed into the search engine and is used for matching tags,
correspondents and document types.
Frontend overview
#################
.. warning::
TBD. Add some fancy screenshots!
Adding documents to paperless
#############################
Once you've got Paperless setup, you need to start feeding documents into it.
When adding documents to paperless, it will perform the following operations on
your documents:
1. OCR the document, if it has no text. Digital documents usually have text,
and this step will be skipped for those documents.
2. Paperless will create an archivable PDF/A document from your document.
If this document is coming from your scanner, it will have embedded selectable text.
3. Paperless performs automatic matching of tags, correspondents and types on the
document before storing it in the database.
.. hint::
This process can be configured to fit your needs. If you don't want paperless
to create archived versions for digital documents, you can configure that by
configuring ``PAPERLESS_OCR_MODE=skip_noarchive``. Please read the
:ref:`relevant section in the documentation <configuration-ocr>`.
.. note::
No matter which options you choose, Paperless will always store the original
document that it found in the consumption directory or in the mail and
will never overwrite that document. Archived versions are stored alongside the
original versions.
The consumption directory
=========================
The primary method of getting documents into your database is by putting them in
the consumption directory. The consumer runs in an infinite loop, looking for new
additions to this directory. When it finds them, the consumer goes about the process
of parsing them with the OCR, indexing what it finds, and storing it in the media directory.
Getting stuff into this directory is up to you. If you're running Paperless
on your local computer, you might just want to drag and drop files there, but if
you're running this on a server and want your scanner to automatically push
files to this directory, you'll need to setup some sort of service to accept the
files from the scanner. Typically, you're looking at an FTP server like
`Proftpd`_ or a Windows folder share with `Samba`_.
.. _Proftpd: http://www.proftpd.org/
.. _Samba: http://www.samba.org/
.. TODO: hyperref to configuration of the location of this magic folder.
Web UI Upload
=============
The dashboard has a file drop field to upload documents to paperless. Simply drag a file
onto this field or select a file with the file dialog. Multiple files are supported.
You can also upload documents on any other page of the web UI by dragging-and-dropping
files into your browser window.
.. _usage-mobile_upload:
Mobile upload
=============
The mobile app over at `<https://github.com/qcasey/paperless_share>`_ allows Android users
to share any documents with paperless. This can be combined with any of the mobile
scanning apps out there, such as Office Lens.
Furthermore, there is the `Paperless App <https://github.com/bauerj/paperless_app>`_ as well,
which not only has document upload, but also document browsing and download features.
.. _usage-email:
IMAP (Email)
============
You can tell paperless-ngx to consume documents from your email accounts.
This is a very flexible and powerful feature, if you regularly received documents
via mail that you need to archive. The mail consumer can be configured by using the
admin interface in the following manner:
1. Define e-mail accounts.
2. Define mail rules for your account.
These rules perform the following:
1. Connect to the mail server.
2. Fetch all matching mails (as defined by folder, maximum age and the filters)
3. Check if there are any consumable attachments.
4. If so, instruct paperless to consume the attachments and optionally
use the metadata provided in the rule for the new document.
5. If documents were consumed from a mail, the rule action is performed
on that mail.
Paperless will completely ignore mails that do not match your filters. It will also
only perform the action on mails that it has consumed documents from.
The actions all ensure that the same mail is not consumed twice by different means.
These are as follows:
* **Delete:** Immediately deletes mail that paperless has consumed documents from.
Use with caution.
* **Mark as read:** Mark consumed mail as read. Paperless will not consume documents
from already read mails. If you read a mail before paperless sees it, it will be
ignored.
* **Flag:** Sets the 'important' flag on mails with consumed documents. Paperless
will not consume flagged mails.
* **Move to folder:** Moves consumed mails out of the way so that paperless wont
consume them again.
.. caution::
The mail consumer will perform these actions on all mails it has consumed
documents from. Keep in mind that the actual consumption process may fail
for some reason, leaving you with missing documents in paperless.
.. note::
With the correct set of rules, you can completely automate your email documents.
Create rules for every correspondent you receive digital documents from and
paperless will read them automatically. The default action "mark as read" is
pretty tame and will not cause any damage or data loss whatsoever.
You can also setup a special folder in your mail account for paperless and use
your favorite mail client to move to be consumed mails into that folder
automatically or manually and tell paperless to move them to yet another folder
after consumption. It's up to you.
.. note::
When defining a mail rule with a folder, you may need to try different characters to
define how the sub-folders are separated. Common values include ".", "/" or "|", but
this varies by the mail server. Check the documentation for your mail server. In the
event of an error fetching mail from a certain folder, check the Paperless logs. When
a folder is not located, Paperless will attempt to list all folders found in the account
to the Paperless logs.
.. note::
Paperless will process the rules in the order defined in the admin page.
You can define catch-all rules and have them executed last to consume
any documents not matched by previous rules. Such a rule may assign an "Unknown
mail document" tag to consumed documents so you can inspect them further.
Paperless is set up to check your mails every 10 minutes. This can be configured on the
'Scheduled tasks' page in the admin.
REST API
========
You can also submit a document using the REST API, see :ref:`api-file_uploads` for details.
.. _basic-searching:
Best practices
##############
Paperless offers a couple tools that help you organize your document collection. However,
it is up to you to use them in a way that helps you organize documents and find specific
documents when you need them. This section offers a couple ideas for managing your collection.
Document types allow you to classify documents according to what they are. You can define
types such as "Receipt", "Invoice", or "Contract". If you used to collect all your receipts
in a single binder, you can recreate that system in paperless by defining a document type,
assigning documents to that type and then filtering by that type to only see all receipts.
Not all documents need document types. Sometimes its hard to determine what the type of a
document is or it is hard to justify creating a document type that you only need once or twice.
This is okay. As long as the types you define help you organize your collection in the way
you want, paperless is doing its job.
Tags can be used in many different ways. Think of tags are more versatile folders or binders.
If you have a binder for documents related to university / your car or health care, you can
create these binders in paperless by creating tags and assigning them to relevant documents.
Just as with documents, you can filter the document list by tags and only see documents of
a certain topic.
With physical documents, you'll often need to decide which folder the document belongs to.
The advantage of tags over folders and binders is that a single document can have multiple
tags. A physical document cannot magically appear in two different folders, but with tags,
this is entirely possible.
.. hint::
This can be used in many different ways. One example: Imagine you're working on a particular
task, such as signing up for university. Usually you'll need to collect a bunch of different
documents that are already sorted into various folders. With the tag system of paperless,
you can create a new group of documents that are relevant to this task without destroying
the already existing organization. When you're done with the task, you could delete the
tag again, which would be equal to sorting documents back into the folder they belong into.
Or keep the tag, up to you.
All of the logic above applies to correspondents as well. Attach them to documents if you
feel that they help you organize your collection.
When you've started organizing your documents, create a couple saved views for document collections
you regularly access. This is equal to having labeled physical binders on your desk, except
that these saved views are dynamic and simply update themselves as you add documents to the system.
Here are a couple examples of tags and types that you could use in your collection.
* An ``inbox`` tag for newly added documents that you haven't manually edited yet.
* A tag ``car`` for everything car related (repairs, registration, insurance, etc)
* A tag ``todo`` for documents that you still need to do something with, such as reply, or
perform some task online.
* A tag ``bank account x`` for all bank statement related to that account.
* A tag ``mail`` for anything that you added to paperless via its mail processing capabilities.
* A tag ``missing_metadata`` when you still need to add some metadata to a document, but can't
or don't want to do this right now.
.. _basic-usage_searching:
Searching
#########
Paperless offers an extensive searching mechanism that is designed to allow you to quickly
find a document you're looking for (for example, that thing that just broke and you bought
a couple months ago, that contract you signed 8 years ago).
When you search paperless for a document, it tries to match this query against your documents.
Paperless will look for matching documents by inspecting their content, title, correspondent,
type and tags. Paperless returns a scored list of results, so that documents matching your query
better will appear further up in the search results.
By default, paperless returns only documents which contain all words typed in the search bar.
However, paperless also offers advanced search syntax if you want to drill down the results
further.
Matching documents with logical expressions:
.. code::
shopname AND (product1 OR product2)
Matching specific tags, correspondents or types:
.. code::
type:invoice tag:unpaid
correspondent:university certificate
Matching dates:
.. code::
created:[2005 to 2009]
added:yesterday
modified:today
Matching inexact words:
.. code::
produ*name
.. note::
Inexact terms are hard for search indexes. These queries might take a while to execute. That's why paperless offers
auto complete and query correction.
All of these constructs can be combined as you see fit.
If you want to learn more about the query language used by paperless, paperless uses Whoosh's default query language.
Head over to `Whoosh query language <https://whoosh.readthedocs.io/en/latest/querylang.html>`_.
For details on what date parsing utilities are available, see
`Date parsing <https://whoosh.readthedocs.io/en/latest/dates.html#parsing-date-queries>`_.
.. _usage-recommended_workflow:
The recommended workflow
########################
Once you have familiarized yourself with paperless and are ready to use it
for all your documents, the recommended workflow for managing your documents
is as follows. This workflow also takes into account that some documents
have to be kept in physical form, but still ensures that you get all the
advantages for these documents as well.
The following diagram shows how easy it is to manage your documents.
.. image:: _static/recommended_workflow.png
Preparations in paperless
=========================
* Create an inbox tag that gets assigned to all new documents.
* Create a TODO tag.
Processing of the physical documents
====================================
Keep a physical inbox. Whenever you receive a document that you need to
archive, put it into your inbox. Regularly, do the following for all documents
in your inbox:
1. For each document, decide if you need to keep the document in physical
form. This applies to certain important documents, such as contracts and
certificates.
2. If you need to keep the document, write a running number on the document
before scanning, starting at one and counting upwards. This is the archive
serial number, or ASN in short.
3. Scan the document.
4. If the document has an ASN assigned, store it in a *single* binder, sorted
by ASN. Don't order this binder in any other way.
5. If the document has no ASN, throw it away. Yay!
Over time, you will notice that your physical binder will fill up. If it is
full, label the binder with the range of ASNs in this binder (i.e., "Documents
1 to 343"), store the binder in your cellar or elsewhere, and start a new
binder.
The idea behind this process is that you will never have to use the physical
binders to find a document. If you need a specific physical document, you
may find this document by:
1. Searching in paperless for the document.
2. Identify the ASN of the document, since it appears on the scan.
3. Grab the relevant document binder and get the document. This is easy since
they are sorted by ASN.
Processing of documents in paperless
====================================
Once you have scanned in a document, proceed in paperless as follows.
1. If the document has an ASN, assign the ASN to the document.
2. Assign a correspondent to the document (i.e., your employer, bank, etc)
This isn't strictly necessary but helps in finding a document when you need
it.
3. Assign a document type (i.e., invoice, bank statement, etc) to the document
This isn't strictly necessary but helps in finding a document when you need
it.
4. Assign a proper title to the document (the name of an item you bought, the
subject of the letter, etc)
5. Check that the date of the document is correct. Paperless tries to read
the date from the content of the document, but this fails sometimes if the
OCR is bad or multiple dates appear on the document.
6. Remove inbox tags from the documents.
.. hint::
You can setup manual matching rules for your correspondents and tags and
paperless will assign them automatically. After consuming a couple documents,
you can even ask paperless to *learn* when to assign tags and correspondents
by itself. For details on this feature, see :ref:`advanced-matching`.
Task management
===============
Some documents require attention and require you to act on the document. You
may take two different approaches to handle these documents based on how
regularly you intend to scan documents and use paperless.
* If you scan and process your documents in paperless regularly, assign a
TODO tag to all scanned documents that you need to process. Create a saved
view on the dashboard that shows all documents with this tag.
* If you do not scan documents regularly and use paperless solely for archiving,
create a physical todo box next to your physical inbox and put documents you
need to process in the TODO box. When you performed the task associated with
the document, move it to the inbox.

207
docs/utilities.rst Normal file
View File

@@ -0,0 +1,207 @@
.. _utilities:
Utilities
=========
There's basically three utilities to Paperless: the webserver, consumer, and
if needed, the exporter. They're all detailed here.
.. _utilities-webserver:
The Webserver
-------------
At the heart of it, Paperless is a simple Django webservice, and the entire
interface is based on Django's standard admin interface. Once running, visiting
the URL for your service delivers the admin, through which you can get a
detailed listing of all available documents, search for specific files, and
download whatever it is you're looking for.
.. _utilities-webserver-howto:
How to Use It
.............
The webserver is started via the ``manage.py`` script:
.. code-block:: shell-session
$ /path/to/paperless/src/manage.py runserver
By default, the server runs on localhost, port 8000, but you can change this
with a few arguments, run ``manage.py --help`` for more information.
Note that this command runs continuously, so exiting it will mean your webserver
disappears. If you want to run this full-time (which is kind of the point)
you'll need to have it start in the background -- something you'll need to
figure out for your own system. To get you started though, there are Systemd
service files in the ``scripts`` directory.
.. _utilities-consumer:
The Consumer
------------
The consumer script runs in an infinite loop, constantly looking at a directory
for PDF files to parse and index. The process is pretty straightforward:
1. Look in ``CONSUMPTION_DIR`` for a PDF. If one is found, go to #2. If not,
wait 10 seconds and try again.
2. Parse the PDF with Tesseract
3. Create a new record in the database with the OCR'd text
4. Attempt to automatically assign document attributes by doing some guesswork.
Read up on the :ref:`guesswork documentation<guesswork>` for more
information about this process.
5. Encrypt the PDF and store it in the ``media`` directory under
``documents/pdf``.
6. Go to #1.
.. _utilities-consumer-howto:
How to Use It
.............
The consumer is started via the ``manage.py`` script:
.. code-block:: shell-session
$ /path/to/paperless/src/manage.py document_consumer
This starts the service that will run in a loop, consuming PDF files as they
appear in ``CONSUMPTION_DIR``.
Note that this command runs continuously, so exiting it will mean your webserver
disappears. If you want to run this full-time (which is kind of the point)
you'll need to have it start in the background -- something you'll need to
figure out for your own system. To get you started though, there are Systemd
service files in the ``scripts`` directory.
.. _utilities-exporter:
The Exporter
------------
Tired of fiddling with Paperless, or just want to do something stupid and are
afraid of accidentally damaging your files? You can export all of your PDFs
into neatly named, dated, and unencrypted.
.. _utilities-exporter-howto:
How to Use It
.............
This too is done via the ``manage.py`` script:
.. code-block:: shell-session
$ /path/to/paperless/src/manage.py document_exporter /path/to/somewhere/
This will dump all of your unencrypted PDFs into ``/path/to/somewhere`` for you
to do with as you please. The files are accompanied with a special file,
``manifest.json`` which can be used to
:ref:`import the files <utilities-importer>` at a later date if you wish.
.. _utilities-exporter-howto-docker:
Docker
______
If you are :ref:`using Docker <setup-installation-docker>`, running the
expoorter is almost as easy. To mount a volume for exports, follow the
instructions in the ``docker-compose.yml.example`` file for the ``/export``
volume (making the changes in your own ``docker-compose.yml`` file, of course).
Once you have the volume mounted, the command to run an export is:
.. code-block:: shell-session
$ docker-compose run --rm consumer document_exporter /export
If you prefer to use ``docker run`` directly, supplying the necessary commandline
options:
.. code-block:: shell-session
$ # Identify your containers
$ docker-compose ps
Name Command State Ports
-------------------------------------------------------------------------
paperless_consumer_1 /sbin/docker-entrypoint.sh ... Exit 0
paperless_webserver_1 /sbin/docker-entrypoint.sh ... Exit 0
$ # Make sure to replace your passphrase and remove or adapt the id mapping
$ docker run --rm \
--volumes-from paperless_data_1 \
--volume /path/to/arbitrary/place:/export \
-e PAPERLESS_PASSPHRASE=YOUR_PASSPHRASE \
-e USERMAP_UID=1000 -e USERMAP_GID=1000 \
paperless document_exporter /export
.. _utilities-importer:
The Importer
------------
Looking to transfer Paperless data from one instance to another, or just want
to restore from a backup? This is your go-to toy.
.. _utilities-importer-howto:
How to Use It
.............
The importer works just like the exporter. You point it at a directory, and
the script does the rest of the work:
.. code-block:: shell-session
$ /path/to/paperless/src/manage.py document_importer /path/to/somewhere/
Docker
______
Assuming that you've already gone through the steps above in the
:ref:`export <utilities-exporter-howto-docker>` section, then the easiest thing
to do is just re-use the ``/export`` path you already setup:
.. code-block:: shell-session
$ docker-compose run --rm consumer document_importer /export
Similarly, if you're not using docker-compose, you can adjust the export
instructions above to do the import.
.. _utilities-retagger:
The Re-tagger
-------------
Say you've imported a few hundred documents and now want to introduce a tag
and apply its matching to all of the currently-imported docs. This problem is
common enough that there's a tool for it.
.. _utilities-retagger-howto:
How to Use It
.............
This too is done via the ``manage.py`` script:
.. code:: bash
$ /path/to/paperless/src/manage.py document_retagger
That's it. It'll loop over all of the documents in your database and attempt
to match all of your tags to them. If one matches, it'll be applied. And
don't worry, you can run this as often as you like, it' won't double-tag
a document.

Some files were not shown because too many files have changed in this diff Show More