Compare commits

..

2 Commits

Author SHA1 Message Date
Trenton Holmes
7d8721c53c (delete me) 2022-04-02 07:14:14 -07:00
Trenton Holmes
aa67c851e3 (delete me) 2022-04-02 07:12:56 -07:00
712 changed files with 76563 additions and 202027 deletions

View File

@@ -1,9 +0,0 @@
{
"qpdf": {
"version": "11.3.0"
},
"jbig2enc": {
"version": "0.29",
"git_tag": "0.29"
}
}

View File

@@ -1,19 +0,0 @@
# https://docs.codecov.com/docs/pull-request-comments
# codecov will only comment if coverage changes
comment:
require_changes: true
coverage:
status:
project:
default:
# https://docs.codecov.com/docs/commit-status#threshold
threshold: 1%
# https://docs.codecov.com/docs/commit-status#only_pulls
only_pulls: true
patch:
default:
# For the changed lines only, target 75% covered, but
# allow as low as 50%
target: 75%
threshold: 25%
only_pulls: true

View File

@@ -27,14 +27,11 @@ indent_style = space
[*.md]
indent_style = space
[Pipfile.lock]
indent_style = space
# Tests don't get a line width restriction. It's still a good idea to follow
# the 79 character rule, but in the interests of clarity, tests often need to
# violate it.
[**/test_*.py]
max_line_length = off
[Dockerfile*]
[Dockerfile]
indent_style = space

View File

@@ -1,14 +0,0 @@
title: "[Feature Request] "
body:
- type: textarea
id: description
attributes:
label: Description
description: A clear and concise description of what you would like to see.
validations:
required: true
- type: textarea
id: other
attributes:
label: Other
description: Add any other context or information about the feature request here.

View File

@@ -1,97 +0,0 @@
name: Bug report
description: Something is not working
title: "[BUG] Concise description of the issue"
labels: ["bug", "unconfirmed"]
body:
- type: markdown
attributes:
value: |
Have a question? 👉 [Start a new discussion](https://github.com/paperless-ngx/paperless-ngx/discussions/new) or [ask in chat](https://matrix.to/#/#paperlessngx:matrix.org).
Before opening an issue, please double check:
- [The troubleshooting documentation](https://docs.paperless-ngx.com/troubleshooting/).
- [The installation instructions](https://docs.paperless-ngx.com/setup/#installation).
- [Existing issues and discussions](https://github.com/paperless-ngx/paperless-ngx/search?q=&type=issues).
- Disable any customer container initialization scripts, if using any
If you encounter issues while installing or configuring Paperless-ngx, please post in the ["Support" section of the discussions](https://github.com/paperless-ngx/paperless-ngx/discussions/new?category=support).
- type: textarea
id: description
attributes:
label: Description
description: A clear and concise description of what the bug is. If applicable, add screenshots to help explain your problem.
placeholder: |
Currently Paperless does not work when...
[Screenshot if applicable]
validations:
required: true
- type: textarea
id: reproduction
attributes:
label: Steps to reproduce
description: Steps to reproduce the behavior.
placeholder: |
1. Go to '...'
2. Click on '....'
3. See error
validations:
required: true
- type: textarea
id: logs
attributes:
label: Webserver logs
description: Logs from the web server related to your issue.
render: bash
validations:
required: true
- type: textarea
id: logs_browser
attributes:
label: Browser logs
description: Logs from the web browser related to your issue, if needed
render: bash
- type: input
id: version
attributes:
label: Paperless-ngx version
placeholder: e.g. 1.6.0
validations:
required: true
- type: input
id: host-os
attributes:
label: Host OS
description: Host OS of the machine running paperless-ngx. Please add the architecture (uname -m) if applicable.
placeholder: e.g. Archlinux / Ubuntu 20.04 / Raspberry Pi `arm64`
validations:
required: true
- type: dropdown
id: install-method
attributes:
label: Installation method
options:
- Docker - official image
- Docker - linuxserver.io image
- Bare metal
- Other (please describe above)
description: Note there are significant differences from the official image and linuxserver.io, please check if your issue is specific to the third-party image.
validations:
required: true
- type: input
id: browser
attributes:
label: Browser
description: Which browser you are using, if relevant.
placeholder: e.g. Chrome, Safari
- type: input
id: config-changes
attributes:
label: Configuration changes
description: Any configuration changes you made in `docker-compose.yml`, `docker-compose.env` or `paperless.conf`.
- type: input
id: other
attributes:
label: Other
description: Any other relevant details.

50
.github/ISSUE_TEMPLATE/bug_report.md vendored Normal file
View File

@@ -0,0 +1,50 @@
---
name: Bug report
about: Something is not working
title: '[BUG] Concise description of the issue'
labels: ''
assignees: ''
---
<!---
=> Before opening an issue, please check the documentation and see if it helps you resolve your issue: https://paperless-ngx.readthedocs.io/en/latest/troubleshooting.html
=> Please also make sure that you followed the installation instructions.
=> Please search the issues and look for similar issues before opening a bug report.
=> If you would like to submit a feature request please submit one under https://github.com/paperless-ngx/paperless-ngx/discussions/categories/feature-requests
=> If you encounter issues while installing of configuring Paperless-ngx, please post that in the "Support" section of the discussions. Remember that Paperless successfully runs on a variety of different systems. If paperless does not start, it's probably an issue with your system, and not an issue of paperless.
=> Don't remove the [BUG] prefix from the title.
-->
**Describe the bug**
A clear and concise description of what the bug is.
**To Reproduce**
Steps to reproduce the behavior:
1. Go to '...'
2. Click on '....'
3. Scroll down to '....'
4. See error
**Expected behavior**
A clear and concise description of what you expected to happen.
**Screenshots**
If applicable, add screenshots to help explain your problem.
**Webserver logs**
```
If available, post any logs from the web server related to your issue.
```
**Relevant information**
- Host OS of the machine running paperless: [e.g. Archlinux / Ubuntu 20.04]
- Browser [e.g. chrome, safari]
- Version [e.g. 1.0.0]
- Installation method: [docker / bare metal]
- Any configuration changes you made in `docker-compose.yml`, `docker-compose.env` or `paperless.conf`.

View File

@@ -1,11 +0,0 @@
blank_issues_enabled: false
contact_links:
- name: 🤔 Questions and Help
url: https://github.com/paperless-ngx/paperless-ngx/discussions
about: This issue tracker is not for support questions. Please refer to our Discussions.
- name: 💬 Chat
url: https://matrix.to/#/#paperlessngx:matrix.org
about: Want to discuss Paperless-ngx with others? Check out our chat.
- name: 🚀 Feature Request
url: https://github.com/paperless-ngx/paperless-ngx/discussions/new?category=feature-requests
about: Remember to search for existing feature requests and "up-vote" any you like

19
.github/ISSUE_TEMPLATE/other.md vendored Normal file
View File

@@ -0,0 +1,19 @@
---
name: Other
about: Anything that is not a feature request or bug.
title: '[Other] Title of your issue'
labels: ''
assignees: ''
---
<!--
=> Discussions, Feedback and other suggestions belong in the "Discussion" section and not on the issue tracker.
=> If you would like to submit a feature request please submit one under https://github.com/paperless-ngx/paperless-ngx/discussions/categories/feature-requests
=> If you encounter issues while installing of configuring Paperless-ngx, please post that in the "Support" section of the discussions. Remember that Paperless successfully runs on a variety of different systems. If paperless does not start, it's probably is an issue with your system, and not an issue of paperless.
=> Don't remove the [Other] prefix from the title.
-->

View File

@@ -1,15 +1,23 @@
<!--
<!--
Note: All PRs with code changes should be targeted to the `dev` branch, pure documentation changes can target `main`
-->
## Proposed change
<!--
Please include a summary of the change and which issue is fixed (if any) and any relevant motivation / context. List any dependencies that are required for this change. If appropriate, please include an explanation of how your proposed change can be tested. Screenshots and / or videos can also be helpful if appropriate.
Please include a summary of the change and which issue is fixed (if any) and any relevant motivation / context. List any dependencies that are required for this change. If appropriate, please include an explanation of how your poposed change can be tested. Screenshots and / or videos can also be helpful if appropriate.
-->
Fixes # (issue)
<!--
Please also tag the relevant team to help with review. You can tag any of the following:
@paperless-ngx/backend (Python / django, database, etc.)
@paperless-ngx/frontend (JavaScript/Typescript, HTML, CSS, etc.)
@paperless-ngx/ci-cd (GitHub Actions, deployment)
@paperless-ngx/test (General testing for larger PRs)
-->
## Type of change
<!--
@@ -26,7 +34,7 @@ NOTE: Please check only one box!
- [ ] I have read & agree with the [contributing guidelines](https://github.com/paperless-ngx/paperless-ngx/blob/main/CONTRIBUTING.md).
- [ ] If applicable, I have tested my code for new features & regressions on both mobile & desktop devices, using the latest version of major browsers.
- [ ] If applicable, I have checked that all tests pass, see [documentation](https://docs.paperless-ngx.com/development/#back-end-development).
- [ ] I have run all `pre-commit` hooks, see [documentation](https://docs.paperless-ngx.com/development/#code-formatting-with-pre-commit-hooks).
- [ ] If applicable, I have checked that all tests pass, see [documentation](https://paperless-ngx.readthedocs.io/en/latest/extending.html#back-end-development).
- [ ] I have run all `pre-commit` hooks, see [documentation](https://paperless-ngx.readthedocs.io/en/latest/extending.html#code-formatting-with-pre-commit-hooks).
- [ ] I have made corresponding changes to the documentation as needed.
- [ ] I have checked my modifications for any breaking changes.

View File

@@ -6,14 +6,11 @@ updates:
# Enable version updates for npm
- package-ecosystem: "npm"
target-branch: "dev"
# Look for `package.json` and `lock` files in the `/src-ui` directory
# Look for `package.json` and `lock` files in the `root` directory
directory: "/src-ui"
# Check the npm registry for updates every month
schedule:
interval: "monthly"
labels:
- "frontend"
- "dependencies"
# Add reviewers
reviewers:
- "paperless-ngx/frontend"
@@ -29,13 +26,9 @@ updates:
labels:
- "backend"
- "dependencies"
# Add reviewers
reviewers:
- "paperless-ngx/backend"
# Enable updates for Github Actions
- package-ecosystem: "github-actions"
target-branch: "dev"
directory: "/"
schedule:
# Check for updates to GitHub Actions every month
@@ -45,4 +38,4 @@ updates:
- "dependencies"
# Add reviewers
reviewers:
- "paperless-ngx/ci-cd"
- "paperless-ngx/backend"

View File

@@ -1,65 +0,0 @@
autolabeler:
- label: "bug"
branch:
- '/^fix/'
title:
- "/^fix/i"
- "/^Bugfix/i"
- label: "enhancement"
branch:
- '/^feature/'
title:
- "/^feature/i"
categories:
- title: 'Breaking Changes'
labels:
- 'breaking-change'
- title: 'Notable Changes'
labels:
- 'notable'
- title: 'Features'
labels:
- 'enhancement'
- title: 'Bug Fixes'
labels:
- 'bug'
- title: 'Documentation'
labels:
- 'documentation'
- title: 'Maintenance'
labels:
- 'chore'
- 'deployment'
- 'translation'
- 'ci-cd'
- title: 'Dependencies'
collapse-after: 3
labels:
- 'dependencies'
- title: 'All App Changes'
labels:
- 'frontend'
- 'backend'
collapse-after: 1
include-labels:
- 'enhancement'
- 'bug'
- 'chore'
- 'deployment'
- 'translation'
- 'dependencies'
- 'documentation'
- 'frontend'
- 'backend'
- 'ci-cd'
- 'breaking-change'
- 'notable'
exclude-labels:
- 'skip-changelog'
category-template: '### $TITLE'
change-template: '- $TITLE @$AUTHOR ([#$NUMBER]($URL))'
change-title-escapes: '\<*_&#@'
template: |
## paperless-ngx $RESOLVED_VERSION
$CHANGES

View File

@@ -1,485 +0,0 @@
import json
import logging
import os
import shutil
import subprocess
from argparse import ArgumentParser
from typing import Dict
from typing import Final
from typing import Iterator
from typing import List
from typing import Optional
from common import get_log_level
from github import ContainerPackage
from github import GithubBranchApi
from github import GithubContainerRegistryApi
logger = logging.getLogger("cleanup-tags")
class ImageProperties:
"""
Data class wrapping the properties of an entry in the image index
manifests list. It is NOT an actual image with layers, etc
https://docs.docker.com/registry/spec/manifest-v2-2/
https://github.com/opencontainers/image-spec/blob/main/manifest.md
https://github.com/opencontainers/image-spec/blob/main/descriptor.md
"""
def __init__(self, data: Dict) -> None:
self._data = data
# This is the sha256: digest string. Corresponds to GitHub API name
# if the package is an untagged package
self.digest = self._data["digest"]
platform_data_os = self._data["platform"]["os"]
platform_arch = self._data["platform"]["architecture"]
platform_variant = self._data["platform"].get(
"variant",
"",
)
self.platform = f"{platform_data_os}/{platform_arch}{platform_variant}"
class ImageIndex:
"""
Data class wrapping up logic for an OCI Image Index
JSON data. Primary use is to access the manifests listing
See https://github.com/opencontainers/image-spec/blob/main/image-index.md
"""
def __init__(self, package_url: str, tag: str) -> None:
self.qualified_name = f"{package_url}:{tag}"
logger.info(f"Getting image index for {self.qualified_name}")
try:
proc = subprocess.run(
[
shutil.which("docker"),
"buildx",
"imagetools",
"inspect",
"--raw",
self.qualified_name,
],
capture_output=True,
check=True,
)
self._data = json.loads(proc.stdout)
except subprocess.CalledProcessError as e:
logger.error(
f"Failed to get image index for {self.qualified_name}: {e.stderr}",
)
raise e
@property
def image_pointers(self) -> Iterator[ImageProperties]:
for manifest_data in self._data["manifests"]:
yield ImageProperties(manifest_data)
class RegistryTagsCleaner:
"""
This is the base class for the image registry cleaning. Given a package
name, it will keep all images which are tagged and all untagged images
referred to by a manifest. This results in only images which have been untagged
and cannot be referenced except by their SHA in being removed. None of these
images should be referenced, so it is fine to delete them.
"""
def __init__(
self,
package_name: str,
repo_owner: str,
repo_name: str,
package_api: GithubContainerRegistryApi,
branch_api: Optional[GithubBranchApi],
):
self.actually_delete = False
self.package_api = package_api
self.branch_api = branch_api
self.package_name = package_name
self.repo_owner = repo_owner
self.repo_name = repo_name
self.tags_to_delete: List[str] = []
self.tags_to_keep: List[str] = []
# Get the information about all versions of the given package
# These are active, not deleted, the default returned from the API
self.all_package_versions = self.package_api.get_active_package_versions(
self.package_name,
)
# Get a mapping from a tag like "1.7.0" or "feature-xyz" to the ContainerPackage
# tagged with it. It makes certain lookups easy
self.all_pkgs_tags_to_version: Dict[str, ContainerPackage] = {}
for pkg in self.all_package_versions:
for tag in pkg.tags:
self.all_pkgs_tags_to_version[tag] = pkg
logger.info(
f"Located {len(self.all_package_versions)} versions of package {self.package_name}",
)
self.decide_what_tags_to_keep()
def clean(self):
"""
This method will delete image versions, based on the selected tags to delete.
It behaves more like an unlinking than actual deletion. Removing the tag
simply removes a pointer to an image, but the actual image data remains accessible
if one has the sha256 digest of it.
"""
for tag_to_delete in self.tags_to_delete:
package_version_info = self.all_pkgs_tags_to_version[tag_to_delete]
if self.actually_delete:
logger.info(
f"Deleting {tag_to_delete} (id {package_version_info.id})",
)
self.package_api.delete_package_version(
package_version_info,
)
else:
logger.info(
f"Would delete {tag_to_delete} (id {package_version_info.id})",
)
else:
logger.info("No tags to delete")
def clean_untagged(self, is_manifest_image: bool):
"""
This method will delete untagged images, that is those which are not named. It
handles if the image tag is actually a manifest, which points to images that look otherwise
untagged.
"""
def _clean_untagged_manifest():
"""
Handles the deletion of untagged images, but where the package is a manifest, ie a multi
arch image, which means some "untagged" images need to exist still.
Ok, bear with me, these are annoying.
Our images are multi-arch, so the manifest is more like a pointer to a sha256 digest.
These images are untagged, but pointed to, and so should not be removed (or every pull fails).
So for each image getting kept, parse the manifest to find the digest(s) it points to. Then
remove those from the list of untagged images. The final result is the untagged, not pointed to
version which should be safe to remove.
Example:
Tag: ghcr.io/paperless-ngx/paperless-ngx:1.7.1 refers to
amd64: sha256:b9ed4f8753bbf5146547671052d7e91f68cdfc9ef049d06690b2bc866fec2690
armv7: sha256:81605222df4ba4605a2ba4893276e5d08c511231ead1d5da061410e1bbec05c3
arm64: sha256:374cd68db40734b844705bfc38faae84cc4182371de4bebd533a9a365d5e8f3b
each of which appears as untagged image, but isn't really.
So from the list of untagged packages, remove those digests. Once all tags which
are being kept are checked, the remaining untagged packages are actually untagged
with no referrals in a manifest to them.
"""
# Simplify the untagged data, mapping name (which is a digest) to the version
# At the moment, these are the images which APPEAR untagged.
untagged_versions = {}
for x in self.all_package_versions:
if x.untagged:
untagged_versions[x.name] = x
skips = 0
# Parse manifests to locate digests pointed to
for tag in sorted(self.tags_to_keep):
try:
image_index = ImageIndex(
f"ghcr.io/{self.repo_owner}/{self.package_name}",
tag,
)
for manifest in image_index.image_pointers:
if manifest.digest in untagged_versions:
logger.info(
f"Skipping deletion of {manifest.digest},"
f" referred to by {image_index.qualified_name}"
f" for {manifest.platform}",
)
del untagged_versions[manifest.digest]
skips += 1
except Exception as err:
self.actually_delete = False
logger.exception(err)
return
logger.info(
f"Skipping deletion of {skips} packages referred to by a manifest",
)
# Delete the untagged and not pointed at packages
logger.info(f"Deleting untagged packages of {self.package_name}")
for to_delete_name in untagged_versions:
to_delete_version = untagged_versions[to_delete_name]
if self.actually_delete:
logger.info(
f"Deleting id {to_delete_version.id} named {to_delete_version.name}",
)
self.package_api.delete_package_version(
to_delete_version,
)
else:
logger.info(
f"Would delete {to_delete_name} (id {to_delete_version.id})",
)
def _clean_untagged_non_manifest():
"""
If the package is not a multi-arch manifest, images without tags are safe to delete.
"""
for package in self.all_package_versions:
if package.untagged:
if self.actually_delete:
logger.info(
f"Deleting id {package.id} named {package.name}",
)
self.package_api.delete_package_version(
package,
)
else:
logger.info(
f"Would delete {package.name} (id {package.id})",
)
else:
logger.info(
f"Not deleting tag {package.tags[0]} of package {self.package_name}",
)
logger.info("Beginning untagged image cleaning")
if is_manifest_image:
_clean_untagged_manifest()
else:
_clean_untagged_non_manifest()
def decide_what_tags_to_keep(self):
"""
This method holds the logic to delete what tags to keep and there fore
what tags to delete.
By default, any image with at least 1 tag will be kept
"""
# By default, keep anything which is tagged
self.tags_to_keep = list(set(self.all_pkgs_tags_to_version.keys()))
def check_remaining_tags_valid(self):
"""
Checks the non-deleted tags are still valid. The assumption is if the
manifest is can be inspected and each image manifest if points to can be
inspected, the image will still pull.
https://github.com/opencontainers/image-spec/blob/main/image-index.md
"""
logger.info("Beginning confirmation step")
a_tag_failed = False
for tag in sorted(self.tags_to_keep):
try:
image_index = ImageIndex(
f"ghcr.io/{self.repo_owner}/{self.package_name}",
tag,
)
for manifest in image_index.image_pointers:
logger.info(f"Checking {manifest.digest} for {manifest.platform}")
# This follows the pointer from the index to an actual image, layers and all
# Note the format is @
digest_name = f"ghcr.io/{self.repo_owner}/{self.package_name}@{manifest.digest}"
try:
subprocess.run(
[
shutil.which("docker"),
"buildx",
"imagetools",
"inspect",
"--raw",
digest_name,
],
capture_output=True,
check=True,
)
except subprocess.CalledProcessError as e:
logger.error(f"Failed to inspect digest: {e.stderr}")
a_tag_failed = True
except subprocess.CalledProcessError as e:
a_tag_failed = True
logger.error(f"Failed to inspect: {e.stderr}")
continue
if a_tag_failed:
raise Exception("At least one image tag failed to inspect")
class MainImageTagsCleaner(RegistryTagsCleaner):
def decide_what_tags_to_keep(self):
"""
Overrides the default logic for deciding what images to keep. Images tagged as "feature-"
will be removed, if the corresponding branch no longer exists.
"""
# Default to everything gets kept still
super().decide_what_tags_to_keep()
# Locate the feature branches
feature_branches = {}
for branch in self.branch_api.get_branches(
repo=self.repo_name,
):
if branch.name.startswith("feature-"):
logger.debug(f"Found feature branch {branch.name}")
feature_branches[branch.name] = branch
logger.info(f"Located {len(feature_branches)} feature branches")
if not len(feature_branches):
# Our work here is done, delete nothing
return
# Filter to packages which are tagged with feature-*
packages_tagged_feature: List[ContainerPackage] = []
for package in self.all_package_versions:
if package.tag_matches("feature-"):
packages_tagged_feature.append(package)
# Map tags like "feature-xyz" to a ContainerPackage
feature_pkgs_tags_to_versions: Dict[str, ContainerPackage] = {}
for pkg in packages_tagged_feature:
for tag in pkg.tags:
feature_pkgs_tags_to_versions[tag] = pkg
logger.info(
f'Located {len(feature_pkgs_tags_to_versions)} versions of package {self.package_name} tagged "feature-"',
)
# All the feature tags minus all the feature branches leaves us feature tags
# with no corresponding branch
self.tags_to_delete = list(
set(feature_pkgs_tags_to_versions.keys()) - set(feature_branches.keys()),
)
# All the tags minus the set of going to be deleted tags leaves us the
# tags which will be kept around
self.tags_to_keep = list(
set(self.all_pkgs_tags_to_version.keys()) - set(self.tags_to_delete),
)
logger.info(
f"Located {len(self.tags_to_delete)} versions of package {self.package_name} to delete",
)
class LibraryTagsCleaner(RegistryTagsCleaner):
"""
Exists for the off chance that someday, the installer library images
will need their own logic
"""
def _main():
parser = ArgumentParser(
description="Using the GitHub API locate and optionally delete container"
" tags which no longer have an associated feature branch",
)
# Requires an affirmative command to actually do a delete
parser.add_argument(
"--delete",
action="store_true",
default=False,
help="If provided, actually delete the container tags",
)
# When a tagged image is updated, the previous version remains, but it no longer tagged
# Add this option to remove them as well
parser.add_argument(
"--untagged",
action="store_true",
default=False,
help="If provided, delete untagged containers as well",
)
# If given, the package is assumed to be a multi-arch manifest. Cache packages are
# not multi-arch, all other types are
parser.add_argument(
"--is-manifest",
action="store_true",
default=False,
help="If provided, the package is assumed to be a multi-arch manifest following schema v2",
)
# Allows configuration of log level for debugging
parser.add_argument(
"--loglevel",
default="info",
help="Configures the logging level",
)
# Get the name of the package being processed this round
parser.add_argument(
"package",
help="The package to process",
)
args = parser.parse_args()
logging.basicConfig(
level=get_log_level(args),
datefmt="%Y-%m-%d %H:%M:%S",
format="%(asctime)s %(levelname)-8s %(message)s",
)
# Must be provided in the environment
repo_owner: Final[str] = os.environ["GITHUB_REPOSITORY_OWNER"]
repo: Final[str] = os.environ["GITHUB_REPOSITORY"]
gh_token: Final[str] = os.environ["TOKEN"]
# Find all branches named feature-*
# Note: Only relevant to the main application, but simpler to
# leave in for all packages
with GithubBranchApi(gh_token) as branch_api:
with GithubContainerRegistryApi(gh_token, repo_owner) as container_api:
if args.package in {"paperless-ngx", "paperless-ngx/builder/cache/app"}:
cleaner = MainImageTagsCleaner(
args.package,
repo_owner,
repo,
container_api,
branch_api,
)
else:
cleaner = LibraryTagsCleaner(
args.package,
repo_owner,
repo,
container_api,
None,
)
# Set if actually doing a delete vs dry run
cleaner.actually_delete = args.delete
# Clean images with tags
cleaner.clean()
# Clean images which are untagged
cleaner.clean_untagged(args.is_manifest)
# Verify remaining tags still pull
if args.is_manifest:
cleaner.check_remaining_tags_valid()
if __name__ == "__main__":
_main()

View File

@@ -1,47 +0,0 @@
import logging
def get_image_tag(
repo_name: str,
pkg_name: str,
pkg_version: str,
) -> str:
"""
Returns a string representing the normal image for a given package
"""
return f"ghcr.io/{repo_name.lower()}/builder/{pkg_name}:{pkg_version}"
def get_cache_image_tag(
repo_name: str,
pkg_name: str,
pkg_version: str,
branch_name: str,
) -> str:
"""
Returns a string representing the expected image cache tag for a given package
Registry type caching is utilized for the builder images, to allow fast
rebuilds, generally almost instant for the same version
"""
return f"ghcr.io/{repo_name.lower()}/builder/cache/{pkg_name}:{pkg_version}"
def get_log_level(args) -> int:
"""
Returns a logging level, based
:param args:
:return:
"""
levels = {
"critical": logging.CRITICAL,
"error": logging.ERROR,
"warn": logging.WARNING,
"warning": logging.WARNING,
"info": logging.INFO,
"debug": logging.DEBUG,
}
level = levels.get(args.loglevel.lower())
if level is None:
level = logging.INFO
return level

View File

@@ -1,91 +0,0 @@
"""
This is a helper script for the mutli-stage Docker image builder.
It provides a single point of configuration for package version control.
The output JSON object is used by the CI workflow to determine what versions
to build and pull into the final Docker image.
Python package information is obtained from the Pipfile.lock. As this is
kept updated by dependabot, it usually will need no further configuration.
The sole exception currently is pikepdf, which has a dependency on qpdf,
and is configured here to use the latest version of qpdf built by the workflow.
Other package version information is configured directly below, generally by
setting the version and Git information, if any.
"""
import argparse
import json
import os
from pathlib import Path
from typing import Final
from common import get_cache_image_tag
from common import get_image_tag
def _main():
parser = argparse.ArgumentParser(
description="Generate a JSON object of information required to build the given package, based on the Pipfile.lock",
)
parser.add_argument(
"package",
help="The name of the package to generate JSON for",
)
PIPFILE_LOCK_PATH: Final[Path] = Path("Pipfile.lock")
BUILD_CONFIG_PATH: Final[Path] = Path(".build-config.json")
# Read the main config file
build_json: Final = json.loads(BUILD_CONFIG_PATH.read_text())
# Read Pipfile.lock file
pipfile_data: Final = json.loads(PIPFILE_LOCK_PATH.read_text())
args: Final = parser.parse_args()
# Read from environment variables set by GitHub Actions
repo_name: Final[str] = os.environ["GITHUB_REPOSITORY"]
branch_name: Final[str] = os.environ["GITHUB_REF_NAME"]
# Default output values
version = None
extra_config = {}
if args.package in pipfile_data["default"]:
# Read the version from Pipfile.lock
pkg_data = pipfile_data["default"][args.package]
pkg_version = pkg_data["version"].split("==")[-1]
version = pkg_version
# Any extra/special values needed
if args.package == "pikepdf":
extra_config["qpdf_version"] = build_json["qpdf"]["version"]
elif args.package in build_json:
version = build_json[args.package]["version"]
else:
raise NotImplementedError(args.package)
# The JSON object we'll output
output = {
"name": args.package,
"version": version,
"image_tag": get_image_tag(repo_name, args.package, version),
"cache_tag": get_cache_image_tag(
repo_name,
args.package,
version,
branch_name,
),
}
# Add anything special a package may need
output.update(extra_config)
# Output the JSON info to stdout
print(json.dumps(output))
if __name__ == "__main__":
_main()

View File

@@ -1,270 +0,0 @@
"""
This module contains some useful classes for interacting with the Github API.
The full documentation for the API can be found here: https://docs.github.com/en/rest
Mostly, this focusses on two areas, repo branches and repo packages, as the use case
is cleaning up container images which are no longer referred to.
"""
import functools
import logging
import re
import urllib.parse
from typing import Dict
from typing import List
from typing import Optional
import httpx
logger = logging.getLogger("github-api")
class _GithubApiBase:
"""
A base class for interacting with the Github API. It
will handle the session and setting authorization headers.
"""
def __init__(self, token: str) -> None:
self._token = token
self._client: Optional[httpx.Client] = None
def __enter__(self) -> "_GithubApiBase":
"""
Sets up the required headers for auth and response
type from the API
"""
self._client = httpx.Client()
self._client.headers.update(
{
"Accept": "application/vnd.github.v3+json",
"Authorization": f"token {self._token}",
},
)
return self
def __exit__(self, exc_type, exc_val, exc_tb):
"""
Ensures the authorization token is cleaned up no matter
the reason for the exit
"""
if "Accept" in self._client.headers:
del self._client.headers["Accept"]
if "Authorization" in self._client.headers:
del self._client.headers["Authorization"]
# Close the session as well
self._client.close()
self._client = None
def _read_all_pages(self, endpoint):
"""
Helper function to read all pages of an endpoint, utilizing the
next.url until exhausted. Assumes the endpoint returns a list
"""
internal_data = []
while True:
resp = self._client.get(endpoint)
if resp.status_code == 200:
internal_data += resp.json()
if "next" in resp.links:
endpoint = resp.links["next"]["url"]
else:
logger.debug("Exiting pagination loop")
break
else:
logger.warning(f"Request to {endpoint} return HTTP {resp.status_code}")
resp.raise_for_status()
return internal_data
class _EndpointResponse:
"""
For all endpoint JSON responses, store the full
response data, for ease of extending later, if need be.
"""
def __init__(self, data: Dict) -> None:
self._data = data
class GithubBranch(_EndpointResponse):
"""
Simple wrapper for a repository branch, only extracts name information
for now.
"""
def __init__(self, data: Dict) -> None:
super().__init__(data)
self.name = self._data["name"]
class GithubBranchApi(_GithubApiBase):
"""
Wrapper around branch API.
See https://docs.github.com/en/rest/branches/branches
"""
def __init__(self, token: str) -> None:
super().__init__(token)
self._ENDPOINT = "https://api.github.com/repos/{REPO}/branches"
def get_branches(self, repo: str) -> List[GithubBranch]:
"""
Returns all current branches of the given repository owned by the given
owner or organization.
"""
# The environment GITHUB_REPOSITORY already contains the owner in the correct location
endpoint = self._ENDPOINT.format(REPO=repo)
internal_data = self._read_all_pages(endpoint)
return [GithubBranch(branch) for branch in internal_data]
class ContainerPackage(_EndpointResponse):
"""
Data class wrapping the JSON response from the package related
endpoints
"""
def __init__(self, data: Dict):
super().__init__(data)
# This is a numerical ID, required for interactions with this
# specific package, including deletion of it or restoration
self.id: int = self._data["id"]
# A string name. This might be an actual name or it could be a
# digest string like "sha256:"
self.name: str = self._data["name"]
# URL to the package, including its ID, can be used for deletion
# or restoration without needing to build up a URL ourselves
self.url: str = self._data["url"]
# The list of tags applied to this image. Maybe an empty list
self.tags: List[str] = self._data["metadata"]["container"]["tags"]
@functools.cached_property
def untagged(self) -> bool:
"""
Returns True if the image has no tags applied to it, False otherwise
"""
return len(self.tags) == 0
@functools.cache
def tag_matches(self, pattern: str) -> bool:
"""
Returns True if the image has at least one tag which matches the given regex,
False otherwise
"""
return any(re.match(pattern, tag) is not None for tag in self.tags)
def __repr__(self):
return f"Package {self.name}"
class GithubContainerRegistryApi(_GithubApiBase):
"""
Class wrapper to deal with the Github packages API. This class only deals with
container type packages, the only type published by paperless-ngx.
"""
def __init__(self, token: str, owner_or_org: str) -> None:
super().__init__(token)
self._owner_or_org = owner_or_org
if self._owner_or_org == "paperless-ngx":
# https://docs.github.com/en/rest/packages#get-all-package-versions-for-a-package-owned-by-an-organization
self._PACKAGES_VERSIONS_ENDPOINT = "https://api.github.com/orgs/{ORG}/packages/{PACKAGE_TYPE}/{PACKAGE_NAME}/versions"
# https://docs.github.com/en/rest/packages#delete-package-version-for-an-organization
self._PACKAGE_VERSION_DELETE_ENDPOINT = "https://api.github.com/orgs/{ORG}/packages/{PACKAGE_TYPE}/{PACKAGE_NAME}/versions/{PACKAGE_VERSION_ID}"
else:
# https://docs.github.com/en/rest/packages#get-all-package-versions-for-a-package-owned-by-the-authenticated-user
self._PACKAGES_VERSIONS_ENDPOINT = "https://api.github.com/user/packages/{PACKAGE_TYPE}/{PACKAGE_NAME}/versions"
# https://docs.github.com/en/rest/packages#delete-a-package-version-for-the-authenticated-user
self._PACKAGE_VERSION_DELETE_ENDPOINT = "https://api.github.com/user/packages/{PACKAGE_TYPE}/{PACKAGE_NAME}/versions/{PACKAGE_VERSION_ID}"
self._PACKAGE_VERSION_RESTORE_ENDPOINT = (
f"{self._PACKAGE_VERSION_DELETE_ENDPOINT}/restore"
)
def get_active_package_versions(
self,
package_name: str,
) -> List[ContainerPackage]:
"""
Returns all the versions of a given package (container images) from
the API
"""
package_type: str = "container"
# Need to quote this for slashes in the name
package_name = urllib.parse.quote(package_name, safe="")
endpoint = self._PACKAGES_VERSIONS_ENDPOINT.format(
ORG=self._owner_or_org,
PACKAGE_TYPE=package_type,
PACKAGE_NAME=package_name,
)
pkgs = []
for data in self._read_all_pages(endpoint):
pkgs.append(ContainerPackage(data))
return pkgs
def get_deleted_package_versions(
self,
package_name: str,
) -> List[ContainerPackage]:
package_type: str = "container"
# Need to quote this for slashes in the name
package_name = urllib.parse.quote(package_name, safe="")
endpoint = (
self._PACKAGES_VERSIONS_ENDPOINT.format(
ORG=self._owner_or_org,
PACKAGE_TYPE=package_type,
PACKAGE_NAME=package_name,
)
+ "?state=deleted"
)
pkgs = []
for data in self._read_all_pages(endpoint):
pkgs.append(ContainerPackage(data))
return pkgs
def delete_package_version(self, package_data: ContainerPackage):
"""
Deletes the given package version from the GHCR
"""
resp = self._client.delete(package_data.url)
if resp.status_code != 204:
logger.warning(
f"Request to delete {package_data.url} returned HTTP {resp.status_code}",
)
def restore_package_version(
self,
package_name: str,
package_data: ContainerPackage,
):
package_type: str = "container"
endpoint = self._PACKAGE_VERSION_RESTORE_ENDPOINT.format(
ORG=self._owner_or_org,
PACKAGE_TYPE=package_type,
PACKAGE_NAME=package_name,
PACKAGE_VERSION_ID=package_data.id,
)
resp = self._client.post(endpoint)
if resp.status_code != 204:
logger.warning(
f"Request to delete {endpoint} returned HTTP {resp.status_code}",
)

15
.github/stale.yml vendored
View File

@@ -1,23 +1,18 @@
# Number of days of inactivity before an issue becomes stale
daysUntilStale: 30
# Number of days of inactivity before a stale issue is closed
daysUntilClose: 7
# Only issues or pull requests with all of these labels are check if stale. Defaults to `[]` (disabled)
onlyLabels: [cant-reproduce]
# Issues with these labels will never be considered stale
exemptLabels:
- pinned
- security
- fixpending
# Label to use when marking an issue as stale
staleLabel: stale
# Comment to post when marking an issue as stale. Set to `false` to disable
markComment: >
This issue has been automatically marked as stale because it has not had
recent activity. It will be closed if no further activity occurs. Thank you
for your contributions.
# Comment to post when closing a stale issue. Set to `false` to disable
closeComment: false
# See https://github.com/marketplace/stale for more info on the app
# and https://github.com/probot/stale for the configuration docs

View File

@@ -3,350 +3,221 @@ name: ci
on:
push:
tags:
# https://semver.org/#spec-item-2
- 'v[0-9]+.[0-9]+.[0-9]+'
# https://semver.org/#spec-item-9
- 'v[0-9]+.[0-9]+.[0-9]+-beta.rc[0-9]+'
- ngx-*
- beta-*
branches-ignore:
- 'translations**'
pull_request:
branches-ignore:
- 'translations**'
env:
# This is the version of pipenv all the steps will use
# If changing this, change Dockerfile
DEFAULT_PIP_ENV_VERSION: "2023.3.20"
# This is the default version of Python to use in most steps
# If changing this, change Dockerfile
DEFAULT_PYTHON_VERSION: "3.9"
jobs:
pre-commit:
name: Linting Checks
runs-on: ubuntu-22.04
steps:
-
name: Checkout repository
uses: actions/checkout@v3
-
name: Install python
uses: actions/setup-python@v4
with:
python-version: ${{ env.DEFAULT_PYTHON_VERSION }}
-
name: Check files
uses: pre-commit/action@v3.0.0
documentation:
name: "Build Documentation"
runs-on: ubuntu-22.04
needs:
- pre-commit
runs-on: ubuntu-20.04
steps:
-
name: Checkout
uses: actions/checkout@v3
-
name: Install pipenv
run: pipx install pipenv
-
name: Set up Python
id: setup-python
uses: actions/setup-python@v4
uses: actions/setup-python@v3
with:
python-version: ${{ env.DEFAULT_PYTHON_VERSION }}
python-version: 3.9
cache: "pipenv"
cache-dependency-path: 'Pipfile.lock'
-
name: Install pipenv
run: |
pip install --user pipenv==${DEFAULT_PIP_ENV_VERSION}
-
name: Install dependencies
run: |
pipenv --python ${{ steps.setup-python.outputs.python-version }} sync --dev
-
name: List installed Python dependencies
run: |
pipenv --python ${{ steps.setup-python.outputs.python-version }} run pip list
pipenv sync --dev
-
name: Make documentation
run: |
pipenv --python ${{ steps.setup-python.outputs.python-version }} run mkdocs build --config-file ./mkdocs.yml
cd docs/
pipenv run make html
-
name: Upload artifact
uses: actions/upload-artifact@v3
with:
name: documentation
path: site/
path: docs/_build/html/
documentation-deploy:
name: "Deploy Documentation"
runs-on: ubuntu-22.04
if: github.event_name == 'push' && github.ref == 'refs/heads/main'
needs:
- documentation
code-checks-backend:
name: "Backend Code Checks"
runs-on: ubuntu-20.04
steps:
-
name: Checkout
uses: actions/checkout@v3
-
name: Deploy docs
uses: mhausenblas/mkdocs-deploy-gh-pages@master
env:
GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }}
CUSTOM_DOMAIN: docs.paperless-ngx.com
CONFIG_FILE: mkdocs.yml
EXTRA_PACKAGES: build-base
name: Install checkers
run: |
pipx install reorder-python-imports
pipx install yesqa
pipx install add-trailing-comma
pipx install flake8
-
name: Run reorder-python-imports
run: |
find src/ -type f -name '*.py' ! -path "*/migrations/*" | xargs reorder-python-imports
-
name: Run yesqa
run: |
find src/ -type f -name '*.py' ! -path "*/migrations/*" | xargs yesqa
-
name: Run add-trailing-comma
run: |
find src/ -type f -name '*.py' ! -path "*/migrations/*" | xargs add-trailing-comma
# black is placed after add-trailing-comma because it may format differently
# if a trailing comma is added
-
name: Run black
uses: psf/black@stable
with:
options: "--check --diff"
version: "22.3.0"
-
name: Run flake8 checks
run: |
cd src/
flake8 --max-line-length=88 --ignore=E203,W503
code-checks-frontend:
name: "Frontend Code Checks"
runs-on: ubuntu-20.04
steps:
-
name: Checkout
uses: actions/checkout@v3
- uses: actions/setup-node@v3
with:
node-version: '16'
-
name: Install prettier
run: |
npm install prettier
-
name: Run prettier
run:
npx prettier --check --ignore-path Pipfile.lock **/*.js **/*.ts *.md **/*.md
tests-backend:
name: "Tests (${{ matrix.python-version }})"
runs-on: ubuntu-22.04
needs:
- pre-commit
needs: [code-checks-backend]
name: "Backend Tests (${{ matrix.python-version }})"
runs-on: ubuntu-20.04
strategy:
matrix:
python-version: ['3.8', '3.9', '3.10']
python-version: ['3.8', '3.9']
fail-fast: false
env:
# Enable Tika end to end testing
TIKA_LIVE: 1
# Enable paperless_mail testing against real server
PAPERLESS_MAIL_TEST_HOST: ${{ secrets.TEST_MAIL_HOST }}
PAPERLESS_MAIL_TEST_USER: ${{ secrets.TEST_MAIL_USER }}
PAPERLESS_MAIL_TEST_PASSWD: ${{ secrets.TEST_MAIL_PASSWD }}
# Enable Gotenberg end to end testing
GOTENBERG_LIVE: 1
steps:
-
name: Checkout
uses: actions/checkout@v3
with:
fetch-depth: 2
-
name: Start containers
run: |
docker compose --file ${GITHUB_WORKSPACE}/docker/compose/docker-compose.ci-test.yml pull --quiet
docker compose --file ${GITHUB_WORKSPACE}/docker/compose/docker-compose.ci-test.yml up --detach
name: Install pipenv
run: pipx install pipenv
-
name: Set up Python
id: setup-python
uses: actions/setup-python@v4
uses: actions/setup-python@v3
with:
python-version: "${{ matrix.python-version }}"
cache: "pipenv"
cache-dependency-path: 'Pipfile.lock'
-
name: Install pipenv
run: |
pip install --user pipenv==${DEFAULT_PIP_ENV_VERSION}
-
name: Install system dependencies
run: |
sudo apt-get update -qq
sudo apt-get install -qq --no-install-recommends unpaper tesseract-ocr imagemagick ghostscript libzbar0 poppler-utils
-
name: Configure ImageMagick
run: |
sudo cp docker/imagemagick-policy.xml /etc/ImageMagick-6/policy.xml
sudo apt-get install -qq --no-install-recommends unpaper tesseract-ocr imagemagick ghostscript optipng
-
name: Install Python dependencies
run: |
pipenv --python ${{ steps.setup-python.outputs.python-version }} run python --version
pipenv --python ${{ steps.setup-python.outputs.python-version }} sync --dev
-
name: List installed Python dependencies
run: |
pipenv --python ${{ steps.setup-python.outputs.python-version }} run pip list
pipenv sync --dev
-
name: Tests
run: |
cd src/
pipenv --python ${{ steps.setup-python.outputs.python-version }} run pytest -ra
pipenv run pytest
-
name: Upload coverage to Codecov
if: ${{ matrix.python-version == env.DEFAULT_PYTHON_VERSION }}
uses: codecov/codecov-action@v3
name: Get changed files
id: changed-files-specific
uses: tj-actions/changed-files@v18.1
with:
# not required for public repos, but intermittently fails otherwise
token: ${{ secrets.CODECOV_TOKEN }}
# future expansion
flags: backend
files: |
src/**
-
name: Stop containers
if: always()
name: List all changed files
run: |
docker compose --file ${GITHUB_WORKSPACE}/docker/compose/docker-compose.ci-test.yml logs
docker compose --file ${GITHUB_WORKSPACE}/docker/compose/docker-compose.ci-test.yml down
for file in ${{ steps.changed-files-specific.outputs.all_changed_files }}; do
echo "${file} was changed"
done
-
name: Publish coverage results
if: matrix.python-version == '3.9' && steps.changed-files-specific.outputs.any_changed == 'true'
env:
GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }}
# https://github.com/coveralls-clients/coveralls-python/issues/251
run: |
cd src/
pipenv run coveralls --service=github
tests-frontend:
name: "Tests Frontend"
runs-on: ubuntu-22.04
needs:
- pre-commit
needs: [code-checks-frontend]
name: "Frontend Tests"
runs-on: ubuntu-20.04
strategy:
matrix:
node-version: [16.x]
steps:
- uses: actions/checkout@v3
-
name: Use Node.js ${{ matrix.node-version }}
- name: Use Node.js ${{ matrix.node-version }}
uses: actions/setup-node@v3
with:
node-version: ${{ matrix.node-version }}
cache: 'npm'
cache-dependency-path: 'src-ui/package-lock.json'
- run: cd src-ui && npm ci
- run: cd src-ui && npm run lint
- run: cd src-ui && npm run test
- run: cd src-ui && npm run e2e:ci
prepare-docker-build:
name: Prepare Docker Pipeline Data
if: github.event_name == 'push' && (startsWith(github.ref, 'refs/heads/feature-') || github.ref == 'refs/heads/dev' || github.ref == 'refs/heads/beta' || contains(github.ref, 'beta.rc') || startsWith(github.ref, 'refs/tags/v'))
runs-on: ubuntu-22.04
needs:
- documentation
- tests-backend
- tests-frontend
steps:
-
name: Set ghcr repository name
id: set-ghcr-repository
run: |
ghcr_name=$(echo "${GITHUB_REPOSITORY}" | awk '{ print tolower($0) }')
echo "repository=${ghcr_name}" >> $GITHUB_OUTPUT
-
name: Checkout
uses: actions/checkout@v3
-
name: Set up Python
uses: actions/setup-python@v4
with:
python-version: ${{ env.DEFAULT_PYTHON_VERSION }}
-
name: Setup qpdf image
id: qpdf-setup
run: |
build_json=$(python ${GITHUB_WORKSPACE}/.github/scripts/get-build-json.py qpdf)
echo ${build_json}
echo "qpdf-json=${build_json}" >> $GITHUB_OUTPUT
-
name: Setup psycopg2 image
id: psycopg2-setup
run: |
build_json=$(python ${GITHUB_WORKSPACE}/.github/scripts/get-build-json.py psycopg2)
echo ${build_json}
echo "psycopg2-json=${build_json}" >> $GITHUB_OUTPUT
-
name: Setup pikepdf image
id: pikepdf-setup
run: |
build_json=$(python ${GITHUB_WORKSPACE}/.github/scripts/get-build-json.py pikepdf)
echo ${build_json}
echo "pikepdf-json=${build_json}" >> $GITHUB_OUTPUT
-
name: Setup jbig2enc image
id: jbig2enc-setup
run: |
build_json=$(python ${GITHUB_WORKSPACE}/.github/scripts/get-build-json.py jbig2enc)
echo ${build_json}
echo "jbig2enc-json=${build_json}" >> $GITHUB_OUTPUT
outputs:
ghcr-repository: ${{ steps.set-ghcr-repository.outputs.repository }}
qpdf-json: ${{ steps.qpdf-setup.outputs.qpdf-json }}
pikepdf-json: ${{ steps.pikepdf-setup.outputs.pikepdf-json }}
psycopg2-json: ${{ steps.psycopg2-setup.outputs.psycopg2-json }}
jbig2enc-json: ${{ steps.jbig2enc-setup.outputs.jbig2enc-json}}
# build and push image to docker hub.
build-docker-image:
runs-on: ubuntu-22.04
concurrency:
group: ${{ github.workflow }}-build-docker-image-${{ github.ref_name }}
cancel-in-progress: true
needs:
- prepare-docker-build
if: github.event_name == 'push' && (startsWith(github.ref, 'refs/heads/feature-') || github.ref == 'refs/heads/dev' || github.ref == 'refs/heads/beta' || startsWith(github.ref, 'refs/tags/ngx-') || startsWith(github.ref, 'refs/tags/beta-'))
runs-on: ubuntu-20.04
needs: [tests-backend, tests-frontend]
steps:
-
name: Check pushing to Docker Hub
id: docker-hub
# Only push to Dockerhub from the main repo AND the ref is either:
# main
# dev
# beta
# a tag
# Otherwise forks would require a Docker Hub account and secrets setup
run: |
if [[ ${{ needs.prepare-docker-build.outputs.ghcr-repository }} == "paperless-ngx/paperless-ngx" && ( ${{ github.ref_name }} == "main" || ${{ github.ref_name }} == "dev" || ${{ github.ref_name }} == "beta" || ${{ startsWith(github.ref, 'refs/tags/v') }} == "true" ) ]] ; then
echo "Enabling DockerHub image push"
echo "enable=true" >> $GITHUB_OUTPUT
else
echo "Not pushing to DockerHub"
echo "enable=false" >> $GITHUB_OUTPUT
fi
-
name: Gather Docker metadata
id: docker-meta
uses: docker/metadata-action@v4
uses: docker/metadata-action@v3
with:
images: |
ghcr.io/${{ needs.prepare-docker-build.outputs.ghcr-repository }}
name=paperlessngx/paperless-ngx,enable=${{ steps.docker-hub.outputs.enable }}
name=quay.io/paperlessngx/paperless-ngx,enable=${{ steps.docker-hub.outputs.enable }}
images: ghcr.io/${{ github.repository }}
tags: |
# Tag branches with branch name
type=match,pattern=ngx-(\d.\d.\d),group=1
type=semver,pattern=ngx-{{version}}
type=semver,pattern=ngx-{{major}}.{{minor}}
type=ref,event=branch
# Process semver tags
# For a tag x.y.z or vX.Y.Z, output an x.y.z and x.y image tag
type=semver,pattern={{version}}
type=semver,pattern={{major}}.{{minor}}
-
name: Checkout
uses: actions/checkout@v3
-
name: Set up Docker Buildx
uses: docker/setup-buildx-action@v2
uses: docker/setup-buildx-action@v1
-
name: Set up QEMU
uses: docker/setup-qemu-action@v2
uses: docker/setup-qemu-action@v1
-
name: Login to Github Container Registry
uses: docker/login-action@v2
uses: docker/login-action@v1
with:
registry: ghcr.io
username: ${{ github.actor }}
password: ${{ secrets.GITHUB_TOKEN }}
-
name: Login to Docker Hub
uses: docker/login-action@v2
# Don't attempt to login is not pushing to Docker Hub
if: steps.docker-hub.outputs.enable == 'true'
with:
username: ${{ secrets.DOCKERHUB_USERNAME }}
password: ${{ secrets.DOCKERHUB_TOKEN }}
-
name: Login to Quay.io
uses: docker/login-action@v2
# Don't attempt to login is not pushing to Docker Hub
if: steps.docker-hub.outputs.enable == 'true'
with:
registry: quay.io
username: ${{ secrets.QUAY_USERNAME }}
password: ${{ secrets.QUAY_ROBOT_TOKEN }}
-
name: Build and push
uses: docker/build-push-action@v4
uses: docker/build-push-action@v2
with:
context: .
file: ./Dockerfile
@@ -354,19 +225,8 @@ jobs:
push: ${{ github.event_name != 'pull_request' }}
tags: ${{ steps.docker-meta.outputs.tags }}
labels: ${{ steps.docker-meta.outputs.labels }}
build-args: |
JBIG2ENC_VERSION=${{ fromJSON(needs.prepare-docker-build.outputs.jbig2enc-json).version }}
QPDF_VERSION=${{ fromJSON(needs.prepare-docker-build.outputs.qpdf-json).version }}
PIKEPDF_VERSION=${{ fromJSON(needs.prepare-docker-build.outputs.pikepdf-json).version }}
PSYCOPG2_VERSION=${{ fromJSON(needs.prepare-docker-build.outputs.psycopg2-json).version }}
# Get cache layers from this branch, then dev, then main
# This allows new branches to get at least some cache benefits, generally from dev
cache-from: |
type=registry,ref=ghcr.io/${{ needs.prepare-docker-build.outputs.ghcr-repository }}/builder/cache/app:${{ github.ref_name }}
type=registry,ref=ghcr.io/${{ needs.prepare-docker-build.outputs.ghcr-repository }}/builder/cache/app:dev
type=registry,ref=ghcr.io/${{ needs.prepare-docker-build.outputs.ghcr-repository }}/builder/cache/app:main
cache-to: |
type=registry,mode=max,ref=ghcr.io/${{ needs.prepare-docker-build.outputs.ghcr-repository }}/builder/cache/app:${{ github.ref_name }}
cache-from: type=gha
cache-to: type=gha,mode=max
-
name: Inspect image
run: |
@@ -384,34 +244,24 @@ jobs:
path: src/documents/static/frontend/
build-release:
needs:
- build-docker-image
runs-on: ubuntu-22.04
needs: [build-docker-image, documentation]
runs-on: ubuntu-20.04
steps:
-
name: Checkout
uses: actions/checkout@v3
-
name: Set up Python
id: setup-python
uses: actions/setup-python@v4
uses: actions/setup-python@v3
with:
python-version: ${{ env.DEFAULT_PYTHON_VERSION }}
cache: "pipenv"
cache-dependency-path: 'Pipfile.lock'
python-version: 3.9
-
name: Install pipenv + tools
run: |
pip install --upgrade --user pipenv==${DEFAULT_PIP_ENV_VERSION} setuptools wheel
-
name: Install Python dependencies
run: |
pipenv --python ${{ steps.setup-python.outputs.python-version }} sync --dev
-
name: Install system dependencies
name: Install dependencies
run: |
sudo apt-get update -qq
sudo apt-get install -qq --no-install-recommends gettext liblept5
pip3 install --upgrade pip setuptools wheel
pip3 install -r requirements.txt
-
name: Download frontend artifact
uses: actions/download-artifact@v3
@@ -425,64 +275,33 @@ jobs:
name: documentation
path: docs/_build/html/
-
name: Generate requirements file
name: Move files
run: |
pipenv --python ${{ steps.setup-python.outputs.python-version }} requirements > requirements.txt
mkdir dist
mkdir dist/paperless-ngx
mkdir dist/paperless-ngx/scripts
cp .dockerignore .env Dockerfile Pipfile Pipfile.lock LICENSE README.md requirements.txt dist/paperless-ngx/
cp paperless.conf.example dist/paperless-ngx/paperless.conf
cp gunicorn.conf.py dist/paperless-ngx/gunicorn.conf.py
cp docker/ dist/paperless-ngx/docker -r
cp scripts/*.service scripts/*.sh dist/paperless-ngx/scripts/
cp src/ dist/paperless-ngx/src -r
cp docs/_build/html/ dist/paperless-ngx/docs -r
-
name: Compile messages
run: |
cd src/
pipenv --python ${{ steps.setup-python.outputs.python-version }} run python3 manage.py compilemessages
cd dist/paperless-ngx/src
python3 manage.py compilemessages
-
name: Collect static files
run: |
cd src/
pipenv --python ${{ steps.setup-python.outputs.python-version }} run python3 manage.py collectstatic --no-input
-
name: Move files
run: |
echo "Making dist folders"
for directory in dist \
dist/paperless-ngx \
dist/paperless-ngx/scripts;
do
mkdir --verbose --parents ${directory}
done
echo "Copying basic files"
for file_name in .dockerignore \
.env \
Dockerfile \
Pipfile \
Pipfile.lock \
requirements.txt \
LICENSE \
README.md \
paperless.conf.example \
gunicorn.conf.py
do
cp --verbose ${file_name} dist/paperless-ngx/
done
mv --verbose dist/paperless-ngx/paperless.conf.example dist/paperless-ngx/paperless.conf
echo "Copying Docker related files"
cp --recursive docker/ dist/paperless-ngx/docker
echo "Copying startup scripts"
cp --verbose scripts/*.service scripts/*.sh scripts/*.socket dist/paperless-ngx/scripts/
echo "Copying source files"
cp --recursive src/ dist/paperless-ngx/src
echo "Copying documentation"
cp --recursive docs/_build/html/ dist/paperless-ngx/docs
mv --verbose static dist/paperless-ngx
cd dist/paperless-ngx/src
python3 manage.py collectstatic --no-input
-
name: Make release package
run: |
echo "Creating release archive"
cd dist
sudo chown -R 1000:1000 paperless-ngx/
find . -name __pycache__ | xargs rm -r
tar -cJf paperless-ngx.tar.xz paperless-ngx/
-
name: Upload release artifact
@@ -492,14 +311,9 @@ jobs:
path: dist/paperless-ngx.tar.xz
publish-release:
runs-on: ubuntu-22.04
outputs:
prerelease: ${{ steps.get_version.outputs.prerelease }}
changelog: ${{ steps.create-release.outputs.body }}
version: ${{ steps.get_version.outputs.version }}
needs:
- build-release
if: github.ref_type == 'tag' && (startsWith(github.ref_name, 'v') || contains(github.ref_name, '-beta.rc'))
runs-on: ubuntu-20.04
needs: build-release
if: contains(github.ref, 'refs/tags/ngx-') || contains(github.ref, 'refs/tags/beta-')
steps:
-
name: Download release artifact
@@ -511,92 +325,35 @@ jobs:
name: Get version
id: get_version
run: |
echo "version=${{ github.ref_name }}" >> $GITHUB_OUTPUT
if [[ ${{ contains(github.ref_name, '-beta.rc') }} == 'true' ]]; then
echo "prerelease=true" >> $GITHUB_OUTPUT
else
echo "prerelease=false" >> $GITHUB_OUTPUT
if [[ $GITHUB_REF == refs/tags/ngx-* ]]; then
echo ::set-output name=version::${GITHUB_REF#refs/tags/ngx-}
echo ::set-output name=prerelease::false
echo ::set-output name=body::"For a complete list of changes, see the changelog at https://paperless-ngx.readthedocs.io/en/latest/changelog.html"
elif [[ $GITHUB_REF == refs/tags/beta-* ]]; then
echo ::set-output name=version::${GITHUB_REF#refs/tags/beta-}
echo ::set-output name=prerelease::true
echo ::set-output name=body::"For a complete list of changes, see the changelog at https://github.com/paperless-ngx/paperless-ngx/blob/beta/docs/changelog.rst"
fi
-
name: Create Release and Changelog
id: create-release
uses: release-drafter/release-drafter@v5
with:
name: Paperless-ngx ${{ steps.get_version.outputs.version }}
tag: ${{ steps.get_version.outputs.version }}
version: ${{ steps.get_version.outputs.version }}
prerelease: ${{ steps.get_version.outputs.prerelease }}
publish: true # ensures release is not marked as draft
name: Create release
id: create_release
uses: actions/create-release@v1
env:
GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }}
with:
tag_name: ngx-${{ steps.get_version.outputs.version }}
release_name: Paperless-ngx ${{ steps.get_version.outputs.version }}
draft: false
prerelease: ${{ steps.get_version.outputs.prerelease }}
body: ${{ steps.get_version.outputs.body }}
-
name: Upload release archive
id: upload-release-asset
uses: shogo82148/actions-upload-release-asset@v1
uses: actions/upload-release-asset@v1
env:
GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }}
with:
github_token: ${{ secrets.GITHUB_TOKEN }}
upload_url: ${{ steps.create-release.outputs.upload_url }}
upload_url: ${{ steps.create_release.outputs.upload_url }} # This pulls from the CREATE RELEASE step above, referencing it's ID to get its outputs object, which include a `upload_url`. See this blog post for more info: https://jasonet.co/posts/new-features-of-github-actions/#passing-data-to-future-steps
asset_path: ./paperless-ngx.tar.xz
asset_name: paperless-ngx-${{ steps.get_version.outputs.version }}.tar.xz
asset_content_type: application/x-xz
append-changelog:
runs-on: ubuntu-22.04
needs:
- publish-release
if: needs.publish-release.outputs.prerelease == 'false'
steps:
-
name: Checkout
uses: actions/checkout@v3
with:
ref: main
-
name: Set up Python
uses: actions/setup-python@v4
with:
python-version: ${{ env.DEFAULT_PYTHON_VERSION }}
cache: "pipenv"
cache-dependency-path: 'Pipfile.lock'
-
name: Install pipenv + tools
run: |
pip install --upgrade --user pipenv==${DEFAULT_PIP_ENV_VERSION} setuptools wheel
-
name: Append Changelog to docs
id: append-Changelog
working-directory: docs
run: |
git branch ${{ needs.publish-release.outputs.version }}-changelog
git checkout ${{ needs.publish-release.outputs.version }}-changelog
echo -e "# Changelog\n\n${{ needs.publish-release.outputs.changelog }}\n" > changelog-new.md
echo "Manually linking usernames"
sed -i -r 's|@(.+?) \(\[#|[@\1](https://github.com/\1) ([#|ig' changelog-new.md
CURRENT_CHANGELOG=`tail --lines +2 changelog.md`
echo -e "$CURRENT_CHANGELOG" >> changelog-new.md
mv changelog-new.md changelog.md
pipenv run pre-commit run --files changelog.md || true
git config --global user.name "github-actions"
git config --global user.email "41898282+github-actions[bot]@users.noreply.github.com"
git commit -am "Changelog ${{ needs.publish-release.outputs.version }} - GHA"
git push origin ${{ needs.publish-release.outputs.version }}-changelog
-
name: Create Pull Request
uses: actions/github-script@v6
with:
script: |
const { repo, owner } = context.repo;
const result = await github.rest.pulls.create({
title: '[Documentation] Add ${{ needs.publish-release.outputs.version }} changelog',
owner,
repo,
head: '${{ needs.publish-release.outputs.version }}-changelog',
base: 'main',
body: 'This PR is auto-generated by CI.'
});
github.rest.issues.addLabels({
owner,
repo,
issue_number: result.data.number,
labels: ['documentation', 'skip-changelog']
});

View File

@@ -1,83 +0,0 @@
# This workflow runs on certain conditions to check for and potentially
# delete container images from the GHCR which no longer have an associated
# code branch.
# Requires a PAT with the correct scope set in the secrets.
#
# This workflow will not trigger runs on forked repos.
name: Cleanup Image Tags
on:
delete:
push:
paths:
- ".github/workflows/cleanup-tags.yml"
- ".github/scripts/cleanup-tags.py"
- ".github/scripts/github.py"
- ".github/scripts/common.py"
concurrency:
group: registry-tags-cleanup
cancel-in-progress: false
jobs:
cleanup-images:
name: Cleanup Image Tags for ${{ matrix.primary-name }}
if: github.repository_owner == 'paperless-ngx'
runs-on: ubuntu-22.04
strategy:
matrix:
include:
- primary-name: "paperless-ngx"
cache-name: "paperless-ngx/builder/cache/app"
- primary-name: "paperless-ngx/builder/qpdf"
cache-name: "paperless-ngx/builder/cache/qpdf"
- primary-name: "paperless-ngx/builder/pikepdf"
cache-name: "paperless-ngx/builder/cache/pikepdf"
- primary-name: "paperless-ngx/builder/jbig2enc"
cache-name: "paperless-ngx/builder/cache/jbig2enc"
- primary-name: "paperless-ngx/builder/psycopg2"
cache-name: "paperless-ngx/builder/cache/psycopg2"
env:
# Requires a personal access token with the OAuth scope delete:packages
TOKEN: ${{ secrets.GHA_CONTAINER_DELETE_TOKEN }}
steps:
-
name: Checkout
uses: actions/checkout@v3
-
name: Login to Github Container Registry
uses: docker/login-action@v2
with:
registry: ghcr.io
username: ${{ github.actor }}
password: ${{ secrets.GITHUB_TOKEN }}
-
name: Set up Python
uses: actions/setup-python@v4
with:
python-version: "3.10"
-
name: Install Python libraries
run: |
python -m pip install httpx docker
#
# Clean up primary package
#
-
name: Cleanup for package "${{ matrix.primary-name }}"
if: "${{ env.TOKEN != '' }}"
run: |
python ${GITHUB_WORKSPACE}/.github/scripts/cleanup-tags.py --untagged --is-manifest --delete "${{ matrix.primary-name }}"
#
# Clean up registry cache package
#
-
name: Cleanup for package "${{ matrix.cache-name }}"
if: "${{ env.TOKEN != '' }}"
run: |
python ${GITHUB_WORKSPACE}/.github/scripts/cleanup-tags.py --untagged --delete "${{ matrix.cache-name }}"

View File

@@ -23,7 +23,7 @@ on:
jobs:
analyze:
name: Analyze
runs-on: ubuntu-22.04
runs-on: ubuntu-latest
permissions:
actions: read
contents: read
@@ -38,11 +38,11 @@ jobs:
steps:
- name: Checkout repository
uses: actions/checkout@v3
uses: actions/checkout@v2
# Initializes the CodeQL tools for scanning.
- name: Initialize CodeQL
uses: github/codeql-action/init@v2
uses: github/codeql-action/init@v1
with:
languages: ${{ matrix.language }}
# If you wish to specify custom queries, you can do so here or in a config file.
@@ -51,4 +51,4 @@ jobs:
# queries: ./path/to/local/query, your-org/your-repo/queries@main
- name: Perform CodeQL Analysis
uses: github/codeql-action/analyze@v2
uses: github/codeql-action/analyze@v1

View File

@@ -1,310 +0,0 @@
# This workflow will run to update the installer library of
# Docker images. These are the images which provide updated wheels
# .deb installation packages or maybe just some compiled library
name: Build Image Library
on:
push:
# Must match one of these branches AND one of the paths
# to be triggered
branches:
- "main"
- "dev"
- "library-*"
- "feature-*"
paths:
# Trigger the workflow if a Dockerfile changed
- "docker-builders/**"
# Trigger if a package was updated
- ".build-config.json"
- "Pipfile.lock"
# Also trigger on workflow changes related to the library
- ".github/workflows/installer-library.yml"
- ".github/workflows/reusable-workflow-builder.yml"
- ".github/scripts/**"
# Set a workflow level concurrency group so primary workflow
# can wait for this to complete if needed
# DO NOT CHANGE without updating main workflow group
concurrency:
group: build-installer-library
cancel-in-progress: false
jobs:
prepare-docker-build:
name: Prepare Docker Image Version Data
runs-on: ubuntu-22.04
steps:
-
name: Set ghcr repository name
id: set-ghcr-repository
run: |
ghcr_name=$(echo "${GITHUB_REPOSITORY}" | awk '{ print tolower($0) }')
echo "repository=${ghcr_name}" >> $GITHUB_OUTPUT
-
name: Checkout
uses: actions/checkout@v3
-
name: Set up Python
uses: actions/setup-python@v4
with:
python-version: "3.9"
-
name: Install jq
run: |
sudo apt-get update
sudo apt-get install jq
-
name: Setup qpdf image
id: qpdf-setup
run: |
build_json=$(python ${GITHUB_WORKSPACE}/.github/scripts/get-build-json.py qpdf)
echo ${build_json}
echo "qpdf-json=${build_json}" >> $GITHUB_OUTPUT
-
name: Setup psycopg2 image
id: psycopg2-setup
run: |
build_json=$(python ${GITHUB_WORKSPACE}/.github/scripts/get-build-json.py psycopg2)
echo ${build_json}
echo "psycopg2-json=${build_json}" >> $GITHUB_OUTPUT
-
name: Setup pikepdf image
id: pikepdf-setup
run: |
build_json=$(python ${GITHUB_WORKSPACE}/.github/scripts/get-build-json.py pikepdf)
echo ${build_json}
echo "pikepdf-json=${build_json}" >> $GITHUB_OUTPUT
-
name: Setup jbig2enc image
id: jbig2enc-setup
run: |
build_json=$(python ${GITHUB_WORKSPACE}/.github/scripts/get-build-json.py jbig2enc)
echo ${build_json}
echo "jbig2enc-json=${build_json}" >> $GITHUB_OUTPUT
-
name: Setup other versions
id: cache-bust-setup
run: |
pillow_version=$(jq -r '.default.pillow.version | gsub("=";"")' Pipfile.lock)
lxml_version=$(jq -r '.default.lxml.version | gsub("=";"")' Pipfile.lock)
echo "Pillow is ${pillow_version}"
echo "lxml is ${lxml_version}"
echo "pillow-version=${pillow_version}" >> $GITHUB_OUTPUT
echo "lxml-version=${lxml_version}" >> $GITHUB_OUTPUT
outputs:
ghcr-repository: ${{ steps.set-ghcr-repository.outputs.repository }}
qpdf-json: ${{ steps.qpdf-setup.outputs.qpdf-json }}
pikepdf-json: ${{ steps.pikepdf-setup.outputs.pikepdf-json }}
psycopg2-json: ${{ steps.psycopg2-setup.outputs.psycopg2-json }}
jbig2enc-json: ${{ steps.jbig2enc-setup.outputs.jbig2enc-json }}
pillow-version: ${{ steps.cache-bust-setup.outputs.pillow-version }}
lxml-version: ${{ steps.cache-bust-setup.outputs.lxml-version }}
build-qpdf-debs:
name: qpdf
needs:
- prepare-docker-build
uses: ./.github/workflows/reusable-workflow-builder.yml
with:
dockerfile: ./docker-builders/Dockerfile.qpdf
build-platforms: linux/amd64
build-json: ${{ needs.prepare-docker-build.outputs.qpdf-json }}
build-args: |
QPDF_VERSION=${{ fromJSON(needs.prepare-docker-build.outputs.qpdf-json).version }}
build-jbig2enc:
name: jbig2enc
needs:
- prepare-docker-build
uses: ./.github/workflows/reusable-workflow-builder.yml
with:
dockerfile: ./docker-builders/Dockerfile.jbig2enc
build-json: ${{ needs.prepare-docker-build.outputs.jbig2enc-json }}
build-args: |
JBIG2ENC_VERSION=${{ fromJSON(needs.prepare-docker-build.outputs.jbig2enc-json).version }}
build-psycopg2-wheel:
name: psycopg2
needs:
- prepare-docker-build
uses: ./.github/workflows/reusable-workflow-builder.yml
with:
dockerfile: ./docker-builders/Dockerfile.psycopg2
build-json: ${{ needs.prepare-docker-build.outputs.psycopg2-json }}
build-args: |
PSYCOPG2_VERSION=${{ fromJSON(needs.prepare-docker-build.outputs.psycopg2-json).version }}
build-pikepdf-wheel:
name: pikepdf
needs:
- prepare-docker-build
- build-qpdf-debs
uses: ./.github/workflows/reusable-workflow-builder.yml
with:
dockerfile: ./docker-builders/Dockerfile.pikepdf
build-json: ${{ needs.prepare-docker-build.outputs.pikepdf-json }}
build-args: |
REPO=${{ needs.prepare-docker-build.outputs.ghcr-repository }}
QPDF_VERSION=${{ fromJSON(needs.prepare-docker-build.outputs.qpdf-json).version }}
PIKEPDF_VERSION=${{ fromJSON(needs.prepare-docker-build.outputs.pikepdf-json).version }}
PILLOW_VERSION=${{ needs.prepare-docker-build.outputs.pillow-version }}
LXML_VERSION=${{ needs.prepare-docker-build.outputs.lxml-version }}
commit-binary-files:
name: Store installers
needs:
- prepare-docker-build
- build-qpdf-debs
- build-jbig2enc
- build-psycopg2-wheel
- build-pikepdf-wheel
runs-on: ubuntu-22.04
steps:
-
name: Checkout
uses: actions/checkout@v3
with:
ref: binary-library
-
name: Set up Python
uses: actions/setup-python@v4
with:
python-version: "3.9"
-
name: Install system dependencies
run: |
sudo apt-get update -qq
sudo apt-get install -qq --no-install-recommends tree
-
name: Extract qpdf files
run: |
version=${{ fromJSON(needs.prepare-docker-build.outputs.qpdf-json).version }}
tag=${{ fromJSON(needs.prepare-docker-build.outputs.qpdf-json).image_tag }}
docker pull --quiet ${tag}
docker create --name qpdf-extract ${tag}
mkdir --parents qpdf/${version}/amd64
docker cp qpdf-extract:/usr/src/qpdf/${version}/amd64 qpdf/${version}
mkdir --parents qpdf/${version}/arm64
docker cp qpdf-extract:/usr/src/qpdf/${version}/arm64 qpdf/${version}
mkdir --parents qpdf/${version}/armv7
docker cp qpdf-extract:/usr/src/qpdf/${version}/armv7 qpdf/${version}
-
name: Extract psycopg2 files
run: |
version=${{ fromJSON(needs.prepare-docker-build.outputs.psycopg2-json).version }}
tag=${{ fromJSON(needs.prepare-docker-build.outputs.psycopg2-json).image_tag }}
docker pull --quiet --platform linux/amd64 ${tag}
docker create --platform linux/amd64 --name psycopg2-extract ${tag}
mkdir --parents psycopg2/${version}/amd64
docker cp psycopg2-extract:/usr/src/wheels/ psycopg2/${version}/amd64
mv psycopg2/${version}/amd64/wheels/* psycopg2/${version}/amd64
rm -r psycopg2/${version}/amd64/wheels/
docker rm psycopg2-extract
docker pull --quiet --platform linux/arm64 ${tag}
docker create --platform linux/arm64 --name psycopg2-extract ${tag}
mkdir --parents psycopg2/${version}/arm64
docker cp psycopg2-extract:/usr/src/wheels/ psycopg2/${version}/arm64
mv psycopg2/${version}/arm64/wheels/* psycopg2/${version}/arm64
rm -r psycopg2/${version}/arm64/wheels/
docker rm psycopg2-extract
docker pull --quiet --platform linux/arm/v7 ${tag}
docker create --platform linux/arm/v7 --name psycopg2-extract ${tag}
mkdir --parents psycopg2/${version}/armv7
docker cp psycopg2-extract:/usr/src/wheels/ psycopg2/${version}/armv7
mv psycopg2/${version}/armv7/wheels/* psycopg2/${version}/armv7
rm -r psycopg2/${version}/armv7/wheels/
docker rm psycopg2-extract
-
name: Extract pikepdf files
run: |
version=${{ fromJSON(needs.prepare-docker-build.outputs.pikepdf-json).version }}
tag=${{ fromJSON(needs.prepare-docker-build.outputs.pikepdf-json).image_tag }}
docker pull --quiet --platform linux/amd64 ${tag}
docker create --platform linux/amd64 --name pikepdf-extract ${tag}
mkdir --parents pikepdf/${version}/amd64
docker cp pikepdf-extract:/usr/src/wheels/ pikepdf/${version}/amd64
mv pikepdf/${version}/amd64/wheels/* pikepdf/${version}/amd64
rm -r pikepdf/${version}/amd64/wheels/
docker rm pikepdf-extract
docker pull --quiet --platform linux/arm64 ${tag}
docker create --platform linux/arm64 --name pikepdf-extract ${tag}
mkdir --parents pikepdf/${version}/arm64
docker cp pikepdf-extract:/usr/src/wheels/ pikepdf/${version}/arm64
mv pikepdf/${version}/arm64/wheels/* pikepdf/${version}/arm64
rm -r pikepdf/${version}/arm64/wheels/
docker rm pikepdf-extract
docker pull --quiet --platform linux/arm/v7 ${tag}
docker create --platform linux/arm/v7 --name pikepdf-extract ${tag}
mkdir --parents pikepdf/${version}/armv7
docker cp pikepdf-extract:/usr/src/wheels/ pikepdf/${version}/armv7
mv pikepdf/${version}/armv7/wheels/* pikepdf/${version}/armv7
rm -r pikepdf/${version}/armv7/wheels/
docker rm pikepdf-extract
-
name: Extract jbig2enc files
run: |
version=${{ fromJSON(needs.prepare-docker-build.outputs.jbig2enc-json).version }}
tag=${{ fromJSON(needs.prepare-docker-build.outputs.jbig2enc-json).image_tag }}
docker pull --quiet --platform linux/amd64 ${tag}
docker create --platform linux/amd64 --name jbig2enc-extract ${tag}
mkdir --parents jbig2enc/${version}/amd64
docker cp jbig2enc-extract:/usr/src/jbig2enc/build jbig2enc/${version}/amd64/
mv jbig2enc/${version}/amd64/build/* jbig2enc/${version}/amd64/
docker rm jbig2enc-extract
docker pull --quiet --platform linux/arm64 ${tag}
docker create --platform linux/arm64 --name jbig2enc-extract ${tag}
mkdir --parents jbig2enc/${version}/arm64
docker cp jbig2enc-extract:/usr/src/jbig2enc/build jbig2enc/${version}/arm64
mv jbig2enc/${version}/arm64/build/* jbig2enc/${version}/arm64/
docker rm jbig2enc-extract
docker pull --quiet --platform linux/arm/v7 ${tag}
docker create --platform linux/arm/v7 --name jbig2enc-extract ${tag}
mkdir --parents jbig2enc/${version}/armv7
docker cp jbig2enc-extract:/usr/src/jbig2enc/build jbig2enc/${version}/armv7
mv jbig2enc/${version}/armv7/build/* jbig2enc/${version}/armv7/
docker rm jbig2enc-extract
-
name: Show file structure
run: |
tree .
-
name: Commit files
run: |
git config --global user.name "github-actions"
git config --global user.email "41898282+github-actions[bot]@users.noreply.github.com"
git add pikepdf/ qpdf/ psycopg2/ jbig2enc/
git commit -m "Updating installer packages" || true
git push origin || true

View File

@@ -13,9 +13,6 @@ on:
- main
- dev
permissions:
contents: read
env:
todo: Todo
done: Done
@@ -24,11 +21,11 @@ env:
jobs:
issue_opened_or_reopened:
name: issue_opened_or_reopened
runs-on: ubuntu-22.04
runs-on: ubuntu-latest
if: github.event_name == 'issues' && (github.event.action == 'opened' || github.event.action == 'reopened')
steps:
- name: Add issue to project and set status to ${{ env.todo }}
uses: leonsteinhaeuser/project-beta-automations@v2.1.0
- name: Set issue status to ${{ env.todo }}
uses: leonsteinhaeuser/project-beta-automations@v1.2.1
with:
gh_token: ${{ secrets.GH_TOKEN }}
organization: paperless-ngx
@@ -37,21 +34,14 @@ jobs:
status_value: ${{ env.todo }} # Target status
pr_opened_or_reopened:
name: pr_opened_or_reopened
runs-on: ubuntu-22.04
permissions:
# write permission is required for autolabeler
pull-requests: write
if: github.event_name == 'pull_request_target' && (github.event.action == 'opened' || github.event.action == 'reopened') && github.event.pull_request.user.login != 'dependabot'
runs-on: ubuntu-latest
if: github.event_name == 'pull_request_target' && (github.event.action == 'opened' || github.event.action == 'reopened')
steps:
- name: Add PR to project and set status to "Needs Review"
uses: leonsteinhaeuser/project-beta-automations@v2.1.0
- name: Set PR status to ${{ env.in_progress }}
uses: leonsteinhaeuser/project-beta-automations@v1.2.1
with:
gh_token: ${{ secrets.GH_TOKEN }}
organization: paperless-ngx
project_id: 2
resource_node_id: ${{ github.event.pull_request.node_id }}
status_value: "Needs Review" # Target status
- name: Label PR with release-drafter
uses: release-drafter/release-drafter@v5
env:
GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }}
status_value: ${{ env.in_progress }} # Target status

View File

@@ -1,47 +0,0 @@
name: 'Repository Maintenance'
on:
schedule:
- cron: '0 3 * * *'
workflow_dispatch:
permissions:
issues: write
pull-requests: write
concurrency:
group: lock
jobs:
stale:
name: 'Stale'
runs-on: ubuntu-latest
steps:
- uses: actions/stale@v8
with:
days-before-stale: 30
days-before-close: 7
only-labels: 'cant-reproduce'
stale-issue-label: stale
stale-pr-label: stale
stale-issue-message: >
This issue has been automatically marked as stale because it has not had
recent activity. It will be closed if no further activity occurs. Thank you
for your contributions.
lock-threads:
name: 'Lock Old Threads'
runs-on: ubuntu-latest
steps:
- uses: dessant/lock-threads@v4
with:
issue-inactive-days: '30'
pr-inactive-days: '30'
log-output: true
issue-comment: >
This issue has been automatically locked since there
has not been any recent activity after it was closed.
Please open a new discussion or issue for related concerns.
pr-comment: >
This pull request has been automatically locked since there
has not been any recent activity after it was closed.
Please open a new discussion or issue for related concerns.

View File

@@ -1,57 +0,0 @@
name: Reusable Image Builder
on:
workflow_call:
inputs:
dockerfile:
required: true
type: string
build-json:
required: true
type: string
build-args:
required: false
default: ""
type: string
build-platforms:
required: false
default: linux/amd64,linux/arm64,linux/arm/v7
type: string
concurrency:
group: ${{ github.workflow }}-${{ fromJSON(inputs.build-json).name }}-${{ fromJSON(inputs.build-json).version }}
cancel-in-progress: false
jobs:
build-image:
name: Build ${{ fromJSON(inputs.build-json).name }} @ ${{ fromJSON(inputs.build-json).version }}
runs-on: ubuntu-22.04
steps:
-
name: Checkout
uses: actions/checkout@v3
-
name: Login to Github Container Registry
uses: docker/login-action@v2
with:
registry: ghcr.io
username: ${{ github.actor }}
password: ${{ secrets.GITHUB_TOKEN }}
-
name: Set up Docker Buildx
uses: docker/setup-buildx-action@v2
-
name: Set up QEMU
uses: docker/setup-qemu-action@v2
-
name: Build ${{ fromJSON(inputs.build-json).name }}
uses: docker/build-push-action@v4
with:
context: .
file: ${{ inputs.dockerfile }}
tags: ${{ fromJSON(inputs.build-json).image_tag }}
platforms: ${{ inputs.build-platforms }}
build-args: ${{ inputs.build-args }}
push: true
cache-from: type=registry,ref=${{ fromJSON(inputs.build-json).cache_tag }}
cache-to: type=registry,mode=max,ref=${{ fromJSON(inputs.build-json).cache_tag }}

16
.gitignore vendored
View File

@@ -51,8 +51,8 @@ coverage.xml
# Django stuff:
*.log
# MkDocs documentation
site/
# Sphinx documentation
docs/_build/
# PyBuilder
target/
@@ -63,17 +63,13 @@ target/
# VS Code
.vscode
/src-ui/.vscode
/docs/.vscode
# Other stuff that doesn't belong
.virtualenv
virtualenv
/venv
.venv/
/docker-compose.env
/docker-compose.yml
.ruff_cache/
# Used for development
scripts/import-for-development
@@ -88,12 +84,8 @@ scripts/nuke
/paperless.conf
/consume/
/export/
/src-ui/.vscode
# this is where the compiled frontend is moved to.
/src/documents/static/frontend/
# mac os
.DS_Store
# celery schedule file
celerybeat-schedule*
/docs/.vscode/settings.json

View File

@@ -1,8 +0,0 @@
failure-threshold: warning
ignored:
# https://github.com/hadolint/hadolint/wiki/DL3008
- DL3008
# https://github.com/hadolint/hadolint/wiki/DL3013
- DL3013
# https://github.com/hadolint/hadolint/wiki/DL3003
- DL3003

View File

@@ -5,7 +5,7 @@
repos:
# General hooks
- repo: https://github.com/pre-commit/pre-commit-hooks
rev: v4.4.0
rev: v4.1.0
hooks:
- id: check-docstring-first
- id: check-json
@@ -27,7 +27,7 @@ repos:
- id: check-case-conflict
- id: detect-private-key
- repo: https://github.com/pre-commit/mirrors-prettier
rev: "v2.7.1"
rev: "v2.6.1"
hooks:
- id: prettier
types_or:
@@ -36,19 +36,37 @@ repos:
- markdown
exclude: "(^Pipfile\\.lock$)"
# Python hooks
- repo: https://github.com/charliermarsh/ruff-pre-commit
rev: 'v0.0.263'
- repo: https://github.com/asottile/reorder_python_imports
rev: v3.0.1
hooks:
- id: ruff
- id: reorder-python-imports
exclude: "(migrations)"
- repo: https://github.com/asottile/yesqa
rev: "v1.3.0"
hooks:
- id: yesqa
exclude: "(migrations)"
- repo: https://github.com/asottile/add-trailing-comma
rev: "v2.2.1"
hooks:
- id: add-trailing-comma
exclude: "(migrations)"
- repo: https://gitlab.com/pycqa/flake8
rev: 3.9.2
hooks:
- id: flake8
files: ^src/
args:
- "--config=./src/setup.cfg"
- repo: https://github.com/psf/black
rev: 23.3.0
rev: 22.3.0
hooks:
- id: black
# Dockerfile hooks
- repo: https://github.com/AleksaC/hadolint-py
rev: v2.12.0.2
- repo: https://github.com/pryorda/dockerfilelint-precommit-hooks
rev: "v0.1.0"
hooks:
- id: hadolint
- id: dockerfilelint
# Shell script hooks
- repo: https://github.com/lovesegfault/beautysh
rev: v6.2.1
@@ -57,6 +75,6 @@ repos:
args:
- "--tab"
- repo: https://github.com/shellcheck-py/shellcheck-py
rev: "v0.9.0.2"
rev: "v0.8.0.4"
hooks:
- id: shellcheck

View File

@@ -1 +0,0 @@
3.8.16

16
.readthedocs.yml Normal file
View File

@@ -0,0 +1,16 @@
# .readthedocs.yml
# Read the Docs configuration file
# See https://docs.readthedocs.io/en/stable/config-file/v2.html for details
# Required
version: 2
# Build documentation in the docs/ directory with Sphinx
sphinx:
configuration: docs/conf.py
# Optionally set the version of Python and requirements required to build your docs
python:
version: "3.8"
install:
- requirements: docs/requirements.txt

View File

@@ -1,23 +0,0 @@
# https://beta.ruff.rs/docs/settings/
# https://beta.ruff.rs/docs/rules/
extend-select = ["I", "W", "UP", "COM", "DJ", "EXE", "ISC", "ICN", "G201", "INP", "PIE", "RSE", "SIM", "TID", "PLC", "PLE", "RUF"]
# TODO PTH
ignore = ["DJ001", "SIM105"]
fix = true
line-length = 88
respect-gitignore = true
src = ["src"]
target-version = "py38"
format = "grouped"
show-fixes = true
[per-file-ignores]
".github/scripts/*.py" = ["E501", "INP001", "SIM117"]
"docker/wait-for-redis.py" = ["INP001"]
"*/tests/*.py" = ["E501", "SIM117"]
"*/migrations/*.py" = ["E501", "SIM"]
"src/paperless_tesseract/tests/test_parser.py" = ["RUF001"]
"src/documents/models.py" = ["SIM115"]
[isort]
force-single-line = true

View File

@@ -1,9 +0,0 @@
/.github/workflows/ @paperless-ngx/ci-cd
/docker/ @paperless-ngx/ci-cd
/scripts/ @paperless-ngx/ci-cd
/src-ui/ @paperless-ngx/frontend
/src/ @paperless-ngx/backend
Pipfile* @paperless-ngx/backend
*.py @paperless-ngx/backend

View File

@@ -27,11 +27,11 @@ Please format and test your code! I know it's a hassle, but it makes sure that y
To test your code, execute `pytest` in the src/ directory. This also generates a html coverage report, which you can use to see if you missed anything important during testing.
Before you can run `pytest`, ensure to [properly set up your local environment](https://docs.paperless-ngx.com/development/#initial-setup-and-first-start).
Before you can run `pytest`, ensure to [properly set up your local environment](https://paperless-ngx.readthedocs.io/en/latest/extending.html#initial-setup-and-first-start).
## More info:
... is available [in the documentation](https://docs.paperless-ngx.com/development).
... is available in the documentation. https://paperless-ngx.readthedocs.io/en/latest/extending.html
# Merging PRs

View File

@@ -1,256 +1,61 @@
# syntax=docker/dockerfile:1.4
# https://github.com/moby/buildkit/blob/master/frontend/dockerfile/docs/reference.md
FROM node:16 AS compile-frontend
# Stage: compile-frontend
# Purpose: Compiles the frontend
# Notes:
# - Does NPM stuff with Typescript and such
FROM --platform=$BUILDPLATFORM node:16-bullseye-slim AS compile-frontend
COPY ./src-ui /src/src-ui
COPY . /src
WORKDIR /src/src-ui
RUN set -eux \
&& npm update npm -g \
&& npm ci --omit=optional
RUN set -eux \
&& ./node_modules/.bin/ng build --configuration production
RUN npm update npm -g && npm ci --no-optional
RUN ./node_modules/.bin/ng build --configuration production
# Stage: pipenv-base
# Purpose: Generates a requirements.txt file for building
# Comments:
# - pipenv dependencies are not left in the final image
# - pipenv can't touch the final image somehow
FROM --platform=$BUILDPLATFORM python:3.9-slim-bullseye as pipenv-base
WORKDIR /usr/src/pipenv
COPY Pipfile* ./
RUN set -eux \
&& echo "Installing pipenv" \
&& python3 -m pip install --no-cache-dir --upgrade pipenv==2023.3.20 \
&& echo "Generating requirement.txt" \
&& pipenv requirements > requirements.txt
# Stage: main-app
# Purpose: The final image
# Comments:
# - Don't leave anything extra in here
FROM python:3.9-slim-bullseye as main-app
FROM ghcr.io/paperless-ngx/builder/ngx-base:1.0 as main-app
LABEL org.opencontainers.image.authors="paperless-ngx team <hello@paperless-ngx.com>"
LABEL org.opencontainers.image.documentation="https://docs.paperless-ngx.com/"
LABEL org.opencontainers.image.documentation="https://paperless-ngx.readthedocs.io/en/latest/"
LABEL org.opencontainers.image.source="https://github.com/paperless-ngx/paperless-ngx"
LABEL org.opencontainers.image.url="https://github.com/paperless-ngx/paperless-ngx"
LABEL org.opencontainers.image.licenses="GPL-3.0-only"
ARG DEBIAN_FRONTEND=noninteractive
#
# Begin installation and configuration
# Order the steps below from least often changed to most
#
# Packages need for running
ARG RUNTIME_PACKAGES="\
# General utils
curl \
# Docker specific
gosu \
# Timezones support
tzdata \
# fonts for text file thumbnail generation
fonts-liberation \
gettext \
ghostscript \
gnupg \
icc-profiles-free \
imagemagick \
# Image processing
liblept5 \
liblcms2-2 \
libtiff5 \
libfreetype6 \
libwebp6 \
libopenjp2-7 \
libimagequant0 \
libraqm0 \
libjpeg62-turbo \
# PostgreSQL
libpq5 \
postgresql-client \
# MySQL / MariaDB
mariadb-client \
# For Numpy
libatlas3-base \
# OCRmyPDF dependencies
tesseract-ocr \
tesseract-ocr-eng \
tesseract-ocr-deu \
tesseract-ocr-fra \
tesseract-ocr-ita \
tesseract-ocr-spa \
unpaper \
pngquant \
# pikepdf / qpdf
jbig2dec \
libxml2 \
libxslt1.1 \
libgnutls30 \
# Mime type detection
file \
libmagic1 \
media-types \
zlib1g \
# Barcode splitter
libzbar0 \
poppler-utils \
# RapidFuzz on armv7
libatomic1"
# Install basic runtime packages.
# These change very infrequently
RUN set -eux \
echo "Installing system packages" \
&& apt-get update \
&& apt-get install --yes --quiet --no-install-recommends ${RUNTIME_PACKAGES} \
&& rm -rf /var/lib/apt/lists/* \
&& echo "Installing supervisor" \
&& python3 -m pip install --default-timeout=1000 --upgrade --no-cache-dir supervisor==4.2.5
# Copy gunicorn config
# Changes very infrequently
WORKDIR /usr/src/paperless/
COPY gunicorn.conf.py .
# setup docker-specific things
# These change sometimes, but rarely
WORKDIR /usr/src/paperless/src/docker/
COPY [ \
"docker/imagemagick-policy.xml", \
"docker/supervisord.conf", \
"docker/docker-entrypoint.sh", \
"docker/docker-prepare.sh", \
"docker/paperless_cmd.sh", \
"docker/wait-for-redis.py", \
"docker/env-from-file.sh", \
"docker/management_script.sh", \
"docker/flower-conditional.sh", \
"docker/install_management_commands.sh", \
"/usr/src/paperless/src/docker/" \
]
RUN set -eux \
&& echo "Configuring ImageMagick" \
&& mv imagemagick-policy.xml /etc/ImageMagick-6/policy.xml \
&& echo "Configuring supervisord" \
&& mkdir /var/log/supervisord /var/run/supervisord \
&& mv supervisord.conf /etc/supervisord.conf \
&& echo "Setting up Docker scripts" \
&& mv docker-entrypoint.sh /sbin/docker-entrypoint.sh \
&& chmod 755 /sbin/docker-entrypoint.sh \
&& mv docker-prepare.sh /sbin/docker-prepare.sh \
&& chmod 755 /sbin/docker-prepare.sh \
&& mv wait-for-redis.py /sbin/wait-for-redis.py \
&& chmod 755 /sbin/wait-for-redis.py \
&& mv env-from-file.sh /sbin/env-from-file.sh \
&& chmod 755 /sbin/env-from-file.sh \
&& mv paperless_cmd.sh /usr/local/bin/paperless_cmd.sh \
&& chmod 755 /usr/local/bin/paperless_cmd.sh \
&& mv flower-conditional.sh /usr/local/bin/flower-conditional.sh \
&& chmod 755 /usr/local/bin/flower-conditional.sh \
&& echo "Installing managment commands" \
&& chmod +x install_management_commands.sh \
&& ./install_management_commands.sh
# Buildx provided, must be defined to use though
ARG TARGETARCH
ARG TARGETVARIANT
# Workflow provided, defaults set for manual building
ARG JBIG2ENC_VERSION=0.29
ARG QPDF_VERSION=11.3.0
ARG PIKEPDF_VERSION=7.1.1
ARG PSYCOPG2_VERSION=2.9.5
# Install the built packages from the installer library images
# These change sometimes
RUN set -eux \
&& echo "Getting binaries" \
&& mkdir paperless-ngx \
&& curl --fail --silent --show-error --output paperless-ngx.tar.gz --location https://github.com/paperless-ngx/paperless-ngx/archive/ba28a1e16c27d121b644b4f6bdb78855a2850561.tar.gz \
&& tar -xf paperless-ngx.tar.gz --directory paperless-ngx --strip-components=1 \
&& cd paperless-ngx \
# Setting a specific revision ensures we know what this installed
# and ensures cache breaking on changes
&& echo "Installing jbig2enc" \
&& cp ./jbig2enc/${JBIG2ENC_VERSION}/${TARGETARCH}${TARGETVARIANT}/jbig2 /usr/local/bin/ \
&& cp ./jbig2enc/${JBIG2ENC_VERSION}/${TARGETARCH}${TARGETVARIANT}/libjbig2enc* /usr/local/lib/ \
&& echo "Installing qpdf" \
&& apt-get install --yes --no-install-recommends ./qpdf/${QPDF_VERSION}/${TARGETARCH}${TARGETVARIANT}/libqpdf29_*.deb \
&& apt-get install --yes --no-install-recommends ./qpdf/${QPDF_VERSION}/${TARGETARCH}${TARGETVARIANT}/qpdf_*.deb \
&& echo "Installing pikepdf and dependencies" \
&& python3 -m pip install --no-cache-dir ./pikepdf/${PIKEPDF_VERSION}/${TARGETARCH}${TARGETVARIANT}/*.whl \
&& python3 -m pip list \
&& echo "Installing psycopg2" \
&& python3 -m pip install --no-cache-dir ./psycopg2/${PSYCOPG2_VERSION}/${TARGETARCH}${TARGETVARIANT}/psycopg2*.whl \
&& python3 -m pip list \
&& echo "Cleaning up image layer" \
&& cd ../ \
&& rm -rf paperless-ngx \
&& rm paperless-ngx.tar.gz
WORKDIR /usr/src/paperless/src/
COPY requirements.txt ../
# Python dependencies
# Change pretty frequently
COPY --from=pipenv-base /usr/src/pipenv/requirements.txt ./
RUN apt-get update \
# python-Levenshtein still needs to be compiled here
&& apt-get -y --no-install-recommends install \
build-essential \
&& python3 -m pip install --upgrade --no-cache-dir pip wheel \
&& python3 -m pip install --default-timeout=1000 --upgrade --no-cache-dir supervisor \
&& python3 -m pip install --default-timeout=1000 --no-cache-dir -r ../requirements.txt \
&& apt-get -y purge build-essential \
&& apt-get -y autoremove --purge \
&& rm -rf /var/lib/apt/lists/*
# Packages needed only for building a few quick Python
# dependencies
ARG BUILD_PACKAGES="\
build-essential \
git \
default-libmysqlclient-dev \
python3-dev"
# setup docker-specific things
COPY docker/ ./docker/
RUN set -eux \
&& echo "Installing build system packages" \
&& apt-get update \
&& apt-get install --yes --quiet --no-install-recommends ${BUILD_PACKAGES} \
&& python3 -m pip install --no-cache-dir --upgrade wheel \
&& echo "Installing Python requirements" \
&& python3 -m pip install --default-timeout=1000 --no-cache-dir --requirement requirements.txt \
&& echo "Installing NLTK data" \
&& python3 -W ignore::RuntimeWarning -m nltk.downloader -d "/usr/share/nltk_data" snowball_data \
&& python3 -W ignore::RuntimeWarning -m nltk.downloader -d "/usr/share/nltk_data" stopwords \
&& python3 -W ignore::RuntimeWarning -m nltk.downloader -d "/usr/share/nltk_data" punkt \
&& echo "Cleaning up image" \
&& apt-get -y purge ${BUILD_PACKAGES} \
&& apt-get -y autoremove --purge \
&& apt-get clean --yes \
&& rm -rf /var/lib/apt/lists/* \
&& rm -rf /tmp/* \
&& rm -rf /var/tmp/* \
&& rm -rf /var/cache/apt/archives/* \
&& truncate -s 0 /var/log/*log
RUN cd docker \
&& cp imagemagick-policy.xml /etc/ImageMagick-6/policy.xml \
&& mkdir /var/log/supervisord /var/run/supervisord \
&& cp supervisord.conf /etc/supervisord.conf \
&& cp docker-entrypoint.sh /sbin/docker-entrypoint.sh \
&& chmod 755 /sbin/docker-entrypoint.sh \
&& cp docker-prepare.sh /sbin/docker-prepare.sh \
&& chmod 755 /sbin/docker-prepare.sh \
&& chmod +x install_management_commands.sh \
&& ./install_management_commands.sh \
&& cd .. \
&& rm -rf docker/
# copy backend
COPY ./src ./
COPY gunicorn.conf.py ../
# copy frontend
COPY --from=compile-frontend /src/src/documents/static/frontend/ ./documents/static/frontend/
# copy app
COPY --from=compile-frontend /src/src/ ./
# add users, setup scripts
# Mount the compiled frontend to expected location
RUN set -eux \
&& addgroup --gid 1000 paperless \
RUN addgroup --gid 1000 paperless \
&& useradd --uid 1000 --gid paperless --home-dir /usr/src/paperless paperless \
&& chown -R paperless:paperless /usr/src/paperless \
&& gosu paperless python3 manage.py collectstatic --clear --no-input --link \
&& chown -R paperless:paperless ../ \
&& gosu paperless python3 manage.py collectstatic --clear --no-input \
&& gosu paperless python3 manage.py compilemessages
VOLUME ["/usr/src/paperless/data", \
@@ -262,4 +67,4 @@ ENTRYPOINT ["/sbin/docker-entrypoint.sh"]
EXPOSE 8000
CMD ["/usr/local/bin/paperless_cmd.sh"]
CMD ["/usr/local/bin/supervisord", "-c", "/etc/supervisord.conf"]

78
Pipfile
View File

@@ -10,37 +10,35 @@ name = "piwheels"
[packages]
dateparser = "~=1.1"
django = "~=4.1"
django = "~=4.0"
django-cors-headers = "*"
django-celery-results = "*"
django-compression-middleware = "*"
django-guardian = "*"
django-extensions = "*"
django-filter = "~=22.1"
djangorestframework = "~=3.14"
djangorestframework-guardian = "*"
django-ipware = "*"
django-filter = "~=21.1"
django-q = "~=1.3"
djangorestframework = "~=3.13"
filelock = "*"
fuzzywuzzy = {extras = ["speedup"], version = "*"}
gunicorn = "*"
imap-tools = "*"
langdetect = "*"
pathvalidate = "*"
pillow = "~=9.4"
pikepdf = "*"
pillow = "~=9.0"
# Any version update to pikepdf requires a base image update
pikepdf = "~=5.1"
python-gnupg = "*"
python-dotenv = "*"
python-dateutil = "*"
python-magic = "*"
# Any version update to psycopg2 requires a base image update
psycopg2 = "*"
rapidfuzz = "*"
redis = {extras = ["hiredis"], version = "*"}
scikit-learn = "~=1.2"
numpy = "*"
whitenoise = "~=6.3"
watchdog = "~=2.2"
whoosh="~=2.7"
redis = "*"
# Pinned because aarch64 wheels and updates cause warnings when loading the classifier model.
scikit-learn="==1.0.2"
whitenoise = "~=6.0.0"
watchdog = "~=2.1.0"
whoosh="~=2.7.4"
inotifyrecursive = "~=0.3"
ocrmypdf = "~=14.0"
ocrmypdf = "~=13.4"
tqdm = "*"
tika = "*"
# TODO: This will sadly also install daphne+dependencies,
@@ -50,52 +48,22 @@ channels-redis = "*"
uvicorn = {extras = ["standard"], version = "*"}
concurrent-log-handler = "*"
"pdfminer.six" = "*"
pyzbar = "*"
mysqlclient = "*"
celery = {extras = ["redis"], version = "*"}
setproctitle = "*"
nltk = "*"
pdf2image = "*"
flower = "*"
bleach = "*"
zxing-cpp = {version = "*", platform_machine = "== 'x86_64'"}
#
# Packages locked due to issues (try to check if these are fixed in a release every so often)
#
# Pin this until piwheels is building 1.9 (see https://www.piwheels.org/project/scipy/)
scipy = "==1.8.1"
"backports.zoneinfo" = {version = "*", markers = "python_version < '3.9'"}
"importlib-resources" = {version = "*", markers = "python_version < '3.9'"}
zipp = {version = "*", markers = "python_version < '3.9'"}
[dev-packages]
coveralls = "*"
factory-boy = "*"
pycodestyle = "*"
pytest = "*"
pytest-cov = "*"
pytest-django = "*"
pytest-env = "*"
pytest-sugar = "*"
pytest-xdist = "*"
sphinx = "~=4.4.0"
sphinx_rtd_theme = "*"
tox = "*"
black = "*"
pre-commit = "*"
imagehash = "*"
mkdocs-material = "*"
ruff = "*"
[typing-dev]
mypy = "*"
types-Pillow = "*"
django-filter-stubs = "*"
types-python-dateutil = "*"
djangorestframework-stubs = {extras= ["compatible-mypy"], version="*"}
celery-types = "*"
django-stubs = {extras= ["compatible-mypy"], version="*"}
types-dateparser = "*"
types-bleach = "*"
types-humanfriendly = "*"
types-redis = "*"
types-tqdm = "*"
types-Markdown = "*"
types-Pygments = "*"
types-backports = "*"
types-colorama = "*"
types-psycopg2 = "*"
types-setuptools = "*"

4101
Pipfile.lock generated

File diff suppressed because it is too large Load Diff

View File

@@ -1,9 +1,8 @@
[![ci](https://github.com/paperless-ngx/paperless-ngx/workflows/ci/badge.svg)](https://github.com/paperless-ngx/paperless-ngx/actions)
[![Crowdin](https://badges.crowdin.net/paperless-ngx/localized.svg)](https://crowdin.com/project/paperless-ngx)
[![Documentation Status](https://img.shields.io/github/deployments/paperless-ngx/paperless-ngx/github-pages?label=docs)](https://docs.paperless-ngx.com)
[![codecov](https://codecov.io/gh/paperless-ngx/paperless-ngx/branch/main/graph/badge.svg?token=VK6OUPJ3TY)](https://codecov.io/gh/paperless-ngx/paperless-ngx)
[![Chat on Matrix](https://matrix.to/img/matrix-badge.svg)](https://matrix.to/#/%23paperlessngx%3Amatrix.org)
[![demo](https://cronitor.io/badges/ve7ItY/production/W5E_B9jkelG9ZbDiNHUPQEVH3MY.svg)](https://demo.paperless-ngx.com)
[![Documentation Status](https://readthedocs.org/projects/paperless-ngx/badge/?version=latest)](https://paperless-ngx.readthedocs.io/en/latest/?badge=latest)
[![Coverage Status](https://coveralls.io/repos/github/paperless-ngx/paperless-ngx/badge.svg?branch=master)](https://coveralls.io/github/paperless-ngx/paperless-ngx?branch=master)
[![Chat on Matrix](https://matrix.to/img/matrix-badge.svg)](https://matrix.to/#/#paperless:adnidor.de)
<p align="center">
<img src="https://github.com/paperless-ngx/paperless-ngx/raw/main/resources/logo/web/png/Black%20logo%20-%20no%20background.png#gh-light-mode-only" width="50%" />
@@ -33,13 +32,13 @@ A demo is available at [demo.paperless-ngx.com](https://demo.paperless-ngx.com)
# Features
![Dashboard](https://raw.githubusercontent.com/paperless-ngx/paperless-ngx/main/docs/assets/screenshots/documents-smallcards.png#gh-light-mode-only)
![Dashboard](https://raw.githubusercontent.com/paperless-ngx/paperless-ngx/main/docs/assets/screenshots/documents-smallcards-dark.png#gh-dark-mode-only)
![Dashboard](https://raw.githubusercontent.com/paperless-ngx/paperless-ngx/main/docs/_static/screenshots/documents-wchrome.png#gh-light-mode-only)
![Dashboard](https://raw.githubusercontent.com/paperless-ngx/paperless-ngx/main/docs/_static/screenshots/documents-wchrome-dark.png#gh-dark-mode-only)
- Organize and index your scanned documents with tags, correspondents, types, and more.
- Performs OCR on your documents, adds selectable text to image only documents and adds tags, correspondents and document types to your documents.
- Supports PDF documents, images, plain text files, and Office documents (Word, Excel, Powerpoint, and LibreOffice equivalents).
- Office document support is optional and provided by Apache Tika (see [configuration](https://docs.paperless-ngx.com/configuration/#tika))
- Office document support is optional and provided by Apache Tika (see [configuration](https://paperless-ngx.readthedocs.io/en/latest/configuration.html#tika-settings))
- Paperless stores your documents plain on disk. Filenames and folders are managed by paperless and their format can be configured freely.
- Single page application front end.
- Includes a dashboard that shows basic statistics and has document upload.
@@ -57,7 +56,7 @@ A demo is available at [demo.paperless-ngx.com](https://demo.paperless-ngx.com)
- Paperless-ngx learns from your documents and will be able to automatically assign tags, correspondents and types to documents once you've stored a few documents in paperless.
- Optimized for multi core systems: Paperless-ngx consumes multiple documents in parallel.
- The integrated sanity checker makes sure that your document archive is in good health.
- [More screenshots are available in the documentation](https://docs.paperless-ngx.com/#screenshots).
- [More screenshots are available in the documentation](https://paperless-ngx.readthedocs.io/en/latest/screenshots.html).
# Getting started
@@ -69,19 +68,19 @@ If you'd like to jump right in, you can configure a docker-compose environment w
bash -c "$(curl -L https://raw.githubusercontent.com/paperless-ngx/paperless-ngx/main/install-paperless-ngx.sh)"
```
Alternatively, you can install the dependencies and setup apache and a database server yourself. The [documentation](https://docs.paperless-ngx.com/setup/#installation) has a step by step guide on how to do it.
Alternatively, you can install the dependencies and setup apache and a database server yourself. The [documentation](https://paperless-ngx.readthedocs.io/en/latest/setup.html#installation) has a step by step guide on how to do it.
Migrating from Paperless-ng is easy, just drop in the new docker image! See the [documentation on migrating](https://docs.paperless-ngx.com/setup/#migrating-to-paperless-ngx) for more details.
Migrating from Paperless-ng is easy, just drop in the new docker image! See the [documentation on migrating](https://paperless-ngx.readthedocs.io/en/latest/setup.html#migrating-from-paperless-ng) for more details.
<!-- omit in toc -->
### Documentation
The documentation for Paperless-ngx is available at [https://docs.paperless-ngx.com](https://docs.paperless-ngx.com/).
The documentation for Paperless-ngx is available on [ReadTheDocs](https://paperless-ngx.readthedocs.io/).
# Contributing
If you feel like contributing to the project, please do! Bug fixes, enhancements, visual fixes etc. are always welcome. If you want to implement something big: Please start a discussion about that! The [documentation](https://docs.paperless-ngx.com/development/) has some basic information on how to get started.
If you feel like contributing to the project, please do! Bug fixes, enhancements, visual fixes etc. are always welcome. If you want to implement something big: Please start a discussion about that! The [documentation](https://paperless-ngx.readthedocs.io/en/latest/extending.html) has some basic information on how to get started.
## Community Support
@@ -101,16 +100,21 @@ For bugs please [open an issue](https://github.com/paperless-ngx/paperless-ngx/i
# Affiliated Projects
Paperless has been around for a while now, and people have built tools that interact with it. If you're one of them, please reach out and we can add your project to the list. Current projects include:
Paperless has been around a while now, and people are starting to build stuff on top of it. If you're one of those people, we can add your project to this list:
- **Mobile**
- [Paperless App](https://github.com/bauerj/paperless_app): An Android/iOS application for Paperless-ngx.
- [Paperless Mobile](https://github.com/astubenbord/paperless-mobile): A modern, feature rich Android app for Paperless-ngx.
- [Paperless Share](https://github.com/qcasey/paperless_share): Share any files from your Android application with Paperless-ngx. Very simple, but works with all mobile scanning apps that allow you to share scanned documents.
- **Desktop**
- [Scan to Paperless](https://github.com/sbrunner/scan-to-paperless): Scan and prepare (crop, deskew, OCR, ...) your documents for use in Paperless-ngx.
- [Paperless App](https://github.com/bauerj/paperless_app): An Android/iOS app for Paperless-ngx. Also works with the original Paperless and Paperless-ngx.
- [Paperless Share](https://github.com/qcasey/paperless_share). Share any files from your Android application with paperless. Very simple, but works with all of the mobile scanning apps out there that allow you to share scanned documents.
- [Scan to Paperless](https://github.com/sbrunner/scan-to-paperless): Scan and prepare (crop, deskew, OCR, ...) your documents for Paperless.
These projects also exist, but their status and compatibility with paperless-ngx is unknown.
- [paperless-cli](https://github.com/stgarf/paperless-cli): A golang command line binary to interact with a Paperless instance.
This project also exists, but needs updates to be compatible with paperless-ngx.
- [Paperless Desktop](https://github.com/thomasbrueggemann/paperless-desktop): A desktop UI for your Paperless installation. Runs on Mac, Linux, and Windows.
Known issues on Mac: (Could not load reminders and documents)
# Important Note
> Document scanners are typically used to scan sensitive documents like your social insurance number, tax records, invoices, etc. **Paperless-ngx should never be run on an untrusted host** because information is stored in clear text without encryption. No guarantees are made regarding security (but we do try!) and you use the app at your own risk.
> **The safest way to run Paperless-ngx is on a local server in your own home with backups in place**.
Document scanners are typically used to scan sensitive documents. Things like your social insurance number, tax records, invoices, etc. Everything is stored in the clear without encryption. This means that Paperless should never be run on an untrusted host. Instead, I recommend that if you do want to use it, run it locally on a server in your own home.

View File

@@ -1,81 +0,0 @@
#!/usr/bin/env bash
# Helper script for building the Docker image locally.
# Parses and provides the nessecary versions of other images to Docker
# before passing in the rest of script args.
# First Argument: The Dockerfile to build
# Other Arguments: Additional arguments to docker build
# Example Usage:
# ./build-docker-image.sh Dockerfile -t paperless-ngx:my-awesome-feature
set -eu
if ! command -v jq &> /dev/null ; then
echo "jq required"
exit 1
elif [ ! -f "$1" ]; then
echo "$1 is not a file, please provide the Dockerfile"
exit 1
fi
# Get the branch name (used for caching)
branch_name=$(git rev-parse --abbrev-ref HEAD)
# Parse eithe Pipfile.lock or the .build-config.json
jbig2enc_version=$(jq -r '.jbig2enc.version' .build-config.json)
qpdf_version=$(jq -r '.qpdf.version' .build-config.json)
psycopg2_version=$(jq -r '.default.psycopg2.version | gsub("=";"")' Pipfile.lock)
pikepdf_version=$(jq -r '.default.pikepdf.version | gsub("=";"")' Pipfile.lock)
pillow_version=$(jq -r '.default.pillow.version | gsub("=";"")' Pipfile.lock)
lxml_version=$(jq -r '.default.lxml.version | gsub("=";"")' Pipfile.lock)
base_filename="$(basename -- "${1}")"
build_args_str=""
cache_from_str=""
case "${base_filename}" in
*.jbig2enc)
build_args_str="--build-arg JBIG2ENC_VERSION=${jbig2enc_version}"
cache_from_str="--cache-from ghcr.io/paperless-ngx/paperless-ngx/builder/cache/jbig2enc:${jbig2enc_version}"
;;
*.psycopg2)
build_args_str="--build-arg PSYCOPG2_VERSION=${psycopg2_version}"
cache_from_str="--cache-from ghcr.io/paperless-ngx/paperless-ngx/builder/cache/psycopg2:${psycopg2_version}"
;;
*.qpdf)
build_args_str="--build-arg QPDF_VERSION=${qpdf_version}"
cache_from_str="--cache-from ghcr.io/paperless-ngx/paperless-ngx/builder/cache/qpdf:${qpdf_version}"
;;
*.pikepdf)
build_args_str="--build-arg QPDF_VERSION=${qpdf_version} --build-arg PIKEPDF_VERSION=${pikepdf_version} --build-arg PILLOW_VERSION=${pillow_version} --build-arg LXML_VERSION=${lxml_version}"
cache_from_str="--cache-from ghcr.io/paperless-ngx/paperless-ngx/builder/cache/pikepdf:${pikepdf_version}"
;;
Dockerfile)
build_args_str="--build-arg QPDF_VERSION=${qpdf_version} --build-arg PIKEPDF_VERSION=${pikepdf_version} --build-arg PSYCOPG2_VERSION=${psycopg2_version} --build-arg JBIG2ENC_VERSION=${jbig2enc_version}"
cache_from_str="--cache-from ghcr.io/paperless-ngx/paperless-ngx/builder/cache/app:${branch_name} --cache-from ghcr.io/paperless-ngx/paperless-ngx/builder/cache/app:dev"
;;
*)
echo "Unable to match ${base_filename}"
exit 1
;;
esac
read -r -a build_args_arr <<< "${build_args_str}"
read -r -a cache_from_arr <<< "${cache_from_str}"
set -eux
docker buildx build --file "${1}" \
--progress=plain \
--output=type=docker \
"${cache_from_arr[@]}" \
"${build_args_arr[@]}" \
"${@:2}" .

View File

@@ -1,48 +0,0 @@
# This Dockerfile compiles the jbig2enc library
# Inputs:
# - JBIG2ENC_VERSION - the Git tag to checkout and build
FROM debian:bullseye-slim as main
LABEL org.opencontainers.image.description="A intermediate image with jbig2enc built"
ARG DEBIAN_FRONTEND=noninteractive
ARG JBIG2ENC_VERSION
ARG BUILD_PACKAGES="\
build-essential \
automake \
libtool \
libleptonica-dev \
zlib1g-dev \
git \
ca-certificates"
WORKDIR /usr/src/jbig2enc
RUN set -eux \
&& echo "Installing build tools" \
&& apt-get update --quiet \
&& apt-get install --yes --quiet --no-install-recommends ${BUILD_PACKAGES} \
&& echo "Building jbig2enc" \
&& git clone --quiet --branch $JBIG2ENC_VERSION https://github.com/agl/jbig2enc . \
&& ./autogen.sh \
&& ./configure \
&& make \
&& echo "Gathering package data" \
&& dpkg-query -f '${Package;-40}${Version}\n' -W > ./pkg-list.txt \
&& echo "Cleaning up image" \
&& apt-get -y purge ${BUILD_PACKAGES} \
&& apt-get -y autoremove --purge \
&& rm -rf /var/lib/apt/lists/* \
&& echo "Moving files around" \
&& mkdir build \
# Unlink a symlink that causes problems
&& unlink ./src/.libs/libjbig2enc.la \
# Move what the link pointed to
&& mv ./src/libjbig2enc.la ./build/ \
# Move the shared library .so files
&& mv ./src/.libs/libjbig2enc* ./build/ \
# And move the cli binary
&& mv ./src/jbig2 ./build/ \
&& mv ./pkg-list.txt ./build/

View File

@@ -1,118 +0,0 @@
# This Dockerfile builds the pikepdf wheel
# Inputs:
# - REPO - Docker repository to pull qpdf from
# - QPDF_VERSION - The image qpdf version to copy .deb files from
# - PIKEPDF_VERSION - Version of pikepdf to build wheel for
# Default to pulling from the main repo registry when manually building
ARG REPO="paperless-ngx/paperless-ngx"
# This does nothing, except provide a name for a copy below
ARG QPDF_VERSION
FROM --platform=$BUILDPLATFORM ghcr.io/${REPO}/builder/qpdf:${QPDF_VERSION} as qpdf-builder
#
# Stage: builder
# Purpose:
# - Build the pikepdf wheel
# - Build any dependent wheels which can't be found
#
FROM python:3.9-slim-bullseye as builder
LABEL org.opencontainers.image.description="A intermediate image with pikepdf wheel built"
# Buildx provided
ARG TARGETARCH
ARG TARGETVARIANT
ARG DEBIAN_FRONTEND=noninteractive
# Workflow provided
ARG QPDF_VERSION
ARG PIKEPDF_VERSION
# These are not used, but will still bust the cache if one changes
# Otherwise, the main image will try to build thing (and fail)
ARG PILLOW_VERSION
ARG LXML_VERSION
ARG BUILD_PACKAGES="\
build-essential \
python3-dev \
python3-pip \
# qpdf requirement - https://github.com/qpdf/qpdf#crypto-providers
libgnutls28-dev \
# lxml requrements - https://lxml.de/installation.html
libxml2-dev \
libxslt1-dev \
# Pillow requirements - https://pillow.readthedocs.io/en/stable/installation.html#external-libraries
# JPEG functionality
libjpeg62-turbo-dev \
# conpressed PNG
zlib1g-dev \
# compressed TIFF
libtiff-dev \
# type related services
libfreetype-dev \
# color management
liblcms2-dev \
# WebP format
libwebp-dev \
# JPEG 2000
libopenjp2-7-dev \
# improved color quantization
libimagequant-dev \
# complex text layout support
libraqm-dev"
WORKDIR /usr/src
COPY --from=qpdf-builder /usr/src/qpdf/${QPDF_VERSION}/${TARGETARCH}${TARGETVARIANT}/*.deb ./
# As this is an base image for a multi-stage final image
# the added size of the install is basically irrelevant
RUN set -eux \
&& echo "Installing build tools" \
&& apt-get update --quiet \
&& apt-get install --yes --quiet --no-install-recommends ${BUILD_PACKAGES} \
&& echo "Installing qpdf" \
&& dpkg --install libqpdf29_*.deb \
&& dpkg --install libqpdf-dev_*.deb \
&& echo "Installing Python tools" \
&& python3 -m pip install --no-cache-dir --upgrade \
pip \
wheel \
# https://pikepdf.readthedocs.io/en/latest/installation.html#requirements
pybind11 \
&& echo "Building pikepdf wheel ${PIKEPDF_VERSION}" \
&& mkdir wheels \
&& python3 -m pip wheel \
# Build the package at the required version
pikepdf==${PIKEPDF_VERSION} \
# Look to piwheels for additional pre-built wheels
--extra-index-url https://www.piwheels.org/simple \
# Output the *.whl into this directory
--wheel-dir wheels \
# Do not use a binary packge for the package being built
--no-binary=pikepdf \
# Do use binary packages for dependencies
--prefer-binary \
# Don't cache build files
--no-cache-dir \
&& ls -ahl wheels \
&& echo "Gathering package data" \
&& dpkg-query -f '${Package;-40}${Version}\n' -W > ./wheels/pkg-list.txt \
&& echo "Cleaning up image" \
&& apt-get -y purge ${BUILD_PACKAGES} \
&& apt-get -y autoremove --purge \
&& rm -rf /var/lib/apt/lists/*
#
# Stage: package
# Purpose: Holds the compiled .whl files in a tiny image to pull
#
FROM alpine:3.17 as package
WORKDIR /usr/src/wheels/
COPY --from=builder /usr/src/wheels/*.whl ./
COPY --from=builder /usr/src/wheels/pkg-list.txt ./

View File

@@ -1,66 +0,0 @@
# This Dockerfile builds the psycopg2 wheel
# Inputs:
# - PSYCOPG2_VERSION - Version to build
#
# Stage: builder
# Purpose:
# - Build the psycopg2 wheel
#
FROM python:3.9-slim-bullseye as builder
LABEL org.opencontainers.image.description="A intermediate image with psycopg2 wheel built"
ARG PSYCOPG2_VERSION
ARG DEBIAN_FRONTEND=noninteractive
ARG BUILD_PACKAGES="\
build-essential \
python3-dev \
python3-pip \
# https://www.psycopg.org/docs/install.html#prerequisites
libpq-dev"
WORKDIR /usr/src
# As this is an base image for a multi-stage final image
# the added size of the install is basically irrelevant
RUN set -eux \
&& echo "Installing build tools" \
&& apt-get update --quiet \
&& apt-get install --yes --quiet --no-install-recommends ${BUILD_PACKAGES} \
&& echo "Installing Python tools" \
&& python3 -m pip install --no-cache-dir --upgrade pip wheel \
&& echo "Building psycopg2 wheel ${PSYCOPG2_VERSION}" \
&& cd /usr/src \
&& mkdir wheels \
&& python3 -m pip wheel \
# Build the package at the required version
psycopg2==${PSYCOPG2_VERSION} \
# Output the *.whl into this directory
--wheel-dir wheels \
# Do not use a binary packge for the package being built
--no-binary=psycopg2 \
# Do use binary packages for dependencies
--prefer-binary \
# Don't cache build files
--no-cache-dir \
&& ls -ahl wheels/ \
&& echo "Gathering package data" \
&& dpkg-query -f '${Package;-40}${Version}\n' -W > ./wheels/pkg-list.txt \
&& echo "Cleaning up image" \
&& apt-get -y purge ${BUILD_PACKAGES} \
&& apt-get -y autoremove --purge \
&& rm -rf /var/lib/apt/lists/*
#
# Stage: package
# Purpose: Holds the compiled .whl files in a tiny image to pull
#
FROM alpine:3.17 as package
WORKDIR /usr/src/wheels/
COPY --from=builder /usr/src/wheels/*.whl ./
COPY --from=builder /usr/src/wheels/pkg-list.txt ./

View File

@@ -1,156 +0,0 @@
#
# Stage: pre-build
# Purpose:
# - Installs common packages
# - Sets common environment variables related to dpkg
# - Aquires the qpdf source from bookwork
# Useful Links:
# - https://qpdf.readthedocs.io/en/stable/installation.html#system-requirements
# - https://wiki.debian.org/Multiarch/HOWTO
# - https://wiki.debian.org/CrossCompiling
#
FROM debian:bullseye-slim as pre-build
ARG QPDF_VERSION
ARG COMMON_BUILD_PACKAGES="\
cmake \
debhelper\
debian-keyring \
devscripts \
dpkg-dev \
equivs \
packaging-dev \
libtool"
ENV DEB_BUILD_OPTIONS="terse nocheck nodoc parallel=2"
WORKDIR /usr/src
RUN set -eux \
&& echo "Installing common packages" \
&& apt-get update --quiet \
&& apt-get install --yes --quiet --no-install-recommends ${COMMON_BUILD_PACKAGES} \
&& echo "Getting qpdf source" \
&& echo "deb-src http://deb.debian.org/debian/ bookworm main" > /etc/apt/sources.list.d/bookworm-src.list \
&& apt-get update --quiet \
&& apt-get source --yes --quiet qpdf=${QPDF_VERSION}-1/bookworm
#
# Stage: amd64-builder
# Purpose: Builds qpdf for x86_64 (native build)
#
FROM pre-build as amd64-builder
ARG AMD64_BUILD_PACKAGES="\
build-essential \
libjpeg62-turbo-dev:amd64 \
libgnutls28-dev:amd64 \
zlib1g-dev:amd64"
WORKDIR /usr/src/qpdf-${QPDF_VERSION}
RUN set -eux \
&& echo "Beginning amd64" \
&& echo "Install amd64 packages" \
&& apt-get update --quiet \
&& apt-get install --yes --quiet --no-install-recommends ${AMD64_BUILD_PACKAGES} \
&& echo "Building amd64" \
&& dpkg-buildpackage --build=binary --unsigned-source --unsigned-changes --post-clean \
&& echo "Removing debug files" \
&& rm -f ../libqpdf29-dbgsym* \
&& rm -f ../qpdf-dbgsym* \
&& echo "Gathering package data" \
&& dpkg-query -f '${Package;-40}${Version}\n' -W > ../pkg-list.txt
#
# Stage: armhf-builder
# Purpose:
# - Sets armhf specific environment
# - Builds qpdf for armhf (cross compile)
#
FROM pre-build as armhf-builder
ARG ARMHF_PACKAGES="\
crossbuild-essential-armhf \
libjpeg62-turbo-dev:armhf \
libgnutls28-dev:armhf \
zlib1g-dev:armhf"
WORKDIR /usr/src/qpdf-${QPDF_VERSION}
ENV CXX="/usr/bin/arm-linux-gnueabihf-g++" \
CC="/usr/bin/arm-linux-gnueabihf-gcc"
RUN set -eux \
&& echo "Beginning armhf" \
&& echo "Install armhf packages" \
&& dpkg --add-architecture armhf \
&& apt-get update --quiet \
&& apt-get install --yes --quiet --no-install-recommends ${ARMHF_PACKAGES} \
&& echo "Building armhf" \
&& dpkg-buildpackage --build=binary --unsigned-source --unsigned-changes --post-clean --host-arch armhf \
&& echo "Removing debug files" \
&& rm -f ../libqpdf29-dbgsym* \
&& rm -f ../qpdf-dbgsym* \
&& echo "Gathering package data" \
&& dpkg-query -f '${Package;-40}${Version}\n' -W > ../pkg-list.txt
#
# Stage: aarch64-builder
# Purpose:
# - Sets aarch64 specific environment
# - Builds qpdf for aarch64 (cross compile)
#
FROM pre-build as aarch64-builder
ARG ARM64_PACKAGES="\
crossbuild-essential-arm64 \
libjpeg62-turbo-dev:arm64 \
libgnutls28-dev:arm64 \
zlib1g-dev:arm64"
ENV CXX="/usr/bin/aarch64-linux-gnu-g++" \
CC="/usr/bin/aarch64-linux-gnu-gcc"
WORKDIR /usr/src/qpdf-${QPDF_VERSION}
RUN set -eux \
&& echo "Beginning arm64" \
&& echo "Install arm64 packages" \
&& dpkg --add-architecture arm64 \
&& apt-get update --quiet \
&& apt-get install --yes --quiet --no-install-recommends ${ARM64_PACKAGES} \
&& echo "Building arm64" \
&& dpkg-buildpackage --build=binary --unsigned-source --unsigned-changes --post-clean --host-arch arm64 \
&& echo "Removing debug files" \
&& rm -f ../libqpdf29-dbgsym* \
&& rm -f ../qpdf-dbgsym* \
&& echo "Gathering package data" \
&& dpkg-query -f '${Package;-40}${Version}\n' -W > ../pkg-list.txt
#
# Stage: package
# Purpose: Holds the compiled .deb files in arch/variant specific folders
#
FROM alpine:3.17 as package
LABEL org.opencontainers.image.description="A image with qpdf installers stored in architecture & version specific folders"
ARG QPDF_VERSION
WORKDIR /usr/src/qpdf/${QPDF_VERSION}/amd64
COPY --from=amd64-builder /usr/src/*.deb ./
COPY --from=amd64-builder /usr/src/pkg-list.txt ./
# Note this is ${TARGETARCH}${TARGETVARIANT} for armv7
WORKDIR /usr/src/qpdf/${QPDF_VERSION}/armv7
COPY --from=armhf-builder /usr/src/*.deb ./
COPY --from=armhf-builder /usr/src/pkg-list.txt ./
WORKDIR /usr/src/qpdf/${QPDF_VERSION}/arm64
COPY --from=aarch64-builder /usr/src/*.deb ./
COPY --from=aarch64-builder /usr/src/pkg-list.txt ./

View File

@@ -1,57 +0,0 @@
# Installer Library
This folder contains the Dockerfiles for building certain installers or libraries, which are then pulled into the main image.
## [jbig2enc](https://github.com/agl/jbig2enc)
### Why
JBIG is an image coding which can achieve better compression of images for PDFs.
### What
The Docker image builds a shared library file and utility, which is copied into the correct location in the final image.
### Updating
1. Ensure the given qpdf version is present in [Debian bookworm](https://packages.debian.org/bookworm/qpdf)
2. Update `.build-config.json` to the given version
3. If the Debian specific version has incremented, update `Dockerfile.qpdf`
See Also:
- [OCRMyPDF Documentation](https://ocrmypdf.readthedocs.io/en/latest/jbig2.html)
## [psycopg2](https://www.psycopg.org/)
### Why
The pre-built wheels of psycopg2 are built on Debian 9, which provides a quite old version of libpq-dev. This causes issue with authentication methods.
### What
The image builds psycopg2 wheels on Debian 10 and places the produced wheels into `/usr/src/wheels/`.
See Also:
- [Issue 266](https://github.com/paperless-ngx/paperless-ngx/issues/266)
## [qpdf](https://qpdf.readthedocs.io/en/stable/index.html)
### Why
qpdf and it's library provide tools to read, manipulate and fix up PDFs. Version 11 is also required by `pikepdf` 6+ and Debian 9 does not provide above version 10.
### What
The Docker image cross compiles .deb installers for each supported architecture of the main image. The installers are placed in `/usr/src/qpdf/${QPDF_VERSION}/${TARGETARCH}${TARGETVARIANT}/`
## [pikepdf](https://pikepdf.readthedocs.io/en/latest/)
### Why
Required by OCRMyPdf, this is a general purpose library for PDF manipulation in Python via the qpdf libraries.
### What
The built wheels are placed into `/usr/src/wheels/`

View File

@@ -1,25 +0,0 @@
# docker-compose file for running paperless testing with actual gotenberg
# and Tika containers for a more end to end test of the Tika related functionality
# Can be used locally or by the CI to start the nessecary containers with the
# correct networking for the tests
version: "3.7"
services:
gotenberg:
image: docker.io/gotenberg/gotenberg:7.8
hostname: gotenberg
container_name: gotenberg
network_mode: host
restart: unless-stopped
# The gotenberg chromium route is used to convert .eml files. We do not
# want to allow external content like tracking pixels or even javascript.
command:
- "gotenberg"
- "--chromium-disable-javascript=true"
- "--chromium-allow-list=file:///tmp/.*"
tika:
image: ghcr.io/paperless-ngx/tika:latest
hostname: tika
container_name: tika
network_mode: host
restart: unless-stopped

View File

@@ -22,10 +22,6 @@
# Docker setup does not use the configuration file.
# A few commonly adjusted settings are provided below.
# This is required if you will be exposing Paperless-ngx on a public domain
# (if doing so please consider security measures such as reverse proxy)
#PAPERLESS_URL=https://paperless.example.com
# Adjust this key if you plan to make paperless available publicly. It should
# be a very long sequence of random characters. You don't need to remember it.
#PAPERLESS_SECRET_KEY=change-me
@@ -36,7 +32,3 @@
# The default language to use for OCR. Set this to the language most of your
# documents are written in.
#PAPERLESS_OCR_LANGUAGE=eng
# Set if accessing paperless via a domain subpath e.g. https://domain.com/PATHPREFIX and using a reverse-proxy like traefik or nginx
#PAPERLESS_FORCE_SCRIPT_NAME=/PATHPREFIX
#PAPERLESS_STATIC_URL=/PATHPREFIX/static/ # trailing slash required

View File

@@ -1,81 +0,0 @@
# docker-compose file for running paperless from the Docker Hub.
# This file contains everything paperless needs to run.
# Paperless supports amd64, arm and arm64 hardware.
#
# All compose files of paperless configure paperless in the following way:
#
# - Paperless is (re)started on system boot, if it was running before shutdown.
# - Docker volumes for storing data are managed by Docker.
# - Folders for importing and exporting files are created in the same directory
# as this file and mounted to the correct folders inside the container.
# - Paperless listens on port 8000.
#
# In addition to that, this docker-compose file adds the following optional
# configurations:
#
# - Instead of SQLite (default), MariaDB is used as the database server.
#
# To install and update paperless with this file, do the following:
#
# - Copy this file as 'docker-compose.yml' and the files 'docker-compose.env'
# and '.env' into a folder.
# - Run 'docker-compose pull'.
# - Run 'docker-compose run --rm webserver createsuperuser' to create a user.
# - Run 'docker-compose up -d'.
#
# For more extensive installation and update instructions, refer to the
# documentation.
version: "3.4"
services:
broker:
image: docker.io/library/redis:7
restart: unless-stopped
volumes:
- redisdata:/data
db:
image: docker.io/library/mariadb:10
restart: unless-stopped
volumes:
- dbdata:/var/lib/mysql
environment:
MARIADB_HOST: paperless
MARIADB_DATABASE: paperless
MARIADB_USER: paperless
MARIADB_PASSWORD: paperless
MARIADB_ROOT_PASSWORD: paperless
webserver:
image: ghcr.io/paperless-ngx/paperless-ngx:latest
restart: unless-stopped
depends_on:
- db
- broker
ports:
- "8000:8000"
healthcheck:
test: ["CMD", "curl", "-f", "http://localhost:8000"]
interval: 30s
timeout: 10s
retries: 5
volumes:
- data:/usr/src/paperless/data
- media:/usr/src/paperless/media
- ./export:/usr/src/paperless/export
- ./consume:/usr/src/paperless/consume
env_file: docker-compose.env
environment:
PAPERLESS_REDIS: redis://broker:6379
PAPERLESS_DBENGINE: mariadb
PAPERLESS_DBHOST: db
PAPERLESS_DBUSER: paperless # only needed if non-default username
PAPERLESS_DBPASS: paperless # only needed if non-default password
PAPERLESS_DBPORT: 3306
volumes:
data:
media:
dbdata:
redisdata:

View File

@@ -31,13 +31,13 @@
version: "3.4"
services:
broker:
image: docker.io/library/redis:7
image: redis:6.0
restart: unless-stopped
volumes:
- redisdata:/data
db:
image: docker.io/library/postgres:13
image: postgres:13
restart: unless-stopped
volumes:
- pgdata:/var/lib/postgresql/data
@@ -53,9 +53,9 @@ services:
- db
- broker
ports:
- "8010:8000"
- 8010:8000
healthcheck:
test: ["CMD", "curl", "-fs", "-S", "--max-time", "2", "http://localhost:8000"]
test: ["CMD", "curl", "-f", "http://localhost:8000"]
interval: 30s
timeout: 10s
retries: 5

View File

@@ -1,6 +1,7 @@
# docker-compose file for running paperless from the docker container registry.
# This file contains everything paperless needs to run.
# Paperless supports amd64, arm and arm64 hardware.
# Paperless supports amd64, arm and arm64 hardware. The apache/tika image
# does not support arm or arm64, however.
#
# All compose files of paperless configure paperless in the following way:
#
@@ -33,13 +34,13 @@
version: "3.4"
services:
broker:
image: docker.io/library/redis:7
image: redis:6.0
restart: unless-stopped
volumes:
- redisdata:/data
db:
image: docker.io/library/postgres:13
image: postgres:13
restart: unless-stopped
volumes:
- pgdata:/var/lib/postgresql/data
@@ -57,9 +58,9 @@ services:
- gotenberg
- tika
ports:
- "8000:8000"
- 8000:8000
healthcheck:
test: ["CMD", "curl", "-fs", "-S", "--max-time", "2", "http://localhost:8000"]
test: ["CMD", "curl", "-f", "http://localhost:8000"]
interval: 30s
timeout: 10s
retries: 5
@@ -77,18 +78,14 @@ services:
PAPERLESS_TIKA_ENDPOINT: http://tika:9998
gotenberg:
image: docker.io/gotenberg/gotenberg:7.8
image: gotenberg/gotenberg:7
restart: unless-stopped
# The gotenberg chromium route is used to convert .eml files. We do not
# want to allow external content like tracking pixels or even javascript.
command:
- "gotenberg"
- "--chromium-disable-javascript=true"
- "--chromium-allow-list=file:///tmp/.*"
- "--chromium-disable-routes=true"
tika:
image: ghcr.io/paperless-ngx/tika:latest
image: apache/tika
restart: unless-stopped
volumes:

View File

@@ -29,13 +29,13 @@
version: "3.4"
services:
broker:
image: docker.io/library/redis:7
image: redis:6.0
restart: unless-stopped
volumes:
- redisdata:/data
db:
image: docker.io/library/postgres:13
image: postgres:13
restart: unless-stopped
volumes:
- pgdata:/var/lib/postgresql/data
@@ -51,9 +51,9 @@ services:
- db
- broker
ports:
- "8000:8000"
- 8000:8000
healthcheck:
test: ["CMD", "curl", "-fs", "-S", "--max-time", "2", "http://localhost:8000"]
test: ["CMD", "curl", "-f", "http://localhost:8000"]
interval: 30s
timeout: 10s
retries: 5

View File

@@ -1,4 +1,4 @@
# docker-compose file for running paperless from the Docker Hub.
# docker-compose file for running paperless from the docker container registry.
# This file contains everything paperless needs to run.
# Paperless supports amd64, arm and arm64 hardware.
#
@@ -10,10 +10,14 @@
# as this file and mounted to the correct folders inside the container.
# - Paperless listens on port 8000.
#
# SQLite is used as the database. The SQLite file is stored in the data volume.
#
# iwishiwasaneagle/apache-tika-arm docker image is used to enable arm64 arch
# which apache/tika does not currently support.
#
# In addition to that, this docker-compose file adds the following optional
# configurations:
#
# - Instead of SQLite (default), MariaDB is used as the database server.
# - Apache Tika and Gotenberg servers are started with paperless and paperless
# is configured to use these services. These provide support for consuming
# Office documents (Word, Excel, Power Point and their LibreOffice counter-
@@ -33,33 +37,20 @@
version: "3.4"
services:
broker:
image: docker.io/library/redis:7
image: redis:6.0
restart: unless-stopped
volumes:
- redisdata:/data
db:
image: docker.io/library/mariadb:10
restart: unless-stopped
volumes:
- dbdata:/var/lib/mysql
environment:
MARIADB_HOST: paperless
MARIADB_DATABASE: paperless
MARIADB_USER: paperless
MARIADB_PASSWORD: paperless
MARIADB_ROOT_PASSWORD: paperless
webserver:
image: ghcr.io/paperless-ngx/paperless-ngx:latest
restart: unless-stopped
depends_on:
- db
- broker
- gotenberg
- tika
ports:
- "8000:8000"
- 8000:8000
healthcheck:
test: ["CMD", "curl", "-f", "http://localhost:8000"]
interval: 30s
@@ -73,31 +64,21 @@ services:
env_file: docker-compose.env
environment:
PAPERLESS_REDIS: redis://broker:6379
PAPERLESS_DBENGINE: mariadb
PAPERLESS_DBHOST: db
PAPERLESS_DBUSER: paperless # only needed if non-default username
PAPERLESS_DBPASS: paperless # only needed if non-default password
PAPERLESS_DBPORT: 3306
PAPERLESS_TIKA_ENABLED: 1
PAPERLESS_TIKA_GOTENBERG_ENDPOINT: http://gotenberg:3000
PAPERLESS_TIKA_ENDPOINT: http://tika:9998
gotenberg:
image: docker.io/gotenberg/gotenberg:7.8
image: thecodingmachine/gotenberg
restart: unless-stopped
# The gotenberg chromium route is used to convert .eml files. We do not
# want to allow external content like tracking pixels or even javascript.
command:
- "gotenberg"
- "--chromium-disable-javascript=true"
- "--chromium-allow-list=file:///tmp/.*"
environment:
DISABLE_GOOGLE_CHROME: 1
tika:
image: ghcr.io/paperless-ngx/tika:latest
image: iwishiwasaneagle/apache-tika-arm@sha256:a78c25ffe57ecb1a194b2859d42a61af46e9e845191512b8f1a4bf90578ffdfd
restart: unless-stopped
volumes:
data:
media:
dbdata:
redisdata:

View File

@@ -1,6 +1,8 @@
# docker-compose file for running paperless from the docker container registry.
# This file contains everything paperless needs to run.
# Paperless supports amd64, arm and arm64 hardware.
# Paperless supports amd64, arm and arm64 hardware. The apache/tika image
# does not support arm or arm64, however.
#
# All compose files of paperless configure paperless in the following way:
#
# - Paperless is (re)started on system boot, if it was running before shutdown.
@@ -33,7 +35,7 @@
version: "3.4"
services:
broker:
image: docker.io/library/redis:7
image: redis:6.0
restart: unless-stopped
volumes:
- redisdata:/data
@@ -46,9 +48,9 @@ services:
- gotenberg
- tika
ports:
- "8000:8000"
- 8000:8000
healthcheck:
test: ["CMD", "curl", "-fs", "-S", "--max-time", "2", "http://localhost:8000"]
test: ["CMD", "curl", "-f", "http://localhost:8000"]
interval: 30s
timeout: 10s
retries: 5
@@ -65,18 +67,14 @@ services:
PAPERLESS_TIKA_ENDPOINT: http://tika:9998
gotenberg:
image: docker.io/gotenberg/gotenberg:7.8
image: gotenberg/gotenberg:7
restart: unless-stopped
# The gotenberg chromium route is used to convert .eml files. We do not
# want to allow external content like tracking pixels or even javascript.
command:
- "gotenberg"
- "--chromium-disable-javascript=true"
- "--chromium-allow-list=file:///tmp/.*"
- "--chromium-disable-routes=true"
tika:
image: ghcr.io/paperless-ngx/tika:latest
image: apache/tika
restart: unless-stopped
volumes:

View File

@@ -26,7 +26,7 @@
version: "3.4"
services:
broker:
image: docker.io/library/redis:7
image: redis:6.0
restart: unless-stopped
volumes:
- redisdata:/data
@@ -37,9 +37,9 @@ services:
depends_on:
- broker
ports:
- "8000:8000"
- 8000:8000
healthcheck:
test: ["CMD", "curl", "-fs", "-S", "--max-time", "2", "http://localhost:8000"]
test: ["CMD", "curl", "-f", "http://localhost:8000"]
interval: 30s
timeout: 10s
retries: 5

View File

@@ -4,119 +4,44 @@ set -e
# Source: https://github.com/sameersbn/docker-gitlab/
map_uidgid() {
local -r usermap_original_uid=$(id -u paperless)
local -r usermap_original_gid=$(id -g paperless)
local -r usermap_new_uid=${USERMAP_UID:-$usermap_original_uid}
local -r usermap_new_gid=${USERMAP_GID:-${usermap_original_gid:-$usermap_new_uid}}
if [[ ${usermap_new_uid} != "${usermap_original_uid}" || ${usermap_new_gid} != "${usermap_original_gid}" ]]; then
echo "Mapping UID and GID for paperless:paperless to $usermap_new_uid:$usermap_new_gid"
usermod -o -u "${usermap_new_uid}" paperless
groupmod -o -g "${usermap_new_gid}" paperless
fi
}
map_folders() {
# Export these so they can be used in docker-prepare.sh
export DATA_DIR="${PAPERLESS_DATA_DIR:-/usr/src/paperless/data}"
export MEDIA_ROOT_DIR="${PAPERLESS_MEDIA_ROOT:-/usr/src/paperless/media}"
export CONSUME_DIR="${PAPERLESS_CONSUMPTION_DIR:-/usr/src/paperless/consume}"
}
custom_container_init() {
# Mostly borrowed from the LinuxServer.io base image
# https://github.com/linuxserver/docker-baseimage-ubuntu/tree/bionic/root/etc/cont-init.d
local -r custom_script_dir="/custom-cont-init.d"
# Tamper checking.
# Don't run files which are owned by anyone except root
# Don't run files which are writeable by others
if [ -d "${custom_script_dir}" ]; then
if [ -n "$(/usr/bin/find "${custom_script_dir}" -maxdepth 1 ! -user root)" ]; then
echo "**** Potential tampering with custom scripts detected ****"
echo "**** The folder '${custom_script_dir}' must be owned by root ****"
return 0
fi
if [ -n "$(/usr/bin/find "${custom_script_dir}" -maxdepth 1 -perm -o+w)" ]; then
echo "**** The folder '${custom_script_dir}' or some of contents have write permissions for others, which is a security risk. ****"
echo "**** Please review the permissions and their contents to make sure they are owned by root, and can only be modified by root. ****"
return 0
fi
# Make sure custom init directory has files in it
if [ -n "$(/bin/ls -A "${custom_script_dir}" 2>/dev/null)" ]; then
echo "[custom-init] files found in ${custom_script_dir} executing"
# Loop over files in the directory
for SCRIPT in "${custom_script_dir}"/*; do
NAME="$(basename "${SCRIPT}")"
if [ -f "${SCRIPT}" ]; then
echo "[custom-init] ${NAME}: executing..."
/bin/bash "${SCRIPT}"
echo "[custom-init] ${NAME}: exited $?"
elif [ ! -f "${SCRIPT}" ]; then
echo "[custom-init] ${NAME}: is not a file"
fi
done
else
echo "[custom-init] no custom files found exiting..."
fi
USERMAP_ORIG_UID=$(id -u paperless)
USERMAP_ORIG_GID=$(id -g paperless)
USERMAP_NEW_UID=${USERMAP_UID:-$USERMAP_ORIG_UID}
USERMAP_NEW_GID=${USERMAP_GID:-${USERMAP_ORIG_GID:-$USERMAP_NEW_UID}}
if [[ ${USERMAP_NEW_UID} != "${USERMAP_ORIG_UID}" || ${USERMAP_NEW_GID} != "${USERMAP_ORIG_GID}" ]]; then
echo "Mapping UID and GID for paperless:paperless to $USERMAP_NEW_UID:$USERMAP_NEW_GID"
usermod -o -u "${USERMAP_NEW_UID}" paperless
groupmod -o -g "${USERMAP_NEW_GID}" paperless
fi
}
initialize() {
# Setup environment from secrets before anything else
# Check for a version of this var with _FILE appended
# and convert the contents to the env var value
# Source it so export is persistent
# shellcheck disable=SC1091
source /sbin/env-from-file.sh
# Change the user and group IDs if needed
map_uidgid
# Check for overrides of certain folders
map_folders
local -r export_dir="/usr/src/paperless/export"
for dir in \
"${export_dir}" \
"${DATA_DIR}" "${DATA_DIR}/index" \
"${MEDIA_ROOT_DIR}" "${MEDIA_ROOT_DIR}/documents" "${MEDIA_ROOT_DIR}/documents/originals" "${MEDIA_ROOT_DIR}/documents/thumbnails" \
"${CONSUME_DIR}"; do
if [[ ! -d "${dir}" ]]; then
echo "Creating directory ${dir}"
mkdir "${dir}"
for dir in export data data/index media media/documents media/documents/originals media/documents/thumbnails; do
if [[ ! -d "../$dir" ]]; then
echo "Creating directory ../$dir"
mkdir ../$dir
fi
done
local -r tmp_dir="/tmp/paperless"
echo "Creating directory ${tmp_dir}"
mkdir -p "${tmp_dir}"
echo "Creating directory /tmp/paperless"
mkdir -p /tmp/paperless
set +e
echo "Adjusting permissions of paperless files. This may take a while."
chown -R paperless:paperless ${tmp_dir}
for dir in \
"${export_dir}" \
"${DATA_DIR}" \
"${MEDIA_ROOT_DIR}" \
"${CONSUME_DIR}"; do
find "${dir}" -not \( -user paperless -and -group paperless \) -exec chown paperless:paperless {} +
done
chown -R paperless:paperless /tmp/paperless
find .. -not \( -user paperless -and -group paperless \) -exec chown paperless:paperless {} +
set -e
"${gosu_cmd[@]}" /sbin/docker-prepare.sh
# Leave this last thing
custom_container_init
gosu paperless /sbin/docker-prepare.sh
}
install_languages() {
echo "Installing languages..."
read -ra langs <<<"$1"
local langs="$1"
read -ra langs <<<"$langs"
# Check that it is not empty
if [ ${#langs[@]} -eq 0 ]; then
@@ -126,6 +51,10 @@ install_languages() {
for lang in "${langs[@]}"; do
pkg="tesseract-ocr-$lang"
# English is installed by default
#if [[ "$lang" == "eng" ]]; then
# continue
#fi
if dpkg -s "$pkg" &>/dev/null; then
echo "Package $pkg already installed!"
@@ -147,11 +76,6 @@ install_languages() {
echo "Paperless-ngx docker container starting..."
gosu_cmd=(gosu paperless)
if [ "$(id -u)" == "$(id -u paperless)" ]; then
gosu_cmd=()
fi
# Install additional languages if specified
if [[ -n "$PAPERLESS_OCR_LANGUAGES" ]]; then
install_languages "$PAPERLESS_OCR_LANGUAGES"
@@ -161,7 +85,7 @@ initialize
if [[ "$1" != "/"* ]]; then
echo Executing management command "$@"
exec "${gosu_cmd[@]}" python3 manage.py "$@"
exec gosu paperless python3 manage.py "$@"
else
echo Executing "$@"
exec "$@"

View File

@@ -1,44 +1,16 @@
#!/usr/bin/env bash
set -e
wait_for_postgres() {
local attempt_num=1
local -r max_attempts=5
attempt_num=1
max_attempts=5
echo "Waiting for PostgreSQL to start..."
local -r host="${PAPERLESS_DBHOST:-localhost}"
local -r port="${PAPERLESS_DBPORT:-5432}"
host="${PAPERLESS_DBHOST:=localhost}"
port="${PAPERLESS_DBPORT:=5342}"
# Disable warning, host and port can't have spaces
# shellcheck disable=SC2086
while [ ! "$(pg_isready -h ${host} -p ${port})" ]; do
if [ $attempt_num -eq $max_attempts ]; then
echo "Unable to connect to database."
exit 1
else
echo "Attempt $attempt_num failed! Trying again in 5 seconds..."
fi
attempt_num=$(("$attempt_num" + 1))
sleep 5
done
}
wait_for_mariadb() {
echo "Waiting for MariaDB to start..."
local -r host="${PAPERLESS_DBHOST:=localhost}"
local -r port="${PAPERLESS_DBPORT:=3306}"
local attempt_num=1
local -r max_attempts=5
# Disable warning, host and port can't have spaces
# shellcheck disable=SC2086
while ! true > /dev/tcp/$host/$port; do
while [ ! "$(pg_isready -h $host -p $port)" ]; do
if [ $attempt_num -eq $max_attempts ]; then
echo "Unable to connect to database."
@@ -53,14 +25,6 @@ wait_for_mariadb() {
done
}
wait_for_redis() {
# We use a Python script to send the Redis ping
# instead of installing redis-tools just for 1 thing
if ! python3 /sbin/wait-for-redis.py; then
exit 1
fi
}
migrations() {
(
# flock is in place to prevent multiple containers from doing migrations
@@ -68,25 +32,18 @@ migrations() {
# of the current container starts.
flock 200
echo "Apply database migrations..."
python3 manage.py migrate --skip-checks --no-input
) 200>"${DATA_DIR}/migration_lock"
}
django_checks() {
# Explicitly run the Django system checks
echo "Running Django checks"
python3 manage.py check
python3 manage.py migrate
) 200>/usr/src/paperless/data/migration_lock
}
search_index() {
index_version=1
index_version_file=/usr/src/paperless/data/.index_version
local -r index_version=4
local -r index_version_file=${DATA_DIR}/.index_version
if [[ (! -f "${index_version_file}") || $(<"${index_version_file}") != "$index_version" ]]; then
if [[ (! -f "$index_version_file") || $(<$index_version_file) != "$index_version" ]]; then
echo "Search index out of date. Updating..."
python3 manage.py document_index reindex --no-progress-bar
echo ${index_version} | tee "${index_version_file}" >/dev/null
python3 manage.py document_index reindex
echo $index_version | tee $index_version_file >/dev/null
fi
}
@@ -97,18 +54,12 @@ superuser() {
}
do_work() {
if [[ "${PAPERLESS_DBENGINE}" == "mariadb" ]]; then
wait_for_mariadb
elif [[ -n "${PAPERLESS_DBHOST}" ]]; then
if [[ -n "${PAPERLESS_DBHOST}" ]]; then
wait_for_postgres
fi
wait_for_redis
migrations
django_checks
search_index
superuser

View File

@@ -1,42 +0,0 @@
#!/usr/bin/env bash
# Scans the environment variables for those with the suffix _FILE
# When located, checks the file exists, and exports the contents
# of the file as the same name, minus the suffix
# This allows the use of Docker secrets or mounted files
# to fill in any of the settings configurable via environment
# variables
set -eu
for line in $(printenv)
do
# Extract the name of the environment variable
env_name=${line%%=*}
# Check if it starts with "PAPERLESS_" and ends in "_FILE"
if [[ ${env_name} == PAPERLESS_*_FILE ]]; then
# This should have been named different..
if [[ ${env_name} == "PAPERLESS_OCR_SKIP_ARCHIVE_FILE" ]]; then
continue
fi
# Extract the value of the environment
env_value=${line#*=}
# Check the file exists
if [[ -f ${env_value} ]]; then
# Trim off the _FILE suffix
non_file_env_name=${env_name%"_FILE"}
echo "Setting ${non_file_env_name} from file"
# Reads the value from th file
val="$(< "${!env_name}")"
# Sets the normal name to the read file contents
export "${non_file_env_name}"="${val}"
else
echo "File ${env_value} referenced by ${env_name} doesn't exist"
fi
fi
done

View File

@@ -1,12 +0,0 @@
#!/usr/bin/env bash
echo "Checking if we should start flower..."
if [[ -n "${PAPERLESS_ENABLE_FLOWER}" ]]; then
# Small delay to allow celery to be up first
echo "Starting flower in 5s"
sleep 5
celery --app paperless flower --conf=/usr/src/paperless/src/paperless/flowerconfig.py
else
echo "Not starting flower"
fi

View File

@@ -1,19 +1,6 @@
#!/usr/bin/env bash
set -eu
for command in decrypt_documents \
document_archiver \
document_exporter \
document_importer \
mail_fetcher \
document_create_classifier \
document_index \
document_renamer \
document_retagger \
document_thumbnails \
document_sanity_checker \
manage_superuser;
for command in document_archiver document_exporter document_importer mail_fetcher document_create_classifier document_index document_renamer document_retagger document_thumbnails document_sanity_checker manage_superuser;
do
echo "installing $command..."
sed "s/management_command/$command/g" management_script.sh > /usr/local/bin/$command

View File

@@ -3,9 +3,6 @@
set -e
cd /usr/src/paperless/src/
# This ensures environment is setup
# shellcheck disable=SC1091
source /sbin/env-from-file.sh
if [[ $(id -u) == 0 ]] ;
then

View File

@@ -1,15 +0,0 @@
#!/usr/bin/env bash
rootless_args=()
if [ "$(id -u)" == "$(id -u paperless)" ]; then
rootless_args=(
--user
paperless
--logfile
supervisord.log
--pidfile
supervisord.pid
)
fi
exec /usr/local/bin/supervisord -c /etc/supervisord.conf "${rootless_args[@]}"

View File

@@ -10,7 +10,7 @@ user=root
[program:gunicorn]
command=gunicorn -c /usr/src/paperless/gunicorn.conf.py paperless.asgi:application
user=paperless
priority = 1
stdout_logfile=/dev/stdout
stdout_logfile_maxbytes=0
stderr_logfile=/dev/stderr
@@ -19,41 +19,16 @@ stderr_logfile_maxbytes=0
[program:consumer]
command=python3 manage.py document_consumer
user=paperless
stopsignal=INT
priority = 20
stdout_logfile=/dev/stdout
stdout_logfile_maxbytes=0
stderr_logfile=/dev/stderr
stderr_logfile_maxbytes=0
[program:celery]
command = celery --app paperless worker --loglevel INFO --without-mingle --without-gossip
[program:scheduler]
command=python3 manage.py qcluster
user=paperless
stopasgroup = true
stopwaitsecs = 60
priority = 5
stdout_logfile=/dev/stdout
stdout_logfile_maxbytes=0
stderr_logfile=/dev/stderr
stderr_logfile_maxbytes=0
[program:celery-beat]
command = celery --app paperless beat --loglevel INFO
user=paperless
stopasgroup = true
priority = 10
stdout_logfile=/dev/stdout
stdout_logfile_maxbytes=0
stderr_logfile=/dev/stderr
stderr_logfile_maxbytes=0
[program:celery-flower]
command = /usr/local/bin/flower-conditional.sh
user = paperless
startsecs = 0
priority = 40
stdout_logfile=/dev/stdout
stdout_logfile_maxbytes=0
stderr_logfile=/dev/stderr

View File

@@ -1,43 +0,0 @@
#!/usr/bin/env python3
"""
Simple script which attempts to ping the Redis broker as set in the environment for
a certain number of times, waiting a little bit in between
"""
import os
import sys
import time
from typing import Final
from redis import Redis
if __name__ == "__main__":
MAX_RETRY_COUNT: Final[int] = 5
RETRY_SLEEP_SECONDS: Final[int] = 5
REDIS_URL: Final[str] = os.getenv("PAPERLESS_REDIS", "redis://localhost:6379")
print("Waiting for Redis...", flush=True)
attempt = 0
with Redis.from_url(url=REDIS_URL) as client:
while attempt < MAX_RETRY_COUNT:
try:
client.ping()
break
except Exception as e:
print(
f"Redis ping #{attempt} failed.\n"
f"Error: {str(e)}.\n"
f"Waiting {RETRY_SLEEP_SECONDS}s",
flush=True,
)
time.sleep(RETRY_SLEEP_SECONDS)
attempt += 1
if attempt >= MAX_RETRY_COUNT:
print("Failed to connect to redis using environment variable PAPERLESS_REDIS.")
sys.exit(os.EX_UNAVAILABLE)
else:
print("Connected to Redis broker.")
sys.exit(os.EX_OK)

17
docs/Dockerfile Normal file
View File

@@ -0,0 +1,17 @@
FROM python:3.5.1
# Install Sphinx and Pygments
RUN pip install Sphinx Pygments
# Setup directories, copy data
RUN mkdir /build
COPY . /build
WORKDIR /build/docs
# Build documentation
RUN make html
# Start webserver
WORKDIR /build/docs/_build/html
EXPOSE 8000/tcp
CMD ["python3", "-m", "http.server"]

177
docs/Makefile Normal file
View File

@@ -0,0 +1,177 @@
# Makefile for Sphinx documentation
#
# You can set these variables from the command line.
SPHINXOPTS =
SPHINXBUILD = sphinx-build
PAPER =
BUILDDIR = _build
# User-friendly check for sphinx-build
ifeq ($(shell which $(SPHINXBUILD) >/dev/null 2>&1; echo $$?), 1)
$(error The '$(SPHINXBUILD)' command was not found. Make sure you have Sphinx installed, then set the SPHINXBUILD environment variable to point to the full path of the '$(SPHINXBUILD)' executable. Alternatively you can add the directory with the executable to your PATH. If you don't have Sphinx installed, grab it from http://sphinx-doc.org/)
endif
# Internal variables.
PAPEROPT_a4 = -D latex_paper_size=a4
PAPEROPT_letter = -D latex_paper_size=letter
ALLSPHINXOPTS = -d $(BUILDDIR)/doctrees $(PAPEROPT_$(PAPER)) $(SPHINXOPTS) .
# the i18n builder cannot share the environment and doctrees with the others
I18NSPHINXOPTS = $(PAPEROPT_$(PAPER)) $(SPHINXOPTS) .
.PHONY: help clean html dirhtml singlehtml pickle json htmlhelp qthelp devhelp epub latex latexpdf text man changes linkcheck doctest gettext
help:
@echo "Please use \`make <target>' where <target> is one of"
@echo " html to make standalone HTML files"
@echo " dirhtml to make HTML files named index.html in directories"
@echo " singlehtml to make a single large HTML file"
@echo " pickle to make pickle files"
@echo " json to make JSON files"
@echo " htmlhelp to make HTML files and a HTML help project"
@echo " qthelp to make HTML files and a qthelp project"
@echo " devhelp to make HTML files and a Devhelp project"
@echo " epub to make an epub"
@echo " latex to make LaTeX files, you can set PAPER=a4 or PAPER=letter"
@echo " latexpdf to make LaTeX files and run them through pdflatex"
@echo " latexpdfja to make LaTeX files and run them through platex/dvipdfmx"
@echo " text to make text files"
@echo " man to make manual pages"
@echo " texinfo to make Texinfo files"
@echo " info to make Texinfo files and run them through makeinfo"
@echo " gettext to make PO message catalogs"
@echo " changes to make an overview of all changed/added/deprecated items"
@echo " xml to make Docutils-native XML files"
@echo " pseudoxml to make pseudoxml-XML files for display purposes"
@echo " linkcheck to check all external links for integrity"
@echo " doctest to run all doctests embedded in the documentation (if enabled)"
clean:
rm -rf $(BUILDDIR)/*
html:
$(SPHINXBUILD) -b html $(ALLSPHINXOPTS) $(BUILDDIR)/html
@echo
@echo "Build finished. The HTML pages are in $(BUILDDIR)/html."
dirhtml:
$(SPHINXBUILD) -b dirhtml $(ALLSPHINXOPTS) $(BUILDDIR)/dirhtml
@echo
@echo "Build finished. The HTML pages are in $(BUILDDIR)/dirhtml."
singlehtml:
$(SPHINXBUILD) -b singlehtml $(ALLSPHINXOPTS) $(BUILDDIR)/singlehtml
@echo
@echo "Build finished. The HTML page is in $(BUILDDIR)/singlehtml."
pickle:
$(SPHINXBUILD) -b pickle $(ALLSPHINXOPTS) $(BUILDDIR)/pickle
@echo
@echo "Build finished; now you can process the pickle files."
json:
$(SPHINXBUILD) -b json $(ALLSPHINXOPTS) $(BUILDDIR)/json
@echo
@echo "Build finished; now you can process the JSON files."
htmlhelp:
$(SPHINXBUILD) -b htmlhelp $(ALLSPHINXOPTS) $(BUILDDIR)/htmlhelp
@echo
@echo "Build finished; now you can run HTML Help Workshop with the" \
".hhp project file in $(BUILDDIR)/htmlhelp."
qthelp:
$(SPHINXBUILD) -b qthelp $(ALLSPHINXOPTS) $(BUILDDIR)/qthelp
@echo
@echo "Build finished; now you can run "qcollectiongenerator" with the" \
".qhcp project file in $(BUILDDIR)/qthelp, like this:"
@echo "# qcollectiongenerator $(BUILDDIR)/qthelp/RIPEAtlasToolsMagellan.qhcp"
@echo "To view the help file:"
@echo "# assistant -collectionFile $(BUILDDIR)/qthelp/RIPEAtlasToolsMagellan.qhc"
devhelp:
$(SPHINXBUILD) -b devhelp $(ALLSPHINXOPTS) $(BUILDDIR)/devhelp
@echo
@echo "Build finished."
@echo "To view the help file:"
@echo "# mkdir -p $$HOME/.local/share/devhelp/RIPEAtlasToolsMagellan"
@echo "# ln -s $(BUILDDIR)/devhelp $$HOME/.local/share/devhelp/RIPEAtlasToolsMagellan"
@echo "# devhelp"
epub:
$(SPHINXBUILD) -b epub $(ALLSPHINXOPTS) $(BUILDDIR)/epub
@echo
@echo "Build finished. The epub file is in $(BUILDDIR)/epub."
latex:
$(SPHINXBUILD) -b latex $(ALLSPHINXOPTS) $(BUILDDIR)/latex
@echo
@echo "Build finished; the LaTeX files are in $(BUILDDIR)/latex."
@echo "Run \`make' in that directory to run these through (pdf)latex" \
"(use \`make latexpdf' here to do that automatically)."
latexpdf:
$(SPHINXBUILD) -b latex $(ALLSPHINXOPTS) $(BUILDDIR)/latex
@echo "Running LaTeX files through pdflatex..."
$(MAKE) -C $(BUILDDIR)/latex all-pdf
@echo "pdflatex finished; the PDF files are in $(BUILDDIR)/latex."
latexpdfja:
$(SPHINXBUILD) -b latex $(ALLSPHINXOPTS) $(BUILDDIR)/latex
@echo "Running LaTeX files through platex and dvipdfmx..."
$(MAKE) -C $(BUILDDIR)/latex all-pdf-ja
@echo "pdflatex finished; the PDF files are in $(BUILDDIR)/latex."
text:
$(SPHINXBUILD) -b text $(ALLSPHINXOPTS) $(BUILDDIR)/text
@echo
@echo "Build finished. The text files are in $(BUILDDIR)/text."
man:
$(SPHINXBUILD) -b man $(ALLSPHINXOPTS) $(BUILDDIR)/man
@echo
@echo "Build finished. The manual pages are in $(BUILDDIR)/man."
texinfo:
$(SPHINXBUILD) -b texinfo $(ALLSPHINXOPTS) $(BUILDDIR)/texinfo
@echo
@echo "Build finished. The Texinfo files are in $(BUILDDIR)/texinfo."
@echo "Run \`make' in that directory to run these through makeinfo" \
"(use \`make info' here to do that automatically)."
info:
$(SPHINXBUILD) -b texinfo $(ALLSPHINXOPTS) $(BUILDDIR)/texinfo
@echo "Running Texinfo files through makeinfo..."
make -C $(BUILDDIR)/texinfo info
@echo "makeinfo finished; the Info files are in $(BUILDDIR)/texinfo."
gettext:
$(SPHINXBUILD) -b gettext $(I18NSPHINXOPTS) $(BUILDDIR)/locale
@echo
@echo "Build finished. The message catalogs are in $(BUILDDIR)/locale."
changes:
$(SPHINXBUILD) -b changes $(ALLSPHINXOPTS) $(BUILDDIR)/changes
@echo
@echo "The overview file is in $(BUILDDIR)/changes."
linkcheck:
$(SPHINXBUILD) -b linkcheck $(ALLSPHINXOPTS) $(BUILDDIR)/linkcheck
@echo
@echo "Link check complete; look for any errors in the above output " \
"or in $(BUILDDIR)/linkcheck/output.txt."
doctest:
$(SPHINXBUILD) -b doctest $(ALLSPHINXOPTS) $(BUILDDIR)/doctest
@echo "Testing of doctests in the sources finished, look at the " \
"results in $(BUILDDIR)/doctest/output.txt."
xml:
$(SPHINXBUILD) -b xml $(ALLSPHINXOPTS) $(BUILDDIR)/xml
@echo
@echo "Build finished. The XML files are in $(BUILDDIR)/xml."
pseudoxml:
$(SPHINXBUILD) -b pseudoxml $(ALLSPHINXOPTS) $(BUILDDIR)/pseudoxml
@echo
@echo "Build finished. The pseudo-XML files are in $(BUILDDIR)/pseudoxml."

14
docs/_static/custom.css vendored Normal file
View File

@@ -0,0 +1,14 @@
/* override table width restrictions */
@media screen and (min-width: 767px) {
.wy-table-responsive table td {
/* !important prevents the common CSS stylesheets from
overriding this as on RTD they are loaded after this stylesheet */
white-space: normal !important;
}
.wy-table-responsive {
overflow: visible !important;
}
}

View File

Before

Width:  |  Height:  |  Size: 67 KiB

After

Width:  |  Height:  |  Size: 67 KiB

BIN
docs/_static/screenshot.png vendored Normal file

Binary file not shown.

After

Width:  |  Height:  |  Size: 445 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 106 KiB

BIN
docs/_static/screenshots/dashboard.png vendored Normal file

Binary file not shown.

After

Width:  |  Height:  |  Size: 167 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 28 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 306 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 410 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 137 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 680 KiB

Binary file not shown.

Before

Width:  |  Height:  |  Size: 685 KiB

After

Width:  |  Height:  |  Size: 686 KiB

BIN
docs/_static/screenshots/editing.png vendored Normal file

Binary file not shown.

After

Width:  |  Height:  |  Size: 293 KiB

BIN
docs/_static/screenshots/logs.png vendored Normal file

Binary file not shown.

After

Width:  |  Height:  |  Size: 260 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 96 KiB

BIN
docs/_static/screenshots/mobile.png vendored Normal file

Binary file not shown.

After

Width:  |  Height:  |  Size: 158 KiB

BIN
docs/_static/screenshots/new-tag.png vendored Normal file

Binary file not shown.

After

Width:  |  Height:  |  Size: 32 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 61 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 261 KiB

View File

@@ -1,551 +0,0 @@
# Administration
## Making backups {#backup}
Multiple options exist for making backups of your paperless instance,
depending on how you installed paperless.
Before making backups, make sure that paperless is not running.
Options available to any installation of paperless:
- Use the [document exporter](#exporter). The document exporter exports all your documents,
thumbnails and metadata to a specific folder. You may import your
documents into a fresh instance of paperless again or store your
documents in another DMS with this export.
- The document exporter is also able to update an already existing
export. Therefore, incremental backups with `rsync` are entirely
possible.
!!! caution
You cannot import the export generated with one version of paperless in
a different version of paperless. The export contains an exact image of
the database, and migrations may change the database layout.
Options available to docker installations:
- Backup the docker volumes. These usually reside within
`/var/lib/docker/volumes` on the host and you need to be root in
order to access them.
Paperless uses 4 volumes:
- `paperless_media`: This is where your documents are stored.
- `paperless_data`: This is where auxillary data is stored. This
folder also contains the SQLite database, if you use it.
- `paperless_pgdata`: Exists only if you use PostgreSQL and
contains the database.
- `paperless_dbdata`: Exists only if you use MariaDB and contains
the database.
Options available to bare-metal and non-docker installations:
- Backup the entire paperless folder. This ensures that if your
paperless instance crashes at some point or your disk fails, you can
simply copy the folder back into place and it works.
When using PostgreSQL or MariaDB, you'll also have to backup the
database.
### Restoring {#migrating-restoring}
## Updating Paperless {#updating}
### Docker Route {#docker-updating}
If a new release of paperless-ngx is available, upgrading depends on how
you installed paperless-ngx in the first place. The releases are
available at the [release
page](https://github.com/paperless-ngx/paperless-ngx/releases).
First of all, ensure that paperless is stopped.
```shell-session
$ cd /path/to/paperless
$ docker-compose down
```
After that, [make a backup](#backup).
1. If you pull the image from the docker hub, all you need to do is:
```shell-session
$ docker-compose pull
$ docker-compose up
```
The docker-compose files refer to the `latest` version, which is
always the latest stable release.
2. If you built the image yourself, do the following:
```shell-session
$ git pull
$ docker-compose build
$ docker-compose up
```
Running `docker-compose up` will also apply any new database migrations.
If you see everything working, press CTRL+C once to gracefully stop
paperless. Then you can start paperless-ngx with `-d` to have it run in
the background.
!!! note
In version 0.9.14, the update process was changed. In 0.9.13 and
earlier, the docker-compose files specified exact versions and pull
won't automatically update to newer versions. In order to enable
updates as described above, either get the new `docker-compose.yml`
file from
[here](https://github.com/paperless-ngx/paperless-ngx/tree/main/docker/compose)
or edit the `docker-compose.yml` file, find the line that says
```
image: ghcr.io/paperless-ngx/paperless-ngx:0.9.x
```
and replace the version with `latest`:
```
image: ghcr.io/paperless-ngx/paperless-ngx:latest
```
!!! note
In version 1.7.1 and onwards, the Docker image can now be pinned to a
release series. This is often combined with automatic updaters such as
Watchtower to allow safer unattended upgrading to new bugfix releases
only. It is still recommended to always review release notes before
upgrading. To pin your install to a release series, edit the
`docker-compose.yml` find the line that says
```
image: ghcr.io/paperless-ngx/paperless-ngx:latest
```
and replace the version with the series you want to track, for
example:
```
image: ghcr.io/paperless-ngx/paperless-ngx:1.7
```
### Bare Metal Route {#bare-metal-updating}
After grabbing the new release and unpacking the contents, do the
following:
1. Update dependencies. New paperless version may require additional
dependencies. The dependencies required are listed in the section
about
[bare metal installations](/setup#bare_metal).
2. Update python requirements. Keep in mind to activate your virtual
environment before that, if you use one.
```shell-session
$ pip install -r requirements.txt
```
3. Migrate the database.
```shell-session
$ cd src
$ python3 manage.py migrate # (1)
```
1. Including `sudo -Hu <paperless_user>` may be required
This might not actually do anything. Not every new paperless version
comes with new database migrations.
## Downgrading Paperless {#downgrade-paperless}
Downgrades are possible. However, some updates also contain database
migrations (these change the layout of the database and may move data).
In order to move back from a version that applied database migrations,
you'll have to revert the database migration _before_ downgrading, and
then downgrade paperless.
This table lists the compatible versions for each database migration
number.
| Migration number | Version range |
| ---------------- | --------------- |
| 1011 | 1.0.0 |
| 1012 | 1.1.0 - 1.2.1 |
| 1014 | 1.3.0 - 1.3.1 |
| 1016 | 1.3.2 - current |
Execute the following management command to migrate your database:
```shell-session
$ python3 manage.py migrate documents <migration number>
```
!!! note
Some migrations cannot be undone. The command will issue errors if that
happens.
## Management utilities {#management-commands}
Paperless comes with some management commands that perform various
maintenance tasks on your paperless instance. You can invoke these
commands in the following way:
With docker-compose, while paperless is running:
```shell-session
$ cd /path/to/paperless
$ docker-compose exec webserver <command> <arguments>
```
With docker, while paperless is running:
```shell-session
$ docker exec -it <container-name> <command> <arguments>
```
Bare metal:
```shell-session
$ cd /path/to/paperless/src
$ python3 manage.py <command> <arguments> # (1)
```
1. Including `sudo -Hu <paperless_user>` may be required
All commands have built-in help, which can be accessed by executing them
with the argument `--help`.
### Document exporter {#exporter}
The document exporter exports all your data from paperless into a folder
for backup or migration to another DMS.
If you use the document exporter within a cronjob to backup your data
you might use the `-T` flag behind exec to suppress "The input device
is not a TTY" errors. For example:
`docker-compose exec -T webserver document_exporter ../export`
```
document_exporter target [-c] [-d] [-f] [-na] [-nt] [-p] [-sm] [-z]
optional arguments:
-c, --compare-checksums
-d, --delete
-f, --use-filename-format
-na, --no-archive
-nt, --no-thumbnail
-p, --use-folder-prefix
-sm, --split-manifest
-z --zip
```
`target` is a folder to which the data gets written. This includes
documents, thumbnails and a `manifest.json` file. The manifest contains
all metadata from the database (correspondents, tags, etc).
When you use the provided docker compose script, specify `../export` as
the target. This path inside the container is automatically mounted on
your host on the folder `export`.
If the target directory already exists and contains files, paperless
will assume that the contents of the export directory are a previous
export and will attempt to update the previous export. Paperless will
only export changed and added files. Paperless determines whether a file
has changed by inspecting the file attributes "date/time modified" and
"size". If that does not work out for you, specify `-c` or
`--compare-checksums` and paperless will attempt to compare file
checksums instead. This is slower.
Paperless will not remove any existing files in the export directory. If
you want paperless to also remove files that do not belong to the
current export such as files from deleted documents, specify `-d` or `--delete`.
Be careful when pointing paperless to a directory that already contains
other files.
The filenames generated by this command follow the format
`[date created] [correspondent] [title].[extension]`. If you want
paperless to use `PAPERLESS_FILENAME_FORMAT` for exported filenames
instead, specify `-f` or `--use-filename-format`.
If `-na` or `--no-archive` is provided, no archive files will be exported,
only the original files.
If `-nt` or `--no-thumbnail` is provided, thumbnail files will not be exported.
!!! note
When using the `-na`/`--no-archive` or `-nt`/`--no-thumbnail` options
the exporter will not output these files for backup. After importing,
the [sanity checker](#sanity-checker) will warn about missing thumbnails and archive files
until they are regenerated with `document_thumbnails` or [`document_archiver`](#archiver).
It can make sense to omit these files from backup as their content and checksum
can change (new archiver algorithm) and may then cause additional used space in
a deduplicated backup.
If `-p` or `--use-folder-prefix` is provided, files will be exported
in dedicated folders according to their nature: `archive`, `originals`,
`thumbnails` or `json`
If `-sm` or `--split-manifest` is provided, information about document
will be placed in individual json files, instead of a single JSON file. The main
manifest.json will still contain application wide information (e.g. tags, correspondent,
documenttype, etc)
If `-z` or `--zip` is provided, the export will be a zipfile
in the target directory, named according to the current date.
!!! warning
If exporting with the file name format, there may be errors due to
your operating system's maximum path lengths. Try adjusting the export
target or consider not using the filename format.
### Document importer {#importer}
The document importer takes the export produced by the [Document
exporter](#exporter) and imports it into paperless.
The importer works just like the exporter. You point it at a directory,
and the script does the rest of the work:
```
document_importer source
```
When you use the provided docker compose script, put the export inside
the `export` folder in your paperless source directory. Specify
`../export` as the `source`.
!!! note
Importing from a previous version of Paperless may work, but for best
results it is suggested to match the versions.
### Document retagger {#retagger}
Say you've imported a few hundred documents and now want to introduce a
tag or set up a new correspondent, and apply its matching to all of the
currently-imported docs. This problem is common enough that there are
tools for it.
```
document_retagger [-h] [-c] [-T] [-t] [-i] [--use-first] [-f]
optional arguments:
-c, --correspondent
-T, --tags
-t, --document_type
-s, --storage_path
-i, --inbox-only
--use-first
-f, --overwrite
```
Run this after changing or adding matching rules. It'll loop over all
of the documents in your database and attempt to match documents
according to the new rules.
Specify any combination of `-c`, `-T`, `-t` and `-s` to have the
retagger perform matching of the specified metadata type. If you don't
specify any of these options, the document retagger won't do anything.
Specify `-i` to have the document retagger work on documents tagged with
inbox tags only. This is useful when you don't want to mess with your
already processed documents.
When multiple document types or correspondents match a single document,
the retagger won't assign these to the document. Specify `--use-first`
to override this behavior and just use the first correspondent or type
it finds. This option does not apply to tags, since any amount of tags
can be applied to a document.
Finally, `-f` specifies that you wish to overwrite already assigned
correspondents, types and/or tags. The default behavior is to not assign
correspondents and types to documents that have this data already
assigned. `-f` works differently for tags: By default, only additional
tags get added to documents, no tags will be removed. With `-f`, tags
that don't match a document anymore get removed as well.
### Managing the Automatic matching algorithm
The _Auto_ matching algorithm requires a trained neural network to work.
This network needs to be updated whenever somethings in your data
changes. The docker image takes care of that automatically with the task
scheduler. You can manually renew the classifier by invoking the
following management command:
```
document_create_classifier
```
This command takes no arguments.
### Document thumbnails {#thumbnails}
Use this command to re-create document thumbnails. Optionally include the ` --document {id}` option to generate thumbnails for a specific document only.
```
document_thumbnails
```
### Managing the document search index {#index}
The document search index is responsible for delivering search results
for the website. The document index is automatically updated whenever
documents get added to, changed, or removed from paperless. However, if
the search yields non-existing documents or won't find anything, you
may need to recreate the index manually.
```
document_index {reindex,optimize}
```
Specify `reindex` to have the index created from scratch. This may take
some time.
Specify `optimize` to optimize the index. This updates certain aspects
of the index and usually makes queries faster and also ensures that the
autocompletion works properly. This command is regularly invoked by the
task scheduler.
### Managing filenames {#renamer}
If you use paperless' feature to
[assign custom filenames to your documents](/advanced_usage#file-name-handling), you can use this command to move all your files after
changing the naming scheme.
!!! warning
Since this command moves your documents, it is advised to do a backup
beforehand. The renaming logic is robust and will never overwrite or
delete a file, but you can't ever be careful enough.
```
document_renamer
```
The command takes no arguments and processes all your documents at once.
Learn how to use
[Management Utilities](#management-commands).
### Sanity checker {#sanity-checker}
Paperless has a built-in sanity checker that inspects your document
collection for issues.
The issues detected by the sanity checker are as follows:
- Missing original files.
- Missing archive files.
- Inaccessible original files due to improper permissions.
- Inaccessible archive files due to improper permissions.
- Corrupted original documents by comparing their checksum against
what is stored in the database.
- Corrupted archive documents by comparing their checksum against what
is stored in the database.
- Missing thumbnails.
- Inaccessible thumbnails due to improper permissions.
- Documents without any content (warning).
- Orphaned files in the media directory (warning). These are files
that are not referenced by any document im paperless.
```
document_sanity_checker
```
The command takes no arguments. Depending on the size of your document
archive, this may take some time.
### Fetching e-mail
Paperless automatically fetches your e-mail every 10 minutes by default.
If you want to invoke the email consumer manually, call the following
management command:
```
mail_fetcher
```
The command takes no arguments and processes all your mail accounts and
rules.
!!! tip
To use OAuth access tokens for mail fetching,
select the box to indicate the password is actually
a token when creating or editing a mail account. The
details for creating a token depend on your email
provider.
### Creating archived documents {#archiver}
Paperless stores archived PDF/A documents alongside your original
documents. These archived documents will also contain selectable text
for image-only originals. These documents are derived from the
originals, which are always stored unmodified. If coming from an earlier
version of paperless, your documents won't have archived versions.
This command creates PDF/A documents for your documents.
```
document_archiver --overwrite --document <id>
```
This command will only attempt to create archived documents when no
archived document exists yet, unless `--overwrite` is specified. If
`--document <id>` is specified, the archiver will only process that
document.
!!! note
This command essentially performs OCR on all your documents again,
according to your settings. If you run this with
`PAPERLESS_OCR_MODE=redo`, it will potentially run for a very long time.
You can cancel the command at any time, since this command will skip
already archived versions the next time it is run.
!!! note
Some documents will cause errors and cannot be converted into PDF/A
documents, such as encrypted PDF documents. The archiver will skip over
these documents each time it sees them.
### Managing encryption {#encryption}
Documents can be stored in Paperless using GnuPG encryption.
!!! warning
Encryption is deprecated since [paperless-ng 0.9](/changelog#paperless-ng-090) and doesn't really
provide any additional security, since you have to store the passphrase
in a configuration file on the same system as the encrypted documents
for paperless to work. Furthermore, the entire text content of the
documents is stored plain in the database, even if your documents are
encrypted. Filenames are not encrypted as well.
Also, the web server provides transparent access to your encrypted
documents.
Consider running paperless on an encrypted filesystem instead, which
will then at least provide security against physical hardware theft.
#### Enabling encryption
Enabling encryption is no longer supported.
#### Disabling encryption
Basic usage to disable encryption of your document store:
(Note: If `PAPERLESS_PASSPHRASE` isn't set already, you need to specify
it here)
```
decrypt_documents [--passphrase SECR3TP4SSPHRA$E]
```

499
docs/administration.rst Normal file
View File

@@ -0,0 +1,499 @@
**************
Administration
**************
.. _administration-backup:
Making backups
##############
Multiple options exist for making backups of your paperless instance,
depending on how you installed paperless.
Before making backups, make sure that paperless is not running.
Options available to any installation of paperless:
* Use the :ref:`document exporter <utilities-exporter>`.
The document exporter exports all your documents, thumbnails and
metadata to a specific folder. You may import your documents into a
fresh instance of paperless again or store your documents in another
DMS with this export.
* The document exporter is also able to update an already existing export.
Therefore, incremental backups with ``rsync`` are entirely possible.
.. caution::
You cannot import the export generated with one version of paperless in a
different version of paperless. The export contains an exact image of the
database, and migrations may change the database layout.
Options available to docker installations:
* Backup the docker volumes. These usually reside within
``/var/lib/docker/volumes`` on the host and you need to be root in order
to access them.
Paperless uses 3 volumes:
* ``paperless_media``: This is where your documents are stored.
* ``paperless_data``: This is where auxillary data is stored. This
folder also contains the SQLite database, if you use it.
* ``paperless_pgdata``: Exists only if you use PostgreSQL and contains
the database.
Options available to bare-metal and non-docker installations:
* Backup the entire paperless folder. This ensures that if your paperless instance
crashes at some point or your disk fails, you can simply copy the folder back
into place and it works.
When using PostgreSQL, you'll also have to backup the database.
.. _migrating-restoring:
Restoring
=========
.. _administration-updating:
Updating Paperless
##################
Docker Route
============
If a new release of paperless-ngx is available, upgrading depends on how you
installed paperless-ngx in the first place. The releases are available at the
`release page <https://github.com/paperless-ngx/paperless-ngx/releases>`_.
First of all, ensure that paperless is stopped.
.. code:: shell-session
$ cd /path/to/paperless
$ docker-compose down
After that, :ref:`make a backup <administration-backup>`.
A. If you pull the image from the docker hub, all you need to do is:
.. code:: shell-session
$ docker-compose pull
$ docker-compose up
The docker-compose files refer to the ``latest`` version, which is always the latest
stable release.
B. If you built the image yourself, do the following:
.. code:: shell-session
$ git pull
$ docker-compose build
$ docker-compose up
Running ``docker-compose up`` will also apply any new database migrations.
If you see everything working, press CTRL+C once to gracefully stop paperless.
Then you can start paperless-ngx with ``-d`` to have it run in the background.
.. note::
In version 0.9.14, the update process was changed. In 0.9.13 and earlier, the
docker-compose files specified exact versions and pull won't automatically
update to newer versions. In order to enable updates as described above, either
get the new ``docker-compose.yml`` file from `here <https://github.com/paperless-ngx/paperless-ngx/tree/master/docker/compose>`_
or edit the ``docker-compose.yml`` file, find the line that says
.. code::
image: ghcr.io/paperless-ngx/paperless-ngx:0.9.x
and replace the version with ``latest``:
.. code::
image: ghcr.io/paperless-ngx/paperless-ngx:latest
Bare Metal Route
================
After grabbing the new release and unpacking the contents, do the following:
1. Update dependencies. New paperless version may require additional
dependencies. The dependencies required are listed in the section about
:ref:`bare metal installations <setup-bare_metal>`.
2. Update python requirements. Keep in mind to activate your virtual environment
before that, if you use one.
.. code:: shell-session
$ pip install -r requirements.txt
3. Migrate the database.
.. code:: shell-session
$ cd src
$ python3 manage.py migrate
This might not actually do anything. Not every new paperless version comes with new
database migrations.
Downgrading Paperless
#####################
Downgrades are possible. However, some updates also contain database migrations (these change the layout of the database and may move data).
In order to move back from a version that applied database migrations, you'll have to revert the database migration *before* downgrading,
and then downgrade paperless.
This table lists the compatible versions for each database migration number.
+------------------+-----------------+
| Migration number | Version range |
+------------------+-----------------+
| 1011 | 1.0.0 |
+------------------+-----------------+
| 1012 | 1.1.0 - 1.2.1 |
+------------------+-----------------+
| 1014 | 1.3.0 - 1.3.1 |
+------------------+-----------------+
| 1016 | 1.3.2 - current |
+------------------+-----------------+
Execute the following management command to migrate your database:
.. code:: shell-session
$ python3 manage.py migrate documents <migration number>
.. note::
Some migrations cannot be undone. The command will issue errors if that happens.
.. _utilities-management-commands:
Management utilities
####################
Paperless comes with some management commands that perform various maintenance
tasks on your paperless instance. You can invoke these commands in the following way:
With docker-compose, while paperless is running:
.. code:: shell-session
$ cd /path/to/paperless
$ docker-compose exec webserver <command> <arguments>
With docker, while paperless is running:
.. code:: shell-session
$ docker exec -it <container-name> <command> <arguments>
Bare metal:
.. code:: shell-session
$ cd /path/to/paperless/src
$ python3 manage.py <command> <arguments>
All commands have built-in help, which can be accessed by executing them with
the argument ``--help``.
.. _utilities-exporter:
Document exporter
=================
The document exporter exports all your data from paperless into a folder for
backup or migration to another DMS.
If you use the document exporter within a cronjob to backup your data you might use the ``-T`` flag behind exec to suppress "The input device is not a TTY" errors. For example: ``docker-compose exec -T webserver document_exporter ../export``
.. code::
document_exporter target [-c] [-f] [-d]
optional arguments:
-c, --compare-checksums
-f, --use-filename-format
-d, --delete
``target`` is a folder to which the data gets written. This includes documents,
thumbnails and a ``manifest.json`` file. The manifest contains all metadata from
the database (correspondents, tags, etc).
When you use the provided docker compose script, specify ``../export`` as the
target. This path inside the container is automatically mounted on your host on
the folder ``export``.
If the target directory already exists and contains files, paperless will assume
that the contents of the export directory are a previous export and will attempt
to update the previous export. Paperless will only export changed and added files.
Paperless determines whether a file has changed by inspecting the file attributes
"date/time modified" and "size". If that does not work out for you, specify
``--compare-checksums`` and paperless will attempt to compare file checksums instead.
This is slower.
Paperless will not remove any existing files in the export directory. If you want
paperless to also remove files that do not belong to the current export such as files
from deleted documents, specify ``--delete``. Be careful when pointing paperless to
a directory that already contains other files.
The filenames generated by this command follow the format
``[date created] [correspondent] [title].[extension]``.
If you want paperless to use ``PAPERLESS_FILENAME_FORMAT`` for exported filenames
instead, specify ``--use-filename-format``.
.. _utilities-importer:
Document importer
=================
The document importer takes the export produced by the `Document exporter`_ and
imports it into paperless.
The importer works just like the exporter. You point it at a directory, and
the script does the rest of the work:
.. code::
document_importer source
When you use the provided docker compose script, put the export inside the
``export`` folder in your paperless source directory. Specify ``../export``
as the ``source``.
.. _utilities-retagger:
Document retagger
=================
Say you've imported a few hundred documents and now want to introduce
a tag or set up a new correspondent, and apply its matching to all of
the currently-imported docs. This problem is common enough that
there are tools for it.
.. code::
document_retagger [-h] [-c] [-T] [-t] [-i] [--use-first] [-f]
optional arguments:
-c, --correspondent
-T, --tags
-t, --document_type
-i, --inbox-only
--use-first
-f, --overwrite
Run this after changing or adding matching rules. It'll loop over all
of the documents in your database and attempt to match documents
according to the new rules.
Specify any combination of ``-c``, ``-T`` and ``-t`` to have the
retagger perform matching of the specified metadata type. If you don't
specify any of these options, the document retagger won't do anything.
Specify ``-i`` to have the document retagger work on documents tagged
with inbox tags only. This is useful when you don't want to mess with
your already processed documents.
When multiple document types or correspondents match a single document,
the retagger won't assign these to the document. Specify ``--use-first``
to override this behavior and just use the first correspondent or type
it finds. This option does not apply to tags, since any amount of tags
can be applied to a document.
Finally, ``-f`` specifies that you wish to overwrite already assigned
correspondents, types and/or tags. The default behavior is to not
assign correspondents and types to documents that have this data already
assigned. ``-f`` works differently for tags: By default, only additional tags get
added to documents, no tags will be removed. With ``-f``, tags that don't
match a document anymore get removed as well.
Managing the Automatic matching algorithm
=========================================
The *Auto* matching algorithm requires a trained neural network to work.
This network needs to be updated whenever somethings in your data
changes. The docker image takes care of that automatically with the task
scheduler. You can manually renew the classifier by invoking the following
management command:
.. code::
document_create_classifier
This command takes no arguments.
.. _`administration-index`:
Managing the document search index
==================================
The document search index is responsible for delivering search results for the
website. The document index is automatically updated whenever documents get
added to, changed, or removed from paperless. However, if the search yields
non-existing documents or won't find anything, you may need to recreate the
index manually.
.. code::
document_index {reindex,optimize}
Specify ``reindex`` to have the index created from scratch. This may take some
time.
Specify ``optimize`` to optimize the index. This updates certain aspects of
the index and usually makes queries faster and also ensures that the
autocompletion works properly. This command is regularly invoked by the task
scheduler.
.. _utilities-renamer:
Managing filenames
==================
If you use paperless' feature to
:ref:`assign custom filenames to your documents <advanced-file_name_handling>`,
you can use this command to move all your files after changing
the naming scheme.
.. warning::
Since this command moves you documents around alot, it is advised to to
a backup before. The renaming logic is robust and will never overwrite
or delete a file, but you can't ever be careful enough.
.. code::
document_renamer
The command takes no arguments and processes all your documents at once.
Learn how to use :ref:`Management Utilities<utilities-management-commands>`.
.. _utilities-sanity-checker:
Sanity checker
==============
Paperless has a built-in sanity checker that inspects your document collection for issues.
The issues detected by the sanity checker are as follows:
* Missing original files.
* Missing archive files.
* Inaccessible original files due to improper permissions.
* Inaccessible archive files due to improper permissions.
* Corrupted original documents by comparing their checksum against what is stored in the database.
* Corrupted archive documents by comparing their checksum against what is stored in the database.
* Missing thumbnails.
* Inaccessible thumbnails due to improper permissions.
* Documents without any content (warning).
* Orphaned files in the media directory (warning). These are files that are not referenced by any document im paperless.
.. code::
document_sanity_checker
The command takes no arguments. Depending on the size of your document archive, this may take some time.
Fetching e-mail
===============
Paperless automatically fetches your e-mail every 10 minutes by default. If
you want to invoke the email consumer manually, call the following management
command:
.. code::
mail_fetcher
The command takes no arguments and processes all your mail accounts and rules.
.. _utilities-archiver:
Creating archived documents
===========================
Paperless stores archived PDF/A documents alongside your original documents.
These archived documents will also contain selectable text for image-only
originals.
These documents are derived from the originals, which are always stored
unmodified. If coming from an earlier version of paperless, your documents
won't have archived versions.
This command creates PDF/A documents for your documents.
.. code::
document_archiver --overwrite --document <id>
This command will only attempt to create archived documents when no archived
document exists yet, unless ``--overwrite`` is specified. If ``--document <id>``
is specified, the archiver will only process that document.
.. note::
This command essentially performs OCR on all your documents again,
according to your settings. If you run this with ``PAPERLESS_OCR_MODE=redo``,
it will potentially run for a very long time. You can cancel the command
at any time, since this command will skip already archived versions the next time
it is run.
.. note::
Some documents will cause errors and cannot be converted into PDF/A documents,
such as encrypted PDF documents. The archiver will skip over these documents
each time it sees them.
.. _utilities-encyption:
Managing encryption
===================
Documents can be stored in Paperless using GnuPG encryption.
.. danger::
Encryption is deprecated since paperless-ngx 0.9 and doesn't really provide any
additional security, since you have to store the passphrase in a configuration
file on the same system as the encrypted documents for paperless to work.
Furthermore, the entire text content of the documents is stored plain in the
database, even if your documents are encrypted. Filenames are not encrypted as
well.
Also, the web server provides transparent access to your encrypted documents.
Consider running paperless on an encrypted filesystem instead, which will then
at least provide security against physical hardware theft.
Enabling encryption
-------------------
Enabling encryption is no longer supported.
Disabling encryption
--------------------
Basic usage to disable encryption of your document store:
(Note: If ``PAPERLESS_PASSPHRASE`` isn't set already, you need to specify it here)
.. code::
decrypt_documents [--passphrase SECR3TP4SSPHRA$E]

View File

@@ -1,545 +0,0 @@
# Advanced Topics
Paperless offers a couple features that automate certain tasks and make
your life easier.
## Matching tags, correspondents, document types, and storage paths {#matching}
Paperless will compare the matching algorithms defined by every tag,
correspondent, document type, and storage path in your database to see
if they apply to the text in a document. In other words, if you define a
tag called `Home Utility` that had a `match` property of `bc hydro` and
a `matching_algorithm` of `Exact`, Paperless will automatically tag
your newly-consumed document with your `Home Utility` tag so long as the
text `bc hydro` appears in the body of the document somewhere.
The matching logic is quite powerful. It supports searching the text of
your document with different algorithms, and as such, some
experimentation may be necessary to get things right.
In order to have a tag, correspondent, document type, or storage path
assigned automatically to newly consumed documents, assign a match and
matching algorithm using the web interface. These settings define when
to assign tags, correspondents, document types, and storage paths to
documents.
The following algorithms are available:
- **None:** No matching will be performed.
- **Any:** Looks for any occurrence of any word provided in match in
the PDF. If you define the match as `Bank1 Bank2`, it will match
documents containing either of these terms.
- **All:** Requires that every word provided appears in the PDF,
albeit not in the order provided.
- **Exact:** Matches only if the match appears exactly as provided
(i.e. preserve ordering) in the PDF.
- **Regular expression:** Parses the match as a regular expression and
tries to find a match within the document.
- **Fuzzy match:** I don't know. Look at the source.
- **Auto:** Tries to automatically match new documents. This does not
require you to set a match. See the notes below.
When using the _any_ or _all_ matching algorithms, you can search for
terms that consist of multiple words by enclosing them in double quotes.
For example, defining a match text of `"Bank of America" BofA` using the
_any_ algorithm, will match documents that contain either "Bank of
America" or "BofA", but will not match documents containing "Bank of
South America".
Then just save your tag, correspondent, document type, or storage path
and run another document through the consumer. Once complete, you should
see the newly-created document, automatically tagged with the
appropriate data.
### Automatic matching {#automatic-matching}
Paperless-ngx comes with a new matching algorithm called _Auto_. This
matching algorithm tries to assign tags, correspondents, document types,
and storage paths to your documents based on how you have already
assigned these on existing documents. It uses a neural network under the
hood.
If, for example, all your bank statements of your account 123 at the
Bank of America are tagged with the tag "bofa123" and the matching
algorithm of this tag is set to _Auto_, this neural network will examine
your documents and automatically learn when to assign this tag.
Paperless tries to hide much of the involved complexity with this
approach. However, there are a couple caveats you need to keep in mind
when using this feature:
- Changes to your documents are not immediately reflected by the
matching algorithm. The neural network needs to be _trained_ on your
documents after changes. Paperless periodically (default: once each
hour) checks for changes and does this automatically for you.
- The Auto matching algorithm only takes documents into account which
are NOT placed in your inbox (i.e. have any inbox tags assigned to
them). This ensures that the neural network only learns from
documents which you have correctly tagged before.
- The matching algorithm can only work if there is a correlation
between the tag, correspondent, document type, or storage path and
the document itself. Your bank statements usually contain your bank
account number and the name of the bank, so this works reasonably
well, However, tags such as "TODO" cannot be automatically
assigned.
- The matching algorithm needs a reasonable number of documents to
identify when to assign tags, correspondents, storage paths, and
types. If one out of a thousand documents has the correspondent
"Very obscure web shop I bought something five years ago", it will
probably not assign this correspondent automatically if you buy
something from them again. The more documents, the better.
- Paperless also needs a reasonable amount of negative examples to
decide when not to assign a certain tag, correspondent, document
type, or storage path. This will usually be the case as you start
filling up paperless with documents. Example: If all your documents
are either from "Webshop" and "Bank", paperless will assign one
of these correspondents to ANY new document, if both are set to
automatic matching.
## Hooking into the consumption process {#consume-hooks}
Sometimes you may want to do something arbitrary whenever a document is
consumed. Rather than try to predict what you may want to do, Paperless
lets you execute scripts of your own choosing just before or after a
document is consumed using a couple simple hooks.
Just write a script, put it somewhere that Paperless can read & execute,
and then put the path to that script in `paperless.conf` or
`docker-compose.env` with the variable name of either
`PAPERLESS_PRE_CONSUME_SCRIPT` or `PAPERLESS_POST_CONSUME_SCRIPT`.
!!! info
These scripts are executed in a **blocking** process, which means that
if a script takes a long time to run, it can significantly slow down
your document consumption flow. If you want things to run
asynchronously, you'll have to fork the process in your script and
exit.
### Pre-consumption script {#pre-consume-script}
Executed after the consumer sees a new document in the consumption
folder, but before any processing of the document is performed. This
script can access the following relevant environment variables set:
| Environment Variable | Description |
| ----------------------- | ------------------------------------------------------------ |
| `DOCUMENT_SOURCE_PATH` | Original path of the consumed document |
| `DOCUMENT_WORKING_PATH` | Path to a copy of the original that consumption will work on |
!!! note
Pre-consume scripts which modify the document should only change
the `DOCUMENT_WORKING_PATH` file or a second consume task may
be triggered, leading to failures as two tasks work on the
same document path
A simple but common example for this would be creating a simple script
like this:
`/usr/local/bin/ocr-pdf`
```bash
#!/usr/bin/env bash
pdf2pdfocr.py -i ${DOCUMENT_WORKING_PATH}
```
`/etc/paperless.conf`
```bash
...
PAPERLESS_PRE_CONSUME_SCRIPT="/usr/local/bin/ocr-pdf"
...
```
This will pass the path to the document about to be consumed to
`/usr/local/bin/ocr-pdf`, which will in turn call
[pdf2pdfocr.py](https://github.com/LeoFCardoso/pdf2pdfocr) on your
document, which will then overwrite the file with an OCR'd version of
the file and exit. At which point, the consumption process will begin
with the newly modified file.
The script's stdout and stderr will be logged line by line to the
webserver log, along with the exit code of the script.
### Post-consumption script {#post-consume-script}
Executed after the consumer has successfully processed a document and
has moved it into paperless. It receives the following environment
variables:
| Environment Variable | Description |
| ---------------------------- | --------------------------------------------- |
| `DOCUMENT_ID` | Database primary key of the document |
| `DOCUMENT_FILE_NAME` | Formatted filename, not including paths |
| `DOCUMENT_CREATED` | Date & time when document created |
| `DOCUMENT_MODIFIED` | Date & time when document was last modified |
| `DOCUMENT_ADDED` | Date & time when document was added |
| `DOCUMENT_SOURCE_PATH` | Path to the original document file |
| `DOCUMENT_ARCHIVE_PATH` | Path to the generate archive file (if any) |
| `DOCUMENT_THUMBNAIL_PATH` | Path to the generated thumbnail |
| `DOCUMENT_DOWNLOAD_URL` | URL for document download |
| `DOCUMENT_THUMBNAIL_URL` | URL for the document thumbnail |
| `DOCUMENT_CORRESPONDENT` | Assigned correspondent (if any) |
| `DOCUMENT_TAGS` | Comma separated list of tags applied (if any) |
| `DOCUMENT_ORIGINAL_FILENAME` | Filename of original document |
The script can be in any language, A simple shell script example:
```bash title="post-consumption-example"
--8<-- "./scripts/post-consumption-example.sh"
```
!!! note
The post consumption script cannot cancel the consumption process.
!!! warning
The post consumption script should not modify the document files
directly
The script's stdout and stderr will be logged line by line to the
webserver log, along with the exit code of the script.
### Docker {#docker-consume-hooks}
To hook into the consumption process when using Docker, you
will need to pass the scripts into the container via a host mount
in your `docker-compose.yml`.
Assuming you have
`/home/paperless-ngx/scripts/post-consumption-example.sh` as a
script which you'd like to run.
You can pass that script into the consumer container via a host mount:
```yaml
...
webserver:
...
volumes:
...
- /home/paperless-ngx/scripts:/path/in/container/scripts/ # (1)!
environment: # (3)!
...
PAPERLESS_POST_CONSUME_SCRIPT: /path/in/container/scripts/post-consumption-example.sh # (2)!
...
```
1. The external scripts directory is mounted to a location inside the container.
2. The internal location of the script is used to set the script to run
3. This can also be set in `docker-compose.env`
Troubleshooting:
- Monitor the docker-compose log
`cd ~/paperless-ngx; docker-compose logs -f`
- Check your script's permission e.g. in case of permission error
`sudo chmod 755 post-consumption-example.sh`
- Pipe your scripts's output to a log file e.g.
`echo "${DOCUMENT_ID}" | tee --append /usr/src/paperless/scripts/post-consumption-example.log`
## File name handling {#file-name-handling}
By default, paperless stores your documents in the media directory and
renames them using the identifier which it has assigned to each
document. You will end up getting files like `0000123.pdf` in your media
directory. This isn't necessarily a bad thing, because you normally
don't have to access these files manually. However, if you wish to name
your files differently, you can do that by adjusting the
`PAPERLESS_FILENAME_FORMAT` configuration option. Paperless adds the
correct file extension e.g. `.pdf`, `.jpg` automatically.
This variable allows you to configure the filename (folders are allowed)
using placeholders. For example, configuring this to
```bash
PAPERLESS_FILENAME_FORMAT={created_year}/{correspondent}/{title}
```
will create a directory structure as follows:
```
2019/
My bank/
Statement January.pdf
Statement February.pdf
2020/
My bank/
Statement January.pdf
Letter.pdf
Letter_01.pdf
Shoe store/
My new shoes.pdf
```
!!! warning
Do not manually move your files in the media folder. Paperless remembers
the last filename a document was stored as. If you do rename a file,
paperless will report your files as missing and won't be able to find
them.
Paperless provides the following placeholders within filenames:
- `{asn}`: The archive serial number of the document, or "none".
- `{correspondent}`: The name of the correspondent, or "none".
- `{document_type}`: The name of the document type, or "none".
- `{tag_list}`: A comma separated list of all tags assigned to the
document.
- `{title}`: The title of the document.
- `{created}`: The full date (ISO format) the document was created.
- `{created_year}`: Year created only, formatted as the year with
century.
- `{created_year_short}`: Year created only, formatted as the year
without century, zero padded.
- `{created_month}`: Month created only (number 01-12).
- `{created_month_name}`: Month created name, as per locale
- `{created_month_name_short}`: Month created abbreviated name, as per
locale
- `{created_day}`: Day created only (number 01-31).
- `{added}`: The full date (ISO format) the document was added to
paperless.
- `{added_year}`: Year added only.
- `{added_year_short}`: Year added only, formatted as the year without
century, zero padded.
- `{added_month}`: Month added only (number 01-12).
- `{added_month_name}`: Month added name, as per locale
- `{added_month_name_short}`: Month added abbreviated name, as per
locale
- `{added_day}`: Day added only (number 01-31).
- `{owner_username}`: Username of document owner, if any, or "none"
- `{original_name}`: Document original filename, minus the extension, if any, or "none"
Paperless will try to conserve the information from your database as
much as possible. However, some characters that you can use in document
titles and correspondent names (such as `: \ /` and a couple more) are
not allowed in filenames and will be replaced with dashes.
If paperless detects that two documents share the same filename,
paperless will automatically append `_01`, `_02`, etc to the filename.
This happens if all the placeholders in a filename evaluate to the same
value.
!!! tip
You can affect how empty placeholders are treated by changing the
following setting to `true`.
```
PAPERLESS_FILENAME_FORMAT_REMOVE_NONE=True
```
Doing this results in all empty placeholders resolving to "" instead
of "none" as stated above. Spaces before empty placeholders are
removed as well, empty directories are omitted.
!!! tip
Paperless checks the filename of a document whenever it is saved.
Therefore, you need to update the filenames of your documents and move
them after altering this setting by invoking the
[`document renamer`](/administration#renamer).
!!! warning
Make absolutely sure you get the spelling of the placeholders right, or
else paperless will use the default naming scheme instead.
!!! caution
As of now, you could totally tell paperless to store your files anywhere
outside the media directory by setting
```
PAPERLESS_FILENAME_FORMAT=../../my/custom/location/{title}
```
However, keep in mind that inside docker, if files get stored outside of
the predefined volumes, they will be lost after a restart of paperless.
!!! warning
When file naming handling, in particular when using `{tag_list}`,
you may run into the limits of your operating system's maximum
path lengths. Files will retain the previous path instead and
the issue logged.
## Storage paths
One of the best things in Paperless is that you can not only access the
documents via the web interface, but also via the file system.
When a single storage layout is not sufficient for your use case,
storage paths come to the rescue. Storage paths allow you to configure
more precisely where each document is stored in the file system.
- Each storage path is a `PAPERLESS_FILENAME_FORMAT` and
follows the rules described above
- Each document is assigned a storage path using the matching
algorithms described above, but can be overwritten at any time
For example, you could define the following two storage paths:
1. Normal communications are put into a folder structure sorted by
`year/correspondent`
2. Communications with insurance companies are stored in a flat
structure with longer file names, but containing the full date of
the correspondence.
```
By Year = {created_year}/{correspondent}/{title}
Insurances = Insurances/{correspondent}/{created_year}-{created_month}-{created_day} {title}
```
If you then map these storage paths to the documents, you might get the
following result. For simplicity, `By Year` defines the same
structure as in the previous example above.
```text
2019/ # By Year
My bank/
Statement January.pdf
Statement February.pdf
Insurances/ # Insurances
Healthcare 123/
2022-01-01 Statement January.pdf
2022-02-02 Letter.pdf
2022-02-03 Letter.pdf
Dental 456/
2021-12-01 New Conditions.pdf
```
!!! tip
Defining a storage path is optional. If no storage path is defined for a
document, the global `PAPERLESS_FILENAME_FORMAT` is applied.
## Celery Monitoring {#celery-monitoring}
The monitoring tool
[Flower](https://flower.readthedocs.io/en/latest/index.html) can be used
to view more detailed information about the health of the celery workers
used for asynchronous tasks. This includes details on currently running,
queued and completed tasks, timing and more. Flower can also be used
with Prometheus, as it exports metrics. For details on its capabilities,
refer to the Flower documentation.
To configure Flower further, create a `flowerconfig.py` and
place it into the `src/paperless` directory. For a Docker
installation, you can use volumes to accomplish this:
```yaml
services:
# ...
webserver:
ports:
- 5555:5555 # (2)!
# ...
volumes:
- /path/to/my/flowerconfig.py:/usr/src/paperless/src/paperless/flowerconfig.py:ro # (1)!
```
1. Note the `:ro` tag means the file will be mounted as read only.
2. `flower` runs by default on port 5555, but this can be configured
## Custom Container Initialization
The Docker image includes the ability to run custom user scripts during
startup. This could be utilized for installing additional tools or
Python packages, for example. Scripts are expected to be shell scripts.
To utilize this, mount a folder containing your scripts to the custom
initialization directory, `/custom-cont-init.d` and place
scripts you wish to run inside. For security, the folder must be owned
by `root` and should have permissions of `a=rx`. Additionally, scripts
must only be writable by `root`.
Your scripts will be run directly before the webserver completes
startup. Scripts will be run by the `root` user.
If you would like to switch users, the utility `gosu` is available and
preferred over `sudo`.
This is an advanced functionality with which you could break functionality
or lose data. If you experience issues, please disable any custom scripts
and try again before reporting an issue.
For example, using Docker Compose:
```yaml
services:
# ...
webserver:
# ...
volumes:
- /path/to/my/scripts:/custom-cont-init.d:ro # (1)!
```
1. Note the `:ro` tag means the folder will be mounted as read only. This is for extra security against changes
## MySQL Caveats {#mysql-caveats}
### Case Sensitivity
The database interface does not provide a method to configure a MySQL
database to be case sensitive. This would prevent a user from creating a
tag `Name` and `NAME` as they are considered the same.
Per Django documentation, to enable this requires manual intervention.
To enable case sensetive tables, you can execute the following command
against each table:
`ALTER TABLE <table_name> CONVERT TO CHARACTER SET utf8mb4 COLLATE utf8mb4_bin;`
You can also set the default for new tables (this does NOT affect
existing tables) with:
`ALTER DATABASE <db_name> CHARACTER SET utf8mb4 COLLATE utf8mb4_bin;`
!!! warning
Using mariadb version 10.4+ is recommended. Using the `utf8mb3` character set on
an older system may fix issues that can arise while setting up Paperless-ngx but
`utf8mb3` can cause issues with consumption (where `utf8mb4` does not).
## Barcodes {#barcodes}
Paperless is able to utilize barcodes for automatically preforming some tasks.
At this time, the library utilized for detection of bacodes supports the following types:
- AN-13/UPC-A
- UPC-E
- EAN-8
- Code 128
- Code 93
- Code 39
- Codabar
- Interleaved 2 of 5
- QR Code
- SQ Code
You may check for updates on the [zbar library homepage](https://github.com/mchehab/zbar).
For usage in Paperless, the type of barcode does not matter, only the contents of it.
For how to enable barcode usage, see [the configuration](/configuration#barcodes).
The two settings may be enabled independently, but do have interactions as explained
below.
### Document Splitting
When enabled, Paperless will look for a barcode with the configured value and create a new document
starting from the next page. The page with the barcode on it will _not_ be retained. It
is expected to be a page existing only for triggering the split.
### Archive Serial Number Assignment
When enabled, the value of the barcode (as an integer) will be used to set the document's
archive serial number, allowing quick reference back to the original, paper document.
If document splitting via barcode is also enabled, documents will be split when an ASN
barcode is located. However, differing from the splitting, the page with the
barcode _will_ be retained. This allows application of a barcode to any page, including
one which holds data to keep in the document.

292
docs/advanced_usage.rst Normal file
View File

@@ -0,0 +1,292 @@
***************
Advanced topics
***************
Paperless offers a couple features that automate certain tasks and make your life
easier.
.. _advanced-matching:
Matching tags, correspondents and document types
################################################
Paperless will compare the matching algorithms defined by every tag and
correspondent already set in your database to see if they apply to the text in
a document. In other words, if you defined a tag called ``Home Utility``
that had a ``match`` property of ``bc hydro`` and a ``matching_algorithm`` of
``literal``, Paperless will automatically tag your newly-consumed document with
your ``Home Utility`` tag so long as the text ``bc hydro`` appears in the body
of the document somewhere.
The matching logic is quite powerful. It supports searching the text of your
document with different algorithms, and as such, some experimentation may be
necessary to get things right.
In order to have a tag, correspondent, or type assigned automatically to newly
consumed documents, assign a match and matching algorithm using the web
interface. These settings define when to assign correspondents, tags, and types
to documents.
The following algorithms are available:
* **Any:** Looks for any occurrence of any word provided in match in the PDF.
If you define the match as ``Bank1 Bank2``, it will match documents containing
either of these terms.
* **All:** Requires that every word provided appears in the PDF, albeit not in the
order provided.
* **Literal:** Matches only if the match appears exactly as provided (i.e. preserve ordering) in the PDF.
* **Regular expression:** Parses the match as a regular expression and tries to
find a match within the document.
* **Fuzzy match:** I dont know. Look at the source.
* **Auto:** Tries to automatically match new documents. This does not require you
to set a match. See the notes below.
When using the *any* or *all* matching algorithms, you can search for terms
that consist of multiple words by enclosing them in double quotes. For example,
defining a match text of ``"Bank of America" BofA`` using the *any* algorithm,
will match documents that contain either "Bank of America" or "BofA", but will
not match documents containing "Bank of South America".
Then just save your tag/correspondent and run another document through the
consumer. Once complete, you should see the newly-created document,
automatically tagged with the appropriate data.
.. _advanced-automatic_matching:
Automatic matching
==================
Paperless-ngx comes with a new matching algorithm called *Auto*. This matching
algorithm tries to assign tags, correspondents, and document types to your
documents based on how you have already assigned these on existing documents. It
uses a neural network under the hood.
If, for example, all your bank statements of your account 123 at the Bank of
America are tagged with the tag "bofa_123" and the matching algorithm of this
tag is set to *Auto*, this neural network will examine your documents and
automatically learn when to assign this tag.
Paperless tries to hide much of the involved complexity with this approach.
However, there are a couple caveats you need to keep in mind when using this
feature:
* Changes to your documents are not immediately reflected by the matching
algorithm. The neural network needs to be *trained* on your documents after
changes. Paperless periodically (default: once each hour) checks for changes
and does this automatically for you.
* The Auto matching algorithm only takes documents into account which are NOT
placed in your inbox (i.e. have any inbox tags assigned to them). This ensures
that the neural network only learns from documents which you have correctly
tagged before.
* The matching algorithm can only work if there is a correlation between the
tag, correspondent, or document type and the document itself. Your bank
statements usually contain your bank account number and the name of the bank,
so this works reasonably well, However, tags such as "TODO" cannot be
automatically assigned.
* The matching algorithm needs a reasonable number of documents to identify when
to assign tags, correspondents, and types. If one out of a thousand documents
has the correspondent "Very obscure web shop I bought something five years
ago", it will probably not assign this correspondent automatically if you buy
something from them again. The more documents, the better.
* Paperless also needs a reasonable amount of negative examples to decide when
not to assign a certain tag, correspondent or type. This will usually be the
case as you start filling up paperless with documents. Example: If all your
documents are either from "Webshop" and "Bank", paperless will assign one of
these correspondents to ANY new document, if both are set to automatic matching.
Hooking into the consumption process
####################################
Sometimes you may want to do something arbitrary whenever a document is
consumed. Rather than try to predict what you may want to do, Paperless lets
you execute scripts of your own choosing just before or after a document is
consumed using a couple simple hooks.
Just write a script, put it somewhere that Paperless can read & execute, and
then put the path to that script in ``paperless.conf`` or ``docker-compose.env`` with the variable name
of either ``PAPERLESS_PRE_CONSUME_SCRIPT`` or
``PAPERLESS_POST_CONSUME_SCRIPT``.
.. important::
These scripts are executed in a **blocking** process, which means that if
a script takes a long time to run, it can significantly slow down your
document consumption flow. If you want things to run asynchronously,
you'll have to fork the process in your script and exit.
Pre-consumption script
======================
Executed after the consumer sees a new document in the consumption folder, but
before any processing of the document is performed. This script receives exactly
one argument:
* Document file name
A simple but common example for this would be creating a simple script like
this:
``/usr/local/bin/ocr-pdf``
.. code:: bash
#!/usr/bin/env bash
pdf2pdfocr.py -i ${1}
``/etc/paperless.conf``
.. code:: bash
...
PAPERLESS_PRE_CONSUME_SCRIPT="/usr/local/bin/ocr-pdf"
...
This will pass the path to the document about to be consumed to ``/usr/local/bin/ocr-pdf``,
which will in turn call `pdf2pdfocr.py`_ on your document, which will then
overwrite the file with an OCR'd version of the file and exit. At which point,
the consumption process will begin with the newly modified file.
.. _pdf2pdfocr.py: https://github.com/LeoFCardoso/pdf2pdfocr
.. _advanced-post_consume_script:
Post-consumption script
=======================
Executed after the consumer has successfully processed a document and has moved it
into paperless. It receives the following arguments:
* Document id
* Generated file name
* Source path
* Thumbnail path
* Download URL
* Thumbnail URL
* Correspondent
* Tags
The script can be in any language, but for a simple shell script
example, you can take a look at `post-consumption-example.sh`_ in this project.
The post consumption script cannot cancel the consumption process.
Docker
------
Assumed you have ``/home/foo/paperless-ngx/scripts/post-consumption-example.sh``.
You can pass that script into the consumer container via a host mount in your ``docker-compose.yml``.
.. code:: bash
...
consumer:
...
volumes:
...
- /home/paperless-ngx/scripts:/path/in/container/scripts/
...
Example (docker-compose.yml): ``- /home/foo/paperless-ngx/scripts:/usr/src/paperless/scripts``
which in turn requires the variable ``PAPERLESS_POST_CONSUME_SCRIPT`` in ``docker-compose.env`` to point to ``/path/in/container/scripts/post-consumption-example.sh``.
Example (docker-compose.env): ``PAPERLESS_POST_CONSUME_SCRIPT=/usr/src/paperless/scripts/post-consumption-example.sh``
Troubleshooting:
- Monitor the docker-compose log ``cd ~/paperless-ngx; docker-compose logs -f``
- Check your script's permission e.g. in case of permission error ``sudo chmod 755 post-consumption-example.sh``
- Pipe your scripts's output to a log file e.g. ``echo "${DOCUMENT_ID}" | tee --append /usr/src/paperless/scripts/post-consumption-example.log``
.. _post-consumption-example.sh: https://github.com/jonaswinkler/paperless-ngx/blob/master/scripts/post-consumption-example.sh
.. _advanced-file_name_handling:
File name handling
##################
By default, paperless stores your documents in the media directory and renames them
using the identifier which it has assigned to each document. You will end up getting
files like ``0000123.pdf`` in your media directory. This isn't necessarily a bad
thing, because you normally don't have to access these files manually. However, if
you wish to name your files differently, you can do that by adjusting the
``PAPERLESS_FILENAME_FORMAT`` configuration option.
This variable allows you to configure the filename (folders are allowed) using
placeholders. For example, configuring this to
.. code:: bash
PAPERLESS_FILENAME_FORMAT={created_year}/{correspondent}/{title}
will create a directory structure as follows:
.. code::
2019/
My bank/
Statement January.pdf
Statement February.pdf
2020/
My bank/
Statement January.pdf
Letter.pdf
Letter_01.pdf
Shoe store/
My new shoes.pdf
.. danger::
Do not manually move your files in the media folder. Paperless remembers the
last filename a document was stored as. If you do rename a file, paperless will
report your files as missing and won't be able to find them.
Paperless provides the following placeholders withing filenames:
* ``{asn}``: The archive serial number of the document, or "none".
* ``{correspondent}``: The name of the correspondent, or "none".
* ``{document_type}``: The name of the document type, or "none".
* ``{tag_list}``: A comma separated list of all tags assigned to the document.
* ``{title}``: The title of the document.
* ``{created}``: The full date (ISO format) the document was created.
* ``{created_year}``: Year created only.
* ``{created_month}``: Month created only (number 01-12).
* ``{created_day}``: Day created only (number 01-31).
* ``{added}``: The full date (ISO format) the document was added to paperless.
* ``{added_year}``: Year added only.
* ``{added_month}``: Month added only (number 01-12).
* ``{added_day}``: Day added only (number 01-31).
Paperless will try to conserve the information from your database as much as possible.
However, some characters that you can use in document titles and correspondent names (such
as ``: \ /`` and a couple more) are not allowed in filenames and will be replaced with dashes.
If paperless detects that two documents share the same filename, paperless will automatically
append ``_01``, ``_02``, etc to the filename. This happens if all the placeholders in a filename
evaluate to the same value.
.. hint::
Paperless checks the filename of a document whenever it is saved. Therefore,
you need to update the filenames of your documents and move them after altering
this setting by invoking the :ref:`document renamer <utilities-renamer>`.
.. warning::
Make absolutely sure you get the spelling of the placeholders right, or else
paperless will use the default naming scheme instead.
.. caution::
As of now, you could totally tell paperless to store your files anywhere outside
the media directory by setting
.. code::
PAPERLESS_FILENAME_FORMAT=../../my/custom/location/{title}
However, keep in mind that inside docker, if files get stored outside of the
predefined volumes, they will be lost after a restart of paperless.

View File

@@ -1,325 +0,0 @@
# The REST API
Paperless makes use of the [Django REST
Framework](https://django-rest-framework.org/) standard API interface. It
provides a browsable API for most of its endpoints, which you can
inspect at `http://<paperless-host>:<port>/api/`. This also documents
most of the available filters and ordering fields.
The API provides 7 main endpoints:
- `/api/documents/`: Full CRUD support, except POSTing new documents.
See below.
- `/api/correspondents/`: Full CRUD support.
- `/api/document_types/`: Full CRUD support.
- `/api/logs/`: Read-Only.
- `/api/tags/`: Full CRUD support.
- `/api/tasks/`: Read-only.
- `/api/mail_accounts/`: Full CRUD support.
- `/api/mail_rules/`: Full CRUD support.
- `/api/users/`: Full CRUD support.
- `/api/groups/`: Full CRUD support.
All of these endpoints except for the logging endpoint allow you to
fetch (and edit and delete where appropriate) individual objects by
appending their primary key to the path, e.g. `/api/documents/454/`.
The objects served by the document endpoint contain the following
fields:
- `id`: ID of the document. Read-only.
- `title`: Title of the document.
- `content`: Plain text content of the document.
- `tags`: List of IDs of tags assigned to this document, or empty
list.
- `document_type`: Document type of this document, or null.
- `correspondent`: Correspondent of this document or null.
- `created`: The date time at which this document was created.
- `created_date`: The date (YYYY-MM-DD) at which this document was
created. Optional. If also passed with created, this is ignored.
- `modified`: The date at which this document was last edited in
paperless. Read-only.
- `added`: The date at which this document was added to paperless.
Read-only.
- `archive_serial_number`: The identifier of this document in a
physical document archive.
- `original_file_name`: Verbose filename of the original document.
Read-only.
- `archived_file_name`: Verbose filename of the archived document.
Read-only. Null if no archived document is available.
## Downloading documents
In addition to that, the document endpoint offers these additional
actions on individual documents:
- `/api/documents/<pk>/download/`: Download the document.
- `/api/documents/<pk>/preview/`: Display the document inline, without
downloading it.
- `/api/documents/<pk>/thumb/`: Download the PNG thumbnail of a
document.
Paperless generates archived PDF/A documents from consumed files and
stores both the original files as well as the archived files. By
default, the endpoints for previews and downloads serve the archived
file, if it is available. Otherwise, the original file is served. Some
document cannot be archived.
The endpoints correctly serve the response header fields
`Content-Disposition` and `Content-Type` to indicate the filename for
download and the type of content of the document.
In order to download or preview the original document when an archived
document is available, supply the query parameter `original=true`.
!!! tip
Paperless used to provide these functionality at `/fetch/<pk>/preview`,
`/fetch/<pk>/thumb` and `/fetch/<pk>/doc`. Redirects to the new URLs are
in place. However, if you use these old URLs to access documents, you
should update your app or script to use the new URLs.
## Getting document metadata
The api also has an endpoint to retrieve read-only metadata about
specific documents. this information is not served along with the
document objects, since it requires reading files and would therefore
slow down document lists considerably.
Access the metadata of a document with an ID `id` at
`/api/documents/<id>/metadata/`.
The endpoint reports the following data:
- `original_checksum`: MD5 checksum of the original document.
- `original_size`: Size of the original document, in bytes.
- `original_mime_type`: Mime type of the original document.
- `media_filename`: Current filename of the document, under which it
is stored inside the media directory.
- `has_archive_version`: True, if this document is archived, false
otherwise.
- `original_metadata`: A list of metadata associated with the original
document. See below.
- `archive_checksum`: MD5 checksum of the archived document, or null.
- `archive_size`: Size of the archived document in bytes, or null.
- `archive_metadata`: Metadata associated with the archived document,
or null. See below.
File metadata is reported as a list of objects in the following form:
```json
[
{
"namespace": "http://ns.adobe.com/pdf/1.3/",
"prefix": "pdf",
"key": "Producer",
"value": "SparklePDF, Fancy edition"
}
]
```
`namespace` and `prefix` can be null. The actual metadata reported
depends on the file type and the metadata available in that specific
document. Paperless only reports PDF metadata at this point.
## Authorization
The REST api provides three different forms of authentication.
1. Basic authentication
Authorize by providing a HTTP header in the form
```
Authorization: Basic <credentials>
```
where `credentials` is a base64-encoded string of
`<username>:<password>`
2. Session authentication
When you're logged into paperless in your browser, you're
automatically logged into the API as well and don't need to provide
any authorization headers.
3. Token authentication
Paperless also offers an endpoint to acquire authentication tokens.
POST a username and password as a form or json string to
`/api/token/` and paperless will respond with a token, if the login
data is correct. This token can be used to authenticate other
requests with the following HTTP header:
```
Authorization: Token <token>
```
Tokens can be managed and revoked in the paperless admin.
## Searching for documents
Full text searching is available on the `/api/documents/` endpoint. Two
specific query parameters cause the API to return full text search
results:
- `/api/documents/?query=your%20search%20query`: Search for a document
using a full text query. For details on the syntax, see [Basic Usage - Searching](/usage#basic-usage_searching).
- `/api/documents/?more_like=1234`: Search for documents similar to
the document with id 1234.
Pagination works exactly the same as it does for normal requests on this
endpoint.
Certain limitations apply to full text queries:
- Results are always sorted by search score. The results matching the
query best will show up first.
- Only a small subset of filtering parameters are supported.
Furthermore, each returned document has an additional `__search_hit__`
attribute with various information about the search results:
```
{
"count": 31,
"next": "http://localhost:8000/api/documents/?page=2&query=test",
"previous": null,
"results": [
...
{
"id": 123,
"title": "title",
"content": "content",
...
"__search_hit__": {
"score": 0.343,
"highlights": "text <span class="match">Test</span> text",
"rank": 23
}
},
...
]
}
```
- `score` is an indication how well this document matches the query
relative to the other search results.
- `highlights` is an excerpt from the document content and highlights
the search terms with `<span>` tags as shown above.
- `rank` is the index of the search results. The first result will
have rank 0.
### `/api/search/autocomplete/`
Get auto completions for a partial search term.
Query parameters:
- `term`: The incomplete term.
- `limit`: Amount of results. Defaults to 10.
Results returned by the endpoint are ordered by importance of the term
in the document index. The first result is the term that has the highest
[Tf/Idf](https://en.wikipedia.org/wiki/Tf%E2%80%93idf) score in the index.
```json
["term1", "term3", "term6", "term4"]
```
## POSTing documents {#file-uploads}
The API provides a special endpoint for file uploads:
`/api/documents/post_document/`
POST a multipart form to this endpoint, where the form field `document`
contains the document that you want to upload to paperless. The filename
is sanitized and then used to store the document in a temporary
directory, and the consumer will be instructed to consume the document
from there.
The endpoint supports the following optional form fields:
- `title`: Specify a title that the consumer should use for the
document.
- `created`: Specify a DateTime where the document was created (e.g.
"2016-04-19" or "2016-04-19 06:15:00+02:00").
- `correspondent`: Specify the ID of a correspondent that the consumer
should use for the document.
- `document_type`: Similar to correspondent.
- `tags`: Similar to correspondent. Specify this multiple times to
have multiple tags added to the document.
- `archive_serial_number`: An optional archive serial number to set.
The endpoint will immediately return HTTP 200 if the document consumption
process was started successfully, with the UUID of the consumption task
as the data. No additional status information about the consumption process
itself is available immediately, since that happens in a different process.
However, querying the tasks endpoint with the returned UUID e.g.
`/api/tasks/?task_id={uuid}` will provide information on the state of the
consumption including the ID of a created document if consumption succeeded.
## API Versioning
The REST API is versioned since Paperless-ngx 1.3.0.
- Versioning ensures that changes to the API don't break older
clients.
- Clients specify the specific version of the API they wish to use
with every request and Paperless will handle the request using the
specified API version.
- Even if the underlying data model changes, older API versions will
always serve compatible data.
- If no version is specified, Paperless will serve version 1 to ensure
compatibility with older clients that do not request a specific API
version.
API versions are specified by submitting an additional HTTP `Accept`
header with every request:
```
Accept: application/json; version=6
```
If an invalid version is specified, Paperless 1.3.0 will respond with
"406 Not Acceptable" and an error message in the body. Earlier
versions of Paperless will serve API version 1 regardless of whether a
version is specified via the `Accept` header.
If a client wishes to verify whether it is compatible with any given
server, the following procedure should be performed:
1. Perform an _authenticated_ request against any API endpoint. If the
server is on version 1.3.0 or newer, the server will add two custom
headers to the response:
```
X-Api-Version: 2
X-Version: 1.3.0
```
2. Determine whether the client is compatible with this server based on
the presence/absence of these headers and their values if present.
### API Changelog
#### Version 1
Initial API version.
#### Version 2
- Added field `Tag.color`. This read/write string field contains a hex
color such as `#a6cee3`.
- Added read-only field `Tag.text_color`. This field contains the text
color to use for a specific tag, which is either black or white
depending on the brightness of `Tag.color`.
- Removed field `Tag.colour`.

300
docs/api.rst Normal file
View File

@@ -0,0 +1,300 @@
************
The REST API
************
Paperless makes use of the `Django REST Framework`_ standard API interface.
It provides a browsable API for most of its endpoints, which you can inspect
at ``http://<paperless-host>:<port>/api/``. This also documents most of the
available filters and ordering fields.
.. _Django REST Framework: http://django-rest-framework.org/
The API provides 5 main endpoints:
* ``/api/documents/``: Full CRUD support, except POSTing new documents. See below.
* ``/api/correspondents/``: Full CRUD support.
* ``/api/document_types/``: Full CRUD support.
* ``/api/logs/``: Read-Only.
* ``/api/tags/``: Full CRUD support.
All of these endpoints except for the logging endpoint
allow you to fetch, edit and delete individual objects
by appending their primary key to the path, for example ``/api/documents/454/``.
The objects served by the document endpoint contain the following fields:
* ``id``: ID of the document. Read-only.
* ``title``: Title of the document.
* ``content``: Plain text content of the document.
* ``tags``: List of IDs of tags assigned to this document, or empty list.
* ``document_type``: Document type of this document, or null.
* ``correspondent``: Correspondent of this document or null.
* ``created``: The date at which this document was created.
* ``modified``: The date at which this document was last edited in paperless. Read-only.
* ``added``: The date at which this document was added to paperless. Read-only.
* ``archive_serial_number``: The identifier of this document in a physical document archive.
* ``original_file_name``: Verbose filename of the original document. Read-only.
* ``archived_file_name``: Verbose filename of the archived document. Read-only. Null if no archived document is available.
Downloading documents
#####################
In addition to that, the document endpoint offers these additional actions on
individual documents:
* ``/api/documents/<pk>/download/``: Download the document.
* ``/api/documents/<pk>/preview/``: Display the document inline,
without downloading it.
* ``/api/documents/<pk>/thumb/``: Download the PNG thumbnail of a document.
Paperless generates archived PDF/A documents from consumed files and stores both
the original files as well as the archived files. By default, the endpoints
for previews and downloads serve the archived file, if it is available.
Otherwise, the original file is served.
Some document cannot be archived.
The endpoints correctly serve the response header fields ``Content-Disposition``
and ``Content-Type`` to indicate the filename for download and the type of content of
the document.
In order to download or preview the original document when an archived document is available,
supply the query parameter ``original=true``.
.. hint::
Paperless used to provide these functionality at ``/fetch/<pk>/preview``,
``/fetch/<pk>/thumb`` and ``/fetch/<pk>/doc``. Redirects to the new URLs
are in place. However, if you use these old URLs to access documents, you
should update your app or script to use the new URLs.
Getting document metadata
#########################
The api also has an endpoint to retrieve read-only metadata about specific documents. this
information is not served along with the document objects, since it requires reading
files and would therefore slow down document lists considerably.
Access the metadata of a document with an ID ``id`` at ``/api/documents/<id>/metadata/``.
The endpoint reports the following data:
* ``original_checksum``: MD5 checksum of the original document.
* ``original_size``: Size of the original document, in bytes.
* ``original_mime_type``: Mime type of the original document.
* ``media_filename``: Current filename of the document, under which it is stored inside the media directory.
* ``has_archive_version``: True, if this document is archived, false otherwise.
* ``original_metadata``: A list of metadata associated with the original document. See below.
* ``archive_checksum``: MD5 checksum of the archived document, or null.
* ``archive_size``: Size of the archived document in bytes, or null.
* ``archive_metadata``: Metadata associated with the archived document, or null. See below.
File metadata is reported as a list of objects in the following form:
.. code:: json
[
{
"namespace": "http://ns.adobe.com/pdf/1.3/",
"prefix": "pdf",
"key": "Producer",
"value": "SparklePDF, Fancy edition"
},
]
``namespace`` and ``prefix`` can be null. The actual metadata reported depends on the file type and the metadata
available in that specific document. Paperless only reports PDF metadata at this point.
Authorization
#############
The REST api provides three different forms of authentication.
1. Basic authentication
Authorize by providing a HTTP header in the form
.. code::
Authorization: Basic <credentials>
where ``credentials`` is a base64-encoded string of ``<username>:<password>``
2. Session authentication
When you're logged into paperless in your browser, you're automatically
logged into the API as well and don't need to provide any authorization
headers.
3. Token authentication
Paperless also offers an endpoint to acquire authentication tokens.
POST a username and password as a form or json string to ``/api/token/``
and paperless will respond with a token, if the login data is correct.
This token can be used to authenticate other requests with the
following HTTP header:
.. code::
Authorization: Token <token>
Tokens can be managed and revoked in the paperless admin.
Searching for documents
#######################
Full text searching is available on the ``/api/documents/`` endpoint. Two specific
query parameters cause the API to return full text search results:
* ``/api/documents/?query=your%20search%20query``: Search for a document using a full text query.
For details on the syntax, see :ref:`basic-usage_searching`.
* ``/api/documents/?more_like=1234``: Search for documents similar to the document with id 1234.
Pagination works exactly the same as it does for normal requests on this endpoint.
Certain limitations apply to full text queries:
* Results are always sorted by search score. The results matching the query best will show up first.
* Only a small subset of filtering parameters are supported.
Furthermore, each returned document has an additional ``__search_hit__`` attribute with various information
about the search results:
.. code::
{
"count": 31,
"next": "http://localhost:8000/api/documents/?page=2&query=test",
"previous": null,
"results": [
...
{
"id": 123,
"title": "title",
"content": "content",
...
"__search_hit__": {
"score": 0.343,
"highlights": "text <span class=\"match\">Test</span> text",
"rank": 23
}
},
...
]
}
* ``score`` is an indication how well this document matches the query relative to the other search results.
* ``highlights`` is an excerpt from the document content and highlights the search terms with ``<span>`` tags as shown above.
* ``rank`` is the index of the search results. The first result will have rank 0.
``/api/search/autocomplete/``
=============================
Get auto completions for a partial search term.
Query parameters:
* ``term``: The incomplete term.
* ``limit``: Amount of results. Defaults to 10.
Results returned by the endpoint are ordered by importance of the term in the
document index. The first result is the term that has the highest Tf/Idf score
in the index.
.. code:: json
[
"term1",
"term3",
"term6",
"term4"
]
.. _api-file_uploads:
POSTing documents
#################
The API provides a special endpoint for file uploads:
``/api/documents/post_document/``
POST a multipart form to this endpoint, where the form field ``document`` contains
the document that you want to upload to paperless. The filename is sanitized and
then used to store the document in a temporary directory, and the consumer will
be instructed to consume the document from there.
The endpoint supports the following optional form fields:
* ``title``: Specify a title that the consumer should use for the document.
* ``correspondent``: Specify the ID of a correspondent that the consumer should use for the document.
* ``document_type``: Similar to correspondent.
* ``tags``: Similar to correspondent. Specify this multiple times to have multiple tags added
to the document.
The endpoint will immediately return "OK" if the document consumption process
was started successfully. No additional status information about the consumption
process itself is available, since that happens in a different process.
.. _api-versioning:
API Versioning
##############
The REST API is versioned since Paperless-ngx 1.3.0.
* Versioning ensures that changes to the API don't break older clients.
* Clients specify the specific version of the API they wish to use with every request and Paperless will handle the request using the specified API version.
* Even if the underlying data model changes, older API versions will always serve compatible data.
* If no version is specified, Paperless will serve version 1 to ensure compatibility with older clients that do not request a specific API version.
API versions are specified by submitting an additional HTTP ``Accept`` header with every request:
.. code::
Accept: application/json; version=6
If an invalid version is specified, Paperless 1.3.0 will respond with "406 Not Acceptable" and an error message in the body.
Earlier versions of Paperless will serve API version 1 regardless of whether a version is specified via the ``Accept`` header.
If a client wishes to verify whether it is compatible with any given server, the following procedure should be performed:
1. Perform an *authenticated* request against any API endpoint. If the server is on version 1.3.0 or newer, the server will
add two custom headers to the response:
.. code::
X-Api-Version: 2
X-Version: 1.3.0
2. Determine whether the client is compatible with this server based on the presence/absence of these headers and their values if present.
API Changelog
=============
Version 1
---------
Initial API version.
Version 2
---------
* Added field ``Tag.color``. This read/write string field contains a hex color such as ``#a6cee3``.
* Added read-only field ``Tag.text_color``. This field contains the text color to use for a specific tag, which is either black or white depending on the brightness of ``Tag.color``.
* Removed field ``Tag.colour``.

View File

@@ -1,36 +0,0 @@
:root > * {
--md-primary-fg-color: #17541f;
--md-primary-fg-color--dark: #17541f;
--md-primary-fg-color--light: #17541f;
--md-accent-fg-color: #2b8a38;
--md-typeset-a-color: #21652a;
}
[data-md-color-scheme="slate"] {
--md-hue: 222;
}
@media (min-width: 400px) {
.grid-left {
width: 33%;
float: left;
}
.grid-right {
width: 62%;
margin-left: 4%;
float: left;
}
}
.grid-left > p {
margin-bottom: 2rem;
}
.grid-right p {
margin: 0;
}
.index-callout {
margin-right: .5rem;
}

Binary file not shown.

Before

Width:  |  Height:  |  Size: 768 B

View File

@@ -1,12 +0,0 @@
<?xml version="1.0" encoding="utf-8"?>
<!-- Generator: Adobe Illustrator 27.0.1, SVG Export Plug-In . SVG Version: 6.00 Build 0) -->
<svg version="1.1" id="Layer_1" xmlns="http://www.w3.org/2000/svg" xmlns:xlink="http://www.w3.org/1999/xlink" x="0px" y="0px"
viewBox="0 0 1000 1000" style="enable-background:new 0 0 1000 1000;" xml:space="preserve">
<style type="text/css">
.st0{fill:#FFFFFF;}
</style>
<path class="st0" d="M299,891.7c-4.2-19.8-12.5-59.6-13.6-59.6c-176.7-105.7-155.8-288.7-97.3-393.4
c12.5,131.8,245.8,222.8,109.8,383.9c-1.1,2,6.2,27.2,12.5,50.2c27.2-46,68-101.4,65.8-106.7C208.9,358.2,731.9,326.9,840.6,73.7
c49.1,244.8-25.1,623.5-445.5,719.7c-2,1.1-76.3,131.8-79.5,132.9c0-2-31.4-1.1-27.2-11.5C290.7,908.4,294.8,900.1,299,891.7
L299,891.7z M293.8,793.4c53.3-61.8-9.4-167.4-47.1-201.9C310.5,701.3,306.3,765.1,293.8,793.4L293.8,793.4z"/>
</svg>

Before

Width:  |  Height:  |  Size: 869 B

View File

@@ -1,68 +0,0 @@
<?xml version="1.0" encoding="utf-8"?>
<!-- Generator: Adobe Illustrator 27.0.1, SVG Export Plug-In . SVG Version: 6.00 Build 0) -->
<svg version="1.1" id="Layer_1" xmlns="http://www.w3.org/2000/svg" xmlns:xlink="http://www.w3.org/1999/xlink" x="0px" y="0px"
viewBox="0 0 2962.2 860.2" style="enable-background:new 0 0 2962.2 860.2;" xml:space="preserve">
<style type="text/css">
.st0{fill:#17541F;stroke:#000000;stroke-miterlimit:10;}
</style>
<path d="M1055.6,639.7v-20.6c-18,20-43.1,30.1-75.4,30.1c-22.4,0-42.8-5.8-61-17.5c-18.3-11.7-32.5-27.8-42.9-48.3
c-10.3-20.5-15.5-43.3-15.5-68.4c0-25.1,5.2-48,15.5-68.5s24.6-36.6,42.9-48.3s38.6-17.5,61-17.5c32.3,0,57.5,10,75.4,30.1v-20.6
h85.3v249.6L1055.6,639.7L1055.6,639.7z M1059.1,514.9c0-17.4-5.2-31.9-15.5-43.8c-10.3-11.8-23.9-17.7-40.6-17.7
c-16.8,0-30.2,5.9-40.4,17.7c-10.2,11.8-15.3,26.4-15.3,43.8c0,17.4,5.1,31.9,15.3,43.8c10.2,11.8,23.6,17.7,40.4,17.7
c16.8,0,30.3-5.9,40.6-17.7C1054,546.9,1059.1,532.3,1059.1,514.9z"/>
<path d="M1417.8,398.2c18.3,11.7,32.5,27.8,42.9,48.3c10.3,20.5,15.5,43.3,15.5,68.5c0,25.1-5.2,48-15.5,68.4
c-10.3,20.5-24.6,36.6-42.9,48.3s-38.6,17.5-61,17.5c-32.3,0-57.5-10-75.4-30.1v165.6h-85.3V390.2h85.3v20.6
c18-20,43.1-30.1,75.4-30.1C1379.2,380.7,1399.5,386.6,1417.8,398.2z M1389.5,514.9c0-17.4-5.1-31.9-15.3-43.8
c-10.2-11.8-23.6-17.7-40.4-17.7s-30.2,5.9-40.4,17.7c-10.2,11.8-15.3,26.4-15.3,43.8c0,17.4,5.1,31.9,15.3,43.8
c10.2,11.8,23.6,17.7,40.4,17.7s30.2-5.9,40.4-17.7S1389.5,532.3,1389.5,514.9z"/>
<path d="M1713.6,555.3l53,49.4c-28.1,29.6-66.7,44.4-115.8,44.4c-28.1,0-53-5.8-74.5-17.5s-38.2-27.7-49.8-48
c-11.7-20.3-17.7-43.2-18-68.7c0-24.8,5.9-47.5,17.7-68c11.8-20.5,28.1-36.7,48.7-48.5s43.5-17.7,68.7-17.7
c24.8,0,47.6,6.1,68.2,18.2s37,29.5,49.1,52.3c12.1,22.7,18.2,49.1,18.2,79l-0.4,11.7h-181.8c3.6,11.4,10.5,20.7,20.9,28.1
c10.3,7.3,21.3,11,33,11c14.4,0,26.3-2.2,35.7-6.5C1695.8,570.1,1704.9,563.7,1713.6,555.3z M1596.9,486.2h92.9
c-2.1-12.3-7.5-22.1-16.2-29.4s-18.7-11-30.1-11s-21.5,3.7-30.3,11S1599,473.9,1596.9,486.2z"/>
<path d="M1908.8,418.4c7.8-10.8,17.2-19,28.3-24.7s22-8.5,32.8-8.5c11.4,0,20,1.6,26,4.9l-10.8,72.7c-8.4-2.1-15.7-3.1-22-3.1
c-17.1,0-30.4,4.3-39.9,12.8c-9.6,8.5-14.4,24.2-14.4,46.9v120.3h-85.3V390.2h85.3V418.4L1908.8,418.4z"/>
<path d="M2113,258.2v381.5h-85.3V258.2H2113z"/>
<path d="M2360.8,555.3l53,49.4c-28.1,29.6-66.7,44.4-115.8,44.4c-28.1,0-53-5.8-74.5-17.5s-38.2-27.7-49.8-48
c-11.7-20.3-17.7-43.2-18-68.7c0-24.8,5.9-47.5,17.7-68s28.1-36.7,48.7-48.5c20.6-11.8,43.5-17.7,68.7-17.7
c24.8,0,47.6,6.1,68.2,18.2c20.6,12.1,37,29.5,49.1,52.3c12.1,22.7,18.2,49.1,18.2,79l-0.4,11.7h-181.8
c3.6,11.4,10.5,20.7,20.9,28.1c10.3,7.3,21.3,11,33,11c14.4,0,26.3-2.2,35.7-6.5C2343.1,570.1,2352.1,563.7,2360.8,555.3z
M2244.1,486.2h92.9c-2.1-12.3-7.5-22.1-16.2-29.4s-18.7-11-30.1-11s-21.5,3.7-30.3,11C2251.7,464.1,2246.2,473.9,2244.1,486.2z"/>
<path d="M2565.9,446.3c-9.9,0-17.1,1.1-21.5,3.4c-4.5,2.2-6.7,5.9-6.7,11s3.4,8.8,10.3,11.2c6.9,2.4,18,4.9,33.2,7.6
c20,3,37,6.7,50.9,11.2s26,12.1,36.1,22.9c10.2,10.8,15.3,25.9,15.3,45.3c0,29.9-10.9,52.4-32.8,67.6
c-21.8,15.1-50.3,22.7-85.3,22.7c-25.7,0-49.5-3.7-71.4-11c-21.8-7.3-37.4-14.7-46.7-22.2l33.7-60.6c10.2,9,23.4,15.8,39.7,20.4
c16.3,4.6,31.3,7,45.1,7c19.7,0,29.6-5.2,29.6-15.7c0-5.4-3.3-9.4-9.9-11.9c-6.6-2.5-17.2-5.2-31.9-7.9c-18.9-3.3-34.9-7.2-48-11.7
c-13.2-4.5-24.6-12.2-34.3-23.1c-9.7-10.9-14.6-26-14.6-45.1c0-27.2,9.7-48.5,29-63.7c19.3-15.3,46-22.9,80.1-22.9
c23.3,0,44.4,3.6,63.3,10.8c18.9,7.2,34,14.5,45.3,22l-32.8,58.8c-10.8-7.5-23.2-13.7-37.3-18.6
C2590.5,448.7,2577.6,446.3,2565.9,446.3z"/>
<path d="M2817.3,446.3c-9.9,0-17.1,1.1-21.5,3.4c-4.5,2.2-6.7,5.9-6.7,11s3.4,8.8,10.3,11.2c6.9,2.4,18,4.9,33.2,7.6
c20,3,37,6.7,50.9,11.2s26,12.1,36.1,22.9c10.2,10.8,15.3,25.9,15.3,45.3c0,29.9-10.9,52.4-32.8,67.6
c-21.8,15.1-50.3,22.7-85.3,22.7c-25.7,0-49.5-3.7-71.4-11c-21.8-7.3-37.4-14.7-46.7-22.2l33.7-60.6c10.2,9,23.4,15.8,39.7,20.4
c16.3,4.6,31.3,7,45.1,7c19.8,0,29.6-5.2,29.6-15.7c0-5.4-3.3-9.4-9.9-11.9c-6.6-2.5-17.2-5.2-31.9-7.9c-18.9-3.3-34.9-7.2-48-11.7
c-13.2-4.5-24.6-12.2-34.3-23.1c-9.7-10.9-14.6-26-14.6-45.1c0-27.2,9.7-48.5,29-63.7c19.3-15.3,46-22.9,80.1-22.9
c23.3,0,44.4,3.6,63.3,10.8c18.9,7.2,34,14.5,45.3,22l-32.8,58.8c-10.8-7.5-23.2-13.7-37.3-18.6
C2841.8,448.7,2828.9,446.3,2817.3,446.3z"/>
<g>
<path d="M2508,724h60.2v17.3H2508V724z"/>
<path d="M2629.2,694.4c4.9-2,10.2-3.1,16-3.1c10.9,0,19.5,3.4,25.9,10.2s9.6,16.7,9.6,29.6v57.3h-19.6v-52.6
c0-9.3-1.7-16.2-5.1-20.7c-3.4-4.5-9.1-6.7-17-6.7c-6.5,0-11.8,2.4-16.1,7.1c-4.3,4.8-6.4,11.5-6.4,20.2v52.6h-19.6v-94.6h19.6v9.5
C2620.2,699.4,2624.4,696.4,2629.2,694.4z"/>
<path d="M2790.3,833.2c-8.6,6.8-19.4,10.2-32.3,10.2c-7.9,0-15.2-1.4-21.9-4.1s-12.1-6.8-16.3-12.2s-6.6-11.9-7.1-19.6h19.6
c0.7,6.1,3.5,10.8,8.4,13.9c4.9,3.2,10.7,4.8,17.4,4.8c7,0,13.1-2,18.2-6c5.1-4,7.7-10.3,7.7-18.9v-24.7c-3.6,3.4-8,6.2-13.3,8.2
c-5.2,2.1-10.7,3.1-16.3,3.1c-8.7,0-16.6-2.1-23.7-6.4c-7.1-4.3-12.6-10-16.7-17.3c-4-7.3-6-15.5-6-24.6s2-17.3,6-24.7
s9.6-13.2,16.7-17.4c7.1-4.3,15-6.4,23.7-6.4c5.7,0,11.1,1,16.3,3.1s9.6,4.8,13.3,8.2v-8.8h19.4v107.8
C2803.2,815.9,2798.9,826.4,2790.3,833.2z M2782.2,755.7c2.6-4.7,3.8-10,3.8-15.9s-1.3-11.2-3.8-16c-2.6-4.8-6.1-8.5-10.5-11.1
c-4.5-2.7-9.5-4-15.1-4c-5.8,0-10.9,1.4-15.4,4.3c-4.5,2.8-7.9,6.6-10.3,11.4c-2.4,4.8-3.6,9.9-3.6,15.5c0,5.4,1.2,10.5,3.6,15.3
c2.4,4.8,5.8,8.6,10.3,11.5s9.6,4.3,15.4,4.3c5.6,0,10.6-1.4,15.1-4.1C2776.1,764.1,2779.6,760.4,2782.2,755.7z"/>
<path d="M2843.5,788.4h-21.6l37.9-48l-36.4-46.6h22.6l25.7,33.3l25.8-33.3h21.6l-36.2,45.9l37.9,48.6h-22.6l-27.4-35L2843.5,788.4z
"/>
</g>
<path d="M835.8,319.2c-11.5-18.9-27.4-33.7-47.6-44.7c-20.2-10.9-43-16.4-68.5-16.4h-90.6c-8.6,39.6-21.3,77.2-38,112.4
c-10,21-21.3,41-33.9,59.9v209.2H647v-135h72.7c25.4,0,48.3-5.5,68.5-16.4s36.1-25.8,47.6-44.7c11.5-18.9,17.3-39.5,17.3-61.9
C853.1,358.9,847.4,338.1,835.8,319.2z M747,416.6c-9.4,9-21.8,13.5-37,13.5l-62.8,0.4v-93.4l62.8-0.4c15.3,0,27.6,4.5,37,13.5
s14.1,20,14.1,33.2C761.1,396.6,756.4,407.7,747,416.6z"/>
<path class="st0" d="M164.7,698.7c-3.5-16.5-10.4-49.6-11.3-49.6c-147.1-88-129.7-240.3-81-327.4C82.8,431.4,277,507.1,163.8,641.2
c-0.9,1.7,5.2,22.6,10.4,41.8c22.6-38.3,56.6-84.4,54.8-88.8C89.7,254.7,525,228.6,615.5,17.9c40.9,203.7-20.9,518.9-370.8,599
c-1.7,0.9-63.5,109.7-66.2,110.6c0-1.7-26.1-0.9-22.6-9.6C157.8,712.6,161.2,705.7,164.7,698.7L164.7,698.7z M160.4,616.9
c44.4-51.4-7.8-139.3-39.2-168C174.3,540.2,170.8,593.3,160.4,616.9L160.4,616.9z"/>
</svg>

Before

Width:  |  Height:  |  Size: 6.3 KiB

View File

@@ -1,69 +0,0 @@
<?xml version="1.0" encoding="utf-8"?>
<!-- Generator: Adobe Illustrator 27.0.1, SVG Export Plug-In . SVG Version: 6.00 Build 0) -->
<svg version="1.1" id="Layer_1" xmlns="http://www.w3.org/2000/svg" xmlns:xlink="http://www.w3.org/1999/xlink" x="0px" y="0px"
viewBox="0 0 2962.2 860.2" style="enable-background:new 0 0 2962.2 860.2;" xml:space="preserve">
<style type="text/css">
.st0{fill:#FFFFFF;stroke:#000000;stroke-miterlimit:10;}
.st1{fill:#17541F;stroke:#000000;stroke-miterlimit:10;}
</style>
<path class="st0" d="M1055.6,639.7v-20.6c-18,20-43.1,30.1-75.4,30.1c-22.4,0-42.8-5.8-61-17.5c-18.3-11.7-32.5-27.8-42.9-48.3
c-10.3-20.5-15.5-43.3-15.5-68.4c0-25.1,5.2-48,15.5-68.5s24.6-36.6,42.9-48.3s38.6-17.5,61-17.5c32.3,0,57.5,10,75.4,30.1v-20.6
h85.3v249.6L1055.6,639.7L1055.6,639.7z M1059.1,514.9c0-17.4-5.2-31.9-15.5-43.8c-10.3-11.8-23.9-17.7-40.6-17.7
c-16.8,0-30.2,5.9-40.4,17.7c-10.2,11.8-15.3,26.4-15.3,43.8c0,17.4,5.1,31.9,15.3,43.8c10.2,11.8,23.6,17.7,40.4,17.7
c16.8,0,30.3-5.9,40.6-17.7C1054,546.9,1059.1,532.3,1059.1,514.9z"/>
<path class="st0" d="M1417.8,398.2c18.3,11.7,32.5,27.8,42.9,48.3c10.3,20.5,15.5,43.3,15.5,68.5c0,25.1-5.2,48-15.5,68.4
c-10.3,20.5-24.6,36.6-42.9,48.3s-38.6,17.5-61,17.5c-32.3,0-57.5-10-75.4-30.1v165.6h-85.3V390.2h85.3v20.6
c18-20,43.1-30.1,75.4-30.1C1379.2,380.7,1399.5,386.6,1417.8,398.2z M1389.5,514.9c0-17.4-5.1-31.9-15.3-43.8
c-10.2-11.8-23.6-17.7-40.4-17.7s-30.2,5.9-40.4,17.7c-10.2,11.8-15.3,26.4-15.3,43.8c0,17.4,5.1,31.9,15.3,43.8
c10.2,11.8,23.6,17.7,40.4,17.7s30.2-5.9,40.4-17.7S1389.5,532.3,1389.5,514.9z"/>
<path class="st0" d="M1713.6,555.3l53,49.4c-28.1,29.6-66.7,44.4-115.8,44.4c-28.1,0-53-5.8-74.5-17.5s-38.2-27.7-49.8-48
c-11.7-20.3-17.7-43.2-18-68.7c0-24.8,5.9-47.5,17.7-68c11.8-20.5,28.1-36.7,48.7-48.5s43.5-17.7,68.7-17.7
c24.8,0,47.6,6.1,68.2,18.2s37,29.5,49.1,52.3c12.1,22.7,18.2,49.1,18.2,79l-0.4,11.7h-181.8c3.6,11.4,10.5,20.7,20.9,28.1
c10.3,7.3,21.3,11,33,11c14.4,0,26.3-2.2,35.7-6.5C1695.8,570.1,1704.9,563.7,1713.6,555.3z M1596.9,486.2h92.9
c-2.1-12.3-7.5-22.1-16.2-29.4s-18.7-11-30.1-11s-21.5,3.7-30.3,11S1599,473.9,1596.9,486.2z"/>
<path class="st0" d="M1908.8,418.4c7.8-10.8,17.2-19,28.3-24.7s22-8.5,32.8-8.5c11.4,0,20,1.6,26,4.9l-10.8,72.7
c-8.4-2.1-15.7-3.1-22-3.1c-17.1,0-30.4,4.3-39.9,12.8c-9.6,8.5-14.4,24.2-14.4,46.9v120.3h-85.3V390.2h85.3V418.4L1908.8,418.4z"/>
<path class="st0" d="M2113,258.2v381.5h-85.3V258.2H2113z"/>
<path class="st0" d="M2360.8,555.3l53,49.4c-28.1,29.6-66.7,44.4-115.8,44.4c-28.1,0-53-5.8-74.5-17.5s-38.2-27.7-49.8-48
c-11.7-20.3-17.7-43.2-18-68.7c0-24.8,5.9-47.5,17.7-68s28.1-36.7,48.7-48.5c20.6-11.8,43.5-17.7,68.7-17.7
c24.8,0,47.6,6.1,68.2,18.2c20.6,12.1,37,29.5,49.1,52.3c12.1,22.7,18.2,49.1,18.2,79l-0.4,11.7h-181.8
c3.6,11.4,10.5,20.7,20.9,28.1c10.3,7.3,21.3,11,33,11c14.4,0,26.3-2.2,35.7-6.5C2343.1,570.1,2352.1,563.7,2360.8,555.3z
M2244.1,486.2h92.9c-2.1-12.3-7.5-22.1-16.2-29.4s-18.7-11-30.1-11s-21.5,3.7-30.3,11C2251.7,464.1,2246.2,473.9,2244.1,486.2z"/>
<path class="st0" d="M2565.9,446.3c-9.9,0-17.1,1.1-21.5,3.4c-4.5,2.2-6.7,5.9-6.7,11s3.4,8.8,10.3,11.2c6.9,2.4,18,4.9,33.2,7.6
c20,3,37,6.7,50.9,11.2s26,12.1,36.1,22.9c10.2,10.8,15.3,25.9,15.3,45.3c0,29.9-10.9,52.4-32.8,67.6
c-21.8,15.1-50.3,22.7-85.3,22.7c-25.7,0-49.5-3.7-71.4-11c-21.8-7.3-37.4-14.7-46.7-22.2l33.7-60.6c10.2,9,23.4,15.8,39.7,20.4
c16.3,4.6,31.3,7,45.1,7c19.7,0,29.6-5.2,29.6-15.7c0-5.4-3.3-9.4-9.9-11.9c-6.6-2.5-17.2-5.2-31.9-7.9c-18.9-3.3-34.9-7.2-48-11.7
c-13.2-4.5-24.6-12.2-34.3-23.1c-9.7-10.9-14.6-26-14.6-45.1c0-27.2,9.7-48.5,29-63.7c19.3-15.3,46-22.9,80.1-22.9
c23.3,0,44.4,3.6,63.3,10.8c18.9,7.2,34,14.5,45.3,22l-32.8,58.8c-10.8-7.5-23.2-13.7-37.3-18.6
C2590.5,448.7,2577.6,446.3,2565.9,446.3z"/>
<path class="st0" d="M2817.3,446.3c-9.9,0-17.1,1.1-21.5,3.4c-4.5,2.2-6.7,5.9-6.7,11s3.4,8.8,10.3,11.2c6.9,2.4,18,4.9,33.2,7.6
c20,3,37,6.7,50.9,11.2s26,12.1,36.1,22.9c10.2,10.8,15.3,25.9,15.3,45.3c0,29.9-10.9,52.4-32.8,67.6
c-21.8,15.1-50.3,22.7-85.3,22.7c-25.7,0-49.5-3.7-71.4-11c-21.8-7.3-37.4-14.7-46.7-22.2l33.7-60.6c10.2,9,23.4,15.8,39.7,20.4
c16.3,4.6,31.3,7,45.1,7c19.8,0,29.6-5.2,29.6-15.7c0-5.4-3.3-9.4-9.9-11.9c-6.6-2.5-17.2-5.2-31.9-7.9c-18.9-3.3-34.9-7.2-48-11.7
c-13.2-4.5-24.6-12.2-34.3-23.1c-9.7-10.9-14.6-26-14.6-45.1c0-27.2,9.7-48.5,29-63.7c19.3-15.3,46-22.9,80.1-22.9
c23.3,0,44.4,3.6,63.3,10.8c18.9,7.2,34,14.5,45.3,22l-32.8,58.8c-10.8-7.5-23.2-13.7-37.3-18.6
C2841.8,448.7,2828.9,446.3,2817.3,446.3z"/>
<g>
<path class="st0" d="M2508,724h60.2v17.3H2508V724z"/>
<path class="st0" d="M2629.2,694.4c4.9-2,10.2-3.1,16-3.1c10.9,0,19.5,3.4,25.9,10.2s9.6,16.7,9.6,29.6v57.3h-19.6v-52.6
c0-9.3-1.7-16.2-5.1-20.7c-3.4-4.5-9.1-6.7-17-6.7c-6.5,0-11.8,2.4-16.1,7.1c-4.3,4.8-6.4,11.5-6.4,20.2v52.6h-19.6v-94.6h19.6v9.5
C2620.2,699.4,2624.4,696.4,2629.2,694.4z"/>
<path class="st0" d="M2790.3,833.2c-8.6,6.8-19.4,10.2-32.3,10.2c-7.9,0-15.2-1.4-21.9-4.1s-12.1-6.8-16.3-12.2s-6.6-11.9-7.1-19.6
h19.6c0.7,6.1,3.5,10.8,8.4,13.9c4.9,3.2,10.7,4.8,17.4,4.8c7,0,13.1-2,18.2-6c5.1-4,7.7-10.3,7.7-18.9v-24.7
c-3.6,3.4-8,6.2-13.3,8.2c-5.2,2.1-10.7,3.1-16.3,3.1c-8.7,0-16.6-2.1-23.7-6.4c-7.1-4.3-12.6-10-16.7-17.3c-4-7.3-6-15.5-6-24.6
s2-17.3,6-24.7s9.6-13.2,16.7-17.4c7.1-4.3,15-6.4,23.7-6.4c5.7,0,11.1,1,16.3,3.1s9.6,4.8,13.3,8.2v-8.8h19.4v107.8
C2803.2,815.9,2798.9,826.4,2790.3,833.2z M2782.2,755.7c2.6-4.7,3.8-10,3.8-15.9s-1.3-11.2-3.8-16c-2.6-4.8-6.1-8.5-10.5-11.1
c-4.5-2.7-9.5-4-15.1-4c-5.8,0-10.9,1.4-15.4,4.3c-4.5,2.8-7.9,6.6-10.3,11.4c-2.4,4.8-3.6,9.9-3.6,15.5c0,5.4,1.2,10.5,3.6,15.3
c2.4,4.8,5.8,8.6,10.3,11.5s9.6,4.3,15.4,4.3c5.6,0,10.6-1.4,15.1-4.1C2776.1,764.1,2779.6,760.4,2782.2,755.7z"/>
<path class="st0" d="M2843.5,788.4h-21.6l37.9-48l-36.4-46.6h22.6l25.7,33.3l25.8-33.3h21.6l-36.2,45.9l37.9,48.6h-22.6l-27.4-35
L2843.5,788.4z"/>
</g>
<path class="st0" d="M835.8,319.2c-11.5-18.9-27.4-33.7-47.6-44.7c-20.2-10.9-43-16.4-68.5-16.4h-90.6c-8.6,39.6-21.3,77.2-38,112.4
c-10,21-21.3,41-33.9,59.9v209.2H647v-135h72.7c25.4,0,48.3-5.5,68.5-16.4s36.1-25.8,47.6-44.7c11.5-18.9,17.3-39.5,17.3-61.9
C853.1,358.9,847.4,338.1,835.8,319.2z M747,416.6c-9.4,9-21.8,13.5-37,13.5l-62.8,0.4v-93.4l62.8-0.4c15.3,0,27.6,4.5,37,13.5
s14.1,20,14.1,33.2C761.1,396.6,756.4,407.7,747,416.6z"/>
<path class="st1" d="M164.7,698.7c-3.5-16.5-10.4-49.6-11.3-49.6c-147.1-88-129.7-240.3-81-327.4C82.8,431.4,277,507.1,163.8,641.2
c-0.9,1.7,5.2,22.6,10.4,41.8c22.6-38.3,56.6-84.4,54.8-88.8C89.7,254.7,525,228.6,615.5,17.9c40.9,203.7-20.9,518.9-370.8,599
c-1.7,0.9-63.5,109.7-66.2,110.6c0-1.7-26.1-0.9-22.6-9.6C157.8,712.6,161.2,705.7,164.7,698.7L164.7,698.7z M160.4,616.9
c44.4-51.4-7.8-139.3-39.2-168C174.3,540.2,170.8,593.3,160.4,616.9L160.4,616.9z"/>
</svg>

Before

Width:  |  Height:  |  Size: 6.5 KiB

Binary file not shown.

Before

Width:  |  Height:  |  Size: 740 KiB

Binary file not shown.

Before

Width:  |  Height:  |  Size: 383 KiB

Binary file not shown.

Before

Width:  |  Height:  |  Size: 704 KiB

Binary file not shown.

Before

Width:  |  Height:  |  Size: 474 KiB

Binary file not shown.

Before

Width:  |  Height:  |  Size: 616 KiB

Binary file not shown.

Before

Width:  |  Height:  |  Size: 708 KiB

Binary file not shown.

Before

Width:  |  Height:  |  Size: 705 KiB

Binary file not shown.

Before

Width:  |  Height:  |  Size: 480 KiB

Binary file not shown.

Before

Width:  |  Height:  |  Size: 689 KiB

Some files were not shown because too many files have changed in this diff Show More