Compare commits

..

1 Commits

Author SHA1 Message Date
shamoon
8ea2f56925 Enhancement: collapsible sidebar menus 2025-09-20 10:16:36 -07:00
23 changed files with 125 additions and 542 deletions

View File

@@ -2,11 +2,9 @@
If you feel like contributing to the project, please do! Bug fixes and improvements are always welcome. If you feel like contributing to the project, please do! Bug fixes and improvements are always welcome.
⚠️ Please note: Pull requests that implement a new feature or enhancement _should almost always target an existing feature request_ with evidence of community interest and discussion. This is in order to balance the work of implementing and maintaining new features / enhancements. Pull requests that are opened without meeting this requirement may not be merged.
If you want to implement something big: If you want to implement something big:
- As above, please start with a discussion! Maybe something similar is already in development and we can make it happen together. - Please start a discussion about that in the issues! Maybe something similar is already in development and we can make it happen together.
- When making additions to the project, consider if the majority of users will benefit from your change. If not, you're probably better of forking the project. - When making additions to the project, consider if the majority of users will benefit from your change. If not, you're probably better of forking the project.
- Also consider if your change will get in the way of other users. A good change is a change that enhances the experience of some users who want that change and does not affect users who do not care about the change. - Also consider if your change will get in the way of other users. A good change is a change that enhances the experience of some users who want that change and does not affect users who do not care about the change.
- Please see the [paperless-ngx merge process](#merging-prs) below. - Please see the [paperless-ngx merge process](#merging-prs) below.

View File

@@ -1759,11 +1759,6 @@ started by the container.
: Path to an image file in the /media/logo directory, must include 'logo', e.g. `/logo/Atari_logo.svg` : Path to an image file in the /media/logo directory, must include 'logo', e.g. `/logo/Atari_logo.svg`
!!! note
The logo file will be viewable by anyone with access to the Paperless instance login page,
so consider your choice of logo carefully and removing exif data from images before uploading.
#### [`PAPERLESS_ENABLE_UPDATE_CHECK=<bool>`](#PAPERLESS_ENABLE_UPDATE_CHECK) {#PAPERLESS_ENABLE_UPDATE_CHECK} #### [`PAPERLESS_ENABLE_UPDATE_CHECK=<bool>`](#PAPERLESS_ENABLE_UPDATE_CHECK) {#PAPERLESS_ENABLE_UPDATE_CHECK}
!!! note !!! note
@@ -1805,23 +1800,3 @@ password. All of these options come from their similarly-named [Django settings]
#### [`PAPERLESS_EMAIL_USE_SSL=<bool>`](#PAPERLESS_EMAIL_USE_SSL) {#PAPERLESS_EMAIL_USE_SSL} #### [`PAPERLESS_EMAIL_USE_SSL=<bool>`](#PAPERLESS_EMAIL_USE_SSL) {#PAPERLESS_EMAIL_USE_SSL}
: Defaults to false. : Defaults to false.
## Remote OCR
#### [`PAPERLESS_REMOTE_OCR_ENGINE=<str>`](#PAPERLESS_REMOTE_OCR_ENGINE) {#PAPERLESS_REMOTE_OCR_ENGINE}
: The remote OCR engine to use. Currently only Azure AI is supported as "azureai".
Defaults to None, which disables remote OCR.
#### [`PAPERLESS_REMOTE_OCR_API_KEY=<str>`](#PAPERLESS_REMOTE_OCR_API_KEY) {#PAPERLESS_REMOTE_OCR_API_KEY}
: The API key to use for the remote OCR engine.
Defaults to None.
#### [`PAPERLESS_REMOTE_OCR_ENDPOINT=<str>`](#PAPERLESS_REMOTE_OCR_ENDPOINT) {#PAPERLESS_REMOTE_OCR_ENDPOINT}
: The endpoint to use for the remote OCR engine. This is required for Azure AI.
Defaults to None.

View File

@@ -25,10 +25,9 @@ physical documents into a searchable online archive so you can keep, well, _less
## Features ## Features
- **Organize and index** your scanned documents with tags, correspondents, types, and more. - **Organize and index** your scanned documents with tags, correspondents, types, and more.
- _Your_ data is stored locally on _your_ server and is never transmitted or shared in any way, unless you explicitly choose to do so. - _Your_ data is stored locally on _your_ server and is never transmitted or shared in any way.
- Performs **OCR** on your documents, adding searchable and selectable text, even to documents scanned with only images. - Performs **OCR** on your documents, adding searchable and selectable text, even to documents scanned with only images.
- Utilizes the open-source Tesseract engine to recognize more than 100 languages. - Utilizes the open-source Tesseract engine to recognize more than 100 languages.
- _New!_ Supports remote OCR with Azure AI (opt-in).
- Documents are saved as PDF/A format which is designed for long term storage, alongside the unaltered originals. - Documents are saved as PDF/A format which is designed for long term storage, alongside the unaltered originals.
- Uses machine-learning to automatically add tags, correspondents and document types to your documents. - Uses machine-learning to automatically add tags, correspondents and document types to your documents.
- Supports PDF documents, images, plain text files, Office documents (Word, Excel, PowerPoint, and LibreOffice equivalents)[^1] and more. - Supports PDF documents, images, plain text files, Office documents (Word, Excel, PowerPoint, and LibreOffice equivalents)[^1] and more.

View File

@@ -878,21 +878,6 @@ how regularly you intend to scan documents and use paperless.
performed the task associated with the document, move it to the performed the task associated with the document, move it to the
inbox. inbox.
## Remote OCR
!!! important
This feature is disabled by default and will always remain strictly "opt-in".
Paperless-ngx supports performing OCR on documents using remote services. At the moment, this is limited to
[Microsoft's Azure "Document Intelligence" service](https://azure.microsoft.com/en-us/products/ai-services/ai-document-intelligence).
This is of course a paid service (with a free tier) which requires an Azure account and subscription. Azure AI is not affiliated with
Paperless-ngx in any way. When enabled, Paperless-ngx will automatically send appropriate documents to Azure for OCR processing, bypassing
the local OCR engine. See the [configuration](configuration.md#PAPERLESS_REMOTE_OCR_ENGINE) options for more details.
Additionally, when using a commercial service with this feature, consider both potential costs as well as any associated file size
or page limitations (e.g. with a free tier).
## Architecture ## Architecture
Paperless-ngx consists of the following components: Paperless-ngx consists of the following components:

View File

@@ -15,7 +15,6 @@ classifiers = [
# This will allow testing to not install a webserver, mysql, etc # This will allow testing to not install a webserver, mysql, etc
dependencies = [ dependencies = [
"azure-ai-documentintelligence>=1.0.2",
"babel>=2.17", "babel>=2.17",
"bleach~=6.2.0", "bleach~=6.2.0",
"celery[redis]~=5.5.1", "celery[redis]~=5.5.1",
@@ -234,7 +233,6 @@ testpaths = [
"src/paperless_tesseract/tests/", "src/paperless_tesseract/tests/",
"src/paperless_tika/tests", "src/paperless_tika/tests",
"src/paperless_text/tests/", "src/paperless_text/tests/",
"src/paperless_remote/tests/",
] ]
addopts = [ addopts = [
"--pythonwarnings=all", "--pythonwarnings=all",

View File

@@ -166,10 +166,13 @@
</div> </div>
<div class="nav-group mt-3 mb-1"> <div class="nav-group mt-3 mb-1">
<h6 class="sidebar-heading px-3 text-muted"> <h6 class="sidebar-heading px-3 text-muted d-flex align-items-center">
<span i18n>Manage</span> <span i18n>Manage</span>
<button class="btn btn-link p-2 py-0" (click)="manageCollapse.toggle()">
<i-bs width="0.9em" height="0.9em" [name]="isManageMenuCollapsed ? 'chevron-down' : 'chevron-up'"></i-bs>
</button>
</h6> </h6>
<ul class="nav flex-column mb-2"> <ul class="nav flex-column mb-2" #manageCollapse="ngbCollapse" [(ngbCollapse)]="isManageMenuCollapsed">
<li class="nav-item app-link" <li class="nav-item app-link"
*pngxIfPermissions="{ action: PermissionAction.View, type: PermissionType.Correspondent }"> *pngxIfPermissions="{ action: PermissionAction.View, type: PermissionType.Correspondent }">
<a class="nav-link" routerLink="correspondents" routerLinkActive="active" (click)="closeMenu()" <a class="nav-link" routerLink="correspondents" routerLinkActive="active" (click)="closeMenu()"
@@ -243,117 +246,124 @@
</div> </div>
<div class="nav-group mt-auto mb-1"> <div class="nav-group mt-auto mb-1">
<h6 class="sidebar-heading px-3 pt-4 text-muted"> <h6 class="sidebar-heading px-3 pt-4 text-muted d-flex align-items-center">
<span i18n>Administration</span> <span i18n>Administration</span>
<button class="btn btn-link p-2 py-0" (click)="adminCollapse.toggle()">
<i-bs width="0.9em" height="0.9em" [name]="isAdminMenuCollapsed ? 'chevron-down' : 'chevron-up'"></i-bs>
</button>
</h6> </h6>
<ul class="nav flex-column mb-2"> <div class="mb-2">
<li class="nav-item app-link" *pngxIfPermissions="{ action: PermissionAction.Change, type: PermissionType.UISettings }" <ul class="nav flex-column" #adminCollapse="ngbCollapse" [(ngbCollapse)]="isAdminMenuCollapsed">
tourAnchor="tour.settings"> <li class="nav-item app-link" *pngxIfPermissions="{ action: PermissionAction.Change, type: PermissionType.UISettings }"
<a class="nav-link" routerLink="settings" routerLinkActive="active" (click)="closeMenu()" tourAnchor="tour.settings">
ngbPopover="Settings" i18n-ngbPopover [disablePopover]="!slimSidebarEnabled" placement="end" <a class="nav-link" routerLink="settings" routerLinkActive="active" (click)="closeMenu()"
container="body" triggers="mouseenter:mouseleave" popoverClass="popover-slim"> ngbPopover="Settings" i18n-ngbPopover [disablePopover]="!slimSidebarEnabled" placement="end"
<i-bs class="me-1" name="gear"></i-bs><span>&nbsp;<ng-container i18n>Settings</ng-container></span> container="body" triggers="mouseenter:mouseleave" popoverClass="popover-slim">
</a> <i-bs class="me-1" name="gear"></i-bs><span>&nbsp;<ng-container i18n>Settings</ng-container></span>
</li>
<li class="nav-item app-link" *pngxIfPermissions="{ action: PermissionAction.Change, type: PermissionType.AppConfig }">
<a class="nav-link" routerLink="config" routerLinkActive="active" (click)="closeMenu()"
ngbPopover="Configuration" i18n-ngbPopover [disablePopover]="!slimSidebarEnabled" placement="end"
container="body" triggers="mouseenter:mouseleave" popoverClass="popover-slim">
<i-bs class="me-1" name="sliders2-vertical"></i-bs><span>&nbsp;<ng-container i18n>Configuration</ng-container></span>
</a>
</li>
<li class="nav-item app-link" *pngxIfPermissions="{ action: PermissionAction.View, type: PermissionType.User }">
<a class="nav-link" routerLink="usersgroups" routerLinkActive="active" (click)="closeMenu()"
ngbPopover="Users & Groups" i18n-ngbPopover [disablePopover]="!slimSidebarEnabled" placement="end"
container="body" triggers="mouseenter:mouseleave" popoverClass="popover-slim">
<i-bs class="me-1" name="people"></i-bs><span>&nbsp;<ng-container i18n>Users & Groups</ng-container></span>
</a>
</li>
<li class="nav-item app-link"
*pngxIfPermissions="{ action: PermissionAction.View, type: PermissionType.PaperlessTask }"
tourAnchor="tour.file-tasks">
<a class="nav-link" routerLink="tasks" routerLinkActive="active" (click)="closeMenu()"
ngbPopover="File Tasks" i18n-ngbPopover [disablePopover]="!slimSidebarEnabled" placement="end"
container="body" triggers="mouseenter:mouseleave" popoverClass="popover-slim">
<i-bs class="me-1" name="list-task"></i-bs><span>&nbsp;<ng-container i18n>File Tasks</ng-container>@if (tasksService.failedFileTasks.length > 0) {
<span><span class="badge bg-danger ms-2 d-inline">{{tasksService.failedFileTasks.length}}</span></span>
}</span>
@if (tasksService.failedFileTasks.length > 0 && slimSidebarEnabled) {
<span class="badge bg-danger position-absolute top-0 end-0 d-none d-md-block">{{tasksService.failedFileTasks.length}}</span>
}
</a>
</li>
@if (permissionsService.isAdmin()) {
<li class="nav-item app-link">
<a class="nav-link" routerLink="logs" routerLinkActive="active" (click)="closeMenu()" ngbPopover="Logs"
i18n-ngbPopover [disablePopover]="!slimSidebarEnabled" placement="end" container="body"
triggers="mouseenter:mouseleave" popoverClass="popover-slim">
<i-bs class="me-1" name="text-left"></i-bs><span>&nbsp;<ng-container i18n>Logs</ng-container></span>
</a> </a>
</li> </li>
} <li class="nav-item app-link" *pngxIfPermissions="{ action: PermissionAction.Change, type: PermissionType.AppConfig }">
<li class="nav-item mt-2" tourAnchor="tour.outro"> <a class="nav-link" routerLink="config" routerLinkActive="active" (click)="closeMenu()"
<a class="px-3 py-2 text-muted small d-flex align-items-center flex-wrap text-decoration-none" ngbPopover="Configuration" i18n-ngbPopover [disablePopover]="!slimSidebarEnabled" placement="end"
target="_blank" rel="noopener noreferrer" href="https://docs.paperless-ngx.com" ngbPopover="Documentation" container="body" triggers="mouseenter:mouseleave" popoverClass="popover-slim">
i18n-ngbPopover [disablePopover]="!slimSidebarEnabled" placement="end" container="body" <i-bs class="me-1" name="sliders2-vertical"></i-bs><span>&nbsp;<ng-container i18n>Configuration</ng-container></span>
triggers="mouseenter:mouseleave" popoverClass="popover-slim"> </a>
<i-bs class="d-flex" name="question-circle"></i-bs><span class="ms-1">&nbsp;<ng-container i18n>Documentation</ng-container></span> </li>
</a> <li class="nav-item app-link" *pngxIfPermissions="{ action: PermissionAction.View, type: PermissionType.User }">
</li> <a class="nav-link" routerLink="usersgroups" routerLinkActive="active" (click)="closeMenu()"
<li class="nav-item" [class.visually-hidden]="slimSidebarEnabled"> ngbPopover="Users & Groups" i18n-ngbPopover [disablePopover]="!slimSidebarEnabled" placement="end"
<div class="px-3 py-0 text-muted small d-flex align-items-center flex-wrap"> container="body" triggers="mouseenter:mouseleave" popoverClass="popover-slim">
<div class="me-3"> <i-bs class="me-1" name="people"></i-bs><span>&nbsp;<ng-container i18n>Users & Groups</ng-container></span>
<a class="text-muted text-decoration-none" target="_blank" rel="noopener noreferrer" </a>
href="https://github.com/paperless-ngx/paperless-ngx" ngbPopover="GitHub" i18n-ngbPopover </li>
[disablePopover]="!slimSidebarEnabled" placement="end" container="body" <li class="nav-item app-link"
*pngxIfPermissions="{ action: PermissionAction.View, type: PermissionType.PaperlessTask }"
tourAnchor="tour.file-tasks">
<a class="nav-link" routerLink="tasks" routerLinkActive="active" (click)="closeMenu()"
ngbPopover="File Tasks" i18n-ngbPopover [disablePopover]="!slimSidebarEnabled" placement="end"
container="body" triggers="mouseenter:mouseleave" popoverClass="popover-slim">
<i-bs class="me-1" name="list-task"></i-bs><span>&nbsp;<ng-container i18n>File Tasks</ng-container>@if (tasksService.failedFileTasks.length > 0) {
<span><span class="badge bg-danger ms-2 d-inline">{{tasksService.failedFileTasks.length}}</span></span>
}</span>
@if (tasksService.failedFileTasks.length > 0 && slimSidebarEnabled) {
<span class="badge bg-danger position-absolute top-0 end-0 d-none d-md-block">{{tasksService.failedFileTasks.length}}</span>
}
</a>
</li>
@if (permissionsService.isAdmin()) {
<li class="nav-item app-link">
<a class="nav-link" routerLink="logs" routerLinkActive="active" (click)="closeMenu()" ngbPopover="Logs"
i18n-ngbPopover [disablePopover]="!slimSidebarEnabled" placement="end" container="body"
triggers="mouseenter:mouseleave" popoverClass="popover-slim"> triggers="mouseenter:mouseleave" popoverClass="popover-slim">
{{ versionString }} <i-bs class="me-1" name="text-left"></i-bs><span>&nbsp;<ng-container i18n>Logs</ng-container></span>
</a> </a>
</div> </li>
@if (!settingsService.updateCheckingIsSet || appRemoteVersion) { }
<div class="version-check"> </ul>
<ng-template #updateAvailablePopContent> <ul class="nav flex-column">
<span class="small">Paperless-ngx {{ appRemoteVersion.version }} <ng-container i18n>is <li class="nav-item mt-2" tourAnchor="tour.outro">
available.</ng-container><br /><ng-container i18n>Click to view.</ng-container></span> <a class="px-3 py-2 text-muted small d-flex align-items-center flex-wrap text-decoration-none"
</ng-template> target="_blank" rel="noopener noreferrer" href="https://docs.paperless-ngx.com" ngbPopover="Documentation"
<ng-template #updateCheckingNotEnabledPopContent> i18n-ngbPopover [disablePopover]="!slimSidebarEnabled" placement="end" container="body"
<p class="small mb-2"> triggers="mouseenter:mouseleave" popoverClass="popover-slim">
<ng-container i18n>Paperless-ngx can automatically check for updates</ng-container> <i-bs class="d-flex" name="question-circle"></i-bs><span class="ms-1">&nbsp;<ng-container i18n>Documentation</ng-container></span>
</p> </a>
<div class="btn-group btn-group-xs flex-fill w-100"> </li>
<button class="btn btn-outline-primary" (click)="setUpdateChecking(true)">Enable</button> <li class="nav-item" [class.visually-hidden]="slimSidebarEnabled">
<button class="btn btn-outline-secondary" (click)="setUpdateChecking(false)">Disable</button> <div class="px-3 py-0 text-muted small d-flex align-items-center flex-wrap">
</div> <div class="me-3">
<p class="small mb-0 mt-2"> <a class="text-muted text-decoration-none" target="_blank" rel="noopener noreferrer"
<a class="small text-decoration-none fst-italic" routerLink="/settings" fragment="update-checking" i18n> href="https://github.com/paperless-ngx/paperless-ngx" ngbPopover="GitHub" i18n-ngbPopover
How does this work? [disablePopover]="!slimSidebarEnabled" placement="end" container="body"
</a> triggers="mouseenter:mouseleave" popoverClass="popover-slim">
</p> {{ versionString }}
</ng-template> </a>
@if (settingsService.updateCheckingIsSet) { </div>
@if (appRemoteVersion.update_available) { @if (!settingsService.updateCheckingIsSet || appRemoteVersion) {
<a class="small text-decoration-none" target="_blank" rel="noopener noreferrer" <div class="version-check">
href="https://github.com/paperless-ngx/paperless-ngx/releases" <ng-template #updateAvailablePopContent>
[ngbPopover]="updateAvailablePopContent" popoverClass="shadow" triggers="mouseenter:mouseleave" <span class="small">Paperless-ngx {{ appRemoteVersion.version }} <ng-container i18n>is
available.</ng-container><br /><ng-container i18n>Click to view.</ng-container></span>
</ng-template>
<ng-template #updateCheckingNotEnabledPopContent>
<p class="small mb-2">
<ng-container i18n>Paperless-ngx can automatically check for updates</ng-container>
</p>
<div class="btn-group btn-group-xs flex-fill w-100">
<button class="btn btn-outline-primary" (click)="setUpdateChecking(true)">Enable</button>
<button class="btn btn-outline-secondary" (click)="setUpdateChecking(false)">Disable</button>
</div>
<p class="small mb-0 mt-2">
<a class="small text-decoration-none fst-italic" routerLink="/settings" fragment="update-checking" i18n>
How does this work?
</a>
</p>
</ng-template>
@if (settingsService.updateCheckingIsSet) {
@if (appRemoteVersion.update_available) {
<a class="small text-decoration-none" target="_blank" rel="noopener noreferrer"
href="https://github.com/paperless-ngx/paperless-ngx/releases"
[ngbPopover]="updateAvailablePopContent" popoverClass="shadow" triggers="mouseenter:mouseleave"
container="body">
<i-bs width="1.2em" height="1.2em" name="info-circle"></i-bs>
@if (appRemoteVersion?.update_available) {
&nbsp;<ng-container i18n>Update available</ng-container>
}
</a>
}
} @else {
<a *pngxIfPermissions="{ action: PermissionAction.Change, type: PermissionType.UISettings }" class="small text-decoration-none" routerLink="/settings" fragment="update-checking"
[ngbPopover]="updateCheckingNotEnabledPopContent" popoverClass="shadow" triggers="mouseenter"
container="body"> container="body">
<i-bs width="1.2em" height="1.2em" name="info-circle"></i-bs> <i-bs width="1.2em" height="1.2em" name="info-circle"></i-bs>
@if (appRemoteVersion?.update_available) {
&nbsp;<ng-container i18n>Update available</ng-container>
}
</a> </a>
} }
} @else { </div>
<a *pngxIfPermissions="{ action: PermissionAction.Change, type: PermissionType.UISettings }" class="small text-decoration-none" routerLink="/settings" fragment="update-checking" }
[ngbPopover]="updateCheckingNotEnabledPopContent" popoverClass="shadow" triggers="mouseenter" </div>
container="body"> </li>
<i-bs width="1.2em" height="1.2em" name="info-circle"></i-bs> </ul>
</a> </div>
}
</div>
}
</div>
</li>
</ul>
</div> </div>
</div> </div>
</nav> </nav>

View File

@@ -89,6 +89,8 @@ export class AppFrameComponent
appRemoteVersion: AppRemoteVersion appRemoteVersion: AppRemoteVersion
isMenuCollapsed: boolean = true isMenuCollapsed: boolean = true
isManageMenuCollapsed: boolean = false
isAdminMenuCollapsed: boolean = false
slimSidebarAnimating: boolean = false slimSidebarAnimating: boolean = false

View File

@@ -71,20 +71,4 @@ describe('TagListComponent', () => {
'Do you really want to delete the tag "Tag1"?' 'Do you really want to delete the tag "Tag1"?'
) )
}) })
it('should filter out child tags if name filter is empty, otherwise show all', () => {
const tags = [
{ id: 1, name: 'Tag1', parent: null },
{ id: 2, name: 'Tag2', parent: 1 },
{ id: 3, name: 'Tag3', parent: null },
]
component['_nameFilter'] = null // Simulate empty name filter
const filtered = component.filterData(tags as any)
expect(filtered.length).toBe(2)
expect(filtered.find((t) => t.id === 2)).toBeUndefined()
component['_nameFilter'] = 'Tag2' // Simulate non-empty name filter
const filteredWithName = component.filterData(tags as any)
expect(filteredWithName.length).toBe(3)
})
}) })

View File

@@ -62,8 +62,6 @@ export class TagListComponent extends ManagementListComponent<Tag> {
} }
filterData(data: Tag[]) { filterData(data: Tag[]) {
return this.nameFilter?.length return data.filter((tag) => !tag.parent)
? [...data]
: data.filter((tag) => !tag.parent)
} }
} }

View File

@@ -55,7 +55,9 @@ import {
checkLg, checkLg,
chevronDoubleLeft, chevronDoubleLeft,
chevronDoubleRight, chevronDoubleRight,
chevronDown,
chevronRight, chevronRight,
chevronUp,
clipboard, clipboard,
clipboardCheck, clipboardCheck,
clipboardCheckFill, clipboardCheckFill,
@@ -267,7 +269,9 @@ const icons = {
checkLg, checkLg,
chevronDoubleLeft, chevronDoubleLeft,
chevronDoubleRight, chevronDoubleRight,
chevronDown,
chevronRight, chevronRight,
chevronUp,
clipboard, clipboard,
clipboardCheck, clipboardCheck,
clipboardCheckFill, clipboardCheckFill,

View File

@@ -82,13 +82,6 @@ def _is_ignored(filepath: Path) -> bool:
def _consume(filepath: Path) -> None: def _consume(filepath: Path) -> None:
# Check permissions early
try:
filepath.stat()
except (PermissionError, OSError):
logger.warning(f"Not consuming file {filepath}: Permission denied.")
return
if filepath.is_dir() or _is_ignored(filepath): if filepath.is_dir() or _is_ignored(filepath):
return return
@@ -330,12 +323,7 @@ class Command(BaseCommand):
# Also make sure the file exists still, some scanners might write a # Also make sure the file exists still, some scanners might write a
# temporary file first # temporary file first
try: file_still_exists = filepath.exists() and filepath.is_file()
file_still_exists = filepath.exists() and filepath.is_file()
except (PermissionError, OSError): # pragma: no cover
# If we can't check, let it fail in the _consume function
file_still_exists = True
continue
if waited_long_enough and file_still_exists: if waited_long_enough and file_still_exists:
_consume(filepath) _consume(filepath)

View File

@@ -209,26 +209,6 @@ class TestConsumer(DirectoriesMixin, ConsumerThreadMixin, TransactionTestCase):
# assert that we have an error logged with this invalid file. # assert that we have an error logged with this invalid file.
error_logger.assert_called_once() error_logger.assert_called_once()
@mock.patch("documents.management.commands.document_consumer.logger.warning")
def test_permission_error_on_prechecks(self, warning_logger):
filepath = Path(self.dirs.consumption_dir) / "selinux.txt"
filepath.touch()
original_stat = Path.stat
def raising_stat(self, *args, **kwargs):
if self == filepath:
raise PermissionError("Permission denied")
return original_stat(self, *args, **kwargs)
with mock.patch("pathlib.Path.stat", new=raising_stat):
document_consumer._consume(filepath)
warning_logger.assert_called_once()
(args, _) = warning_logger.call_args
self.assertIn("Permission denied", args[0])
self.consume_file_mock.assert_not_called()
@override_settings(CONSUMPTION_DIR="does_not_exist") @override_settings(CONSUMPTION_DIR="does_not_exist")
def test_consumption_directory_invalid(self): def test_consumption_directory_invalid(self):
self.assertRaises(CommandError, call_command, "document_consumer", "--oneshot") self.assertRaises(CommandError, call_command, "document_consumer", "--oneshot")

View File

@@ -322,7 +322,6 @@ INSTALLED_APPS = [
"paperless_tesseract.apps.PaperlessTesseractConfig", "paperless_tesseract.apps.PaperlessTesseractConfig",
"paperless_text.apps.PaperlessTextConfig", "paperless_text.apps.PaperlessTextConfig",
"paperless_mail.apps.PaperlessMailConfig", "paperless_mail.apps.PaperlessMailConfig",
"paperless_remote.apps.PaperlessRemoteParserConfig",
"django.contrib.admin", "django.contrib.admin",
"rest_framework", "rest_framework",
"rest_framework.authtoken", "rest_framework.authtoken",
@@ -923,7 +922,7 @@ CELERY_ACCEPT_CONTENT = ["application/json", "application/x-python-serialize"]
CELERY_BEAT_SCHEDULE = _parse_beat_schedule() CELERY_BEAT_SCHEDULE = _parse_beat_schedule()
# https://docs.celeryq.dev/en/stable/userguide/configuration.html#beat-schedule-filename # https://docs.celeryq.dev/en/stable/userguide/configuration.html#beat-schedule-filename
CELERY_BEAT_SCHEDULE_FILENAME = str(DATA_DIR / "celerybeat-schedule.db") CELERY_BEAT_SCHEDULE_FILENAME = DATA_DIR / "celerybeat-schedule.db"
# Cachalot: Database read cache. # Cachalot: Database read cache.
@@ -1390,10 +1389,3 @@ WEBHOOKS_ALLOW_INTERNAL_REQUESTS = __get_boolean(
"PAPERLESS_WEBHOOKS_ALLOW_INTERNAL_REQUESTS", "PAPERLESS_WEBHOOKS_ALLOW_INTERNAL_REQUESTS",
"true", "true",
) )
###############################################################################
# Remote Parser #
###############################################################################
REMOTE_OCR_ENGINE = os.getenv("PAPERLESS_REMOTE_OCR_ENGINE")
REMOTE_OCR_API_KEY = os.getenv("PAPERLESS_REMOTE_OCR_API_KEY")
REMOTE_OCR_ENDPOINT = os.getenv("PAPERLESS_REMOTE_OCR_ENDPOINT")

View File

@@ -1,4 +0,0 @@
# this is here so that django finds the checks.
from paperless_remote.checks import check_remote_parser_configured
__all__ = ["check_remote_parser_configured"]

View File

@@ -1,14 +0,0 @@
from django.apps import AppConfig
from paperless_remote.signals import remote_consumer_declaration
class PaperlessRemoteParserConfig(AppConfig):
name = "paperless_remote"
def ready(self):
from documents.signals import document_consumer_declaration
document_consumer_declaration.connect(remote_consumer_declaration)
AppConfig.ready(self)

View File

@@ -1,17 +0,0 @@
from django.conf import settings
from django.core.checks import Error
from django.core.checks import register
@register()
def check_remote_parser_configured(app_configs, **kwargs):
if settings.REMOTE_OCR_ENGINE == "azureai" and not (
settings.REMOTE_OCR_ENDPOINT and settings.REMOTE_OCR_API_KEY
):
return [
Error(
"Azure AI remote parser requires endpoint and API key to be configured.",
),
]
return []

View File

@@ -1,113 +0,0 @@
from pathlib import Path
from django.conf import settings
from paperless_tesseract.parsers import RasterisedDocumentParser
class RemoteEngineConfig:
def __init__(
self,
engine: str,
api_key: str | None = None,
endpoint: str | None = None,
):
self.engine = engine
self.api_key = api_key
self.endpoint = endpoint
def engine_is_valid(self):
valid = self.engine in ["azureai"] and self.api_key is not None
if self.engine == "azureai":
valid = valid and self.endpoint is not None
return valid
class RemoteDocumentParser(RasterisedDocumentParser):
"""
This parser uses a remote OCR engine to parse documents. Currently, it supports Azure AI Vision
as this is the only service that provides a remote OCR API with text-embedded PDF output.
"""
logging_name = "paperless.parsing.remote"
def get_settings(self) -> RemoteEngineConfig:
"""
Returns the configuration for the remote OCR engine, loaded from Django settings.
"""
return RemoteEngineConfig(
engine=settings.REMOTE_OCR_ENGINE,
api_key=settings.REMOTE_OCR_API_KEY,
endpoint=settings.REMOTE_OCR_ENDPOINT,
)
def supported_mime_types(self):
if self.settings.engine_is_valid():
return {
"application/pdf": ".pdf",
"image/png": ".png",
"image/jpeg": ".jpg",
"image/tiff": ".tiff",
"image/bmp": ".bmp",
"image/gif": ".gif",
"image/webp": ".webp",
}
else:
return {}
def azure_ai_vision_parse(
self,
file: Path,
) -> str | None:
"""
Uses Azure AI Vision to parse the document and return the text content.
It requests a searchable PDF output with embedded text.
The PDF is saved to the archive_path attribute.
Returns the text content extracted from the document.
If the parsing fails, it returns None.
"""
from azure.ai.documentintelligence import DocumentIntelligenceClient
from azure.ai.documentintelligence.models import AnalyzeDocumentRequest
from azure.ai.documentintelligence.models import AnalyzeOutputOption
from azure.ai.documentintelligence.models import DocumentContentFormat
from azure.core.credentials import AzureKeyCredential
client = DocumentIntelligenceClient(
endpoint=self.settings.endpoint,
credential=AzureKeyCredential(self.settings.api_key),
)
with file.open("rb") as f:
analyze_request = AnalyzeDocumentRequest(bytes_source=f.read())
poller = client.begin_analyze_document(
model_id="prebuilt-read",
body=analyze_request,
output_content_format=DocumentContentFormat.TEXT,
output=[AnalyzeOutputOption.PDF], # request searchable PDF output
content_type="application/json",
)
poller.wait()
result_id = poller.details["operation_id"]
result = poller.result()
# Download the PDF with embedded text
self.archive_path = self.tempdir / "archive.pdf"
with self.archive_path.open("wb") as f:
for chunk in client.get_analyze_result_pdf(
model_id="prebuilt-read",
result_id=result_id,
):
f.write(chunk)
client.close()
return result.content
def parse(self, document_path: Path, mime_type, file_name=None):
if not self.settings.engine_is_valid():
self.log.warning(
"No valid remote parser engine is configured, content will be empty.",
)
self.text = ""
elif self.settings.engine == "azureai":
self.text = self.azure_ai_vision_parse(document_path)

View File

@@ -1,18 +0,0 @@
def get_parser(*args, **kwargs):
from paperless_remote.parsers import RemoteDocumentParser
return RemoteDocumentParser(*args, **kwargs)
def get_supported_mime_types():
from paperless_remote.parsers import RemoteDocumentParser
return RemoteDocumentParser(None).supported_mime_types()
def remote_consumer_declaration(sender, **kwargs):
return {
"parser": get_parser,
"weight": 5,
"mime_types": get_supported_mime_types(),
}

View File

@@ -1,24 +0,0 @@
from unittest import TestCase
from django.test import override_settings
from paperless_remote import check_remote_parser_configured
class TestChecks(TestCase):
@override_settings(REMOTE_OCR_ENGINE=None)
def test_no_engine(self):
msgs = check_remote_parser_configured(None)
self.assertEqual(len(msgs), 0)
@override_settings(REMOTE_OCR_ENGINE="azureai")
@override_settings(REMOTE_OCR_API_KEY="somekey")
@override_settings(REMOTE_OCR_ENDPOINT=None)
def test_azure_no_endpoint(self):
msgs = check_remote_parser_configured(None)
self.assertEqual(len(msgs), 1)
self.assertTrue(
msgs[0].msg.startswith(
"Azure AI remote parser requires endpoint and API key to be configured.",
),
)

View File

@@ -1,101 +0,0 @@
import uuid
from pathlib import Path
from unittest import mock
from django.test import TestCase
from django.test import override_settings
from documents.tests.utils import DirectoriesMixin
from documents.tests.utils import FileSystemAssertsMixin
from paperless_remote.parsers import RemoteDocumentParser
from paperless_remote.signals import get_parser
class TestParser(DirectoriesMixin, FileSystemAssertsMixin, TestCase):
SAMPLE_FILES = Path(__file__).resolve().parent / "samples"
def assertContainsStrings(self, content: str, strings: list[str]):
# Asserts that all strings appear in content, in the given order.
indices = []
for s in strings:
if s in content:
indices.append(content.index(s))
else:
self.fail(f"'{s}' is not in '{content}'")
self.assertListEqual(indices, sorted(indices))
@mock.patch("paperless_tesseract.parsers.run_subprocess")
@mock.patch("azure.ai.documentintelligence.DocumentIntelligenceClient")
def test_get_text_with_azure(self, mock_client_cls, mock_subprocess):
# Arrange mock Azure client
mock_client = mock.Mock()
mock_client_cls.return_value = mock_client
# Simulate poller result and its `.details`
mock_poller = mock.Mock()
mock_poller.wait.return_value = None
mock_poller.details = {"operation_id": "fake-op-id"}
mock_client.begin_analyze_document.return_value = mock_poller
mock_poller.result.return_value.content = "This is a test document."
# Return dummy PDF bytes
mock_client.get_analyze_result_pdf.return_value = [
b"%PDF-",
b"1.7 ",
b"FAKEPDF",
]
# Simulate pdftotext by writing dummy text to sidecar file
def fake_run(cmd, *args, **kwargs):
with Path(cmd[-1]).open("w", encoding="utf-8") as f:
f.write("This is a test document.")
mock_subprocess.side_effect = fake_run
with override_settings(
REMOTE_OCR_ENGINE="azureai",
REMOTE_OCR_API_KEY="somekey",
REMOTE_OCR_ENDPOINT="https://endpoint.cognitiveservices.azure.com",
):
parser = get_parser(uuid.uuid4())
parser.parse(
self.SAMPLE_FILES / "simple-digital.pdf",
"application/pdf",
)
self.assertContainsStrings(
parser.text.strip(),
["This is a test document."],
)
@override_settings(
REMOTE_OCR_ENGINE="azureai",
REMOTE_OCR_API_KEY="key",
REMOTE_OCR_ENDPOINT="https://endpoint.cognitiveservices.azure.com",
)
def test_supported_mime_types_valid_config(self):
parser = RemoteDocumentParser(uuid.uuid4())
expected_types = {
"application/pdf": ".pdf",
"image/png": ".png",
"image/jpeg": ".jpg",
"image/tiff": ".tiff",
"image/bmp": ".bmp",
"image/gif": ".gif",
"image/webp": ".webp",
}
self.assertEqual(parser.supported_mime_types(), expected_types)
def test_supported_mime_types_invalid_config(self):
parser = get_parser(uuid.uuid4())
self.assertEqual(parser.supported_mime_types(), {})
@override_settings(
REMOTE_OCR_ENGINE=None,
REMOTE_OCR_API_KEY=None,
REMOTE_OCR_ENDPOINT=None,
)
def test_parse_with_invalid_config(self):
parser = get_parser(uuid.uuid4())
parser.parse(self.SAMPLE_FILES / "simple-digital.pdf", "application/pdf")
self.assertEqual(parser.text, "")

39
uv.lock generated
View File

@@ -95,34 +95,6 @@ wheels = [
{ url = "https://files.pythonhosted.org/packages/af/cc/55a32a2c98022d88812b5986d2a92c4ff3ee087e83b712ebc703bba452bf/Automat-24.8.1-py3-none-any.whl", hash = "sha256:bf029a7bc3da1e2c24da2343e7598affaa9f10bf0ab63ff808566ce90551e02a", size = 42585, upload-time = "2024-08-19T17:31:56.729Z" }, { url = "https://files.pythonhosted.org/packages/af/cc/55a32a2c98022d88812b5986d2a92c4ff3ee087e83b712ebc703bba452bf/Automat-24.8.1-py3-none-any.whl", hash = "sha256:bf029a7bc3da1e2c24da2343e7598affaa9f10bf0ab63ff808566ce90551e02a", size = 42585, upload-time = "2024-08-19T17:31:56.729Z" },
] ]
[[package]]
name = "azure-ai-documentintelligence"
version = "1.0.2"
source = { registry = "https://pypi.org/simple" }
dependencies = [
{ name = "azure-core", marker = "sys_platform == 'darwin' or sys_platform == 'linux'" },
{ name = "isodate", marker = "sys_platform == 'darwin' or sys_platform == 'linux'" },
{ name = "typing-extensions", marker = "sys_platform == 'darwin' or sys_platform == 'linux'" },
]
sdist = { url = "https://files.pythonhosted.org/packages/44/7b/8115cd713e2caa5e44def85f2b7ebd02a74ae74d7113ba20bdd41fd6dd80/azure_ai_documentintelligence-1.0.2.tar.gz", hash = "sha256:4d75a2513f2839365ebabc0e0e1772f5601b3a8c9a71e75da12440da13b63484", size = 170940 }
wheels = [
{ url = "https://files.pythonhosted.org/packages/d9/75/c9ec040f23082f54ffb1977ff8f364c2d21c79a640a13d1c1809e7fd6b1a/azure_ai_documentintelligence-1.0.2-py3-none-any.whl", hash = "sha256:e1fb446abbdeccc9759d897898a0fe13141ed29f9ad11fc705f951925822ed59", size = 106005 },
]
[[package]]
name = "azure-core"
version = "1.33.0"
source = { registry = "https://pypi.org/simple" }
dependencies = [
{ name = "requests", marker = "sys_platform == 'darwin' or sys_platform == 'linux'" },
{ name = "six", marker = "sys_platform == 'darwin' or sys_platform == 'linux'" },
{ name = "typing-extensions", marker = "sys_platform == 'darwin' or sys_platform == 'linux'" },
]
sdist = { url = "https://files.pythonhosted.org/packages/75/aa/7c9db8edd626f1a7d99d09ef7926f6f4fb34d5f9fa00dc394afdfe8e2a80/azure_core-1.33.0.tar.gz", hash = "sha256:f367aa07b5e3005fec2c1e184b882b0b039910733907d001c20fb08ebb8c0eb9", size = 295633 }
wheels = [
{ url = "https://files.pythonhosted.org/packages/07/b7/76b7e144aa53bd206bf1ce34fa75350472c3f69bf30e5c8c18bc9881035d/azure_core-1.33.0-py3-none-any.whl", hash = "sha256:9b5b6d0223a1d38c37500e6971118c1e0f13f54951e6893968b38910bc9cda8f", size = 207071 },
]
[[package]] [[package]]
name = "babel" name = "babel"
version = "2.17.0" version = "2.17.0"
@@ -1440,15 +1412,6 @@ wheels = [
{ url = "https://files.pythonhosted.org/packages/c7/fc/4e5a141c3f7c7bed550ac1f69e599e92b6be449dd4677ec09f325cad0955/inotifyrecursive-0.3.5-py3-none-any.whl", hash = "sha256:7e5f4a2e1dc2bef0efa3b5f6b339c41fb4599055a2b54909d020e9e932cc8d2f", size = 8009, upload-time = "2020-11-20T12:38:46.981Z" }, { url = "https://files.pythonhosted.org/packages/c7/fc/4e5a141c3f7c7bed550ac1f69e599e92b6be449dd4677ec09f325cad0955/inotifyrecursive-0.3.5-py3-none-any.whl", hash = "sha256:7e5f4a2e1dc2bef0efa3b5f6b339c41fb4599055a2b54909d020e9e932cc8d2f", size = 8009, upload-time = "2020-11-20T12:38:46.981Z" },
] ]
[[package]]
name = "isodate"
version = "0.7.2"
source = { registry = "https://pypi.org/simple" }
sdist = { url = "https://files.pythonhosted.org/packages/54/4d/e940025e2ce31a8ce1202635910747e5a87cc3a6a6bb2d00973375014749/isodate-0.7.2.tar.gz", hash = "sha256:4cd1aa0f43ca76f4a6c6c0292a85f40b35ec2e43e315b59f06e6d32171a953e6", size = 29705 }
wheels = [
{ url = "https://files.pythonhosted.org/packages/15/aa/0aca39a37d3c7eb941ba736ede56d689e7be91cab5d9ca846bde3999eba6/isodate-0.7.2-py3-none-any.whl", hash = "sha256:28009937d8031054830160fce6d409ed342816b543597cece116d966c6d99e15", size = 22320 },
]
[[package]] [[package]]
name = "jinja2" name = "jinja2"
version = "3.1.6" version = "3.1.6"
@@ -2069,7 +2032,6 @@ name = "paperless-ngx"
version = "2.18.4" version = "2.18.4"
source = { virtual = "." } source = { virtual = "." }
dependencies = [ dependencies = [
{ name = "azure-ai-documentintelligence", marker = "sys_platform == 'darwin' or sys_platform == 'linux'" },
{ name = "babel", marker = "sys_platform == 'darwin' or sys_platform == 'linux'" }, { name = "babel", marker = "sys_platform == 'darwin' or sys_platform == 'linux'" },
{ name = "bleach", marker = "sys_platform == 'darwin' or sys_platform == 'linux'" }, { name = "bleach", marker = "sys_platform == 'darwin' or sys_platform == 'linux'" },
{ name = "celery", extra = ["redis"], marker = "sys_platform == 'darwin' or sys_platform == 'linux'" }, { name = "celery", extra = ["redis"], marker = "sys_platform == 'darwin' or sys_platform == 'linux'" },
@@ -2207,7 +2169,6 @@ typing = [
[package.metadata] [package.metadata]
requires-dist = [ requires-dist = [
{ name = "azure-ai-documentintelligence", specifier = ">=1.0.2" },
{ name = "babel", specifier = ">=2.17" }, { name = "babel", specifier = ">=2.17" },
{ name = "bleach", specifier = "~=6.2.0" }, { name = "bleach", specifier = "~=6.2.0" },
{ name = "celery", extras = ["redis"], specifier = "~=5.5.1" }, { name = "celery", extras = ["redis"], specifier = "~=5.5.1" },