Compare commits

...

8 Commits

Author SHA1 Message Date
dependabot[bot]
aab0cc79ca Chore(deps): Update django-allauth[mfa,socialaccount] requirement
Updates the requirements on [django-allauth[mfa,socialaccount]](https://github.com/sponsors/pennersr) to permit the latest version.
- [Commits](https://github.com/sponsors/pennersr/commits)

---
updated-dependencies:
- dependency-name: django-allauth[mfa,socialaccount]
  dependency-version: 65.14.0
  dependency-type: direct:production
...

Signed-off-by: dependabot[bot] <support@github.com>
2026-02-04 00:04:02 +00:00
Trenton H
fb7abf7a6e Chore: Enable mypy checking in CI (#11991) 2026-02-03 16:02:33 -08:00
GitHub Actions
6ad2fc0356 Auto translate strings 2026-02-03 20:11:13 +00:00
Trenton H
2ec8ec96c8 Feature: Enable users to customize date parsing via plugins (#11931) 2026-02-03 20:09:13 +00:00
Trenton H
276dc13e3f Chore: Fixes the TO filter chaining so it doesn't reset the messages list + deterministic UIDs (#11987) 2026-02-03 11:31:19 -08:00
GitHub Actions
d0c02e7a8d Auto translate strings 2026-02-03 17:33:37 +00:00
shamoon
e45fca475a Feature: password removal workflow action (#11665) 2026-02-03 17:10:07 +00:00
shamoon
63c0e2f72b Documentation: clarify workflow placeholders docs 2026-02-03 08:13:10 -08:00
33 changed files with 4353 additions and 715 deletions

View File

@@ -28,3 +28,4 @@
./resources
# Other stuff
**/*.drawio.png
.mypy_baseline

View File

@@ -99,3 +99,47 @@ jobs:
run: |
docker compose --file docker/compose/docker-compose.ci-test.yml logs
docker compose --file docker/compose/docker-compose.ci-test.yml down
typing:
name: Check project typing
runs-on: ubuntu-24.04
env:
DEFAULT_PYTHON: "3.12"
steps:
- name: Checkout
uses: actions/checkout@v6.0.1
- name: Set up Python
id: setup-python
uses: actions/setup-python@v6.2.0
with:
python-version: "${{ env.DEFAULT_PYTHON }}"
- name: Install uv
uses: astral-sh/setup-uv@v7.2.1
with:
version: ${{ env.DEFAULT_UV_VERSION }}
enable-cache: true
python-version: ${{ steps.setup-python.outputs.python-version }}
- name: Install Python dependencies
run: |
uv sync \
--python ${{ steps.setup-python.outputs.python-version }} \
--group testing \
--group typing \
--frozen
- name: List installed Python dependencies
run: |
uv pip list
- name: Cache Mypy
uses: actions/cache@v5.0.3
with:
path: .mypy_cache
# Keyed by OS, Python version, and dependency hashes
key: ${{ runner.os }}-mypy-py${{ env.DEFAULT_PYTHON }}-${{ hashFiles('pyproject.toml', 'uv.lock') }}
restore-keys: |
${{ runner.os }}-mypy-py${{ env.DEFAULT_PYTHON }}-
${{ runner.os }}-mypy-
- name: Check typing
run: |
uv run mypy \
--show-error-codes \
--warn-unused-configs \
src/ | uv run mypy-baseline filter

2499
.mypy-baseline.txt Normal file

File diff suppressed because it is too large Load Diff

View File

@@ -481,3 +481,147 @@ To get started:
5. The project is ready for debugging, start either run the fullstack debug or individual debug
processes. Yo spin up the project without debugging run the task **Project Start: Run all Services**
## Developing Date Parser Plugins
Paperless-ngx uses a plugin system for date parsing, allowing you to extend or replace the default date parsing behavior. Plugins are discovered using [Python entry points](https://setuptools.pypa.io/en/latest/userguide/entry_point.html).
### Creating a Date Parser Plugin
To create a custom date parser plugin, you need to:
1. Create a class that inherits from `DateParserPluginBase`
2. Implement the required abstract method
3. Register your plugin via an entry point
#### 1. Implementing the Parser Class
Your parser must extend `documents.plugins.date_parsing.DateParserPluginBase` and implement the `parse` method:
```python
from collections.abc import Iterator
import datetime
from documents.plugins.date_parsing import DateParserPluginBase
class MyDateParserPlugin(DateParserPluginBase):
"""
Custom date parser implementation.
"""
def parse(self, filename: str, content: str) -> Iterator[datetime.datetime]:
"""
Parse dates from the document's filename and content.
Args:
filename: The original filename of the document
content: The extracted text content of the document
Yields:
datetime.datetime: Valid datetime objects found in the document
"""
# Your parsing logic here
# Use self.config to access configuration settings
# Example: parse dates from filename first
if self.config.filename_date_order:
# Your filename parsing logic
yield some_datetime
# Then parse dates from content
# Your content parsing logic
yield another_datetime
```
#### 2. Configuration and Helper Methods
Your parser instance is initialized with a `DateParserConfig` object accessible via `self.config`. This provides:
- `languages: list[str]` - List of language codes for date parsing
- `timezone_str: str` - Timezone string for date localization
- `ignore_dates: set[datetime.date]` - Dates that should be filtered out
- `reference_time: datetime.datetime` - Current time for filtering future dates
- `filename_date_order: str | None` - Date order preference for filenames (e.g., "DMY", "MDY")
- `content_date_order: str` - Date order preference for content
The base class provides two helper methods you can use:
```python
def _parse_string(
self,
date_string: str,
date_order: str,
) -> datetime.datetime | None:
"""
Parse a single date string using dateparser with configured settings.
"""
def _filter_date(
self,
date: datetime.datetime | None,
) -> datetime.datetime | None:
"""
Validate a parsed datetime against configured rules.
Filters out dates before 1900, future dates, and ignored dates.
"""
```
#### 3. Resource Management (Optional)
If your plugin needs to acquire or release resources (database connections, API clients, etc.), override the context manager methods. Paperless-ngx will always use plugins as context managers, ensuring resources can be released even in the event of errors.
#### 4. Registering Your Plugin
Register your plugin using a setuptools entry point in your package's `pyproject.toml`:
```toml
[project.entry-points."paperless_ngx.date_parsers"]
my_parser = "my_package.parsers:MyDateParserPlugin"
```
The entry point name (e.g., `"my_parser"`) is used for sorting when multiple plugins are found. Paperless-ngx will use the first plugin alphabetically by name if multiple plugins are discovered.
### Plugin Discovery
Paperless-ngx automatically discovers and loads date parser plugins at runtime. The discovery process:
1. Queries the `paperless_ngx.date_parsers` entry point group
2. Validates that each plugin is a subclass of `DateParserPluginBase`
3. Sorts valid plugins alphabetically by entry point name
4. Uses the first valid plugin, or falls back to the default `RegexDateParserPlugin` if none are found
If multiple plugins are installed, a warning is logged indicating which plugin was selected.
### Example: Simple Date Parser
Here's a minimal example that only looks for ISO 8601 dates:
```python
import datetime
import re
from collections.abc import Iterator
from documents.plugins.date_parsing.base import DateParserPluginBase
class ISODateParserPlugin(DateParserPluginBase):
"""
Parser that only matches ISO 8601 formatted dates (YYYY-MM-DD).
"""
ISO_REGEX = re.compile(r"\b(\d{4}-\d{2}-\d{2})\b")
def parse(self, filename: str, content: str) -> Iterator[datetime.datetime]:
# Combine filename and content for searching
text = f"{filename} {content}"
for match in self.ISO_REGEX.finditer(text):
date_string = match.group(1)
# Use helper method to parse with configured timezone
date = self._parse_string(date_string, "YMD")
# Use helper method to validate the date
filtered_date = self._filter_date(date)
if filtered_date is not None:
yield filtered_date
```

View File

@@ -562,8 +562,8 @@ you may want to adjust these settings to prevent abuse.
#### Workflow placeholders
Titles can be assigned by workflows using [Jinja templates](https://jinja.palletsprojects.com/en/3.1.x/templates/).
This allows for complex logic to be used to generate the title, including [logical structures](https://jinja.palletsprojects.com/en/3.1.x/templates/#list-of-control-structures)
Titles and webhook payloads can be generated by workflows using [Jinja templates](https://jinja.palletsprojects.com/en/3.1.x/templates/).
This allows for complex logic to be used, including [logical structures](https://jinja.palletsprojects.com/en/3.1.x/templates/#list-of-control-structures)
and [filters](https://jinja.palletsprojects.com/en/3.1.x/templates/#id11).
The template is provided as a string.
@@ -586,7 +586,7 @@ applied. You can use the following placeholders in the template with any trigger
- `{{added_time}}`: added time in HH:MM format
- `{{original_filename}}`: original file name without extension
- `{{filename}}`: current file name without extension
- `{{doc_title}}`: current document title
- `{{doc_title}}`: current document title (cannot be used in title assignment)
The following placeholders are only available for "added" or "updated" triggers

View File

@@ -27,7 +27,7 @@ dependencies = [
# WARNING: django does not use semver.
# Only patch versions are guaranteed to not introduce breaking changes.
"django~=5.2.10",
"django-allauth[mfa,socialaccount]~=65.13.1",
"django-allauth[mfa,socialaccount]~=65.14.0",
"django-auditlog~=3.4.1",
"django-cachalot~=2.8.0",
"django-celery-results~=2.6.0",
@@ -138,7 +138,9 @@ typing = [
"django-stubs[compatible-mypy]",
"djangorestframework-stubs[compatible-mypy]",
"lxml-stubs",
"microsoft-python-type-stubs @ git+https://github.com/microsoft/python-type-stubs.git",
"mypy",
"mypy-baseline",
"types-bleach",
"types-colorama",
"types-dateparser",
@@ -306,6 +308,7 @@ markers = [
"gotenberg: Tests requiring Gotenberg service",
"tika: Tests requiring Tika service",
"greenmail: Tests requiring Greenmail service",
"date_parsing: Tests which cover date parsing from content or filename",
]
[tool.pytest_env]
@@ -345,3 +348,7 @@ warn_unused_ignores = true
[tool.django-stubs]
django_settings_module = "paperless.settings"
[tool.mypy-baseline]
baseline_path = ".mypy-baseline.txt"
sort_baseline = true

View File

@@ -5359,6 +5359,27 @@
<context context-type="linenumber">429</context>
</context-group>
</trans-unit>
<trans-unit id="32686762098259088" datatype="html">
<source> One password per line. The workflow will try them in order until one succeeds. </source>
<context-group purpose="location">
<context context-type="sourcefile">src/app/components/common/edit-dialog/workflow-edit-dialog/workflow-edit-dialog.component.html</context>
<context context-type="linenumber">436,438</context>
</context-group>
</trans-unit>
<trans-unit id="3853121441237751087" datatype="html">
<source>Passwords</source>
<context-group purpose="location">
<context context-type="sourcefile">src/app/components/common/edit-dialog/workflow-edit-dialog/workflow-edit-dialog.component.html</context>
<context context-type="linenumber">441</context>
</context-group>
</trans-unit>
<trans-unit id="3653669613103848563" datatype="html">
<source>Passwords are stored in plain text. Use with caution.</source>
<context-group purpose="location">
<context context-type="sourcefile">src/app/components/common/edit-dialog/workflow-edit-dialog/workflow-edit-dialog.component.html</context>
<context context-type="linenumber">445</context>
</context-group>
</trans-unit>
<trans-unit id="4626030417479279989" datatype="html">
<source>Consume Folder</source>
<context-group purpose="location">
@@ -5454,109 +5475,116 @@
<context context-type="linenumber">140</context>
</context-group>
</trans-unit>
<trans-unit id="4824906895380506720" datatype="html">
<source>Password removal</source>
<context-group purpose="location">
<context context-type="sourcefile">src/app/components/common/edit-dialog/workflow-edit-dialog/workflow-edit-dialog.component.ts</context>
<context context-type="linenumber">144</context>
</context-group>
</trans-unit>
<trans-unit id="4522609911791833187" datatype="html">
<source>Has any of these tags</source>
<context-group purpose="location">
<context context-type="sourcefile">src/app/components/common/edit-dialog/workflow-edit-dialog/workflow-edit-dialog.component.ts</context>
<context context-type="linenumber">209</context>
<context context-type="linenumber">213</context>
</context-group>
</trans-unit>
<trans-unit id="4166903555074156852" datatype="html">
<source>Has all of these tags</source>
<context-group purpose="location">
<context context-type="sourcefile">src/app/components/common/edit-dialog/workflow-edit-dialog/workflow-edit-dialog.component.ts</context>
<context context-type="linenumber">216</context>
<context context-type="linenumber">220</context>
</context-group>
</trans-unit>
<trans-unit id="6624363795312783141" datatype="html">
<source>Does not have these tags</source>
<context-group purpose="location">
<context context-type="sourcefile">src/app/components/common/edit-dialog/workflow-edit-dialog/workflow-edit-dialog.component.ts</context>
<context context-type="linenumber">223</context>
<context context-type="linenumber">227</context>
</context-group>
</trans-unit>
<trans-unit id="7168528512669831184" datatype="html">
<source>Has any of these correspondents</source>
<context-group purpose="location">
<context context-type="sourcefile">src/app/components/common/edit-dialog/workflow-edit-dialog/workflow-edit-dialog.component.ts</context>
<context context-type="linenumber">230</context>
<context context-type="linenumber">234</context>
</context-group>
</trans-unit>
<trans-unit id="5281365940563983618" datatype="html">
<source>Has correspondent</source>
<context-group purpose="location">
<context context-type="sourcefile">src/app/components/common/edit-dialog/workflow-edit-dialog/workflow-edit-dialog.component.ts</context>
<context context-type="linenumber">238</context>
<context context-type="linenumber">242</context>
</context-group>
</trans-unit>
<trans-unit id="6884498632428600393" datatype="html">
<source>Does not have correspondents</source>
<context-group purpose="location">
<context context-type="sourcefile">src/app/components/common/edit-dialog/workflow-edit-dialog/workflow-edit-dialog.component.ts</context>
<context context-type="linenumber">246</context>
<context context-type="linenumber">250</context>
</context-group>
</trans-unit>
<trans-unit id="4806713133917046341" datatype="html">
<source>Has document type</source>
<context-group purpose="location">
<context context-type="sourcefile">src/app/components/common/edit-dialog/workflow-edit-dialog/workflow-edit-dialog.component.ts</context>
<context context-type="linenumber">254</context>
<context context-type="linenumber">258</context>
</context-group>
</trans-unit>
<trans-unit id="8801397520369995032" datatype="html">
<source>Has any of these document types</source>
<context-group purpose="location">
<context context-type="sourcefile">src/app/components/common/edit-dialog/workflow-edit-dialog/workflow-edit-dialog.component.ts</context>
<context context-type="linenumber">262</context>
<context context-type="linenumber">266</context>
</context-group>
</trans-unit>
<trans-unit id="1507843981661822403" datatype="html">
<source>Does not have document types</source>
<context-group purpose="location">
<context context-type="sourcefile">src/app/components/common/edit-dialog/workflow-edit-dialog/workflow-edit-dialog.component.ts</context>
<context context-type="linenumber">270</context>
<context context-type="linenumber">274</context>
</context-group>
</trans-unit>
<trans-unit id="4277260190522078330" datatype="html">
<source>Has storage path</source>
<context-group purpose="location">
<context context-type="sourcefile">src/app/components/common/edit-dialog/workflow-edit-dialog/workflow-edit-dialog.component.ts</context>
<context context-type="linenumber">278</context>
<context context-type="linenumber">282</context>
</context-group>
</trans-unit>
<trans-unit id="8858580062214623097" datatype="html">
<source>Has any of these storage paths</source>
<context-group purpose="location">
<context context-type="sourcefile">src/app/components/common/edit-dialog/workflow-edit-dialog/workflow-edit-dialog.component.ts</context>
<context context-type="linenumber">286</context>
<context context-type="linenumber">290</context>
</context-group>
</trans-unit>
<trans-unit id="6070943364927280151" datatype="html">
<source>Does not have storage paths</source>
<context-group purpose="location">
<context context-type="sourcefile">src/app/components/common/edit-dialog/workflow-edit-dialog/workflow-edit-dialog.component.ts</context>
<context context-type="linenumber">294</context>
<context context-type="linenumber">298</context>
</context-group>
</trans-unit>
<trans-unit id="6250799006816371860" datatype="html">
<source>Matches custom field query</source>
<context-group purpose="location">
<context context-type="sourcefile">src/app/components/common/edit-dialog/workflow-edit-dialog/workflow-edit-dialog.component.ts</context>
<context context-type="linenumber">302</context>
<context context-type="linenumber">306</context>
</context-group>
</trans-unit>
<trans-unit id="3138206142174978019" datatype="html">
<source>Create new workflow</source>
<context-group purpose="location">
<context context-type="sourcefile">src/app/components/common/edit-dialog/workflow-edit-dialog/workflow-edit-dialog.component.ts</context>
<context context-type="linenumber">531</context>
<context context-type="linenumber">535</context>
</context-group>
</trans-unit>
<trans-unit id="5996779210524133604" datatype="html">
<source>Edit workflow</source>
<context-group purpose="location">
<context context-type="sourcefile">src/app/components/common/edit-dialog/workflow-edit-dialog/workflow-edit-dialog.component.ts</context>
<context context-type="linenumber">535</context>
<context context-type="linenumber">539</context>
</context-group>
</trans-unit>
<trans-unit id="5457837313196342910" datatype="html">

View File

@@ -430,6 +430,24 @@
</div>
</div>
}
@case (WorkflowActionType.PasswordRemoval) {
<div class="row">
<div class="col">
<p class="small" i18n>
One password per line. The workflow will try them in order until one succeeds.
</p>
<pngx-input-textarea
i18n-title
title="Passwords"
formControlName="passwords"
rows="4"
[error]="error?.actions?.[i]?.passwords"
hint="Passwords are stored in plain text. Use with caution."
i18n-hint
></pngx-input-textarea>
</div>
</div>
}
}
</div>
</ng-template>

View File

@@ -3,6 +3,7 @@ import { provideHttpClient, withInterceptorsFromDi } from '@angular/common/http'
import { provideHttpClientTesting } from '@angular/common/http/testing'
import { ComponentFixture, TestBed } from '@angular/core/testing'
import {
FormArray,
FormControl,
FormGroup,
FormsModule,
@@ -994,4 +995,32 @@ describe('WorkflowEditDialogComponent', () => {
component.removeSelectedCustomField(3, formGroup)
expect(formGroup.get('assign_custom_fields').value).toEqual([])
})
it('should handle parsing of passwords from array to string and back on save', () => {
const passwordAction: WorkflowAction = {
id: 1,
type: WorkflowActionType.PasswordRemoval,
passwords: ['pass1', 'pass2'],
}
component.object = {
name: 'Workflow with Passwords',
id: 1,
order: 1,
enabled: true,
triggers: [],
actions: [passwordAction],
}
component.ngOnInit()
const formActions = component.objectForm.get('actions') as FormArray
expect(formActions.value[0].passwords).toBe('pass1\npass2')
formActions.at(0).get('passwords').setValue('pass1\npass2\npass3')
component.save()
expect(component.objectForm.get('actions').value[0].passwords).toEqual([
'pass1',
'pass2',
'pass3',
])
})
})

View File

@@ -139,6 +139,10 @@ export const WORKFLOW_ACTION_OPTIONS = [
id: WorkflowActionType.Webhook,
name: $localize`Webhook`,
},
{
id: WorkflowActionType.PasswordRemoval,
name: $localize`Password removal`,
},
]
export enum TriggerFilterType {
@@ -1202,11 +1206,25 @@ export class WorkflowEditDialogComponent
headers: new FormControl(action.webhook?.headers),
include_document: new FormControl(!!action.webhook?.include_document),
}),
passwords: new FormControl(
this.formatPasswords(action.passwords ?? [])
),
}),
{ emitEvent }
)
}
private formatPasswords(passwords: string[] = []): string {
return passwords.join('\n')
}
private parsePasswords(value: string = ''): string[] {
return value
.split(/[\n,]+/)
.map((entry) => entry.trim())
.filter((entry) => entry.length > 0)
}
private updateAllTriggerActionFields(emitEvent: boolean = false) {
this.triggerFields.clear({ emitEvent: false })
this.object?.triggers.forEach((trigger) => {
@@ -1331,6 +1349,7 @@ export class WorkflowEditDialogComponent
headers: null,
include_document: false,
},
passwords: [],
}
this.object.actions.push(action)
this.createActionField(action)
@@ -1367,6 +1386,7 @@ export class WorkflowEditDialogComponent
if (action.type !== WorkflowActionType.Email) {
action.email = null
}
action.passwords = this.parsePasswords(action.passwords as any)
})
super.save()
}

View File

@@ -5,6 +5,7 @@ export enum WorkflowActionType {
Removal = 2,
Email = 3,
Webhook = 4,
PasswordRemoval = 5,
}
export interface WorkflowActionEmail extends ObjectWithId {
@@ -97,4 +98,6 @@ export interface WorkflowAction extends ObjectWithId {
email?: WorkflowActionEmail
webhook?: WorkflowActionWebhook
passwords?: string[]
}

View File

@@ -33,12 +33,12 @@ from documents.models import WorkflowTrigger
from documents.parsers import DocumentParser
from documents.parsers import ParseError
from documents.parsers import get_parser_class_for_mime_type
from documents.parsers import parse_date
from documents.permissions import set_permissions_for_object
from documents.plugins.base import AlwaysRunPluginMixin
from documents.plugins.base import ConsumeTaskPlugin
from documents.plugins.base import NoCleanupPluginMixin
from documents.plugins.base import NoSetupPluginMixin
from documents.plugins.date_parsing import get_date_parser
from documents.plugins.helpers import ProgressManager
from documents.plugins.helpers import ProgressStatusOptions
from documents.signals import document_consumption_finished
@@ -432,7 +432,8 @@ class ConsumerPlugin(
ProgressStatusOptions.WORKING,
ConsumerStatusShortMessage.PARSE_DATE,
)
date = parse_date(self.filename, text)
with get_date_parser() as date_parser:
date = next(date_parser.parse(self.filename, text), None)
archive_path = document_parser.get_archive_path()
page_count = document_parser.get_page_count(self.working_copy, mime_type)

View File

@@ -0,0 +1,38 @@
# Generated by Django 5.2.7 on 2025-12-29 03:56
from django.db import migrations
from django.db import models
class Migration(migrations.Migration):
dependencies = [
("documents", "0008_sharelinkbundle"),
]
operations = [
migrations.AddField(
model_name="workflowaction",
name="passwords",
field=models.JSONField(
blank=True,
help_text="Passwords to try when removing PDF protection. Separate with commas or new lines.",
null=True,
verbose_name="passwords",
),
),
migrations.AlterField(
model_name="workflowaction",
name="type",
field=models.PositiveIntegerField(
choices=[
(1, "Assignment"),
(2, "Removal"),
(3, "Email"),
(4, "Webhook"),
(5, "Password removal"),
],
default=1,
verbose_name="Workflow Action Type",
),
),
]

View File

@@ -1405,6 +1405,10 @@ class WorkflowAction(models.Model):
4,
_("Webhook"),
)
PASSWORD_REMOVAL = (
5,
_("Password removal"),
)
type = models.PositiveIntegerField(
_("Workflow Action Type"),
@@ -1634,6 +1638,15 @@ class WorkflowAction(models.Model):
verbose_name=_("webhook"),
)
passwords = models.JSONField(
_("passwords"),
null=True,
blank=True,
help_text=_(
"Passwords to try when removing PDF protection. Separate with commas or new lines.",
),
)
class Meta:
verbose_name = _("workflow action")
verbose_name_plural = _("workflow actions")

View File

@@ -9,22 +9,17 @@ import subprocess
import tempfile
from functools import lru_cache
from pathlib import Path
from re import Match
from typing import TYPE_CHECKING
from django.conf import settings
from django.utils import timezone
from documents.loggers import LoggingMixin
from documents.signals import document_consumer_declaration
from documents.utils import copy_file_with_basic_stats
from documents.utils import run_subprocess
from paperless.config import OcrConfig
from paperless.utils import ocr_to_dateparser_languages
if TYPE_CHECKING:
import datetime
from collections.abc import Iterator
# This regular expression will try to find dates in the document at
# hand and will match the following formats:
@@ -259,75 +254,6 @@ def make_thumbnail_from_pdf(in_path: Path, temp_dir: Path, logging_group=None) -
return out_path
def parse_date(filename, text) -> datetime.datetime | None:
return next(parse_date_generator(filename, text), None)
def parse_date_generator(filename, text) -> Iterator[datetime.datetime]:
"""
Returns the date of the document.
"""
def __parser(ds: str, date_order: str) -> datetime.datetime:
"""
Call dateparser.parse with a particular date ordering
"""
import dateparser
ocr_config = OcrConfig()
languages = settings.DATE_PARSER_LANGUAGES or ocr_to_dateparser_languages(
ocr_config.language,
)
return dateparser.parse(
ds,
settings={
"DATE_ORDER": date_order,
"PREFER_DAY_OF_MONTH": "first",
"RETURN_AS_TIMEZONE_AWARE": True,
"TIMEZONE": settings.TIME_ZONE,
},
locales=languages,
)
def __filter(date: datetime.datetime) -> datetime.datetime | None:
if (
date is not None
and date.year > 1900
and date <= timezone.now()
and date.date() not in settings.IGNORE_DATES
):
return date
return None
def __process_match(
match: Match[str],
date_order: str,
) -> datetime.datetime | None:
date_string = match.group(0)
try:
date = __parser(date_string, date_order)
except Exception:
# Skip all matches that do not parse to a proper date
date = None
return __filter(date)
def __process_content(content: str, date_order: str) -> Iterator[datetime.datetime]:
for m in re.finditer(DATE_REGEX, content):
date = __process_match(m, date_order)
if date is not None:
yield date
# if filename date parsing is enabled, search there first:
if settings.FILENAME_DATE_ORDER:
yield from __process_content(filename, settings.FILENAME_DATE_ORDER)
# Iterate through all regex matches in text and try to parse the date
yield from __process_content(text, settings.DATE_ORDER)
class ParseError(Exception):
pass

View File

@@ -0,0 +1,101 @@
import logging
from functools import lru_cache
from importlib.metadata import EntryPoint
from importlib.metadata import entry_points
from typing import Final
from django.conf import settings
from django.utils import timezone
from documents.plugins.date_parsing.base import DateParserConfig
from documents.plugins.date_parsing.base import DateParserPluginBase
from documents.plugins.date_parsing.regex_parser import RegexDateParserPlugin
from paperless.config import OcrConfig
from paperless.utils import ocr_to_dateparser_languages
logger = logging.getLogger(__name__)
DATE_PARSER_ENTRY_POINT_GROUP: Final = "paperless_ngx.date_parsers"
@lru_cache(maxsize=1)
def _discover_parser_class() -> type[DateParserPluginBase]:
"""
Discovers the date parser plugin class to use.
- If one or more plugins are found, sorts them by name and returns the first.
- If no plugins are found, returns the default RegexDateParser.
"""
eps: tuple[EntryPoint, ...]
try:
eps = entry_points(group=DATE_PARSER_ENTRY_POINT_GROUP)
except Exception as e:
# Log a warning
logger.warning(f"Could not query entry points for date parsers: {e}")
eps = ()
valid_plugins: list[EntryPoint] = []
for ep in eps:
try:
plugin_class = ep.load()
if plugin_class and issubclass(plugin_class, DateParserPluginBase):
valid_plugins.append(ep)
else:
logger.warning(f"Plugin {ep.name} does not subclass DateParser.")
except Exception as e:
logger.error(f"Unable to load date parser plugin {ep.name}: {e}")
if not valid_plugins:
return RegexDateParserPlugin
valid_plugins.sort(key=lambda ep: ep.name)
if len(valid_plugins) > 1:
logger.warning(
f"Multiple date parsers found: "
f"{[ep.name for ep in valid_plugins]}. "
f"Using the first one by name: '{valid_plugins[0].name}'.",
)
return valid_plugins[0].load()
def get_date_parser() -> DateParserPluginBase:
"""
Factory function to get an initialized date parser instance.
This function is responsible for:
1. Discovering the correct parser class (plugin or default).
2. Loading configuration from Django settings.
3. Instantiating the parser with the configuration.
"""
# 1. Discover the class (this is cached)
parser_class = _discover_parser_class()
# 2. Load configuration from settings
# TODO: Get the language from the settings and/or configuration object, depending
ocr_config = OcrConfig()
languages = settings.DATE_PARSER_LANGUAGES or ocr_to_dateparser_languages(
ocr_config.language,
)
config = DateParserConfig(
languages=languages,
timezone_str=settings.TIME_ZONE,
ignore_dates=settings.IGNORE_DATES,
reference_time=timezone.now(),
filename_date_order=settings.FILENAME_DATE_ORDER,
content_date_order=settings.DATE_ORDER,
)
# 3. Instantiate the discovered class with the config
return parser_class(config=config)
__all__ = [
"DateParserConfig",
"DateParserPluginBase",
"RegexDateParserPlugin",
"get_date_parser",
]

View File

@@ -0,0 +1,124 @@
import datetime
import logging
from abc import ABC
from abc import abstractmethod
from collections.abc import Iterator
from dataclasses import dataclass
from types import TracebackType
try:
from typing import Self
except ImportError:
from typing_extensions import Self
import dateparser
logger = logging.getLogger(__name__)
@dataclass(frozen=True, slots=True)
class DateParserConfig:
"""
Configuration for a DateParser instance.
This object is created by the factory and passed to the
parser's constructor, decoupling the parser from settings.
"""
languages: list[str]
timezone_str: str
ignore_dates: set[datetime.date]
# A "now" timestamp for filtering future dates.
# Passed in by the factory.
reference_time: datetime.datetime
# Settings for the default RegexDateParser
# Other plugins should use or consider these, but it is not required
filename_date_order: str | None
content_date_order: str
class DateParserPluginBase(ABC):
"""
Abstract base class for date parsing strategies.
Instances are configured via a DateParserConfig object.
"""
def __init__(self, config: DateParserConfig):
"""
Initializes the parser with its configuration.
"""
self.config = config
def __enter__(self) -> Self:
"""
Enter the runtime context related to this object.
Subclasses can override this to acquire resources (connections, handles).
"""
return self
def __exit__(
self,
exc_type: type[BaseException] | None,
exc_val: BaseException | None,
exc_tb: TracebackType | None,
) -> None:
"""
Exit the runtime context related to this object.
Subclasses can override this to release resources.
"""
# Default implementation does nothing.
# Returning None implies exceptions are propagated.
def _parse_string(
self,
date_string: str,
date_order: str,
) -> datetime.datetime | None:
"""
Helper method to parse a single date string using dateparser.
Uses configuration from `self.config`.
"""
try:
return dateparser.parse(
date_string,
settings={
"DATE_ORDER": date_order,
"PREFER_DAY_OF_MONTH": "first",
"RETURN_AS_TIMEZONE_AWARE": True,
"TIMEZONE": self.config.timezone_str,
},
locales=self.config.languages,
)
except Exception as e:
logger.error(f"Error while parsing date string '{date_string}': {e}")
return None
def _filter_date(
self,
date: datetime.datetime | None,
) -> datetime.datetime | None:
"""
Helper method to validate a parsed datetime object.
Uses configuration from `self.config`.
"""
if (
date is not None
and date.year > 1900
and date <= self.config.reference_time
and date.date() not in self.config.ignore_dates
):
return date
return None
@abstractmethod
def parse(self, filename: str, content: str) -> Iterator[datetime.datetime]:
"""
Parses a document's filename and content, yielding valid datetime objects.
"""

View File

@@ -0,0 +1,65 @@
import datetime
import re
from collections.abc import Iterator
from re import Match
from documents.plugins.date_parsing.base import DateParserPluginBase
class RegexDateParserPlugin(DateParserPluginBase):
"""
The default date parser, using a series of regular expressions.
It is configured entirely by the DateParserConfig object
passed to its constructor.
"""
DATE_REGEX = re.compile(
r"(\b|(?!=([_-])))(\d{1,2})[\.\/-](\d{1,2})[\.\/-](\d{4}|\d{2})(\b|(?=([_-])))|"
r"(\b|(?!=([_-])))(\d{4}|\d{2})[\.\/-](\d{1,2})[\.\/-](\d{1,2})(\b|(?=([_-])))|"
r"(\b|(?!=([_-])))(\d{1,2}[\. ]+[a-zéûäëčžúřěáíóńźçŞğü]{3,9} \d{4}|[a-zéûäëčžúřěáíóńźçŞğü]{3,9} \d{1,2}, \d{4})(\b|(?=([_-])))|"
r"(\b|(?!=([_-])))([^\W\d_]{3,9} \d{1,2}, (\d{4}))(\b|(?=([_-])))|"
r"(\b|(?!=([_-])))([^\W\d_]{3,9} \d{4})(\b|(?=([_-])))|"
r"(\b|(?!=([_-])))(\d{1,2}[^ 0-9]{2}[\. ]+[^ ]{3,9}[ \.\/-]\d{4})(\b|(?=([_-])))|"
r"(\b|(?!=([_-])))(\b\d{1,2}[ \.\/-][a-zéûäëčžúřěáíóńźçŞğü]{3}[ \.\/-]\d{4})(\b|(?=([_-])))",
re.IGNORECASE,
)
def _process_match(
self,
match: Match[str],
date_order: str,
) -> datetime.datetime | None:
"""
Processes a single regex match using the base class helpers.
"""
date_string = match.group(0)
date = self._parse_string(date_string, date_order)
return self._filter_date(date)
def _process_content(
self,
content: str,
date_order: str,
) -> Iterator[datetime.datetime]:
"""
Finds all regex matches in content and yields valid dates.
"""
for m in re.finditer(self.DATE_REGEX, content):
date = self._process_match(m, date_order)
if date is not None:
yield date
def parse(self, filename: str, content: str) -> Iterator[datetime.datetime]:
"""
Implementation of the abstract parse method.
Reads its configuration from `self.config`.
"""
if self.config.filename_date_order:
yield from self._process_content(
filename,
self.config.filename_date_order,
)
yield from self._process_content(content, self.config.content_date_order)

View File

@@ -2627,6 +2627,7 @@ class WorkflowActionSerializer(serializers.ModelSerializer):
"remove_change_groups",
"email",
"webhook",
"passwords",
]
def validate(self, attrs):
@@ -2683,6 +2684,23 @@ class WorkflowActionSerializer(serializers.ModelSerializer):
"Webhook data is required for webhook actions",
)
if (
"type" in attrs
and attrs["type"] == WorkflowAction.WorkflowActionType.PASSWORD_REMOVAL
):
passwords = attrs.get("passwords")
# ensure passwords is a non-empty list of non-empty strings
if (
passwords is None
or not isinstance(passwords, list)
or len(passwords) == 0
or any(not isinstance(pw, str) for pw in passwords)
or any(len(pw.strip()) == 0 for pw in passwords)
):
raise serializers.ValidationError(
"Passwords are required for password removal actions",
)
return attrs

View File

@@ -48,6 +48,7 @@ from documents.permissions import get_objects_for_user_owner_aware
from documents.templating.utils import convert_format_str_to_template_format
from documents.workflows.actions import build_workflow_action_context
from documents.workflows.actions import execute_email_action
from documents.workflows.actions import execute_password_removal_action
from documents.workflows.actions import execute_webhook_action
from documents.workflows.mutations import apply_assignment_to_document
from documents.workflows.mutations import apply_assignment_to_overrides
@@ -831,6 +832,8 @@ def run_workflows(
logging_group,
original_file,
)
elif action.type == WorkflowAction.WorkflowActionType.PASSWORD_REMOVAL:
execute_password_removal_action(action, document, logging_group)
if not use_overrides:
# limit title to 128 characters

View File

@@ -0,0 +1,82 @@
import datetime
from collections.abc import Generator
from typing import Any
import pytest
import pytest_django
from documents.plugins.date_parsing import _discover_parser_class
from documents.plugins.date_parsing.base import DateParserConfig
from documents.plugins.date_parsing.regex_parser import RegexDateParserPlugin
@pytest.fixture
def base_config() -> DateParserConfig:
"""Basic configuration for date parser testing."""
return DateParserConfig(
languages=["en"],
timezone_str="UTC",
ignore_dates=set(),
reference_time=datetime.datetime(
2024,
1,
15,
12,
0,
0,
tzinfo=datetime.timezone.utc,
),
filename_date_order="YMD",
content_date_order="DMY",
)
@pytest.fixture
def config_with_ignore_dates() -> DateParserConfig:
"""Configuration with dates to ignore."""
return DateParserConfig(
languages=["en", "de"],
timezone_str="America/New_York",
ignore_dates={datetime.date(2024, 1, 1), datetime.date(2024, 12, 25)},
reference_time=datetime.datetime(
2024,
1,
15,
12,
0,
0,
tzinfo=datetime.timezone.utc,
),
filename_date_order="DMY",
content_date_order="MDY",
)
@pytest.fixture
def regex_parser(base_config: DateParserConfig) -> RegexDateParserPlugin:
"""Instance of RegexDateParser with base config."""
return RegexDateParserPlugin(base_config)
@pytest.fixture
def clear_lru_cache() -> Generator[None, None, None]:
"""
Ensure the LRU cache for _discover_parser_class is cleared
before and after any test that depends on it.
"""
_discover_parser_class.cache_clear()
yield
_discover_parser_class.cache_clear()
@pytest.fixture
def mock_date_parser_settings(settings: pytest_django.fixtures.SettingsWrapper) -> Any:
"""
Override Django settings for the duration of date parser tests.
"""
settings.DATE_PARSER_LANGUAGES = ["en", "de"]
settings.TIME_ZONE = "UTC"
settings.IGNORE_DATES = [datetime.date(1900, 1, 1)]
settings.FILENAME_DATE_ORDER = "YMD"
settings.DATE_ORDER = "DMY"
return settings

View File

@@ -0,0 +1,229 @@
import datetime
import logging
from collections.abc import Iterator
from importlib.metadata import EntryPoint
import pytest
import pytest_mock
from django.utils import timezone
from documents.plugins.date_parsing import DATE_PARSER_ENTRY_POINT_GROUP
from documents.plugins.date_parsing import _discover_parser_class
from documents.plugins.date_parsing import get_date_parser
from documents.plugins.date_parsing.base import DateParserConfig
from documents.plugins.date_parsing.base import DateParserPluginBase
from documents.plugins.date_parsing.regex_parser import RegexDateParserPlugin
class AlphaParser(DateParserPluginBase):
def parse(self, filename: str, content: str) -> Iterator[datetime.datetime]:
yield timezone.now()
class BetaParser(DateParserPluginBase):
def parse(self, filename: str, content: str) -> Iterator[datetime.datetime]:
yield timezone.now()
@pytest.mark.date_parsing
@pytest.mark.usefixtures("clear_lru_cache")
class TestDiscoverParserClass:
"""Tests for the _discover_parser_class() function."""
def test_returns_default_when_no_plugins_found(
self,
mocker: pytest_mock.MockerFixture,
) -> None:
mocker.patch(
"documents.plugins.date_parsing.entry_points",
return_value=(),
)
result = _discover_parser_class()
assert result is RegexDateParserPlugin
def test_returns_default_when_entrypoint_query_fails(
self,
mocker: pytest_mock.MockerFixture,
caplog: pytest.LogCaptureFixture,
) -> None:
mocker.patch(
"documents.plugins.date_parsing.entry_points",
side_effect=RuntimeError("boom"),
)
result = _discover_parser_class()
assert result is RegexDateParserPlugin
assert "Could not query entry points" in caplog.text
def test_filters_out_invalid_plugins(
self,
mocker: pytest_mock.MockerFixture,
caplog: pytest.LogCaptureFixture,
) -> None:
fake_ep = mocker.MagicMock(spec=EntryPoint)
fake_ep.name = "bad_plugin"
fake_ep.load.return_value = object # not subclass of DateParser
mocker.patch(
"documents.plugins.date_parsing.entry_points",
return_value=(fake_ep,),
)
result = _discover_parser_class()
assert result is RegexDateParserPlugin
assert "does not subclass DateParser" in caplog.text
def test_skips_plugins_that_fail_to_load(
self,
mocker: pytest_mock.MockerFixture,
caplog: pytest.LogCaptureFixture,
) -> None:
fake_ep = mocker.MagicMock(spec=EntryPoint)
fake_ep.name = "failing_plugin"
fake_ep.load.side_effect = ImportError("cannot import")
mocker.patch(
"documents.plugins.date_parsing.entry_points",
return_value=(fake_ep,),
)
result = _discover_parser_class()
assert result is RegexDateParserPlugin
assert "Unable to load date parser plugin failing_plugin" in caplog.text
def test_returns_single_valid_plugin_without_warning(
self,
mocker: pytest_mock.MockerFixture,
caplog: pytest.LogCaptureFixture,
) -> None:
"""If exactly one valid plugin is discovered, it should be returned without logging a warning."""
ep = mocker.MagicMock(spec=EntryPoint)
ep.name = "alpha"
ep.load.return_value = AlphaParser
mock_entry_points = mocker.patch(
"documents.plugins.date_parsing.entry_points",
return_value=(ep,),
)
with caplog.at_level(
logging.WARNING,
logger="documents.plugins.date_parsing",
):
result = _discover_parser_class()
# It should have called entry_points with the correct group
mock_entry_points.assert_called_once_with(group=DATE_PARSER_ENTRY_POINT_GROUP)
# The discovered class should be exactly our AlphaParser
assert result is AlphaParser
# No warnings should have been logged
assert not any(
"Multiple date parsers found" in record.message for record in caplog.records
), "Unexpected warning logged when only one plugin was found"
def test_returns_first_valid_plugin_by_name(
self,
mocker: pytest_mock.MockerFixture,
) -> None:
ep_a = mocker.MagicMock(spec=EntryPoint)
ep_a.name = "alpha"
ep_a.load.return_value = AlphaParser
ep_b = mocker.MagicMock(spec=EntryPoint)
ep_b.name = "beta"
ep_b.load.return_value = BetaParser
mocker.patch(
"documents.plugins.date_parsing.entry_points",
return_value=(ep_b, ep_a),
)
result = _discover_parser_class()
assert result is AlphaParser
def test_logs_warning_if_multiple_plugins_found(
self,
mocker: pytest_mock.MockerFixture,
caplog: pytest.LogCaptureFixture,
) -> None:
ep1 = mocker.MagicMock(spec=EntryPoint)
ep1.name = "a"
ep1.load.return_value = AlphaParser
ep2 = mocker.MagicMock(spec=EntryPoint)
ep2.name = "b"
ep2.load.return_value = BetaParser
mocker.patch(
"documents.plugins.date_parsing.entry_points",
return_value=(ep1, ep2),
)
with caplog.at_level(
logging.WARNING,
logger="documents.plugins.date_parsing",
):
result = _discover_parser_class()
# Should select alphabetically first plugin ("a")
assert result is AlphaParser
# Should log a warning mentioning multiple parsers
assert any(
"Multiple date parsers found" in record.message for record in caplog.records
), "Expected a warning about multiple date parsers"
def test_cache_behavior_only_runs_once(
self,
mocker: pytest_mock.MockerFixture,
) -> None:
mock_entry_points = mocker.patch(
"documents.plugins.date_parsing.entry_points",
return_value=(),
)
# First call populates cache
_discover_parser_class()
# Second call should not re-invoke entry_points
_discover_parser_class()
mock_entry_points.assert_called_once()
@pytest.mark.django_db
@pytest.mark.date_parsing
@pytest.mark.usefixtures("mock_date_parser_settings")
class TestGetDateParser:
"""Tests for the get_date_parser() factory function."""
def test_returns_instance_of_discovered_class(
self,
mocker: pytest_mock.MockerFixture,
) -> None:
mocker.patch(
"documents.plugins.date_parsing._discover_parser_class",
return_value=AlphaParser,
)
parser = get_date_parser()
assert isinstance(parser, AlphaParser)
assert isinstance(parser.config, DateParserConfig)
assert parser.config.languages == ["en", "de"]
assert parser.config.timezone_str == "UTC"
assert parser.config.ignore_dates == [datetime.date(1900, 1, 1)]
assert parser.config.filename_date_order == "YMD"
assert parser.config.content_date_order == "DMY"
# Check reference_time near now
delta = abs((parser.config.reference_time - timezone.now()).total_seconds())
assert delta < 2
def test_uses_default_regex_parser_when_no_plugins(
self,
mocker: pytest_mock.MockerFixture,
) -> None:
mocker.patch(
"documents.plugins.date_parsing._discover_parser_class",
return_value=RegexDateParserPlugin,
)
parser = get_date_parser()
assert isinstance(parser, RegexDateParserPlugin)

View File

@@ -0,0 +1,433 @@
import datetime
import logging
from typing import Any
import pytest
import pytest_mock
from documents.plugins.date_parsing.base import DateParserConfig
from documents.plugins.date_parsing.regex_parser import RegexDateParserPlugin
@pytest.mark.date_parsing
class TestParseString:
"""Tests for DateParser._parse_string method via RegexDateParser."""
@pytest.mark.parametrize(
("date_string", "date_order", "expected_year"),
[
pytest.param("15/01/2024", "DMY", 2024, id="dmy_slash"),
pytest.param("01/15/2024", "MDY", 2024, id="mdy_slash"),
pytest.param("2024/01/15", "YMD", 2024, id="ymd_slash"),
pytest.param("January 15, 2024", "DMY", 2024, id="month_name_comma"),
pytest.param("15 Jan 2024", "DMY", 2024, id="day_abbr_month_year"),
pytest.param("15.01.2024", "DMY", 2024, id="dmy_dot"),
pytest.param("2024-01-15", "YMD", 2024, id="ymd_dash"),
],
)
def test_parse_string_valid_formats(
self,
regex_parser: RegexDateParserPlugin,
date_string: str,
date_order: str,
expected_year: int,
) -> None:
"""Should correctly parse various valid date formats."""
result = regex_parser._parse_string(date_string, date_order)
assert result is not None
assert result.year == expected_year
@pytest.mark.parametrize(
"invalid_string",
[
pytest.param("not a date", id="plain_text"),
pytest.param("32/13/2024", id="invalid_day_month"),
pytest.param("", id="empty_string"),
pytest.param("abc123xyz", id="alphanumeric_gibberish"),
pytest.param("99/99/9999", id="out_of_range"),
],
)
def test_parse_string_invalid_input(
self,
regex_parser: RegexDateParserPlugin,
invalid_string: str,
) -> None:
"""Should return None for invalid date strings."""
result = regex_parser._parse_string(invalid_string, "DMY")
assert result is None
def test_parse_string_handles_exceptions(
self,
caplog: pytest.LogCaptureFixture,
mocker: pytest_mock.MockerFixture,
regex_parser: RegexDateParserPlugin,
) -> None:
"""Should handle and log exceptions from dateparser gracefully."""
with caplog.at_level(
logging.ERROR,
logger="documents.plugins.date_parsing.base",
):
# We still need to mock dateparser.parse to force the exception
mocker.patch(
"documents.plugins.date_parsing.base.dateparser.parse",
side_effect=ValueError(
"Parsing error: 01/01/2024",
),
)
# 1. Execute the function under test
result = regex_parser._parse_string("01/01/2024", "DMY")
assert result is None
# Check if an error was logged
assert len(caplog.records) == 1
assert caplog.records[0].levelname == "ERROR"
# Check if the specific error message is present
assert "Error while parsing date string" in caplog.text
# Optional: Check for the exact exception message if it's included in the log
assert "Parsing error: 01/01/2024" in caplog.text
@pytest.mark.date_parsing
class TestFilterDate:
"""Tests for DateParser._filter_date method via RegexDateParser."""
@pytest.mark.parametrize(
("date", "expected_output"),
[
# Valid Dates
pytest.param(
datetime.datetime(2024, 1, 10, tzinfo=datetime.timezone.utc),
datetime.datetime(2024, 1, 10, tzinfo=datetime.timezone.utc),
id="valid_past_date",
),
pytest.param(
datetime.datetime(2024, 1, 15, 12, 0, 0, tzinfo=datetime.timezone.utc),
datetime.datetime(2024, 1, 15, 12, 0, 0, tzinfo=datetime.timezone.utc),
id="exactly_at_reference",
),
pytest.param(
datetime.datetime(1901, 1, 1, tzinfo=datetime.timezone.utc),
datetime.datetime(1901, 1, 1, tzinfo=datetime.timezone.utc),
id="year_1901_valid",
),
# Date is > reference_time
pytest.param(
datetime.datetime(2024, 1, 16, tzinfo=datetime.timezone.utc),
None,
id="future_date_day_after",
),
# date.date() in ignore_dates
pytest.param(
datetime.datetime(2024, 1, 1, 0, 0, 0, tzinfo=datetime.timezone.utc),
None,
id="ignored_date_midnight_jan1",
),
pytest.param(
datetime.datetime(2024, 1, 1, 10, 30, 0, tzinfo=datetime.timezone.utc),
None,
id="ignored_date_midday_jan1",
),
pytest.param(
datetime.datetime(2024, 12, 25, 15, 0, 0, tzinfo=datetime.timezone.utc),
None,
id="ignored_date_dec25_future",
),
# date.year <= 1900
pytest.param(
datetime.datetime(1899, 12, 31, tzinfo=datetime.timezone.utc),
None,
id="year_1899",
),
pytest.param(
datetime.datetime(1900, 1, 1, tzinfo=datetime.timezone.utc),
None,
id="year_1900_boundary",
),
# date is None
pytest.param(None, None, id="none_input"),
],
)
def test_filter_date_validation_rules(
self,
config_with_ignore_dates: DateParserConfig,
date: datetime.datetime | None,
expected_output: datetime.datetime | None,
) -> None:
"""Should correctly validate dates against various rules."""
parser = RegexDateParserPlugin(config_with_ignore_dates)
result = parser._filter_date(date)
assert result == expected_output
def test_filter_date_respects_ignore_dates(
self,
config_with_ignore_dates: DateParserConfig,
) -> None:
"""Should filter out dates in the ignore_dates set."""
parser = RegexDateParserPlugin(config_with_ignore_dates)
ignored_date = datetime.datetime(
2024,
1,
1,
12,
0,
tzinfo=datetime.timezone.utc,
)
another_ignored = datetime.datetime(
2024,
12,
25,
15,
30,
tzinfo=datetime.timezone.utc,
)
allowed_date = datetime.datetime(
2024,
1,
2,
12,
0,
tzinfo=datetime.timezone.utc,
)
assert parser._filter_date(ignored_date) is None
assert parser._filter_date(another_ignored) is None
assert parser._filter_date(allowed_date) == allowed_date
def test_filter_date_timezone_aware(
self,
regex_parser: RegexDateParserPlugin,
) -> None:
"""Should work with timezone-aware datetimes."""
date_utc = datetime.datetime(2024, 1, 10, 12, 0, tzinfo=datetime.timezone.utc)
result = regex_parser._filter_date(date_utc)
assert result is not None
assert result.tzinfo is not None
@pytest.mark.date_parsing
class TestRegexDateParser:
@pytest.mark.parametrize(
("filename", "content", "expected"),
[
pytest.param(
"report-2023-12-25.txt",
"Event recorded on 25/12/2022.",
[
datetime.datetime(2023, 12, 25, tzinfo=datetime.timezone.utc),
datetime.datetime(2022, 12, 25, tzinfo=datetime.timezone.utc),
],
id="filename-y-m-d_and_content-d-m-y",
),
pytest.param(
"img_2023.01.02.jpg",
"Taken on 01/02/2023",
[
datetime.datetime(2023, 1, 2, tzinfo=datetime.timezone.utc),
datetime.datetime(2023, 2, 1, tzinfo=datetime.timezone.utc),
],
id="ambiguous-dates-respect-orders",
),
pytest.param(
"notes.txt",
"bad date 99/99/9999 and 25/12/2022",
[
datetime.datetime(2022, 12, 25, tzinfo=datetime.timezone.utc),
],
id="parse-exception-skips-bad-and-yields-good",
),
],
)
def test_parse_returns_expected_dates(
self,
base_config: DateParserConfig,
mocker: pytest_mock.MockerFixture,
filename: str,
content: str,
expected: list[datetime.datetime],
) -> None:
"""
High-level tests that exercise RegexDateParser.parse only.
dateparser.parse is mocked so tests are deterministic.
"""
parser = RegexDateParserPlugin(base_config)
# Patch the dateparser.parse
target = "documents.plugins.date_parsing.base.dateparser.parse"
def fake_parse(
date_string: str,
settings: dict[str, Any] | None = None,
locales: None = None,
) -> datetime.datetime | None:
date_order = settings.get("DATE_ORDER") if settings else None
# Filename-style YYYY-MM-DD / YYYY.MM.DD
if (
"2023-12-25" in date_string
or "2023.12.25" in date_string
or "2023-12-25" in date_string
):
return datetime.datetime(2023, 12, 25, tzinfo=datetime.timezone.utc)
# content DMY 25/12/2022
if "25/12/2022" in date_string or "25-12-2022" in date_string:
return datetime.datetime(2022, 12, 25, tzinfo=datetime.timezone.utc)
# filename YMD 2023.01.02
if "2023.01.02" in date_string or "2023-01-02" in date_string:
return datetime.datetime(2023, 1, 2, tzinfo=datetime.timezone.utc)
# ambiguous 01/02/2023 -> respect DATE_ORDER setting
if "01/02/2023" in date_string:
if date_order == "DMY":
return datetime.datetime(2023, 2, 1, tzinfo=datetime.timezone.utc)
if date_order == "YMD":
return datetime.datetime(2023, 1, 2, tzinfo=datetime.timezone.utc)
# fallback
return datetime.datetime(2023, 2, 1, tzinfo=datetime.timezone.utc)
# simulate parse failure for malformed input
if "99/99/9999" in date_string or "bad date" in date_string:
raise Exception("parse failed for malformed date")
return None
mocker.patch(target, side_effect=fake_parse)
results = list(parser.parse(filename, content))
assert results == expected
for dt in results:
assert dt.tzinfo is not None
def test_parse_filters_future_and_ignored_dates(
self,
mocker: pytest_mock.MockerFixture,
) -> None:
"""
Ensure parser filters out:
- dates after reference_time
- dates whose .date() are in ignore_dates
"""
cfg = DateParserConfig(
languages=["en"],
timezone_str="UTC",
ignore_dates={datetime.date(2023, 12, 10)},
reference_time=datetime.datetime(
2024,
1,
15,
12,
0,
0,
tzinfo=datetime.timezone.utc,
),
filename_date_order="YMD",
content_date_order="DMY",
)
parser = RegexDateParserPlugin(cfg)
target = "documents.plugins.date_parsing.base.dateparser.parse"
def fake_parse(
date_string: str,
settings: dict[str, Any] | None = None,
locales: None = None,
) -> datetime.datetime | None:
if "10/12/2023" in date_string or "10-12-2023" in date_string:
# ignored date
return datetime.datetime(2023, 12, 10, tzinfo=datetime.timezone.utc)
if "01/02/2024" in date_string or "01-02-2024" in date_string:
# future relative to reference_time -> filtered
return datetime.datetime(2024, 2, 1, tzinfo=datetime.timezone.utc)
if "05/01/2023" in date_string or "05-01-2023" in date_string:
# valid
return datetime.datetime(2023, 1, 5, tzinfo=datetime.timezone.utc)
return None
mocker.patch(target, side_effect=fake_parse)
content = "Ignored: 10/12/2023, Future: 01/02/2024, Keep: 05/01/2023"
results = list(parser.parse("whatever.txt", content))
assert results == [datetime.datetime(2023, 1, 5, tzinfo=datetime.timezone.utc)]
def test_parse_handles_no_matches_and_returns_empty_list(
self,
base_config: DateParserConfig,
) -> None:
"""
When there are no matching date-like substrings, parse should yield nothing.
"""
parser = RegexDateParserPlugin(base_config)
results = list(
parser.parse("no-dates.txt", "this has no dates whatsoever"),
)
assert results == []
def test_parse_skips_filename_when_filename_date_order_none(
self,
mocker: pytest_mock.MockerFixture,
) -> None:
"""
When filename_date_order is None the parser must not attempt to parse the filename.
Only dates found in the content should be passed to dateparser.parse.
"""
cfg = DateParserConfig(
languages=["en"],
timezone_str="UTC",
ignore_dates=set(),
reference_time=datetime.datetime(
2024,
1,
15,
12,
0,
0,
tzinfo=datetime.timezone.utc,
),
filename_date_order=None,
content_date_order="DMY",
)
parser = RegexDateParserPlugin(cfg)
# Patch the module's dateparser.parse so we can inspect calls
target = "documents.plugins.date_parsing.base.dateparser.parse"
def fake_parse(
date_string: str,
settings: dict[str, Any] | None = None,
locales: None = None,
) -> datetime.datetime | None:
# return distinct datetimes so we can tell which source was parsed
if "25/12/2022" in date_string:
return datetime.datetime(2022, 12, 25, tzinfo=datetime.timezone.utc)
if "2023-12-25" in date_string:
return datetime.datetime(2023, 12, 25, tzinfo=datetime.timezone.utc)
return None
mock = mocker.patch(target, side_effect=fake_parse)
filename = "report-2023-12-25.txt"
content = "Event recorded on 25/12/2022."
results = list(parser.parse(filename, content))
# Only the content date should have been parsed -> one call
assert mock.call_count == 1
# # first call, first positional arg
called_date_string = mock.call_args_list[0][0][0]
assert "25/12/2022" in called_date_string
# And the parser should have yielded the corresponding datetime
assert results == [
datetime.datetime(2022, 12, 25, tzinfo=datetime.timezone.utc),
]

View File

@@ -1989,11 +1989,11 @@ class TestDocumentApi(DirectoriesMixin, DocumentConsumeDelayMixin, APITestCase):
response = self.client.get(f"/api/documents/{doc.pk}/suggestions/")
self.assertEqual(response.status_code, status.HTTP_200_OK)
@mock.patch("documents.parsers.parse_date_generator")
@mock.patch("documents.views.get_date_parser")
@override_settings(NUMBER_OF_SUGGESTED_DATES=0)
def test_get_suggestions_dates_disabled(
self,
parse_date_generator,
mock_get_date_parser: mock.MagicMock,
):
"""
GIVEN:
@@ -2010,7 +2010,8 @@ class TestDocumentApi(DirectoriesMixin, DocumentConsumeDelayMixin, APITestCase):
)
self.client.get(f"/api/documents/{doc.pk}/suggestions/")
self.assertFalse(parse_date_generator.called)
mock_get_date_parser.assert_not_called()
def test_saved_views(self) -> None:
u1 = User.objects.create_superuser("user1")

View File

@@ -838,3 +838,61 @@ class TestApiWorkflows(DirectoriesMixin, APITestCase):
self.assertEqual(response.status_code, status.HTTP_200_OK)
self.action.refresh_from_db()
self.assertEqual(self.action.assign_title, "Patched Title")
def test_password_action_passwords_field(self):
"""
GIVEN:
- Nothing
WHEN:
- A workflow password removal action is created with passwords set
THEN:
- The passwords field is correctly stored and retrieved
"""
passwords = ["password1", "password2", "password3"]
response = self.client.post(
"/api/workflow_actions/",
json.dumps(
{
"type": WorkflowAction.WorkflowActionType.PASSWORD_REMOVAL,
"passwords": passwords,
},
),
content_type="application/json",
)
self.assertEqual(response.status_code, status.HTTP_201_CREATED)
self.assertEqual(response.data["passwords"], passwords)
def test_password_action_invalid_passwords_field(self):
"""
GIVEN:
- Nothing
WHEN:
- A workflow password removal action is created with invalid passwords field
THEN:
- The required validation error is raised
"""
for payload in [
{"type": WorkflowAction.WorkflowActionType.PASSWORD_REMOVAL},
{
"type": WorkflowAction.WorkflowActionType.PASSWORD_REMOVAL,
"passwords": "",
},
{
"type": WorkflowAction.WorkflowActionType.PASSWORD_REMOVAL,
"passwords": [],
},
{
"type": WorkflowAction.WorkflowActionType.PASSWORD_REMOVAL,
"passwords": ["", "password2"],
},
]:
response = self.client.post(
"/api/workflow_actions/",
json.dumps(payload),
content_type="application/json",
)
self.assertEqual(response.status_code, status.HTTP_400_BAD_REQUEST)
self.assertIn(
"Passwords are required",
str(response.data["non_field_errors"][0]),
)

View File

@@ -1,538 +0,0 @@
import datetime
from zoneinfo import ZoneInfo
import pytest
from pytest_django.fixtures import SettingsWrapper
from documents.parsers import parse_date
from documents.parsers import parse_date_generator
@pytest.mark.django_db()
class TestDate:
def test_date_format_1(self) -> None:
text = "lorem ipsum 130218 lorem ipsum"
assert parse_date("", text) is None
def test_date_format_2(self) -> None:
text = "lorem ipsum 2018 lorem ipsum"
assert parse_date("", text) is None
def test_date_format_3(self) -> None:
text = "lorem ipsum 20180213 lorem ipsum"
assert parse_date("", text) is None
def test_date_format_4(self, settings_timezone: ZoneInfo) -> None:
text = "lorem ipsum 13.02.2018 lorem ipsum"
date = parse_date("", text)
assert date == datetime.datetime(2018, 2, 13, 0, 0, tzinfo=settings_timezone)
def test_date_format_5(self, settings_timezone: ZoneInfo) -> None:
text = "lorem ipsum 130218, 2018, 20180213 and lorem 13.02.2018 lorem ipsum"
date = parse_date("", text)
assert date == datetime.datetime(2018, 2, 13, 0, 0, tzinfo=settings_timezone)
def test_date_format_6(self) -> None:
text = (
"lorem ipsum\n"
"Wohnort\n"
"3100\n"
"IBAN\n"
"AT87 4534\n"
"1234\n"
"1234 5678\n"
"BIC\n"
"lorem ipsum"
)
assert parse_date("", text) is None
def test_date_format_7(
self,
settings: SettingsWrapper,
settings_timezone: ZoneInfo,
) -> None:
settings.DATE_PARSER_LANGUAGES = ["de"]
text = "lorem ipsum\nMärz 2019\nlorem ipsum"
date = parse_date("", text)
assert date == datetime.datetime(2019, 3, 1, 0, 0, tzinfo=settings_timezone)
def test_date_format_8(
self,
settings: SettingsWrapper,
settings_timezone: ZoneInfo,
) -> None:
settings.DATE_PARSER_LANGUAGES = ["de"]
text = (
"lorem ipsum\n"
"Wohnort\n"
"3100\n"
"IBAN\n"
"AT87 4534\n"
"1234\n"
"1234 5678\n"
"BIC\n"
"lorem ipsum\n"
"März 2020"
)
assert parse_date("", text) == datetime.datetime(
2020,
3,
1,
0,
0,
tzinfo=settings_timezone,
)
def test_date_format_9(
self,
settings: SettingsWrapper,
settings_timezone: ZoneInfo,
) -> None:
settings.DATE_PARSER_LANGUAGES = ["de"]
text = "lorem ipsum\n27. Nullmonth 2020\nMärz 2020\nlorem ipsum"
assert parse_date("", text) == datetime.datetime(
2020,
3,
1,
0,
0,
tzinfo=settings_timezone,
)
def test_date_format_10(self, settings_timezone: ZoneInfo) -> None:
text = "Customer Number Currency 22-MAR-2022 Credit Card 1934829304"
assert parse_date("", text) == datetime.datetime(
2022,
3,
22,
0,
0,
tzinfo=settings_timezone,
)
def test_date_format_11(self, settings_timezone: ZoneInfo) -> None:
text = "Customer Number Currency 22 MAR 2022 Credit Card 1934829304"
assert parse_date("", text) == datetime.datetime(
2022,
3,
22,
0,
0,
tzinfo=settings_timezone,
)
def test_date_format_12(self, settings_timezone: ZoneInfo) -> None:
text = "Customer Number Currency 22/MAR/2022 Credit Card 1934829304"
assert parse_date("", text) == datetime.datetime(
2022,
3,
22,
0,
0,
tzinfo=settings_timezone,
)
def test_date_format_13(self, settings_timezone: ZoneInfo) -> None:
text = "Customer Number Currency 22.MAR.2022 Credit Card 1934829304"
assert parse_date("", text) == datetime.datetime(
2022,
3,
22,
0,
0,
tzinfo=settings_timezone,
)
def test_date_format_14(self, settings_timezone: ZoneInfo) -> None:
text = "Customer Number Currency 22.MAR 2022 Credit Card 1934829304"
assert parse_date("", text) == datetime.datetime(
2022,
3,
22,
0,
0,
tzinfo=settings_timezone,
)
def test_date_format_15(self) -> None:
text = "Customer Number Currency 22.MAR.22 Credit Card 1934829304"
assert parse_date("", text) is None
def test_date_format_16(self) -> None:
text = "Customer Number Currency 22.MAR,22 Credit Card 1934829304"
assert parse_date("", text) is None
def test_date_format_17(self) -> None:
text = "Customer Number Currency 22,MAR,2022 Credit Card 1934829304"
assert parse_date("", text) is None
def test_date_format_18(self) -> None:
text = "Customer Number Currency 22 MAR,2022 Credit Card 1934829304"
assert parse_date("", text) is None
def test_date_format_19(self, settings_timezone: ZoneInfo) -> None:
text = "Customer Number Currency 21st MAR 2022 Credit Card 1934829304"
assert parse_date("", text) == datetime.datetime(
2022,
3,
21,
0,
0,
tzinfo=settings_timezone,
)
def test_date_format_20(self, settings_timezone: ZoneInfo) -> None:
text = "Customer Number Currency 22nd March 2022 Credit Card 1934829304"
assert parse_date("", text) == datetime.datetime(
2022,
3,
22,
0,
0,
tzinfo=settings_timezone,
)
def test_date_format_21(self, settings_timezone: ZoneInfo) -> None:
text = "Customer Number Currency 2nd MAR 2022 Credit Card 1934829304"
assert parse_date("", text) == datetime.datetime(
2022,
3,
2,
0,
0,
tzinfo=settings_timezone,
)
def test_date_format_22(self, settings_timezone: ZoneInfo) -> None:
text = "Customer Number Currency 23rd MAR 2022 Credit Card 1934829304"
assert parse_date("", text) == datetime.datetime(
2022,
3,
23,
0,
0,
tzinfo=settings_timezone,
)
def test_date_format_23(self, settings_timezone: ZoneInfo) -> None:
text = "Customer Number Currency 24th MAR 2022 Credit Card 1934829304"
assert parse_date("", text) == datetime.datetime(
2022,
3,
24,
0,
0,
tzinfo=settings_timezone,
)
def test_date_format_24(self, settings_timezone: ZoneInfo) -> None:
text = "Customer Number Currency 21-MAR-2022 Credit Card 1934829304"
assert parse_date("", text) == datetime.datetime(
2022,
3,
21,
0,
0,
tzinfo=settings_timezone,
)
def test_date_format_25(self, settings_timezone: ZoneInfo) -> None:
text = "Customer Number Currency 25TH MAR 2022 Credit Card 1934829304"
assert parse_date("", text) == datetime.datetime(
2022,
3,
25,
0,
0,
tzinfo=settings_timezone,
)
def test_date_format_26(self, settings_timezone: ZoneInfo) -> None:
text = "CHASE 0 September 25, 2019 JPMorgan Chase Bank, NA. P0 Box 182051"
assert parse_date("", text) == datetime.datetime(
2019,
9,
25,
0,
0,
tzinfo=settings_timezone,
)
def test_crazy_date_past(self) -> None:
assert parse_date("", "01-07-0590 00:00:00") is None
def test_crazy_date_future(self) -> None:
assert parse_date("", "01-07-2350 00:00:00") is None
def test_crazy_date_with_spaces(self) -> None:
assert parse_date("", "20 408000l 2475") is None
def test_utf_month_names(
self,
settings: SettingsWrapper,
settings_timezone: ZoneInfo,
) -> None:
settings.DATE_PARSER_LANGUAGES = ["fr", "de", "hr", "cs", "pl", "tr"]
assert parse_date("", "13 décembre 2023") == datetime.datetime(
2023,
12,
13,
0,
0,
tzinfo=settings_timezone,
)
assert parse_date("", "13 août 2022") == datetime.datetime(
2022,
8,
13,
0,
0,
tzinfo=settings_timezone,
)
assert parse_date("", "11 März 2020") == datetime.datetime(
2020,
3,
11,
0,
0,
tzinfo=settings_timezone,
)
assert parse_date("", "17. ožujka 2018.") == datetime.datetime(
2018,
3,
17,
0,
0,
tzinfo=settings_timezone,
)
assert parse_date("", "1. veljače 2016.") == datetime.datetime(
2016,
2,
1,
0,
0,
tzinfo=settings_timezone,
)
assert parse_date("", "15. února 1985") == datetime.datetime(
1985,
2,
15,
0,
0,
tzinfo=settings_timezone,
)
assert parse_date("", "30. září 2011") == datetime.datetime(
2011,
9,
30,
0,
0,
tzinfo=settings_timezone,
)
assert parse_date("", "28. května 1990") == datetime.datetime(
1990,
5,
28,
0,
0,
tzinfo=settings_timezone,
)
assert parse_date("", "1. grudzień 1997") == datetime.datetime(
1997,
12,
1,
0,
0,
tzinfo=settings_timezone,
)
assert parse_date("", "17 Şubat 2024") == datetime.datetime(
2024,
2,
17,
0,
0,
tzinfo=settings_timezone,
)
assert parse_date("", "30 Ağustos 2012") == datetime.datetime(
2012,
8,
30,
0,
0,
tzinfo=settings_timezone,
)
assert parse_date("", "17 Eylül 2000") == datetime.datetime(
2000,
9,
17,
0,
0,
tzinfo=settings_timezone,
)
assert parse_date("", "5. október 1992") == datetime.datetime(
1992,
10,
5,
0,
0,
tzinfo=settings_timezone,
)
def test_multiple_dates(self, settings_timezone: ZoneInfo) -> None:
text = """This text has multiple dates.
For example 02.02.2018, 22 July 2022 and December 2021.
But not 24-12-9999 because it's in the future..."""
dates = list(parse_date_generator("", text))
assert dates == [
datetime.datetime(2018, 2, 2, 0, 0, tzinfo=settings_timezone),
datetime.datetime(
2022,
7,
22,
0,
0,
tzinfo=settings_timezone,
),
datetime.datetime(
2021,
12,
1,
0,
0,
tzinfo=settings_timezone,
),
]
def test_filename_date_parse_valid_ymd(
self,
settings: SettingsWrapper,
settings_timezone: ZoneInfo,
) -> None:
"""
GIVEN:
- Date parsing from the filename is enabled
- Filename date format is with Year Month Day (YMD)
- Filename contains date matching the format
THEN:
- Should parse the date from the filename
"""
settings.FILENAME_DATE_ORDER = "YMD"
assert parse_date(
"/tmp/Scan-2022-04-01.pdf",
"No date in here",
) == datetime.datetime(2022, 4, 1, 0, 0, tzinfo=settings_timezone)
def test_filename_date_parse_valid_dmy(
self,
settings: SettingsWrapper,
settings_timezone: ZoneInfo,
) -> None:
"""
GIVEN:
- Date parsing from the filename is enabled
- Filename date format is with Day Month Year (DMY)
- Filename contains date matching the format
THEN:
- Should parse the date from the filename
"""
settings.FILENAME_DATE_ORDER = "DMY"
assert parse_date(
"/tmp/Scan-10.01.2021.pdf",
"No date in here",
) == datetime.datetime(2021, 1, 10, 0, 0, tzinfo=settings_timezone)
def test_filename_date_parse_invalid(self, settings: SettingsWrapper) -> None:
"""
GIVEN:
- Date parsing from the filename is enabled
- Filename includes no date
- File content includes no date
THEN:
- No date is parsed
"""
settings.FILENAME_DATE_ORDER = "YMD"
assert parse_date("/tmp/20 408000l 2475 - test.pdf", "No date in here") is None
def test_filename_date_ignored_use_content(
self,
settings: SettingsWrapper,
settings_timezone: ZoneInfo,
) -> None:
"""
GIVEN:
- Date parsing from the filename is enabled
- Filename date format is with Day Month Year (YMD)
- Date order is Day Month Year (DMY, the default)
- Filename contains date matching the format
- Filename date is an ignored date
- File content includes a date
THEN:
- Should parse the date from the content not filename
"""
settings.FILENAME_DATE_ORDER = "YMD"
settings.IGNORE_DATES = (datetime.date(2022, 4, 1),)
assert parse_date(
"/tmp/Scan-2022-04-01.pdf",
"The matching date is 24.03.2022",
) == datetime.datetime(2022, 3, 24, 0, 0, tzinfo=settings_timezone)
def test_ignored_dates_default_order(
self,
settings: SettingsWrapper,
settings_timezone: ZoneInfo,
) -> None:
"""
GIVEN:
- Ignore dates have been set
- File content includes ignored dates
- File content includes 1 non-ignored date
THEN:
- Should parse the date non-ignored date from content
"""
settings.IGNORE_DATES = (datetime.date(2019, 11, 3), datetime.date(2020, 1, 17))
text = "lorem ipsum 110319, 20200117 and lorem 13.02.2018 lorem ipsum"
assert parse_date("", text) == datetime.datetime(
2018,
2,
13,
0,
0,
tzinfo=settings_timezone,
)
def test_ignored_dates_order_ymd(
self,
settings: SettingsWrapper,
settings_timezone: ZoneInfo,
) -> None:
"""
GIVEN:
- Ignore dates have been set
- Date order is Year Month Date (YMD)
- File content includes ignored dates
- File content includes 1 non-ignored date
THEN:
- Should parse the date non-ignored date from content
"""
settings.FILENAME_DATE_ORDER = "YMD"
settings.IGNORE_DATES = (datetime.date(2019, 11, 3), datetime.date(2020, 1, 17))
text = "lorem ipsum 190311, 20200117 and lorem 13.02.2018 lorem ipsum"
assert parse_date("", text) == datetime.datetime(
2018,
2,
13,
0,
0,
tzinfo=settings_timezone,
)

View File

@@ -2,6 +2,7 @@ import datetime
import json
import shutil
import socket
import tempfile
from datetime import timedelta
from pathlib import Path
from typing import TYPE_CHECKING
@@ -60,6 +61,7 @@ from documents.tests.utils import DirectoriesMixin
from documents.tests.utils import DummyProgressManager
from documents.tests.utils import FileSystemAssertsMixin
from documents.tests.utils import SampleDirMixin
from documents.workflows.actions import execute_password_removal_action
from paperless_mail.models import MailAccount
from paperless_mail.models import MailRule
@@ -3722,6 +3724,196 @@ class TestWorkflows(
mock_post.assert_called_once()
@mock.patch("documents.bulk_edit.remove_password")
def test_password_removal_action_attempts_multiple_passwords(
self,
mock_remove_password,
):
"""
GIVEN:
- Workflow password removal action
- Multiple passwords provided
WHEN:
- Document updated triggering the workflow
THEN:
- Password removal is attempted until one succeeds
"""
doc = Document.objects.create(
title="Protected",
checksum="pw-checksum",
)
trigger = WorkflowTrigger.objects.create(
type=WorkflowTrigger.WorkflowTriggerType.DOCUMENT_UPDATED,
)
action = WorkflowAction.objects.create(
type=WorkflowAction.WorkflowActionType.PASSWORD_REMOVAL,
passwords="wrong, right\n extra ",
)
workflow = Workflow.objects.create(name="Password workflow")
workflow.triggers.add(trigger)
workflow.actions.add(action)
mock_remove_password.side_effect = [
ValueError("wrong password"),
"OK",
]
run_workflows(trigger.type, doc)
assert mock_remove_password.call_count == 2
mock_remove_password.assert_has_calls(
[
mock.call(
[doc.id],
password="wrong",
update_document=True,
user=doc.owner,
),
mock.call(
[doc.id],
password="right",
update_document=True,
user=doc.owner,
),
],
)
@mock.patch("documents.bulk_edit.remove_password")
def test_password_removal_action_fails_without_correct_password(
self,
mock_remove_password,
):
"""
GIVEN:
- Workflow password removal action
- No correct password provided
WHEN:
- Document updated triggering the workflow
THEN:
- Password removal is attempted for all passwords and fails
"""
doc = Document.objects.create(
title="Protected",
checksum="pw-checksum-2",
)
trigger = WorkflowTrigger.objects.create(
type=WorkflowTrigger.WorkflowTriggerType.DOCUMENT_UPDATED,
)
action = WorkflowAction.objects.create(
type=WorkflowAction.WorkflowActionType.PASSWORD_REMOVAL,
passwords=" \n , ",
)
workflow = Workflow.objects.create(name="Password workflow missing passwords")
workflow.triggers.add(trigger)
workflow.actions.add(action)
run_workflows(trigger.type, doc)
mock_remove_password.assert_not_called()
@mock.patch("documents.bulk_edit.remove_password")
def test_password_removal_action_skips_without_passwords(
self,
mock_remove_password,
):
"""
GIVEN:
- Workflow password removal action with no passwords
WHEN:
- Workflow is run
THEN:
- Password removal is not attempted
"""
doc = Document.objects.create(
title="Protected",
checksum="pw-checksum-2",
)
trigger = WorkflowTrigger.objects.create(
type=WorkflowTrigger.WorkflowTriggerType.DOCUMENT_UPDATED,
)
action = WorkflowAction.objects.create(
type=WorkflowAction.WorkflowActionType.PASSWORD_REMOVAL,
passwords="",
)
workflow = Workflow.objects.create(name="Password workflow missing passwords")
workflow.triggers.add(trigger)
workflow.actions.add(action)
run_workflows(trigger.type, doc)
mock_remove_password.assert_not_called()
@mock.patch("documents.bulk_edit.remove_password")
def test_password_removal_consumable_document_deferred(
self,
mock_remove_password,
):
"""
GIVEN:
- Workflow password removal action
- Simulated consumption trigger (a ConsumableDocument is used)
WHEN:
- Document consumption is finished
THEN:
- Password removal is attempted
"""
action = WorkflowAction.objects.create(
type=WorkflowAction.WorkflowActionType.PASSWORD_REMOVAL,
passwords="first, second",
)
temp_dir = Path(tempfile.mkdtemp())
original_file = temp_dir / "file.pdf"
original_file.write_bytes(b"pdf content")
consumable = ConsumableDocument(
source=DocumentSource.ApiUpload,
original_file=original_file,
)
execute_password_removal_action(action, consumable, logging_group=None)
mock_remove_password.assert_not_called()
mock_remove_password.side_effect = [
ValueError("bad password"),
"OK",
]
doc = Document.objects.create(
checksum="pw-checksum-consumed",
title="Protected",
)
document_consumption_finished.send(
sender=self.__class__,
document=doc,
)
assert mock_remove_password.call_count == 2
mock_remove_password.assert_has_calls(
[
mock.call(
[doc.id],
password="first",
update_document=True,
user=doc.owner,
),
mock.call(
[doc.id],
password="second",
update_document=True,
user=doc.owner,
),
],
)
# ensure handler disconnected after first run
document_consumption_finished.send(
sender=self.__class__,
document=doc,
)
assert mock_remove_password.call_count == 2
class TestWebhookSend:
def test_send_webhook_data_or_json(

View File

@@ -148,7 +148,6 @@ from documents.models import Workflow
from documents.models import WorkflowAction
from documents.models import WorkflowTrigger
from documents.parsers import get_parser_class_for_mime_type
from documents.parsers import parse_date_generator
from documents.permissions import AcknowledgeTasksPermissions
from documents.permissions import PaperlessAdminPermissions
from documents.permissions import PaperlessNotePermissions
@@ -158,6 +157,7 @@ from documents.permissions import get_document_count_filter_for_user
from documents.permissions import get_objects_for_user_owner_aware
from documents.permissions import has_perms_owner_aware
from documents.permissions import set_permissions_for_object
from documents.plugins.date_parsing import get_date_parser
from documents.schema import generate_object_with_permissions_schema
from documents.serialisers import AcknowledgeTasksViewSerializer
from documents.serialisers import BulkDownloadSerializer
@@ -1023,16 +1023,17 @@ class DocumentViewSet(
dates = []
if settings.NUMBER_OF_SUGGESTED_DATES > 0:
gen = parse_date_generator(doc.filename, doc.content)
dates = sorted(
{
i
for i in itertools.islice(
gen,
settings.NUMBER_OF_SUGGESTED_DATES,
)
},
)
with get_date_parser() as date_parser:
gen = date_parser.parse(doc.filename, doc.content)
dates = sorted(
{
i
for i in itertools.islice(
gen,
settings.NUMBER_OF_SUGGESTED_DATES,
)
},
)
resp_data = {
"correspondents": [

View File

@@ -1,4 +1,5 @@
import logging
import re
from pathlib import Path
from django.conf import settings
@@ -14,6 +15,7 @@ from documents.models import Document
from documents.models import DocumentType
from documents.models import WorkflowAction
from documents.models import WorkflowTrigger
from documents.signals import document_consumption_finished
from documents.templating.workflows import parse_w_workflow_placeholders
from documents.workflows.webhooks import send_webhook
@@ -265,3 +267,74 @@ def execute_webhook_action(
f"Error occurred sending webhook: {e}",
extra={"group": logging_group},
)
def execute_password_removal_action(
action: WorkflowAction,
document: Document | ConsumableDocument,
logging_group,
) -> None:
"""
Try to remove a password from a document using the configured list.
"""
passwords = action.passwords
if not passwords:
logger.warning(
"Password removal action %s has no passwords configured",
action.pk,
extra={"group": logging_group},
)
return
passwords = [
password.strip()
for password in re.split(r"[,\n]", passwords)
if password.strip()
]
if isinstance(document, ConsumableDocument):
# hook the consumption-finished signal to attempt password removal later
def handler(sender, **kwargs):
consumed_document: Document = kwargs.get("document")
if consumed_document is not None:
execute_password_removal_action(
action,
consumed_document,
logging_group,
)
document_consumption_finished.disconnect(handler)
document_consumption_finished.connect(handler, weak=False)
return
# import here to avoid circular dependency
from documents.bulk_edit import remove_password
for password in passwords:
try:
remove_password(
[document.id],
password=password,
update_document=True,
user=document.owner,
)
logger.info(
"Removed password from document %s using workflow action %s",
document.pk,
action.pk,
extra={"group": logging_group},
)
return
except ValueError as e:
logger.warning(
"Password removal failed for document %s with supplied password: %s",
document.pk,
e,
extra={"group": logging_group},
)
logger.error(
"Password removal failed for document %s after trying all provided passwords",
document.pk,
extra={"group": logging_group},
)

View File

@@ -2,7 +2,7 @@ msgid ""
msgstr ""
"Project-Id-Version: paperless-ngx\n"
"Report-Msgid-Bugs-To: \n"
"POT-Creation-Date: 2026-01-31 17:12+0000\n"
"POT-Creation-Date: 2026-02-03 20:10+0000\n"
"PO-Revision-Date: 2022-02-17 04:17\n"
"Last-Translator: \n"
"Language-Team: English\n"
@@ -89,7 +89,7 @@ msgstr ""
msgid "Automatic"
msgstr ""
#: documents/models.py:66 documents/models.py:444 documents/models.py:1646
#: documents/models.py:66 documents/models.py:444 documents/models.py:1659
#: paperless_mail/models.py:23 paperless_mail/models.py:143
msgid "name"
msgstr ""
@@ -252,7 +252,7 @@ msgid "The position of this document in your physical document archive."
msgstr ""
#: documents/models.py:313 documents/models.py:688 documents/models.py:742
#: documents/models.py:1689
#: documents/models.py:1702
msgid "document"
msgstr ""
@@ -1089,183 +1089,197 @@ msgid "Webhook"
msgstr ""
#: documents/models.py:1410
msgid "Password removal"
msgstr ""
#: documents/models.py:1414
msgid "Workflow Action Type"
msgstr ""
#: documents/models.py:1415 documents/models.py:1648
#: documents/models.py:1419 documents/models.py:1661
#: paperless_mail/models.py:145
msgid "order"
msgstr ""
#: documents/models.py:1418
#: documents/models.py:1422
msgid "assign title"
msgstr ""
#: documents/models.py:1422
#: documents/models.py:1426
msgid "Assign a document title, must be a Jinja2 template, see documentation."
msgstr ""
#: documents/models.py:1430 paperless_mail/models.py:274
#: documents/models.py:1434 paperless_mail/models.py:274
msgid "assign this tag"
msgstr ""
#: documents/models.py:1439 paperless_mail/models.py:282
#: documents/models.py:1443 paperless_mail/models.py:282
msgid "assign this document type"
msgstr ""
#: documents/models.py:1448 paperless_mail/models.py:296
#: documents/models.py:1452 paperless_mail/models.py:296
msgid "assign this correspondent"
msgstr ""
#: documents/models.py:1457
#: documents/models.py:1461
msgid "assign this storage path"
msgstr ""
#: documents/models.py:1466
#: documents/models.py:1470
msgid "assign this owner"
msgstr ""
#: documents/models.py:1473
#: documents/models.py:1477
msgid "grant view permissions to these users"
msgstr ""
#: documents/models.py:1480
#: documents/models.py:1484
msgid "grant view permissions to these groups"
msgstr ""
#: documents/models.py:1487
#: documents/models.py:1491
msgid "grant change permissions to these users"
msgstr ""
#: documents/models.py:1494
#: documents/models.py:1498
msgid "grant change permissions to these groups"
msgstr ""
#: documents/models.py:1501
#: documents/models.py:1505
msgid "assign these custom fields"
msgstr ""
#: documents/models.py:1505
#: documents/models.py:1509
msgid "custom field values"
msgstr ""
#: documents/models.py:1509
#: documents/models.py:1513
msgid "Optional values to assign to the custom fields."
msgstr ""
#: documents/models.py:1518
#: documents/models.py:1522
msgid "remove these tag(s)"
msgstr ""
#: documents/models.py:1523
#: documents/models.py:1527
msgid "remove all tags"
msgstr ""
#: documents/models.py:1530
#: documents/models.py:1534
msgid "remove these document type(s)"
msgstr ""
#: documents/models.py:1535
#: documents/models.py:1539
msgid "remove all document types"
msgstr ""
#: documents/models.py:1542
#: documents/models.py:1546
msgid "remove these correspondent(s)"
msgstr ""
#: documents/models.py:1547
#: documents/models.py:1551
msgid "remove all correspondents"
msgstr ""
#: documents/models.py:1554
#: documents/models.py:1558
msgid "remove these storage path(s)"
msgstr ""
#: documents/models.py:1559
#: documents/models.py:1563
msgid "remove all storage paths"
msgstr ""
#: documents/models.py:1566
#: documents/models.py:1570
msgid "remove these owner(s)"
msgstr ""
#: documents/models.py:1571
#: documents/models.py:1575
msgid "remove all owners"
msgstr ""
#: documents/models.py:1578
#: documents/models.py:1582
msgid "remove view permissions for these users"
msgstr ""
#: documents/models.py:1585
#: documents/models.py:1589
msgid "remove view permissions for these groups"
msgstr ""
#: documents/models.py:1592
#: documents/models.py:1596
msgid "remove change permissions for these users"
msgstr ""
#: documents/models.py:1599
#: documents/models.py:1603
msgid "remove change permissions for these groups"
msgstr ""
#: documents/models.py:1604
#: documents/models.py:1608
msgid "remove all permissions"
msgstr ""
#: documents/models.py:1611
#: documents/models.py:1615
msgid "remove these custom fields"
msgstr ""
#: documents/models.py:1616
#: documents/models.py:1620
msgid "remove all custom fields"
msgstr ""
#: documents/models.py:1625
#: documents/models.py:1629
msgid "email"
msgstr ""
#: documents/models.py:1634
#: documents/models.py:1638
msgid "webhook"
msgstr ""
#: documents/models.py:1638
#: documents/models.py:1642
msgid "passwords"
msgstr ""
#: documents/models.py:1646
msgid ""
"Passwords to try when removing PDF protection. Separate with commas or new "
"lines."
msgstr ""
#: documents/models.py:1651
msgid "workflow action"
msgstr ""
#: documents/models.py:1639
#: documents/models.py:1652
msgid "workflow actions"
msgstr ""
#: documents/models.py:1654
#: documents/models.py:1667
msgid "triggers"
msgstr ""
#: documents/models.py:1661
#: documents/models.py:1674
msgid "actions"
msgstr ""
#: documents/models.py:1664 paperless_mail/models.py:154
#: documents/models.py:1677 paperless_mail/models.py:154
msgid "enabled"
msgstr ""
#: documents/models.py:1675
#: documents/models.py:1688
msgid "workflow"
msgstr ""
#: documents/models.py:1679
#: documents/models.py:1692
msgid "workflow trigger type"
msgstr ""
#: documents/models.py:1693
#: documents/models.py:1706
msgid "date run"
msgstr ""
#: documents/models.py:1699
#: documents/models.py:1712
msgid "workflow run"
msgstr ""
#: documents/models.py:1700
#: documents/models.py:1713
msgid "workflow runs"
msgstr ""
@@ -1309,7 +1323,7 @@ msgstr ""
msgid "Duplicate document identifiers are not allowed."
msgstr ""
#: documents/serialisers.py:2330 documents/views.py:2838
#: documents/serialisers.py:2330 documents/views.py:2839
#, python-format
msgid "Documents not found: %(ids)s"
msgstr ""
@@ -1573,20 +1587,20 @@ msgstr ""
msgid "Unable to parse URI {value}"
msgstr ""
#: documents/views.py:2850
#: documents/views.py:2851
#, python-format
msgid "Insufficient permissions to share document %(id)s."
msgstr ""
#: documents/views.py:2893
#: documents/views.py:2894
msgid "Bundle is already being processed."
msgstr ""
#: documents/views.py:2950
#: documents/views.py:2951
msgid "The share link bundle is still being prepared. Please try again later."
msgstr ""
#: documents/views.py:2960
#: documents/views.py:2961
msgid "The share link bundle is unavailable."
msgstr ""

View File

@@ -1,6 +1,5 @@
import dataclasses
import email.contentmanager
import random
import time
import uuid
from collections import namedtuple
@@ -148,11 +147,7 @@ class BogusMailBox(AbstractContextManager):
if "TO" in criteria:
to_ = criteria[criteria.index("TO") + 1].strip('"')
msg = []
for m in self.messages:
for to_addrs in m.to:
if to_ in to_addrs:
msg.append(m)
msg = filter(lambda m: any(to_ in to_addr for to_addr in m.to), msg)
if "UNFLAGGED" in criteria:
msg = filter(lambda m: not m.flagged, msg)
@@ -204,7 +199,7 @@ def fake_magic_from_buffer(buffer, *, mime=False):
class MessageBuilder:
def __init__(self) -> None:
self._used_uids = set()
self._next_uid = 1
def create_message(
self,
@@ -257,10 +252,8 @@ class MessageBuilder:
# TODO: Unsure how to add a uid to the actual EmailMessage. This hacks it in,
# based on how imap_tools uses regex to extract it.
# This should be a large enough pool
uid = random.randint(1, 10000)
while uid in self._used_uids:
uid = random.randint(1, 10000)
self._used_uids.add(uid)
uid = self._next_uid
self._next_uid += 1
imap_msg._raw_uid_data = f"UID {uid}".encode()

28
uv.lock generated
View File

@@ -1010,15 +1010,15 @@ wheels = [
[[package]]
name = "django-allauth"
version = "65.13.1"
version = "65.14.0"
source = { registry = "https://pypi.org/simple" }
dependencies = [
{ name = "asgiref", marker = "sys_platform == 'darwin' or sys_platform == 'linux'" },
{ name = "django", marker = "sys_platform == 'darwin' or sys_platform == 'linux'" },
]
sdist = { url = "https://files.pythonhosted.org/packages/e0/b7/42a048ba1dedbb6b553f5376a6126b1c753c10c70d1edab8f94c560c8066/django_allauth-65.13.1.tar.gz", hash = "sha256:2af0d07812f8c1a8e3732feaabe6a9db5ecf3fad6b45b6a0f7fd825f656c5a15", size = 1983857, upload-time = "2025-11-20T16:34:40.811Z" }
sdist = { url = "https://files.pythonhosted.org/packages/23/9b/061a6ac65c602eb721b13fbf9c665b20fb900f113a03ec8521b5fcf16b83/django_allauth-65.14.0.tar.gz", hash = "sha256:5529227aba2b1377d900e9274a3f24496c645e65400fbae3cad5789944bc4d0b", size = 1991909, upload-time = "2026-01-17T18:43:12.928Z" }
wheels = [
{ url = "https://files.pythonhosted.org/packages/d8/98/9d44ae1468abfdb521d651fb67f914165c7812dfdd97be16190c9b1cc246/django_allauth-65.13.1-py3-none-any.whl", hash = "sha256:2887294beedfd108b4b52ebd182e0ed373deaeb927fc5a22f77bbde3174704a6", size = 1787349, upload-time = "2025-11-20T16:34:37.354Z" },
{ url = "https://files.pythonhosted.org/packages/ce/c8/2f959ff8466913d95ba72eb4a29bd7998d28a559786033a97b5bbdda2b81/django_allauth-65.14.0-py3-none-any.whl", hash = "sha256:448f5f7877f95fcbe1657256510fe7822d7871f202521a29e23ef937f3325a97", size = 1793052, upload-time = "2026-01-17T18:43:08.954Z" },
]
[package.optional-dependencies]
@@ -1305,7 +1305,7 @@ name = "exceptiongroup"
version = "1.3.1"
source = { registry = "https://pypi.org/simple" }
dependencies = [
{ name = "typing-extensions", marker = "(python_full_version < '3.13' and platform_machine != 'aarch64' and platform_machine != 'x86_64' and sys_platform == 'linux') or (python_full_version < '3.12' and platform_machine == 'aarch64' and sys_platform == 'linux') or (python_full_version < '3.12' and platform_machine == 'x86_64' and sys_platform == 'linux') or (python_full_version < '3.13' and sys_platform == 'darwin')" },
{ name = "typing-extensions", marker = "(python_full_version < '3.11' and sys_platform == 'darwin') or (python_full_version < '3.11' and sys_platform == 'linux')" },
]
sdist = { url = "https://files.pythonhosted.org/packages/50/79/66800aadf48771f6b62f7eb014e352e5d06856655206165d775e675a02c9/exceptiongroup-1.3.1.tar.gz", hash = "sha256:8b412432c6055b0b7d14c310000ae93352ed6754f70fa8f7c34141f91c4e3219", size = 30371, upload-time = "2025-11-21T23:01:54.787Z" }
wheels = [
@@ -2584,6 +2584,11 @@ wheels = [
{ url = "https://files.pythonhosted.org/packages/2c/19/04f9b178c2d8a15b076c8b5140708fa6ffc5601fb6f1e975537072df5b2a/mergedeep-1.3.4-py3-none-any.whl", hash = "sha256:70775750742b25c0d8f36c55aed03d24c3384d17c951b3175d898bd778ef0307", size = 6354, upload-time = "2021-02-05T18:55:29.583Z" },
]
[[package]]
name = "microsoft-python-type-stubs"
version = "0"
source = { git = "https://github.com/microsoft/python-type-stubs.git#692c37c3969d22612b295ddf7e7af5907204a386" }
[[package]]
name = "mkdocs"
version = "1.6.1"
@@ -2875,6 +2880,15 @@ wheels = [
{ url = "https://files.pythonhosted.org/packages/8d/f4/4ce9a05ce5ded1de3ec1c1d96cf9f9504a04e54ce0ed55cfa38619a32b8d/mypy-1.19.1-py3-none-any.whl", hash = "sha256:f1235f5ea01b7db5468d53ece6aaddf1ad0b88d9e7462b86ef96fe04995d7247", size = 2471239, upload-time = "2025-12-15T05:03:07.248Z" },
]
[[package]]
name = "mypy-baseline"
version = "0.7.3"
source = { registry = "https://pypi.org/simple" }
sdist = { url = "https://files.pythonhosted.org/packages/65/2a/03288dab6d5f24d187ba0c223f6b3035d9a29de3dd31a3e105a0d4f1b5da/mypy_baseline-0.7.3.tar.gz", hash = "sha256:325f0695310eb8f5c0f10fa7af36ee1b3785a9d26b886a61c07b4a8eddb28d29", size = 319108, upload-time = "2025-05-30T08:43:00.629Z" }
wheels = [
{ url = "https://files.pythonhosted.org/packages/1f/93/7780302b206a8e8e767ce763ef06159725d1323acbe55e46a1cd1ffd109d/mypy_baseline-0.7.3-py3-none-any.whl", hash = "sha256:bd7fa899e687d75af2e3f392a9d6d1790e65dae3d31fe12525cc14f26d866b74", size = 17868, upload-time = "2025-05-30T08:42:58.262Z" },
]
[[package]]
name = "mypy-extensions"
version = "1.1.0"
@@ -3293,7 +3307,9 @@ typing = [
{ name = "django-stubs", extra = ["compatible-mypy"], marker = "sys_platform == 'darwin' or sys_platform == 'linux'" },
{ name = "djangorestframework-stubs", extra = ["compatible-mypy"], marker = "sys_platform == 'darwin' or sys_platform == 'linux'" },
{ name = "lxml-stubs", marker = "sys_platform == 'darwin' or sys_platform == 'linux'" },
{ name = "microsoft-python-type-stubs", marker = "sys_platform == 'darwin' or sys_platform == 'linux'" },
{ name = "mypy", marker = "sys_platform == 'darwin' or sys_platform == 'linux'" },
{ name = "mypy-baseline", marker = "sys_platform == 'darwin' or sys_platform == 'linux'" },
{ name = "types-bleach", marker = "sys_platform == 'darwin' or sys_platform == 'linux'" },
{ name = "types-colorama", marker = "sys_platform == 'darwin' or sys_platform == 'linux'" },
{ name = "types-dateparser", marker = "sys_platform == 'darwin' or sys_platform == 'linux'" },
@@ -3317,7 +3333,7 @@ requires-dist = [
{ name = "concurrent-log-handler", specifier = "~=0.9.25" },
{ name = "dateparser", specifier = "~=1.2" },
{ name = "django", specifier = "~=5.2.10" },
{ name = "django-allauth", extras = ["mfa", "socialaccount"], specifier = "~=65.13.1" },
{ name = "django-allauth", extras = ["mfa", "socialaccount"], specifier = "~=65.14.0" },
{ name = "django-auditlog", specifier = "~=3.4.1" },
{ name = "django-cachalot", specifier = "~=2.8.0" },
{ name = "django-celery-results", specifier = "~=2.6.0" },
@@ -3433,7 +3449,9 @@ typing = [
{ name = "django-stubs", extras = ["compatible-mypy"] },
{ name = "djangorestframework-stubs", extras = ["compatible-mypy"] },
{ name = "lxml-stubs" },
{ name = "microsoft-python-type-stubs", git = "https://github.com/microsoft/python-type-stubs.git" },
{ name = "mypy" },
{ name = "mypy-baseline" },
{ name = "types-bleach" },
{ name = "types-colorama" },
{ name = "types-dateparser" },