Sebastian Steinbeißer
76d363f22d
Chore: switch from os.path to pathlib.Path ( #9060 )
2025-03-05 21:06:01 +00:00
Sebastian Steinbeißer
e560fa3be0
Chore: Enable ruff FBT ( #8645 )
2025-02-07 09:12:03 -08:00
dependabot[bot]
20ec8cb57b
Chore(deps-dev): Bump the development group with 2 updates ( #8841 )
...
* Chore(deps-dev): Bump the development group with 2 updates
Bumps the development group with 2 updates: [ruff](https://github.com/astral-sh/ruff ) and [mkdocs-material](https://github.com/squidfunk/mkdocs-material ).
Updates `ruff` from 0.8.6 to 0.9.2
- [Release notes](https://github.com/astral-sh/ruff/releases )
- [Changelog](https://github.com/astral-sh/ruff/blob/main/CHANGELOG.md )
- [Commits](https://github.com/astral-sh/ruff/compare/0.8.6...0.9.2 )
Updates `mkdocs-material` from 9.5.49 to 9.5.50
- [Release notes](https://github.com/squidfunk/mkdocs-material/releases )
- [Changelog](https://github.com/squidfunk/mkdocs-material/blob/master/CHANGELOG )
- [Commits](https://github.com/squidfunk/mkdocs-material/compare/9.5.49...9.5.50 )
---
updated-dependencies:
- dependency-name: ruff
dependency-type: direct:development
update-type: version-update:semver-minor
dependency-group: development
- dependency-name: mkdocs-material
dependency-type: direct:development
update-type: version-update:semver-patch
dependency-group: development
...
Signed-off-by: dependabot[bot] <support@github.com>
* Update .pre-commit-config.yaml
* Run new ruff format
---------
Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Co-authored-by: shamoon <4887959+shamoon@users.noreply.github.com>
2025-01-21 19:22:25 +00:00
Trenton H
d1f255a22e
Chore: Bulk backend dependency updates ( #8212 )
2024-11-11 11:54:51 -08:00
shamoon
a6f4c75a72
Fix: handle page count exception for pw-protected files ( #8240 )
2024-11-10 03:33:47 -08:00
Trenton H
e6f59472e4
Chore: Drop Python 3.9 support ( #7774 )
2024-09-26 12:22:24 -07:00
s0llvan
c92c3e224a
Feature: page count ( #7750 )
...
---------
Co-authored-by: shamoon <4887959+shamoon@users.noreply.github.com>
2024-09-25 08:22:12 -07:00
dependabot[bot]
ce3d5b0065
Chore(deps-dev): Bump the development group across 1 directory with 2 updates ( #6851 )
...
* Chore(deps-dev): Bump the development group across 1 directory with 2 updates
Bumps the development group with 2 updates in the / directory: [ruff](https://github.com/astral-sh/ruff ) and [mkdocs-material](https://github.com/squidfunk/mkdocs-material ).
Updates `ruff` from 0.4.4 to 0.4.6
- [Release notes](https://github.com/astral-sh/ruff/releases )
- [Changelog](https://github.com/astral-sh/ruff/blob/main/CHANGELOG.md )
- [Commits](https://github.com/astral-sh/ruff/compare/v0.4.4...v0.4.6 )
Updates `mkdocs-material` from 9.5.24 to 9.5.25
- [Release notes](https://github.com/squidfunk/mkdocs-material/releases )
- [Changelog](https://github.com/squidfunk/mkdocs-material/blob/master/CHANGELOG )
- [Commits](https://github.com/squidfunk/mkdocs-material/compare/9.5.24...9.5.25 )
---
updated-dependencies:
- dependency-name: ruff
dependency-type: direct:development
update-type: version-update:semver-patch
dependency-group: development
- dependency-name: mkdocs-material
dependency-type: direct:development
update-type: version-update:semver-patch
dependency-group: development
...
Signed-off-by: dependabot[bot] <support@github.com>
* Updates hook versions to match
* New codespell fixes
* Remove unneeded i18n
---------
Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Co-authored-by: Trenton H <797416+stumpylog@users.noreply.github.com>
Co-authored-by: shamoon <4887959+shamoon@users.noreply.github.com>
2024-05-29 07:04:01 +00:00
Trenton H
2c43b06910
Chore: Standardize subprocess running and logging ( #6275 )
2024-04-04 13:11:43 -07:00
Trenton H
b9636a3def
Feature: Allow user to control PIL image pixel limit ( #5997 )
2024-03-05 00:19:56 +00:00
Trenton H
6779042242
Feature: Allow a user to disable the pixel limit for OCR entirely ( #5996 )
2024-03-04 22:37:36 +00:00
Trenton H
0b1523f4e5
Fix: Test metadata items for Unicode issues ( #5707 )
...
Test each key for unicode issues and reject ones which will fail inside DRF
2024-02-09 20:08:23 +00:00
Trenton H
b60e16fe33
Chore: Backend dependencies update ( #5676 )
2024-02-08 09:48:24 -08:00
Trenton H
9043f45350
Adds more documentation for OCR_PAGES and prevents using 0 for actual OCR ( #5275 )
2024-01-06 09:06:41 -08:00
Trenton H
a82e3771ae
Fix: Allows pre-consume scripts to modify the working path again ( #5260 )
...
* Allows pre-consume scripts to modify the working path again and generally cleans up some confusion about working copy vs original
2024-01-05 21:01:57 -08:00
Trenton H
061f33fb05
Feature: Allow setting backend configuration settings via the UI ( #5126 )
...
* Saving some start on this
* At least partially working for the tesseract parser
* Problems with migration testing need to figure out
* Work around that error
* Fixes max m_pixels
* Moving the settings to main paperless application
* Starting some consumer options
* More fixes and work
* Fixes these last tests
* Fix max_length on OcrSettings.mode field
* Fix all fields on Common & Ocr settings serializers
* Umbrellla config view
* Revert "Umbrellla config view"
This reverts commit fbaf9f4be30f89afeb509099180158a3406416a5.
* Updates to use a single configuration object for all settings
* Squashed commit of the following:
commit 8a0a49dd5766094f60462fbfbe62e9921fbd2373
Author: shamoon <4887959+shamoon@users.noreply.github.com>
Date: Tue Dec 19 23:02:47 2023 -0800
Fix formatting
commit 66b2d90c507b8afd9507813ff555e46198ea33b9
Author: shamoon <4887959+shamoon@users.noreply.github.com>
Date: Tue Dec 19 22:36:35 2023 -0800
Refactor frontend data models
commit 5723bd8dd823ee855625e250df39393e26709d48
Author: Adam Bogdał <adam@bogdal.pl>
Date: Wed Dec 20 01:17:43 2023 +0100
Fix: speed up admin panel for installs with a large number of documents (#5052 )
commit 9b08ce176199bf9011a6634bb88f616846150d2b
Author: shamoon <4887959+shamoon@users.noreply.github.com>
Date: Tue Dec 19 15:18:51 2023 -0800
Update PULL_REQUEST_TEMPLATE.md
commit a6248bec2d793b7690feed95fcaf5eb34a75bfb6
Author: shamoon <4887959+shamoon@users.noreply.github.com>
Date: Tue Dec 19 15:02:05 2023 -0800
Chore: Update Angular to v17 (#4980 )
commit b1f6f52486d5ba5c04af99b41315eb6428fd1fa8
Author: shamoon <4887959+shamoon@users.noreply.github.com>
Date: Tue Dec 19 13:53:56 2023 -0800
Fix: Dont allow null custom_fields property via API (#5063 )
commit 638d9970fd468d8c02c91d19bd28f8b0796bdcb1
Author: shamoon <4887959+shamoon@users.noreply.github.com>
Date: Tue Dec 19 13:43:50 2023 -0800
Enhancement: symmetric document links (#4907 )
commit 5e8de4c1da6eb4eb8f738b20962595c7536b30ec
Author: shamoon <4887959+shamoon@users.noreply.github.com>
Date: Tue Dec 19 12:45:04 2023 -0800
Enhancement: shared icon & shared by me filter (#4859 )
commit 088bad90306025d3f6b139cbd0ad264a1cbecfe5
Author: Trenton H <797416+stumpylog@users.noreply.github.com>
Date: Tue Dec 19 12:04:03 2023 -0800
Bulk updates all the backend libraries (#5061 )
* Saving some work on frontend config
* Very basic but dynamically-generated config form
* Saving work on slightly less ugly frontend config
* JSON validation for user_args field
* Fully dynamic config form
* Adds in some additional validators for a nicer error message
* Cleaning up the testing and coverage more
* Reverts unintentional change
* Adds documentation about the settings and the precedence
* Couple more commenting and style fixes
---------
Co-authored-by: shamoon <4887959+shamoon@users.noreply.github.com>
2023-12-29 15:42:56 -08:00
Trenton H
92a920021d
Apply user arguments even in the case of the safe fallback to forcing OCR ( #4981 )
2023-12-14 11:20:47 -08:00
Trenton H
e3f4e0b775
Adds new setting to control color conversions ( #4709 )
2023-11-29 12:18:44 -08:00
Trenton H
e1b573adeb
Fix: Add a warning about a low image DPI which may cause OCR to fail ( #4708 )
2023-11-29 11:28:27 -08:00
Trenton H
facb7226fe
Chore: Backend bulk updates ( #4509 )
2023-11-13 17:09:56 +00:00
shamoon
e14f4c94c2
Fix: ghostscript rendering error doesnt trigger frontend failure message ( #4092 )
...
* Raise ParseError from gs rendering error
* catch all parser errors as generic exception
* Differentiate generic vs parse errors during consumption
2023-08-31 19:49:00 -07:00
Trenton H
7e768bfe23
When PDF/A rendering fails, add a warning the user may want to allow it to continue
2023-08-28 18:10:11 -07:00
Trenton H
70f3f98363
Let ruff autofix some things from the newest version
2023-06-13 20:15:18 -07:00
Trenton H
452c79f9a1
Improves the logging mixin and allows it to be typed better
2023-05-23 17:16:39 -07:00
Trenton H
111960c530
Adds better handling for files with invalid utf8 content
2023-05-13 09:29:18 -07:00
Trenton H
6f163111ce
Upgrades black to v23, upgrades ruff
2023-04-26 09:35:27 -07:00
Trenton H
3bcbd05252
Fixes ruff not running isort against the codebase
2023-04-26 09:35:27 -07:00
Trenton H
ce41ac9158
Configures ruff as the one stop linter and resolves warnings it raised
2023-04-01 17:03:52 -07:00
Brandon Rothweiler
ca412e0184
Add PAPERLESS_OCR_SKIP_ARCHIVE_FILE config setting
2023-02-23 22:42:57 -05:00
Brandon Rothweiler
8a89f5ae27
Revert "Merge pull request #2732 from bdr99/skip_neverarchive"
...
This reverts commit 77b23d3acb573232e4e307b63a83f8ff557c0e7e, reversing
changes made to 5d8aa278315dcf92bfa1abe9e1fbd4911f8ed258.
2023-02-23 21:26:53 -05:00
Brandon Rothweiler
93a6391f96
Add a setting to disable creating an archive file
2023-02-22 15:27:17 -05:00
Trenton H
bdcba570cb
Adding more test coverage, in particular around Tika and its parser
2023-02-05 11:01:55 -08:00
Trenton H
1e4923835b
Small tweak to use the existing tempdir instead of a new one
2023-01-03 13:05:44 -08:00
Trenton Holmes
7be9ae9c02
Try a new way of extracting text from a given PDF file
2023-01-03 12:43:31 -08:00
Trenton H
59e0c1fe4e
Let convert handle the removal of the alpha channel
2023-01-03 09:56:19 -08:00
Trenton Holmes
26c7fad005
If extracting text from a fallback file (ie forced), allow the text to be used
2023-01-01 09:57:15 -08:00
Trenton H
a2b7687c3b
In the case of an RTL language being extracted via pdfminer.six, fall back to forced OCR, which handles RTL text better
2022-12-29 16:02:02 -08:00
Trenton H
e96d65f945
Allows parsing of WebP format images
2022-11-28 09:35:54 -08:00
Trenton H
b897d6de2e
Don't use the sidecar file when redoing the OCR, it only contains new text
2022-11-22 07:22:41 -08:00
Trenton Holmes
d1aa08850d
Reverts the change around skip_noarchive to align with how it is documented to work
2022-10-20 13:34:41 -07:00
Trenton Holmes
b3b2519bf0
Fixes the creation of an archive file, even if noarchive was specified
2022-08-20 13:47:56 -07:00
Trenton Holmes
b70e21a6d5
When raising an exception during exception handling, chain them together for slightly cleaner logs
2022-08-03 09:00:56 -07:00
Trenton Holmes
fc26fe0ac0
Updates to provide the user provided max pixel size to ocrmypdf
2022-05-22 16:56:08 -07:00
Trenton Holmes
3003bdd507
Runs pyupgrade to Python 3.8+ and adds a hook for it
2022-05-06 09:04:08 -07:00
Henning Häcker
3b4da70c85
extract OCR_MAX_IMAGE_PIXELS into settings.py
2022-03-30 09:23:45 +02:00
Henning Häcker
95199bd325
formatting according to black
2022-03-30 09:23:45 +02:00
Henning Häcker
a8887b211e
implement PAPERLESS_OCR_MAX_IMAGE_PIXELS
2022-03-30 09:23:45 +02:00
Trenton Holmes
1771d18a21
Runs the pre-commit hooks over all the Python files
2022-03-11 11:34:28 -08:00
Trenton Holmes
85b210ebf6
Reduces number of warnings from testing from 165 to 128. In doing so, fixes a few minor things in the decrypt and export commands
2022-03-10 18:12:48 -08:00
kpj
fc695896dd
Format Python code with black
2022-02-27 15:26:41 +01:00