193 Commits

Author SHA1 Message Date
Trenton H
e3bc680bf1 Chore: Drop Python 3.9 support (#7774) 2024-09-26 12:22:24 -07:00
s0llvan
fd2bea0da9 Feature: page count (#7750)
---------

Co-authored-by: shamoon <4887959+shamoon@users.noreply.github.com>
2024-09-25 08:22:12 -07:00
Trenton H
ec0fc0b1a4 Fix: Rework system check so it won't crash if tesseract is not found (#7640) 2024-09-08 12:17:32 -07:00
dependabot[bot]
59edf10aa9 Chore(deps-dev): Bump the development group across 1 directory with 2 updates (#6851)
* Chore(deps-dev): Bump the development group across 1 directory with 2 updates

Bumps the development group with 2 updates in the / directory: [ruff](https://github.com/astral-sh/ruff) and [mkdocs-material](https://github.com/squidfunk/mkdocs-material).


Updates `ruff` from 0.4.4 to 0.4.6
- [Release notes](https://github.com/astral-sh/ruff/releases)
- [Changelog](https://github.com/astral-sh/ruff/blob/main/CHANGELOG.md)
- [Commits](https://github.com/astral-sh/ruff/compare/v0.4.4...v0.4.6)

Updates `mkdocs-material` from 9.5.24 to 9.5.25
- [Release notes](https://github.com/squidfunk/mkdocs-material/releases)
- [Changelog](https://github.com/squidfunk/mkdocs-material/blob/master/CHANGELOG)
- [Commits](https://github.com/squidfunk/mkdocs-material/compare/9.5.24...9.5.25)

---
updated-dependencies:
- dependency-name: ruff
  dependency-type: direct:development
  update-type: version-update:semver-patch
  dependency-group: development
- dependency-name: mkdocs-material
  dependency-type: direct:development
  update-type: version-update:semver-patch
  dependency-group: development
...

Signed-off-by: dependabot[bot] <support@github.com>

* Updates hook versions to match

* New codespell fixes

* Remove unneeded i18n

---------

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Co-authored-by: Trenton H <797416+stumpylog@users.noreply.github.com>
Co-authored-by: shamoon <4887959+shamoon@users.noreply.github.com>
2024-05-29 07:04:01 +00:00
Trenton H
75bc6b3ba8 Chore: Standardize subprocess running and logging (#6275) 2024-04-04 13:11:43 -07:00
dependabot[bot]
6656adcd6b Chore(deps-dev): Bump the development group with 3 updates (#6079)
* Chore(deps-dev): Bump the development group with 3 updates

Bumps the development group with 3 updates: [ruff](https://github.com/astral-sh/ruff), [pytest](https://github.com/pytest-dev/pytest) and [mkdocs-material](https://github.com/squidfunk/mkdocs-material).


Updates `ruff` from 0.3.0 to 0.3.2
- [Release notes](https://github.com/astral-sh/ruff/releases)
- [Changelog](https://github.com/astral-sh/ruff/blob/main/CHANGELOG.md)
- [Commits](https://github.com/astral-sh/ruff/compare/v0.3.0...v0.3.2)

Updates `pytest` from 8.0.2 to 8.1.1
- [Release notes](https://github.com/pytest-dev/pytest/releases)
- [Changelog](https://github.com/pytest-dev/pytest/blob/main/CHANGELOG.rst)
- [Commits](https://github.com/pytest-dev/pytest/compare/8.0.2...8.1.1)

Updates `mkdocs-material` from 9.5.12 to 9.5.13
- [Release notes](https://github.com/squidfunk/mkdocs-material/releases)
- [Changelog](https://github.com/squidfunk/mkdocs-material/blob/master/CHANGELOG)
- [Commits](https://github.com/squidfunk/mkdocs-material/compare/9.5.12...9.5.13)

---
updated-dependencies:
- dependency-name: ruff
  dependency-type: direct:development
  update-type: version-update:semver-patch
  dependency-group: development
- dependency-name: pytest
  dependency-type: direct:development
  update-type: version-update:semver-minor
  dependency-group: development
- dependency-name: mkdocs-material
  dependency-type: direct:development
  update-type: version-update:semver-patch
  dependency-group: development
...

Signed-off-by: dependabot[bot] <support@github.com>

* Updates pre-commit hook versions and runs it against all files

---------

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Co-authored-by: Trenton H <797416+stumpylog@users.noreply.github.com>
2024-03-12 07:56:01 -07:00
Trenton H
122bd9fd5b Feature: Allow user to control PIL image pixel limit (#5997) 2024-03-05 00:19:56 +00:00
Trenton H
23398f5ed1 Feature: Allow a user to disable the pixel limit for OCR entirely (#5996) 2024-03-04 22:37:36 +00:00
Trenton H
633ec9de86 Fix: Test metadata items for Unicode issues (#5707)
Test each key for unicode issues and reject ones which will fail inside DRF
2024-02-09 20:08:23 +00:00
Trenton H
ec0b0d0de4 Chore: Backend dependencies update (#5676) 2024-02-08 09:48:24 -08:00
Trenton H
23d39a1156 Adds more documentation for OCR_PAGES and prevents using 0 for actual OCR (#5275) 2024-01-06 09:06:41 -08:00
Trenton H
eb2caa5118 Fix: Allows pre-consume scripts to modify the working path again (#5260)
* Allows pre-consume scripts to modify the working path again and generally cleans up some confusion about working copy vs original
2024-01-05 21:01:57 -08:00
Trenton H
c8a62715ec Feature: Allow setting backend configuration settings via the UI (#5126)
* Saving some start on this

* At least partially working for the tesseract parser

* Problems with migration testing need to figure out

* Work around that error

* Fixes max m_pixels

* Moving the settings to main paperless application

* Starting some consumer options

* More fixes and work

* Fixes these last tests

* Fix max_length on OcrSettings.mode field

* Fix all fields on Common & Ocr settings serializers

* Umbrellla config view

* Revert "Umbrellla config view"

This reverts commit fbaf9f4be30f89afeb509099180158a3406416a5.

* Updates to use a single configuration object for all settings

* Squashed commit of the following:

commit 8a0a49dd5766094f60462fbfbe62e9921fbd2373
Author: shamoon <4887959+shamoon@users.noreply.github.com>
Date:   Tue Dec 19 23:02:47 2023 -0800

    Fix formatting

commit 66b2d90c507b8afd9507813ff555e46198ea33b9
Author: shamoon <4887959+shamoon@users.noreply.github.com>
Date:   Tue Dec 19 22:36:35 2023 -0800

    Refactor frontend data models

commit 5723bd8dd823ee855625e250df39393e26709d48
Author: Adam Bogdał <adam@bogdal.pl>
Date:   Wed Dec 20 01:17:43 2023 +0100

    Fix: speed up admin panel for installs with a large number of documents (#5052)

commit 9b08ce176199bf9011a6634bb88f616846150d2b
Author: shamoon <4887959+shamoon@users.noreply.github.com>
Date:   Tue Dec 19 15:18:51 2023 -0800

    Update PULL_REQUEST_TEMPLATE.md

commit a6248bec2d793b7690feed95fcaf5eb34a75bfb6
Author: shamoon <4887959+shamoon@users.noreply.github.com>
Date:   Tue Dec 19 15:02:05 2023 -0800

    Chore: Update Angular to v17 (#4980)

commit b1f6f52486d5ba5c04af99b41315eb6428fd1fa8
Author: shamoon <4887959+shamoon@users.noreply.github.com>
Date:   Tue Dec 19 13:53:56 2023 -0800

    Fix: Dont allow null custom_fields property via API (#5063)

commit 638d9970fd468d8c02c91d19bd28f8b0796bdcb1
Author: shamoon <4887959+shamoon@users.noreply.github.com>
Date:   Tue Dec 19 13:43:50 2023 -0800

    Enhancement: symmetric document links (#4907)

commit 5e8de4c1da6eb4eb8f738b20962595c7536b30ec
Author: shamoon <4887959+shamoon@users.noreply.github.com>
Date:   Tue Dec 19 12:45:04 2023 -0800

    Enhancement: shared icon & shared by me filter (#4859)

commit 088bad90306025d3f6b139cbd0ad264a1cbecfe5
Author: Trenton H <797416+stumpylog@users.noreply.github.com>
Date:   Tue Dec 19 12:04:03 2023 -0800

    Bulk updates all the backend libraries (#5061)

* Saving some work on frontend config

* Very basic but dynamically-generated config form

* Saving work on slightly less ugly frontend config

* JSON validation for user_args field

* Fully dynamic config form

* Adds in some additional validators for a nicer error message

* Cleaning up the testing and coverage more

* Reverts unintentional change

* Adds documentation about the settings and the precedence

* Couple more commenting and style fixes

---------

Co-authored-by: shamoon <4887959+shamoon@users.noreply.github.com>
2023-12-29 15:42:56 -08:00
Trenton H
46997dfd3d Apply user arguments even in the case of the safe fallback to forcing OCR (#4981) 2023-12-14 11:20:47 -08:00
Trenton H
a620a6727d Adds new setting to control color conversions (#4709) 2023-11-29 12:18:44 -08:00
Trenton H
cadfce88ad Fix: Add a warning about a low image DPI which may cause OCR to fail (#4708) 2023-11-29 11:28:27 -08:00
Trenton H
ab9e561317 Chore: Backend bulk updates (#4509) 2023-11-13 17:09:56 +00:00
Trenton H
dc642152d1 Standarizes the imports across all the files and modules (#4248) 2023-09-23 20:17:01 -07:00
Trenton Holmes
34b80a4d8e Removes support for Python 3.8 and lower from the code base 2023-09-10 11:42:59 -07:00
shamoon
9ed9dbb369 Fix: ghostscript rendering error doesnt trigger frontend failure message (#4092)
* Raise ParseError from gs rendering error

* catch all parser errors as generic exception

* Differentiate generic vs parse errors during consumption
2023-08-31 19:49:00 -07:00
Trenton H
98641e9bf3 When PDF/A rendering fails, add a warning the user may want to allow it to continue 2023-08-28 18:10:11 -07:00
Dennis Brakhane
a1f2ac43f3 Don't consider better OCR as failing
Tesseract 5.3.0 does a better job at OCR, and correctly
reads "a webp" instead of "awebp", this is good, so we
don't want the test to fail.
2023-07-11 16:44:18 +02:00
Trenton H
4504668cb2 Let ruff autofix some things from the newest version 2023-06-13 20:15:18 -07:00
Trenton H
bad8d304cb Improves the logging mixin and allows it to be typed better 2023-05-23 17:16:39 -07:00
Trenton H
6722b6e31c Adds better handling for files with invalid utf8 content 2023-05-13 09:29:18 -07:00
Trenton H
aabcc9a1c4 Upgrades black to v23, upgrades ruff 2023-04-26 09:35:27 -07:00
Trenton H
30655f1b73 Fixes ruff not running isort against the codebase 2023-04-26 09:35:27 -07:00
Trenton H
d2c02b9102 Configures ruff as the one stop linter and resolves warnings it raised 2023-04-01 17:03:52 -07:00
Brandon Rothweiler
7d950d9e87 Add PAPERLESS_OCR_SKIP_ARCHIVE_FILE config setting 2023-02-23 22:42:57 -05:00
Brandon Rothweiler
d49e7d6693 Revert "Merge pull request #2732 from bdr99/skip_neverarchive"
This reverts commit 77b23d3acb573232e4e307b63a83f8ff557c0e7e, reversing
changes made to 5d8aa278315dcf92bfa1abe9e1fbd4911f8ed258.
2023-02-23 21:26:53 -05:00
Brandon Rothweiler
955546d2ef Add a setting to disable creating an archive file 2023-02-22 15:27:17 -05:00
Trenton Holmes
acfa7d633d Creates a mix-in for asserting file system states 2023-02-20 10:25:21 -08:00
Trenton H
09ac404148 Adding more test coverage, in particular around Tika and its parser 2023-02-05 11:01:55 -08:00
shamoon
e1d52f4884 Merge pull request #2302 from paperless-ngx/feature-fix-display-rtl-content 2023-01-10 07:30:52 -08:00
Trenton H
b91217064b Fixes some sample test files showing as modified after running tests 2023-01-05 08:39:48 -08:00
Trenton H
cd42d17ffb Small tweak to use the existing tempdir instead of a new one 2023-01-03 13:05:44 -08:00
Trenton Holmes
a185f94c4b Try a new way of extracting text from a given PDF file 2023-01-03 12:43:31 -08:00
Trenton H
fb20c92c51 Adds testing coverage of multipage TIFF with alpha, without and with alpha/sRGB 2023-01-03 09:56:19 -08:00
Trenton H
911d3cb567 Let convert handle the removal of the alpha channel 2023-01-03 09:56:19 -08:00
Trenton Holmes
22620caf6e If extracting text from a fallback file (ie forced), allow the text to be used 2023-01-01 09:57:15 -08:00
Trenton H
79aecebbd2 In the case of an RTL language being extracted via pdfminer.six, fall back to forced OCR, which handles RTL text better 2022-12-29 16:02:02 -08:00
Trenton Holmes
c83d2da67e Fixes language code checks around two part languages 2022-12-04 12:23:12 -08:00
shamoon
7edf178019 Merge pull request #2057 from paperless-ngx/fix/2044-lang-code-diffs
Bugfix: Some tesseract languages aren't detected as installed.
2022-11-28 11:04:44 -08:00
Trenton H
68c62f3857 Allows parsing of WebP format images 2022-11-28 09:35:54 -08:00
Trenton Holmes
90f3266900 Fixes how a language code like chi-sim is treated in the checks 2022-11-27 08:28:22 -08:00
Trenton H
ffd9cd721d Adds a test to cover this edge case 2022-11-22 07:22:41 -08:00
Trenton H
be8fa418bb Don't use the sidecar file when redoing the OCR, it only contains new text 2022-11-22 07:22:41 -08:00
Trenton Holmes
1be8f39aa0 Reverts the change around skip_noarchive to align with how it is documented to work 2022-10-20 13:34:41 -07:00
Trenton Holmes
43d2545321 Fixes the creation of an archive file, even if noarchive was specified 2022-08-20 13:47:56 -07:00
Trenton Holmes
024fd8bc9b When raising an exception during exception handling, chain them together for slightly cleaner logs 2022-08-03 09:00:56 -07:00