Trenton H
|
aabcc9a1c4
|
Upgrades black to v23, upgrades ruff
|
2023-04-26 09:35:27 -07:00 |
|
Trenton H
|
30655f1b73
|
Fixes ruff not running isort against the codebase
|
2023-04-26 09:35:27 -07:00 |
|
Trenton H
|
d2c02b9102
|
Configures ruff as the one stop linter and resolves warnings it raised
|
2023-04-01 17:03:52 -07:00 |
|
Brandon Rothweiler
|
7d950d9e87
|
Add PAPERLESS_OCR_SKIP_ARCHIVE_FILE config setting
|
2023-02-23 22:42:57 -05:00 |
|
Brandon Rothweiler
|
d49e7d6693
|
Revert "Merge pull request #2732 from bdr99/skip_neverarchive"
This reverts commit 77b23d3acb573232e4e307b63a83f8ff557c0e7e, reversing
changes made to 5d8aa278315dcf92bfa1abe9e1fbd4911f8ed258.
|
2023-02-23 21:26:53 -05:00 |
|
Brandon Rothweiler
|
955546d2ef
|
Add a setting to disable creating an archive file
|
2023-02-22 15:27:17 -05:00 |
|
Trenton Holmes
|
acfa7d633d
|
Creates a mix-in for asserting file system states
|
2023-02-20 10:25:21 -08:00 |
|
Trenton H
|
09ac404148
|
Adding more test coverage, in particular around Tika and its parser
|
2023-02-05 11:01:55 -08:00 |
|
shamoon
|
e1d52f4884
|
Merge pull request #2302 from paperless-ngx/feature-fix-display-rtl-content
|
2023-01-10 07:30:52 -08:00 |
|
Trenton H
|
b91217064b
|
Fixes some sample test files showing as modified after running tests
|
2023-01-05 08:39:48 -08:00 |
|
Trenton H
|
cd42d17ffb
|
Small tweak to use the existing tempdir instead of a new one
|
2023-01-03 13:05:44 -08:00 |
|
Trenton Holmes
|
a185f94c4b
|
Try a new way of extracting text from a given PDF file
|
2023-01-03 12:43:31 -08:00 |
|
Trenton H
|
fb20c92c51
|
Adds testing coverage of multipage TIFF with alpha, without and with alpha/sRGB
|
2023-01-03 09:56:19 -08:00 |
|
Trenton H
|
911d3cb567
|
Let convert handle the removal of the alpha channel
|
2023-01-03 09:56:19 -08:00 |
|
Trenton Holmes
|
22620caf6e
|
If extracting text from a fallback file (ie forced), allow the text to be used
|
2023-01-01 09:57:15 -08:00 |
|
Trenton H
|
79aecebbd2
|
In the case of an RTL language being extracted via pdfminer.six, fall back to forced OCR, which handles RTL text better
|
2022-12-29 16:02:02 -08:00 |
|
Trenton Holmes
|
c83d2da67e
|
Fixes language code checks around two part languages
|
2022-12-04 12:23:12 -08:00 |
|
shamoon
|
7edf178019
|
Merge pull request #2057 from paperless-ngx/fix/2044-lang-code-diffs
Bugfix: Some tesseract languages aren't detected as installed.
|
2022-11-28 11:04:44 -08:00 |
|
Trenton H
|
68c62f3857
|
Allows parsing of WebP format images
|
2022-11-28 09:35:54 -08:00 |
|
Trenton Holmes
|
90f3266900
|
Fixes how a language code like chi-sim is treated in the checks
|
2022-11-27 08:28:22 -08:00 |
|
Trenton H
|
ffd9cd721d
|
Adds a test to cover this edge case
|
2022-11-22 07:22:41 -08:00 |
|
Trenton H
|
be8fa418bb
|
Don't use the sidecar file when redoing the OCR, it only contains new text
|
2022-11-22 07:22:41 -08:00 |
|
Trenton Holmes
|
1be8f39aa0
|
Reverts the change around skip_noarchive to align with how it is documented to work
|
2022-10-20 13:34:41 -07:00 |
|
Trenton Holmes
|
43d2545321
|
Fixes the creation of an archive file, even if noarchive was specified
|
2022-08-20 13:47:56 -07:00 |
|
Trenton Holmes
|
024fd8bc9b
|
When raising an exception during exception handling, chain them together for slightly cleaner logs
|
2022-08-03 09:00:56 -07:00 |
|
Trenton Holmes
|
8660103563
|
Changes the simple-alpha parsing test to use a tempdir so the original isn't modified in Git
|
2022-07-02 16:19:22 +02:00 |
|
Trenton Holmes
|
95bbf47995
|
Updates to provide the user provided max pixel size to ocrmypdf
|
2022-05-22 16:56:08 -07:00 |
|
Trenton Holmes
|
f62193099c
|
Runs pyupgrade to Python 3.8+ and adds a hook for it
|
2022-05-06 09:04:08 -07:00 |
|
Henning Häcker
|
f4a0d8c040
|
extract OCR_MAX_IMAGE_PIXELS into settings.py
|
2022-03-30 09:23:45 +02:00 |
|
Henning Häcker
|
6bc2cb0607
|
formatting according to black
|
2022-03-30 09:23:45 +02:00 |
|
Henning Häcker
|
2c1f0cd3ee
|
implement PAPERLESS_OCR_MAX_IMAGE_PIXELS
|
2022-03-30 09:23:45 +02:00 |
|
Trenton Holmes
|
6635fa5f0d
|
Runs the pre-commit hooks over all the Python files
|
2022-03-11 11:34:28 -08:00 |
|
Trenton Holmes
|
55486ac151
|
Reduces number of warnings from testing from 165 to 128. In doing so, fixes a few minor things in the decrypt and export commands
|
2022-03-10 18:12:48 -08:00 |
|
kpj
|
c56cb25b5f
|
Format Python code with black
|
2022-02-27 15:26:41 +01:00 |
|
Martin Müller
|
5fa3ec6704
|
Remove unneded exception handler from has_alpha()
|
2022-02-21 22:58:19 +01:00 |
|
Martin Müller
|
a662ce03ea
|
Modify test for PNG image with alpha
|
2022-02-21 22:38:25 +01:00 |
|
Martin Müller
|
b0afdc4841
|
Fix code style (line too long)
|
2022-02-21 22:34:34 +01:00 |
|
Martin Müller
|
01310b9742
|
Remove alpha layer from PNG files for img2pdf
Fixes issue #1254
|
2022-02-21 22:06:43 +01:00 |
|
jonaswinkler
|
95abc7d6d7
|
fix bug with DPI calculation
|
2021-08-18 18:33:33 +02:00 |
|
jonaswinkler
|
1402f11dc8
|
fix logging getting spammed with pdfminer warnings on JPG files
|
2021-06-13 12:09:16 +02:00 |
|
jonaswinkler
|
271f9001dd
|
Workaround for all PDFminer.six issues.
|
2021-05-15 12:15:32 +02:00 |
|
jonaswinkler
|
c9d76322eb
|
also apply \0 removal to sidecar contents
|
2021-03-22 23:08:34 +01:00 |
|
jonaswinkler
|
d85a0f950f
|
better exception logging
|
2021-03-22 23:00:15 +01:00 |
|
jonaswinkler
|
62f829ae82
|
fixes #794
|
2021-03-22 22:46:35 +01:00 |
|
jonaswinkler
|
3a67462396
|
fixes #631
|
2021-03-14 14:42:48 +01:00 |
|
jonaswinkler
|
81b787635e
|
update dependencies
|
2021-02-28 13:01:26 +01:00 |
|
jonaswinkler
|
96088716d9
|
tests
|
2021-02-22 00:17:16 +01:00 |
|
jonaswinkler
|
8dd2e1098b
|
fix up the ocrmypdf parameter construction for clean-final and redo
|
2021-02-21 23:39:19 +01:00 |
|
jonaswinkler
|
3f920a84da
|
use archived file for thumbnail, if available
|
2021-02-21 23:30:14 +01:00 |
|
jonaswinkler
|
dce65dc0fa
|
more parameter checking
|
2021-02-21 22:19:24 +01:00 |
|