shamoon
e14f4c94c2
Fix: ghostscript rendering error doesnt trigger frontend failure message ( #4092 )
...
* Raise ParseError from gs rendering error
* catch all parser errors as generic exception
* Differentiate generic vs parse errors during consumption
2023-08-31 19:49:00 -07:00
Dennis Brakhane
93009c1eed
Don't consider better OCR as failing
...
Tesseract 5.3.0 does a better job at OCR, and correctly
reads "a webp" instead of "awebp", this is good, so we
don't want the test to fail.
2023-07-11 16:44:18 +02:00
Trenton H
111960c530
Adds better handling for files with invalid utf8 content
2023-05-13 09:29:18 -07:00
Trenton H
6f163111ce
Upgrades black to v23, upgrades ruff
2023-04-26 09:35:27 -07:00
Trenton H
3bcbd05252
Fixes ruff not running isort against the codebase
2023-04-26 09:35:27 -07:00
Trenton H
ce41ac9158
Configures ruff as the one stop linter and resolves warnings it raised
2023-04-01 17:03:52 -07:00
Brandon Rothweiler
ca412e0184
Add PAPERLESS_OCR_SKIP_ARCHIVE_FILE config setting
2023-02-23 22:42:57 -05:00
Brandon Rothweiler
8a89f5ae27
Revert "Merge pull request #2732 from bdr99/skip_neverarchive"
...
This reverts commit 77b23d3acb573232e4e307b63a83f8ff557c0e7e, reversing
changes made to 5d8aa278315dcf92bfa1abe9e1fbd4911f8ed258.
2023-02-23 21:26:53 -05:00
Brandon Rothweiler
93a6391f96
Add a setting to disable creating an archive file
2023-02-22 15:27:17 -05:00
Trenton Holmes
0df91c31f1
Creates a mix-in for asserting file system states
2023-02-20 10:25:21 -08:00
Trenton H
bdcba570cb
Adding more test coverage, in particular around Tika and its parser
2023-02-05 11:01:55 -08:00
shamoon
985f298c46
Merge pull request #2302 from paperless-ngx/feature-fix-display-rtl-content
2023-01-10 07:30:52 -08:00
Trenton H
d7939ca958
Fixes some sample test files showing as modified after running tests
2023-01-05 08:39:48 -08:00
Trenton Holmes
7be9ae9c02
Try a new way of extracting text from a given PDF file
2023-01-03 12:43:31 -08:00
Trenton H
0fd51e35e1
Adds testing coverage of multipage TIFF with alpha, without and with alpha/sRGB
2023-01-03 09:56:19 -08:00
Trenton H
a2b7687c3b
In the case of an RTL language being extracted via pdfminer.six, fall back to forced OCR, which handles RTL text better
2022-12-29 16:02:02 -08:00
Trenton Holmes
55ef0d4a1b
Fixes language code checks around two part languages
2022-12-04 12:23:12 -08:00
Trenton H
e96d65f945
Allows parsing of WebP format images
2022-11-28 09:35:54 -08:00
Trenton H
f015556562
Adds a test to cover this edge case
2022-11-22 07:22:41 -08:00
Trenton Holmes
d1aa08850d
Reverts the change around skip_noarchive to align with how it is documented to work
2022-10-20 13:34:41 -07:00
Trenton Holmes
b3b2519bf0
Fixes the creation of an archive file, even if noarchive was specified
2022-08-20 13:47:56 -07:00
Trenton Holmes
49a843dcdd
Changes the simple-alpha parsing test to use a tempdir so the original isn't modified in Git
2022-07-02 16:19:22 +02:00
Trenton Holmes
1771d18a21
Runs the pre-commit hooks over all the Python files
2022-03-11 11:34:28 -08:00
kpj
fc695896dd
Format Python code with black
2022-02-27 15:26:41 +01:00
Martin Müller
73a8569d21
Modify test for PNG image with alpha
2022-02-21 22:38:25 +01:00
jonaswinkler
0e596bd1fc
also apply \0 removal to sidecar contents
2021-03-22 23:08:34 +01:00
jonaswinkler
40ce38254b
fixes #631
2021-03-14 14:42:48 +01:00
jonaswinkler
6ab884a95c
update dependencies
2021-02-28 13:01:26 +01:00
jonaswinkler
99a18516b2
tests
2021-02-22 00:17:16 +01:00
jonaswinkler
50c1978d36
tests
2021-02-21 00:18:34 +01:00
jonaswinkler
9cbb1c5726
add some test files
2021-02-21 00:13:08 +01:00
jonaswinkler
56bd966c02
local import of ocrmypdf so that the webserver does not load that
2021-02-15 12:18:10 +01:00
jonaswinkler
89d6e422f5
fix bugs and test cases
2021-01-02 15:37:27 +01:00
jonaswinkler
1b1b57eb6a
more tests
2020-12-19 15:54:13 +01:00
jonaswinkler
a0631413d6
fixes bauerj/paperless_app#23 and most of all other scanner apps out there.
2020-12-12 18:25:15 +01:00
jonaswinkler
e3ce573fbb
a couple fixes and more supported image files
2020-12-02 17:39:49 +01:00
jonaswinkler
12fa844c7f
testing the new noarchive option.
2020-12-01 14:30:13 +01:00
jonaswinkler
ac1b701000
more tests!
2020-11-29 19:58:48 +01:00
jonaswinkler
06cfc3113a
test case fixes.
2020-11-27 14:06:37 +01:00
Jonas Winkler
e87575240d
more tests of the new parser
2020-11-26 00:08:23 +01:00
Jonas Winkler
f51d2be303
fixed the test cases
2020-11-25 19:51:09 +01:00
Jonas Winkler
56ce267f89
removed obsolete tests.
2020-11-25 14:51:32 +01:00
Jonas Winkler
41650f20f4
mime type handling
2020-11-20 13:31:03 +01:00
Jonas Winkler
1655d85a53
testing the tesseract parser
2020-11-19 20:31:08 +01:00
Jonas Winkler
d2e22e3f27
Changed the way parsers are discovered. This also prepares for upcoming changes regarding content types and file types: parsers should declare what they support, and actual file extensions should not be hardcoded everywhere.
2020-11-16 23:53:12 +01:00
Jonas Winkler
2e04ba1c04
code style fixes
2020-11-12 21:09:45 +01:00
Jonas Winkler
f182709fdd
fixed most of the tests
2020-11-02 19:42:23 +01:00
Jonas Winkler
7d282a4e4e
removed unused code, small fixes
2020-11-02 18:20:04 +01:00
Johannes Wienke
a311cd498c
Handle dateparser ValueErrors
...
When parsing dates from the document text or filenames, correctly handle values
errors indicating broken dates. Newly added tests ensure that this handling
works properly.
2020-03-08 18:44:15 +01:00
Johannes Wienke
a3aab0cb48
Remove duplicated date parsing test
...
The exact same tests existed twice in the file.
2020-03-08 18:26:29 +01:00