Trenton H
e160580c8b
Fixes issues with copy2 or copystat and SELinux see #3665
2023-07-22 06:27:49 -07:00
Johannes Plunien
3a58a5f123
Copy default thumbnail if thumbnail generation fails
...
Fix #3631
2023-06-20 11:28:46 -07:00
Trenton H
bad8d304cb
Improves the logging mixin and allows it to be typed better
2023-05-23 17:16:39 -07:00
Trenton Holmes
2f12206911
Changes the error mode to replace instead of ignore, to better highlight where a problem happened
2023-05-13 09:29:18 -07:00
Trenton H
6722b6e31c
Adds better handling for files with invalid utf8 content
2023-05-13 09:29:18 -07:00
Trenton H
aabcc9a1c4
Upgrades black to v23, upgrades ruff
2023-04-26 09:35:27 -07:00
Trenton H
30655f1b73
Fixes ruff not running isort against the codebase
2023-04-26 09:35:27 -07:00
Trenton H
d2c02b9102
Configures ruff as the one stop linter and resolves warnings it raised
2023-04-01 17:03:52 -07:00
Trenton H
d58747c912
relock with Python 3.8.15
2023-01-06 17:59:39 -08:00
Trenton H
8504b6f7da
Cleans up and improves parser discovery testing, simplifies the determination of supported or not supported extensions and mime types
2023-01-05 08:39:48 -08:00
Trenton H
cdfcbff529
Don't allow an exception when trying to parse a date cause complete failure
2022-11-17 13:37:37 -08:00
Matthias Eck
05d97d2cf1
fix(parsers|test_api): fix failed tests
2022-08-06 19:19:10 +02:00
Matthias Eck
1195fb9afe
feat(parsers): add generator for date parsing
2022-08-06 13:03:20 +02:00
Trenton Holmes
ef6ebf9888
Entirely removes the optipng, updates ghostscript fall back to also use WebP. Updates the conversion to use a multiprocessing pool
2022-06-11 08:38:49 -07:00
Michael Shamoon
f208f89179
webp thumbnail support with png fallback
2022-06-10 02:28:13 -07:00
shamoon
3ccf143c0b
Merge pull request #721 from paperless-ngx/bug-fix-date-ignore
...
Fix Ignore Date Parsing
2022-05-10 16:45:58 -07:00
Trenton Holmes
304d5b0d5a
Updates the ignore date parsing to utilize the settings defined date order, instead of guessing a bit
2022-05-08 16:57:35 -07:00
Trenton Holmes
a944ef1ca6
Adds additional testing for both date parsing and consumed document created date
2022-05-08 16:57:35 -07:00
Trenton Holmes
f62193099c
Runs pyupgrade to Python 3.8+ and adds a hook for it
2022-05-06 09:04:08 -07:00
Fantasticle
6982641398
update new regex pattern for second boundary
2022-03-31 09:37:15 +02:00
fantasticle
95fdcab953
Update regex date match patterns
2022-03-30 12:19:30 +02:00
Simon Siebert
5aea4da8b2
Update parsers.py and test_consumer.py
2022-03-14 19:03:09 +01:00
Trenton Holmes
6635fa5f0d
Runs the pre-commit hooks over all the Python files
2022-03-11 11:34:28 -08:00
kpj
c56cb25b5f
Format Python code with black
2022-02-27 15:26:41 +01:00
jonaswinkler
3a67462396
fixes #631
2021-03-14 14:42:48 +01:00
jonaswinkler
f8f49bac75
only import dateparser when required
2021-02-15 11:52:46 +01:00
jonaswinkler
b04d91d68c
fix a bug with thumbnail generation when TIKA was enabled
2021-02-09 22:12:43 +01:00
jonaswinkler
e5a7dc0cc7
rework most of the logging
2021-02-05 01:10:29 +01:00
jonaswinkler
eeff7b3bdb
code style
2021-02-02 23:58:25 +01:00
jonaswinkler
5f7d817d69
localization for websockets
2021-01-28 22:06:02 +01:00
jonaswinkler
c0f185fe7e
bug fixes, test case fixes
2021-01-26 15:19:56 +01:00
jonaswinkler
044aa55d74
Merge branch 'dev' into feature-websockets-status
2021-01-23 22:22:17 +01:00
Jonas Winkler
22f45ac619
Merge pull request #251 from jayme-github/ignore-date
...
Add option to ignore certain dates in parse_date
2021-01-05 00:19:13 +01:00
jonaswinkler
179b53d373
Merge branch 'dev' into feature-websockets-status
2021-01-04 22:45:56 +01:00
jonaswinkler
e2680b7113
code style
2021-01-02 15:26:09 +01:00
jayme-github
cd15490e91
Add option to ignore certain dates in parse_date
...
PAPERLESS_IGNORE_DATES allows to specify a comma separated list of dates
to ignore during date parsing (from filename and content). This can be
used so specify dates that do appear often in documents but are usually
not the documents creation date (like your date of birth).
2021-01-02 15:20:49 +01:00
jonaswinkler
755f950cd2
supply file_name for tika parser
2021-01-01 22:19:43 +01:00
jonaswinkler
f1e9b414f9
remove duplicate code
2021-01-01 21:50:45 +01:00
jonaswinkler
4b7138f477
fixes #218
2020-12-30 15:12:16 +01:00
jonaswinkler
cdd2c873bd
fixes #25
2020-12-15 13:52:35 +01:00
jonaswinkler
0c6c4a62d8
moved metadata extraction to the parsers
2020-12-10 14:57:53 +01:00
jonaswinkler
0bfecaa0fc
Merge branch 'dev' into feature-websockets-status
2020-12-06 22:53:54 +01:00
jonaswinkler
b0507ce92a
fixes #78
2020-12-02 18:00:49 +01:00
jonaswinkler
e4eeb29f54
checking file types against parsers in the consumer.
2020-12-01 15:26:05 +01:00
jonaswinkler
1df64e3129
Merge branch 'dev' into feature-ocrmypdf
2020-11-30 16:48:09 +01:00
jonaswinkler
7658c07b4d
added file type checks to the parsers to prevent temporary files from being consumed. Also: parsers announce file types they wish to use as default for each mime type.
2020-11-30 00:40:04 +01:00
Jonas Winkler
9bfa088eb5
reworked the interface of the parsers.
2020-11-25 19:36:39 +01:00
Jonas Winkler
15935ab61f
reworked PDF parser that uses OCRmyPDF and produces archive files.
2020-11-25 14:50:43 +01:00
Jonas Winkler
17b62b61fa
add support for archive files.
2020-11-25 14:47:17 +01:00
Jonas Winkler
3893a23852
Merge branch 'dev' into celery-tasks
2020-11-22 22:49:37 +01:00