3269 Commits

Author SHA1 Message Date
Trenton Holmes
27d1d790f9 Try waiting a little bit after a parser error during the live testing 2022-11-02 15:55:12 -07:00
Trenton Holmes
59ea37f09c No need for an extra import, the object is smart already 2022-11-01 08:44:30 -07:00
Trenton Holmes
f8c3f12146 Adds more options for the filename formatting 2022-11-01 08:44:30 -07:00
Max Bachmann
4a061c38d7 directly use rapidfuzz 2022-10-31 13:17:10 -07:00
Sblop
91a1d8f5ae Update settings.py
Comment too long.
2022-10-28 14:40:48 -07:00
Sblop
fcc9be619a Update settings.py 2022-10-28 14:40:48 -07:00
Sblop
a214b7a861 Update settings.py
Django gives a system error on MariaDB on VARCHARs longer than 255 chars. This was a limitation in older versions of mysql.
Meaning: You cannot run Paperless-NGX on older version were this limitation were present, meaning Django plays it extremely safe by giving an error.
This fixes this problem.
2022-10-28 14:40:48 -07:00
phail
16257f5288 fix string 2022-10-27 23:53:47 +02:00
phail
4caad88790 replace thumbnail creation with mock 2022-10-27 23:41:29 +02:00
phail
6d92b33d13 Downgrade pdf validation to text only 2022-10-27 23:11:41 +02:00
Paperless-ngx Translation Bot [bot]
dcd909e71a New translations django.po (Dutch)
[ci skip]
2022-10-27 02:51:17 -07:00
phail
739e291b2c improve test coverage a little 2022-10-27 00:27:15 +02:00
phail
adff1b6a96 Merge remote-tracking branch 'paperless/dev' into feature-consume-eml 2022-10-26 20:59:49 +02:00
phail
2ea07f6497 remove erroring paramerter 2022-10-25 21:17:40 +02:00
Trenton H
1e1f0347fa More smoothly handle the case of a password protected PDF for barcodes 2022-10-24 13:16:14 -07:00
phail
e4e4d1b0de rename help text 2022-10-24 22:15:33 +02:00
phail
32ee7aa26c Merge remote-tracking branch 'paperless/dev' into feature-consume-eml 2022-10-24 21:12:35 +02:00
Trenton H
6d2851c693 Allows using pdf2image instead of pikepdf if desired 2022-10-24 09:58:34 -07:00
Trenton H
20b7287dc2 Connects up the celery signals to support pending, started and success/failure, without relying on django-celery-results 2022-10-24 09:10:10 -07:00
phail
b151cb7293 update variable names 2022-10-23 21:39:15 +02:00
phail
20a0ba6e57 Merge remote-tracking branch 'paperless/dev' into feature-consume-eml 2022-10-23 20:37:22 +02:00
phail
9d6b725fa1 add tests for mail_to_html and generate_pdf_from_mail 2022-10-23 17:18:10 +02:00
phail
6854896708 test for broken eml, add test_generate_pdf 2022-10-22 02:25:23 +02:00
phail
20e84558d6 add unittest for external images 2022-10-22 00:44:32 +02:00
Paperless-ngx Translation Bot [bot]
11e04a32a8 New translations django.po (Arabic)
[ci skip]
2022-10-20 16:30:32 -07:00
Michael Shamoon
a44dc23979 Update django.po
[ci skip]
2022-10-20 15:33:16 -07:00
Michael Shamoon
54e9e60dd3 rename backend Arabic translation file
[ci skip]
2022-10-20 15:31:28 -07:00
Paperless-ngx Translation Bot [bot]
6884de3c33 New translations django.po (Arabic)
[ci skip]
2022-10-20 15:28:49 -07:00
Trenton Holmes
1be8f39aa0 Reverts the change around skip_noarchive to align with how it is documented to work 2022-10-20 13:34:41 -07:00
phail
f1e0ab314d add unittest for generate_pdf_from_html 2022-10-19 23:19:33 +02:00
phail
3d58129666 add unittest for transform_inline_html 2022-10-18 23:48:07 +02:00
Paperless-ngx Translation Bot [bot]
5f1492f900 New translations django.po (Czech)
[ci skip]
2022-10-18 13:06:11 -07:00
Paperless-ngx Translation Bot [bot]
33a7177867 New translations django.po (Belarusian)
[ci skip]
2022-10-16 06:46:10 -07:00
Paperless-ngx Translation Bot [bot]
5d048bc569 New translations django.po (Belarusian)
[ci skip]
2022-10-16 05:44:52 -07:00
phail
cd8d4ce8ab add unittest for parse 2022-10-15 15:41:43 +02:00
phail
cef1a4f8b9 Add unitest for tika_parse() 2022-10-15 13:13:29 +02:00
phail
76dec120d1 add 2 more tests 2022-10-14 15:43:43 +02:00
phail
dec6e6c0b8 add unittest for get_thumbnail 2022-10-13 01:03:09 +02:00
Trenton Holmes
ddef90d96e Adds specific handling for CCITT Group 4, which pikepdf decodes, but not correctly 2022-10-11 13:51:14 -07:00
Trenton H
c888b3dfd3 In case pikepdf fails to convert an image to a PIL image, fall back to converting pages to PIL images 2022-10-11 13:51:13 -07:00
Trenton H
0c08b16402 Catch the new error raised by redis when it can't find the broker and stub out the call for testing 2022-10-10 14:21:42 -07:00
Trenton H
4994df2e3c Fixes usage of a depracated logger method 2022-10-10 14:20:19 -07:00
Trenton H
e88d911984 Account for plusses in the OCR language setting 2022-10-10 08:58:23 -07:00
Trenton H
2d71415ede Allows disabling NLTK, adds it as a consideration for low power devices 2022-10-10 08:58:23 -07:00
Trenton Holmes
a78d44ec5f Changes the NLTK language to be based on the Tesseract OCR language, with fallback to the default processing 2022-10-10 08:58:23 -07:00
Trenton H
0bc13c2a72 Allows configuration of the NLTK processing language 2022-10-10 08:58:23 -07:00
Trenton Holmes
70b1988a55 Fixes the download and usage of the downloaded data 2022-10-10 08:58:23 -07:00
Trenton Holmes
3c12f13df2 Missed one mock 2022-10-10 08:58:23 -07:00
Trenton Holmes
d334eec321 Mock out the nltk portions so the data doesn't need to be downloaded 2022-10-10 08:58:23 -07:00
Trenton Holmes
66884ea035 Updates the pre-processing of document content to be much more robust, with tokenization, stemming and stop word removal 2022-10-10 08:58:23 -07:00