paperless-ngx

mirror of https://github.com/paperless-ngx/paperless-ngx.git synced 2026-02-24 00:59:35 -06:00

Author	SHA1	Message	Date
jonaswinkler	bee7a06e41	fix bugs and test cases	2021-01-02 15:37:27 +01:00
jonaswinkler	755f950cd2	supply file_name for tika parser	2021-01-01 22:19:43 +01:00
jonaswinkler	f1e9b414f9	remove duplicate code	2021-01-01 21:50:45 +01:00
jonaswinkler	4b7138f477	fixes #218	2020-12-30 15:12:16 +01:00
jonaswinkler	d329b371ef	removed unused code	2020-12-20 14:00:24 +01:00
jonaswinkler	a3334293af	more tests	2020-12-19 15:54:13 +01:00
jonaswinkler	45d31f9735	fixes bauerj/paperless_app#23 and most of all other scanner apps out there.	2020-12-12 18:25:15 +01:00
jonaswinkler	0c6c4a62d8	moved metadata extraction to the parsers	2020-12-10 14:57:53 +01:00
jonaswinkler	905c090908	fixes for the parser.	2020-12-04 16:44:34 +01:00
jonaswinkler	884eec9b61	disabled thumbnail trimming.	2020-12-04 12:44:02 +01:00
jonaswinkler	e2a375c9aa	catch encrypted pdf documents	2020-12-03 01:02:37 +01:00
jonaswinkler	1d073d2cfd	a couple fixes and more supported image files	2020-12-02 17:39:49 +01:00
jonaswinkler	0fb294d556	testing the new noarchive option.	2020-12-01 14:30:13 +01:00
jonaswinkler	1f90d50833	some more tests.	2020-12-01 14:15:43 +01:00
jonaswinkler	1df64e3129	Merge branch 'dev' into feature-ocrmypdf	2020-11-30 16:48:09 +01:00
jonaswinkler	7658c07b4d	added file type checks to the parsers to prevent temporary files from being consumed. Also: parsers announce file types they wish to use as default for each mime type.	2020-11-30 00:40:04 +01:00
jonaswinkler	20cc7e3dc0	more tests!	2020-11-29 19:58:48 +01:00
jonaswinkler	388f6cfbe6	reorganised settings documentation and added OCR_USER_ARGS	2020-11-29 12:38:32 +01:00
jonaswinkler	a19a336567	fixed checking the installed languages.	2020-11-29 12:31:42 +01:00
jonaswinkler	99e6906b51	test case fixes.	2020-11-27 14:06:37 +01:00
Jonas Winkler	f901def797	more tests of the new parser	2020-11-26 00:08:23 +01:00
Jonas Winkler	c00c63c639	fixed the test cases	2020-11-25 19:51:09 +01:00
Jonas Winkler	e55d1ff9cc	OMP_THREAD_LIMIT	2020-11-25 19:37:59 +01:00
Jonas Winkler	3b655c95d9	added image DPI detection to the tesseract parser.	2020-11-25 19:37:48 +01:00
Jonas Winkler	9bfa088eb5	reworked the interface of the parsers.	2020-11-25 19:36:39 +01:00
Jonas Winkler	b02f29ce9d	Merge branch 'dev' into feature-ocrmypdf	2020-11-25 16:58:20 +01:00
Jonas Winkler	bd8a2eaf1e	codestyle	2020-11-25 16:05:52 +01:00
Jonas Winkler	f5656222e2	removed obsolete tests.	2020-11-25 14:51:32 +01:00
Jonas Winkler	15935ab61f	reworked PDF parser that uses OCRmyPDF and produces archive files.	2020-11-25 14:50:43 +01:00
Jonas Winkler	7a6dcf8520	default language check	2020-11-25 10:52:38 +01:00
Jonas Winkler	ae198f0767	new setting: PAPERLESS_OCR_PAGES	2020-11-22 12:54:08 +01:00
Jonas Winkler	a532200d10	code cleanup	2020-11-21 15:34:00 +01:00
Jonas Winkler	afc3753e58	code cleanup	2020-11-21 14:03:45 +01:00
Jonas Winkler	f976a0b4ba	mime type handling	2020-11-20 13:31:03 +01:00
Jonas Winkler	cbee56ae8c	testing the tesseract parser	2020-11-19 20:31:08 +01:00
Jonas Winkler	680ab3d56b	updated logging, logging for the mail consumer to see whats happening	2020-11-18 13:23:30 +01:00
Jonas Winkler	9a48d6c577	Changed the way parsers are discovered. This also prepares for upcoming changes regarding content types and file types: parsers should declare what they support, and actual file extensions should not be hardcoded everywhere.	2020-11-16 23:53:12 +01:00
Jonas Winkler	bd04c966c5	first version of the new consumer.	2020-11-16 18:26:54 +01:00
Jonas Winkler	eb6805e37e	code style fixes	2020-11-12 21:09:45 +01:00
Jonas Winkler	340f9f141f	fixed most of the tests	2020-11-02 19:42:23 +01:00
Jonas Winkler	d42979842e	made unpaper and convert a little bit nicer to interact with	2020-11-02 19:31:04 +01:00
Jonas Winkler	a89773ad71	removed unused code, small fixes	2020-11-02 18:20:04 +01:00
Jonas Winkler	def3a85858	reworked most of the tesseract parser, better logging	2020-11-02 15:40:44 +01:00
Jonas Winkler	972a6a2333	bugfix	2020-11-02 01:26:42 +01:00
Jonas Winkler	6adc870a20	silenced unpaper, optipng for cleaner output moved parser settings to settings removed forgiving ocr (now default) since tesseract is plenty accurate even without defining the correct language.	2020-11-01 23:23:42 +01:00
Jonas Winkler	0f4094f3ca	better thumbnail generation for smaller files	2020-10-26 01:05:23 +01:00
Johannes Wienke	ebcfcea05b	Handle dateparser ValueErrors When parsing dates from the document text or filenames, correctly handle values errors indicating broken dates. Newly added tests ensure that this handling works properly.	2020-03-08 18:44:15 +01:00
Johannes Wienke	6531a67940	Remove duplicated date parsing test The exact same tests existed twice in the file.	2020-03-08 18:26:29 +01:00
Stéphane Brunner	3fab354a6e	Strip the thumbnails	2019-03-17 16:37:47 +01:00
jenspfeifle	5c40da1a48	make pycodestyle happy	2019-03-03 20:41:17 +01:00

1 2 3

107 Commits