34 Commits

Author SHA1 Message Date
Evgenii
1dc8477a00 Fix: update ASN regex to support Unicode (#5099) 2023-12-25 16:33:30 -08:00
Trenton H
e8877c2c0e Fix: Document metadata is lost during barcode splitting (#4982)
* Fixes barcode splitting dropping metadata that might be needed for the round 2
2023-12-15 09:17:25 -08:00
Sebastian Porombka
62fdc545b9 barcode logic: strip non-numeric characters from detected ASN string (#4379)
* legacy barcodes exist which still contain characters after the number. the current logic did not truncate them. instead, int() was called from the remaining string. this does not work in this case. it is therefore sufficient to continue processing numeric characters.

* lint

---------

Co-authored-by: shamoon <4887959+shamoon@users.noreply.github.com>
2023-10-17 03:44:22 +00:00
Trenton Holmes
34b80a4d8e Removes support for Python 3.8 and lower from the code base 2023-09-10 11:42:59 -07:00
Dennis Brakhane
ef749f9a29 Feature: collate two single-sided multipage scans (#3784)
* Feature: collate two single-sided scans

Some ADF only support single-sided scans, making scanning
double-sided documents a bit annoying.

This new feature enables Paperless to do most of the work,
by merging two seperate scans into a single one, collating
the even and odd numbered pages.

* Documentation: clarify that collation is disabled by default

* Apply suggestions from code review

Co-authored-by: shamoon <4887959+shamoon@users.noreply.github.com>

* Address code review remarks

* Grammar fixes

---------

Co-authored-by: shamoon <4887959+shamoon@users.noreply.github.com>
2023-07-24 00:29:04 -07:00
Trenton H
e160580c8b Fixes issues with copy2 or copystat and SELinux see #3665 2023-07-22 06:27:49 -07:00
Bastian Machek
324e30bd4b Feature: support barcode upscaling for better detection of small barcodes (#3655) 2023-06-27 10:18:47 -07:00
Trenton H
4504668cb2 Let ruff autofix some things from the newest version 2023-06-13 20:15:18 -07:00
Trenton H
e83be2e540 In cases where a temporary file is created or used, copy the original file stats to it 2023-06-07 09:02:19 -07:00
Trenton H
1396f25419 Updates handling of barcodes to encapsulate logic, moving it out of tasks and into barcodes 2023-05-22 06:52:31 -07:00
Trenton H
d2c02b9102 Configures ruff as the one stop linter and resolves warnings it raised 2023-04-01 17:03:52 -07:00
Trenton H
36a6df0bae Creates a data model for the document consumption, allowing stronger typing of arguments and setting of some information about the file only once 2023-04-01 11:05:34 -07:00
Trenton H
f124228e86 Instead of using PIL directly to convert TIFF to PDF, use the existing library of img2pdf 2023-03-20 13:48:05 -07:00
Marvin Gaube
c66a0ec82e feature: Add support for zxing as barcode scanning lib 2023-03-19 13:48:35 +01:00
Trenton H
ec2b0eb308 Changes out the settings and a decent amount of test code to be pathlib compatible 2023-03-06 09:16:07 -08:00
Trenton Holmes
e36d46f0df When splitting via barcodes, cleanup the split documents better 2023-02-12 08:20:12 -08:00
Fabian Ohler
c08b19c7a9 Feature: split documents on ASN barcode (#2554)
* also split documents when an ASN barcode is found

* linter

* fix test case parameters

* avoid pre-python-3.9 features

* simplify dict-creation in tests

* simplify dict-creation in tests for empty dicts

* Add test cases for the splitting by ASN barcode feature

* deleted supporting files for test case construction
2023-02-01 01:13:30 -08:00
Trenton H
b19ada7a41 Removes pikepdf based scanning, fixes up unit testing (+ commenting) 2023-01-27 12:24:47 -08:00
Trenton H
f61536f74c Tweaks the resizing based on testing 2023-01-24 10:30:53 -08:00
Trenton H
68c9f7a614 Rescales images from PDFs so zbar can better find them 2023-01-24 10:30:53 -08:00
Trenton H
1102a18697 Use dataclasses to group data about barcodes in documents 2023-01-24 09:43:52 -08:00
Peter Kappelt
147293a2cc Proper code formatting 2023-01-24 09:43:52 -08:00
Peter Kappelt
b865890bce Unified separator ans ASN barcode parsing
so that barcode parsing won't run twice
2023-01-24 09:43:52 -08:00
Peter Kappelt
099b8b8161 Feature: Parse ASN from barcode
ASN-Barcodes are identified by a configurable prefix
2023-01-24 09:43:52 -08:00
Peter Kappelt
f8f8cc7dd0 split function for reading barcode and separating pages 2023-01-24 09:43:52 -08:00
Trenton H
189d02dfe6 Always use pikepdf, then pdf2image if needed to check for barcodes instead of requiring/allowing configuration 2022-11-09 13:01:39 -08:00
Trenton H
1e1f0347fa More smoothly handle the case of a password protected PDF for barcodes 2022-10-24 13:16:14 -07:00
Trenton H
6d2851c693 Allows using pdf2image instead of pikepdf if desired 2022-10-24 09:58:34 -07:00
Trenton Holmes
ddef90d96e Adds specific handling for CCITT Group 4, which pikepdf decodes, but not correctly 2022-10-11 13:51:14 -07:00
Trenton H
c888b3dfd3 In case pikepdf fails to convert an image to a PIL image, fall back to converting pages to PIL images 2022-10-11 13:51:13 -07:00
Trenton H
13465fcfda Fixes grammar in comment
Co-authored-by: Florian <florian.brandes@posteo.de>
2022-09-16 09:08:16 -07:00
Trenton Holmes
b21f64de8a Updates how barcodes are detected, using pikepdf images, instead of converting each page to an image 2022-09-16 09:08:16 -07:00
Trenton Holmes
33a4a273a3 Fixes the seperation of files by barcode, during the case where 2 barcodes appear back to back 2022-09-14 14:00:37 -07:00
Trenton Holmes
af204426af Moves the barcode related functionality out of tasks and into its own location. Splits up the testing based on that 2022-07-02 16:19:22 +02:00