36 Commits

Author SHA1 Message Date
shamoon
a3b4198408 Feature: auto-clean some invalid pdfs (#7651) 2024-09-25 15:57:20 +00:00
Dennis Brakhane
ef749f9a29 Feature: collate two single-sided multipage scans (#3784)
* Feature: collate two single-sided scans

Some ADF only support single-sided scans, making scanning
double-sided documents a bit annoying.

This new feature enables Paperless to do most of the work,
by merging two seperate scans into a single one, collating
the even and odd numbered pages.

* Documentation: clarify that collation is disabled by default

* Apply suggestions from code review

Co-authored-by: shamoon <4887959+shamoon@users.noreply.github.com>

* Address code review remarks

* Grammar fixes

---------

Co-authored-by: shamoon <4887959+shamoon@users.noreply.github.com>
2023-07-24 00:29:04 -07:00
Bastian Machek
324e30bd4b Feature: support barcode upscaling for better detection of small barcodes (#3655) 2023-06-27 10:18:47 -07:00
Trenton H
1396f25419 Updates handling of barcodes to encapsulate logic, moving it out of tasks and into barcodes 2023-05-22 06:52:31 -07:00
Fabian Ohler
c08b19c7a9 Feature: split documents on ASN barcode (#2554)
* also split documents when an ASN barcode is found

* linter

* fix test case parameters

* avoid pre-python-3.9 features

* simplify dict-creation in tests

* simplify dict-creation in tests for empty dicts

* Add test cases for the splitting by ASN barcode feature

* deleted supporting files for test case construction
2023-02-01 01:13:30 -08:00
Trenton H
b19ada7a41 Removes pikepdf based scanning, fixes up unit testing (+ commenting) 2023-01-27 12:24:47 -08:00
shamoon
7dad9f29a1 Merge pull request #2498 from paperless-ngx/fix-2496
Fix: limit asn integer size
2023-01-24 10:37:04 -08:00
Trenton H
68c9f7a614 Rescales images from PDFs so zbar can better find them 2023-01-24 10:30:53 -08:00
Trenton H
83c5e051fd Adjust the barcode to ASN range check and add test case to cover the check 2023-01-24 10:30:32 -08:00
Peter Kappelt
8ed3740c98 Extended tests for ASN barcode parsing 2023-01-24 09:43:52 -08:00
Trenton H
1e1f0347fa More smoothly handle the case of a password protected PDF for barcodes 2022-10-24 13:16:14 -07:00
Trenton Holmes
ddef90d96e Adds specific handling for CCITT Group 4, which pikepdf decodes, but not correctly 2022-10-11 13:51:14 -07:00
Trenton Holmes
33a4a273a3 Fixes the seperation of files by barcode, during the case where 2 barcodes appear back to back 2022-09-14 14:00:37 -07:00
Trenton Holmes
ee4e3cebe2 Removes last vestiges of PNG from the tests, code, docs and samples 2022-06-11 14:20:50 -07:00
Trenton Holmes
a944ef1ca6 Adds additional testing for both date parsing and consumed document created date 2022-05-08 16:57:35 -07:00
Florian Brandes
40d1412e5e add TIFF barcode support
Signed-off-by: Florian Brandes <florian.brandes@posteo.de>
2022-04-16 21:59:03 +02:00
Florian Brandes
b58d10a2d7 add more tests
Signed-off-by: Florian Brandes <florian.brandes@posteo.de>
2022-04-07 11:14:29 +02:00
Florian Brandes
537bec2eeb addes tests:
- barcode-39
- barcode-128
- qr barcodes
- test for consumption

Signed-off-by: Florian Brandes <florian.brandes@posteo.de>
2022-04-07 11:14:29 +02:00
florian on nixos (Florian Brandes)
b787971421 working split pages
Signed-off-by: florian on nixos (Florian Brandes) <florian.brandes@posteo.de>
2022-04-06 21:16:41 +02:00
florian on nixos (Florian Brandes)
aa46b06d95 add first tests for barcode reader
Signed-off-by: florian on nixos (Florian Brandes) <florian.brandes@posteo.de>
2022-04-06 21:16:41 +02:00
jonaswinkler
d41f540a87 more testing #511 2021-02-09 00:01:11 +01:00
jonaswinkler
e1376cbf40 migration for #511 2021-02-08 20:59:14 +01:00
jonaswinkler
817bb299a9 added a test case that replicates #511 2021-02-07 18:23:54 +01:00
jonaswinkler
f1d15561f6 some bug fixes and tests 2021-01-18 14:16:32 +01:00
jonaswinkler
d60c3c1abd new exporter that updates the export in place, fixes #376 #343 #166 2021-01-18 01:15:39 +01:00
jonaswinkler
99c7ff3123 added invalid PDF document with BOM marker 2020-12-29 21:02:45 +01:00
jonaswinkler
cb959e296a more tests! 2020-11-29 19:22:49 +01:00
jonaswinkler
744b86bb91 fixes an issue with paperless not assigning metadata when FILENAME_FORMAT is specified and resolves an invalid warning about missing files fixes #67 2020-11-29 14:45:43 +01:00
jonaswinkler
f49bf187eb more tests. 2020-11-26 23:56:57 +01:00
Jonas Winkler
d08e6f333a Test cases for the API 2020-11-26 17:57:00 +01:00
Jonas Winkler
d99b4623f8 first implementation of the mail rework 2020-11-15 23:56:22 +01:00
Daniel Quinn
d2c283582b feat: refactor for pluggable consumers
I've broken out the OCR-specific code from the consumers and dumped it
all into its own app, `paperless_tesseract`.  This new app should serve
as a sample of how to create one's own consumer for different file
types.

Documentation for how to do this isn't ready yet, but for the impatient:

* Create a new app
    * containing a `parsers.py` for your parser modelled after
      `paperless_tesseract.parsers.RasterisedDocumentParser`
    * containing a `signals.py` with a handler moddelled after
      `paperless_tesseract.signals.ConsumerDeclaration`
    * connect the signal handler to
      `documents.signals.document_consumer_declaration` in
      `your_app.apps`
* Install the app into Paperless by declaring
  `PAPERLESS_INSTALLED_APPS=your_app`.  Additional apps should be
  separated with commas.
* Restart the consumer
2017-03-25 15:10:25 +00:00
Daniel Quinn
18495ce9da Fix for #154
* Added a test with a faked pyocr and tesseract
* Added a catch for pyocr's *other* TesseractError
2016-11-27 15:06:45 +00:00
Florian Harr
9ff4b6c6bc UnitTests for inline attachment email 2016-04-14 13:01:03 -04:00
Daniel Quinn
6b0a537bff Added support for a shared secret in email 2016-02-14 03:01:24 +00:00
Daniel Quinn
c4311af263 Cleaned up the tests 2016-02-06 17:41:11 +00:00