From d7160de9f1ebd7aafe765561af059125fe14275e Mon Sep 17 00:00:00 2001 From: Jonas Winkler Date: Mon, 23 Nov 2020 19:34:52 +0100 Subject: [PATCH 01/36] many changes to the documentation, mostly typos --- docs/administration.rst | 12 ++++++------ docs/advanced_usage.rst | 6 ++++-- docs/api.rst | 2 +- docs/changelog.rst | 6 +++--- docs/configuration.rst | 8 ++++---- docs/contributing.rst | 2 +- docs/faq.rst | 6 +++--- docs/scanners.rst | 2 +- docs/screenshots.rst | 2 +- docs/setup.rst | 6 +++--- docs/usage_overview.rst | 24 +++++++++++++++--------- 11 files changed, 42 insertions(+), 34 deletions(-) diff --git a/docs/administration.rst b/docs/administration.rst index a77c559f9..c582e83a0 100644 --- a/docs/administration.rst +++ b/docs/administration.rst @@ -30,7 +30,7 @@ Options available to docker installations: Paperless uses 3 volumes: * ``paperless_media``: This is where your documents are stored. - * ``paperless_data``: This is where auxilliary data is stored. This + * ``paperless_data``: This is where auxillary data is stored. This folder also contains the SQLite database, if you use it. * ``paperless_pgdata``: Exists only if you use PostgreSQL and contains the database. @@ -109,7 +109,7 @@ B. If you built the image yourself, grab the new archive and replace your curre .. hint:: You can usually keep your ``docker-compose.env`` file, since this file will - never include mandantory configuration options. However, it is worth checking + never include mandatory configuration options. However, it is worth checking out the new version of this file, since it might have new recommendations on what to configure. @@ -126,8 +126,8 @@ After grabbing the new release and unpacking the contents, do the following: $ pip install --upgrade pipenv $ cd /path/to/paperless - $ pipenv install $ pipenv clean + $ pipenv install This creates a new virtual environment (or uses your existing environment) and installs all dependencies into it. @@ -247,12 +247,12 @@ your already processed documents. When multiple document types or correspondents match a single document, the retagger won't assign these to the document. Specify ``--use-first`` -to override this behaviour and just use the first correspondent or type +to override this behavior and just use the first correspondent or type it finds. This option does not apply to tags, since any amount of tags can be applied to a document. Finally, ``-f`` specifies that you wish to overwrite already assigned -correspondents, types and/or tags. The default behaviour is to not +correspondents, types and/or tags. The default behavior is to not assign correspondents and types to documents that have this data already assigned. ``-f`` works differently for tags: By default, only additional tags get added to documents, no tags will be removed. With ``-f``, tags that don't @@ -341,7 +341,7 @@ Documents can be stored in Paperless using GnuPG encryption. .. danger:: - Encryption is depreceated since paperless-ng 0.9 and doesn't really provide any + Encryption is deprecated since paperless-ng 0.9 and doesn't really provide any additional security, since you have to store the passphrase in a configuration file on the same system as the encrypted documents for paperless to work. Furthermore, the entire text content of the documents is stored plain in the diff --git a/docs/advanced_usage.rst b/docs/advanced_usage.rst index a6f44ce48..653bee1c6 100644 --- a/docs/advanced_usage.rst +++ b/docs/advanced_usage.rst @@ -84,6 +84,8 @@ to the filename. PAPERLESS_FILENAME_PARSE_TRANSFORMS=[{"pattern":"^([a-z]+)_(\\d{8})_(\\d{6})_([0-9]+)\\.", "repl":"\\2\\3Z - \\4 - \\1."}, {"pattern":"^([a-z]+)_([0-9]+)\\.", "repl":" - \\2 - \\1."}] +.. _advanced-matching: + Matching tags, correspondents and document types ################################################ @@ -253,7 +255,7 @@ By default, paperless stores your documents in the media directory and renames t using the identifier which it has assigned to each document. You will end up getting files like ``0000123.pdf`` in your media directory. This isn't necessarily a bad thing, because you normally don't have to access these files manually. However, if -you wish to name your files differently, you can do that by adjustng the +you wish to name your files differently, you can do that by adjusting the ``PAPERLESS_FILENAME_FORMAT`` settings variable. This variable allows you to configure the filename (folders are allowed!) using @@ -278,7 +280,7 @@ will create a directory structure as follows: my_new_shoes-0000004.pdf Paperless appends the unique identifier of each document to the filename. This -avoides filename clashes. +avoids filename clashes. .. danger:: diff --git a/docs/api.rst b/docs/api.rst index e661cc3ff..4f41832de 100644 --- a/docs/api.rst +++ b/docs/api.rst @@ -94,7 +94,7 @@ Result object: } * ``id``: the primary key of the found document -* ``highlights``: an object containing parseable highlights for the result. +* ``highlights``: an object containing parsable highlights for the result. See below. * ``score``: The score assigned to the document. A higher score indicates a better match with the query. Search results are sorted descending by score. diff --git a/docs/changelog.rst b/docs/changelog.rst index 9ab853439..9ef3f4326 100644 --- a/docs/changelog.rst +++ b/docs/changelog.rst @@ -52,7 +52,7 @@ paperless-ng 0.9.0 * **Added:** New frontend. Features: * Single page application: It's much more responsive than the django admin pages. - * Dashboard. Shows recently scanned documents, or todos, or other documents + * Dashboard. Shows recently scanned documents, or todo notes, or other documents at wish. Allows uploading of documents. Shows basic statistics. * Better document list with multiple display options. * Full text search with result highlighting, auto completion and scoring based @@ -102,7 +102,7 @@ paperless-ng 0.9.0 * **Modified [breaking]:** PostgreSQL: - * If ``PAPERLESS_DBHOST`` is specified in the settings, paperless uses postgresql instead of sqlite. + * If ``PAPERLESS_DBHOST`` is specified in the settings, paperless uses PostgreSQL instead of SQLite. Username, database and password all default to ``paperless`` if not specified. * **Modified [breaking]:** document_retagger management command rework. See @@ -130,7 +130,7 @@ paperless-ng 0.9.0 Certain language specifics such as umlauts may not get picked up properly. * ``PAPERLESS_DEBUG`` defaults to ``false``. * The presence of ``PAPERLESS_DBHOST`` now determines whether to use PostgreSQL or - sqlite. + SQLite. * ``PAPERLESS_OCR_THREADS`` is gone and replaced with ``PAPERLESS_TASK_WORKERS`` and ``PAPERLESS_THREADS_PER_WORKER``. Refer to the config example for details. * ``PAPERLESS_OPTIMIZE_THUMBNAILS`` allows you to disable or enable thumbnail diff --git a/docs/configuration.rst b/docs/configuration.rst index afb0b5f90..e3f0c0e9f 100644 --- a/docs/configuration.rst +++ b/docs/configuration.rst @@ -69,7 +69,7 @@ PAPERLESS_CONSUMPTION_DIR= Defaults to "../consume", relative to the "src" directory. PAPERLESS_DATA_DIR= - This is where paperless stores all its data (search index, sqlite database, + This is where paperless stores all its data (search index, SQLite database, classification model, etc). Defaults to "../data", relative to the "src" directory. @@ -100,7 +100,7 @@ Hosting & Security ################## PAPERLESS_SECRET_KEY= - Paperless uses this to make session tokens. If you exose paperless on the + Paperless uses this to make session tokens. If you expose paperless on the internet, you need to change this, since the default secret is well known. Use any sequence of characters. The more, the better. You don't need to @@ -220,7 +220,7 @@ PAPERLESS_CONSUMER_POLLING= specify a polling interval in seconds here, which will then cause paperless to periodically check your consumption directory for changes. - Defaults to 0, which disables polling and uses filesystem notifiactions. + Defaults to 0, which disables polling and uses filesystem notifications. PAPERLESS_CONSUMER_DELETE_DUPLICATES= When the consumer detects a duplicate document, it will not touch the @@ -264,7 +264,7 @@ PAPERLESS_CONVERT_DENSITY= Default is 300. PAPERLESS_OPTIMIZE_THUMBNAILS= - Use optipng to optimize thumbnails. This usually reduces the sice of + Use optipng to optimize thumbnails. This usually reduces the size of thumbnails by about 20%, but uses considerable compute time during consumption. diff --git a/docs/contributing.rst b/docs/contributing.rst index 540081b7e..30eb9779a 100644 --- a/docs/contributing.rst +++ b/docs/contributing.rst @@ -85,7 +85,7 @@ quoted, or triple-quoted string will do: problematic_string = 'This is a "string" with "quotes" in it' In HTML templates, please use double-quotes for tag attributes, and single -quotes for arguments passed to Django tempalte tags: +quotes for arguments passed to Django template tags: .. code:: html diff --git a/docs/faq.rst b/docs/faq.rst index ea05544a6..7b5432326 100644 --- a/docs/faq.rst +++ b/docs/faq.rst @@ -17,7 +17,7 @@ is .. caution:: - Dont mess with this folder. Don't change permissions and don't move + Do not mess with this folder. Don't change permissions and don't move files around manually. This folder is meant to be entirely managed by docker and paperless. @@ -36,9 +36,9 @@ file extensions do not matter. **A:** The short answer is yes. I've tested it on a Raspberry Pi 3 B. The long answer is that certain parts of -Paperless will run very slow, such as the tesseract OCR. On Rasperry Pi, +Paperless will run very slow, such as the tesseract OCR. On Raspberry Pi, try to OCR documents before feeding them into paperless so that paperless can -reuse the text. The web interface should be alot snappier, since it runs +reuse the text. The web interface should be a lot snappier, since it runs in your browser and paperless has to do much less work to serve the data. .. note:: diff --git a/docs/scanners.rst b/docs/scanners.rst index 0c78f79e4..d4ad4dfb1 100644 --- a/docs/scanners.rst +++ b/docs/scanners.rst @@ -8,7 +8,7 @@ Scanner recommendations As Paperless operates by watching a folder for new files, doesn't care what scanner you use, but sometimes finding a scanner that will write to an FTP, NFS, or SMB server can be difficult. This page is here to help you find one -that works right for you based on recommentations from other Paperless users. +that works right for you based on recommendations from other Paperless users. +---------+----------------+-----+-----+-----+----------------+ | Brand | Model | Supports | Recommended By | diff --git a/docs/screenshots.rst b/docs/screenshots.rst index cf99641c5..7ba431563 100644 --- a/docs/screenshots.rst +++ b/docs/screenshots.rst @@ -21,7 +21,7 @@ Extensive filtering mechanisms: .. image:: _static/screenshots/documents-filter.png -Side-by-side editing of documents. Optmized for 1080p. +Side-by-side editing of documents. Optimized for 1080p. .. image:: _static/screenshots/editing.png diff --git a/docs/setup.rst b/docs/setup.rst index af2f47f90..d0e7099c7 100644 --- a/docs/setup.rst +++ b/docs/setup.rst @@ -85,7 +85,7 @@ Paperless consists of the following components: needs to do from time to time in order to operate properly. This allows paperless to process multiple documents from your consumption folder in parallel! On - a modern multicore system, consumption with full ocr is blazing fast. + a modern multi core system, consumption with full ocr is blazing fast. The task processor comes with a built-in admin interface that you can use to see whenever any of the tasks fail and inspect the errors (i.e., wrong email credentials, errors during consuming a specific @@ -322,7 +322,7 @@ management commands as below. $ cd /path/to/paperless $ docker-compose run --rm webserver /bin/bash - This will lauch the container and initialize the PostgreSQL database. + This will launch the container and initialize the PostgreSQL database. b) Without docker, open a shell in your virtual environment, switch to the ``src`` directory and create the database schema: @@ -372,7 +372,7 @@ configuring some options in paperless can help improve performance immensely: * ``PAPERLESS_TASK_WORKERS`` and ``PAPERLESS_THREADS_PER_WORKER`` are configured to use all cores. The Raspberry Pi models 3 and up have 4 cores, meaning that paperless will use 2 workers and 2 threads per worker. This may result in - slugish response times during consumption, so you might want to lower these + sluggish response times during consumption, so you might want to lower these settings (example: 2 workers and 1 thread to always have some computing power left for other tasks). * Keep ``PAPERLESS_OCR_ALWAYS`` at its default value 'false' and consider OCR'ing diff --git a/docs/usage_overview.rst b/docs/usage_overview.rst index 5f47b56a9..0e50dafc2 100644 --- a/docs/usage_overview.rst +++ b/docs/usage_overview.rst @@ -5,13 +5,13 @@ Usage Overview Paperless is an application that manages your personal documents. With the help of a document scanner (see :ref:`scanners`), paperless transforms your wieldy physical document binders into a searchable archive and -provices many utilities for finding and managing your documents. +provides many utilities for finding and managing your documents. Terms and definitions ##################### -Paperless esentially consists of two different parts for managing your +Paperless essentially consists of two different parts for managing your documents: * The *consumer* watches a specified folder and adds all documents in that @@ -30,12 +30,12 @@ Each document has a couple of fields that you can assign to them: tag, however, a single document can also have multiple tags. This is not possible with folders. The reason folders are not implemented in paperless is simply that tags are much more versatile than folders. -* A *document type* is used to demarkate the type of a document such as letter, +* A *document type* is used to demarcate the type of a document such as letter, bank statement, invoice, contract, etc. It is used to identify what a document is about. * The *date added* of a document is the date the document was scanned into paperless. You cannot and should not change this date. -* The *date created* of a document is the date the document was intially issued. +* The *date created* of a document is the date the document was initially issued. This can be the date you bought a product, the date you signed a contract, or the date a letter was sent to you. * The *archive serial number* (short: ASN) of a document is the identifier of @@ -131,7 +131,7 @@ These are as follows: With the correct set of rules, you can completely automate your email documents. Create rules for every correspondent you receive digital documents from and - paperless will read them automatically. The default acion "mark as read" is + paperless will read them automatically. The default action "mark as read" is pretty tame and will not cause any damage or data loss whatsoever. You can also setup a special folder in your mail account for paperless and use @@ -182,7 +182,7 @@ Processing of the physical documents ==================================== Keep a physical inbox. Whenever you receive a document that you need to -archive, put it into your inbox. Regulary, do the following for all documents +archive, put it into your inbox. Regularly, do the following for all documents in your inbox: 1. For each document, decide if you need to keep the document in physical @@ -217,18 +217,24 @@ Once you have scanned in a document, proceed in paperless as follows. 1. If the document has an ASN, assign the ASN to the document. 2. Assign a correspondent to the document (i.e., your employer, bank, etc) - This isnt strictly necessary but helps in finding a document when you need + This isn't strictly necessary but helps in finding a document when you need it. 3. Assign a document type (i.e., invoice, bank statement, etc) to the document - This isnt strictly necessary but helps in finding a document when you need + This isn't strictly necessary but helps in finding a document when you need it. 4. Assign a proper title to the document (the name of an item you bought, the subject of the letter, etc) -5. Check that the date of the document is corrent. Paperless tries to read +5. Check that the date of the document is correct. Paperless tries to read the date from the content of the document, but this fails sometimes if the OCR is bad or multiple dates appear on the document. 6. Remove inbox tags from the documents. +.. hint:: + + You can setup manual matching rules for your correspondents and tags and + paperless will assign them automatically. After consuming a couple documents, + you can even ask paperless to *learn* when to assign tags and correspondents + by itself. For details on this feature, see :ref:`advanced-matching`. Task management =============== From 09e419aeee98f0fdb70017f20d058c992ea275e9 Mon Sep 17 00:00:00 2001 From: Jonas Winkler Date: Mon, 23 Nov 2020 21:42:01 +0100 Subject: [PATCH 02/36] Added some notes about how to move back to paperless. --- docs/setup.rst | 29 +++++++++++++++++++++++++++++ 1 file changed, 29 insertions(+) diff --git a/docs/setup.rst b/docs/setup.rst index d0e7099c7..88785364b 100644 --- a/docs/setup.rst +++ b/docs/setup.rst @@ -357,6 +357,35 @@ management commands as below. 7. Start paperless. +Moving back to paperless +======================== + +Lets say you migrated to Paperless-ng and used it for a while, but decided that +you don't like it and want to move back (If you do, send me a mail about what +part you didn't like!), you can totally do that with a few simple steps. + +Paperless-ng modified the database schema slightly, however, these changes can +be reverted while keeping your current data, so that your current data will +be compatible with original Paperless. + +Execute this: + +.. code:: shell-session + + $ cd /path/to/paperless + $ docker-compose run --rm webserver migrate documents 0023 + +Or without docker: + +.. code:: shell-session + + $ cd /path/to/paperless/src + $ python3 manage.py migrate documents 0023 + +After that, you need to clear your cookies (Paperless-ng comes with updated +dependencies that do cookie-processing differently) and probably your cache +as well. + .. _setup-less_powerful_devices: From f4013b134323215e167868964746bdf6f0f828b6 Mon Sep 17 00:00:00 2001 From: Jonas Winkler Date: Mon, 23 Nov 2020 22:50:02 +0100 Subject: [PATCH 03/36] added setting PAPERLESS_AUTO_LOGIN_USERNAME --- docs/changelog.rst | 84 ++++++++++++++++++++++----------------- docs/configuration.rst | 40 ++++++++++++------- paperless.conf.example | 1 + src/paperless/auth.py | 10 +++++ src/paperless/settings.py | 9 +++++ 5 files changed, 92 insertions(+), 52 deletions(-) diff --git a/docs/changelog.rst b/docs/changelog.rst index 9ef3f4326..7a1b1c374 100644 --- a/docs/changelog.rst +++ b/docs/changelog.rst @@ -5,6 +5,13 @@ Changelog ********* +next +#### + +* Setting ``PAPERLESS_AUTO_LOGIN_USERNAME`` replaces ``PAPERLESS_DISABLE_LOGIN``. + You have to specify your username. + + paperless-ng 0.9.2 ################## @@ -138,8 +145,11 @@ paperless-ng 0.9.0 * Many more small changes here and there. The usual stuff. +Paperless +######### + 2.7.0 -##### +===== * `syntonym`_ submitted a pull request to catch IMAP connection errors `#475`_. * `Stéphane Brunner`_ added ``psycopg2`` to the Pipfile `#489`_. He also fixed @@ -156,7 +166,7 @@ paperless-ng 0.9.0 2.6.1 -##### +===== * We now have a logo, complete with a favicon :-) * Removed some problematic tests. @@ -168,7 +178,7 @@ paperless-ng 0.9.0 2.6.0 -##### +===== * Allow an infinite number of logs to be deleted. Thanks to `Ulli`_ for noting the problem in `#433`_. @@ -189,7 +199,7 @@ paperless-ng 0.9.0 2.5.0 -##### +===== * **New dependency**: Paperless now optimises thumbnail generation with `optipng`_, so you'll need to install that somewhere in your PATH or declare @@ -233,7 +243,7 @@ paperless-ng 0.9.0 2.4.0 -##### +===== * A new set of actions are now available thanks to `jonaswinkler`_'s very first pull request! You can now do nifty things like tag documents in bulk, or set @@ -254,7 +264,7 @@ paperless-ng 0.9.0 2.3.0 -##### +===== * Support for consuming plain text & markdown documents was added by `Joshua Taillon`_! This was a long-requested feature, and it's addition is @@ -272,14 +282,14 @@ paperless-ng 0.9.0 2.2.1 -##### +===== * `Kyle Lucy`_ reported a bug quickly after the release of 2.2.0 where we broke the ``DISABLE_LOGIN`` feature: `#392`_. 2.2.0 -##### +===== * Thanks to `dadosch`_, `Wolfgang Mader`_, and `Tim Brooks`_ this is the first version of Paperless that supports Django 2.0! As a result of their hard @@ -296,7 +306,7 @@ paperless-ng 0.9.0 2.1.0 -##### +===== * `Enno Lohmeier`_ added three simple features that make Paperless a lot more user (and developer) friendly: @@ -315,7 +325,7 @@ paperless-ng 0.9.0 2.0.0 -##### +===== This is a big release as we've changed a core-functionality of Paperless: we no longer encrypt files with GPG by default. @@ -347,7 +357,7 @@ Special thanks to `erikarvstedt`_, `matthewmoto`_, and `mcronce`_ who did the bulk of the work on this big change. 1.4.0 -##### +===== * `Quentin Dawans`_ has refactored the document consumer to allow for some command-line options. Notably, you can now direct it to consume from a @@ -382,7 +392,7 @@ bulk of the work on this big change. to some excellent work from `erikarvstedt`_ on `#351`_ 1.3.0 -##### +===== * You can now run Paperless without a login, though you'll still have to create at least one user. This is thanks to a pull-request from `matthewmoto`_: @@ -405,7 +415,7 @@ bulk of the work on this big change. problem and helping me find where to fix it. 1.2.0 -##### +===== * New Docker image, now based on Alpine, thanks to the efforts of `addadi`_ and `Pit`_. This new image is dramatically smaller than the Debian-based @@ -424,7 +434,7 @@ bulk of the work on this big change. in the document text. 1.1.0 -##### +===== * Fix for `#283`_, a redirect bug which broke interactions with paperless-desktop. Thanks to `chris-aeviator`_ for reporting it. @@ -434,7 +444,7 @@ bulk of the work on this big change. `Dan Panzarella`_ 1.0.0 -##### +===== * Upgrade to Django 1.11. **You'll need to run ``pip install -r requirements.txt`` after the usual ``git pull`` to @@ -453,14 +463,14 @@ bulk of the work on this big change. `Lukas Winkler`_'s issue `#278`_ 0.8.0 -##### +===== * Paperless can now run in a subdirectory on a host (``/paperless``), rather than always running in the root (``/``) thanks to `maphy-psd`_'s work on `#255`_. 0.7.0 -##### +===== * **Potentially breaking change**: As per `#235`_, Paperless will no longer automatically delete documents attached to correspondents when those @@ -472,7 +482,7 @@ bulk of the work on this big change. `Kusti Skytén`_ for posting the correct solution in the Github issue. 0.6.0 -##### +===== * Abandon the shared-secret trick we were using for the POST API in favour of BasicAuth or Django session. @@ -486,7 +496,7 @@ bulk of the work on this big change. the help with this feature. 0.5.0 -##### +===== * Support for fuzzy matching in the auto-tagger & auto-correspondent systems thanks to `Jake Gysland`_'s patch `#220`_. @@ -504,13 +514,13 @@ bulk of the work on this big change. * Amended the Django Admin configuration to have nice headers (`#230`_) 0.4.1 -##### +===== * Fix for `#206`_ wherein the pluggable parser didn't recognise files with all-caps suffixes like ``.PDF`` 0.4.0 -##### +===== * Introducing reminders. See `#199`_ for more information, but the short explanation is that you can now attach simple notes & times to documents @@ -520,7 +530,7 @@ bulk of the work on this big change. like to make use of this feature in his project. 0.3.6 -##### +===== * Fix for `#200`_ (!!) where the API wasn't configured to allow updating the correspondent or the tags for a document. @@ -534,7 +544,7 @@ bulk of the work on this big change. documentation is on its way. 0.3.5 -##### +===== * A serious facelift for the documents listing page wherein we drop the tabular layout in favour of a tiled interface. @@ -545,7 +555,7 @@ bulk of the work on this big change. consumption. 0.3.4 -##### +===== * Removal of django-suit due to a licensing conflict I bumped into in 0.3.3. Note that you *can* use Django Suit with Paperless, but only in a @@ -558,26 +568,26 @@ bulk of the work on this big change. API thanks to @thomasbrueggemann. See `#179`_. 0.3.3 -##### +===== * Thumbnails in the UI and a Django-suit -based face-lift courtesy of @ekw! * Timezone, items per page, and default language are now all configurable, also thanks to @ekw. 0.3.2 -##### +===== * Fix for `#172`_: defaulting ALLOWED_HOSTS to ``["*"]`` and allowing the user to set her own value via ``PAPERLESS_ALLOWED_HOSTS`` should the need arise. 0.3.1 -##### +===== * Added a default value for ``CONVERT_BINARY`` 0.3.0 -##### +===== * Updated to using django-filter 1.x * Added some system checks so new users aren't confused by misconfigurations. @@ -590,7 +600,7 @@ bulk of the work on this big change. ``PAPERLESS_SHARED_SECRET`` respectively instead. 0.2.0 -##### +===== * `#150`_: The media root is now a variable you can set in ``paperless.conf``. @@ -618,7 +628,7 @@ bulk of the work on this big change. to `Martin Honermeyer`_ and `Tim White`_ for working with me on this. 0.1.1 -##### +===== * Potentially **Breaking Change**: All references to "sender" in the code have been renamed to "correspondent" to better reflect the nature of the @@ -642,7 +652,7 @@ bulk of the work on this big change. to be imported but made unavailable. 0.1.0 -##### +===== * Docker support! Big thanks to `Wayne Werner`_, `Brian Conn`_, and `Tikitu de Jager`_ for this one, and especially to `Pit`_ @@ -661,14 +671,14 @@ bulk of the work on this big change. * Added tox with pep8 checking 0.0.6 -##### +===== * Added support for parallel OCR (significant work from `Pit`_) * Sped up the language detection (significant work from `Pit`_) * Added simple logging 0.0.5 -##### +===== * Added support for image files as documents (png, jpg, gif, tiff) * Added a crude means of HTTP POST for document imports @@ -677,7 +687,7 @@ bulk of the work on this big change. * Documentation for the above as well as data migration 0.0.4 -##### +===== * Added automated tagging basted on keyword matching * Cleaned up the document listing page @@ -685,19 +695,19 @@ bulk of the work on this big change. * Added ``pytz`` to the list of requirements 0.0.3 -##### +===== * Added basic tagging 0.0.2 -##### +===== * Added language detection * Added datestamps to ``document_exporter``. * Changed ``settings.TESSERACT_LANGUAGE`` to ``settings.OCR_LANGUAGE``. 0.0.1 -##### +===== * Initial release diff --git a/docs/configuration.rst b/docs/configuration.rst index e3f0c0e9f..c3f01c2ca 100644 --- a/docs/configuration.rst +++ b/docs/configuration.rst @@ -35,22 +35,22 @@ PAPERLESS_DBHOST= PAPERLESS_DBPORT= Adjust port if necessary. - + Default is 5432. PAPERLESS_DBNAME= Database name in PostgreSQL. - + Defaults to "paperless". PAPERLESS_DBUSER= Database user in PostgreSQL. - + Defaults to "paperless". PAPERLESS_DBPASS= Database password for PostgreSQL. - + Defaults to "paperless". @@ -113,7 +113,7 @@ PAPERLESS_ALLOWED_HOSTS really should set this value to the domain name you're using. Failing to do so leaves you open to HTTP host header attacks: https://docs.djangoproject.com/en/3.1/topics/security/#host-header-validation - + Just remember that this is a comma-separated list, so "example.com" is fine, as is "example.com,www.example.com", but NOT " example.com" or "example.com," @@ -132,15 +132,25 @@ PAPERLESS_FORCE_SCRIPT_NAME= .. note:: I don't know if this works in paperless-ng. Probably not. - + Defaults to none, which hosts paperless at "/". PAPERLESS_STATIC_URL= Override the STATIC_URL here. Unless you're hosting Paperless off a subdomain like /paperless/, you probably don't need to change this. - + Defaults to "/static/". +PAPERLESS_AUTO_LOGIN_USERNAME= + Specify a username here so that paperless will automatically perform login + with the selected user. + + .. danger:: + + Do not use this when exposing paperless on the internet. There are no + checks in place that would prevent you from doing this. + + Defaults to none, which disables this feature. Software tweaks ############### @@ -156,11 +166,11 @@ PAPERLESS_THREADS_PER_WORKER= in parallel on a single document. .. caution:: - + Ensure that the product - + PAPERLESS_TASK_WORKERS * PAPERLESS_THREADS_PER_WORKER - + does not exceed your CPU core count or else paperless will be extremely slow. If you want paperless to process many documents in parallel, choose a high worker count. If you want paperless to process very large documents faster, @@ -197,10 +207,10 @@ PAPERLESS_OCR_PAGES= PAPERLESS_OCR_LANGUAGE= Customize the default language that tesseract will attempt to use when parsing documents. The default language is used whenever - + * No language could be detected on a document * No tesseract data files are available for the detected language - + It should be a 3-letter language code consistent with ISO 639: https://www.loc.gov/standards/iso639-2/php/code_list.php @@ -234,7 +244,7 @@ PAPERLESS_CONVERT_MEMORY_LIMIT= such cases, try setting this to a reasonably low value, like 32. The default is to use whatever is necessary to do everything without writing to disk, and units are in megabytes. - + For more information on how to use this value, you should search the web for "MAGICK_MEMORY_LIMIT". @@ -245,7 +255,7 @@ PAPERLESS_CONVERT_TMPDIR= /tmp as tmpfs, you should set this to a path that's on a physical disk, like /home/your_user/tmp or something. ImageMagick will use this as scratch space when crunching through very large documents. - + For more information on how to use this value, you should search the web for "MAGICK_TMPDIR". @@ -282,7 +292,7 @@ PAPERLESS_FILENAME_DATE_ORDER= Use this setting to enable checking the document filename for date information. The date order can be set to any option as specified in https://dateparser.readthedocs.io/en/latest/settings.html#date-order. - The filename will be checked first, and if nothing is found, the document + The filename will be checked first, and if nothing is found, the document text will be checked as normal. Defaults to none, which disables this feature. diff --git a/paperless.conf.example b/paperless.conf.example index 4749151e7..4962c1567 100644 --- a/paperless.conf.example +++ b/paperless.conf.example @@ -29,6 +29,7 @@ #PAPERLESS_CORS_ALLOWED_HOSTS=localhost:8080,example.com,localhost:8000 #PAPERLESS_FORCE_SCRIPT_NAME= #PAPERLESS_STATIC_URL=/static/ +#PAPERLESS_AUTO_LOGIN_USERNAME= # Software tweaks diff --git a/src/paperless/auth.py b/src/paperless/auth.py index 83279ef36..faf3104bc 100644 --- a/src/paperless/auth.py +++ b/src/paperless/auth.py @@ -1,8 +1,18 @@ from django.conf import settings from django.contrib.auth.models import User +from django.utils.deprecation import MiddlewareMixin from rest_framework import authentication +class AutoLoginMiddleware(MiddlewareMixin): + + def process_request(self, request): + try: + request.user = User.objects.get(username=settings.AUTO_LOGIN_USERNAME) + except User.DoesNotExist: + pass + + class AngularApiAuthenticationOverride(authentication.BaseAuthentication): """ This class is here to provide authentication to the angular dev server during development. This is disabled in production. diff --git a/src/paperless/settings.py b/src/paperless/settings.py index 0d64efa57..1432dc5ec 100644 --- a/src/paperless/settings.py +++ b/src/paperless/settings.py @@ -144,6 +144,15 @@ TEMPLATES = [ # Security # ############################################################################### +AUTO_LOGIN_USERNAME = os.getenv("PAPERLESS_AUTO_LOGIN_USERNAME") + +if AUTO_LOGIN_USERNAME: + _index = MIDDLEWARE.index('django.contrib.auth.middleware.AuthenticationMiddleware') + # This overrides everything the auth middleware is doing but still allows + # regular login in case the provided user does not exist. + MIDDLEWARE.insert(_index+1, 'paperless.auth.AutoLoginMiddleware') + + if DEBUG: X_FRAME_OPTIONS = '' # this should really be 'allow-from uri' but its not supported in any mayor From cd6e7d9563762d5fcdf5eb39a7e8cba58015937d Mon Sep 17 00:00:00 2001 From: Jonas Winkler Date: Mon, 23 Nov 2020 23:39:42 +0100 Subject: [PATCH 04/36] fixed a typo in one of the components --- src-ui/src/app/app.module.ts | 4 ++-- .../document-card-large.component.html | 2 +- .../result-highlight.component.html} | 0 .../result-highlight.component.scss} | 0 .../result-highlight.component.spec.ts} | 12 ++++++------ .../result-highlight.component.ts} | 8 ++++---- 6 files changed, 13 insertions(+), 13 deletions(-) rename src-ui/src/app/components/search/{result-hightlight/result-hightlight.component.html => result-highlight/result-highlight.component.html} (100%) rename src-ui/src/app/components/search/{result-hightlight/result-hightlight.component.scss => result-highlight/result-highlight.component.scss} (100%) rename src-ui/src/app/components/search/{result-hightlight/result-hightlight.component.spec.ts => result-highlight/result-highlight.component.spec.ts} (51%) rename src-ui/src/app/components/search/{result-hightlight/result-hightlight.component.ts => result-highlight/result-highlight.component.ts} (54%) diff --git a/src-ui/src/app/app.module.ts b/src-ui/src/app/app.module.ts index 3ccb1c5f1..7f2e8414e 100644 --- a/src-ui/src/app/app.module.ts +++ b/src-ui/src/app/app.module.ts @@ -23,7 +23,7 @@ import { TagEditDialogComponent } from './components/manage/tag-list/tag-edit-di import { DocumentTypeEditDialogComponent } from './components/manage/document-type-list/document-type-edit-dialog/document-type-edit-dialog.component'; import { TagComponent } from './components/common/tag/tag.component'; import { SearchComponent } from './components/search/search.component'; -import { ResultHightlightComponent } from './components/search/result-hightlight/result-hightlight.component'; +import { ResultHighlightComponent } from './components/search/result-highlight/result-highlight.component'; import { PageHeaderComponent } from './components/common/page-header/page-header.component'; import { AppFrameComponent } from './components/app-frame/app-frame.component'; import { ToastsComponent } from './components/common/toasts/toasts.component'; @@ -65,7 +65,7 @@ import { WidgetFrameComponent } from './components/dashboard/widgets/widget-fram DocumentTypeEditDialogComponent, TagComponent, SearchComponent, - ResultHightlightComponent, + ResultHighlightComponent, PageHeaderComponent, AppFrameComponent, ToastsComponent, diff --git a/src-ui/src/app/components/document-list/document-card-large/document-card-large.component.html b/src-ui/src/app/components/document-list/document-card-large/document-card-large.component.html index c61a6b069..305cb37d2 100644 --- a/src-ui/src/app/components/document-list/document-card-large/document-card-large.component.html +++ b/src-ui/src/app/components/document-list/document-card-large/document-card-large.component.html @@ -11,7 +11,7 @@
#{{document.archive_serial_number}}

- + {{getDetailsAsString()}}

diff --git a/src-ui/src/app/components/search/result-hightlight/result-hightlight.component.html b/src-ui/src/app/components/search/result-highlight/result-highlight.component.html similarity index 100% rename from src-ui/src/app/components/search/result-hightlight/result-hightlight.component.html rename to src-ui/src/app/components/search/result-highlight/result-highlight.component.html diff --git a/src-ui/src/app/components/search/result-hightlight/result-hightlight.component.scss b/src-ui/src/app/components/search/result-highlight/result-highlight.component.scss similarity index 100% rename from src-ui/src/app/components/search/result-hightlight/result-hightlight.component.scss rename to src-ui/src/app/components/search/result-highlight/result-highlight.component.scss diff --git a/src-ui/src/app/components/search/result-hightlight/result-hightlight.component.spec.ts b/src-ui/src/app/components/search/result-highlight/result-highlight.component.spec.ts similarity index 51% rename from src-ui/src/app/components/search/result-hightlight/result-hightlight.component.spec.ts rename to src-ui/src/app/components/search/result-highlight/result-highlight.component.spec.ts index e9da5d314..8e00a9d0b 100644 --- a/src-ui/src/app/components/search/result-hightlight/result-hightlight.component.spec.ts +++ b/src-ui/src/app/components/search/result-highlight/result-highlight.component.spec.ts @@ -1,20 +1,20 @@ import { ComponentFixture, TestBed } from '@angular/core/testing'; -import { ResultHightlightComponent } from './result-hightlight.component'; +import { ResultHighlightComponent } from './result-highlight.component'; -describe('ResultHightlightComponent', () => { - let component: ResultHightlightComponent; - let fixture: ComponentFixture; +describe('ResultHighlightComponent', () => { + let component: ResultHighlightComponent; + let fixture: ComponentFixture; beforeEach(async () => { await TestBed.configureTestingModule({ - declarations: [ ResultHightlightComponent ] + declarations: [ ResultHighlightComponent ] }) .compileComponents(); }); beforeEach(() => { - fixture = TestBed.createComponent(ResultHightlightComponent); + fixture = TestBed.createComponent(ResultHighlightComponent); component = fixture.componentInstance; fixture.detectChanges(); }); diff --git a/src-ui/src/app/components/search/result-hightlight/result-hightlight.component.ts b/src-ui/src/app/components/search/result-highlight/result-highlight.component.ts similarity index 54% rename from src-ui/src/app/components/search/result-hightlight/result-hightlight.component.ts rename to src-ui/src/app/components/search/result-highlight/result-highlight.component.ts index cd37448e0..d9a1a50b1 100644 --- a/src-ui/src/app/components/search/result-hightlight/result-hightlight.component.ts +++ b/src-ui/src/app/components/search/result-highlight/result-highlight.component.ts @@ -2,11 +2,11 @@ import { Component, Input, OnInit } from '@angular/core'; import { SearchHitHighlight } from 'src/app/data/search-result'; @Component({ - selector: 'app-result-hightlight', - templateUrl: './result-hightlight.component.html', - styleUrls: ['./result-hightlight.component.scss'] + selector: 'app-result-highlight', + templateUrl: './result-highlight.component.html', + styleUrls: ['./result-highlight.component.scss'] }) -export class ResultHightlightComponent implements OnInit { +export class ResultHighlightComponent implements OnInit { constructor() { } From dd833643268e718c1c8e86708f7da8bc2513bcee Mon Sep 17 00:00:00 2001 From: Jonas Winkler Date: Wed, 25 Nov 2020 10:52:38 +0100 Subject: [PATCH 05/36] default language check --- src/documents/__init__.py | 3 ++- src/paperless_tesseract/__init__.py | 2 ++ src/paperless_tesseract/checks.py | 24 ++++++++++++++++++++++++ 3 files changed, 28 insertions(+), 1 deletion(-) create mode 100644 src/paperless_tesseract/checks.py diff --git a/src/documents/__init__.py b/src/documents/__init__.py index 864b5f5fe..5c9f358c3 100644 --- a/src/documents/__init__.py +++ b/src/documents/__init__.py @@ -1 +1,2 @@ -from .checks import changed_password_check +# this is here so that django finds the checks. +from .checks import * diff --git a/src/paperless_tesseract/__init__.py b/src/paperless_tesseract/__init__.py index e69de29bb..5c9f358c3 100644 --- a/src/paperless_tesseract/__init__.py +++ b/src/paperless_tesseract/__init__.py @@ -0,0 +1,2 @@ +# this is here so that django finds the checks. +from .checks import * diff --git a/src/paperless_tesseract/checks.py b/src/paperless_tesseract/checks.py new file mode 100644 index 000000000..21f229e65 --- /dev/null +++ b/src/paperless_tesseract/checks.py @@ -0,0 +1,24 @@ +import subprocess + +from django.conf import settings +from django.core.checks import Error, register + + +def get_tesseract_langs(): + with subprocess.Popen(['tesseract', '--list-langs'], stdout=subprocess.PIPE) as p: + stdout, stderr = p.communicate() + + return stdout.decode().strip().split("\n")[1:] + + +@register() +def check_default_language_available(app_configs, **kwargs): + langs = get_tesseract_langs() + + if not settings.OCR_LANGUAGE in langs: + return [Error( + f"The default ocr language {settings.OCR_LANGUAGE} is " + f"not installed. Paperless cannot OCR your documents " + f"without it. Please fix PAPERLESS_OCR_LANGUAGE.")] + else: + return [] From 6aca09d4856b458225a0df1bb18a70fc24509664 Mon Sep 17 00:00:00 2001 From: Jonas Winkler Date: Wed, 25 Nov 2020 15:06:27 +0100 Subject: [PATCH 06/36] additional note about the automatic matching algorithm --- docs/advanced_usage.rst | 9 ++++++++- 1 file changed, 8 insertions(+), 1 deletion(-) diff --git a/docs/advanced_usage.rst b/docs/advanced_usage.rst index 653bee1c6..fca3ff4df 100644 --- a/docs/advanced_usage.rst +++ b/docs/advanced_usage.rst @@ -147,7 +147,9 @@ America are tagged with the tag "bofa_123" and the matching algorithm of this tag is set to *Auto*, this neural network will examine your documents and automatically learn when to assign this tag. -There are a couple caveats you need to keep in mind when using this feature: +Paperless tries to hide much of the involved complexity with this approach. +However, there are a couple caveats you need to keep in mind when using this +feature: * Changes to your documents are not immediately reflected by the matching algorithm. The neural network needs to be *trained* on your documents after @@ -167,6 +169,11 @@ There are a couple caveats you need to keep in mind when using this feature: has the correspondent "Very obscure web shop I bought something five years ago", it will probably not assign this correspondent automatically if you buy something from them again. The more documents, the better. +* Paperless also needs a reasonable amount of negative examples to decide when + not to assign a certain tag, correspondent or type. This will usually be the + case as you start filling up paperless with documents. Example: If all your + documents are either from "Webshop" and "Bank", paperless will assign one of + these correspondents to ANY new document, if both are set to automatic matching. Hooking into the consumption process #################################### From 751c2ac54bfb69612c26acb2cd6ae66053971e7e Mon Sep 17 00:00:00 2001 From: Jonas Winkler Date: Wed, 25 Nov 2020 16:04:58 +0100 Subject: [PATCH 07/36] added a simple sanity checker. --- .../migrations/1004_sanity_check_schedule.py | 26 +++++ src/documents/sanity_checker.py | 94 +++++++++++++++++++ src/documents/tasks.py | 12 ++- 3 files changed, 131 insertions(+), 1 deletion(-) create mode 100644 src/documents/migrations/1004_sanity_check_schedule.py create mode 100644 src/documents/sanity_checker.py diff --git a/src/documents/migrations/1004_sanity_check_schedule.py b/src/documents/migrations/1004_sanity_check_schedule.py new file mode 100644 index 000000000..b6346d479 --- /dev/null +++ b/src/documents/migrations/1004_sanity_check_schedule.py @@ -0,0 +1,26 @@ +# Generated by Django 3.1.3 on 2020-11-25 14:53 + +from django.db import migrations +from django.db.migrations import RunPython +from django_q.models import Schedule +from django_q.tasks import schedule + + +def add_schedules(apps, schema_editor): + schedule('documents.tasks.sanity_check', name="Perform sanity check", schedule_type=Schedule.WEEKLY) + + +def remove_schedules(apps, schema_editor): + Schedule.objects.filter(func='documents.tasks.sanity_check').delete() + + +class Migration(migrations.Migration): + + dependencies = [ + ('documents', '1003_mime_types'), + ('django_q', '0013_task_attempt_count'), + ] + + operations = [ + RunPython(add_schedules, remove_schedules) + ] diff --git a/src/documents/sanity_checker.py b/src/documents/sanity_checker.py new file mode 100644 index 000000000..18bb3781c --- /dev/null +++ b/src/documents/sanity_checker.py @@ -0,0 +1,94 @@ +import hashlib +import os + +from django.conf import settings + +from documents.models import Document + + +class SanityMessage: + message = None + + +class SanityWarning(SanityMessage): + def __init__(self, message): + self.message = message + + def __str__(self): + return f"Warning: {self.message}" + + +class SanityError(SanityMessage): + def __init__(self, message): + self.message = message + + def __str__(self): + return f"ERROR: {self.message}" + + +class SanityFailedError(Exception): + + def __init__(self, messages): + self.messages = messages + + def __str__(self): + message_string = "\n".join([str(m) for m in self.messages]) + return ( + f"The following issuse were found by the sanity checker:\n" + f"{message_string}\n\n===============\n\n") + + +def check_sanity(): + messages = [] + + present_files = [] + for root, subdirs, files in os.walk(settings.MEDIA_ROOT): + for f in files: + present_files.append(os.path.normpath(os.path.join(root, f))) + + for doc in Document.objects.all(): + # Check thumbnail + if not os.path.isfile(doc.thumbnail_path): + messages.append(SanityError( + f"Thumbnail of document {doc.pk} does not exist.")) + else: + present_files.remove(os.path.normpath(doc.thumbnail_path)) + try: + with doc.thumbnail_file as f: + f.read() + except OSError as e: + messages.append(SanityError( + f"Cannot read thumbnail file of document {doc.pk}: {e}" + )) + + # Check document + if not os.path.isfile(doc.source_path): + messages.append(SanityError( + f"Original of document {doc.pk} does not exist.")) + else: + present_files.remove(os.path.normpath(doc.source_path)) + checksum = None + try: + with doc.source_file as f: + checksum = hashlib.md5(f.read()).hexdigest() + except OSError as e: + messages.append(SanityError( + f"Cannot read original file of document {doc.pk}: {e}")) + + if checksum and not checksum == doc.checksum: + messages.append(SanityError( + f"Checksum mismatch of document {doc.pk}. " + f"Stored: {doc.checksum}, actual: {checksum}." + )) + + if not doc.content: + messages.append(SanityWarning( + f"Document {doc.pk} has no content." + )) + + for extra_file in present_files: + messages.append(SanityWarning( + f"Orphaned file in media dir: {extra_file}" + )) + + return messages diff --git a/src/documents/tasks.py b/src/documents/tasks.py index 40ed8f25e..3c9baad08 100644 --- a/src/documents/tasks.py +++ b/src/documents/tasks.py @@ -3,11 +3,12 @@ import logging from django.conf import settings from whoosh.writing import AsyncWriter -from documents import index +from documents import index, sanity_checker from documents.classifier import DocumentClassifier, \ IncompatibleClassifierVersionError from documents.consumer import Consumer, ConsumerError from documents.models import Document +from documents.sanity_checker import SanityFailedError def index_optimize(): @@ -74,3 +75,12 @@ def consume_file(path, else: raise ConsumerError("Unknown error: Returned document was null, but " "no error message was given.") + + +def sanity_check(): + messages = sanity_checker.check_sanity() + + if len(messages) > 0: + raise SanityFailedError(messages) + else: + return "No issues detected." From d92214d41204c5a545f0926f8b4123019434611d Mon Sep 17 00:00:00 2001 From: Jonas Winkler Date: Wed, 25 Nov 2020 16:05:52 +0100 Subject: [PATCH 08/36] codestyle --- src/paperless/auth.py | 3 ++- src/paperless_tesseract/checks.py | 5 +++-- 2 files changed, 5 insertions(+), 3 deletions(-) diff --git a/src/paperless/auth.py b/src/paperless/auth.py index faf3104bc..ece5d0eba 100644 --- a/src/paperless/auth.py +++ b/src/paperless/auth.py @@ -8,7 +8,8 @@ class AutoLoginMiddleware(MiddlewareMixin): def process_request(self, request): try: - request.user = User.objects.get(username=settings.AUTO_LOGIN_USERNAME) + request.user = User.objects.get( + username=settings.AUTO_LOGIN_USERNAME) except User.DoesNotExist: pass diff --git a/src/paperless_tesseract/checks.py b/src/paperless_tesseract/checks.py index 21f229e65..8a06d7b00 100644 --- a/src/paperless_tesseract/checks.py +++ b/src/paperless_tesseract/checks.py @@ -5,7 +5,8 @@ from django.core.checks import Error, register def get_tesseract_langs(): - with subprocess.Popen(['tesseract', '--list-langs'], stdout=subprocess.PIPE) as p: + with subprocess.Popen(['tesseract', '--list-langs'], + stdout=subprocess.PIPE) as p: stdout, stderr = p.communicate() return stdout.decode().strip().split("\n")[1:] @@ -15,7 +16,7 @@ def get_tesseract_langs(): def check_default_language_available(app_configs, **kwargs): langs = get_tesseract_langs() - if not settings.OCR_LANGUAGE in langs: + if settings.OCR_LANGUAGE not in langs: return [Error( f"The default ocr language {settings.OCR_LANGUAGE} is " f"not installed. Paperless cannot OCR your documents " From 1987dccf48dbabdae6203aa796ad8886ad6d4420 Mon Sep 17 00:00:00 2001 From: Jonas Winkler Date: Wed, 25 Nov 2020 16:30:53 +0100 Subject: [PATCH 09/36] changelog --- docs/changelog.rst | 2 ++ 1 file changed, 2 insertions(+) diff --git a/docs/changelog.rst b/docs/changelog.rst index 7a1b1c374..c494cecb9 100644 --- a/docs/changelog.rst +++ b/docs/changelog.rst @@ -10,6 +10,8 @@ next * Setting ``PAPERLESS_AUTO_LOGIN_USERNAME`` replaces ``PAPERLESS_DISABLE_LOGIN``. You have to specify your username. +* Added a simple sanity checker that checks your documents for missing or orphaned files, + files with wrong checksums, inaccessible files, and documents with empty content. paperless-ng 0.9.2 From 97639508cba7f157bb7b472e7b5fec429cfa08d5 Mon Sep 17 00:00:00 2001 From: Jonas Winkler Date: Wed, 25 Nov 2020 17:16:04 +0100 Subject: [PATCH 10/36] Basic support for metadata added. Fixes 404 for actions on invalid document ids. --- src/documents/serialisers.py | 3 --- src/documents/views.py | 24 ++++++++++++++++++------ 2 files changed, 18 insertions(+), 9 deletions(-) diff --git a/src/documents/serialisers.py b/src/documents/serialisers.py index e0ad73a23..c86aa8c83 100644 --- a/src/documents/serialisers.py +++ b/src/documents/serialisers.py @@ -93,14 +93,11 @@ class DocumentSerializer(serializers.ModelSerializer): "document_type_id", "title", "content", - "mime_type", "tags", "tags_id", - "checksum", "created", "modified", "added", - "file_name", "archive_serial_number" ) diff --git a/src/documents/views.py b/src/documents/views.py index 14323e933..287fb114c 100755 --- a/src/documents/views.py +++ b/src/documents/views.py @@ -149,13 +149,25 @@ class DocumentViewSet(RetrieveModelMixin, else: return HttpResponseBadRequest(str(form.errors)) + @action(methods=['get'], detail=True) + def metadata(self, request, pk=None): + try: + doc = Document.objects.get(pk=pk) + return Response({ + "paperless__checksum": doc.checksum, + "paperless__mime_type": doc.mime_type, + "paperless__filename": doc.filename, + }) + except Document.DoesNotExist: + raise Http404() + @action(methods=['get'], detail=True) def preview(self, request, pk=None): try: response = self.file_response(pk, "inline") return response - except FileNotFoundError: - raise Http404("Document source file does not exist") + except (FileNotFoundError, Document.DoesNotExist): + raise Http404() @action(methods=['get'], detail=True) @cache_control(public=False, max_age=315360000) @@ -163,15 +175,15 @@ class DocumentViewSet(RetrieveModelMixin, try: return HttpResponse(Document.objects.get(id=pk).thumbnail_file, content_type='image/png') - except FileNotFoundError: - raise Http404("Document thumbnail does not exist") + except (FileNotFoundError, Document.DoesNotExist): + raise Http404() @action(methods=['get'], detail=True) def download(self, request, pk=None): try: return self.file_response(pk, "attachment") - except FileNotFoundError: - raise Http404("Document source file does not exist") + except (FileNotFoundError, Document.DoesNotExist): + raise Http404() class LogViewSet(ReadOnlyModelViewSet): From 3b38ac0f9bbcf54f98efae456ce08c07afa8a4f1 Mon Sep 17 00:00:00 2001 From: Jonas Winkler Date: Wed, 25 Nov 2020 20:22:56 +0100 Subject: [PATCH 11/36] Removed ability to encrypt documents. --- docs/administration.rst | 24 ++-------- ...e_storage_type.py => decrypt_documents.py} | 44 +------------------ 2 files changed, 5 insertions(+), 63 deletions(-) rename src/documents/management/commands/{change_storage_type.py => decrypt_documents.py} (59%) diff --git a/docs/administration.rst b/docs/administration.rst index c582e83a0..610d2c9d3 100644 --- a/docs/administration.rst +++ b/docs/administration.rst @@ -353,39 +353,23 @@ Documents can be stored in Paperless using GnuPG encryption. Consider running paperless on an encrypted filesystem instead, which will then at least provide security against physical hardware theft. -.. code:: - - change_storage_type [--passphrase PASSPHRASE] {gpg,unencrypted} {gpg,unencrypted} - - positional arguments: - {gpg,unencrypted} The state you want to change your documents from - {gpg,unencrypted} The state you want to change your documents to - - optional arguments: - --passphrase PASSPHRASE Enabling encryption ------------------- -Basic usage to enable encryption of your document store (**USE A MORE SECURE PASSPHRASE**): - -(Note: If ``PAPERLESS_PASSPHRASE`` isn't set already, you need to specify it here) - -.. code:: - - change_storage_type [--passphrase SECR3TP4SSPHRA$E] unencrypted gpg +Enabling encryption is no longer supported. Disabling encryption -------------------- -Basic usage to enable encryption of your document store: +Basic usage to disable encryption of your document store: -(Note: Again, if ``PAPERLESS_PASSPHRASE`` isn't set already, you need to specify it here) +(Note: If ``PAPERLESS_PASSPHRASE`` isn't set already, you need to specify it here) .. code:: - change_storage_type [--passphrase SECR3TP4SSPHRA$E] gpg unencrypted + decrypt_documents [--passphrase SECR3TP4SSPHRA$E] .. _Pipenv: https://pipenv.pypa.io/en/latest/ \ No newline at end of file diff --git a/src/documents/management/commands/change_storage_type.py b/src/documents/management/commands/decrypt_documents.py similarity index 59% rename from src/documents/management/commands/change_storage_type.py rename to src/documents/management/commands/decrypt_documents.py index 344d3388d..e4d607fa8 100644 --- a/src/documents/management/commands/change_storage_type.py +++ b/src/documents/management/commands/decrypt_documents.py @@ -17,16 +17,6 @@ class Command(BaseCommand): def add_arguments(self, parser): - parser.add_argument( - "from", - choices=("gpg", "unencrypted"), - help="The state you want to change your documents from" - ) - parser.add_argument( - "to", - choices=("gpg", "unencrypted"), - help="The state you want to change your documents to" - ) parser.add_argument( "--passphrase", help="If PAPERLESS_PASSPHRASE isn't set already, you need to " @@ -50,11 +40,6 @@ class Command(BaseCommand): except KeyboardInterrupt: return - if options["from"] == options["to"]: - raise CommandError( - 'The "from" and "to" values can\'t be the same.' - ) - passphrase = options["passphrase"] or settings.PASSPHRASE if not passphrase: raise CommandError( @@ -62,10 +47,7 @@ class Command(BaseCommand): "by declaring it in your environment or your config." ) - if options["from"] == "gpg" and options["to"] == "unencrypted": - self.__gpg_to_unencrypted(passphrase) - elif options["from"] == "unencrypted" and options["to"] == "gpg": - self.__unencrypted_to_gpg(passphrase) + self.__gpg_to_unencrypted(passphrase) @staticmethod def __gpg_to_unencrypted(passphrase): @@ -94,27 +76,3 @@ class Command(BaseCommand): for path in old_paths: os.unlink(path) - - @staticmethod - def __unencrypted_to_gpg(passphrase): - - unencrypted_files = Document.objects.filter( - storage_type=Document.STORAGE_TYPE_UNENCRYPTED) - - for document in unencrypted_files: - - print(coloured("Encrypting {}".format(document), "green")) - - old_paths = [document.source_path, document.thumbnail_path] - with open(document.source_path, "rb") as raw_document: - with open(document.thumbnail_path, "rb") as raw_thumb: - document.storage_type = Document.STORAGE_TYPE_GPG - with open(document.source_path, "wb") as f: - f.write(GnuPG.encrypted(raw_document, passphrase)) - with open(document.thumbnail_path, "wb") as f: - f.write(GnuPG.encrypted(raw_thumb, passphrase)) - - document.save(update_fields=("storage_type",)) - - for path in old_paths: - os.unlink(path) From 2163015d06dc4c50e8dc5aa35fef091d369788a6 Mon Sep 17 00:00:00 2001 From: Jonas Winkler Date: Wed, 25 Nov 2020 20:26:07 +0100 Subject: [PATCH 12/36] changelog --- docs/changelog.rst | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/docs/changelog.rst b/docs/changelog.rst index c494cecb9..e2df92863 100644 --- a/docs/changelog.rst +++ b/docs/changelog.rst @@ -12,7 +12,8 @@ next You have to specify your username. * Added a simple sanity checker that checks your documents for missing or orphaned files, files with wrong checksums, inaccessible files, and documents with empty content. - +* It is no longer possible to encrypt your documents. For the time being, paperless will + continue to operate with already encrypted documents. paperless-ng 0.9.2 ################## From ef15de18a97acdffd97c52a9fde3079fd7e0c7e0 Mon Sep 17 00:00:00 2001 From: Jonas Winkler Date: Wed, 25 Nov 2020 21:03:06 +0100 Subject: [PATCH 13/36] Paperless will continue to operate with encrypted files, however, all new files will be stored unencrypted. --- src/documents/consumer.py | 11 ++------- .../management/commands/document_importer.py | 23 +++---------------- 2 files changed, 5 insertions(+), 29 deletions(-) diff --git a/src/documents/consumer.py b/src/documents/consumer.py index 65febc937..8fed01c30 100755 --- a/src/documents/consumer.py +++ b/src/documents/consumer.py @@ -208,10 +208,7 @@ class Consumer(LoggingMixin): created = file_info.created or date or timezone.make_aware( datetime.datetime.fromtimestamp(stats.st_mtime)) - if settings.PASSPHRASE: - storage_type = Document.STORAGE_TYPE_GPG - else: - storage_type = Document.STORAGE_TYPE_UNENCRYPTED + storage_type = Document.STORAGE_TYPE_UNENCRYPTED with open(self.path, "rb") as f: document = Document.objects.create( @@ -260,8 +257,4 @@ class Consumer(LoggingMixin): def _write(self, document, source, target): with open(source, "rb") as read_file: with open(target, "wb") as write_file: - if document.storage_type == Document.STORAGE_TYPE_UNENCRYPTED: - write_file.write(read_file.read()) - return - self.log("debug", "Encrypting") - write_file.write(GnuPG.encrypted(read_file)) + write_file.write(read_file.read()) diff --git a/src/documents/management/commands/document_importer.py b/src/documents/management/commands/document_importer.py index 208a0ef37..5f50f08f6 100644 --- a/src/documents/management/commands/document_importer.py +++ b/src/documents/management/commands/document_importer.py @@ -82,8 +82,6 @@ class Command(Renderable, BaseCommand): def _import_files_from_manifest(self): storage_type = Document.STORAGE_TYPE_UNENCRYPTED - if settings.PASSPHRASE: - storage_type = Document.STORAGE_TYPE_GPG for record in self.manifest: @@ -105,23 +103,8 @@ class Command(Renderable, BaseCommand): create_source_path_directory(document.source_path) - if settings.PASSPHRASE: - - with open(document_path, "rb") as unencrypted: - with open(document.source_path, "wb") as encrypted: - print("Encrypting {} and saving it to {}".format( - doc_file, document.source_path)) - encrypted.write(GnuPG.encrypted(unencrypted)) - - with open(thumbnail_path, "rb") as unencrypted: - with open(document.thumbnail_path, "wb") as encrypted: - print("Encrypting {} and saving it to {}".format( - thumb_file, document.thumbnail_path)) - encrypted.write(GnuPG.encrypted(unencrypted)) - - else: - print(f"Moving {document_path} to {document.source_path}") - shutil.copy(document_path, document.source_path) - shutil.copy(thumbnail_path, document.thumbnail_path) + print(f"Moving {document_path} to {document.source_path}") + shutil.copy(document_path, document.source_path) + shutil.copy(thumbnail_path, document.thumbnail_path) document.save() From 2a4fe4dceb35db00a48f8cbc11171e026f269678 Mon Sep 17 00:00:00 2001 From: Jonas Winkler Date: Wed, 25 Nov 2020 21:10:50 +0100 Subject: [PATCH 14/36] fixed the decryption code, but its still untested. --- .../management/commands/decrypt_documents.py | 12 +++++++++++- 1 file changed, 11 insertions(+), 1 deletion(-) diff --git a/src/documents/management/commands/decrypt_documents.py b/src/documents/management/commands/decrypt_documents.py index e4d607fa8..f9b4edcdc 100644 --- a/src/documents/management/commands/decrypt_documents.py +++ b/src/documents/management/commands/decrypt_documents.py @@ -61,18 +61,28 @@ class Command(BaseCommand): document).encode('utf-8'), "green")) old_paths = [document.source_path, document.thumbnail_path] + raw_document = GnuPG.decrypted(document.source_file, passphrase) raw_thumb = GnuPG.decrypted(document.thumbnail_file, passphrase) document.storage_type = Document.STORAGE_TYPE_UNENCRYPTED + ext = os.path.splitext(document.filename)[1] + + if not ext == '.gpg': + raise CommandError( + f"Abort: encrypted file {document.source_path} does not " + f"end with .gpg") + + document.filename = os.path.splitext(document.source_path)[0] + with open(document.source_path, "wb") as f: f.write(raw_document) with open(document.thumbnail_path, "wb") as f: f.write(raw_thumb) - document.save(update_fields=("storage_type",)) + document.save(update_fields=("storage_type", "filename")) for path in old_paths: os.unlink(path) From 30acfdd3f12c5709189b2c302ed3861497f16ba9 Mon Sep 17 00:00:00 2001 From: Jonas Winkler Date: Thu, 26 Nov 2020 14:18:10 +0100 Subject: [PATCH 15/36] tests for the classifier and fixes for edge cases with minimal data. --- src/documents/classifier.py | 45 +++++-- src/documents/tests/test_classifier.py | 155 ++++++++++++++++++++++++- 2 files changed, 189 insertions(+), 11 deletions(-) diff --git a/src/documents/classifier.py b/src/documents/classifier.py index 6e0d6f946..b0d7d87bb 100755 --- a/src/documents/classifier.py +++ b/src/documents/classifier.py @@ -6,7 +6,8 @@ import re from sklearn.feature_extraction.text import CountVectorizer from sklearn.neural_network import MLPClassifier -from sklearn.preprocessing import MultiLabelBinarizer +from sklearn.preprocessing import MultiLabelBinarizer, LabelBinarizer +from sklearn.utils.multiclass import type_of_target from documents.models import Document, MatchingModel from paperless import settings @@ -27,7 +28,7 @@ def preprocess_content(content): class DocumentClassifier(object): - FORMAT_VERSION = 5 + FORMAT_VERSION = 6 def __init__(self): # mtime of the model file on disk. used to prevent reloading when @@ -54,6 +55,8 @@ class DocumentClassifier(object): "Cannor load classifier, incompatible versions.") else: if self.classifier_version > 0: + # Don't be confused by this check. It's simply here + # so that we wont log anything on initial reload. logger.info("Classifier updated on disk, " "reloading classifier models") self.data_hash = pickle.load(f) @@ -122,9 +125,14 @@ class DocumentClassifier(object): labels_tags_unique = set([tag for tags in labels_tags for tag in tags]) num_tags = len(labels_tags_unique) + # substract 1 since -1 (null) is also part of the classes. - num_correspondents = len(set(labels_correspondent)) - 1 - num_document_types = len(set(labels_document_type)) - 1 + + # union with {-1} accounts for cases where all documents have + # correspondents and types assigned, so -1 isnt part of labels_x, which + # it usually is. + num_correspondents = len(set(labels_correspondent) | {-1}) - 1 + num_document_types = len(set(labels_document_type) | {-1}) - 1 logging.getLogger(__name__).debug( "{} documents, {} tag(s), {} correspondent(s), " @@ -145,12 +153,23 @@ class DocumentClassifier(object): ) data_vectorized = self.data_vectorizer.fit_transform(data) - self.tags_binarizer = MultiLabelBinarizer() - labels_tags_vectorized = self.tags_binarizer.fit_transform(labels_tags) - # Step 3: train the classifiers if num_tags > 0: logging.getLogger(__name__).debug("Training tags classifier...") + + if num_tags == 1: + # Special case where only one tag has auto: + # Fallback to binary classification. + labels_tags = [label[0] if len(label) == 1 else -1 + for label in labels_tags] + self.tags_binarizer = LabelBinarizer() + labels_tags_vectorized = self.tags_binarizer.fit_transform( + labels_tags).ravel() + else: + self.tags_binarizer = MultiLabelBinarizer() + labels_tags_vectorized = self.tags_binarizer.fit_transform( + labels_tags) + self.tags_classifier = MLPClassifier(tol=0.01) self.tags_classifier.fit(data_vectorized, labels_tags_vectorized) else: @@ -222,6 +241,16 @@ class DocumentClassifier(object): X = self.data_vectorizer.transform([preprocess_content(content)]) y = self.tags_classifier.predict(X) tags_ids = self.tags_binarizer.inverse_transform(y)[0] - return tags_ids + if type_of_target(y).startswith('multilabel'): + # the usual case when there are multiple tags. + return list(tags_ids) + elif type_of_target(y) == 'binary' and tags_ids != -1: + # This is for when we have binary classification with only one + # tag and the result is to assign this tag. + return [tags_ids] + else: + # Usually binary as well with -1 as the result, but we're + # going to catch everything else here as well. + return [] else: return [] diff --git a/src/documents/tests/test_classifier.py b/src/documents/tests/test_classifier.py index 4ae672ac2..0f421bb32 100644 --- a/src/documents/tests/test_classifier.py +++ b/src/documents/tests/test_classifier.py @@ -1,8 +1,10 @@ import tempfile +from time import sleep +from unittest import mock from django.test import TestCase, override_settings -from documents.classifier import DocumentClassifier +from documents.classifier import DocumentClassifier, IncompatibleClassifierVersionError from documents.models import Correspondent, Document, Tag, DocumentType @@ -15,10 +17,12 @@ class TestClassifier(TestCase): def generate_test_data(self): self.c1 = Correspondent.objects.create(name="c1", matching_algorithm=Correspondent.MATCH_AUTO) self.c2 = Correspondent.objects.create(name="c2") + self.c3 = Correspondent.objects.create(name="c3", matching_algorithm=Correspondent.MATCH_AUTO) self.t1 = Tag.objects.create(name="t1", matching_algorithm=Tag.MATCH_AUTO, pk=12) self.t2 = Tag.objects.create(name="t2", matching_algorithm=Tag.MATCH_ANY, pk=34, is_inbox_tag=True) self.t3 = Tag.objects.create(name="t3", matching_algorithm=Tag.MATCH_AUTO, pk=45) self.dt = DocumentType.objects.create(name="dt", matching_algorithm=DocumentType.MATCH_AUTO) + self.dt2 = DocumentType.objects.create(name="dt2", matching_algorithm=DocumentType.MATCH_AUTO) self.doc1 = Document.objects.create(title="doc1", content="this is a document from c1", correspondent=self.c1, checksum="A", document_type=self.dt) self.doc2 = Document.objects.create(title="doc1", content="this is another document, but from c2", correspondent=self.c2, checksum="B") @@ -59,8 +63,8 @@ class TestClassifier(TestCase): self.classifier.train() self.assertEqual(self.classifier.predict_correspondent(self.doc1.content), self.c1.pk) self.assertEqual(self.classifier.predict_correspondent(self.doc2.content), None) - self.assertTupleEqual(self.classifier.predict_tags(self.doc1.content), (self.t1.pk,)) - self.assertTupleEqual(self.classifier.predict_tags(self.doc2.content), (self.t1.pk, self.t3.pk)) + self.assertListEqual(self.classifier.predict_tags(self.doc1.content), [self.t1.pk]) + self.assertListEqual(self.classifier.predict_tags(self.doc2.content), [self.t1.pk, self.t3.pk]) self.assertEqual(self.classifier.predict_document_type(self.doc1.content), self.dt.pk) self.assertEqual(self.classifier.predict_document_type(self.doc2.content), None) @@ -71,6 +75,42 @@ class TestClassifier(TestCase): self.assertTrue(self.classifier.train()) self.assertFalse(self.classifier.train()) + def testVersionIncreased(self): + + self.generate_test_data() + self.assertTrue(self.classifier.train()) + self.assertFalse(self.classifier.train()) + + classifier2 = DocumentClassifier() + + current_ver = DocumentClassifier.FORMAT_VERSION + with mock.patch("documents.classifier.DocumentClassifier.FORMAT_VERSION", current_ver+1): + # assure that we won't load old classifiers. + self.assertRaises(IncompatibleClassifierVersionError, self.classifier.reload) + + self.classifier.save_classifier() + + # assure that we can load the classifier after saving it. + classifier2.reload() + + def testReload(self): + + self.generate_test_data() + self.assertTrue(self.classifier.train()) + self.classifier.save_classifier() + + classifier2 = DocumentClassifier() + classifier2.reload() + v1 = classifier2.classifier_version + + # change the classifier after some time. + sleep(1) + self.classifier.save_classifier() + + classifier2.reload() + v2 = classifier2.classifier_version + self.assertNotEqual(v1, v2) + @override_settings(DATA_DIR=tempfile.mkdtemp()) def testSaveClassifier(self): @@ -83,3 +123,112 @@ class TestClassifier(TestCase): new_classifier = DocumentClassifier() new_classifier.reload() self.assertFalse(new_classifier.train()) + + def test_one_correspondent_predict(self): + c1 = Correspondent.objects.create(name="c1", matching_algorithm=Correspondent.MATCH_AUTO) + doc1 = Document.objects.create(title="doc1", content="this is a document from c1", correspondent=c1, checksum="A") + + self.classifier.train() + self.assertEqual(self.classifier.predict_correspondent(doc1.content), c1.pk) + + def test_one_correspondent_predict_manydocs(self): + c1 = Correspondent.objects.create(name="c1", matching_algorithm=Correspondent.MATCH_AUTO) + doc1 = Document.objects.create(title="doc1", content="this is a document from c1", correspondent=c1, checksum="A") + doc2 = Document.objects.create(title="doc2", content="this is a document from noone", checksum="B") + + self.classifier.train() + self.assertEqual(self.classifier.predict_correspondent(doc1.content), c1.pk) + self.assertIsNone(self.classifier.predict_correspondent(doc2.content)) + + def test_one_type_predict(self): + dt = DocumentType.objects.create(name="dt", matching_algorithm=DocumentType.MATCH_AUTO) + + doc1 = Document.objects.create(title="doc1", content="this is a document from c1", + checksum="A", document_type=dt) + + self.classifier.train() + self.assertEqual(self.classifier.predict_document_type(doc1.content), dt.pk) + + def test_one_type_predict_manydocs(self): + dt = DocumentType.objects.create(name="dt", matching_algorithm=DocumentType.MATCH_AUTO) + + doc1 = Document.objects.create(title="doc1", content="this is a document from c1", + checksum="A", document_type=dt) + + doc2 = Document.objects.create(title="doc1", content="this is a document from c2", + checksum="B") + + self.classifier.train() + self.assertEqual(self.classifier.predict_document_type(doc1.content), dt.pk) + self.assertIsNone(self.classifier.predict_document_type(doc2.content)) + + def test_one_tag_predict(self): + t1 = Tag.objects.create(name="t1", matching_algorithm=Tag.MATCH_AUTO, pk=12) + + doc1 = Document.objects.create(title="doc1", content="this is a document from c1", checksum="A") + + doc1.tags.add(t1) + self.classifier.train() + self.assertListEqual(self.classifier.predict_tags(doc1.content), [t1.pk]) + + def test_one_tag_predict_unassigned(self): + t1 = Tag.objects.create(name="t1", matching_algorithm=Tag.MATCH_AUTO, pk=12) + + doc1 = Document.objects.create(title="doc1", content="this is a document from c1", checksum="A") + + self.classifier.train() + self.assertListEqual(self.classifier.predict_tags(doc1.content), []) + + def test_two_tags_predict_singledoc(self): + t1 = Tag.objects.create(name="t1", matching_algorithm=Tag.MATCH_AUTO, pk=12) + t2 = Tag.objects.create(name="t2", matching_algorithm=Tag.MATCH_AUTO, pk=121) + + doc4 = Document.objects.create(title="doc1", content="this is a document from c4", checksum="D") + + doc4.tags.add(t1) + doc4.tags.add(t2) + self.classifier.train() + self.assertListEqual(self.classifier.predict_tags(doc4.content), [t1.pk, t2.pk]) + + def test_two_tags_predict(self): + t1 = Tag.objects.create(name="t1", matching_algorithm=Tag.MATCH_AUTO, pk=12) + t2 = Tag.objects.create(name="t2", matching_algorithm=Tag.MATCH_AUTO, pk=121) + + doc1 = Document.objects.create(title="doc1", content="this is a document from c1", checksum="A") + doc2 = Document.objects.create(title="doc1", content="this is a document from c2", checksum="B") + doc3 = Document.objects.create(title="doc1", content="this is a document from c3", checksum="C") + doc4 = Document.objects.create(title="doc1", content="this is a document from c4", checksum="D") + + doc1.tags.add(t1) + doc2.tags.add(t2) + + doc4.tags.add(t1) + doc4.tags.add(t2) + self.classifier.train() + self.assertListEqual(self.classifier.predict_tags(doc1.content), [t1.pk]) + self.assertListEqual(self.classifier.predict_tags(doc2.content), [t2.pk]) + self.assertListEqual(self.classifier.predict_tags(doc3.content), []) + self.assertListEqual(self.classifier.predict_tags(doc4.content), [t1.pk, t2.pk]) + + def test_one_tag_predict_multi(self): + t1 = Tag.objects.create(name="t1", matching_algorithm=Tag.MATCH_AUTO, pk=12) + + doc1 = Document.objects.create(title="doc1", content="this is a document from c1", checksum="A") + doc2 = Document.objects.create(title="doc2", content="this is a document from c2", checksum="B") + + doc1.tags.add(t1) + doc2.tags.add(t1) + self.classifier.train() + self.assertListEqual(self.classifier.predict_tags(doc1.content), [t1.pk]) + self.assertListEqual(self.classifier.predict_tags(doc2.content), [t1.pk]) + + def test_one_tag_predict_multi_2(self): + t1 = Tag.objects.create(name="t1", matching_algorithm=Tag.MATCH_AUTO, pk=12) + + doc1 = Document.objects.create(title="doc1", content="this is a document from c1", checksum="A") + doc2 = Document.objects.create(title="doc2", content="this is a document from c2", checksum="B") + + doc1.tags.add(t1) + self.classifier.train() + self.assertListEqual(self.classifier.predict_tags(doc1.content), [t1.pk]) + self.assertListEqual(self.classifier.predict_tags(doc2.content), []) From 43b473dc531d22e5de7ee62f0a98430630b64f31 Mon Sep 17 00:00:00 2001 From: Jonas Winkler Date: Thu, 26 Nov 2020 15:55:13 +0100 Subject: [PATCH 16/36] Test cases for the API --- src/documents/tests/samples/simple.pdf | Bin 0 -> 22926 bytes src/documents/tests/samples/simple.zip | Bin 0 -> 17396 bytes src/documents/tests/test_api.py | 39 +++++++++++++++++++++++++ 3 files changed, 39 insertions(+) create mode 100644 src/documents/tests/samples/simple.pdf create mode 100644 src/documents/tests/samples/simple.zip diff --git a/src/documents/tests/samples/simple.pdf b/src/documents/tests/samples/simple.pdf new file mode 100644 index 0000000000000000000000000000000000000000..e450de48269ce43785b8344c63e233a1794abae6 GIT binary patch literal 22926 zcmeFZ1ymeg@;^!l!6mrE;Lb3(yL)iAVQ_bM2@U~*y9EgZNJ4OTf?IHc27(3mH{|=> z-S7V7z4P{+J?EYO***+?`*z*Bb*rj-daCNvG^&!)EFe~HWSZ{c?w0P)-Fe9D05*W5 znGLd_AW#wFVCiNB;DGk10i~_&+#oJMX**Llh$IB;Xbuq;Ms{^`ftcDOdu6l44>H?6H;@?p^vKWPs#P#ZGbM{;Bcf{BGr+ELKNlvYYRNcNgtX+7@aLXT zMQN!S?3XnMOGXzd6?Y;rsx^sOB+DXSS48V9%8C_*Nre0Ge09*fJ;tB)nym>uKSOw1 z2i!o0IGFz_FtqiwM&zfZJvBhw+)rnJ_woEU1@Qha3iwk&AOMJsm!0je>e%A*b|c=( zS#^}IgXn*zCa)u*nWXP(wA20EkL1j%VyEC?M)$S`K=_(QzmCRCLHrFiECjSQtYn`iKg zf}%nOaWK%_&+Ku&A#j>Q@-?@j>#2p9dZv4QKhun z=@em(Dge&env$D{x9Q_-*cI_>U>>Rgrg4#rb67eijW{P8;mu->2nuC92$yD~)|^om zof)g{JNi%po%qS2uXL^$$;LVc720v6ksjPB{pbm!yHQ(d{s&oogF>puBi3^YH8K8~ ztf=^&Z>QNYfr%PP%}Ba_X=avrD9bVAkH*pka_wzWhja;v5}TSXTYZnCH!OGA` z3&Wr_z7-7B5)oa9ALHmvT?5AkgZZZC23wcJ>T-OElbRVKU0r;Baq_`7Pq-kT3Z{JZ znzD>tk*w~s6hTUEMXXn7y`Gwr?fjkxs;nIJ_~l-gs<>$-h<Ro53Nw-;(BpU?f_z^C3`oT3wrR`6@gqyKrECgMXzc67xJHqs zAT-Dx8^>$LdmKT)E37b`Q9HMosc9RLS$SU}H%%K8sPn~!;@wJl8+r3Ni~|WNKE`!R z=<|F3*4*42fu`oqj{85Inim%J8su5@xg8h26nNi<@6U2i3+$78s?@4}*VfUtWWkk6 zeAe{ldtn!>Qb4X=udArd3&K0rj2b-D!=Pmdh8w@li!_F%!*}lAmIHJV5!u@`n11Hu z#F}Fagcv7kuQ4S`TnuB$$LG*7l+ct^QSXK;nPa<~;%{0m9&|Yq?448ky<0xS-kd^R z=`@)^j=-TZt1p0$iI!&iVt%=D96Ou<>au%fn$^mpvHOmuK3obBkAk|UuHVvh2G0bh zVd#_TTdGX6Jv;Kv{=TJ2sA4{=8zx zVGa>A?xEGeV@B7Swbkd+ z`Yz5K(Xo}_Tt4>o8W?%ftQ37A^?FYCHR{9eQ0jBvmGBcLV z7USBIYAT_SguJkPyK>eTf=DHgI?IA7lk=OMias-*WM_{oKsStX;f1tbxPT*rG)H@JdR-qiMbg%YftI!VPiy zZR^}EJtn@&S8k+jFr_tRn+KzvT{naNdgcjyzj@^-Cw4W{vM74GX3aG zA8%4J&>|DQ4h1z-uCB}oY#P?Uy|GYrt+1K%w)kn}x`2wFUsXPfXf!%W(eEb!UUN;={~aG}&ptOzqXF$UaFB(W2)RzJSXYod?!X>MwuK0cJ@kv?_Z)Wq0~* zGOg&X#OHioX*4tz8_S6BMI3fc-aPx9SV>#!LJ6SP0Y&o|8J_vzMoHtuuMdn&y(V1R zK3q=dZ`GNZv1=&=LdVU94vAbHoVU;;EGI@!=NH-SOr^m`dwB(Y2hn07Nsh@#q^8!b zog){pP5B33E|Gl)J?KO2>I$`2g4eMdGHjsV|+;o9-(THn7OA24?NE{GgWdW$|4i-A%#Om9y@vU~Gu zf_#`FM|CtNfv^t=Vv#jFC!namky9zp<6{Wl8^lNw%}gptv8L=)vGr7JU$w5d0xfO@ zO`Hb6y3uS@5GCb|O^vME)$Um$SdSk5l-cGS^vgLtmnCt;I?6gFaT1e^Kycs3X~0)8 z#@Ld>x3EadXZ=fh*Sy_b)t4{;X-ds?7e@fOpdJ~0__})@Tj!i~EmyhR zMIaQ*Gq&r}C!;53!hbq4PU6b(^$S5J$HvCwPj~NadHT7-)7`vvRWj>x(94OQT=S)QiT2GGZDghdV$l(WmRmJFIsV5<7Q&=*@b_>z0*3@5vvn##f4+iAtctFB4n-0 zwal!;jo%)3jY*cxR)?YS9BGm&4jLFzMgE%Zds9|GHgwt$G;dYa(PPb(`E&Rb*J(?S z_{*t4;H1me92Saibz9)2`y4aeaEOjeuoRE9t$Nj#&&W$5r|$}8Eg86;nv zY>xf(Dh_F-t`;Xnc;xxNV!5UqHMfq0Mn~fae3Tz`4iS{D8W|NQbS!2j1 zFH<*9e-3L`+3Q8VSR14DPu+Z%TC5kTag`HZQN$w}xA&Ek)xR!ydk{s_4Go>SZMbzn zL_!NQZ`ynqXsi}XRqLZv8&^~H(aUdJUdVX!Wb3r=2iHsE=MP+Ky3f0aysXerYl3sR z5~I>gd=d9wF?6mJ6Nf#spfYIGs^W}}v(3s@?XPuWV*1IXJ)gFtnL^COB6#`zXTs8I zjVGs@;mP!J+c-_!s)%4fjqBG1$)mOSoCb_f^J1>2>yVNogSmjmRHb3NgQCNlWix+| z$sg8^D~-jq)%N)JT%m5PZNtW+B61^Nnib_);Fa7z$&cGqY6z0urs1<5oo6tjMwHBh zLT5Uxy+ebokmIfyM`}Yfy!<3ZTCpOuLq1}?{DPAe)JsR}5mWMY)a^u$BWv&snUh|Z z@w5RenTyjt7*A*MW61mAPy2v&IL4vgK6m^sl*=XlPajm@ruzpjPB@b8&6E8!ZhOcJ zGVt6uW;rN|yGwpNKIh=k;PasC_;_3KzGd&CgJ!hc@&E-d)yGrgbLMkskg>WWifmaok5-jbv%Y8R!_ZR!a*c(d+@u|ReL8tA^_wR4s z(=t^yBG?}G5mo>c1}UuHA&WLhsuxu1Sd>%h>@wQo;x$#s>+9K^UCE#8x!0)JX4ePV z1sCD*+67yq^suZo1ogxA!0I2<=;p7$hP*h#OSg2PPx(e#C$Lc>?`kEZI6BfLi$SJO zjJt)G59v2|fqDU8FJ>>^UfrMHs7BPnfo3i=cG%(VWP5TAq^)XV21 z>6;rtTl(aT+79zB=gbYc&^^nVu<_A&2Xe&RJh8r#PMxAtj2=F z)%fPs)dVAg;B8O)`^^>5hk(N#67s$PyzgN`w1&2-_?TSfoYwM!!g0AwmnPyNNUxEU zGdjQ_KG?fTY-8g)N^Je5hwqTkvrHD?oUyNz02zSybtl5ozu44-iuMSpv>)lG6f4(H zCxLhD77fEc*}vi;X4N!6E_&~1A$gs;Yve||em_#RSFR3h6Yjd>=9CVaFK(>8{5wA! zKjaDD@8=nN@71b1-k<$AYee6E82M|*{myK$S7Dr2yYe_m6F!X?Kf@arw%S@Lck%O4NiFh)aHP0`uSyE} zzVqrj;R7Q2We;?xXxoRme}!`>0F&jz_Z z+4Z{~oZjNYI|?`TY}u5vk?2_o@$&Ar9*%ca``_lrUe4KB*;HE_UKP0{o(jKXVAguS zm??JaKRl`d!4R~I&*nvK@8{E#-!86{)6FQCt^CTf1*~W8O+HjJB&3v1?$@@eqAvj# zm4QzUQCjSS%UFWQaZ+DoVm5ZeGN8b*u$csVpm6H0!J?!S61UqK9D1U)Ta4)gZU=`a zU77grJahcUgf4TnJ1nQvW4@gP6LXRM6#^`!0#*5iP7e2R2vXlZDxQ&NV}PPF(@o>dC__(T55|`~t+14O#brDBA9x;pGlC zIt@R7J(&;skAWXW9<%BL2Mzt0YUGt3VXHfjly1aZ{T4%F3{r4IP9N?n$87sn305g* z7{!MgE!4V!RL?bXL?rn!f&C2#-is$IQOC-DS+C0ASC!-G1LXpdJ-XDFi0=5hTEAkP zv}<{5TvjT1XZ;GuxUBxJsX~b71ikv8*)L#9xQV1jh`h9Y+0p zbeVCkVQ>QJ4%^~<31eR8ncCGnt2xE^h0nnDQmmJWO-5eB|}CM)u=DRO#rwrrNJlnQ3GDQG*v`~ES$4E(xmH{pAOhk;6c~hK@@-&o1!BJ87^L2x~HgP5i7gB1bZi0mJ$O00#VQur~ zfSb@JH+r2cP*wTMj95A|U)|XFO)E;A=)!tYHa4t`*Tv&{yordhX1yQmk|wAt5%I)d zBNU>~X;9)}m=@!^yXE2bz)a03_R(GxS~8NNg@;U2MYQs82FveyQQ|F1oD#;+u#KBA z{v3$%W5!b|uu(*W3r*+MV&dlKlDiFqC__R}Vrk6=VpQPNtRU>V_{t^UiL*!jZ!czE zq|C;eS?-P3oeq1I-=8E-Ccu4S=bMN-NSK^es(qNGzG-kr5pu3dAP&XG3D%?F$ly(j zMqvh1=gQC0P+_~*_I#=DUshU>3ya;z>AIE2q7Xlk#yW~gzl26iMH!1Y|86GU+i>iS* zEN(?Bh%T-N$i;cDD8GDF`|8Xd?hKb$mrDJLDU4l=mf|JCiZNuP@~W92WyFODR1NPD zn7gh0EiRuXSd+h(mvd4M@1Cr`(<~?2;(QA*I4(8L*1O#G-@OI zDRO9j@%?0Hyb(7=Rmp^duPm~ZB^O*!*FUID4S!fc!V^pU0dY|%+!^y&JGCszlWIu2 zUh3S%QDHY?`YuOPV<_fF$wP_^dJsDNE=kmsd zBmG7+@)hcp)4s7n$s>+9{)o(NZfRrZEFbt^=*&#zyT8;qZmsoD5o9%3)&*@6r^ zm%bGXWtg1My4uO;e(bc1|6!wEj?}Bpzk&3b{9y3HGWM%}Q=yo}*OM1qdkZ`CPkW*O%DR3o%(%G_(6lTi~BdHmdw zjg=M}x*YEEBctv534OShxFd$-QFhG91mK3co1$-;JgM~o967uFLhrH;@SPz6OY&P` zr_AG;t`E5xE5Na@ltOWsJ?hN4)Rhp@VN;T8q1}S4+=8iguB9Lf|4}aewQu~e6tFKN zoYkbO*60LG(^0#$>IkX*6b5X2J&C`c86sp2at_%`-{AG|Q20|F=4knSURw^mWbvvz zR#uoU3d3Gie&Go)iXj?X$DUUzXOS&AI(Y5*y;;J$Iq9&gL7}sOBHrN$kWJ!3phqwv6>k} zay@rFY5dW+v!nd=>zH+E9`7C8>Z=W)}?i- zd;v3|?}A;cuAVvPhv+fq)dJdjD$9IfM+pYV^=bGypcaT7$$t`|JF@zEJzdHWM^gX z>vwzR5s>Bk=~F$pRddetz{x{!?!=O-|ny^XIrdD z2V@1aoNqLZ<`NW#H<+*9@NUDc$#j{E-$+Iek;fj6izKqU{`ox@mtp@Y{%wRL` z`oPTn8N)gHpf`d!JFi=6m&Uukv#8}xTxAj~k4Gs^#v_jt-^JS980*?D>Xp7`MVN8&@n+X(`cc64x)gt&rT ztexB(p_w1(p|Ytx1Sl@{v@t1Kn?YPm-K-rQ)FCd`7RXPjAxl>PC$s^j6c=~&(r4k~ z^wXGK3qV&`N3 zf5=@KCJj4u1E9GpzZb)9nAb_X7=v{ww6&X3a{A>c;~5#g z2n27yt~OOjgO%i(C}J0VHW*W0r^uTxs{5ju(-z)2C0J=o>_}p@9tgf5AX-d-FuZBd z$qLEa`kvp9BKUf<*Y>#2XzTmNeBYN=zRMTS=K<=kG}R=?Zw1{C8;JoFb9EE7c3my8 zZ$&z9e~ND7So+~|*`ynkb$DJN7b;H4zjZf6`fy{+!Zh#lW5#l)h%;uAlVw(OGux|P z!U%MfG}p6;>mrt2Hb6XUKA=CVnn${WvMI!|g)zeQi;Cl1urBS4ZZt1(Y{YUib4+Ne z=&+%GXZio4Gx? zj)sdnJUN4+>7Q*77cx-W+T6{`Ri6jyINVUXaA#VCWXl50t0r7G({3ri} z5Ksf+(6f|uM zlz>VMozfNXTj;-DKb-#%P#M7Sw+t|d>$hzG_bK4_nVK3}hPQ`1dMZ4EZl2IT&jvSH z&xW9nAppi4O8Kb#pdfnW*CEj%h>Y(L8R<;`@cHx~yxTp0p4?+xJld@y>_5cWjyBE; zRojxW!dxwP`*nj&`C+=KzMK7wJ@?8nH1Gxu2f@4xLL~@}ZE&Mtn4iG_LC1j~#vV;p^V6MRt~AjvP#y%j1EWc96Oe>Y zI!&G5bYebw%Co3(=P z^oDtP3WF+hcj>pu2cyA_aW?fh1GoPye-mchbEiF1p~05##4< zqOdrk&rv&`-F5m?kbRNl(3I{-81)mq>n+|<9tpAaLj9Is%_mh}BEiL@G8URg2^Vu*f1zhM!SL3?Q-x zj{sZ;u<9c000INg_~H1(SQD8x(Y?hZDI5pze?`I;0O;Y6Ln-rR>4~DlzZbyKBk{wQ zh|3hX(PIxqxD;%t3cJBFhDGO3se&|+QzdX!aW$ULh@GoGcY9_NqL&{tPV}U zkB`J&H|NC_Mz-wwb`0XhU=32~DqA=Ef?6F^xvuwx%pr()-QtSU52+2+vrBv3=nFYn zkYi`}F`^*yYGnU9(iKP$O(fKEJ?+@m3o`%#*v)h-bH#Co`+)A)wRnu)f^tM<0*4$d zwT4LzM;h|1Gt5NF3GfB81@T!JTS!Ers4Rs!CNd%6POV_jY z*G^(zs9IhMBL+&oq{P7tel6WYfrTma()u;3BpxMxQY5`74#g-y9y9edQID?V^Fqvt z5Gx5c06(TSrvK~x*IBPAdJxTUPGC+DPY6J9UJ>aME#l0SD-alfONSQ~zB+klu0h0z zofyV)o!&H`22};_Ong&FQ*=`VktkZhVg6wOSs_`Gg=+mf?RQesST--t zkH! zuxCm4xZ#)YMJeS*r{ci_prnNj3E$2+81xuUz<>?QVaYL zGY|Pmf!4c7H0C;u;5UlzZBmZ+AnmzW*MF3}r8ZbNS4Z^IXR zySAP0VP-k}#VtK5>qMoN=XHI#jGftUe_5u$?gP zE`cthbtco~<48nIkR(=8@PPCt4Kg;(YZpn}LcEDYE9H+g{Fp+o1A1PX;edkErAKJD zu~jgKqdxVV_Go>_H3K>a@rt*oWK#=MwNbXwRAbFWW%YAv_rKPr=qT18O-+-iCikt_Z_-+guRMNQ)`rSGsX7T>uBbS&*m@FWlJdv% z9~?grtRqpAK<4ZjQ6pm8bW;P9<}`J*--7I#w+=kpWOHG@M&wJ*mSLq#OR^or zAF$tKbrJDS6qFB;%%y0jZl|ev)BdJmpc#icT(m%Kp1uVKGa5%KsZea9Ed7-o!zd0= z9)>0xOGd81{M9dpGKSO?A9;?F&`Alx{8-gK1{HeO6saMEA^aiQEg)$Kx{3@mK)z9e zU65R=UN}|Ekzb!*U*=kJT7Xg-Q>aj=P$nR)EvYT7Ei)(U8Fk3GjMjwNgy5#`=IR#X z#^ko|rtj8#M(;x@L?wjK3e!r^O47>M%G!$5iq?wViti8bhx3Q^C-P^#m%CBB(Yukp zQ8?57fv_vH+yA5Nhw*rOiE+_t{-e|j4dzsFNWm{vsds7`G!=L=uWh7+B+v72)Vs@1 zra5iUKPBqPzc1ldTPzov&YEtXmYFV_)}E%>quaySBZr539t_<=#e&4L#d5`RI)gex zI)giNIuj^jDdH(oDIyCJ3sMT=3!(}_3)0r{y6L;&x?Lk)L|~xqqw1sH#u7*)Q=w1+ zi|C6`i;#<0ix`SnikONpz=&XGFfy1MOaW#9vw*R|2w?IJzy`qvxKFo_C$=(nl^~Eq zOwo&$5gQ>HCK)anHkme=A{jRsI~ia7r8cEn($Yh-d{XoP>Hbi`w%dZcZ{bmU}2CL;lqXI5?QVB!F78xrjB zOY;u7<~yZ4Wjy8CCdGq8NeWL2kC953%9IM1O6!g7&F#(U&7%pSNumj*$ty`Od0P^- z#XijS*7Pj`^AvN9&RgBK4|esiS|m)VsiB&Qn$emWnvpXpGx0MyGkIJgTuEHX=Ww=2 zwk)xWj0+m6qjpS$Bt=xRCYPwQf8hwE+Yh-()=G}qnMMb^{R zS=Qy503jm7D_twS!X3gx!h^#75ApXLx3ssy2=Sr8p)nF+5)l${-9g>a-5K4HRIyaK zR5?_6MPWtpMd3v;;2?0y2B;q;o+chhvY7pnE;u!7g^>0oEEd?yeckD+av&VSx`Q~^7T@Ia18{ZjZ8&De58SEKE7;qa@8w40684MbH zGmtV!HP~t7|5EL|yzSF-p1sd9M^~-n&=I#|@qK$8Z$oQs_B;31=bej9^zZRo${VB5 z`(^cp!`9(;Td#kDZ>DdqZ}_dir7;;QMrvfTOqfi(OlW_4e|&!qT_RlsUFr_63P^=i z1x1Btj3xJ0a7wVoTgyA(_~(h4H=$Rfw{;70%R-Aq3wn!D3q{LG3t`KPmaP`FmV=f# zFBva7uN|*6uQ~6)Q;KVPzn^~Me(ZjGe$sxQ{SN(t{Xl*)*T1giulKGOu6(W|uE|cm z{%YO0?YfL1A5>Yl+`<2zwm!GUv)!<9xrwozu%WU8Tm$Ynt{tr%?X-74ra#g=B0MOlK(lsF~2B3!6eM2&ZK&cuv4#7tCOsgqBAn2K13m8KSV&>ilX9W!%HRESF+jz zY6DUO#`N;^n)E8~Glx|@PceNBPus_T;L zBESfij;noZzwA$r71lqb15k(ktn}WmPy)3>Ph-Y zMo5uGwY>BO{!%j6#sLm7GXfY{Dp+kO_W(XRoCch_XX#e8ySft!g>#Ze(aKS6c!t=^ zg`J$paiO&G>iIt=nXRiois~`5glfrF7IIBBSxbDB`N+56yTG%MOCT1jsi>-`rKm8X zI3hQq!mP)v^fOE+T&GAUZJwppxmLFpQtN2u``ybK&soe_*O|rH)!EWH@Py#R?L_>9 z@r3ro;e_Hu<~y)Q5cL2S305IiG?ow6Ec9!w6AKCJ8&(<3sJy|O-86#K^tU=G zSCq9>QAy@0o5?9}rxQ6--BQkyW8Vrg>H>6WzA<}i{`$OGyva1kHORtHjMIwKkP?n%tyZa~KZ$7VW9?v#WX)v_{vk0bJo#!;rUa$9 zvRJ>^w^*jQtk_r~@XZ5D=v&j2^|Yk76DgoH=Om<*sN}&nd@Qb7rH0#0BSIOL>UF6AnQQD+vTM0APVbV0F7^Jsm(q(953TGsJHKbjkYNNHM z*;L6{bf#r3VQrjIHB!P{P*rVL@w1BLU3PwUr9p+CSfl>+=U*wm3V${Isz!blE+M5T zrP(XlD@P+rV?-lUBDvKfpEhKl%8)9R>Yi$-S;!b!@uecOqP(KsCf}xT4>ZF*BRxYl zBTz9_;i%bO{){V{C?PW?voy0Lvsv$*Ui1ReC!0?(HupA{GZr&i)1Rj&_7wNT_mKDC z_b&F@_CR~OdzRD0(*iS*GiB5GGwl^e)`?tSxTv`}xtO_@xVkvixR5z*I9aWi#tG)U z%1>TxG|(z1DNt#q2ZU9}y({TWX>MoK|aL3dbZSdK}R z>DB%ht#GZFS+4iKdi8otdW#DL_IUP~_7HopJ>7xm0ri34f&78iLBPT70`TM9$H4tH zw|Tc7w`sQ#Hv@NZcQrR;Hyif_cZWU2-Im$w!um3W*{#{s*_qi-vqDwGC0XUZ(w8j` z4Tq%5gv*fSJRf=?S|L**8X>R{T`QoKy4A1Mrj@((S!<-vsgLoQ$NAum;LYR>|IO!{ zl^f8_x0{lixSQ4+@J;;<(@i_76Z}v3@9=x@c<^NKnD9T~%g~5Wo6w|DEl}U0aidK{ zFh>+dR76-rI7FC6SVu@joJZtDG@zBCrJ)8A6~yYqO2%o#jl?d+cEwJ{Dih{#?6DJp zGC_LW3+#Oa^+Z##GjS}jLa|h_9I;~rc|3<8GOhrJ7Mr_8IL9t)O6x*%CwAD$DQClw-gLxi`NzrB@&(@2!;rw}P{Rl7g**Bh-`z!^p!iZ+TegSXOlc z8pQQ#btbB&-!s2&sWPi_sTz9Es)wedq{E`;-yp5Wp+8kM^ZtDmhJL@!j-IwogYN3b z+>Zga_ucUJ-8NJXMb=>O1{mAXb*XfrwA*69V!`5@#Uh^Nur-r?rR}~6u~n9xzLmb? zBE)yjW7gyAMGa|h{Hk*2==|sm>k?~s^?dbU_3B~4a>z1|Z?RCl&{w1Gm4%gl;T7Sr zw!VkDd;8n63#JPJbZP`61aAZ?1pG+0NViC{NZQDN$VhY)^i_-k^jb13kz)6Oi^Ru* zVu=@#qhx#hoIX~E<7E=Q@ftD|GPp9XGQfU2Me@FElDN);ildbzgm(eQSv^5IPvTUGT$n zfBne0r)XfKMJBIbB0(yFJXc3WS%pyLxeET6(-`xZ%NX|<#Telj?pV8#n~}It1@smv zY!qem!f2<(r-jU`_2MRgz(0NC!Nu9ynYUfD<#SMDt zleMXiPaXGwK0qztAYVd^LX1#sODrwK0YM1kh6EL%9!>#%A3*>?25tvV`-L$Q8Cn2t z3uXxVI%*OM2x7-4Fw#AR}D` ze+AQ?(X{yVn3XtZ;79q7?rx%PHg0lm`e#mOc4t1Q>=9JbtbNpdOtci+=8E8vxO6&D zCFr(xq!z38rnb6vqSm|iyw<-KyLPx%!1>I%(0Rc*-g(`b&zZ}4*%{;eN{@Yi@<#Xu z*9Pf#iH-X2mEQ$6j(UMTjx>o_aM;t>Ke5EI*s&S0tErb1bY$lhscDgEE9e$zBB(8B zwP-zPR}>6nm*gpFkZ7!_`KZHbAvBtF?leNOKFMcH8E=^1RHgn*CrQfAGDurWzN7bM zxE{ldpfZy=i~m9vCHy9^cwP1}shp~k0T50j1ub~!Wul=dp)cWsux)t{`7ko3GHE`2 z(TQ-ed}97Y<<99Yd~QhuQeso0`bxpV%tH5``#qgK<$?dg;j7Z3H!^4mJ|uA>KEJG{ zCOV785~|CJrW&VGr`}IJpPHItniQBiDr=W}%n{2I8y^4a^!4>1Z;~?w7_ed8x1(rI zt6geVs%ar=;bS3XVYz)j(vZ`rptcTW0Pg zr@hgB>&bUyw35}5;}_u<+-XdAfcn3pd=;UWGQ)*lsyU8Zax2y!ud z=&^WKBKlFZrsF-J4f=%r$)tEqv(tj2QT3y0&G>t^wqp%b+jZG>zx8doA-M&)*5pGv zY~@^G?~~1(14qDQ0u%KV<;fGLOZUL%5W zgGC)Epn<39QCL-O_vI(#n+0EqLDkZSrOG>5Z(VQ8Yp?5DWYx}B1v;kcn0n4tg#~8$ z-@SVqfnAZ8F}h|JdgP{~j$<7dWj#{G)Kilquy&F7k9m2H~Xl~NVw6+)HC zl?s(L8VTC%HP5)RxrVqrxYoJ6x#GAaxPZ3Ywm)oXY~pPFZ4+j-W-exxX96m^r%&dr zXB($`XL72WHF&fxs~s+x7j?tyxh*Be$V7Y}ey)8>v*g+Tu}`z7wCBG+wQsw3yH~b9 zH?}^Clrtr?6Ttez;78-AE5_J+P5_Y?(KHbWu^CY`(E?E)(F*s^TuXr(uayhGtLwW{ zr5(q-K>vr!7o{k5_{=~!VXD^`9hSFKS?)U)PG zn~Nzc8dKA&XCJMLGUgR+x$Q0-gcl5!m*xbg?mmT&Q!bc|yL_Q}Us6$018Lf_Wh&78 zIFmoq%=P|L)2ABu((`?(?@|j=-7&)W2}V_o^MNNndL8XIxof#MxZ6L+es20qY+r9L z!a8C2rp4TY;aKgW#5K*q(8t@lIj7dJbh%Eb_}2YzS+R^Cq63XSTGnU6kaFtL*a}ULK7dI_{G11uj4C@GrWj?#It< zRO1^TwFrCP52JOVEud`>KjU$6$ZiMuflvAlVfup>f=)%dM89>+1k*hy<{9Lnam-vr zZOrhjzuw;cTB3i2-jl&X3R68`lpO6&r9J7FXjN{kJ`K4*DDBWMrwZsJ5vn zsIIBrQ(a06FNrNs9s!7X`b<{S}vR-n)7RlwRk z_%iD#VI}Yp=3MkRxbt~C!Z!?7!7rz4%e;c4BGm$iLb-e^Y{G&M+Z0{{x2ih@bKPf= z=NOS>tinryJ*Tp_NjpVF38TV9ejksm97BA?ZM$5*@AiC)>Wm7HYL&~CJCLgxJR7_j zEEwz^yzg&Ga3>=V$UeDR+E`3jRTfkCQ}n$`iGQX8r$Vxr*_MFRhsig1@Et8 zPcpu9^jk)aG3K6&_@3D9@ZGwe?;MUz<>vVdpZPiCjr{t$_w{CUdxZOI{MYIc(;X{U z3)ceIa92{-WLIP3@zz9t(@V+=sUI;b_+!R*K7$u(2PKQc(`lB*=&gRI$UeP~Lu;vz zmEamZVI_-sY=Yv|Hjho?OStRnC^i)9rixGyRDM3&-x2TtDBfgX9L0 z6;P{sz|OD3f#wCz(4!URPK6K!X$3I_y0;5yv}xRL%b^eY@xyY%Gs8S-_$v}eFII@# zF#Pf!A0}_3E{VzCk>d#S2FO0Bp03YE_TqX;IrUo6c$JuxfFU+pw~QMVzbb}phW5VA ztk0BiO=s?Ae$%^JDF1Z$smykf{F|_i-`QRIb?wQ*7SyeOc6KW=1-pUIq0gGrJd2LbtBmbve)?DF{`hwezl6?wCoZdx z#@ZflI!`2SBgYb!$ngUOZ)#7H*UKhV=JKpXhyzyc8=K_jQ&v7MD$Z9ew0yd@Kxw3@ zgS6o`YPQTYoSWHtW;rgJ2v{{cHM{y+_}=381K%49x5wYx+;Uy-FFKa4R$QM`X1lk`0gJq8cws? z&$;keH27Wi9d?nmK*OAE_>J04bL~9s7|>g%(q;E$%;oKc@xjsK?NN^y#e=s&`il;V z7cenMFclTgUZTSUeSZjqVZ#?${htbHo@y(eYC?E9dH(@I2R&7T{Iya<#of)x-A(Ry z-3GMCLq1_J(Z0$R~wZv9k?C1(yT?69`5hPcqXLTf$%a#9l9AU)^?j}S@`6pv1a|_caiT}de)Y{tqH}|J=KnGW#wfS$6pbY<@=7(0>xCwIz0{?ml zo$v4G!Hy0VjxP454q%9|_}|I?+WRCO@JXD%i1+_OI4;s?Qh~h>G&m} zuaAJ%7HT*;+6imBSVLbhAqf1h{l69Y_fCi;^!*dU>}>38EFcgIh*tx|#m~;h&&kWg z2I6OfuH(N@{x2)}L)lP&g8Cou?+*B+@&7Zn|19MHZ+85TMgC)vf0G9O<68eF#s14| z|6`GVS#|%Wf&a0{ze%zGGTZ-Hn-u#m zv;9BEBL9e=LGg@2l%7w~vHvMz_!pq23KTQv4Gmx+16A!z!4P|B@DC_%=Loj_E9&fT z5yrov&i(+8N!dZ{p|DCQl+4t@+|S$s1FAXNL6l5?1K<8!22d3WB>i*pzc-$=0aVccy}W;(^f#kFCslU0chyH`1Ly<( zCY|A50;B&mBKw5hb4C7dfIVd3-w}Jz$m-uAd(c^d3fAVX`hRHl=}^Ma{Rueumnqo) zE2vTe8h5pGw1lo4I@ccvL1dtc6U0Fb{1oif|7|4zP)An};NWg&_s2grDDQt(-QO*s zXzE}|4{>18RA+$7Y3gDt2MtU@-QuYUG5-@D2%X_C!Dd(e-?3;lL+HBx2m}K_zj3#J zmHRJn$v;DSg3?mb(rj#8JZ#V^CpI>2KIonkx`)CtdDx-G_*)?IPdZMCKWMn2w7==0 zbez!f+}wYDpyRnYpdV;Il!T1~%9mXVx@Uv(`aLC79Z#V_dP!(Fkev+#0`Y>l*x7!^ z1?kw>=otPYu9pkM0%~S<4rC~&KQ{n37Z(Q?zyk1R9}gD?)ZkAWz~TQY+=_7iIzwS=we$U&?!S+P!s`D6h(o=UGn;zf@Ux^bR~{QiN51tfzOz8%?u_+(c;KQ zhSD}T@LJ)eXGF>TK0gtN`{OwrawtjqNr zu=Ty=!&Y~Iu{Jf(7qWchg&`<|%RL{t1SLLk>*XB}8ke?Pd;Gj%M2wgYKkM)?7506b z)Cc-o(|s#&`Rh0lg~C4S#JPxi2#hpVu65C6obhw`Ur2K@=FM_GeSBSizlWDuDCw-V kPGfa=d?`gW`dlS1xI->&7F}1o7(8T1_&&7T@9RN#cXxA{fdBvi literal 0 HcmV?d00001 diff --git a/src/documents/tests/samples/simple.zip b/src/documents/tests/samples/simple.zip new file mode 100644 index 0000000000000000000000000000000000000000..e96270508e35b295d4d833c33e3582e84472ab25 GIT binary patch literal 17396 zcmV(#K;*wrO9KQH00;mG01g3qQ2+n{000000FGGz015yg0CQ<=aBO8RaAamxR1E+J z?c=^*8}7beneDz`b$AN^0R-p+000E&0{{T*yLD7tP4YiX2tfu5?l8DB4DJN?KyZg) z7~C1$Jp>OB+$~5TKoWwx6WoFmG!QJf|45!^cc0zweqTB7KYQM@b(p!gyQ`}{)wga> zQ{6?cCN0Yj=HLU+cNTUwb$;s10dN61fsST201*+8GR(meZUy9is8I*WTHC>4E+APu z2plF2gF2eS#KZuua2FWF9^jeD9y-ltwPXAV_6C3iE% zH&sBAm-GfT^kzzuukjFJ;Ee%Sc{I7aXP*ii+2cOJN$?sl+33S1J$%$mgWj+zUFF8r zRTUB|T2nps30N*k$PeAMU%snb8yY%jX>}A6F748MY1PlO8p$I>@>00G?1^Mw6#BXT!@E)c3}ai!BS$_%YPlCw1pAE`jYhw0D@vX`J73_~1~0vG4d zR2+Iq9!`?0`7CjCnpiLa{aUp-i#!$^Zp}BWfjCCVqkSLQYzgwURcuBhc9m!~E%%p4gdY$UuNLLtmAgG{NPT1-{749P9MD^r+c zhzijCY(xEaskDWv@f5*8tzUC?c9w8M<0r@6A5M*P^ZIp)>;`;}bD#6P zZo_t``6h*T$v>8BSH5j&Ygw@1O(r{Oc>OKE15w34Tff`&navZD8;-OpLX7>uMHn_* z$mU53^TPgHx37!-dZ?&eEt;%9va+JhvHOBe;&oP7a_wm*zFqYa)rvTyjv}Uz=$WgCkSwEYfFC9h?#(_KRUO+OM z*)_M{($|K`pb}EkRn_!bLZ%2!5$Xm$Lf+7(o9z&CyFCf6DVf=|;h(n?liTB~?^+tx zEsRM^I-XniwNQUKI@&J*`Q+}fxw+=e^{r|Qd9S^V-Z%|AcOT;*J*{4WZ?dm^Z|~<0 z>oOl?le&I-Y5%qD+o{Tyik!+?4he0Y-Mx+>6Vzg;`0ajuQlMmi`Sg95n%d@^`S$VL zHq0IR&eCGRyA!`+BHCTNuv+QXknH#SjY;b01}EbUf?#4j|^cNSa*q zGjh>8=r&F(cSm1EP(L(@TKu$R$88Q1G3jE^C1k_Wp0y}|33wXP(7x=OXkOysh{rkZ zly;fEFdtSGbR~J-4ct(+6wq^PrSO@bI><`Unwk%GC(^z#eU?qx{=FfZuctCaNsMd$ z2X_&2-q-KY8bsNT?8G?crvNrwn+=wg1+`=nQ7J_+#n$H;W9}8EuemX5Y&T+ZO-$(7WN{WBaL*hfSw&c~BFd zx4ub*;qHZul6%8!1Fj+W-5DHp@=f#1H4_ez>K8t=O(=~uO^Z8=on0rh!D{*ua<6&$ zGI^I9KZEBNKE0oi2tRknKDk~j)(=-nX_yglUj_o}%9od>gx8I#kX~D;xt7_?cv<`~ zcAdk(%B?7!Ml~KDx>i$*Lu!u+Y<9yF+qkj8T=HfUC>U?IU!?PFSLPZg4sX(T3Y#Mk ze3b*$ROu~!#|>qV%eN|dH(_`Dbv&iujnt=H0a;8;b{os^uz7qg9)TSDtY~Rl>wGDY z=N@(A2qmF;|5{P~*w6PYtlg%(I^MiaSZ_4g*zl^Ug+oTo2KGr^9h^6?!Y#*!k7wt( z>>#qxgPokbp1oHR!3mCuXcQ3Z=Jug8#D?6x&*uPd!*M%_ks(D~qoGyI$(McxPCZKp zi(Q4z;`MXB^ru>i4Y{_tQ!k1y{V9;(lTe9UHZ3jXPw zn#~?RWF2Xp6hDqbNi4GK?bPR^9A#q3?UP@i?Y;UrwPV)qrs8ubkRqw-vV+^7%E+$k z<-7I(V2B_3r~Z~FkNYPE!p_vn4b2^kP6!np6()|EM5*NPqXTy)W_{u?;D{V&b505X zC4WfvSWa=)&(&hCg8PuitlBZH{34N-qb{>_S!^4x1Tak?len1phomY&L7{(f}f6cWh0NRlP~#|S}}zUScJ@~IU4xnwYM0exQBc(HPJ)iHGNwV z7z2v3bH(=n)KC2V09X zgQ{3Kn3>{m0-jfzyUQ+2>>7fN__N8C-Oq%9en0w;491^F@r98@bhnsf-rq zi8SGI@>mv!^N_*eq!;pE(#StM*&O=nm+dpiT+EM)D?+8qzi<?t; zC)dhHtuci{dP~Zgs*tqkc>;9NB3Of>bb?!PMUsJ99ftIM$xcamlJwiGL7E*R1-Vw0 z>-F4{N}NMw@%G|+X5;tB86yDGr^)NbKL&Lw?DZp_tc)=DC9l6tF4T|EyvPbGE8u?? zv-5?A!>=Z2vmadxj)}$AGT7WFE~Sl=Gig2uGEtAFt@c)_j;X5n;OV=mC~C5Pv~k?? zo%gB9<2&9Jy~iD-o>rI`RYBRj@sTN}KB&7dnA?{M$wD7l&>GfjSMa=wvCYU->#cTT zW&OdbGn=xdl|;tME^_|uN8G}4l?M;p=xB1WWt2W}yxWdrqMrB2IWEHoM zila8=nj)0fDMTCxr)ex0;YG9Hkf}CnuaF=sK+Gk}P<61Tr?0elGj6y}@CV$dpYbyS zyBQhWql&%^!;ciWGx|>eJWP5l$91@gygc?ngu3$|g5M2(+y%BGu?%$gxDj=rpDze~ ze6RK<)h}Rv%#mVemhy*5%Uw>Op~u=G`(8oJP2&CGDL22SpgUvRhwHL;O*?08^uwi= zd+3D8-X5Bt(r44VvjsJ0aB08CYZJanT2DFhEmA0AANTfsaw#KZI1z441zxg5{EEcR z;FNBaftv=#m7%}d9N+Ab6_T zyRBDW&|b1F5X83Ir?iz9aeMeEMok;+E+IDb5py1o{;v0JVdRQ}`CS#|+g*u_J(gRk9MR^hR~td4f; z>-Dc$AnqVbez7;kZrMhhJjve_^{Yvo#wLy)Pm6Q{8rzyZFOAS=n$E_h*$(DMv-L0L zgWayK1|xhU!&Dcg4zF7gHRl{IM%OPc##m59uDe>_tuKM!`|o#=Q}y^0ehU?4FtSx6 z!r?9Du~rlji^*m^H`mkA=jMpYf`tow!mmea1cLS?+|`#o7_ z>0-||?xr(lMrE(@?8@5Hul@buU5?o8ZjOo9PPK;P?aB9(x`b&6<8-_H%{W6v@C*Ac zmhD5^w_{|3mp_fE9J~7J)-AY3T;v%8Pa>2#m5EIFZWh0OF&v8dNV=uL`Ceu08NUp5 zGxB8?CN>tk;l*;*68xki{cf@&;{K*{o*^Bii+ON=6z#UUcAbU3n&NL)P@N8*PP_>r z-sLZXsOwj#6C@F8rcc6pxX5J+o`?00i~Ta$&m;dNyXi%|8b?};tmGJ^ARM-C4~+{Yo4M}(td zRvSyDF23HX$pt}MNz)sH-5n?!1Wjcwo(+b{nB_5UyV(2CuHKM8B{8(cP7^W zW?(ZaWJ6Awqot$qA~{7zO>Jq5b+oDrie<96wRY+wGfUqggv0T)ok=nql`8V^6Dlb3 z&=seJwIhor`XvG-&Ya9!OP39rx6-I1mPk}M)bzo0l6X+s11TJ1BFJ;3w6C9yLv!Dz z$`5XHn~g*ren1t+U**J`r|po%35||BwMLPXIgH|rKYB6Dvuu&>@scxUzbKu&CGC65 z7sM5FhNqVMW9840Vrq=TTW?am_L+IMfbQC&a!i)41LPw~$9kKg+G(PB)sIXWKPX;R z1pffP3mp{qu`zI7s(!cosGs+vU9aoj@ih^$qgX@NhD|9axxSUCfWUV1{%A*~-*s;E z`HY>FO{KNbMS)BFvDjN?Htl!w=@Qp|gTtCIY!SQIOaYA6UO|1u&BAg8y|hyK@-O@w zpjx)>#C>%m5*pdWULC6nrXpxT3G~IPJJgLGddDGr)iKoI*;B8 zB6=zbS*xAlz6WNj#fSm$x_@xol~w4*Bg=P0^gLUq{fy$ihT3m`kAl1-{{hQ*<^jhKC5cn<65P_LgVE%XV$W^6m`ipo}Ve`Ts0u*)ZwqAkvPcy zYF~*|NSI8zF5@-;UB}HMtkl6qx9&TrEB(&kKH!7+eMW6SzmYF%m7>Z$VrBb*%9UiZ z?|g8SVG=>`@!d`Nh;7d@@iNU6i{wDPg?cB5_OWJ`xU>)ssJ9NqYyM~{^6#?X0SW;C_YRE((i_2!<}cVst=e9i=jAG189)8g&MUuZsgfpt76{E+$3^n*xz9X| zwhgl4`(*T0;1RtCGpF3o{#3+cPr@usQ-T3e1zJk<< zooAYV5}l%Y_Ci|TDvJJFwKR|*Qf3!0KTKqce>Ey?3Djb>i;A6glarOS*2@!5dKrBh z)>FcO82PS$OpS+F5fy9IH}d$%C{Ow|O^h3+8>VI2)}ZWb$!GHPi0z~1z-qadFdu%jdLovp@z5jXOF)3-v)p`yB$(iDR!j+iTqiby9(hoPqV((b zKBm(*6@v`%8Zmd8R*v$sIPyi3{kk7`#uBo@2qdS3L6EOXfJ`fTb+(sZ<$<<4l$@t|kv?NR)A9MVTF!LgXVxbbO~>br5e zt2#GyQRj*{vJgD{Abo1?G=Y>?=xk8BY{hAMT0FPvuFti-iz;&pq0wtu9oLGuG!ir` z6wgEgN0}lM1iq@`Dziju231Jz*!c3l_8t$cdnTmD5`Kd=NhD9!fi{^xf>29F`$9dU zRr-Tp(MQ?Pt0X3&GF7e=qbzcWNRl zTiW4Oe=Ln3V!#<|xgBo#>oeZ`iUL17t?=Y%#x$d#A6@5Sd<5X>y0`X%yGLKk3b!nj z(=>v<_u#}-!F|SOR}j0p3DwVNyMOS3R4)Qs>lVN@)e#4b#wJivc5G}EQsZP z74mu3aP*W?-NqYdA8#?o7HrhM@U>7h&Gd-D)lN?DL%UtjTv~CTdHMCDud;MpY z(O(Q2@+B<396jNsf__wBHiok01Mdojv$Nk3?>w*eFPzcvPS?mOH4L2MFW&C{TA8G7 zmS(<7JH){GoNuLlJaTL-M~E+?zT84nkK0XgXt>oNu7|*qV91Cn(vB^W7+P0zRq$1d zKe^VQJ8QdF^lio-ku%JHL2)DWm~B+c^)5SY2{iJBMl=SeOM@+&t{g@&2qC`|-OkI% z&YNiET?nN18|F1w`6`4+jrcU&SzWebg;5xiiqWB6Lqg}P)Mwl6LH6y$2qmMRXTV{?D9nidoMk{HA;Ck zChT|ADYe&8$2xwC@D&NId%uOIrqH`GN#ePWfoi#uuG)5trX1jO@p%9Ca%wtaQ+3Ey z^Z>KMz$<0_^>v@tIvi z&!=-vGBd@Pb2M|1{0}jgtfofLTu)t(>c7`-Z9V_;WyHETN8kqOe4;l?dvR^LS8%sF z8#+V2c8eXj|7iDd=9ck&^TH}^uCSTtH<6BISC6dIeXOX{N?{#+)kVPoMh{=-{n@N; z?LrMV>?;EmV`6w((~;%)5o35q51Pe zbyExOQ}!s18oiF0Ek}njHz$}8jw_UNK4zellZsD0`MlFsy`;9uR-{8jm+d0(kfkDf z+Ntt|J4nA!v?LMsltF}$w!vW6xCAdlg`@<9cU z;_G6ooAl>4B?c}AcKbzHn-T@OU@L^hT;oYhm%tdp{#=c^w`=eQ$Re+hv7A0CzdaH! zc|>RJ<6B-{quyhp>jv|fEi)3VzNy<2=2NVGFBEew0eH0s;R;y`85RZ2yTOL_9{;Kz zCY!sjLr~@MQ)QpE+D*&r`}Lcbg&V19gdF0FDSXky#PwjD(0sj-m`+FM9Hq!b|27ZN zn*2~t%1b`iJ8-TsF;tAHmf@ZB$HZ5H#hLz`5C7){APkZ^qyL+I4}m}TIQX~(`2Suh z_>YYbO)n=HNY>E-E)8>qx>!5G9bJAGpF`|nAW4Z|#Hws<26KVHtsNaSU@q1cfL}W! zmaae^z;8}TNk>luc3vKCAUiiFHxSIl&kqz702_f6;1D}&sDy*19Sq0`07Jhqc*Ce0Kvq~>rr zB~8Jg6n!);%?E|}HPqFn0!t)rg{-mh9zwtXMlX=3B@;VMriM^OxRgxUA|9QZWZuE zjq|WiYprK`)=C+J4-#g&<_TORGE4f%rp@~drqyyNw$Rr_xi_$fcz@DzpNiC^oG=dO z#E%SFuBVTPP896d^=_@+1oSXWeYqCkx}r-bc;2sjaVy6!1RuLX-ZzO2`8OT}2LBsx z0cp6I!G9GrUEE;5Yb7DBu-`5JtJ7G!xWc8ZATGb0K^gMrJ(w2&(y=y&Te%wWKR6EG z!*k{3(=U*G4{G=Yxr`p(1{XId z{EwclzlKiE#nH{_L9E8Z*h3s#oqn}Iy+BeL4{w#bH58^UCkavjKJ@ z4Gd%gybOL7jLPyBm4(R^h@8vx-mBH)$I&hJ*}dH|%I;l^?Qs3HXr(PB2g1c-r*9_| zB81RE`_1e}^r>f-k)aoOFc9HwAO>+zbe*GJQe{lclM55uUBSS+@7%BNn|1@1&+plS za{?+r2v5%fU40A8zF~bn47|i$%fDM4er$1q`SVP8r4fY<$CY#Ck>gFl^-Tn1U7Y*^ zRD0J6#&W+xVS9uC0v`svAGtSO&P{bfxX{8nL%$R540uI;9fu}%)DCfe-Hvnbp~&^} ziF+II_9Nq!$KND4Q6D=$`L9J{;4Tayv!f5vU~ z=v^+RA>v-{W5Y*Tx#;i^%2D*IHeM%+1%$^d48%xMro;=#v@0CMPoOK5o(MhdQirI@ z!M&yckw@Gs%$~>=?c|=1jDx)nN!bGVSJ*vK-n9E1KB@@jJ;d=vHfhs&|EML1YK3tY zaejqp_EA6^$L!9#L3#s~9#Z%#~N6r!=PeEPS zXwM|$Iq3jnKxcW}*TLpMJ$Xv?z$74nJb`)y50G6RWjySJTB;9~D`*Jl+J{>cZU+?Z zdnAM;D8UiWx{l=~8A0vXNAxoSF%QUu1PGzYm1iP-74|I;i3v>zu}D%b56*-)5blz< zrX~hQWC?whJD~>F1SCrls1ayBo{~6KdklYM6Re*bHA?@Mq5;EAa#0O?66I8iKR06( z$b%8{%p(uHEyjZ-8qAQ#Y(?q~5RKT(O&JyWfY2QFCii?4@B!H`xaFBip5rLC72eg8 zy=MV=qQq^S(V<{YEQU6^R74GkY$}OG*jETgD#~Wa6H{tke8LcVQ)XSlHxUF70$t4M zz>lUzx>S6jDyH1Ch@ZmkAwIJ>?%_rd>{;})@Y@vx0b;DM&Q&G>TuMln#Mps}>hDzGy8N9vwQX#-FJu10CBt4^c$`%Mrb+ zXcu4+Oz{xQ)s#b9PaJ;~P$%o5+y(ay!5x-k$NV8y6Z!$IDH0$^W(9{1i6Zz>d#I_5 z6Yw)m6Y8T-&fxI)XY$nMH~>^wD5)e~Ud*TzF^%%mO;pxU_KruZFP$Vl(>6U7Lk*CA zN`sFR^isBq8W%_Isr5??X+j#Jgb1c*_|(HdMV#c9q+NQ_kMnWHf~};a4)ui+e@HC-Ep1Y8KobY=fT(W^$G(k*0!2Br} z21#VJ3XF_}xZ1xOXF#BSG$v^4otMJ#- z{rUK6EKHb5Sc+7zXf8Ruf*)$JOgc%711JQ60NJpD{1-=W%{56`@RCCYqzLk#7bw0+ zeycwzs7YH!FcsSn-0-R)ob(xH-hS?W9%VjdfrVP_Tb;KulejidPSH*YPVr9BPl+%` z<#`BIq$l!7#wx6y^HAxdsz@p2!^U1&X?-Bn$F>jUlQPKL8+&fW|AAiC)oq zQ`}NC$J~WHBBn!*poY99w!lCMN1=U*(=Bgi4pt=Qjpmc3f(BQ2&6<&eD~UnTAO=*t z*vK9_Q+3YospQIHn{>s*8bPJ)nVmck4l@pDrCBA!!OQ{L!?~8)n_7}elsc2@kxDRR zImAE2V1ZH_Cu8QUi~Hsktlh=z@_mP{a|Ms~qw-f%9Xn8!_iuECEAXZ%c4IKx!p#ca zAI;Lv!tJp5s0>2c+qKPh-)~j3&$`#dHfR;>oTkD7)3Aa$p>eteDI z7$vpxUH~`t6|~)X^}?A(04id`MXW*(BQJP-`NXhILP6a+tGh=k7ks{qE!Y8g z0b~K#;z2p>BdbJ*gBNiZRTnBZREW}TPhTRj1SPkrJf(YyQ;7Bo$t{?pJ=Ij%!6 zO{o5%PbKlGgW?tYXx4FFp%#VObO=gv=VgtmY%^lu?&J2LX@qWtdUXhQkgl>q5)UI# zalq2J2|<0b>+}FT`j;-!1o?zx@m9}2&UYK9LEN|haA*u+!C8I1hcb=;-> zjlc}#94H{!q?$=5M%zNuLSKnH6PeK^PWLjDK0!g54hxq`A&E{#ovt+TM4o}pcAWaP zaDN28@+&&}cy)!Zg}&p~>Vl=QlkzsSZV8nM$O;985rx*{=;JisAAjfmj$$2wkpMR5 zN{bwl*kqjO8#kw~VfY$oC;N$BII+3!(K@FK+a;=CVx}AiO-h1oKT)6kI){t6SG*2ApICZ#*NJG(opJBL1)K7l@jKBp+P=uJ`N2G=0(8^{|Jwh6W> z-8XtI@9k<4waHnNlS8!PwO(nZX+=yWO~p=SP37FVh6>w3&SwUg$v;G^Ut%^=SZ%An35$l%W4H()X#pMUXb&CYVR>?`S4q1no6 z%-WdRFYlWzuG>Dff9i}iWvu3|J+6tW9;~&kA*-H$-&k{96H&`pV_B193WA9bE_Ez* zi?xXji1my0-o@T>Uo%{bp~Qv+g+xh(N`*_sbOv_5>P+j5ppB-@rp=I~~=yRp> z;-+`kY345b3}dB=LtD(2#kb8>!Zq!c>2G`+pSI4{vA)G_JYO4rm@g~W95(hhTe|(? zeA0cgeZsDV&rK*Xu#+Padt-aE7~>hk8I!jJRKcngs_3fxBkbANA`>E2 zUfN#%hd++YyhuFjy{wx6O><2aO;}CFP1H?CO(acEnl_p+oA#P!JmowYJ-0klJZHQD zj;Sx1e1G^#`f~a1_{#cz^4<3h@&)_KUH-gOyxh5*yYRjYzob0+^0RsEy5l^Gs$X^0 za*OC&%IeGt|7P9V`8xJy+?whZXa%(8xN@*^u+`dmpL$P!k8%N$%U+rrb}t^|=MP zai*cBHKvs-B<=d`+U=C>)a?<$wZTfkyTQVeR@7xr>z=B}zmV7IQ}2`MGhtF>(qdA5 ztGS)2)~MDxYLv&E=aom0M3uA>%}&Oyf)Qnx>3Kmd2eXn}(VOjpi#&iL{NhzO=V=xC~`v(^D_dPZe_= ze9!Um@fp%LCig`MeVi)>0qjJ_yeF%<<1<6H7#EDXS@KD=Q5t4=D_(vgxy_{0P+z z(=E_VnPsnbuGXuDRXdvbeDicBbe3?|b7psTb+&X4I3hlRA4wju95Ea@98n+1eFJrg zzr2qRr0c^a$IZumh3kzw{qVEaj*Eu-6}NKomp3z}-$$OQAG^~9t% zlkq&s@T8N(=rM(uCw;@_Omk=;y2^hC50xHB~>SdB-LpaYL-o+ zl>$l`N*zierD&x(>gDPNO9H1jC!i%oCicG;WOvmrHri|$5>2z@QsP=RKZ7aM{HV#VQLI6( zaj%Jh(VA<(RQt3C_y*7VV*8%=bM*PYSxo3001u)K^!I=7)9bGv4C|vGq)1^+Yy{5G z^V9KD^MlS`?5Bw6XTLChW~^;&u%y0ZBqlAUaF^dQ6EFk_ z2K)px0Hy)i00@8?a0y5Td;v5AJn)MNmY96CX0_;8YFNkdlL=z*De(OWp!nSQs?5{O zpIEV&#MCl13|}m0O0s~Nj+pzItXcJ#>siBCC|(Tcl&RTh?`SoY^AwzDTT5A+q*V+R zvE@}%+LisNAb*>gn^|sH<||QeaQW$H($D;#4L>UZFT$i`lx4KKrMngApV1rB%N0p) zG%2PG*e5e5%Otxc8)@aUM3j9lOD`)etF_6s$=?A_aZSliQBDb$O_VumwU$2OeMK6V zo|ImkUXxEI`8PC$A7i)D4&l6P=i%ZL_ z^S@V+6s8rXmFSgz&G%9{&fcupe6{(BP&P_NNm@y7PH z`mFl%bHw(9_Bi%1d#F9*-m^WrJ&`@dJ?%aJz3Vy9hnWunyDRWnco%#UJ_I*(lXO#u z1K>7pb8Zeh%G*uTm-)3NO4A$D$9RC}T6QnXcKl*`}GV`B8I(_p!IhiTi2)mB`ijmC)6vtEDUO)z_<{tC*|iE9h12 z73)Xo(Ni(( z(W23`(cIA^#5w%?U`k$phbEhwc_haUYZ~i(do$=adtBSsP<4lKGnYm41@oo#sQwn_ zEaukIrqaC90_*tiI1`8yLKB04@I+q$DgUEH*4FEJr3-CR^sC zOk;O$cT%@-RL&bKB|ar*B^4!GCC7(HX*dWNjC#Y*&d9#3>t83SU#&Ypj;;&EbH$w&eHL>TUoGYdEeEYx?aOU>=^7|Bu){1^H?p&H%>J?iO8)@metGTtmE;(a86UL%L zF-Gx1p+zBz;EaGrm_;x|_(w!wp<^v$=V4V-;))l#^_|7v=M_pli5RBb5#sT-+8-^E z@`=@yqn0C(bCm=2689SRvh=$5QumVdV#f+Ur(-l`e8LE2Fl3lyn0}ajm_L&-Ygy2-Va4#Xp}8T3VWyF!(YTSKp}wKRXXS<_ z*RL+!^)B@T4WtbXpTp}dKEG-B)UX84axrqh-H>%!b58W&c6W7+f)BcvJ2kHJt@EvY z`gYcv^^Iet@`uB||E%b+=t%Q)^TciCb>_7tc3((;$Y$Pm$nNTab5}v%T9aH(uT-2& z996ci>T^{R)yJwtBTgf1BQ7I+Bh(`#BLpL@#&BavcDmE4$ywRx|btzR#};e-qxl{)+t* zdWLsCaU{I{u&CJA*__$rUq7F8{Ngy0)1I?gwc&U`wVtzS@@;u1U@%}VU@Cw=kQ9iU zOJYj&lsKMHjeHc>iPcodl-HC$o|K97Er}HsF}^3yd%8oj71Nc8wvTPMz#d>Va4%O% zf?9$^VnZS&*a1Zp`-&U`r4~sEc^5?(MGk2TN#}_P8YQMbK@(0e)+$BljZIPylT99wEAQfbM0>^{KSlUiH zpH`4|gI0wWTaH1RP}cZm6Qv-*CAACxD{4hHMaBVU77-T4GNCfo9pg#K$q_3_o`4UE zAKc*2;5KjtxWS3jiQS1e23I((EJqJr4=V%prnxe7C?=H=Tn@gj9;(KzzN)UQ9;^1M zKCSkv#;YEz7Ir>y&Uc=3j&)vj7IfxyUUbI(w$x?co46LX#=A!GO=_+7TlqKPwS#U@ zmm_^VE)w1(-Vax$$bspJWn8HOnfi8oAM%$FlL;k0IQC$XO?BgI|^ z6t2qOCzR5bGXuluWrVr~x>@O|iqX(0-w2XJco4$M=rWo@UQ)ZvRDb!D~6pIB)V8aqpj26JzZKBXN}_1rzlX$rJA; z9#2e6u#O8)9F(*w+-FJTNDPjCar*MIUm(Gm8VX#q?%7f{XV57&E7r1jX5noiV_~^@ zJ5-m}TKiVdOz%q1LC;w)Oix8GL~o?F#R6L6RjXBNUn^Bp`|!VsPYq+uvgN_X`}Mx{ z)OF7-@_y!i>yi45V@E}m}r^4m6-Bc=Z%Npf$>sCQjHI1GYS@ zus@m>u4uJeP}i${P^%h!$JuhI39((3U-ezxR2Wd0Q)o`yXT*D+E#`H!o|Rao>OeF` z6iGya??p&YKtS}7D2AY!c}}p~N%3dW5%YltbIHJSBf-;^$??*~3APfLYDUhPc$R;| z6@27-|3tav3U{Vu((r61nrNa=@!ilVH@D?GC-Cw#pyFpO_Z`2g6 z=#lBO=@sbd=_%;Xm6nok*kR1c>>rshnZB^SW8P!UV!71}XIW#{010anYPsiEl-hm% z0sCqpSY%kS@P48EM&3)$%kt9m@*1Gl{vuBoqJg9DT#=t=mix`CyH3%pNWRFpNDgKO zbB4)n+zu2CG_!nS31itUi!WO!bJcpRHB#Q9g;y?9c3LJ{o>;C_UZokQ(^~b2HT6^kj`uUW9S?A=@jP-Q=WcO57Wuqp)_Iahl zIorHmSS_EW)Ci@x&)tueuPK)NyWe-|cT{%#b|-dicdmCzc4tObhtaYoM7R7oz8ik8 zA9lqadB+1J^(37nB_}f@eMLG)+C#d;_aoajkL)`cJ(&vGEg3i22;Y*G@6ttcw>gd zWm`VGGY7Fb!^MRek%^m+VWTv2W}_~j>E9KV6;;6+Hf&k*v_4GbPBrqr``GZY%B}cx zSLU0{oJ?nw7*U*YMg45R(f4jg`*pr*zBRtqPtl(mK9Ski+KY3H8NF^YcV|9SKPz%g zaWL}svTn?(HY#4M5iPv+^nqUQorRr=Q&dpAqHw23pa`QN67CUt!x1uQm@>s7Y!usQ z=zBS^A(t{!nOlWwI9C77i+)3KBYBCHTZd0bAZ>1P@$MO)5_az<5m`S;HcM)n=IsdfSu@3(~J-I8}>XVxl-Ob(jFylw|EJ22-k z*T^36yEtUFf_Wd`!mQ&rk1|zKl_y=23gOx&5Wc;E;XE zV6INYp=hbB-tBgJ;oUolVu?$M#`f)YjgS-+Mr3i!)d(x89;ruC^-?iXv{H@{?VKa; zr{7;Y4LF@1p03uf?#(OL5je~*$JYC8x^M0E*3HPt)8f)@(o)l2(!QfTmla!(Sj1^6 z2sK z6%@n`i;?<%IJ9yM_K~#haQ(L3^)<3RGAy!LAzNWjp{oC+|EfQ)zq|jow<*qzlFC2x z=we}QK5qHBgodw%w}zO8p2k+mzB0S0gb0qW*e??HhNmW#kA;_Nlnf zk=>Tywd?8D{>Vgjj-S|xuQTD$&o4V)u7)><_`bw`sT_iAS-D!c=DCKsQn)6%nwX3> z$NNFfY0hN6M=cSJnA~{xpQ-N^%@0ndSejrp`yK)oFG9* zUMj%=ix!`|?>C#an=>p`M#d#gm42cZ-p-HoS4=o{4^0GGMjIZcZ?A+8OwCrc#I?E& zvyB3c7uT5AWHJ2Nj^oQbP4&l{w+1H$V|5meolDuiJ{x<9b>d6lW)1(XpZk4{a~>fF zOKj~*!Ai195=xA3=293^_}-L0Ec9as6$YmU`BR9Nqz;}ek+op^=G@N$4e-To~1@Y{)-lI9WL98573-{Jd(yUU}k z0~6v_rU2*L*Jm3GDZ(a8RnVF4nRn9#GfFd6?^+F)ZwIcm?rqy~KE_9NDSy{__yTCNOpKC(BZJ~nZG~!|a5l({DFjl}{Wr&%h8(frz=1;*+<`$4&i2ouRVr^~z zTl%jykb^78+Wa@92Z4WZ3pqht;9}e&puZOX+75MeuyAy-hd4lCVv>Jn`)lnlbfABM z_x}R&-|;%Qa{L}U4ydF3FFhbI2lrpV3Ozh_b2li=MNHY+%mrJPm6*ue%J|dLqQ3c; z`|JPW7X195?;rlyVqtsW;UDQa>~@MP+4nbE@5->Uy4Cq|N5tVoJ*5TpYyYRG&G>TI zy{CI}_hea~G z{!4wD@AlvHiGA=X4;f3HvYZ}Mxs;{VJq@y`D)zu4dQXZgi?-k<)9krWEqH(So= zzEtt2tMBjbj)(C=rlF3f1oYj_Ef|)hu{R!iQ}4>I`c`J)w)(SHkN-~ZxF>%y*{IB* zn!)4oLY6{<_8l|qI~YFwnRz6Q^N;dlUKK``|1$7j@e zvU2}f9iiHMUo!Q-j)2aGeATR72NGWO_3c;r>sS6el&$FR#7*o$3@fh3!XV@p8p?m>g)8Li_i1Etz%>GTfq1s|3%NgLqFFG-i+T_y{_T^#0~5} z+>gn9(ATWyKjh$b?b$lEdY*&p7fo4wd;O7v`i9*{*t)#`jUUVWs7!%v+ch%?o0kG zRXyVWt&B&p%0xdcy(jL}y*#_KXV>u`KEU8^UEtxk@j&3l40r2=Yd#6JUJRF5nZ9tv zCof~iqmLInF6{6#^jf@L`GDbigQu@GBG$iJ+OSLC;2869mM&FU{+uWFWwHL#zkj}E z`lICMMA3)+v%e+Yw~T$o_U(R&e3#G9x$_yF6x(x`h3wd=cc5)*)>+-(PwRf&U*f-M z&CEYlb#=dg{WCb;s<>_GRjc1w#yf+eoTqCo*D+mUlq>J!_quel;)&mlFAG-m=l)_1 z@Mh Date: Thu, 26 Nov 2020 17:41:50 +0100 Subject: [PATCH 17/36] Apparently there was a very good reason to use inotify. fixes #46 complete with test cases for inotify and polling. --- Pipfile | 1 + Pipfile.lock | 22 +- .../management/commands/document_consumer.py | 124 ++++++++---- .../tests/test_management_consumer.py | 188 ++++++++++++++++++ 4 files changed, 293 insertions(+), 42 deletions(-) create mode 100644 src/documents/tests/test_management_consumer.py diff --git a/Pipfile b/Pipfile index ad60e0905..a6169a2ba 100644 --- a/Pipfile +++ b/Pipfile @@ -35,6 +35,7 @@ scikit-learn="~=0.23.2" whitenoise = "~=5.2.0" watchdog = "*" whoosh="~=2.7.4" +inotify-simple = "*" [dev-packages] coveralls = "*" diff --git a/Pipfile.lock b/Pipfile.lock index 6ecca3c34..b10c414ed 100644 --- a/Pipfile.lock +++ b/Pipfile.lock @@ -1,7 +1,7 @@ { "_meta": { "hash": { - "sha256": "ae2643b9cf0cf5741ae149fb6bc0c480de41329ce48e773eb4b5d760bc5e2244" + "sha256": "e9792119f687757dd388e73827ddd4216910327d5b65a8b950d4b202679c36eb" }, "pipfile-spec": 6, "requires": {}, @@ -129,6 +129,14 @@ "index": "pypi", "version": "==0.32.0" }, + "inotify-simple": { + "hashes": [ + "sha256:8440ffe49c4ae81a8df57c1ae1eb4b6bfa7acb830099bfb3e305b383005cc128", + "sha256:854f9ac752cc1fcff6ca34e9d3d875c9a94c9b7d6eb377f63be2d481a566c6ee" + ], + "index": "pypi", + "version": "==1.3.5" + }, "joblib": { "hashes": [ "sha256:698c311779f347cf6b7e6b8a39bb682277b8ee4aba8cf9507bc0cf4cd4737b72", @@ -663,11 +671,11 @@ }, "faker": { "hashes": [ - "sha256:3f5d379e4b5ce92a8afe3c2ce59d7c43886370dd3bf9495a936b91888debfc81", - "sha256:8c0e8a06acef4b9312902e2ce18becabe62badd3a6632180bd0680c6ee111473" + "sha256:5398268e1d751ffdb3ed36b8a790ed98659200599b368eec38a02eed15bce997", + "sha256:d4183b8f57316de3be27cd6c3b40e9f9343d27c95c96179f027316c58c2c239e" ], "markers": "python_version >= '3.5'", - "version": "==4.17.0" + "version": "==4.17.1" }, "filelock": { "hashes": [ @@ -999,11 +1007,11 @@ }, "virtualenv": { "hashes": [ - "sha256:b0011228208944ce71052987437d3843e05690b2f23d1c7da4263fde104c97a2", - "sha256:b8d6110f493af256a40d65e29846c69340a947669eec8ce784fcf3dd3af28380" + "sha256:07cff122e9d343140366055f31be4dcd61fd598c69d11cd33a9d9c8df4546dd7", + "sha256:e0aac7525e880a429764cefd3aaaff54afb5d9f25c82627563603f5d7de5a6e5" ], "markers": "python_version >= '2.7' and python_version not in '3.0, 3.1, 3.2, 3.3'", - "version": "==20.1.0" + "version": "==20.2.1" } } } diff --git a/src/documents/management/commands/document_consumer.py b/src/documents/management/commands/document_consumer.py index 05711ebd8..4bfd78e8f 100644 --- a/src/documents/management/commands/document_consumer.py +++ b/src/documents/management/commands/document_consumer.py @@ -1,11 +1,11 @@ import logging import os +from time import sleep from django.conf import settings from django.core.management.base import BaseCommand from django_q.tasks import async_task from watchdog.events import FileSystemEventHandler -from watchdog.observers import Observer from watchdog.observers.polling import PollingObserver try: @@ -13,25 +13,54 @@ try: except ImportError: INotify = flags = None +logger = logging.getLogger(__name__) + + +def _consume(file): + try: + if os.path.isfile(file): + async_task("documents.tasks.consume_file", + file, + task_name=os.path.basename(file)[:100]) + else: + logger.debug( + f"Not consuming file {file}: File has moved.") + + except Exception as e: + # Catch all so that the consumer won't crash. + # This is also what the test case is listening for to check for + # errors. + logger.error( + "Error while consuming document: {}".format(e)) + + +def _consume_wait_unmodified(file, num_tries=20, wait_time=1): + mtime = -1 + current_try = 0 + while current_try < num_tries: + try: + new_mtime = os.stat(file).st_mtime + except FileNotFoundError: + logger.debug(f"File {file} moved while waiting for it to remain " + f"unmodified.") + return + if new_mtime == mtime: + _consume(file) + return + mtime = new_mtime + sleep(wait_time) + current_try += 1 + + logger.error(f"Timeout while waiting on file {file} to remain unmodified.") + class Handler(FileSystemEventHandler): - def _consume(self, file): - if os.path.isfile(file): - try: - async_task("documents.tasks.consume_file", - file, - task_name=os.path.basename(file)[:100]) - except Exception as e: - # Catch all so that the consumer won't crash. - logging.getLogger(__name__).error( - "Error while consuming document: {}".format(e)) - def on_created(self, event): - self._consume(event.src_path) + _consume_wait_unmodified(event.src_path) def on_moved(self, event): - self._consume(event.src_path) + _consume_wait_unmodified(event.dest_path) class Command(BaseCommand): @@ -40,12 +69,15 @@ class Command(BaseCommand): consumption directory. """ + # This is here primarily for the tests and is irrelevant in production. + stop_flag = False + def __init__(self, *args, **kwargs): - self.verbosity = 0 self.logger = logging.getLogger(__name__) BaseCommand.__init__(self, *args, **kwargs) + self.observer = None def add_arguments(self, parser): parser.add_argument( @@ -54,38 +86,60 @@ class Command(BaseCommand): nargs="?", help="The consumption directory." ) + parser.add_argument( + "--oneshot", + action="store_true", + help="Run only once." + ) def handle(self, *args, **options): - - self.verbosity = options["verbosity"] directory = options["directory"] logging.getLogger(__name__).info( - "Starting document consumer at {}".format( - directory - ) - ) + f"Starting document consumer at {directory}") - # Consume all files as this is not done initially by the watchdog for entry in os.scandir(directory): if entry.is_file(): async_task("documents.tasks.consume_file", entry.path, task_name=os.path.basename(entry.path)[:100]) - # Start the watchdog. Woof! - if settings.CONSUMER_POLLING > 0: - logging.getLogger(__name__).info( - "Using polling instead of file system notifications.") - observer = PollingObserver(timeout=settings.CONSUMER_POLLING) + if options["oneshot"]: + return + + if settings.CONSUMER_POLLING == 0 and INotify: + self.handle_inotify(directory) else: - observer = Observer() - event_handler = Handler() - observer.schedule(event_handler, directory, recursive=True) - observer.start() + self.handle_polling(directory) + + logger.debug("Consumer exiting.") + + def handle_polling(self, directory): + logging.getLogger(__name__).info( + f"Polling directory for changes: {directory}") + self.observer = PollingObserver(timeout=settings.CONSUMER_POLLING) + self.observer.schedule(Handler(), directory, recursive=False) + self.observer.start() try: - while observer.is_alive(): - observer.join(1) + while self.observer.is_alive(): + self.observer.join(1) + if self.stop_flag: + self.observer.stop() except KeyboardInterrupt: - observer.stop() - observer.join() + self.observer.stop() + self.observer.join() + + def handle_inotify(self, directory): + logging.getLogger(__name__).info( + f"Using inotify to watch directory for changes: {directory}") + + inotify = INotify() + inotify.add_watch(directory, flags.CLOSE_WRITE | flags.MOVED_TO) + try: + while not self.stop_flag: + for event in inotify.read(timeout=1000, read_delay=1000): + file = os.path.join(directory, event.name) + if os.path.isfile(file): + _consume(file) + except KeyboardInterrupt: + pass diff --git a/src/documents/tests/test_management_consumer.py b/src/documents/tests/test_management_consumer.py new file mode 100644 index 000000000..bfb7520ee --- /dev/null +++ b/src/documents/tests/test_management_consumer.py @@ -0,0 +1,188 @@ +import filecmp +import os +import shutil +import tempfile +from threading import Thread +from time import sleep +from unittest import mock + +from django.conf import settings +from django.test import TestCase, override_settings + +from documents.consumer import ConsumerError +from documents.management.commands import document_consumer + + +class ConsumerThread(Thread): + + def __init__(self): + super().__init__() + self.cmd = document_consumer.Command() + + def run(self) -> None: + self.cmd.handle(directory=settings.CONSUMPTION_DIR, oneshot=False) + + def stop(self): + # Consumer checks this every second. + self.cmd.stop_flag = True + + +def chunked(size, source): + for i in range(0, len(source), size): + yield source[i:i+size] + + +class TestConsumer(TestCase): + + sample_file = os.path.join(os.path.dirname(__file__), "samples", "simple.pdf") + + def setUp(self) -> None: + patcher = mock.patch("documents.management.commands.document_consumer.async_task") + self.task_mock = patcher.start() + self.addCleanup(patcher.stop) + + self.consume_dir = tempfile.mkdtemp() + + override_settings(CONSUMPTION_DIR=self.consume_dir).enable() + + def t_start(self): + self.t = ConsumerThread() + self.t.start() + # give the consumer some time to do initial work + sleep(1) + + def tearDown(self) -> None: + if self.t: + self.t.stop() + + def wait_for_task_mock_call(self): + n = 0 + while n < 100: + if self.task_mock.call_count > 0: + # give task_mock some time to finish and raise errors + sleep(1) + return + n += 1 + sleep(0.1) + self.fail("async_task was never called") + + # A bogus async_task that will simply check the file for + # completeness and raise an exception otherwise. + def bogus_task(self, func, filename, **kwargs): + eq = filecmp.cmp(filename, self.sample_file, shallow=False) + if not eq: + print("Consumed an INVALID file.") + raise ConsumerError("Incomplete File READ FAILED") + else: + print("Consumed a perfectly valid file.") + + def slow_write_file(self, target, incomplete=False): + with open(self.sample_file, 'rb') as f: + pdf_bytes = f.read() + + if incomplete: + pdf_bytes = pdf_bytes[:len(pdf_bytes) - 100] + + with open(target, 'wb') as f: + # this will take 2 seconds, since the file is about 20k. + print("Start writing file.") + for b in chunked(1000, pdf_bytes): + f.write(b) + sleep(0.1) + print("file completed.") + + def test_consume_file(self): + self.t_start() + + f = os.path.join(self.consume_dir, "my_file.pdf") + shutil.copy(self.sample_file, f) + + self.wait_for_task_mock_call() + + self.task_mock.assert_called_once() + self.assertEqual(self.task_mock.call_args.args[1], f) + + @override_settings(CONSUMER_POLLING=1) + def test_consume_file_polling(self): + self.test_consume_file() + + def test_consume_existing_file(self): + f = os.path.join(self.consume_dir, "my_file.pdf") + shutil.copy(self.sample_file, f) + + self.t_start() + self.task_mock.assert_called_once() + self.assertEqual(self.task_mock.call_args.args[1], f) + + @override_settings(CONSUMER_POLLING=1) + def test_consume_existing_file_polling(self): + self.test_consume_existing_file() + + @mock.patch("documents.management.commands.document_consumer.logger.error") + def test_slow_write_pdf(self, error_logger): + + self.task_mock.side_effect = self.bogus_task + + self.t_start() + + fname = os.path.join(self.consume_dir, "my_file.pdf") + + self.slow_write_file(fname) + + self.wait_for_task_mock_call() + + error_logger.assert_not_called() + + self.task_mock.assert_called_once() + + self.assertEqual(self.task_mock.call_args.args[1], fname) + + @override_settings(CONSUMER_POLLING=1) + def test_slow_write_pdf_polling(self): + self.test_slow_write_pdf() + + @mock.patch("documents.management.commands.document_consumer.logger.error") + def test_slow_write_and_move(self, error_logger): + + self.task_mock.side_effect = self.bogus_task + + self.t_start() + + fname = os.path.join(self.consume_dir, "my_file.~df") + fname2 = os.path.join(self.consume_dir, "my_file.pdf") + + self.slow_write_file(fname) + shutil.move(fname, fname2) + + self.wait_for_task_mock_call() + + self.task_mock.assert_called_once() + self.assertEqual(self.task_mock.call_args.args[1], fname2) + + error_logger.assert_not_called() + + @override_settings(CONSUMER_POLLING=1) + def test_slow_write_and_move_polling(self): + self.test_slow_write_and_move() + + @mock.patch("documents.management.commands.document_consumer.logger.error") + def test_slow_write_incomplete(self, error_logger): + + self.task_mock.side_effect = self.bogus_task + + self.t_start() + + fname = os.path.join(self.consume_dir, "my_file.pdf") + self.slow_write_file(fname, incomplete=True) + + self.wait_for_task_mock_call() + + self.task_mock.assert_called_once() + self.assertEqual(self.task_mock.call_args.args[1], fname) + + # assert that we have an error logged with this invalid file. + error_logger.assert_called_once() + + @override_settings(CONSUMER_POLLING=1) + def test_slow_write_incomplete_polling(self): + self.test_slow_write_incomplete() From d5ec76295455105fc5a9d21a77d6a16af06b9ddc Mon Sep 17 00:00:00 2001 From: jonaswinkler Date: Thu, 26 Nov 2020 18:55:05 +0100 Subject: [PATCH 18/36] couple changes to the consumer. --- src/documents/management/commands/document_consumer.py | 8 +------- 1 file changed, 1 insertion(+), 7 deletions(-) diff --git a/src/documents/management/commands/document_consumer.py b/src/documents/management/commands/document_consumer.py index 4bfd78e8f..c25d0cfa9 100644 --- a/src/documents/management/commands/document_consumer.py +++ b/src/documents/management/commands/document_consumer.py @@ -95,14 +95,8 @@ class Command(BaseCommand): def handle(self, *args, **options): directory = options["directory"] - logging.getLogger(__name__).info( - f"Starting document consumer at {directory}") - for entry in os.scandir(directory): - if entry.is_file(): - async_task("documents.tasks.consume_file", - entry.path, - task_name=os.path.basename(entry.path)[:100]) + _consume(entry.path) if options["oneshot"]: return From 4bf0d834a0ad39f2cdfe188545bd65401f128f0f Mon Sep 17 00:00:00 2001 From: jonaswinkler Date: Thu, 26 Nov 2020 22:17:14 +0100 Subject: [PATCH 19/36] improved test cases. Python 3.6 compatibility. --- Pipfile | 3 ++ Pipfile.lock | 30 +++++++++++++- src/documents/tests/test_api.py | 30 ++++---------- src/documents/tests/test_classifier.py | 1 - src/documents/tests/test_consumer.py | 26 +++--------- .../tests/test_management_consumer.py | 37 ++++++++++------- src/documents/tests/test_matchables.py | 9 ++++ src/documents/tests/utils.py | 41 +++++++++++++++++++ 8 files changed, 116 insertions(+), 61 deletions(-) create mode 100644 src/documents/tests/utils.py diff --git a/Pipfile b/Pipfile index a6169a2ba..105efd0ad 100644 --- a/Pipfile +++ b/Pipfile @@ -8,6 +8,9 @@ url = "https://www.piwheels.org/simple" verify_ssl = true name = "piwheels" +[requires] +python_version = "3.6" + [packages] dateparser = "~=0.7.6" django = "~=3.1.3" diff --git a/Pipfile.lock b/Pipfile.lock index b10c414ed..918609845 100644 --- a/Pipfile.lock +++ b/Pipfile.lock @@ -1,10 +1,12 @@ { "_meta": { "hash": { - "sha256": "e9792119f687757dd388e73827ddd4216910327d5b65a8b950d4b202679c36eb" + "sha256": "d6432a18280c092c108e998f00bcd377c0c55ef18f26cb0b8eb64f9618b9f383" }, "pipfile-spec": 6, - "requires": {}, + "requires": { + "python_version": "3.6" + }, "sources": [ { "name": "pypi", @@ -701,6 +703,22 @@ "markers": "python_version >= '2.7' and python_version not in '3.0, 3.1, 3.2, 3.3'", "version": "==1.2.0" }, + "importlib-metadata": { + "hashes": [ + "sha256:030f3b1bdb823ecbe4a9659e14cc861ce5af403fe99863bae173ec5fe00ab132", + "sha256:caeee3603f5dcf567864d1be9b839b0bcfdf1383e3e7be33ce2dead8144ff19c" + ], + "markers": "python_version < '3.8'", + "version": "==2.1.0" + }, + "importlib-resources": { + "hashes": [ + "sha256:7b51f0106c8ec564b1bef3d9c588bc694ce2b92125bbb6278f4f2f5b54ec3592", + "sha256:a3d34a8464ce1d5d7c92b0ea4e921e696d86f2aa212e684451cb1482c8d84ed5" + ], + "markers": "python_version < '3.7'", + "version": "==3.3.0" + }, "iniconfig": { "hashes": [ "sha256:011e24c64b7f47f6ebd835bb12a743f2fbe9a26d4cecaa7f53bc4f35ee9da8b3", @@ -1012,6 +1030,14 @@ ], "markers": "python_version >= '2.7' and python_version not in '3.0, 3.1, 3.2, 3.3'", "version": "==20.2.1" + }, + "zipp": { + "hashes": [ + "sha256:102c24ef8f171fd729d46599845e95c7ab894a4cf45f5de11a44cc7444fb1108", + "sha256:ed5eee1974372595f9e416cc7bbeeb12335201d8081ca8a0743c954d4446e5cb" + ], + "markers": "python_version < '3.8'", + "version": "==3.4.0" } } } diff --git a/src/documents/tests/test_api.py b/src/documents/tests/test_api.py index c7e31e280..d9a2aac26 100644 --- a/src/documents/tests/test_api.py +++ b/src/documents/tests/test_api.py @@ -1,41 +1,24 @@ import os -import shutil import tempfile from unittest import mock from django.contrib.auth.models import User -from django.test import override_settings from pathvalidate import ValidationError from rest_framework.test import APITestCase from documents.models import Document, Correspondent, DocumentType, Tag +from documents.tests.utils import setup_directories, remove_dirs class DocumentApiTest(APITestCase): def setUp(self): - self.scratch_dir = tempfile.mkdtemp() - self.media_dir = tempfile.mkdtemp() - self.originals_dir = os.path.join(self.media_dir, "documents", "originals") - self.thumbnail_dir = os.path.join(self.media_dir, "documents", "thumbnails") - - os.makedirs(self.originals_dir, exist_ok=True) - os.makedirs(self.thumbnail_dir, exist_ok=True) - - override_settings( - SCRATCH_DIR=self.scratch_dir, - MEDIA_ROOT=self.media_dir, - ORIGINALS_DIR=self.originals_dir, - THUMBNAIL_DIR=self.thumbnail_dir - ).enable() + self.dirs = setup_directories() + self.addCleanup(remove_dirs, self.dirs) user = User.objects.create_superuser(username="temp_admin") self.client.force_login(user=user) - def tearDown(self): - shutil.rmtree(self.scratch_dir, ignore_errors=True) - shutil.rmtree(self.media_dir, ignore_errors=True) - def testDocuments(self): response = self.client.get("/api/documents/").data @@ -88,7 +71,7 @@ class DocumentApiTest(APITestCase): def test_document_actions(self): - _, filename = tempfile.mkstemp(dir=self.originals_dir) + _, filename = tempfile.mkstemp(dir=self.dirs.originals_dir) content = b"This is a test" content_thumbnail = b"thumbnail content" @@ -98,7 +81,7 @@ class DocumentApiTest(APITestCase): doc = Document.objects.create(title="none", filename=os.path.basename(filename), mime_type="application/pdf") - with open(os.path.join(self.thumbnail_dir, "{:07d}.png".format(doc.pk)), "wb") as f: + with open(os.path.join(self.dirs.thumbnail_dir, "{:07d}.png".format(doc.pk)), "wb") as f: f.write(content_thumbnail) response = self.client.get('/api/documents/{}/download/'.format(doc.pk)) @@ -227,7 +210,8 @@ class DocumentApiTest(APITestCase): m.assert_called_once() - self.assertEqual(m.call_args.kwargs['override_filename'], "simple.pdf") + args, kwargs = m.call_args + self.assertEqual(kwargs['override_filename'], "simple.pdf") @mock.patch("documents.forms.async_task") def test_upload_invalid_form(self, m): diff --git a/src/documents/tests/test_classifier.py b/src/documents/tests/test_classifier.py index 0f421bb32..e5e7d8639 100644 --- a/src/documents/tests/test_classifier.py +++ b/src/documents/tests/test_classifier.py @@ -11,7 +11,6 @@ from documents.models import Correspondent, Document, Tag, DocumentType class TestClassifier(TestCase): def setUp(self): - self.classifier = DocumentClassifier() def generate_test_data(self): diff --git a/src/documents/tests/test_consumer.py b/src/documents/tests/test_consumer.py index 6dab98d02..323f5051f 100644 --- a/src/documents/tests/test_consumer.py +++ b/src/documents/tests/test_consumer.py @@ -1,12 +1,12 @@ import os import re -import shutil import tempfile from unittest import mock from unittest.mock import MagicMock from django.test import TestCase, override_settings +from .utils import setup_directories, remove_dirs from ..consumer import Consumer, ConsumerError from ..models import FileInfo, Tag, Correspondent, DocumentType, Document from ..parsers import DocumentParser, ParseError @@ -411,23 +411,14 @@ def fake_magic_from_file(file, mime=False): class TestConsumer(TestCase): def make_dummy_parser(self, path, logging_group): - return DummyParser(path, logging_group, self.scratch_dir) + return DummyParser(path, logging_group, self.dirs.scratch_dir) def make_faulty_parser(self, path, logging_group): - return FaultyParser(path, logging_group, self.scratch_dir) + return FaultyParser(path, logging_group, self.dirs.scratch_dir) def setUp(self): - self.scratch_dir = tempfile.mkdtemp() - self.media_dir = tempfile.mkdtemp() - self.consumption_dir = tempfile.mkdtemp() - - override_settings( - SCRATCH_DIR=self.scratch_dir, - MEDIA_ROOT=self.media_dir, - ORIGINALS_DIR=os.path.join(self.media_dir, "documents", "originals"), - THUMBNAIL_DIR=os.path.join(self.media_dir, "documents", "thumbnails"), - CONSUMPTION_DIR=self.consumption_dir - ).enable() + self.dirs = setup_directories() + self.addCleanup(remove_dirs, self.dirs) patcher = mock.patch("documents.parsers.document_consumer_declaration.send") m = patcher.start() @@ -441,13 +432,8 @@ class TestConsumer(TestCase): self.consumer = Consumer() - def tearDown(self): - shutil.rmtree(self.scratch_dir, ignore_errors=True) - shutil.rmtree(self.media_dir, ignore_errors=True) - shutil.rmtree(self.consumption_dir, ignore_errors=True) - def get_test_file(self): - fd, f = tempfile.mkstemp(suffix=".pdf", dir=self.scratch_dir) + fd, f = tempfile.mkstemp(suffix=".pdf", dir=self.dirs.scratch_dir) return f def testNormalOperation(self): diff --git a/src/documents/tests/test_management_consumer.py b/src/documents/tests/test_management_consumer.py index bfb7520ee..33938d450 100644 --- a/src/documents/tests/test_management_consumer.py +++ b/src/documents/tests/test_management_consumer.py @@ -1,7 +1,6 @@ import filecmp import os import shutil -import tempfile from threading import Thread from time import sleep from unittest import mock @@ -11,6 +10,7 @@ from django.test import TestCase, override_settings from documents.consumer import ConsumerError from documents.management.commands import document_consumer +from documents.tests.utils import setup_directories, remove_dirs class ConsumerThread(Thread): @@ -41,9 +41,8 @@ class TestConsumer(TestCase): self.task_mock = patcher.start() self.addCleanup(patcher.stop) - self.consume_dir = tempfile.mkdtemp() - - override_settings(CONSUMPTION_DIR=self.consume_dir).enable() + self.dirs = setup_directories() + self.addCleanup(remove_dirs, self.dirs) def t_start(self): self.t = ConsumerThread() @@ -94,25 +93,29 @@ class TestConsumer(TestCase): def test_consume_file(self): self.t_start() - f = os.path.join(self.consume_dir, "my_file.pdf") + f = os.path.join(self.dirs.consumption_dir, "my_file.pdf") shutil.copy(self.sample_file, f) self.wait_for_task_mock_call() self.task_mock.assert_called_once() - self.assertEqual(self.task_mock.call_args.args[1], f) + + args, kwargs = self.task_mock.call_args + self.assertEqual(args[1], f) @override_settings(CONSUMER_POLLING=1) def test_consume_file_polling(self): self.test_consume_file() def test_consume_existing_file(self): - f = os.path.join(self.consume_dir, "my_file.pdf") + f = os.path.join(self.dirs.consumption_dir, "my_file.pdf") shutil.copy(self.sample_file, f) self.t_start() self.task_mock.assert_called_once() - self.assertEqual(self.task_mock.call_args.args[1], f) + + args, kwargs = self.task_mock.call_args + self.assertEqual(args[1], f) @override_settings(CONSUMER_POLLING=1) def test_consume_existing_file_polling(self): @@ -125,7 +128,7 @@ class TestConsumer(TestCase): self.t_start() - fname = os.path.join(self.consume_dir, "my_file.pdf") + fname = os.path.join(self.dirs.consumption_dir, "my_file.pdf") self.slow_write_file(fname) @@ -135,7 +138,8 @@ class TestConsumer(TestCase): self.task_mock.assert_called_once() - self.assertEqual(self.task_mock.call_args.args[1], fname) + args, kwargs = self.task_mock.call_args + self.assertEqual(args[1], fname) @override_settings(CONSUMER_POLLING=1) def test_slow_write_pdf_polling(self): @@ -148,8 +152,8 @@ class TestConsumer(TestCase): self.t_start() - fname = os.path.join(self.consume_dir, "my_file.~df") - fname2 = os.path.join(self.consume_dir, "my_file.pdf") + fname = os.path.join(self.dirs.consumption_dir, "my_file.~df") + fname2 = os.path.join(self.dirs.consumption_dir, "my_file.pdf") self.slow_write_file(fname) shutil.move(fname, fname2) @@ -157,7 +161,9 @@ class TestConsumer(TestCase): self.wait_for_task_mock_call() self.task_mock.assert_called_once() - self.assertEqual(self.task_mock.call_args.args[1], fname2) + + args, kwargs = self.task_mock.call_args + self.assertEqual(args[1], fname2) error_logger.assert_not_called() @@ -172,13 +178,14 @@ class TestConsumer(TestCase): self.t_start() - fname = os.path.join(self.consume_dir, "my_file.pdf") + fname = os.path.join(self.dirs.consumption_dir, "my_file.pdf") self.slow_write_file(fname, incomplete=True) self.wait_for_task_mock_call() self.task_mock.assert_called_once() - self.assertEqual(self.task_mock.call_args.args[1], fname) + args, kwargs = self.task_mock.call_args + self.assertEqual(args[1], fname) # assert that we have an error logged with this invalid file. error_logger.assert_called_once() diff --git a/src/documents/tests/test_matchables.py b/src/documents/tests/test_matchables.py index 24e285ae7..4e4a3e7dc 100644 --- a/src/documents/tests/test_matchables.py +++ b/src/documents/tests/test_matchables.py @@ -1,3 +1,5 @@ +import shutil +import tempfile from random import randint from django.contrib.admin.models import LogEntry @@ -215,6 +217,13 @@ class TestDocumentConsumptionFinishedSignal(TestCase): self.doc_contains = Document.objects.create( content="I contain the keyword.", mime_type="application/pdf") + self.index_dir = tempfile.mkdtemp() + # TODO: we should not need the index here. + override_settings(INDEX_DIR=self.index_dir).enable() + + def tearDown(self) -> None: + shutil.rmtree(self.index_dir, ignore_errors=True) + def test_tag_applied_any(self): t1 = Tag.objects.create( name="test", match="keyword", matching_algorithm=Tag.MATCH_ANY) diff --git a/src/documents/tests/utils.py b/src/documents/tests/utils.py new file mode 100644 index 000000000..7b0938ee3 --- /dev/null +++ b/src/documents/tests/utils.py @@ -0,0 +1,41 @@ +import os +import shutil +import tempfile +from collections import namedtuple + +from django.test import override_settings + + +def setup_directories(): + + dirs = namedtuple("Dirs", ()) + + dirs.data_dir = tempfile.mkdtemp() + dirs.scratch_dir = tempfile.mkdtemp() + dirs.media_dir = tempfile.mkdtemp() + dirs.consumption_dir = tempfile.mkdtemp() + dirs.index_dir = os.path.join(dirs.data_dir, "documents", "originals") + dirs.originals_dir = os.path.join(dirs.media_dir, "documents", "originals") + dirs.thumbnail_dir = os.path.join(dirs.media_dir, "documents", "thumbnails") + os.makedirs(dirs.index_dir) + os.makedirs(dirs.originals_dir) + os.makedirs(dirs.thumbnail_dir) + + override_settings( + DATA_DIR=dirs.data_dir, + SCRATCH_DIR=dirs.scratch_dir, + MEDIA_ROOT=dirs.media_dir, + ORIGINALS_DIR=dirs.originals_dir, + THUMBNAIL_DIR=dirs.thumbnail_dir, + CONSUMPTION_DIR=dirs.consumption_dir, + INDEX_DIR=dirs.index_dir + ).enable() + + return dirs + + +def remove_dirs(dirs): + shutil.rmtree(dirs.media_dir, ignore_errors=True) + shutil.rmtree(dirs.data_dir, ignore_errors=True) + shutil.rmtree(dirs.scratch_dir, ignore_errors=True) + shutil.rmtree(dirs.consumption_dir, ignore_errors=True) From b589b7a5dcc8307a743c5db50cf5bd09d8abc2b2 Mon Sep 17 00:00:00 2001 From: jonaswinkler Date: Thu, 26 Nov 2020 22:18:30 +0100 Subject: [PATCH 20/36] The index is now recreated in case loading fails. --- src/documents/index.py | 21 ++++++++++++--------- 1 file changed, 12 insertions(+), 9 deletions(-) diff --git a/src/documents/index.py b/src/documents/index.py index cf312cbcc..a6c3abba8 100644 --- a/src/documents/index.py +++ b/src/documents/index.py @@ -64,15 +64,18 @@ def get_schema(): def open_index(recreate=False): - if exists_in(settings.INDEX_DIR) and not recreate: - return open_dir(settings.INDEX_DIR) - else: - # TODO: this is not thread safe. If 2 instances try to create the index - # at the same time, this fails. This currently prevents parallel - # tests. - if not os.path.isdir(settings.INDEX_DIR): - os.makedirs(settings.INDEX_DIR, exist_ok=True) - return create_in(settings.INDEX_DIR, get_schema()) + # TODO: this is not thread safe. If 2 instances try to create the index + # at the same time, this fails. This currently prevents parallel + # tests. + try: + if exists_in(settings.INDEX_DIR) and not recreate: + return open_dir(settings.INDEX_DIR) + except Exception as e: + logger.error(f"Error while opening the index: {e}, recreating.") + + if not os.path.isdir(settings.INDEX_DIR): + os.makedirs(settings.INDEX_DIR, exist_ok=True) + return create_in(settings.INDEX_DIR, get_schema()) def update_document(writer, doc): From 6454df57bf4937bece5e53035a20f0376f51d75f Mon Sep 17 00:00:00 2001 From: jonaswinkler Date: Thu, 26 Nov 2020 23:09:17 +0100 Subject: [PATCH 21/36] removed some obsolete exporter code. --- .../management/commands/document_exporter.py | 42 +------------------ 1 file changed, 1 insertion(+), 41 deletions(-) diff --git a/src/documents/management/commands/document_exporter.py b/src/documents/management/commands/document_exporter.py index f86462119..971481ff8 100644 --- a/src/documents/management/commands/document_exporter.py +++ b/src/documents/management/commands/document_exporter.py @@ -22,13 +22,6 @@ class Command(Renderable, BaseCommand): def add_arguments(self, parser): parser.add_argument("target") - parser.add_argument( - "--legacy", - action="store_true", - help="Don't try to export all of the document data, just dump the " - "original document files out in a format that makes " - "re-consuming them easy." - ) def __init__(self, *args, **kwargs): BaseCommand.__init__(self, *args, **kwargs) @@ -44,10 +37,7 @@ class Command(Renderable, BaseCommand): if not os.access(self.target, os.W_OK): raise CommandError("That path doesn't appear to be writable") - if options["legacy"]: - self.dump_legacy() - else: - self.dump() + self.dump() def dump(self): @@ -102,33 +92,3 @@ class Command(Renderable, BaseCommand): with open(os.path.join(self.target, "manifest.json"), "w") as f: json.dump(manifest, f, indent=2) - - def dump_legacy(self): - - for document in Document.objects.all(): - - target = os.path.join( - self.target, self._get_legacy_file_name(document)) - - print("Exporting: {}".format(target)) - - with open(target, "wb") as f: - f.write(GnuPG.decrypted(document.source_file)) - t = int(time.mktime(document.created.timetuple())) - os.utime(target, times=(t, t)) - - @staticmethod - def _get_legacy_file_name(doc): - - if not doc.correspondent and not doc.title: - return os.path.basename(doc.source_path) - - created = doc.created.strftime("%Y%m%d%H%M%SZ") - tags = ",".join([t.slug for t in doc.tags.all()]) - - if tags: - return "{} - {} - {} - {}{}".format( - created, doc.correspondent, doc.title, tags, doc.file_type) - - return "{} - {} - {}{}".format( - created, doc.correspondent, doc.title, doc.file_type) From db0f7649d1784cfbe1c56701c117880170d15206 Mon Sep 17 00:00:00 2001 From: jonaswinkler Date: Thu, 26 Nov 2020 23:56:57 +0100 Subject: [PATCH 22/36] more tests. --- .../management/commands/decrypt_documents.py | 2 +- .../tests/samples/originals/0000001.pdf | Bin 0 -> 22926 bytes .../tests/samples/originals/0000002.pdf.gpg | Bin 0 -> 18961 bytes src/documents/tests/samples/thumb/0000001.png | Bin 0 -> 7913 bytes .../tests/samples/thumb/0000002.png.gpg | Bin 0 -> 7141 bytes .../tests/test_management_decrypt.py | 56 ++++++++++++++++++ .../tests/test_management_exporter.py | 53 +++++++++++++++++ src/setup.cfg | 2 +- 8 files changed, 111 insertions(+), 2 deletions(-) create mode 100644 src/documents/tests/samples/originals/0000001.pdf create mode 100644 src/documents/tests/samples/originals/0000002.pdf.gpg create mode 100644 src/documents/tests/samples/thumb/0000001.png create mode 100644 src/documents/tests/samples/thumb/0000002.png.gpg create mode 100644 src/documents/tests/test_management_decrypt.py create mode 100644 src/documents/tests/test_management_exporter.py diff --git a/src/documents/management/commands/decrypt_documents.py b/src/documents/management/commands/decrypt_documents.py index f9b4edcdc..2287bfa72 100644 --- a/src/documents/management/commands/decrypt_documents.py +++ b/src/documents/management/commands/decrypt_documents.py @@ -74,7 +74,7 @@ class Command(BaseCommand): f"Abort: encrypted file {document.source_path} does not " f"end with .gpg") - document.filename = os.path.splitext(document.source_path)[0] + document.filename = os.path.splitext(document.filename)[0] with open(document.source_path, "wb") as f: f.write(raw_document) diff --git a/src/documents/tests/samples/originals/0000001.pdf b/src/documents/tests/samples/originals/0000001.pdf new file mode 100644 index 0000000000000000000000000000000000000000..e450de48269ce43785b8344c63e233a1794abae6 GIT binary patch literal 22926 zcmeFZ1ymeg@;^!l!6mrE;Lb3(yL)iAVQ_bM2@U~*y9EgZNJ4OTf?IHc27(3mH{|=> z-S7V7z4P{+J?EYO***+?`*z*Bb*rj-daCNvG^&!)EFe~HWSZ{c?w0P)-Fe9D05*W5 znGLd_AW#wFVCiNB;DGk10i~_&+#oJMX**Llh$IB;Xbuq;Ms{^`ftcDOdu6l44>H?6H;@?p^vKWPs#P#ZGbM{;Bcf{BGr+ELKNlvYYRNcNgtX+7@aLXT zMQN!S?3XnMOGXzd6?Y;rsx^sOB+DXSS48V9%8C_*Nre0Ge09*fJ;tB)nym>uKSOw1 z2i!o0IGFz_FtqiwM&zfZJvBhw+)rnJ_woEU1@Qha3iwk&AOMJsm!0je>e%A*b|c=( zS#^}IgXn*zCa)u*nWXP(wA20EkL1j%VyEC?M)$S`K=_(QzmCRCLHrFiECjSQtYn`iKg zf}%nOaWK%_&+Ku&A#j>Q@-?@j>#2p9dZv4QKhun z=@em(Dge&env$D{x9Q_-*cI_>U>>Rgrg4#rb67eijW{P8;mu->2nuC92$yD~)|^om zof)g{JNi%po%qS2uXL^$$;LVc720v6ksjPB{pbm!yHQ(d{s&oogF>puBi3^YH8K8~ ztf=^&Z>QNYfr%PP%}Ba_X=avrD9bVAkH*pka_wzWhja;v5}TSXTYZnCH!OGA` z3&Wr_z7-7B5)oa9ALHmvT?5AkgZZZC23wcJ>T-OElbRVKU0r;Baq_`7Pq-kT3Z{JZ znzD>tk*w~s6hTUEMXXn7y`Gwr?fjkxs;nIJ_~l-gs<>$-h<Ro53Nw-;(BpU?f_z^C3`oT3wrR`6@gqyKrECgMXzc67xJHqs zAT-Dx8^>$LdmKT)E37b`Q9HMosc9RLS$SU}H%%K8sPn~!;@wJl8+r3Ni~|WNKE`!R z=<|F3*4*42fu`oqj{85Inim%J8su5@xg8h26nNi<@6U2i3+$78s?@4}*VfUtWWkk6 zeAe{ldtn!>Qb4X=udArd3&K0rj2b-D!=Pmdh8w@li!_F%!*}lAmIHJV5!u@`n11Hu z#F}Fagcv7kuQ4S`TnuB$$LG*7l+ct^QSXK;nPa<~;%{0m9&|Yq?448ky<0xS-kd^R z=`@)^j=-TZt1p0$iI!&iVt%=D96Ou<>au%fn$^mpvHOmuK3obBkAk|UuHVvh2G0bh zVd#_TTdGX6Jv;Kv{=TJ2sA4{=8zx zVGa>A?xEGeV@B7Swbkd+ z`Yz5K(Xo}_Tt4>o8W?%ftQ37A^?FYCHR{9eQ0jBvmGBcLV z7USBIYAT_SguJkPyK>eTf=DHgI?IA7lk=OMias-*WM_{oKsStX;f1tbxPT*rG)H@JdR-qiMbg%YftI!VPiy zZR^}EJtn@&S8k+jFr_tRn+KzvT{naNdgcjyzj@^-Cw4W{vM74GX3aG zA8%4J&>|DQ4h1z-uCB}oY#P?Uy|GYrt+1K%w)kn}x`2wFUsXPfXf!%W(eEb!UUN;={~aG}&ptOzqXF$UaFB(W2)RzJSXYod?!X>MwuK0cJ@kv?_Z)Wq0~* zGOg&X#OHioX*4tz8_S6BMI3fc-aPx9SV>#!LJ6SP0Y&o|8J_vzMoHtuuMdn&y(V1R zK3q=dZ`GNZv1=&=LdVU94vAbHoVU;;EGI@!=NH-SOr^m`dwB(Y2hn07Nsh@#q^8!b zog){pP5B33E|Gl)J?KO2>I$`2g4eMdGHjsV|+;o9-(THn7OA24?NE{GgWdW$|4i-A%#Om9y@vU~Gu zf_#`FM|CtNfv^t=Vv#jFC!namky9zp<6{Wl8^lNw%}gptv8L=)vGr7JU$w5d0xfO@ zO`Hb6y3uS@5GCb|O^vME)$Um$SdSk5l-cGS^vgLtmnCt;I?6gFaT1e^Kycs3X~0)8 z#@Ld>x3EadXZ=fh*Sy_b)t4{;X-ds?7e@fOpdJ~0__})@Tj!i~EmyhR zMIaQ*Gq&r}C!;53!hbq4PU6b(^$S5J$HvCwPj~NadHT7-)7`vvRWj>x(94OQT=S)QiT2GGZDghdV$l(WmRmJFIsV5<7Q&=*@b_>z0*3@5vvn##f4+iAtctFB4n-0 zwal!;jo%)3jY*cxR)?YS9BGm&4jLFzMgE%Zds9|GHgwt$G;dYa(PPb(`E&Rb*J(?S z_{*t4;H1me92Saibz9)2`y4aeaEOjeuoRE9t$Nj#&&W$5r|$}8Eg86;nv zY>xf(Dh_F-t`;Xnc;xxNV!5UqHMfq0Mn~fae3Tz`4iS{D8W|NQbS!2j1 zFH<*9e-3L`+3Q8VSR14DPu+Z%TC5kTag`HZQN$w}xA&Ek)xR!ydk{s_4Go>SZMbzn zL_!NQZ`ynqXsi}XRqLZv8&^~H(aUdJUdVX!Wb3r=2iHsE=MP+Ky3f0aysXerYl3sR z5~I>gd=d9wF?6mJ6Nf#spfYIGs^W}}v(3s@?XPuWV*1IXJ)gFtnL^COB6#`zXTs8I zjVGs@;mP!J+c-_!s)%4fjqBG1$)mOSoCb_f^J1>2>yVNogSmjmRHb3NgQCNlWix+| z$sg8^D~-jq)%N)JT%m5PZNtW+B61^Nnib_);Fa7z$&cGqY6z0urs1<5oo6tjMwHBh zLT5Uxy+ebokmIfyM`}Yfy!<3ZTCpOuLq1}?{DPAe)JsR}5mWMY)a^u$BWv&snUh|Z z@w5RenTyjt7*A*MW61mAPy2v&IL4vgK6m^sl*=XlPajm@ruzpjPB@b8&6E8!ZhOcJ zGVt6uW;rN|yGwpNKIh=k;PasC_;_3KzGd&CgJ!hc@&E-d)yGrgbLMkskg>WWifmaok5-jbv%Y8R!_ZR!a*c(d+@u|ReL8tA^_wR4s z(=t^yBG?}G5mo>c1}UuHA&WLhsuxu1Sd>%h>@wQo;x$#s>+9K^UCE#8x!0)JX4ePV z1sCD*+67yq^suZo1ogxA!0I2<=;p7$hP*h#OSg2PPx(e#C$Lc>?`kEZI6BfLi$SJO zjJt)G59v2|fqDU8FJ>>^UfrMHs7BPnfo3i=cG%(VWP5TAq^)XV21 z>6;rtTl(aT+79zB=gbYc&^^nVu<_A&2Xe&RJh8r#PMxAtj2=F z)%fPs)dVAg;B8O)`^^>5hk(N#67s$PyzgN`w1&2-_?TSfoYwM!!g0AwmnPyNNUxEU zGdjQ_KG?fTY-8g)N^Je5hwqTkvrHD?oUyNz02zSybtl5ozu44-iuMSpv>)lG6f4(H zCxLhD77fEc*}vi;X4N!6E_&~1A$gs;Yve||em_#RSFR3h6Yjd>=9CVaFK(>8{5wA! zKjaDD@8=nN@71b1-k<$AYee6E82M|*{myK$S7Dr2yYe_m6F!X?Kf@arw%S@Lck%O4NiFh)aHP0`uSyE} zzVqrj;R7Q2We;?xXxoRme}!`>0F&jz_Z z+4Z{~oZjNYI|?`TY}u5vk?2_o@$&Ar9*%ca``_lrUe4KB*;HE_UKP0{o(jKXVAguS zm??JaKRl`d!4R~I&*nvK@8{E#-!86{)6FQCt^CTf1*~W8O+HjJB&3v1?$@@eqAvj# zm4QzUQCjSS%UFWQaZ+DoVm5ZeGN8b*u$csVpm6H0!J?!S61UqK9D1U)Ta4)gZU=`a zU77grJahcUgf4TnJ1nQvW4@gP6LXRM6#^`!0#*5iP7e2R2vXlZDxQ&NV}PPF(@o>dC__(T55|`~t+14O#brDBA9x;pGlC zIt@R7J(&;skAWXW9<%BL2Mzt0YUGt3VXHfjly1aZ{T4%F3{r4IP9N?n$87sn305g* z7{!MgE!4V!RL?bXL?rn!f&C2#-is$IQOC-DS+C0ASC!-G1LXpdJ-XDFi0=5hTEAkP zv}<{5TvjT1XZ;GuxUBxJsX~b71ikv8*)L#9xQV1jh`h9Y+0p zbeVCkVQ>QJ4%^~<31eR8ncCGnt2xE^h0nnDQmmJWO-5eB|}CM)u=DRO#rwrrNJlnQ3GDQG*v`~ES$4E(xmH{pAOhk;6c~hK@@-&o1!BJ87^L2x~HgP5i7gB1bZi0mJ$O00#VQur~ zfSb@JH+r2cP*wTMj95A|U)|XFO)E;A=)!tYHa4t`*Tv&{yordhX1yQmk|wAt5%I)d zBNU>~X;9)}m=@!^yXE2bz)a03_R(GxS~8NNg@;U2MYQs82FveyQQ|F1oD#;+u#KBA z{v3$%W5!b|uu(*W3r*+MV&dlKlDiFqC__R}Vrk6=VpQPNtRU>V_{t^UiL*!jZ!czE zq|C;eS?-P3oeq1I-=8E-Ccu4S=bMN-NSK^es(qNGzG-kr5pu3dAP&XG3D%?F$ly(j zMqvh1=gQC0P+_~*_I#=DUshU>3ya;z>AIE2q7Xlk#yW~gzl26iMH!1Y|86GU+i>iS* zEN(?Bh%T-N$i;cDD8GDF`|8Xd?hKb$mrDJLDU4l=mf|JCiZNuP@~W92WyFODR1NPD zn7gh0EiRuXSd+h(mvd4M@1Cr`(<~?2;(QA*I4(8L*1O#G-@OI zDRO9j@%?0Hyb(7=Rmp^duPm~ZB^O*!*FUID4S!fc!V^pU0dY|%+!^y&JGCszlWIu2 zUh3S%QDHY?`YuOPV<_fF$wP_^dJsDNE=kmsd zBmG7+@)hcp)4s7n$s>+9{)o(NZfRrZEFbt^=*&#zyT8;qZmsoD5o9%3)&*@6r^ zm%bGXWtg1My4uO;e(bc1|6!wEj?}Bpzk&3b{9y3HGWM%}Q=yo}*OM1qdkZ`CPkW*O%DR3o%(%G_(6lTi~BdHmdw zjg=M}x*YEEBctv534OShxFd$-QFhG91mK3co1$-;JgM~o967uFLhrH;@SPz6OY&P` zr_AG;t`E5xE5Na@ltOWsJ?hN4)Rhp@VN;T8q1}S4+=8iguB9Lf|4}aewQu~e6tFKN zoYkbO*60LG(^0#$>IkX*6b5X2J&C`c86sp2at_%`-{AG|Q20|F=4knSURw^mWbvvz zR#uoU3d3Gie&Go)iXj?X$DUUzXOS&AI(Y5*y;;J$Iq9&gL7}sOBHrN$kWJ!3phqwv6>k} zay@rFY5dW+v!nd=>zH+E9`7C8>Z=W)}?i- zd;v3|?}A;cuAVvPhv+fq)dJdjD$9IfM+pYV^=bGypcaT7$$t`|JF@zEJzdHWM^gX z>vwzR5s>Bk=~F$pRddetz{x{!?!=O-|ny^XIrdD z2V@1aoNqLZ<`NW#H<+*9@NUDc$#j{E-$+Iek;fj6izKqU{`ox@mtp@Y{%wRL` z`oPTn8N)gHpf`d!JFi=6m&Uukv#8}xTxAj~k4Gs^#v_jt-^JS980*?D>Xp7`MVN8&@n+X(`cc64x)gt&rT ztexB(p_w1(p|Ytx1Sl@{v@t1Kn?YPm-K-rQ)FCd`7RXPjAxl>PC$s^j6c=~&(r4k~ z^wXGK3qV&`N3 zf5=@KCJj4u1E9GpzZb)9nAb_X7=v{ww6&X3a{A>c;~5#g z2n27yt~OOjgO%i(C}J0VHW*W0r^uTxs{5ju(-z)2C0J=o>_}p@9tgf5AX-d-FuZBd z$qLEa`kvp9BKUf<*Y>#2XzTmNeBYN=zRMTS=K<=kG}R=?Zw1{C8;JoFb9EE7c3my8 zZ$&z9e~ND7So+~|*`ynkb$DJN7b;H4zjZf6`fy{+!Zh#lW5#l)h%;uAlVw(OGux|P z!U%MfG}p6;>mrt2Hb6XUKA=CVnn${WvMI!|g)zeQi;Cl1urBS4ZZt1(Y{YUib4+Ne z=&+%GXZio4Gx? zj)sdnJUN4+>7Q*77cx-W+T6{`Ri6jyINVUXaA#VCWXl50t0r7G({3ri} z5Ksf+(6f|uM zlz>VMozfNXTj;-DKb-#%P#M7Sw+t|d>$hzG_bK4_nVK3}hPQ`1dMZ4EZl2IT&jvSH z&xW9nAppi4O8Kb#pdfnW*CEj%h>Y(L8R<;`@cHx~yxTp0p4?+xJld@y>_5cWjyBE; zRojxW!dxwP`*nj&`C+=KzMK7wJ@?8nH1Gxu2f@4xLL~@}ZE&Mtn4iG_LC1j~#vV;p^V6MRt~AjvP#y%j1EWc96Oe>Y zI!&G5bYebw%Co3(=P z^oDtP3WF+hcj>pu2cyA_aW?fh1GoPye-mchbEiF1p~05##4< zqOdrk&rv&`-F5m?kbRNl(3I{-81)mq>n+|<9tpAaLj9Is%_mh}BEiL@G8URg2^Vu*f1zhM!SL3?Q-x zj{sZ;u<9c000INg_~H1(SQD8x(Y?hZDI5pze?`I;0O;Y6Ln-rR>4~DlzZbyKBk{wQ zh|3hX(PIxqxD;%t3cJBFhDGO3se&|+QzdX!aW$ULh@GoGcY9_NqL&{tPV}U zkB`J&H|NC_Mz-wwb`0XhU=32~DqA=Ef?6F^xvuwx%pr()-QtSU52+2+vrBv3=nFYn zkYi`}F`^*yYGnU9(iKP$O(fKEJ?+@m3o`%#*v)h-bH#Co`+)A)wRnu)f^tM<0*4$d zwT4LzM;h|1Gt5NF3GfB81@T!JTS!Ers4Rs!CNd%6POV_jY z*G^(zs9IhMBL+&oq{P7tel6WYfrTma()u;3BpxMxQY5`74#g-y9y9edQID?V^Fqvt z5Gx5c06(TSrvK~x*IBPAdJxTUPGC+DPY6J9UJ>aME#l0SD-alfONSQ~zB+klu0h0z zofyV)o!&H`22};_Ong&FQ*=`VktkZhVg6wOSs_`Gg=+mf?RQesST--t zkH! zuxCm4xZ#)YMJeS*r{ci_prnNj3E$2+81xuUz<>?QVaYL zGY|Pmf!4c7H0C;u;5UlzZBmZ+AnmzW*MF3}r8ZbNS4Z^IXR zySAP0VP-k}#VtK5>qMoN=XHI#jGftUe_5u$?gP zE`cthbtco~<48nIkR(=8@PPCt4Kg;(YZpn}LcEDYE9H+g{Fp+o1A1PX;edkErAKJD zu~jgKqdxVV_Go>_H3K>a@rt*oWK#=MwNbXwRAbFWW%YAv_rKPr=qT18O-+-iCikt_Z_-+guRMNQ)`rSGsX7T>uBbS&*m@FWlJdv% z9~?grtRqpAK<4ZjQ6pm8bW;P9<}`J*--7I#w+=kpWOHG@M&wJ*mSLq#OR^or zAF$tKbrJDS6qFB;%%y0jZl|ev)BdJmpc#icT(m%Kp1uVKGa5%KsZea9Ed7-o!zd0= z9)>0xOGd81{M9dpGKSO?A9;?F&`Alx{8-gK1{HeO6saMEA^aiQEg)$Kx{3@mK)z9e zU65R=UN}|Ekzb!*U*=kJT7Xg-Q>aj=P$nR)EvYT7Ei)(U8Fk3GjMjwNgy5#`=IR#X z#^ko|rtj8#M(;x@L?wjK3e!r^O47>M%G!$5iq?wViti8bhx3Q^C-P^#m%CBB(Yukp zQ8?57fv_vH+yA5Nhw*rOiE+_t{-e|j4dzsFNWm{vsds7`G!=L=uWh7+B+v72)Vs@1 zra5iUKPBqPzc1ldTPzov&YEtXmYFV_)}E%>quaySBZr539t_<=#e&4L#d5`RI)gex zI)giNIuj^jDdH(oDIyCJ3sMT=3!(}_3)0r{y6L;&x?Lk)L|~xqqw1sH#u7*)Q=w1+ zi|C6`i;#<0ix`SnikONpz=&XGFfy1MOaW#9vw*R|2w?IJzy`qvxKFo_C$=(nl^~Eq zOwo&$5gQ>HCK)anHkme=A{jRsI~ia7r8cEn($Yh-d{XoP>Hbi`w%dZcZ{bmU}2CL;lqXI5?QVB!F78xrjB zOY;u7<~yZ4Wjy8CCdGq8NeWL2kC953%9IM1O6!g7&F#(U&7%pSNumj*$ty`Od0P^- z#XijS*7Pj`^AvN9&RgBK4|esiS|m)VsiB&Qn$emWnvpXpGx0MyGkIJgTuEHX=Ww=2 zwk)xWj0+m6qjpS$Bt=xRCYPwQf8hwE+Yh-()=G}qnMMb^{R zS=Qy503jm7D_twS!X3gx!h^#75ApXLx3ssy2=Sr8p)nF+5)l${-9g>a-5K4HRIyaK zR5?_6MPWtpMd3v;;2?0y2B;q;o+chhvY7pnE;u!7g^>0oEEd?yeckD+av&VSx`Q~^7T@Ia18{ZjZ8&De58SEKE7;qa@8w40684MbH zGmtV!HP~t7|5EL|yzSF-p1sd9M^~-n&=I#|@qK$8Z$oQs_B;31=bej9^zZRo${VB5 z`(^cp!`9(;Td#kDZ>DdqZ}_dir7;;QMrvfTOqfi(OlW_4e|&!qT_RlsUFr_63P^=i z1x1Btj3xJ0a7wVoTgyA(_~(h4H=$Rfw{;70%R-Aq3wn!D3q{LG3t`KPmaP`FmV=f# zFBva7uN|*6uQ~6)Q;KVPzn^~Me(ZjGe$sxQ{SN(t{Xl*)*T1giulKGOu6(W|uE|cm z{%YO0?YfL1A5>Yl+`<2zwm!GUv)!<9xrwozu%WU8Tm$Ynt{tr%?X-74ra#g=B0MOlK(lsF~2B3!6eM2&ZK&cuv4#7tCOsgqBAn2K13m8KSV&>ilX9W!%HRESF+jz zY6DUO#`N;^n)E8~Glx|@PceNBPus_T;L zBESfij;noZzwA$r71lqb15k(ktn}WmPy)3>Ph-Y zMo5uGwY>BO{!%j6#sLm7GXfY{Dp+kO_W(XRoCch_XX#e8ySft!g>#Ze(aKS6c!t=^ zg`J$paiO&G>iIt=nXRiois~`5glfrF7IIBBSxbDB`N+56yTG%MOCT1jsi>-`rKm8X zI3hQq!mP)v^fOE+T&GAUZJwppxmLFpQtN2u``ybK&soe_*O|rH)!EWH@Py#R?L_>9 z@r3ro;e_Hu<~y)Q5cL2S305IiG?ow6Ec9!w6AKCJ8&(<3sJy|O-86#K^tU=G zSCq9>QAy@0o5?9}rxQ6--BQkyW8Vrg>H>6WzA<}i{`$OGyva1kHORtHjMIwKkP?n%tyZa~KZ$7VW9?v#WX)v_{vk0bJo#!;rUa$9 zvRJ>^w^*jQtk_r~@XZ5D=v&j2^|Yk76DgoH=Om<*sN}&nd@Qb7rH0#0BSIOL>UF6AnQQD+vTM0APVbV0F7^Jsm(q(953TGsJHKbjkYNNHM z*;L6{bf#r3VQrjIHB!P{P*rVL@w1BLU3PwUr9p+CSfl>+=U*wm3V${Isz!blE+M5T zrP(XlD@P+rV?-lUBDvKfpEhKl%8)9R>Yi$-S;!b!@uecOqP(KsCf}xT4>ZF*BRxYl zBTz9_;i%bO{){V{C?PW?voy0Lvsv$*Ui1ReC!0?(HupA{GZr&i)1Rj&_7wNT_mKDC z_b&F@_CR~OdzRD0(*iS*GiB5GGwl^e)`?tSxTv`}xtO_@xVkvixR5z*I9aWi#tG)U z%1>TxG|(z1DNt#q2ZU9}y({TWX>MoK|aL3dbZSdK}R z>DB%ht#GZFS+4iKdi8otdW#DL_IUP~_7HopJ>7xm0ri34f&78iLBPT70`TM9$H4tH zw|Tc7w`sQ#Hv@NZcQrR;Hyif_cZWU2-Im$w!um3W*{#{s*_qi-vqDwGC0XUZ(w8j` z4Tq%5gv*fSJRf=?S|L**8X>R{T`QoKy4A1Mrj@((S!<-vsgLoQ$NAum;LYR>|IO!{ zl^f8_x0{lixSQ4+@J;;<(@i_76Z}v3@9=x@c<^NKnD9T~%g~5Wo6w|DEl}U0aidK{ zFh>+dR76-rI7FC6SVu@joJZtDG@zBCrJ)8A6~yYqO2%o#jl?d+cEwJ{Dih{#?6DJp zGC_LW3+#Oa^+Z##GjS}jLa|h_9I;~rc|3<8GOhrJ7Mr_8IL9t)O6x*%CwAD$DQClw-gLxi`NzrB@&(@2!;rw}P{Rl7g**Bh-`z!^p!iZ+TegSXOlc z8pQQ#btbB&-!s2&sWPi_sTz9Es)wedq{E`;-yp5Wp+8kM^ZtDmhJL@!j-IwogYN3b z+>Zga_ucUJ-8NJXMb=>O1{mAXb*XfrwA*69V!`5@#Uh^Nur-r?rR}~6u~n9xzLmb? zBE)yjW7gyAMGa|h{Hk*2==|sm>k?~s^?dbU_3B~4a>z1|Z?RCl&{w1Gm4%gl;T7Sr zw!VkDd;8n63#JPJbZP`61aAZ?1pG+0NViC{NZQDN$VhY)^i_-k^jb13kz)6Oi^Ru* zVu=@#qhx#hoIX~E<7E=Q@ftD|GPp9XGQfU2Me@FElDN);ildbzgm(eQSv^5IPvTUGT$n zfBne0r)XfKMJBIbB0(yFJXc3WS%pyLxeET6(-`xZ%NX|<#Telj?pV8#n~}It1@smv zY!qem!f2<(r-jU`_2MRgz(0NC!Nu9ynYUfD<#SMDt zleMXiPaXGwK0qztAYVd^LX1#sODrwK0YM1kh6EL%9!>#%A3*>?25tvV`-L$Q8Cn2t z3uXxVI%*OM2x7-4Fw#AR}D` ze+AQ?(X{yVn3XtZ;79q7?rx%PHg0lm`e#mOc4t1Q>=9JbtbNpdOtci+=8E8vxO6&D zCFr(xq!z38rnb6vqSm|iyw<-KyLPx%!1>I%(0Rc*-g(`b&zZ}4*%{;eN{@Yi@<#Xu z*9Pf#iH-X2mEQ$6j(UMTjx>o_aM;t>Ke5EI*s&S0tErb1bY$lhscDgEE9e$zBB(8B zwP-zPR}>6nm*gpFkZ7!_`KZHbAvBtF?leNOKFMcH8E=^1RHgn*CrQfAGDurWzN7bM zxE{ldpfZy=i~m9vCHy9^cwP1}shp~k0T50j1ub~!Wul=dp)cWsux)t{`7ko3GHE`2 z(TQ-ed}97Y<<99Yd~QhuQeso0`bxpV%tH5``#qgK<$?dg;j7Z3H!^4mJ|uA>KEJG{ zCOV785~|CJrW&VGr`}IJpPHItniQBiDr=W}%n{2I8y^4a^!4>1Z;~?w7_ed8x1(rI zt6geVs%ar=;bS3XVYz)j(vZ`rptcTW0Pg zr@hgB>&bUyw35}5;}_u<+-XdAfcn3pd=;UWGQ)*lsyU8Zax2y!ud z=&^WKBKlFZrsF-J4f=%r$)tEqv(tj2QT3y0&G>t^wqp%b+jZG>zx8doA-M&)*5pGv zY~@^G?~~1(14qDQ0u%KV<;fGLOZUL%5W zgGC)Epn<39QCL-O_vI(#n+0EqLDkZSrOG>5Z(VQ8Yp?5DWYx}B1v;kcn0n4tg#~8$ z-@SVqfnAZ8F}h|JdgP{~j$<7dWj#{G)Kilquy&F7k9m2H~Xl~NVw6+)HC zl?s(L8VTC%HP5)RxrVqrxYoJ6x#GAaxPZ3Ywm)oXY~pPFZ4+j-W-exxX96m^r%&dr zXB($`XL72WHF&fxs~s+x7j?tyxh*Be$V7Y}ey)8>v*g+Tu}`z7wCBG+wQsw3yH~b9 zH?}^Clrtr?6Ttez;78-AE5_J+P5_Y?(KHbWu^CY`(E?E)(F*s^TuXr(uayhGtLwW{ zr5(q-K>vr!7o{k5_{=~!VXD^`9hSFKS?)U)PG zn~Nzc8dKA&XCJMLGUgR+x$Q0-gcl5!m*xbg?mmT&Q!bc|yL_Q}Us6$018Lf_Wh&78 zIFmoq%=P|L)2ABu((`?(?@|j=-7&)W2}V_o^MNNndL8XIxof#MxZ6L+es20qY+r9L z!a8C2rp4TY;aKgW#5K*q(8t@lIj7dJbh%Eb_}2YzS+R^Cq63XSTGnU6kaFtL*a}ULK7dI_{G11uj4C@GrWj?#It< zRO1^TwFrCP52JOVEud`>KjU$6$ZiMuflvAlVfup>f=)%dM89>+1k*hy<{9Lnam-vr zZOrhjzuw;cTB3i2-jl&X3R68`lpO6&r9J7FXjN{kJ`K4*DDBWMrwZsJ5vn zsIIBrQ(a06FNrNs9s!7X`b<{S}vR-n)7RlwRk z_%iD#VI}Yp=3MkRxbt~C!Z!?7!7rz4%e;c4BGm$iLb-e^Y{G&M+Z0{{x2ih@bKPf= z=NOS>tinryJ*Tp_NjpVF38TV9ejksm97BA?ZM$5*@AiC)>Wm7HYL&~CJCLgxJR7_j zEEwz^yzg&Ga3>=V$UeDR+E`3jRTfkCQ}n$`iGQX8r$Vxr*_MFRhsig1@Et8 zPcpu9^jk)aG3K6&_@3D9@ZGwe?;MUz<>vVdpZPiCjr{t$_w{CUdxZOI{MYIc(;X{U z3)ceIa92{-WLIP3@zz9t(@V+=sUI;b_+!R*K7$u(2PKQc(`lB*=&gRI$UeP~Lu;vz zmEamZVI_-sY=Yv|Hjho?OStRnC^i)9rixGyRDM3&-x2TtDBfgX9L0 z6;P{sz|OD3f#wCz(4!URPK6K!X$3I_y0;5yv}xRL%b^eY@xyY%Gs8S-_$v}eFII@# zF#Pf!A0}_3E{VzCk>d#S2FO0Bp03YE_TqX;IrUo6c$JuxfFU+pw~QMVzbb}phW5VA ztk0BiO=s?Ae$%^JDF1Z$smykf{F|_i-`QRIb?wQ*7SyeOc6KW=1-pUIq0gGrJd2LbtBmbve)?DF{`hwezl6?wCoZdx z#@ZflI!`2SBgYb!$ngUOZ)#7H*UKhV=JKpXhyzyc8=K_jQ&v7MD$Z9ew0yd@Kxw3@ zgS6o`YPQTYoSWHtW;rgJ2v{{cHM{y+_}=381K%49x5wYx+;Uy-FFKa4R$QM`X1lk`0gJq8cws? z&$;keH27Wi9d?nmK*OAE_>J04bL~9s7|>g%(q;E$%;oKc@xjsK?NN^y#e=s&`il;V z7cenMFclTgUZTSUeSZjqVZ#?${htbHo@y(eYC?E9dH(@I2R&7T{Iya<#of)x-A(Ry z-3GMCLq1_J(Z0$R~wZv9k?C1(yT?69`5hPcqXLTf$%a#9l9AU)^?j}S@`6pv1a|_caiT}de)Y{tqH}|J=KnGW#wfS$6pbY<@=7(0>xCwIz0{?ml zo$v4G!Hy0VjxP454q%9|_}|I?+WRCO@JXD%i1+_OI4;s?Qh~h>G&m} zuaAJ%7HT*;+6imBSVLbhAqf1h{l69Y_fCi;^!*dU>}>38EFcgIh*tx|#m~;h&&kWg z2I6OfuH(N@{x2)}L)lP&g8Cou?+*B+@&7Zn|19MHZ+85TMgC)vf0G9O<68eF#s14| z|6`GVS#|%Wf&a0{ze%zGGTZ-Hn-u#m zv;9BEBL9e=LGg@2l%7w~vHvMz_!pq23KTQv4Gmx+16A!z!4P|B@DC_%=Loj_E9&fT z5yrov&i(+8N!dZ{p|DCQl+4t@+|S$s1FAXNL6l5?1K<8!22d3WB>i*pzc-$=0aVccy}W;(^f#kFCslU0chyH`1Ly<( zCY|A50;B&mBKw5hb4C7dfIVd3-w}Jz$m-uAd(c^d3fAVX`hRHl=}^Ma{Rueumnqo) zE2vTe8h5pGw1lo4I@ccvL1dtc6U0Fb{1oif|7|4zP)An};NWg&_s2grDDQt(-QO*s zXzE}|4{>18RA+$7Y3gDt2MtU@-QuYUG5-@D2%X_C!Dd(e-?3;lL+HBx2m}K_zj3#J zmHRJn$v;DSg3?mb(rj#8JZ#V^CpI>2KIonkx`)CtdDx-G_*)?IPdZMCKWMn2w7==0 zbez!f+}wYDpyRnYpdV;Il!T1~%9mXVx@Uv(`aLC79Z#V_dP!(Fkev+#0`Y>l*x7!^ z1?kw>=otPYu9pkM0%~S<4rC~&KQ{n37Z(Q?zyk1R9}gD?)ZkAWz~TQY+=_7iIzwS=we$U&?!S+P!s`D6h(o=UGn;zf@Ux^bR~{QiN51tfzOz8%?u_+(c;KQ zhSD}T@LJ)eXGF>TK0gtN`{OwrawtjqNr zu=Ty=!&Y~Iu{Jf(7qWchg&`<|%RL{t1SLLk>*XB}8ke?Pd;Gj%M2wgYKkM)?7506b z)Cc-o(|s#&`Rh0lg~C4S#JPxi2#hpVu65C6obhw`Ur2K@=FM_GeSBSizlWDuDCw-V kPGfa=d?`gW`dlS1xI->&7F}1o7(8T1_&&7T@9RN#cXxA{fdBvi literal 0 HcmV?d00001 diff --git a/src/documents/tests/samples/originals/0000002.pdf.gpg b/src/documents/tests/samples/originals/0000002.pdf.gpg new file mode 100644 index 0000000000000000000000000000000000000000..0322a8039f0a1e401142c7fb9d5dc08c60ad105c GIT binary patch literal 18961 zcmV(lK=i+i4Fm@R0@fmhlmb2@qV&@30X}=TA!MckkQ={n3_(37H zMcCZAX00DiNf+|llnul@B&doiE}<1}I*XA%fvL*7(d)xeJrh&(e~6XJiUD=a7i0dw}pE=Z>LkH||J@ z-((I_!zPP%_k}br0++oqJt*eP*d3?H>0DiXCN)ZN&H$z8%hvaW;XjZ5Z z)Xc#V?pBpJo;BHndD$vG_W348k56<_Qf%VawcpJgZ%9HAJ zd%2qJ2eP_9U?b8}*>GwnSY`VwM0Xi@iI;U4sZ&l~dE$*@Yu_A+Isa1rU5bE6L~R9a zz9cmUUn(p%V^!c95`l*D`0bM>Zv>*6Pxly!-CEc8^vgVw<|2!VwmxUHKUQeSD;Pmm zq9q_>+1pMQAp`kec4;=tjnkBjhl*IB)UsId%7w%6QP6?&vC@W7`|WU{3GpQnO`Mni zBmZZN+!~t{PE`{n$8dn{4y_HEKBVwQb3fH6l)7wNlZ_Y&LeyrF(6V)%7HK!V5Dcrsni z7r77Bi(C?PNm`m*c!89Oj^WP&iz?p0sb>|m(wOSv>W@MFC z2c*fiQlrE0s%9y~wCCV(aS410hIp>mJ2{U<0X2IyY3J?8U5&naS|f7z5>HLp1|FY% zrL6NnD~|$VJpzfMS%*a|KkmUG)!v!9C74l^Y&Z|1V#0r9ryU7nPiH zBul5SUMjD<_Ax=JUyGy{Qy060nTY;&=I@tMN`4ZX@{I6vTMPE9h}7YKTB`Ytv(j5T zXROKkc%!Kdz*1n%tz@|38zZM4KCIr^@SK1%~vMCJ(jPQ3PY z6xH`}@Q)xOq7q5Tf;Lgm@s-Zwg!!eDnHE-PE$yXyqW%Zkz7fIM}I+HoQ+QqO%zK>V#CFeilx_5Q-K9H_3f^FLq$#!%YlfFzAh?f)|4evvgOP)qj3iokPThBdkhGOa6&e>( zGGv8kj~kElbn^^YW?0~v6ihM*W2X#oX&l{#KLVWYgT2VUHPljrACUl`rhtwyFx#)u zscq`PtPyCf1W@M3t>Dt1Z@R^zYa!<)PC!#dBtliDAZ}9 z!&C)pN4Mg@t{_Z`fyl@a*b$lJ_Cs~Wn6j%e;yPZg+Wv)cUGXW-o_l*?0Wi_22TEJf z_l?Rs@D8tyf>p>VdAB)aazsVWkuDtn7>@p>iLtTdPniaj($av5C#iwH7rPtG#J|B+ zSY5~+R57zkbMQ580oBOG0!7ed$ve6!TQ^*`Gpo2YZK%Of=IFhbRpUO4#5~4-WgG(S z{o^Rd7aG%es%=c18q7B=e+$J%Q#-YnwsY=6JKPh$*1G4*_rJ{r+-N)k%RCWsB3(Ym zrh5r<%yBo$9*_4{Q3{*w`L8x5&DDS*!@D89`j87);!1QMY2~mD?-{$?5vkJHlQ8~o_rsk>jLV5aPW)&v0}{wf>IsD? zabQ*Ai&t(>?cWC*+zAAuCQ*ulIH~%P4&q?o8U>4#|J+Iy(Z!IYWj1&fl|#&q50CAo zk)j&5DJU*M^{PoH2$9OT|APUuc8y{@3bwA_jS!7g_KrEZ&+G$hIRY{U$s$1Xsp7>V z;{v2H!m~9U=%g}^{sckgp16%AnvLr)*++DQ6eOG7j}tX%Z9YA=qMauYPWW9{8yqqZ zt;?eveXx>*R0PXOgffn`HGIPEU=B`2#{UiDlCiBwFuNtz%crg!B%?cHp^*{ zjCS$5IQUo!F@&I4`9%IXH3VVd8IiBoogNjU{JG)~2fG-D9bZVO zKs_i@d9%^~F+@R>!h`(4ff^8PAc-RKhYURT%I~4_2%? zi~9xNt$5Z6^20|xO`B~QamW|t^6FS2^ogIcwD(;Q1LfIUGU?8*-$S9=M^M{eb)2}R?ub5(f;+K}hY zCxCdq0?+K)9=xxAI*xA!rFK#4Qe=gQ0fjAYQ&l>nZEj$Z>pLX0qC zC>M4kq(5&4I6+Dgoz@X`iu9bnVJrW#MzJXn_P>T`=TBI9jruhM>sFju@hl^%KhtZ& zE;|&e;ss)#uZ&z28xdUwxV>`YduD25Mt;Qjo-3%%+qM!+-KwIH}#hVv`&J27cc)S$J{j}@2m~6Zm z5FU>^RRwW>pogsIba&kh5i}C)I2({#gfrSM>88$;u;EXx$a909m@9Gmqo6A&3a{25?sgs(Z3E$KpTY10(`7cNaVX7%EKS-%@T$ zZ6#G@Zq9Ktb^;=hX$0K-aE=X5lCRv9ifF68ij@Rm_`+j4j@9lZcg}(xpqJ!IDzA$` zM~BSE!%2cB>%-y~xvebhHW4?z;kt#J&$!uEg>MdfCexUw)V#QQ5vr4rPY{WurTX3? z&?%&d-C)E1>|YBA1hRjz(p6;AqNqin(&iRT{ERpH-53LwpV~8W+mm1fyw(N?xYyKh z&YG%RL&gVbuf^JmbKn{IY5aW;1sNJ>Q=6^T_8u%X!=9t8+JF_jPeh^84}=bZ-yR`8 z!sY#`tpUnWOzJ|W!$_#{$E?v2ww-iKiG;*s!=AX1_-UqHI=+(s=d%axS1K6`a-$SL zpbq}Afy^}oxFI~H&5?C}9=vzmfO&g36D2%t93q@KKG6FvEYIl_HNmb(MM zg4-?i`t_XUq(;|F=NS~XeWKczEp(+P(*q*?kdty-Q3=jWMM$SHI2V5|j}YB+ii--k z725y8p`UVE!s_c<%j$zDuJWdKkoZ|fq<4&YWJ@P<@U}|(zg%k9s%Pf8CaG#{U`e)$+aup zova8$He5Ns+%y`2`lx%A$GOeWnJ}D%ZkMlzvj87LY4*o?(9U*CI>!3@Mav5o>^Y0ufYY452Jzji#OmHZi@*ZZub zuvIm2xGmXQY<-aNl01E*7&TcCgoKN>{x8mMZ#44!XidALDxR>?AD?yyPd zd=E;@hp0c1QQG`*Fq5Q}ubVno7V}zP$?{_S=pGYxD)m}fF|*bT7;7VrjmndP_0-NW zKXr3%WqOR}-5w&kExog-=ryPch|mygpT)f>Av7Pc!JrlTN#>zh7xk<^C#Dll>|52L z<~}FL$TDwGqj+)5d+?L6&X&Tg!1$yFgPr=0&!Q^SwMMsF2JN5TXXXG77m#a`*`;WL zANWZ_R#`!(;+W6W>+9gr)*D|9cV_&dr6rJpz*Bsj?*Y@?@DM$#szJ-MQw-57{$iLHh5NG$p%`v<9w&2bot{(%&b5 zbECkQmaEJNb>;>95K|QRI~AlV&9Hlyv6@#ZQ3p@VrddE$<&&Y!p|V>`@aW>}yzPvt znwLZVd%OVYf)=OA1&qjbKvE|Hz-*H9KL(&XLJy=Pqx3Zvbk(IKOnG(Z_YlAxC+tLa zp0-FMpeILPEvPL&2qK&}D4C)(2Tu5;>}SH)5R=L1Pq~@3-Fd2o8jY+wXdkTGKpL-0 zdWn|#a18%c1k$~bXS|ef7Afw?*4%CRAEEvn26H3P>vQ$XpMU^Y*eUt&1fz^%`_#;) z2u`X`emRforzTHF_kV|(-B;whf#64CM8mVWbD#OJBoZU<{D|U%#50!L=ymcu<#%D~ zG_%8(#bZJYgzEN)o)&jBh{5esc+u-)Xt!_j>B1fJ;X-S~&zL@0Hd!ws2w%A{gGi#)Wt!sd8^lnq zGF+SR@`6*Ah!=lEPCR1LF)3jwvctBsw&uizF6ps6v?X2aVM2(sg8&95sTIy!jXHj5j@5On_isF@GjVa@fJS%>p9uw$DPHh(}C~DDwwE0YDFg~at@+t`G z7pbc>wBl#7)M2UbxU#VlW;sk=H^`M;&Oe~Vy>6@nKK=`(4_O`s4~rEdA0^rEpXHFm z?0|KmS75)L1XXKyXc0P*&O;Ax`Z_Lr4|RV~RJ1BBpOM$pHUzftU!taKHv^aYXlv7@ zCCeF}+k%>-8u=6z34aYPj8;aaQ$Lq2CsClMj>8k}Bn|Snhfw=4E;U>L6Mz9=o|NdT zl!?{%uhoo19k?D^_C1{lyWqZrnkyYJCv?Lo-OVmW!byM=T?Z)gM5fI)3wgyc+U`&i z!H_n9x#vr)FV?VuKeQwGRHwss&7Qt_Mdh0dos;==<$cJf?DAc1N3veBeyle=kt(Sk zYFp{?=enPte?OzC4WliocbVWbQ_Oeq|3!tAY`Q7YTa~7SR^NJijOz%#QNnU4UMG`A zUxu?>z1WNLrVMFW{d_+$&-}Lk2Q`FQOGSK2W|aAz+PgX8ov>cp54wW4W7JszSx0n0 zhgi7hGG8eQ40Dc?^;|`cd{UG{1inY3Nm_9HVLHyelPEK;9_P>}%*U-g4JB}*k=wa2 z?Q%cbW1%i=Gw>UzXU1TUl201jVP`fvAr+wWe}~Iz7IqCR+V)@BgyDQm0YqX!nJEC( zogxq>%8F9~ei@@Mc^5~G6-+;xZ10r6<3hBu9<-n`7QnonV^Mn{$q*xD)!&S|PEzBJkGW zLY5_JPj3@K#C9aZ3Rl6QEjV~gbXK@3Aa=c?*bGgR+V`aaDQ|1ev7JJ)ht{EYN1d&3 zEC>ePZI@?93BK^PB~QasLoKq&b+R4rZHv{H$ozaJu_Wm{>B`-X^vY>d1fH9)~Ktot^bp|=5C)Bs^JC}iydVNNtqnrI<0 zYR!%ZJe1VDf0whXlJUrv@o@-y27O1LV7wb@?ouG6>X_Emr|U1Brt-ci-+D+#21hDv zN#hmN!yh#RU|FR&WW-lW20-8i3oc|^(NuccgC^ z&a*TS88Vsv=sT4JO&yd^#EQw?zA^4KFK}8{b@v)}dQf4Rq#Xwa#qI&H2wt}cy7Eoc zo?~K;=Y*b#nGrE@Pbi5?oxTlgZ3%BY(tgqn9Z_36%2Jn@2Y9pR|d)H zeQZ!(P@{eqOhx8 zT#Pq=J%7moBhD9MG&cqapOl4KQdmcP`t>1l?q>@C1Ch{0574Yoos~F&A`YeqpW^x3sx7|uI zb&TajpGd8woPlzGgcbi8Q8X!6o z<;-y{&oNAcZ9})_8+W+;7fzp?9#=#n>Q~EZZB1Qj9@V6fpNR&}N#ho;d!EL{9?yya z580ixt@sZ~4?wRuwuukDfi+4Dprm^kZZOQKj-pGTQ+)T|*L8lvcjAY=FDR!Wd8a5$ zqI0~M&LB(VyMKxE9rsjTfYJalR$ozRRYQTp;6GbC4ld>*OZ`-Q??WrjfEs8GH|c2O zXaRO$49iMs46$DcdG+Q*9@JQ2%($O8=w+l)w|9#j!6@>hfNon4NaEEW7@|)uNAa#k zbEMmCF}UGtbj6|E$Wy25wr5g`_^8#hr=~x_g4sptC>!f5APJ8P(nAkgV2o-!*QC)$ z10`K%O05i zTg}L+_9C+s;q+_HOE4+LU!ob(F3VqLmIRoSufWOegOU?twcH2M81w-e%=ipOAM`PX z36d9z#nZ#{o%p@fZ)gPc3AoarB*v!iv6XUmbg0yjzSETv;f}o?r{XGeZ3U)tgFb`N zvQFrwnxRYXd)4YDekaci>+3teKMu`OyJ63l@D^K5jdy2*7&-Hz{n$cPy(z638GI>o z1XOwVBN8D!2r&;vt|_CWZo4hNF!Ea4$?UYR?X1I)DRwr?raN{Z$w?S+46wHw|NP@nWLFoel>lsnwT#`uo z|9~UrY&B2A10xw3+-XqV!yZUBpRCI#(mXB0);p)$WgjDQs;}O}+FN!>v%tn92C={b zi|P8Bheyaek3A{SI2baeyt5$9aHVURut>H$h`4nEGBgM1P+ll# z={TaI>1F)`Lm=L$kY(_bAD1&|5vJq7GB5y84QrjN!}0dJJX$Yrw+)U;hiZeuBil;Mv3N<81-0rEqR=AgNNUEo46DlDV_ z!!+ywL4TQ@^(`w0>>I6{sv|3OZ70o-+7Ys4txza=JW^pjWM>VK-?K0b?G`tN+|=ntR@0OLeLif! z?FNjd8z0Zv<(KwSW{LSvKx~izLb<4r*;=hL_V^;&@bu9bIdXB6-?LWlH zm+HvRUL2Z1q^L<|d^NeBZufCIocdcxJ~F8gtXA6PIq)dWLQN=v7mX{GrOD#7vWeS3 zHWbQDBuA6B`ptkW3{nwP@qONMWI1(cX1sf!Bq%XMsE(|fZUDC&Nib?d(rKFrEGQjX z+JWsXlXKJ#nhfQ^X80M`OZ&BN;DNv)%%Z2nH%$yj3q zFZOTK70z132O;rKIzJ?iv7qgpWpZaU_kz}n0^}OgH2Y9kwA}iVi)t^%_*gI|ltdgl9pv)d3dU%W0Ty@&EVQpYRNPuFZ zVk71(biqOfC@sIyCOszc9pIoY5Wot+ZCBj{DLZC=5N9HEnolK2Zj07Pf^=aZD$Pw} zxa~RD)uzMYCs|-Y@7PMbNv$s#5K+7+8OBY;V~2HcvTRhk+X*DQ#t9&G`vyI^BO#2rtI}>;7OkT|7>ZcNmmlj3{$>~fc0Z$2$XPqs<;?Yh%8ta=X%QB5)ELtj4cP2$ zCc+vjgl+oPG}iM{2bFFmA?*Uk)1Vl{`vVPmWI_qVTfm$vLpqlmkEKhEIW_BvfAaGw2{PN;u`Lxw4=;z4owM1Q4QxdbaCt$4|Xy=toQLmAF| z2sodB8!N?fpR?tI!8TFDtq!5u=xqGtQG8hW7pM8lRRbR$_rSHH@PxlV{G31B#|f|o zuE#A*b*AoO&wlO=v*iBm>>P45^FEa`M_+@bm#NSwvCqLi9+?wk1u5 z_Ng@)JdNwo-i{BBDV3g`7=A%D9<#0Y2VqiQ_T-!u4?5us33*4%Z=|5457Qnxd+h!) z6;=b#(F`#*2U7i%5T=`Vl^{ho(raoVqTQy$Qft+fuOG?$Ox!<^5JxHB=lD{`K`ce) ztU90%$~5wsBekTvo6CZ|{wPAvTt;OMGYSlFwjzIqUz|?bz(tjM=-zKp6n>Of4FXPI zKE55v7=H+Z<{^89T;Kr6SF)z>lt^TgJz(O+yi9V~sZgM(VsbUZzm&T1-X1^qk!|Ti z@7=>rwrJl9E6shg^Qf@=s(oqR`MW#GD5_Zq_}4XmzrLy_70GZ`m3il}8<3ZcDk4Er zAut1+ntDVqj3Cg3^~dETYc zU9x9x?l*Df90$ng_QbyZr5H~V=@KhV?07&~2k zjq(qsg9MJhTYAYPit=T*-8`p+=}Bl6rKfQK0LcXZT{uQ{^+6kO7udI4pDyh_D$C#_ zz(Kp^>K;ITZ3oVL98&XNzp8IHJ`aw_Gk(^nVb7bry}%4cYjcx@e6bay)S zD$GVH>tHBISFknB!%o+5Ymou(cVvm5Q-=aW2mQZjP3NE0KM_+I% z8O*yF)C;@*U9yx85-f=SuMz76Nmk$oIs7cTm_1k5_SeuMNwkA3LS#cvt2m#r z*vLWF+@9%n9N0+ti`x~7gxhUxY*=TTzm>k-H7#`Cf`H>6&v#>`#1(MJ=G0rf?geZFQWMkyvVX7&(S-Wo51-O&2?OI@9SN&o!Z_Ys{ z0)AZgdc=$H)|f1aU@0<`^6tXui(|^?m6zyyXFs&V@XM5qbu{|Hyd6txPqq{38#*^D zC#0___5u7G42Z!M+fJx*uPPiveaKWXN7dnZ>7T&w^>6I_ZyG1ElLiQtJI#Le+UV z#iP>Yb1h$+vTo4OODctJJw@7pC_?AuLF6-#bbcy3+Q85x$gTI!NsifhW4qb%Dbf>a z@D$P?w0bN@8&RIi19r&X0hZJc7!twa@kj#R7^uy;0t2^G2&~A#*O0Yx| zA+o?Fae92lEr#P{n0R~z+G*l>9aQm&K%t>Wa^5^|5r9aFg!$zuM$u6meE4HLamljN z8j+QqE2t+cy~VYPq*9k^Vd~=X2K%F?b+E{f1)S$o3nhlKeP_>{j}z$EQbVYezMC;> zRoDMJp@J)XXBace((-#)kF`-O_Rblw_xuMOaL(%5-r0ahm++)eTxL3`@&-G6ymjRu z=ZeAH9lWCdCd#}mC_+y4EqwbO@hUHHPX+JrYd>Qm=I5GA*{%KxRDXQ_ z;7|HOug3}n*IGMY%y;H2h7Ih=CQP%nTmWe_%|3IP3+CQw^S|0AFOl|-5w@CSR8@5p zH3s116I@XIna!c8@*0}}Wz?rn*+EHaqU`2xUo!pkm3q4Z4FR)9t`6Y<36$7>K;R$N zEa>L*qp(@EVx-!CaX~xwp3>6F(OiZQ!@id4WS?2VUr@+*j#br`M?knSd#Nfgk5XVZ z;(i(xV}V@9i~89%Q`bP{f)rLd7+qi56}NncX9+7Xd`}YXrLVA86_}|l-c0gZI=ezs z@j;d$+j+39`-aiTgyyG8?_nN!gIq2K?oVFyxJU*TVN&37 zY?txVTncOyem8RKH7=4I>sJ+|%V1V|!Z3C8J$W0~%PSA^*5LhdoK!e;{ge%5m=F|( z4szdD?_44u@}jYpeAJoX7iuxp4st9UBBsIBZzDLjqI+1ah9(>a)T8Nzj;%c5V-`gI z?eKTik|RS&n4@!7T6dwTM9gr@zC96nNGUJk8i4h@qZ$6xtmRO1e1UA=qhf|Sq8RYj z1m^@W*GUbeb4$;n$K-s=k$CF6X5#cb)-=m`+D@X`mgG!B0pk@wkzoz^4-O0}lG8u0Ega{Sx!kRna*YB*k7Rt>Na1P__tpBh zJ&yj0eCa#M8Ie*O)e>|EB^PbWXA?8nxh#5-2N8)fg5 zN-H`iVeHdNKr9z8HKdQ+K3KwSq`tp_%(<#Tjs-ut=VG}J+m$X!~7Lp;8 z(R5s$Uq;Dkm)0CL@WA{t4*&U#F^*dNn?|&7WyqVffNt=-c?#h32)l%}V0&}@%hg_D zqM2;+@{V_5{LG=vDsr`=?>#R9o4X)aFJi1%CzENw(dj+w0$HIDoWVXL_FOv zQ)Z(i|7aKGfi0(kAHFAI{Kw!P>M}6;<|@U~hv_rT*|M8`om3^&cx#tPxwK|L1`U%@ z411)=Q$nmFzYa;cLrgXP14_9cLz@}2@(h#BT25ab#$pu(ZNq}{Sb@_QN@>S{tRVxp zj~9J^K>-uE-EQo;*aGnC=PC%VHDjWRC|ip&s+nZkJG0NKL}i8q50iU%e)YddMPr|0 zdyPz9Ur>FF`y7Ba)JW;ZK)@ob3CK`Gd%!n*|DmT)HhafriOdvLsWP!G`M_HiJTG4c z#8&AP;>r+>lIpwyOI>fm2*pA9U%v~PQ2_RAo=WWfMBfYW40N^GLYwZgUV0qjMSmTt z!%l2RH0LIgl}Y?eQ3?Zj!KjV(BJ|tJ3#lkAB8F&;S(XhyqW1XEL9!uBVrbc41z#j$ z#&a{N^`K~6nIJM2jU{8{T($k{&9IqVLs}d7&%~668s`Ux#ibC~@jH_J4loMj59Dm@ zkn}Z)K5AfV8RtMWOCbIq@;72oSSCefj;mZPDRZ z4-;inx0qy*8gDF~szSD-#k8H_0_^!bZfp39R(lVyZ zYJj*^*&3y>@*15XV{!V4J^?j)G=&TrjI**NqcFl^nTyStG)u%nnZh#CB$ye)XO{T? zw;+eg%}-Ni1ZcsR$bT0YCnw=HuZz4Yb)pvhEG8vwvLnkT28DzZ%Soj>R+5O^&l!Sd zp6I41I&A~W&wW3OtBMP~{pIG8P|p&6@tGOLIZ7mE*FOUWB7PPcjp~hmz2w2Xgc&Ri z@q9gXHX8I09q|m1slLN0*eWnX`>=%KHj4`DN3wzVgnj>s1B4}Y8;5WSHaxcWrCu&J zCzg4AS?bV|5hmB;2nskE7(TZ?Y&IOH5@+ASe|}k*NE$Cch(Cw((|^?c)lUDa1$hn8 z%~P-DH*^OA-P=ow*>UODU+m}z(D6Nb3?6R?nuFsx5}QfX7_YfW;#D-)=epIveEDm2 zS<-!VSq*v&?w*8=6?WW}UbQvA z?91D-__bhW%nB0@5i%_NkxwBb4i#7QW>SY&aLFSK6;cWP6vZOT5bvzc@>4j;mg!`= zx<1)EWj4lH_9Di)WYeD#;t1yG=L9Yb^LC!50|xHpCSxk;swh3Ayx-=DqcYpjmT6Ev zC%vZYbZtVw&8sc=fkj5A0=S2x7DDBw?*{Wa=xiGs(MPG?$2mJr(%J|BF(WNK!^eNA z45L`#7?Vfr7lNj1;kR4#)Znabd4841iZ+6$sHundkpi`o3&H?7^+h=p zKg;?St_&g%nfa4#9z;Yi3F)Xzd?Ijx2P&=;$oCCm*Q6>z%!`*uX9+HvM zGs)#hg(qMFtA%g+grUexVnwj%ZW0&Q*XQDZ{=uH`sCmMep>94<2UBn9@rrqV#0(2k z@X?{lj#dnh#IA0?ziL_yYK_Rk2fT@M;&sHM6;cvt4vD79(^FoNJpVVy^a=GK9aL{B zym@l0EM}h{FOTZ8oM)P&YZBuU7iK>knJb8)iVH0^RLLJ6($)MqDCYBbj?I>C8b?fp zf@oj*t+g03La8~rMw>=zhhUY2ozSZYELv5YpY4O6Hh8SQ-A5HpoWS23OuemSDx#*g z0SNC=vo+?cF!KBY*zCJ-Ez4kISa`)rp}ay&RJMXcX5&!T2kaguhxLd}d`WYjw^P(z z%T(WT0B%734b38+Aq--+f6el_u9<-Kt7$wLknxQ(R3lrOKb!)t2veJx&vD^RBY%_i zx~xE|zZ&wc)q1n$ZroGNDY&9Q27~v<5lgT}Iw;6PLVJGU6pESx>?{WdAEZU(9wm2` zPe&Anm>GYd47cOkXX*4*;d3Chhg}9#``$pdEgm~!iJ6mhg~hL~Xc7e2%bm;_YCA1~ zSIzyEAtNLD&sA3lj_70F<#7imLe*u>7a#*~(hlaA+x~7mtS+zDJN@`+~n0B`MbWW~`?0omYQP{O;#UKz>XdxIfm8 zD*^nG(9QYnxTQ5IS_hwMG4nash)Mfp3Gxw9XrY*UguJW)vdmzKC-r(#!Sc_bP)BKy zL>km3by%y*F6{P4_tm7e=oX;W{wbfgjv<$djnS%$bUHatc1>LOtp*~Oh9&TO zTHoy}d4$Nb;6QVK2|?1~)Y#xCe5^zsJX>MJHDsCW;GZ_cFn!MDY#yKChZ~@Wy*iw6 z%lC#bsP%eNaY&Mm){1(vgJ>w2T0^#<2YR+&^?TNz^<2VLU${a*BuCsABH|Ug!2DEM{Tc*u5Gu>v$7F4-Ziu#*DcW@#ee=bX z&ri;DxHt=#yn)LzLv5U2{7LChB6$dy)hlqKQcCCLG^KlVfZX8I`Q015a<+&R@7qGb zBRum8&B+5kbvQ5kBLc8_y(Xaxfa4jbc&lSl!O!+`v6w-UvS?KR`!z6bGFaQYpLF~o zJrwq8T-nD-1b4je*!m)g_3nskQ>PZ(qzh5JB(9|y4K@n@v;T1sJV4CZFx2zxqf^lt z+nB<2*YqlqXZ~9iNz9cg*|BiA0&q0Q^Xc z&(O2+hm=`Bt$;?^pPn$1(2lv-S1-Cr13c{*gDK_iHPe(h>FJ{$d}AoQbsMkwYnQ?+a%|yRBpZ4Oy2r7KfdNj7OHF0NjF__b*xk-OG2xYez=`;%2%L z^3=;3Mqq6eWvotp1;nDE59HttkhkJuSjXuJTfvu>A~73qol!$=E<&pHEh&&9cu7xh z+Ah6DZBwxrM6hhtKV}xz)Z%Gg3DfO0c7-Lq=V&6K=m2=R1z9xZEc~#7Y1AJ*msT%0 z+wmHWtR-w(k|#tS?Z4UYoKf)l=M()x7XZ7u!Ncob{1@}R>_^kx*5Iw+qgrw z2J|Tky;rwoXbb16Xem!QL$u@uc-fNWR;nh9L{P@=b0}5m1PJnXC@AzX)Bg>K=nx7( z$_N?U;4YSrr9lc1Y51zD=x}^YF}#26Lj|jUtqdj2-|dxuIZ`mb+=RRrO^)mlHepgv zv#wT9jQfQ>owNlOBWsh#S_l+(W=AOzkB@~+AWVN;SG6YNOw1RwLvuAiG)cStRsNjUHt_Mz_J3`srV{KuC!1WJH_9jMF-Y#xLb-sfVrOqA$z-6246 z83#Jxc*@uVb9(Yh!M;m;s}wmmC^ zKgwoD4nB-XbU9!sCll`N*8@{7=FKaS*aUl~601X%Yc={jVFpG~&d95M&dt5mIKy`n z2~Zu7_E)*Zf~OVz-l&9F1bI?u$J&knC_JKxiLhrjas-DNYQ!CR=Vg3Wf9IZt(P&wHA8AlQw!TUd8DKtLo`ccatH{NObkLnSZ@=}WaxJ9)CT7z`{~Yn{FyLw zIJjGQ+QyJEs%Y~w|70(KVpUG~#j8N*<2})BR4lmzJtJ`+*k1^fockhHD?Qlyc$JJw zK;p6`zm?Bp>lG<*QVF7UL$Q)?Zqf-w+oFv&-%d1|`3&M`#u89c*kU=|P}Qd2wP-JK zD?)UNIs?P{!AruXGCYn|RYQT0DzNl${s^#)VjU}nKCSEDHEi$lOr-c6g;>3AYW$=|0&ZoQ!7NH1f$ zIq>W1S%}ra*isOPOWpx^^IK$W$9CYf#YF+zdQ$4jFpU~)1C{A=vNeF~1#AY==j8s@ z8r3qp4G_0lfCbWm`4bfRteo@JhwVgZVpVZe2X549PkH(mGn2bUDuXL-lC?D}II0eH`64;S*Bco}2pd6r z%6N^&7Eqg1E3q(eClSwdbI-e0&}EAa>Kw?)zng^Wh0L}==MlPxaMSpWU^0pHOk@ki z{{Ix~GD!lxc(2(VbQL%p)&lCM$sn%YrlR0IfzeIyev~xmRR|0rs^6ljIR+AB$~;1k znT@TceRTXRXiDTkb~4|PfS_ug>FB5bwO=5^m5viMnHyv4vR61osoZ$J@I-fE2hri; zJVmD|WxRWUIG^eWTQFCBSjiFc#Xj~BI@Jin;h;lK7SZi9E+7=!b0~11m zSB-B;Y_s3+#UA^mVL!+U1H~o1%evA}V!J~T=Rg-7o-X!4fWteM@XKkd%(~{jyX*&a zB77ZQ5+yLn5phEny}}yoxHD>W2z5!l7MBv=Usci7d* z^s`Ay^-*?^cIdYh7J(g(6Fj5+uX}6^YdvIfB>k>x&A?35$`R|pE4MLIy2W6=u+Irv zE=PRWds(qSm0LUsa5(zf`E$rh$3@vVQ#hz35S+8U+fKfQ(GG1LNpt@46Ql##%Cj9Zd8sD&8tx1>!S`)N#XX`0f(Dy|t&sA`s)0*1xcv^(x+zv+&epQ&R`E=_Z zLxVjZW|U7XifD{^U&i2P+>HSpl-l0c{!7f!i!0K0>xO|(3U0)G<>DmFr1aMSTM(jP znK!3Bx4R3!M}5xn1IQlKXA90jPLMYIC6R`?PMn`pS*xkkvI z?1W;VC+ivxb$X7&yo*N?rRJ~iJ$W=19#&I3Ku!w1DdHiqc#H%^XG->%S;B(U?i2HV8qrXq zz>*t{9L+2~tRo~(+$}9%gAQb?DEYpr_v@dvN|pmdVGFN=nH7;Fg0<;#-OUKh-K@a{ zPY)xi$otN>Iw2B*uT?R^-_mxTMMM3^%c&K!W`clTj2%QTfZ# zlXBGpX7Jh|2Um|JIsSFcUR<%!`$|))x}KtU_^arx6Z9sQE?PveirPcYyBv593xkkA z{N5#%{MZ0PCf~`4xKLX7Wi;J;mL; z@5SBEQB?A2F{gI{GJ))EsV=nKvB5u|ol20_4aru$l^a=}z4&d{lYEt)Ng}(}iGugc zs#_w-sV?lMg)4={bRPrpA!!2%@WdzlYVs8PF#!#M!`s zr3}1nbm5?(%g$6-q!Hq3(+=RBl#k*ywq-}X{f98S>2zDo6&PU4HF>r{w@|#Gae$K`5-!? zG{9cR_!kJ!Hxmc3k2!-_=*dSDX9r;6?OB`#(~#ugfUHY@#O)nfVWKVqRx`T+;nyPH zsmII51!50l6++-^Qb({gN;AI#?0U5WgrKII7=WH9N z7ke3=sJvJCi6CPOQu%@55iu}vPZ~aVMM|E?iJefDi46aGSR@@1yR|HCjYvuk6F4Nz zAgRXAwh?w6A-P8LCk5*^_x6+jBC*UEKU2Ky+9}+xzl$$loQY5mXN*6w|k z1b=i*a|X7@oZ@c_we`S)HPCIB4M(|vjgB8N?m=IkP0Wdyo0&4)nqmB>2yoOjM5!sl zSN~qaGQgnpUalvWmhP}S>bnV0&WF6j<0{!HdMm~ln`DH0TP>0y8X_jtH+Vt@1I|&7 zX8TGY%}+B@iD)QJ6~^BUK2{iKSKQ9#@e`AX@F$hYxK*7F`uCn+lUG!Dtg7T2n&&j` z!ZL|^g)WPaPSRxnDNm&EeL2Aqso_?`Eltztc+P%`*2vlHcC5yUjRq?xSAW1pxu^3q zF!i=FP%`T^pw(Nc`uUK3-xy9W1-EQ%Eg5dkOV~+gRNqjr<5aXu(9;%43bP}ke_UYC zCTVq6uZT(tx)@ZSJop}jNJ=q1Hj5LU_j>$w27-H~eM=co3EQH*j@?uJbYv9eZVd zU+43E{)y*#{BSt8`@Y}z{eEAs>vdh%J3>=knVgKCjDUcETvbIui-6z^G5j5Wjs!lL zq@Mf2ud{A)sygT3$M2j)IDDsbSJZRYcCvE!GIg~iuy%BEu;g_!ceS*1bhB}CUn6Rf zflHC93ioxq({U3>zlYW*r0W_S5x?K0-C$?sVP$=B?z~5SZy2pZ9)|N-lDk@;Zi7uh z;=~e{SySnlr|r`4XY}wVUd$&&d8$7Fb~=Cf%v|F_EnHLP_g?k$JJJl|5)x)}-%<#IA|uI;P7c?SW&9IV- z*E}MgNqTj4bz!80+1}p1kdcf!rS;=0N*HjU3F%eoFivY^WaRAP5?oPnX9b7L)Z{d= zvx|E7j%MX|opx&H!~|nW(=Pq5=OSF(+>a56h{L_LnUmu~m`AyQ0Gaaga_4`WmTCbs z1eBg}x_ifalloF5+QNzA5#inSJgGE|+k4{g!kU>iT zql}D473jWD7& zWn`G9r>9HH%I=pKHLYiy9(DZ@O#WE*b&Tro+??4Owy8Gow7E(?zd|jEk3~cTjWMM{ zDEj(J^rIxLaRT&=QLDFYCY$}6Dho6@r$OTKU4ub+qj4?9R^4d0XYa$+OLF^XdTPkLp|gde5uq z@&EGlOk|%@S6Aoc}4FjAAguYKJ@bi*&O-e*CCAuamFC1<#c- z>!z!#`?0aH5la%v&CMMmml=dYEv9|m-}kQAuIfUw-Tj!!@&r{-TYGKNZ;RX$hY17N z{ciNR41?m=V__hx%%trFeV6!tm+VPq%}|xoBvbq?4N>va-rfiHGr2$tEgF+y0XLPM=gTX^5Tw{`9Nz_zNS2h!9%dXDYKCk5Z>lUfBGy zXzg5e7C}L3*^_-XK))t%LzrB!ZqYN-)*!-bvH?=>k*jyJ%6?haJ=M`M0X&wAM@2<# zjI^9Q(JgVC@86$cKD801)dsY(;%{@bP6U%3+}J3UhknjULndrDe34PYJ*cHc7Anqe zVt1m!tIV{W?p^W6h>E-o$;CnrjNe*SIT_|>?$xMXn`R>1Ldw5N-Upr{k} zHFMw*wXLmfcTdlKAD_DftsNc2j*gD&TU+u;e1=nNYq7Pp;`o8OVi@JMwBD*C$->U8 zwVXEhFT+*k1^J(&A|vHm0*01Ba!o65}y_#!sx7xMS7g+a3)-IXg> zY|x*tz^V&4VWm)RrU9qNUMr(zL_mQm8XB*(@sJ*A5htvHg{5W0;!tkXr$=eMb6LkWBgLp%-<{7`QGnzl zYFb(j0|Nt#p^Nmw-6lL^V`HpRQU;&b78W?`J(f90D6Vq?1BVqA6>YySDRDx{c$r65 z^#070w{(G3FbE9|4F^t%1XAe3BB#TyDnPeE%_Y5m&xfgTaw=ukk%qf(S7&PFD;OB0 zYjV5;!ubsAyD^k!d2<8$TDC2KLm`$)`fCs&k-{qqMhlz| za2$Wj(cs@&z_I!C>@54E5>3v~mX?-l;?5ryhH_)=o1er+UF9cZ2^II-FTE!6{N4KQ zZeo*@on3*K5h=xWD)X)w^uH5=@C5Fsy-S~UxHdNX_@f*r>U$0@6O;v>29){kJkL5t zx3%5JcanEWNl`y!4j+bwh0)V*~3HRS^+$%CRc)lgGPpoA+(0C*(WWqzS{LY z=F4Rd$e@vhH8nNCsaK#`uySxHJxmjQ^e|Oeb$bb|mb)c&xKcqfUR+p6LqS2o;b7m_ z+pFO0T&m+Wn61jn&#$4YoUaqvax}|tH8~0-S@8D7!h+2PJZ>Zq&zQAdrA^$*VTx>E z0BX()hY^WOPUgIH`Ep08X}g63KcZa0mtRouu|2jCVSTW<=?XApc{Bra{PykJ&}T)7 zOW8R&J-RP|rmg^!j@T9y_Z@e~v1t`TJJgc=o+g&$AM^FALR@;fLGCO^C>6BEcf2KW zY|4?qAB#n4C=~Db4SF>)Y`~!d!mQHopX*~Cotp*mad9k4u}r+^in225jg1YG@h1gn zGiW5`wnl-+9xhvRk5Kiw>ZnU4EhqKrTAGGCGTPc&niD@{PwfS6=Q$o9?3CB{_xBI( z(b3S{e0hhCx#|LxnM2rr_>jhKq&NXhPEL+qJcU9QJTR2*XDY}h2+|p|L@ZrW|&@$!|pdu<9?HSb4r?leb_NFrEgXKJB%kKS+)YC6T(K?CEqWag zgE;`|wal>o^EE;#S5V~kXuU^4A@1~KuZ3hBo+81Y^l%qfQ0D08*I4Lat{O@uhvClT zza9Sib>S1OQtKZ~@$oNUIS~_Hlm1UFtgM!b8XU$&QRMz0$M$EFj1?Rmi`9>hj{F2Y zSM(6aK|w+E=8w4&K7yRnnxJM=>b^9&r1bh4i<*klWOGrCk;8DG-Ed(X+QZqIDl9C_ zcX@7pzHcdpL9DpbQ?*5T?X@7=#c024moKwP4)cY!X3D?%CNv4_q2$X%O>MgVmClVZ zEAZNIOa;2iuEmk^{Rl zv9e04su6`TRY?u`z_c54X-*Gj6M6MnSyF!uf_1wyM*UOba2GXCSO08;dl%2TqB9oM!Zn4Bb zPVwi-w6;;iUl$jb^5{@@#GU@TZgVCbkyl8Tr8d4^E-%j^E&N>Wvt>0_R2Pk^-2)t` z{kT9zw%*6z(v5ZEIX3T0n7U$^~ z{pG|#=z^zxd}1OyBo5^4w*@$WlXRP1W?r@(d`}gThoZF92A12_zG`{%x=<;(*ia~L;WqNWaRImS7`(g^1_SM47h z^uFyu?yb_Hl#qtGO?zbC&N~V_3g7i)od&Zb>|;CgwR691GVJW^BqT70nRYX9aS%VN zESW&paBrIG{3zx(s10(N>Fhai5uu4@SGSY&TuI*kjYgxVSSas#QNG$WlxApmEh#K~ z$9^X!CgvOkg{c0aa?JI(_mbk`3-;%p4>lq9=*J0zpon`mX_YYBG=j` zCML2EhRy8kxW{vxdVST94{7~}Y;KhV?zc0)uGViVNlHjau}pT*He=s1(K2Dd5OL!` zxC8_QB;T>7=zlM5my_oXcC6_3ty{|UvIIfy567u+hAS9`qt4RbcZ7s2PJc>xuGmxu z7edctKUBt4JLtzNGow$E431+y5I7(0yCCX7D|&=)Z@5l>Ax4Z{_earJcR8-J+R;h- z3Y(dkt)rhBtCn1$p#cap0bv6bl5(i0heuq#MSncXbG7m<*qFHz9w7YvZ%xJpVEp}0j{b>IlM%j#XG^ zfgqCbc%Y_MX4XLl0!?L})b@5_jCPT(`L|?2{3E%#xj|zJC%!;C1zk%sU#GjjU&+df z11vzU7bik(sM$b3K!A#vn7GP!L6WXGb1(L@X;8s0D=A;|C11YSU4dTo_)v`Yp_}IS z%A^UX1-pq6c{#a48foc^9K^am>)fnCgk2JI{N4Iq8o9nyP_lG{ybB`T>U*kX&N=2% zopfkr9ndOd^BaNld5jvb1qTO{Z_cS<8mbG7Q*9{7s{(}`Fgm)T^H{Q{PoE;{s*$0U z=J~*Bp5wz0su{8lR+oRR3pa|ank!^#nygQ?si>&50+$E|9C?o&GKku@;>K%=XeLez z*2D`YZ|1;&R!!fEWNb1b-lwJM^KaSO9Lpan38OVG7INd#V$pMdD+dR)5SIoD5qHLN z3NQ}FKqV|$en)zrOSIeDvOfglJ^Fk`V9etzCDjt}yVs?>e?~rD6!J=+ot-VOSY~K1 zR`T-l8dLRrkfSx9Tw5m+dXgz^^GHeQGj-#YW)8Ot|w)RGEm&Lha;&s@0czVJqsA!bRw>`cA2<%pPCR|;qx{meAi z9&Py#P1E_yopplSl=v}U9)eY~VsxxNXnm59kRXxNj#fB)+YV|3F$NOF6L$pq$pu=q zgu+44ZY{y{s``EhQDMxn6l4Jp;@;#4+%A0s%(T>$R4acY`0({{4BZv))yVxyZ^`MOXGB_*8c8Wd_#+Fy?O)z#wY( zEx6ykdsp=b`gzqrW*9@(n>SQrMrDfq$W#l=--x3}F-f8trQt0b7?pWvLV}}3jlj3O zNgA@pG?V^&hNwSirW~QWck9$~A_|_Kp2-5HFLY?-OrXHYJ=+oElP(EjSZ^YLl(LG- zIWRddfnA4{y`#^-GD#qJc>+%US%-Ll*v*-5YinEI+bdZsb=MP8he91&#TFLw@EA86 zDO2&((Q)YpcFW&?_IhF>X*K5#C>t8PDfwz>uB{KJH6)md<=T;-0#CLwM21W9@>>7)#OIK|3ki{r zM(X?d`~0+R(;o!`jltG6ThYo!*z7jt}R4WbjKO zw!o%c6Sdb1kun|X=}{FGM}ts$laOHPoSG4EaK7MaNn$UwN})xH)Y#7?cRpr5$oT?R z=S@n=ZDdn+_E``VkRu(-PrOq&6wfG|38>0994YQ@j9s5zPDB^uFmJI8;B1!Ar+3hW(A9kjfHoWWs0c2N22*86N=Zri z?%lf^w{DS+6dTGbDUob$ZMm%ezT4!IY+k3VpwPXlE@s(Fb6wK&BBWZJJUj}(ir^Rp zUH^8!dZvr#EO-G376_->9#lf336G5Y0^Ski7(S@5u(qC=Y6}7X9*^r3*+!Am37rM0 z{G||4aU--f{jQty&e9WnuNahdV>wH4T|B5}GC;q+sxBmi2>fj*G)(m9E`w8JFjR)g zY;OW6#J>_=s|GxV^)#T_6`h?0z_yy0n}@u3aprdu@@h&-in#0Fb8_B?fo;CxF_IG}MrZbwM*6&|=zh%yxL5E?d5aY%Bs= zS0wb)k-IN#hd$B0HidB|hJd@ny|6yY;*ydKxE6xYnwpw@$gpyT@YV=yzyh|Zp-o)W z@wZ;CB@`zwH65Mp_Li@&ZzNb9L7Sg9#+Ds0mDKRYe}Fc9$0y6oUqSv9k9S)GAeE-q z*7(58KPQ3YKrEUEB&W_09;h7ut?u)DeM;ppb4mfO+{@Pl>K^DBd84gp0 zzgzhTxVi;QRY9__?OO<R46EGKlI<6Z3YFODRN&=D2&7tw0=L`{%Ws48Nm*HVuh;Abg6;_L1 zeNK*ipu<+so4dKY(_G~@dbgDO>C@6LI@y56p5w*G=4feR5JLFPn;myXNl)SZX1UCJ z5a6`U{KkwJ)YuM@!2AgwtmUi5bHJX|y?gh92E+>}8{<<`BjGWpfxg5&mS`X+3Wc~I z&kJ~w3@1X)gU2@zu94@Zp>`V{7u~U17d9yGGM5=8s7?~FL%<>Ugb`?#ih+R}rBeeT z0&02GPLiis2Jo>km~#Q2BSL+ucr9Hxf4&X!i5;xt$nY>Nv)?)))_aB=pT8n^SK6Ff zj$h)_OC7G-x;oV^E;is1AgiYK=2W*jN!SJX@m^Z5(Ob}S_4oN4tq71tnGj<@F4u*( zdaFEo`bGLRoscD$d94}wf#y_s^r!^y+GHMH1<~=n(B_ zdJ(dLfdROl6!u@DXS!l%K-1OwABsY?oT(rdnbRA zj04&rNO7L;X9T{;g1L3}@=}ISgzEbB*8l~4jxT~uzbzzm2?$=V%#0ZJY(D64P0!7} zgdy-^*LyC-I@_{HE5FovS}vsQ-~N6p)U;=~lKze)iEdF1*lD{$V}HZ>m6g{r{s$i* zq2fT76oWs(Q`p=1_%gc@zC}rXPR=0ULx{&=+bI+t7|&4i=@+06xGxSW&WrweIfq+b zwp({hISAMTn#NDdN5v>-h$`|mJeP;b!5MVIdrjeRylBb`7upAN)F3WD^DZGFGw}3e zkg2XKE}E*5l%1Md8Vfs4ntFORXof+Nv;Su|uSDkmvpvB7-uwTZoy{X=g#M$ne+jJV Ruvbl>s;I6|ENAlKe*j2SP+b53 literal 0 HcmV?d00001 diff --git a/src/documents/tests/samples/thumb/0000002.png.gpg b/src/documents/tests/samples/thumb/0000002.png.gpg new file mode 100644 index 0000000000000000000000000000000000000000..8a61a91265589106b192d25f557987f1a574faef GIT binary patch literal 7141 zcmV-ATm6l_M}ujn^ej{U*A*4iMGlxyZjTNR=|Hbj6KqVKrn}4YI0X=vA>QP z09q`2C3qD7p2dTy?V21;m;I_KA<6{MJ5A$Mhs$lgw&X+W9-gfK@Fj1h-6okwH)^T{lT zXDE`v{@K$FfZcDo#xvr)bC@f`z$$$_J5|Ilv z5G8WwN>}f1AI_i>N`g|;i2;oIl=F;B8ZPmwre%#SEt2`M9%`jJEJdJnmH#?nsl&j< zFZt3aY(3n!7tA*{`c$2Vi0$0!zE|J&ATxb4#mh#EoRqYNT5ErP=Va7}FVc22qMZz{ zw+-k%JZoIYQjeFY=U4jlG08x)v+I$3alVi!Z2q4mQQvkOUc}g{uMLu}VYKVcmrT|w z`Jrut(V8o7Dey#a7GgdDZ`pgoIfX#`hR}6sAo4d0AF|2vNeQ6P)4@@>>K6Bg1S7yK znuecS=~|bp3JD_M*LuUb=MGvlb5@GV3np=D$*8tf%yYkFBM(O``wZTTjAAZ%0}&U^ zZvq*=2i-(*oLJh?1pP`CPO===uRjijal>P0Cg9+o0Y~c7$zD>9vy5Rc{o9id)YhB5 z<3*^7ci8(=8PGg|CU`55ja%lp9uIbNc2(+&8Qj?_^Yc^WJtqL( z0s)d3v3V)|s+mr)m0rwj_*3H(SaoIqJ%f)y{nIwa@w_T}N$IR>Xyv&aCwA(Rd? ze`fFLYAA^6eYskVm6mY0X#bqltC=n)_ZTH%esGw!W(iSz)QyHq)3>~b5qPM><;1Dj zu;9hWyALftUg*8j(TmYfV-#~-(co(kg}3-5egNC6PE?1IKir1k05=n^KC()BT;}~Rl*_Ok z$veDjNUwWCezo$Rhug8Xu({A7V@n1htrb%-T=N%UuQR#t^jnS8;c18YTu~z?is$<# zQWYE^N=eK8gY`^XY%f|C@w8xr=5?+-5p6d|I1z4Tt3{XbUEyR~)b_5n?8Ry({* z&pQ4s`jX&msl3iJI%N)t{lZ7&ap*bzUpT5UG9$4!%i~DUZZ@(gM1)1&!{Be|GV(lY zTN?IL|JuP*hNC4#VRKYnPa)FNP-xH`bo&DxqLY`9)6=xyre<>o0&?fn^>Bs&H}D=5 za{b(FYmT~hSeJtap8Te1R@JJIVO<|dE=OjNKGfjy`82|NpjPf})D-YWM^LE`oM~rV zp8n+$c(*l*)V6z=#>D_+3|X|l^^?|+N}Jyw*z-e?C5dGi{uV}wCUbKh6sR%Zgv_fj zgOaq$d`O8Vo`avpzme!(snP}BEKqNf;QlhU+&0;OmC4v(ZbV&gURPMhuHD5)IGa5& zk0qs*a!0s5_===)FY#xQIoc}g(8--1%By7mOZ5K9qBMhMe(I?%v+|?T1_^X6Pz|l0 zqHU4Tw(b6*^P}3o>~b4R$&DYO!ylA8NS;NH1++IZs+bJW9qX~|UW(gNKT0F#Gep~_xm#CF@Q}sWO7BXW%+QP0j-jV_7?uBO_ccT0>Plol(3Ytw>_Mb zYh6IXl=pmysCT2Aqb;bq_rZOzJ5HmsfBvio<|z1(sx;hNwI|a)+`ilFQ6(tIbbWDP z?cQ}KwX1E#X{rUVd@XE}$F26cmQAd#b!Fe^uW5tGQkVWjd*;xrabPSj@_p3ySE&P( zT&@AVxQ4*Ns4kVfEg-=a--`Y=P63v@LLYEfh-a5(OaG7$(cM;2J(ZB zx%St|MjCF}Ykom-3>Ua%GqO9CZbp#MyCx*G`yexy^Y{4oa%N~Oi^g9nwLbK_G!EB` z>AFmtF(yBs);{cYXCw|OsBttDVQu{gSviJ6ulu{f8@=I~wpjEI%t+=h6rM(vU3UYE zg&P;iKJ@6ZvVzGQsCyEOe%5`<);lu9a?m9UO^*PwQar`nj-wm#@rvOHX44KEddtv+ zxrIjdr&h!G_5x@A&w%q41Wdgi{_mSf2Bv}szYqXo4rgXK#YWsYb~1rt&U8F0zUK5N zP-v@3;Yo061ZDvCm~-;?`h4ZtCpXt$FlzP?4tm5X=`WwA{m7?mswvI&gQpAhrnnhN zO;+)(kb55)4Ue@fp8qk^4=pr7aucczu%m7iD* zLS&3k?qB987xpw=vZ4{7ZQYKvICMz^mm@BiLi}S3H zsdVJyZ*e_RP2UaW!N9LX#v&CW$5=(`h`Re%G&V5GJ|`|m5e4KeTm57iI<6rD>5}#j zl6d!gXE(N#t&TKfvFxufv(pT0_^N!C(#h=s`l0a8j=|z>MMqaLvQFVzrz}zJL{09wErFTQA62|#rd|+kXmzFP8InmP`OBAn#RoY)I zdJjnD6Njih4y$RMETU3EF`n+{=lJ%%c!SAO3#nSnp6Yk_idcF$<}&gKdT6@$e^1lN z>_`p90Hu?>zj)ZsCcoP7`fS(C9!eOvY^$P!jBUdIt-7RHUhxmgp6>BBPcZ*D1#Qon z2x3Sb`A8sYZ>w%g9sh}j&N~ZD_C9htsPvo_vNZ+Jsv)l7xxPII>~YjAF>i!D!rMsx z)yjdc>Fu%^0WW3suo&QBnb!I7TxU{dy|_ct?^X3LGkyOI0XL8o48}MHos>4;ZV_h)@4=*ZJ6+}wj0ZMn#s{^nv1iImNa z-l90Od&o#WVc{{h0Ej!YHptV}WNzl*gKqjT2^fzpa56IrJAHjh6Sk)JQ7~34yHPK> zbpDd=xs`qW?LeU{FnAfIw;}Ge4l7O{VwfE~4T-kFmd!O)eddNAH(3s8F_ zAB3CH-1q5}X;>KrZy>EcgFm26neI(@zL`Qbx8z9S5%rzC+eD=BOF!%_hNQS*jY)zyh zqpA`dqPCg+-J#kSaUb7Dl3ua{5LcCuY0v{Kha+JrczUD-6x|)0Fu?!Zy}Pa%1u%_E z-1wd!Af$#t!}(C2bp_0A~%(X-8ha z*Xfue(d1?GAl|{9aZk}B*Q@Kjhq!AzhF-KZaEp0UIl(9)mf{Y>pzhSuRPH{#F4&3A zcb~X+iY)CO~fvES3jMBaii7YyalUGaVRcdo%oGKV6!R}(JVL-{Rs;C#O$!#Wk|lP z2&2R>QVb^tGvlU`=|qHM1zqUnC)~Po$vx%RkBvnzp=UG$czO#KT(VwT#yVqt&bJ8Qby+8ac(GEyX4R0e+q-C z56~(|9r4;W=9tRB{YxD(R|})u7{MHfdnlWH5kmvg@z3U4{eFdE0)>e5zzs079g3gL zkkElz5?<@{tz;mo2AZC(uYIpc&BV8Ko*$Kh7~xPY>KM^)I}ZW}xLiAMyZ%;VQ~+_E z2I3=co^Ss}r9B;7;vqq1$fp@c6*oyY9R{U9{q?59S=s9qf=D>ewfNwcWtEB`Vpc`n zIZX83tILUAve_MS>mk(X_MYveKpi^J?Z6|mdu-AGAy)@r8a@01A*+QgR~}>zo>h!2 zdZdI_EtTX?Rx>1p1hH){Wud;1fm;^7Pb5cLGXT8oS&uDXFYPigh|o6H0d`+02v~jI zbk+jO2nwRP!oUV>SL!Qx{Ep*rG-r1Cn@vY;%05;n8Psg^9-|6aFdD2Ai^iwzomiJr z?^Gzn5-*954`J|;H1WUJpk&U0Ji~8MYVq(kg_c??mB^p#N)_2UIw4IYrCoy!S8X%X zDNlpJ?M?h4HIR^AKhubSOflT4w(c(-_H4k;4+%>$zfhwgQs%Q%yghI|nBkNHf~FIy+eSFW9M>FuF&`U*l*IFi)8N2wJj@GYNt22cjU}0Iqz?^a-=)N z;e~icAU`yYndW6(z3XaS*`m4aYo5fGtGHO zXGySYnew;w*wfam=-1ShVAYqVrb=L_kRci85rb}9C4KqI#A~xxtIstvQff)yNWmOP zDMzY!Z(P)UztS47Mt(pMG-_ESC@1hH_V;e(CJbne7~h6o(0e%sqbkO)q?IpH?r#QX zH!oIlxHNimq+5JJw}PIi@YDq8*fD?&-v0WcI>2J6YE`+`98qTD1?Fw3T<+5wL(hB} zG@UJ)toc-P-lKItgp@qy$F|V20HzNE4Mwsq3qHpIIJON^e&GjTLc-vU%0JM2aa{0l zWpiXy4vU8Ck(Nj1Daq(h+P;VS-7g06+Cygi8Lyh4 zAnGy`EsfSkx<81~=XET?cYDZIyYd3*krpMi>&%+pqU} zW}wK#hiIqneqdXM6eULhl0hS;sFSm6GLH5|FK?j3Ecoc{+E`Q39<+yi(x77`SXwwX z`O>_OgyYS%@Hl`Z_mRhkqR$TaL43IWI!RpQUwR}qy9I26tROy2(jBE|xjcr%*w9)SnXBTCnm|< zp#KXy_h<8s*FK62_R7j)u>jeOFMI=MPXHr!C z#1Me7q^;Gb7On0bL~yNEZ$e)>y(TZ2U<~UoDEF-%V;6m?h-?Tp+Yh{ImLJTmpFY?a zh6M4L&rhy;h{$~7kBu*psHKpSQ2sexjl(C*?VrNNL=*H7%{Po8hk*~{Bi_n@Bp2TA zov~fmv?=w*(EkZ)gU3=kUpd_=11f`e8p9+1d-K}jFU}m_3(4|~@I(={w{o!!{=0<> z&9n9Gq(Wx74&$d!y?GqB+j`&%IkHc{=#;tvmDYqARzghPXrN7?hYa7}>wPNYlRQ3^ zQ-Q7M?2r?L7NQ_f8>%Bn1)jrJzqzJ;cnU>eoa#wBp0B6sp$L|9{u?q}Aa2zN9%LXp zA~(Zmog}kt%ZVavzSMe_?=?0N1zKS8HJ;S#w#G(0=@tPhguMN_zHS0G8nt(+Oh z2m~I*P&KwrUrP^thMfLG!+WCTWoyLH%!~)W&;kdKVDTYNzvO0}7uC2k>qG{rc?{&1 zVj8s(`=h7mCNZf&VOzWZ@L_-IS7&*|snU)uKDz@iP53S$OE}@lE@0X+M~A~nuyzh2 zP7+9Rq8wih_Lcr-llO&^A=1r#NZAzDajc92p}d?e&{F~N>B1mU#kjF4!wv+jycyvJ z*N6Q`ACQz2pyru2l_J`nlqH@qMGy`Z+0g@^ib=B+fg0C<$wnX!{T`LFCMx$SUx@>E zvGgBPL_G42_TJBY)@`*U4kCL94SySTGk332aZ){uZ%A3m!pWBcbl$StGu6?)`J5U=IC%enSIU^^$&0VcFtmoB-7K z4FIeMGHzBK`3?~J0JO_1^kOi*Y0K}m~PJ70?LhYh{s9r zUAr`R$G`x|KBmv{Pr7(yvJeZRb+MkgM3|X)dn`T1rRrOE;xYYO^YTn0S*9R74A`6K zd{}3XA3PHf+m%xWYU@cKKU**5#+YJYPc_$nr9e3{Ab|2XX}*uZLMjlZp{_)D0znDO zR&#at=AhuW3|ao2!7%wmQ&(8vXp86PrP<46pCrA4H{^>kL0f{mh|$m*v0b>V3MMHf z2+Pk4vl`;WNu9$Vw**p4Kdhd~=Rb@F-_vK(nHkuPPXWr5iNJp}vGT=91YTtnfy~NN zv$lbDybrIC=AM+BK0ipaHH+CPygptQyFrW@#e9_s{F*yo?dnbd=<41g_@H|s8rQRg zpF+kcDu+bwRcX$(z*5pZ19PS;Hf3vyEN8(GBL7V6vpGU+3weTPN_-m1q5_G^KBB|t zV}khpC}H7liL@k}l4&3r_6v2-uwND;ip9XfIyC#VGLD8s@AFDR-{ZfRsOdu>bu0|! z05GY9S0ABUtr#qHubdu8{0Khw<3d0^9Pvj60<`++Xs0dGHA8=T9gsfsje_ZnxG0JD z&NgTtG;>n*W|CSFb^d(2n zj*>lUf6YCBzKlwGGA>!3OKP7cTU>awx-ODPNrWSLcnd9kKX6lI)__rxsiV%x`89o* z>uLVhr2!uWt2o(xk@>{ZBmZ8PnX5VdoD%UmzM3cPLYn{m>C+6?|F+Hp&vhR#c2RZjqxc*Ooc%aiecn5c b)0CCg#qy=b28v$ju#MT_mQiT|m*D(i(OLi# literal 0 HcmV?d00001 diff --git a/src/documents/tests/test_management_decrypt.py b/src/documents/tests/test_management_decrypt.py new file mode 100644 index 000000000..326276389 --- /dev/null +++ b/src/documents/tests/test_management_decrypt.py @@ -0,0 +1,56 @@ +import hashlib +import json +import os +import shutil +import tempfile +from unittest import mock + +from django.core.management import call_command +from django.test import TestCase, override_settings + +from documents.management.commands import document_exporter +from documents.models import Document, Tag, DocumentType, Correspondent + + +class TestDecryptDocuments(TestCase): + + @override_settings( + ORIGINALS_DIR=os.path.join(os.path.dirname(__file__), "samples", "originals"), + THUMBNAIL_DIR=os.path.join(os.path.dirname(__file__), "samples", "thumb"), + PASSPHRASE="test" + ) + @mock.patch("documents.management.commands.decrypt_documents.input") + def test_decrypt(self, m): + + media_dir = tempfile.mkdtemp() + originals_dir = os.path.join(media_dir, "documents", "originals") + thumb_dir = os.path.join(media_dir, "documents", "thumbnails") + os.makedirs(originals_dir, exist_ok=True) + os.makedirs(thumb_dir, exist_ok=True) + + override_settings( + ORIGINALS_DIR=originals_dir, + THUMBNAIL_DIR=thumb_dir, + PASSPHRASE="test" + ).enable() + + shutil.copy(os.path.join(os.path.dirname(__file__), "samples", "originals", "0000002.pdf.gpg"), os.path.join(originals_dir, "0000002.pdf.gpg")) + shutil.copy(os.path.join(os.path.dirname(__file__), "samples", "thumb", "0000002.png.gpg"), os.path.join(thumb_dir, "0000002.png.gpg")) + + Document.objects.create(checksum="9c9691e51741c1f4f41a20896af31770", title="wow", filename="0000002.pdf.gpg", id=2, mime_type="application/pdf", storage_type=Document.STORAGE_TYPE_GPG) + + call_command('decrypt_documents') + + doc = Document.objects.get(id=2) + + self.assertEqual(doc.storage_type, Document.STORAGE_TYPE_UNENCRYPTED) + self.assertEqual(doc.filename, "0000002.pdf") + self.assertTrue(os.path.isfile(os.path.join(originals_dir, "0000002.pdf"))) + self.assertTrue(os.path.isfile(doc.source_path)) + self.assertTrue(os.path.isfile(os.path.join(thumb_dir, "0000002.png"))) + self.assertTrue(os.path.isfile(doc.thumbnail_path)) + + with doc.source_file as f: + checksum = hashlib.md5(f.read()).hexdigest() + self.assertEqual(checksum, doc.checksum) + diff --git a/src/documents/tests/test_management_exporter.py b/src/documents/tests/test_management_exporter.py new file mode 100644 index 000000000..c8d1490d2 --- /dev/null +++ b/src/documents/tests/test_management_exporter.py @@ -0,0 +1,53 @@ +import hashlib +import json +import os +import tempfile + +from django.core.management import call_command +from django.test import TestCase, override_settings + +from documents.management.commands import document_exporter +from documents.models import Document, Tag, DocumentType, Correspondent + + +class TestExporter(TestCase): + + @override_settings( + ORIGINALS_DIR=os.path.join(os.path.dirname(__file__), "samples", "originals"), + THUMBNAIL_DIR=os.path.join(os.path.dirname(__file__), "samples", "thumb"), + PASSPHRASE="test" + ) + def test_exporter(self): + file = os.path.join(os.path.dirname(__file__), "samples", "originals", "0000001.pdf") + + with open(file, "rb") as f: + checksum = hashlib.md5(f.read()).hexdigest() + + Document.objects.create(checksum=checksum, title="wow", filename="0000001.pdf", id=1, mime_type="application/pdf") + Document.objects.create(checksum="9c9691e51741c1f4f41a20896af31770", title="wow", filename="0000002.pdf.gpg", id=2, mime_type="application/pdf", storage_type=Document.STORAGE_TYPE_GPG) + Tag.objects.create(name="t") + DocumentType.objects.create(name="dt") + Correspondent.objects.create(name="c") + + target = tempfile.mkdtemp() + + call_command('document_exporter', target) + + with open(os.path.join(target, "manifest.json")) as f: + manifest = json.load(f) + + self.assertEqual(len(manifest), 5) + + for element in manifest: + if element['model'] == 'documents.document': + fname = os.path.join(target, element[document_exporter.EXPORTER_FILE_NAME]) + self.assertTrue(os.path.exists(fname)) + self.assertTrue(os.path.exists(os.path.join(target, element[document_exporter.EXPORTER_THUMBNAIL_NAME]))) + + with open(fname, "rb") as f: + checksum = hashlib.md5(f.read()).hexdigest() + self.assertEqual(checksum, element['fields']['checksum']) + + Document.objects.create(checksum="AAAAAAAAAAAAAAAAA", title="wow", filename="0000004.pdf", id=3, mime_type="application/pdf") + + self.assertRaises(FileNotFoundError, call_command, 'document_exporter', target) diff --git a/src/setup.cfg b/src/setup.cfg index 4b0a216f5..b540f9efe 100644 --- a/src/setup.cfg +++ b/src/setup.cfg @@ -3,7 +3,7 @@ exclude = migrations, paperless/settings.py, .tox, */tests/* [tool:pytest] DJANGO_SETTINGS_MODULE=paperless.settings -addopts = --pythonwarnings=all +addopts = --pythonwarnings=all --cov --cov-report=html env = PAPERLESS_SECRET=paperless PAPERLESS_EMAIL_SECRET=paperless From a4bd2d687ed193311bd1b40ee413d11efdfd2378 Mon Sep 17 00:00:00 2001 From: jonaswinkler Date: Fri, 27 Nov 2020 00:05:29 +0100 Subject: [PATCH 23/36] add empty test case. --- src/documents/tests/test_document_retagger.py | 7 +++++++ 1 file changed, 7 insertions(+) create mode 100644 src/documents/tests/test_document_retagger.py diff --git a/src/documents/tests/test_document_retagger.py b/src/documents/tests/test_document_retagger.py new file mode 100644 index 000000000..6fe40d7e9 --- /dev/null +++ b/src/documents/tests/test_document_retagger.py @@ -0,0 +1,7 @@ +from django.test import TestCase + + +class TestRetagger(TestCase): + + def test_overwrite(self): + pass From 6b3ec52ed49e8e50b9b809c8724c2c6d8a7f42c4 Mon Sep 17 00:00:00 2001 From: jonaswinkler Date: Fri, 27 Nov 2020 12:03:24 +0100 Subject: [PATCH 24/36] todo note --- src/documents/models.py | 1 + 1 file changed, 1 insertion(+) diff --git a/src/documents/models.py b/src/documents/models.py index 8e0435647..cd4517a3d 100755 --- a/src/documents/models.py +++ b/src/documents/models.py @@ -230,6 +230,7 @@ class Document(models.Model): @property def file_type(self): + # TODO: this is not stable across python versions return mimetypes.guess_extension(str(self.mime_type)) @property From d04b54140cc33d1b048172062c2b8edb4155eed7 Mon Sep 17 00:00:00 2001 From: jonaswinkler Date: Fri, 27 Nov 2020 13:12:13 +0100 Subject: [PATCH 25/36] moved consumption dir check into the correct spot --- src/documents/consumer.py | 13 ------------ .../management/commands/document_consumer.py | 11 +++++++++- src/documents/tests/test_consumer.py | 20 ------------------- .../tests/test_management_consumer.py | 11 ++++++++++ 4 files changed, 21 insertions(+), 34 deletions(-) diff --git a/src/documents/consumer.py b/src/documents/consumer.py index 8fed01c30..a7afca89d 100755 --- a/src/documents/consumer.py +++ b/src/documents/consumer.py @@ -8,7 +8,6 @@ from django.conf import settings from django.db import transaction from django.utils import timezone -from paperless.db import GnuPG from .classifier import DocumentClassifier, IncompatibleClassifierVersionError from .file_handling import generate_filename, create_source_path_directory from .loggers import LoggingMixin @@ -40,17 +39,6 @@ class Consumer(LoggingMixin): raise ConsumerError("Cannot consume {}: It is not a file".format( self.path)) - def pre_check_consumption_dir(self): - if not settings.CONSUMPTION_DIR: - raise ConsumerError( - "The CONSUMPTION_DIR settings variable does not appear to be " - "set.") - - if not os.path.isdir(settings.CONSUMPTION_DIR): - raise ConsumerError( - "Consumption directory {} does not exist".format( - settings.CONSUMPTION_DIR)) - def pre_check_duplicate(self): with open(self.path, "rb") as f: checksum = hashlib.md5(f.read()).hexdigest() @@ -92,7 +80,6 @@ class Consumer(LoggingMixin): # Make sure that preconditions for consuming the file are met. self.pre_check_file_exists() - self.pre_check_consumption_dir() self.pre_check_directories() self.pre_check_duplicate() diff --git a/src/documents/management/commands/document_consumer.py b/src/documents/management/commands/document_consumer.py index c25d0cfa9..b738f001b 100644 --- a/src/documents/management/commands/document_consumer.py +++ b/src/documents/management/commands/document_consumer.py @@ -3,7 +3,7 @@ import os from time import sleep from django.conf import settings -from django.core.management.base import BaseCommand +from django.core.management.base import BaseCommand, CommandError from django_q.tasks import async_task from watchdog.events import FileSystemEventHandler from watchdog.observers.polling import PollingObserver @@ -95,6 +95,15 @@ class Command(BaseCommand): def handle(self, *args, **options): directory = options["directory"] + if not directory: + raise CommandError( + "CONSUMPTION_DIR does not appear to be set." + ) + + if not os.path.isdir(directory): + raise CommandError( + f"Consumption directory {directory} does not exist") + for entry in os.scandir(directory): _consume(entry.path) diff --git a/src/documents/tests/test_consumer.py b/src/documents/tests/test_consumer.py index 323f5051f..ed835a467 100644 --- a/src/documents/tests/test_consumer.py +++ b/src/documents/tests/test_consumer.py @@ -502,26 +502,6 @@ class TestConsumer(TestCase): self.fail("Should throw exception") - @override_settings(CONSUMPTION_DIR=None) - def testConsumptionDirUnset(self): - try: - self.consumer.try_consume_file(self.get_test_file()) - except ConsumerError as e: - self.assertEqual(str(e), "The CONSUMPTION_DIR settings variable does not appear to be set.") - return - - self.fail("Should throw exception") - - @override_settings(CONSUMPTION_DIR="asd") - def testNoConsumptionDir(self): - try: - self.consumer.try_consume_file(self.get_test_file()) - except ConsumerError as e: - self.assertEqual(str(e), "Consumption directory asd does not exist") - return - - self.fail("Should throw exception") - def testDuplicates(self): self.consumer.try_consume_file(self.get_test_file()) diff --git a/src/documents/tests/test_management_consumer.py b/src/documents/tests/test_management_consumer.py index 33938d450..25b71f563 100644 --- a/src/documents/tests/test_management_consumer.py +++ b/src/documents/tests/test_management_consumer.py @@ -6,6 +6,7 @@ from time import sleep from unittest import mock from django.conf import settings +from django.core.management import call_command, CommandError from django.test import TestCase, override_settings from documents.consumer import ConsumerError @@ -193,3 +194,13 @@ class TestConsumer(TestCase): @override_settings(CONSUMER_POLLING=1) def test_slow_write_incomplete_polling(self): self.test_slow_write_incomplete() + + @override_settings(CONSUMPTION_DIR="does_not_exist") + def test_consumption_directory_invalid(self): + + self.assertRaises(CommandError, call_command, 'document_consumer', '--oneshot') + + @override_settings(CONSUMPTION_DIR="") + def test_consumption_directory_unset(self): + + self.assertRaises(CommandError, call_command, 'document_consumer', '--oneshot') From 20c11396324755e96f02c4b60a91dd7820b9741a Mon Sep 17 00:00:00 2001 From: jonaswinkler Date: Fri, 27 Nov 2020 13:12:34 +0100 Subject: [PATCH 26/36] inotify: cleanup descriptor when done --- src/documents/management/commands/document_consumer.py | 7 ++++--- 1 file changed, 4 insertions(+), 3 deletions(-) diff --git a/src/documents/management/commands/document_consumer.py b/src/documents/management/commands/document_consumer.py index b738f001b..295da4f5c 100644 --- a/src/documents/management/commands/document_consumer.py +++ b/src/documents/management/commands/document_consumer.py @@ -137,12 +137,13 @@ class Command(BaseCommand): f"Using inotify to watch directory for changes: {directory}") inotify = INotify() - inotify.add_watch(directory, flags.CLOSE_WRITE | flags.MOVED_TO) + descriptor = inotify.add_watch(directory, flags.CLOSE_WRITE | flags.MOVED_TO) try: while not self.stop_flag: for event in inotify.read(timeout=1000, read_delay=1000): file = os.path.join(directory, event.name) - if os.path.isfile(file): - _consume(file) + _consume(file) except KeyboardInterrupt: pass + + inotify.rm_watch(descriptor) From 35b2033949c1a3facbb2a2e499b7a62cbff46f13 Mon Sep 17 00:00:00 2001 From: jonaswinkler Date: Fri, 27 Nov 2020 13:13:11 +0100 Subject: [PATCH 27/36] tests: disable db logger in all tests except logger tests --- src/documents/loggers.py | 5 +++++ src/documents/tests/test_logger.py | 4 +++- src/paperless/settings.py | 2 ++ src/setup.cfg | 3 +-- 4 files changed, 11 insertions(+), 3 deletions(-) diff --git a/src/documents/loggers.py b/src/documents/loggers.py index fd20e1288..76dbe0163 100644 --- a/src/documents/loggers.py +++ b/src/documents/loggers.py @@ -1,9 +1,14 @@ import logging import uuid +from django.conf import settings + class PaperlessHandler(logging.Handler): def emit(self, record): + if settings.DISABLE_DBHANDLER: + return + # We have to do the import here or Django will barf when it tries to # load this because the apps aren't loaded at that point from .models import Log diff --git a/src/documents/tests/test_logger.py b/src/documents/tests/test_logger.py index 6e240ffc9..bbc9c2b5d 100644 --- a/src/documents/tests/test_logger.py +++ b/src/documents/tests/test_logger.py @@ -2,7 +2,7 @@ import logging import uuid from unittest import mock -from django.test import TestCase +from django.test import TestCase, override_settings from ..models import Log @@ -14,6 +14,7 @@ class TestPaperlessLog(TestCase): self.logger = logging.getLogger( "documents.management.commands.document_consumer") + @override_settings(DISABLE_DBHANDLER=False) def test_that_it_saves_at_all(self): kw = {"group": uuid.uuid4()} @@ -38,6 +39,7 @@ class TestPaperlessLog(TestCase): self.logger.critical("This is a critical message", extra=kw) self.assertEqual(Log.objects.all().count(), 5) + @override_settings(DISABLE_DBHANDLER=False) def test_groups(self): kw1 = {"group": uuid.uuid4()} diff --git a/src/paperless/settings.py b/src/paperless/settings.py index 1432dc5ec..4847d7bce 100644 --- a/src/paperless/settings.py +++ b/src/paperless/settings.py @@ -250,6 +250,8 @@ USE_TZ = True # Logging # ############################################################################### +DISABLE_DBHANDLER = __get_boolean("PAPERLESS_DISABLE_DBHANDLER") + LOGGING = { "version": 1, "disable_existing_loggers": False, diff --git a/src/setup.cfg b/src/setup.cfg index b540f9efe..f43c9adf6 100644 --- a/src/setup.cfg +++ b/src/setup.cfg @@ -5,8 +5,7 @@ exclude = migrations, paperless/settings.py, .tox, */tests/* DJANGO_SETTINGS_MODULE=paperless.settings addopts = --pythonwarnings=all --cov --cov-report=html env = - PAPERLESS_SECRET=paperless - PAPERLESS_EMAIL_SECRET=paperless + PAPERLESS_DISABLE_DBHANDLER=true [coverage:run] From a4277706f2a87c917e3fbfc02dd11b2c500a498b Mon Sep 17 00:00:00 2001 From: jonaswinkler Date: Fri, 27 Nov 2020 13:14:02 +0100 Subject: [PATCH 28/36] tests: wait for the consumer to exit before removing directories. --- src/documents/tests/test_management_consumer.py | 6 ++++++ 1 file changed, 6 insertions(+) diff --git a/src/documents/tests/test_management_consumer.py b/src/documents/tests/test_management_consumer.py index 25b71f563..2d45acd01 100644 --- a/src/documents/tests/test_management_consumer.py +++ b/src/documents/tests/test_management_consumer.py @@ -38,6 +38,7 @@ class TestConsumer(TestCase): sample_file = os.path.join(os.path.dirname(__file__), "samples", "simple.pdf") def setUp(self) -> None: + self.t = None patcher = mock.patch("documents.management.commands.document_consumer.async_task") self.task_mock = patcher.start() self.addCleanup(patcher.stop) @@ -53,7 +54,12 @@ class TestConsumer(TestCase): def tearDown(self) -> None: if self.t: + # set the stop flag self.t.stop() + # wait for the consumer to exit. + self.t.join() + + remove_dirs(self.dirs) def wait_for_task_mock_call(self): n = 0 From 60ac1ddbb93b6acda61589509293b4f8009d00b6 Mon Sep 17 00:00:00 2001 From: jonaswinkler Date: Fri, 27 Nov 2020 13:19:58 +0100 Subject: [PATCH 29/36] fix warnings about unclosed files. --- src/documents/management/commands/document_consumer.py | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-) diff --git a/src/documents/management/commands/document_consumer.py b/src/documents/management/commands/document_consumer.py index 295da4f5c..7baeccce0 100644 --- a/src/documents/management/commands/document_consumer.py +++ b/src/documents/management/commands/document_consumer.py @@ -137,7 +137,8 @@ class Command(BaseCommand): f"Using inotify to watch directory for changes: {directory}") inotify = INotify() - descriptor = inotify.add_watch(directory, flags.CLOSE_WRITE | flags.MOVED_TO) + descriptor = inotify.add_watch( + directory, flags.CLOSE_WRITE | flags.MOVED_TO) try: while not self.stop_flag: for event in inotify.read(timeout=1000, read_delay=1000): @@ -147,3 +148,4 @@ class Command(BaseCommand): pass inotify.rm_watch(descriptor) + inotify.close() From 42c9186e91f0a60c66b37c2a3c159137b15e13ce Mon Sep 17 00:00:00 2001 From: jonaswinkler Date: Fri, 27 Nov 2020 13:56:07 +0100 Subject: [PATCH 30/36] refrain from creating the index as part of the migrations, messes with the test cases. --- docs/setup.rst | 18 +++++++++------- .../migrations/1000_update_paperless_all.py | 21 ------------------- 2 files changed, 10 insertions(+), 29 deletions(-) diff --git a/docs/setup.rst b/docs/setup.rst index 88785364b..4e2826dd6 100644 --- a/docs/setup.rst +++ b/docs/setup.rst @@ -265,15 +265,17 @@ Migration to paperless-ng is then performed in a few simple steps: ``docker-compose.env`` to your needs. See `docker route`_ for details on which edits are advised. -6. Start paperless-ng. +6. In order to find your existing documents with the new search feature, you need + to invoke a one-time operation that will create the search index: - .. code:: bash + .. code:: shell-session - $ docker-compose up + $ docker-compose run --rm webserver document_index reindex + + This will migrate your database and create the search index. After that, + paperless will take care of maintaining the index by itself. - If you see everything working (you should see some migrations getting - applied, for instance), you can gracefully stop paperless-ng with Ctrl-C - and then start paperless-ng as usual with +7. Start paperless-ng. .. code:: bash @@ -281,11 +283,11 @@ Migration to paperless-ng is then performed in a few simple steps: This will run paperless in the background and automatically start it on system boot. -7. Paperless installed a permanent redirect to ``admin/`` in your browser. This +8. Paperless installed a permanent redirect to ``admin/`` in your browser. This redirect is still in place and prevents access to the new UI. Clear browsing cache in order to fix this. -8. Optionally, follow the instructions below to migrate your existing data to PostgreSQL. +9. Optionally, follow the instructions below to migrate your existing data to PostgreSQL. .. _setup-sqlite_to_psql: diff --git a/src/documents/migrations/1000_update_paperless_all.py b/src/documents/migrations/1000_update_paperless_all.py index dc6313dd8..f3fbbb6c1 100644 --- a/src/documents/migrations/1000_update_paperless_all.py +++ b/src/documents/migrations/1000_update_paperless_all.py @@ -5,23 +5,6 @@ from django.db import migrations, models import django.db.models.deletion -def make_index(apps, schema_editor): - Document = apps.get_model("documents", "Document") - documents = Document.objects.all() - print() - try: - print(" --> Creating document index...") - from whoosh.writing import AsyncWriter - from documents import index - ix = index.open_index(recreate=True) - with AsyncWriter(ix) as writer: - for document in documents: - index.update_document(writer, document) - except ImportError: - # index may not be relevant anymore - print(" --> Cannot create document index.") - - def logs_set_default_group(apps, schema_editor): Log = apps.get_model('documents', 'Log') for log in Log.objects.all(): @@ -99,8 +82,4 @@ class Migration(migrations.Migration): code=django.db.migrations.operations.special.RunPython.noop, reverse_code=logs_set_default_group ), - migrations.RunPython( - code=make_index, - reverse_code=django.db.migrations.operations.special.RunPython.noop, - ), ] From 938499706ce8a1ff01234cd833dd1c044cc92e1d Mon Sep 17 00:00:00 2001 From: jonaswinkler Date: Fri, 27 Nov 2020 13:59:24 +0100 Subject: [PATCH 31/36] fixed an issue with the search api opening the index on import (that's way too early.) --- src/documents/views.py | 8 ++++++-- 1 file changed, 6 insertions(+), 2 deletions(-) diff --git a/src/documents/views.py b/src/documents/views.py index 287fb114c..96b413d67 100755 --- a/src/documents/views.py +++ b/src/documents/views.py @@ -202,7 +202,9 @@ class SearchView(APIView): permission_classes = (IsAuthenticated,) - ix = index.open_index() + def __init__(self, *args, **kwargs): + super(SearchView, self).__init__(*args, **kwargs) + self.ix = index.open_index() def add_infos_to_hit(self, r): doc = Document.objects.get(id=r['id']) @@ -241,7 +243,9 @@ class SearchAutoCompleteView(APIView): permission_classes = (IsAuthenticated,) - ix = index.open_index() + def __init__(self, *args, **kwargs): + super(SearchAutoCompleteView, self).__init__(*args, **kwargs) + self.ix = index.open_index() def get(self, request, format=None): if 'term' in request.query_params: From 6834e563a880c003d3daccda11b13a6486650395 Mon Sep 17 00:00:00 2001 From: jonaswinkler Date: Fri, 27 Nov 2020 14:00:41 +0100 Subject: [PATCH 32/36] refactored the test cases to use a mixin for setting up temporary directories. --- src/documents/tests/test_api.py | 7 +++---- src/documents/tests/test_consumer.py | 7 +++---- src/documents/tests/test_management_consumer.py | 12 +++++------- src/documents/tests/utils.py | 15 +++++++++++++++ 4 files changed, 26 insertions(+), 15 deletions(-) diff --git a/src/documents/tests/test_api.py b/src/documents/tests/test_api.py index d9a2aac26..37f774891 100644 --- a/src/documents/tests/test_api.py +++ b/src/documents/tests/test_api.py @@ -7,14 +7,13 @@ from pathvalidate import ValidationError from rest_framework.test import APITestCase from documents.models import Document, Correspondent, DocumentType, Tag -from documents.tests.utils import setup_directories, remove_dirs +from documents.tests.utils import DirectoriesMixin -class DocumentApiTest(APITestCase): +class DocumentApiTest(DirectoriesMixin, APITestCase): def setUp(self): - self.dirs = setup_directories() - self.addCleanup(remove_dirs, self.dirs) + super(DocumentApiTest, self).setUp() user = User.objects.create_superuser(username="temp_admin") self.client.force_login(user=user) diff --git a/src/documents/tests/test_consumer.py b/src/documents/tests/test_consumer.py index ed835a467..d1cd0adf1 100644 --- a/src/documents/tests/test_consumer.py +++ b/src/documents/tests/test_consumer.py @@ -6,7 +6,7 @@ from unittest.mock import MagicMock from django.test import TestCase, override_settings -from .utils import setup_directories, remove_dirs +from .utils import DirectoriesMixin from ..consumer import Consumer, ConsumerError from ..models import FileInfo, Tag, Correspondent, DocumentType, Document from ..parsers import DocumentParser, ParseError @@ -408,7 +408,7 @@ def fake_magic_from_file(file, mime=False): @mock.patch("documents.consumer.magic.from_file", fake_magic_from_file) -class TestConsumer(TestCase): +class TestConsumer(DirectoriesMixin, TestCase): def make_dummy_parser(self, path, logging_group): return DummyParser(path, logging_group, self.dirs.scratch_dir) @@ -417,8 +417,7 @@ class TestConsumer(TestCase): return FaultyParser(path, logging_group, self.dirs.scratch_dir) def setUp(self): - self.dirs = setup_directories() - self.addCleanup(remove_dirs, self.dirs) + super(TestConsumer, self).setUp() patcher = mock.patch("documents.parsers.document_consumer_declaration.send") m = patcher.start() diff --git a/src/documents/tests/test_management_consumer.py b/src/documents/tests/test_management_consumer.py index 2d45acd01..aed824926 100644 --- a/src/documents/tests/test_management_consumer.py +++ b/src/documents/tests/test_management_consumer.py @@ -7,11 +7,11 @@ from unittest import mock from django.conf import settings from django.core.management import call_command, CommandError -from django.test import TestCase, override_settings +from django.test import override_settings, TestCase from documents.consumer import ConsumerError from documents.management.commands import document_consumer -from documents.tests.utils import setup_directories, remove_dirs +from documents.tests.utils import DirectoriesMixin class ConsumerThread(Thread): @@ -33,19 +33,17 @@ def chunked(size, source): yield source[i:i+size] -class TestConsumer(TestCase): +class TestConsumer(DirectoriesMixin, TestCase): sample_file = os.path.join(os.path.dirname(__file__), "samples", "simple.pdf") def setUp(self) -> None: + super(TestConsumer, self).setUp() self.t = None patcher = mock.patch("documents.management.commands.document_consumer.async_task") self.task_mock = patcher.start() self.addCleanup(patcher.stop) - self.dirs = setup_directories() - self.addCleanup(remove_dirs, self.dirs) - def t_start(self): self.t = ConsumerThread() self.t.start() @@ -59,7 +57,7 @@ class TestConsumer(TestCase): # wait for the consumer to exit. self.t.join() - remove_dirs(self.dirs) + super(TestConsumer, self).tearDown() def wait_for_task_mock_call(self): n = 0 diff --git a/src/documents/tests/utils.py b/src/documents/tests/utils.py index 7b0938ee3..83148e9c7 100644 --- a/src/documents/tests/utils.py +++ b/src/documents/tests/utils.py @@ -39,3 +39,18 @@ def remove_dirs(dirs): shutil.rmtree(dirs.data_dir, ignore_errors=True) shutil.rmtree(dirs.scratch_dir, ignore_errors=True) shutil.rmtree(dirs.consumption_dir, ignore_errors=True) + + +class DirectoriesMixin: + + def __init__(self, *args, **kwargs): + super().__init__(*args, **kwargs) + self.dirs = None + + def setUp(self) -> None: + self.dirs = setup_directories() + super(DirectoriesMixin, self).setUp() + + def tearDown(self) -> None: + super(DirectoriesMixin, self).tearDown() + remove_dirs(self.dirs) From 6c308116d68904374536f5e47907a3d22c1688d1 Mon Sep 17 00:00:00 2001 From: jonaswinkler Date: Fri, 27 Nov 2020 14:00:52 +0100 Subject: [PATCH 33/36] parallel tests. --- src/documents/index.py | 3 --- src/setup.cfg | 2 +- 2 files changed, 1 insertion(+), 4 deletions(-) diff --git a/src/documents/index.py b/src/documents/index.py index a6c3abba8..ffa3e688f 100644 --- a/src/documents/index.py +++ b/src/documents/index.py @@ -64,9 +64,6 @@ def get_schema(): def open_index(recreate=False): - # TODO: this is not thread safe. If 2 instances try to create the index - # at the same time, this fails. This currently prevents parallel - # tests. try: if exists_in(settings.INDEX_DIR) and not recreate: return open_dir(settings.INDEX_DIR) diff --git a/src/setup.cfg b/src/setup.cfg index f43c9adf6..2a1a348bd 100644 --- a/src/setup.cfg +++ b/src/setup.cfg @@ -3,7 +3,7 @@ exclude = migrations, paperless/settings.py, .tox, */tests/* [tool:pytest] DJANGO_SETTINGS_MODULE=paperless.settings -addopts = --pythonwarnings=all --cov --cov-report=html +addopts = --pythonwarnings=all --cov --cov-report=html -n auto env = PAPERLESS_DISABLE_DBHANDLER=true From bc4192e7d1ee3d68ff5bbebc3e74e9065b82116b Mon Sep 17 00:00:00 2001 From: jonaswinkler Date: Fri, 27 Nov 2020 15:00:16 +0100 Subject: [PATCH 34/36] more tests and bugfixes. --- src/documents/classifier.py | 2 +- src/documents/tests/test_api.py | 104 ++++++++++++++++++ src/documents/tests/test_classifier.py | 8 +- src/documents/tests/test_document_retagger.py | 7 -- .../tests/test_management_retagger.py | 58 ++++++++++ src/documents/tests/utils.py | 13 ++- src/documents/views.py | 3 + 7 files changed, 180 insertions(+), 15 deletions(-) delete mode 100644 src/documents/tests/test_document_retagger.py create mode 100644 src/documents/tests/test_management_retagger.py diff --git a/src/documents/classifier.py b/src/documents/classifier.py index b0d7d87bb..60c9abeec 100755 --- a/src/documents/classifier.py +++ b/src/documents/classifier.py @@ -4,13 +4,13 @@ import os import pickle import re +from django.conf import settings from sklearn.feature_extraction.text import CountVectorizer from sklearn.neural_network import MLPClassifier from sklearn.preprocessing import MultiLabelBinarizer, LabelBinarizer from sklearn.utils.multiclass import type_of_target from documents.models import Document, MatchingModel -from paperless import settings class IncompatibleClassifierVersionError(Exception): diff --git a/src/documents/tests/test_api.py b/src/documents/tests/test_api.py index 37f774891..bb0581656 100644 --- a/src/documents/tests/test_api.py +++ b/src/documents/tests/test_api.py @@ -6,6 +6,7 @@ from django.contrib.auth.models import User from pathvalidate import ValidationError from rest_framework.test import APITestCase +from documents import index from documents.models import Document, Correspondent, DocumentType, Tag from documents.tests.utils import DirectoriesMixin @@ -162,6 +163,109 @@ class DocumentApiTest(DirectoriesMixin, APITestCase): results = response.data['results'] self.assertEqual(len(results), 3) + def test_search_no_query(self): + response = self.client.get("/api/search/") + results = response.data['results'] + + self.assertEqual(len(results), 0) + + def test_search(self): + d1=Document.objects.create(title="invoice", content="the thing i bought at a shop and paid with bank account", checksum="A", pk=1) + d2=Document.objects.create(title="bank statement 1", content="things i paid for in august", pk=2, checksum="B") + d3=Document.objects.create(title="bank statement 3", content="things i paid for in september", pk=3, checksum="C") + with index.open_index(False).writer() as writer: + # Note to future self: there is a reason we dont use a model signal handler to update the index: some operations edit many documents at once + # (retagger, renamer) and we don't want to open a writer for each of these, but rather perform the entire operation with one writer. + # That's why we cant open the writer in a model on_save handler or something. + index.update_document(writer, d1) + index.update_document(writer, d2) + index.update_document(writer, d3) + response = self.client.get("/api/search/?query=bank") + results = response.data['results'] + self.assertEqual(response.data['count'], 3) + self.assertEqual(response.data['page'], 1) + self.assertEqual(response.data['page_count'], 1) + self.assertEqual(len(results), 3) + + response = self.client.get("/api/search/?query=september") + results = response.data['results'] + self.assertEqual(response.data['count'], 1) + self.assertEqual(response.data['page'], 1) + self.assertEqual(response.data['page_count'], 1) + self.assertEqual(len(results), 1) + + response = self.client.get("/api/search/?query=statement") + results = response.data['results'] + self.assertEqual(response.data['count'], 2) + self.assertEqual(response.data['page'], 1) + self.assertEqual(response.data['page_count'], 1) + self.assertEqual(len(results), 2) + + response = self.client.get("/api/search/?query=sfegdfg") + results = response.data['results'] + self.assertEqual(response.data['count'], 0) + self.assertEqual(response.data['page'], 0) + self.assertEqual(response.data['page_count'], 0) + self.assertEqual(len(results), 0) + + def test_search_multi_page(self): + with index.open_index(False).writer() as writer: + for i in range(55): + doc = Document.objects.create(checksum=str(i), pk=i+1, title=f"Document {i+1}", content="content") + index.update_document(writer, doc) + + # This is here so that we test that no document gets returned twice (might happen if the paging is not working) + seen_ids = [] + + for i in range(1, 6): + response = self.client.get(f"/api/search/?query=content&page={i}") + results = response.data['results'] + self.assertEqual(response.data['count'], 55) + self.assertEqual(response.data['page'], i) + self.assertEqual(response.data['page_count'], 6) + self.assertEqual(len(results), 10) + + for result in results: + self.assertNotIn(result['id'], seen_ids) + seen_ids.append(result['id']) + + response = self.client.get(f"/api/search/?query=content&page=6") + results = response.data['results'] + self.assertEqual(response.data['count'], 55) + self.assertEqual(response.data['page'], 6) + self.assertEqual(response.data['page_count'], 6) + self.assertEqual(len(results), 5) + + for result in results: + self.assertNotIn(result['id'], seen_ids) + seen_ids.append(result['id']) + + response = self.client.get(f"/api/search/?query=content&page=7") + results = response.data['results'] + self.assertEqual(response.data['count'], 55) + self.assertEqual(response.data['page'], 6) + self.assertEqual(response.data['page_count'], 6) + self.assertEqual(len(results), 5) + + def test_search_invalid_page(self): + with index.open_index(False).writer() as writer: + for i in range(15): + doc = Document.objects.create(checksum=str(i), pk=i+1, title=f"Document {i+1}", content="content") + index.update_document(writer, doc) + + first_page = self.client.get(f"/api/search/?query=content&page=1").data + second_page = self.client.get(f"/api/search/?query=content&page=2").data + should_be_first_page_1 = self.client.get(f"/api/search/?query=content&page=0").data + should_be_first_page_2 = self.client.get(f"/api/search/?query=content&page=dgfd").data + should_be_first_page_3 = self.client.get(f"/api/search/?query=content&page=").data + should_be_first_page_4 = self.client.get(f"/api/search/?query=content&page=-7868").data + + self.assertDictEqual(first_page, should_be_first_page_1) + self.assertDictEqual(first_page, should_be_first_page_2) + self.assertDictEqual(first_page, should_be_first_page_3) + self.assertDictEqual(first_page, should_be_first_page_4) + self.assertNotEqual(len(first_page['results']), len(second_page['results'])) + @mock.patch("documents.index.autocomplete") def test_search_autocomplete(self, m): m.side_effect = lambda ix, term, limit: [term for _ in range(limit)] diff --git a/src/documents/tests/test_classifier.py b/src/documents/tests/test_classifier.py index e5e7d8639..9e999794d 100644 --- a/src/documents/tests/test_classifier.py +++ b/src/documents/tests/test_classifier.py @@ -6,11 +6,13 @@ from django.test import TestCase, override_settings from documents.classifier import DocumentClassifier, IncompatibleClassifierVersionError from documents.models import Correspondent, Document, Tag, DocumentType +from documents.tests.utils import DirectoriesMixin -class TestClassifier(TestCase): +class TestClassifier(DirectoriesMixin, TestCase): def setUp(self): + super(TestClassifier, self).setUp() self.classifier = DocumentClassifier() def generate_test_data(self): @@ -80,12 +82,14 @@ class TestClassifier(TestCase): self.assertTrue(self.classifier.train()) self.assertFalse(self.classifier.train()) + self.classifier.save_classifier() + classifier2 = DocumentClassifier() current_ver = DocumentClassifier.FORMAT_VERSION with mock.patch("documents.classifier.DocumentClassifier.FORMAT_VERSION", current_ver+1): # assure that we won't load old classifiers. - self.assertRaises(IncompatibleClassifierVersionError, self.classifier.reload) + self.assertRaises(IncompatibleClassifierVersionError, classifier2.reload) self.classifier.save_classifier() diff --git a/src/documents/tests/test_document_retagger.py b/src/documents/tests/test_document_retagger.py deleted file mode 100644 index 6fe40d7e9..000000000 --- a/src/documents/tests/test_document_retagger.py +++ /dev/null @@ -1,7 +0,0 @@ -from django.test import TestCase - - -class TestRetagger(TestCase): - - def test_overwrite(self): - pass diff --git a/src/documents/tests/test_management_retagger.py b/src/documents/tests/test_management_retagger.py new file mode 100644 index 000000000..2346b6527 --- /dev/null +++ b/src/documents/tests/test_management_retagger.py @@ -0,0 +1,58 @@ +from django.core.management import call_command +from django.test import TestCase + +from documents.models import Document, Tag, Correspondent, DocumentType +from documents.tests.utils import DirectoriesMixin + + +class TestRetagger(DirectoriesMixin, TestCase): + + def make_models(self): + self.d1 = Document.objects.create(checksum="A", title="A", content="first document") + self.d2 = Document.objects.create(checksum="B", title="B", content="second document") + self.d3 = Document.objects.create(checksum="C", title="C", content="unrelated document") + + self.tag_first = Tag.objects.create(name="tag1", match="first", matching_algorithm=Tag.MATCH_ANY) + self.tag_second = Tag.objects.create(name="tag2", match="second", matching_algorithm=Tag.MATCH_ANY) + + self.correspondent_first = Correspondent.objects.create( + name="c1", match="first", matching_algorithm=Correspondent.MATCH_ANY) + self.correspondent_second = Correspondent.objects.create( + name="c2", match="second", matching_algorithm=Correspondent.MATCH_ANY) + + self.doctype_first = DocumentType.objects.create( + name="dt1", match="first", matching_algorithm=DocumentType.MATCH_ANY) + self.doctype_second = DocumentType.objects.create( + name="dt2", match="second", matching_algorithm=DocumentType.MATCH_ANY) + + def get_updated_docs(self): + return Document.objects.get(title="A"), Document.objects.get(title="B"), Document.objects.get(title="C") + + def setUp(self) -> None: + super(TestRetagger, self).setUp() + self.make_models() + + def test_add_tags(self): + call_command('document_retagger', '--tags') + d_first, d_second, d_unrelated = self.get_updated_docs() + + self.assertEqual(d_first.tags.count(), 1) + self.assertEqual(d_second.tags.count(), 1) + self.assertEqual(d_unrelated.tags.count(), 0) + + self.assertEqual(d_first.tags.first(), self.tag_first) + self.assertEqual(d_second.tags.first(), self.tag_second) + + def test_add_type(self): + call_command('document_retagger', '--document_type') + d_first, d_second, d_unrelated = self.get_updated_docs() + + self.assertEqual(d_first.document_type, self.doctype_first) + self.assertEqual(d_second.document_type, self.doctype_second) + + def test_add_correspondent(self): + call_command('document_retagger', '--correspondent') + d_first, d_second, d_unrelated = self.get_updated_docs() + + self.assertEqual(d_first.correspondent, self.correspondent_first) + self.assertEqual(d_second.correspondent, self.correspondent_second) diff --git a/src/documents/tests/utils.py b/src/documents/tests/utils.py index 83148e9c7..aec99ff34 100644 --- a/src/documents/tests/utils.py +++ b/src/documents/tests/utils.py @@ -14,12 +14,13 @@ def setup_directories(): dirs.scratch_dir = tempfile.mkdtemp() dirs.media_dir = tempfile.mkdtemp() dirs.consumption_dir = tempfile.mkdtemp() - dirs.index_dir = os.path.join(dirs.data_dir, "documents", "originals") + dirs.index_dir = os.path.join(dirs.data_dir, "index") dirs.originals_dir = os.path.join(dirs.media_dir, "documents", "originals") dirs.thumbnail_dir = os.path.join(dirs.media_dir, "documents", "thumbnails") - os.makedirs(dirs.index_dir) - os.makedirs(dirs.originals_dir) - os.makedirs(dirs.thumbnail_dir) + + os.makedirs(dirs.index_dir, exist_ok=True) + os.makedirs(dirs.originals_dir, exist_ok=True) + os.makedirs(dirs.thumbnail_dir, exist_ok=True) override_settings( DATA_DIR=dirs.data_dir, @@ -28,7 +29,9 @@ def setup_directories(): ORIGINALS_DIR=dirs.originals_dir, THUMBNAIL_DIR=dirs.thumbnail_dir, CONSUMPTION_DIR=dirs.consumption_dir, - INDEX_DIR=dirs.index_dir + INDEX_DIR=dirs.index_dir, + MODEL_FILE=os.path.join(dirs.data_dir, "classification_model.pickle") + ).enable() return dirs diff --git a/src/documents/views.py b/src/documents/views.py index 96b413d67..84f4a3999 100755 --- a/src/documents/views.py +++ b/src/documents/views.py @@ -224,6 +224,9 @@ class SearchView(APIView): except (ValueError, TypeError): page = 1 + if page < 1: + page = 1 + with index.query_page(self.ix, query, page) as result_page: return Response( {'count': len(result_page), From 481b6c7cec2ddf6b5694c6d1638123a172aac00f Mon Sep 17 00:00:00 2001 From: jonaswinkler Date: Fri, 27 Nov 2020 15:48:53 +0100 Subject: [PATCH 35/36] changelog and versions. --- docker/hub/docker-compose.postgres.yml | 2 +- docker/hub/docker-compose.sqlite.yml | 2 +- docs/changelog.rst | 24 ++++++++++++++++-------- src/paperless/version.py | 2 +- 4 files changed, 19 insertions(+), 11 deletions(-) diff --git a/docker/hub/docker-compose.postgres.yml b/docker/hub/docker-compose.postgres.yml index e89513e5c..5d9e2a7ae 100644 --- a/docker/hub/docker-compose.postgres.yml +++ b/docker/hub/docker-compose.postgres.yml @@ -15,7 +15,7 @@ services: POSTGRES_PASSWORD: paperless webserver: - image: jonaswinkler/paperless-ng:0.9.2 + image: jonaswinkler/paperless-ng:0.9.3 restart: always depends_on: - db diff --git a/docker/hub/docker-compose.sqlite.yml b/docker/hub/docker-compose.sqlite.yml index cdd6206d9..95f024061 100644 --- a/docker/hub/docker-compose.sqlite.yml +++ b/docker/hub/docker-compose.sqlite.yml @@ -5,7 +5,7 @@ services: restart: always webserver: - image: jonaswinkler/paperless-ng:0.9.2 + image: jonaswinkler/paperless-ng:0.9.3 restart: always depends_on: - broker diff --git a/docs/changelog.rst b/docs/changelog.rst index e2df92863..c4443504f 100644 --- a/docs/changelog.rst +++ b/docs/changelog.rst @@ -5,15 +5,23 @@ Changelog ********* -next -#### +paperless-ng 0.9.3 +################## -* Setting ``PAPERLESS_AUTO_LOGIN_USERNAME`` replaces ``PAPERLESS_DISABLE_LOGIN``. - You have to specify your username. -* Added a simple sanity checker that checks your documents for missing or orphaned files, - files with wrong checksums, inaccessible files, and documents with empty content. -* It is no longer possible to encrypt your documents. For the time being, paperless will - continue to operate with already encrypted documents. +* Setting ``PAPERLESS_AUTO_LOGIN_USERNAME`` replaces ``PAPERLESS_DISABLE_LOGIN``. + You have to specify your username. +* Added a simple sanity checker that checks your documents for missing or orphaned files, + files with wrong checksums, inaccessible files, and documents with empty content. +* It is no longer possible to encrypt your documents. For the time being, paperless will + continue to operate with already encrypted documents. +* Fixes: + + * Paperless now uses inotify again, since the watchdog was causing issues which I was not + aware of. + * Issue with the automatic classifier not working with only one tag. + * A couple issues with the search index being opened to eagerly. + +* Added lots of tests for various parts of the application. paperless-ng 0.9.2 ################## diff --git a/src/paperless/version.py b/src/paperless/version.py index a0e084253..90680d4b0 100644 --- a/src/paperless/version.py +++ b/src/paperless/version.py @@ -1 +1 @@ -__version__ = (0, 9, 2) +__version__ = (0, 9, 3) From a1f5ddede8f1558b031c83ec0f8349cbfc853e5a Mon Sep 17 00:00:00 2001 From: jonaswinkler Date: Fri, 27 Nov 2020 17:36:57 +0100 Subject: [PATCH 36/36] release script checklist --- scripts/make-release.sh | 14 ++++++++++++++ 1 file changed, 14 insertions(+) diff --git a/scripts/make-release.sh b/scripts/make-release.sh index 06548748b..3c3892aed 100755 --- a/scripts/make-release.sh +++ b/scripts/make-release.sh @@ -1,5 +1,19 @@ #!/bin/bash +# Release checklist +# - wait for travis build. +# adjust src/paperless/version.py +# changelog in the documentation +# adjust versions in docker/hub/* +# If docker-compose was modified: all compose files are the same. + +# Steps: +# run release script "dev", push +# if it works: new tag, merge into master +# on master: make release "lastest", push +# on master: make release "version-tag", push +# publish release files + set -e