Compare commits

..

154 Commits
1.3.0 ... 2.0.0

Author SHA1 Message Date
Daniel Quinn
3b72d38440 Merge pull request #254 from danielquinn/mcronce-disable_encryption
Allow encryption to be disabled
2018-06-17 20:31:48 +01:00
Daniel Quinn
631d316985 Merge the storage_type migrations 2018-06-17 20:23:54 +01:00
Erik Arvstedt
742b01d1f5 Update Consumer class documentation 2018-06-17 20:17:40 +01:00
Daniel Quinn
d37aabfb06 Put imports at the top 2018-06-17 20:14:46 +01:00
Erik Arvstedt
b3624f6375 Improve password check
1. Fail when the db contains encrypted docs and no password is set.
   Previously, this case wasn't detected.

2. Exit with an error instead of showing warnings.
   This ensures that we never store docs with different encryption passwords.
2018-06-17 20:07:32 +01:00
Daniel Quinn
d6d8537b69 Remove emoji from storage-type changer 2018-06-17 17:23:50 +01:00
Daniel Quinn
90cd9f3eb7 Drop lines thanks to @erikarvstedt's eagle-eye 2018-06-17 17:10:45 +01:00
Daniel Quinn
a0240cace3 Update docs for new encryption toggle 2018-06-17 17:08:24 +01:00
Daniel Quinn
988adf963a Update import & export to handle encryption toggle 2018-06-17 17:06:22 +01:00
Daniel Quinn
3d188ec623 Fix migrations 2018-06-17 16:47:38 +01:00
Daniel Quinn
c9f35a7da2 Merge branch 'master' into mcronce-disable_encryption 2018-06-17 16:32:51 +01:00
Daniel Quinn
d5876cc97d Clean up the text a bit 2018-06-15 14:44:19 +01:00
Daniel Quinn
6235edf845 Merge pull request #367 from ahyear/patch-2
update docker-compose.env for mail Consumption
2018-06-15 14:42:33 +01:00
ahyear
9b00a98de3 update docker-compose.env for mail Consumptionu$ 2018-06-15 15:31:29 +02:00
Daniel Quinn
07e18e773a Merge branch 'erikarvstedt-fix-checkbox' 2018-06-01 18:39:03 +01:00
Erik Arvstedt
fc560b8c04 Fix unclickable checkbox in documents view
1. Clicks to the document selection checkbox were captured by the onclick
   handler of the document item header. This is now fixed.

2. Reexpose the doc title link to mouse events by putting it on top of
   the header link layer.
2018-06-01 14:07:34 +02:00
Daniel Quinn
c94f4dcc75 Merge branch 'erikarvstedt-document_field_added' 2018-06-01 07:55:39 +01:00
Daniel Quinn
9173bca3c7 Merge branch 'document_field_added' of git://github.com/erikarvstedt/paperless into erikarvstedt-document_field_added 2018-06-01 07:51:44 +01:00
Daniel Quinn
f2cf3a6a0f Merge branch 'master' of github.com:danielquinn/paperless 2018-06-01 07:50:31 +01:00
Daniel Quinn
d6346706db Merge pull request #360 from erikarvstedt/fix-incompatibility
Fix incompatibility with Python versions < 3.6
2018-06-01 07:46:50 +01:00
Erik Arvstedt
48738dab9f Fix incompatibility with Python versions < 3.6
Direct index access to a match was only added in 3.6.

Fixes #359
2018-06-01 00:45:59 +02:00
Erik Arvstedt
11db87fa11 Add field 'added' to documents
This field indicates when the document was added to the database
2018-05-31 10:17:03 +02:00
Daniel Quinn
1f7990d742 Add note about inotify 2018-05-28 13:11:19 +01:00
Daniel Quinn
52b32fddc9 Merge branch 'erikarvstedt-inotify' 2018-05-28 13:08:27 +01:00
Daniel Quinn
81a8cb45d7 It's exist_ok=, not exists_ok= -- my bad. 2018-05-28 13:08:00 +01:00
Daniel Quinn
9c583fe9f3 Merge branch 'inotify' of git://github.com/erikarvstedt/paperless into erikarvstedt-inotify 2018-05-28 13:03:06 +01:00
Daniel Quinn
a1cb67c4ce Don't check changed passphrase if no passphrase set 2018-05-28 12:58:28 +01:00
Daniel Quinn
c37f642cff Remove old Python2.7-style code 2018-05-28 12:58:28 +01:00
Daniel Quinn
9df06fbb12 Document the big changes for 2.0 2018-05-28 12:58:28 +01:00
Daniel Quinn
0abf637c67 Exclude unencrypted documents & thumbnails 2018-05-28 12:58:28 +01:00
Daniel Quinn
27a936f9bf Add script to (de|en)crypt all documents 2018-05-28 12:58:28 +01:00
Daniel Quinn
6e1f2b3f03 Drop STORAGE_TYPE in favour of just using PAPERLESS_PASSPHRASE 2018-05-28 12:58:28 +01:00
Daniel Quinn
5643d89270 Change default storage_type to unencrypted 2018-05-28 12:58:28 +01:00
Daniel Quinn
52b0249d71 Don't run document checks if table doesn't exist yet 2018-05-28 12:58:28 +01:00
Daniel Quinn
2ab2c37f5a Fix migration conflict 2018-05-28 12:58:28 +01:00
Daniel Quinn
f72fa43e86 Add check for changed password
These tests are incomplete, but I have no idea how to write the other
half.
2018-05-28 12:58:28 +01:00
Daniel Quinn
c0ad6cd58a Add "fat finger" check to password status 2018-05-28 12:58:28 +01:00
Daniel Quinn
b79caa64d0 Remove checks we weren't using 2018-05-28 12:58:28 +01:00
Daniel Quinn
e5b7e93eff Only require a passphrase if STORAGE_TYPE is not "unencrypted" 2018-05-28 12:58:28 +01:00
Daniel Quinn
d8740ee5ca Make the consumer aware of the different storage types 2018-05-28 12:58:28 +01:00
Daniel Quinn
cdc07cf153 Move the encrypt/decrypt decision out of db and into the view 2018-05-28 12:58:28 +01:00
Daniel Quinn
da6dc2ad5b Attach storage_type to Documents 2018-05-28 12:58:28 +01:00
Daniel Quinn
885dbf67d5 Set STORAGE_TYPE instead of ENABLE_ENCRYPTION boolean
This allows for future decisions around the types of encryption used (if any).  Ideally, I want to replace GPG one day with something elegant out of the cryptography module.
2018-05-28 12:58:28 +01:00
Daniel Quinn
02b40a54e0 Try to be more pep8 in the settings file 2018-05-28 12:58:28 +01:00
Mike Cronce
3b6a3219f5 src/paperless/db.py: If encryption is disabled, just directly read the file contents 2018-05-28 12:58:28 +01:00
Mike Cronce
8783c2af88 src/manage.py: Added check to see whether or not encryption is enabled before prompting for passphrase if it's empty 2018-05-28 12:58:28 +01:00
Mike Cronce
6cedbb3307 src/paperless/settings.py: Added DISABLE_ENCRYPTION environment variable 2018-05-28 12:58:28 +01:00
Daniel Quinn
4585308e7f Fix redirect for subpaths (hopefully) 2018-05-28 12:56:20 +01:00
Daniel Quinn
4386b09eb1 Code clean up 2018-05-28 12:56:06 +01:00
Erik Arvstedt
f96e7f7895 fixup: mention inotify in 'utilities.rst' 2018-05-22 01:22:41 +02:00
Erik Arvstedt
8218b1aa51 Documentation: Replace 'PDF' with 'document'
There are more supported file formats than just PDF.
2018-05-22 01:22:38 +02:00
Erik Arvstedt
0559204be4 fixup: require usage of PAPERLESS_EMAIL_SECRET 2018-05-21 12:11:56 +02:00
Erik Arvstedt
bccac5017c fixup: remove helper fn 'make_dirs' 2018-05-21 00:45:00 +02:00
Erik Arvstedt
3e8038577d fixup: break up complex if condition 2018-05-21 00:44:58 +02:00
Daniel Quinn
05b7bcd199 Minor dependency updates 2018-05-20 18:07:53 +01:00
Daniel Quinn
3a2a180607 Update for project status 2018-05-20 17:52:46 +01:00
Daniel Quinn
9690a00761 Add notes for #352 and #354 2018-05-20 17:28:10 +01:00
Daniel Quinn
3532745579 Allow the searching of documents by tag #354 2018-05-20 17:28:00 +01:00
Daniel Quinn
24bdc07e14 Merge pull request #352 from Strubbl/fix-unwanted-exit-in-docker-entrypoint.sh
fix bug where docker-entrypoint.sh exits w/o notice
2018-05-20 17:16:01 +01:00
Daniel Quinn
528b572855 Add hack to allow for logentries to show for all users. 2018-05-20 16:29:00 +01:00
Daniel Quinn
91ddfaa065 Include changelog notes for better clickable area. 2018-05-20 16:28:42 +01:00
Daniel Quinn
ac0cda861e Merge pull request #344 from erikarvstedt/increase_link_area
[Help needed] Increase link area in documents listing
2018-05-20 14:58:08 +01:00
Sven Fischer
a752a4a91a fix bug where docker-entrypoint.sh exits w/o notice
This commit fixes a nasty bug, where the docker-entrypoint.sh silently
exits without any error message. The test for a lock file can fail and
due to the `set -e` at the beginning of the file the bash script exists
without starting the paperless application.
It is fixed by moving the check for the existence of the lock file into
the if statement, where the `set -e` does not trigger an exit in case
the statement fails.

Additionally this commit enables the script to trap exit signals and in
that case deletes the lock file.
2018-05-15 19:34:21 +02:00
Erik Arvstedt
7e1d59377a Add inotify support 2018-05-11 14:14:50 +02:00
Erik Arvstedt
7357471b9e Consumer loop: make sleep duration dynamic
Make the sleep duration dynamic to account for the time spent in
loop_step.
This improves responsiveness when repeatedly consuming newly
arriving docs.

Use float epoch seconds (time.time()) as the time type for
MailFetcher.last_checked to allow for natural time arithmetic.
2018-05-11 14:14:50 +02:00
Erik Arvstedt
bd75a65866 Refactor: renamings, extract fn 'loop'
Renamings:
loop -> loop_step
delta -> next_mail_time (this variable names a point in time, not a duration)

Extracting the 'loop' fn is a preparation for later commits where a
second type of loop is added.
2018-05-11 14:14:25 +02:00
Erik Arvstedt
e65e27d11f Consider mtime of ignored files, garbage-collect ignore list
1. Store the mtime of ignored files so that we can reconsider them if
they have changed.

2. Regularly reset the ignore list to files that still exist in the
consumption dir. Previously, the list could grow indefinitely.
2018-05-11 14:05:30 +02:00
Erik Arvstedt
12488c9634 Simplify ignoring docs 2018-05-11 14:05:29 +02:00
Erik Arvstedt
61cd050e24 Ensure docs have been unmodified for some time before consuming
Previously, the second mtime check for new files usually happened right
after the first one, which could have caused consumption of docs that
were still being modified.

We're now waiting for at least FILES_MIN_UNMODIFIED_DURATION (0.5s).

This also cleans up the logic by eliminating the consumer.stats attribute
and the weird double call to consumer.run().

Additionally, this a fixes memory leak in consumer.stats where paths could be
added but never removed if the corresponding files disappeared from
the consumer dir before being considered ready.
2018-05-11 14:05:29 +02:00
Erik Arvstedt
f018e8e54f Refactor: extract fn try_consume_file
The main purpose of this change is to make the following commits more
readable.
2018-05-11 14:05:28 +02:00
Erik Arvstedt
a56a3eb86d Use os.scandir instead of os.listdir
It's simpler and better suited for use cases introduced in later commits.
2018-05-11 14:05:25 +02:00
Erik Arvstedt
2fe7df8ca0 Consume documents in order of increasing mtime
This increases overall usability, especially for multi-page scans.
Previously, the consumption order was undefined (see os.listdir())
2018-05-11 14:04:37 +02:00
Erik Arvstedt
873c98dddb Refactor: extract fn 'make_dirs' 2018-05-11 14:04:36 +02:00
Erik Arvstedt
ea287e0db2 Fix list out of bounds error in mail message parsing
Check list length before accessing the first two elements of
'dispositions'.
The list may have only a single element ('inline') or may be empty in
mailformed emails.
2018-05-11 14:04:36 +02:00
Erik Arvstedt
4babfa1a5b Set default empty PAPERLESS_EMAIL_SECRET
Previously, if the user didn't set PAPERLESS_EMAIL_SECRET, Paperless
failed with an error in check_body() because self.SECRET was None.
2018-05-11 14:04:31 +02:00
Erik Arvstedt
aa2fc84d7f Mail fetching: Only catch internal errors
Previously, all errors raised during mail fetching were silently caught
and printed without backtrace.

To increase robustness and ease debugging, we now fail with a backtrace
on unexpected errors.
2018-05-11 14:01:09 +02:00
Erik Arvstedt
8d5ae64aff Increase link area in documents listing
Increase the link area to include the whole visual header.

Fixes #335
2018-05-11 13:50:09 +02:00
Daniel Quinn
82f9dde055 Account for KeyError problem in #345 2018-04-28 12:20:43 +01:00
Daniel Quinn
c983e73d0f Account for KeyError problem in #345 2018-04-28 12:19:53 +01:00
Daniel Quinn
20a4a66a57 Clean up test formatting a bit 2018-04-22 16:28:21 +01:00
Daniel Quinn
4ed1fff518 Remove old Python style 2018-04-22 16:28:03 +01:00
Daniel Quinn
7223ea3c3f Don't explode on invalid dates 2018-04-22 16:27:43 +01:00
Daniel Quinn
676c8f9fa7 Patch up thanks.md references 2018-04-22 16:11:58 +01:00
Daniel Quinn
00fd2268c5 Merge pull request #340 from CkuT/issue_334
Fix LogEntry user when PAPERLESS_DISABLE_LOGIN is set to true
2018-04-22 15:51:25 +01:00
CkuT
3aafabba26 Fix LogEntry user when PAPERLESS_DISABLE_LOGIN is set to true 2018-04-17 21:03:18 +02:00
Daniel Quinn
b733b32c1d Update lockfile 2018-04-16 09:53:39 +01:00
Daniel Quinn
4ba9514007 Revert root redirection 2018-04-16 09:53:31 +01:00
Daniel Quinn
4505711e4f Put this file where it's supposed to be 2018-04-15 13:41:08 +01:00
Daniel Quinn
63c394fa31 Document update for subdir support 2018-04-13 20:19:05 +01:00
Daniel Quinn
27c72a7bc6 Remove the hard-coding of the thumbnail URL 2018-04-13 20:18:16 +01:00
Daniel Quinn
72af13e4e4 Allow STATIC_URL and MEDIA_URL to be configurable via env 2018-04-13 20:18:00 +01:00
Daniel Quinn
6c8ef8f044 Use a named URL for the LOGIN_URL value 2018-04-13 20:17:31 +01:00
Daniel Quinn
9d4bebd569 Dependencies update 2018-04-13 19:52:11 +01:00
Daniel Quinn
101b7bb9bf Use a URL name for the redirect instead of a hard-coding 2018-04-13 19:45:14 +01:00
Daniel Quinn
52d6cf085d Fix links and grammar 2018-04-13 19:43:56 +01:00
Daniel Quinn
39ead59e45 Merge pull request #338 from Belonias/master
Greek Translation
2018-04-10 19:40:38 +01:00
Daniel Quinn
015c49030b Ignore .pytest_cache 2018-04-10 19:37:55 +01:00
Daniel Quinn
985b9428fe Add THANKS.md 2018-04-10 19:37:42 +01:00
Daniel Quinn
ea90bd3f84 Merge pull request #333 from erikarvstedt/fix-warnings
Fix runtime warning when adding log entries
2018-04-03 15:38:22 +01:00
Belonias
fccc95254b final(minor changes) 2018-04-01 22:39:40 +03:00
Belonias
e266e114a9 final 2018-04-01 22:34:16 +03:00
Belonias
19faed3634 update 2018-03-31 21:10:45 +03:00
Erik Arvstedt
fcdcf62c2c Fix runtime warning when adding log entries
LogEntry.action_time expects a Django timezone object instead of a builtin datetime.

This fixes a runtime warning of the following kind:
RuntimeWarning: DateTimeField LogEntry.action_time received a naive datetime (2018-03-28 20:53:01.714173) while time zone support is active.
2018-03-30 00:15:52 +02:00
Daniel Quinn
68251b8be6 Add notes for #328 2018-03-23 11:20:20 +00:00
Daniel Quinn
8e63388833 Merge pull request #328 from erikarvstedt/master
Use --noreload for permanent server start commands
2018-03-23 19:17:47 +08:00
Erik Arvstedt
1f2079f65a Use --noreload for permanent server start commands
Without it, the server is highly resource-intensive even when
running idle
2018-03-23 11:13:20 +01:00
Daniel Quinn
f61fa06993 Add new consumption logging feature 2018-03-19 12:59:10 +00:00
Daniel Quinn
da1d3820ec Merge pull request #326 from CkuT/new_document_list
New imported documents list
2018-03-19 12:50:18 +00:00
Daniel Quinn
f778d3a6e3 Remove last remnants of PAPERLESS_SHARED_SECRET 2018-03-18 14:08:41 +00:00
Daniel Quinn
96a94c4ee9 Remove superfluous import 2018-03-18 14:08:29 +00:00
Belonias
b126c6b0ff ... 2018-03-13 20:31:44 +02:00
CkuT
1d162dc769 Add test case 2018-03-13 19:27:59 +01:00
CkuT
a1f257369d Use datetime.now() instead of document.created for LogEntry action_time 2018-03-13 19:09:48 +01:00
CkuT
45e18d7094 Add LogEntry after document consumption
See #319
2018-03-11 17:09:43 +01:00
Belonias
d6fe17f4c6 2nd commit 2018-03-10 11:19:49 +02:00
Daniel Quinn
93bed91937 Merge pull request #325 from jakewins/patch-1
Add curl as dependency in docker container
2018-03-07 10:32:29 +01:00
Jacob Hansson
10d22abd8f Add curl as dependency in docker container
The health check in `docker-compose.yml` uses curl, but the `alpine:3.7` image this Dockerfile builds on doesn't include curl, leading to the health check failing:

    {
        "Start": "2018-03-06T20:48:57.293359619-06:00",
        "End": "2018-03-06T20:48:57.388576132-06:00",
        "ExitCode": -1,
        "Output": "OCI runtime exec failed: exec failed: container_linux.go:348: starting container process caused \"exec: \\\"curl\\\": executable file not found in $PATH\": unknown"
    }

This should be reproducible by simply following the docker-based installation instructions; without this change my `docker-compose up` fails because the health check fails. This change made the docker installation instructions work for me.
2018-03-06 20:57:57 -06:00
Belonias
fdb50d0446 first pass 2018-03-06 17:36:06 +02:00
Daniel Quinn
79bdd829ea Update changelog to reflect #322 2018-03-05 15:44:29 +00:00
Daniel Quinn
1c4226d27c Merge pull request #320 from ovv/detected-document-date-log-iso
Log detected document date with isoformat
2018-03-05 12:37:28 +01:00
Daniel Quinn
b437803321 Merge pull request #321 from RidaAyed/patch-1
Update setup.rst
2018-03-05 12:35:45 +01:00
Daniel Quinn
b6a266a4f7 Merge pull request #322 from Arendtsen/master
Added support for paperless.conf in /usr/local/etc
2018-03-05 12:34:40 +01:00
Martin Arendtsen
b7f1561217 Added support for paperless.conf in /usr/local/etc 2018-03-04 21:37:04 +01:00
Rida Ayed
abbd4d772c Update setup.rst
Fixed typo
2018-03-04 16:30:29 +01:00
Ovv
75ac8d2796 Log detected document date with isoformat 2018-03-04 13:10:49 +01:00
Daniel Quinn
41f816a29b Add changelog data for the new consumer options 2018-03-03 18:49:33 +00:00
Daniel Quinn
4a25e9655c Merge branch 'ovv-consumer-cli-args' 2018-03-03 18:43:41 +00:00
Daniel Quinn
d0252e8e44 Run a --oneshot loop twice
This was necessary since the first loop only ever collects file
statistics so that the second run can be sure about "readiness".
2018-03-03 18:43:20 +00:00
Daniel Quinn
73e62600c2 Clean up docstring to be properly rst 2018-03-03 18:43:20 +00:00
Ovv
5c43041610 fix typo 2018-03-03 18:43:20 +00:00
Ovv
f56dafe7d9 Help & documentation 2018-03-03 18:43:20 +00:00
Ovv
7a1754fffd remove consume env var from pytest.ini 2018-03-03 18:43:20 +00:00
Ovv
f8c6c07bb7 use tmp dir 2018-03-03 18:43:20 +00:00
Ovv
8fefafb844 style & test 2018-03-03 18:43:20 +00:00
Ovv
d1a57b5d68 Configuration cli argument for document_consumer 2018-03-03 18:43:20 +00:00
Daniel Quinn
3adccc0bdb Add affiliated projects 2018-03-02 16:27:53 +00:00
Daniel Quinn
6058630360 Create CODE_OF_CONDUCT.md 2018-02-28 01:48:02 +00:00
Daniel Quinn
81d92fb4ad Merge pull request #316 from ovv/tox
Add back tox
2018-02-27 17:51:20 +01:00
Ovv
40cb0190fc Sphinx warning pngpath to imgpath 2018-02-27 13:48:49 +01:00
Ovv
e5ebd84eca travis doc path 2018-02-27 13:37:01 +01:00
Ovv
673b4cf911 remove double python 3.6 2018-02-27 13:31:05 +01:00
Ovv
ea1260c2ce Add documentation testing 2018-02-27 13:30:02 +01:00
Ovv
7040d13f76 Add back tox 2018-02-27 12:27:21 +01:00
Daniel Quinn
ed36070e92 Merge pull request #315 from ovv/readme-links
Add links in readme badge (doc & travis)
2018-02-26 19:48:58 +01:00
Ovv
c88a7646e5 found the gitter link 2018-02-26 19:22:55 +01:00
Ovv
d174390624 Add links in readme badge (doc & travis) 2018-02-26 19:19:57 +01:00
Daniel Quinn
5bb90cb63d Merge pull request #314 from danielquinn/add-coveralls
Add Coveralls
2018-02-26 12:01:55 +01:00
Daniel Quinn
c4bbb71a3b Have pytest generate the coverage files 2018-02-25 16:42:15 +00:00
Daniel Quinn
7d6cae96f3 Change directory in [script] 2018-02-25 16:08:47 +00:00
Daniel Quinn
b1e616055e Add Pipfile 2018-02-25 15:57:32 +00:00
Daniel Quinn
eec8f09d6f Add CI changes to changelog 2018-02-25 15:56:53 +00:00
Daniel Quinn
c68a6d78eb Add coveralls badge to README 2018-02-25 15:54:55 +00:00
Daniel Quinn
acd3cc5062 Start generating requirements.txt from Pipfile 2018-02-25 15:52:32 +00:00
Daniel Quinn
a55a915439 Consolidate CI tools into setup.cfg and drop tox 2018-02-25 15:51:59 +00:00
55 changed files with 1834 additions and 381 deletions

1
.gitattributes vendored Normal file
View File

@@ -0,0 +1 @@
THANKS.md merge=union

5
.gitignore vendored
View File

@@ -42,6 +42,7 @@ htmlcov/
nosetests.xml
coverage.xml
*,cover
.pytest_cache
# Translations
*.mo
@@ -58,8 +59,8 @@ target/
# Stored PDFs
media/documents/*.gpg
media/documents/thumbnails/*.gpg
media/documents/originals/*.gpg
media/documents/thumbnails/*
media/documents/originals/*
# Sqlite database
db.sqlite3

View File

@@ -9,16 +9,17 @@ sudo: false
matrix:
include:
- python: 3.4
env: TOXENV=py34
- python: 3.5
env: TOXENV=py35
- python: 3.6
env: TOXENV=py36
- python: 3.6
env: TOXENV=pycodestyle
install:
- pip install --requirement requirements.txt
- pip install tox
- pip install sphinx
script:
- cd src/
- pytest --cov
- pycodestyle
- sphinx-build -b html ../docs ../docs/_build -W
script: tox -c src/tox.ini
after_success:
- coveralls

46
CODE_OF_CONDUCT.md Normal file
View File

@@ -0,0 +1,46 @@
# Contributor Covenant Code of Conduct
## Our Pledge
In the interest of fostering an open and welcoming environment, we as contributors and maintainers pledge to making participation in our project and our community a harassment-free experience for everyone, regardless of age, body size, disability, ethnicity, gender identity and expression, level of experience, nationality, personal appearance, race, religion, or sexual identity and orientation.
## Our Standards
Examples of behavior that contributes to creating a positive environment include:
* Using welcoming and inclusive language
* Being respectful of differing viewpoints and experiences
* Gracefully accepting constructive criticism
* Focusing on what is best for the community
* Showing empathy towards other community members
Examples of unacceptable behavior by participants include:
* Unwelcome sexual attention or advances
* Trolling, insulting/derogatory comments, and personal or political attacks
* Public or private harassment
* Publishing others' private information, such as a physical or electronic address, without explicit permission
* Other conduct which could reasonably be considered inappropriate in a professional setting
## Our Responsibilities
Project maintainers are responsible for clarifying the standards of acceptable behavior and are expected to take appropriate and fair corrective action in response to any instances of unacceptable behavior.
Project maintainers have the right and responsibility to remove, edit, or reject comments, commits, code, wiki edits, issues, and other contributions that are not aligned to this Code of Conduct, or to ban temporarily or permanently any contributor for other behaviors that they deem inappropriate, threatening, offensive, or harmful.
## Scope
This Code of Conduct applies both within project spaces and in public spaces when an individual is representing the project or its community. Examples of representing a project or community include using an official project e-mail address, posting via an official social media account, or acting as an appointed representative at an online or offline event. Representation of a project may be further defined and clarified by project maintainers.
## Enforcement
Instances of abusive, harassing, or otherwise unacceptable behavior may be reported by contacting the project team at code@danielquinn.org. The project team will review and investigate all complaints, and will respond in a way that it deems appropriate to the circumstances. The project team is obligated to maintain confidentiality with regard to the reporter of an incident. Further details of specific enforcement policies may be posted separately.
Project maintainers who do not follow or enforce the Code of Conduct in good faith may face temporary or permanent repercussions as determined by other members of the project's leadership.
## Attribution
This Code of Conduct is adapted from the [Contributor Covenant][homepage], version 1.4, available at [http://contributor-covenant.org/version/1/4][version]
[homepage]: http://contributor-covenant.org
[version]: http://contributor-covenant.org/version/1/4/

View File

@@ -17,7 +17,7 @@ ENV PAPERLESS_EXPORT_DIR=/export \
# Install dependencies
RUN apk --no-cache --update add \
python3 gnupg libmagic bash shadow \
python3 gnupg libmagic bash shadow curl \
sudo poppler tesseract-ocr imagemagick ghostscript unpaper && \
apk --no-cache add --virtual .build-dependencies \
python3-dev poppler-dev gcc g++ musl-dev zlib-dev jpeg-dev && \

37
Pipfile Normal file
View File

@@ -0,0 +1,37 @@
[[source]]
url = "https://pypi.python.org/simple"
verify_ssl = true
name = "pypi"
[packages]
django = "<2.0,>=1.11"
pillow = "*"
coveralls = "*"
dateparser = "*"
django-crispy-forms = "*"
django-extensions = "*"
django-filter = "*"
django-flat-responsive = "*"
djangorestframework = "*"
factory-boy = "*"
"flake8" = "*"
filemagic = "*"
fuzzywuzzy = {extras = ["speedup"], version = "==0.15.0"}
gunicorn = "*"
langdetect = "*"
pdftotext = "*"
pyocr = "*"
python-dateutil = "*"
python-dotenv = "*"
python-gnupg = "*"
pytz = "*"
pycodestyle = "*"
pytest = "*"
pytest-cov = "*"
pytest-django = "*"
pytest-sugar = "*"
pytest-env = "*"
pytest-xdist = "*"
[dev-packages]
ipython = "*"

594
Pipfile.lock generated Normal file
View File

@@ -0,0 +1,594 @@
{
"_meta": {
"hash": {
"sha256": "928fbb4c8952128aef7a2ed2707ce510d31d49df96cfc5f08959698edff6e67f"
},
"pipfile-spec": 6,
"requires": {},
"sources": [
{
"name": "pypi",
"url": "https://pypi.python.org/simple",
"verify_ssl": true
}
]
},
"default": {
"apipkg": {
"hashes": [
"sha256:2e38399dbe842891fe85392601aab8f40a8f4cc5a9053c326de35a1cc0297ac6",
"sha256:65d2aa68b28e7d31233bb2ba8eb31cda40e4671f8ac2d6b241e358c9652a74b9"
],
"version": "==1.4"
},
"attrs": {
"hashes": [
"sha256:1c7960ccfd6a005cd9f7ba884e6316b5e430a3f1a6c37c5f87d8b43f83b54ec9",
"sha256:a17a9573a6f475c99b551c0e0a812707ddda1ec9653bed04c13841404ed6f450"
],
"version": "==17.4.0"
},
"certifi": {
"hashes": [
"sha256:14131608ad2fd56836d33a71ee60fa1c82bc9d2c8d98b7bdbc631fe1b3cd1296",
"sha256:edbc3f203427eef571f79a7692bb160a2b0f7ccaa31953e99bd17e307cf63f7d"
],
"version": "==2018.1.18"
},
"chardet": {
"hashes": [
"sha256:84ab92ed1c4d4f16916e05906b6b75a6c0fb5db821cc65e70cbd64a3e2a5eaae",
"sha256:fc323ffcaeaed0e0a02bf4d117757b98aed530d9ed4531e3e15460124c106691"
],
"version": "==3.0.4"
},
"coverage": {
"hashes": [
"sha256:03481e81d558d30d230bc12999e3edffe392d244349a90f4ef9b88425fac74ba",
"sha256:0b136648de27201056c1869a6c0d4e23f464750fd9a9ba9750b8336a244429ed",
"sha256:104ab3934abaf5be871a583541e8829d6c19ce7bde2923b2751e0d3ca44db60a",
"sha256:15b111b6a0f46ee1a485414a52a7ad1d703bdf984e9ed3c288a4414d3871dcbd",
"sha256:198626739a79b09fa0a2f06e083ffd12eb55449b5f8bfdbeed1df4910b2ca640",
"sha256:1c383d2ef13ade2acc636556fd544dba6e14fa30755f26812f54300e401f98f2",
"sha256:28b2191e7283f4f3568962e373b47ef7f0392993bb6660d079c62bd50fe9d162",
"sha256:2eb564bbf7816a9d68dd3369a510be3327f1c618d2357fa6b1216994c2e3d508",
"sha256:337ded681dd2ef9ca04ef5d93cfc87e52e09db2594c296b4a0a3662cb1b41249",
"sha256:3a2184c6d797a125dca8367878d3b9a178b6fdd05fdc2d35d758c3006a1cd694",
"sha256:3c79a6f7b95751cdebcd9037e4d06f8d5a9b60e4ed0cd231342aa8ad7124882a",
"sha256:3d72c20bd105022d29b14a7d628462ebdc61de2f303322c0212a054352f3b287",
"sha256:3eb42bf89a6be7deb64116dd1cc4b08171734d721e7a7e57ad64cc4ef29ed2f1",
"sha256:4635a184d0bbe537aa185a34193898eee409332a8ccb27eea36f262566585000",
"sha256:56e448f051a201c5ebbaa86a5efd0ca90d327204d8b059ab25ad0f35fbfd79f1",
"sha256:5a13ea7911ff5e1796b6d5e4fbbf6952381a611209b736d48e675c2756f3f74e",
"sha256:69bf008a06b76619d3c3f3b1983f5145c75a305a0fea513aca094cae5c40a8f5",
"sha256:6bc583dc18d5979dc0f6cec26a8603129de0304d5ae1f17e57a12834e7235062",
"sha256:701cd6093d63e6b8ad7009d8a92425428bc4d6e7ab8d75efbb665c806c1d79ba",
"sha256:7608a3dd5d73cb06c531b8925e0ef8d3de31fed2544a7de6c63960a1e73ea4bc",
"sha256:76ecd006d1d8f739430ec50cc872889af1f9c1b6b8f48e29941814b09b0fd3cc",
"sha256:7aa36d2b844a3e4a4b356708d79fd2c260281a7390d678a10b91ca595ddc9e99",
"sha256:7d3f553904b0c5c016d1dad058a7554c7ac4c91a789fca496e7d8347ad040653",
"sha256:7e1fe19bd6dce69d9fd159d8e4a80a8f52101380d5d3a4d374b6d3eae0e5de9c",
"sha256:8c3cb8c35ec4d9506979b4cf90ee9918bc2e49f84189d9bf5c36c0c1119c6558",
"sha256:9d6dd10d49e01571bf6e147d3b505141ffc093a06756c60b053a859cb2128b1f",
"sha256:9e112fcbe0148a6fa4f0a02e8d58e94470fc6cb82a5481618fea901699bf34c4",
"sha256:ac4fef68da01116a5c117eba4dd46f2e06847a497de5ed1d64bb99a5fda1ef91",
"sha256:b8815995e050764c8610dbc82641807d196927c3dbed207f0a079833ffcf588d",
"sha256:be6cfcd8053d13f5f5eeb284aa8a814220c3da1b0078fa859011c7fffd86dab9",
"sha256:c1bb572fab8208c400adaf06a8133ac0712179a334c09224fb11393e920abcdd",
"sha256:de4418dadaa1c01d497e539210cb6baa015965526ff5afc078c57ca69160108d",
"sha256:e05cb4d9aad6233d67e0541caa7e511fa4047ed7750ec2510d466e806e0255d6",
"sha256:e4d96c07229f58cb686120f168276e434660e4358cc9cf3b0464210b04913e77",
"sha256:f3f501f345f24383c0000395b26b726e46758b71393267aeae0bd36f8b3ade80",
"sha256:f8a923a85cb099422ad5a2e345fe877bbc89a8a8b23235824a93488150e45f6e"
],
"version": "==4.5.1"
},
"coveralls": {
"hashes": [
"sha256:32569a43c9dbc13fa8199247580a4ab182ef439f51f65bb7f8316d377a1340e8",
"sha256:664794748d2e5673e347ec476159a9d87f43e0d2d44950e98ed0e27b98da8346"
],
"index": "pypi",
"version": "==1.3.0"
},
"dateparser": {
"hashes": [
"sha256:940828183c937bcec530753211b70f673c0a9aab831e43273489b310538dff86",
"sha256:b452ef8b36cd78ae86a50721794bc674aa3994e19b570f7ba92810f4e0a2ae03"
],
"index": "pypi",
"version": "==0.7.0"
},
"django": {
"hashes": [
"sha256:056fe5b9e1f8f7fed9bb392919d64f6b33b3a71cfb0f170a90ee277a6ed32bc2",
"sha256:4d398c7b02761e234bbde490aea13ea94cb539ceeb72805b72303f348682f2eb"
],
"index": "pypi",
"version": "==1.11.12"
},
"django-crispy-forms": {
"hashes": [
"sha256:5952bab971110d0b86c278132dae0aa095beee8f723e625c3d3fa28888f1675f",
"sha256:705ededc554ad8736157c666681165fe22ead2dec0d5446d65fc9dd976a5a876"
],
"index": "pypi",
"version": "==1.7.2"
},
"django-extensions": {
"hashes": [
"sha256:37a543af370ee3b0721ff50442d33c357dd083e6ea06c5b94a199283b6f9e361",
"sha256:bc9f2946c117bb2f49e5e0633eba783787790ae810ea112fe7fd82fa64de2ff1"
],
"index": "pypi",
"version": "==2.0.6"
},
"django-filter": {
"hashes": [
"sha256:ea204242ea83790e1512c9d0d8255002a652a6f4986e93cee664f28955ba0c22",
"sha256:ec0ef1ba23ef95b1620f5d481334413700fb33f45cd76d56a63f4b0b1d76976a"
],
"index": "pypi",
"version": "==1.1.0"
},
"django-flat-responsive": {
"hashes": [
"sha256:451caa2700c541b52fb7ce2d34d3d8dee9e980cf29f5463bc8a8c6256a1a6474"
],
"index": "pypi",
"version": "==2.0"
},
"djangorestframework": {
"hashes": [
"sha256:b6714c3e4b0f8d524f193c91ecf5f5450092c2145439ac2769711f7eba89a9d9",
"sha256:c375e4f95a3a64fccac412e36fb42ba36881e52313ec021ef410b40f67cddca4"
],
"index": "pypi",
"version": "==3.8.2"
},
"docopt": {
"hashes": [
"sha256:49b3a825280bd66b3aa83585ef59c4a8c82f2c8a522dbe754a8bc8d08c85c491"
],
"version": "==0.6.2"
},
"execnet": {
"hashes": [
"sha256:a7a84d5fa07a089186a329528f127c9d73b9de57f1a1131b82bb5320ee651f6a",
"sha256:fc155a6b553c66c838d1a22dba1dc9f5f505c43285a878c6f74a79c024750b83"
],
"version": "==1.5.0"
},
"factory-boy": {
"hashes": [
"sha256:bd5a096d0f102d79b6c78cef1c8c0b650f2e1a3ecba351c735c6d2df8dabd29c",
"sha256:be2abc8092294e4097935a29b4e37f5b9ed3e4205e2e32df215c0315b625995e"
],
"index": "pypi",
"version": "==2.10.0"
},
"faker": {
"hashes": [
"sha256:226d8fa67a8cf8b4007aab721f67639f130e9cfdc53a7095a2290ebb07a65c71",
"sha256:48fed4b4a191e2b42ad20c14115f1c6d36d338b80192075d7573f0f42d7fb321"
],
"version": "==0.8.13"
},
"filemagic": {
"hashes": [
"sha256:e684359ef40820fe406f0ebc5bf8a78f89717bdb7fed688af68082d991d6dbf3"
],
"index": "pypi",
"version": "==1.6"
},
"flake8": {
"hashes": [
"sha256:7253265f7abd8b313e3892944044a365e3f4ac3fcdcfb4298f55ee9ddf188ba0",
"sha256:c7841163e2b576d435799169b78703ad6ac1bbb0f199994fc05f700b2a90ea37"
],
"index": "pypi",
"version": "==3.5.0"
},
"fuzzywuzzy": {
"hashes": [
"sha256:3759bc6859daa0eecef8c82b45404bdac20c23f23136cf4c18b46b426bbc418f",
"sha256:5b36957ccf836e700f4468324fa80ba208990385392e217be077d5cd738ae602"
],
"index": "pypi",
"version": "==0.15.0"
},
"gunicorn": {
"hashes": [
"sha256:75af03c99389535f218cc596c7de74df4763803f7b63eb09d77e92b3956b36c6",
"sha256:eee1169f0ca667be05db3351a0960765620dad53f53434262ff8901b68a1b622"
],
"index": "pypi",
"version": "==19.7.1"
},
"idna": {
"hashes": [
"sha256:2c6a5de3089009e3da7c5dde64a141dbc8551d5b7f6cf4ed7c2568d0cc520a8f",
"sha256:8c7309c718f94b3a625cb648ace320157ad16ff131ae0af362c9f21b80ef6ec4"
],
"version": "==2.6"
},
"langdetect": {
"hashes": [
"sha256:91a170d5f0ade380db809b3ba67f08e95fe6c6c8641f96d67a51ff7e98a9bf30"
],
"index": "pypi",
"version": "==1.0.7"
},
"mccabe": {
"hashes": [
"sha256:ab8a6258860da4b6677da4bd2fe5dc2c659cff31b3ee4f7f5d64e79735b80d42",
"sha256:dd8d182285a0fe56bace7f45b5e7d1a6ebcbf524e8f3bd87eb0f125271b8831f"
],
"version": "==0.6.1"
},
"more-itertools": {
"hashes": [
"sha256:0dd8f72eeab0d2c3bd489025bb2f6a1b8342f9b198f6fc37b52d15cfa4531fea",
"sha256:11a625025954c20145b37ff6309cd54e39ca94f72f6bb9576d1195db6fa2442e",
"sha256:c9ce7eccdcb901a2c75d326ea134e0886abfbea5f93e91cc95de9507c0816c44"
],
"version": "==4.1.0"
},
"pdftotext": {
"hashes": [
"sha256:0b82a9fd255a3f2bf5c861cf9e3174d3c4223e1e441bb060c611dcb4e65c6cb8"
],
"index": "pypi",
"version": "==2.0.2"
},
"pillow": {
"hashes": [
"sha256:00633bc2ec40313f4daf351855e506d296ec3c553f21b66720d0f1225ca84c6f",
"sha256:03514478db61b034fc5d38b9bf060f994e5916776e93f02e59732a8270069c61",
"sha256:040144ba422216aecf7577484865ade90e1a475f867301c48bf9fbd7579efd76",
"sha256:16246261ff22368e5e32ad74d5ef40403ab6895171a7fc6d34f6c17cfc0f1943",
"sha256:1cb38df69362af35c14d4a50123b63c7ff18ec9a6d4d5da629a6f19d05e16ba8",
"sha256:2400e122f7b21d9801798207e424cbe1f716cee7314cd0c8963fdb6fc564b5fb",
"sha256:2ee6364b270b56a49e8b8a51488e847ab130adc1220c171bed6818c0d4742455",
"sha256:3b4560c3891b05022c464b09121bd507c477505a4e19d703e1027a3a7c68d896",
"sha256:41374a6afb3f44794410dab54a0d7175e6209a5a02d407119c81083f1a4c1841",
"sha256:438a3faf5f702c8d0f80b9f9f9b8382cfa048ca6a0d64ef71b86b563b0ee0359",
"sha256:472a124c640bde4d5468f6991c9fa7e30b723d84ac4195a77c6ab6aea30f2b9c",
"sha256:4d32c8e3623a61d6e29ccd024066cd1ba556555abfb4cd714155020e00107e3f",
"sha256:4d8077fd649ac40a5c4165f2c22fa2a4ad18c668e271ecb2f9d849d1017a9313",
"sha256:62ec7ae98357fcd46002c110bb7cad15fce532776f0cbe7ca1d44c49b837d49d",
"sha256:6c7cab6a05351cf61e469937c49dbf3cdf5ffb3eeac71f8d22dc9be3507598d8",
"sha256:6eca36905444c4b91fe61f1b9933a47a30480738a1dd26501ff67d94fc2bc112",
"sha256:74e2ebfd19c16c28ad43b8a28ff73b904ed382ea4875188838541751986e8c9a",
"sha256:7673e7473a13107059377c96c563aa36f73184c29d2926882e0a0210b779a1e7",
"sha256:81762cf5fca9a82b53b7b2d0e6b420e0f3b06167b97678c81d00470daa622d58",
"sha256:8554bbeb4218d9cfb1917c69e6f2d2ad0be9b18a775d2162547edf992e1f5f1f",
"sha256:9b66e968da9c4393f5795285528bc862c7b97b91251f31a08004a3c626d18114",
"sha256:a00edb2dec0035e98ac3ec768086f0b06dfabb4ad308592ede364ef573692f55",
"sha256:b48401752496757e95304a46213c3155bc911ac884bed2e9b275ce1c1df3e293",
"sha256:b6cf18f9e653a8077522bb3aa753a776b117e3e0cc872c25811cfdf1459491c2",
"sha256:bb8adab1877e9213385cbb1adc297ed8337e01872c42a30cfaa66ff8c422779c",
"sha256:c8a4b39ba380b57a31a4b5449a9d257b1302d8bc4799767e645dcee25725efe1",
"sha256:cee9bc75bff455d317b6947081df0824a8f118de2786dc3d74a3503fd631f4ef",
"sha256:d0dc1313dff48af64517cbbd85e046d6b477fbe5e9d69712801f024dcb08c62b",
"sha256:d5bf527ed83617edd1855a5c923eeeaf68bcb9ac0ceb28e3f19b575b3a424984",
"sha256:df5863a21f91de5ecdf7d32a32f406dd9867ebb35d41033b8bd9607a21887599",
"sha256:e39142332541ed2884c257495504858b22c078a5d781059b07aba4c3a80d7551",
"sha256:e52e8f675ba0b2b417fa98579e7286a41a8e23871f17f4793772f5aa884fea79",
"sha256:e6dd55d5d94b9e36929325dd0c9ab85bfde84a5fc35947c334c32af1af668944",
"sha256:e87cc1acbebf263f308a8494272c2d42016aa33c32bf14d209c81e1f65e11868",
"sha256:ea0091cd4100519cedfeea2c659f52291f535ac6725e2368bcf59e874f270efa",
"sha256:eeb247f4f4d962942b3b555530b0c63b77473c7bfe475e51c6b75b7344b49ce3",
"sha256:f0d4433adce6075efd24fc0285135248b0b50f5a58129c7e552030e04fe45c7f",
"sha256:f1f3bd92f8e12dc22884935a73c9f94c4d9bd0d34410c456540713d6b7832b8c",
"sha256:f42a87cbf50e905f49f053c0b1fb86c911c730624022bf44c8857244fc4cdaca",
"sha256:f5f302db65e2e0ae96e26670818157640d3ca83a3054c290eff3631598dcf819",
"sha256:f7634d534662bbb08976db801ba27a112aee23e597eeaf09267b4575341e45bf",
"sha256:fdd374c02e8bb2d6468a85be50ea66e1c4ef9e809974c30d8576728473a6ed03",
"sha256:fe6931db24716a0845bd8c8915bd096b77c2a7043e6fc59ae9ca364fe816f08b"
],
"index": "pypi",
"version": "==5.1.0"
},
"pluggy": {
"hashes": [
"sha256:714306e9b9a7b24ee4c1e3ff6463d7f652cdd30f4693121b31572e2fe1fdaea3",
"sha256:7f8ae7f5bdf75671a718d2daf0a64b7885f74510bcd98b1a0bb420eb9a9d0cff",
"sha256:d345c8fe681115900d6da8d048ba67c25df42973bda370783cd58826442dcd7c",
"sha256:e160a7fcf25762bb60efc7e171d4497ff1d8d2d75a3d0df7a21b76821ecbf5c5"
],
"version": "==0.6.0"
},
"py": {
"hashes": [
"sha256:29c9fab495d7528e80ba1e343b958684f4ace687327e6f789a94bf3d1915f881",
"sha256:983f77f3331356039fdd792e9220b7b8ee1aa6bd2b25f567a963ff1de5a64f6a"
],
"version": "==1.5.3"
},
"pycodestyle": {
"hashes": [
"sha256:1ec08a51c901dfe44921576ed6e4c1f5b7ecbad403f871397feedb5eb8e4fa14",
"sha256:5ff2fbcbab997895ba9ead77e1b38b3ebc2e5c3b8a6194ef918666e4c790a00e",
"sha256:682256a5b318149ca0d2a9185d365d8864a768a28db66a84a2ea946bcc426766",
"sha256:6c4245ade1edfad79c3446fadfc96b0de2759662dc29d07d80a6f27ad1ca6ba9"
],
"index": "pypi",
"version": "==2.3.1"
},
"pyflakes": {
"hashes": [
"sha256:08bd6a50edf8cffa9fa09a463063c425ecaaf10d1eb0335a7e8b1401aef89e6f",
"sha256:8d616a382f243dbf19b54743f280b80198be0bca3a5396f1d2e1fca6223e8805"
],
"version": "==1.6.0"
},
"pyocr": {
"hashes": [
"sha256:9ee8b5f38dd966ca531115fc5fe4715f7fa8961a9f14cd5109c2d938c17a2043"
],
"index": "pypi",
"version": "==0.5.1"
},
"pytest": {
"hashes": [
"sha256:6266f87ab64692112e5477eba395cfedda53b1933ccd29478e671e73b420c19c",
"sha256:fae491d1874f199537fd5872b5e1f0e74a009b979df9d53d1553fd03da1703e1"
],
"index": "pypi",
"version": "==3.5.0"
},
"pytest-cov": {
"hashes": [
"sha256:03aa752cf11db41d281ea1d807d954c4eda35cfa1b21d6971966cc041bbf6e2d",
"sha256:890fe5565400902b0c78b5357004aab1c814115894f4f21370e2433256a3eeec"
],
"index": "pypi",
"version": "==2.5.1"
},
"pytest-django": {
"hashes": [
"sha256:534505e0261cc566279032d9d887f844235342806fd63a6925689670fa1b29d7",
"sha256:7501942093db2250a32a4e36826edfc542347bb9b26c78ed0649cdcfd49e5789"
],
"index": "pypi",
"version": "==3.2.1"
},
"pytest-env": {
"hashes": [
"sha256:7e94956aef7f2764f3c147d216ce066bf6c42948bb9e293169b1b1c880a580c2"
],
"index": "pypi",
"version": "==0.6.2"
},
"pytest-forked": {
"hashes": [
"sha256:e4500cd0509ec4a26535f7d4112a8cc0f17d3a41c29ffd4eab479d2a55b30805",
"sha256:f275cb48a73fc61a6710726348e1da6d68a978f0ec0c54ece5a5fae5977e5a08"
],
"version": "==0.2"
},
"pytest-sugar": {
"hashes": [
"sha256:ab8cc42faf121344a4e9b13f39a51257f26f410e416c52ea11078cdd00d98a2c"
],
"index": "pypi",
"version": "==0.9.1"
},
"pytest-xdist": {
"hashes": [
"sha256:be2662264b035920ba740ed6efb1c816a83c8a22253df7766d129f6a7bfdbd35",
"sha256:e8f5744acc270b3e7d915bdb4d5f471670f049b6fbd163d4cbd52203b075d30f"
],
"index": "pypi",
"version": "==1.22.2"
},
"python-dateutil": {
"hashes": [
"sha256:3220490fb9741e2342e1cf29a503394fdac874bc39568288717ee67047ff29df",
"sha256:9d8074be4c993fbe4947878ce593052f71dac82932a677d49194d8ce9778002e"
],
"index": "pypi",
"version": "==2.7.2"
},
"python-dotenv": {
"hashes": [
"sha256:4965ed170bf51c347a89820e8050655e9c25db3837db6602e906b6d850fad85c",
"sha256:509736185257111613009974e666568a1b031b028b61b500ef1ab4ee780089d5"
],
"index": "pypi",
"version": "==0.8.2"
},
"python-gnupg": {
"hashes": [
"sha256:38f18712b7cfdd0d769bc88a21e90138154b9be2cbffb1e7d28bc37ee73a1c47",
"sha256:5a54a6dd25bf78d3758dd7a1864f4efd122f9ca9402101d90e3ec4483ceafb73"
],
"index": "pypi",
"version": "==0.4.2"
},
"python-levenshtein": {
"hashes": [
"sha256:033a11de5e3d19ea25c9302d11224e1a1898fe5abd23c61c7c360c25195e3eb1"
],
"version": "==0.12.0"
},
"pytz": {
"hashes": [
"sha256:65ae0c8101309c45772196b21b74c46b2e5d11b6275c45d251b150d5da334555",
"sha256:c06425302f2cf668f1bba7a0a03f3c1d34d4ebeef2c72003da308b3947c7f749"
],
"index": "pypi",
"version": "==2018.4"
},
"regex": {
"hashes": [
"sha256:1b428a296531ea1642a7da48562746309c5c06471a97bd0c02dd6a82e9cecee8",
"sha256:27d72bb42dffb32516c28d218bb054ce128afd3e18464f30837166346758af67",
"sha256:32cf4743debee9ea12d3626ee21eae83052763740e04086304e7a74778bf58c9",
"sha256:32f6408dbca35040bc65f9f4ae1444d5546411fde989cb71443a182dd643305e",
"sha256:333687d9a44738c486735955993f83bd22061a416c48f5a5f9e765e90cf1b0c9",
"sha256:35eeccf17af3b017a54d754e160af597036435c58eceae60f1dd1364ae1250c7",
"sha256:361a1fd703a35580a4714ec28d85e29780081a4c399a99bbfb2aee695d72aedb",
"sha256:494bed6396a20d3aa6376bdf2d3fbb1005b8f4339558d8ac7b53256755f80303",
"sha256:5b9c0ddd5b4afa08c9074170a2ea9b34ea296e32aeea522faaaaeeeb2fe0af2e",
"sha256:a50532f61b23d4ab9d216a6214f359dd05c911c1a1ad20986b6738a782926c1a",
"sha256:a9243d7b359b72c681a2c32eaa7ace8d346b7e8ce09d172a683acf6853161d9c",
"sha256:b44624a38d07d3c954c84ad302c29f7930f4bf01443beef5589e9157b14e2a29",
"sha256:be42a601aaaeb7a317f818490a39d153952a97c40c6e9beeb2a1103616405348",
"sha256:eee4d94b1a626490fc8170ffd788883f8c641b576e11ba9b4a29c9f6623371e0",
"sha256:f69d1201a4750f763971ea8364ed95ee888fc128968b39d38883a72a4d005895"
],
"version": "==2018.2.21"
},
"requests": {
"hashes": [
"sha256:6a1b267aa90cac58ac3a765d067950e7dbbf75b1da07e895d1f594193a40a38b",
"sha256:9c443e7324ba5b85070c4a818ade28bfabedf16ea10206da1132edaa6dda237e"
],
"version": "==2.18.4"
},
"six": {
"hashes": [
"sha256:70e8a77beed4562e7f14fe23a786b54f6296e34344c23bc42f07b15018ff98e9",
"sha256:832dc0e10feb1aa2c68dcc57dbb658f1c7e65b9b61af69048abc87a2db00a0eb"
],
"version": "==1.11.0"
},
"termcolor": {
"hashes": [
"sha256:1d6d69ce66211143803fbc56652b41d73b4a400a2891d7bf7a1cdf4c02de613b"
],
"version": "==1.1.0"
},
"text-unidecode": {
"hashes": [
"sha256:5a1375bb2ba7968740508ae38d92e1f889a0832913cb1c447d5e2046061a396d",
"sha256:801e38bd550b943563660a91de8d4b6fa5df60a542be9093f7abf819f86050cc"
],
"version": "==1.2"
},
"tzlocal": {
"hashes": [
"sha256:4ebeb848845ac898da6519b9b31879cf13b6626f7184c496037b818e238f2c4e"
],
"version": "==1.5.1"
},
"urllib3": {
"hashes": [
"sha256:06330f386d6e4b195fbfc736b297f58c5a892e4440e54d294d7004e3a9bbea1b",
"sha256:cc44da8e1145637334317feebd728bd869a35285b93cbb4cca2577da7e62db4f"
],
"version": "==1.22"
}
},
"develop": {
"backcall": {
"hashes": [
"sha256:38ecd85be2c1e78f77fd91700c76e14667dc21e2713b63876c0eb901196e01e4",
"sha256:bbbf4b1e5cd2bdb08f915895b51081c041bac22394fdfcfdfbe9f14b77c08bf2"
],
"version": "==0.1.0"
},
"decorator": {
"hashes": [
"sha256:2c51dff8ef3c447388fe5e4453d24a2bf128d3a4c32af3fabef1f01c6851ab82",
"sha256:c39efa13fbdeb4506c476c9b3babf6a718da943dab7811c206005a4a956c080c"
],
"version": "==4.3.0"
},
"ipython": {
"hashes": [
"sha256:85882f97d75122ff8cdfe129215a408085a26039527110c8d4a2b8a5e45b7639",
"sha256:a6ac981381b3f5f604b37a293369963485200e3639fb0404fa76092383c10c41"
],
"index": "pypi",
"version": "==6.3.1"
},
"ipython-genutils": {
"hashes": [
"sha256:72dd37233799e619666c9f639a9da83c34013a73e8bbc79a7a6348d93c61fab8",
"sha256:eb2e116e75ecef9d4d228fdc66af54269afa26ab4463042e33785b887c628ba8"
],
"version": "==0.2.0"
},
"jedi": {
"hashes": [
"sha256:1972f694c6bc66a2fac8718299e2ab73011d653a6d8059790c3476d2353b99ad",
"sha256:5861f6dc0c16e024cbb0044999f9cf8013b292c05f287df06d3d991a87a4eb89"
],
"version": "==0.12.0"
},
"parso": {
"hashes": [
"sha256:62bd6bf7f04ab5c817704ff513ef175328676471bdef3629d4bdd46626f75551",
"sha256:a75a304d7090d2c67bd298091c14ef9d3d560e3c53de1c239617889f61d1d307"
],
"version": "==0.2.0"
},
"pexpect": {
"hashes": [
"sha256:9783f4644a3ef8528a6f20374eeb434431a650c797ca6d8df0d81e30fffdfa24",
"sha256:9f8eb3277716a01faafaba553d629d3d60a1a624c7cf45daa600d2148c30020c"
],
"markers": "sys_platform != 'win32'",
"version": "==4.5.0"
},
"pickleshare": {
"hashes": [
"sha256:84a9257227dfdd6fe1b4be1319096c20eb85ff1e82c7932f36efccfe1b09737b",
"sha256:c9a2541f25aeabc070f12f452e1f2a8eae2abd51e1cd19e8430402bdf4c1d8b5"
],
"version": "==0.7.4"
},
"prompt-toolkit": {
"hashes": [
"sha256:1df952620eccb399c53ebb359cc7d9a8d3a9538cb34c5a1344bdbeb29fbcc381",
"sha256:3f473ae040ddaa52b52f97f6b4a493cfa9f5920c255a12dc56a7d34397a398a4",
"sha256:858588f1983ca497f1cf4ffde01d978a3ea02b01c8a26a8bbc5cd2e66d816917"
],
"version": "==1.0.15"
},
"ptyprocess": {
"hashes": [
"sha256:e64193f0047ad603b71f202332ab5527c5e52aa7c8b609704fc28c0dc20c4365",
"sha256:e8c43b5eee76b2083a9badde89fd1bbce6c8942d1045146e100b7b5e014f4f1a"
],
"version": "==0.5.2"
},
"pygments": {
"hashes": [
"sha256:78f3f434bcc5d6ee09020f92ba487f95ba50f1e3ef83ae96b9d5ffa1bab25c5d",
"sha256:dbae1046def0efb574852fab9e90209b23f556367b5a320c0bcb871c77c3e8cc"
],
"version": "==2.2.0"
},
"simplegeneric": {
"hashes": [
"sha256:dc972e06094b9af5b855b3df4a646395e43d1c9d0d39ed345b7393560d0b9173"
],
"version": "==0.8.1"
},
"six": {
"hashes": [
"sha256:70e8a77beed4562e7f14fe23a786b54f6296e34344c23bc42f07b15018ff98e9",
"sha256:832dc0e10feb1aa2c68dcc57dbb658f1c7e65b9b61af69048abc87a2db00a0eb"
],
"version": "==1.11.0"
},
"traitlets": {
"hashes": [
"sha256:9c4bd2d267b7153df9152698efb1050a5d84982d3384a37b2c1f7723ba3e7835",
"sha256:c6cb5e6f57c5a9bdaa40fa71ce7b4af30298fbab9ece9815b5d995ab6217c7d9"
],
"version": "==4.3.2"
},
"wcwidth": {
"hashes": [
"sha256:3df37372226d6e63e1b1e1eda15c594bca98a22d33a23832a90998faa96bc65e",
"sha256:f4ebe71925af7b40a864553f761ed559b43544f8f71746c2d756c7fe788ade7c"
],
"version": "==0.1.7"
}
}
}

80
README-el.md Normal file
View File

@@ -0,0 +1,80 @@
*[English](README.md)*
# Paperless
[![Documentation](https://readthedocs.org/projects/paperless/badge/?version=latest)](https://paperless.readthedocs.org/) [![Chat](https://badges.gitter.im/danielquinn/paperless.svg)](https://gitter.im/danielquinn/paperless) [![Travis](https://travis-ci.org/danielquinn/paperless.svg?branch=master)](https://travis-ci.org/danielquinn/paperless) [![Coverage Status](https://coveralls.io/repos/github/danielquinn/paperless/badge.svg?branch=master)](https://coveralls.io/github/danielquinn/paperless?branch=master) [![Thanks](https://img.shields.io/badge/THANKS-md-ff69b4.svg)](https://github.com/danielquinn/paperless/blob/master/THANKS.md)
Ευρετήριο και αρχείο για όλα σας τα σκαναρισμένα έγγραφα
Μισώ το χαρτί. Πέρα από τα περιβαλλοντικά ζητήματα, είναι ο εφιάλτης ενός τεχνικού.
* Δεν υπάρχει η δυνατότητα της αναζήτησης
* Πιάνουν πολύ χώρο
* Τα αντίγραφα ασφαλείας σημάινουν περισσότερο χαρτί
Τους τελευταίους μήνες μου έχει τύχει αρκετές φορές να μην μπορώ να βρω το σωστό έγγραφο. Κάποιες φορές ανακύκλωνα το έγγραφο που χρειαζόμουν (ποιος κρατάει τους λογαριασμούς του νερού για 2 χρόνια;;;) και κάποιες φορές απλά το έχανα ... επειδή έτσι είναι τα χαρτιά. Το έκανα αυτό για να κάνω την ζωή μου πιο εύκολη
## Πως δουλεύει
Η εφαρμογή Paperless δεν ελέγχει το scanner σας, αλλά σας βοηθάει με τα αποτελέσματα του scanner σας.
1. Αγοράστε ένα scanner με πρόσβαση στο δίκτυο σας. Αν χρειάζεστε έμπνευση, δείτε την σελίδα με τα [προτεινόμενα scanner](https://paperless.readthedocs.io/en/latest/scanners.html).
2. Κάντε την ρύθμιση "scan to FTP" ή κάτι παρόμοιο. Θα μπορεί να αποθηκεύει τις σκαναρισμένες εικόνες σε έναν server χωρίς να χρειάζεται να κάνετε κάτι. Φυσικά άμα το scanner σας δεν μπορεί να αποθηκεύσει κάπου τις εικόνες σας αυτόματα μπορείτε να το κάνετε χειροκίνητα. Το Paperless δεν ενδιαφέρεται πως καταλήγουν κάπου τα αρχεία.
3. Να έχετε τον server που τρέχει το OCR script του Paperless να έχει ευρετήριο στην τοπική βάση δεδομένων.
4. Χρησιμοποιήστε το web frontend για να επιλέξετε βάση δεδομένων και να βρείτε αυτό που θέλετε.
5. Κατεβάστε το PDF που θέλετε/χρειάζεστε μέσω του web interface και κάντε ότι θέλετε με αυτό. Μπορείτε ακόμη να το εκτυπώσετε και να το στείλετε, σαν να ήταν το αρχικό. Στις περισσότερες περιπτώσεις κανείς δεν θα το προσέξει ή θα νοιαστεί.
Αυτό είναι που θα πάρετε:
![Το πριν και το μετά](https://raw.githubusercontent.com/danielquinn/paperless/master/docs/_static/screenshot.png)
## Documentation
Είναι όλα διαθέσιμα εδώ [ReadTheDocs](https://paperless.readthedocs.org/).
## Απαιτήσεις
Όλα αυτά είναι πολύ απλά, και φιλικά προς τον χρήστη, μια συλλογή με πολύτιμα εργαλεία.
* [ImageMagick](http://imagemagick.org/) μετατρέπει τις εικόνες σε έγχρωμες και ασπρόμαυρες.
* [Tesseract](https://github.com/tesseract-ocr) κάνει την αναγνώρηση των χαρακτήρων.
* [Unpaper](https://www.flameeyes.eu/projects/unpaper) despeckles and deskews the scanned image.
* [GNU Privacy Guard](https://gnupg.org/) χρησιμοποιείται για κρυπτογράφηση στο backend.
* [Python 3](https://python.org/) είναι η γλώσσα του project.
* [Pillow](https://pypi.python.org/pypi/pillowfight/) Φορτώνει την εικόνα σαν αντικείμενο στην python και μπορεί να χρησιμοποιηθεί με PyOCR
* [PyOCR](https://github.com/jflesch/pyocr) is a slick programmatic wrapper around tesseract.
* [Django](https://www.djangoproject.com/) το framework με το οποίο έγινε το project.
* [Python-GNUPG](http://pythonhosted.org/python-gnupg/) Αποκρυπτογραφεί τα PDF αρχεία στη στιγμή ώστε να κατεβάζετε αποκρυπτογραφημένα αρχεία, αφήνοντας τα κρυπτογραφημένα στον δίσκο.
## Σταθερότητα
Αυτό το project υπάρχει από το 2015 και υπάρχουν αρκετοί άνθρωποι που το χρησιμοποιούν, παρόλα αυτά βρίσκεται σε διαρκή ανάπτυξη (απλά δείτε πότε commit έχουν γίνει στο git history) οπότε μην περιμένετε να είναι 100% σταθερό. Μπορείτε να κάνετε backup την βάση δεδομένων sqlite3, τον φάκελο media και το configuration αρχείο σας ώστε να είστε ασφαλείς.
## Affiliated Projects
Το Paperless υπάρχει εδώ και κάποιο καιρό και άνθρωποι έχουν αρχίσει να φτιάχνουν πράγματα γύρω από αυτό. Αν είσαι ένας από αυτούς τους ανθρώπους, μπορούμε να βάλουμε το project σου σε αυτήν την λίστα:
* [Paperless Desktop](https://github.com/thomasbrueggemann/paperless-desktop): Μια desktop εφαρμογή για εγκατάσταση του Paperless. Τρέχει σε Mac, Linux, και Windows.
* [ansible-role-paperless](https://github.com/ovv/ansible-role-paperless): Ένας εύκολο τρόπος για να τρέχει το Paperless μέσω Ansible.
## Παρόμοια Projects
Υπάρχει ένα άλλο ṕroject που λέγεται [Mayan EDMS](https://mayan.readthedocs.org/en/latest/) το οποίο έχει παρόμοια τεχνικά χαρακτηριστικά με το Paperless σε εντυπωσιακό βαθμό. Επίσης βασισμένο στο Django και χρησιμοποιώντας το consumer model με Tesseract και Unpaper, Mayan EDMS έχει *πολλά* περισσότερα χαρακτηριστικά και έρχεται με ένα επιδέξιο UI, αλλά είναι ακόμα σε Python 2. Μπορεί να είναι ότι το Paperless καταναλώνει λιγότερους πόρους, αλλά για να είμαι ειλικρινής, αυτό είναι μια εικασία την οποία δεν έχω επιβεβαιώσει μόνος μου. Ένα πράγμα είναι σίγουρο, το *Paperless* έχει **πολύ** καλύτερο όνομα.
## Σημαντική Σημείωση
Τα scanner για αρχεία συνήθως χρησιμοποιούνται για ευαίσθητα αρχεία. Πράγματα όπως το ΑΜΚΑ, φορολογικά αρχεία, τιμολόγια κτλπ. Παρόλο που το Paperless κρυπτογραφεί τα αρχικά αρχεία μέσω του consumption script, το κείμενο OCR *δεν είναι* κρυπτογραφημένο και για αυτό αποθηκεύεται (πρέπει να είναι αναζητήσιμο, οπότε αν κάποιος ξέρει να το κάνει αυτό με κρυπτογραφημένα δεδομένα είμαι όλος αυτιά). Αυτό σημάνει ότι το Paperless δεν πρέπει ποτέ να τρέχει σε μη αξιόπιστο πάροχο. Για αυτό συστήνω αν θέλετε να το τρέξετε να το τρέξετε σε έναν τοπικό server σπίτι σας.
## Δωρεές
Όπως με όλα τα δωρεάν λογισμικά, η δύναμη δεν βρίσκεται στα οικονομικά αλλά στην συλλογική προσπάθεια. Αλήθεια εκτιμώ κάθε pull request και bug report που προσφέρεται από τους χρήστες του Paperless, οπότε σας παρακαλώ συνεχίστε. Αν παρόλα αυτά, δεν μπορείτε να γράψετε κώδικα/να κάνέτε design/να γράψετε documentation, και θέλετε να συνεισφέρετε οικονομικά, δεν θα πω όχι ;-)
Το θέμα είναι ότι είμαι οικονομικά εντάξει, οπότε θα σας ζητήσω να δωρίσετε τα χρήματα σας εδώ [United Nations High Commissioner for Refugees](https://donate.unhcr.org/int-en/general). Κάνουν σημαντική δουλειά και χρειάζονται τα χρήματα πολύ περισσότερο από ότι εγώ.

View File

@@ -1,6 +1,8 @@
*[Greek](README-el.md)*
# Paperless
![Documentation](https://readthedocs.org/projects/paperless/badge/?version=latest) ![Chat](https://badges.gitter.im/danielquinn/paperless.svg) ![Travis](https://travis-ci.org/danielquinn/paperless.svg?branch=master)
[![Documentation](https://readthedocs.org/projects/paperless/badge/?version=latest)](https://paperless.readthedocs.org/) [![Chat](https://badges.gitter.im/danielquinn/paperless.svg)](https://gitter.im/danielquinn/paperless) [![Travis](https://travis-ci.org/danielquinn/paperless.svg?branch=master)](https://travis-ci.org/danielquinn/paperless) [![Coverage Status](https://coveralls.io/repos/github/danielquinn/paperless/badge.svg?branch=master)](https://coveralls.io/github/danielquinn/paperless?branch=master) [![Thanks](https://img.shields.io/badge/THANKS-md-ff69b4.svg)](https://github.com/danielquinn/paperless/blob/master/THANKS.md)
Index and archive all of your scanned paper documents
@@ -48,9 +50,19 @@ This is all really a quite simple, shiny, user-friendly wrapper around some very
* [Python-GNUPG](http://pythonhosted.org/python-gnupg/) decrypts the PDFs on-the-fly to allow you to download unencrypted files, leaving the encrypted ones on-disk.
## Stability
## Project Status
This project has been around since 2015, and there's lots of people using it, however it's still under active development (just look at the git commit history) so don't expect it to be 100% stable. You can backup the sqlite3 database, media directory and your configuration file to be on the safe side.
This project has been around since 2015, and there's lots of people using it. For some reason, it's really popular in Germany -- maybe someone over there can clue me in as to why?
I am no longer doing new development on Paperless as it does exactly what I need it to and have since turned my attention to my latest project, [Aletheia](https://github.com/danielquinn/aletheia). However, I'm not abandoning this project. I am happy to field pull requests and answer questions in the issue queue. If you're a developer yourself and want a new feature, float it in the issue queue and/or send me a pull request! I'm happy to add new stuff, but I just don't have the time to do that work myself.
## Affiliated Projects
Paperless has been around a while now, and people are starting to build stuff on top of it. If you're one of those people, we can add your project to this list:
* [Paperless Desktop](https://github.com/thomasbrueggemann/paperless-desktop): A desktop UI for your Paperless installation. Runs on Mac, Linux, and Windows.
* [ansible-role-paperless](https://github.com/ovv/ansible-role-paperless): An easy way to get Paperless running via Ansible.
## Similar Projects

19
THANKS.md Normal file
View File

@@ -0,0 +1,19 @@
# Thanks for using Paperless!
Working on this project has been exhausting, but rewarding at the same time.
It's just wonderful that so many people are using this thing, and in so many
crazy ways.
This file is here for everyone to post their own stories about how you use this
code. It helps me to understand who's using it and why, and maybe to give
others an idea of how it might be used. It's based on a Twitter exchange
between [John Glanville](https://twitter.com/hexapodium) and
[Julia Evans](https://github.com/jvns) and later better defined [here](https://github.com/paulmolluzzo/thanks-md).
To contribute, simply issue a pull request that appends to this file something
like this:
```
### Your Name
Some friendly message
```

View File

@@ -1,8 +1,9 @@
# Environment variables to set for Paperless
# Commented out variables will be replaced by a default within Paperless.
# Passphrase Paperless uses to encrypt and decrypt your documents
PAPERLESS_PASSPHRASE=CHANGE_ME
# Passphrase Paperless uses to encrypt and decrypt your documents, if you want
# encryption at all.
# PAPERLESS_PASSPHRASE=CHANGE_ME
# The amount of threads to use for text recognition
# PAPERLESS_OCR_THREADS=4
@@ -13,3 +14,25 @@ PAPERLESS_PASSPHRASE=CHANGE_ME
# You can change the default user and group id to a custom one
# USERMAP_UID=1000
# USERMAP_GID=1000
###############################################################################
#### Mail Consumption ####
###############################################################################
# These values are required if you want paperless to check a particular email
# box every 10 minutes and attempt to consume documents from there. If you
# don't define a HOST, mail checking will just be disabled.
# Don't use quotes after = or it will crash your docker
# PAPERLESS_CONSUME_MAIL_HOST=
# PAPERLESS_CONSUME_MAIL_PORT=
# PAPERLESS_CONSUME_MAIL_USER=
# PAPERLESS_CONSUME_MAIL_PASS=
# Override the default IMAP inbox here. If it's not set, Paperless defaults to
# INBOX.
# PAPERLESS_CONSUME_MAIL_INBOX=INBOX
# Any email sent to the target account that does not contain this text will be
# ignored. Mail checking won't work without this.
# PAPERLESS_EMAIL_SECRET=

View File

@@ -24,7 +24,7 @@ services:
# value with nothing.
environment:
- PAPERLESS_OCR_LANGUAGES=
command: ["runserver", "--insecure", "0.0.0.0:8000"]
command: ["runserver", "--insecure", "--noreload", "0.0.0.0:8000"]
consumer:
build: ./

View File

@@ -1,8 +1,72 @@
Changelog
#########
1.3.0 (Unreleased)
==================
2.0.0
=====
This is a big release as we've changed a core-functionality of Paperless: we no
longer encrypt files with GPG by default.
The reasons for this are many, but it boils down to that the encryption wasn't
really all that useful, as files on-disk were still accessible so long as you
had the key, and the key was most typically stored in the config file. In
other words, your files are only as safe as the ``paperless`` user is. In
addition to that, *the contents of the documents were never encrypted*, so
important numbers etc. were always accessible simply by querying the database.
Still, it was better than nothing, but the consensus from users appears to be
that it was more an annoyance than anything else, so this feature is now turned
off unless you explicitly set a passphrase in your config file.
Migrating from 1.x
------------------
Encryption isn't gone, it's just off for new users. So long as you have
``PAPERLESS_PASSPHRASE`` set in your config or your environment, Paperless
should continue to operate as it always has. If however, you want to drop
encryption too, you only need to do two things:
1. Run ``./manage.py migrate && ./manage.py change_storage_type gpg unencrypted``.
This will go through your entire database and Decrypt All The Things.
2. Remove ``PAPERLESS_PASSPHRASE`` from your ``paperless.conf`` file, or simply
stop declaring it in your environment.
1.4.0
=====
* `Quentin Dawans`_ has refactored the document consumer to allow for some
command-line options. Notably, you can now direct it to consume from a
particular ``--directory``, limit the ``--loop-time``, set the time between
mail server checks with ``--mail-delta`` or just run it as a one-off with
``--one-shot``. See `#305`_ & `#313`_ for more information.
* Refactor the use of travis/tox/pytest/coverage into two files:
``.travis.yml`` and ``setup.cfg``.
* Start generating requirements.txt from a Pipfile. I'll probably switch over
to just using pipenv in the future.
* All for a alternative FreeBSD-friendly location for ``paperless.conf``.
Thanks to `Martin Arendtsen`_ who provided this (`#322`_).
* Document consumption events are now logged in the Django admin events log.
Thanks to `CkuT`_ for doing the legwork on this one and to `Quentin Dawans`_
& `David Martin`_ for helping to coordinate & work out how the feature would
be developed.
* `erikarvstedt`_ contributed a pull request (`#328`_) to add ``--noreload``
to the default server start process. This helps reduce the load imposed
by the running webservice.
* Through some discussion on `#253`_ and `#323`_, we've removed a few of the
hardcoded URL values to make it easier for people to host Paperless on a
subdirectory. Thanks to `Quentin Dawans`_ and `Kyle Lucy`_ for helping to
work this out.
* The clickable area for documents on the listing page has been increased to a
more predictable space thanks to a glorious hack from `erikarvstedt`_ in
`#344`_.
* `Strubbl`_ noticed an annoying bug in the bash script wrapping the Docker
entrypoint and fixed it with some very creating Bash skills: `#352`_.
* You can now use the search field to find documents by tag thanks to
`thinkjk`_'s *first ever issue*: `#354`_.
* Inotify is now being used to detect additions to the consume directory thanks
to some excellent work from `erikarvstedt`_ on `#351`_
1.3.0
=====
* You can now run Paperless without a login, though you'll still have to create
at least one user. This is thanks to a pull-request from `matthewmoto`_:
@@ -352,11 +416,16 @@ Changelog
.. _Dan Panzarella: https://github.com/pzl
.. _addadi: https://github.com/addadi
.. _BastianPoe: https://github.com/BastianPoe
.. _matthewmoto: https://github.com/BastianPoe
.. _matthewmoto: https://github.com/matthewmoto
.. _Isaac: https://github.com/isaacsando
.. _Georgi Todorov: https://github.com/TeraHz
.. _Jeffrey Portman: https://github.com/ChromoX
.. _Simon Taddiken: https://github.com/skuzzle
.. _Quentin Dawans: https://github.com/ovv
.. _Martin Arendtsen: https://github.com/Arendtsen
.. _erikarvstedt: https://github.com/erikarvstedt
.. _Kyle Lucy: https://github.com/kmlucy
.. _thinkjk: https://github.com/thinkjk
.. _#20: https://github.com/danielquinn/paperless/issues/20
.. _#44: https://github.com/danielquinn/paperless/issues/44
@@ -413,10 +482,20 @@ Changelog
.. _#300: https://github.com/danielquinn/paperless/pull/300
.. _#301: https://github.com/danielquinn/paperless/issues/301
.. _#303: https://github.com/danielquinn/paperless/issues/303
.. _#305: https://github.com/danielquinn/paperless/issues/305
.. _#306: https://github.com/danielquinn/paperless/issues/306
.. _#308: https://github.com/danielquinn/paperless/issues/308
.. _#311: https://github.com/danielquinn/paperless/pull/311
.. _#312: https://github.com/danielquinn/paperless/pull/312
.. _#313: https://github.com/danielquinn/paperless/pull/313
.. _#322: https://github.com/danielquinn/paperless/pull/322
.. _#328: https://github.com/danielquinn/paperless/pull/328
.. _#253: https://github.com/danielquinn/paperless/issues/253
.. _#323: https://github.com/danielquinn/paperless/issues/323
.. _#344: https://github.com/danielquinn/paperless/pull/344
.. _#351: https://github.com/danielquinn/paperless/pull/351
.. _#352: https://github.com/danielquinn/paperless/pull/352
.. _#354: https://github.com/danielquinn/paperless/issues/354
.. _pipenv: https://docs.pipenv.org/
.. _a new home on Docker Hub: https://hub.docker.com/r/danielquinn/paperless/

View File

@@ -40,7 +40,7 @@ extensions = [
'sphinx.ext.autodoc',
'sphinx.ext.intersphinx',
'sphinx.ext.todo',
'sphinx.ext.pngmath',
'sphinx.ext.imgmath',
'sphinx.ext.viewcode',
]

View File

@@ -17,7 +17,8 @@ The primary method of getting documents into your database is by putting them in
the consumption directory. The ``document_consumer`` script runs in an infinite
loop looking for new additions to this directory and when it finds them, it goes
about the process of parsing them with the OCR, indexing what it finds, and
encrypting the PDF, storing it in the media directory.
encrypting the PDF (if ``PAPERLESS_PASSPHRASE`` is set), storing it in the
media directory.
Getting stuff into this directory is up to you. If you're running Paperless
on your local computer, you might just want to drag and drop files there, but if

View File

@@ -16,7 +16,7 @@ Backing Up
----------
So you're bored of this whole project, or you want to make a remote backup of
the unencrypted files for whatever reason. This is easy to do, simply use the
your files for whatever reason. This is easy to do, simply use the
:ref:`exporter <utilities-exporter>` to dump your documents and database out
into an arbitrary directory.

View File

@@ -63,17 +63,18 @@ Standard (Bare Metal)
1. Install the requirements as per the :ref:`requirements <requirements>` page.
2. Within the extract of master.zip go to the ``src`` directory.
3. Copy ``paperless.conf.example`` to ``/etc/paperless.conf`` also the virtual
envrionment look there for it and open it in your favourite editor.
Because this file contains passwords it should only be readable by user root
and paperless ! Set the values for:
3. Copy ``../paperless.conf.example`` to ``/etc/paperless.conf`` and open it in
your favourite editor. Because this file contains passwords it should only
be readable by user root and paperless! Set the values for:
* ``PAPERLESS_CONSUMPTION_DIR``: this is where your documents will be
dumped to be consumed by Paperless.
* ``PAPERLESS_PASSPHRASE``: this is the passphrase Paperless uses to
encrypt/decrypt the original document.
* ``PAPERLESS_OCR_THREADS``: this is the number of threads the OCR process
will spawn to process document pages in parallel.
* ``PAPERLESS_PASSPHRASE``: this is only required if you want to use GPG to
encrypt your document files. This is the passphrase Paperless uses to
encrypt/decrypt the original documents. Don't worry about defining this
if you don't want to use encryption (the default).
4. Initialise the SQLite database with ``./manage.py migrate``.
5. Create a user for your Paperless instance with
@@ -139,7 +140,8 @@ Docker Method
``PAPERLESS_PASSPHRASE``
This is the passphrase Paperless uses to encrypt/decrypt the original
document.
document. If you aren't planning on using GPG encryption, you can just
leave this undefined.
``PAPERLESS_OCR_THREADS``
This is the number of threads the OCR process will spawn to process
@@ -265,11 +267,12 @@ Vagrant Method
3. Run ``vagrant ssh`` and once inside your new vagrant box, edit
``/etc/paperless.conf`` and set the values for:
* ``PAPERLESS_CONSUMPTION_DIR``: this is where your documents will be
* ``PAPERLESS_CONSUMPTION_DIR``: This is where your documents will be
dumped to be consumed by Paperless.
* ``PAPERLESS_PASSPHRASE``: this is the passphrase Paperless uses to
encrypt/decrypt the original document.
* ``PAPERLESS_SHARED_SECRET``: this is the "magic word" used when consuming
* ``PAPERLESS_PASSPHRASE``: This is the passphrase Paperless uses to
encrypt/decrypt the original document. It's only required if you want
your original files to be encrypted, otherwise, just leave it unset.
* ``PAPERLESS_EMAIL_SECRET``: this is the "magic word" used when consuming
documents from mail or via the API. If you don't use either, leaving it
blank is just fine.
@@ -352,7 +355,7 @@ after restarting your system:
respawn limit 10 5
script
exec /srv/paperless/src/manage.py runserver 0.0.0.0:80
exec /srv/paperless/src/manage.py runserver --noreload 0.0.0.0:80
end script
Note that you'll need to replace ``/srv/paperless/src/manage.py`` with the

View File

@@ -33,8 +33,11 @@ The webserver is started via the ``manage.py`` script:
By default, the server runs on localhost, port 8000, but you can change this
with a few arguments, run ``manage.py --help`` for more information.
Note that this command runs continuously, so exiting it will mean your webserver
disappears. If you want to run this full-time (which is kind of the point)
Add the option ``--noreload`` to reduce resource usage. Otherwise, the server
continuously polls all source files for changes to auto-reload them.
Note that when exiting this command your webserver will disappear.
If you want to run this full-time (which is kind of the point)
you'll need to have it start in the background -- something you'll need to
figure out for your own system. To get you started though, there are Systemd
service files in the ``scripts`` directory.
@@ -46,17 +49,18 @@ The Consumer
------------
The consumer script runs in an infinite loop, constantly looking at a directory
for PDF files to parse and index. The process is pretty straightforward:
for documents to parse and index. The process is pretty straightforward:
1. Look in ``CONSUMPTION_DIR`` for a PDF. If one is found, go to #2. If not,
wait 10 seconds and try again.
2. Parse the PDF with Tesseract
1. Look in ``CONSUMPTION_DIR`` for a document. If one is found, go to #2.
If not, wait 10 seconds and try again. On Linux, new documents are detected
instantly via inotify, so there's no waiting involved.
2. Parse the document with Tesseract
3. Create a new record in the database with the OCR'd text
4. Attempt to automatically assign document attributes by doing some guesswork.
Read up on the :ref:`guesswork documentation<guesswork>` for more
information about this process.
5. Encrypt the PDF and store it in the ``media`` directory under
``documents/pdf``.
5. Encrypt the document (if you have a passphrase set) and store it in the
``media`` directory under ``documents/originals``.
6. Go to #1.
@@ -71,8 +75,8 @@ The consumer is started via the ``manage.py`` script:
$ /path/to/paperless/src/manage.py document_consumer
This starts the service that will run in a loop, consuming PDF files as they
appear in ``CONSUMPTION_DIR``.
This starts the service that will consume documents as they appear in
``CONSUMPTION_DIR``.
Note that this command runs continuously, so exiting it will mean your webserver
disappears. If you want to run this full-time (which is kind of the point)
@@ -80,6 +84,13 @@ you'll need to have it start in the background -- something you'll need to
figure out for your own system. To get you started though, there are Systemd
service files in the ``scripts`` directory.
Some command line arguments are available to customize the behavior of the
consumer. By default it will use ``/etc/paperless.conf`` values. Display the
help with:
.. code-block:: shell-session
$ /path/to/paperless/src/manage.py document_consumer --help
.. _utilities-exporter:
@@ -87,8 +98,8 @@ The Exporter
------------
Tired of fiddling with Paperless, or just want to do something stupid and are
afraid of accidentally damaging your files? You can export all of your PDFs
into neatly named, dated, and unencrypted.
afraid of accidentally damaging your files? You can export all of your
documents into neatly named, dated, and unencrypted files.
.. _utilities-exporter-howto:
@@ -102,10 +113,10 @@ This too is done via the ``manage.py`` script:
$ /path/to/paperless/src/manage.py document_exporter /path/to/somewhere/
This will dump all of your unencrypted PDFs into ``/path/to/somewhere`` for you
to do with as you please. The files are accompanied with a special file,
``manifest.json`` which can be used to
:ref:`import the files <utilities-importer>` at a later date if you wish.
This will dump all of your unencrypted documents into ``/path/to/somewhere``
for you to do with as you please. The files are accompanied with a special
file, ``manifest.json`` which can be used to :ref:`import the files
<utilities-importer>` at a later date if you wish.
.. _utilities-exporter-howto-docker:

View File

@@ -29,6 +29,15 @@ PAPERLESS_CONSUMPTION_DIR=""
#PAPERLESS_STATICDIR=""
# Override the MEDIA_URL here. Unless you're hosting Paperless off a subdomain
# like /paperless/, you probably don't need to change this.
#PAPERLESS_MEDIA_URL="/media/"
# Override the STATIC_URL here. Unless you're hosting Paperless off a
# subdomain like /paperless/, you probably don't need to change this.
#PAPERLESS_STATIC_URL="/static/"
# These values are required if you want paperless to check a particular email
# box every 10 minutes and attempt to consume documents from there. If you
# don't define a HOST, mail checking will just be disabled.
@@ -50,19 +59,19 @@ PAPERLESS_EMAIL_SECRET=""
#### Security ####
###############################################################################
# You must have a passphrase in order for Paperless to work at all. If you set
# this to "", GNUGPG will "encrypt" your PDF by writing it out as a zero-byte
# file.
#
# The passphrase you use here will be used when storing your documents in
# Paperless, but you can always export them in an unencrypted format by using
# document exporter. See the documentation for more information.
# Paperless can be instructed to attempt to encrypt your PDF files with GPG
# using the PAPERLESS_PASSPHRASE specified below. If however you're not
# concerned about encrypting these files (for example if you have disk
# encryption locally) then you don't need this and can safely leave this value
# un-set.
#
# One final note about the passphrase. Once you've consumed a document with
# one passphrase, DON'T CHANGE IT. Paperless assumes this to be a constant and
# can't properly export documents that were encrypted with an old passphrase if
# you've since changed it to a new one.
PAPERLESS_PASSPHRASE="secret"
#
# The default is to not use encryption at all.
#PAPERLESS_PASSPHRASE="secret"
# The secret key has a default that should be fine so long as you're hosting
@@ -156,6 +165,8 @@ PAPERLESS_PASSPHRASE="secret"
#PAPERLESS_CONVERT_DENSITY=300
# (This setting is ignored on Linux where inotify is used instead of a
# polling loop.)
# The number of seconds that Paperless will wait between checking
# PAPERLESS_CONSUMPTION_DIR. If you tend to write documents to this directory
# rarely, you may want to use a higher value than the default (10).

View File

@@ -1,28 +1,52 @@
Django>=1.11,<2.0
Pillow>=3.1.1
dateparser>=0.6.0
django-crispy-forms>=1.6.1
django-extensions>=1.7.6
django-filter>=1.0
django-flat-responsive>=1.2.0
djangorestframework>=3.5.3
filemagic>=1.6
fuzzywuzzy[speedup]==0.15.0
gunicorn>=19.7.1
langdetect>=1.0.7
pdftotext>=2.0.1
pyocr>=0.4.7
python-dateutil>=2.6.0
python-dotenv>=0.6.2
python-gnupg>=0.3.9
pytz>=2016.10
# For the tests
factory-boy
flake8
pytest==3.3.2 # Newer versions break with pytest-sugar
pytest-django
pytest-sugar
pytest-env
pycodestyle
tox
apipkg==1.4
attrs==18.1.0
certifi==2018.4.16
chardet==3.0.4
coverage==4.5.1
coveralls==1.3.0
dateparser==0.7.0
django-crispy-forms==1.7.2
django-extensions==2.0.7
django-filter==1.1.0
django-flat-responsive==2.0
django==1.11.13
djangorestframework==3.8.2
docopt==0.6.2
execnet==1.5.0
factory-boy==2.11.1
faker==0.8.15
filemagic==1.6
flake8==3.5.0
fuzzywuzzy==0.15.0
gunicorn==19.8.1
idna==2.6
inotify_simple==1.1.7; sys_platform == 'linux'
langdetect==1.0.7
mccabe==0.6.1
more-itertools==4.1.0
pdftotext==2.0.2
pillow==5.1.0
pluggy==0.6.0
py==1.5.3
pycodestyle==2.3.1
pyflakes==1.6.0
pyocr==0.5.1
pytest-cov==2.5.1
pytest-django==3.2.1
pytest-env==0.6.2
pytest-forked==0.2
pytest-sugar==0.9.1
pytest-xdist==1.22.2
pytest==3.5.1
python-dateutil==2.7.3
python-dotenv==0.8.2
python-gnupg==0.4.2
python-levenshtein==0.12.0
pytz==2018.4
regex==2018.2.21
requests==2.18.4
six==1.11.0
termcolor==1.1.0
text-unidecode==1.2
tzlocal==1.5.1
urllib3==1.22

View File

@@ -46,11 +46,10 @@ migrations() {
# A simple lock file in case other containers use this startup
LOCKFILE="/usr/src/paperless/data/db.sqlite3.migration"
set -o noclobber
# check for and create lock file in one command
(> ${LOCKFILE}) &> /dev/null
if [ $? -eq 0 ]
if (set -o noclobber; echo "$$" > "${LOCKFILE}") 2> /dev/null
then
trap 'rm -f "${LOCKFILE}"; exit $?' INT TERM EXIT
sudo -HEu paperless "/usr/src/paperless/src/manage.py" "migrate"
rm ${LOCKFILE}
fi

View File

@@ -4,7 +4,7 @@ Description=Paperless webserver
[Service]
User=paperless
Group=paperless
ExecStart=/home/paperless/project/virtualenv/bin/python /home/paperless/project/src/manage.py runserver 0.0.0.0:8000
ExecStart=/home/paperless/project/virtualenv/bin/python /home/paperless/project/src/manage.py runserver --noreload 0.0.0.0:8000
[Install]
WantedBy=multi-user.target

View File

@@ -0,0 +1 @@
from .checks import changed_password_check

View File

@@ -124,8 +124,10 @@ class DocumentAdmin(CommonAdmin):
"all": ("paperless.css",)
}
search_fields = ("correspondent__name", "title", "content")
list_display = ("title", "created", "thumbnail", "correspondent", "tags_")
search_fields = ("correspondent__name", "title", "content", "tags__name")
readonly_fields = ("added",)
list_display = ("title", "created", "added", "thumbnail", "correspondent",
"tags_")
list_filter = ("tags", "correspondent", FinancialYearFilter,
MonthListFilter)
@@ -139,19 +141,17 @@ class DocumentAdmin(CommonAdmin):
created_.short_description = "Created"
def thumbnail(self, obj):
if settings.FORCE_SCRIPT_NAME:
src_link = "{}/fetch/thumb/{}".format(
settings.FORCE_SCRIPT_NAME, obj.id)
else:
src_link = "/fetch/thumb/{}".format(obj.id)
png_img = self._html_tag(
"img",
src=src_link,
width=180,
alt="Thumbnail of {}".format(obj.file_name),
title=obj.file_name
return self._html_tag(
"a",
self._html_tag(
"img",
src=reverse("fetch", kwargs={"kind": "thumb", "pk": obj.pk}),
width=180,
alt="Thumbnail of {}".format(obj.file_name),
title=obj.file_name
),
href=obj.download_url
)
return self._html_tag("a", png_img, href=obj.download_url)
thumbnail.allow_tags = True
def tags_(self, obj):

View File

@@ -15,13 +15,15 @@ class DocumentsConfig(AppConfig):
set_tags,
run_pre_consume_script,
run_post_consume_script,
cleanup_document_deletion
cleanup_document_deletion,
set_log_entry
)
document_consumption_started.connect(run_pre_consume_script)
document_consumption_finished.connect(set_tags)
document_consumption_finished.connect(set_correspondent)
document_consumption_finished.connect(set_log_entry)
document_consumption_finished.connect(run_post_consume_script)
post_delete.connect(cleanup_document_deletion)

39
src/documents/checks.py Normal file
View File

@@ -0,0 +1,39 @@
import textwrap
from django.conf import settings
from django.core.checks import Error, register
from django.db.utils import OperationalError
@register()
def changed_password_check(app_configs, **kwargs):
from documents.models import Document
from paperless.db import GnuPG
try:
encrypted_doc = Document.objects.filter(
storage_type=Document.STORAGE_TYPE_GPG).first()
except OperationalError:
return [] # No documents table yet
if encrypted_doc:
if not settings.PASSPHRASE:
return [Error(
"The database contains encrypted documents but no password "
"is set."
)]
if not GnuPG.decrypted(encrypted_doc.source_file):
return [Error(textwrap.dedent(
"""
The current password doesn't match the password of the
existing documents.
If you intend to change your password, you must first export
all of the old documents, start fresh with the new password
and then re-import them."
"""))]
return []

View File

@@ -3,8 +3,10 @@ import hashlib
import logging
import os
import re
import time
import uuid
from operator import itemgetter
from django.conf import settings
from django.utils import timezone
from paperless.db import GnuPG
@@ -27,36 +29,40 @@ class Consumer:
Loop over every file found in CONSUMPTION_DIR and:
1. Convert it to a greyscale pnm
2. Use tesseract on the pnm
3. Encrypt and store the document in the MEDIA_ROOT
3. Store the document in the MEDIA_ROOT with optional encryption
4. Store the OCR'd text in the database
5. Delete the document and image(s)
"""
SCRATCH = settings.SCRATCH_DIR
CONSUME = settings.CONSUMPTION_DIR
# Files are considered ready for consumption if they have been unmodified
# for this duration
FILES_MIN_UNMODIFIED_DURATION = 0.5
def __init__(self):
def __init__(self, consume=settings.CONSUMPTION_DIR,
scratch=settings.SCRATCH_DIR):
self.logger = logging.getLogger(__name__)
self.logging_group = None
try:
os.makedirs(self.SCRATCH)
except FileExistsError:
pass
self.stats = {}
self._ignore = []
self.consume = consume
self.scratch = scratch
if not self.CONSUME:
os.makedirs(self.scratch, exist_ok=True)
self.storage_type = Document.STORAGE_TYPE_UNENCRYPTED
if settings.PASSPHRASE:
self.storage_type = Document.STORAGE_TYPE_GPG
if not self.consume:
raise ConsumerError(
"The CONSUMPTION_DIR settings variable does not appear to be "
"set."
)
if not os.path.exists(self.CONSUME):
if not os.path.exists(self.consume):
raise ConsumerError(
"Consumption directory {} does not exist".format(self.CONSUME))
"Consumption directory {} does not exist".format(self.consume))
self.parsers = []
for response in document_consumer_declaration.send(self):
@@ -73,83 +79,99 @@ class Consumer:
"group": self.logging_group
})
def consume(self):
def consume_new_files(self):
"""
Find non-ignored files in consumption dir and consume them if they have
been unmodified for FILES_MIN_UNMODIFIED_DURATION.
"""
ignored_files = []
files = []
for entry in os.scandir(self.consume):
if entry.is_file():
file = (entry.path, entry.stat().st_mtime)
if file in self._ignore:
ignored_files.append(file)
else:
files.append(file)
for doc in os.listdir(self.CONSUME):
if not files:
return
doc = os.path.join(self.CONSUME, doc)
# Set _ignore to only include files that still exist.
# This keeps it from growing indefinitely.
self._ignore[:] = ignored_files
if not os.path.isfile(doc):
continue
files_old_to_new = sorted(files, key=itemgetter(1))
if not re.match(FileInfo.REGEXES["title"], doc):
continue
time.sleep(self.FILES_MIN_UNMODIFIED_DURATION)
if doc in self._ignore:
continue
for file, mtime in files_old_to_new:
if mtime == os.path.getmtime(file):
# File has not been modified and can be consumed
if not self.try_consume_file(file):
self._ignore.append((file, mtime))
if not self._is_ready(doc):
continue
def try_consume_file(self, file):
"Return True if file was consumed"
if self._is_duplicate(doc):
self.log(
"info",
"Skipping {} as it appears to be a duplicate".format(doc)
)
self._ignore.append(doc)
continue
if not re.match(FileInfo.REGEXES["title"], file):
return False
parser_class = self._get_parser_class(doc)
if not parser_class:
self.log(
"error", "No parsers could be found for {}".format(doc))
self._ignore.append(doc)
continue
doc = file
self.logging_group = uuid.uuid4()
if self._is_duplicate(doc):
self.log(
"info",
"Skipping {} as it appears to be a duplicate".format(doc)
)
return False
self.log("info", "Consuming {}".format(doc))
parser_class = self._get_parser_class(doc)
if not parser_class:
self.log(
"error", "No parsers could be found for {}".format(doc))
return False
document_consumption_started.send(
sender=self.__class__,
filename=doc,
logging_group=self.logging_group
self.logging_group = uuid.uuid4()
self.log("info", "Consuming {}".format(doc))
document_consumption_started.send(
sender=self.__class__,
filename=doc,
logging_group=self.logging_group
)
parsed_document = parser_class(doc)
try:
thumbnail = parsed_document.get_thumbnail()
date = parsed_document.get_date()
document = self._store(
parsed_document.get_text(),
doc,
thumbnail,
date
)
except ParseError as e:
self.log("error", "PARSE FAILURE for {}: {}".format(doc, e))
parsed_document.cleanup()
return False
else:
parsed_document.cleanup()
self._cleanup_doc(doc)
self.log(
"info",
"Document {} consumption finished".format(document)
)
parsed_document = parser_class(doc)
try:
thumbnail = parsed_document.get_thumbnail()
date = parsed_document.get_date()
document = self._store(
parsed_document.get_text(),
doc,
thumbnail,
date
)
except ParseError as e:
self._ignore.append(doc)
self.log("error", "PARSE FAILURE for {}: {}".format(doc, e))
parsed_document.cleanup()
continue
else:
parsed_document.cleanup()
self._cleanup_doc(doc)
self.log(
"info",
"Document {} consumption finished".format(document)
)
document_consumption_finished.send(
sender=self.__class__,
document=document,
logging_group=self.logging_group
)
document_consumption_finished.send(
sender=self.__class__,
document=document,
logging_group=self.logging_group
)
return True
def _get_parser_class(self, doc):
"""
@@ -195,7 +217,8 @@ class Consumer:
file_type=file_info.extension,
checksum=hashlib.md5(f.read()).hexdigest(),
created=created,
modified=created
modified=created,
storage_type=self.storage_type
)
relevant_tags = set(list(Tag.match_all(text)) + list(file_info.tags))
@@ -204,42 +227,26 @@ class Consumer:
self.log("debug", "Tagging with {}".format(tag_names))
document.tags.add(*relevant_tags)
# Encrypt and store the actual document
with open(doc, "rb") as unencrypted:
with open(document.source_path, "wb") as encrypted:
self.log("debug", "Encrypting the document")
encrypted.write(GnuPG.encrypted(unencrypted))
# Encrypt and store the thumbnail
with open(thumbnail, "rb") as unencrypted:
with open(document.thumbnail_path, "wb") as encrypted:
self.log("debug", "Encrypting the thumbnail")
encrypted.write(GnuPG.encrypted(unencrypted))
self._write(document, doc, document.source_path)
self._write(document, thumbnail, document.thumbnail_path)
self.log("info", "Completed")
return document
def _write(self, document, source, target):
with open(source, "rb") as read_file:
with open(target, "wb") as write_file:
if document.storage_type == Document.STORAGE_TYPE_UNENCRYPTED:
write_file.write(read_file.read())
return
self.log("debug", "Encrypting")
write_file.write(GnuPG.encrypted(read_file))
def _cleanup_doc(self, doc):
self.log("debug", "Deleting document {}".format(doc))
os.unlink(doc)
def _is_ready(self, doc):
"""
Detect whether `doc` is ready to consume or if it's still being written
to by the uploader.
"""
t = os.stat(doc).st_mtime
if self.stats.get(doc) == t:
del(self.stats[doc])
return True
self.stats[doc] = t
return False
@staticmethod
def _is_duplicate(doc):
with open(doc, "rb") as f:

View File

@@ -92,7 +92,7 @@ class UploadForm(forms.Form):
t = int(mktime(datetime.now().timetuple()))
file_name = os.path.join(
Consumer.CONSUME,
settings.CONSUMPTION_DIR,
"{} - {}.{}".format(correspondent, title, self._file_type)
)

View File

@@ -13,7 +13,6 @@ from dateutil import parser
from django.conf import settings
from .consumer import Consumer
from .models import Correspondent
@@ -21,7 +20,7 @@ class MailFetcherError(Exception):
pass
class InvalidMessageError(Exception):
class InvalidMessageError(MailFetcherError):
pass
@@ -43,10 +42,7 @@ class Message(Loggable):
and n attachments, and that we don't care about the message body.
"""
SECRET = os.getenv(
"PAPERLESS_EMAIL_SECRET",
os.getenv("PAPERLESS_SHARED_SECRET") # TODO: Remove after 2017/09
)
SECRET = os.getenv("PAPERLESS_EMAIL_SECRET")
def __init__(self, data, group=None):
"""
@@ -79,6 +75,9 @@ class Message(Loggable):
continue
dispositions = content_disposition.strip().split(";")
if len(dispositions) < 2:
continue
if not dispositions[0].lower() == "attachment" and \
"filename" not in dispositions[1].lower():
continue
@@ -151,7 +150,7 @@ class Attachment(object):
class MailFetcher(Loggable):
def __init__(self):
def __init__(self, consume=settings.CONSUMPTION_DIR):
Loggable.__init__(self)
@@ -163,8 +162,11 @@ class MailFetcher(Loggable):
self._inbox = os.getenv("PAPERLESS_CONSUME_MAIL_INBOX", "INBOX")
self._enabled = bool(self._host)
if self._enabled and Message.SECRET is None:
raise MailFetcherError("No PAPERLESS_EMAIL_SECRET defined")
self.last_checked = datetime.datetime.now()
self.last_checked = time.time()
self.consume = consume
def pull(self):
"""
@@ -185,12 +187,12 @@ class MailFetcher(Loggable):
self.log("info", 'Storing email: "{}"'.format(message.subject))
t = int(time.mktime(message.time.timetuple()))
file_name = os.path.join(Consumer.CONSUME, message.file_name)
file_name = os.path.join(self.consume, message.file_name)
with open(file_name, "wb") as f:
f.write(message.attachment.data)
os.utime(file_name, times=(t, t))
self.last_checked = datetime.datetime.now()
self.last_checked = time.time()
def _get_messages(self):
@@ -208,7 +210,7 @@ class MailFetcher(Loggable):
self._connection.close()
self._connection.logout()
except Exception as e:
except MailFetcherError as e:
self.log("error", str(e))
return r

View File

@@ -0,0 +1,119 @@
import os
from django.conf import settings
from django.core.management.base import BaseCommand, CommandError
from termcolor import colored as coloured
from documents.models import Document
from paperless.db import GnuPG
class Command(BaseCommand):
help = (
"This is how you migrate your stored documents from an encrypted "
"state to an unencrypted one (or vice-versa)"
)
def add_arguments(self, parser):
parser.add_argument(
"from",
choices=("gpg", "unencrypted"),
help="The state you want to change your documents from"
)
parser.add_argument(
"to",
choices=("gpg", "unencrypted"),
help="The state you want to change your documents to"
)
parser.add_argument(
"--passphrase",
help="If PAPERLESS_PASSPHRASE isn't set already, you need to "
"specify it here"
)
def handle(self, *args, **options):
try:
print(coloured(
"\n\nWARNING: This script is going to work directly on your "
"document originals, so\nWARNING: you probably shouldn't run "
"this unless you've got a recent backup\nWARNING: handy. It "
"*should* work without a hitch, but be safe and backup your\n"
"WARNING: stuff first.\n\nHit Ctrl+C to exit now, or Enter to "
"continue.\n\n",
"yellow",
attrs=("bold",)
))
__ = input()
except KeyboardInterrupt:
return
if options["from"] == options["to"]:
raise CommandError(
'The "from" and "to" values can\'t be the same.'
)
passphrase = options["passphrase"] or settings.PASSPHRASE
if not passphrase:
raise CommandError(
"Passphrase not defined. Please set it with --passphrase or "
"by declaring it in your environment or your config."
)
if options["from"] == "gpg" and options["to"] == "unencrypted":
self.__gpg_to_unencrypted(passphrase)
elif options["from"] == "unencrypted" and options["to"] == "gpg":
self.__unencrypted_to_gpg(passphrase)
@staticmethod
def __gpg_to_unencrypted(passphrase):
encrypted_files = Document.objects.filter(
storage_type=Document.STORAGE_TYPE_GPG)
for document in encrypted_files:
print(coloured("Decrypting {}".format(document), "green"))
old_paths = [document.source_path, document.thumbnail_path]
raw_document = GnuPG.decrypted(document.source_file, passphrase)
raw_thumb = GnuPG.decrypted(document.thumbnail_file, passphrase)
document.storage_type = Document.STORAGE_TYPE_UNENCRYPTED
with open(document.source_path, "wb") as f:
f.write(raw_document)
with open(document.thumbnail_path, "wb") as f:
f.write(raw_thumb)
document.save(update_fields=("storage_type",))
for path in old_paths:
os.unlink(path)
@staticmethod
def __unencrypted_to_gpg(passphrase):
unencrypted_files = Document.objects.filter(
storage_type=Document.STORAGE_TYPE_UNENCRYPTED)
for document in unencrypted_files:
print(coloured("Encrypting {}".format(document), "green"))
old_paths = [document.source_path, document.thumbnail_path]
with open(document.source_path, "rb") as raw_document:
with open(document.thumbnail_path, "rb") as raw_thumb:
document.storage_type = Document.STORAGE_TYPE_GPG
with open(document.source_path, "wb") as f:
f.write(GnuPG.encrypted(raw_document, passphrase))
with open(document.thumbnail_path, "wb") as f:
f.write(GnuPG.encrypted(raw_thumb, passphrase))
document.save(update_fields=("storage_type",))
for path in old_paths:
os.unlink(path)

View File

@@ -1,6 +1,7 @@
import datetime
import logging
import os
import sys
import time
from django.conf import settings
@@ -9,6 +10,11 @@ from django.core.management.base import BaseCommand, CommandError
from ...consumer import Consumer, ConsumerError
from ...mail import MailFetcher, MailFetcherError
try:
from inotify_simple import INotify, flags
except ImportError:
pass
class Command(BaseCommand):
"""
@@ -16,9 +22,6 @@ class Command(BaseCommand):
consumption directory, and fetch any mail available.
"""
LOOP_TIME = settings.CONSUMER_LOOP_TIME
MAIL_DELTA = datetime.timedelta(minutes=10)
ORIGINAL_DOCS = os.path.join(settings.MEDIA_ROOT, "documents", "originals")
THUMB_DOCS = os.path.join(settings.MEDIA_ROOT, "documents", "thumbnails")
@@ -32,44 +35,113 @@ class Command(BaseCommand):
BaseCommand.__init__(self, *args, **kwargs)
def add_arguments(self, parser):
parser.add_argument(
"directory",
default=settings.CONSUMPTION_DIR,
nargs="?",
help="The consumption directory."
)
parser.add_argument(
"--loop-time",
default=settings.CONSUMER_LOOP_TIME,
type=int,
help="Wait time between each loop (in seconds)."
)
parser.add_argument(
"--mail-delta",
default=10,
type=int,
help="Wait time between each mail fetch (in minutes)."
)
parser.add_argument(
"--oneshot",
action="store_true",
help="Run only once."
)
parser.add_argument(
"--no-inotify",
action="store_true",
help="Don't use inotify, even if it's available."
)
def handle(self, *args, **options):
self.verbosity = options["verbosity"]
directory = options["directory"]
loop_time = options["loop_time"]
mail_delta = options["mail_delta"] * 60
use_inotify = (not options["no_inotify"]
and "inotify_simple" in sys.modules)
try:
self.file_consumer = Consumer()
self.mail_fetcher = MailFetcher()
self.file_consumer = Consumer(consume=directory)
self.mail_fetcher = MailFetcher(consume=directory)
except (ConsumerError, MailFetcherError) as e:
raise CommandError(e)
for path in (self.ORIGINAL_DOCS, self.THUMB_DOCS):
try:
os.makedirs(path)
except FileExistsError:
pass
for d in (self.ORIGINAL_DOCS, self.THUMB_DOCS):
os.makedirs(d, exist_ok=True)
logging.getLogger(__name__).info(
"Starting document consumer at {}".format(settings.CONSUMPTION_DIR)
"Starting document consumer at {}{}".format(
directory,
" with inotify" if use_inotify else ""
)
)
try:
while True:
self.loop()
time.sleep(self.LOOP_TIME)
if self.verbosity > 1:
print(".")
except KeyboardInterrupt:
print("Exiting")
if options["oneshot"]:
self.loop_step(mail_delta)
else:
try:
if use_inotify:
self.loop_inotify(mail_delta)
else:
self.loop(loop_time, mail_delta)
except KeyboardInterrupt:
print("Exiting")
def loop(self):
def loop(self, loop_time, mail_delta):
while True:
start_time = time.time()
if self.verbosity > 1:
print(".", int(start_time))
self.loop_step(mail_delta, start_time)
# Sleep until the start of the next loop step
time.sleep(max(0, start_time + loop_time - time.time()))
# Consume whatever files we can
self.file_consumer.consume()
def loop_step(self, mail_delta, time_now=None):
# Occasionally fetch mail and store it to be consumed on the next loop
# We fetch email when we first start up so that it is not necessary to
# wait for 10 minutes after making changes to the config file.
delta = self.mail_fetcher.last_checked + self.MAIL_DELTA
if self.first_iteration or delta < datetime.datetime.now():
next_mail_time = self.mail_fetcher.last_checked + mail_delta
if self.first_iteration or time_now > next_mail_time:
self.first_iteration = False
self.mail_fetcher.pull()
self.file_consumer.consume_new_files()
def loop_inotify(self, mail_delta):
directory = self.file_consumer.consume
inotify = INotify()
inotify.add_watch(directory, flags.CLOSE_WRITE | flags.MOVED_TO)
# Run initial mail fetch and consume all currently existing documents
self.loop_step(mail_delta)
next_mail_time = self.mail_fetcher.last_checked + mail_delta
while True:
# Consume documents until next_mail_time
while True:
delta = next_mail_time - time.time()
if delta > 0:
for event in inotify.read(timeout=delta):
file = os.path.join(directory, event.name)
if os.path.isfile(file):
self.file_consumer.try_consume_file(file)
else:
break
self.mail_fetcher.pull()
next_mail_time = self.mail_fetcher.last_checked + mail_delta

View File

@@ -1,8 +1,8 @@
import json
import os
import time
import shutil
from django.conf import settings
from django.core.management.base import BaseCommand, CommandError
from django.core import serializers
@@ -45,9 +45,6 @@ class Command(Renderable, BaseCommand):
if not os.access(self.target, os.W_OK):
raise CommandError("That path doesn't appear to be writable")
if not settings.PASSPHRASE:
settings.PASSPHRASE = input("Please enter the passphrase: ")
if options["legacy"]:
self.dump_legacy()
else:
@@ -73,13 +70,20 @@ class Command(Renderable, BaseCommand):
print("Exporting: {}".format(file_target))
t = int(time.mktime(document.created.timetuple()))
with open(file_target, "wb") as f:
f.write(GnuPG.decrypted(document.source_file))
os.utime(file_target, times=(t, t))
if document.storage_type == Document.STORAGE_TYPE_GPG:
with open(thumbnail_target, "wb") as f:
f.write(GnuPG.decrypted(document.thumbnail_file))
os.utime(thumbnail_target, times=(t, t))
with open(file_target, "wb") as f:
f.write(GnuPG.decrypted(document.source_file))
os.utime(file_target, times=(t, t))
with open(thumbnail_target, "wb") as f:
f.write(GnuPG.decrypted(document.thumbnail_file))
os.utime(thumbnail_target, times=(t, t))
else:
shutil.copy(document.source_path, file_target)
shutil.copy(document.thumbnail_path, thumbnail_target)
manifest += json.loads(
serializers.serialize("json", Correspondent.objects.all()))

View File

@@ -1,5 +1,6 @@
import json
import os
import shutil
from django.conf import settings
from django.core.management.base import BaseCommand, CommandError
@@ -46,12 +47,6 @@ class Command(Renderable, BaseCommand):
self._check_manifest()
if not settings.PASSPHRASE:
raise CommandError(
"You need to define a passphrase before continuing. Please "
"consult the documentation for setting up Paperless."
)
# Fill up the database with whatever is in the manifest
call_command("loaddata", manifest_path)
@@ -99,14 +94,21 @@ class Command(Renderable, BaseCommand):
document_path = os.path.join(self.source, doc_file)
thumbnail_path = os.path.join(self.source, thumb_file)
with open(document_path, "rb") as unencrypted:
with open(document.source_path, "wb") as encrypted:
print("Encrypting {} and saving it to {}".format(
doc_file, document.source_path))
encrypted.write(GnuPG.encrypted(unencrypted))
if document.storage_type == Document.STORAGE_TYPE_GPG:
with open(thumbnail_path, "rb") as unencrypted:
with open(document.thumbnail_path, "wb") as encrypted:
print("Encrypting {} and saving it to {}".format(
thumb_file, document.thumbnail_path))
encrypted.write(GnuPG.encrypted(unencrypted))
with open(document_path, "rb") as unencrypted:
with open(document.source_path, "wb") as encrypted:
print("Encrypting {} and saving it to {}".format(
doc_file, document.source_path))
encrypted.write(GnuPG.encrypted(unencrypted))
with open(thumbnail_path, "rb") as unencrypted:
with open(document.thumbnail_path, "wb") as encrypted:
print("Encrypting {} and saving it to {}".format(
thumb_file, document.thumbnail_path))
encrypted.write(GnuPG.encrypted(unencrypted))
else:
shutil.copy(document_path, document.source_path)
shutil.copy(thumbnail_path, document.thumbnail_path)

View File

@@ -0,0 +1,25 @@
# -*- coding: utf-8 -*-
# Generated by Django 1.10.5 on 2017-07-15 17:12
from __future__ import unicode_literals
from django.contrib.auth.models import User
from django.db import migrations
def forwards_func(apps, schema_editor):
User.objects.create(username="consumer")
def reverse_func(apps, schema_editor):
User.objects.get(username="consumer").delete()
class Migration(migrations.Migration):
dependencies = [
('documents', '0018_auto_20170715_1712'),
]
operations = [
migrations.RunPython(forwards_func, reverse_func),
]

View File

@@ -0,0 +1,27 @@
# -*- coding: utf-8 -*-
from __future__ import unicode_literals
from django.db import migrations, models
import django.utils.timezone
def set_added_time_to_created_time(apps, schema_editor):
Document = apps.get_model("documents", "Document")
for doc in Document.objects.all():
doc.added = doc.created
doc.save()
class Migration(migrations.Migration):
dependencies = [
('documents', '0019_add_consumer_user'),
]
operations = [
migrations.AddField(
model_name='document',
name='added',
field=models.DateTimeField(db_index=True, default=django.utils.timezone.now, editable=False),
),
migrations.RunPython(set_added_time_to_created_time)
]

View File

@@ -0,0 +1,30 @@
# -*- coding: utf-8 -*-
# Generated by Django 1.11.10 on 2018-02-04 13:07
from __future__ import unicode_literals
from django.db import migrations, models
class Migration(migrations.Migration):
dependencies = [
('documents', '0020_document_added'),
]
operations = [
# Add the field with the default GPG-encrypted value
migrations.AddField(
model_name='document',
name='storage_type',
field=models.CharField(choices=[('unencrypted', 'Unencrypted'), ('gpg', 'Encrypted with GNU Privacy Guard')], default='gpg', editable=False, max_length=11),
),
# Now that the field is added, change the default to unencrypted
migrations.AlterField(
model_name='document',
name='storage_type',
field=models.CharField(choices=[('unencrypted', 'Unencrypted'), ('gpg', 'Encrypted with GNU Privacy Guard')], default='unencrypted', editable=False, max_length=11),
),
]

View File

@@ -1,4 +1,4 @@
class Renderable(object):
class Renderable:
"""
A handy mixin to make it easier/cleaner to print output based on a
verbosity value.

View File

@@ -57,7 +57,7 @@ class MatchingModel(models.Model):
is_insensitive = models.BooleanField(default=True)
class Meta(object):
class Meta:
abstract = True
def __str__(self):
@@ -156,7 +156,7 @@ class Correspondent(MatchingModel):
# better safe than sorry.
SAFE_REGEX = re.compile(r"^[\w\- ,.']+$")
class Meta(object):
class Meta:
ordering = ("name",)
@@ -190,6 +190,13 @@ class Document(models.Model):
TYPE_TIF = "tiff"
TYPES = (TYPE_PDF, TYPE_PNG, TYPE_JPG, TYPE_GIF, TYPE_TIF,)
STORAGE_TYPE_UNENCRYPTED = "unencrypted"
STORAGE_TYPE_GPG = "gpg"
STORAGE_TYPES = (
(STORAGE_TYPE_UNENCRYPTED, "Unencrypted"),
(STORAGE_TYPE_GPG, "Encrypted with GNU Privacy Guard")
)
correspondent = models.ForeignKey(
Correspondent,
blank=True,
@@ -230,7 +237,17 @@ class Document(models.Model):
modified = models.DateTimeField(
auto_now=True, editable=False, db_index=True)
class Meta(object):
storage_type = models.CharField(
max_length=11,
choices=STORAGE_TYPES,
default=STORAGE_TYPE_UNENCRYPTED,
editable=False
)
added = models.DateTimeField(
default=timezone.now, editable=False, db_index=True)
class Meta:
ordering = ("correspondent", "title")
def __str__(self):
@@ -244,11 +261,16 @@ class Document(models.Model):
@property
def source_path(self):
file_name = "{:07}.{}".format(self.pk, self.file_type)
if self.storage_type == self.STORAGE_TYPE_GPG:
file_name += ".gpg"
return os.path.join(
settings.MEDIA_ROOT,
"documents",
"originals",
"{:07}.{}.gpg".format(self.pk, self.file_type)
file_name
)
@property
@@ -265,11 +287,16 @@ class Document(models.Model):
@property
def thumbnail_path(self):
file_name = "{:07}.png".format(self.pk)
if self.storage_type == self.STORAGE_TYPE_GPG:
file_name += ".gpg"
return os.path.join(
settings.MEDIA_ROOT,
"documents",
"thumbnails",
"{:07}.png.gpg".format(self.pk)
file_name
)
@property
@@ -299,7 +326,7 @@ class Log(models.Model):
objects = LogManager()
class Meta(object):
class Meta:
ordering = ("-modified",)
def __str__(self):
@@ -319,7 +346,7 @@ class Log(models.Model):
models.Model.save(self, *args, **kwargs)
class FileInfo(object):
class FileInfo:
# This epic regex *almost* worked for our needs, so I'm keeping it here for
# posterity, in the hopes that we might find a way to make it work one day.
@@ -394,7 +421,10 @@ class FileInfo(object):
@classmethod
def _get_created(cls, created):
return dateutil.parser.parse("{:0<14}Z".format(created[:-1]))
try:
return dateutil.parser.parse("{:0<14}Z".format(created[:-1]))
except ValueError:
return None
@classmethod
def _get_correspondent(cls, name):

View File

@@ -3,6 +3,10 @@ import os
from subprocess import Popen
from django.conf import settings
from django.contrib.admin.models import ADDITION, LogEntry
from django.contrib.auth.models import User
from django.contrib.contenttypes.models import ContentType
from django.utils import timezone
from ..models import Correspondent, Document, Tag
@@ -93,3 +97,18 @@ def cleanup_document_deletion(sender, instance, using, **kwargs):
os.unlink(f)
except FileNotFoundError:
pass # The file's already gone, so we're cool with it.
def set_log_entry(sender, document=None, logging_group=None, **kwargs):
ct = ContentType.objects.get(model="document")
user = User.objects.get(username="consumer")
LogEntry.objects.create(
action_flag=ADDITION,
action_time=timezone.now(),
content_type=ct,
object_id=document.id,
user=user,
object_repr=document.__str__(),
)

View File

@@ -29,13 +29,32 @@
.result .header {
padding: 5px;
background-color: #79AEC8;
position: relative;
}
.result .header .checkbox{
.result .header .checkbox {
width: 5%;
float: left;
position: absolute;
z-index: 2;
}
.result .header .info {
margin-left: 10%;
position: relative;
}
.headerLink {
cursor: pointer;
opacity: 0;
z-index: 1;
position: absolute;
top: 0;
left: 0;
width: 100%;
height: 100%;
}
.header > a {
z-index: 2;
margin-left: 10%;
position: relative;
}
.result .header a,
.result a.tag {
@@ -129,23 +148,36 @@
{# 0: Checkbox #}
{# 1: Title #}
{# 2: Date #}
{# 3: Image #}
{# 4: Correspondent #}
{# 5: Tags #}
{# 3: Added #}
{# 4: Image #}
{# 5: Correspondent #}
{# 6: Tags #}
{# 7: Document edit url #}
<div class="box">
<div class="result">
<div class="header">
{% comment %}
The purpose of 'headerLink' is to make the whole header
background clickable.
We use an onclick handler here instead of a native link ('<a>')
to allow selecting (and copying) the overlying doc title text
with the mouse cursor.
If the title link were layered upon another link ('<a>'), title text
selection would not be possible with mouse click + drag. Instead,
the underlying link would be dragged.
{% endcomment %}
<div class="headerLink" onclick="location.href='{{ result.7 }}';"></div>
<div class="checkbox">{{ result.0 }}</div>
<div class="info">
{{ result.4 }}<br />
{{ result.1 }}
{{ result.5 }}
</div>
{{ result.1 }}
<div style="clear: both;"></div>
</div>
<div class="tags">{{ result.5 }}</div>
<div class="tags">{{ result.6 }}</div>
<div class="date">{{ result.2 }}</div>
<div style="clear: both;"></div>
<div class="image">{{ result.3 }}</div>
<div class="image">{{ result.4 }}</div>
</div>
</div>
{% endfor %}

View File

@@ -0,0 +1,39 @@
{% extends "admin/index.html" %}
{% load i18n static %}
{# This whole block is here just to override the `get_admin_log` line so #}
{# that the log entries aren't limited to the current user #}
{% block sidebar %}
<div id="content-related">
<div class="module" id="recent-actions-module">
<h2>{% trans 'Recent actions' %}</h2>
<h3>{% trans 'My actions' %}</h3>
{% load log %}
{% get_admin_log 10 as admin_log %}
{% if not admin_log %}
<p>{% trans 'None available' %}</p>
{% else %}
<ul class="actionlist">
{% for entry in admin_log %}
<li class="{% if entry.is_addition %}addlink{% endif %}{% if entry.is_change %}changelink{% endif %}{% if entry.is_deletion %}deletelink{% endif %}">
{% if entry.is_deletion or not entry.get_admin_url %}
{{ entry.object_repr }}
{% else %}
<a href="{{ entry.get_admin_url }}">{{ entry.object_repr }}</a>
{% endif %}
<br/>
{% if entry.content_type %}
<span class="mini quiet">{% filter capfirst %}{{ entry.content_type }}{% endfilter %}</span>
{% else %}
<span class="mini quiet">{% trans 'Unknown content' %}</span>
{% endif %}
</li>
{% endfor %}
</ul>
{% endif %}
</div>
</div>
{% endblock %}

View File

@@ -1,3 +1,5 @@
import re
from django.contrib.admin.templatetags.admin_list import (
result_headers,
result_hidden_fields,
@@ -6,6 +8,8 @@ from django.contrib.admin.templatetags.admin_list import (
from django.template import Library
EXTRACT_URL = re.compile(r'href="(.*?)"')
register = Library()
@@ -25,4 +29,15 @@ def result_list(cl):
'result_hidden_fields': list(result_hidden_fields(cl)),
'result_headers': headers,
'num_sorted_fields': num_sorted_fields,
'results': list(results(cl))}
'results': map(add_doc_edit_url, results(cl))}
def add_doc_edit_url(result):
"""
Make the document edit URL accessible to the view as a separate item
"""
title = result[1]
match = re.search(EXTRACT_URL, title)
edit_doc_url = match.group(1)
result.append(edit_doc_url)
return result

View File

@@ -0,0 +1,25 @@
import unittest
from django.test import TestCase
from ..checks import changed_password_check
from ..models import Document
from .factories import DocumentFactory
class ChecksTestCase(TestCase):
def test_changed_password_check_empty_db(self):
self.assertEqual(changed_password_check(None), [])
def test_changed_password_check_no_encryption(self):
DocumentFactory.create(storage_type=Document.STORAGE_TYPE_UNENCRYPTED)
self.assertEqual(changed_password_check(None), [])
@unittest.skip("I don't know how to test this")
def test_changed_password_check_gpg_encryption_with_good_password(self):
pass
@unittest.skip("I don't know how to test this")
def test_changed_password_check_fail(self):
pass

View File

@@ -1,5 +1,6 @@
from django.test import TestCase
from unittest import mock
from tempfile import TemporaryDirectory
from ..consumer import Consumer
from ..models import FileInfo
@@ -16,7 +17,6 @@ class TestConsumer(TestCase):
self.DummyParser
)
@mock.patch("documents.consumer.Consumer.CONSUME")
@mock.patch("documents.consumer.os.makedirs")
@mock.patch("documents.consumer.os.path.exists", return_value=True)
@mock.patch("documents.consumer.document_consumer_declaration.send")
@@ -32,18 +32,22 @@ class TestConsumer(TestCase):
(None, lambda _: {"weight": 0, "parser": DummyParser1}),
(None, lambda _: {"weight": 1, "parser": DummyParser2}),
)
with TemporaryDirectory() as tmpdir:
self.assertEqual(
Consumer(consume=tmpdir)._get_parser_class("doc.pdf"),
DummyParser2
)
self.assertEqual(Consumer()._get_parser_class("doc.pdf"), DummyParser2)
@mock.patch("documents.consumer.Consumer.CONSUME")
@mock.patch("documents.consumer.os.makedirs")
@mock.patch("documents.consumer.os.path.exists", return_value=True)
@mock.patch("documents.consumer.document_consumer_declaration.send")
def test__get_parser_class_0_parsers(self, m, *args):
m.return_value = ((None, lambda _: None),)
self.assertIsNone(Consumer()._get_parser_class("doc.pdf"))
with TemporaryDirectory() as tmpdir:
self.assertIsNone(
Consumer(consume=tmpdir)._get_parser_class("doc.pdf")
)
@mock.patch("documents.consumer.Consumer.CONSUME")
@mock.patch("documents.consumer.os.makedirs")
@mock.patch("documents.consumer.os.path.exists", return_value=True)
@mock.patch("documents.consumer.document_consumer_declaration.send")
@@ -51,7 +55,8 @@ class TestConsumer(TestCase):
m.return_value = (
(None, lambda _: {"weight": 0, "parser": self.DummyParser}),
)
return Consumer()
with TemporaryDirectory() as tmpdir:
return Consumer(consume=tmpdir)
class TestAttributes(TestCase):
@@ -271,11 +276,13 @@ class TestFieldPermutations(TestCase):
def test_created_and_correspondent_and_title_and_tags(self):
template = ("/path/to/{created} - "
"{correspondent} - "
"{title} - "
"{tags}"
".{extension}")
template = (
"/path/to/{created} - "
"{correspondent} - "
"{title} - "
"{tags}"
".{extension}"
)
for created in self.valid_dates:
for correspondent in self.valid_correspondents:
@@ -294,10 +301,7 @@ class TestFieldPermutations(TestCase):
def test_created_and_correspondent_and_title(self):
template = ("/path/to/{created} - "
"{correspondent} - "
"{title}"
".{extension}")
template = "/path/to/{created} - {correspondent} - {title}.{extension}"
for created in self.valid_dates:
for correspondent in self.valid_correspondents:
@@ -320,9 +324,7 @@ class TestFieldPermutations(TestCase):
def test_created_and_title(self):
template = ("/path/to/{created} - "
"{title}"
".{extension}")
template = "/path/to/{created} - {title}.{extension}"
for created in self.valid_dates:
for title in self.valid_titles:
@@ -337,10 +339,7 @@ class TestFieldPermutations(TestCase):
def test_created_and_title_and_tags(self):
template = ("/path/to/{created} - "
"{title} - "
"{tags}"
".{extension}")
template = "/path/to/{created} - {title} - {tags}.{extension}"
for created in self.valid_dates:
for title in self.valid_titles:
@@ -354,3 +353,8 @@ class TestFieldPermutations(TestCase):
}
self._test_guessed_attributes(
template.format(**spec), **spec)
def test_invalid_date_format(self):
info = FileInfo.from_path("/path/to/06112017Z - title.pdf")
self.assertEqual(info.title, "title")
self.assertIsNone(info.created)

View File

@@ -1,5 +1,7 @@
from random import randint
from django.contrib.admin.models import LogEntry
from django.contrib.auth.models import User
from django.test import TestCase, override_settings
from ..models import Correspondent, Document, Tag
@@ -208,6 +210,7 @@ class TestDocumentConsumptionFinishedSignal(TestCase):
def setUp(self):
TestCase.setUp(self)
User.objects.create_user(username='test_consumer', password='12345')
self.doc_contains = Document.objects.create(
content="I contain the keyword.", file_type="pdf")
@@ -244,3 +247,9 @@ class TestDocumentConsumptionFinishedSignal(TestCase):
document_consumption_finished.send(
sender=self.__class__, document=self.doc_contains)
self.assertEqual(self.doc_contains.correspondent, None)
def test_logentry_created(self):
document_consumption_finished.send(
sender=self.__class__, document=self.doc_contains)
self.assertEqual(LogEntry.objects.count(), 1)

View File

@@ -52,12 +52,12 @@ class FetchView(SessionOrBasicAuthMixin, DetailView):
if self.kwargs["kind"] == "thumb":
return HttpResponse(
GnuPG.decrypted(self.object.thumbnail_file),
self._get_raw_data(self.object.thumbnail_file),
content_type=content_types[Document.TYPE_PNG]
)
response = HttpResponse(
GnuPG.decrypted(self.object.source_file),
self._get_raw_data(self.object.source_file),
content_type=content_types[self.object.file_type]
)
response["Content-Disposition"] = 'attachment; filename="{}"'.format(
@@ -65,6 +65,11 @@ class FetchView(SessionOrBasicAuthMixin, DetailView):
return response
def _get_raw_data(self, file_handle):
if self.object.storage_type == Document.STORAGE_TYPE_UNENCRYPTED:
return file_handle
return GnuPG.decrypted(file_handle)
class PushView(SessionOrBasicAuthMixin, FormView):
"""

View File

@@ -3,16 +3,9 @@ import os
import sys
if __name__ == "__main__":
os.environ.setdefault("DJANGO_SETTINGS_MODULE", "paperless.settings")
from django.conf import settings
from django.core.management import execute_from_command_line
# The runserver and consumer need to have access to the passphrase, so it
# must be entered at start time to keep it safe.
if "runserver" in sys.argv or "document_consumer" in sys.argv:
if not settings.PASSPHRASE:
settings.PASSPHRASE = input(
"settings.PASSPHRASE is unset. Input passphrase: ")
execute_from_command_line(sys.argv)

View File

@@ -2,7 +2,7 @@ import os
import shutil
from django.conf import settings
from django.core.checks import Error, register, Warning
from django.core.checks import Error, Warning, register
@register()
@@ -84,20 +84,3 @@ def binaries_check(app_configs, **kwargs):
check_messages.append(Warning(error.format(binary), hint))
return check_messages
@register()
def config_check(app_configs, **kwargs):
warning = (
"It looks like you have PAPERLESS_SHARED_SECRET defined. Note that "
"in the \npast, this variable was used for both API authentication "
"and as the mail \nkeyword. As the API no no longer uses it, this "
"variable has been renamed to \nPAPERLESS_EMAIL_SECRET, so if you're "
"using the mail feature, you'd best update \nyour variable name.\n\n"
"The old variable will stop working in a few months."
)
if os.getenv("PAPERLESS_SHARED_SECRET"):
return [Warning(warning)]
return []

View File

@@ -3,7 +3,7 @@ import gnupg
from django.conf import settings
class GnuPG(object):
class GnuPG:
"""
A handy singleton to use when handling encrypted files.
"""
@@ -11,15 +11,22 @@ class GnuPG(object):
gpg = gnupg.GPG(gnupghome=settings.GNUPG_HOME)
@classmethod
def decrypted(cls, file_handle):
return cls.gpg.decrypt_file(
file_handle, passphrase=settings.PASSPHRASE).data
def decrypted(cls, file_handle, passphrase=None):
if not passphrase:
passphrase = settings.PASSPHRASE
return cls.gpg.decrypt_file(file_handle, passphrase=passphrase).data
@classmethod
def encrypted(cls, file_handle):
def encrypted(cls, file_handle, passphrase=None):
if not passphrase:
passphrase = settings.PASSPHRASE
return cls.gpg.encrypt_file(
file_handle,
recipients=None,
passphrase=settings.PASSPHRASE,
passphrase=passphrase,
symmetric=True
).data

View File

@@ -18,6 +18,8 @@ from dotenv import load_dotenv
# Tap paperless.conf if it's available
if os.path.exists("/etc/paperless.conf"):
load_dotenv("/etc/paperless.conf")
elif os.path.exists("/usr/local/etc/paperless.conf"):
load_dotenv("/usr/local/etc/paperless.conf")
# Build paths inside the project like this: os.path.join(BASE_DIR, ...)
@@ -39,7 +41,7 @@ SECRET_KEY = os.getenv(
# SECURITY WARNING: don't run with debug turned on in production!
DEBUG = True
LOGIN_URL = '/admin/login'
LOGIN_URL = "admin:login"
ALLOWED_HOSTS = ["*"]
@@ -90,11 +92,11 @@ MIDDLEWARE_CLASSES = [
'django.middleware.clickjacking.XFrameOptionsMiddleware',
]
#If AUTH is disabled, we just use our "bypass" authentication middleware
if bool(os.getenv("PAPERLESS_DISABLE_LOGIN", "false").lower() in ("yes", "y", "1", "t", "true")):
_index = MIDDLEWARE_CLASSES.index('django.contrib.auth.middleware.AuthenticationMiddleware')
MIDDLEWARE_CLASSES[_index] = 'paperless.middleware.Middleware'
MIDDLEWARE_CLASSES.remove('django.contrib.auth.middleware.SessionAuthenticationMiddleware')
# If auth is disabled, we just use our "bypass" authentication middleware
if bool(os.getenv("PAPERLESS_DISABLE_LOGIN", "false").lower() in ("yes", "y", "1", "t", "true")):
_index = MIDDLEWARE_CLASSES.index("django.contrib.auth.middleware.AuthenticationMiddleware")
MIDDLEWARE_CLASSES[_index] = "paperless.middleware.Middleware"
MIDDLEWARE_CLASSES.remove("django.contrib.auth.middleware.SessionAuthenticationMiddleware")
ROOT_URLCONF = 'paperless.urls'
@@ -183,8 +185,8 @@ STATIC_ROOT = os.getenv(
MEDIA_ROOT = os.getenv(
"PAPERLESS_MEDIADIR", os.path.join(BASE_DIR, "..", "media"))
STATIC_URL = '/static/'
MEDIA_URL = "/media/"
STATIC_URL = os.getenv("PAPERLESS_STATIC_URL", "/static/")
MEDIA_URL = os.getenv("PAPERLESS_MEDIA_URL", "/media/")
# Paperless-specific stuff
@@ -219,12 +221,12 @@ OCR_LANGUAGE = os.getenv("PAPERLESS_OCR_LANGUAGE", "eng")
OCR_THREADS = os.getenv("PAPERLESS_OCR_THREADS")
# OCR all documents?
OCR_ALWAYS = bool(os.getenv("PAPERLESS_OCR_ALWAYS", "NO").lower() in ("yes", "y", "1", "t", "true"))
OCR_ALWAYS = bool(os.getenv("PAPERLESS_OCR_ALWAYS", "NO").lower() in ("yes", "y", "1", "t", "true")) # NOQA
# If this is true, any failed attempts to OCR a PDF will result in the PDF
# being indexed anyway, with whatever we could get. If it's False, the file
# will simply be left in the CONSUMPTION_DIR.
FORGIVING_OCR = bool(os.getenv("PAPERLESS_FORGIVING_OCR", "YES").lower() in ("yes", "y", "1", "t", "true"))
FORGIVING_OCR = bool(os.getenv("PAPERLESS_FORGIVING_OCR", "YES").lower() in ("yes", "y", "1", "t", "true")) # NOQA
# GNUPG needs a home directory for some reason
GNUPG_HOME = os.getenv("HOME", "/tmp")
@@ -244,18 +246,24 @@ SCRATCH_DIR = os.getenv("PAPERLESS_SCRATCH_DIR", "/tmp/paperless")
# This is where Paperless will look for PDFs to index
CONSUMPTION_DIR = os.getenv("PAPERLESS_CONSUMPTION_DIR")
# (This setting is ignored on Linux where inotify is used instead of a
# polling loop.)
# The number of seconds that Paperless will wait between checking
# CONSUMPTION_DIR. If you tend to write documents to this directory very
# slowly, you may want to use a higher value than the default.
CONSUMER_LOOP_TIME = int(os.getenv("PAPERLESS_CONSUMER_LOOP_TIME", 10))
# This is used to encrypt the original documents and decrypt them later when
# you want to download them. Set it and change the permissions on this file to
# 0600, or set it to `None` and you'll be prompted for the passphrase at
# runtime. The default looks for an environment variable.
# DON'T FORGET TO SET THIS as leaving it blank may cause some strange things
# with GPG, including an interesting case where it may "encrypt" zero-byte
# files.
# Pre-2.x versions of Paperless stored your documents locally with GPG
# encryption, but that is no longer the default. This behaviour is still
# available, but it must be explicitly enabled by setting
# `PAPERLESS_PASSPHRASE` in your environment or config file. The default is to
# store these files unencrypted.
#
# Translation:
# * If you're a new user, you can safely ignore this setting.
# * If you're upgrading from 1.x, this must be set, OR you can run
# `./manage.py change_storage_type gpg unencrypted` to decrypt your files,
# after which you can unset this value.
PASSPHRASE = os.getenv("PAPERLESS_PASSPHRASE")
# Trigger a script after every successful document consumption?

View File

@@ -1,6 +1,7 @@
from django.conf import settings
from django.conf.urls import include, static, url
from django.contrib import admin
from django.urls import reverse_lazy
from django.views.decorators.csrf import csrf_exempt
from django.views.generic import RedirectView
from rest_framework.routers import DefaultRouter
@@ -45,7 +46,8 @@ urlpatterns = [
url(r"admin/", admin.site.urls),
# Redirect / to /admin
url(r"^$", RedirectView.as_view(permanent=True, url="/admin/")),
url(r"^$", RedirectView.as_view(
permanent=True, url=reverse_lazy("admin:index"))),
] + static.static(settings.MEDIA_URL, document_root=settings.MEDIA_ROOT)

View File

@@ -242,7 +242,7 @@ class RasterisedDocumentParser(DocumentParser):
break
if date is not None:
self.log("info", "Detected document date " + date.strftime("%x") +
self.log("info", "Detected document date " + date.isoformat() +
" based on string " + datestring)
else:
self.log("info", "Unable to detect date for document")
@@ -285,7 +285,7 @@ def image_to_string(args):
try:
orientation = ocr.detect_orientation(f, lang=lang)
f = f.rotate(orientation["angle"], expand=1)
except (TesseractError, OtherTesseractError):
except (TesseractError, OtherTesseractError, AttributeError):
pass
return ocr.image_to_string(f, lang=lang)

View File

@@ -1,8 +0,0 @@
[pytest]
DJANGO_SETTINGS_MODULE=paperless.settings
addopts = --pythonwarnings=all
env =
PAPERLESS_CONSUME=/tmp
PAPERLESS_PASSPHRASE=THISISNOTASECRET
PAPERLESS_SECRET=paperless
PAPERLESS_EMAIL_SECRET=paperless

18
src/setup.cfg Normal file
View File

@@ -0,0 +1,18 @@
[pycodestyle]
exclude = migrations, paperless/settings.py, .tox
[tool:pytest]
DJANGO_SETTINGS_MODULE=paperless.settings
addopts = --pythonwarnings=all -n auto
env =
PAPERLESS_PASSPHRASE=THISISNOTASECRET
PAPERLESS_SECRET=paperless
PAPERLESS_EMAIL_SECRET=paperless
[coverage:run]
source =
./
omit =
*/tests

View File

@@ -5,7 +5,7 @@
[tox]
skipsdist = True
envlist = py34, py35, py36, pycodestyle
envlist = py34, py35, py36, pycodestyle, doc
[testenv]
commands = pytest
@@ -15,8 +15,8 @@ deps = -r{toxinidir}/../requirements.txt
commands=pycodestyle
deps=pycodestyle
[pycodestyle]
exclude=
.tox,
migrations,
paperless/settings.py
[testenv:doc]
deps =
-r{toxinidir}/../requirements.txt
sphinx
commands=sphinx-build -b html ../docs ../docs/_build -W