Merge pull request #222 from tido-/master

little changes to reflect as much as possible
This commit is contained in:
Daniel Quinn 2017-05-10 15:25:35 -07:00 committed by GitHub
commit 3477b96d87
4 changed files with 57 additions and 46 deletions

View File

@ -6,7 +6,7 @@ Paperless
|Travis|
|Dependencies|
Scan, index, and archive all of your paper documents
Index and archive all of your scanned paper documents
I hate paper. Environmental issues aside, it's a tech person's nightmare:
@ -23,6 +23,8 @@ it... because paper. I wrote this to make my life easier.
How it Works
============
Paperless does not control your scanner, it only helps you deal with what your scanner produces
1. Buy a document scanner like `this one`_ (used by me) or `this other one`_
recommended by another user.
2. Set it up to "scan to FTP" or something similar. It should be able to push
@ -30,7 +32,7 @@ How it Works
scanner doesn't know how to automatically upload the file somewhere, you can
always do that manually. Paperless doesn't care how the documents get into
its local consumption directory.
3. Have the target server run the Paperless consumption script to OCR the PDF
3. Have the target server run the Paperless consumption script to OCR the file
and index it into a local database.
4. Use the web frontend to sift through the database and find what you want.
5. Download the PDF you need/want via the web interface and do whatever you
@ -48,9 +50,8 @@ Stability
=========
Paperless is still under active development (just look at the git commit
history) so don't expect it to be 100% stable. I'm using it for my own
documents, but I'm crazy like that. If you use this and it breaks something,
you get to keep all the shiny pieces.
history) so don't expect it to be 100% stable. You can backup the sqlite3
database, media directory and your configuration file to be on the safe side.
Requirements
@ -83,22 +84,22 @@ Similar Projects
There's another project out there called `Mayan EDMS`_ that has a surprising
amount of technical overlap with Paperless. Also based on Django and using
a consumer model with Tesseract and unpaper, Mayan EDMS is *much* more
featureful and comes with a slick UI as well. It may be that Paperless is
better suited for low-resource environments (like a Rasberry Pi), but to be
honest, this is just a guess as I haven't tested this myself. One thing's
for certain though, *Paperless* is a **much** better name.
a consumer model with Tesseract and Unpaper, Mayan EDMS is *much* more
featureful and comes with a slick UI as well, but still in Python 2. It may be
that Paperless consumes fewer resources, but to be honest, this is just a guess
as I haven't tested this myself. One thing's for certain though, *Paperless*
is a **much** better name.
Important Note
==============
Document scanners are typically used to scan sensitive documents. Things like
your social insurance number, tax records, invoices, etc. While paperless
encrypts the original PDFs via the consumption script, the OCR'd text is *not*
your social insurance number, tax records, invoices, etc. While Paperless
encrypts the original files via the consumption script, the OCR'd text is *not*
encrypted and is therefore stored in the clear (it needs to be searchable, so
if someone has ideas on how to do that on encrypted data, I'm all ears). This
means that paperless should never be run on an untrusted host. Instead, I
means that Paperless should never be run on an untrusted host. Instead, I
recommend that if you do want to use it, run it locally on a server in your own
home.

View File

@ -3,7 +3,11 @@
Paperless
=========
Scan, index, and archive all of your paper documents. Say goodbye to paper.
Paperless is a simple Django application running in two parts:
a :ref:`consumer <utilities-consumer>` (the thing that does the indexing) and
the :ref:`webserver <utilities-webserver>` (the part that lets you search & download
already-indexed documents). If you want to learn more about its functions keep on
reading after the installation section.
.. _index-why-this-exists:
@ -15,10 +19,11 @@ Paper is a nightmare. Environmental issues aside, there's no excuse for it in
the 21st century. It takes up space, collects dust, doesn't support any form of
a search feature, indexing is tedious, it's heavy and prone to damage & loss.
I wrote this to make "going paperless" easier. I wanted to be able to feed
documents right from the post box into the scanner and then shred them so I
never have to worry about finding stuff again. Perhaps you might find it useful
too.
I wrote this to make "going paperless" easier. I do not have to worry about
finding stuff again. I feed documents right from the post box into the scanner and
then shred them. Perhaps you might find it useful too.
Contents

View File

@ -4,7 +4,7 @@ Requirements
============
You need a Linux machine or Unix-like setup (theoretically an Apple machine
should work) that has the following software installed on it:
should work) that has the following software installed:
* `Python3`_ (with development libraries, pip and virtualenv)
* `GNU Privacy Guard`_
@ -21,14 +21,14 @@ should work) that has the following software installed on it:
Notably, you should confirm how you access your Python3 installation. Many
Linux distributions will install Python3 in parallel to Python2, using the names
``python3`` and ``python`` respectively. The same goes for ``pip3`` and
``pip``. Using Python2 will likely break things, so make sure that you're using
the right version.
``pip``. Running Paperless with Python2 will likely break things, so make sure that
you're using the right version.
For the purposes of simplicity, ``python`` and ``pip`` is used everywhere to
refer to their Python 3 versions.
refer to their Python3 versions.
In addition to the above, there are a number of Python requirements, all of
which are listed in a file called ``requirements.txt`` in the project root.
which are listed in a file called ``requirements.txt`` in the project root directory.
If you're not working on a virtual environment (like Vagrant or Docker), you
should probably be using a virtualenv, but that's your call. The reasons why
@ -67,7 +67,7 @@ dependencies is easy:
$ pip install --user --requirement /path/to/paperless/requirements.txt
This should download and install all of the requirements into
This will download and install all of the requirements into
``${HOME}/.local``. Remember that your distribution may be using ``pip3`` as
mentioned above.
@ -86,8 +86,8 @@ enter it, and install the requirements using the ``requirements.txt`` file:
$ . /path/to/arbitrary/directory/bin/activate
$ pip install --requirement /path/to/paperless/requirements.txt
Now you're ready to go. Just remember to enter your virtualenv whenever you
want to use Paperless.
Now you're ready to go. Just remember to enter (activate) your virtualenv
whenever you want to use Paperless.
.. _requirements-documentation:
@ -95,7 +95,7 @@ want to use Paperless.
Documentation
-------------
As generation of the documentation is not required for use of Paperless,
As generation of the documentation is not required for the use of Paperless,
dependencies for this process are not included in ``requirements.txt``. If
you'd like to generate your own docs locally, you'll need to:

View File

@ -4,9 +4,8 @@ Setup
=====
Paperless isn't a very complicated app, but there are a few components, so some
basic documentation is in order. If you go follow along in this document and
still have trouble, please open an `issue on GitHub`_ so I can fill in the
gaps.
basic documentation is in order. If you follow along in this document and still
have trouble, please open an `issue on GitHub`_ so I can fill in the gaps.
.. _issue on GitHub: https://github.com/danielquinn/paperless/issues
@ -28,6 +27,7 @@ or just download the tarball and go that route:
.. code:: bash
$ cd to the directory where you want to run Paperless
$ wget https://github.com/danielquinn/paperless/archive/master.zip
$ unzip master.zip
$ cd paperless-master
@ -42,8 +42,10 @@ You can go multiple routes with setting up and running Paperless. The `Vagrant
route`_ is quick & easy, but means you're running a VM which comes with memory
consumption etc. We also `support Docker`_, which you can use natively under
Linux and in a VM with `Docker Machine`_ (this guide was written for native
Docker usage under Linux, you might have to adapt it for Docker Machine.)
Alternatively the standard, `bare metal`_ approach is a little more
Docker usage under Linux, you might have to adapt it for Docker Machine.)
Not to forget the virtualenv, this is similar to `bare metal`_ with the exception
that you have to activate the virtualenv first.
Last but not least, the standard `bare metal`_ approach is a little more
complicated, but worth it because it makes it easier should you want to
contribute some code back.
@ -59,9 +61,11 @@ Standard (Bare Metal)
.....................
1. Install the requirements as per the :ref:`requirements <requirements>` page.
2. Change to the ``src`` directory in this repo.
3. Copy ``paperless.conf.example`` to ``/etc/paperless.conf`` and open it in
your favourite editor. Set the values for:
2. Within the extract of master.zip go to the ``src`` directory.
3. Copy ``paperless.conf.example`` to ``/etc/paperless.conf`` also the virtual
envrionment look there for it and open it in your favourite editor.
Because this file contains passwords it should only be readable by user root
and paperless ! Set the values for:
* ``PAPERLESS_CONSUMPTION_DIR``: this is where your documents will be
dumped to be consumed by Paperless.
@ -70,18 +74,18 @@ Standard (Bare Metal)
* ``PAPERLESS_OCR_THREADS``: this is the number of threads the OCR process
will spawn to process document pages in parallel.
4. Initialise the database with ``./manage.py migrate``.
4. Initialise the SQLite database with ``./manage.py migrate``.
5. Create a user for your Paperless instance with
``./manage.py createsuperuser``. Follow the prompts to create your user.
6. Start the webserver with ``./manage.py runserver <IP>:<PORT>``.
If no specifc IP or port are given, the default is ``127.0.0.1:8000``.
You should now be able to visit your (empty) `Paperless webserver`_ at
``127.0.0.1:8000`` (or whatever you chose). You can login with the
user/pass you created in #5.
If no specifc IP or port are given, the default is ``127.0.0.1:8000``
also known as http://localhost:8000/.
You should now be able to visit your (empty) at `Paperless webserver`_ or
whatever you chose before. You can login with the user/pass you created in #5.
7. In a separate window, change to the ``src`` directory in this repo again,
but this time, you should start the consumer script with
``./manage.py document_consumer``.
8. Scan something. Put it in the ``CONSUMPTION_DIR``.
8. Scan something or put a file into the ``CONSUMPTION_DIR``.
9. Wait a few minutes
10. Visit the document list on your webserver, and it should be there, indexed
and downloadable.
@ -299,10 +303,11 @@ Standard (Bare Metal, Systemd)
If you're running on a bare metal system that's using Systemd, you can use the
service unit files in the ``scripts`` directory to set this up. You'll need to
create a user called ``paperless`` and setup Paperless to be in a place that
this new user can read and write to. Be sure to edit the service scripts to point
to the proper location of your paperless install, referencing the appropriate Python
binary. For example: ``ExecStart=/path/to/python3 /path/to/paperless/src/manage.py document_consumer``.
create a user called ``paperless`` (without login (if not already done so #5)) and
setup Paperless to be in a place that this new user can read and write to. Be sure
to edit the service scripts to point to the proper location of your paperless install,
referencing the appropriate Python binary. For example:
``ExecStart=/path/to/python3 /path/to/paperless/src/manage.py document_consumer``.
If you don't want to make a new user, you can change the ``Group`` and ``User`` variables
accordingly.
@ -344,7 +349,7 @@ after restarting your system:
If you are using a network interface other than ``eth0``, you will have to
change ``IFACE=eth0``. For example, if you are connected via WiFi, you will
likely need to replace ``eth0`` above with ``wlan0``. To see all interfaces,
run ``ifconfig``.
run ``ifconfig -a``.
Save the file.