Merge pull request #222 from tido-/master

little changes to reflect as much as possible
This commit is contained in:
Daniel Quinn 2017-05-10 15:25:35 -07:00 committed by GitHub
commit 3477b96d87
4 changed files with 57 additions and 46 deletions

View File

@ -6,7 +6,7 @@ Paperless
|Travis| |Travis|
|Dependencies| |Dependencies|
Scan, index, and archive all of your paper documents Index and archive all of your scanned paper documents
I hate paper. Environmental issues aside, it's a tech person's nightmare: I hate paper. Environmental issues aside, it's a tech person's nightmare:
@ -23,6 +23,8 @@ it... because paper. I wrote this to make my life easier.
How it Works How it Works
============ ============
Paperless does not control your scanner, it only helps you deal with what your scanner produces
1. Buy a document scanner like `this one`_ (used by me) or `this other one`_ 1. Buy a document scanner like `this one`_ (used by me) or `this other one`_
recommended by another user. recommended by another user.
2. Set it up to "scan to FTP" or something similar. It should be able to push 2. Set it up to "scan to FTP" or something similar. It should be able to push
@ -30,7 +32,7 @@ How it Works
scanner doesn't know how to automatically upload the file somewhere, you can scanner doesn't know how to automatically upload the file somewhere, you can
always do that manually. Paperless doesn't care how the documents get into always do that manually. Paperless doesn't care how the documents get into
its local consumption directory. its local consumption directory.
3. Have the target server run the Paperless consumption script to OCR the PDF 3. Have the target server run the Paperless consumption script to OCR the file
and index it into a local database. and index it into a local database.
4. Use the web frontend to sift through the database and find what you want. 4. Use the web frontend to sift through the database and find what you want.
5. Download the PDF you need/want via the web interface and do whatever you 5. Download the PDF you need/want via the web interface and do whatever you
@ -48,9 +50,8 @@ Stability
========= =========
Paperless is still under active development (just look at the git commit Paperless is still under active development (just look at the git commit
history) so don't expect it to be 100% stable. I'm using it for my own history) so don't expect it to be 100% stable. You can backup the sqlite3
documents, but I'm crazy like that. If you use this and it breaks something, database, media directory and your configuration file to be on the safe side.
you get to keep all the shiny pieces.
Requirements Requirements
@ -83,22 +84,22 @@ Similar Projects
There's another project out there called `Mayan EDMS`_ that has a surprising There's another project out there called `Mayan EDMS`_ that has a surprising
amount of technical overlap with Paperless. Also based on Django and using amount of technical overlap with Paperless. Also based on Django and using
a consumer model with Tesseract and unpaper, Mayan EDMS is *much* more a consumer model with Tesseract and Unpaper, Mayan EDMS is *much* more
featureful and comes with a slick UI as well. It may be that Paperless is featureful and comes with a slick UI as well, but still in Python 2. It may be
better suited for low-resource environments (like a Rasberry Pi), but to be that Paperless consumes fewer resources, but to be honest, this is just a guess
honest, this is just a guess as I haven't tested this myself. One thing's as I haven't tested this myself. One thing's for certain though, *Paperless*
for certain though, *Paperless* is a **much** better name. is a **much** better name.
Important Note Important Note
============== ==============
Document scanners are typically used to scan sensitive documents. Things like Document scanners are typically used to scan sensitive documents. Things like
your social insurance number, tax records, invoices, etc. While paperless your social insurance number, tax records, invoices, etc. While Paperless
encrypts the original PDFs via the consumption script, the OCR'd text is *not* encrypts the original files via the consumption script, the OCR'd text is *not*
encrypted and is therefore stored in the clear (it needs to be searchable, so encrypted and is therefore stored in the clear (it needs to be searchable, so
if someone has ideas on how to do that on encrypted data, I'm all ears). This if someone has ideas on how to do that on encrypted data, I'm all ears). This
means that paperless should never be run on an untrusted host. Instead, I means that Paperless should never be run on an untrusted host. Instead, I
recommend that if you do want to use it, run it locally on a server in your own recommend that if you do want to use it, run it locally on a server in your own
home. home.

View File

@ -3,7 +3,11 @@
Paperless Paperless
========= =========
Scan, index, and archive all of your paper documents. Say goodbye to paper. Paperless is a simple Django application running in two parts:
a :ref:`consumer <utilities-consumer>` (the thing that does the indexing) and
the :ref:`webserver <utilities-webserver>` (the part that lets you search & download
already-indexed documents). If you want to learn more about its functions keep on
reading after the installation section.
.. _index-why-this-exists: .. _index-why-this-exists:
@ -15,10 +19,11 @@ Paper is a nightmare. Environmental issues aside, there's no excuse for it in
the 21st century. It takes up space, collects dust, doesn't support any form of the 21st century. It takes up space, collects dust, doesn't support any form of
a search feature, indexing is tedious, it's heavy and prone to damage & loss. a search feature, indexing is tedious, it's heavy and prone to damage & loss.
I wrote this to make "going paperless" easier. I wanted to be able to feed I wrote this to make "going paperless" easier. I do not have to worry about
documents right from the post box into the scanner and then shred them so I finding stuff again. I feed documents right from the post box into the scanner and
never have to worry about finding stuff again. Perhaps you might find it useful then shred them. Perhaps you might find it useful too.
too.
Contents Contents

View File

@ -4,7 +4,7 @@ Requirements
============ ============
You need a Linux machine or Unix-like setup (theoretically an Apple machine You need a Linux machine or Unix-like setup (theoretically an Apple machine
should work) that has the following software installed on it: should work) that has the following software installed:
* `Python3`_ (with development libraries, pip and virtualenv) * `Python3`_ (with development libraries, pip and virtualenv)
* `GNU Privacy Guard`_ * `GNU Privacy Guard`_
@ -21,14 +21,14 @@ should work) that has the following software installed on it:
Notably, you should confirm how you access your Python3 installation. Many Notably, you should confirm how you access your Python3 installation. Many
Linux distributions will install Python3 in parallel to Python2, using the names Linux distributions will install Python3 in parallel to Python2, using the names
``python3`` and ``python`` respectively. The same goes for ``pip3`` and ``python3`` and ``python`` respectively. The same goes for ``pip3`` and
``pip``. Using Python2 will likely break things, so make sure that you're using ``pip``. Running Paperless with Python2 will likely break things, so make sure that
the right version. you're using the right version.
For the purposes of simplicity, ``python`` and ``pip`` is used everywhere to For the purposes of simplicity, ``python`` and ``pip`` is used everywhere to
refer to their Python 3 versions. refer to their Python3 versions.
In addition to the above, there are a number of Python requirements, all of In addition to the above, there are a number of Python requirements, all of
which are listed in a file called ``requirements.txt`` in the project root. which are listed in a file called ``requirements.txt`` in the project root directory.
If you're not working on a virtual environment (like Vagrant or Docker), you If you're not working on a virtual environment (like Vagrant or Docker), you
should probably be using a virtualenv, but that's your call. The reasons why should probably be using a virtualenv, but that's your call. The reasons why
@ -67,7 +67,7 @@ dependencies is easy:
$ pip install --user --requirement /path/to/paperless/requirements.txt $ pip install --user --requirement /path/to/paperless/requirements.txt
This should download and install all of the requirements into This will download and install all of the requirements into
``${HOME}/.local``. Remember that your distribution may be using ``pip3`` as ``${HOME}/.local``. Remember that your distribution may be using ``pip3`` as
mentioned above. mentioned above.
@ -86,8 +86,8 @@ enter it, and install the requirements using the ``requirements.txt`` file:
$ . /path/to/arbitrary/directory/bin/activate $ . /path/to/arbitrary/directory/bin/activate
$ pip install --requirement /path/to/paperless/requirements.txt $ pip install --requirement /path/to/paperless/requirements.txt
Now you're ready to go. Just remember to enter your virtualenv whenever you Now you're ready to go. Just remember to enter (activate) your virtualenv
want to use Paperless. whenever you want to use Paperless.
.. _requirements-documentation: .. _requirements-documentation:
@ -95,7 +95,7 @@ want to use Paperless.
Documentation Documentation
------------- -------------
As generation of the documentation is not required for use of Paperless, As generation of the documentation is not required for the use of Paperless,
dependencies for this process are not included in ``requirements.txt``. If dependencies for this process are not included in ``requirements.txt``. If
you'd like to generate your own docs locally, you'll need to: you'd like to generate your own docs locally, you'll need to:

View File

@ -4,9 +4,8 @@ Setup
===== =====
Paperless isn't a very complicated app, but there are a few components, so some Paperless isn't a very complicated app, but there are a few components, so some
basic documentation is in order. If you go follow along in this document and basic documentation is in order. If you follow along in this document and still
still have trouble, please open an `issue on GitHub`_ so I can fill in the have trouble, please open an `issue on GitHub`_ so I can fill in the gaps.
gaps.
.. _issue on GitHub: https://github.com/danielquinn/paperless/issues .. _issue on GitHub: https://github.com/danielquinn/paperless/issues
@ -28,6 +27,7 @@ or just download the tarball and go that route:
.. code:: bash .. code:: bash
$ cd to the directory where you want to run Paperless
$ wget https://github.com/danielquinn/paperless/archive/master.zip $ wget https://github.com/danielquinn/paperless/archive/master.zip
$ unzip master.zip $ unzip master.zip
$ cd paperless-master $ cd paperless-master
@ -43,7 +43,9 @@ route`_ is quick & easy, but means you're running a VM which comes with memory
consumption etc. We also `support Docker`_, which you can use natively under consumption etc. We also `support Docker`_, which you can use natively under
Linux and in a VM with `Docker Machine`_ (this guide was written for native Linux and in a VM with `Docker Machine`_ (this guide was written for native
Docker usage under Linux, you might have to adapt it for Docker Machine.) Docker usage under Linux, you might have to adapt it for Docker Machine.)
Alternatively the standard, `bare metal`_ approach is a little more Not to forget the virtualenv, this is similar to `bare metal`_ with the exception
that you have to activate the virtualenv first.
Last but not least, the standard `bare metal`_ approach is a little more
complicated, but worth it because it makes it easier should you want to complicated, but worth it because it makes it easier should you want to
contribute some code back. contribute some code back.
@ -59,9 +61,11 @@ Standard (Bare Metal)
..................... .....................
1. Install the requirements as per the :ref:`requirements <requirements>` page. 1. Install the requirements as per the :ref:`requirements <requirements>` page.
2. Change to the ``src`` directory in this repo. 2. Within the extract of master.zip go to the ``src`` directory.
3. Copy ``paperless.conf.example`` to ``/etc/paperless.conf`` and open it in 3. Copy ``paperless.conf.example`` to ``/etc/paperless.conf`` also the virtual
your favourite editor. Set the values for: envrionment look there for it and open it in your favourite editor.
Because this file contains passwords it should only be readable by user root
and paperless ! Set the values for:
* ``PAPERLESS_CONSUMPTION_DIR``: this is where your documents will be * ``PAPERLESS_CONSUMPTION_DIR``: this is where your documents will be
dumped to be consumed by Paperless. dumped to be consumed by Paperless.
@ -70,18 +74,18 @@ Standard (Bare Metal)
* ``PAPERLESS_OCR_THREADS``: this is the number of threads the OCR process * ``PAPERLESS_OCR_THREADS``: this is the number of threads the OCR process
will spawn to process document pages in parallel. will spawn to process document pages in parallel.
4. Initialise the database with ``./manage.py migrate``. 4. Initialise the SQLite database with ``./manage.py migrate``.
5. Create a user for your Paperless instance with 5. Create a user for your Paperless instance with
``./manage.py createsuperuser``. Follow the prompts to create your user. ``./manage.py createsuperuser``. Follow the prompts to create your user.
6. Start the webserver with ``./manage.py runserver <IP>:<PORT>``. 6. Start the webserver with ``./manage.py runserver <IP>:<PORT>``.
If no specifc IP or port are given, the default is ``127.0.0.1:8000``. If no specifc IP or port are given, the default is ``127.0.0.1:8000``
You should now be able to visit your (empty) `Paperless webserver`_ at also known as http://localhost:8000/.
``127.0.0.1:8000`` (or whatever you chose). You can login with the You should now be able to visit your (empty) at `Paperless webserver`_ or
user/pass you created in #5. whatever you chose before. You can login with the user/pass you created in #5.
7. In a separate window, change to the ``src`` directory in this repo again, 7. In a separate window, change to the ``src`` directory in this repo again,
but this time, you should start the consumer script with but this time, you should start the consumer script with
``./manage.py document_consumer``. ``./manage.py document_consumer``.
8. Scan something. Put it in the ``CONSUMPTION_DIR``. 8. Scan something or put a file into the ``CONSUMPTION_DIR``.
9. Wait a few minutes 9. Wait a few minutes
10. Visit the document list on your webserver, and it should be there, indexed 10. Visit the document list on your webserver, and it should be there, indexed
and downloadable. and downloadable.
@ -299,10 +303,11 @@ Standard (Bare Metal, Systemd)
If you're running on a bare metal system that's using Systemd, you can use the If you're running on a bare metal system that's using Systemd, you can use the
service unit files in the ``scripts`` directory to set this up. You'll need to service unit files in the ``scripts`` directory to set this up. You'll need to
create a user called ``paperless`` and setup Paperless to be in a place that create a user called ``paperless`` (without login (if not already done so #5)) and
this new user can read and write to. Be sure to edit the service scripts to point setup Paperless to be in a place that this new user can read and write to. Be sure
to the proper location of your paperless install, referencing the appropriate Python to edit the service scripts to point to the proper location of your paperless install,
binary. For example: ``ExecStart=/path/to/python3 /path/to/paperless/src/manage.py document_consumer``. referencing the appropriate Python binary. For example:
``ExecStart=/path/to/python3 /path/to/paperless/src/manage.py document_consumer``.
If you don't want to make a new user, you can change the ``Group`` and ``User`` variables If you don't want to make a new user, you can change the ``Group`` and ``User`` variables
accordingly. accordingly.
@ -344,7 +349,7 @@ after restarting your system:
If you are using a network interface other than ``eth0``, you will have to If you are using a network interface other than ``eth0``, you will have to
change ``IFACE=eth0``. For example, if you are connected via WiFi, you will change ``IFACE=eth0``. For example, if you are connected via WiFi, you will
likely need to replace ``eth0`` above with ``wlan0``. To see all interfaces, likely need to replace ``eth0`` above with ``wlan0``. To see all interfaces,
run ``ifconfig``. run ``ifconfig -a``.
Save the file. Save the file.