mirror of
https://github.com/paperless-ngx/paperless-ngx.git
synced 2025-04-02 13:45:10 -05:00
Merge pull request #222 from tido-/master
little changes to reflect as much as possible
This commit is contained in:
commit
3477b96d87
27
README.rst
27
README.rst
@ -6,7 +6,7 @@ Paperless
|
||||
|Travis|
|
||||
|Dependencies|
|
||||
|
||||
Scan, index, and archive all of your paper documents
|
||||
Index and archive all of your scanned paper documents
|
||||
|
||||
I hate paper. Environmental issues aside, it's a tech person's nightmare:
|
||||
|
||||
@ -23,6 +23,8 @@ it... because paper. I wrote this to make my life easier.
|
||||
How it Works
|
||||
============
|
||||
|
||||
Paperless does not control your scanner, it only helps you deal with what your scanner produces
|
||||
|
||||
1. Buy a document scanner like `this one`_ (used by me) or `this other one`_
|
||||
recommended by another user.
|
||||
2. Set it up to "scan to FTP" or something similar. It should be able to push
|
||||
@ -30,7 +32,7 @@ How it Works
|
||||
scanner doesn't know how to automatically upload the file somewhere, you can
|
||||
always do that manually. Paperless doesn't care how the documents get into
|
||||
its local consumption directory.
|
||||
3. Have the target server run the Paperless consumption script to OCR the PDF
|
||||
3. Have the target server run the Paperless consumption script to OCR the file
|
||||
and index it into a local database.
|
||||
4. Use the web frontend to sift through the database and find what you want.
|
||||
5. Download the PDF you need/want via the web interface and do whatever you
|
||||
@ -48,9 +50,8 @@ Stability
|
||||
=========
|
||||
|
||||
Paperless is still under active development (just look at the git commit
|
||||
history) so don't expect it to be 100% stable. I'm using it for my own
|
||||
documents, but I'm crazy like that. If you use this and it breaks something,
|
||||
you get to keep all the shiny pieces.
|
||||
history) so don't expect it to be 100% stable. You can backup the sqlite3
|
||||
database, media directory and your configuration file to be on the safe side.
|
||||
|
||||
|
||||
Requirements
|
||||
@ -83,22 +84,22 @@ Similar Projects
|
||||
|
||||
There's another project out there called `Mayan EDMS`_ that has a surprising
|
||||
amount of technical overlap with Paperless. Also based on Django and using
|
||||
a consumer model with Tesseract and unpaper, Mayan EDMS is *much* more
|
||||
featureful and comes with a slick UI as well. It may be that Paperless is
|
||||
better suited for low-resource environments (like a Rasberry Pi), but to be
|
||||
honest, this is just a guess as I haven't tested this myself. One thing's
|
||||
for certain though, *Paperless* is a **much** better name.
|
||||
a consumer model with Tesseract and Unpaper, Mayan EDMS is *much* more
|
||||
featureful and comes with a slick UI as well, but still in Python 2. It may be
|
||||
that Paperless consumes fewer resources, but to be honest, this is just a guess
|
||||
as I haven't tested this myself. One thing's for certain though, *Paperless*
|
||||
is a **much** better name.
|
||||
|
||||
|
||||
Important Note
|
||||
==============
|
||||
|
||||
Document scanners are typically used to scan sensitive documents. Things like
|
||||
your social insurance number, tax records, invoices, etc. While paperless
|
||||
encrypts the original PDFs via the consumption script, the OCR'd text is *not*
|
||||
your social insurance number, tax records, invoices, etc. While Paperless
|
||||
encrypts the original files via the consumption script, the OCR'd text is *not*
|
||||
encrypted and is therefore stored in the clear (it needs to be searchable, so
|
||||
if someone has ideas on how to do that on encrypted data, I'm all ears). This
|
||||
means that paperless should never be run on an untrusted host. Instead, I
|
||||
means that Paperless should never be run on an untrusted host. Instead, I
|
||||
recommend that if you do want to use it, run it locally on a server in your own
|
||||
home.
|
||||
|
||||
|
@ -3,7 +3,11 @@
|
||||
Paperless
|
||||
=========
|
||||
|
||||
Scan, index, and archive all of your paper documents. Say goodbye to paper.
|
||||
Paperless is a simple Django application running in two parts:
|
||||
a :ref:`consumer <utilities-consumer>` (the thing that does the indexing) and
|
||||
the :ref:`webserver <utilities-webserver>` (the part that lets you search & download
|
||||
already-indexed documents). If you want to learn more about its functions keep on
|
||||
reading after the installation section.
|
||||
|
||||
|
||||
.. _index-why-this-exists:
|
||||
@ -15,10 +19,11 @@ Paper is a nightmare. Environmental issues aside, there's no excuse for it in
|
||||
the 21st century. It takes up space, collects dust, doesn't support any form of
|
||||
a search feature, indexing is tedious, it's heavy and prone to damage & loss.
|
||||
|
||||
I wrote this to make "going paperless" easier. I wanted to be able to feed
|
||||
documents right from the post box into the scanner and then shred them so I
|
||||
never have to worry about finding stuff again. Perhaps you might find it useful
|
||||
too.
|
||||
I wrote this to make "going paperless" easier. I do not have to worry about
|
||||
finding stuff again. I feed documents right from the post box into the scanner and
|
||||
then shred them. Perhaps you might find it useful too.
|
||||
|
||||
|
||||
|
||||
|
||||
Contents
|
||||
|
@ -4,7 +4,7 @@ Requirements
|
||||
============
|
||||
|
||||
You need a Linux machine or Unix-like setup (theoretically an Apple machine
|
||||
should work) that has the following software installed on it:
|
||||
should work) that has the following software installed:
|
||||
|
||||
* `Python3`_ (with development libraries, pip and virtualenv)
|
||||
* `GNU Privacy Guard`_
|
||||
@ -21,14 +21,14 @@ should work) that has the following software installed on it:
|
||||
Notably, you should confirm how you access your Python3 installation. Many
|
||||
Linux distributions will install Python3 in parallel to Python2, using the names
|
||||
``python3`` and ``python`` respectively. The same goes for ``pip3`` and
|
||||
``pip``. Using Python2 will likely break things, so make sure that you're using
|
||||
the right version.
|
||||
``pip``. Running Paperless with Python2 will likely break things, so make sure that
|
||||
you're using the right version.
|
||||
|
||||
For the purposes of simplicity, ``python`` and ``pip`` is used everywhere to
|
||||
refer to their Python 3 versions.
|
||||
refer to their Python3 versions.
|
||||
|
||||
In addition to the above, there are a number of Python requirements, all of
|
||||
which are listed in a file called ``requirements.txt`` in the project root.
|
||||
which are listed in a file called ``requirements.txt`` in the project root directory.
|
||||
|
||||
If you're not working on a virtual environment (like Vagrant or Docker), you
|
||||
should probably be using a virtualenv, but that's your call. The reasons why
|
||||
@ -67,7 +67,7 @@ dependencies is easy:
|
||||
|
||||
$ pip install --user --requirement /path/to/paperless/requirements.txt
|
||||
|
||||
This should download and install all of the requirements into
|
||||
This will download and install all of the requirements into
|
||||
``${HOME}/.local``. Remember that your distribution may be using ``pip3`` as
|
||||
mentioned above.
|
||||
|
||||
@ -86,8 +86,8 @@ enter it, and install the requirements using the ``requirements.txt`` file:
|
||||
$ . /path/to/arbitrary/directory/bin/activate
|
||||
$ pip install --requirement /path/to/paperless/requirements.txt
|
||||
|
||||
Now you're ready to go. Just remember to enter your virtualenv whenever you
|
||||
want to use Paperless.
|
||||
Now you're ready to go. Just remember to enter (activate) your virtualenv
|
||||
whenever you want to use Paperless.
|
||||
|
||||
|
||||
.. _requirements-documentation:
|
||||
@ -95,7 +95,7 @@ want to use Paperless.
|
||||
Documentation
|
||||
-------------
|
||||
|
||||
As generation of the documentation is not required for use of Paperless,
|
||||
As generation of the documentation is not required for the use of Paperless,
|
||||
dependencies for this process are not included in ``requirements.txt``. If
|
||||
you'd like to generate your own docs locally, you'll need to:
|
||||
|
||||
|
@ -4,9 +4,8 @@ Setup
|
||||
=====
|
||||
|
||||
Paperless isn't a very complicated app, but there are a few components, so some
|
||||
basic documentation is in order. If you go follow along in this document and
|
||||
still have trouble, please open an `issue on GitHub`_ so I can fill in the
|
||||
gaps.
|
||||
basic documentation is in order. If you follow along in this document and still
|
||||
have trouble, please open an `issue on GitHub`_ so I can fill in the gaps.
|
||||
|
||||
.. _issue on GitHub: https://github.com/danielquinn/paperless/issues
|
||||
|
||||
@ -28,6 +27,7 @@ or just download the tarball and go that route:
|
||||
|
||||
.. code:: bash
|
||||
|
||||
$ cd to the directory where you want to run Paperless
|
||||
$ wget https://github.com/danielquinn/paperless/archive/master.zip
|
||||
$ unzip master.zip
|
||||
$ cd paperless-master
|
||||
@ -42,8 +42,10 @@ You can go multiple routes with setting up and running Paperless. The `Vagrant
|
||||
route`_ is quick & easy, but means you're running a VM which comes with memory
|
||||
consumption etc. We also `support Docker`_, which you can use natively under
|
||||
Linux and in a VM with `Docker Machine`_ (this guide was written for native
|
||||
Docker usage under Linux, you might have to adapt it for Docker Machine.)
|
||||
Alternatively the standard, `bare metal`_ approach is a little more
|
||||
Docker usage under Linux, you might have to adapt it for Docker Machine.)
|
||||
Not to forget the virtualenv, this is similar to `bare metal`_ with the exception
|
||||
that you have to activate the virtualenv first.
|
||||
Last but not least, the standard `bare metal`_ approach is a little more
|
||||
complicated, but worth it because it makes it easier should you want to
|
||||
contribute some code back.
|
||||
|
||||
@ -59,9 +61,11 @@ Standard (Bare Metal)
|
||||
.....................
|
||||
|
||||
1. Install the requirements as per the :ref:`requirements <requirements>` page.
|
||||
2. Change to the ``src`` directory in this repo.
|
||||
3. Copy ``paperless.conf.example`` to ``/etc/paperless.conf`` and open it in
|
||||
your favourite editor. Set the values for:
|
||||
2. Within the extract of master.zip go to the ``src`` directory.
|
||||
3. Copy ``paperless.conf.example`` to ``/etc/paperless.conf`` also the virtual
|
||||
envrionment look there for it and open it in your favourite editor.
|
||||
Because this file contains passwords it should only be readable by user root
|
||||
and paperless ! Set the values for:
|
||||
|
||||
* ``PAPERLESS_CONSUMPTION_DIR``: this is where your documents will be
|
||||
dumped to be consumed by Paperless.
|
||||
@ -70,18 +74,18 @@ Standard (Bare Metal)
|
||||
* ``PAPERLESS_OCR_THREADS``: this is the number of threads the OCR process
|
||||
will spawn to process document pages in parallel.
|
||||
|
||||
4. Initialise the database with ``./manage.py migrate``.
|
||||
4. Initialise the SQLite database with ``./manage.py migrate``.
|
||||
5. Create a user for your Paperless instance with
|
||||
``./manage.py createsuperuser``. Follow the prompts to create your user.
|
||||
6. Start the webserver with ``./manage.py runserver <IP>:<PORT>``.
|
||||
If no specifc IP or port are given, the default is ``127.0.0.1:8000``.
|
||||
You should now be able to visit your (empty) `Paperless webserver`_ at
|
||||
``127.0.0.1:8000`` (or whatever you chose). You can login with the
|
||||
user/pass you created in #5.
|
||||
If no specifc IP or port are given, the default is ``127.0.0.1:8000``
|
||||
also known as http://localhost:8000/.
|
||||
You should now be able to visit your (empty) at `Paperless webserver`_ or
|
||||
whatever you chose before. You can login with the user/pass you created in #5.
|
||||
7. In a separate window, change to the ``src`` directory in this repo again,
|
||||
but this time, you should start the consumer script with
|
||||
``./manage.py document_consumer``.
|
||||
8. Scan something. Put it in the ``CONSUMPTION_DIR``.
|
||||
8. Scan something or put a file into the ``CONSUMPTION_DIR``.
|
||||
9. Wait a few minutes
|
||||
10. Visit the document list on your webserver, and it should be there, indexed
|
||||
and downloadable.
|
||||
@ -299,10 +303,11 @@ Standard (Bare Metal, Systemd)
|
||||
|
||||
If you're running on a bare metal system that's using Systemd, you can use the
|
||||
service unit files in the ``scripts`` directory to set this up. You'll need to
|
||||
create a user called ``paperless`` and setup Paperless to be in a place that
|
||||
this new user can read and write to. Be sure to edit the service scripts to point
|
||||
to the proper location of your paperless install, referencing the appropriate Python
|
||||
binary. For example: ``ExecStart=/path/to/python3 /path/to/paperless/src/manage.py document_consumer``.
|
||||
create a user called ``paperless`` (without login (if not already done so #5)) and
|
||||
setup Paperless to be in a place that this new user can read and write to. Be sure
|
||||
to edit the service scripts to point to the proper location of your paperless install,
|
||||
referencing the appropriate Python binary. For example:
|
||||
``ExecStart=/path/to/python3 /path/to/paperless/src/manage.py document_consumer``.
|
||||
If you don't want to make a new user, you can change the ``Group`` and ``User`` variables
|
||||
accordingly.
|
||||
|
||||
@ -344,7 +349,7 @@ after restarting your system:
|
||||
If you are using a network interface other than ``eth0``, you will have to
|
||||
change ``IFACE=eth0``. For example, if you are connected via WiFi, you will
|
||||
likely need to replace ``eth0`` above with ``wlan0``. To see all interfaces,
|
||||
run ``ifconfig``.
|
||||
run ``ifconfig -a``.
|
||||
|
||||
Save the file.
|
||||
|
||||
|
Loading…
x
Reference in New Issue
Block a user