mirror of
				https://github.com/paperless-ngx/paperless-ngx.git
				synced 2025-11-03 03:16:10 -06:00 
			
		
		
		
	Merge pull request #222 from tido-/master
little changes to reflect as much as possible
This commit is contained in:
		
							
								
								
									
										27
									
								
								README.rst
									
									
									
									
									
								
							
							
						
						
									
										27
									
								
								README.rst
									
									
									
									
									
								
							@@ -6,7 +6,7 @@ Paperless
 | 
			
		||||
|Travis|
 | 
			
		||||
|Dependencies|
 | 
			
		||||
 | 
			
		||||
Scan, index, and archive all of your paper documents
 | 
			
		||||
Index and archive all of your scanned paper documents
 | 
			
		||||
 | 
			
		||||
I hate paper.  Environmental issues aside, it's a tech person's nightmare:
 | 
			
		||||
 | 
			
		||||
@@ -23,6 +23,8 @@ it... because paper.  I wrote this to make my life easier.
 | 
			
		||||
How it Works
 | 
			
		||||
============
 | 
			
		||||
 | 
			
		||||
Paperless does not control your scanner, it only helps you deal with what your scanner produces
 | 
			
		||||
 | 
			
		||||
1. Buy a document scanner like `this one`_ (used by me) or `this other one`_
 | 
			
		||||
   recommended by another user.
 | 
			
		||||
2. Set it up to "scan to FTP" or something similar. It should be able to push
 | 
			
		||||
@@ -30,7 +32,7 @@ How it Works
 | 
			
		||||
   scanner doesn't know how to automatically upload the file somewhere, you can
 | 
			
		||||
   always do that manually.  Paperless doesn't care how the documents get into
 | 
			
		||||
   its local consumption directory.
 | 
			
		||||
3. Have the target server run the Paperless consumption script to OCR the PDF
 | 
			
		||||
3. Have the target server run the Paperless consumption script to OCR the file
 | 
			
		||||
   and index it into a local database.
 | 
			
		||||
4. Use the web frontend to sift through the database and find what you want.
 | 
			
		||||
5. Download the PDF you need/want via the web interface and do whatever you
 | 
			
		||||
@@ -48,9 +50,8 @@ Stability
 | 
			
		||||
=========
 | 
			
		||||
 | 
			
		||||
Paperless is still under active development (just look at the git commit
 | 
			
		||||
history) so don't expect it to be 100% stable.  I'm using it for my own
 | 
			
		||||
documents, but I'm crazy like that.  If you use this and it breaks something,
 | 
			
		||||
you get to keep all the shiny pieces.
 | 
			
		||||
history) so don't expect it to be 100% stable.  You can backup the sqlite3 
 | 
			
		||||
database, media directory and your configuration file to be on the safe side.
 | 
			
		||||
 | 
			
		||||
 | 
			
		||||
Requirements
 | 
			
		||||
@@ -83,22 +84,22 @@ Similar Projects
 | 
			
		||||
 | 
			
		||||
There's another project out there called `Mayan EDMS`_ that has a surprising
 | 
			
		||||
amount of technical overlap with Paperless.  Also based on Django and using
 | 
			
		||||
a consumer model with Tesseract and unpaper, Mayan EDMS is *much* more
 | 
			
		||||
featureful and comes with a slick UI as well.  It may be that Paperless is
 | 
			
		||||
better suited for low-resource environments (like a Rasberry Pi), but to be
 | 
			
		||||
honest, this is just a guess as I haven't tested this myself.  One thing's
 | 
			
		||||
for certain though, *Paperless* is a **much** better name.
 | 
			
		||||
a consumer model with Tesseract and Unpaper, Mayan EDMS is *much* more
 | 
			
		||||
featureful and comes with a slick UI as well, but still in Python 2. It may be 
 | 
			
		||||
that Paperless consumes fewer resources, but to be honest, this is just a guess 
 | 
			
		||||
as I haven't tested this myself.  One thing's for certain though, *Paperless* 
 | 
			
		||||
is a **much** better name.
 | 
			
		||||
 | 
			
		||||
 | 
			
		||||
Important Note
 | 
			
		||||
==============
 | 
			
		||||
 | 
			
		||||
Document scanners are typically used to scan sensitive documents.  Things like
 | 
			
		||||
your social insurance number, tax records, invoices, etc.  While paperless
 | 
			
		||||
encrypts the original PDFs via the consumption script, the OCR'd text is *not*
 | 
			
		||||
your social insurance number, tax records, invoices, etc.  While Paperless
 | 
			
		||||
encrypts the original files via the consumption script, the OCR'd text is *not*
 | 
			
		||||
encrypted and is therefore stored in the clear (it needs to be searchable, so
 | 
			
		||||
if someone has ideas on how to do that on encrypted data, I'm all ears).  This
 | 
			
		||||
means that paperless should never be run on an untrusted host.  Instead, I
 | 
			
		||||
means that Paperless should never be run on an untrusted host.  Instead, I
 | 
			
		||||
recommend that if you do want to use it, run it locally on a server in your own
 | 
			
		||||
home.
 | 
			
		||||
 | 
			
		||||
 
 | 
			
		||||
@@ -3,7 +3,11 @@
 | 
			
		||||
Paperless
 | 
			
		||||
=========
 | 
			
		||||
 | 
			
		||||
Scan, index, and archive all of your paper documents.  Say goodbye to paper.
 | 
			
		||||
Paperless is a simple Django application running in two parts: 
 | 
			
		||||
a :ref:`consumer <utilities-consumer>` (the thing that does the indexing) and 
 | 
			
		||||
the :ref:`webserver <utilities-webserver>` (the part that lets you search & download
 | 
			
		||||
already-indexed documents). If you want to learn more about its functions keep on 
 | 
			
		||||
reading after the installation section.
 | 
			
		||||
 | 
			
		||||
 | 
			
		||||
.. _index-why-this-exists:
 | 
			
		||||
@@ -15,10 +19,11 @@ Paper is a nightmare.  Environmental issues aside, there's no excuse for it in
 | 
			
		||||
the 21st century.  It takes up space, collects dust, doesn't support any form of
 | 
			
		||||
a search feature, indexing is tedious, it's heavy and prone to damage & loss.
 | 
			
		||||
 | 
			
		||||
I wrote this to make "going paperless" easier.  I wanted to be able to feed
 | 
			
		||||
documents right from the post box into the scanner and then shred them so I
 | 
			
		||||
never have to worry about finding stuff again.  Perhaps you might find it useful
 | 
			
		||||
too.
 | 
			
		||||
I wrote this to make "going paperless" easier.  I do not have to worry about 
 | 
			
		||||
finding stuff again. I feed documents right from the post box into the scanner and 
 | 
			
		||||
then shred them.  Perhaps you might find it useful too.
 | 
			
		||||
 | 
			
		||||
 | 
			
		||||
 | 
			
		||||
 | 
			
		||||
Contents
 | 
			
		||||
 
 | 
			
		||||
@@ -4,7 +4,7 @@ Requirements
 | 
			
		||||
============
 | 
			
		||||
 | 
			
		||||
You need a Linux machine or Unix-like setup (theoretically an Apple machine
 | 
			
		||||
should work) that has the following software installed on it:
 | 
			
		||||
should work) that has the following software installed:
 | 
			
		||||
 | 
			
		||||
* `Python3`_ (with development libraries, pip and virtualenv)
 | 
			
		||||
* `GNU Privacy Guard`_
 | 
			
		||||
@@ -21,14 +21,14 @@ should work) that has the following software installed on it:
 | 
			
		||||
Notably, you should confirm how you access your Python3 installation.  Many
 | 
			
		||||
Linux distributions will install Python3 in parallel to Python2, using the names
 | 
			
		||||
``python3`` and ``python`` respectively.  The same goes for ``pip3`` and
 | 
			
		||||
``pip``.  Using Python2 will likely break things, so make sure that you're using
 | 
			
		||||
the right version.
 | 
			
		||||
``pip``.  Running Paperless with Python2 will likely break things, so make sure that 
 | 
			
		||||
you're using the right version.
 | 
			
		||||
 | 
			
		||||
For the purposes of simplicity, ``python`` and ``pip`` is used everywhere to
 | 
			
		||||
refer to their Python 3 versions.
 | 
			
		||||
refer to their Python3 versions.
 | 
			
		||||
 | 
			
		||||
In addition to the above, there are a number of Python requirements, all of
 | 
			
		||||
which are listed in a file called ``requirements.txt`` in the project root.
 | 
			
		||||
which are listed in a file called ``requirements.txt`` in the project root directory.
 | 
			
		||||
 | 
			
		||||
If you're not working on a virtual environment (like Vagrant or Docker), you
 | 
			
		||||
should probably be using a virtualenv, but that's your call.  The reasons why
 | 
			
		||||
@@ -67,7 +67,7 @@ dependencies is easy:
 | 
			
		||||
 | 
			
		||||
    $ pip install --user --requirement /path/to/paperless/requirements.txt
 | 
			
		||||
 | 
			
		||||
This should download and install all of the requirements into
 | 
			
		||||
This will download and install all of the requirements into
 | 
			
		||||
``${HOME}/.local``.  Remember that your distribution may be using ``pip3`` as
 | 
			
		||||
mentioned above.
 | 
			
		||||
 | 
			
		||||
@@ -86,8 +86,8 @@ enter it, and install the requirements using the ``requirements.txt`` file:
 | 
			
		||||
    $ . /path/to/arbitrary/directory/bin/activate
 | 
			
		||||
    $ pip install  --requirement /path/to/paperless/requirements.txt
 | 
			
		||||
 | 
			
		||||
Now you're ready to go.  Just remember to enter your virtualenv whenever you
 | 
			
		||||
want to use Paperless.
 | 
			
		||||
Now you're ready to go.  Just remember to enter (activate) your virtualenv 
 | 
			
		||||
whenever you want to use Paperless.
 | 
			
		||||
 | 
			
		||||
 | 
			
		||||
.. _requirements-documentation:
 | 
			
		||||
@@ -95,7 +95,7 @@ want to use Paperless.
 | 
			
		||||
Documentation
 | 
			
		||||
-------------
 | 
			
		||||
 | 
			
		||||
As generation of the documentation is not required for use of Paperless,
 | 
			
		||||
As generation of the documentation is not required for the use of Paperless,
 | 
			
		||||
dependencies for this process are not included in ``requirements.txt``.  If
 | 
			
		||||
you'd like to generate your own docs locally, you'll need to:
 | 
			
		||||
 | 
			
		||||
 
 | 
			
		||||
@@ -4,9 +4,8 @@ Setup
 | 
			
		||||
=====
 | 
			
		||||
 | 
			
		||||
Paperless isn't a very complicated app, but there are a few components, so some
 | 
			
		||||
basic documentation is in order.  If you go follow along in this document and
 | 
			
		||||
still have trouble, please open an `issue on GitHub`_ so I can fill in the
 | 
			
		||||
gaps.
 | 
			
		||||
basic documentation is in order.  If you follow along in this document and still 
 | 
			
		||||
have trouble, please open an `issue on GitHub`_ so I can fill in the gaps.
 | 
			
		||||
 | 
			
		||||
.. _issue on GitHub: https://github.com/danielquinn/paperless/issues
 | 
			
		||||
 | 
			
		||||
@@ -28,6 +27,7 @@ or just download the tarball and go that route:
 | 
			
		||||
 | 
			
		||||
.. code:: bash
 | 
			
		||||
 | 
			
		||||
    $ cd to the directory where you want to run Paperless
 | 
			
		||||
    $ wget https://github.com/danielquinn/paperless/archive/master.zip
 | 
			
		||||
    $ unzip master.zip
 | 
			
		||||
    $ cd paperless-master
 | 
			
		||||
@@ -42,8 +42,10 @@ You can go multiple routes with setting up and running Paperless. The `Vagrant
 | 
			
		||||
route`_ is quick & easy, but means you're running a VM which comes with memory
 | 
			
		||||
consumption etc. We also `support Docker`_, which you can use natively under
 | 
			
		||||
Linux and in a VM with `Docker Machine`_ (this guide was written for native
 | 
			
		||||
Docker usage under Linux, you might have to adapt it for Docker Machine.)
 | 
			
		||||
Alternatively the standard, `bare metal`_ approach is a little more
 | 
			
		||||
Docker usage under Linux, you might have to adapt it for Docker Machine.) 
 | 
			
		||||
Not to forget the virtualenv, this is similar to `bare metal`_ with the exception
 | 
			
		||||
that you have to activate the virtualenv first.
 | 
			
		||||
Last but not least, the standard `bare metal`_ approach is a little more
 | 
			
		||||
complicated, but worth it because it makes it easier should you want to
 | 
			
		||||
contribute some code back.
 | 
			
		||||
 | 
			
		||||
@@ -59,9 +61,11 @@ Standard (Bare Metal)
 | 
			
		||||
.....................
 | 
			
		||||
 | 
			
		||||
1. Install the requirements as per the :ref:`requirements <requirements>` page.
 | 
			
		||||
2. Change to the ``src`` directory in this repo.
 | 
			
		||||
3. Copy ``paperless.conf.example`` to ``/etc/paperless.conf`` and open it in
 | 
			
		||||
   your favourite editor.  Set the values for:
 | 
			
		||||
2. Within the extract of master.zip go to the ``src`` directory.
 | 
			
		||||
3. Copy ``paperless.conf.example`` to ``/etc/paperless.conf`` also the virtual 
 | 
			
		||||
   envrionment look there for it and open it in your favourite editor.  
 | 
			
		||||
   Because this file contains passwords it should only be readable by user root
 | 
			
		||||
   and paperless !  Set the values for:
 | 
			
		||||
 | 
			
		||||
    * ``PAPERLESS_CONSUMPTION_DIR``: this is where your documents will be
 | 
			
		||||
      dumped to be consumed by Paperless.
 | 
			
		||||
@@ -70,18 +74,18 @@ Standard (Bare Metal)
 | 
			
		||||
    * ``PAPERLESS_OCR_THREADS``: this is the number of threads the OCR process
 | 
			
		||||
      will spawn to process document pages in parallel.
 | 
			
		||||
 | 
			
		||||
4. Initialise the database with ``./manage.py migrate``.
 | 
			
		||||
4. Initialise the SQLite database with ``./manage.py migrate``.
 | 
			
		||||
5. Create a user for your Paperless instance with
 | 
			
		||||
   ``./manage.py createsuperuser``. Follow the prompts to create your user.
 | 
			
		||||
6. Start the webserver with ``./manage.py runserver <IP>:<PORT>``.
 | 
			
		||||
   If no specifc IP or port are given, the default is ``127.0.0.1:8000``.
 | 
			
		||||
   You should now be able to visit your (empty) `Paperless webserver`_ at
 | 
			
		||||
   ``127.0.0.1:8000`` (or whatever you chose).  You can login with the
 | 
			
		||||
   user/pass you created in #5.
 | 
			
		||||
   If no specifc IP or port are given, the default is ``127.0.0.1:8000`` 
 | 
			
		||||
   also known as http://localhost:8000/.
 | 
			
		||||
   You should now be able to visit your (empty) at `Paperless webserver`_ or 
 | 
			
		||||
   whatever you chose before.  You can login with the user/pass you created in #5.
 | 
			
		||||
7. In a separate window, change to the ``src`` directory in this repo again,
 | 
			
		||||
   but this time, you should start the consumer script with
 | 
			
		||||
   ``./manage.py document_consumer``.
 | 
			
		||||
8. Scan something.  Put it in the ``CONSUMPTION_DIR``.
 | 
			
		||||
8. Scan something or put a file into the  ``CONSUMPTION_DIR``.
 | 
			
		||||
9. Wait a few minutes
 | 
			
		||||
10. Visit the document list on your webserver, and it should be there, indexed
 | 
			
		||||
    and downloadable.
 | 
			
		||||
@@ -299,10 +303,11 @@ Standard (Bare Metal, Systemd)
 | 
			
		||||
 | 
			
		||||
If you're running on a bare metal system that's using Systemd, you can use the
 | 
			
		||||
service unit files in the ``scripts`` directory to set this up.  You'll need to
 | 
			
		||||
create a user called ``paperless`` and setup Paperless to be in a place that
 | 
			
		||||
this new user can read and write to. Be sure to edit the service scripts to point
 | 
			
		||||
to the proper location of your paperless install, referencing the appropriate Python
 | 
			
		||||
binary. For example: ``ExecStart=/path/to/python3 /path/to/paperless/src/manage.py document_consumer``.
 | 
			
		||||
create a user called ``paperless`` (without login (if not already done so #5)) and 
 | 
			
		||||
setup Paperless to be in a place that this new user can read and write to. Be sure 
 | 
			
		||||
to edit the service  scripts to point to the proper location of your paperless install, 
 | 
			
		||||
referencing the appropriate Python binary. For example: 
 | 
			
		||||
``ExecStart=/path/to/python3 /path/to/paperless/src/manage.py document_consumer``.
 | 
			
		||||
If you don't want to make a new user, you can change the ``Group`` and ``User`` variables
 | 
			
		||||
accordingly.
 | 
			
		||||
 | 
			
		||||
@@ -344,7 +349,7 @@ after restarting your system:
 | 
			
		||||
  If you are using a network interface other than ``eth0``, you will have to
 | 
			
		||||
  change ``IFACE=eth0``. For example, if you are connected via WiFi, you will
 | 
			
		||||
  likely need to replace ``eth0`` above with ``wlan0``. To see all interfaces,
 | 
			
		||||
  run ``ifconfig``.
 | 
			
		||||
  run ``ifconfig -a``.
 | 
			
		||||
 | 
			
		||||
  Save the file.
 | 
			
		||||
 | 
			
		||||
 
 | 
			
		||||
		Reference in New Issue
	
	Block a user