diff --git a/README.md b/README.md index 66e22c957..137db21cc 100644 --- a/README.md +++ b/README.md @@ -12,15 +12,57 @@ of not having the right document around. Sometimes I recycled a document I needed (who keeps water bills for two years?) and other times I just lost it... because paper. I wrote this to make my life easier. -Here's how it works: +## How it Works: 1. Buy a document scanner like [this one](http://welcome.brother.com/sg-en/products-services/scanners/ads-1100w.html). -2. Set it up to "scan to FTP". This means you can use it without being - connected to a running computer. It will just scan the document and save it - as a PDF on a server in your house. -3. Setup a cronjob on that server to use *paperless* to OCR the PDF and index - it into a local database. +2. Set it up to "scan to FTP" or something similar. It should be able to push + scanned images to a server without you having to do anything. +3. Have the target server run the *paperless* consumption script to OCR the PDF + and index it into a local database. 4. Use the web frontend to sift through the database and find what you want. 5. Download the PDF you need/want via the web interface and do whatever you like with it. You can even print it and send it as if it's the original. In most cases, no one will care or notice. + + +## Requirements + +This is all really quite simple, a shiny, user-friendly wrapper around some very +powerful tools. + +* [ImageMagick](http://imagemagick.org/) converts the images between colour and + greyscale. +* [Tesseract](https://github.com/tesseract-ocr) does the character recognition +* [Python 3](https://python.org/) is the language of the project + * [Pillow](https://pypi.python.org/pypi/pillowfight/) converts the PDFs to + images + * [PyOCR](https://github.com/jflesch/pyocr) is a slick programmatic wrapper + around tesseract + * [Django](https://djangoproject.org/) is the framework this project is + written against. + + +## Instructions + +1. Check out this repo to somewhere convenient and install the requirements + listed here into your environment. + +2. Configure `settings.py` and make sure that `CONVERT_BINARY`, `SCRATCH_DIR`, + and `CONSUMPTION_DIR` are set to values you'd expect: + + * `CONVERT_BINARY`: The path to `convert`, installed as part of ImageMagick. + * `SCRATCH_DIR`: A place for files to be created and destroyed. The default + is as good a place as any. + * `CONSUMPTION_DIR`: The directory you scanner will be depositing files. + Note that the consumption script will import files from here **and then + delete them**. + +3. Run `python manage.py migrate`. This will create your local database. + +4. Run `python manage.py consume`. You may want to do this in a background + process like a SystemD service or rc script because it runs in an infinite + loop. + +5. Start the webserver with `python manage.py runserver`. + +6. Log into your new toy by visiting `http://localhost:8000/`.