mirror of
https://github.com/paperless-ngx/paperless-ngx.git
synced 2025-04-02 13:45:10 -05:00
Reformat README to be Docker Hub-friendly
For some reason, Docker Hub doesn't follow the Markdown spec correctly, and inserts `<br />` tags on single newlines, meaning that this file can't use hard wraps.
This commit is contained in:
parent
c073ba5272
commit
b5d6a82cc3
78
README.md
78
README.md
@ -10,31 +10,18 @@ I hate paper. Environmental issues aside, it's a tech person's nightmare:
|
|||||||
* It takes up physical space
|
* It takes up physical space
|
||||||
* Backups mean more paper
|
* Backups mean more paper
|
||||||
|
|
||||||
In the past few months I've been bitten more than a few times by the problem
|
In the past few months I've been bitten more than a few times by the problem of not having the right document around. Sometimes I recycled a document I needed (who keeps water bills for two years?) and other times I just lost it... because paper. I wrote this to make my life easier.
|
||||||
of not having the right document around. Sometimes I recycled a document I
|
|
||||||
needed (who keeps water bills for two years?) and other times I just lost
|
|
||||||
it... because paper. I wrote this to make my life easier.
|
|
||||||
|
|
||||||
|
|
||||||
## How it Works
|
## How it Works
|
||||||
|
|
||||||
Paperless does not control your scanner, it only helps you deal with what your
|
Paperless does not control your scanner, it only helps you deal with what your scanner produces
|
||||||
scanner produces
|
|
||||||
|
|
||||||
1. Buy a document scanner that can write to a place on your network. If you
|
1. Buy a document scanner that can write to a place on your network. If you need some inspiration, have a look at the [scanner recommendations](https://paperless.readthedocs.io/en/latest/scanners.html) page.
|
||||||
need some inspiration, have a look at the [scanner recommendations](https://paperless.readthedocs.io/en/latest/scanners.html)
|
2. Set it up to "scan to FTP" or something similar. It should be able to push scanned images to a server without you having to do anything. Of course if your scanner doesn't know how to automatically upload the file somewhere, you can always do that manually. Paperless doesn't care how the documents get into its local consumption directory.
|
||||||
page.
|
3. Have the target server run the Paperless consumption script to OCR the file and index it into a local database.
|
||||||
2. Set it up to "scan to FTP" or something similar. It should be able to push
|
|
||||||
scanned images to a server without you having to do anything. Of course if
|
|
||||||
your scanner doesn't know how to automatically upload the file somewhere,
|
|
||||||
you can always do that manually. Paperless doesn't care how the documents
|
|
||||||
get into its local consumption directory.
|
|
||||||
3. Have the target server run the Paperless consumption script to OCR the file
|
|
||||||
and index it into a local database.
|
|
||||||
4. Use the web frontend to sift through the database and find what you want.
|
4. Use the web frontend to sift through the database and find what you want.
|
||||||
5. Download the PDF you need/want via the web interface and do whatever you
|
5. Download the PDF you need/want via the web interface and do whatever you like with it. You can even print it and send it as if it's the original. In most cases, no one will care or notice.
|
||||||
like with it. You can even print it and send it as if it's the original.
|
|
||||||
In most cases, no one will care or notice.
|
|
||||||
|
|
||||||
Here's what you get:
|
Here's what you get:
|
||||||
|
|
||||||
@ -43,32 +30,22 @@ Here's what you get:
|
|||||||
|
|
||||||
## Stability
|
## Stability
|
||||||
|
|
||||||
Paperless is still under active development (just look at the git commit
|
Paperless is still under active development (just look at the git commit history) so don't expect it to be 100% stable. You can backup the sqlite3 database, media directory and your configuration file to be on the safe side.
|
||||||
history) so don't expect it to be 100% stable. You can backup the sqlite3
|
|
||||||
database, media directory and your configuration file to be on the safe side.
|
|
||||||
|
|
||||||
|
|
||||||
## Requirements
|
## Requirements
|
||||||
|
|
||||||
This is all really a quite simple, shiny, user-friendly wrapper around some
|
This is all really a quite simple, shiny, user-friendly wrapper around some very powerful tools.
|
||||||
very powerful tools.
|
|
||||||
|
|
||||||
* [ImageMagick](http://imagemagick.org/) converts the images between colour and
|
* [ImageMagick](http://imagemagick.org/) converts the images between colour and greyscale.
|
||||||
greyscale.
|
|
||||||
* [Tesseract](https://github.com/tesseract-ocr) does the character recognition.
|
* [Tesseract](https://github.com/tesseract-ocr) does the character recognition.
|
||||||
* [Unpaper](https://www.flameeyes.eu/projects/unpaper) despeckles and deskews
|
* [Unpaper](https://www.flameeyes.eu/projects/unpaper) despeckles and deskews the scanned image.
|
||||||
the scanned image.
|
|
||||||
* [GNU Privacy Guard](https://gnupg.org/) is used as the encryption backend.
|
* [GNU Privacy Guard](https://gnupg.org/) is used as the encryption backend.
|
||||||
* [Python 3](https://python.org/) is the language of the project.
|
* [Python 3](https://python.org/) is the language of the project.
|
||||||
* [Pillow](https://pypi.python.org/pypi/pillowfight/) loads the image data as
|
* [Pillow](https://pypi.python.org/pypi/pillowfight/) loads the image data as a python object to be used with PyOCR.
|
||||||
a python object to be used with PyOCR.
|
* [PyOCR](https://github.com/jflesch/pyocr) is a slick programmatic wrapper around tesseract.
|
||||||
* [PyOCR](https://github.com/jflesch/pyocr) is a slick programmatic wrapper
|
* [Django](https://www.djangoproject.com/) is the framework this project is written against.
|
||||||
around tesseract.
|
* [Python-GNUPG](http://pythonhosted.org/python-gnupg/) decrypts the PDFs on-the-fly to allow you to download unencrypted files, leaving the encrypted ones on-disk.
|
||||||
* [Django](https://www.djangoproject.com/) is the framework this project is
|
|
||||||
written against.
|
|
||||||
* [Python-GNUPG](http://pythonhosted.org/python-gnupg/) decrypts the PDFs
|
|
||||||
on-the-fly to allow you to download unencrypted files, leaving the
|
|
||||||
encrypted ones on-disk.
|
|
||||||
|
|
||||||
|
|
||||||
## Documentation
|
## Documentation
|
||||||
@ -78,35 +55,16 @@ It's all available on [ReadTheDocs](https://paperless.readthedocs.org/).
|
|||||||
|
|
||||||
## Similar Projects
|
## Similar Projects
|
||||||
|
|
||||||
There's another project out there called [Mayan EDMS](https://mayan.readthedocs.org/en/latest/)
|
There's another project out there called [Mayan EDMS](https://mayan.readthedocs.org/en/latest/) that has a surprising amount of technical overlap with Paperless. Also based on Django and using a consumer model with Tesseract and Unpaper, Mayan EDMS is *much* more featureful and comes with a slick UI as well, but still in Python 2. It may be that Paperless consumes fewer resources, but to be honest, this is just a guess as I haven't tested this myself. One thing's for certain though, *Paperless* is a **way** better name.
|
||||||
that has a surprising amount of technical overlap with Paperless. Also based
|
|
||||||
on Django and using a consumer model with Tesseract and Unpaper, Mayan EDMS is
|
|
||||||
*much* more featureful and comes with a slick UI as well, but still in Python
|
|
||||||
2. It may be that Paperless consumes fewer resources, but to be honest, this is
|
|
||||||
just a guess as I haven't tested this myself. One thing's for certain though,
|
|
||||||
*Paperless* is a **way** better name.
|
|
||||||
|
|
||||||
|
|
||||||
## Important Note
|
## Important Note
|
||||||
|
|
||||||
Document scanners are typically used to scan sensitive documents. Things like
|
Document scanners are typically used to scan sensitive documents. Things like your social insurance number, tax records, invoices, etc. While Paperless encrypts the original files via the consumption script, the OCR'd text is *not* encrypted and is therefore stored in the clear (it needs to be searchable, so if someone has ideas on how to do that on encrypted data, I'm all ears). This means that Paperless should never be run on an untrusted host. Instead, I recommend that if you do want to use it, run it locally on a server in your own home.
|
||||||
your social insurance number, tax records, invoices, etc. While Paperless
|
|
||||||
encrypts the original files via the consumption script, the OCR'd text is *not*
|
|
||||||
encrypted and is therefore stored in the clear (it needs to be searchable, so
|
|
||||||
if someone has ideas on how to do that on encrypted data, I'm all ears). This
|
|
||||||
means that Paperless should never be run on an untrusted host. Instead, I
|
|
||||||
recommend that if you do want to use it, run it locally on a server in your own
|
|
||||||
home.
|
|
||||||
|
|
||||||
|
|
||||||
## Donations
|
## Donations
|
||||||
|
|
||||||
As with all Free software, the power is less in the finances and more in the
|
As with all Free software, the power is less in the finances and more in the collective efforts. I really appreciate every pull request and bug report offered up by Paperless' users, so please keep that stuff coming. If however, you're not one for coding/design/documentation, and would like to contribute financially, I won't say no ;-)
|
||||||
collective efforts. I really appreciate every pull request and bug report
|
|
||||||
offered up by Paperless' users, so please keep that stuff coming. If however,
|
|
||||||
you're not one for coding/design/documentation, and would like to contribute
|
|
||||||
financially, I won't say no ;-)
|
|
||||||
|
|
||||||
The thing is, I'm doing ok for money, so I would instead ask you to donate to
|
The thing is, I'm doing ok for money, so I would instead ask you to donate to the [United Nations High Commissioner for Refugees](https://donate.unhcr.org/int-en/general). They're doing important work and they need the money a lot more than I do.
|
||||||
the [United Nations High Commissioner for Refugees](https://donate.unhcr.org/int-en/general).
|
|
||||||
They're doing important work and they need the money a lot more than I do.
|
|
||||||
|
Loading…
x
Reference in New Issue
Block a user