mirror of
https://github.com/paperless-ngx/paperless-ngx.git
synced 2025-04-02 13:45:10 -05:00
Updated the requirements section
This commit is contained in:
parent
94ce12f9ee
commit
5b479c5058
14
README.md
14
README.md
@ -35,13 +35,21 @@ powerful tools.
|
||||
* [Tesseract](https://github.com/tesseract-ocr) does the character recognition
|
||||
* [GNU Privacy Guard](https://gnupg.org)
|
||||
* [Python 3](https://python.org/) is the language of the project
|
||||
* [Pillow](https://pypi.python.org/pypi/pillowfight/) converts the PDFs to
|
||||
images
|
||||
* [Pillow](https://pypi.python.org/pypi/pillowfight/) loads the image data
|
||||
as a python object to be used with PyOCR.
|
||||
* [PyOCR](https://github.com/jflesch/pyocr) is a slick programmatic wrapper
|
||||
around tesseract
|
||||
* [Django](https://djangoproject.org/) is the framework this project is
|
||||
written against.
|
||||
* [Python-GNUPG](http://pythonhosted.org/python-gnupg/)
|
||||
* [Python-GNUPG](http://pythonhosted.org/python-gnupg/) decrypts the PDFs
|
||||
on-the-fly to allow you to download unencrypted files, leaving the
|
||||
encrypted ones on-disk.
|
||||
|
||||
The keen eye might have noticed that we're converting a PDF to an image to be
|
||||
read by Tesseract, and to do this we're using a chain of: scanned PDF >
|
||||
Imagemagick > Pillow > PyOCR > Tesseract > text. It's not ideal, but
|
||||
apparently, Pillow lacks the ability to read PDFs, and PyOCR requires a Pillow
|
||||
object, so we're sort of stuck.
|
||||
|
||||
|
||||
## Instructions
|
||||
|
Loading…
x
Reference in New Issue
Block a user