documentation

This commit is contained in:
jonaswinkler 2020-12-01 23:38:42 +01:00
parent 8cad12b154
commit 19bb29d5cd
6 changed files with 122 additions and 2 deletions

View File

@ -333,6 +333,36 @@ command:
The command takes no arguments and processes all your mail accounts and rules. The command takes no arguments and processes all your mail accounts and rules.
.. _utilities-archiver:
Creating archived documents
===========================
Paperless stores archived PDF/A documents alongside your original documents.
These archived documents will also contain selectable text for image-only
originals.
These documents are derived from the originals, which are always stored
unmodified. If coming from an earlier version of paperless, your documents
won't have archived versions.
This command creates PDF/A documents for your documents.
.. code::
document_archiver --overwrite
This command will only attempt to create archived documents when no archived
document exists yet, unless ``--overwrite`` is specified.
.. note::
This command essentially performs OCR on all your documents again,
according to your settings. If you run this with ``PAPERLESS_OCR_MODE=redo``,
it will potentially run for a very long time. You can cancel the command
at any time, since this command will skip already archived versions the next time
it is run.
.. _utilities-encyption: .. _utilities-encyption:
Managing encryption Managing encryption

View File

@ -5,6 +5,29 @@
Changelog Changelog
********* *********
paperless-ng 0.9.5
##################
* OCR
* Paperless now uses `OCRmyPDF <https://github.com/jbarlow83/OCRmyPDF>`_ to perform OCR on documents.
* OCRmyPDF creates archived PDF/A documents with embedded text that can be selected in the front end.
* Paperless stores archived versions of documents alongside with the originals. The originals can be
accessed on the document edit page, if available.
* Many of the configuration options regarding OCR have changed. See :ref:`configuration-ocr` for details.
* Paperless no longer guesses the language of your documents. It always uses the language that you
specified with ``PAPERLESS_OCR_LANGUAGE``. Be sure to set this to the language the majority of your
documents are in.
* The management command :ref:`document_archiver <utilities-archiver>` can be used to create archived versions for already
existing documents.
* Tags from consumption folder.
* Thanks to `jayme-github`_, paperless now consumes files from sub folders in the consumption folder and is able to assign tags
based on the sub folders a document was found in. This can be configured with ``PAPERLESS_CONSUMER_RECURSIVE`` and
``PAPERLESS_CONSUMER_SUBDIRS_AS_TAGS``.
paperless-ng 0.9.4 paperless-ng 0.9.4
################## ##################
@ -750,6 +773,7 @@ bulk of the work on this big change.
* Initial release * Initial release
.. _jayme-github: http://github.com/jayme-github
.. _Brian Conn: https://github.com/TheConnMan .. _Brian Conn: https://github.com/TheConnMan
.. _Christopher Luu: https://github.com/nuudles .. _Christopher Luu: https://github.com/nuudles
.. _Florian Jung: https://github.com/the01 .. _Florian Jung: https://github.com/the01

View File

@ -152,6 +152,8 @@ PAPERLESS_AUTO_LOGIN_USERNAME=<username>
Defaults to none, which disables this feature. Defaults to none, which disables this feature.
.. _configuration-ocr:
OCR settings OCR settings
############ ############
@ -184,6 +186,8 @@ PAPERLESS_OCR_MODE=<mode>
where no text is present. This is the safest and fastest option. where no text is present. This is the safest and fastest option.
* ``skip_noarchive``: In addition to skip, paperless won't create an * ``skip_noarchive``: In addition to skip, paperless won't create an
archived version of your documents when it finds any text in them. archived version of your documents when it finds any text in them.
This is useful if you don't want to have two almost-identical versions
of your digital documents in the media folder.
* ``redo``: Paperless will OCR all pages of your documents and attempt to * ``redo``: Paperless will OCR all pages of your documents and attempt to
replace any existing text layers with new text. This will be useful for replace any existing text layers with new text. This will be useful for
documents from scanners that already performed OCR with insufficient documents from scanners that already performed OCR with insufficient
@ -197,7 +201,8 @@ PAPERLESS_OCR_MODE=<mode>
however, the resulting document may be significantly larger and text however, the resulting document may be significantly larger and text
won't appear as sharp when zoomed in. won't appear as sharp when zoomed in.
The default is ``skip``, which only performs OCR when necessary. The default is ``skip``, which only performs OCR when necessary and always
creates archived documents.
PAPERLESS_OCR_OUTPUT_TYPE=<type> PAPERLESS_OCR_OUTPUT_TYPE=<type>
Specify the the type of PDF documents that paperless should produce. Specify the the type of PDF documents that paperless should produce.
@ -244,7 +249,7 @@ PAPERLESS_OCR_USER_ARG=<json>
OCRmyPDF offers many more options. Use this parameter to specify any OCRmyPDF offers many more options. Use this parameter to specify any
additional arguments you wish to pass to OCRmyPDF. Since Paperless uses additional arguments you wish to pass to OCRmyPDF. Since Paperless uses
the API of OCRmyPDF, you have to specify these in a format that can be the API of OCRmyPDF, you have to specify these in a format that can be
passed to the API. See `https://ocrmypdf.readthedocs.io/en/latest/api.html#reference`_ passed to the API. See `the API reference of OCRmyPDF <https://ocrmypdf.readthedocs.io/en/latest/api.html#reference>`_
for valid parameters. All command line options are supported, but they for valid parameters. All command line options are supported, but they
use underscores instead of dashed. use underscores instead of dashed.

View File

@ -3,6 +3,18 @@
Frequently asked questions Frequently asked questions
************************** **************************
**Q:** *What's the general plan for Paperless-ng?*
**A:** Paperless-ng is already almost feature-complete. This project will remain
as simple as it is right now. It will see improvements to features that are already there.
If you need advanced features such as document versions,
workflows or multi-user with customizable access to individual files, this is
not the tool for you.
Features that *are* planned are some more quality of life extensions for the searching
(i.e., search for similar documents, group results by correspondents with "more from this"
links, etc), bulk editing and hierarchical tags.
**Q:** *I'm using docker. Where are my documents?* **Q:** *I'm using docker. Where are my documents?*
**A:** Your documents are stored inside the docker volume ``paperless_media``. **A:** Your documents are stored inside the docker volume ``paperless_media``.
@ -21,6 +33,18 @@ is
files around manually. This folder is meant to be entirely managed by docker files around manually. This folder is meant to be entirely managed by docker
and paperless. and paperless.
**Q:** *Let's say you don't support this project anymore in a year. Can I easily move to other systems?*
**A:** Your documents are stored as plain files inside the media folder. You can always drag those files
out of that folder to use them elsewhere. Here are a couple notes about that.
* Paperless never modifies your original documents. It keeps checksums of all documents and uses a
scheduled sanity checker to check that they remain the same.
* By default, paperless uses the internal ID of each document as its filename. This might not be very
convenient for export. However, you can adjust the way files are stored in paperless by
:ref:`configuring the filename format <advanced-file_name_handling>`.
* :ref:`The exporter <utilities-exporter>` is another easy way to get your files out of paperless with reasonable file names.
**Q:** *What file types does paperless-ng support?* **Q:** *What file types does paperless-ng support?*
**A:** Currently, the following files are supported: **A:** Currently, the following files are supported:
@ -53,3 +77,12 @@ in your browser and paperless has to do much less work to serve the data.
that automatically, I'm all ears. For now, you have to grab the latest release that automatically, I'm all ears. For now, you have to grab the latest release
archive from the project page and build the image yourself. The release comes archive from the project page and build the image yourself. The release comes
with the front end already compiled, so you don't have to do this on the Pi. with the front end already compiled, so you don't have to do this on the Pi.
**Q:** *How do I run this on my toaster?*
**A:** I honestly don't know! As for all other devices that might be able
to run paperless, you're a bit on your own. If you can't run the docker image,
the documentation has instructions for bare metal installs. I'm running
paperless on an i3 processor from 2015 or so. This is also what I use to test
new releases with. Apart from that, I also have a Raspberry Pi, which I
occasionally build the image on and see if it works.

View File

@ -42,6 +42,9 @@ resources in the documentation:
learn about how paperless automates all tagging using machine learning. learn about how paperless automates all tagging using machine learning.
* Paperless now comes with a :ref:`proper email consumer <usage-email>` * Paperless now comes with a :ref:`proper email consumer <usage-email>`
that's fully tested and production ready. that's fully tested and production ready.
* Paperless creates searchable PDF/A documents from whatever you you put into
the consumption directory. This means that you can select text in
image-only documents coming from your scanner.
* See :ref:`this note <utilities-encyption>` about GnuPG encryption in * See :ref:`this note <utilities-encyption>` about GnuPG encryption in
paperless-ng. paperless-ng.
* Paperless is now integrated with a * Paperless is now integrated with a

View File

@ -60,6 +60,31 @@ Once you've got Paperless setup, you need to start feeding documents into it.
Currently, there are three options: the consumption directory, IMAP (email), and Currently, there are three options: the consumption directory, IMAP (email), and
HTTP POST. HTTP POST.
When adding documents to paperless, it will perform the following operations on
your documents:
1. OCR the document, if it has no text. Digital documents usually have text,
and this step will be skipped for those documents.
2. Paperless will create an archiveable PDF/A document from your document.
If this document is coming from your scanner, it will have embedded selectable text.
3. Paperless performs automatic matching of tags, correspondents and types on the
document before storing it in the database.
.. hint::
This process can be configured to fit your needs. If you don't want paperless
to create archived versions for digital documents, you can configure that by
configuring ``PAPERLESS_OCR_MODE=skip_noarchive``. Please read the
:ref:`relevant section in the documentation <configuration-ocr>`.
.. note::
No matter which options you choose, Paperless will always store the original
document that it found in the consumption directory or in the mail and
will never overwrite that document. Archived versions are stored alongside the
digital versions.
The consumption directory The consumption directory
========================= =========================