documentation

2025-12-14 01:21:14 -06:00 · 2020-12-01 23:38:42 +01:00
parent 8cad12b154
commit 19bb29d5cd
6 changed files with 122 additions and 2 deletions
--- a/docs/administration.rst
+++ b/docs/administration.rst
@@ -333,6 +333,36 @@ command:

 The command takes no arguments and processes all your mail accounts and rules.

+.. _utilities-archiver:
+
+Creating archived documents
+===========================
+
+Paperless stores archived PDF/A documents alongside your original documents.
+These archived documents will also contain selectable text for image-only
+originals.
+These documents are derived from the originals, which are always stored
+unmodified. If coming from an earlier version of paperless, your documents
+won't have archived versions.
+
+This command creates PDF/A documents for your documents.
+
+.. code::
+
+    document_archiver --overwrite
+
+This command will only attempt to create archived documents when no archived
+document exists yet, unless ``--overwrite`` is specified.
+
+.. note::
+
+    This command essentially performs OCR on all your documents again,
+    according to your settings. If you run this with ``PAPERLESS_OCR_MODE=redo``,
+    it will potentially run for a very long time. You can cancel the command
+    at any time, since this command will skip already archived versions the next time
+    it is run.
+
+
 .. _utilities-encyption:

 Managing encryption
--- a/docs/changelog.rst
+++ b/docs/changelog.rst
@@ -5,6 +5,29 @@
 Changelog
 *********

+paperless-ng 0.9.5
+##################
+
+* OCR
+
+  * Paperless now uses `OCRmyPDF <https://github.com/jbarlow83/OCRmyPDF>`_ to perform OCR on documents.
+  * OCRmyPDF creates archived PDF/A documents with embedded text that can be selected in the front end.
+  * Paperless stores archived versions of documents alongside with the originals. The originals can be
+    accessed on the document edit page, if available.
+  * Many of the configuration options regarding OCR have changed. See :ref:`configuration-ocr` for details.
+  * Paperless no longer guesses the language of your documents. It always uses the language that you
+    specified with ``PAPERLESS_OCR_LANGUAGE``. Be sure to set this to the language the majority of your
+    documents are in.
+  * The management command :ref:`document_archiver <utilities-archiver>` can be used to create archived versions for already
+    existing documents.
+
+* Tags from consumption folder.
+
+  * Thanks to `jayme-github`_, paperless now consumes files from sub folders in the consumption folder and is able to assign tags
+    based on the sub folders a document was found in. This can be configured with ``PAPERLESS_CONSUMER_RECURSIVE`` and
+    ``PAPERLESS_CONSUMER_SUBDIRS_AS_TAGS``.
+
+
 paperless-ng 0.9.4
 ##################

@@ -750,6 +773,7 @@ bulk of the work on this big change.

 * Initial release

+.. _jayme-github: http://github.com/jayme-github
 .. _Brian Conn: https://github.com/TheConnMan
 .. _Christopher Luu: https://github.com/nuudles
 .. _Florian Jung: https://github.com/the01
--- a/docs/configuration.rst
+++ b/docs/configuration.rst
@@ -152,6 +152,8 @@ PAPERLESS_AUTO_LOGIN_USERNAME=<username>

    Defaults to none, which disables this feature.

+.. _configuration-ocr:
+
 OCR settings
 ############

@@ -184,6 +186,8 @@ PAPERLESS_OCR_MODE=<mode>
        where no text is present. This is the safest and fastest option.
    *   ``skip_noarchive``: In addition to skip, paperless won't create an
        archived version of your documents when it finds any text in them.
+        This is useful if you don't want to have two almost-identical versions
+        of your digital documents in the media folder.
    *   ``redo``: Paperless will OCR all pages of your documents and attempt to
        replace any existing text layers with new text. This will be useful for
        documents from scanners that already performed OCR with insufficient
@@ -197,7 +201,8 @@ PAPERLESS_OCR_MODE=<mode>
        however, the resulting document may be significantly larger and text
        won't appear as sharp when zoomed in.
    
-    The default is ``skip``, which only performs OCR when necessary.
+    The default is ``skip``, which only performs OCR when necessary and always
+    creates archived documents.

 PAPERLESS_OCR_OUTPUT_TYPE=<type>
    Specify the the type of PDF documents that paperless should produce.
@@ -244,7 +249,7 @@ PAPERLESS_OCR_USER_ARG=<json>
    OCRmyPDF offers many more options. Use this parameter to specify any
    additional arguments you wish to pass to OCRmyPDF. Since Paperless uses
    the API of OCRmyPDF, you have to specify these in a format that can be
-    passed to the API. See `https://ocrmypdf.readthedocs.io/en/latest/api.html#reference`_
+    passed to the API. See `the API reference of OCRmyPDF <https://ocrmypdf.readthedocs.io/en/latest/api.html#reference>`_
    for valid parameters. All command line options are supported, but they
    use underscores instead of dashed.

--- a/docs/faq.rst
+++ b/docs/faq.rst
@@ -3,6 +3,18 @@
 Frequently asked questions
 **************************

+**Q:** *What's the general plan for Paperless-ng?*
+
+**A:** Paperless-ng is already almost feature-complete. This project will remain
+as simple as it is right now. It will see improvements to features that are already there.
+If you need advanced features such as document versions,
+workflows or multi-user with customizable access to individual files, this is
+not the tool for you.
+
+Features that *are* planned are some more quality of life extensions for the searching
+(i.e., search for similar documents, group results by correspondents with "more from this"
+links, etc), bulk editing and hierarchical tags.
+
 **Q:** *I'm using docker. Where are my documents?*

 **A:** Your documents are stored inside the docker volume ``paperless_media``.
@@ -21,6 +33,18 @@ is
    files around manually. This folder is meant to be entirely managed by docker
    and paperless.

+**Q:** *Let's say you don't support this project anymore in a year. Can I easily move to other systems?*
+
+**A:** Your documents are stored as plain files inside the media folder. You can always drag those files
+out of that folder to use them elsewhere. Here are a couple notes about that.
+
+*   Paperless never modifies your original documents. It keeps checksums of all documents and uses a
+    scheduled sanity checker to check that they remain the same.
+*   By default, paperless uses the internal ID of each document as its filename. This might not be very
+    convenient for export. However, you can adjust the way files are stored in paperless by
+    :ref:`configuring the filename format <advanced-file_name_handling>`.
+*   :ref:`The exporter <utilities-exporter>` is another easy way to get your files out of paperless with reasonable file names.
+
 **Q:** *What file types does paperless-ng support?*

 **A:** Currently, the following files are supported:
@@ -53,3 +77,12 @@ in your browser and paperless has to do much less work to serve the data.
 that automatically, I'm all ears. For now, you have to grab the latest release
 archive from the project page and build the image yourself. The release comes
 with the front end already compiled, so you don't have to do this on the Pi.
+
+**Q:** *How do I run this on my toaster?*
+
+**A:** I honestly don't know! As for all other devices that might be able
+to run paperless, you're a bit on your own. If you can't run the docker image,
+the documentation has instructions for bare metal installs. I'm running
+paperless on an i3 processor from 2015 or so. This is also what I use to test
+new releases with. Apart from that, I also have a Raspberry Pi, which I
+occasionally build the image on and see if it works.
--- a/docs/index.rst
+++ b/docs/index.rst
@@ -42,6 +42,9 @@ resources in the documentation:
    learn about how paperless automates all tagging using machine learning.
 *   Paperless now comes with a :ref:`proper email consumer <usage-email>`
    that's fully tested and production ready.
+*   Paperless creates searchable PDF/A documents from whatever you you put into
+    the consumption directory. This means that you can select text in
+    image-only documents coming from your scanner.
 *   See :ref:`this note <utilities-encyption>` about GnuPG encryption in
    paperless-ng.
 *   Paperless is now integrated with a
--- a/docs/usage_overview.rst
+++ b/docs/usage_overview.rst
@@ -60,6 +60,31 @@ Once you've got Paperless setup, you need to start feeding documents into it.
 Currently, there are three options: the consumption directory, IMAP (email), and
 HTTP POST.

+When adding documents to paperless, it will perform the following operations on
+your documents:
+
+1.  OCR the document, if it has no text. Digital documents usually have text,
+    and this step will be skipped for those documents.
+2.  Paperless will create an archiveable PDF/A document from your document.
+    If this document is coming from your scanner, it will have embedded selectable text.
+3.  Paperless performs automatic matching of tags, correspondents and types on the
+    document before storing it in the database.
+
+.. hint::
+
+    This process can be configured to fit your needs. If you don't want paperless
+    to create archived versions for digital documents, you can configure that by
+    configuring ``PAPERLESS_OCR_MODE=skip_noarchive``. Please read the 
+    :ref:`relevant section in the documentation <configuration-ocr>`.
+
+.. note::
+
+    No matter which options you choose, Paperless will always store the original
+    document that it found in the consumption directory or in the mail and
+    will never overwrite that document. Archived versions are stored alongside the
+    digital versions.
+
+

 The consumption directory
 =========================