mirror of
https://github.com/paperless-ngx/paperless-ngx.git
synced 2025-07-28 18:24:38 -05:00
Merge branch 'dev'
This commit is contained in:
@@ -5,85 +5,6 @@ Advanced topics
|
||||
Paperless offers a couple features that automate certain tasks and make your life
|
||||
easier.
|
||||
|
||||
Guesswork
|
||||
#########
|
||||
|
||||
|
||||
Any document you put into the consumption directory will be consumed, but if
|
||||
you name the file right, it'll automatically set some values in the database
|
||||
for you. This is is the logic the consumer follows:
|
||||
|
||||
1. Try to find the correspondent, title, and tags in the file name following
|
||||
the pattern: ``Date - Correspondent - Title - tag,tag,tag.pdf``. Note that
|
||||
the format of the date is **rigidly defined** as ``YYYYMMDDHHMMSSZ`` or
|
||||
``YYYYMMDDZ``. The ``Z`` refers "Zulu time" AKA "UTC".
|
||||
The tags are optional, so the format ``Date - Correspondent - Title.pdf``
|
||||
works as well.
|
||||
2. If that doesn't work, we skip the date and try this pattern:
|
||||
``Correspondent - Title - tag,tag,tag.pdf``.
|
||||
3. If that doesn't work, we try to find the correspondent and title in the file
|
||||
name following the pattern: ``Correspondent - Title.pdf``.
|
||||
4. If that doesn't work, just assume that the name of the file is the title.
|
||||
|
||||
So given the above, the following examples would work as you'd expect:
|
||||
|
||||
* ``20150314000700Z - Some Company Name - Invoice 2016-01-01 - money,invoices.pdf``
|
||||
* ``20150314Z - Some Company Name - Invoice 2016-01-01 - money,invoices.pdf``
|
||||
* ``Some Company Name - Invoice 2016-01-01 - money,invoices.pdf``
|
||||
* ``Another Company - Letter of Reference.jpg``
|
||||
* ``Dad's Recipe for Pancakes.png``
|
||||
|
||||
These however wouldn't work:
|
||||
|
||||
* ``2015-03-14 00:07:00 UTC - Some Company Name, Invoice 2016-01-01, money, invoices.pdf``
|
||||
* ``2015-03-14 - Some Company Name, Invoice 2016-01-01, money, invoices.pdf``
|
||||
* ``Some Company Name, Invoice 2016-01-01, money, invoices.pdf``
|
||||
* ``Another Company- Letter of Reference.jpg``
|
||||
|
||||
Do I have to be so strict about naming?
|
||||
=======================================
|
||||
|
||||
Rather than using the strict document naming rules, one can also set the option
|
||||
``PAPERLESS_FILENAME_DATE_ORDER`` in ``paperless.conf`` to any date order
|
||||
that is accepted by dateparser_. Doing so will cause ``paperless`` to default
|
||||
to any date format that is found in the title, instead of a date pulled from
|
||||
the document's text, without requiring the strict formatting of the document
|
||||
filename as described above.
|
||||
|
||||
.. _dateparser: https://github.com/scrapinghub/dateparser/blob/v0.7.0/docs/usage.rst#settings
|
||||
|
||||
.. _advanced-transforming_filenames:
|
||||
|
||||
Transforming filenames for parsing
|
||||
==================================
|
||||
|
||||
Some devices can't produce filenames that can be parsed by the default
|
||||
parser. By configuring the option ``PAPERLESS_FILENAME_PARSE_TRANSFORMS`` in
|
||||
``paperless.conf`` one can add transformations that are applied to the filename
|
||||
before it's parsed.
|
||||
|
||||
The option contains a list of dictionaries of regular expressions (key:
|
||||
``pattern``) and replacements (key: ``repl``) in JSON format, which are
|
||||
applied in order by passing them to ``re.subn``. Transformation stops
|
||||
after the first match, so at most one transformation is applied. The general
|
||||
syntax is
|
||||
|
||||
.. code:: python
|
||||
|
||||
[{"pattern":"pattern1", "repl":"repl1"}, {"pattern":"pattern2", "repl":"repl2"}, ..., {"pattern":"patternN", "repl":"replN"}]
|
||||
|
||||
The example below is for a Brother ADS-2400N, a scanner that allows
|
||||
different names to different hardware buttons (useful for handling
|
||||
multiple entities in one instance), but insists on adding ``_<count>``
|
||||
to the filename.
|
||||
|
||||
.. code:: python
|
||||
|
||||
# Brother profile configuration, support "Name_Date_Count" (the default
|
||||
# setting) and "Name_Count" (use "Name" as tag and "Count" as title).
|
||||
PAPERLESS_FILENAME_PARSE_TRANSFORMS=[{"pattern":"^([a-z]+)_(\\d{8})_(\\d{6})_([0-9]+)\\.", "repl":"\\2\\3Z - \\4 - \\1."}, {"pattern":"^([a-z]+)_([0-9]+)\\.", "repl":" - \\2 - \\1."}]
|
||||
|
||||
|
||||
.. _advanced-matching:
|
||||
|
||||
Matching tags, correspondents and document types
|
||||
|
15
docs/api.rst
15
docs/api.rst
@@ -221,21 +221,16 @@ Each fragment contains a list of strings, and some of them are marked as a highl
|
||||
|
||||
[
|
||||
[
|
||||
{"text": "This is a sample text with a "},
|
||||
{"text": "highlighted", "term": 0},
|
||||
{"text": " word."}
|
||||
{"text": "This is a sample text with a ", "highlight": false},
|
||||
{"text": "highlighted", "highlight": true},
|
||||
{"text": " word.", "highlight": false}
|
||||
],
|
||||
[
|
||||
{"text": "Another", "term": 1},
|
||||
{"text": " fragment with a highlight."}
|
||||
{"text": "Another", "highlight": true},
|
||||
{"text": " fragment with a highlight.", "highlight": false}
|
||||
]
|
||||
]
|
||||
|
||||
|
||||
|
||||
When ``term`` is present within a string, the word within ``text`` should be highlighted.
|
||||
The term index groups multiple matches together and words with the same index
|
||||
should get identical highlighting.
|
||||
A client may use this example to produce the following output:
|
||||
|
||||
... This is a sample text with a **highlighted** word. ... **Another** fragment with a highlight. ...
|
||||
|
@@ -6,6 +6,40 @@ Changelog
|
||||
*********
|
||||
|
||||
|
||||
paperless-ng 0.9.9
|
||||
##################
|
||||
|
||||
Christmas release!
|
||||
|
||||
* Bulk editing
|
||||
|
||||
* Paperless now supports bulk editing.
|
||||
* The following operations are available: Add and remove correspondents, tags, document types from selected documents, as well as mass-deleting documents.
|
||||
* We've got a more fancy UI in the works that makes these features more accessible, but that's not quite ready yet.
|
||||
|
||||
* Searching
|
||||
|
||||
* Paperless now supports searching for similar documents ("More like this") both from the document detail page as well as from individual search results.
|
||||
* A search score indicates how well a document matches the search query, or how similar a document is to a given reference document.
|
||||
|
||||
* Other additions and changes
|
||||
|
||||
* Clarification in the UI that the fields "Match" and "Is insensitive" are not relevant for the Auto matching algorithm.
|
||||
* New select interface for tags, types and correspondents allows filtering. This also improves tag selection. Thanks again to `Michael Shamoon`_!
|
||||
* Page navigation controls for the document viewer, thanks to `Michael Shamoon`_.
|
||||
* Layout changes to the small cards document list.
|
||||
* The dashboard now displays the username (or full name if specified in the admin) on the dashboard.
|
||||
|
||||
* Fixes
|
||||
|
||||
* An error that caused the document importer to crash was fixed.
|
||||
* An issue with changes not being possible when ``PAPERLESS_COOKIE_PREFIX`` is used was fixed.
|
||||
* The date selection filters now allow manual entry of dates.
|
||||
|
||||
* Feature Removal
|
||||
|
||||
* Most of the guesswork features have been removed. Paperless no longer tries to extract correspondents and tags from file names.
|
||||
|
||||
paperless-ng 0.9.8
|
||||
##################
|
||||
|
||||
|
@@ -400,11 +400,6 @@ PAPERLESS_FILENAME_DATE_ORDER=<format>
|
||||
|
||||
Defaults to none, which disables this feature.
|
||||
|
||||
PAPERLESS_FILENAME_PARSE_TRANSFORMS
|
||||
Transforms filenames before they are processed by paperless. See
|
||||
:ref:`advanced-transforming_filenames` for details.
|
||||
|
||||
Defaults to none, which disables this feature.
|
||||
|
||||
Binaries
|
||||
########
|
||||
|
@@ -120,6 +120,8 @@ The `bare metal route`_ is more complicated to setup but makes it easier
|
||||
should you want to contribute some code back. You need to configure and
|
||||
run the above mentioned components yourself.
|
||||
|
||||
.. _setup-docker_route:
|
||||
|
||||
Docker Route
|
||||
============
|
||||
|
||||
|
@@ -39,7 +39,7 @@ Operation not permitted
|
||||
|
||||
You might see errors such as:
|
||||
|
||||
.. code::
|
||||
.. code:: shell-session
|
||||
|
||||
chown: changing ownership of '../export': Operation not permitted
|
||||
|
||||
@@ -49,3 +49,29 @@ to these folders. This happens when pointing these directories to NFS shares,
|
||||
for example.
|
||||
|
||||
Ensure that `chown` is possible on these directories.
|
||||
|
||||
Classifier error: No training data available
|
||||
############################################
|
||||
|
||||
This indicates that the Auto matching algorithm found no documents to learn from.
|
||||
This may have two reasons:
|
||||
|
||||
* You don't use the Auto matching algorithm: The error can be safely ignored in this case.
|
||||
* You are using the Auto matching algorithm: The classifier explicitly excludes documents
|
||||
with Inbox tags. Verify that there are documents in your archive without inbox tags.
|
||||
The algorithm will only learn from documents not in your inbox.
|
||||
|
||||
Permission denied errors in the consumption directory
|
||||
#####################################################
|
||||
|
||||
You might encounter errors such as:
|
||||
|
||||
.. code:: shell-session
|
||||
|
||||
The following error occured while consuming document.pdf: [Errno 13] Permission denied: '/usr/src/paperless/src/../consume/document.pdf'
|
||||
|
||||
This happens when paperless does not have permission to delete files inside the consumption directory.
|
||||
Ensure that ``USERMAP_UID`` and ``USERMAP_GID`` are set to the user id and group id you use on the host operating system, if these are
|
||||
different from ``1000``. See :ref:`setup-docker_route`.
|
||||
|
||||
Also ensure that you are able to read and write to the consumption directory on the host.
|
||||
|
Reference in New Issue
Block a user