mirror of
https://github.com/paperless-ngx/paperless-ngx.git
synced 2025-04-02 13:45:10 -05:00
updated documentation
This commit is contained in:
parent
c527b274b6
commit
ae3e2a7063
@ -5,85 +5,6 @@ Advanced topics
|
|||||||
Paperless offers a couple features that automate certain tasks and make your life
|
Paperless offers a couple features that automate certain tasks and make your life
|
||||||
easier.
|
easier.
|
||||||
|
|
||||||
Guesswork
|
|
||||||
#########
|
|
||||||
|
|
||||||
|
|
||||||
Any document you put into the consumption directory will be consumed, but if
|
|
||||||
you name the file right, it'll automatically set some values in the database
|
|
||||||
for you. This is is the logic the consumer follows:
|
|
||||||
|
|
||||||
1. Try to find the correspondent, title, and tags in the file name following
|
|
||||||
the pattern: ``Date - Correspondent - Title - tag,tag,tag.pdf``. Note that
|
|
||||||
the format of the date is **rigidly defined** as ``YYYYMMDDHHMMSSZ`` or
|
|
||||||
``YYYYMMDDZ``. The ``Z`` refers "Zulu time" AKA "UTC".
|
|
||||||
The tags are optional, so the format ``Date - Correspondent - Title.pdf``
|
|
||||||
works as well.
|
|
||||||
2. If that doesn't work, we skip the date and try this pattern:
|
|
||||||
``Correspondent - Title - tag,tag,tag.pdf``.
|
|
||||||
3. If that doesn't work, we try to find the correspondent and title in the file
|
|
||||||
name following the pattern: ``Correspondent - Title.pdf``.
|
|
||||||
4. If that doesn't work, just assume that the name of the file is the title.
|
|
||||||
|
|
||||||
So given the above, the following examples would work as you'd expect:
|
|
||||||
|
|
||||||
* ``20150314000700Z - Some Company Name - Invoice 2016-01-01 - money,invoices.pdf``
|
|
||||||
* ``20150314Z - Some Company Name - Invoice 2016-01-01 - money,invoices.pdf``
|
|
||||||
* ``Some Company Name - Invoice 2016-01-01 - money,invoices.pdf``
|
|
||||||
* ``Another Company - Letter of Reference.jpg``
|
|
||||||
* ``Dad's Recipe for Pancakes.png``
|
|
||||||
|
|
||||||
These however wouldn't work:
|
|
||||||
|
|
||||||
* ``2015-03-14 00:07:00 UTC - Some Company Name, Invoice 2016-01-01, money, invoices.pdf``
|
|
||||||
* ``2015-03-14 - Some Company Name, Invoice 2016-01-01, money, invoices.pdf``
|
|
||||||
* ``Some Company Name, Invoice 2016-01-01, money, invoices.pdf``
|
|
||||||
* ``Another Company- Letter of Reference.jpg``
|
|
||||||
|
|
||||||
Do I have to be so strict about naming?
|
|
||||||
=======================================
|
|
||||||
|
|
||||||
Rather than using the strict document naming rules, one can also set the option
|
|
||||||
``PAPERLESS_FILENAME_DATE_ORDER`` in ``paperless.conf`` to any date order
|
|
||||||
that is accepted by dateparser_. Doing so will cause ``paperless`` to default
|
|
||||||
to any date format that is found in the title, instead of a date pulled from
|
|
||||||
the document's text, without requiring the strict formatting of the document
|
|
||||||
filename as described above.
|
|
||||||
|
|
||||||
.. _dateparser: https://github.com/scrapinghub/dateparser/blob/v0.7.0/docs/usage.rst#settings
|
|
||||||
|
|
||||||
.. _advanced-transforming_filenames:
|
|
||||||
|
|
||||||
Transforming filenames for parsing
|
|
||||||
==================================
|
|
||||||
|
|
||||||
Some devices can't produce filenames that can be parsed by the default
|
|
||||||
parser. By configuring the option ``PAPERLESS_FILENAME_PARSE_TRANSFORMS`` in
|
|
||||||
``paperless.conf`` one can add transformations that are applied to the filename
|
|
||||||
before it's parsed.
|
|
||||||
|
|
||||||
The option contains a list of dictionaries of regular expressions (key:
|
|
||||||
``pattern``) and replacements (key: ``repl``) in JSON format, which are
|
|
||||||
applied in order by passing them to ``re.subn``. Transformation stops
|
|
||||||
after the first match, so at most one transformation is applied. The general
|
|
||||||
syntax is
|
|
||||||
|
|
||||||
.. code:: python
|
|
||||||
|
|
||||||
[{"pattern":"pattern1", "repl":"repl1"}, {"pattern":"pattern2", "repl":"repl2"}, ..., {"pattern":"patternN", "repl":"replN"}]
|
|
||||||
|
|
||||||
The example below is for a Brother ADS-2400N, a scanner that allows
|
|
||||||
different names to different hardware buttons (useful for handling
|
|
||||||
multiple entities in one instance), but insists on adding ``_<count>``
|
|
||||||
to the filename.
|
|
||||||
|
|
||||||
.. code:: python
|
|
||||||
|
|
||||||
# Brother profile configuration, support "Name_Date_Count" (the default
|
|
||||||
# setting) and "Name_Count" (use "Name" as tag and "Count" as title).
|
|
||||||
PAPERLESS_FILENAME_PARSE_TRANSFORMS=[{"pattern":"^([a-z]+)_(\\d{8})_(\\d{6})_([0-9]+)\\.", "repl":"\\2\\3Z - \\4 - \\1."}, {"pattern":"^([a-z]+)_([0-9]+)\\.", "repl":" - \\2 - \\1."}]
|
|
||||||
|
|
||||||
|
|
||||||
.. _advanced-matching:
|
.. _advanced-matching:
|
||||||
|
|
||||||
Matching tags, correspondents and document types
|
Matching tags, correspondents and document types
|
||||||
|
@ -400,11 +400,6 @@ PAPERLESS_FILENAME_DATE_ORDER=<format>
|
|||||||
|
|
||||||
Defaults to none, which disables this feature.
|
Defaults to none, which disables this feature.
|
||||||
|
|
||||||
PAPERLESS_FILENAME_PARSE_TRANSFORMS
|
|
||||||
Transforms filenames before they are processed by paperless. See
|
|
||||||
:ref:`advanced-transforming_filenames` for details.
|
|
||||||
|
|
||||||
Defaults to none, which disables this feature.
|
|
||||||
|
|
||||||
Binaries
|
Binaries
|
||||||
########
|
########
|
||||||
|
Loading…
x
Reference in New Issue
Block a user