diff --git a/docs/advanced_usage.rst b/docs/advanced_usage.rst index 48a86384c..8f6b91b4c 100644 --- a/docs/advanced_usage.rst +++ b/docs/advanced_usage.rst @@ -5,85 +5,6 @@ Advanced topics Paperless offers a couple features that automate certain tasks and make your life easier. -Guesswork -######### - - -Any document you put into the consumption directory will be consumed, but if -you name the file right, it'll automatically set some values in the database -for you. This is is the logic the consumer follows: - -1. Try to find the correspondent, title, and tags in the file name following - the pattern: ``Date - Correspondent - Title - tag,tag,tag.pdf``. Note that - the format of the date is **rigidly defined** as ``YYYYMMDDHHMMSSZ`` or - ``YYYYMMDDZ``. The ``Z`` refers "Zulu time" AKA "UTC". - The tags are optional, so the format ``Date - Correspondent - Title.pdf`` - works as well. -2. If that doesn't work, we skip the date and try this pattern: - ``Correspondent - Title - tag,tag,tag.pdf``. -3. If that doesn't work, we try to find the correspondent and title in the file - name following the pattern: ``Correspondent - Title.pdf``. -4. If that doesn't work, just assume that the name of the file is the title. - -So given the above, the following examples would work as you'd expect: - -* ``20150314000700Z - Some Company Name - Invoice 2016-01-01 - money,invoices.pdf`` -* ``20150314Z - Some Company Name - Invoice 2016-01-01 - money,invoices.pdf`` -* ``Some Company Name - Invoice 2016-01-01 - money,invoices.pdf`` -* ``Another Company - Letter of Reference.jpg`` -* ``Dad's Recipe for Pancakes.png`` - -These however wouldn't work: - -* ``2015-03-14 00:07:00 UTC - Some Company Name, Invoice 2016-01-01, money, invoices.pdf`` -* ``2015-03-14 - Some Company Name, Invoice 2016-01-01, money, invoices.pdf`` -* ``Some Company Name, Invoice 2016-01-01, money, invoices.pdf`` -* ``Another Company- Letter of Reference.jpg`` - -Do I have to be so strict about naming? -======================================= - -Rather than using the strict document naming rules, one can also set the option -``PAPERLESS_FILENAME_DATE_ORDER`` in ``paperless.conf`` to any date order -that is accepted by dateparser_. Doing so will cause ``paperless`` to default -to any date format that is found in the title, instead of a date pulled from -the document's text, without requiring the strict formatting of the document -filename as described above. - -.. _dateparser: https://github.com/scrapinghub/dateparser/blob/v0.7.0/docs/usage.rst#settings - -.. _advanced-transforming_filenames: - -Transforming filenames for parsing -================================== - -Some devices can't produce filenames that can be parsed by the default -parser. By configuring the option ``PAPERLESS_FILENAME_PARSE_TRANSFORMS`` in -``paperless.conf`` one can add transformations that are applied to the filename -before it's parsed. - -The option contains a list of dictionaries of regular expressions (key: -``pattern``) and replacements (key: ``repl``) in JSON format, which are -applied in order by passing them to ``re.subn``. Transformation stops -after the first match, so at most one transformation is applied. The general -syntax is - -.. code:: python - - [{"pattern":"pattern1", "repl":"repl1"}, {"pattern":"pattern2", "repl":"repl2"}, ..., {"pattern":"patternN", "repl":"replN"}] - -The example below is for a Brother ADS-2400N, a scanner that allows -different names to different hardware buttons (useful for handling -multiple entities in one instance), but insists on adding ``_`` -to the filename. - -.. code:: python - - # Brother profile configuration, support "Name_Date_Count" (the default - # setting) and "Name_Count" (use "Name" as tag and "Count" as title). - PAPERLESS_FILENAME_PARSE_TRANSFORMS=[{"pattern":"^([a-z]+)_(\\d{8})_(\\d{6})_([0-9]+)\\.", "repl":"\\2\\3Z - \\4 - \\1."}, {"pattern":"^([a-z]+)_([0-9]+)\\.", "repl":" - \\2 - \\1."}] - - .. _advanced-matching: Matching tags, correspondents and document types diff --git a/docs/configuration.rst b/docs/configuration.rst index d3f47215b..efc1a9db1 100644 --- a/docs/configuration.rst +++ b/docs/configuration.rst @@ -400,11 +400,6 @@ PAPERLESS_FILENAME_DATE_ORDER= Defaults to none, which disables this feature. -PAPERLESS_FILENAME_PARSE_TRANSFORMS - Transforms filenames before they are processed by paperless. See - :ref:`advanced-transforming_filenames` for details. - - Defaults to none, which disables this feature. Binaries ########