Merge pull request #204 from jovandeginste/paperless_tika

WIP: Add the new paperless_tika parser
This commit is contained in:
Jonas Winkler
2021-01-01 20:21:18 +01:00
committed by GitHub
8 changed files with 273 additions and 0 deletions

View File

@@ -277,6 +277,35 @@ PAPERLESS_OCR_USER_ARG=<json>
{"deskew": true, "optimize": 3, "unpaper_args": "--pre-rotate 90"}
.. _configuration-tika:
Tika settings
#############
Paperless can make use of `Tika <https://tika.apache.org/>`_ and
`Gotenberg <https://thecodingmachine.github.io/gotenberg/>`_ for parsing and
converting "Office" documents (such as ".doc", ".xlsx" and ".odt"). If you
wish to use this, you must provide a Tika server and a Gotenberg server,
configure their endpoints, and enable the feature.
If you run paperless on docker, you can add those services to the docker-compose
file (see the examples provided).
PAPERLESS_TIKA_ENABLED=<bool>
Enable (or disable) the Tika parser.
Defaults to false.
PAPERLESS_TIKA_ENDPOINT=<url>
Set the endpoint URL were Paperless can reach your Tika server.
Defaults to "http://localhost:9998".
PAPERLESS_TIKA_GOTENBERG_ENDPOINT=<url>
Set the endpoint URL were Paperless can reach your Gotenberg server.
Defaults to "http://localhost:3000".
Software tweaks
###############