fix(tika): adapt to Gotenberg 7 API

This commit adapts to the latest breaking changes from Gotenberg 7.
It also freezes the usage of the Gotenberg server to v7.x. Doing
this prevents further breaking changes leaking in our code base.

* refs #1250
This commit is contained in:
Uli Fahrer 2021-08-27 08:32:16 +02:00
parent cd43bc1f66
commit 2dcacaee14
6 changed files with 15 additions and 15 deletions

View File

@ -75,10 +75,10 @@ services:
PAPERLESS_TIKA_ENDPOINT: http://tika:9998
gotenberg:
image: thecodingmachine/gotenberg
image: gotenberg/gotenberg:7
restart: unless-stopped
environment:
DISABLE_GOOGLE_CHROME: 1
CHROMIUM_DISABLE_ROUTES: 1
tika:
image: apache/tika

View File

@ -64,10 +64,10 @@ services:
PAPERLESS_TIKA_ENDPOINT: http://tika:9998
gotenberg:
image: thecodingmachine/gotenberg
image: gotenberg/gotenberg:7
restart: unless-stopped
environment:
DISABLE_GOOGLE_CHROME: 1
CHROMIUM_DISABLE_ROUTES: 1
tika:
image: apache/tika

View File

@ -402,7 +402,7 @@ Tika settings
#############
Paperless can make use of `Tika <https://tika.apache.org/>`_ and
`Gotenberg <https://thecodingmachine.github.io/gotenberg/>`_ for parsing and
`Gotenberg <https://gotenberg.dev/>`_ for parsing and
converting "Office" documents (such as ".doc", ".xlsx" and ".odt"). If you
wish to use this, you must provide a Tika server and a Gotenberg server,
configure their endpoints, and enable the feature.
@ -444,10 +444,10 @@ requires are as follows:
# ...
gotenberg:
image: thecodingmachine/gotenberg
image: gotenberg/gotenberg:7
restart: unless-stopped
environment:
DISABLE_GOOGLE_CHROME: 1
CHROMIUM_DISABLE_ROUTES: 1
tika:
image: apache/tika

View File

@ -101,22 +101,22 @@ You may experience these errors when using the optional TIKA integration:
.. code::
requests.exceptions.HTTPError: 504 Server Error: Gateway Timeout for url: http://gotenberg:3000/convert/office
requests.exceptions.HTTPError: 504 Server Error: Gateway Timeout for url: http://gotenberg:3000/forms/libreoffice/convert
Gotenberg is a server that converts Office documents into PDF documents and has a default timeout of 10 seconds.
Gotenberg is a server that converts Office documents into PDF documents and has a default timeout of 30 seconds.
When conversion takes longer, Gotenberg raises this error.
You can increase the timeout by configuring an environment variable for gotenberg (see also `here <https://thecodingmachine.github.io/gotenberg/#environment_variables.default_wait_timeout>`__).
You can increase the timeout by configuring an environment variable for Gotenberg (see also `here <https://gotenberg.dev/docs/modules/api#properties>`__).
If using docker-compose, this is achieved by the following configuration change in the ``docker-compose.yml`` file:
.. code:: yaml
gotenberg:
image: thecodingmachine/gotenberg
image: gotenberg/gotenberg:7
restart: unless-stopped
environment:
DISABLE_GOOGLE_CHROME: 1
DEFAULT_WAIT_TIMEOUT: 30
CHROMIUM_DISABLE_ROUTES: 1
API_PROCESS_TIMEOUT: 60
Permission denied errors in the consumption directory
#####################################################

View File

@ -1,4 +1,4 @@
docker run -p 5432:5432 -e POSTGRES_PASSWORD=password -v paperless_pgdata:/var/lib/postgresql/data -d postgres:13
docker run -d -p 6379:6379 redis:latest
docker run -p 3000:3000 -d thecodingmachine/gotenberg
docker run -p 3000:3000 -d gotenberg/gotenberg:7
docker run -p 9998:9998 -d apache/tika

View File

@ -67,7 +67,7 @@ class TikaDocumentParser(DocumentParser):
def convert_to_pdf(self, document_path, file_name):
pdf_path = os.path.join(self.tempdir, "convert.pdf")
gotenberg_server = settings.PAPERLESS_TIKA_GOTENBERG_ENDPOINT
url = gotenberg_server + "/convert/office"
url = gotenberg_server + "/forms/libreoffice/convert"
self.log("info", f"Converting {document_path} to PDF as {pdf_path}")
files = {"files": (file_name or os.path.basename(document_path),