mirror of
				https://github.com/paperless-ngx/paperless-ngx.git
				synced 2025-10-30 03:56:23 -05:00 
			
		
		
		
	Add the new paperless_tika parser
This parser will use an external Tika and Gotenberg server to parse "Office" documents (.doc, .xls, .odt, etc.) Signed-off-by: Jo Vandeginste <Jo.Vandeginste@kuleuven.be>
This commit is contained in:
		| @@ -277,6 +277,35 @@ PAPERLESS_OCR_USER_ARG=<json> | ||||
|  | ||||
|         {"deskew": true, "optimize": 3, "unpaper_args": "--pre-rotate 90"}     | ||||
|      | ||||
| .. _configuration-tika: | ||||
|  | ||||
| Tika settings | ||||
| ############# | ||||
|  | ||||
| Paperless can make use of `Tika <https://tika.apache.org/>`_ and  | ||||
| `Gotenberg <https://thecodingmachine.github.io/gotenberg/>`_ for parsing and | ||||
| converting "Office" documents (such as ".doc", ".xlsx" and ".odt"). If you | ||||
| wish to use this, you must provide a Tika server and a Gotenberg server, | ||||
| configure their endpoints, and enable the feature. | ||||
|  | ||||
| If you run paperless on docker, you can add those services to the docker-compose | ||||
| file (see the examples provided). | ||||
|  | ||||
| PAPERLESS_TIKA=<bool> | ||||
|     Enable (or disable) the Tika parser. | ||||
|  | ||||
|     Defaults to false. | ||||
|  | ||||
| TIKA_SERVER_ENDPOINT=<url> | ||||
|     Set the endpoint URL were Paperless can reach your Tika server. | ||||
|  | ||||
|     Defaults to "http://localhost:9998". | ||||
|  | ||||
| GOTENBERG_SERVER_ENDPOINT=<url> | ||||
|     Set the endpoint URL were Paperless can reach your Gotenberg server. | ||||
|  | ||||
|     Defaults to "http://localhost:3000". | ||||
|  | ||||
|      | ||||
| Software tweaks | ||||
| ############### | ||||
|   | ||||
		Reference in New Issue
	
	Block a user
	 Jo Vandeginste
					Jo Vandeginste