From 58095100a6319c66def26eb3d50092bb45d04a15 Mon Sep 17 00:00:00 2001 From: Trenton H <797416+stumpylog@users.noreply.github.com> Date: Thu, 31 Oct 2024 07:59:54 -0700 Subject: [PATCH] Updated Backend Ideas List (markdown) --- Backend-Ideas-List.md | 39 ++++++++++++++++----------------------- 1 file changed, 16 insertions(+), 23 deletions(-) diff --git a/Backend-Ideas-List.md b/Backend-Ideas-List.md index 998ebb0..daa933a 100644 --- a/Backend-Ideas-List.md +++ b/Backend-Ideas-List.md @@ -6,34 +6,27 @@ - Provides no benefit - Does still linger in the code base here and there -## Formatting Language - -- Combine and standardize the formatting used for titles and filenames -- Add some basic operations? -- Ensure dates and locale are set to use proper locale - -## Context Managers - -Updating the consumer and maybe the parsers to be context managers. -- allows them to be used in `with` statements, persist some values until the exit -- Single location to clean up temporary directories. -- allow a connection to a server to be maintained throughout the life, which would slightly shorten connections to Tika - ## Migration to s6-overlay - supervisord isn't meant to run as PID 1, S6 is - s8 startup can be separated into independent units, with dependencies between them, which could slightly improve startup time - Initial work done in https://github.com/paperless-ngx/paperless-ngx/tree/feature-s6-overlay -## Integrate `apprise` - -- all in one library for notifications across multiple services, from email to self hosted instances -- need to standardize what is notified and how it is tagged (ie always include `paperless-ngx`, and maybe a level like `warning`, `error`, etc) -- Probably the user provides a filepath to the config -- as much as possible, would likely want to persistent the client through a consumption, to prevent extra work - ## External Services -- External OCR services, using an API, could provide more recent tesseract and ghostscript versions, potentially fixing issues faster than Debian updates (thinking Alpine) -- Is time consuming, so might need celery there? And a database? -- fastapi could easily set this up \ No newline at end of file +### External OCR + +- External OCR services, using an API, could provide more recent tesseract and ghostscript versions, potentially fixing issues faster than Debian updates (thinking Alpine based image) +- This would be streamed the document, eventually return the content and an optional archive file +- Is time consuming, so might need celery/huey/task queue there? And a database? +- fastapi could easily set this up, if there is no need for a database. + +## Separate OCR from Archive + +- The getting of a image or PDF document content should be separated from the generation of an archive file +- Just too many interactions between them, leading to odd combinations + +## Break apart consumer + +- The consumer does so much stuff, break it apart into smaller, more discrete steps +- Make each step well defined with possible status/states to report over the websocket and/or notifications \ No newline at end of file