From 58095100a6319c66def26eb3d50092bb45d04a15 Mon Sep 17 00:00:00 2001
From: Trenton H <797416+stumpylog@users.noreply.github.com>
Date: Thu, 31 Oct 2024 07:59:54 -0700
Subject: [PATCH] Updated Backend Ideas List (markdown)

---
 Backend-Ideas-List.md | 39 ++++++++++++++++-----------------------
 1 file changed, 16 insertions(+), 23 deletions(-)

diff --git a/Backend-Ideas-List.md b/Backend-Ideas-List.md
index 998ebb0..daa933a 100644
--- a/Backend-Ideas-List.md
+++ b/Backend-Ideas-List.md
@@ -6,34 +6,27 @@
 - Provides no benefit
 - Does still linger in the code base here and there
 
-## Formatting Language
-
-- Combine and standardize the formatting used for titles and filenames
-- Add some basic operations?
-- Ensure dates and locale are set to use proper locale
-
-## Context Managers
-
-Updating the consumer and maybe the parsers to be context managers.
-- allows them to be used in `with` statements, persist some values until the exit
-- Single location to clean up temporary directories.
-- allow a connection to a server to be maintained throughout the life, which would slightly shorten connections to Tika
-
 ## Migration to s6-overlay
 
 - supervisord isn't meant to run as PID 1, S6 is
 - s8 startup can be separated into independent units, with dependencies between them, which could slightly improve startup time
 - Initial work done in https://github.com/paperless-ngx/paperless-ngx/tree/feature-s6-overlay
 
-## Integrate `apprise`
-
-- all in one library for notifications across multiple services, from email to self hosted instances
-- need to standardize what is notified and how it is tagged (ie always include `paperless-ngx`, and maybe a level like `warning`, `error`, etc)
-- Probably the user provides a filepath to the config
-- as much as possible, would likely want to persistent the client through a consumption, to prevent extra work
-
 ## External Services
 
-- External OCR services, using an API, could provide more recent tesseract and ghostscript versions, potentially fixing issues faster than Debian updates (thinking Alpine)
-- Is time consuming, so might need celery there?  And a database?
-- fastapi could easily set this up
\ No newline at end of file
+### External OCR
+
+- External OCR services, using an API, could provide more recent tesseract and ghostscript versions, potentially fixing issues faster than Debian updates (thinking Alpine based image)
+- This would be streamed the document, eventually return the content and an optional archive file
+- Is time consuming, so might need celery/huey/task queue there?  And a database?
+- fastapi could easily set this up, if there is no need for a database.
+
+## Separate OCR from Archive
+
+- The getting of a image or PDF document content should be separated from the generation of an archive file
+- Just too many interactions between them, leading to odd combinations
+
+## Break apart consumer
+
+- The consumer does so much stuff, break it apart into smaller, more discrete steps
+- Make each step well defined with possible status/states to report over the websocket and/or notifications
\ No newline at end of file