Updated Backend Ideas List (markdown)

Trenton H 2024-10-31 07:59:54 -07:00
parent 5a6d1680e4
commit 58095100a6

@ -6,34 +6,27 @@
- Provides no benefit
- Does still linger in the code base here and there
## Formatting Language
- Combine and standardize the formatting used for titles and filenames
- Add some basic operations?
- Ensure dates and locale are set to use proper locale
## Context Managers
Updating the consumer and maybe the parsers to be context managers.
- allows them to be used in `with` statements, persist some values until the exit
- Single location to clean up temporary directories.
- allow a connection to a server to be maintained throughout the life, which would slightly shorten connections to Tika
## Migration to s6-overlay
- supervisord isn't meant to run as PID 1, S6 is
- s8 startup can be separated into independent units, with dependencies between them, which could slightly improve startup time
- Initial work done in https://github.com/paperless-ngx/paperless-ngx/tree/feature-s6-overlay
## Integrate `apprise`
- all in one library for notifications across multiple services, from email to self hosted instances
- need to standardize what is notified and how it is tagged (ie always include `paperless-ngx`, and maybe a level like `warning`, `error`, etc)
- Probably the user provides a filepath to the config
- as much as possible, would likely want to persistent the client through a consumption, to prevent extra work
## External Services
- External OCR services, using an API, could provide more recent tesseract and ghostscript versions, potentially fixing issues faster than Debian updates (thinking Alpine)
- Is time consuming, so might need celery there? And a database?
- fastapi could easily set this up
### External OCR
- External OCR services, using an API, could provide more recent tesseract and ghostscript versions, potentially fixing issues faster than Debian updates (thinking Alpine based image)
- This would be streamed the document, eventually return the content and an optional archive file
- Is time consuming, so might need celery/huey/task queue there? And a database?
- fastapi could easily set this up, if there is no need for a database.
## Separate OCR from Archive
- The getting of a image or PDF document content should be separated from the generation of an archive file
- Just too many interactions between them, leading to odd combinations
## Break apart consumer
- The consumer does so much stuff, break it apart into smaller, more discrete steps
- Make each step well defined with possible status/states to report over the websocket and/or notifications