mirror of
https://github.com/paperless-ngx/paperless-ngx.git
synced 2025-07-26 18:14:37 -05:00
Updated v3 Ideas List (markdown)
parent
aa26a44a12
commit
a550f7e348
@ -9,7 +9,7 @@
|
|||||||
### Settings Updates
|
### Settings Updates
|
||||||
|
|
||||||
- Remove all but Django settings from the environment
|
- Remove all but Django settings from the environment
|
||||||
- Separate OCR vs other settings
|
- Separate OCR vs other settings (call them site setting?)
|
||||||
- Create multiple levels of OCR settings:
|
- Create multiple levels of OCR settings:
|
||||||
- A default system configuration, controlled by staff/superusers
|
- A default system configuration, controlled by staff/superusers
|
||||||
- A user specific settings set
|
- A user specific settings set
|
||||||
@ -29,18 +29,24 @@
|
|||||||
- An initial task takes the file, waits for it to be unmodified, then determines the next task to start.
|
- An initial task takes the file, waits for it to be unmodified, then determines the next task to start.
|
||||||
- Or alternatively, the initial task builds a pipeline and starts that.
|
- Or alternatively, the initial task builds a pipeline and starts that.
|
||||||
- Handles deciding if the file can be consumed, rather than when a new file is seen (see plugin ideas)
|
- Handles deciding if the file can be consumed, rather than when a new file is seen (see plugin ideas)
|
||||||
|
- Make each step along the well a well defined status update, sent over websocket, but also configure something like apprise/ntfy
|
||||||
|
- TODO: If something fails along the chain, the DB shouldn't be updated. Maybe 1 task, multiple steps, wrapped in a transaction?
|
||||||
|
|
||||||
### Actual Plugins
|
### Actual Plugins
|
||||||
|
|
||||||
- Design a system to allow plugins, while splitting apart the current code into plugins
|
- Design a system to allow plugins, while splitting apart the current code into plugins
|
||||||
- I can see the following being plugins:
|
- I can see the following being plugins:
|
||||||
- Parsers (obviously. Includes things like AI/cloud OCR to get the content or even could talk to a remote, but local network API)
|
- Parsers (obviously. Includes things like AI/cloud OCR to get the content or even could talk to a remote API)
|
||||||
- Archive generation (example, use Gotenberg to convert a PDF to PDF/A instead of ocrmypdf)
|
- Archive generation (example, use Gotenberg to convert a PDF to PDF/A instead of ocrmypdf)
|
||||||
- Thumbnail generation (maybe you want to handle PDFs differently than JPEGs?)
|
- Thumbnail generation (maybe you want to handle PDFs differently than JPEGs?)
|
||||||
- Date parsing (handling non-latin dates, for example)
|
- Date parsing (handling non-latin dates, for example)
|
||||||
- Machine learning (provides an interface which returns the proposed tags, type, etc)
|
- Machine learning (provides an interface which returns the proposed tags, type, etc)
|
||||||
- Ideally, plugins should be registered when installed, declaring what mime types they support
|
- Ideally, plugins should be registered when installed, declaring what mime types they support, with some sort of conflict resolution
|
||||||
- With the settings updates above, a workflow could also be used to set the parser based on matching certain values
|
- With the settings updates above, a workflow could also be used to set the parser based on matching certain values
|
||||||
|
- Provide "paperless", a core set of functionality, including models
|
||||||
|
- Provide the existing parsers, re-configured to match the new format
|
||||||
|
- Rework the other parts to conform to the plugin API spec
|
||||||
|
|
||||||
|
|
||||||
### Simpler consumer
|
### Simpler consumer
|
||||||
|
|
||||||
@ -85,23 +91,6 @@
|
|||||||
- The getting of a image or PDF document content should be separated from the generation of an archive file
|
- The getting of a image or PDF document content should be separated from the generation of an archive file
|
||||||
- Just too many interactions between them, leading to odd combinations
|
- Just too many interactions between them, leading to odd combinations
|
||||||
|
|
||||||
## Break apart consumer
|
|
||||||
|
|
||||||
- The consumer does so much stuff, break it apart into smaller, more discrete steps
|
|
||||||
- Make each step well defined with possible status/states to report over the websocket and/or notifications
|
|
||||||
- Make it a chain of tasks, passing a package through which accumulates data, etc, before being saved
|
|
||||||
|
|
||||||
## Settings Manager
|
|
||||||
|
|
||||||
- Allow multiple levels of settings to be defined
|
|
||||||
- From matching, apply certain settings
|
|
||||||
- From the user (if known), apply their settings
|
|
||||||
- From the system wide settings
|
|
||||||
- From environment variable settings
|
|
||||||
- Then defaults
|
|
||||||
- settings at lower levels have less priority, so a matched setting is never changed
|
|
||||||
- Settings travel through the new consumer with the document
|
|
||||||
|
|
||||||
## Django Ninja
|
## Django Ninja
|
||||||
|
|
||||||
- Really like the OpenAPI spec it generates
|
- Really like the OpenAPI spec it generates
|
||||||
@ -114,6 +103,7 @@
|
|||||||
- Could track, with some resolution, when a token was last used. Might be nice to display and allow removing old tokens which haven't been used
|
- Could track, with some resolution, when a token was last used. Might be nice to display and allow removing old tokens which haven't been used
|
||||||
- Could implement expiration too
|
- Could implement expiration too
|
||||||
- Async pagination isn't working quite yet
|
- Async pagination isn't working quite yet
|
||||||
|
- No idea about allauth/oidc integration
|
||||||
|
|
||||||
## Vector Embeddings
|
## Vector Embeddings
|
||||||
|
|
||||||
|
Loading…
x
Reference in New Issue
Block a user