mirror of
https://github.com/paperless-ngx/paperless-ngx.git
synced 2025-04-02 13:45:10 -05:00
documentation.
This commit is contained in:
parent
65816a434c
commit
278f6da16a
@ -119,8 +119,11 @@ Updating paperless without docker
|
||||
|
||||
After grabbing the new release and unpacking the contents, do the following:
|
||||
|
||||
1. Update python requirements. Paperless uses
|
||||
`Pipenv`_ for managing dependencies:
|
||||
1. Update dependencies. New paperless version may require additional
|
||||
dependencies. The dependencies required are listed in the section about
|
||||
:ref:`bare metal installations <setup-bare_metal>`.
|
||||
|
||||
2. Update python requirements. If you use Pipenv, this is done with the following steps.
|
||||
|
||||
.. code:: shell-session
|
||||
|
||||
@ -132,14 +135,14 @@ After grabbing the new release and unpacking the contents, do the following:
|
||||
This creates a new virtual environment (or uses your existing environment)
|
||||
and installs all dependencies into it.
|
||||
|
||||
2. Collect static files.
|
||||
3. Collect static files.
|
||||
|
||||
.. code:: shell-session
|
||||
|
||||
$ cd src
|
||||
$ pipenv run python3 manage.py collectstatic --clear
|
||||
|
||||
3. Migrate the database.
|
||||
4. Migrate the database.
|
||||
|
||||
.. code:: shell-session
|
||||
|
||||
@ -153,14 +156,14 @@ Management utilities
|
||||
Paperless comes with some management commands that perform various maintenance
|
||||
tasks on your paperless instance. You can invoke these commands either by
|
||||
|
||||
.. code:: bash
|
||||
.. code:: shell-session
|
||||
|
||||
$ cd /path/to/paperless
|
||||
$ docker-compose run --rm webserver <command> <arguments>
|
||||
|
||||
or
|
||||
|
||||
.. code:: bash
|
||||
.. code:: shell-session
|
||||
|
||||
$ cd /path/to/paperless/src
|
||||
$ pipenv run python manage.py <command> <arguments>
|
||||
@ -366,7 +369,7 @@ is specified, the archiver will only process that document.
|
||||
.. note::
|
||||
|
||||
Some documents will cause errors and cannot be converted into PDF/A documents,
|
||||
such as encrypted PDF documents. The archiver will skip over these Documents
|
||||
such as encrypted PDF documents. The archiver will skip over these documents
|
||||
each time it sees them.
|
||||
|
||||
.. _utilities-encyption:
|
||||
|
@ -118,114 +118,80 @@ This will test and assemble everything and also build and tag a docker image.
|
||||
Extending Paperless
|
||||
===================
|
||||
|
||||
.. warning::
|
||||
Paperless does not have any fancy plugin systems and will probably never have. However,
|
||||
some parts of the application have been designed to allow easy integration of additional
|
||||
features without any modification to the base code.
|
||||
|
||||
This section is not updated to paperless-ng yet.
|
||||
Making custom parsers
|
||||
---------------------
|
||||
|
||||
For the most part, Paperless is monolithic, so extending it is often best
|
||||
managed by way of modifying the code directly and issuing a pull request on
|
||||
`GitHub`_. However, over time the project has been evolving to be a little
|
||||
more "pluggable" so that users can write their own stuff that talks to it.
|
||||
Paperless uses parsers to add documents to paperless. A parser is responsible for:
|
||||
|
||||
.. _GitHub: https://github.com/the-paperless-project/paperless
|
||||
* Retrieve the content from the original
|
||||
* Create a thumbnail
|
||||
* Optional: Retrieve a created date from the original
|
||||
* Optional: Create an archived document from the original
|
||||
|
||||
Custom parsers can be added to paperless to support more file types. In order to do that,
|
||||
you need to write the parser itself and announce its existence to paperless.
|
||||
|
||||
.. _extending-parsers:
|
||||
|
||||
Parsers
|
||||
-------
|
||||
|
||||
You can leverage Paperless' consumption model to have it consume files *other*
|
||||
than ones handled by default like ``.pdf``, ``.jpg``, and ``.tiff``. To do so,
|
||||
you simply follow Django's convention of creating a new app, with a few key
|
||||
requirements.
|
||||
|
||||
|
||||
.. _extending-parsers-parserspy:
|
||||
|
||||
parsers.py
|
||||
..........
|
||||
|
||||
In this file, you create a class that extends
|
||||
``documents.parsers.DocumentParser`` and go about implementing the three
|
||||
required methods:
|
||||
|
||||
* ``get_thumbnail()``: Returns the path to a file we can use as a thumbnail for
|
||||
this document.
|
||||
* ``get_text()``: Returns the text from the document and only the text.
|
||||
* ``get_date()``: If possible, this returns the date of the document, otherwise
|
||||
it should return ``None``.
|
||||
|
||||
|
||||
.. _extending-parsers-signalspy:
|
||||
|
||||
signals.py
|
||||
..........
|
||||
|
||||
At consumption time, Paperless emits a ``document_consumer_declaration``
|
||||
signal which your module has to react to in order to let the consumer know
|
||||
whether or not it's capable of handling a particular file. Think of it like
|
||||
this:
|
||||
|
||||
1. Consumer finds a file in the consumption directory.
|
||||
2. It asks all the available parsers: *"Hey, can you handle this file?"*
|
||||
3. Each parser responds with either ``None`` meaning they can't handle the
|
||||
file, or a dictionary in the following format:
|
||||
The parser itself must extend ``documents.parsers.DocumentParser`` and must implement the
|
||||
methods ``parse`` and ``get_thumbnail``. You can provide your own implementation to
|
||||
``get_date`` if you don't want to rely on paperless' default date guessing mechanisms.
|
||||
|
||||
.. code:: python
|
||||
|
||||
{
|
||||
"parser": <the class name>,
|
||||
"weight": <an integer>
|
||||
}
|
||||
class MyCustomParser(DocumentParser):
|
||||
|
||||
The consumer compares the ``weight`` values from all respondents and uses the
|
||||
class with the highest value to consume the document. The default parser,
|
||||
``RasterisedDocumentParser`` has a weight of ``0``.
|
||||
def parse(self, document_path, mime_type):
|
||||
# This method does not return anything. Rather, you should assign
|
||||
# whatever you got from the document to the following fields:
|
||||
|
||||
# The content of the document.
|
||||
self.text = "content"
|
||||
|
||||
# Optional: path to a PDF document that you created from the original.
|
||||
self.archive_path = os.path.join(self.tempdir, "archived.pdf")
|
||||
|
||||
.. _extending-parsers-appspy:
|
||||
# Optional: "created" date of the document.
|
||||
self.date = get_created_from_metadata(document_path)
|
||||
|
||||
apps.py
|
||||
.......
|
||||
def get_thumbnail(self, document_path, mime_type):
|
||||
# This should return the path to a thumbnail you created for this
|
||||
# document.
|
||||
return os.path.join(self.tempdir, "thumb.png")
|
||||
|
||||
This is a standard Django file, but you'll need to add some code to it to
|
||||
connect your parser to the ``document_consumer_declaration`` signal.
|
||||
If you encounter any issues during parsing, raise a ``documents.parsers.ParseError``.
|
||||
|
||||
The ``self.tempdir`` directory is a temporary directory that is guaranteed to be empty
|
||||
and removed after consumption finished. You can use that directory to store any
|
||||
intermediate files and also use it to store the thumbnail / archived document.
|
||||
|
||||
.. _extending-parsers-finally:
|
||||
|
||||
Finally
|
||||
.......
|
||||
|
||||
The last step is to update ``settings.py`` to include your new module.
|
||||
Eventually, this will be dynamic, but at the moment, you have to edit the
|
||||
``INSTALLED_APPS`` section manually. Simply add the path to your AppConfig to
|
||||
the list like this:
|
||||
After that, you need to announce your parser to paperless. You need to connect a
|
||||
handler to the ``document_consumer_declaration`` signal. Have a look in the file
|
||||
``src/paperless_tesseract/apps.py`` on how that's done. The handler is a method
|
||||
that returns information about your parser:
|
||||
|
||||
.. code:: python
|
||||
|
||||
INSTALLED_APPS = [
|
||||
...
|
||||
"my_module.apps.MyModuleConfig",
|
||||
...
|
||||
]
|
||||
def myparser_consumer_declaration(sender, **kwargs):
|
||||
return {
|
||||
"parser": MyCustomParser,
|
||||
"weight": 0,
|
||||
"mime_types": {
|
||||
"application/pdf": ".pdf",
|
||||
"image/jpeg": ".jpg",
|
||||
}
|
||||
}
|
||||
|
||||
Order doesn't matter, but generally it's a good idea to place your module lower
|
||||
in the list so that you don't end up accidentally overriding project defaults
|
||||
somewhere.
|
||||
* ``parser`` is a reference to a class that extends ``DocumentParser``.
|
||||
|
||||
* ``weight`` is used whenever two or more parsers are able to parse a file: The parser with
|
||||
the higher weight wins. This can be used to override the parsers provided by
|
||||
paperless.
|
||||
|
||||
.. _extending-parsers-example:
|
||||
|
||||
An Example
|
||||
..........
|
||||
|
||||
The core Paperless functionality is based on this design, so if you want to see
|
||||
what a parser module should look like, have a look at `parsers.py`_,
|
||||
`signals.py`_, and `apps.py`_ in the `paperless_tesseract`_ module.
|
||||
|
||||
.. _parsers.py: https://github.com/the-paperless-project/paperless/blob/master/src/paperless_tesseract/parsers.py
|
||||
.. _signals.py: https://github.com/the-paperless-project/paperless/blob/master/src/paperless_tesseract/signals.py
|
||||
.. _apps.py: https://github.com/the-paperless-project/paperless/blob/master/src/paperless_tesseract/apps.py
|
||||
.. _paperless_tesseract: https://github.com/the-paperless-project/paperless/blob/master/src/paperless_tesseract/
|
||||
* ``mime_types`` is a dictionary. The keys are the mime types your parser supports and the value
|
||||
is the default file extension that paperless should use when storing files and serving them for
|
||||
download. We could guess that from the file extensions, but some mime types have many extensions
|
||||
associated with them and the python methods responsible for guessing the extension do not always
|
||||
return the same value.
|
||||
|
@ -73,7 +73,7 @@ in your browser and paperless has to do much less work to serve the data.
|
||||
|
||||
**Q:** *How do I install paperless-ng on Raspberry Pi?*
|
||||
|
||||
**A:** There is not docker image for ARM available. If you know how to build
|
||||
**A:** There is no docker image for ARM available. If you know how to build
|
||||
that automatically, I'm all ears. For now, you have to grab the latest release
|
||||
archive from the project page and build the image yourself. The release comes
|
||||
with the front end already compiled, so you don't have to do this on the Pi.
|
||||
|
Loading…
x
Reference in New Issue
Block a user