mirror of
https://github.com/paperless-ngx/paperless-ngx.git
synced 2025-04-02 13:45:10 -05:00
documentation.
This commit is contained in:
parent
65816a434c
commit
278f6da16a
@ -119,8 +119,11 @@ Updating paperless without docker
|
|||||||
|
|
||||||
After grabbing the new release and unpacking the contents, do the following:
|
After grabbing the new release and unpacking the contents, do the following:
|
||||||
|
|
||||||
1. Update python requirements. Paperless uses
|
1. Update dependencies. New paperless version may require additional
|
||||||
`Pipenv`_ for managing dependencies:
|
dependencies. The dependencies required are listed in the section about
|
||||||
|
:ref:`bare metal installations <setup-bare_metal>`.
|
||||||
|
|
||||||
|
2. Update python requirements. If you use Pipenv, this is done with the following steps.
|
||||||
|
|
||||||
.. code:: shell-session
|
.. code:: shell-session
|
||||||
|
|
||||||
@ -132,14 +135,14 @@ After grabbing the new release and unpacking the contents, do the following:
|
|||||||
This creates a new virtual environment (or uses your existing environment)
|
This creates a new virtual environment (or uses your existing environment)
|
||||||
and installs all dependencies into it.
|
and installs all dependencies into it.
|
||||||
|
|
||||||
2. Collect static files.
|
3. Collect static files.
|
||||||
|
|
||||||
.. code:: shell-session
|
.. code:: shell-session
|
||||||
|
|
||||||
$ cd src
|
$ cd src
|
||||||
$ pipenv run python3 manage.py collectstatic --clear
|
$ pipenv run python3 manage.py collectstatic --clear
|
||||||
|
|
||||||
3. Migrate the database.
|
4. Migrate the database.
|
||||||
|
|
||||||
.. code:: shell-session
|
.. code:: shell-session
|
||||||
|
|
||||||
@ -153,14 +156,14 @@ Management utilities
|
|||||||
Paperless comes with some management commands that perform various maintenance
|
Paperless comes with some management commands that perform various maintenance
|
||||||
tasks on your paperless instance. You can invoke these commands either by
|
tasks on your paperless instance. You can invoke these commands either by
|
||||||
|
|
||||||
.. code:: bash
|
.. code:: shell-session
|
||||||
|
|
||||||
$ cd /path/to/paperless
|
$ cd /path/to/paperless
|
||||||
$ docker-compose run --rm webserver <command> <arguments>
|
$ docker-compose run --rm webserver <command> <arguments>
|
||||||
|
|
||||||
or
|
or
|
||||||
|
|
||||||
.. code:: bash
|
.. code:: shell-session
|
||||||
|
|
||||||
$ cd /path/to/paperless/src
|
$ cd /path/to/paperless/src
|
||||||
$ pipenv run python manage.py <command> <arguments>
|
$ pipenv run python manage.py <command> <arguments>
|
||||||
@ -366,7 +369,7 @@ is specified, the archiver will only process that document.
|
|||||||
.. note::
|
.. note::
|
||||||
|
|
||||||
Some documents will cause errors and cannot be converted into PDF/A documents,
|
Some documents will cause errors and cannot be converted into PDF/A documents,
|
||||||
such as encrypted PDF documents. The archiver will skip over these Documents
|
such as encrypted PDF documents. The archiver will skip over these documents
|
||||||
each time it sees them.
|
each time it sees them.
|
||||||
|
|
||||||
.. _utilities-encyption:
|
.. _utilities-encyption:
|
||||||
|
@ -118,114 +118,80 @@ This will test and assemble everything and also build and tag a docker image.
|
|||||||
Extending Paperless
|
Extending Paperless
|
||||||
===================
|
===================
|
||||||
|
|
||||||
.. warning::
|
Paperless does not have any fancy plugin systems and will probably never have. However,
|
||||||
|
some parts of the application have been designed to allow easy integration of additional
|
||||||
|
features without any modification to the base code.
|
||||||
|
|
||||||
This section is not updated to paperless-ng yet.
|
Making custom parsers
|
||||||
|
---------------------
|
||||||
|
|
||||||
For the most part, Paperless is monolithic, so extending it is often best
|
Paperless uses parsers to add documents to paperless. A parser is responsible for:
|
||||||
managed by way of modifying the code directly and issuing a pull request on
|
|
||||||
`GitHub`_. However, over time the project has been evolving to be a little
|
|
||||||
more "pluggable" so that users can write their own stuff that talks to it.
|
|
||||||
|
|
||||||
.. _GitHub: https://github.com/the-paperless-project/paperless
|
* Retrieve the content from the original
|
||||||
|
* Create a thumbnail
|
||||||
|
* Optional: Retrieve a created date from the original
|
||||||
|
* Optional: Create an archived document from the original
|
||||||
|
|
||||||
|
Custom parsers can be added to paperless to support more file types. In order to do that,
|
||||||
|
you need to write the parser itself and announce its existence to paperless.
|
||||||
|
|
||||||
.. _extending-parsers:
|
The parser itself must extend ``documents.parsers.DocumentParser`` and must implement the
|
||||||
|
methods ``parse`` and ``get_thumbnail``. You can provide your own implementation to
|
||||||
Parsers
|
``get_date`` if you don't want to rely on paperless' default date guessing mechanisms.
|
||||||
-------
|
|
||||||
|
|
||||||
You can leverage Paperless' consumption model to have it consume files *other*
|
|
||||||
than ones handled by default like ``.pdf``, ``.jpg``, and ``.tiff``. To do so,
|
|
||||||
you simply follow Django's convention of creating a new app, with a few key
|
|
||||||
requirements.
|
|
||||||
|
|
||||||
|
|
||||||
.. _extending-parsers-parserspy:
|
|
||||||
|
|
||||||
parsers.py
|
|
||||||
..........
|
|
||||||
|
|
||||||
In this file, you create a class that extends
|
|
||||||
``documents.parsers.DocumentParser`` and go about implementing the three
|
|
||||||
required methods:
|
|
||||||
|
|
||||||
* ``get_thumbnail()``: Returns the path to a file we can use as a thumbnail for
|
|
||||||
this document.
|
|
||||||
* ``get_text()``: Returns the text from the document and only the text.
|
|
||||||
* ``get_date()``: If possible, this returns the date of the document, otherwise
|
|
||||||
it should return ``None``.
|
|
||||||
|
|
||||||
|
|
||||||
.. _extending-parsers-signalspy:
|
|
||||||
|
|
||||||
signals.py
|
|
||||||
..........
|
|
||||||
|
|
||||||
At consumption time, Paperless emits a ``document_consumer_declaration``
|
|
||||||
signal which your module has to react to in order to let the consumer know
|
|
||||||
whether or not it's capable of handling a particular file. Think of it like
|
|
||||||
this:
|
|
||||||
|
|
||||||
1. Consumer finds a file in the consumption directory.
|
|
||||||
2. It asks all the available parsers: *"Hey, can you handle this file?"*
|
|
||||||
3. Each parser responds with either ``None`` meaning they can't handle the
|
|
||||||
file, or a dictionary in the following format:
|
|
||||||
|
|
||||||
.. code:: python
|
.. code:: python
|
||||||
|
|
||||||
{
|
class MyCustomParser(DocumentParser):
|
||||||
"parser": <the class name>,
|
|
||||||
"weight": <an integer>
|
|
||||||
}
|
|
||||||
|
|
||||||
The consumer compares the ``weight`` values from all respondents and uses the
|
def parse(self, document_path, mime_type):
|
||||||
class with the highest value to consume the document. The default parser,
|
# This method does not return anything. Rather, you should assign
|
||||||
``RasterisedDocumentParser`` has a weight of ``0``.
|
# whatever you got from the document to the following fields:
|
||||||
|
|
||||||
|
# The content of the document.
|
||||||
|
self.text = "content"
|
||||||
|
|
||||||
|
# Optional: path to a PDF document that you created from the original.
|
||||||
|
self.archive_path = os.path.join(self.tempdir, "archived.pdf")
|
||||||
|
|
||||||
.. _extending-parsers-appspy:
|
# Optional: "created" date of the document.
|
||||||
|
self.date = get_created_from_metadata(document_path)
|
||||||
|
|
||||||
apps.py
|
def get_thumbnail(self, document_path, mime_type):
|
||||||
.......
|
# This should return the path to a thumbnail you created for this
|
||||||
|
# document.
|
||||||
|
return os.path.join(self.tempdir, "thumb.png")
|
||||||
|
|
||||||
This is a standard Django file, but you'll need to add some code to it to
|
If you encounter any issues during parsing, raise a ``documents.parsers.ParseError``.
|
||||||
connect your parser to the ``document_consumer_declaration`` signal.
|
|
||||||
|
|
||||||
|
The ``self.tempdir`` directory is a temporary directory that is guaranteed to be empty
|
||||||
|
and removed after consumption finished. You can use that directory to store any
|
||||||
|
intermediate files and also use it to store the thumbnail / archived document.
|
||||||
|
|
||||||
.. _extending-parsers-finally:
|
After that, you need to announce your parser to paperless. You need to connect a
|
||||||
|
handler to the ``document_consumer_declaration`` signal. Have a look in the file
|
||||||
Finally
|
``src/paperless_tesseract/apps.py`` on how that's done. The handler is a method
|
||||||
.......
|
that returns information about your parser:
|
||||||
|
|
||||||
The last step is to update ``settings.py`` to include your new module.
|
|
||||||
Eventually, this will be dynamic, but at the moment, you have to edit the
|
|
||||||
``INSTALLED_APPS`` section manually. Simply add the path to your AppConfig to
|
|
||||||
the list like this:
|
|
||||||
|
|
||||||
.. code:: python
|
.. code:: python
|
||||||
|
|
||||||
INSTALLED_APPS = [
|
def myparser_consumer_declaration(sender, **kwargs):
|
||||||
...
|
return {
|
||||||
"my_module.apps.MyModuleConfig",
|
"parser": MyCustomParser,
|
||||||
...
|
"weight": 0,
|
||||||
]
|
"mime_types": {
|
||||||
|
"application/pdf": ".pdf",
|
||||||
|
"image/jpeg": ".jpg",
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
Order doesn't matter, but generally it's a good idea to place your module lower
|
* ``parser`` is a reference to a class that extends ``DocumentParser``.
|
||||||
in the list so that you don't end up accidentally overriding project defaults
|
|
||||||
somewhere.
|
|
||||||
|
|
||||||
|
* ``weight`` is used whenever two or more parsers are able to parse a file: The parser with
|
||||||
|
the higher weight wins. This can be used to override the parsers provided by
|
||||||
|
paperless.
|
||||||
|
|
||||||
.. _extending-parsers-example:
|
* ``mime_types`` is a dictionary. The keys are the mime types your parser supports and the value
|
||||||
|
is the default file extension that paperless should use when storing files and serving them for
|
||||||
An Example
|
download. We could guess that from the file extensions, but some mime types have many extensions
|
||||||
..........
|
associated with them and the python methods responsible for guessing the extension do not always
|
||||||
|
return the same value.
|
||||||
The core Paperless functionality is based on this design, so if you want to see
|
|
||||||
what a parser module should look like, have a look at `parsers.py`_,
|
|
||||||
`signals.py`_, and `apps.py`_ in the `paperless_tesseract`_ module.
|
|
||||||
|
|
||||||
.. _parsers.py: https://github.com/the-paperless-project/paperless/blob/master/src/paperless_tesseract/parsers.py
|
|
||||||
.. _signals.py: https://github.com/the-paperless-project/paperless/blob/master/src/paperless_tesseract/signals.py
|
|
||||||
.. _apps.py: https://github.com/the-paperless-project/paperless/blob/master/src/paperless_tesseract/apps.py
|
|
||||||
.. _paperless_tesseract: https://github.com/the-paperless-project/paperless/blob/master/src/paperless_tesseract/
|
|
||||||
|
@ -73,7 +73,7 @@ in your browser and paperless has to do much less work to serve the data.
|
|||||||
|
|
||||||
**Q:** *How do I install paperless-ng on Raspberry Pi?*
|
**Q:** *How do I install paperless-ng on Raspberry Pi?*
|
||||||
|
|
||||||
**A:** There is not docker image for ARM available. If you know how to build
|
**A:** There is no docker image for ARM available. If you know how to build
|
||||||
that automatically, I'm all ears. For now, you have to grab the latest release
|
that automatically, I'm all ears. For now, you have to grab the latest release
|
||||||
archive from the project page and build the image yourself. The release comes
|
archive from the project page and build the image yourself. The release comes
|
||||||
with the front end already compiled, so you don't have to do this on the Pi.
|
with the front end already compiled, so you don't have to do this on the Pi.
|
||||||
|
Loading…
x
Reference in New Issue
Block a user