mirror of
				https://github.com/paperless-ngx/paperless-ngx.git
				synced 2025-11-03 03:16:10 -06:00 
			
		
		
		
	Add section for extending
This commit is contained in:
		
							
								
								
									
										104
									
								
								docs/extending.rst
									
									
									
									
									
										Normal file
									
								
							
							
						
						
									
										104
									
								
								docs/extending.rst
									
									
									
									
									
										Normal file
									
								
							@@ -0,0 +1,104 @@
 | 
				
			|||||||
 | 
					.. _extending:
 | 
				
			||||||
 | 
					
 | 
				
			||||||
 | 
					Extending Paperless
 | 
				
			||||||
 | 
					===================
 | 
				
			||||||
 | 
					
 | 
				
			||||||
 | 
					For the most part, Paperless is monolithic, so extending it is often best
 | 
				
			||||||
 | 
					managed by way of modifying the code directly and issuing a pull request on
 | 
				
			||||||
 | 
					`GitHub`_.  However, over time the project has been evolving to be a little
 | 
				
			||||||
 | 
					more "pluggable" so that users can write their own stuff that talks to it.
 | 
				
			||||||
 | 
					
 | 
				
			||||||
 | 
					.. _GitHub: https://github.com/danielquinn/paperless
 | 
				
			||||||
 | 
					
 | 
				
			||||||
 | 
					
 | 
				
			||||||
 | 
					.. _extending-parsers:
 | 
				
			||||||
 | 
					
 | 
				
			||||||
 | 
					Parsers
 | 
				
			||||||
 | 
					-------
 | 
				
			||||||
 | 
					
 | 
				
			||||||
 | 
					You can leverage Paperless' consumption model to have it consume files *other*
 | 
				
			||||||
 | 
					than ones handled by default like ``.pdf``, ``.jpg``, and ``.tiff``.  To do so,
 | 
				
			||||||
 | 
					you simply follow Django's convention of creating a new app, with a few key
 | 
				
			||||||
 | 
					requirements.
 | 
				
			||||||
 | 
					
 | 
				
			||||||
 | 
					
 | 
				
			||||||
 | 
					.. _extending-parsers-parserspy:
 | 
				
			||||||
 | 
					
 | 
				
			||||||
 | 
					parsers.py
 | 
				
			||||||
 | 
					..........
 | 
				
			||||||
 | 
					
 | 
				
			||||||
 | 
					In this file, you create a class that extends
 | 
				
			||||||
 | 
					``documents.parsers.DocumentParser`` and go about implementing the three
 | 
				
			||||||
 | 
					required methods:
 | 
				
			||||||
 | 
					
 | 
				
			||||||
 | 
					* ``get_thumbnail()``: Returns the path to a file we can use as a thumbnail for
 | 
				
			||||||
 | 
					  this document.
 | 
				
			||||||
 | 
					* ``get_text()``: Returns the text from the document and only the text.
 | 
				
			||||||
 | 
					* ``get_date()``: If possible, this returns the date of the document, otherwise
 | 
				
			||||||
 | 
					  it should return ``None``.
 | 
				
			||||||
 | 
					
 | 
				
			||||||
 | 
					
 | 
				
			||||||
 | 
					.. _extending-parsers-signalspy:
 | 
				
			||||||
 | 
					
 | 
				
			||||||
 | 
					signals.py
 | 
				
			||||||
 | 
					..........
 | 
				
			||||||
 | 
					
 | 
				
			||||||
 | 
					At consumption time, Paperless emits a ``document_consumer_declaration``
 | 
				
			||||||
 | 
					signal which your module has to react to in order to let the consumer know
 | 
				
			||||||
 | 
					whether or not it's capable of handling a particular file.  Think of it like
 | 
				
			||||||
 | 
					this:
 | 
				
			||||||
 | 
					
 | 
				
			||||||
 | 
					1. Consumer finds a file in the consumption directory.
 | 
				
			||||||
 | 
					2. It asks all the available parsers: *"Hey, can you handle this file?"*
 | 
				
			||||||
 | 
					3. The first parser that says yes gets to handle the file.  The order in which
 | 
				
			||||||
 | 
					   the parsers are asked is handled by sorting ``INSTALLED_APPS`` in
 | 
				
			||||||
 | 
					   ``settings.py``.
 | 
				
			||||||
 | 
					
 | 
				
			||||||
 | 
					
 | 
				
			||||||
 | 
					.. _extending-parsers-appspy:
 | 
				
			||||||
 | 
					
 | 
				
			||||||
 | 
					apps.py
 | 
				
			||||||
 | 
					.......
 | 
				
			||||||
 | 
					
 | 
				
			||||||
 | 
					This is a standard Django file, but you'll need to add some code to it to
 | 
				
			||||||
 | 
					register your parser as being able to handle particular files.
 | 
				
			||||||
 | 
					
 | 
				
			||||||
 | 
					
 | 
				
			||||||
 | 
					.. _extending-parsers-finally:
 | 
				
			||||||
 | 
					
 | 
				
			||||||
 | 
					Finally
 | 
				
			||||||
 | 
					.......
 | 
				
			||||||
 | 
					
 | 
				
			||||||
 | 
					The last step is to update ``settings.py`` to include your new module.
 | 
				
			||||||
 | 
					Eventually, this will be dynamic, but at the moment, you have to edit the
 | 
				
			||||||
 | 
					``INSTALLED_APPS`` section manually.  Simply add the path to your AppConfig to
 | 
				
			||||||
 | 
					the list like this:
 | 
				
			||||||
 | 
					
 | 
				
			||||||
 | 
					.. code:: python
 | 
				
			||||||
 | 
					
 | 
				
			||||||
 | 
					    INSTALLED_APPS = [
 | 
				
			||||||
 | 
					        ...
 | 
				
			||||||
 | 
					        "my_module.apps.MyModuleConfig",
 | 
				
			||||||
 | 
					        "paperless_tesseract.apps.PaperlessTesseractConfig",
 | 
				
			||||||
 | 
					        ...
 | 
				
			||||||
 | 
					    ]
 | 
				
			||||||
 | 
					
 | 
				
			||||||
 | 
					Note that we're placing our module *above* ``PaperlessTesseractConfig``.  This
 | 
				
			||||||
 | 
					is to ensure that if your module wants to handle any files typically handled by
 | 
				
			||||||
 | 
					the default module, yours will win instead.  If there's no conflict between
 | 
				
			||||||
 | 
					what your module does and the default, then order doesn't matter.
 | 
				
			||||||
 | 
					
 | 
				
			||||||
 | 
					
 | 
				
			||||||
 | 
					.. _extending-parsers-example:
 | 
				
			||||||
 | 
					
 | 
				
			||||||
 | 
					An Example
 | 
				
			||||||
 | 
					..........
 | 
				
			||||||
 | 
					
 | 
				
			||||||
 | 
					The core Paperless functionality is based on this design, so if you want to see
 | 
				
			||||||
 | 
					what a parser module should look like, have a look at `parsers.py`_,
 | 
				
			||||||
 | 
					`signals.py`_, and `apps.py`_ in the `paperless_tesseract`_ module.
 | 
				
			||||||
 | 
					
 | 
				
			||||||
 | 
					.. _parsers.py: https://github.com/danielquinn/paperless/blob/master/src/paperless_tesseract/parsers.py
 | 
				
			||||||
 | 
					.. _signals.py: https://github.com/danielquinn/paperless/blob/master/src/paperless_tesseract/signals.py
 | 
				
			||||||
 | 
					.. _apps.py: https://github.com/danielquinn/paperless/blob/master/src/paperless_tesseract/apps.py
 | 
				
			||||||
 | 
					.. _paperless_tesseract: https://github.com/danielquinn/paperless/blob/master/src/paperless_tesseract/
 | 
				
			||||||
@@ -40,6 +40,7 @@ Contents
 | 
				
			|||||||
   utilities
 | 
					   utilities
 | 
				
			||||||
   guesswork
 | 
					   guesswork
 | 
				
			||||||
   migrating
 | 
					   migrating
 | 
				
			||||||
 | 
					   extending
 | 
				
			||||||
   troubleshooting
 | 
					   troubleshooting
 | 
				
			||||||
   scanners
 | 
					   scanners
 | 
				
			||||||
   changelog
 | 
					   changelog
 | 
				
			||||||
 
 | 
				
			|||||||
		Reference in New Issue
	
	Block a user