mirror of
https://github.com/paperless-ngx/paperless-ngx.git
synced 2025-04-02 13:45:10 -05:00
475 lines
16 KiB
Markdown
475 lines
16 KiB
Markdown
# Development
|
|
|
|
This section describes the steps you need to take to start development
|
|
on paperless-ngx.
|
|
|
|
Check out the source from github. The repository is organized in the
|
|
following way:
|
|
|
|
- `main` always represents the latest release and will only see
|
|
changes when a new release is made.
|
|
- `dev` contains the code that will be in the next release.
|
|
- `feature-X` contain bigger changes that will be in some release, but
|
|
not necessarily the next one.
|
|
|
|
When making functional changes to paperless, _always_ make your changes
|
|
on the `dev` branch.
|
|
|
|
Apart from that, the folder structure is as follows:
|
|
|
|
- `docs/` - Documentation.
|
|
- `src-ui/` - Code of the front end.
|
|
- `src/` - Code of the back end.
|
|
- `scripts/` - Various scripts that help with different parts of
|
|
development.
|
|
- `docker/` - Files required to build the docker image.
|
|
|
|
## Contributing to Paperless
|
|
|
|
Maybe you've been using Paperless for a while and want to add a feature
|
|
or two, or maybe you've come across a bug that you have some ideas how
|
|
to solve. The beauty of open source software is that you can see what's
|
|
wrong and help to get it fixed for everyone!
|
|
|
|
Before contributing please review our [code of
|
|
conduct](https://github.com/paperless-ngx/paperless-ngx/blob/main/CODE_OF_CONDUCT.md)
|
|
and other important information in the [contributing
|
|
guidelines](https://github.com/paperless-ngx/paperless-ngx/blob/main/CONTRIBUTING.md).
|
|
|
|
## Code formatting with pre-commit Hooks
|
|
|
|
To ensure a consistent style and formatting across the project source,
|
|
the project utilizes a Git [`pre-commit`](https://git-scm.com/book/en/v2/Customizing-Git-Git-Hooks)
|
|
hook to perform some formatting and linting before a commit is allowed.
|
|
That way, everyone uses the same style and some common issues can be caught
|
|
early on. See below for installation instructions.
|
|
|
|
Once installed, hooks will run when you commit. If the formatting isn't
|
|
quite right or a linter catches something, the commit will be rejected.
|
|
You'll need to look at the output and fix the issue. Some hooks, such
|
|
as the Python formatting tool `black`, will format failing
|
|
files, so all you need to do is `git add` those files again
|
|
and retry your commit.
|
|
|
|
## Initial setup and first start
|
|
|
|
After you forked and cloned the code from github you need to perform a
|
|
first-time setup. To do the setup you need to perform the steps from the
|
|
following chapters in a certain order:
|
|
|
|
1. Install prerequisites + pipenv as mentioned in
|
|
[Bare metal route](/setup#bare_metal)
|
|
|
|
2. Copy `paperless.conf.example` to `paperless.conf` and enable debug
|
|
mode.
|
|
|
|
3. Install the Angular CLI interface:
|
|
|
|
```shell-session
|
|
$ npm install -g @angular/cli
|
|
```
|
|
|
|
4. Install pre-commit hooks
|
|
|
|
```shell-session
|
|
pre-commit install
|
|
```
|
|
|
|
5. Create `consume` and `media` folders in the cloned root folder.
|
|
|
|
```shell-session
|
|
mkdir -p consume media
|
|
```
|
|
|
|
6. You can now either ...
|
|
|
|
- install redis or
|
|
|
|
- use the included scripts/start-services.sh to use docker to fire
|
|
up a redis instance (and some other services such as tika,
|
|
gotenberg and a database server) or
|
|
|
|
- spin up a bare redis container
|
|
|
|
```shell-session
|
|
docker run -d -p 6379:6379 --restart unless-stopped redis:latest
|
|
```
|
|
|
|
7. Install the python dependencies by performing in the src/ directory.
|
|
|
|
```shell-session
|
|
pipenv install --dev
|
|
```
|
|
|
|
!!! note
|
|
|
|
Make sure you're using python 3.10.x or lower. Otherwise you might
|
|
get issues with building dependencies. You can use
|
|
[pyenv](https://github.com/pyenv/pyenv) to install a specific
|
|
python version.
|
|
|
|
8. Generate the static UI so you can perform a login to get session
|
|
that is required for frontend development (this needs to be done one
|
|
time only). From src-ui directory:
|
|
|
|
```shell-session
|
|
npm install .
|
|
./node_modules/.bin/ng build --configuration production
|
|
```
|
|
|
|
9. Apply migrations and create a superuser for your dev instance:
|
|
|
|
```shell-session
|
|
python3 manage.py migrate
|
|
python3 manage.py createsuperuser
|
|
```
|
|
|
|
10. Now spin up the dev backend. Depending on which part of paperless
|
|
you're developing for, you need to have some or all of them
|
|
running.
|
|
|
|
```shell-session
|
|
python3 manage.py runserver & python3 manage.py document_consumer & celery --app paperless worker
|
|
```
|
|
|
|
11. Login with the superuser credentials provided in step 8 at
|
|
`http://localhost:8000` to create a session that enables you to use
|
|
the backend.
|
|
|
|
Backend development environment is now ready, to start Frontend
|
|
development go to `/src-ui` and run `ng serve`. From there you can use
|
|
`http://localhost:4200` for a preview.
|
|
|
|
## Back end development
|
|
|
|
The backend is a [Django](https://www.djangoproject.com/) application. PyCharm works well for development,
|
|
but you can use whatever you want.
|
|
|
|
Configure the IDE to use the src/ folder as the base source folder.
|
|
Configure the following launch configurations in your IDE:
|
|
|
|
- `python3 manage.py runserver`
|
|
- `celery --app paperless worker`
|
|
- `python3 manage.py document_consumer`
|
|
|
|
To start them all:
|
|
|
|
```shell-session
|
|
python3 manage.py runserver & python3 manage.py document_consumer & celery --app paperless worker
|
|
```
|
|
|
|
Testing and code style:
|
|
|
|
- Run `pytest` in the `src/` directory to execute all tests. This also
|
|
generates a HTML coverage report. When runnings test, paperless.conf
|
|
is loaded as well. However: the tests rely on the default
|
|
configuration. This is not ideal. But for now, make sure no settings
|
|
except for DEBUG are overridden when testing.
|
|
|
|
- Coding style is enforced by the Git pre-commit hooks. These will
|
|
ensure your code is formatted and do some linting when you do a `git commit`.
|
|
|
|
- You can also run `black` manually to format your code
|
|
|
|
- The `pre-commit` hooks will modify files and interact with each other.
|
|
It may take a couple of `git add`, `git commit` cycle to satisfy them.
|
|
|
|
!!! note
|
|
|
|
The line length rule E501 is generally useful for getting multiple
|
|
source files next to each other on the screen. However, in some
|
|
cases, its just not possible to make some lines fit, especially
|
|
complicated IF cases. Append `# noqa: E501` to disable this check
|
|
for certain lines.
|
|
|
|
## Front end development
|
|
|
|
The front end is built using Angular. In order to get started, you need
|
|
`npm`. Install the Angular CLI interface with
|
|
|
|
```shell-session
|
|
$ npm install -g @angular/cli
|
|
```
|
|
|
|
and make sure that it's on your path. Next, in the src-ui/ directory,
|
|
install the required dependencies of the project.
|
|
|
|
```shell-session
|
|
$ npm install
|
|
```
|
|
|
|
You can launch a development server by running
|
|
|
|
```shell-session
|
|
$ ng serve
|
|
```
|
|
|
|
This will automatically update whenever you save. However, in-place
|
|
compilation might fail on syntax errors, in which case you need to
|
|
restart it.
|
|
|
|
By default, the development server is available on
|
|
`http://localhost:4200/` and is configured to access the API at
|
|
`http://localhost:8000/api/`, which is the default of the backend. If
|
|
you enabled DEBUG on the back end, several security overrides for
|
|
allowed hosts, CORS and X-Frame-Options are in place so that the front
|
|
end behaves exactly as in production. This also relies on you being
|
|
logged into the back end. Without a valid session, The front end will
|
|
simply not work.
|
|
|
|
Testing and code style:
|
|
|
|
- The frontend code (.ts, .html, .scss) use `prettier` for code
|
|
formatting via the Git `pre-commit` hooks which run automatically on
|
|
commit. See
|
|
[above](#code-formatting-with-pre-commit-hooks) for installation. You can also run this via cli with a
|
|
command such as
|
|
|
|
```shell-session
|
|
$ git ls-files -- '*.ts' | xargs pre-commit run prettier --files
|
|
```
|
|
|
|
- Frontend testing uses jest and cypress. There is currently a need
|
|
for significantly more frontend tests. Unit tests and e2e tests,
|
|
respectively, can be run non-interactively with:
|
|
|
|
```shell-session
|
|
$ ng test
|
|
$ npm run e2e:ci
|
|
```
|
|
|
|
Cypress also includes a UI which can be run from within the `src-ui`
|
|
directory with
|
|
|
|
```shell-session
|
|
$ ./node_modules/.bin/cypress open
|
|
```
|
|
|
|
In order to build the front end and serve it as part of django, execute
|
|
|
|
```shell-session
|
|
$ ng build --configuration production
|
|
```
|
|
|
|
This will build the front end and put it in a location from which the
|
|
Django server will serve it as static content. This way, you can verify
|
|
that authentication is working.
|
|
|
|
## Localization
|
|
|
|
Paperless is available in many different languages. Since paperless
|
|
consists both of a django application and an Angular front end, both
|
|
these parts have to be translated separately.
|
|
|
|
### Front end localization
|
|
|
|
- The Angular front end does localization according to the [Angular
|
|
documentation](https://angular.io/guide/i18n).
|
|
- The source language of the project is "en_US".
|
|
- The source strings end up in the file "src-ui/messages.xlf".
|
|
- The translated strings need to be placed in the
|
|
"src-ui/src/locale/" folder.
|
|
- In order to extract added or changed strings from the source files,
|
|
call `ng xi18n --ivy`.
|
|
|
|
Adding new languages requires adding the translated files in the
|
|
"src-ui/src/locale/" folder and adjusting a couple files.
|
|
|
|
1. Adjust "src-ui/angular.json":
|
|
|
|
```json
|
|
"i18n": {
|
|
"sourceLocale": "en-US",
|
|
"locales": {
|
|
"de": "src/locale/messages.de.xlf",
|
|
"nl-NL": "src/locale/messages.nl_NL.xlf",
|
|
"fr": "src/locale/messages.fr.xlf",
|
|
"en-GB": "src/locale/messages.en_GB.xlf",
|
|
"pt-BR": "src/locale/messages.pt_BR.xlf",
|
|
"language-code": "language-file"
|
|
}
|
|
}
|
|
```
|
|
|
|
2. Add the language to the available options in
|
|
"src-ui/src/app/services/settings.service.ts":
|
|
|
|
```typescript
|
|
getLanguageOptions(): LanguageOption[] {
|
|
return [
|
|
{code: "en-us", name: $localize`English (US)`, englishName: "English (US)", dateInputFormat: "mm/dd/yyyy"},
|
|
{code: "en-gb", name: $localize`English (GB)`, englishName: "English (GB)", dateInputFormat: "dd/mm/yyyy"},
|
|
{code: "de", name: $localize`German`, englishName: "German", dateInputFormat: "dd.mm.yyyy"},
|
|
{code: "nl", name: $localize`Dutch`, englishName: "Dutch", dateInputFormat: "dd-mm-yyyy"},
|
|
{code: "fr", name: $localize`French`, englishName: "French", dateInputFormat: "dd/mm/yyyy"},
|
|
{code: "pt-br", name: $localize`Portuguese (Brazil)`, englishName: "Portuguese (Brazil)", dateInputFormat: "dd/mm/yyyy"}
|
|
// Add your new language here
|
|
]
|
|
}
|
|
```
|
|
|
|
`dateInputFormat` is a special string that defines the behavior of
|
|
the date input fields and absolutely needs to contain "dd", "mm"
|
|
and "yyyy".
|
|
|
|
3. Import and register the Angular data for this locale in
|
|
"src-ui/src/app/app.module.ts":
|
|
|
|
```typescript
|
|
import localeDe from '@angular/common/locales/de'
|
|
registerLocaleData(localeDe)
|
|
```
|
|
|
|
### Back end localization
|
|
|
|
A majority of the strings that appear in the back end appear only when
|
|
the admin is used. However, some of these are still shown on the front
|
|
end (such as error messages).
|
|
|
|
- The django application does localization according to the [django
|
|
documentation](https://docs.djangoproject.com/en/3.1/topics/i18n/translation/).
|
|
- The source language of the project is "en_US".
|
|
- Localization files end up in the folder "src/locale/".
|
|
- In order to extract strings from the application, call
|
|
`python3 manage.py makemessages -l en_US`. This is important after
|
|
making changes to translatable strings.
|
|
- The message files need to be compiled for them to show up in the
|
|
application. Call `python3 manage.py compilemessages` to do this.
|
|
The generated files don't get committed into git, since these are
|
|
derived artifacts. The build pipeline takes care of executing this
|
|
command.
|
|
|
|
Adding new languages requires adding the translated files in the
|
|
"src/locale/" folder and adjusting the file
|
|
"src/paperless/settings.py" to include the new language:
|
|
|
|
```python
|
|
LANGUAGES = [
|
|
("en-us", _("English (US)")),
|
|
("en-gb", _("English (GB)")),
|
|
("de", _("German")),
|
|
("nl-nl", _("Dutch")),
|
|
("fr", _("French")),
|
|
("pt-br", _("Portuguese (Brazil)")),
|
|
# Add language here.
|
|
]
|
|
```
|
|
|
|
## Building the documentation
|
|
|
|
The documentation is built using material-mkdocs, see their [documentation](https://squidfunk.github.io/mkdocs-material/reference/).
|
|
If you want to build the documentation locally, this is how you do it:
|
|
|
|
1. Install python dependencies.
|
|
|
|
```shell-session
|
|
$ cd /path/to/paperless
|
|
$ pipenv install --dev
|
|
```
|
|
|
|
2. Build the documentation
|
|
|
|
```shell-session
|
|
$ cd /path/to/paperless
|
|
$ pipenv mkdocs build --config-file mkdocs.yml
|
|
```
|
|
|
|
## Building the Docker image
|
|
|
|
The docker image is primarily built by the GitHub actions workflow, but
|
|
it can be faster when developing to build and tag an image locally.
|
|
|
|
To provide the build arguments automatically, build the image using the
|
|
helper script `build-docker-image.sh`.
|
|
|
|
Building the docker image from source:
|
|
|
|
```shell-session
|
|
./build-docker-image.sh Dockerfile -t <your-tag>
|
|
```
|
|
|
|
## Extending Paperless
|
|
|
|
Paperless does not have any fancy plugin systems and will probably never
|
|
have. However, some parts of the application have been designed to allow
|
|
easy integration of additional features without any modification to the
|
|
base code.
|
|
|
|
### Making custom parsers
|
|
|
|
Paperless uses parsers to add documents to paperless. A parser is
|
|
responsible for:
|
|
|
|
- Retrieve the content from the original
|
|
- Create a thumbnail
|
|
- Optional: Retrieve a created date from the original
|
|
- Optional: Create an archived document from the original
|
|
|
|
Custom parsers can be added to paperless to support more file types. In
|
|
order to do that, you need to write the parser itself and announce its
|
|
existence to paperless.
|
|
|
|
The parser itself must extend `documents.parsers.DocumentParser` and
|
|
must implement the methods `parse` and `get_thumbnail`. You can provide
|
|
your own implementation to `get_date` if you don't want to rely on
|
|
paperless' default date guessing mechanisms.
|
|
|
|
```python
|
|
class MyCustomParser(DocumentParser):
|
|
|
|
def parse(self, document_path, mime_type):
|
|
# This method does not return anything. Rather, you should assign
|
|
# whatever you got from the document to the following fields:
|
|
|
|
# The content of the document.
|
|
self.text = "content"
|
|
|
|
# Optional: path to a PDF document that you created from the original.
|
|
self.archive_path = os.path.join(self.tempdir, "archived.pdf")
|
|
|
|
# Optional: "created" date of the document.
|
|
self.date = get_created_from_metadata(document_path)
|
|
|
|
def get_thumbnail(self, document_path, mime_type):
|
|
# This should return the path to a thumbnail you created for this
|
|
# document.
|
|
return os.path.join(self.tempdir, "thumb.webp")
|
|
```
|
|
|
|
If you encounter any issues during parsing, raise a
|
|
`documents.parsers.ParseError`.
|
|
|
|
The `self.tempdir` directory is a temporary directory that is guaranteed
|
|
to be empty and removed after consumption finished. You can use that
|
|
directory to store any intermediate files and also use it to store the
|
|
thumbnail / archived document.
|
|
|
|
After that, you need to announce your parser to paperless. You need to
|
|
connect a handler to the `document_consumer_declaration` signal. Have a
|
|
look in the file `src/paperless_tesseract/apps.py` on how that's done.
|
|
The handler is a method that returns information about your parser:
|
|
|
|
```python
|
|
def myparser_consumer_declaration(sender, **kwargs):
|
|
return {
|
|
"parser": MyCustomParser,
|
|
"weight": 0,
|
|
"mime_types": {
|
|
"application/pdf": ".pdf",
|
|
"image/jpeg": ".jpg",
|
|
}
|
|
}
|
|
```
|
|
|
|
- `parser` is a reference to a class that extends `DocumentParser`.
|
|
- `weight` is used whenever two or more parsers are able to parse a
|
|
file: The parser with the higher weight wins. This can be used to
|
|
override the parsers provided by paperless.
|
|
- `mime_types` is a dictionary. The keys are the mime types your
|
|
parser supports and the value is the default file extension that
|
|
paperless should use when storing files and serving them for
|
|
download. We could guess that from the file extensions, but some
|
|
mime types have many extensions associated with them and the python
|
|
methods responsible for guessing the extension do not always return
|
|
the same value.
|