9
Pre Consume Script Examples
tooomm edited this page 2025-01-05 22:35:48 +01:00

This wiki page is a repository of example pre-consume scripts contributed by the community. As always, you should exercise caution when using a script and make sure you understand the code before using a script from the internet.

Removing Blank Pages

Warning

This script modifies the original file!

Note

Original source: https://github.com/paperless-ngx/paperless-ngx/discussions/668#discussioncomment-3936343 with slight update (suppress warnings for Apple PDFs)

#!/bin/bash
#set -x -e -o pipefail
set -e -o pipefail
export LC_ALL=C

#IN="$1"
IN="$DOCUMENT_WORKING_PATH"

# Check for PDF format
TYPE=$(file -b "$IN")

if [ "${TYPE%%,*}" != "PDF document" ]; then
  >&2 echo "Skipping $IN - non PDF [$TYPE]."
  exit 0
fi

# PDF file - proceed

#PAGES=$(pdfinfo "$IN" | grep ^Pages: | tr -dc '0-9')
PAGES=$(pdfinfo "$IN" | awk '/Pages:/ {print $2}')

>&2 echo Total pages $PAGES


# Threshold for HP scanners
# THRESHOLD=1
# Threshold for Lexmar MC2425
THRESHOLD=0.8


non_blank() {
  for i in $(seq 1 $PAGES) ; do
    PERCENT=$(gs -o -  -dFirstPage=${i} -dLastPage=${i} -sDEVICE=ink_cov "${IN}" | grep CMYK | nawk 'BEGIN { sum=0; } {sum += $1 + $2 + $3 + $4;} END {  printf "%.5f\n", sum } ')
    >&2 echo -n "Color-sum in page $i is $PERCENT: "
    if awk "BEGIN { exit !($PERCENT > $THRESHOLD) }"; then
      echo $i
      >&2 echo "Page added to document"
    else
      >&2 echo "Page removed from document"
    fi
  done
}

NON_BLANK=$(non_blank)

if [ -n "$NON_BLANK" ]; then
  NON_BLANK=$(echo $NON_BLANK  | tr ' ' ",")
  qpdf "$IN" --warning-exit-0 --replace-input --pages . $NON_BLANK --
fi

Cleaning with qpdf

  • ⚠️ This script modifies the original file
  • Useful for correcting certain structural issues with PDFs
#!/usr/bin/env bash

qpdf --replace-input "$DOCUMENT_WORKING_PATH"