When parsing dates from the document text or filenames, correctly handle values
errors indicating broken dates. Newly added tests ensure that this handling
works properly.
Some tests had differing outcomes depending on the version of Tesseract
installed on the test system. This lead to a bunch of false test
failures, which lead to people (including me) just ignoring the Travis
results.
This commit removes those tests, and while it reduces our coverage, at
least the results are predictable.
It turns out that the Lorem ipsum text in the sample files was confuing the language guesser, causing it to think the file was in Catalan and not English or German.
Originally the test wrote scratch data inside the repo dir, which meant
manual cleanup. Now it writes to `/tmp/paperless-tests-<random-string>`
and cleans up after itself.