Advanced topics — Paperless-ng 1.5.0 documentation (2024)

Paperless offers a couple features that automate certain tasks and make your lifeeasier.

Matching tags, correspondents and document types

Paperless will compare the matching algorithms defined by every tag andcorrespondent already set in your database to see if they apply to the text ina document. In other words, if you defined a tag called Home Utilitythat had a match property of bc hydro and a matching_algorithm ofliteral, Paperless will automatically tag your newly-consumed document withyour Home Utility tag so long as the text bc hydro appears in the bodyof the document somewhere.

The matching logic is quite powerful, and supports searching the text of yourdocument with different algorithms, and as such, some experimentation may benecessary to get things right.

In order to have a tag, correspondent or type assigned automatically to newlyconsumed documents, assign a match and matching algorithm using the webinterface. These settings define when to assign correspondents, tags and typesto documents.

The following algorithms are available:

  • Any: Looks for any occurrence of any word provided in match in the PDF.If you define the match as Bank1 Bank2, it will match documents containingeither of these terms.

  • All: Requires that every word provided appears in the PDF, albeit not in theorder provided.

  • Literal: Matches only if the match appears exactly as provided in the PDF.

  • Regular expression: Parses the match as a regular expression and tries tofind a match within the document.

  • Fuzzy match: I dont know. Look at the source.

  • Auto: Tries to automatically match new documents. This does not require youto set a match. See the notes below.

When using the “any” or “all” matching algorithms, you can search for termsthat consist of multiple words by enclosing them in double quotes. For example,defining a match text of "Bank of America" BofA using the “any” algorithm,will match documents that contain either “Bank of America” or “BofA”, but willnot match documents containing “Bank of South America”.

Then just save your tag/correspondent and run another document through theconsumer. Once complete, you should see the newly-created document,automatically tagged with the appropriate data.

Automatic matching

Paperless-ng comes with a new matching algorithm called Auto. This matchingalgorithm tries to assign tags, correspondents and document types to yourdocuments based on how you have assigned these on existing documents. Ituses a neural network under the hood.

If, for example, all your bank statements of your account 123 at the Bank ofAmerica are tagged with the tag “bofa_123” and the matching algorithm of thistag is set to Auto, this neural network will examine your documents andautomatically learn when to assign this tag.

Paperless tries to hide much of the involved complexity with this approach.However, there are a couple caveats you need to keep in mind when using thisfeature:

  • Changes to your documents are not immediately reflected by the matchingalgorithm. The neural network needs to be trained on your documents afterchanges. Paperless periodically (default: once each hour) checks for changesand does this automatically for you.

  • The Auto matching algorithm only takes documents into account which are NOTplaced in your inbox (i.e., have inbox tags assigned to them). This ensuresthat the neural network only learns from documents which you have correctlytagged before.

  • The matching algorithm can only work if there is a correlation between thetag, correspondent or document type and the document itself. Your bankstatements usually contain your bank account number and the name of the bank,so this works reasonably well, However, tags such as “TODO” cannot beautomatically assigned.

  • The matching algorithm needs a reasonable number of documents to identify whento assign tags, correspondents, and types. If one out of a thousand documentshas the correspondent “Very obscure web shop I bought something five yearsago”, it will probably not assign this correspondent automatically if you buysomething from them again. The more documents, the better.

  • Paperless also needs a reasonable amount of negative examples to decide whennot to assign a certain tag, correspondent or type. This will usually be thecase as you start filling up paperless with documents. Example: If all yourdocuments are either from “Webshop” and “Bank”, paperless will assign one ofthese correspondents to ANY new document, if both are set to automatic matching.

Hooking into the consumption process

Sometimes you may want to do something arbitrary whenever a document isconsumed. Rather than try to predict what you may want to do, Paperless letsyou execute scripts of your own choosing just before or after a document isconsumed using a couple simple hooks.

Just write a script, put it somewhere that Paperless can read & execute, andthen put the path to that script in paperless.conf with the variable nameof either PAPERLESS_PRE_CONSUME_SCRIPT orPAPERLESS_POST_CONSUME_SCRIPT.

Important

These scripts are executed in a blocking process, which means that ifa script takes a long time to run, it can significantly slow down yourdocument consumption flow. If you want things to run asynchronously,you’ll have to fork the process in your script and exit.

Pre-consumption script

Executed after the consumer sees a new document in the consumption folder, butbefore any processing of the document is performed. This script receives exactlyone argument:

  • Document file name

A simple but common example for this would be creating a simple script likethis:

/usr/local/bin/ocr-pdf

#!/usr/bin/env bashpdf2pdfocr.py -i ${1}

/etc/paperless.conf

...PAPERLESS_PRE_CONSUME_SCRIPT="/usr/local/bin/ocr-pdf"...

This will pass the path to the document about to be consumed to /usr/local/bin/ocr-pdf,which will in turn call pdf2pdfocr.py on your document, which will thenoverwrite the file with an OCR’d version of the file and exit. At which point,the consumption process will begin with the newly modified file.

Post-consumption script

Executed after the consumer has successfully processed a document and has moved itinto paperless. It receives the following arguments:

  • Document id

  • Generated file name

  • Source path

  • Thumbnail path

  • Download URL

  • Thumbnail URL

  • Correspondent

  • Tags

The script can be in any language you like, but for a simple shell scriptexample, you can take a look at post-consumption-example.sh in thescripts directory in this project.

The post consumption script cannot cancel the consumption process.

File name handling

By default, paperless stores your documents in the media directory and renames themusing the identifier which it has assigned to each document. You will end up gettingfiles like 0000123.pdf in your media directory. This isn’t necessarily a badthing, because you normally don’t have to access these files manually. However, ifyou wish to name your files differently, you can do that by adjusting thePAPERLESS_FILENAME_FORMAT configuration option.

This variable allows you to configure the filename (folders are allowed) usingplaceholders. For example, configuring this to

PAPERLESS_FILENAME_FORMAT={created_year}/{correspondent}/{title}

will create a directory structure as follows:

2019/ My bank/ Statement January.pdf Statement February.pdf2020/ My bank/ Statement January.pdf Letter.pdf Letter_01.pdf Shoe store/ My new shoes.pdf

Danger

Do not manually move your files in the media folder. Paperless remembers thelast filename a document was stored as. If you do rename a file, paperless willreport your files as missing and won’t be able to find them.

Paperless provides the following placeholders withing filenames:

  • {asn}: The archive serial number of the document, or “none”.

  • {correspondent}: The name of the correspondent, or “none”.

  • {document_type}: The name of the document type, or “none”.

  • {tag_list}: A comma separated list of all tags assigned to the document.

  • {title}: The title of the document.

  • {created}: The full date and time the document was created.

  • {created_year}: Year created only.

  • {created_month}: Month created only (number 1-12).

  • {created_day}: Day created only (number 1-31).

  • {added}: The full date and time the document was added to paperless.

  • {added_year}: Year added only.

  • {added_month}: Month added only (number 1-12).

  • {added_day}: Day added only (number 1-31).

Paperless will try to conserve the information from your database as much as possible.However, some characters that you can use in document titles and correspondent names (suchas : \ / and a couple more) are not allowed in filenames and will be replaced with dashes.

If paperless detects that two documents share the same filename, paperless will automaticallyappend _01, _02, etc to the filename. This happens if all the placeholders in a filenameevaluate to the same value.

Hint

Paperless checks the filename of a document whenever it is saved. Therefore,you need to update the filenames of your documents and move them after alteringthis setting by invoking the document renamer.

Warning

Make absolutely sure you get the spelling of the placeholders right, or elsepaperless will use the default naming scheme instead.

Caution

As of now, you could totally tell paperless to store your files anywhere outsidethe media directory by setting

PAPERLESS_FILENAME_FORMAT=../../my/custom/location/{title}

However, keep in mind that inside docker, if files get stored outside of thepredefined volumes, they will be lost after a restart of paperless.

Advanced topics — Paperless-ng 1.5.0 documentation (2024)

FAQs

Where are paperless-NGX documents stored? ›

By default, paperless stores your documents in the media directory and renames them using the identifier which it has assigned to each document.

Where does Paperless save files? ›

No matter which options you choose, Paperless will always store the original document that it found in the consumption directory or in the mail and will never overwrite that document. Archived versions are stored alongside the original versions.

What file type does paperless-ngx use? ›

A: Currently, the following files are supported: PDF documents, PNG images, JPEG images, TIFF images, GIF images and WebP images are processed with OCR and converted into PDF documents.

What is the difference between paperless-NGX and Papermerge? ›

What's the difference between Papermerge and Paperless-ng/Paperless-ngx? They are similar in many aspects. Compared to Papermerge, Paperless-ngx follows minimalist approach. Papermerge offers more complex features like multi-user, folder structure, document versioning, page management (split, merge, delete, rotate).

Is paperless good or bad? ›

While paperless statements offer pros such as less clutter from hard-copy statements, they also present some cons such as harder access to older records.

What is the default admin for paperless? ›

Default login is admin:admin via the webui, accessible at http://SERVERIP:PORT More info at paperless-ng. For convenience this container provides an alias to perform administration management commands.

How do I organize my paperless client files? ›

Organizing paperless client files is simple: organize digital client files exactly how you organized your files before you went paperless. Go with the “folder” analogy that your computer uses for organizing files, and use them just as you use your red ropes and manila folders for your paperless law firm file structure.

Usage Overview — Paperless-ng 1.5.0 ...Paperless-ngxhttps://paperless-ngx.readthedocs.io ›

The consumer watches a specified folder and adds all documents in that folder to paperless. The web server provides a UI that you use to manage and search for y...

Paperless-ngx

Paperless-ngx
https://docs.paperless-ngx.com
Paperless-ngx
https://docs.paperless-ngx.com
Paperless-ngx is a community-supported open-source document management system that transforms your physical documents into a searchable online archive so you ca...
Paperless-NGX promises to free you of the pain of maintaining physical documents. Installation. I have a homelab (read: ex-gaming PC I shove random software ont...

Where is paperless post located? ›

Paperless Post is a software company based in New York City that enables creating, sending, and managing online invitations and events.

Where do I find stored documents? ›

Search File Explorer: Open File Explorer from the taskbar or right-click on the Start menu, choose File Explorer and then select a location from the left pane to search or browse.

Where do you store digital documents? ›

The best cloud document storage of 2024 in full:
  1. Google Drive. Best cloud document storage for Google users. ...
  2. Dropbox. Best cloud document storage for simplicity. ...
  3. Egnyte. Best cloud document storage for experienced users. ...
  4. Microsoft OneDrive. Best cloud document storage platform overall. ...
  5. Adobe Document Cloud.

Where are paper documents stored? ›

Here are six recommended options for storing paper documents long term:
  • A Digital Filing Cabinet. ...
  • A Physical Filing Cabinet. ...
  • A Safety Deposit Box. ...
  • Fireproof Lockboxes or Safes. ...
  • Off-Site Storage Facilities. ...
  • Cloud-Based Storage Systems.
Jan 31, 2023

References

Top Articles
Employee Self Service | ESS | YoungCapital
Ddo Saga
Sdn Md 2023-2024
Mchoul Funeral Home Of Fishkill Inc. Services
Hotels Near 625 Smith Avenue Nashville Tn 37203
Satyaprem Ki Katha review: Kartik Aaryan, Kiara Advani shine in this pure love story on a sensitive subject
Cash4Life Maryland Winning Numbers
Mackenzie Rosman Leaked
Z-Track Injection | Definition and Patient Education
Craigslist Cars And Trucks Buffalo Ny
Www Thechristhospital Billpay
MADRID BALANZA, MªJ., y VIZCAÍNO SÁNCHEZ, J., 2008, "Collares de época bizantina procedentes de la necrópolis oriental de Carthago Spartaria", Verdolay, nº10, p.173-196.
Dark Souls 2 Soft Cap
Jessica Renee Johnson Update 2023
What Was D-Day Weegy
Current Time In Maryland
I Touch and Day Spa II
Procore Championship 2024 - PGA TOUR Golf Leaderboard | ESPN
Paychex Pricing And Fees (2024 Guide)
Hollywood Bowl Section H
Christina Steele And Nathaniel Hadley Novel
Brbl Barber Shop
Sec Baseball Tournament Score
Lines Ac And Rs Can Best Be Described As
Macu Heloc Rate
BJ 이름 찾는다 꼭 도와줘라 | 짤방 | 일베저장소
CVS Health’s MinuteClinic Introduces New Virtual Care Offering
Cfv Mychart
2004 Honda Odyssey Firing Order
Pdx Weather Noaa
Ellafeet.official
Skroch Funeral Home
Green Bay Crime Reports Police Fire And Rescue
Planet Fitness Lebanon Nh
Raising Canes Franchise Cost
Is Arnold Swansinger Married
Kelley Blue Book Recalls
Spectrum Outage in Genoa City, Wisconsin
Kerry Cassidy Portal
Invalleerkracht [Gratis] voorbeelden van sollicitatiebrieven & expert tips
Japanese Big Natural Boobs
Armageddon Time Showtimes Near Cmx Daytona 12
3500 Orchard Place
R/Gnv
Contico Tuff Box Replacement Locks
Cara Corcione Obituary
Stephen Dilbeck, The First Hicks Baby: 5 Fast Facts You Need to Know
Wwba Baseball
View From My Seat Madison Square Garden
Phumikhmer 2022
Worlds Hardest Game Tyrone
Dr Seuss Star Bellied Sneetches Pdf
Latest Posts
Article information

Author: Rueben Jacobs

Last Updated:

Views: 5739

Rating: 4.7 / 5 (57 voted)

Reviews: 80% of readers found this page helpful

Author information

Name: Rueben Jacobs

Birthday: 1999-03-14

Address: 951 Caterina Walk, Schambergerside, CA 67667-0896

Phone: +6881806848632

Job: Internal Education Planner

Hobby: Candle making, Cabaret, Poi, Gambling, Rock climbing, Wood carving, Computer programming

Introduction: My name is Rueben Jacobs, I am a cooperative, beautiful, kind, comfortable, glamorous, open, magnificent person who loves writing and wants to share my knowledge and understanding with you.