Beginning with version 1.3, it is possible to ``fine-tune'' the summary information generated by the Essence summarizers. A typical application of this would be to change the Time-to-live attribute based on some knowledge about the objects. So an administrator could use the post-summarizing feature to give quickly-changing objects a lower TTL, and very stable documents a higher TTL.
Objects are selected for post-processing if they meet a specified condition. A condition consists of three parts: An attribute name, an operation, and some string data. For example:
city == 'New York'
In this case we are checking if the city attribute is equal to the string `New York' The for exact string matching, the string data must be enclosed in single quotes. Regular expressions are also supported:
city ~ /New York/
Negative operators are also supported:
city != 'New York'
city !~ /New York/
Conditions can be joined with `&&' (logical and) or
`||' (logical or) operators:
city == 'New York' && state != 'NY';
When all conditions are met for an object, some number of instructions are executed on it. There are four types of instructions which can be specified:
time-to-live = "86400"
keywords | tr A-Z a-z
address,city,state,zip ! cleanup-address.pl
delete()
The conditions and instructions are combined together in a ``rules'' file. The format of this file is somewhat similar to a Makefile; conditions begin in the first column and instructions are indented by a tab-stop. Example:
type == 'HTML'
partial-text | cleanup-html-text.pl
URL ~ /users/
time-to-live = "86400"
partial-text ! extract-owner.sh
type == 'SOIFStream'
delete()
This rules file is specified in the gatherer.cf file with the Post-Summarizing: tag, e.g.:
Post-Summarizing: lib/myrules