Essence supports two mechanisms for defining the type-specific extraction algorithms (called Summarizers) that generate content summaries: a UNIX program that takes as its only command line argument the filename of the data to summarize, and line-based regular expressions specified in lib/quick-sum.cf. See Appendix C.4 for detailed examples on how to define both types of Summarizers.
The UNIX Summarizers are named using the convention
TypeName.sum (e.g., PostScript.sum). These Summarizers output
their content summary in a SOIF attribute-value list (see
Appendix
for information on how to use the
SOIF library to write a summarizer). You can use the wrapit
command to wrap raw output into the SOIF format (i.e., to provide
byte-count delimiters on the individual attribute-value pairs).
There is a summarizer called FullText.sum that you can use to perform full text indexing of selected file types, by simply setting up the lib/bycontent.cf and lib/byname.cf configuration files to recognize the desired file types as FullText (i.e., using ``FullText'' in column 1 next to the matching regular expression).