The second new file format is the ``Abstract'' type, which is a file that contains only the text of a paper abstract (a format that is common in technical report FTP archives). To recognize that a file is written in this format, we'll use the naming convention that the filename for ``Abstract'' files ends in ``.abs''. So, we add that type recognition customization to the lib/byname.cf file as a regular expression:
Abstract ^.*\.abs$
Another way to write a summarizer is to write a program or script that
takes a filename as the first argument on the command line, extracts the
structured information, then outputs the results as a list of SOIF
attribute-value pairs (see Appendix
for
further information on how to write a program that can produce SOIF).
Summarizer programs are named TypeName.sum, so we call our new
summarizer Abstract.sum. Remember to place the summarizer program in
a directory that is in your path so that Gatherer can run it. You'll see
below that Abstract.sum is a Bourne shell script that takes the first
50 lines of the file, wraps it as the ``Abstract'' attribute, and outputs
it as a SOIF attribute-value pair.
#!/bin/sh
#
# Usage: Abstract.sum filename
#
head -50 "$1" | wrapit "Abstract"