				 WILMA
		  Web Interface to List Mail Archives

Wilma is a set of Perl programs that combines (Majordomo) mail list
archives, the MHonArc mail folder HTML reformatter, and the Glimpse
search engine into a CGI interface for browsing and searching those
archives.

Wilma requires:

    Perl 5	(tested with 5.004_03)
    CGI.pm	(part of the 5.004 distribution)
    AppCfg.pm	(included)
    MHonArc	(tested with 2.1.0)
    Glimpse	(tested with 4.0B1)*

* Version 3.6 is recommended because version 4 of glimpse is broken
in that it corrupts the index when doing incremental additions.
Wilma_reindex now has a -f option to force a complete rebuild of the
index when incremental indexing fails but that can be time consuming
once the archives become large.


To install Wilma:

1)  Copy the wilma* files to a suitable CGI directory and be sure the
    "shebang line" that invokes Perl is correct for your system, e.g.:

	PERL=/usr/local/bin/perl
	for G in wilma*; do
	  sed "1s,^#!.*/perl,#!$PERL," <$G >/webroot/cgi-bin/$G
	done

    (Be sure to change /webroot to whatever your web root directory
    is.) If it's necessary to add a suffix for them to be recognized as
    CGI programs, rename wilma and wilma_glimpse and edit the *SEARCH
    constant in wilma. Remember to allow read and execute access to your
    web server for all these files.

2)  The AppCfg.pm file should be installed in the Perl local library.
    Use perl -V to see where Perl searches for library modules (the
    @INCS list at the end of the output). The preferred location
    is something like /usr/local/lib/perl5/site_perl but it can be
    installed in a directory not in @INCS if you do one or more of the
    following:

    * Install AppCfg.pm in the directory Wilma is run from (since "." is
      one of the @INC paths, this places AppCfg.pm in the search path)
      and always run it from that directory. This works well for CGI
      with everything installed in the same directory but may not work
      for wilma_reindex, which is run from a crontab, unless one of the
      following is done.

    * Change to the directory where AppCfg.pm is installed before
      running any of the Wilma programs.

    * Set (and export depending on the shell) the PERL5LIB environment
      variable to the path where AppCfg.pm is installed.

    * Run wilma_reindex specifying the -I option to perl, e.g.:

        perl -I/webroot/cgi-bin /webroot/cgi-bin/wilma_reindex

      (Substitute the appropriate path following '-I' and for
      wilma_reindex.)

    * Insert the following line in wilma_reindex before the first 'use'
      statement:

        use lib '/webroot/cgi-bin';

      (Substitute the appropriate path.)

3)  Create a subdirectory named .wilma where you installed the Wilma
    files. This directory is where the configuration files for the
    archives are kept.

4)  Insert a crontab entry that runs wilma_reindex to update the HTML
    archives as often as desired. Without arguments, wilma_reindex
    updates all archives it finds configuration files for in the .wilma
    directory, otherwise it updates the archives named on the command
    line. For example, the following entry updates all archives at 30
    minutes past midnight every Tuesday through Saturday:

	30 0 * * 2-6 nice /webroot/cgi-bin/wilma_reindex

    (Again, change /webroot as appropriate.)

For each archive:

5)  Create a directory for the HTML version of the archive somewhere
    accessible to your web server. Ownership and permissions must allow
    the web server user read and search access. If you run Wilma as
    a specific user via a wrapper, e.g. cgiwrap, then ownership and
    permissions must allow that user access. The crontab owner of step 4
    must also have write access and will own most of the files, but the
    umask used can be configured.

6)  Create an info.html or info.txt file in the HTML archive directory
    created in step 5. This file is just an HTML or text file,
    respectively, that usually describes the list, but can contain
    whatever you want it to. It's optional, but if present, a link to
    this file is displayed at the end of the search/browse page.

7)  Create a subdirectory named index to hold the Glimpse files. Copy
    the .glimpse_exclude and .glimpse_filter files into the index
    directory, making sure to edit .glimpse_filters as appropriate to
    point to the real path to wilma_striphtml.

8)  Copy the wilma_template.cf and wilma_template.rc into the .wilma
    directory, naming them for the list archive they'll configure. For
    example, if your archive is for a mailing list named "foobar",
    then you'd copy wilma_template.cf to .wilma/foobar.cf and
    wilma_template.rc to .wilma/foobar.rc.

9)  Edit the configuration files appropriately. Pay particular
    attention to the list name, title, and paths in the .cf file. Each
    configuration variable is documented in the file. Note that Wilma
    currently expects to extract a 2-digit month (01-12) and a 2- or
    4-digit year from the archive file names. The example regular
    expressions work for Majordomo-style archives but division by any
    other criteria will probably require some hacking in the code.

    The .rc file determines the look of the HTML pages, but only the
    TOPFNAME variable must be unique for each list. It must be set to
    the URL path of the wilma CGI program with the list name appended as
    pathinfo, e.g. /cgi-bin/wilma/foobar. See the MHonArc documentation
    for an explanation of the .rc file contents.

10) Run wilma_reindex for the archive, i.e. specify the list name as
    a parameter to wilma_reindex. This must be done as the user the
    crontab entry was made for. This may take awhile if the existing
    mail list archives are large, so you might want to skip this
    step until off-peak hours or just allow the HTML archive to be
    initialized when the crontab entry runs.


A list archive is specified by appending the list name to the URL for
the wilma CGI program, e.g.:

    http://www.server.com/cgi-bin/wilma/foobar

If no pathinfo (i.e. "/foobar") is specified, wilma generates an index
page of all configured archives, but it's usually best to advertise a
specific URL for a specific list. Depending on your web server it may be
possible to map the full path such that users can just refer to the URL

    http://www.server.com/foobar

to see the archive for list foobar. One way to do this in Apache is with
a ScriptAlias configuration command, for example:

    ScriptAlias  /foobar  /webroot/cgi-bin/wilma/foobar

Another way that doesn't require a configuration line per list is with
the RewriteRule command (the rewrite module is available only in later
versions of Apache and must be configured and enabled in your
installation), for example:

    RewriteRule ^/wilma(/.*|$) /cgi-bin/wilma$1 [PT]



[ end of README $Revision: 1.9 $ ]
