NAME
    nsitemap - functions for generating a site map for a given site URL

SYNOPSIS
        use nsitemap;
        use LWP::UserAgent;

        my $ua = new LWP::UserAgent;
        my $sitemap = new nsitemap(
            EMAIL       => 'your@email.address',
            USERAGENT   => $ua,
            ROOT        => 'http://your.ip.address/'
        );

        $sitemap->generate();
        $sitemap->option( 'VERBOSE' => 1 );
        my $len = $sitemap->option( 'SUMMARY_LENGTH' );

        my $root = $sitemap->root();
        for my $url ( $sitemap->urls() )
        {
            if ( $sitemap->is_internal_url( $url ) )
            {
                # do something ...
            }
            my @links   = $sitemap->links( $url );
            my $title   = $sitemap->title( $url );
            my $summary = $sitemap->summary( $url );
            my $depth   = $sitemap->depth( $url );
            my $digest  = $sitemap->MD5digest( $url );
        }
        $sitemap->traverse(
            sub {
                my ( $sitemap, $url, $depth, $flag ) = @_;
                if ( $flag == 0 )
                {
                    # do something at the start of a list of sub-pages ...
                }
                elsif( $flag == 1 )
                {
                    # do something for each page ...
                }
                elsif( $flag == 2 )
                {
                    # do something at the end of a list of sub-pages ...
                }
            }
        )

DESCRIPTION
    The `nsitemap' module creates a site map for a WWW site, by traversing
    the site using the WWW::Robot module. The nsitemap object has a number
    of methods to access a list of all the urls in the site; a list of all
    the links for each url; page titles; page summaries; page fingerprints
    (MD5digest); and the depth, or mimimum number of links from the root URL
    to a page.

CONSTRUCTOR
  nsitemap->new [ $option => $value ] ...

        my $sitemap = new nsitemap(
            EMAIL       => 'your@email.address',
            USERAGENT   => new LWP::UserAgent,
            ROOT        => 'http://www.my.com/'
        );

    Possible option are:

    EMAIL
        The email address the robot uses to identify itself. This option is
        required.

    ROOT
        Root URL of the site for which the site map is being created. This
        option is required.

    USERAGENT
        User agent (typically 'new LWP::UserAgent') used by the robot. This
        option is required.

    VERBOSE
        Verbose flag, for printing out useful messages during traversal [0
        or 1]. Defaults to 0.

    SUMMARY_LENGTH
        Maximum length of (automatically generated) summary. Defaults to
        200.

    DEPTH
        Maximum depth of traversal. Defaults to no limit.

METHODS
  generate( )

    Method for generating the site map, based on the constructor options.

        $site->generate();

  option( $option [=> $value ] )

    Interface to get / set options after object construction.

        $site->option( 'VERBOSE' => 1 );
        my $len = $site->option( 'SUMMARY_LENGTH' );

  root( )

    Returns the root URL for the site.

        my $root = $site->root();

  urls( )

    Returns a list of all the URLs on the site map.

        my @urls = $site->urls();

  is_internal_url( $url )

    Returns 1 (one) if $url is an internal URL based on the ROOT value.
    Otherwise returns 0 (zero);

        if ( $site->is_internal_url( $url ) )
        {
            # do something ...
        }

  links( $url )

    Returns a list of all the links from a given URL in the site map.

        my @links = $site->links( $url );

  title( $url )

    Returns the title of the URL based on the TITLE tag

        my $title = $site->title( $url );

  MD5digest( $url )

    Returns the MD5_hex (fingerprint) of the URL.

        my $fingerprint = $site->MD5digest( $url );

  summary( $url )

    Returns a summary of the URL; generated using HTML::Summary. If the URL
    has a NAME='description' META tag, returns the value of CONTENT.
    Otherwise it attempts to summarize the text.

        my $summary = $site>summary( $url );

  depth( $url )

    Returns the minimum number of links to traverse from the root URL of the
    site to this URL. The root URL is at depth zero.

        my $depth = $sitemap->depth( $url );

  traverse( \&callback )

    The traverse method walks the site map, starting at the root node
    (spcificed by -url), and visits each URL in the order that they would be
    displayed in a sequential site map of the site. The callback is called
    in a number of places in the traversal as indicated by the $flag
    argument to the callback:

    $flag = 0
        Triggered before each set of daughter URLs of a given URL.

    $flag = 1
        Triggered for each URL.

    $flag = 2
        Triggered after each set of daughter URLs of a given URL.

SEE ALSO
        LWP::UserAgent
        HTML::Summary
        WWW::Robot

AUTHOR
    Steve Horsburgh <shorsburgh@horsburgh.com>

CREDITS
    This utility was inspired by the 1997 Sitemap.pm utility by Ave Wrigley
    <wrigley@cre.canon.co.uk>

COPYRIGHT
    Copyright (c) 2000, Horsburgh.com. All rights reserved.

    This script is free software; you can redistribute it and/or modify it
    under GNU GPL. (See the file COPYING)

