NAME
    nsite - tool for generating WWW site maps

SYNOPSIS
        nsite.pl 
            [ -verbose ] 
            [ -help ]
            [ -doc ]
            [ -depth <depth> ] 
            [ -proxy <proxy URL> ] 
            [ -[no]envproxy ] 
            [ -agent <agent> ]
            [ -authen ] 
            [ -format <html|text|xml|none> ] 
            [ -summary <number of chars> ] 
            [ -title <page title> ] 
            [ -email <e-mail address> ]
            [ -index ]
            [ -nolinks ]
            [ -stats <filename> ]
            [ -output <filename> ]
            [ -altstart <filename> ]
            -url <root URL> 

DESCRIPTION
    nSite generates site maps for a given WWW site. It walks a site from the
    root URL and generates an HTML, TEXT, or XML link page which illustrates
    the structure of the site.

    The HTML site map consists of the page url, title, unique fingerprint,
    summary, and list of internal and external links. The links are
    'clickable' with the internal links in blue and the external links in
    orange.

    The TEXT site map consists of the page url, title, and unique
    fingerprint.

    The XML site map is a list of XML <LINK>/<URL> structures.

    The structure reflects the depth from the root page to the pages listed;
    i.e., the first-level bullets are pages accessible directly from the
    root page, at the next levels are pages accessible from those pages,
    etc. nSite assumes a typical, breadth-first, top-down site structure so
    pages may appear in a different order than originally intended.

OPTIONS
  -url <root URL>

    Option to specify a root URL to generate a site map for. This option is
    required.

  -depth <depth>

    Option to specify the depth of the site map generated. If not specified,
    nSite will generate a sitemap of unlimited depth.

  -email <email address>

    Option to specify the email address which is reported by the robot to
    the site where it gets pages from.

  -proxy <proxy URL>

    Specify an HTTP proxy to use.

  -[no]envproxy

    If -envproxy is set, the proxy specified by the $http_proxy environment
    variable will be used (this is the default behaviour). Use -noenvproxy
    to suppress this. -proxy takes precedence over -envproxy.

  -agent <agent>

    Allows the user to specify an agent for the robot to pretend to be (e.g.
    'Mozilla/4.5'). This can be necessary for sites that do browser sniffing
    for serving particular content, etc.

  -format <formatting option>

    Option for specifying the output format the site map. Possible values
    are

    html
        Simple HTML bulleted list (default). Consists of the page url,
        title, unique fingerprint, summary, and list of internal and
        external links. The links are 'clickable' with the internal links in
        blue and the external links in orange.

    text
        Plain text with indenting. Consists of the page url, title, and a
        unique fingerprint.

    xml An XML graph of linkage between pages. Consists of a list of XML
        <LINK>/<URL> structures.

    none
        Do not output the site map. Useful when you want to just output the
        stats file. (see -stats)

  -summary <number of chars>

    Automatically extract a summary to display with the title. This will be
    truncated at the specified number of characters (default:200). To
    disable the summary display, set the number of chars to -1.

  -title <page title>

    Option to specify a page title for the site map.

  -authen

    Option to use LWP::AuthenAgent to get HTML pages. This allows the user
    to type a username / password for pages that are access controlled.

  -index

    Option to display an index (table of contents) for the site map.

  -nolinks

    Option to disable the display of the internal and external links for
    each page in the site map.

  -altstart

    Option to start the mapping at a specific file instead of the default
    index file.

  -stats <filename>

    Option to output a statistics file with lines containing the following:

     URL<tab>FINGERPRINT<tab>NUMBER_OF_LINKS<tab>DEPTH<tab>TITLE.

  -output <filename>

    Option to output the site map to a file. (Defaults to standard output.)

  -help

    Display a help message to standard output, with a brief description of
    nSite and its command-line switches.

  -doc

    Display the full documentation for nSite, generated from the embedded
    pod format documentation.

  -version

    Print out the current version number for nSite.

  -verbose

    Turn on verbose messages.

ENVIRONMENT
    nSite makes use of the `$http_proxy' environment variable, if it is set.

PREREQUISITES
        HTML::Entities
        Getopt::Long
        LWP::AuthenAgent
        LWP::UserAgent
        Pod::Usage

BUGS
    XML support is very basic. It has been tested only on some Linux,
    Windows, and Irix systems.

AUTHOR
    Steve Horsburgh <shorsburgh@horsburgh.com>

CREDITS
    This script is based on the 1997 sitemapper.pl script by Ave Wrigley
    <wrigley@cre.canon.co.uk>

COPYRIGHT
    Copyright (c) 2000, Horsburgh.com. All rights reserved.

    This script is free software; you can redistribute it and/or modify it
    under GNU GPL. (See the file COPYING)

