==============================================================================
README for squid-gw 1.3                                             2000-05-12
==============================================================================

Copyright 1997-2000 by Eberhard Mattes <em-gw@windhager.de>
Donated to the public domain.  No warranty.


Introduction
============

squid-gw is an add-on for the TIS Firewall Toolkit 2.0; it is a HTTP
proxy which forwards all requests to another HTTP proxy like Squid.

The basic design policy of squid-gw is to reject anything which is not
explicitely allowed.  For instance, any HTTP headers squid-gw does not
know about are removed.  In particular, HTML documents are parsed and
completely rewritten into a form which all browsers should understand
and treat the same way.  Thus, it should not be possible to trick the
browser into executing JavaScript by exploiting differences between
the way the firewall and the browser parse HTML.  However, there are
too many places in which web browsers interpret the input as
JavaScript (style sheets!), so it is not guaranteed that JavaScript
can be completely blocked.  (The syntax of the generated HTML is
described in html.c.  Note that you can configure squid-gw to deviate
from that syntax.  Better don't do that.)

If you think that the HTML rewriting done by squid-gw violates the
copyright of the authors of HTML documents, don't use squid-gw (or
disable HTML rewriting).

squid-gw doesn't supports shttp.

squid-gw does not support persistent HTTP/1.1 connections; this can be
considered to be a security feature.


Installation
============

1. After installing the TIS Firewall Toolkit, unpack squid-gw.tar.gz
   or em-gw.tar.gz into the main directory of the TIS Firewall Toolkit:

        cd /sources/fwtk
        gunzip </dist/em-gw.tar.gz | tar xf -

2. If your `make' tool requires `.include' instead of `include',
   replace `include' with `.include' in squid-gw/Makefile and
   libem/Makefile.  (This can also be done by running `fixmake' of the
   TIS Firewall Toolkit.)

3. Compile the libraries and the program:

        cd libem
        make
        cd ../squid-gw
        make

4. Copy squid-gw to the target directory:

        cp squid-gw /usr/local/etc

   (see the definition of DEST in Makefile.config for the target
   directory) and copy squid-log and squid-top to a directory in
   $PATH:

        cp squid-log squid-top /usr/local/bin

5. Set up Squid or another HTTP proxy.  For security reasons, you
   should _not_ put that HTTP proxy on the gateway host; better put it
   on a host in the DMZ.

6. Configure squid-gw by editing netperm-table, see below.

7. Start squid-gw (don't forget to arrange for squid-gw being started
   automatically when the system comes up), see below.


Compilation problems
====================

If your system does not have vsnprintf(), add

  #define DONT_HAVE_VSNPRINTF

to firewall.h and retry.  squid-gw requires an ANSI/ISO C compiler and
a POSIX.1 operating system.  If you don't have these, you have to port
squid-gw to your system yourself.


Configuration
=============

squid-gw is -- like the TIS FWTK programs -- configured by rules in
"netperm-table".  For global configuration, it reads all rules using
the "squid-gw" and "*" keywords (however, see <KEY> below).
Additionally, you can define configuration classes and assign them to
individual clients based on IP address or browser.  For configuration
class "foo", squid-gw reads all rules using the "squid-gw-foo" and "*"
keywords (however, see <KEY> below).  Some attributes should be used
only in the global configuration of squid-gw.  Note that
"netperm-table" must be readable by the user ID and group ID
configured with the "userid" and "groupid" attributes; otherwise,
configuration classes won't work.

Configuration classes are applied in the following sequence:

1.  Global configuration
2.  Client-specific class from "hosts" attribute
3.  Destination-specific class from "destinations" attribute
4.  Client-specific class from "browsers" attribute

By default, later classes override values provided by earlier classes,
that is, "browsers"'s classes override "destinations"'s classes which
in turn override "hosts"'s classes which in turn override the global
configuration.

However, for some attributes, you can specify a level by using the
-force option.  Values provided by higher levels always override
values provided by lower levels, no matter in what sequence the values
are provided.  For values of the same level, the sequence matters as
described above.  Each occurrence of the -force option increases the
level of the succeeding values of the line by one.  The default level
is zero.

Suppose that you want to enable JavaScript for certain browsers
("User-Agent: SafeBrowser").  For certain other browsers ("User-Agent:
GoodBrowser"), you want to enable JavaScript only when visiting a
certain site ("http://safesite").  For all other browsers, you want to
disable JavaScript.  The following configuration will implement that
policy:

  squid-gw:              block javascript

  squid-gw:              destinations http://safesite  -class safesite
  squid-gw:              destinations *

  squid-gw-safesite:     allow javascript

  squid-gw:              browsers SafeBrowser -class safebrowser
  squid-gw:              browsers GoodBrowser
  squid-gw:              browsers *           -class otherbrowser

  squid-gw-safebrowser:  allow -force javascript
  squid-gw-otherbrowser: block -force javascript

This configuration has been written in a way which does not depend on
the sequence in which the classes are applied (except for the global
configuration).  If there is a site for which you want to block
JavaScript even for SafeBrowser, just insert

  squid-gw:              destinations http://evilsite  -class evilsite

before the "destinations *" line and insert

  squid-gw-evilsite:     block -force -force javascript

anywhere.

As the "browsers" and "destinations" attributes can be used for global
configuration only, not for configuration classes, the "-hosts" option
has been provided for these attributes to be able to implement a
policy like the above but depending on the _client_ address instead of
the _destination_ address.  See `The "-hosts" option' for details.


Global configuration
====================

squid-gw understands the following attributes for global configuration:

  browsers <BROWSER-PATTERN>...
           [ -class <CLASSNAME>... ]
           [ -hosts <EXTENDED-HOST-PATTERN>... ]

     Specify access permissions based on the value of the User-Agent
     HTTP header field (that is, based on the (pretended) type of the
     browser).  If there is no User-Agent HTTP header field, squid-gw
     pretends that the value is "no-user-agent" (for the GET, HEAD,
     and POST request methods) or "no-connect-user-agent" (for the
     CONNECT request method).  The "browsers" attribute comes in two
     flavours: "permit-browsers" for permitting access for matching
     browsers and "deny-browsers" for denying access for matching
     browsers.  "browsers" is equivalent to "permit-browsers".  If
     there are multiple "browsers" lines, all of them will be read,
     stopping at the first matching line.  If the -class option is
     used, all specified configuration classes will also be read for
     attributes, overriding the global configuration.  If at least one
     "-hosts" option is used, the client must match the "-hosts"
     options for the rule line to take effect, see chapter `The
     "-hosts" option'.  If the client does not match, the rule line
     will be ignored and squid-gw will try the next line.  If there is
     no matching line, access will be denied.  The -class option is
     useless for deny-browsers.  <BROWSER-PATTERN> can contain "*" to
     match any sequence of characters.  Letter case is ignored.  Don't
     forget to use quotes if <BROWSER-PATTERN> contains blanks!

     Example:

          squid-gw: permit-browsers Wget* -class wget
          squid-gw: deny-browsers no-user-agent no-connect-user-agent
          squid-gw: permit-browsers "*"

     The second line denies access if the User-Agent line is missing,
     except for a test machine called test.my.domain.

  connect <HOSTNAME-PATTERN>[:<PORT>]... [ -ipaddr <IP-ADDRESS>... ]
      [ -delay <SECONDS> ] [ -message <MESSAGE> ]
      [ -hosts <EXTENDED-HOST-PATTERN>... ]

     Specify access permissions for https (SSL) based on the
     destination, that is, on the destination host.  (See the
     "destinations" attribute for access permissions for plain HTTP
     URLs.)  The "connect" attribute comes in two flavours:
     "permit-connect" for permitting access for matching destinations
     and "deny-connect" for denying access for matching destinations.
     "connect" is equivalent to "permit-connect".  If there are
     multiple "connect" lines, all of them will be read, stopping at
     the first matching line.  If there is no matching line, access
     will be denied.

     For a "connect" line to match, all of the following conditions
     must be met: at least one <HOSTNAME-PATTERN> and its <PORT> match
     the destination address; if "-ipaddr" is used, at least one
     <IP-ADDRESS> matches the destination address; if "-hosts" is
     used, at least one <EXTENDED-HOST-PATTERN> matches the client
     address.

     There must be at least one <HOSTNAME-PATTERN>, which will be
     matched against the hostname (or IP address) specified by the
     client (i.e., the host to which a tunnelled connection is
     requested).  The character "*" in <HOSTNAME-PATTERN> matches any
     number of characters, including zero characters.  If the port
     number is omitted, 443 (https/SSL) will be assumed.  If "*" is
     specified for <PORT> (not recommended), any port number will
     match.

     If "-ipaddr" is used, at least one <IP-ADDRESS> must match an IP
     address of the host requested by the client.  You can use "*" in
     <IP-ADDRESS>. Note that using "-ipaddr" turns <HOSTNAME-PATTERN>
     mostly into a comment (except for the port number).  If you want
     to match by IP address only, just use "*" as <HOSTNAME-PATTERN>:

          squid-gw: permit-connect * -ipaddr 1.2.3.4

     As <IP-ADDRESS> doesn't include a port number, the port number is
     specified by <PORT>:

          squid-gw: permit-connect *:5555 -ipaddr 1.2.3.4

     It is highly recommended to use "-ipaddr" to make it harder for
     attackers to redirect connections.

     The "-delay" option can be used to slow down access to sites
     which are not required for work.

     For "deny-connect", you can define the error message with the
     "-message" option.  If no "-message" option is given, the message
     will be "Forbidden".  The "-delay" option can be used to cause a
     delay before sending the response.

     If at least one "-hosts" option is used, the client must match
     the "-hosts" options for the rule line to take effect, see
     chapter `The "-hosts" option'.  If the client does not match, the
     rule line will be ignored and squid-gw will try the next line.

     IMPORTANT: You should permit CONNECT only to trusted sites, not
     to the Internet at large.  squid-gw doesn't perform any content
     filtering for CONNECT!  You cannot block ActiveX, Java, and
     JavaScript!

     Example:

          squid-gw: connect www.example.com -ipaddr 1.2.3.4 1.2.3.5

     This line grants SSL access (port 443) to www.example.com if (and
     only if) that hostname resolves to IP address 1.2.3.4 or 1.2.3.5.

  destinations [METHOD] <URL-PATTERN>...
               [ -class <CLASSNAME>... ] [ -delay <SECONDS> ]
               [ -message <MESSAGE> ] [ -redir <URL> ]
               [ -hosts <EXTENDED-HOST-PATTERN>... ]

     Specify access permissions based on the destination, that is, on
     the requested URL (for the GET, HEAD, and POST request methods;
     see the "connect" attribute for the CONNECT request method.)  The
     "destinations" attribute comes in two flavours:
     "permit-destinations" for permitting access for matching
     destinations and "deny-destinations" for denying access for
     matching destinations.  "destinations" is equivalent to
     "permit-destinations".  If there are multiple "destinations"
     lines, all of them will be read, stopping at the first matching
     line.  If there is no matching line, access will be permitted(!).

     For permit-destinations, you can specify configurations classes
     with the -class option: all configuration classes specified for
     the first matching "permit-destinations" line will be read for
     attributes, overriding the global configuration.  The -delay
     option can be used to slow down access to sites which are not
     required for work.  The -class option is ignored for
     "deny-destinations".

     For deny-destinations, you can define the error message with the
     -message option.  If no -message option is given, the message
     will be "Forbidden".  Alternatively, you can redirect the request
     to another URL by using the -redir option.  This can be used for
     producing the local conditions of use.  The URL should point to
     an internal web server; to avoid recursion, the URL must not
     match a deny-destinations rule which uses the -redir option.  The
     -delay option can be used to cause a delay before sending the
     response.

     <METHOD> is one of "GET", "HEAD", and "POST".  The line is
     considered only if the request method is <METHOD>.  If <METHOD>
     is omitted (that is, the first word is not one of "GET", "HEAD",
     and "POST"), the line applies to all request methods.  Note that
     upper-case letters must be used for <METHOD>.

     If at least one "-hosts" option is used, the client must match
     the "-hosts" options for the rule line to take effect, see
     chapter `The "-hosts" option'.  If the client does not match, the
     rule line will be ignored and squid-gw will try the next line.

     See chapter `URL patterns' for details on <URL-PATTERN>.

     IMPORTANT: You cannot rely on deny-destinations for prohibiting
     access to certain sites -- there are too many ways for working
     around such restrictions!  However, it's good enough for blocking
     ads and other unwanted stuff.

     Example:

          squid-gw: deny-destinations http*://*.*.*.*
          squid-gw: deny-destinations http*:*/chat*?* -message "Don't chat!"
          squid-gw: deny-destinations POST http*:*/chat* -message "Don't chat!"
          squid-gw: deny-destinations *:*sex* -redir "http://int-www/use.html"
          squid-gw: destinations http://www.myowndomain.com
          squid-gw: destinations * -class notsafe -delay 1
          squid-gw-notsafe: block -force java javascript object

     Explanation: The first line denies access if an IP address is
     given as host.  All serious web sites should be accessible by
     name.

     The next two lines attempt to prevent users from chatting: The
     first line disables all URLs which contain "/chat" in the path
     and which contain a query (that's the part starting with "?").
     The second line disables POST requests to URLs which contain
     "/chat" in the path.  As there's no host part, this applies to
     all web servers.  Note that we allow a GET request to, say,
     "http://foo.bar/icons/chat.gif" (there's no query part).

     The third line is an attempt at reminding users of local
     conditions of Internet use.

     The last three lines block Java, JavaScript, and ActiveX for all
     sites but www.myowndomain.com.

  groupid <GROUP>

     Run with group ID <GROUP>.  Only the first "groupid" line is
     read.  Note that "netperm-table" must be readable by the user ID
     and group ID configured with the "userid" and "groupid"
     attributes; otherwise, configuration classes won't work.

     Example:

          squid-gw: groupid nogroup

  hosts <HOST-PATTERN>... [ -class <CLASSNAME>... ]

     Specify access permissions for clients based on IP address.  The
     "hosts" attribute comes in two flavours: "permit-hosts" for
     permitting access for matching hosts and "deny-hosts" for denying
     access for matching hosts.  "hosts" is equivalent to
     "permit-hosts".  If there are multiple "hosts" lines, all of them
     will be read, stopping at the first matching line.  If the -class
     option is used, all specified configuration classes will also be
     read for attributes, overriding the global configuration.  The
     -class option is useless for deny-hosts.  See also the "-hosts"
     option of the "browsers" and "destinations" attributes.

     Example:

          squid-gw: permit-hosts 199.99.99.*
          squid-gw: permit-hosts 127.0.0.1 -class local
          squid-gw: deny-hosts 199.99.99.1

  href <URL-PATTERN>...

     Specify what URLs are allowed in HTML attributes such as HREF and
     SRC which take an URL as value.  The "href" attribute comes in
     two flavours: "permit-href" for permitting matching URLs and
     "deny-href" for rejecting matching URLs.  "href" is equivalent to
     "permit-href".  If there are multiple "href" lines, all of them
     will be read, stopping at the first matching line.  If there is
     no matching line for an URL, the URL will be rejected.  As there
     is no default value, configuring "href" is required to enable
     hyperlinks.  The main purpose of "href" is to disable prevent
     browsers from accessing local services:

          <img src="http://localhost:19">

     Unfortunately, the <URL-PATTERN> syntax is not yet powerful
     enough.  For controlling access to external WWW sites, you should
     use "destinations" instead of "href".

     See chapter `URL patterns' for details on <URL-PATTERN>.

     Example:

          squid-gw: deny-href *:*://localhost*
          squid-gw: deny-href *:*://127.*
          squid-gw: deny-href *:*.mydomain.com*
          squid-gw: permit-href http: https: ftp:// gopher://
          squid-gw: permit-href javacript: news: mailto:

  userid <USER>

     Run with user ID <USER>.  Only the first "userid" line is read.
     Note that "netperm-table" must be readable by the user ID and
     group ID configured with the "userid" and "groupid" attributes;
     otherwise, configuration classes won't work.

     Example:

          squid-gw: userid nouser


Configuration classes
=====================

squid-gw understands the following attributes for global configuration
and for configuration classes:

  allow <WHAT>...
  block <WHAT>...

     By default, squid-gw blocks (filters) anything it considers
     dangerous.  You can change the blocking policy of squid-gw by
     using the "allow" and "block" attributes.  Only the first "allow"
     line is read and only the first "block" line is read.  The
     following keywords are available for <WHAT>:

          cookies       Cookies
          embed         <EMBED> HTML tag
          java          Java applets (<APPLET> HTML tag)
          javascript    JavaScript (and other) scripts
          object        ActiveX objects (<OBJECT> HTML tag)
          style         Style sheets

     Moreover, the -force option can be used for assigning a level to
     the values following that option.

     To enable JavaScript, you also have to enable "javascript:" in
     HREF attributes, see "href".  See also "script".  To enable
     ActiveX, you also have to enable "clsid:" in HREF attributes.

     Example:

          squid-gw-nofilter: allow cookies java javascript object

     Note that there are other ways than cookies to track users.  For
     instance, user-specific information can be encoded in URLs.

     Note that there's no point in blocking Java and ActiveX if you
     allow JavaScript as JavaScript can be used to build HTML pages on
     the fly.

     Note that for recent web browsers, allowing style sheets means
     allowing JavaScript (which in turns means allowing ActiveX)!  A
     future version of squid-gw may attempt to filter JavaScript from
     style sheets.

  auto-html-limit <LIMIT>

     Microsoft Internet Explorer doesn't strictly pay attention to
     Content-Type: it treats anything which has <HTML>, <HEAD>, or
     <BODY> near the beginning as HTML document.  "Near the beginning"
     means that these tags can be preceded by up to 196 arbitrary
     bytes.  This brain damage enables bad guys to bypass HTML
     filtering just by using any Content-Type different from
     text/html.  Therefore, squid-gw looks at the first <LIMIT> bytes
     of any HTTP body received from the server, no matter what
     Content-Type is received.  If it finds "<HTML", "<HEAD", or
     "<BODY" within the first <LIMIT> bytes, HTML filtering will be
     applied.  Of course, HTML rewriting will break plain text files
     (text/plain) and binary files (e.g., image/gif) which happen to
     contain these bytes near the beginning.  Note that some ZIP files
     contain HTML markup in the ZIP comment -- MSIE will display those
     ZIP files as HTML...  If you are sure that nobody uses MSIE, you
     can prevent squid-gw from scanning for "<HTML" etc. by
     configuring

          squid-gw: auto-html-limit 0

     Of course, you can use configuration classes to make the value of
     auto-html-limit depend on the browser.  The default value is 256;
     the smallest value suitable for MSIE (according to limited
     testing with MSIE 3.0 for Windows 95) seems to be 196+5 = 201.
     You can use values up to 4096.  Of course, there is no guarantee
     that this prevents MSIE from interpreting certain bodies as HTML.
     If you find out more about this MSIE brain damage, please tell
     the author of squid-gw.  The set of strings to search for can be
     changed in the source code only for now (auto_html_tags in
     http.c).

  client-timeout <TIMEOUT>

     Define the number of seconds squid-gw is idle (without network
     activity) before disconnecting from the client.  The default
     value is 2 hours (PROXY_TIMEOUT in firewall.h).  Only the first
     "client-timeout" line is read.  For the tunnelled connection of
     the CONNECT request method, "client-timeout" is ignored and
     "server-timeout" applies to both the client and the server.

     Example:

          squid-gw: client-timeout 300

  cookies <MODE>

     Define how to parse cookies.  Only the first "cookies" line is
     read.  There are four possible values for <MODE>:

          netscape        relaxed parsing as in Netscape's cookie spec
          netscape-quote  relaxed parsing, quote non-strict values
          nocheck         don't parse cookies at all
          rfc2109         strict parsing according to RFC 2109

     At this time, I have not yet encountered any HTTP server which
     uses RFC 2109 syntax for cookies.  Therefore, the default value
     is "netscape".

     Example:

          squid-gw: cookies nocheck

  drop-meta-content-type [-force]... <ON_OFF>

     Some web browsers have problems with

        <META HTTP-EQUIV=Content-Type CONTENT="text/html; charset=iso-8859-1">

     You can use this attribute tell squid-gw to drop any META tags
     having HTTP-EQUIV=Content-Type.  Only the first
     "drop-meta-content-type" line is read.  The following keywords
     are available for <ON_OFF>:

          off           Don't drop <META HTTP-EQUIV=Content-Type>
          on            drop <META HTTP-EQUIV=Content-Type>

     The default setting is "off". Moreover, the -force option can be
     used for assigning a level to the value following that option.

     Example:

          squid-gw-netscape: drop-meta-content-type

  html-attribute-length <LIMIT>

     Limit the length of the attribute values to <LIMIT> octets.  The
     default value is 1024 (cf. RFC 1866).  However, there are HTML
     documents which use longer attribute values.  Note that setting
     <LIMIT> very high wastes lots of memory.

     Example:

          squid-gw: html-attribute-length 32768

  html-attributes <TYPE>:<POLICY>[/log] ...

     Configure how to handle suspect attributes of tags in HTML
     documents (e.g., "<A HREF="javascript:") received from the
     server.  If there are multiple "html-attributes" lines, all of
     them will be read.  <POLICY> applies to attributes classified as
     <TYPE>.  If "/log" is given, squid-gw will log any attribute
     classified as <TYPE>.  The following keywords are available for
     <TYPE>:

          alphanumeric  unknown attribute with a value which consists
                        solely of characters a-z, A-Z, and 0-9 and whose
                        name does not start with "on"
          dangerous     dangerous attribute
          novalue       unknown attribute without value and whose name does
                        not start with "on")
          on            unknown attribute whose name starts with "on"
          unknown       unknown attribute

     The classification of some attributes can be configured (see
     "allow"), the classification of the other attributes can be
     changed in the source code only.  Currently, squid-gw knows about
     HTML 2.0, HTML 3.2, HTML 4.0 (draft of 1997-07-08), some Netscape
     and Microsoft extensions, and some common typos and other errors.
     You might want to log unknown attributes to be able to add them
     to the table of known attributes (and reporting them to the
     author of squid-gw).  The following keywords are available for
     <POLICY>:

          copy          don't remove the attribute [ANOU]
          drop          completely remove the attribute [ADNOU]
          prefix        prefix the attribute name ("REMOVED-foobar") [ADNOU]

     (The letters in brackets indicate for which <TYPE>s the keyword
     can be used.)  It is not recommended to use "on:copy" or
     "unknown:copy".

     Any <TYPE>s not listed in the attribute are not changed.  The
     default setting is

          alphanumeric:prefix
          dangerous:prefix
          novalue:copy
          on:prefix/log
          unknown:prefix

     Example:

          squid-gw: html-attributes dangerous:prefix/log

  html-filter [-force]... <ON_OFF>

     By default, squid-gw filters HTML documents.  However, you can
     configure squid-gw to pass through bodies of Content-Type
     text/html unmodified.  This will disable all HTML filtering such
     as removal of ActiveX, Java, and JavaScript.  Only the first
     "html-filter" line is read.  The following keywords are available
     for <ON_OFF>:

          off           Pass through HTML (this is not recommended)
          on            Filter HTML (default setting)

     Moreover, the -force option can be used for assigning a level to
     the value following that option.

     Example:

          squid-gw:      browsers Wget* -class wget -hosts 10.0.0.1
          squid-gw-wget: html-filter off

     These rules turn off HTML filtering for Wget on client 10.0.0.1
     only.

  html-meta <TYPE>:<POLICY>[/log] ...

     Configure how to handle suspect META tags in HTML documents
     received from the server.  If there are multiple "html-meta"
     lines, all of them will be read.  <POLICY> applies to META tags
     classified as <TYPE>.  If "/log" is given, squid-gw will log any
     META tag classified as <TYPE>.  The following keywords are
     available for <TYPE>:

          unknown       unknown name (e.g., "<meta name=foo content=bar>")

     Note that "unknown" applies only to META tags with a "NAME"
     attribute; it does not apply to META tags with an "HTTP-EQUIV"
     attribute which are always considered dangerous unless known to
     be benign; handling of unknown "HTTP-EQUIV" META tags is
     controlled by "html-tags dangerous" (which see).

     The set of unknown names can be changed in the source code
     (html-meta.tab) only.  Note that there is not much point in
     logging unknown names as web page authors and authoring tools
     invent way too many names of their own.  The following keywords
     are available for <POLICY>:

          comment       replace with comment [U]
          copy          don't remove the tag [U]
          drop          completely remove the tag [U]
          escape        escape the tag ("&amp;foobar&gt;") [U]
          prefix        prefix the tag name ("<REMOVED-foobar>") [U]

     (The letters in brackets indicate for which <TYPE>s the keyword
     can be used.)  It is not recommended to use "unknown:copy".

     Any <TYPE>s not listed in the attribute are not changed.  The
     default setting is

          unknown:prefix

     Example:

          squid-gw: html-meta unknown:drop

  html-references <TYPE>:<POLICY>[/log] ...

     Configure how to handle suspect numeric character references
     ("&#500;") and entity references ("&foobar;") in HTML documents
     received from the server.  If there are multiple
     "html-references" lines, all of them will be read.  <POLICY>
     applies to references classified as <TYPE>.  If "/log" is given,
     squid-gw will log any reference classified as <TYPE>.  The
     following keywords are available for <TYPE>:

          unknown       unknown references (e.g., "&foobar;")

     The classification of references can be changed in the source
     code only.  Currently, squid-gw knows about HTML and ISO-Latin-1
     entity references.  Numeric character references are classified
     by value: 0-255 are known, 256-65535 are unknown, 65536 and
     greater are invalid.  The following keywords are available for
     <POLICY>:

          copy          don't remove the reference [U]
          drop          completely remove the reference [U]
          escape        escape the reference ("&amp;foobar;") [U]

     (The letters in brackets indicate for which <TYPE>s the keyword
     can be used.)  For security, it is not recommended to use
     "unknown:copy".  Note that "unknown:drop" is a bad idea as a lot
     of HTML documents to not properly escape "&", in particular in
     URLs ("/cgi-bin/foobar?a=b&c=d").

     The default setting is "unknown:escape".  Any <TYPE>s not listed
     in the attribute are not changed.

     Example:

          squid-gw: html-references unknown:drop/log

  html-tags <TYPE>:<POLICY>[/log] ...

     Configure how to handle suspect tags ("<FOOBAR>") in HTML
     documents received from the server.  If there are multiple
     "html-tags" lines, all of them will be read.  <POLICY> applies to
     tags classified as <TYPE>.  If "/log" is given, squid-gw will log
     any tag classified as <TYPE>.  The following keywords are
     available for <TYPE>:

          dangerous     dangerous tag (e.g., "<APPLET>" when blocking Java)
          invalid       invalid tag (e.g., "<FOOBAR%>")
          unknown       unknown tag (e.g., "<foobar>")

     The classification of some tags can be configured (see "allow"),
     the classification of the other tags can be changed in the source
     code only.  Currently, squid-gw knows about HTML 2.0, HTML 3.2,
     HTML 4.0 (draft of 1997-07-08) and some Netscape and Microsoft
     extensions.  You should log unknown tags to be able to add them
     to the table of known tags (and reporting them to the author of
     squid-gw).  The following keywords are available for <POLICY>:

          comment       replace with comment [DIU]
          copy          don't remove the tag [U]
          drop          completely remove the tag [DIU]
          escape        escape the tag ("&amp;foobar&gt;") [DIU]
          prefix        prefix the tag name ("<REMOVED-foobar>") [DU]

     (The letters in brackets indicate for which <TYPE>s the keyword
     can be used.)  It is not recommended to use "unknown:copy".

     Any <TYPE>s not listed in the attribute are not changed.  The
     default setting is

          dangerous:prefix
          invalid:escape
          unknown:prefix

     Example:

          squid-gw: html-tags dangerous:prefix/log invalid:drop/log

  http-fields <TYPE>:<POLICY>[/log] ...

     Configure how to handle suspect and dangerous HTTP header fields.
     If there are multiple "http-fields" lines, all of them will be
     read.  <POLICY> applies to header fields classified as <TYPE>.
     If "/log" is given, squid-gw will log any header field classified
     as <TYPE>.  The following keywords are available for <TYPE>:

          dangerous     dangerous header fields (e.g. "Location: javascript:")
          invalid       invalid header fields (e.g., ":" missing)
          privacy       header fields disturbing privacy (e.g., "From:")
          silent        header fields to be dropped silently
                        (e.g., "Content-Length:" for text/html)
          unknown       unknown header fields (e.g., "foobar:")

     The classification of header fields can be changed in the source
     code only.  You should log unknown header fields to be able to
     add them to the table of known header fields (and reporting them
     to the author of squid-gw).  The following keywords are available
     for <POLICY>:

          copy          don't remove the header field [PU]
          drop          completely remove the header field [DIPSU]
          prefix        prefix the name with "REMOVED-" [DPUS]

     (The letters in brackets indicate for which <TYPE>s the keyword
     can be used.)  It is not recommended to use "unknown:copy".

     The default setting is "dangerous:prefix/log invalid:drop/log
     privacy:drop silent:drop unknown:prefix/log".  Any <TYPE>s not
     listed in the attribute are not changed.

     Example:

          squid-gw: http-fields dangerous:drop/log unknown:drop/log

  log <EVENT>...

     Define what events to log.  If there are multiple "log" lines,
     all of them will be read.  The following keywords are available
     for <LOG>:

          content-type

		log the Content-Type received from the server;
		this is useful for squid-log

          content-type-conflict

		if there are multiple Content-Type fields having
		different values, log the values received from the
		server; only the first value encountered will be
		passed on to the client, no matter how this option is
		configured

          incorrect-tags

		log incorrect tags such as "<a" at EOF which are
		removed unconditionally by squid-gw

          missing-content-type

		log instances of server responses with body but
		without Content-Type header field

          redirected

		log redirections caused by the -redir option of
		"deny-destinations"

          request

		log the HTTP request line received from the client;
		this is required by squid-top

          request-header

		log all lines of the HTTP request header received from
		the client; this can be used for debugging

          response-header

		log all lines of the HTTP response header received
		from the server; this can be used for debugging

          script-macros

		log all occurences of (removed) script macros (that
		is, attributes containing embedded scripts, such as
		"&{script};").  If JavaScript is not blocked, script
		macros won't be logged

          simple-response

		log the first non-empty line of a simple-response
		(HTTP/0.9) from the HTTP server; this can be used for
		debugging

          tag-attribute-pairs

		log tag/attributes pairs which are not expected to be
		encountered as the attribute (which is one which needs
		special attention such as "ACTION") should occur only
                with certain tags

          unknown-content-type

		log unknown Content-Type values (which squid-gw replaces
		with "application/binary)

          user-agent

		log the value of the User-Agent request header line,
		if present.  This can be used for finding values for
		the "browsers" attribute without having to enable "log
		reqest-header".

     Example:

          squid-gw: log content-type request
          squid-gw-debug: log request-header

  referer <POLICY> [<URL-PATTERN>]

     Configure how to treat the "Referer" HTTP request header line.
     The following values are available for <POLICY>:

          drop

		Drop all "Referer" request lines unconditionally.
		This is the default value.

          keep-all

		Keep all "Referer" request lines unconditionally.

          keep-same-site

		Keep "Referer" request lines only if it its host name
		matches the request's host name.

          keep-match

		Keep only those "Referer" request lines which match
		<URL-PATTERN>.

     Note that <URL-Pattern> is used only for "keep-match".

  server <HOST> <PORT>

     Forward to the HTTP proxy on host <HOST> port <PORT>.  <HOST> can
     be an IP address or a hostname.  This attribute is mandatory (at
     least one "server" must be in effect).  Only the first "server"
     line is read.

     Example:

          squid-gw: server 199.99.99.99 3128

  server-timeout <TIMEOUT>

     Define the number of seconds squid-gw is idle (without network
     activity) before disconnecting from the server.  The default
     value is 2 hours (PROXY_TIMEOUT in firewall.h).  Only the first
     "server-timeout" line is read.  For the tunnelled connection of
     the CONNECT request method, "server-timeout" applies to both the
     client and the server.

     Example:

          squid-gw: server-timeout 3600

  script <POLICY>

     Configure what to do with scripts (JavaScript), that is, with
     stuff between <SCRIPT> and </SCRIPT> and with HTML attributes
     (such as ONLOAD) containing scripts.  Only the first "script"
     line is read.  There are two choices for <POLICY>: "html" causes
     scripts to be treated as HTML; tags etc. will be parsed and
     rewritten; attributes containing scripts will be treated like
     ordinary attributes (escaping dangerous characters).  This breaks
     scripts which contain the characters "'", "", "<", "&", or ">".
     "verbatim" causes scripts to be passed through verbatim.  This
     keeps scripts intact, but opens a security hole for browsers
     which do not understand the <SCRIPT> tag or which for some other
     reason interpret scripts as HTML.  Therefore, you should move the
     "script" configuration to a browser-dependent configuration
     class.  ("-force" is not yet available.)  The default setting is
     "html".  "script" is ignored if JavaScript is disabled (see
     "block").

     Example:

          squid-gw: script html
          squid-gw-netscape: script verbatim

  user-agent <NAME>

     Replace the value of any "User-Agent:" HTTP header field with
     <NAME>.  This can be used for improving privacy by hiding the
     types of browsers and operating systems.  Don't forget to use
     quotes if the value contains blanks!  Only the first "user-agent"
     line is read.

     Example:

          squid-gw: user-agent Mozilla/3.0

     Note that, according to the Wget 1.5.3 documentation, "Netscape
     Communications Corp. has claimed that false transmissions of
     Mozilla as the User-Agent are a copyright infringement, which
     will be prosecuted."  This is probably a trademark issue, not a
     copyright issue.


URL patterns
============

URL patterns are used for the "destinations" and "href" configuration
attributes.  There are three variants for <URL-PATTERN>:

(1) "*" -- this always matches any URL under test.

(2) "*:<PATTERN>" -- compare the entire URL under test without
    interpretation to <PATTERN>.  Letter case is ignored.  "*" in
    <PATTERN> matches any number of characters.

(3) a URL -- compare the URL under test part by part to the pattern
    URL.  The special scheme "http*:" matches "http:" and "https:".

For variant (3), missing parts (e.g., port number, path, query) of the
pattern URL match any corresponding value of the URL under test (i.e.,
missing parts are ignored).  Letter case is ignored (even for the
path!).  "*" can be used for matching any number of characters in the
following parts of the URL: user (ftp), password (ftp), host, path,
query (http), fragment (http), type (ftp).  If the host part looks
like an IP address (ie, it contains four fields separated by dots,
each field containing either "*" or a number between 0 and 255), the
host part of the pattern matches IP address only.  Currently there are
some restrictions in URL matching which may be lifted in future
releases:

- The host name is compared as string; names and IP addresses are not
  resolved (and thus never match)

- "/./" and "/../" in the path are not treated specially.

- Some schemes such as "ftp:" must be followed by "//"; alternatively,
  you can use "*:ftp:".

- Hexadecimal escapes ("%ff") cannot be used in a pattern URL


The "-hosts" option
===================

The "-hosts" option is used for the "browsers" and "destinations"
configuration attributes, including the variants with "permit-" or
"deny-" prefix.  If the "-hosts" option is omitted, the rule line will
be applied independently of the client's IP address or domain name, as
long as the rest of the attribute (browser name or destination URL,
respectively) matches the rule.

If at least one "-hosts" option is used in a configuration line, the
rule will be applied only if the client's IP address or domain name
matches the "-hosts" options.  Each "-hosts" option is followed by at
least one <EXTENDED-HOST-PATTERN>.  <EXTENDED-HOST-PATTERN>s come in
two flavours: positive patterns (a <HOST-PATTERN>, i.e., an IP address
pattern or a domain name pattern) and negative patterns (the character
"!" folllowed by a <HOST-PATTERN>).  "-hosts" options and their
patterns are evaluated from left to right: as soon as a positive
pattern matches, the rule will be accepted (subject to browser or URL,
respectively); as soon as a negative pattern fails to match, the rule
will be ignored.  If a positive pattern fails to match or if a
negative pattern matches, the next pattern or "-hosts" option will be
checked.  When reaching the end of patterns and "-hosts" options
without having hit a matching positive pattern, the rule will be
ignored.  To make sense, a negative pattern must be followed by a
positive pattern (otherwise the rule would never match).  It does not
matter how patterns are grouped to "-hosts" options as long as the
sequence of the patterns isn't changed, i.e., the following lines are
equivalent:

  squid-gw: deny-browsers Wget* -hosts !10.0.0.1 !10.0.0.2 *
  squid-gw: deny-browsers Wget* -hosts !10.0.0.1 -hosts !10.0.0.2 -hosts *

(Currently, there's no point in using multiple "-hosts" options as
there are no other "-hosts"-like options (such as "-browsers") which
could be mixed with "-hosts" options.)

The "-hosts" option is usually used for exempting certain machines
from "deny-" rules and for restricting "-permit" rules to certain
machines.  The following example, exempts clients which are used for
testing or central downloading of programs (such as browsers) from a
"deny-" rule:

  squid-gw:	deny-destinations *:*.exe -hosts !10.0.0.17 !10.0.0.18 *

This rule denies access to all URLs ending with ".exe".  Only for
clients 10.0.0.17 and 10.0.0.18, this rule does not apply.  Note that
the negative patterns are followed by a positive match-all pattern.
Also note that rule lines which come after the above one can still
deny access even for clients 10.0.0.17 and 10.0.0.18, so the above
rule is different from

  squid-gw:	permit-destinations *:*.exe -hosts 10.0.0.17 10.0.0.18
  squid-gw:	deny-destinations *:*.exe


Sample configuration
====================

squid-gw:	permit-hosts 192.168.0.17 -class debug
squid-gw:	permit-hosts 192.168.0.42 -class relax
squid-gw:	permit-hosts 192.168.0.*
squid-gw:	permit-browsers *
squid-gw:	user-agent "Mozilla/3.1"
squid-gw:	server wwwproxy.mydomain.com 3128
squid-gw:	server-timeout 3600
squid-gw:	client-timeout 300
squid-gw:	log content-type request
squid-gw:	href http: https: ftp:// mailto: news: javascript:
squid-gw:	http-fields dangerous:drop/log unknown:drop/log
squid-gw:	html-tags dangerous:comment/log unknown:comment/log
squid-gw:	html-attributes alphanumeric:copy novalue:copy
squid-gw:	html-attributes dangerous:drop unknown:drop on:drop/log
squid-gw:	html-references unknown:escape/log
squid-gw:	html-attribute-length 16384
squid-gw:	destinations http://www.a-safe-site.com -class js
squid-gw:	connect www.a-safe-site.com -ipaddr 1.2.3.4
squid-gw-relax:	allow java
squid-gw-debug:	allow cookies java javascript object
squid-gw-debug:	html-tags dangerous:prefix/log unknown:prefix/log
squid-gw-debug:	html-attributes dangerous:prefix/log unknown:prefix/log
squid-gw-debug:	html-attributes on:copy/log
squid-gw-js:	allow javascript
squid-gw-js:	script verbatim


Running squid-gw
================

squid-gw can be run either from inetd:

        squid-gw [ <KEY> ]

or as daemon:

        squid-gw -daemon <PORT> [ <KEY> ]
        squid-gw -fastdaemon <PORT> [-pf <PIDFILE>] [ <KEY> ]

If <KEY> is specified, squid-gw will read configuration rules using
<KEY> instead of "squid-gw".  This can be used for having different
configurations of squid-gw on different ports.

<PORT> is the port number.  When started as normal daemon (with the
-daemon option), squid-gw will reread the configuration rules for each
connection.

When started as fast daemon (with the -fastdaemon option), squid-gw
will read all global configuration rules and all rules in
configuration classes once, when started, to save time.  Any changes
made to "netperm-table" after starting squid-gw as fast daemon will be
ignored.  However, you can force squid-gw to reread the configuration
by sending it SIGHUP.  Note however, that squid-gw leaks some memory
when it receives SIGHUP.  squid-gw may terminate if the configuration
contains errors while squid-gw rereads the configuration.

If the -pf option is used, the process ID (to which SIGHUP should be
sent for letting squid-gw reread "netperm-table") will be written to
the file <PIDFILE>.  That file is opened before squid-gw has changed
its user ID and group ID to the values configured in "netperm-table".


Statistics
==========

The program squid-log collects squid-gw log entries from a syslog log
file for generating statistics.  If squid-log is called without
command line arguments, the log file is read from standard input.
Otherwise, all files given on the command line are read.  The output
of squid-log consists of one line per successful HTTP transaction,
each line containing four HT-separated fields:

1. host (client): NAME/IP_ADDRESS, e.g., "localhost/127.0.0.1"

2. request line, as received from the client (i.e., METHOD URL
   VERSION), e.g., "GET http://foo/bar HTTP/1.0".  If no request line
   is available (e.g., because "log request" is not enabled for
   squid-gw) or if the request line does not contain three fields
   separated by space, this field contains "- - -"

3. content type, as received from the servee, e.g. "text/html".  If no
   content type is available (e.g., because "log content-type" is not
   enabled for squid-gw), this field contains "-"

4. statistics, consisting of 5 numbers separated by one space
   character: number of bytes read from client, number of bytes
   written to client, number of bytes read from server, number of
   bytes written to server, and memory usage in bytes.

The program squid-top reports, with the -d option, the top <N> sites
(hostname and port number) visited via squid-gw, the top <N> clients
(-c option), or the top <N> Content-Types (-t option).  squid-top
takes its input from standard input, which is expected to be in the
format produced by squid-log.  squid-top is called like this:

        squid-top [-c|-d|-t] [-b] [-r] -n <N>

where <N> is the number of items to be listed.  The options -c, -d,
and -t select the item to be listed (see above); -d is the default.
With the -b option, squid-top lists the top items by number of bytes,
with the -r option, squid-top lists the top items by number of
requests.  If neither -b nor -r is given, squid-top lists the top
items by number of bytes and by number of requests, i.e., omitting
both -b and -r is equivalent to giving both -b and -r.

Example:

        squid-log /var/adm/messages | squid-top -n 10

Note that squid-top requires "log request" to be configured for
squid-gw.


Hints
=====

This chapter gives some hints for speeding up squid-gw and for using
squid-gw with internal WWW servers.


Speeding up squid-gw
--------------------

The code of squid-gw makes excessive use of assertions.  If squid-gw
runs properly for a while on your system, you might want to disable
the debugging assertions (that is, those which check things which
should be guaranteed by the code anyway).  To do this, add

  -DNDEBUG

to CFLAGS in libem/Makefile and squid-gw/Makefile.  Then recompile
squid-gw:

  cd libem
  make clean
  make
  cd ../squid-gw
  make clean
  make

and copy squid-gw to the target directory.  This will approximately
halve the CPU usage of squid-gw for HTML documents.  (That's why I
wrote "excessive" above :-)

As squid-gw has a lot of configuration attributes, using -fastdaemon
is highly recommended.


Access to internal WWW servers
------------------------------

As Squid is installed outside your gateway host, it cannot access any
internal WWW servers (if it can, you probably should review your
security policy).

If you have internal WWW servers and do not want to configure "No
proxy for" in each client's web browser, you can use "server" in a
configuration class invoked by "permit-destinations".  If your
internal WWW servers understand absolute URLs in request lines (HTTP
1.1), just do this:

  # Default server (for the Internet at large)
  squid-gw: server squid.my.domain 3128
  # Internal WWW servers
  squid-gw: destinations http://www1 http://www1.my.domain -class www1
  squid-gw: destinations http://www2 http://www2.my.domain -class www2
  # Configuration classes for internal WWW servers
  squid-gw-www1: server www1.my.domain 80
  squid-gw-www2: server www2.my-domain 80

You have to use a separate configuration class for each server and
each port.  This works for HTTP only, not for FTP.

If your internal HTTP servers do not understand absolute URLs in
request lines, or if you need other protocols such as FTP, you have to
install a HTTP proxy which translates HTTP proxy requests to HTTP (or
FTP or whatever) requests.  You can use http-gw or Squid for that
purpose.


Miscellaneous comments
======================

* Here's one reason for not putting Squid on your gateway host: Consider
  what happens if a HTML page contains "<A HREF='http://localhost:19'>"
  -- note that squid-gw does not yet attempt to filter such stuff.

* A lot of (most?) HTML pages are broken and still displayed correctly
  by most browsers.  squid-gw tries to support some broken pages, but
  not all.  For instance, "<A HREF='#100%'"> is not supported ("%" is
  used for escaping in URLs).

* squid-gw currently supports character set ISO-8859-1 only.  In
  particular, multibyte characters (Kanji!) are not supported.

* Gopher probably doesn't work.

* Even with "script verbatim", the HTML attribute

        ONLOAD="foo='&#0001'"

  gets rewritten to

        ONLOAD="foo='&#1;'"

  For now, I consider fixing this to be not worth the trouble.


Debugging squid-gw
==================

To debug squid-gw, run it this way:

        squid-gw -debug [-server] [ <KEY> [ <CLASS> ]]

The -debug option causes squid-gw to use file descriptors 0 and 1 for
client input and client output, respectively; these file descriptors
need not be sockets in this case.

The -server option causes squid-gw to use file descriptors 3 and 4 for
server input and server output, respectively, instead of connecting to
the server defined with the "server" attribute.

Example:

        squid-gw -debug -server squid-gw test 0<c.i 1>c.o 3<s.i 4>s.o


Log entries explained
=====================

The most common squid-gw log entries are explained here.  Upper-case
words usually refer to a variable part of the entry.

Some of the entries include "at N1/N2" where N1 and N2 are numbers
which describe the location of the problem in the HTML document: N1 is
the relative character number in the input (from the server), N2 is
the relative character number in the output (to the client).

Some of the entries include "REQUEST_OR_RESPONSE"; in actual log
entries, this text is replaced by either "request" (for the request
header from the client) or "response" (for the response header from
the server).

Some of the entries are disabled by default and can be enabled with
the "log" attribute.

attempt to write to closed connection

	squid-gw tried to write to a socket which was closed in the
	meantime at the other end (either the client or the server).

Bad request: bad URL

	The syntax of the request line from the client is incorrect,
	squid-gw does not accept the URL as valid.  For instance, the
	URL must not contain an unescaped space character.

Bad request: invalid HTTP version

	The syntax of the request line from the client is incorrect;
	there's no "HTTP/1.0" or "HTTP/1.1" at the end of the line.

Bad request: unknown method

	squid-gw currently supports only the GET, HEAD, and POST
	request methods.  Unsupported request methods include PUT and
	CONNECT.

Bad request: unsupported scheme

	squid-gw supports only the "ftp" and "http" schemes in the
	request URL.

Class CLASS_NAME not configured

	The configuration class CLASS_NAME is referenced by not
	defined.

configuration key too long

	A class name in netperm-table is too long.

Conflicting values for Content-Type: MIMETYPE1 vs. MIMETYPE2

	There are multiple, conflicting Content-Type response header
	fields.  Only the first one will be passed on to the client.

	Configuration: "log content-type-conflict"

Content-Type: MIMETYPE

	The content type of body of the response from the server is
	MIMETYPE (e.g., "text/html").

	Configuration: "log content-type"

dangerous attribute for <TAG> at N1/N2: ATTR=VALUE

	The attribute named ATTR (having value VALUE) of tag <TAG>
	looks dangerous.  Most commonly, this message is logged for
	invalid URLs for HREF.

	Configuration: "html-attributes dangerous"

dangerous REQUEST_OR_RESPONSE header field: LINE

	The HTTP header line LINE is considered dangerous by squid-gw.

	Configuration: "http-fields dangerous", "allow", and "block"

dangerous tag at N1/N2: <TAG>

	The tag <TAG> is considered dangerous by squid-gw.  Most
	commonly, this message is logged for <APPLET>, <OBJECT>, and
	<SCRIPT> if these tags are disabled.

	Configuration: "html-tags dangerous", "allow", and "block"

deny browser=BROWSER use of gateway

	The value BROWSER of the User-Agent field sent by a client
	matches a "deny-browsers" line.

	Configuration: "browsers"

duplicate REQUEST_OR_RESPONSE header field: LINE

	The HTTP header field in header line LINE should appear only once.
	squid-gw rejected the entire request or response.

exit host=HOSTNAME/IPADDR (CI1/CI2 CO1/CO2) (SI1/SI2 SO1/SO2) MEMORY

	This message is logged on termination of a HTTP connection
	initiated by client HOSTNAME/IPADDR.  The numbers in
	parentheses are numbers of bytes processed during the session;
	the *1 number is the actual number processed, the *2 number is
	the number of bytes read or written.  For instance, if the
	body is longer than claimed by Content-Length, these numbers
	may differ.  CI and CO are the number of bytes received from
	and sent to the client, respectively.  SI and SO are the
	number of bytes received from and sent to the server,
	respectively (these numbers are zero for the CONNECT request
	method).  MEMORY is the memory usage in bytes.

Forbidden

	A client tried to access a host which matches a
	"deny-destinations" line.  Note that you can configure an
	alternate message with the -message option of
	"deny-destinations".

	Configuration: "destinations"

incorrect tag at N1/N2: <TAG>

	The sequence of characters TAG looks like a tag but isn't one
	(e.g., "<A)" and was removed unconditionally by squid-gw.

	Configuration: "log incorrect-tags"

invalid REQUEST_OR_RESPONSE header field: LINE

	The HTTP header line LINE is considered invalid by squid-gw.
	For instance, this happens for rejected cookies.

	Configuration: "http-fields invalid", "allow", and "block"

invalid tag at N1/N2: TAG

	The sequence of characters TAG looks like a tag but isn't one
	(e.g., "<A)".  Most commonly, this message is logged for typos
	and missing escaping ("&lt;" for "<").  Sometimes, it's logged
	for broken HTTP servers which send images with Content-Type
	text/html.

	Configuration: "html-tags invalid"

HTTP-EQUIV used instead of NAME for <META>

	Apparently, the author of the HTML document confused the NAME
	and HTTP-EQUIV attributes of the <META> tag.

<meta> attributes: NAME=NVALUE CONTENT=CVALUE

	This message provides the actual values used in a <META> tag
	about which squid-gw complained in a preceding log entry.
	NAME is either "NAME" or "HTTP-EQUIV", NVALUE is the value of
	that attribute, CVALUE is the value of the "CONTENT"
	attribute.

missing Content-Type

	The response from the server includes a body but no
	Content-Type header field.  squid-gw sets the Content-Type to
	text/html.

	Configuration: "log missing-content-type"

missing colon in REQUEST_OR_RESPONSE header line: LINE

	squid-gw received an invalid HTTP header line, LINE.

NAME used instead of HTTP-EQUIV for <META>

	Apparently, the author of the HTML document confused the NAME
	and HTTP-EQUIV attributes of the <META> tag.  This message is
	benign as <META> tags using the NAME attribute are ignored by
	browsers.  Well, at least they should be ignored.  Anyway,
	squid-gw pretends that HTTP-EQUIV is used.

preloading configuration class CLASS_NAME

	squid-gw starts preloading the configuration class CLASS_NAME
	for -fastdaemon.

privacy disturbing REQUEST_OR_RESPONSE header field: LINE

	The HTTP header line LINE is considered by squid-gw to disturb
	privacy.  For instance, this happens for Referer.

	Configuration: "http-fields privacy", "referer"

reconfiguration: start
reconfiguration: done

	Start and end of reconfiguration due to receipt of SIGHUP by
	squid-gw started with the -fastdaemon option.  A "start" entry
	without "done" entry means that squid-gw terminated during
	configuration.

redirected to REDIR_URL

	The request has been redirected by squid-gw to the URL
	REDIR_URL according to the -redir option of a
	deny-destinations line.

	Configuration: "log redirected", "deny-destinations -redir"

request: REQUEST

	REQUEST is the request line from the client.  If the request
	line is valid, this includes the request method (e.g., "GET"),
	the URL, and "HTTP/VERSION" where VERSION is 1.0 or 1.1.

	Configuration: "log request"

script macro in attribute: VALUE

	An HTML attribute with value VALUE has been removed because it
	contains a script macro (such as "&{script};").

	Configuration: "log script-macros", "allow", and "block"

silently dropped REQUEST_OR_RESPONSE header field: LINE

	The HTTP header line LINE was dropped silently by squid-gw
	because it cannot be handled.  For instance, this happens with
	header lines for persistent HTTP/1.1 connections, which aren't
	supported by squid-gw.

	Configuration: "http-fields silent"

simple-response: LINE

	squid-gw received a response from the server which does not
	look like an HTTP/1.0 (or HTTP/1.1) status line.  LINE is the
	first non-empty line of the response.

	Configuration: "log simple-response"

Trailing NUL character removed

	Apparently, Microsoft IIS (redir.asp) likes to add a NUL
	character at the end of the HREF attribute.  squid-gw removes
	that NUL character and logs this message.

Transfer-Encoding is not implemented

	The Transfer-Encoding HTTP header field is not yet implemented
	in squid-gw.  Sorry.

Treating as HTML: <TAG at N1

	When squid-gw treats a non-text/html body as text/html body
	for Microsoft Internet Explorer, this message will be logged.
	The string "<TAG" was found at byte N1 of the body.

	Configuration: "auto-html-limit"

unexpected tag/attribute pair: TAG/ATTR

	ATTR is the name of a potentially dangerous attribute known by
	squid-gw which is not expected to be used with the tag <TAG>.

	Configuration: "log tag-attribute-pairs"

unknown attribute for <TAG> at N1/N2: ATTR=VALUE

	squid-gw doesn't know about the attribute named ATTR (having
	value VALUE) of tag <TAG>.  Most commonly, this message is
	logged for typos in HTML pages, such as "HEIGTH=100".

	Configuration: "html-attributes unknown" etc.

unknown Content-Type: MIMETYPE

	squid-gw doesn't know about the Content-Type MIMETYPE.  It is
	replaced with "application/binary".

	Configuration: "log unknown-content-type"

unknown reference at N1/N2: NAME

	squid-gw doesn't know about the entity reference or numeric
	character reference NAME.  Most commonly, this message is
	logged for missing escaping of "&", in particular in URLs
	(e.g., "HREF='/cgi-bin/foo?x=1&y=1'").

	Configuration: "html-references	unknown"

unknown REQUEST_OR_RESPONSE header field: LINE

	The HTTP field in the HTTP header line LINE is not known by
	squid-gw.

	Configuration: "http-fields unknown"

unknown tag at N1/N2: <TAG>

	squid-gw doesn't know about the tag <TAG>.  Most commonly,
	this message is logged for typos such as "<ADRESS>".

	Configuration: "html-tags unknown"

User-Agent: BROWSER

	squid-gw received a User-Agent request header line having the
	value BROWSER.

	Configuration: "log user-agent"

History
=======

Version 0.1, 1997-08-31
-----------------------

- Initial version

Version 0.2, 1997-09-13
-----------------------

- Multiple "log" configuration lines; new events: missing-content-type,
  redirected, request-header, response-header, simple-response, user-agent

- Add "href" attribute

- Add "-redir" and "-delay" options for "destinations" attribute

- Add "*:<PATTERN>" for unparsed matching of URLs; "http*:" for matching
  "http:" and "https:"

- New heuristic for handling broken HTML comments

- Handling of Content-Transfer-Encoding improved

- Fixed "comment" policy for removed tags

Version 0.3, 1997-10-05
-----------------------

- Ignore "-delay" when preloading

- Don't reject multiple identical Content-Type response header fields

Version 0.4, 1997-11-01
-----------------------

- Some small bugs fixed

- Treat IP address in URL patterns specially

- Add "-delay" option for deny-destinations attribute

- Add "script" attribute

Version 0.5, 1998-01-31
-----------------------

- Cope with HTTP header lines terminated with CR CR LF

- Add "-hosts" option for "browsers" and "destinations" attributes

- Fix the description of the -debug and -server options

Version 0.6, 1998-02-19
-----------------------

- Fix sample configuration and description of "allow javascript":
  "html-attributes on:copy" should not be used; moreover "href"
  can be used in global configuration, only

- End "<SCRIPT>" at first "</", not at "</SCRIPT>".  This may break
  some JavaScript programs (which violate SGML anyway), but it's
  safer

- Remove script macros in attributes (by replacing "&{" with "${") if
  JavaScript is blocked

- Add new HTML 4.0 tags and attributes

- Accept Content-Type: text/css

Version 0.7, 1998-03-16
-----------------------

- README: "log script-macros" wasn't documented

- End "<SCRIPT>" at "</SCRIPT>", not at first "</", like squid-gw up
  to 0.5 did, but replace "</" (not followed by "SCRIPT>") with "<\/".
  This change fixes broken JavaScript code while being as safe as
  squid-gw 0.6's behavior

Version 0.8, 1998-11-12
-----------------------

- Don't remove </APPLET>

- Better cope with double quotes in comments

- Accept white space (not just SP) in <!DOCTYPE ...>

- Check snprintf() and vsnprintf()

- Don't treat a Content-Type conflict as fatal error; just keep the
  first value.  Logging can be enabled with "log content-type-conflict"

- Add "Content-Type: text/html" if the server doesn't send a Content-Type
  field.  This solves the "Document contains no data" problem

- Add note about "user-agent mozilla"

- Add note about Java and ActiveX blocking being defeated by allowing
  JavaScript

Version 0.9, 1999-02-27
-----------------------

- Accept '&' in attribute values (i.e., in broken HTML code which
  doesn't use quotes)

- No longer reject responses with unknown Content-Type -- use
  application/binary; "log unknown-content-type"

Version 1.0, 1999-07-01
-----------------------

- Support CONNECT (https/SSL)

- Parsing of invalid tags such as "<a<b>" and "<a_b>" changed; it is
  recommended to configure "html-tags invalid:drop"

- Parsing of invalid comments and invalid <!DOCTYPE ...> changed

- New "referer" attribute (configuration)

- New options for squid-top: -c, -d, and -t

Version 1.1, 1999-09-16
-----------------------

- Don't remove </OBJECT>

- Add "allow embed" configuration

- Add "allow style" configuration

Version 1.2, 2000-05-01
-----------------------

- SECURITY: A malicious HTTP server could evade auto-html-limit by
  sending the beginning of the body in separate packets

- Cope with CRLF of the status line being sent in a separate packet

- Add -pf option (PID file)

- Cope with missing white space after the status code

- Don't reject Content-Type: (null)

- Allow (and remove) exact duplicates for certain HTTP response header
  fields

Version 1.3, 2000-05-12
-----------------------

- This version fixes a bug of version 1.2 in emi_fill() which caused
  content (HTTP body) corruption

==============================================================================
                                THE END
==============================================================================
