			      What it is
			      ~~~~~~~~~~

squidGuard is a free (GPL), flexible and efficient filter and
redirector program for squid.  It lets you define multiple access
rules with different restrictions for different user groups on a squid
cache.  squidGuard uses squid standard redirector interface.

(Have an eye on samples/sample.conf while reading this.)

You may define multiple target classes (dest) like adult, financial
etc.  Each class may be defined by a domainlist, an URL list and/or a
set of regular expressions.

You may define multiple user/client groups (src).  Each client group
may be defined by a list of IP-addresses and/or IP-ranges, a list of
domains, and/or list of user names.

You may define multiple rewrite (rew) rules to redirect certain URLs
to a given site.

You may define multiple access control lists (acl) where each client
group is granted access to or blocked from a set of target
classes. Each acl may have different rewrite rules. Each acl may have
a different redirect URL for blocked URLs or fallback to a default.

This way some users may access only som predefined sites or URLs,
while others may access any site exept those containing adult
material, while some may access the whole Internet.

			     How it works
			     ~~~~~~~~~~~~

Squid sends a line like "URL ip-address/fqdn ident method\n" to
squidGuard for each access.  squidGuard returns "\n" if OK or the line
with a rewritten URL if matched by a blocking or rewriting rule.

For each request the client class is determined. Then the matching acl
is checked. If the acl is not "pass any", URL is checked against each
listed target class.  If the URL is blocked the redirect URL is
returned.  If the URL matches a rewrite rule for the acl the new URL
is returned.

The access control lists are checked in the order they are
defined. This means an acl for a subset of clients should come before
a more general acl for a superset of clients.

The match is case insensitive.

squidGuard breaks out as fast as possible to avoid unnecessary
lookups.

The domain prefixes www[0-9]* and web[0-9]* are ignored in domain and
url matching.  The protocol (http/ftp/wais/gopher) is also ignored.
If you have domain blocking on "bad.com", then all of these are
blocked:
	http://bad.com
	http://bad.com/whatever
	https://bad.com/whatever
	ftp://bad.com
	wais://bad.com
	http://www2.bad.com
	http://www.whatever.bad.com
but not:
	http://www.whateverbad.com

If you have URL blocking on "foo.bar.com/~baduser", then all of these
are blocked:
	http://foo.bar.com/~baduser
	http://foo.bar.com/~baduser/whatever
	https://foo.bar.com/~baduser/whatever
	http://www.foo.bar.com/~baduser/whatever
	http://www2.foo.bar.com/~baduser/whatever
but not:
	http://foo.bar.com/~godsite

The target class domainlist, urllist and expressionlist are stored in
separate plain text files (relative to dbhome if not absolute path)
for eache class and type. At startup they ar parsed and stored in an
in-memory-only Berkeley DB database for fast access with simple code.

An IP-range spec may look like 172.16.2.21-172.16.2.97, 10.2.0.0/23 or
10.100.0.0/255.255.255.192

You may break a (long) line by repeating the leading keyword. Repeated
lines of the same type within a class will bee joined when the rule
trees are built. So:

src foo {
		ip 1.2.3.4
		ip 2.3.4.5
}

is equivalent to:

src foo {
		ip 1.2.3.4 2.3.4.5
}

			Target class databases
			~~~~~~~~~~~~~~~~~~~~~~

squidGuard now comes with a static sample database of sites that may
contain pornographic material. You may use it as a start, but you
should check it for correctness first.  A hint for making dynamic
adult material block lists is to search for free regularely updated
adult link pages via a search engine and parse them with regularely
with a Perl script to maintain an up to date bad list.  To make it
more sofisticated the script should be intelligent enough to sort -u
and reduce the urls to directory-URLs or domains (when no subdir in
the URL), and remove all URLs covered by the domainlist.  To make it
even more sofisticated you may make an Perl script that valitates the
domains and urls on a less regurar basis and automatically remove
outdated entries. The domain and URL lists may be supplemented by a
regular expression list like in sample.expr.  WARNING! "sample.expr"
contains lots of dirty words!!  You have been warned!

A domainlist file may look like (Remove leading "www[0-9]*." and
"web[0-9]*." in this file):

----cut-----
porn.com
sex.com
----cut-----

An URL list file may look like (Remove leading
"(http|ftp)://(www|web)[0-9]*." and trailing "/" or "/file.htm*" in
this file):

----cut-----
multihome.foo.com/~bar/pics
x.y.z.com/foo/bar
----cut-----

Use only lower case in all these lists.
(Do "cp list list~;tr A-Z a-z <list~>list;rm list~").

			     Portability
			     ~~~~~~~~~~~

squidGuard has been tested with both squid-1.1.22 and 1.2.beta25 on
Solaris-2.6 with gcc-2.7.2.3 and db-2.4.10. (may work with db-1.85/86
too -- not tested)

			     Installation
			     ~~~~~~~~~~~~

0) You must have a C compiler (gcc is free and recomended)

1) Install the latest version 2 of Berkeley DB
(http://www.sleepycat.com/db/) if you dont't have it allready.  (It
might work with version 1 too. I don't think we use any of the new
stuff in v2)

2) Check/edit the first part of Makefile to reflect your env.

3) Run "make"

4) If all is OK run "make install"

5) Make a config file (see samples/sample.conf)

6) Make the the target class lists you want. You may want to have
these files "chmod 640" and "chown cache_effective_user" or "cghrp
cache_effective_group".

7) Test it isolated by putting some sample requests in tree files:
test.pass test.block test.rewrite

Run "squidGuard -c your.conf < test.pass". If you did it right this
should output the same number of blank lines as requestlines in
test.pass.

Run "squidGuard -c your.conf < test.block". If you did it right this
should output the same number of lines as requestlines in test.block
but all URLs rewritten with the.

Run "squidGuard -c your.conf < test.rewrite". If you did it right this
should output the same number of lines as requestlines in test.rewrite
but all with rewritten URLs acording to your rewriterules (if any).

8) Install a suitable CGI on a web server (Use blocked.cgi as a
sample)

9) Modify squid.conf as described in "Squid configuration" and kill
-HUP squid.

10) Test with a proxy-enabled browser.

			 Squid configuration
			 ~~~~~~~~~~~~~~~~~~~

In the squid config file change this section to something like:

----cut-----
#  TAG: redirect_program
#       Specify the location of the executable for the URL redirector.
#       Since they can perform almost any function there isn't one included.
#       See the Release-Notes for information on how to write one.
#       By default, a redirector is not used.
#
redirect_program /local/squid/bin/squidGuard -c /local/squid/etc/squidGuard.conf

#  TAG: redirect_children
#       The number of redirector processes to spawn. If you start
#       too few Squid will have to wait for them to process a backlog of
#       URLs, slowing it down. If you start too many they will use RAM
#       and other system resources.
#
redirect_children 4
----cut-----

To enable RFC931/ident lookups you may leave "ident_lookup off" but
you have to define a user acl in the squid config file  la:

----cut-----
acl ident user REQUIRED
acl admin src 1.2.3.4/255.255.255.255
http_access allow admin ident
----cut-----

We have plans to make our ident daemon for NT freely awailable very
soon. (Needs some more testing, packaging and maybe a Win9x port.)

			  Bugs and features
			  ~~~~~~~~~~~~~~~~~

We have ran squidGuard with 100% stability and virtually no delay for
months on our proxy server, currently with 4251 listed domains,
5609 URLs, 5 regular expressions and 21 acls.

Though there are still som minor known bugs and features:

A list file cannot be empty. This gives unexpected logical results.
If you have an empty list file then be sure to make i a comment or
remove it from the config file.

For some reason rewrite does not fallback to the default (yet). So
rewrite must be declared explicitly in all acls (until fixed).

If something is terribly wrong (like syntax error in squidGuard.conf)
squidGuard will fallback to a "pass all for all" mode. This is a
feature not a bug! squidGuard can't resume normal operation from this
mode without being restarted by a kill -HUP to squid (not squidGuard).

There is a partial (unfinished) implementation of reload config on a
HUP signal directly to squidGuard, but this is not very well
tested. You better restart squidGuard by kill -HUP to squid and not to
squidGuard.

squidGuard should maintain a reverse DNS map, or use squids dnsserver,
to catch users who breaks the rules by replacing the domain name part
of the URL with an IP-addresse after a manual nslookup. This must be
done in a nonblocking manner. Wich means this type of request may leak
through until a reverse mapping for the address is awailable. A
different approach might be to put both versions (IP and domain) in
the domain and URL lists. Yet a different approach might be to require
the use of domainnames for some or all users by blocking all IP-URLs
by defining a target class that is a regular expression like
"^\d+\.\d+\.\d+\.\d+".

The documentation could be better.

		    Bug reports and contributions
		    ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

If you have a contribution, a bug fix or a comment send it to
		     pal.baltzersen@ost.eltele.no

For bug fixes I prefer a "diff -c oldfile newfile" or "diff -cr olddir
newdir".  Please explain what problem your diff fixes.

Success reports are welcome too..

				Links
			        ~~~~~

The latest version of squidGuard:
http://ftp.ost.eltele.no/pub/www/proxy/	(squidGuard-*.tar.gz)
ftp://ftp.ost.eltele.no/pub/www/proxy/	(squidGuard-*.tar.gz)

The latest version of Berkeley DB code (required):
http://www.sleepycat.com/db/

The latest version of Perl (recommended):
http://www.perl.com/

The latest version of a free portable ident daemon for Unix:
ftp://ftp.lysator.liu.se/pub/ident/servers/	(pidentd-*.tar.gz)

The latest version of our free ident daemon for M$ Win* will be put
in (soon):
http://ftp.ost.eltele.no/pub/networking/ident/	(identd-*.zip)
ftp://ftp.ost.eltele.no/pub/networking/ident/	(identd-*.zip)

			       Authors
			       ~~~~~~~

squidGuard was designed by Pl Baltzersen
<pal.baltzersen@ost.eltele.no> and implemented by Lars-Erik Hland
<leh@nimrod.no> at ElTele st AS and made freely awailable by
permission of
			    ElTele st AS
