Version 2.10-rc1
----------------

[20040222.1606] jonz: added support for global groups

global groups allows DSPAM to provide a "SpamAssassin type out-of-the-box
filtering" for all new users until they have built their own useful
dictionaries.  to create a global classification group, add something like
this to $USERDIR/group:

groupname:classification:*globaluser

This will automatically add globaluser as a classification peer to all users.
Any user who has less than 1000 innocent messages or 250 spam messages in 
their corpus, or whose filter is uncertain about a particular message will 
consult the global dictionary for an answer.

global groups will need to be trained using corpus or other means, or by
using the dspam_merge tool.  the global user (in this case 'globaluser') is
treated just as any other user on the system.

[20040221.2155] jonz: format changes to dspam_dump

dspam_dump formatting changes + display of token probability

[20040220.1700] jonz: added quick fix for \r stripping in dspam_corpus

added a quick fix to strip \r's in mailboxes when using dspam_corpus

[20040219.0150] jonz: added support for neural networking

see README for more details

[20040218.2300] jonz: added tweaking to dobly for small text samples

added tweaking of thresholds to Bayesian Dobly for small text sampes < 3.5k

[20040217.0724] jonz: fixed some miscellaneous compile warnings

fixed some miscellaneous compile warnings.  2 for when trusted user security
is disabled, 1 for dspam_2mysql.c:126

Version 2.10-beta-2
-------------------

[20040214.1632] jonz: added TOE support

added TOE (Train on Error) support using the --enable-toe configure function.
see the README file for more details.

[20040213.1549] jonz: fixed X-DSPAM header duplication bug

fixed a bug which caused X-DSPAM headers to be cumulatively appended when
a single message addresses multiple local users.

[20040214.1327] jonz: added --enable-client-compression configure flag

added option --enable-client-compression to use compression option between
data source and its clients (where available).  presently only available with
the mysql_drv storage driver.  you should enable this if the data source
is on a separate machine from the DSPAM agent(s), as it conserves bandwidth
at the expense of a few CPU cycles.

[20040214.1258] jonz: created speed and space optimized MySQL scripts

created both speed and space optimized mysql_objects.sql scripts.

[20040214.1235] jonz: added new stats to CGI

added FP stats + overall accuracy to CGI

[20040214.1235] jonz: added debug output for noise filtering

added noise level, spammy tokens, and eliminations to debug output

Version 2.10-beta-1
-------------------

[20040212.2208] jonz: added stale data purge / PURGE_ANY

added stale data purge to libdb3 and libdb4 purge tools.  based on PURGE_ANY,
defined in config.h, any stale data is removed after six months.

[20040212.2205] jonz: added DSF_NOISE flag

added DSF_NOISE flag to libdspam interface for activating Bayesian Noise 
Reduction (Bayesian Dobly).  

[20040211.0158] jonz: disabled mysql_drv _ds_delete_signature

disabled _ds_delete_signature in mysql_drv due to errors; added signature
purge to purge.sql script.  no longer necessary to run dspam_clean if using
the mysql storage driver.

[20040211.0155] jonz: mysql_drv get_one update

check to insure there was at least one token to be loaded, otherwise do not
perform query

Version 2.9.6
-------------
[20040208.1906] jonz: bugfix for Bayesian Dobly

BUGFIX: when Bayesian Dobly is activated on users with < 4000 innocent
messages, the filter forgets to load token stats for the user and marks
all messages as innocent.

Version 2.9.5
-------------
[20040204.0413] jonz: implemented Bayesian Dolby

implemented Bayesian Dobly noise reduction 
(see http://www.nuclearelephant.com/projects/dspam/dobly.html)

[20040202.2216] jonz: added multipart frequency threshholds

body tokens in multipart messages now require a minimum frequency of 2 to be
included in the calculation.

[20040128.2021] jonz: only report source-addresses in mature corpuses

only report source-addresses when the user has >4000 innocent messages in
their corpus.

Version 2.9.4
-------------

[20030128.0334] jonz: added DSPAM SBL dropfile support

added support to source address tracking to drop SBL files to /var/spool/sbl
if exists, where client in directory watch mode can read.

Version 2.9.3 
-------------

[20040122.0700] jonz: hex decoding
                                                                                
a small piece of code to perform hex-decoding on 8bit encodings.  very useful, 
although hex encoding is still somewhat rare.
                                                                                
[20040121.0805] jonz: new stats watering-down code for high-spam users
                                                                                
implemented new code for watering down statistcs during the learning phase to
compensate for users with a high percentage of spam.  this should only affect
accuracy of normal (average spam) users for the first 1000 messages.
significant watering down takes place up to 1000 spams.  limited watering
down takes place up to 2500 spams if the user has more spam in their corpus
than innocent mail.
                                                                                
[20040121.0805] jonz: priority given to complex tokens
                                                                                
slight code tweak to give priority to more complex tokens (e.g. chained
tokens) to help improve accuracy.
                                                                                
[20030121.0805] jonz: signaure should not be stored when using --corpus
                                                                                
signatures are no longer stored when using the --corpus flag

Version 2.9.1
------------

[20031220.1442] jonz: added notification emails

three different notification emails can be configured to get sent:

- to a user the first time they receive a message through dspam (first run)
- to a user the first time a spam is caught through dspam on their behalf
- to a user when their quarantine box is > 2MB in size

to use notification emails, copy the txt/ directory from the distribution
into USERDIR and configure the emails accordingly.  more information is
available in the README.

Version 2.8.1
-------------

[20031205.0821] jonz: html preformatting only for html parts

html preformatting to be done only to html parts; html comments in
plain text parts should not be filtered out.

[20031205.0156] jonz: high-byte tokens not ignored

fixed a small bug causing tokens consisting of all high-bytes to be
ignored.
 
[20031205.0122] jonz: tweaked cgi spam ratio

tweaked cgi spam ratio to include misclassificatoins

[20031130.1016] jonz: dspam_merge to corpusfy totals

dspam_merge now moves all totals to corpusfed, so that a merged user can
easily start with fresh stats.

[20031129.1619] jonz: fixed quarantine agent arg skip bug

fixed minor bug which caused some arguments to be skipped then using a custom
quarantine agent
 
[20031129.1443] jonz: implemented opt-in/opt-out storage directory

moved all user.dspam and user.nodspam files to USERDIR/opt-in and
USERDIR/opt-out, respectively.  this saves from needing to have and set up
a directory for each user.
 
Version 2.8
-----------

[20031126.1633] jonz: stepped down insert query error to debug info

stepped down the query error on insert down to debug info, as it is a common
occurance on busy servers.

[20031124.0523] jonz: corrected buffer overrun in BDB drivers

corrected buffer overrun vulnerability in BDB drivers dealing with copying
tokens into memory.  discovered when working with corrupt dictionaries which
caused segfaults.  the dictionary would have to be manipulated in order to 
exploit, so risk was minimal.

[20031124.0459] jonz: fixed bug in dspam_2mysql

dspam_2mysql failed to place quotes around token value.

[20031123.1351] jonz: fixed libdb4,libdb3 shared group bug

fixed a bug that caused shared groups to fail with the following error:

DB_ENV->open failed: No such file or directory

[20031120.0405] jonz: fixed HTML boundary corruption with signature removal

fixed a bug that caused boundary corruption after an HTML part where a DSPAM
signature from a previous reply was removed by the agent.

[20031120.0405] jonz: do not remove old signatures from signed messages

corrected the dspam agent so that older signatures from signed messages were
not parsed out.  this caused the message to fail to authenticate.

Version 2.8-rc-1
----------------

[20031115.2042] jonz: fixed minor memory leak on initialization failure

minor memory leak caused in libdspam when dspam_init fails.  does not affect
DSPAM agent, only library.

[20031115.2042] jonz: DSM_CLASSIFY generated truncated signatures

fixed a bug where DSM_CLASSIFY generated truncated signatures 

[20031115.1540] jonz: corrected multipart analysis bug

corrected a bug that caused parts of a multipart message that were not
specifically marked as text with the "Content-Type" header to be ignored from
analysis.

[20031114.1949] jonz: corrected DSM_CLASSIFY in-memory totals bug

corrected a bug that changed in-memory totals when DSM_CLASSIFY was used

[20031113.1938] jonz: corrected DSM_CLASSIFY bug in libdspam

corrected two bugs in libdspam regarding the DSM_CLASSIFY mode:

1. CTX->signature would overwrite the provided signature with a new signature
   resulting in a potential memory leak

2. If no signature was provided, DSM_CLASSIFY would segfault instead of create
   a new signature

Version 2.8-beta-2
------------------

[20031103.1119] awn: libdspam version changed to the '4:0:0'

libdspam version changed to the '4:0:0' because introducing and
requiring of dspam_init_driver() at start and dspam_shutdown_driver() at
and is backward incompatible change.

[20031031.0402] jonz: fixed web stats for shared groups

shared group webstats fixed

[20031031.0340] jonz: added commandline options

added --stdout commandline option to deliver messages to stdout
added --deliver-spam commandline option to deliver spams to user's mailbox
changed --deliver flag to --deliver-fp, although --deliver still supported
  for backward compatibility.  option still only necessary when configuring
  with --enable-spam-delivery

[20031031.0324] jonz: changed default configure options

enabled the following as defaults in configure:

alternative-bayesian	(alternative Bayesian algorithm)
test-conditional	(test-conditional, iterative based training)

[20031030.1120] jonz: fixed caching bug

fixed caching bug in mysql_drv driver and ora_drv drivers causing dspam_stats
to return stats for first user, as stats for all users

[20031029.0538] jonz: added --classify commandline flag

the --classify commandline flag will classify the input message and output
to stdout "SPAM" or "HAM" depending on the result.  No changes will be made
to the user's tokens or totals.

[20031029.0538] jonz: changed totals mechanism

the following changes have been made to the totals mechanism:

- spam_misses has been changed to spam_misclassified
- false_positives has been changed to innocent_misclassified
- spam_corpusfed and innocent_corpusfed have been added

IMPORTANT UPGRADE NOTE: Please see the README for information on updating your
SQL databases to accept these changes if you are using a SQL-based driver.  If
you are using a BDB-based driver, these changes will automatically be 
implemented.

[20031028.2000] jonz: corrected CLASSIFY bug in mysql_drv and ora_drv

corrected a significant bug in mysql_drv and ora_drv which caused tokens and
totals to be incremented on all CLASSIFY calls.

[20031028.2000] jonz: changed DSF_CLASSIFY (flag) to DSM_CLASSIFY (mode)

the DSF_CLASSIFY flag is now a mode called DSM_CLASSIFY.

Version 2.8-beta-1
------------------

[20031028.0531] jonz: added customizable header for cgi

cgi spam account now has customizable header

[20031028.0448] jonz: classification catches to add as spam

spam catches by a member of a classification group should result in the
message being added as spam, as opposed to innocent.  this has been corrected.

[20031028.0204] jonz: X-DSPAM-User header only considered in managed groups

the X-DSPAM-User header field is only paid attention to when the user is
a member of a managed group (the only time where the original user is
necessary).

the parsing of the X-DSPAM-User header has also been corrected to chomp the
newline character, which was resulting in some systems including the character
in the username.

[20031028.0116] jonz: corrected a critical error in classification groups

corrected a critical error in classification groups causing DSPAM to crash
(and the message get delivered by the MTA's failsafe in most cases) when a
user in a classification group resulted in a spam being caught.

[20031027.0137] jonz: added mta whitelists for source address tracking

file USERDIR/mta.whitelist may now contain a list of internal MTA ip addresses,
which will cause DSPAM to skip to the next 'Received' header when processing
the source address.  each IP should be on a newline.

[20031026.1706] jonz: added signal handling to tools

added signal handling to tools, to unlock databases upon SIGINT, SIGPIPE or 
SIGTERM to avoid stale locks.

[20031025.1111] jonz: added rolling filter accuracy stats to cgi

rolling filter accuracy stats allows the user to measure their filtering
accuracy over a period of time (usually monthly or quarterly).  stats should
be reset after a good learning period (approximately 4000 spams and nonspams)
to measure accuracy accurately =)

[20031024.0007] jonz: libdb drivers reworked

libdb drivers reworked for better:
- locking (exclusive)
- recovery (simple recovery run on open)
- environment management (individual user environments)

IMPORTANT UPGRADE NOTE:

run the script 'dspam_movefiles [userdir]' in the tools directory to upgrade to
this new directory storage format.  after running, make sure you chown the
correct file ownership to the newly created directories.  this should be done
with the MTA shut down and no dspam processes running.

you will also need to reinstall/reconfigure the CGI

[20031023.1949] jonz: update to cgi to avoid missed messages

cgi now tracks the size of the quarantine between viewing and deleting all
messages, to avoid deleting messages that came in while reviewing the
quarantine.

[20031023.1727] jonz: compensated for converged boundaries

compensated for a slight break of RFC where two boundaries in a nested 
message appear without a blank space in-between, leading to message corruption.
fortunatley, this type of behavior is extremely scarce.

[20031023.0900] jonz: fixed classification group bug

fixed a bug that caused classification groups never to fire; datatype
CTX->confidence should be float, not int.

[20031022.2229] jonz: added "-d %u" to default cgi flags

added "-d %u" to default dspam cgi flags to assist new users

[20031022.0930] jonz: fixed bug preventing multiple group subscriptions

fixed a bug that caused a user to not be able to be subscribed to multiple
groups

Version 2.7.6.10
----------------

[20031022.0930] jonz: added support for managed shared groups

the group type 'shared' can be appended with ',managed' to convert the shared
group into a managed shared group.  a managed shared group is the same as a
shared group, only the managed version will share the quarantine box as well,
enabling one user (named after the group) to manage the handling of all
quarantine functions (false positive reporting, etc.).

this is generally not what users want, as personal information could potentially
be shared with the administrator of the group, however there are some
circumstances where this would be appropriate.

a regular shared group:

groupname:shared:user1,user2,userN

a managed shared group:

groupname:shared,managed:user1,user2,userN

[20031022.0930] jonz: corrected long-time stdin bug

corrected a long-time, just discovered but that caused stdin to be read in very
small chunks (32 bytes each).  correcting this bug has caused DSPAM to read
in messages much quicker.

[20031022.0930] jonz: cgi to use X-DSPAM-Signature

when message-id is not present, the cgi will now use the X-DSPAM-Signature
field to uniquely identify each message.

[20031022.0930] jonz: extended header assembly buffer to 4k

header assembly buffer extended to 4k; was truncating some longer fields at 1k.

[20031022.0930] jonz: minor crash bugfix

an obscure bug has been corrected which caused dspam to crash if the word
"boundary" was placed on a line in the message body, and that line began
with a space or tab.

[20031022.0900] jonz: false positives not delivered when spam-delivery enabled

false positives shouldn't be delivered when --enable-spam-delivery is enabled,
since they will be mailed in (or otherwise processed) directly from the user's
inbox.

to force false positives to be delivered, use the --deliver commandline
argument

Version 2.7.6.9
---------------

[20031021.1300] jonz: significant changes to mysql driver

the data type for the 'token' field in the dspam_token_data table has been
changed from BIGINT to VARCHAR.  This is due to a bug in MySQL being unable to
handle some of the large numeric values used for tokens.  

BEFORE UPGRADING, SHUT DOWN YOUR MTA AND ISSUE THE FOLLOWING MYSQL QUERY:

alter table dspam_token_data modify token varchar(32);

[20031021.1206] awn: Convenience symlinks for libdb{3,4}_deadlock

Convenience symlinks dspam_deadlock.libdb4 (in case of libdb4_drv),
dspam_deadlock.libdb3 (in case of libdb3_drv) and dspam_deadlock (in
case of both libdb*_drv) are added and pointed to the appropriate
libdb{3,4}_deadlock binary.

[20031021.1016] awn: configure: mysql and network-related libraries

-lnsl and -lsocket are added to the mysql client library check where
needed (e.g. on Solaris).

[20031021.0000] jonz: changed signature format to include frequency

WARNING: You should delete all your temporary signature information before
upgrading to this version, as the signature format has changed.  You can do
this by deleting all your .sig files or issuing a 
"delete from dspam_signature_data" query if using a SQL-based driver.

RATIONALE: When performing classification queries with signatures, the
frequency is necessary to insure an identical calculation.

[20031021.0000] jonz: added support for 'CLASSIFICATION' group

A 'CLASSIFICATION' group type has been added.  Classify groups are groups of 
users who share the results of spams against their own personal dictionaries.  
This means that for every message that comes in for any user in the group, 
dspam classifies that message for every user and if any user believes the 
message to be spam, it is marked as spam for the destination user.

To avoid false positives, external classification is only used when there is
a confidence level of 0.30 or higher of spam.  The confidence level is
calculated with Chi-Square.

Members of this type of group should only join after their initial training
period.  Members may also be part of an inoculation group, but users can
not be a part of both a classify group and a shared group.

[20031021.0000] jonz: changed default probability for single-corpus tokens

changed the probability for tokens that appear only in one corpus:

TYPE			FROM		TO
Appears +10 in Spam	.9901		.9999
Appears <10 in Spam	.9900		.9998
Appears +10 in Innocent	.0099		.0001
Appears <10 in Innocent	.0100		.0002

[20031019.2200] jonz: added test-conditional training support

added configure flag --enable-test-conditional which will enable test-
conditional training.  test-conditional tranining will automatically re-train
the user's dictionary on spam or false positive until the message condition is
met (e.g. until the user's dictionary no longer results in misclassification of
the message being retrained).  this training has a maximum number of 5
iterations, and will only invoke when:

- The user has > 4000 innocent messages in their corpus, and is reporting
  a spam

- The user is reporting a false positive (regardless of the number of
messages in their corpus)

[20031019.2016] jonz: added support for shared groups in mysql_drv driver

support has been added for shared groups using the mysql_drv driver, but with
one caveat: if you will NOT be enabling "virtual users" support, you will need
to create a user on your system for each group you add.  This is because the
mysql_drv driver maps user ids in the database to users on the system.  this
is not an issue when "virtual users" support is enabled.

Version 2.7.6.8
---------------

[20031019.1722] jonz: added mysql.sock functionality

added functionality for connecting via mysql.sock instead of TCP.  specify
pathname to socket in lieu of hostname to implement.

[20031019.1700] jonz: eliminated false-positive retrain headers

eliminated the additional X-DSPAM headers added when reclassifying a 
false positive.  the headers from the original classification are
preserved.

[20031019.1530] jonz: centralized syslog logging of mysql query errors

centralized/standardized syslog logging of all mysql query errors

[20031019.1530] jonz: corrected bug in virtual users w/mysql

corrected a bug causing some tools to fail when virtual users is enabled while
using the mysql_drv driver.

[20031018.1050] jonz: corrected type-o in dspam_corpus.in

fixed close(PIPIE) type-o in dspam_corpus.in

Version 2.7.6.7
---------------

[20031017.2230] jonz: enhanced overall inoculation processing

code cleanup of inoculation processing; one central subroutine.  fixed some
minor related bugs.

[20031017.2129] jonz: corrected external inoculation processing

external inoculations (--corpus --inoculate --addspam combination) resulted in
an error causing the user to never be inoculated, however all users in the
inoculation group were.  corrected this bug so that the destination user would
also be inoculated. 

Version 2.7.6.6
---------------

[20031017.1930] jonz: fixed bugs in CGI 'From' line reporting

fixed a bug that caused malformatting in the 'Fron' line when placing in spam
quarantine

[20031017.1930] jonz: fixed bugs in false positive processing

fixed a bug, which now strips out any quarantine message 'From' line added by
DSPAM prior to processing.

[20031017.1930] jonz: fixed variable definition problems with experimental code

fixed bugs in experimental code; should not affect normal users, but broke
the build anyway.

Version 2.7.6.5
---------------

[20031017.1730] jonz: added --enable-experimental

added --enable-experimental flag which activates experimental code, moved
the following code bases to experimental:

- Versatile Language Message Inoculation Format
  (standard for sending/receiving inoculations across multiple anti-spam
   platforms and systems)

- Counting of unknown tokens in messages

[20031017.1700] jonz: only inoculate users who require inoculation

inoculation now only inoculates users who would otherwise have misclassified
the message being presented
 
[20031017.1600] jonz: changed all /tmp files to USERDIR

all /tmp files now outputted to USERDIR to avoid a race condition.

[20031016.2207] awn: libdb detection is changed again (sigh)

Probing for -ldb-<major> and -ldb<major> is resurrected again (needed
for some version of Debian with libdb v3.2.9).  Difference from previous
one is using libtool for linking test frogram at the "header-
vs. library version" check stage.

[20031016.1837] jonz: changed high characters to 'z' instead of ignored

changed all high characters to z's; previously ignored them.  effective way to
improve filter rate on spams using wide characters.  credit for this technique
given to Brian Burton.

[20031016.1400] jonz: added warning about MySQL bug to README

added information about the bug in MySQL versions < 4.0.15.stable to the
MySQL README.

[20031016.1227] jonz: compensated for mysql_drv insert bug

compensated for mysql_drv insert bug; made better code in both mysql_drv and
ora_drv to handle insert failures with more grace

[20031016.1142] jonz: corrected token insert debug output

corrected debug output for token inserts to display correct query and disk
state.

Version 2.7.6.4
---------------

[20031016.0946] jonz: switched to MyISAM MySQL tables

InnoDB turned out to be much slower than MyISAM, so all MySQL objects have
been changed to be of type "MyISAM".

[20031015.1434] jonz: added exit code mirroring of LDA

added exit code mirroring of LDA; if any calls to LDA fail, dspam will return
the last failed exit code

[20031015.1045] jonz: added caching of getpwnam() and getpwuid() information

added caching of getpwnam() and getpwuid() information for non-virtual users
(already caches for virtual users).  this was added to keep some tools from
hammering on LDAP or other local authentication mechanisms.

Version 2.7.6.3
---------------

[20031014.2211] jonz: fixed 100% cpu utilization bug in libdbX_deadlock

fixed a bug in libdbX_deadlock causing 100% cpu utilization on linux
 
[20031014.1935] jonz: fixed auto-recovery in libdb drivers

fixed bugs in auto-recovery mechanism in libdb drivers

[20031014.1545] jonz: added support for accepting inoculation messages

Added support for "Inoculation Message Format", a new standard which
is currently in the form of an Internet-Draft, to allow inoculation
via email and trusted checksums.

[20031014.0824] jonz: added X-DSPAM-Signature

X-DSPAM-Signature is NOT a replacement for having in-line signatures
but is useful for debugging purposes

[20031014.0842] jonz: enhanced boundary recognition

enhanced boundary recognition to catch boundaries with malformatted 
definition lines

[20031013.2217] jonz: fixed bug in dspam_2mysql

fixed type-o in 'false-positives' field to false_positives

[20031013.1949] jonz: better html filtering

implemented better filtering of some useless html tag data, focus more on
content; resulted in the catching of a few more spams

[20031013.1832] jonz: added --inoculate flag

added support for inoculation using --inoculate flag.  this can be used in
conjunction with external inoculation as described in the README file.

Version 2.7.6.2
---------------

[20031013.1443] jonz: fixed algorithm initialization bug

fixed a bug in the initialization of algorithm data, which caused some
miscalculations whenever the first token was very innocent.

[20031013.1413] jonz: changed token sorting algorithm

token sorting now sorts by delta first, then by frequency; this means 
tiebreakers will be based in part on token frequency

[20031013.1329] jonz: added deadlock detection tool

for large-volume implementations, added a deadlock detection tool, 
libdb3_deadlock or libdb4_deadlock.  this tool can be run at system start and
will continue to perform deadlock operations in the background.
 
[20031013.1317] jonz: implemented deadlock detection

Implemented calls to libdb's deadlock detection mechanism

[20031013.1250] jonz: modified Chi-Square algorithm for better performance

Chi-Square algorithm changed to use 25 tokens, ignoring mid-range

[20031012.1831] jonz: changed group file format, added inoculation type

changed group format to:

groupname:grouptype:user1,user2,userN

BE SURE TO UPDATE IN YOUR GROUP FILE

there are now two types of groups: shared and inoculation.  the shared group
is the group everyone is used to, sharing dictionaries and signature dbs.

the inoculation group allows each member of the group to maintain their own
private dictionary and signature database, but members of the group will
automatically train eachother's dictionaries with spams they manually forward in
which will help 'inoculate' all other group members from new spams going out.

examples:

development:shared:bob,tom,bill

company:inoculation:jim,ted,robert

a user can be a member of multiple inoculation groups, but cannot be a member
of both a shared group and an inoculation group.

[20031012.0009] jonz: fixed freed-memory bug in decode.c

fixed freed-memory bug in deocde.c, which caused an occasional crash when
decoding encoded headers.

Version 2.7.6.1
---------------

[20031011.1236] jonz: added support for multiple algorithms

added support for multiple algorithms; e.g. if any of the enabled algorithms
suspect the message is spam, it is spam.  you can use the following flags:

--enable-chi-square
--enable-alternative-bayesian
--disable-traditional-bayesian

traditional bayesian is enabled by default

[20031011.1034] jonz: added Chi-Square specific per-token calculations

when using Chi-Square, added Chi-Square's expanded per-token calculations

[20031011.0923] jonz: fixed alternative bayesian calculations

fixed problem with the wrong definition names being used, which caused
alternative bayesian never to get invoked

[20031011.0923] jonz: fixed a bug in all calculations

a bug in 2.7.6 was fixed which resulted in spams to be missed if there were
fewer than 15 tokens available for calculation.  this could only occur in the
most rarest of circumstances, so it should not have affected much.

Version 2.7.6
-------------

[20031008.2200] jonz: added alternative calculation modes

added --enable-alternative-bayesian flag which invokes Brian Burton's 
alternative Bayesian algorithm 

added --enable-chi-square flag which invokes Chi-Square algorithm

only one or neither (for default bayesian) flags should be used.  debug
information for all three calculations is generated regardless.

[20031008.2029] jonz: fixed bug in libdb drivers

fixed a bug which used memory that had already been freed causing
some occasional unpredictible behavior.
 
[20031008.1431] jonz: added support for multipart/signed messages

added support for multipart/signed messages without altering message body.
signature is appended as a text attachment.

[20031007.1904] jonz: fixed bug in boundary detection

fixed a bug in boundary detection where boundary would fail to be detected if
it wasn't the first definition on the Content-Type heading.  For example:

Content-Type: multipart/signed; protocol="application/x-pkcs7-signature"; 
  boundary="------------ms010307080208090601090900"

would have failed.  this bug fix also improves overall boundary detection. 

[20031007.1724] jonz: added source address reporting

the source address for all messages are now reported via syslog. this uses 
the new dspam_getsource() function added to the API.  depending on whether the
message is spam or innocent, the message will be reported either to MAIL.INFO
or MAIL.DEBUG.  for example:

dspam[30965]: spam detected from X.X.X.X 

dspam[30414]: innocent message from X.X.X.X 

this can be used for creating automatic blacklists.  more to come.

[20031007.1557] awn: configure script changes

Configure script now detects version of libdb headers and guesses
appropriate library name from this version.  Probed libraries are:

    -ldb-<major>.minor>
    -ldb<major><minor>

As consequence and for example, no symlinking libdb41.so to the libdb-4.so is required now on FreeBSD.

Version 2.7.5
-------------

[20031007.0930] jonz: date field no longer ignored

date field is no longer ignored; time of day can sometimes play an effective
role in identifying spam or preventing false positives.

[20031006.1911] jonz: Oracle storage driver

first release of ora_drv; storage driver for Oracle.  please see README file
for more information.

[20031004.1423] awn: support for program-name transformation.

Configure options `--program-prefix', `--program-suffix' and
`--program-transform-name' are fully supported now except CGI.
(Was: dspam_corpus and dspam_genaliases don't honor transformed name of
dspam binary).

[20031003.1832] jonz: fix for base64-encoded binary messages 

bug fixed which caused corruption in some base64-encoded single-part
messages in which the only component was a binary file.

[20031003.0031] jonz: automatic recovery for libdb drivers

automatic recovery has been implemented for libdb drivers 

[20031003.0031] jonz: DB_ENV implemented for libdb drivers

DB_ENV locking has been implemented for libdb drivers.  This obsoletes 
storage driver dot-lock file locking, which is no longer used.  quarantine 
dot-lockfile locking is still used when writing to the quarantine.

Version 2.7.4
-------------

[20031002.1728] jonz: modified corpus flag to force results

use of corpus flag now forces results to match commandline flags, meaning
innocent messages no longer need to be fed in first.
 
[20031002.0800] jonz: added unique id to dspam_ngstats

for systems without a static public ip address, a unique id can be configured
in dspam_ngstats.c (NGSTATS_UID) comprised of alphanumeric characters, periods,
and underscores.  any invalid characters will cause stats to be ignored.

[20031002.0800] jonz: removed broken sanity checks

some sanity checks were firing off erroneous messages in 2.7.3; these have
been removed

[20031001.0800] jonz: fixed --enable-large-scale with mysql_drv

modified all drivers to add support for --enable-large-scale with mysql_drv

[20031001.0800] jonz: added dspam_ngstats

added dspam_ngstats, a global stats reporting tool designed for global
stats tracking for dspam

[20030930.1547] awn: Convenience symlinks for libdb{3,4}_purge

IMHO, `libdb3_purge' and `libdb4_purge' are not a very descriptive names.
Therefore, 2 convenience symlinks are added:
  o  dspam_purge.libdb4  (dspam_purge.libdb3 in case of libdb3 driver), and
  o  dspam_purge
both pointed to the appropriate libdb{3,4}_purge.

[20030930.1517] jonz: fixed problem with trailing commas in update command

Version 2.7.3
-------------

[20030929.1450] jonz: fixed problem with groups

groups has been repaired; apparently a line of code was inadvertantly deleted
from the source tree causing it to fail in 2.7.2.

[20030928.0253] awn: New scheme for conditional compilation of storage drivers

All following is for `configure.ac' and resulting `configure' script:

    Now configure doesn't assume that storage driver sources are have
    name `${storage_drv}.c' and `${storage_drv}.h'

    You need to list resulting .lo files in the `${storage_drv_objects}'
    variable instead.

    Storage driver specific subdirectories are should be listed in the
    `${storage_drv_subdirs}' variable also.

This allows to have any number (including zero) driver-specific sources
and subdirectories, build automatically driver specific tools in these
directories (like `libdb4_purge') and should work properly in the VPATH
environment.

[20030928.0248] awn: configure.ac bug fix

Fix CPPFLAGS related bugs in the storage drivers sections of
`configure.ac'.

All three storage sections in the configure.ac was have code like
    CPPFLAGS="$DB_LIBS $CPPFLAGS"
instead of
    CPPFLAGS="$DB_CPPFLAGS $CPPFLAGS"
(replace DB_ by MYSQL for give mysql case).

This was my bug, I know.

[20030927.1600] jonz: added docs for Courier MTA

added documentation for configuring Courier MTA with DSPAM.  contributed by
Michael Greb.

Version 2.7.2
-------------

[20030925.2231] jonz: added --disable-trusted-user-security

added configure flag --disable-trusted-user-security to disable trusted user
security, rather than trying to maintain two different versions of dspam.

[20030925.1103] jonz: added support for RedHat's built-in libdb4.0

added support for RedHat's built-in libdb-4.0.  This should also provide
compatibility with any other libdb-4.0.  An alias will still be necessary:

ln -s /usr/lib/libdb-4.0.so /usr/lib/libdb-4.so

[20030925.1103] jonz: removed -d $u from default LDA configuration

-d $u coming first in the argument list caused some problems; -d %u should now
be used instead in the MTA configuration.
 
[20030925.1103] jonz: patch to compensate for yahoo broken RFC bug

implemented patch to compensate for a bug in the yahoo client where yahoo
breaks RFC and writes an end boundary prematurely, causing the real boundary
to get corrupted.

[20030925.0855] jonz: changed compile flag --enable-virtual-uids

changed compile flag --enable-virtual-uids to --enable-virtual-users

[20030925.0852] jonz: fixed plain text html signature placement bug

fixed a small bug that caused DSPAM to place the signature in html code samples
in plain text.  

[20030924.0000] jonz: added support for virtual users

added support for virtual users in mysql_drv.  this is necessary when the
users don't actually exist on the system.  use --enable-virtual-users to
enable.  only necessary when using the mysql storage driver.

[20030923.2043] jonz: fix for multiple user bug

restored %u and adjusted docs for multiple local user bug with sendmail

Version 2.7.1
-------------

[20030923.0050] jonz: fixes for libdb tools

several small fixes to issues with compiling libdb tools

[20030923.0045] jonz: bug fix for header decoding

fixed a bug causing some headers to decode incorrectly

[20030923.0030] jonz: bug fix for attachments and signature

added code to specifically NOT append a signature to any segments that have
"Content-Disposition" of type attachment.

[20030922.1900] jonz: added more debug output 

added more debug output (on error) to mysql driver and libdspam

[20030920.0840] jonz: mysql_drv to use -lm -lz 

switched mysql_drv to use -lm -lz in place of -lcrypto.  both apparently have
compress/uncompress functions

Version 2.7
-----------

[20030919.0900] jonz: added dspam_merge tool

Version 2.7.beta.3
------------------

[20030915.0000] jonz: added mysql_drv storage driver

mysql_drv storage driver added for MySQL functionality.  please see README
and tools.mysql_drv for more information.

[20030914.1410] jonz: fixed bug in innocent_hits

fixed bug where some tokens received 2 innocent hits instead of 1 (apparently
is an old but but did not dramatically affect effectiveness)

[20030913.0956] jonz: implemented quarantine locking

implemented quarantine locking mechanism independent of driver locking

[20030913.0900] jonz: internalized API locking

all API locking performed internally (driver-specific).  no external locking
calls exist; part of _ds_init_storage and _ds_shutdown_storage.  reason:
not all drivers will require context locking (and hopefully someday neither
will libdb3/libdb4 drivers).

[20030912.0000] jonz: locks to use USERDIR

for driver compatibility, all .lock file locking takes place in USERDIR, even
for large-scale implementations

[20030911.0000] jonz: driver config script management

implemented driver configure script management and tools.[driver] for
driver-specific tools.

Version 2.7.beta.2
------------------

[20030910.0054] jonz: message header decoding

added message header decoding per RFC 2047

[20030909.1830] jonz: implmented standardized return codes

implemented standardized return codes for the major api functions:
EINVAL, EFAILURE, ELOCk, EFILE, EUNKNOWN

[20030909.1730] jonz: ported all tools to new driver API

ported all tools to new driver API.  dspam_purge has been replaced with
a driver-specific purge mechanism (default: libdb4_purge), due to the fact
that not all drivers will need to purge, and recreating datafiles is a very
specific function...still uses the storage driver api's locking mechanism.

[20030909.0051] jonz: removed dspam_convert

removed dspam_convert tool for 2.5->2.6 upgrades

[20030909.0051] awn: configure script changes

`--enable-gcc-warnings' configure option is added.

[20030908.2000] jonz: implemented storage driver API

implemented storage driver api.  default driver is libdb4_drv

[20030907.1627] awn: dspam_genaliases changes

dspam_genaliases now generates `nospam-USER' aliases (aliases for false
positive reporting) by explicitly request only.  New `--nospam' command
line option is used for this.

Version 2.7.beta.1
------------------

[20030907.1140] jonz: user identification and passthru changes

the method of user identification and passthru has been changed:

  - DSPAM no longer recognizes -d to identify the user, but instead --user
    must be used.  --user will never be passed onto the local delivery agent.

  - In order to pass the -d flag through to the local delivery agent, it
    must be specified either separately on the commandline, or at configure
    time. 

  - To allow -d flag support to be supported at configure time (and when
    overriding untrusted users), the $u variable has been added to dspam.
    any commandline arguments passed through DSPAM matching $u will be
    replaced with the actual destination username (specified with --user
    or automatically forced for untrusted users).

These changes require some modifications to the mailer configuration.  In the
following example for sendmail, you would change the following line in
the Mlocal block:

A=/usr/local/bin/dspam -d $u

to:

A=/usr/local/bin/dspam --user $u -d $u

--user is not passed through to the LDA, but -d is.  Alternatively, you could
remove '-d $u' from sendmail.cf, and configure dspam with:

--with-local-delivery-agent="/path/to/lda -d \$u"

NOTE: be sure to escape the $ in $u ONLY when specifying it on the commandline.
This will prevent $u from being overwritten with the shell's environment
variable 'u'.

Specifying this at configure time is especially useful if you plan on running 
dspam via commandline and do not want to have to specify -d [username] in 
addition to your --user [username] arguments.

[20030907.1440] jonz: removed --deliver-cmd and --quarantine-cmd

removed runtime --deliver-cmd and --quarantine-cmd functions; added configure
time --with-quarantine-agent="/path/to/agent" to override default quarantine
function.

[20030906.0000] jonz: fix for boundary definition identification

fix to detect non-lowercase multipart boundary definitions

[20030906.0000] jonz: partial rewrite of internal sorting routines

partial rewrite of tbt sort routines to drop recursion and potential stack
problems to follow.  problems only experienced when using API with
multithreaded code.  original patch submitted by Stuart Gathman 
<stuart@bmsi.com>

[20030906.0000] jonz: forced --deliver-cmd and --quarantine-cmd to require
trusted user permissions.  dspam also must be compiled with 
--enable-insecure-functions for them to be available.

[20030906.0000] jonz: trusted user implementation

implemented trusted user approach with user and passthru overrides for the
untrusted users.  see README for more information

Version 2.6.5.2
---------------

[20030906.0000] jonz: insecure parameter check

insecure parameter check; checks parameters for insecure characters:
| ; < > ` 

Version 2.6.5.1
---------------

[20030905.1105] jonz: partitioned insecure functions

partitioned potentially insecure functions to require the configure flag 
--enable-insecure-functions to be set to activate.  these include:

--deliver-cmd
--quarantine-cmd

special attention needs to be given to the execution permissions of the dspam
agent when enabling these functions to avoid users being able to 
execute arbitrary commands on the server.  it should be understood that these
are potentially insecure functions and could potentially lead to the execution 
of arbitrary code if exploited by a malicious user or CGI.

[20030905.0418] jonz: fixed bug: from header corruption

if MTA is passing in From headers, they were being corrupted by DSPAM's
header parsing.  fixed to specifically parse From headers differently

[20030904.1422] jonz: fixed bug with quoted-printable debugging

fixed a small bug that would fail to decode a quoted character immediately
following a line break

[20030904.1127] awn: c89 compatiblity

C89 compatiblity patch is applied.  Patch author: Albert Chin-A-Young
<china@thewrittenword.com>

	* configure.ac, base64.c, decode.cn dspam.c, error.c,
	error.h, libdspam.c, localdb.c, lock.c, signature.c,
	tools/dspam_dump.c: Allow building with a C89 compiler
	which does not have ISO varargs.

[20030904.1046] awn: work around Solaris' make

tools/Makefile.am doesn't uses $< authomatic variable because Solaris
make (at least some versions) doesn't supports its.

[20030904.0700] jonz: segfaulting on _ds_message_destroy

fixed a bug where destroying CTX->message caused a segfault.  fortunately, this
bug would have never been reached by the agent or the api.

[20030904.0700] jonz: nfs locking

modified lock.c to work over nfs mounts, only checking pid when hostname 
matches.  maximum 20-minute stale lock removal.
 
[20030903.1716] awn: dspam_corpus and dspam_genaliases update

dspam_corpus and dspam_genaliases are use real path to the dspam binary
instead of assuming default /usr/local/bin/dspam.

dspam_genaliases outputs aliases table to the stdout now by default.
Use new `-o filename' or `--output filename' option for redirect its to
the file.

dspam_genaliases generates `nospam-USER' aliases in addition to the
`spam-USER' aliases now.

[20030903.0145] jonz: fixed memory leak in dspam agent

fixed internal memory leak in dspam agent where CTX->message was not destroyed.
only leaked until dspam agent exited, then memory was reclaimed

[20030903.0145] jonz: updated example.c 

updated example.c to show correct CTX->message destruction

[20030903.0115] jonz: fixed bug in false positive reporting

fixed bug where innocent_hits incremented twice on false positive report

Version 2.6.5
-------------

[20030902.0000] jonz: added --version commandline parameter

added --version commandline parameter to display version; -v is not used as
it could be a passthru parameter to an LDA.

[20030902.0000] awn: dspam_purge changes

minor fixes to dspam_purge tool

[20030901.0000] awn: configure changes

- implemented checks (and use of results) for <sys/time.h> <time.h> 
- checking for math.h and fabs() were added, use -lm where need
- aesthetic changes

[20030901.0000] awn: removed compiler warnings

removed "no previous prototype" warnings with some compilers

[20030901.0000] awn: compiler warnings

miscellaneous changes to remove some compilation warnings

Version 2.6.5-rc1.1
-------------------

[20030831.0000] jonz: debug output

removed left over debug output

Version 2.6.5-rc1
-----------------

[20030829.0000] jonz: fixed broken rfc attachments

made compensation for broken rfcs with embedded attachments, where original
message should've been message/rfc822 but was instead attached as plain/text.
this caused attachments to be processed/consume large quantities of time.
decode.c modified to accept a new boundary definition from any header.

[20030829.0000] jonz: --corpus flag foregoes message delivery/quarantine

use of the --corpus flag will now prevent the messages fed in as corpus from
being delivered/quarantined

[20030829.0000] jonz: added commandline delivery override

commandline flags --deliver-cmd and --quarantine-cmd added to override the
default behavior for delivery (MLOCAL) and quarantine (either MLOCAL or
quarantine depending on configuration).  syntax:

dspam --deliver-cmd "/path/to/cmd -flags" 
dspam --quarantine-cmd "/path/to/cmd -flags"

(be sure not to use = sign).

when overridden values used, the user id is by default NOT passed through to
the called program.  use --with-passthru to pass ARG_USER %USER through to
the called program.  example:

dspam --deliver-cmd "/bin/cat" --with-passthru

actually calls: /bin/cat -d [username]

dspam --deliver-cmd "/bin/cat"

actually calls: /bin/cat

[20030829.0000] jonz: signature insertion moved inside body tag

dspam signature now inserted (wherever possible) inside HTML body tags to
avoid droppage under certain conditions.

[20030829.0000] jonz: changed dspam signature

dspam signature changed to a visble signature to work with clients that 
reformat only visible data (Eudora).  new signature:

!DSPAM:[SERIAL]!

Version 2.6.5-beta-2
--------------------

[20030826.1800] jonz: added --enable-delivery-to-stdout option

added --enable-delivery-to-stdout option which causes all delivered messages
to be printed to stdout rather than piped to an LDA.  if you wish to have spams
printed to stdout as well, use the --enable-spam-delivery option in 
conjunction.

[20030825.0031] jonz: signature attachment mode

coded signature-attachments mode, rewriting messages to include a dspam
signature attachment with full data, instead of writing the server-side
attachment.  use --enable-signature-attachments to enable. 

[20030824.2345] jonz: application/dspam-signature media type

added application/dspam-signature media type recognition

Version 2.6.5-beta-1.1
----------------------

[20030823.2010] jonz: fixed bug for empty headers

fixed a bug where segments with empty headers would be dropped in reassembly 
(currently these only seem to appear in mailer-daemon messages)

Version 2.6.5-beta-1
--------------------

[20030823.1804] jonz: groups now share same signature file

groups now share same signature file enabling them to use a single group alias 
for forwarding spams.

[20030823.1339] jonz: added new configure flags

--enable-homedir-dotfiles
When enabled, instead of checking for $USERDIR/$USER[.nodspam|.dspam],
DSPAM will check for a .nodspam|.dspam file in the user's home directory.
 
--enable-opt-in
Causes DSPAM to filter mail only for users with a .dspam dotfile.  The default
is opt-out, which requires a .nodspam file to exist to bypass filtering.

when using --enable-homedir-dotfiles, dspam installs as setuid root.

[20030823.1100] jonz: fixed segfaulting on signature reversal

[only affected alpha-4-internal]
fixed a bug where dspam segfaulted while reversing a signature making it
impossible to train dspam using signatures with alpha-4-internal.

[20030823.1100] jonz: added support for message/rfc822

[only affected alpha-4-internal]
added support for parsing message/rfc822 components; signature was not being
found in forwarded messages using this media type.

[20030822.0929] jonz: added fp alerts to cgi

added customizable false positive alerts to cgi.  alerts list will be
compared to message headers and hilight all messages that match in yellow.
alerts are stored as $USERDIR/$USER.alerts.

[20030822.0929] jonz: fixed decoding header bug

fixed a bug in the header decoding where the original encoding type was
reassembled into the message, instead of the decoded type.  fix only
affected alpha-4 (internal). 

[20030822.0929] jonz: moved signature append to process

moved appending of signature out of delivery_message and into the process
function, using the new message structures instead of parsing.  this also 
fixes a problem in that on memory failure, the delivery_message function
will no longer need to allocate memory.

[20030822.0016] jonz: adjusted lock timeout

adjusted lock timeout from 10 to 20 seconds.  depending on the load of your
machine, this could be set higher or lower.  the higher the setting, the less
chance of any failover deliveries being made, and the more chance of multiple
processes lined up waiting for a lock on a user's mailbox.

[20030822.0014] jonz: documentation tweaks

a few miscellaneous tweaks

[20030821.2145] jonz: added --enable-spam-delivery

added configure flag --enable-spam-delivery causing all spams to be delivered
instead of quarantined (for use with X-DSPAM header filtering

[20030821.1935] jonz: rewrite of message post-processing

Message post-processing rewritten; including appending of signature, 
message re-write, etcetera.  

[20030821.1908] jonz: added header information

X-DSPAM-Result: Spam || Innocent
X-DSPAM-Probability: (Actual Probability)

[20030821.1820] jonz: removed CTX->copyback

CTX->copyback is now obsolete.  All base64 decoding is performed on 
CTX->message, which is available from the context, or via calling
_ds_assemble_message() function using the message structure as a parameter.

[20030821.1730] jonz: changes to DSPAM_CTX

+  struct _ds_message *message;          /* Message Components */

for compatibility with existing API, dspam_process still accepts a const char *,
however tools that already perform message actualization (such as the DSPAM
agent) can set CTX->message to the existing struct _ds_message * to avoid
reprocessing the message, and to carry over any encoding changes.

[20030821.1730] jonz: implemented new decode/actualization functions in sig

implemented use of new actualization and decoding functions [decode.c] in
dspam.c's signature scan code. 

[20030821.1729] jonz: finished block decoding functions

/* Public decode function */
char *                  _ds_decode_block(struct _ds_message_block *block);
                                                                                                                                                                   
/* Private decoding functions */
char *                  _ds_decode_base64(const char *body);
char *                  _ds_decode_quoted(const char *body);

[20030820.0015] jonz: finished preliminary message actualization

decode.c: finished preliminary actualization code (code responsible for
actualizing a message into its individual components).  experiments with
plain messages and non-embedded multipart messages succeeded.  next phase of
testing to include embedded multipart messages, including spams that are
designed to frequently break RFC.  once testing/patching is complete,
decoding routines to follow.

[20030819.0000] jonz: signature embeddedding changes

signatures are now embedded in every text segment of a message to
insure they are forwarded properly

[20030818.1350] awn: fix for empty messages

(Submitted by Andrew W. Nosenk  <awn@bcs.zp.ua>)

* added check for empty data to prevent segfault

[20030817.1336] awn: configure script changes

(Submitted by Andrew W. Nosenko  <awn@bcs.zp.ua>)

* configure.ac: Work around versioning issues of some versions of
  db-4.  E.g. db_create() may be not a real function but simple
  forwarding macro to the db_create_4001().

* configure.ac: New configure option `--with-db4-libraries' (as
  pair for `--with-db4-includes')

[20030817.1230] jonz: added --disable-bias configure flag

when configure is run with --disable-bias, dspam no longer biases the
statistics in favor of innocent mail.  This may increase the filter's
effectiveness in catching spam, but could also potentially result in less
false positive protection.  some argue that eliminating bias is more
accurate, not less.

[20030815.0300] jonz: added dspam_genaliases script

a small script to create an aliases table from /etc/passwd

[20030814.1928] jonz: added large-scale directory support to tools

ported tools to support large-scape directory support (see below).

[20030814.0005] jonz: added large-scale directory support

when configure is run with --enable-large-scale, dspam stores all its user
files in large-scale mode.  for example, user root's files would be stored in
/etc/mail/dspam/r/ro/root.  directories are created automatically as needed. 

Version 2.6.4.1
---------------
                                                                                
[20030816.2352] jonz: parse fix for boundaries with spaces
                                                                                
added fix for multipart emails with spaces in the boundary definition
(e.g. boundary= "blah").  Discovered in some of the newer 'Urgent Response'
type spams.

Version 2.6.4
-------------

[20030809.1115] jonz: corpus spams marked as misses

spams learned through dspam_corpus are now marked as misses instead of 
caught spam.

[20030808.1945] jonz: changes to header processing

Message-ID is now considered for useful information.  Received header is now
considered, but parsed in a different manner preserving IP addresses and
other useful information.

[20030808.1945] jonz: blank signatures will no longer get written

blank signatures are a result of a failover passthrough for a particular
user.  dpsam has been changed to not write a signature if the signature
itself is blank, preventing <!DSPAM:> from appearing in an email.

[20030808.1945] jonz: added .nodspam file functionality

in an attempt to conserve disk space, a username.nodspam file may be
touched in the /etc/mail/dspam directory, which will cause all messages
for that user to be passed through dspam and not processed.  this will
prevent a dictionary or signature file from being built and save disk
space.  users wishing not to use dspam can still simply not use it,
but dropping a .nodspam file will prevent any files from being created. 

[20030805.1630] jonz: fixed multiple header destroy calls

fixed bug where the header nodetree was destroyed a second time in some errors
that cleaned up and returned, causing a segmentation fault.

[20030805.1400] jonz: added quoted-printable decoding

added quoted-printable decoding; decodes hex codes into actual characters.

[20030805.1230] jonz: documentation correction for dspam_corpus

dspam_corpus uses --addspam flag, not -a anymore

[20030805.1200] jonz: added verbose debugging option

added --enable-verbose-debug for verbose debugging information to be written
to /tmp/dspam.debug

[20030805.1200] jonz: new line unbreaking code

new line unbreaking code to unbreak only quoted-printable lines

Version 2.6.3
-------------

[20030801.0930] jonz: debug after context destruction

fixed a bug in dspam.c that reported debug information for a context
after it had been destroyed.

20030801.0930] jonz: dspam_clean to create new databases

dspam_clean tool rewritten to create new databases when called in the same 
fashion as dspam_purge.  this helps keep the databases in good health and
smaller filesize.
 
[20030801.0900] jonz: fix for PGP signatures

fixed formatting bug causing PGP signatures to be corrupted.  fix required
removing line unbreaking from message which could potentially cause dspam to
lose one or two signatures when messages are being forwarded from Microsoft
Outlook.  does not appear to be a significant issue.

[20030801.0900] jonz: fix for unchecked malloc calls

fixed two unchecked malloc calls
=> struct nt *nt_create(int nodetype)
=> struct nt_node *nt_add(struct nt *nt, void *data)

submitted by Thomas Lussing <lussnig@smcc.net>

[20030731.0852] jonz: added syslog logging 

added syslog logging using mail facility

[20030730.2323] jonz: documentation addition for username case

  added this to the README:

  NOTE: Some authentication mechanisms are case insensitive and will
   authenticate the user regardless of the case they type it in.  DSPAM,
   on the other hand, is case sensitive and the case of the username used
   will need to match the case on the system.  If you suffer from this
   authentication problem, and are certain all of your users' usernames are
   in lowercase, you can add the following line of code to the CGI right
   after the call to &ReadParse...

   $ENV{'REMOTE_USER'} = lc($ENV{'REMOTE_USER'});

[20030730.2311] jonz: fixed bug in dspam_stats

fixed formatting bug in dspam_stats causing problem with usernames > 16 
characters.  submitted by Stuart Gathman <stuart@bmsi.com>

Version 2.6.2.03
----------------

[20030729.2205] jonz: fixed more line parsing bugs

fixed some additional bugs in line parsing which may have caused some emails
to appear blank in Microsoft Outlook

Version 2.6.2.02
----------------

[20030729.0225] jonz: internal cleanup

removed unused variables and added prototypes for some functions lacking them

[20030729.0225] jonz: implemented strsep to fix processing snag

large messages resulted in significant processor consumption due to previous
method of splitting up messages line-by-line.  strsep now implemented to remove
this bottleneck.

Version 2.6.2.01
----------------

[20030710.1000] jonz: fixed bug in dspam_stats

dspam_stats now reports TS (total spams) as total spams minus spam misses.

[20030710.1000] jonz: fixed bug in false positives

fixed a bug where false positives reported without a signature would fail to
decrease the total number of spams.  this event should never occur using
dspam, and only addresses this as an issue for any third party software using
the dspam library.

[20030710.1000] jonz: added support for reusable contexts

added support for reusable contexts, enabling a context to be processed 
multiple times.

[20030704.1827] jonz: fixed condition in chomp

fixed a condition in chomp where it could potentially cause a segment fault if
called with a NULL pointer, or a string with zero length.  this should never
occur anyway considering the calling code.

Version 2.6.2
-------------

[20030701.0000] jonz: added DSF_CLASSIFY flag

added DSF_CLASSIFY flag to libdspam.  use of this flag causes libdspam _not_ to
record statistics for a specific operation, but only to evaluate and return
the operation's result.
 
[20030701.0000] jonz: fixed bit assignment bug

fixed a bit assignment bug resulting in clearing of all flags when headers
ignored
submitted by Stuard D. Gathman [stuart@bsmred.dmsi.com]

[20030701.0000] jonz: fixed bugs related to corpus mail

fixed a bug causing corpus mail's headers to be ignored
submitted by Stuard D. Gathman [stuart@bsmred.dmsi.com]

Version 2.6.1.01
----------------

[20030627.1924] jonz: fixed memory free of copyback buffer

copyback buffer is now freed in dspam.c when context is destroyed

Version 2.6.1.00
----------------

[20030622.0000] jonz: added ` as delimiter

[20030620.0000] jonz: added support for group dictionaries

Group dictionaries enable a group of users with similar email behavior to
share the same dictionary while still maintaining a private quarantine box.
Please see README for more information.

[20030620.0000] jonz: added dspam_stats tool

The dspam_stats tool can be used to display the statistics for one or all
users on the system.  Please see README for more information.

Version 2.6.0.69
----------------

[20030618.0000] jonz: line unbreaking correction

correction made to line unbreaking to sanity check for consecutive
equal signs

Version 2.6.0.68
----------------

[20030612.0000] jonz: change to configure tool

changed configure tool to look for db_strerror instead of
db_env_create in the event that libdb was built without
environmental functions

Version 2.6.0.67
----------------

[20030609.0021] jonz: bugfix in line unbreaking

fixed a bug in line unbreaking (where clients use an equal sign
followed by a carriage return to break up long lines) causing
some attachments to be unreadable by some mail clients.  lines
are now only unbroken in text segments.

[20030607.1020] jonz: bugfix in attachment boundaries

fixed a small bug that wrote the boundary twice at the end of
an attachment

Version 2.6.0.66
----------------

[20030603.1900] jonz: bugfix in line unbreaking

fixed a bug in line unbreaking (where clients use an equal sign 
followed by a carriage return to break up long lines) causing 
unquoted signatures ending with an equal sign to be malparsed,
causing the email to become slightly jumbled.

[20030603.1800] jonz: DSF_CORPUS flag

added DSF_CORPUS flag for processing messages that are from corpus; 
prevents innocent totals/hits from being subtracted when spam corpuses
are fed in. 

Version 2.6.0.65 
----------------

[20030601.0000] jonz: bugfix for locking

a bug in the locking mechanism for tools fixed; occasionally could cause
a corrupt dictionary

Version 2.6.0.64
----------------

[20030525.2300] jonz: bugfix for boundaries

fixed a bug causing boundaries ending in == to be parsed incorrectly
fixed a bug in parsing boundaries that used = without quotes

[20030523.2300] jonz: bugfix for attachments

fixed bug causing attachments to be dropped

[20030523.2300] jonz: optimizations for large databases

increased database cache to 4MB and implemented alternative btree
sorting routine to greatly speed up database functions

[20030523.2000] jonz: addition of libtool/shared libs

libtool is now implemented to build a shared libdspam library.

[20030523.1830] jonz: bugfixes

bugfix for multipart messages that caused message to be truncated
bugfixes to signature management causing some segfaults
bugfixes to crc64 calls, some calls returned a different crc every time

[20030523.0100] jonz: partial rewrite

Rewrote dspam engine into libdspam, enabling developers to link in libdspam
to provide "drop-in" spam filtering for their projects.

Migrated to 64-bit tokens; previous 2.6-Beta databases using 32-bit tokens
will not work with this new version.

Server-side-signature presently the only signature storage method; looking
into a different method of incorporating signature in emails.

Implemented tracking of spam misses and false positives.  Reported in CGI

[20030521.2315] jonz: url tokens ignored outside of urls

tokens found inside urls are ignored as individual tokens, and only 
represented as Url*token.

[20030520.0200] jonz: bugfix for base64 decoding

fixed a bug that failed to decode non-multipart base64 messages

[20030519.0000] jonz: ignore all html tags without spaces

ignore all html tags without spaces; frequently used to separate tokens

[20030519.0000] jonz: ignored collapsible html tags 

collapsed (rather than overwrote) html tags to join together tokens that
some spammers use such tags to separate.  

[20030518.1500] jonz: addition of dspam_crc tool

dspam_crc tool converts a string into the numeric crc used for storage in
the dspam dictionary; makes it easier to use dspam_dump and grep for a 
particular token

[20030517.1930] jonz: bugfix for as_spam signature

fixed a bug causing the signature not to be displayed
on messages marked as spams

[20030517.1300] jonz: bugfixes 

fixed bugs in signature storage (delete .sig files to fix)
fixed bugs in dspam_purge
fixed bugs causing segfault under some circumstances

[20030516.0052] jonz: exim documentation corrections by Jerome Alet

Exim configuration to directors, not routers

[20030516.0020] jonz: massive rewrite and optimizations

addition of tbt and lht dynamic data structures
rewrite of debugging functions
rewrite of database functions
conversion to crc32 long integers for token management
addition of dspam_convert to convert old databases
renamed dbdump to dspam_dump, removed dbset/dbdelete

these rewrites/optimizations convert all tokens to numeric (long)
values, making processing and sorting much faster.  tbt implements
a binary tree sorting mechanism eliminating qsort.  storing tokens
in numeric format also removes the necessity for the zlib compression
librayr.

[20030514.1500] jonz: bugfix in content identification

small bugfix in content identification that led some emails to miss a
dspam signature

[20030514.1500] jonz: error message output added to debug

error messages previously only made it to stderr.  when --enable-debug
option is used, errors are also printed to debug

Version 2.5.4 - May 14 2003
---------------------------

[20030514.0240] jonz: added autoconf support contributed by Andrew W. Nosenko

thanks to Andrew W. Nosenko for contributing the files/patches to provide
autoconf support to dspam.  please read the README file for instructions.

[20030514.0200] jonz: changed hash to support ints

hash.c modified to support ints or character pointers.  makes tracking
token frequency much faster.

[20030513.2345] jonz: bug in dspam_clean corrected

corrected a bug in dspam_clean causing it to fail

[20030513.2300] jonz: experimental tokenized rules

playing with a few experimental tokenized rules

[20030513.2300] jonz: freebsd makefile setuid root

modified the freebsd makefile to install as setuid root.  this is due to 
freebsd's mail.local requiring the ability to change its uid.  dspam will
not work correctly on the commandline (for example when reporting false 
positives)

[20030513.0325] jonz: changed probabilities for single-corpus tokens

probabilities of 0.0100 and 0.0101 were previously assigned to tokens
appearing only in the innocent corpus.  this has been changed to
0.0099 and 0.0100 to balance out the 0.9900 and 0.9901 used for tokens
that appear only in the spam corpus.  this very small change corrected
3 false positives that appeared.

[20030513.0250] jonz: added documentation for exim

documentation thanks to David Shirley 

[20030512.1930] jonz: applied changes submitted by Andrew W. Nosenko

(DELIMITERS): Plain `^M' character is replaced by appropriate
	escape sequence `\r' for avoiding gcc-3.2.2 warning "multi-line
	string literals are deprecated"

(MAX_FILENAME_LENGTH, MAX_USERNAME_LENGTH): Use system-defined
	limits when available (for example max. filename length under
	Linux is not 128 as harcoded, but 4096).

(USERDIR): Define USERDIR only if not defined somewhere else
	(e.g. from command line).  Very convenient for building binary
	package.

Version 2.5.3 - May 12, 2003
----------------------------

[20030512.1430] jonz: bugfix for ignored headers

a bug was fixed that caused all headers to be ignored if a message was stored
as a raw message in the signature database.

[20030512.1400] jonz: embedded boundary recognition

added embedded boundary recognition to recognize emails with embedded bounaries,
such as those sent by Eudora when special formatting is enabled.
 
[20030512.1200] jonz: documentation

added better documentation for the correct permissions of the dspam 
directories and the correct group memberships for the MTA user. 

[20030512.1200] jonz: locking bugfix

fixed bug in locking that caused a loop if a lockfile could not be created 
(due to file permissions).  also increased lock debugging verbosity.

[20030511.2025] jonz: false positives adjustment

false positives reported now hit a token 3 times innocent instead of 2,
for faster re-learning.

[20030511.2010] jonz: header parsing bug

fixed a header parsing bug that did not carry the original header name
across multiple lines, for example the Received header.

[20030511.1945] jonz: dspam_purge complete

dspam_purge completed and expanded to delete old non-qualifying tokens
and defragment/shrink user dictionaries

[20030511.1945] jonz: rewrite of dspam tools

dspam tools rewritten to support new spam_record structure. 

[20030511.1945] jonz: implementation of struct spam_record

new spam_record structure implemented for database storage; include last
hit date for new purge tool.  subroutines backward compatible to work
with old databases.

[20030511.1827] jonz: bugfix for lock sleep

fixed a bug that caused all dspam processes to sleep for 1 second, even
if a lock was successfully acquired on the first try.

[20030511.1719] jonz: addition of probability information to spams

messages marked as spams now to include the tokens and probabilities used in
the message

[20030511.1600] jonz: body tag filtering

now ignoring body tags.  the only frequently used tags that are being 
considered are font, img, and meta

Version 2.5.2 - May 11, 2003
----------------------------

[20030510.1615] jonz: token word joins with punctuation

token word joins modified to include dollar signs and exlamation points. for
example:

$S A V E$

previously would result in 3 tokens: $S, AV, E$ but now results in one: $SAVE$

[20030510.1500] jonz: bugfix for multipart boundary

a bug fixing a problem with multipart boundaries not being detected when defined
without using quotes has been corrected.  this resulted in the dspam signature
(or identifier) never making it into the message.  for example:

Content-Type: multipart/alternative; 
  boundary='~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~'

is now detected correctly

[20030510.0035] jonz: additional filtering

added additional filtering to ignore words with control characters, 
numbers that are not prefixed with $ or end with %, and any tokens that
do not begin with an alphanumeric character, with the exception of $ and #.

[20030510.0020] jonz: bug fix for lock failures

a bug has been fixed that caused dspam to loop, sending multiple emails
in the event of a lock failure

[20030509.2100] jonz: Makefile for FreeBSD

added makefile for freebsd

[20030509.2015] jonz: procmail fix

added small fix to accomodate some procmail implementations 
that require an empty argument after -a

[20030509.0130] jonz: addition of dspam_purge

please see README for more details

[20030509.0130] jonz: tools to output to stderr

dspam tools to output to stderr

[20030509.0130] jonz: removed probability from db storage

removed the 13-character probability from the hash databases; was 
taking up considerable space and wasn't necessary for the calculation.
is backwards compatible, so there is no need to delete any db's.

[20030509.0040] jonz: ! is now treated as a delimeter

the ! character has been added to the delimiter list

[20030508.2330] jonz: added .lock locking mechanism 

added a .lock locking mechanism to prevent database corruption and/or
quarantine mailbox corruption.

[20030508.1915] jonz: filtering of boundaries

multipart boundaries are now filteres

[20030508.1800] jonz: token word joins

if a token is only one character long, and is adjacent to other similar
tokens, each token will be joined to create a single token.  for example

V I A G R A

will be tokenized as "VIAGRA"

[20030508.1800] jonz: header array abolished

the array holding each header line has been replaced with a nodetree
(dynamic data storage)

[20030508.0800] jonz: bugfix for dspam_clean

dspam_clean segfaults after processing the first user signature file.  this
was due to an invalid database handle being closed.  the correct handle is
now used

Version 2.5.1; May 8 2003:
--------------------------

[20030508.0045] jonz: bugfix for inline comments

inline comments normally used to break up guilty spam words such as
S<!1234>E<!1234>X<!1234>

were only partially filtered, leaving gaps between the letters and causing 
DSPAM to miss the whole word.  this has been corrected to eliminate the space
the comments previously used, bringing the words together for calculation.

[20030508.0025] jonz: strdup() overusage

if only one destination user is specified, strdup() is not used to duplicate 
the original header/body pairs to pass to process_user()

[20030507.1130] jonz: bugfix for multiple users

when multiple users are specified in the local mailer parameters, the first
user process, due to a bug in setting ADD_AS_SPAM, determined whether the
message was spam for all other users.  ADD_AS_SPAM is now reset to its original
value prior to each user's calculation.

[20030507.2200] jonz: increased html filtering

<div and <p html tags are now ignored

Version 2.5; May 7 2003:
------------------------

[20030507.0500] jonz: increased html filtering

td, tr, and table tags are now ignored

[20030507.0500] jonz: increased bare corpus safeguards

the following safeguards have been implemented to prevent false positives
in immature corpuses:

- the minimum number of hits for a token to register at anything above .40
  has been raised from 5 to 20 if the user has fewer than 500 innocent
  messages
- if the user has fewer than 1000 messages, the minumum number of hits
  is equal to 5 + (the spam ratio / 2)

[20030507.0500] jonz: commandline multiple user support 

multiple users on the same commandline (e.g. -d user1 user2 user3) are now 
processed individually.  prior to this, only the first user was processed 
(even though the message was delivered to all users).  this results in each
user having their own unique record of the message in their dictionary and 
signature.

[20030507.0500] jonz: libdb1 -> libdb4 migration

libdb 4 has been implemented after running into some problems with db1 
segmentation faults on large record insertions. as a result, to upgrade to 
this and all newer versions, it will be necessary to delete all existing user 
databases on the system. libdb4 can be found at www.sleepycat.com. it should 
be relatively easy to re-code the db functions for db2 or db3, if the 
administrator doesn't want to use db4. 

[20030506.0400] jonz: buffer.c memcpy implementation

modified buffer.c to use memcpy() instead of strcat() resulting in a 
_significant_ speed increase. the delay caused by strcat() in messages 
with large attachments resulted in message parse times to be +20 seconds. 
using memcpy(), parse time is down to less than a fraction of a second. 
this fix addresses issues with dspam on low-end machines.

[20030506.0400] jonz: server-side storage options

if a token string is longer than the original message, the original message
is stored on the server instead and re-parsed.

[20030506.0400] jonz: zlib compression library

zlib (-lz) is now used to compress server-side signatures. zlib can be found 
at http://www.gzip.org/zlib/.  if you will not be using server-side
signatures, remove the -lz library flag from the makefile.

[20030504.0400] jonz: server-side signatures 

server-side token signatures (SSTS) have been implemented with an optional 
compile flag (set by default). using SSTS will eliminate long, annoying 
DSPAM signatures at the expense of server disk space. the signature appended 
to each email is replaced with a single comment to include a reference token. 
this also enables the complete set of tokens from a message to be recorded 
(although only the top 15 are used in actual calculation).   

compiling without SSTS mode enabled will only record 15 or 60 tokens from a
message, depending on whether more than 5 tokens are recognized.  SSTS mode
will record all tokens.  in either mode, only the most interesting 15 tokens
are used in the calculation.

[20030504.0400] jonz: chained tokens

chained tokens have been implemented providing several new analysis features. 
for example the text 'FREE FOR ALL' will parse into five tokens: 

FREE
FOR
ALL
FREE FOR
FOR ALL

this parsing is not specific to just words, but any type of valid token. 
please read the white paper at: 

http://www.networkdweebs.com/products/dspam/Chained_Tokens.pdf 

...for more information.

[20030504.0400] jonz: token precedence

words not appearing in the opposite corpus were previously assigned a 
probability of .99 or .01. now, priority is given to a token that appears 
more than ten times in a single corpus.  

[20030504.0400] jonz: token case

previously, tokens were case insensitive unless they were in all caps. now,
all tokens are case sensitive. 

[20030504.0400] jonz: short html tags

short HTML tags (less than 15 characters) are filtered out. this helps 
prevent false positives that could be caused by a lack of HTML-based email 
in an innocent corpus. it is normally not desirable behavior to assign a 
higher probability of spam to a message simply because it's in HTML, but we 
don't want to filter out all HTML so longer tags will still be tokenized. 

[2003.0503.0400] jonz: special tokens for urls

URLs are broken down into URL-specific tokens. for example, 
http://www.networkdweebs.com/products/dspam/ will be broken down into: 

Url*www
Url*networkdweebs
Url*com
Url*products
Url*dspam

this should help separate emails with suspicious URLs from emails with the 
same tokens outside of a URL.  

[20030503.0400] jonz: misreported number of messages in quarantine

due to a small bug, the number of messages in a quarantine box can be 
misreported. this has been fixed. 

[20030503.0400] jonz: dspam signature change

the DSPAM signature of previous versions is unfortunately rewritten 
incorrectly by some email clients such as Microsoft Outlook. The signature 
has been modified, and the signature retrieval tool has been coded with more 
of a wildcard approach, to help avoid missing reversal information. 
this only applies to administrators running DSPAM outside of its default 
SSTS mode. 

[20030503.0400] jonz: closing html tags
 
some spams fail to close their /html tag in an attempt to evade some spam 
tools. DSPAM now closes the tag to avoid the dpsm signature being ignored.

[20030503.0400] jonz: ignoring of useless header information

the 'Message-ID', 'Received' and 'Date' headers are now ignored; they 
seemed to be filling up more than half the tokens with useless information 

[20030503.0400] jonz: high asccii characters

tokens with high ASCII characters are now ignored 

[20030503.0400] jonz: forwarded message headers

dspam now ignores message headers for messages forwarded by user as spam with 
no identifiable signature.  this prevents irrelevent information from being
recorded, which could lead to any message in reply to be marked as a false
positive.
 
[20030503.0400] jonz: minor code cleanup for linux build

made some minor changes to code to build without warnings on linux

[20040503.0400] jonz: reequired use of long --addspam flag

the shortened flag for --addspam (-a) has been removed for compatibility 
with procmail (procmail uses -a). in order to use this latest build, 
all spam-box aliases (e.g. spam-bob) must be changed to --addspam. 

[20030503.0400] jonz: flag for chained tokens

added -DCHAINED_TOKENS (enabled by default) switch; those who don't have 
the extra disk space for chained tokens can now turn them off by removing
this compile flag.

[20030503.0400] jonz: debug rework

-DDEBUG now results in debug going to /tmp/dspam.debug 

Version 2.4.1; April 29 2003
----------------------------

[20030429.0000] jonz: dspam_signature tool addition

Added dspam_signature tool for decoding dspam signatures via commandline 

Version 2.4; April 27 2003
--------------------------

[20030427.0000] jonz: signature change

changed the signature to a base64-encoded, BEGIN/END delimited signature. 
people seem to feel more comfortable with it, as it resembles the signatures 
used with PGP, Server Certs, and other encrypted signatures...it's also 
less messy. 

[20030427.0000] jonz: false positive recall mechanism

in the unlikely event of a false positive, a mechanism is now available to 
reverse the information from the false positive and email the message to the 
user. this is made possible via a button while viewing a message in the 
user's quarantine box. 

[20030427.0000] jonz: base64 decoding

new code to Base64 Decode any encoded text segments. some SPAMs being sent 
out today are encoded in an attempt to bypass any filtering.  they are
now decoded prior to analysis and delivery.  this only applies to text 
segments (text/plain, text/html, etc.) and should not affect attachments. 

Version 2.35; April 24 2003
---------------------------

[20030424.0000] jonz: makefile corretion

Makefile.linux: -ldb -> -ldb1

[20030424.0000] jonz: prefixed from line

prefixed messages headed to quarantine with a 'From' header to make mailbox
format compliant.

[20030424.0000] jonz: quarantine box showing no spams

fixed a bug that resulted in caught spams to not show up in quarantine box

Version 2.3; April 20 2003
--------------------------

[20030420.0000] jonz: token insertion bug

fixed a bug that occurs when inserting token information on some
multipart emails, which inserts it into the text/plain segment instead of
the text/html segment

Version 2.2; April 17 2003
--------------------------

[20030417.0000] jonz: reversal information

reversal information is now used in spams to reverse the original 15 tokens
(unlearn and relearn as spam).

Version 2.1; April 14 2003
--------------------------

[20030414.0000] jonz: production changes

applied 0.40 value to words with less than 5 hits
changed spam threshhold from .8 to .9

[2003.0414.0000] jonz: attachments

repaired minor bug in filtering out attachments and html comments

Version 2.0; April 11 2003
--------------------------

Initial release
