INN Python Filtering and Authentication Support

    This is $Revision: 1.3 $, dated $Date: 2002/02/22 20:39:04 $.

    This file documents INN's built-in optional support for Python article
    filtering.  It is patterned after the TCL and Perl hooks previously
    added by Bob Heiney and Christophe Wolfhugel.

    For this filter to work successfully, you will need to have Python 1.5.2
    (the latest at this writing) installed.  You can obtain it from
    <http://www.python.org>.

    The innd Python interface and the original Python filtering
    documentation was written by Greg Andruk (nee Fluffy)
    <gerglery@usa.net>.  The Python authentication and authorization support
    for nnrpd and the original documentation for it was written by Ilya
    Etingof <ilya@glas.net>, 12/1999.

NOTE TO RED HAT LINUX USERS

    Python will be preinstalled, but it may not include all the headers and
    libraries required for embedding into INN.  You will need to add the
    development package.  Better yet, get the source kit from the above URL
    and build it yourself.  Be sure when installing Python on Red Hat, to
    run configure with "--prefix=/usr" so that there are no version
    conflicts with the "factory" installation.  You can also find a
    selection of well made RPMs at
    <ftp://starship.python.net/pub/crew/andrich/>.

INSTALLATION

    Once you have built and installed Python, you can cause INN to use it by
    adding the "--with-python" switch to your configure command.

    See the ctlinnd(8) manual page to learn how to enable, disable and
    reload Python filters on a running server ("ctlinnd mode", "ctlinnd
    python y|n", "ctlinnd reload filter.python").

    Also, see the example filter_innd.py script in your filters directory
    for a demonstration of how to get all this working.

WRITING AN INND FILTER

    You need to create a filter_innd.py module in INN's filter directory
    (see the pathfilter setting in inn.conf).  A heavily-commented sample is
    provided that you can use as a template for your own filter.  There is
    also an INN.py module there which is not actually used by INN; it is
    there so you can test your module interactively.

    First, define a class containing the methods you want to provide to
    innd.  Methods innd will use if present are:

    __init__(self)
        Not explicitly called by innd, but will run whenever the filter
        module is (re)loaded.  This is a good place to initialize constants
        or pick up where filter_before_reload or filter_close left off.

    filter_before_reload(self)
        This will execute any time a "ctlinnd reload all" or "ctlinnd reload
        filter.python" command is issued.  You can use it to save statistics
        or reports for use after reloading.

    filter_close(self)
        This will run when a "ctlinnd shutdown" command is received.

    filter_art(self, art)
        art is a dictionary containing an article's headers and body.  This
        method is called every time innd receives an article.  The following
        can be defined:

            Approved, Control, Date, Distribution, Expires, From, Lines,
            Message-ID, Newsgroups, Path, Reply-To, Sender, Subject,
            Supersedes, Bytes, Also-Control, References, Xref, Keywords,
            X-Trace, NNTP-Posting-Host, Followup-To, Organization,
            Content-Type, Content-Base, Content-Disposition, X-Newsreader,
            X-Mailer, X-Newsposter, X-Cancelled-By, X-Canceled-By, Cancel-Key,
            __LINES__, __BODY__

        All the above values will be buffer objects holding the contents of
        the same named article headers, except for the special __BODY__ and
        __LINES__ items.  Items not present in the article will contain
        None.

        __BODY__ is a buffer object containing the article's entire body,
        and __LINES__ is an int holding innd's reckoning of the number of
        lines in the article.  All the other elements will be buffers with
        the contents of the same-named article headers.

        If you want to accept an article, return None or an empty string. 
        To reject, return a non-empty string.  The rejection strings will be
        shown to local clients and your peers, so keep that in mind when
        phrasing your rejection responses.

    filter_messageid(self, msgid)
        msgid is a buffer object containing the ID of an article being
        offered by IHAVE or CHECK.  Like with filter_art(), the message will
        be refused if you return a non-empty string.  If you use this
        feature, keep it light because it is called at a rather busy place
        in innd's main loop.  Also, do not rely on this function alone to
        reject by ID; you should repeat the tests in filter_art() to catch
        articles sent with TAKETHIS but no CHECK.

    filter_mode(self, oldmode, newmode, reason)
        When the operator issues a ctlinnd pause, throttle or go command,
        this function can be used to do something sensible in accordance
        with the state change.  Stamp a log file, save your state on
        throttle, etc.  oldmode and newmode will be strings containing one
        of the values in ('running', 'throttled', 'paused', 'unknown') --
        oldmode is the state innd was in before ctlinnd was run, newmode is
        the state innd will be in after the command finishes.  reason is the
        comment string provided on the ctlinnd command line.

    To register your methods with innd, you need to create an instance of
    your class, import the built-in INN module, and pass the instance to
    INN.set_filter_hook().  For example:

        class Filter:
            def filter_art(self, art):
                ...
                blah blah
                ...

            def filter_messageid(self, id):
                ...
                yadda yadda
                ...

        import INN
        myfilter = Filter()
        INN.set_filter_hook(myfilter)

    When writing and testing your Python filter, don't be afraid to make use
    of try:/except: and the provided INN.syslog() function.  stdout and
    stderr will be disabled, so your filter will die silently otherwise.

    Also, remember to try importing your module interactively before loading
    it, to ensure there are no obvious errors.  One typo can ruin your whole
    filter.  A dummy INND.py module is provided to facilitate testing
    outside the server.  To test, change into your filter directory and use
    a command like:

        python -ic 'import INN, filter_innd'

    You can define as many or few of the methods listed above as you want in
    your filter class (it's fine to define more methods for your own use;
    innd won't use them but your filter can).  If you *do* define the above
    methods, GET THE PARAMETER COUNTS RIGHT.  There are checks in innd to
    see if the methods exist and are callable, but if you define one and get
    the parameter counts wrong, INND WILL DIE.  You have been warned.  Be
    careful with your return values, too.  The filter_art() and
    filter_messageid() methods have to return strings, or None.  If you
    return something like an int, innd will not be happy.

WHAT'S THE DEAL WITH THESE BUFFER OBJECTS?

    Buffer objects are cousins of strings, new in Python 1.5.2.  They are
    supported, but at this writing you won't yet find much about them in the
    Python documentation.  Using buffer objects may take some getting used
    to, but we can create buffers much faster and with less memory than
    strings.

    For most of the operations you will perform in filters (like re.search,
    string.find, md5.digest) you can treat buffers just like strings, but
    there are a few important differences you should know about:

        # Make a string and a two buffers.
        s = "abc"
        b = buffer("def")
        bs = buffer("abc")

        s == bs          # - This is false because the types differ...
        buffer(s) == bs  # - ...but this is true, the types now agree.
        s == str(bs)     # - This is also true, but buffer() is faster.
        s[:2] == bs[:2]  # - True.  Buffer slices are strings.

        # While most string methods will take either a buffer or string,
        # string.join insists on using only strings.
        string.join([str(b), s], '.')   # returns 'def.abc'

        e = s + b        # This raises a TypeError, but...

        # ...these two both return the string 'abcdef'. The first one
        # is faster -- choose buffer() over str() whenever you can.
        e = buffer(s) + b
        f = s + str(b)

        g = b + '>'      # This is legal, returns the string 'def>'.

FUNCTIONS SUPPLIED BY THE BUILT-IN INND MODULE

    Not only can innd use Python, but your filter can use some of innd's
    features too.  Here is some sample Python code to show what you get:

        import INN

        # Python's native syslog module isn't compiled in by default,
        # so the INN module provides a replacement.  The first parameter
        # tells the Unix syslogger what severity to use; you can
        # abbreviate down to one letter and it's case insensitive.
        # Available levels are (in increasing levels of seriousness)
        # Debug, Info, Notice, Warning, Err, Crit, and Alert. (If you
        # provide any other string, it will be defaulted to Notice.)  The
        # second parameter is the message text.  The syslog entries will
        # go to the same log files innd itself uses, with a 'python:'
        # prefix.
        syslog('warning', 'I will not buy this record.  It is scratched.')
        animals = 'eels'
        vehicle = 'hovercraft'
        syslog('N', 'My %s is full of %s.' % (vehicle, animals))

        # Let's cancel an article!  This only deletes the message on the
        # local server; it doesn't send out a control message or anything
        # scary like that.  Returns 1 if successful, else 0.
        if INN.cancel('<meow$123.456@solvangpastries.edu>'):
            canceled = "yup"
        else:
            canceled = "nope"

        # Check if a given message is in history. This doesn't
        # necessarily mean the article is on your spool; canceled and
        # expired articles hang around in history for a while, and
        # rejected articles will be in there if you have enabled
        # remember_trash in inn.conf. Returns 1 if found, else 0.
        if INN.havehist('<z456$789.abc@isc.org>'):
            comment = "*yawn* I've already seen this article."
        else:
            comment = 'Mmm, fresh news.'

        # Here we are running a local spam filter, so why eat all those
        # cancels?  We can add fake entries to history so they'll get
        # refused.  Returns 1 on success, 0 on failure.
        canceled_id = buffer('<meow$123.456@isc.org>')
        if INN.addhist("<cancel." + canceled_id[1:]):
            thought = "Eat my dust, roadkill!"
        else:
            thought = "Darn, someone beat me to it."

        # We can look at the header or all of an article already on spool,
        # too.  Might be useful for long-memory despamming or
        # authentication things.  Each is returned (if present) as a
        # string object; otherwise you'll end up with an empty string.
        artbody = INN.article('<foo$bar.baz@bungmunch.edu>')
        artheader = INN.head('<foo$bar.baz@bungmunch.edu>')

        # Finally, do you want to see if a given newsgroup is moderated or
        # whatever?  INN.newsgroup returns the last field of a group's
        # entry in active as a string.
        froupflag = INN.newsgroup('alt.fan.karl-malden.nose')
        if froupflag == '':
            moderated = 'no such newsgroup'
        elif froupflag == 'y':
            moderated = "nope"
        elif froupflag == 'm':
            moderated = "yep"
        else:
            moderated = "something else"

PYTHON AUTHENTICATION AND AUTHORIZATION SUPPORT FOR NNRPD

    Python authentication and authorization support in nnrpd along with
    filtering support in innd may be compiled in by giving "--with-python"
    command line flag to configure script. Python authentication and
    authorization may be turned on by nnrppythonauth setting in inn.conf.

    If nnrppythonauth is set to true in inn.conf, nnrpd will load the Python
    module defined in include/paths.h and located in the directory specified
    by pathfilter in inn.conf.  Once that module is loaded, nnrpd will
    authenticate and authorize readers by calling a Python methods rather
    than reading readers.conf and using the normal authentication mechanism.

    Every time an authenticated reader asks nnrpd to read or post an
    article, Python authorization hooks are invoked before proceeding with
    the requested operation.  The authorization functionality makes sense
    when a list of newsgroups in your access statements grows too long to
    maintain in readers.conf or you need to have access control rules
    applied immediately without having to restart all the nnrpd processes. 
    Also, Python authorization hooks perform access control on per newsgroup
    basis while readers.conf does the same on per user basis.

    However, consider the authorization functionality as an option which is
    reasonable in just a few cases (like those mentioned above).

WRITING A NNRPD AUTHENTICATION MODULE

    You need to create a nnrpd_auth.py module in INN's filter directory (see
    the pathfilter setting in inn.conf) where you should define a class
    holding certain methods.

    The following methods are known to nnrpd.  It uses them if present:

    __init__(self)
        Not explicitly called by nnrpd, but will run whenever the auth
        module is loaded.  This is a good place to initialize constants or
        establish a database connection.

    close(self)
        This method is invoked on nnrpd termination.  You can use it to save
        state information or close a database connection.

    authenticate(self, attributes)
        Called when a reader connects or issues AUTHINFO command. 
        Connection attributes are passed in the "attributes" dictionary. 
        The following keys are initialized by nnrpd:

        type
            "connect", "authinfo", "read" or "post" values specify the
            authentication type.

        hostname
            resolved hostname (or IP address if resolution fails) of
            connected reader

        ipaddress
            IP address of connected reader

        interface
            IP address of the interface at this machine reader is connected
            to

        user
            username as reader passed with AUTHINFO command or None if not
            applicable

        pass
            password as reader passed with AUTHINFO command or None if not
            applicible

        newsgroup
            name of the newsgroup reader requests read or post access to or
            None if not applicable

        All the above values are buffer objects (see the notes above on what
        buffer objects are).

        This method should return a tuple of four elements:

        1) NNTP response code.  Should be a valid NNTP response code (see
           example for details).

        2) Whether reading is allowed.  Should be a boolean value.

        3) Whether posting is allowed.  Should be a boolean value.

        4) Wildmat expression that says what groups to provide access to.

        See the explanation of applicable NNTP return codes in hook-perl in
        the INN documentation.

    authorize(self, attributes)
        Called when a reader requests either read or post permission.  The
        "attributes" dictionary is passed to group() method (see above for
        details).

        This method should return None to grant requested permission to
        requested newsgroup or non-empty string otherwise.  The rejection
        string will be shown to reader.

    To register your methods with nnrpd, you need to create an instance of
    your class, import the built-in nnrpd module, and pass the instance to
    nnrpd.set_auth_hook().  For example:

        class AUTH:
            def authenticate(self, attributes):
                ...

            def authorize(self, attributes):
                ...

        import nnrpd
        myauth = AUTH()
        nnrpd.set_auth_hook(myauth)

    There is also an nnrpd.py module there that is not actually used by
    nnrpd but provides the same set of functions as built-in nnrpd module. 
    This stub module may be used when debugging your own module.

FUNCTIONS SUPPLIED BY THE BUILT-IN NNRPD MODULE

    As of this writing, nnrpd built-in module exports the following
    functions:

    set_auth_hook()
        used to pass a reference to the instance of authentication and
        authorization class to nnrpd

    syslog()
        intended to be a replacement for a Python native syslog.

