#!/bin/sh -			# -*-perl-*-
#######################################################################
#
# Copyright @ 2007 Sun Microsystems, Inc., 4150 Network Circle, Santa Clara, California 95054, U.S.A.
# All rights reserved.U.S. Government Rights - Commercial software.  Government users are subject to the
# Sun Microsystems, Inc. standard license agreement and applicable provisions of the FAR and its supplements.
# Use is subject to license terms.  Sun,  Sun Microsystems,  the Sun logo and  Sun Ray are trademarks or
# registered trademarks of Sun Microsystems, Inc. in the U.S. and other countries.UNIX is a registered trademark
# in the U.S. and other countries, exclusively licensed through X/Open Company, Ltd.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software distributed
# under the License is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR
# CONDITIONS OF ANY KIND, either express or implied. See the License for the specific
# language governing permissions and limitations under the License.
#
#######################################################################
#
########################################################################################################
#
#
#      $Id: cleanRunaways,v 2.3 2007/06/07 18:33:49 waynec Exp $	
#      $RCSfile: cleanRunaways,v $
#      $Author: waynec $
#               Original Author: Tim Auckland
#               Modified  by:    Sean Meighan 
#                                Gabriel Bolano 
#			         Daniel Mullenax
#
#      $Revision: 2.3 $
#      $Date: 2007/06/07 18:33:49 $
#
#      Description: This script will look to profile the load runnin on a solaris machine.
#			The methid to obtain the load is to run a "ps -eo " command and 
#			stores it in a file called "oldstate".
#                   when run again it creates another "ps -eo ..." command and stores it in the
#                   file "state". It will next compute the amount of cpu load that has occured:
#
#               ratio = 100.0 * (time_state - time_oldstate) / (etime_state - etime_oldstate)
#			we can pass arguments that will only list processes that have consumed
#			a certain threshold of load over the last delta of the two runs.
#
#			Now that we know how much load a process has consumed, we have defined
#			a table to set the kill thresholds.
#			Here is a sample of what this table looks like:
#       my %app_table    = (
#        'netscape'                      => '2.0',
#        'netscape-bin'                  => '2.0',
#        'mozilla-bin'                   => '2.0',
#        'acroread'                      => '50.0',
#        'soffice1.bin'                  => '50.0',
#
#			This is a column indexed by the process name seen in the "ps -eo ...."
#			command and sets the threshold (in percent) that will qualify this
#			PID to be killed when the kill hour occurs.
#
#			The kill hour is the hour when qualified processes will be killed.
#			For each pid that we decide to terminate this script will attempt
#			a nice to abrubt kill sequence. We first will gather some info on
#			this process being killed so that research can be done into why runaway 
#			processes are occuring.
#
#		Kill sequence
#			pstack $pid > $pid.log
#			pfiles $pid >> $pid.log
#			kill -15 $pid
#			kill -1  $pid
#			kill -9  $pid
#
#
#
#       run string: cleanRunaways [-h] [-q] [-t threshold] [-n num] [-s num] [-kill ] [-m mail | premail ] [-w withoutpostmail]
#                       -h =  help screen
#                       -q = quiet mode, display nothing [default: will display to stdout]
#                       -t = Make any lines that exceed this load (in percentage 0 to 100) 
#                               [default=5]
#                       -n = how many lines to print out. zero indicates to print all lines.
#                       -s = local time (military time) when pids are to be killed,
#                            default is 2 (2am). Requires the "-k" to actually kill pids.
#                       -k = kill command is executed. PIDs that qualify will be killed. Only
#				processes listed in app_table qualify for kill.  default is false
#                       -l = by default, logid is /tmp/.$0, this can be overriden
#                       -m = mail will be sent to "Q" status pids.
#                            default no mail will be sent
#			-z = no mail will be sent following kill.  Default is send mail post kill.
#
#	example:  cleanRunaways -t 2 -n 0 -s 2 -kill -mail  -l /your/own/dir
#	man:      perldoc cleanRunaways 
#       $Log: cleanRunaways,v $
#       Revision 2.3  2007/06/07 18:33:49  waynec
#       corrected Sun copyright license statement
#
#       Revision 2.2  2007/06/01 14:52:32  andreash
#       new license
#
#       Revision 2.1  2007/05/18 18:31:01  waynec
#       added OSR copyright statement
#
#       Revision 2.0  2006/09/09 04:24:24  waynec
#       opencanary
#
#       Revision 1.1  2005/08/05 21:33:33  cvs
#       retire canary_dashboard, new V4 canary monitor
#
#       Revision 1.2  2004/06/15 21:00:36  gtsmaster
#       Added more help to the "-h" option. Added new status "<" to show
#       processes that are in the app_table and would qualify to be killed
#       but we have not had enough time between the two samples. Current default is we
#       need at least 300 seconds before kills will be allowed.
#
#       Revision 1.1  2004/06/15 18:51:41  gtsmaster
#       Initial revision
#
#
#       Enhancements
#  
#       Date            Initial  Name                Short Desc
#       ============    ======   ==================  =====================================
#       Jan 14, 2005    gbm      Gabriel Bolano      added new apps, check for kill progression wait time
#                                                    for ldapsearch, and check to email address..marrked [gbm].

#Feb 10 2005
#Updated to add second, more descriptive,duplicate option to handle mail, premail.
#Changed to send mail after any app is killed.  Added new
#option to surpress this post kill message, withoutpostmail. Removed soffice. Changed usage messages
#and the contents of the mail sent.
#Daniel Mullenax, ddm.
#
#
########################################################################################################

eval 'p=/usr/bin;[ -d $p ]||p=/usr/dist/pkgs/perl5/5bin.`/bin/arch`
exec $p/perl $0 ${1+"$@"}'

if 0;
use strict;
use Getopt::Long;

my $num          = "";
my $process      = "";              
my $ret          = 1;
my @pids_to_kill = ();
my %user_to_warn = ();
my %user_to_kill = (); # [ddm] New hash to hold info for post kill msg.
my %pid_info_tab  = ();
my $current_hour = (localtime(time))[2];
my $rundate      = localtime(time);
my %history          ;

#
# final... 
#
my $ELAPSE_SEC   = 300;	#	we must see at least these many seconds in our sample time
			#	to qualify a process to be killed.

my $THIRTY_YEARS = time() - (30 * 12 * 30 * 24 * 60 * 60);

my $AUDIT_HDR    = "Start_time < $rundate >\n"
        ." Column 2 explanation: \n"
        ."  * This process is consuming load but is not in the app_table, will never be killed\n"
        ."  Q This process is in the app_table and qualified to be killed but the -kill option was not passed.\n"
        ."  < This process is in the app_table and qualified to be killed but we need  ay least 300 seconds between
samples.\n"
        ."  . This process is in the app_table but does not qualify to be killed.\n"
        ."  K This process is in the app_table and was killed\n\n"
        ."  CPU           Seconds    User\n"
        ."  Load    PID  time/etime   ID     VSZ  RSS Command\n"
        ."  ----   ----- ----/---- --------  ---  --- ---------------------------------------\n";


# added by <scm>
my %app_table    = (
        'netscape'                      => '10.0',
        'netscape-bin'                  => '10.0',
        'MozillaFirebird-bin'           => '10.0',
        'mozilla-bin'                   => '10.0',
        'gnome-panel'                   => '10.0',
        'gnome-terminal'                => '10.0',
        'gnome-smproxy'                 => '10.0',
        'gnome-printinfo'               => '50.0',
        'acroread'                      => '50.0',
        'guloginGUI'                    => '50.0',
        '_progres'                      => '50.0',
        'hydrogen'                      => '50.0',
        'gedit'                         => '90.0',
        'dtsession'                     => '90.0',
        'dtcm'                          => '90.0',
        'java_vm'                       => '90.0',
        'firefox-bin'                   => '10.0',
        'gconfd-2'                      => '50.0',
        'nautilus'                      => '90.0'
        );

# [cmc] Thu Aug  5 15:03:22 MEST 2004
#	Xsun removed because killing it will kill user session
#	'Xsun'                          => '90.0'

#
# [gbm] Fri Jan 14 2005
# added firefox-bin, gconfd-2, nautilus per conf call with Uwe Grumes, and Mike Miller
#

# [ddm] Thu Feb 10 2005  Removed  soffice, since default will be to call without mail,
#  without user warning.

my %args        = (
        threshold        => 5.0,                      # percentage
        num              => 00,                       # how many lines to show. "0"=show all
        sched_hour       => 2,	                      # what hour is the kill command executed
        logdir           => '/tmp/.cleanRunaways'     # default log directory
        );

#
# get all options ...
#
GetOptions( \%args
           , 'help'
           , 'quiet'
           , 'threshold=s'
           , 'num=s'
           , 'sched_hour=s'
           , 'kill'
           , 'mail'
           , 'premail'
           , 'withoutpostmail'
           , 'logdir=s'
           , 'debug'
        ) or die usage(\%app_table, \%args);

# [ddm]  added duplicate option for mail prior to kill and withoutpostmail to supress post kill message, premail
#
# help requested...print and exit
#
if($args{help})
{
     usage(\%app_table, \%args);
     exit(0);
}

# just in case this is a one off...
mkdir($args{logdir}, 0777) unless -d $args{logdir};
chmod(0777, $args{logdir});

my($readstate, $writestate, $delta);
{
    my($state, $oldstate);

    my $statef             = "$args{logdir}/state";
       $writestate         = $statef;
    my $oldstatef          = "$args{logdir}/oldstate";

    # Get timestamp of each file
    my $statetime          = (stat($statef))[9];


    # if $statetime is defined, and statef is a file
    # rename state to oldstate...
    # else, we've not run before
    if(defined $statetime and -f $statef)
    {
        #<scm> delta seconds is elapse 
        # between last two runs
        $delta            = (time - $statetime);
        $readstate        = $oldstatef;

        rename($statef, $oldstatef);
        chmod(0777, $statef); 
        chmod(0777, $oldstatef);            
    }
    else
    {
         $delta            = 0;
    }
 chmod(0777, $statef);
        chmod(0777, $oldstatef);

}

my $FIRST;
if($readstate&&open(R, $readstate))
{
	$FIRST=0;
    while(<R>)
    {
        chop();
        my($pid, $elapsed, $cpu, $uid, $vsz, $rss, $args)=split(' ', $_, 7);
        $process = get_program($args);
        $history{$pid, $uid, $args}=[$elapsed, $cpu, $process,$vsz,$rss]; 	# <scm>
    }
    close(R);
} 
else
{
	$FIRST=1;
    warn "$0 has not run before.  Will run for first time.\n" unless $args{quiet};
}

open(PS, "ps -e -opid,etime,time,user,vsz,rss,args|")
or die "Error trying to run ps: $!\n";

# assign all values from ps

my(@ps) = <PS>;

#CPU         Delta       User
# Load    PID Secs        ID       VSZ  RSS Command
# ----  ----- ----/----  --------  ---  --- ---------------------------------------
# 35.78 38746   39/109  12001       0    0 /gridware/sge/bin/solaris64/sge_execd
# 22.94 20592   25/109  bcinque     0    0 /usr/dist/local/share/tjordan/top

#
# open an audit file for each run
# 
open(AUDIT, "+> $args{logdir}/audit")
or die "Could not open AUDIT $! \n";
chmod(0777, "$args{logdir}/audit");

if($writestate)
{
    unless(open(W, ">$writestate"))
    {
        warn "Can't write state $writestate: $!\n" unless $args{quiet};
        undef $writestate;
    }
}
for(@ps)
{
    chop();
    my($pid, $etime, $time, $uid, $vsz, $rss, $args)
    = split(' ', $_, 7);

    my($cpu)     = parsetime($time);
    $args      ||= "<defunct>\n";
    my($elapsed) = parsetime($etime);

    if($writestate)
    {
        print W "$pid $elapsed $cpu $uid $vsz $rss $args\n";
    }

    my($prev, $ratio);

    if($prev=$history{$pid, $uid, $args})
    {
        $elapsed    -= $prev->[0];
        $cpu        -= $prev->[1];
        $process     = $prev->[2];
        $vsz        -= $prev->[3];
        $rss        -= $prev->[4];

    }
    else
    {
        next;
    }
    #
    # put in a divide by zero check
    # and for defunct processes...make sure
    # to get rid of the bogus elapse time calc..
    #
    if((time() - $elapsed) < $THIRTY_YEARS)
    {
        $elapsed = 0;
    }
    if($elapsed > 0)
    {
        $ratio       = ($cpu/$elapsed) * 100.0;
    }
    else
    {
        $ratio       = 0.0;
    }

    ###########################################
    #
    #	write out the line of the process that 
    #	looks like it is consuming higher than
    #	expected load.
    #

    ###########################################

    if($args{debug}) {	print "$_\n"};

    $ratio    = sprintf("%6.2f", $ratio);
    $pid      = sprintf("%5d"  , $pid);
    $cpu      = sprintf("%4d"  , $cpu);
    $elapsed  = sprintf("%-4d" , $elapsed);
    $uid      = sprintf("%-8s" , $uid);
    $vsz      = sprintf("%4d"  , $vsz);
    $rss      = sprintf("%4d"  , $rss);

    $process  = get_program($args);


    # during printing to audit, provide markers per line
    # *  : exceeds threshold, is in app_table, but not -r
    #      or does not exceed but is <defunct>
    #      or is owned by init, and process is java_vm
    # Q  : exceeds threshold, is not in app_table
    # K  : exceeds threshold, is in app_table, and is  -r

    my $marker = ".";

    if($ratio >= $args{threshold})
    {
        $ret         = 0;
    }
    else
    {
        next;
    }

    #
    # for now, comment defunct handling
    #
    #else
    #{
    #    if(   $args =~ m/<defunct>/g
    #       or ($uid =~ /init/ and $process eq 'java_vm' ))
    #    {
    #        print AUDIT "* $ratio $pid $cpu/$elapsed $uid $vsz $rss $app_table{$process} $args\n";
    #    }
    #    next;
    #}


    if( defined $ratio )
    {
        if($ratio >= $app_table{$process})
        {
            #
            # these are problematic pids
            #
            if( defined $app_table{$process})
            {
  		if($delta < $ELAPSE_SEC)
        	{
                     $marker            = "<";
        	}
                elsif( $args{kill} and $current_hour eq $args{sched_hour})
                {
                     #
                     # this pid exceeds threshold, is in app table
                     # time kill, and is the kill run
                     #
                     push(@pids_to_kill, $pid);
                     $pid_info_tab{$pid} = "$ratio, $pid, $cpu, $elapsed, $uid,$vsz,$rss,$app_table{$process},$args";
		     #[ddm] line below added to set message re kill after the fact
                     $user_to_kill{$pid} = "$uid,$app_table{$process},$process, $args";
                     $marker            = "K";
                }
                else
                {
                     # this pid Qualified for a kill,
                     # but it is not kill time
                     # or is it kill run...
                     $marker             = "Q";
                     $user_to_warn{$pid} = "$uid,$app_table{$process},$process, $args";
                }
            }
            else
            {
                # this pid is problematic, but is not in
                # app_table ...might belong there in future
                $marker            = "*";
            }
        }
    }# if def cpu 

    #
    # for now, comment defunct handling
    #
    #if($args =~ m/<defunct>/g or $uid =~ /init/ )
    #{
    #    $marker = "*";
    #}
 
    print AUDIT "$ratio $marker $pid $cpu/$elapsed $uid $vsz $rss $app_table{$process} $args\n";
}
close(PS);

close(W) if $writestate;
chmod(0777,$writestate);
close(AUDIT) if $writestate;


my $cmd = "/usr/bin/cat $args{logdir}/audit | sort -r";
if($args{num})
{
    $cmd .= " | head \-$args{num}";
}

open(AUDIT, "$cmd |")
or die "can't run output filter: $!\n";

my @lines = <AUDIT>;

#
# now reopen ...
#
open(AUDIT, "+> $args{logdir}/audit")
or die "Could not open AUDIT $! \n";
chmod(0777, "$args{logdir}/audit");
print AUDIT $AUDIT_HDR;
print AUDIT @lines;
close(AUDIT) if $writestate;

if(! $args{quiet} && ! $FIRST)
{
    print $AUDIT_HDR;
    for(@lines)
    {
        print $_;
    }
}

#
# if we have bad pids ...
#
if(defined @pids_to_kill)
{
     proc_pids(\@pids_to_kill, \%pid_info_tab, \%args, $rundate, $current_hour);

     #Check for -w withoutpostmail if not set, send message re kill
     unless($args{withoutpostmail})
     {
	  my $key = "";
	  for $key (sort keys %user_to_kill)
          {
		kmail_it($args{logdir}, $key, $user_to_kill{$key});
	  }
     }
	
}


#
# if we have bad pids to warn ...
# and mail flag was sent...
#
if($args{mail})
{
    my $key = "";
    for $key (sort keys %user_to_warn)
    {
         mail_it($args{logdir}, $key, $user_to_warn{$key});
    }
}
#Handle the duplicate mail option to send prior to kill [ddm]
if($args{premail})
{
    my $key = "";
    for $key (sort keys %user_to_warn)
    {
         mail_it($args{logdir}, $key, $user_to_warn{$key});
    }
}


#
#  all done ...
#
close(STDOUT);
exit $ret;


# ~~~~~~~~~~~~~~~~~~~~~~~  L O C A L    S U B S  ~~~~~~~~~~~~~~~~~~~~~~

sub usage
{
    my($app_table, $args) = @_;

    print <<DONE;
usage: $0 [-h] [-q] [-t threshold] [-n num] [-s sched_hour] [-k kill] [-m premail] [-l logdir] [-w withoutpostmail]

        -h = help screen
        -q = quiet mode, display nothing [default: will display to stdout]
        -t = Display any lines that exceed this load (in percentage 0 to 100) [default=5]
        -n = how many lines to print out. zero indicates to print all lines.
        -s = local time (military time) when pids are to be killed,
                 default is 2 (2am). Requires the "-k" to actually kill pids.
        -k = kill command is executed. PIDs that qualify will be killed. Only processes 
                 listed in app_table qualify for kill.  default is false
        -m = premail will be sent to "Q" status pids.
                 default no mail will be sent
        -w = withoutpostmail  Turns off mail sent following process termination 
        -l = by default, logid is /tmp/.cleanRunaways, this can be overriden

     files:  /tmp/.cleanRunaways/audit         ...... contains list of programs that are > threshold
             /tmp/.cleanRunaways/pid.log         .... Contains the pstack and pfiles of pid before being killed
             /tmp/.cleanRunaways/kill.log         ... List of pids that were killed 
             /tmp/.cleanRunaways/state           .... most recent "ps -eo pid,user,time,etime,args" 
             /tmp/.cleanRunaways/oldstate         ... previous "ps -eo pid,user,time,etime,args" 
             /tmp/.cleanRunaways/mail.\$uid.\$pid   ... Copy of email sent to user about Q status pid.

     Examples:  cleanRunaways -h  ..... shows tis help screen
                cleanRunaways -t 5 -s 2 -l /tmp/.cleanRunaways -kill ... Show all processes 
                     that are consuming at least 5% load, in the 2:00am to 2:59am kill   
                     processes that have a load higher than the thresholds shown in
                     the app table. This is the "real" command that can be put into a
                     crontab entry.

0,30 * * * 0-6  cleanRunaways -t 5 -s 2 -l /tmp/.cleanRunaways -kill 
 
                     this will run the cleanRunaways script every thirty minutes and will 
                     kill processes at 2am.

                cleanRunaways ......... Takes defaults and lists processes that qualify
                     to be killed, but doesn't kill anything.
DONE

    my $program = "";
    my $ratio   = "";

	print "\n";
    print "   Current defaults \n";
    print "   -----------------\n";
    for ( sort keys %$args)
    {
         print "     $_ = $$args{$_};\n";
    }

	print "\n\tMinimum seconds between samples to enable killing of processes =  $ELAPSE_SEC\n";
    print "\n\n";


format STDOUT_TOP =
      Current app table and max ratio's:
      ----------------------------------

           Program Name          Load %
      =====================      ======
.

format STDOUT =
       @>>>>>>>>>>>>>>>>>>     @>>>>>>
       $program,                $ratio
.


    for ( sort keys %$app_table)
    {
        $program = $_;
        $ratio   = $$app_table{$program};
        write;
    }
}# end usage

# -------------------------------------------

sub parsetime
{
    local($_)=@_;
    my($d, $h, $m, $s);

    if((undef, undef, $d, $h, $m, $s)=

            /^(((\d+)-)?(\d+):)?([-\d]+):([-\d]+)$/)
    {
        return ((($d||0)*24 + ($h||0))*60 + $m)*60 + $s;
    }
    else
    {
        # "Parse error '$_'\n" unless $args{quiet};
        return 0;
    }
}

# -------------------------------------------

sub get_program
{
    my($fullname)    = @_;

    my $program_name = (split(" ",$fullname))[0];

    if($program_name =~ /\//)
    {                                           
        my @toks = ();
        @toks    = split('/',$program_name);
        $process = $toks[$#toks];
    }                           
    else                       
    {                         
        $process = $program_name;
    }                        
    if(! defined $process)
    {
        return "?";
    }
    else
    {
        return($process);
    }
}

# -------------------------------------------

sub proc_pids
{
   my($pids, $pid_info_tab, $args, $rundate, $current_hour) = @_;

   my %killed = ();
   #
   # if it is kill time, and kill run
   # remove all *.log and email* files
   #
   if( $$args{kill} and $$args{sched_hour} eq $current_hour)
   {
       `/usr/bin/rm -f $$args{logdir}/*log > /dev/null`;
       `/usr/bin/rm -f $$args{logdir}/mail* > /dev/null`;
   }
   #
   # the killed log is kept
   # for 24 hours...every
   # midnight, erase content
   #
   my $open_handle = ">>";
   my $killed_log  = "$args{logdir}/killed\.log";
   
   if($current_hour eq "24")
   {
      $open_handle = "+>";
   }

   open(KILLED, "$open_handle $$args{logdir}/killed\.log")
   or die "Could not open KILLED[$$args{logdir}/killed\.log] $! \n";
   chmod(0777, "$$args{logdir}/killed\.log");

   my $pstack = "/usr/bin/pstack 2>/dev/null";
   my $pfiles = "/usr/bin/pfiles 2>/dev/null";
   my $pid = "";
   for $pid(@$pids)
   {
       open(PID, "+> $$args{logdir}/$pid.log");
       print PID "<Rundate : $rundate>\n";

       open(FH, "$pstack $pid |");
       my (@out) = <FH>;
       close(FH);
       print PID"<pstack>\n@out\n</pstack>\n";
           
       open(FH, "$pfiles $pid |");
       (@out) = <FH>;
       close(FH);
       print PID"<pfiles>\n@out\n</pfiles>\n";
     
       close(PID);
       #
       # [gbm] Fri Jan 14, 2005
       # there are runaways that are too far gone
       # and ignores all SIG except 9...
       # go through progression anyway and check if successful
       # kill -9 as last resort...
       #
       my $signal = 0;
       for $signal (15, 1, 9)
       {
            kill $signal, $pid;
            sleep(1);
            my $ps_count = `/usr/bin/ps -ef | /usr/bin/grep -v grep | /usr/bin/grep $pid | /usr/bin/wc -l`;
            chop($ps_count);
 
            if(defined $ps_count and $ps_count < 1)
            {
                #success
                $killed{$pid} = $signal;
                last;
            }
       }

   }

   # report all pids that were killed...
   my $killed_pid = 0;

   for $killed_pid (@$pids)
   {
      my $signal = "Not killed";
      if(exists $killed{$killed_pid})
      {
           $signal = $killed{$killed_pid};
      }
      print KILLED " $killed_pid [ $signal ]\t $$pid_info_tab{$killed_pid}\n";
   }
   close(KILLED);

}

# -------------------------------------------

sub mail_it
{
  my($dir, $pid, $line) = @_;

  my($uid,$tab,$process,$args) = split(/\,/, $line);

  $pid          = trim($pid);
  $uid          = trim($uid);
  $tab          = trim($tab);
  $process      = trim($process);
  $args         = trim($args);

  my $log       = "$dir/mail\.$uid\.$pid";
  #
  # for each warning candidate,
  # we are going to keep a log
  # using uid.pid as filename
  # if it already exist...
  # it already has been sent
  # send only once...
  #

  return if( -f "$log");

  my $to        = get_email($uid);

  #
  # [gbm] Fri Jan 14, 2005
  # Add check so that we don't get mail error in mail
  #
  return if($to eq "");

  my $host      = `/usr/bin/hostname`;
  chop($host); 
  my $from      = 'nobody';
  my $signature = "===============================================\n"
                 ."This mail was sent to you automatically.\n"
                 ."Please do not reply to this mail!";
  my $body = "


** For your information only (no action needs to be taken) **


Your $process application running on $host looks like
it is abnormally busy.

The sunray server will automatically terminate this process
at 2:00 am (local time) if it still looks abnormally busy at
1:00 am (local time).

The program details are:

  Name: $process
   PID: $pid
  Host: $host

You can:

* in case of mozilla/firebird or other web browser application, it would be extremely
 helpful if you return all your browser windows to \"quiet\" web sites

eg. http://hr.central, http://sunweb.central, http://webhome.central

after browsing flash animated web sites (eg. http://www.forbes.com).
This saves CPU resources on the sunray server and will prevent that the server
terminates your web browser application at 2:00am (local time).

* continue working with $process and let the server possibly
terminate the process automatically tonight.
Note: make sure you save any changes!

* completely exit $process now, and start up a new instance.

Should you have any questions regarding this or have been severely
impacted by the termination of the named process please contact
SunRay-Feedback\@Sun.COM for assistance.

Many thanks for your co-operation.


IT Operations
";

  #
  # Send email to user
  # and log email to logfile ...
  #

  open(FH, "+> $log");
  if(open (MAIL, "|/usr/bin/mail -t $to")) {

    print MAIL "From: $from \n";
    print MAIL "Subject: Process Warning \n\n";
    print MAIL "$body";
    print MAIL "\n\n";
    print MAIL "$signature";
    close (MAIL);

    # now, the log
    print FH "From: $from \n";
    print FH "Subject: Process Warning \n\n";
    print FH "$body";
    print FH "\n\n";
    print FH "$signature";
    close (FH);

  }# end if
} # end mail_it

# -------------------------------------------
# [cmc] removed domain as not required and get_domain broken 
#   my $h      = "sun-ds\.$domain\:389";

sub get_email
{
   my($uid)   = @_;

   my $cmd    = "/usr/bin/ldapsearch";
   my $h      = "sun-ds\:389";
   my $b      = "dc=sun,dc=com";
   my $l      = "15";
   my $filter = "(uid=$uid)";

   my $email  = "";


   #
   # [gbm] Fri Jan 14, 2005
   # Add -l switch as local sun-ds may not respond due to load/busy.
   # ...-l is timelimit.  Wait at most timelimit seconds for a  search  to  complete.
   # Stored return in array as S10 returns array where as S9 returns 1 line
   #

   my @arr    = `$cmd -l $l -h $h -b $b \"$filter\" mail`;
  
   my $ldap   = "";
   for $ldap (@arr)
   {
       next if(! $ldap  =~ /mail/);
       $email     = $1 if($ldap =~ /mail[ = : ](.*)/x);
   }

   return($email);
} # end get_email
#-------------------------------------------

#[ddm] copied and modified mail_it sub for post kill message.
sub kmail_it
{
  my($dir, $pid, $line) = @_;

  my($uid,$tab,$process,$args) = split(/\,/, $line);

  $pid          = trim($pid);
  $uid          = trim($uid);
  $tab          = trim($tab);
  $process      = trim($process);
  $args         = trim($args);

  my $log       = "$dir/mail\.$uid\.$pid";

  return if( -f "$log");

  my $to        = get_email($uid);

 return if($to eq "");

  my $host      = `/usr/bin/hostname`;
  chop($host);
  my $from      = 'nobody';
  my $signature = "===============================================\n"
                 ."This mail was sent to you automatically.\n"
                 ."Please do not reply to this mail!";
  my $body = "


** For your information only (no action needs to be taken) **


The SunRay Cleanup script detected that a program you were running on
the host $host was consuming an abnormally large amount of CPU 
resources and was terminated.

The program details were:


  Name: $process
   PID: $pid
  Host: $host


Problematic applications are typically, though not always, web browsers
left on sites that make extensive use of flash animation.  To minimize
the chance that the browser with consume abnormal amounts of CPU in
the future, you can:

Return all your browser windows to quiet web sites

eg. http://hr.central, http://sunweb.central, http://webhome.central

after browsing flash animated web sites (eg. http://www.forbes.com).
This saves CPU resources on the sunray server and will help prevent
your web browser application from being terminated for excess CPU consumptionn
in the future.

Should you have any questions regarding this or have been severely
impacted by the termination of the named process please contact
SunRay-Feedback\@Sun.COM for assistance. 

Many thanks for your co-operation.



IT Operations
";

open(FH, "+> $log");
  if(open (MAIL, "|/usr/bin/mail -t $to")) {

    print MAIL "From: $from \n";
    print MAIL "Subject: Process Warning \n\n";
    print MAIL "$body";
    print MAIL "\n\n";
    print MAIL "$signature";
    close (MAIL);

    # now, the log
    print FH "From: $from \n";
    print FH "Subject: Process Warning \n\n";
    print FH "$body";
    print FH "\n\n";
    print FH "$signature";
    close (FH);

  }# end if
} # end kmail_it


sub get_domain
{
   my $domain = `/usr/bin/domainname`;
   chop($domain);

   if($domain =~ 'uk')
   {
      $domain = 'uk';
   }
   elsif($domain =~ 'canada')
   {
      $domain = 'canada';
   }
   elsif($domain =~ 'singapore')
   {
      $domain = 'singapore';
   }
   elsif($domain =~ 'aus')
   {
      $domain = 'aus';
   }
   elsif($domain =~ 'west')
   {
      $domain = 'west';
   }
   elsif($domain =~ 'central')
   {
      $domain = 'central';
   }
   elsif($domain =~ 'east')
   {
      $domain = 'east';
   }
   elsif($domain =~ 'sfbay')
   {
      $domain = 'sfbay';
   }
   elsif($domain =~ 'holland')
   {
      $domain = 'holland';
   }
   return ($domain);

} # end get_domain

# -------------------------------------------

sub trim
{
   my ($string)    =  @_;

   for($string)
   {
       s/^\s+//; #remove leading spaces
       s/\s+$//; #remove trailing spaces
   }
   return ($string);

} # end trim

# ~~~~~~~~~~~~~~~~~~~~~~~  M A N    P A G E ~~~~~~~~~~~~~~~~~~~~~~

=head1 NAME

CleanRunaways - list looping ratio of processes.


=head1 SYNOPSIS

cleanRunaways [B<-h>] [B<-q>] [B<-t> I<threshold>] 
[B<-n> I<num>] [B<-k> I<num>] [B<-s>] [B<-l>]  [B<-m>]

=head1 DESCRIPTION

B<cleanRunaways> generates a process listing sorted according to the 
percentage of CPU Time over Elapsed Time for all running processes.

Most interactive processes show low ratios (below 10).  High ratios
(above 50) indicate processes that are very likely to be looping.

=over 4

=item B<-h>

Print the usage() routine.

=item B<-q>

Suppress non-fatal error messages and do not echo content of audit file.

=item B<-t> I<threshold>

Percentages greater then I<threshold> will be regarded as looping.
only affects exit status see L<EXIT STATUS>.  Default is 50.

B<cleanRunaways> records CPU usage figures from previous runs in order to
quickly spot problem processes.  If the primary state file is less
than I<$ELAPSE_SEC> seconds old, an older state file will be used.  Default is
300 (5 minutes).

=item B<-n> I<num>

Display only top I<num> lines.  Defaults is 10.  If I<num> is zero
all processes that have used more than threshold is listed.

=item B<-k> I<sched_hour>

During this hour, kill all processes that have surpassed the
threshold.  Defaults is 2 am localtime.

=item B<-m> I<mail>

Send email to B<uid> whose B<pid> status is B<Q>.  By default
no email will be sent.

=item B<-r> I<kill>

By default, this run is for reporting purposes.  Otherwise, this run will
proceed to kill all processes exceedng the I<threshold> at I<sched_hour>.

=item B<-l> I<logdir>

During this run, override default log directory of /tmp/.$0.


=back

=head1 STDOUT

Output is displayed on four columns:

RATIO     PID       USER COMMAND

=head1 EXIT STATUS

The following error values are returned:

0         One or more processes is looping.



>>0       No looping processes found.



=head1 FILES

=over 4

=item F</tmp/I<logdir>/state>

Primary state file.

=item F</tmp/I<logdir>/oldstate>

Secondary state file.

=item F</tmp/I<logdir>/killed>

Log for killed PIDS.

=item F</tmp/I<logdir>/$pid_num>

PID logs (pfiles, pstack) file.

=item F</tmp/I<logdir>/mail.$uid.$pid>

Copy of email sent to user.

=back

=cut




