ArsDigita Archives
 
 
   
 
spacer

ArsDigita Keepalive

for AOLserver by Ben Adida and Philip Greenspun, part of ArsDigita Free Tools
ArsDigita Keepalive is a system that monitors your web services at regular, short intervals, and takes action to resolve problems found. If Keepalive fails to reach a page, depending on how many consecutive previous failures it has seen and the configuration parameters, it will take one of the following actions:
  • nothing except decrement a counter
  • send email to a previously defined group of addresses
  • execute a shell command, presumably one that restarts the stuck service
Keepalive is built using AOLserver (free) and takes advantage of AOLserver's built-in scheduler (like Unix cron but lighter weight) and Tcl API (includes a call to HTTP GET a page from another server). However, unlike most of our AOLserver products, you don't need to install an RDBMS in order to use Keepalive. Web servers generally get stuck because of problems with the RDBMS, so a monitor that depended on an RDBMS would be self-defeating.

Although we generally use Keepalive to monitor AOLserver-based Web services, it will work fine to monitor any HTTP service on a Unix machine.

Installation

  • Download keepalive-1999.tar.gz (last updated December 14, 1998)
  • cd /web
  • tar xvf keepalive-1999.tar.gz (creates /web/keepalive)
  • Create an AOLserver whose page root is /web/keepalive and whose private Tcl directory is /web/keepalive/tcl
  • Edit /tcl/defs.tcl to set the main Keepalive parameters.
  • Edit the keepalive_init procedure in /tcl/init.tcl to add monitors analogously in the same way as the sample monitor is added, knowing that the arguments to new_monitor are, in order:
    • name
    • URL of test page
    • expected return
    • shell command to execute if failure
    • Tcl list of admin email addresses to notify
    • (optional) number of retries before failure action is executed. This defaults to 5.
    • (optional) threshold of retries below which email is sent. This defaults to the number of retries, meaning that Keepalive will send mail if there is any problem (if you feel that you're getting spammed about problems that work themselves out, set this to some lower number; we find that 4 and 2 are good numbers)
  • You're done! Start your server

Which Shell Command?

You might well ask yourself which shell command will restart a Web server. It depends. In the case of AOLserver, we run the server by inserting a line in /etc/inittab:
nsjw:34:respawn:/home/nsadmin/bin/nsd -i -c /home/nsadmin/nsd.ini
which tells Unix to restart nsd if it should die for any reason. Thus keepalive just needs to kill the existing nsd process. The problem is that Web servers must be owned by root if they are to grab Port 80 and Keepalive can't kill a Web server unless it runs as root (a security risk). The solution at ArsDigita is to build a setuid Perl script that Keepalive can call: restart-aolserver
#!/usr/local/bin/perl

## Restarts an AOLserver. Takes as its only argument the name of the server to kill.

## This is a perl script because it needs to run setuid root, 
## and perl has fewer security gotchas than most shells.


$ENV{'PATH'} = '/sbin:/bin';

# uncomment this stuff if you're at an installation where a server 
# takes a long time to restart or keeps important state

# if (scalar(@ARGV) == 0) {
#     die "Don't run this without any arguments!";
# }

$server = shift;

$< = $>; # set realuid to effective uid (root)

sub getpids {
    ## get the PIDs of all jobdirect servers
    my $ps_output = `/usr/bin/ps -ef`;
    my @pids;
    foreach (split(/\n/, $ps_output)) {
	next unless /^\s*\S+\s+(\d+).*nsd.*$server.ini/;
	push(@pids, $1);
    }
    @pids;
}

@pids = &getpids;
print "Killing ", join(" ", @pids), "\n";
kill 'KILL', @pids;

License

This is open-source software, copyright 1998 ArsDigita, LLC and licensed under the GNU General Public License.

Support and Customization

If you want a extended version of Keepalive or support, you can hire the programmer of your choice to install, maintain, and customize keepalive. ArsDigita offers support as well, but probably not at a price that you'd be happy to pay.
ben@arsdigita.com

Reader's Comments

I think using aolserver to keep another aolserver alive is a bit risky. Even without the RDBMS - if both your aolservers hang for the same reason then what? I would feel much more comfortable using cron and a shell script.

-- David Cotter, September 21, 2001
spacer