Installing the Monitors for the ACS system

by Hiro Iwashima and Ryan Lee

Installing all of the standard monitors for your system can be a hassle of configuration. A small change can take hours to fix, and overall, it is not very helpful to the user. Therefore, we have attempted to simplify the documents and show step-by-step the steps necessary for a complete monitored system and also a happy sysadmin.

Table of Contents


Keepalive

Keepalive original docs

Keepalive makes sure that your server can be accessed regularly. If it can't, it'll perform specified actions.

  1. Grab the tar file from Arsdigita Download page
  2. untar it and move the directory as /web/keepalive
  3. change the ownership of the directory as nsadmin.
    chown -R nsadmin.nsadmin /web/keepalive
  4. make a keepalive directory under /home/aol30/servers. Copy another directory by using
    cp -pR [existing server] keepalive
  5. use a text editor to make /home/aol30/keepalive.ini or grab it from www.arsdigita.com/install/keepalive.ini. Set the correct address and hostname under [ns/server/keepalive/module/nssock]. Make sure to remove the line concerning nsssl if you don't have it. It's the last line in the [ns/server/keepalive/modules] section. If you are copying it from another ini file, make sure you have the following.
    [ns/server/keepalive/modules]
    nslog=nslog.so
    nssock=nssock.so
    nsperm=nsperm.so
    Also, make sure the keepalive.ini file points to the correct log locations (the file currently available at ArsDigita.com states that logs are kept in /home/nsadmin/log, whereas our installation puts logs in /home/aol30/log. Replace occurrences of nsadmin (except User=nsadmin) with aol30)
  6. Make sure you have a restart-aolserver script in your /home/aol30/bin directory. If you don't have it, it's at the bottom of the page. Also make sure that /home/aol30/bin is in the path.
  7. Edit /web/keepalive/tcl/defs.tcl (update the parameters) and /web/keepalive/tcl/init.tcl (keepalive_init procedure). In init.tcl, add monitors in the same way as the sample. The arguments are, in order: To make sure the restart-aolserver works, add /home/aol30/bin to your path in one of the start scripts, for example, /etc/profile. Source it again by running from prompt:
    . /etc/profile
  8. copy /web/yourservername/tcl/ad-utilities.tcl.preload into /web/keepalive/tcl/ad-utilities.tcl.preload and /web/yourservername/tcl/00-ad-preload.tcl into /web/keepalive/tcl/00-ad-preload.tcl. It will create a few error messages in your error log because it doesn't find some of the preload files that are in your server installation, but it doesn't really matter.
  9. insert keepalive into /etc/inittab to make sure it respawns.
    nska:34:respawn:/home/aol30/bin/nsd -ic /home/aol30/keepalive.ini
  10. Now, you can start the process by typing
    /home/aol30/bin/nsd -i -c /home/aol30/keepalive.ini
  11. To make sure it is running, go to /web/yourservername/www/SYSTEM/. Perform
    mv dbtest.tcl dbtest.tcl.moved
    This should make the aolserver fail, and send you email. Move it back when you are done.

Uptime

Uptime original docs

Uptime will make sure that your web server is up and running by checking it at designated intervals and performing the specified actions on it.

Sign up for Uptime
If the machine on which your service runs is down, the keepalive service on your machine will be down as well. Uptime resides on a separate server and sends alerts when your server can not be reached. You should use the forms at Uptime to register alerts to the following:

You should break your montoring page to make sure Uptime sends an alert. Then return the page to normal.

Watchdog

Watchdog original docs

Watchdog will check your error logs as designated intervals and send email of the error to the ones specified.

  1. Grab the tarfile at ArsDigita Download
  2. Untar it into /web/watchdog
  3. change the ownership of the directory as nsadmin.
    chown -R nsadmin.nsadmin /web/watchdog
  4. grab the ini file from www.arsdigita.com/install/watchdog.ini and put it in /home/aol30
  5. modify the ini file
  6. make the server directory under /home/aol30/servers/watchdog (similar to keepalive)
  7. insert watchdog into the /etc/inittab
    nswd:34:respawn:/home/aol30/bin/nsd -ic /home/aol30/watchdog.ini
  8. Goto http://yourserver:1998/ to add your server to the list.
  9. Create some tcl errors, make sure email is sent. The email is sent to the administrator, unless specified in /web/yourserver/parameteres/yourserver.ini file under [ns/server/emp530/monitoring]

Cassandrix

Cassandrix original docs

Cassandrix makes sure that you have enough disk space on your harddrive. If it starts to run out, it will send email alerts.

  1. Grab the tarfile at ArsDigita Download
  2. untar it somewhere
  3. change the ownership of the directory to be set to yourself.
    chown -R yourself.yourself whatever
  4. Target machines:
    1. copy the files in the Cassandrix SYSTEM directory into /web/yourservername/www/SYSTEM directory.
  5. Master Machines:
    1. copy the files in the Cassandrix tcl to the server's private TCL library. Currently, there's only cx-defs.tcl
    2. copy the Cassandrix directory into /web/yourservername/www/cassandrix
    3. Make sure adp pages are enabled. In your nsd.tcl or nsd.ini in /home/aol30, make sure you have this:
      [ns/server/markd/adp]
                 Map=/*.adp
    4. feed cassandrix.sql into Oracle.
      sqlplus orauser/orapassword < cassandrix.sql
    5. Restart your aolserver:
      restart-aolserver
    6. goto http://yourservername/cassandrix/index.adp and tell it which machines to monitor
      • Host Name : the name of the host to be monitored. This is just used for putting a name with links on the various pages, and doesn't have to be a fully-qualified domain name.
      • base URL : the base url from which to construct the /SYSTEM/* urls which generl through pager gateway. It's best to make this a generic subject since (if supplied) will be used as the subject for all alerts, including the "everything's OK" alert.
      • custom email body : specialized email body to use on outgoing mail. Like the custom email subject, this is used for all a filesystems that are full are appended to the body.
      • notification interval : how often to send mail complaining that disks are full. It doesn't make much sense to set this to be less than the monitor interval.

Cassandracle

Cassandracle original docs

Cassandracle monitors an Oracle installation. For this monitor, we want to use a more restricted Oracle driver, namely /home/aol30/ora8cass.so that was created when you installed the drivers. If it doesn't exist, then go to the ArsDigita oracle driver installation.

  1. Grab the tarfile at ArsDigita Download
  2. untar it into /web/ce
  3. copy /web/yourservername/tcl/ad-utilities.tcl.preload into /web/ce/tcl/ad-utilities.tcl.preload and /web/yourservername/tcl/00-ad-preload.tcl into /web/ce/tcl/00-ad-preload.tcl. It will create a few error messages in your error log because it doesn't find some of the preload files that are in your server installation, but it doesn't really matter.
  4. change users to oracle and specify to use that user's environment:
    su orauser -
    where orauser is your Oracle user
  5. Run the following at prompt:
    svrmgrl
       connect internal
       create user cassandracle identified by *password* default tablespace yourtablespace temporary tablespace temp quota unlimited on yourtablespace;
       grant connect, resource, dba to cassandracle;
       grant select on V_$SQLTEXT to public;
       exit
  6. run the following at prompt:
    svrmgrl
       connect internal
       grant select on V_$SQLTEXT to public;
       exit
  7. run the procedures in /web/ce/doc/helper-procedures.sql
    sqlplus orauser/orapassword < /web/ce/doc/helper-procedures.sql
  8. get out of the oracle user
  9. su nsadmin -
  10. make the ini file.
    cp /home/aol30/yourserver.ini /home/aol30/ce.ini
  11. edit ce.ini
  12. make the server directory under /home/aol30/servers/ce. (copy yourserver's server directory)
  13. insert into /etc/inittab
    nsce:34:respawn:/home/aol30/bin/nsd-oracle -ic /home/aol30/ce.ini
  14. type init q to load it, go to http://yourserver:1999

MTA (Mail Transport Agent) Monitor

MTA original docs

This monitors a group of mail transport agents administred by one or more administrators. It basically connects every five minutes to each SMTP port, then also try to send a little mail every 15 minutes. If it fails, then it will send email to the appropriate email addresses.

  1. Grab the tarfile at ArsDigita Download
  2. make a directory (accessable by nsadmin) /web/mmon. untar it in that directory (the tarfile creates www, parameters, and tcl directories)
  3. Create the AOLserver install:
  4. feed the data model into Oracle. You can either run
    sqlplus orauser/orapassword < /web/mmon/www/doc/sql/mmon.sql
    or visit http://yourserver:8888/mmon/data-model.tcl (Keep your eyes on the error log to make sure it worked). If you have problems, they you can run http://yourserver:8888/mmon/drop-everything-user-with-care.tcl
  5. Edit bouncer.pl and receiver.pl in /web/mmon/www/mmon/. Fix server's hostname or IP address and to make sure whether the Perl executable is in /usr/bin or in /usr/local/bin
  6. Within your RedHat install, you should have sendmail.
  7. Create a special E-mail account (usually an alias) on every monitored server which calls bouncer.pl. You'll enter in this alias when set up a server to be monitored. The default name is mmon_bouncer.
  8. Create a special E-mail account on the monitoring server. That account should be configured to spawn receiver.pl. For example, if you are using qmail you can create a UNIX user and put in his home directory file called .qmail (not the leading dot) with a single line:
    | /path-to/receiver.pl
    With Sendmail you would add a line to /etc/aliases:
    mmon-receiver: |/path-to/receiver.pl
  9. copy /web/yourservername/tcl/ad-utilities.tcl.preload into /web/mmon/tcl/ad-utilities.tcl.preload and /web/yourservername/tcl/00-ad-preload.tcl into /web/mmon/tcl/00-ad-preload.tcl. It will create a few error messages in your error log because it doesn't find some of the preload files that are in your server installation, but it doesn't really matter.
  10. edit /web/mmon/parameters/mmon.ini. For testing, you may want to set MinNotificationInterval,MinutesBetweenSMTPChecks and BounceTimeout to lower values to make sure that they work.
  11. edit /web/mmon/tcl/mmon-defs.tcl *****WHAT DO WE CHANGE HERE???******
  12. edit /web/mmon/parameters/mmon.ini: change the emails
  13. Restart AOLserver
  14. Visit http://yourserver:8888/mmon/server-add.tcl and add the required servers you'd like to have monitored
  15. Observe the server log and observe whether the MTA Monitor wakes up in the specified interval. Frequently reload http://yourserver:8888/mmon/controlpanel.tcl to see what's going on
  16. Simulate some problem with an MTA and see if the problems get reported. (i.e. change the SMTP port to a nonstandard value, or change the bouncer E-mail address to your own address) and make sure it gets reported




Appendix


Restarting AOL Server

We have a script, /home/aol30/restart-aolserver, which is necessary to run keepalive and some other things.
  

    #!/usr/local/bin/perl

    ## Restarts an AOLserver. Takes as its only argument the name of the server to kill.

    ## This is a perl script because it needs to run setuid root, 
    ## and perl has fewer security gotchas than most shells.


    $ENV{'PATH'} = '/sbin:/bin';

    # uncomment this stuff if you're at an installation where a server 
    # takes a long time to restart or keeps important state

    # if (scalar(@ARGV) == 0) {
    #     die "Don't run this without any arguments!";
    # }

    $server = shift;

    $< = $>; # set realuid to effective uid (root)

    sub getpids {
        ## get the PIDs of all jobdirect servers
        my $ps_output = `/usr/bin/ps -ef`;
        my @pids;
        foreach (split(/\n/, $ps_output)) {
            next unless /^\s*\S+\s+(\d+).*nsd.*$server.ini/;
            push(@pids, $1);
        }
        @pids;
    }

    @pids = &getpids;
    print "Killing ", join(" ", @pids), "\n";
    kill 'KILL', @pids;
   

Make sure that you have the correct version of ps in the line that says:

my $ps_output = `/usr/bin/ps -ef`;
You might want to make it
`/usr/ps -ef`
or wherever your ps is. If you are confused, you can find out by typing
which ps
at the prompt. Also, on some systems, it might be better to use -ewf option rather than -ef to make sure that ps doesn't truncate the text.