Installing Monitors

ACS Documentation : ACS Installation Guide : Installing Monitors 

Installing all of the standard monitors for your system can be a hassle of configuration. A small change can take hours to fix, and overall, it is not very helpful to the user. Therefore, we have attempted to simplify the documents and show step-by-step the steps necessary for a complete monitored system and also a happy sysadmin.

Table of Contents


Keepalive

Keepalive original docs

Keepalive makes sure that your server can be accessed regularly. If it can't, it'll perform specified actions.

  1. Grab the tar file from Arsdigita Download page Save it into /tmp.
  2. Login as nsadmin and untar it. Move the directory to /web/keepalive.
  3. $ su - nsadmin
    $ cd /tmp
    $ tar -xzvf keepalive_tar.tgz --directory=/web
  4. Change the ownership of the directory as nsadmin.
  5. $ su - nsadmin
    ; enter nsadmin password    
    $ chown -R nsadmin.nsadmin /web/keepalive
  6. Use a text editor to make /home/aol30/keepalive.ini or grab it from keepalive.ini. Set the correct address and hostname under [ns/server/keepalive/module/nssock]. Make sore you have at least the following:
  7. [ns/server/keepalive/modules]
    nslog=nslog.so
    nssock=nssock.so
    nsperm=nsperm.so
  8. Make sure you have a restart-aolserver script in /usr/local/bin. If you don't have it, it's at the ACS documentation..
  9. Edit /web/keepalive/tcl/defs.tcl. Make sure that keepalive_email returns a valid email address for error logs to be sent to.
  10. Edit /web/keepalive/tcl/init.tcl and update the keepalive_init procedure. In init.tcl, add monitors in the same way as the sample. The arguments are, in order:
  11. Here is a sample:
    lappend keepalive_monitor_list [new_monitor "service_name" "http://127.0.0.1/SYSTEM/dbtest.tcl" "success" "restart-aolserver service_name"  [list email_1@arsdigita.com email_2@arsdigita.com] [list your_pager@arsdigita.com] 4 2]
    
  12. Copy /web/service_name/tcl/ad-utilities.tcl.preload into /web/keepalive/tcl/ad-utilities.tcl.preload and /web/service_name/tcl/00-ad-preload.tcl into /web/keepalive/tcl/00-ad-preload.tcl. It will create a few error messages in your error log because it doesn't find some of the preload files that are in your server installation, but it doesn't really matter.
  13. $ cp /web/service_name/tcl/ad-utilities.tcl.preload /web/keepalive/tcl/                        
    $ cp /web/service_name/tcl/00-ad-preload.tcl /web/keepalive/tcl/
    
  14. Now, you can start the process by typing
  15. /home/aol30/bin/nsd -c /home/aol30/keepalive.ini
  16. To test the service, visit your website and go to /SYSTEM/dbtest.tcl. You should be see a text message: success. If not, it means that you are accessing the wrong URL or your system is not working at all!.
  17. Visit your website again, this time at port 1997. You should see a page that looks like this. The keepalive system will periodically query your system's dbtest.tcl page. This page returns success if and only if there is a working database connection. If keepalive does not receive the successmessage, it decrements the counter. When the counter reaches the notify threshold, you will get an email. When the counter reaches 0, the system will be restarted and the process will begin again.
  18. To ensure this is happening, issue the following command:
  19. $ mv /web/service_name/www/SYSTEM/dbtest.tcl /web/service_name/www/SYSTEM/dbtest.old
    Now periodically reload the Keepalive page and watch the counters decrement. You can also watch the error log file. (Hit CTRL-C to stop).
    $ tail -f /home/aol30/log/keepalive-error.log
    Make sure that you receive e-mail. In a separate window, keep tabs on your primary service.
    $ tail -f /home/aol30/log/service_name-error.log
    After the counter reaches 0, your primary service should restart. If it doesn't make sure that restart-aolserver still works and that the monitor definition in /web/keepalive/tcl/init.tcl is correct.

    If the system restarts, restore the dbtest.tcl file.

    $ mv /web/service_name/www/SYSTEM/dbtest.tcl.old /web/service_name/www/SYSTEM/dbtest.tcl
    Make sure that keepalive is now functioning again.
  20. Insert keepalive into /etc/inittab to make sure it starts automatically and restarts if it is ever killed.
  21. $ su -
    # emacs -nw /etc/inittab
    Add this line:
    nska:345:respawn:/home/aol30/bin/nsd -ic /home/aol30/keepalive.ini
    Kill keepalive and re-initialize inittab.
    # restart-aolserver keepalive
    # init q
    Make sure that both keepalive and your primary service are still running. You're done.

Uptime

Uptime original docs

Uptime will make sure that your web server is up and running by checking it at designated intervals and performing the specified actions on it.

Sign up for Uptime
If the machine on which your service runs is down, the keepalive service on your machine will be down as well. Uptime resides on a separate server and sends alerts when your server can not be reached. You should use the forms at Uptime to register alerts to the following:

You should break your montoring page to make sure Uptime sends an alert. Then return the page to normal. 

Watchdog

Watchdog original docs

Watchdog will check your error logs as designated intervals and send email of the error to the ones specified.

  1. Grab the tarfile at ArsDigita Download
  2. Untar it into /web/watchdog
  3. $ tar -xzvf keepalive_tar.tgz --directory=/web
  4. change the ownership of the directory as nsadmin.
  5. chown -R nsadmin.nsadmin /web/watchdog
  6. grab the ini file from www.arsdigita.com/install/watchdog.ini and put it in /home/aol30
  7. modify the ini file
  8. Edit /web/watchdog/tcl/defs.tcl so that the watchdog_maintainer_email proc returns the correct e-mail of the site maintainer
  9. Copy ad-utilities.tcl.preload, 00-ad-preload.tcl, ad-defs.tcl, and ad-aolserver-3.tcl into /web/watchdog/tcl
  10. $ cp /web/service_name/tcl/ad-utilities.tcl.preload /web/watchdog/tcl/
    $ cp /web/service_name/tcl/00-ad-preload.tcl /web/watchdog/tcl/
    $ cp /web/service_name/tcl/ad-defs.tcl /web/watchdog/tcl/
    $ cp /web/service_name/tcl/ad-aolserver-3.tcl /web/watchdog/tcl/
    
  11. Insert watchdog into the /etc/inittab
  12. nswd:345:respawn:/home/aol30/bin/nsd -ic /home/aol30/watchdog.ini
  13. Goto http://yourserver:1998/ to add your server to the list.
  14. Create some tcl errors, make sure email is sent. The email is sent to the administrator, unless specified in /web/yourserver/parameteres/yourserver.ini file under [ns/server/emp530/monitoring]

Cassandrix

Cassandrix original docs

Cassandrix makes sure that you have enough disk space on your harddrive. If it starts to run out, it will send email alerts.

  1. Grab the tarfile at ArsDigita Download
  2. untar it somewhere (/tmp, for example)
  3. $ tar -xzvf cassandrix-1_0_tar.tgz --directory=/tmp
  4. change the ownership of the directory to be set to yourself.
  5. chown -R yourself.yourself /tmp/cassandrix
  6. Target machines only:
    1. copy the files in the Cassandrix SYSTEM directory into /web/service_name/www/SYSTEM directory.
    2. $ cp /tmp/cassandrix/SYSTEM/* /web/service_name/www/SYSTEM/
  7. Master Machines only:
    1. copy the files in the Cassandrix tcl to the server's private TCL library. Currently, there's only cx-defs.tcl
    2. $ cp /tmp/cassandrix/tcl/* /web/service_name/tcl/
    3. copy the Cassandrix directory into /web/service_name/www/cassandrix
    4. $ mkdir /web/service_name/www/cassandrix
      $ cp /tmp/cassandrix/* /web/service_name/www/cassandrix
    5. Make sure adp pages are enabled. In your service_name.tcl in /home/aol30, make sure you have this:
    6. [ns/server/service_name/adp]
      Map=/*.adp
    7. feed cassandrix.sql into Oracle.
    8. sqlplus orauser/orapassword < cassandrix.sql
    9. Restart your aolserver:
    10. restart-aolserver service_name
    11. goto http://service_name/cassandrix/index.adp and tell it which machines to monitor

Cassandracle

Cassandracle original docs

Cassandracle monitors an Oracle installation. For this monitor, we want to use a more restricted Oracle driver, namely /home/aol30/ora8cass.so that was created when you installed the drivers. If it doesn't exist, then go to the ArsDigita oracle driver installation.

  1. Grab the tarfile at ArsDigita Download
  2. untar it into /web/ce
  3. $ tar -xzvf cassandracle-1_0_1_tar.tgz --directory=/web
    $ mv cassandracle ce
    
  4. copy /web/service_name/tcl/ad-utilities.tcl.preload into /web/ce/tcl/ad-utilities.tcl.preload and /web/service_name/tcl/00-ad-preload.tcl into /web/ce/tcl/00-ad-preload.tcl.
  5. $ cp /web/service_name/tcl/ad-utilities.tcl.preload /web/ce/tcl/
    $ cp /web/service_name/tcl/00-ad-preload.tcl /web/ce/tcl/
    
  6. change users to oracle and specify to use that user's environment:
  7. su orauser -
    where orauser is your Oracle user
  8. Run the following at prompt:
  9. svrmgrl
       connect internal
       create user cassandracle identified by *password* default tablespace *yourtablespace* temporary tablespace temp quota unlimited on *yourtablespace*;
       grant connect, resource, dba to cassandracle;
       grant select on V_$SQLTEXT to public;
       exit
    Where *password* is any password you choose, and *yourtablespace* is the name of the tablespace used by the web server you're monitoring (without the *'s, of course)
  10. run the procedures in /web/ce/doc/helper-procedures.sql
  11. sqlplus orauser/orapassword < /web/ce/doc/helper-procedures.sql
  12. get out of the oracle user
  13. su nsadmin -
  14. make the ini file.
  15. cp /home/aol30/service_name.ini /home/aol30/ce.ini
  16. edit ce.ini
  17. insert into /etc/inittab
  18. nsce:345:respawn:/home/aol30/bin/nsd-oracle -ic /home/aol30/ce.ini
  19. type init q to load it, go to http://yourserver:1999

MTA (Mail Transport Agent) Monitor

MTA original docs

This monitors a group of mail transport agents administred by one or more administrators. It basically connects every five minutes to each SMTP port, then also try to send a little mail every 15 minutes. If it fails, then it will send email to the appropriate email addresses.

  1. Grab the tarfile at ArsDigita Download
  2. make a directory (accessable by nsadmin) /web/mmon. untar it in that directory (the tarfile creates www, parameters, and tcl directories)
  3. Create the AOLserver install:
  4. feed the data model into Oracle. You can either run
  5. sqlplus orauser/orapassword < /web/mmon/www/doc/sql/mmon.sql
    or visit http://yourserver:8888/mmon/data-model.tcl (Keep your eyes on the error log to make sure it worked). If you have problems, they you can run http://yourserver:8888/mmon/drop-everything-user-with-care.tcl
  6. Edit bouncer.pl and receiver.pl in /web/mmon/www/mmon/. Fix server's hostname or IP address and to make sure whether the Perl executable is in /usr/bin or in /usr/local/bin
  7. Within your RedHat install, you should have sendmail.
  8. Create a special E-mail account (usually an alias) on every monitored server which calls bouncer.pl. You'll enter in this alias when set up a server to be monitored. The default name is mmon_bouncer.
  9. Create a special E-mail account on the monitoring server. That account should be configured to spawn receiver.pl. For example, if you are using qmail you can create a UNIX user and put in his home directory file called .qmail (not the leading dot) with a single line:
  10. | /path-to/receiver.pl
    With Sendmail you would add a line to /etc/aliases:
    mmon-receiver: |/path-to/receiver.pl
  11. copy /web/service_name/tcl/ad-utilities.tcl.preload into /web/mmon/tcl/ad-utilities.tcl.preload and /web/service_name/tcl/00-ad-preload.tcl into /web/mmon/tcl/00-ad-preload.tcl. It will create a few error messages in your error log because it doesn't find some of the preload files that are in your server installation, but it doesn't really matter.
  12. edit /web/mmon/parameters/mmon.ini. For testing, you may want to set MinNotificationInterval,MinutesBetweenSMTPChecks and BounceTimeout to lower values to make sure that they work.
  13. edit /web/mmon/tcl/mmon-defs.tcl *****WHAT DO WE CHANGE HERE???******
  14. edit /web/mmon/parameters/mmon.ini: change the emails
  15. Restart AOLserver
  16. Visit http://yourserver:8888/mmon/server-add.tcl and add the required servers you'd like to have monitored
  17. Observe the server log and observe whether the MTA Monitor wakes up in the specified interval. Frequently reload http://yourserver:8888/mmon/controlpanel.tcl to see what's going on
  18. Simulate some problem with an MTA and see if the problems get reported. (i.e. change the SMTP port to a nonstandard value, or change the bouncer E-mail address to your own address) and make sure it gets reported