Installing Monitors
ACS Documentation :
ACS
Installation Guide : Installing Monitors
Installing all of the standard monitors for your system can be a hassle
of configuration. A small change can take hours to fix, and overall, it
is not very helpful to the user. Therefore, we have attempted to simplify
the documents and show step-by-step the steps necessary for a complete
monitored system and also a happy sysadmin.
Table of Contents
Keepalive
Keepalive
original docs
Keepalive makes sure that your server can be accessed regularly. If
it can't, it'll perform specified actions.
-
Grab the tar file from Arsdigita
Download page Save it into /tmp.
-
Login as nsadmin and untar it. Move the directory to /web/keepalive.
$ su - nsadmin
$ cd /tmp
$ tar -xzvf keepalive_tar.tgz --directory=/web
-
Change the ownership of the directory as nsadmin.
$ su - nsadmin
; enter nsadmin password
$ chown -R nsadmin.nsadmin /web/keepalive
-
Use a text editor to make /home/aol30/keepalive.ini or grab it from keepalive.ini.
Set the correct address and hostname under [ns/server/keepalive/module/nssock].
Make sore you have at least the following:
[ns/server/keepalive/modules]
nslog=nslog.so
nssock=nssock.so
nsperm=nsperm.so
-
Make sure you have a restart-aolserver script in /usr/local/bin.
If you don't have it, it's at the
ACS documentation..
-
Edit /web/keepalive/tcl/defs.tcl. Make sure that keepalive_email
returns a valid email address for error logs to be sent to.
-
Edit /web/keepalive/tcl/init.tcl and update the keepalive_init
procedure. In init.tcl, add monitors in the same way as the sample.
The arguments are, in order:
-
name
-
URL of test page
-
expected return
-
shell command to execute if failure: (restart-aolserver yourservername)
-
TCL list of admin email addresses to notify
-
TCL list of pager email address to notify
-
(optional) number of retries before failure action is executed. This defaults
to 5.
-
(optional) threshold of retries below which email is sent. This defaults
to the number of retries, meaning that Keepalive will send mail if there
is any problem (if you feel that you're getting spammed about problems
that work themselves out, set this to some lower number; we find that 4
and 2 are good numbers)
Here is a sample:
lappend keepalive_monitor_list [new_monitor "service_name" "http://127.0.0.1/SYSTEM/dbtest.tcl" "success" "restart-aolserver service_name" [list email_1@arsdigita.com email_2@arsdigita.com] [list your_pager@arsdigita.com] 4 2]
-
Copy /web/service_name/tcl/ad-utilities.tcl.preload into /web/keepalive/tcl/ad-utilities.tcl.preload
and /web/service_name/tcl/00-ad-preload.tcl into /web/keepalive/tcl/00-ad-preload.tcl.
It will create a few error messages in your error log because it doesn't
find some of the preload files that are in your server installation, but
it doesn't really matter.
$ cp /web/service_name/tcl/ad-utilities.tcl.preload /web/keepalive/tcl/
$ cp /web/service_name/tcl/00-ad-preload.tcl /web/keepalive/tcl/
-
Now, you can start the process by typing
/home/aol30/bin/nsd -c /home/aol30/keepalive.ini
-
To test the service, visit your website and go to /SYSTEM/dbtest.tcl.
You should be see a text message: success. If not, it means that
you are accessing the wrong URL or your system is not working at all!.
-
Visit your website again, this time at port 1997. You should see a page
that looks like this.
The keepalive system will periodically query your system's dbtest.tcl
page. This page returns success if and only if there is a working database
connection. If keepalive does not receive the successmessage,
it decrements the counter. When the counter reaches the notify threshold,
you will get an email. When the counter reaches 0, the system will be restarted
and the process will begin again.
-
To ensure this is happening, issue the following command:
$ mv /web/service_name/www/SYSTEM/dbtest.tcl /web/service_name/www/SYSTEM/dbtest.old
Now periodically reload the Keepalive page and watch the counters decrement.
You can also watch the error log file. (Hit CTRL-C to stop).
$ tail -f /home/aol30/log/keepalive-error.log
Make sure that you receive e-mail. In a separate window, keep tabs on your
primary service.
$ tail -f /home/aol30/log/service_name-error.log
After the counter reaches 0, your primary service should restart. If it
doesn't make sure that restart-aolserver still works and that
the monitor definition in /web/keepalive/tcl/init.tcl is correct.
If the system restarts, restore the dbtest.tcl file.
$ mv /web/service_name/www/SYSTEM/dbtest.tcl.old /web/service_name/www/SYSTEM/dbtest.tcl
Make sure that keepalive is now functioning again.
-
Insert keepalive into /etc/inittab to make sure it starts automatically
and restarts if it is ever killed.
$ su -
# emacs -nw /etc/inittab
Add this line:
nska:345:respawn:/home/aol30/bin/nsd -ic /home/aol30/keepalive.ini
Kill keepalive and re-initialize inittab.
# restart-aolserver keepalive
# init q
Make sure that both keepalive and your primary service are still running.
You're done.
Uptime
Uptime original
docs
Uptime will make sure that your web server is up and running by checking
it at designated intervals and performing the specified actions on it.
Sign up for Uptime
If the machine on which your service runs is down, the keepalive service
on your machine will be down as well. Uptime resides on a separate server
and sends alerts when your server can not be reached. You should use the
forms at Uptime to register alerts to the following:
-
All the people involved with your service
You should break your montoring page to make sure Uptime sends an alert.
Then return the page to normal.
Watchdog
Watchdog original
docs
Watchdog will check your error logs as designated intervals and send
email of the error to the ones specified.
-
Grab the tarfile at ArsDigita
Download
-
Untar it into /web/watchdog
$ tar -xzvf keepalive_tar.tgz --directory=/web
-
change the ownership of the directory as nsadmin.
chown -R nsadmin.nsadmin /web/watchdog
-
grab the ini file from www.arsdigita.com/install/watchdog.ini
and put it in /home/aol30
-
modify the ini file
-
Change the ownership of the file.
chown nsadmin.nsadmin watchdog.ini
-
Inside the file: change server directory from /home/nsadmin to the correct directory (usually /home/aol30)
-
Set the correct address and hostname under [ns/server/watchdog/module/nssock]
-
Remove the ssl section (same as in keepalive) if you don't have it
-
Edit /web/watchdog/tcl/defs.tcl so that the watchdog_maintainer_email proc returns the correct e-mail of the site maintainer
-
Copy ad-utilities.tcl.preload, 00-ad-preload.tcl, ad-defs.tcl, and ad-aolserver-3.tcl into /web/watchdog/tcl
$ cp /web/service_name/tcl/ad-utilities.tcl.preload /web/watchdog/tcl/
$ cp /web/service_name/tcl/00-ad-preload.tcl /web/watchdog/tcl/
$ cp /web/service_name/tcl/ad-defs.tcl /web/watchdog/tcl/
$ cp /web/service_name/tcl/ad-aolserver-3.tcl /web/watchdog/tcl/
-
Insert watchdog into the /etc/inittab
nswd:345:respawn:/home/aol30/bin/nsd -ic /home/aol30/watchdog.ini
-
Goto http://yourserver:1998/ to add your server to the list.
-
Create some tcl errors, make sure email is sent. The email is sent to the
administrator, unless specified in /web/yourserver/parameteres/yourserver.ini
file under [ns/server/emp530/monitoring]
Cassandrix
Cassandrix
original docs
Cassandrix makes sure that you have enough disk space on your harddrive.
If it starts to run out, it will send email alerts.
-
Grab the tarfile at ArsDigita
Download
-
untar it somewhere (/tmp, for example)
$ tar -xzvf cassandrix-1_0_tar.tgz --directory=/tmp
-
change the ownership of the directory to be set to yourself.
chown -R yourself.yourself /tmp/cassandrix
-
Target machines only:
-
copy the files in the Cassandrix SYSTEM directory into /web/service_name/www/SYSTEM
directory.
$ cp /tmp/cassandrix/SYSTEM/* /web/service_name/www/SYSTEM/
-
Master Machines only:
-
copy the files in the Cassandrix tcl to the server's private TCL library.
Currently, there's only cx-defs.tcl
$ cp /tmp/cassandrix/tcl/* /web/service_name/tcl/
-
copy the Cassandrix directory into /web/service_name/www/cassandrix
$ mkdir /web/service_name/www/cassandrix
$ cp /tmp/cassandrix/* /web/service_name/www/cassandrix
-
Make sure adp pages are enabled. In your service_name.tcl in /home/aol30,
make sure you have this:
[ns/server/service_name/adp]
Map=/*.adp
-
feed cassandrix.sql into Oracle.
sqlplus orauser/orapassword < cassandrix.sql
-
Restart your aolserver:
restart-aolserver service_name
-
goto http://service_name/cassandrix/index.adp and tell it which machines
to monitor
-
Host Name : the name of the host to be monitored. This is just used for
putting a name with links on the various pages, and doesn't have to be
a fully-qualified domain name.
-
base URL : the base url from which to construct the /SYSTEM/* urls which
generl through pager gateway. It's best to make this a generic subject
since (if supplied) will be used as the subject for all alerts, including
the "everything's OK" alert.
-
custom email body : specialized email body to use on outgoing mail. Like
the custom email subject, this is used for all a filesystems that are full
are appended to the body.
-
notification interval : how often to send mail complaining that disks are
full. It doesn't make much sense to set this to be less than the monitor
interval.
Cassandracle
Cassandracle
original docs
Cassandracle monitors an Oracle installation. For this monitor, we want
to use a more restricted Oracle driver, namely /home/aol30/ora8cass.so
that was created when you installed the drivers. If it doesn't exist, then
go to the ArsDigita oracle driver installation.
-
Grab the tarfile at ArsDigita
Download
-
untar it into /web/ce
$ tar -xzvf cassandracle-1_0_1_tar.tgz --directory=/web
$ mv cassandracle ce
-
copy /web/service_name/tcl/ad-utilities.tcl.preload into /web/ce/tcl/ad-utilities.tcl.preload
and /web/service_name/tcl/00-ad-preload.tcl into /web/ce/tcl/00-ad-preload.tcl.
$ cp /web/service_name/tcl/ad-utilities.tcl.preload /web/ce/tcl/
$ cp /web/service_name/tcl/00-ad-preload.tcl /web/ce/tcl/
-
change users to oracle and specify to use that user's environment:
su orauser -
where orauser is your Oracle user
-
Run the following at prompt:
svrmgrl
connect internal
create user cassandracle identified by *password* default tablespace *yourtablespace* temporary tablespace temp quota unlimited on *yourtablespace*;
grant connect, resource, dba to cassandracle;
grant select on V_$SQLTEXT to public;
exit
Where *password* is any password you choose, and *yourtablespace* is the name of the tablespace used by the web server you're monitoring (without the *'s, of course)
-
run the procedures in /web/ce/doc/helper-procedures.sql
sqlplus orauser/orapassword < /web/ce/doc/helper-procedures.sql
-
get out of the oracle user
su nsadmin -
-
make the ini file.
cp /home/aol30/service_name.ini /home/aol30/ce.ini
-
edit ce.ini
-
change ora8.so to ora8cass.so
-
change User=service_name to User=cassandracle and Password=service_password to Pasword=cassandracle_password (the password you chose earlier)
-
change all other instances of service_name to ce
-
delete the auxconfigdir line in [ns/parameters]
-
change the Pageroot to /web/ce in [ns/server/ce]
-
add the line Port=1999 in [ns/server/ce/module/nssock]
-
insert into /etc/inittab
nsce:345:respawn:/home/aol30/bin/nsd-oracle -ic /home/aol30/ce.ini
-
type init q to load it, go to http://yourserver:1999
MTA (Mail Transport Agent) Monitor
MTA original docs
This monitors a group of mail transport agents administred by one or
more administrators. It basically connects every five minutes to each SMTP
port, then also try to send a little mail every 15 minutes. If it fails,
then it will send email to the appropriate email addresses.
-
Grab the tarfile at ArsDigita
Download
-
make a directory (accessable by nsadmin) /web/mmon. untar it in that directory
(the tarfile creates www, parameters, and tcl directories)
-
Create the AOLserver install:
-
copy an ini file that uses ora8.so... for example, use /home/aol30/yourserver.ini
and copy it to /home/aol30/mmon.ini If you use
cp -p yourserver.ini mmon.ini
it will preserve your permissions as well
-
change the appropriate directories and logfiles, make sure to remove the
line about ssl if you don't have it.
-
use any free port... say 8888 if it's available. in [ns/server/mmon/module/nssock]
-
copy one of the directories in /home/aol30/servers/, name it mmon.
-
Add it to the inittab. Make sure you use nsd-oracle
-
feed the data model into Oracle. You can either run
sqlplus orauser/orapassword < /web/mmon/www/doc/sql/mmon.sql
or visit http://yourserver:8888/mmon/data-model.tcl (Keep your eyes on
the error log to make sure it worked). If you have problems, they you can
run http://yourserver:8888/mmon/drop-everything-user-with-care.tcl
-
Edit bouncer.pl and receiver.pl in /web/mmon/www/mmon/. Fix server's hostname
or IP address and to make sure whether the Perl executable is in /usr/bin
or in /usr/local/bin
-
Within your RedHat install, you should have sendmail.
-
Create a special E-mail account (usually an alias) on every monitored server
which calls bouncer.pl. You'll enter in this alias when set up a server
to be monitored. The default name is mmon_bouncer.
-
Create a special E-mail account on the monitoring server. That account
should be configured to spawn receiver.pl. For example, if you are using
qmail you can create a UNIX user and put in his home directory file called
.qmail (not the leading dot) with a single line:
| /path-to/receiver.pl
With Sendmail you would add a line to /etc/aliases:
mmon-receiver: |/path-to/receiver.pl
-
copy /web/service_name/tcl/ad-utilities.tcl.preload into /web/mmon/tcl/ad-utilities.tcl.preload
and /web/service_name/tcl/00-ad-preload.tcl into /web/mmon/tcl/00-ad-preload.tcl.
It will create a few error messages in your error log because it doesn't
find some of the preload files that are in your server installation, but
it doesn't really matter.
-
edit /web/mmon/parameters/mmon.ini. For testing, you may want to set MinNotificationInterval,MinutesBetweenSMTPChecks
and BounceTimeout to lower values to make sure that they work.
-
edit /web/mmon/tcl/mmon-defs.tcl *****WHAT DO WE CHANGE HERE???******
-
edit /web/mmon/parameters/mmon.ini: change the emails
-
Restart AOLserver
-
Visit http://yourserver:8888/mmon/server-add.tcl and add the required
servers you'd like to have monitored
-
Observe the server log and observe whether the MTA Monitor wakes up in
the specified interval. Frequently reload http://yourserver:8888/mmon/controlpanel.tcl
to see what's going on
-
Simulate some problem with an MTA and see if the problems get reported.
(i.e. change the SMTP port to a nonstandard value, or change the bouncer
E-mail address to your own address) and make sure it gets reported