ArsDigita Archives
 
 
   
 
spacer

Monitoring

ACS Documentation : ACS Administrator's Guide : Monitoring
  • User directory: none
  • Admin directory: /admin/monitoring/
  • Procedures: /tcl/watchdog-defs, /tcl/cassandracle-defs
  • Binaries: /bin/aolserver-errors.pl

The Big Picture

The ArsDigita Community System has an integrated set of monitoring tools.

Parameters

Monitoring parameters as centralized in the monitoring section of the .ini file. Add a new PersontoNotify for each person who should receive monitoring alerts.
[ns/server/yourservername/acs/monitoring]
; People to email for alerts
PersontoNotify=nerd1@yourservicename.com
;PersontoNotify=nerd2@yourservicename.com
; location of the watchdog perl script
WatchDogParser=/web/yourservicename/bin/aolserver-errors.pl
; watchdog frequency in minutes
WatchDogFrequency=15

Current page requests - monitor

The "current page request" section (linked from /admin/monitoring/) will produce a report like the following.

There are a total of 8 requests being served right now (to 8 distinct IP addresses). Note that this number seems to include only the larger requests. Smaller requests, e.g., for files and in-line images, seem to come and go too fast for this program to catch.
conn #client IPstatemethodurln secondsbytes
17899212.252.145.38runningGET/photo/pcd3255/chappy-store-31.4.jpg59158544
1818538.27.213.213runningGET/wtr/thebook/html210
18247171.210.228.91runningGET/photo/nikon/nikon-reviews150
18367209.86.54.190runningGET/bboard/image834228
18454199.174.160.135runningGET/photo/pcd1669/treptower-big-view-51.4.jpg134376
18464207.100.29.220running??10
18468216.214.210.53runningGET/chat/js-refresh00
18481216.34.106.252runningGET/monitor00

This report will inform you which users are waiting on pages from your server. In the report above, users asking for large images or pages are waiting. This is normal because some users have very slow connections.

If you see the same or .adp file often, especially with the longest wait times, it is likely that the script is extremely slow or is hogging database handles. You should

  • Examine and fix the page
  • User ad_return_if_another_copy_is_running to limit the number of times the page can concurrently run (limit to a few less than your total db pool). This will prevent multiple executions of that page from destroying your whole web service.

If you see a large number of requests from the same IP address, it is likely that a poorly-designed spider is attacking your web service. To stop it, ban that IP address from your system.

Cassandracle (Oracle)

Cassandracle is a Web-based monitor for an Oracle installation. The goal is that, at a glance, a novice Oracle DBA ought to be able to identify problems and find pointers to relevant reference materials.

To use Cassandracle in your installation, you will need to give the web service's database user read access to some core Oracle tables.

  1. Log into Oracle via sqlplus
  2. Execute:
    SQL> connect internal
  3. Run the commands in /sql/cassandracle.sql
  4. Execute
    SQL> grant ad_cassandracle to username;

    Configuration

    This is a simple section with information about the current machine and connection. The information provided is pretty sparse and should expand in the future.

    WatchDog (Error log)

    Every WatchDogFrequency seconds, the service's error logs will be scanned. If errors are found, they will be emailed to those configured as a PersontoNotify. The administration pages have a tool to search the error log for errors.

    If WatchDogFrequency is 0, the error logs won't be scanned regularly.

    Registered Filters and Schedule Procedures

    The procs ad_register_filter and ad_schedule_proc are wrappers around the corresponding ns_ calls, which allow us to more carefully track what's happening on the server and when. /admin/monitoring/filters shows which filters are called for which URLs and methods, and /admin/monitoring/scheduled-procs shows which procedures are scheduled to be called in the future.

    Monitoring top output

    Every TopFrequency seconds, the system call residing at TopLocation will be run, and its output parsed into overall and procedure-specific statistics. /admin/monitoring/top shows the historical results of this periodic call, and lets you see the current output of top on the machine your service is running on, over the web.

    If TopFrequency is 0, top won't be run regularly.
    See also ad-monitoring-defs.tcl and monitoring.sql.

    • Caveat 1: top output varies a great deal from one implementation to another, and this monitor currently only recognizes the syntax of the default Solaris 7 top function.
    • Caveat 2: Neither the regular monitoring nor running top from a web page are possible if you are running the ACS in a chrooted environment, since top looks in /proc, a sensitive directory.

    teadams@arsdigita.com
    jsalz@mit.edu
spacer