ArsDigita Archives
 
 
   
 
spacer

ArsDigita Reporte

by Terence Collins and Philip Greenspun
Suppose that you have a bunch of Web services all running on one Unix box. ArsDigita Reporte is a separate reports server that, every night at 2:00 am, will
  • grab the logs from your production servers
  • run analog to produce a daily report for the day before. As a side effect of running analog, ArsDigita Reporte
    • accumulates a cache for weekly and monthly reports
    • does reverse DNS lookups and accumulates a cache for those (so that you can see how many users came from France or .edu domains)
  • if appropriate, run analog over the cache files to produce a weekly or monthly report
The idea of this is that you give each customer a username/password pair for your reports server and then the reports server demands authentication. One a user authenticates himself, he is redirected to the appropriate section of the server and sees only his service's statistics.

ArsDigita Reporte is not in any way innovative as far as log analysis goes. We provide no more and no less capability than analog (which is one of the best tools and is free and you have the source code so you can change it, which is what we did to cope with some weirdness in how analog treats some of our .tcl page loads). What ArsDigita Reporte saves you from having to do is write a bunch of Unix cron jobs and set up servers to deliver the reports to your clients.

ArsDigita Reporte simultaneously accomplishes Year 2000-compliance and scalability by keeping each year's reports in a separate directory named "1998" or "1999" or whatever. This keeps any particular Unix directory from filling up with thousands of files and making poor old Unix dig around too much for the file you need.

ArsDigita Reporte should work with any Web server program (i.e., you can be using Apache or Netscape Enterprise or whatever as your user-visible Web server). To run the reporting server, you will need to

Some background for ArsDigita Reporte may be obtained by reading Philip and Alex's Guide to Web Publishing.

This is free software, copyright ArsDigita and distributed under the GNU General Public License.

Known Problems

January 4, 1999: A new version is implemented with 0-padded month fields. If you have downloaded the old distribution, you can make these fixes yourself:
  • in tcl/daily-report-procs.tcl:
    add the lines

    set todays_month [format %02d $todays_month]
    set month [format %02d $month]

    before

    set report_date "${two_digit_year}${month}$monthday"

  • in tcl/scheduled-procs.tcl:
    add the line

    set month [format %02d $month]

    before

    set report_date "${two_digit_year}${month}$monthday"

  • in tcl/defs.tcl:
    under the procs six_digit_time_string_from_ns_time and eight_digit_time_string_from_ns_time, change these lines from:

    set month [expr 1 + [parsetime_from_seconds mon $time]]

    to:

    set month [format %02d [expr 1 + [parsetime_from_seconds mon $time]]]

    In the proc month_length_in_days, add the line

    set numeric_month [string trimleft $numeric_month 0]

    before the line

    switch $numeric_month {

  • in service-index.tcl:
    change this lines from:

    set one_month_ago_time [expr $todays_time - ([month_length_in_days [expr [string range [six_digit_time_string_from_ns_time $todays_time] 2 3] - 1]] * $one_day_in_seconds)]

    to these:

    set last_month_numeric [expr ([string range [six_digit_time_string_from_ns_time $todays_time] 2 3] - 1)]
    if {$last_month_numeric==0} {set last_month_numeric 12}
    set one_month_ago_time [expr $todays_time - ([month_length_in_days $last_month_numeric ] * $one_day_in_seconds)]

    There aren't any known problems with ArsDigita Reporte per se. However, analog itself has been known to dump core. Sometimes just pulling the command line from the AOLserver error log (where it is written regardless of whether analysis produces any errors) and rerunning it from a shell will get you out of your difficulty. Anyway, Reporte will send you email if it has any problem overnight. And Reporte will try again during the day if your machine was down during the night. So it should be reasonably robust.

    My (philg's) main complaint with the whole system is that reverse DNS lookup makes log analysis crawl. Analyzing 24 hours of photo.net logs (500,000 hits) takes between 3 and 5 hours! This is being done in the early morning hours on a 4-CPU HP RISC box with 4 GB of RAM so presumably all the bottleneck is reverse DNS.


    tcollins@arsdigita.com
    and
    philg@mit.edu
spacer