ArsDigita Archives
 
 
   
 
spacer

ACS Package Manager (APM)

by Jon Salz, Michael Yoon, and Lars Pind

ACS Documentation : ACS Core Architecture Guide : ACS Package Manager (APM)


The Big Picture

In general terms, a package is a unit of software that serves a single well-defined purpose. That purpose may be to provide a service directly to one or more classes of end-user, (e.g., discussion forums and file storage for community members, user profiling tools for the site publisher), or it may be to act as a building block for other packages (e.g., an application programming interface (API) for storing and querying access control rules, or an API for scheduling email alerts). Thus, packages fall into one of two categories:
  • Application packages: a "program or group of programs designed for end users" (the Webopedia definition); also known as modules, for historical reasons

  • Library packages: the aforementioned building blocks

The ACS itself a collection of interdependent library and application packages. Prior to ACS 3.3, all packages were lumped together into one monolithic distribution without explicit boundaries; the only way to ascertain what comprised a given package was to look at the top of the corresponding documentation page, where, by convention, the package developer would specify where to find:

  • the data model
  • the Tcl procedures
  • the user-accessible pages
  • the administration pages

Experience has shown us that this lack of explicit boundaries causes a number of maintainability problems for pre-3.3 installations:

  1. Package interfaces were not guaranteed to be stable in any formal way, so a change in the interface of one package would often break dependent packages (which we would only discover through manual regression testing). In this context, any of the following could constitute an interface change:

    • renaming a file or directory that appears in a URL
    • changing what form variables are expected as input by a page
    • changing a procedural abstraction, e.g., a PL/SQL or Java stored procedure or a Tcl procedure
    • changing a functional abstraction, e.g., a database view or a PL/SQL or Java stored function
    • changing the data model

    This last point is especially important. In most cases, changing the data model should not affect dependent packages. Rather, the package interface should provide a level of abstraction above the data model (as well as the rest of the package implementation). Then, users of the package can take advantage of implementation improvements that don't affect the interface (e.g., faster performance from intelligent denormalization of the data model), without having to worry that code outside the package will now break.

  2. A typical ACS-backed site only uses a few of the modules included in the distribution, yet there was no well-understood way to pick only what you need when installing the ACS, or even to uninstall what you didn't need, post-installation. Unwanted code had to be removed manually.

  3. Releasing a new version of the ACS was complicated, owing again to the monolithic nature of the software. Since we released everything in the ACS together, all threads of ACS development had to converge on a single deadline, after which we would undertake a focused QA effort whose scale increased in direct proportion to the expansion of the ACS codebase.

  4. There was no standard way for developers outside of ArsDigita to extend the ACS with their own packages. Along the same lines, ArsDigita programmers working on client projects had no standard way to keep custom development cleanly separated from ACS code. Consequently, upgrading the ACS once installed was an error-prone and time-consuming process.
The ACS is basically a platform for web-based application software, and any software platform has the potential to develop problems like these. Fortunately, there are many precedents for systematic ways of avoiding them, including:

Borrowing from all of the above, ACS 3.3 introduces its own package management system, the ACS Package Manager (APM), which consists of:

  • a standard format for APM packages (also called "ACS packages"), including:
    • version numbering, independent of any other package and the ACS as a whole
    • specification of the package interface
    • specification of dependencies on other packages (if any)
    • attribution (who wrote it) and ownership (who maintains it)

  • web-based tools for package management, i.e.:
    • obtaining packages from a remote distribution point
    • installing packages, if and only if:
      1. all prerequisite packages are installed
      2. no conflicts will be created by the installation
    • configuring packages (obsoleting the monolithic ACS configuration file) [ACS4]
    • upgrading packages, without clobbering local modifications
    • uninstalling unwanted packages

  • a registry of installed packages, database-backed and integrated with filesystem-based version control

  • web-based tools for package development, i.e.:
    • creating new packages locally
    • releasing new versions of locally-created packages
Consistent use of the APM format and tools will go a long way toward solving the maintainability problems listed above. Moreover, APM is the substrate that will enable us to soon establish a central package repository, where both ArsDigita and third-party developers will be able publish their packages for other ACS users to download and install.

For a simple illustration of the difference between ACS without APM (pre-3.3) and ACS with APM (3.3 and beyond), consider a hypothetical ACS installation that uses only two of the thirty-odd modules available circa ACS 3.2 (say, bboard and ecommerce):

ACS, without APM vs. with APM

APM itself is part of a package, ACS Core, a library package that is the only mandatory component of an ACS installation.

The Components of an APM Package

An APM package consists of:
  1. A set of interfaces
  2. Implementations of those interfaces
  3. Documentation
  4. A package specification

Package Interfaces

There are three types of interface that an APM package can define:
  • application programming interface (API): A stable, well-documented set of methods for interacting with the package programmatically, either to query it for information or to command it to perform an action.

  • user interface (UI): For each class of end-user (e.g., community member, site administrator), a set of web pages that provides a stable set of features.

  • configuration interface: A stable set of parameters that can be used to control the behavior of the package, whose values can be set non-programmatically, i.e., with a configuration file and/or through a user interface.
By definition, an application package provides a UI but may or may not provide an API. Conversely, a library package provides an API but may or may not provide a UI. A configuration interface is optional for either type of package.

Package Implementation

Implementation varies by type of interface:
  • APIs are implemented as one or more of the following: PL/SQL or Java stored procedure and functions, database views, Tcl library procedures, linkable URLs, e.g., /user-search
  • UIs are implemented as one or more of the following: HTML pages, Tcl pages, AOLserver Dynamic Pages (ADPs), registered procedures.
  • Virtually all API and UI implementations include a database schema (a.k.a. data model).
  • Currently, the standard way to implement a package's configuration interface is through an auxiliary AOLserver configuration file. A database-backed, generic configuration facility will be introduced in version 4.0 of the ACS Core package.
(Note that we now consider the database schema to be part of the package implementation, not the package interface. In other words, the only code that should execute queries or DML against a package's schema is the package's own implementation code. There are legacy violations of this rule that will be corrected incrementally.)

Package Documentation

A package must contain one or more of the following types of documentation:
  • High-level design documentation, written in lay terms ("The Big Picture"); every package should have this.
  • API documentation for programmers writing code that depends on the package
  • "Help" pages for end-users (with good UI design, we shouldn't need too many of these)
  • Configuration instructions for administrators who have installed the package on their site: what parameters are available; for each parameter, what values are valid;
  • Implementation documentation for the package maintainer ("Under the Hood"), e.g., descriptions of any optimizations like denormalization or caching, periodic processes (i.e., scheduled procedures), external programs or scripts used, etc.

Package Specification: The .info file

The package specification is an XML document that lists:
  • properties of the package such as name, version, owner
  • the interfaces that the package provides
  • the external interfaces upon which the package depends
  • the names and types of all files included in the package
Package specifications are typically not authored manually; rather, APM provides a UI for

Here is a sample excerpt from the specification of the ACS Core package itself:

<?xml version="1.0"?>
<!-- Generated by the ACS Package Manager -->

<package key="acs-core" url="http://software.arsdigita.com/packages/acs-core">
    <version name="3.3.0" url="http://software.arsdigita.com/packages/acs-core-3.3.0.apm">
        <package-name>ACS Core</package-name>
        <owner url="mailto:jsalz@mit.edu">Jon Salz</owner>
        <summary>Routines and data models providing the foundation for ACS-based Web services.</summary>
        <release-date>2000-06-03</release-date>
        <vendor url="http://www.arsdigita.com/">ArsDigita Corporation</vendor>

        <provides url="http://software.arsdigita.com/packages/developer-support/tcl-api" version="0.2d"/>
        <!-- No included packages -->

        <files>
            <file type="tcl_procs" path="00-proc-procs.tcl"/>
            <file type="tcl_procs" path="10-database-procs.tcl"/>
            ...
        </files>
    </version>
</package>
The only attributes of the <package> element itself are key and url. The key attribute is a default short name for the package that appears in the APM site administrator UI; to enable the prevention of namespace collision, the key is not fixed but can be changed within an ACS installation. The url attribute identifies the authoritative distribution point for the package (specifically, a directory from which all versions of the package can be obtained). It also serves as the package's universally unique identifier and therefore cannot be changed.

All other properties of the package are stored as attributes and child elements of the <version> element, since they can vary from version to version. The <version> element also has two attributes: name and url. The name attribute is actually a version number that conforms to the numbering convention defined below. It is called name instead of number, because it can be alphanumeric, not purely numeric. The name attribute also designates the maturity of the package: development, alpha, beta, or release. As with the <package> element, the url attribute identifies the authoritative distribution point for the specified version of the package (specifically, the location of an actual package file that can be downloaded) and serves as the package version's universally unique identifier.

The version element contains:

  • One <package-name> element, which is a pretty name for the package
  • One or more <owner> elements, each of which identifies a party responsible for maintenance of the package
  • One <summary> element
  • One <description> element (optional)
  • One <release-date> element
  • One <vendor> element (optional), which identifies the organization that maintains the package
  • Zero or more <provides> elements, each of which identifies an interface provided by the package
  • Zero or more <requires> elements, each of which identifies an interface upon which the package depends
  • One <files> element, containing one <file> element for each
  • One or more <parameter> elements that specify the package's configuration interface [ACS4]
A <provides> or <requires> element identifies an interface with the combination of its url and version attributes, where url is a universally unique identifier for the interface (API or UI) and version is an identifier that conforms to the same version numbering convention used for packages. The convention for constructing an interface URL is:
http://vendor-host/packages/logical-name/implementation-type
In the above example, the vendor-host is software.arsdigita.com, the logical-name is developer-support, and the implementation-type is tcl-api. Other implementation-type values include plsql-api, sql-views, and java-api. (At this time, the result of visiting an interface URL is undefined; in the future, it will display the documentation for the identified interface.)

Once an interface is published in an <provides> element, future versions of the package must maintain that interface, i.e., no changes can be made to the interface or its implementation that would cause dependent code to break. The interface can be augmented, in which case the version number should be incremented, i.e., a later version of an interface is always the superset of an earlier version. To communicate the fact that an incompatible change has been made to an interface, the package owner will remove the original <provides> element and add a new, different <provides> element, e.g., hypothetically, we might someday replace developer-support/tcl-api with developer-support/tcl-api-2.

Also, a <provides> element can include a deprecated attribute, meaning that the package owner expects to remove the corresponding interface in the future.

Version Numbering Convention

A version number consists of:
  1. A major version number.
  2. Optionally, up to three minor version numbers.
  3. One of the following:
    • The letter d, indicating a development-only version (i.e., definitely broken)
    • The letter a, indicating an alpha release (i.e., probably broken)
    • The letter b, indicating a beta release (i.e., somewhat broken)
    • No letter at all, indicating a final release (i.e., not broken or, realistically, broken a little)

In addition, the letters d, a, and b may be followed by another integer, indicating a version within the release.

For those who like regular expressions:

version_number := integer ('.' integer){0,3} (('d'|'a'|'b') integer?)?

So the following is a valid progression for version numbers:

0.9d, 0.9d1, 0.9a1, 0.9b1, 0.9b2, 0.9, 1.0, 1.0.1, 1.1b1, 1.1

Distribution Format: The .apm file

In Maximum RPM, Edward Bailey writes:
Normally, package management systems take all the various files containing programs, data, documentation, and configuration information, and place them in one specially formatted file -- a package file.
This description fits APM packages, which are distributed as gzip-compressed tarfiles, with the special extension .apm. The full naming convention for APM package files is:
package-key-package-version-name.apm
For instance, the first production release of the ACS Core package is named acs-core-3.3.0.apm.

Inside the tarfile, there is one directory at the top level, with the same name as the package key, which, in turn, contains:

  • an optional www directory, in which the implementation of the package's UI (if any) resides

  • zero or more Tcl scripts that are loaded when the server starts. Files ending in -procs.tcl define Tcl procedures; files ending in -init.tcl contain code to be run at initialization time (e.g., filter registration).

  • zero or more SQL files (any files in the directory with a .sql extension) that contain the DDL statements to install the package's database schema and/or the package's database-resident API (views, stored procedures, stored functions)

  • zero or more SQL files, each of which upgrades the package's database schema from one version to a later version (not necessarily the next version, if no upgrades were needed for intervening versions) and is named according to the convention:
    upgrade-version-name-next-version-name.sql
    (If any of these files are present, they will be located in an upgrade subdirectory.)

  • a documentation file named package-key.html or package-name.adp, or a doc subdirectory containing multiple documentation files

  • The package specification file, named package-key.info
Aside from the package specification, all items listed above are optional.

ACS Directory Structure

APM installs packages in the packages subdirectory of the server root directory, at the same level as the legacy www, tcl, and parameters directories (which, by the way, continue to serve the same purposes as they did in versions of ACS prior to 3.2; we may remove some of this backward-compatibility in ACS 4).

Thus, the directory structure of the hypothetical ACS 3.3 installation that is illustrated in the diagram above would look something like this:

server-root/
  |
  +-- packages/
        |
        +-- acs-core/
        |
        +-- bboard/
        |     |
        |     +-- doc/
        |     |     |
        |     |     +-- index.html
        |     |     |
        |     |     +-- ...
        |     |
        |     +-- www/
        |     |     |
        |     |     +-- admin/
        |     |     |     |
        |     |     |     +-- index.adp
        |     |     |     |
        |     |     |     +-- ...
        |     |     |
        |     |     +-- index.adp
        |     |     |
        |     |     +-- ...
        |     |
        |     +-- bboard.info
        |     |
        |     +-- bboard.sql
        |     |
        |     +-- bboard-init.tcl
        |     |
        |     +-- bboard-procs.tcl
        |     |
        |     +-- ...
        |
        +-- ecommerce/
              |
              +-- ...
Another component of the ACS Core package, the Request Processor, is responsible for making the various package user interfaces integrate into one coherent hierarchy of URLs. The basic algorithm used to translate a URL into a filesystem path is simple: "When an HTTP request for /package-key/filename is received, then return the file server-root/packages/package-key/www/filename." (In reality, the job of the Request Processor is not so simple.)

Changes From ACS 3.2 and Prior Versions

Prior to the introduction of APM in ACS 3.3, the contents of a given package were scattered throughout the site's physical structure:
  • the Tcl library scripts for all packages were located in the server-root/tcl directory
  • the UI pages for all packages were located in the directory structure beneath the page root (server-root/www), which translated directly into the site's URL hierarchy
  • the data model files for all packages were located in the server-root/www/doc/sql directory
In contrast, APM imposes a vertical organization wherein the filesystem does not map directly to the URL hierarchy. The main advantage of the pre-APM filesystem organization was the fact that, given a URL, you always knew where to look for the corresponding file under the page root. In our judgement, the benefit of having the filesystem explicitly preserve the modularity of installed packages outweighs this advantage, and the extra complexity that's now built into the Request Processor.

Future Improvements

  • Implement aforementioned configuration facility.
  • Adjust design and implementation to work with forthcoming Parties/Subcommunities model.
  • Implement installation chaining, i.e., installing one package causes any required packages that are not installed to be installed, if they can be obtained. (The FreeBSD ports collection does this.)
  • Implement composite packages, i.e., packages that contain other packages. There is already stub support for this. Installation chaining may actually make this superfluous.
  • Compliance with XML Namespaces (http://www.w3.org/TR/REC-xml-names/); may provide a standard way to solve the namespace collision problem that the key attribute of the package element is designed to address.
  • A method for explicit definition of interfaces (i.e., mapping a UI identifier to be a set of URLs or an API identifier to a set of procedure/function signatures) and, potentially, automated detection of incompatibility
  • Consider a suffix other than .info for package specifications: perhaps just .xml?
  • Documentation improvements:
  • Write a formal DTD for APM package specifications.
    • User experience documentation: for each class of user, what questions can be asked, what actions performed.
    • API documentation
    • Add examples of how interfaces can be broken
    • Document the integration of CVS and APM (specifically regarding imported packages vs. locally developed packages)
    • Documentation browser for installed packages
  • Consider moving a separate api directory
  • Clarify the rules that map files in packages to URLs; what follows is preserved from an earlier version of this document:
    The distribution file containing a package is rooted at the server root, so (for instance) one might find the file packages/address-book/address-book.html in the package. If for some reason a package needs to contribute a file to the global www directory rather than its package-private one, the package could just contain the file www/foo/bar.tcl; this file would be installed into the site-wide www directory.

    Package distribution files can contain files in other packages' directories; this flexibility will be useful in case a package needs to augment another package by providing extra services. For instance, a package providing attachment support for the address book might contain a packages/address-book/www/view-attachments.tcl file. However, it could not contain a new packages/address-book/www/index.tcl file - we allow a file to belong to only one package. (To provide a "hook" to the attachment package, the address book could use a Package Manager API to determine whether the attachment package is installed, displaying a link to view-attachments.tcl only in that case.)

Under the Hood

At startup, the ACS Core scans all package specifications and synchronizes them with the database. Mismatches (indicating that new packages have been installed) will result in appropriate action (running upgrade scripts or notifying the administrator).
michael@arsdigita.com
spacer