ACS Core 4.0 Overview

by Michael Yoon and Rafael Schloming

Contents

Who Should Read This Document

Developers familiar with prior versions of ACS.

Background

A number of important reusable designs (a.k.a. design patterns) and design conventions have emerged over the course of three major cycles of ACS development:
  1. Type Specialization and Attribute Metadata (user_group_types, user_group_type_fields, and *_info tables)
  2. Object Identification (on_which_table + on_what_id)
  3. Object Context ("module scoping")
  4. Consolidated Access Control (General Permissions)
  5. Attribute Value Storage ("skinny" tables of key-value pairs)
  6. Audit Trail
Each item in this list is discussed in detail below.

1. Type Specialization and Attribute Metadata (user_group_types, user_group_type_fields, and *_info tables)

The pre-4.0 User Groups data model provides a means to define special types of user group with extended attributes (i.e., type-specific attributes not stored as columns of the user_groups table) by:
  1. adding rows into the User Groups metadata model (user_group_types, user_group_type_fields, etc.)
  2. creating helper tables to store attribute values for instances of the specialized group type, which follow the naming convention "<group_type>_info"

The original vision for the Content Repository (see the section "A modest proposal: one content table") also includes the concept of type specialization:

If Oracle were a true object database, we could have tables that inherited from the all_content table but supplemented it with extra columns (e.g., refers_to in the case of bboard). But Oracle isn't so we will probably have to resort to kludges like the _info helper tables that we have for user groups.

Additionally, the metadata in tables like user_group_type_fields is used to generate a user interface for manipulating the specific attributes of different user group types (the /admin/ug/ pages) automatically.

Another of ACS 4's design goals is to enable broader usage of this type of metadata:

We need metadata inside Oracle (so that it is available to all clients, whether AOLserver or other) that, for any column in the database says
In other words, the goal is to generalize the attribute metadata model devised for user group types.

2. Object Identification (on_which_table + on_what_id)

In the pre-4.0 series of ACS releases, the de facto standard for identifying an arbitrary object (a.k.a. database row) is to use a pair of columns, one to store the name of a table and another to store the value of a unique key that identifies the row within the table. By convention, these columns are named on_which_table and on_what_id, respectively. (There are exceptions, like the site_wide_index table, which has columns named table_name and the_key.)

The on_which_table + on_what_id idiom is useful in situations where you want to treat heterogenous types of object uniformly. Examples of this idiom in use include:

3. Object Context ("module scoping")

A variation on the on_which_table/on_what_id theme is the Module Scoping design introduced in ACS 3.0, which uses a triple of columns (scope, user_id, and group_id) to identify the "scope" with which a given object in the database is associated.

"Scope" is a term best explained by example. Consider some hypothetical rows in the address_book table:

...scopeuser_idgroup_id...
...user123 ...
...group 456...
...public  ...
The first row represents an entry in User 123's personal address book, the second row represents an entry in User Group 456's shared address book, and the third row represents an entry in the site's public address book.

In this way, the scoping columns identify the context in which a given object belongs, where each context is either a person or a group of people or the general public (itself a group of people).

4. Consolidated Access Control (General Permissions)

Here is a excerpt from the General Permissions chapter of the ACS 3.4 Core Architecture Guide that explains what we mean by "Consolidated Access Control":
The General Permissions package lets you control who can do what with each row in your database, by providing: There are five types of party to whom permissions can be granted: In essence, General Permissions treats rows in the database as operating systems like Unix and Microsoft Windows NT treat files in the filesystem.

5. Attribute Value Storage ("skinny" tables of key-value pairs)

Several pre-4.0 ACS data models use a combination of to store attribute values for a specific type of object. For example: In addition, there are some instances where we are not using this model but should be, e.g., the users_preferences table, which stores preferences for registered users in columns such as prefer_text_only_p and dont_spam_me_p. The "standard" way for an ACS 3.x-based application to add to the list of user preferences is to add a column to the users_preferences table (exactly the kind of data model change that has historically complicated the process of upgrading to a more recent ACS version).

6. Audit Trail

The Developer's Guide (see the "Naming (columns)" section) makes basic auditing columns a standard for ACS tables:
The following column names are ArsDigita Community System standards.
Additionally, there is a defined set of conventions for maintaining comprehensive audit trails on the rows in a given table, that includes a naming convention for audit tables (the _audit suffix) and guidelines for writing auditing triggers.

Tour of the Data Model

Like most data models, the ACS Core data model has two levels:
  1. the knowledge level (a.k.a. the metadata model or metamodel)
  2. the operational level (a.k.a. the "actual" data model where concrete objects like users are stored)
You can browse the data models themselves from here: (Note that we have subdivided the operational level into the latter two files.)

The operational level depends on the knowledge level, so we discuss the knowledge level first.

Introduction to the Knowledge-Level Model

With version 4.0 of ACS Core, we consolidate several of the above patterns and conventions (as well as some new extensions) into a simple and unified metadata model.
acs_object_types

Each row in acs_object_types represents a specific type of object, just as each row in user_group_types represents a specific type of user group. For each type, we store:

There is no analog for the supertype column in user_group_types, which supports a single level of specialization. That is, you can create special types of user group (e.g., "company"), but you cannot further specialize those specializations (e.g., "Corporation", "Partnership", "LLC").

The User Groups model does have an analog for the table_name and id_column columns: the *_info naming convention and the associated convention that the primary key of all *_info tables will be the group_id column.

acs_attributes

Each row in acs_attributes represents an attribute of a specific object type, e.g., "password" is an attribute of the "user" type. Along with attribute_name and object_type, we store:

In most cases, an attribute maps directly to a column in the table identified by the table_name of the corresponding object type, although, as mentioned above, we sometimes store attribute values as key-value pairs in a "skinny" table. However, when you ask the question "What are the attributes of this type of object?", you don't really care about how the values for each attribute are stored (in a column or as key-value pairs); you expect to receive the complete list of all attributes.

Thus, acs_attributes consolidates metadata about both forms of attribute. It contains a storage column to discriminate between:

  1. attributes with type-specific storage: i.e., its values live in the column identified by column_name in the table identified by its object_type's table_name
  2. attributes with generic storage: its values live as rows in the acs_attribute_values table, a generic "skinny" table for storing key-value pairs (see below)

Since storage is specified for each attribute, instead of for the object type as a whole, you can use a combination of type-specific and generic storage for a given object type, making this model more flexible than the user_group_type_fields design, which dictates that all attribute values for a user group type be stored in a type-specific *_info table.

The acs_attributes model satisfies most of the metadata requirements listed above:

(The only requirement that the ACS Core 4.0 data model does not currently meet is: "what validity tests are to be performed on the user's input". We plan to support this requirement in version 4.1 of ACS Core.)

In summary, ACS Core 4.0's knowledge-level model builds upon the pre-4.0 User Groups model to provide a general means for type specialization and storing attribute metadata. In object-oriented programming terminology, the knowledge-level model allows us to create object types by inheriting from existing object types.

(For more detail on the complete knowledge-level model, see Appendix A below.)

Introduction to the Operational-Level Model

Based on our experience with prior versions of ACS, the operational-level model of ACS Core 4.0 augments familiar entities like users with some new ones. The purpose of introducing these new entities is to simplify the rest of ACS by providing: Here is a walkthrough of the most important tables in the operational model:

parties

In his excellent book Data Model Patterns, David Hay writes:

People and organizations share many attributes and many relationships to other entities. A corporation is, after all, a "legal person." Both people and organizations have "names" and "addresses" as attributes, and both may be parties to contracts. For this reason, while PERSON and ORGANIZATION are things of significance, so too is the super-set of the two, which we shall here call PARTY.
Using different terminology ("user" instead of "person" and "user group" instead of "organization"), the Module Scoping design in ACS 3.0 introduced the "party" abstraction, though not in the form of a parties table. Rather, the triple of scope, user_id, and group_id columns was used to identify a party, i.e., something that is either a person or a group of people. One disadvantage of this design convention is that it increases a data model's complexity by requiring you to:

Here is an actual example of this from ACS 3.3 - the calendar_categories table (part of the Calendar module):

create table calendar_categories (
	category_id	integer primary key,
	-- if scope=public, this is a calendar category the whole system
        -- if scope=group, this is a calendar category for a particular group
	-- if scope=user, this is a calendar category for a user   
 	scope           varchar(20) not null,
	group_id	references user_groups,
	user_id		references users,
	category	varchar(100) not null,
	enabled_p	char(1) default 't' check(enabled_p in ('t','f')),
	constraint calendar_category_scope_check check ((scope='group' and group_id is not null) or
							(scope='user' and user_id is not null) or
							(scope='public')),
	constraint calendar_category_unique_check unique(scope, category, group_id, user_id)
);
With the introduction of parties, this table becomes simpler:
create table calendar_categories (
	category_id	integer primary key,
	party_id	references parties,
	category	varchar(100) not null,
	enabled_p	char(1) default 't' check(enabled_p in ('t','f')),
);

The real advantage of the latter model is only apparent when you compare code samples for the two models:

  1. To ascertain the name of the owning party in the Scoping model, you need to do something like:
    # assuming that category_id is passed in as form data
    
    db_1row one_enabled_calendar_category {
        select category, scope, group_id, user_id
        from calendar_categories
        where category_id = :category_id
        and enabled_p = 't'
    }
    
    switch $scope {
        group {
    	set owner [db_string {
    	    select group_name
    	    from user_groups
    	    where group_id = :group_id
    	}]
        }
    
        user {
    	set owner [db_string {
    	    select first_names || ' ' || last_name
    	    from users
    	    where user_id = :user_id
    	}]
        }
    
        public {
    	set owner "Public"
        }
    }
    
    The 4.0 model allows you simply to write (and save a db hit):
    # assuming that category_id is passed in as form data
    
    db_1row one_enabled_calendar_category {
        select category, party.name(party_id) as owner
        from calendar_categories
        where category_id = :category_id
        and enabled_p = 't'
    }
    
    In the Scoping example, the work of resolving the scoping column values into a presentable name is the responsibility of the developer. In contrast, ACS Core 4.0 does that work for you, by providing the name function of the party PL/SQL package, one of the APIs defined by ACS Core 4.0 (more on those later).
  2. To find all items that belong to user 123 or group 456, the Scoping model requires the developer to write logic similar to what we saw in the previous example, e.g.:
    # assuming that scope (and maybe group_id or user_id) is passed in as
    # form data
    
    set enabled_categories_by_scope_sql {
        select category_id, category
        from calendar_categories
        where enabled_p = 't'
        and scope = :scope
    }
    
    # in reality, you'd want to use the ad_scope_sql Tcl proc to construct
    # this criterion, which basically performs the same function as this
    # switch statement
    
    switch $scope {
        group {
    	append enabled_categories_by_scope_sql {
    	    and group_id = :group_id
    	}
        }
    
        user {
    	append enabled_categories_by_scope_sql {
    	    and user_id = :user_id
    	}
        }
    
        public {
    	# do nothing
        }
    }
    
    db_foreach enabled_categories_by_scope $enabled_categories_by_scope_sql {
        ...
    }
    
    Using the ACS 4.0 model again simplifies the developer's job:
    # assuming that party_id is passed in as form data
    
    db_foreach enabled_categories_by_scope {
        select category_id, category
        from calendar_categories
        where enabled_p = 't'
        and party_id = :party_id
    } {
        ...
    }
    

In both of the above examples, we don't really care whether the party is a user or a group of users or any user (the public). This is often the case in real-world situations, i.e., we often want to treat users and groups as the same basic type of object. This is the fundamental reason why parties is a useful abstraction.

For those situations when we do care about the precise type of a party, ACS Core 4.0 provides the type function of the acs_object PL/SQL package (and its corresponding Tcl wrapper, acs_object_type). Here's a PL/SQL sample to illustrates its usage:

party_type := acs_object.type(party_id);

if party_type = 'user' then
  ...
elsif party_type = 'organization' then
  ...
end if;
and here's a Tcl sample:
switch [acs_object_type $party_id] {
    group {
	...
    }

    user {
	...
    }

    default {
	...
    }
}
To recap, the parties model and its accompanying APIs achieve the same design goals as Scoping, with less work required on the part of the programmer.

Note that we include email and url in the parties model because both persons and organizations are often referenced by an email address (in the case of organizations, the email address is usually a mailing list or alias to the organization's representative) and URL.

persons

The idea of introducing a "person" entity that is the supertype of "user" was suggested by David Jones of MIT Press.

The persons table contains a subset of the columns from the ACS 3.x users table, specifically those attributes that all people (whether registered users or not) have, e.g., first_names, last_name. (It omits columns like password that only a registered user would ever have.)

The value of this extra layer of abstraction is that we can represent and treat people uniformly, whether they are registered users or not. To date, we have represented people who are not registered users in a number of different ways, e.g.:

In both instances, what we really want is to reference a person who may or may not be a registered user. Today, creating an address book entry for a registered user requires that we copy data from the users_contact table into the address_book and then keep it in sync manually, not an ideal situation. While we could handle this by adding a nullable user_id column to the address_book table and program logic to handle this special case, a cleaner solution would be to replace the first_names and last_name columns with a person_id foreign key. Similarly, in the Education module, we could replace the professors_in_charge column with a mapping table between edu_subjects and persons.

The persons table also proves useful in the Content Repository: By representing the author of a content item as a person instead of a user, we can treat user-contributed content (e.g., comments and forum postings) and non-user-contributed content (e.g., news articles written by reporters who aren't users, syllabi written by professors who aren't users) uniformly in many situations, again resulting in less program logic to handle special cases.

The downside of this extra layer of abstraction is that we can no longer query the users table directly for person-level attributes, requiring us to add joins to queries that select both user-level and person-level attributes. For convenience, ACS Core 4.0 does define the cc_users view ("cc" stands for "Community Core" - other suggestions for naming this view are welcome) that combines user-level attributes (columns of the users table) with those of its supertypes; of course, this view doesn't remove the cost of the extra joins that are required. There's more discussion on this subject below, in the Cons section below.

users

The familiar users table remains, although it is now has less columns, because, as mentioned above, it inherits many of its attributes from its supertypes, e.g., it inherits first_names and last_name from person, and email and url from party.

organizations

As alluded to above, the "organization" entity is ACS Core 4.0's analog for "user group." The only real difference between organizations and user groups is that an organization is not restricted to containing only users. Rather, an organization can contain parties, i.e., users or other organizations. In essence, organizations are an extension of user groups to support hierarchies, which occur frequently in real-world situations. The Intranet module is a good example of this; its data model defines "customer" and "project" user group types that naturally form a hierarchy.

acs_objects

Each row in the acs_objects table represents an instance of the pre-defined object type "acs_object". The acs_object type is analogous to Java's java.lang.Object class, which is the ultimate supertype of every Java object type (i.e., class). Similarly, acs_object is intended to be the ultimate supertype of every ACS object type (row in acs_object_types). This "intended to be" qualifier is significant, as it reveals that the Java analogy is not perfect: Unlike Java, ACS does not require that every object type be a subtype of the root type.

As it happens, acs_object is the supertype of ACS Core's pre-defined object types, but it bears repeating that subtyping acs_object is not mandatory. Moreover, you can still use the knowledge-level model to store metadata about your application-specific object types, even if they are not subtypes of acs_object. In this way, ACS Core adheres to the design principle of loose coupling. The developer can decide for herself: "Do I want to use only the metadata system? Do I want to use the metadata system and the acs_object framework? Do I not want to use either?"

Thus the acs_objects table identifies the object type, blah blah blah, basically RECAP to make clear.

Of course, we hope that the benefits of subtyping acs_object are compelling enough that ACS developers and application developers will choose to subtype acs_object most of the time. Here is a cost/benefit analysis to help you make up your own mind:

WE MAY WANT TO MOVE THE FOLLOWING DOWN INTO THE "BENEFITS" SECTION

So what exactly is required to subtype acs_object?

It's basically a two-step process.

  1. Create a table to store instances of your object type (or, more precisely, the values of type-specific attributes for instances of your object type); make its primary key an integer column that is also a foreign key from its supertype's table.

  2. Insert rows into the metadata model:

    • one row into acs_object_types for your object type (setting the value of the supertype column to be either "acs_object" or the name of an existing acs_object subtype)
    • one row into acs_attributes for each attribute unique to this type
    • one row into acs_rel_types for each type of permissible relationship for instances of this type

    The <<NOT YET DEFINED>> API simplifies the process of creating this metadata.

What are the benefits of subtyping acs_object?

  1. The baseline auditing columns mentioned above are included in the acs_objects table:

    • creation_user
    • creation_date
    • creation_ip
    • last_modified
    • modifying_user
    • modifying_ip

    Thus, the aforementioned Audit Trail convention is automatically enforced for acs_object subtypes.

  2. ...

acs_attribute_values

The purpose of this table is to provide Attribute Value Storage for any acs_object_type. The combination of acs_attributes and acs_attribute_values should supersede the proliferation of ad hoc "skinny" tables described above.

acs_permissions

The ACS Core 4.0 analog of general_permissions, the acs_permissions table illustrates the simplifying effects of the 4.0 enhancements vividly. Compare:

create table general_permissions (
	permission_id		integer not null primary key,
	on_what_id		integer not null,
	on_which_table		varchar(30) not null,
        scope           	varchar(20),
	user_id			references users,
	group_id		references user_groups,
	role			varchar(200),
	permission_type		varchar(20) not null,
	check ((scope = 'user' and user_id is not null
                and group_id is null and role is null) or
	       (scope = 'group_role' and user_id is null
                and group_id is not null and role is not null) or
	       (scope = 'group' and user_id is null
                and group_id is not null and role is null) or
	       (scope in ('registered_users', 'all_users')
                and user_id is null
                and group_id is null and role is null)),
	unique (on_what_id, on_which_table,
                scope, user_id, group_id, role, permission_type)
);
with:
create table acs_permissions (
	object_id		not null
				constraint acs_permissions_on_what_id_fk
				references acs_objects (object_id),
	grantee_id		integer not null,
	action			varchar(100) not null,
	constraint acs_permissions_pk
	primary key (object_id, grantee_id, action)
);
Five of the general_permissions columns have collapsed into two, and the convoluted check constraint and uniqueness constraint are no longer necessary.

That said, acs_permissions is semantically equivalent to general_permissions. Accordingly, ACS Core 4.0 preserves the ACS 3.x general_permissions API (though we also introduce the acs_permission API, which new code should use instead). RAFI, PETE: add stuff about general_objects, acs_rels; maybe in appendix a

Benefits

Many of the benefits of the ACS Core 4.0 data model have been discussed above. Here are some others that have not been mentioned yet: Also, here is a recap of the benefits we have already discussed:

Costs

The costs associated with adopting the 4.0 data model fall into two basic categories: read performance and write performance.

Read performance

In the 4.0 data model, there is one row in the acs_objects table for each instance of every acs_object subtype. That is, the acs_objects table will have a row for each user, each organization, each content item, each subsite, each APM package, and so on. Clearly, the acs_objects table will be large, given that a successful site will have millions of users, millions of content items, etc.

So the natural question is: What are the performance and scalability implications of this "one big table" model? To answer this question accurately, we need to consider how acs_objects will be used in practice.

Use of acs_objects in queries
We need to first understand what situations require us to query the acs_objects table at all, because many common scenarios that involve instances of the acs_object type do not need to involve the acs_objects table.

For instance, consider the login process: In ACS 3.4, the login process involves two queries:

  1. a query of the users table to ensure that the specified email address belongs to an existing user (if not, the user registration process begins)
  2. a query of the users table to ensure that the specified password is correct

In ACS 4.0, the login process involves the same number of queries (albeit slightly different ones):

  1. a query of the parties table and the users table to ensure that the specified email address belongs to an existing user (the join is needed because email is stored in parties)
  2. a query of the users table to ensure that the specified password is correct

Note that, even though we've made user a subtype of acs_object, the acs_objects table is not involved in either of these two queries. The only real difference between ACS 3.x and 4.0 in this situation is one extra join to the parties table. Admittedly, since it is the superset of users, parties may well be a very large table itself, but the performance impact of the extra unique index scan that the join requires is negligible, even for tables with millions of rows. (If the impact ever did become significant, you could denormalize the email column into the users table to eliminate the join.)

Note also that we do not need to query the persons table, even though person falls between party and user in the object type hierarchy. This is because the user_id of each row in the users table identifies exactly one row in the persons table, exactly one row in the parties table, and exactly one row in the acs_objects table. To illustrate this point, compare:

select pa.email, u.password
from parties pa, pe.persons, users u
where pa.party_id = pe.person_id
and pe.person_id = u.user_id
and u.user_id = :user_id
with:
select p.email, u.password
from parties p, users u
where p.party_id = u.user_id
and u.user_id = :user_id
These two queries return the same information, but the second query eliminates a superfluous join from the first one.

In general terms, within the acs_objects framework, a row identifies exactly one row in each of its supertype tables. This enables you to join between any two levels of the type hierarchy, skipping intervening levels if your select clause does not include columns from those tables, an interesting facet of the "object-relational" nature of the 4.0 data models.

JON POINTED OUT TO ME THAT WE ACTUALLY DO NEED TO QUERY ACS_OBJECTS A LOT, BECAUSE THAT'S WHERE CONTEXT_ID LIVES, WHICH WE NEED FOR HANDLING SUBSITES

Hence, we only need to include acs_objects in queries that have columns from acs_objects (e.g., creation_date, object_type) in their SELECT and/or WHERE clauses, i.e., queries like "Show me all content items created today" and "What user last modified this object?"

Performance Optimizations
A properly configured Oracle database can perform index-scan queries on tables with tens or even hundred of millions rows very quickly, so, in situations where we're targeting a specific object or set of objects (i.e., we have an object_id or a set of object_ids to put in the WHERE clause), querying against acs_objects should not be a problem. In addition, we can use a number of optimizations that can dramatically improve performance and scalability.

We also plan to work proactively with the Scaling team to perform load testing on the 4.0 data model in order to quantify its performance and scalability characteristics.

Write performance

As alluded to above, one consequence of having the data model formally recognize the levels of abstraction that exist in the problem space (i.e., that a user is a person, which is a party, which is an object) is that the data of one particular object, which previously resided all in one table, now resides in multiple tables: at least two tables for any acs_object subtype, and four in the case of the user type.

This fact impacts not only read performance (i.e., the extra joins required to combine attributes from different levels of abstraction in one query) but also write performance. For example, creating a user used to require one INSERT statement; now, it requires four (performed by the implementation of the user.create PL/SQL function). The same is true for updates: to update every attribute of a given user, you need four UPDATE statements, one for each table in which attribute vaues for the user are stored.

In most cases, the overhead of the extra DML statements required by a data model with more levels of abstraction is not a serious concern, given the fact that transactions like user registration are relatively infrequent. However, when using the 4.0 model, you should take this cost into consideration and decide on a case-by-case basis whether or not the model fits your needs. There are certainly applications where the benefits of the acs_objects framework do not outweigh the costs, e.g., for Clickstreaming, we would definitely not make "click" an acs_object subtype.

Issues

  1. Should we make user a subtype of person instead of party?
  2. How should we handle indirect membership queries? <<RAFI HAS AN ANSWER FOR THIS>>
  3. What can we do to minimize backwards-incompatibility (e.g., create views to stand in for obsolete legacy tables like user_group_map)?
  4. What about multiple inheritence? From Luis, "Was there a conscious decision not support multiple inheritance? ACS 4.0 doesn't have to support multi-inheritance, but if the underlying data model does, then you have room to utilize multi-inheritance when necessary. I wrote a paper on making multi-inheritance super-efficient (on my web site: www.lcrm.com, titled "Efficient Method Dispatch in PCL". I think the only change you need to make to the model is that the supertype table is a separate table, since one "object" can have multiple supertypes. BTW - there's a *ton* of research material on cool and optimized Object-Oriented implementations."

Appendix A: The Complete Knowledge-Level Model

other tables not discussed above like acs_rel_types, etc.

Appendix B: The Complete Operational-Level Model

other tables not discussed above like general_objects, acs_rels, etc.

Appendix C: Referential Integrity

As part of 4.0 development, we are defining APIs that will help to decouple ACS-based applications from the internals (i.e., the data model) of ACS Core, so that those internals can be modified (e.g., to optimize its performance) without affecting dependent code.

One area where we anticipate the need to break abstraction is in the interface between application-specific data models and the Core data models. If we were to be puritanical about abstraction, we would forbid the use of referential integrity constraints between application-specific tables and Core tables, because some types of change in the Core tables could then impact applications built on top of the Core.

However, from an engineering perspective, the benefits of referential integrity (i.e., reliable data quality) outweigh the costs of breaking abstraction (i.e., the potential need to adjust the application data model in response to certain types of change within the Core), so we recommend the use of referential integrity between ACS Core and application-specific data models.


michael@arsdigita.com
rhs@arsdigita.com

Last Modified: overview.html,v 1.1 2001/01/21 01:38:18 bquinn Exp