user_group_types, user_group_type_fields,
and *_info tables)
on_which_table +
on_what_id)
user_group_types, user_group_type_fields,
and *_info tables)user_groups table) by:
user_group_types, user_group_type_fields,
etc.)
<group_type>_info"
The original vision for the Content Repository (see the section "A modest proposal: one content table") also includes the concept of type specialization:
If Oracle were a true object database, we could have tables that inherited from theall_contenttable but supplemented it with extra columns (e.g.,refers_toin the case ofbboard). But Oracle isn't so we will probably have to resort to kludges like the_infohelper tables that we have for user groups.
Additionally, the metadata in tables like
user_group_type_fields is used to generate a user
interface for manipulating the specific attributes of different user
group types (the /admin/ug/ pages) automatically.
Another of ACS 4's design goals is to enable broader usage of this type of metadata:
We need metadata inside Oracle (so that it is available to all clients, whether AOLserver or other) that, for any column in the database saysIn other words, the goal is to generalize the attribute metadata model devised for user group types.
- what its pretty name is
- how it is to be presented to the user for form input (e.g., text box, textarea, radio buttons)
- if a radio button or checkbox presentation, what the legal choices are (the survey-simple.sql data model may be useful here)
- what validity tests are to be performed on the user's input (it would be nice to use an interpreted language like Tcl here so that we are fully general but probably it is best to restrict ourselves to things that can be easily evaluated within Java within Oracle)
on_which_table +
on_what_id)on_which_table and
on_what_id, respectively. (There are
exceptions, like the site_wide_index table, which has
columns named table_name and the_key.)
The on_which_table + on_what_id idiom is
useful in situations where you want to treat heterogenous types of
object uniformly. Examples of this idiom in use include:
site_wide_content_map table
to "[store] all the mappings of content to categories... [where]
content items [are] stored in disparate Oracle tables"
site_wide_index
table to "[tie] together rows from different tables to a single
interMedia index column" so that "for example, bulletin board postings
and classified ads are searched at the same time as static documents"
on_which_table/on_what_id
theme is the
Module Scoping design introduced in ACS 3.0, which uses a triple
of columns (scope,
user_id, and
group_id) to identify the "scope" with
which a given object in the database is associated.
"Scope" is a term best explained by example. Consider some
hypothetical rows in the address_book table:
The first row represents an entry in User 123's personal address book, the second row represents an entry in User Group 456's shared address book, and the third row represents an entry in the site's public address book.
... scopeuser_idgroup_id... ... user123... ... group456... ... public...
In this way, the scoping columns identify the context in which a given object belongs, where each context is either a person or a group of people or the general public (itself a group of people).
The General Permissions package lets you control who can do what with each row in your database, by providing:There are five types of party to whom permissions can be granted:
- an API for defining and querying permission rules - Each rule grants a specific type of permission such as "read" or "write" (i.e., the "what") on one row in the database to a specific party (i.e., the "who")
- a simple means to enforce those rules throughout the site - One line of code at the top of each page identifies the required permission and ensures that each user who attempts to view the page has been granted that permission.
In essence, General Permissions treats rows in the database as operating systems like Unix and Microsoft Windows NT treat files in the filesystem.
- An individual user
- Any user who play a specific role in a specific user group
- A specific user group, i.e., any user who is a member of the group
- All registered users
- All users, registered or not
user_group_type_member_fields and
user_group_member_fields tables define attributes for
members of groups of a specific type and for members of a specific
group, respectively. The user_group_member_field_map
table stores values for both categories of attribute in its
field_value column.
ec_custom_product_fields table defines attributes for
catalog products, and the ec_custom_product_field_values
table stores values for those attributes.
ph_custom_photo_fields
table defines attributes for the photographs owned by a specific user,
and tables named according to the convention
"ph_user_<user_id>_custom_info" are used to store
values for those attributes.
users_preferences
table, which stores preferences for registered users in columns such
as prefer_text_only_p and
dont_spam_me_p. The "standard" way for an ACS 3.x-based
application to add to the list of user preferences is to add a column
to the users_preferences table (exactly the kind of data
model change that has historically complicated the process of
upgrading to a more recent ACS version).
The following column names are ArsDigita Community System standards.Additionally, there is a defined set of conventions for maintaining comprehensive audit trails on the rows in a given table, that includes a naming convention for audit tables (the
- creation_date - date row was created
- creation_user - user_id of the creator
- creation_ip_address - ip address of the creator
- last_modified - date row was last modified
- last_modifying_user - user_id of the user last modifying the row
- modified_ip_address - ip address of the last modifying user
- ...
_audit suffix) and
guidelines for writing auditing triggers.
The operational level depends on the knowledge level, so we discuss the knowledge level first.
acs_object_types
acs_object_types represents a specific
type of object, just as each row in user_group_types
represents a specific type of user group. For each type, we store:
pretty_name and
pretty_plural columns)
supertype (needed to compensate for the
fact that Oracle is not a true object database, as mentioned above)
table_name) and the name of that
table's primary key column (id_column)
There is no analog for the supertype column in
user_group_types, which supports a single level of
specialization. That is, you can create special types of user group
(e.g., "company"), but you cannot further specialize those
specializations (e.g., "Corporation", "Partnership", "LLC").
The User Groups model does have an analog for the
table_name and id_column columns: the
*_info naming convention and the associated convention
that the primary key of all *_info tables will be the
group_id column.
acs_attributes
acs_attributes represents an attribute of
a specific object type, e.g., "password" is an attribute of the "user"
type. Along with attribute_name and
object_type, we store:
attribute_type, a foreign key from the
acs_attribute_types helper table): e.g., boolean, string,
number, date
min_n_values and
max_n_values, respectively): one and
only one in most cases; a min_n_values of zero implies an
optional attribute, while a max_n_values that is greater
than one or null (i.e., no maximum) implies an attribute that permits
multiple values
In most cases, an attribute maps directly to a column in the table
identified by the table_name of the corresponding object
type, although, as mentioned above, we sometimes store attribute
values as key-value pairs in a "skinny" table. However, when you ask
the question "What are the attributes of this type of object?", you
don't really care about how the values for each attribute are stored
(in a column or as key-value pairs); you expect to receive the
complete list of all attributes.
Thus, acs_attributes consolidates metadata about both
forms of attribute. It contains a
storage column to discriminate between:
column_name in the table identified by
its object_type's table_name
acs_attribute_values table, a generic "skinny"
table for storing key-value pairs (see below)
Since storage is specified for each attribute, instead of
for the object type as a whole, you can use a combination of
type-specific and generic storage for a given object type, making this
model more flexible than the user_group_type_fields
design, which dictates that all attribute values for a user group type
be stored in a type-specific *_info table.
The acs_attributes model satisfies most of the metadata
requirements listed above:
pretty_name column of
acs_attributes
acs_enum_values table
(The only requirement that the ACS Core 4.0 data model does not currently meet is: "what validity tests are to be performed on the user's input". We plan to support this requirement in version 4.1 of ACS Core.)
In summary, ACS Core 4.0's knowledge-level model builds upon the pre-4.0 User Groups model to provide a general means for type specialization and storing attribute metadata. In object-oriented programming terminology, the knowledge-level model allows us to create object types by inheriting from existing object types.
(For more detail on the complete knowledge-level model, see Appendix A below.)
users with some new ones. The purpose of introducing
these new entities is to simplify the rest of ACS by providing:
parties
In his excellent book Data Model Patterns, David Hay writes:
People and organizations share many attributes and many relationships to other entities. A corporation is, after all, a "legal person." Both people and organizations have "names" and "addresses" as attributes, and both may be parties to contracts. For this reason, while PERSON and ORGANIZATION are things of significance, so too is the super-set of the two, which we shall here call PARTY.Using different terminology ("user" instead of "person" and "user group" instead of "organization"), the Module Scoping design in ACS 3.0 introduced the "party" abstraction, though not in the form of a
parties table. Rather, the triple of scope,
user_id, and group_id columns was used to
identify a party, i.e., something that is either a person or a group
of people. One disadvantage of this design convention is that it
increases a data model's complexity by requiring you to:
scope value of "group" but
a null group_id)
Here is an actual example of this from ACS 3.3 - the
calendar_categories table (part of the Calendar module):
create table calendar_categories (
category_id integer primary key,
-- if scope=public, this is a calendar category the whole system
-- if scope=group, this is a calendar category for a particular group
-- if scope=user, this is a calendar category for a user
scope varchar(20) not null,
group_id references user_groups,
user_id references users,
category varchar(100) not null,
enabled_p char(1) default 't' check(enabled_p in ('t','f')),
constraint calendar_category_scope_check check ((scope='group' and group_id is not null) or
(scope='user' and user_id is not null) or
(scope='public')),
constraint calendar_category_unique_check unique(scope, category, group_id, user_id)
);
With the introduction of parties, this table becomes simpler:
create table calendar_categories (
category_id integer primary key,
party_id references parties,
category varchar(100) not null,
enabled_p char(1) default 't' check(enabled_p in ('t','f')),
);
The real advantage of the latter model is only apparent when you compare code samples for the two models:
# assuming that category_id is passed in as form data
db_1row one_enabled_calendar_category {
select category, scope, group_id, user_id
from calendar_categories
where category_id = :category_id
and enabled_p = 't'
}
switch $scope {
group {
set owner [db_string {
select group_name
from user_groups
where group_id = :group_id
}]
}
user {
set owner [db_string {
select first_names || ' ' || last_name
from users
where user_id = :user_id
}]
}
public {
set owner "Public"
}
}
The 4.0 model allows you simply to write (and save a db hit):
# assuming that category_id is passed in as form data
db_1row one_enabled_calendar_category {
select category, party.name(party_id) as owner
from calendar_categories
where category_id = :category_id
and enabled_p = 't'
}
In the Scoping example, the work of resolving the scoping
column values into a presentable name is the responsibility of the
developer. In contrast, ACS Core 4.0 does that work for you,
by providing the name function of the party
PL/SQL package, one of the APIs defined by ACS Core 4.0 (more on those
later).
# assuming that scope (and maybe group_id or user_id) is passed in as
# form data
set enabled_categories_by_scope_sql {
select category_id, category
from calendar_categories
where enabled_p = 't'
and scope = :scope
}
# in reality, you'd want to use the ad_scope_sql Tcl proc to construct
# this criterion, which basically performs the same function as this
# switch statement
switch $scope {
group {
append enabled_categories_by_scope_sql {
and group_id = :group_id
}
}
user {
append enabled_categories_by_scope_sql {
and user_id = :user_id
}
}
public {
# do nothing
}
}
db_foreach enabled_categories_by_scope $enabled_categories_by_scope_sql {
...
}
Using the ACS 4.0 model again simplifies the developer's job:
# assuming that party_id is passed in as form data
db_foreach enabled_categories_by_scope {
select category_id, category
from calendar_categories
where enabled_p = 't'
and party_id = :party_id
} {
...
}
In both of the above examples, we don't really care whether the party
is a user or a group of users or any user (the public). This is often
the case in real-world situations, i.e., we often want to treat users
and groups as the same basic type of object. This is the fundamental
reason why parties is a useful abstraction.
For those situations when we do care about the precise type of a
party, ACS Core 4.0 provides the type function of the
acs_object PL/SQL package (and its corresponding Tcl
wrapper, acs_object_type). Here's a PL/SQL sample to
illustrates its usage:
and here's a Tcl sample:party_type := acs_object.type(party_id); if party_type = 'user' then ... elsif party_type = 'organization' then ... end if;
switch [acs_object_type $party_id] {
group {
...
}
user {
...
}
default {
...
}
}
To recap, the parties model and its accompanying APIs
achieve the same design goals as Scoping, with less work required on
the part of the programmer.
Note that we include email and url in the parties model
because both persons and organizations are often referenced by an
email address (in the case of organizations, the email address is
usually a mailing list or alias to the organization's representative)
and URL.
persons
The idea of introducing a "person" entity that is the supertype of "user" was suggested by David Jones of MIT Press.
The persons table contains a subset of the columns from
the ACS 3.x users table, specifically those attributes
that all people (whether registered users or not) have, e.g.,
first_names, last_name. (It omits columns
like password that only a registered user would ever
have.)
The value of this extra layer of abstraction is that we can represent and treat people uniformly, whether they are registered users or not. To date, we have represented people who are not registered users in a number of different ways, e.g.:
address_book table contains
first_names and last_name columns (just like
the users table).
edu_subjects table (part of the Education module)
has a professors_in_charge column that is a
varchar(200) used for storing opaque strings like
"Prof. Smith, Prof. Jones".
In both instances, what we really want is to reference a person who
may or may not be a registered user. Today, creating an address book
entry for a registered user requires that we copy data from the
users_contact table into the address_book
and then keep it in sync manually, not an ideal situation. While we
could handle this by adding a nullable user_id column to
the address_book table and program logic to handle this
special case, a cleaner solution would be to replace the
first_names and last_name columns with a
person_id foreign key. Similarly, in the Education
module, we could replace the professors_in_charge column
with a mapping table between edu_subjects and
persons.
The persons table also proves useful in the Content
Repository: By representing the author of a content item as a person
instead of a user, we can treat user-contributed content (e.g.,
comments and forum postings) and non-user-contributed content (e.g.,
news articles written by reporters who aren't users, syllabi written
by professors who aren't users) uniformly in many situations, again
resulting in less program logic to handle special cases.
The downside of this extra layer of abstraction is that we can no
longer query the users table directly for person-level
attributes, requiring us to add joins to queries that select both
user-level and person-level attributes. For convenience, ACS Core 4.0
does define the cc_users view ("cc" stands
for "Community Core" - other suggestions for naming this view are
welcome) that combines user-level attributes (columns of the
users table) with those of its supertypes; of course,
this view doesn't remove the cost of the extra joins that are
required. There's more discussion on this subject below, in the
Cons section below.
users
The familiar users table remains, although it is now has
less columns, because, as mentioned above, it inherits many of its
attributes from its supertypes, e.g., it inherits
first_names and last_name from
person, and email and url from
party.
organizations
As alluded to above, the "organization" entity is ACS Core 4.0's analog for "user group." The only real difference between organizations and user groups is that an organization is not restricted to containing only users. Rather, an organization can contain parties, i.e., users or other organizations. In essence, organizations are an extension of user groups to support hierarchies, which occur frequently in real-world situations. The Intranet module is a good example of this; its data model defines "customer" and "project" user group types that naturally form a hierarchy.
acs_objects
Each row in the acs_objects table represents an instance
of the pre-defined object type "acs_object". The
acs_object type is analogous to Java's
java.lang.Object class, which is the ultimate supertype
of every Java object type (i.e., class). Similarly,
acs_object is intended to be the ultimate supertype of
every ACS object type (row in acs_object_types). This
"intended to be" qualifier is significant, as it reveals that the Java
analogy is not perfect: Unlike Java, ACS does not
require that every object type be a subtype of the root type.
As it happens, acs_object is the supertype of ACS Core's
pre-defined object types, but it bears repeating that subtyping
acs_object is not mandatory. Moreover,
you can still use the knowledge-level model to store metadata about
your application-specific object types, even if they are not subtypes
of acs_object. In this way, ACS Core adheres to the
design principle of loose coupling. The developer can
decide for herself: "Do I want to use only the metadata system? Do I
want to use the metadata system and the acs_object
framework? Do I not want to use either?"
Thus the acs_objects table identifies the object type,
blah blah blah, basically RECAP to make clear.
Of course, we hope that the benefits of subtyping
acs_object are compelling enough that ACS developers and
application developers will choose to subtype acs_object
most of the time. Here is a cost/benefit analysis to help you make up
your own mind:
WE MAY WANT TO MOVE THE FOLLOWING DOWN INTO THE "BENEFITS" SECTION
So what exactly is required to subtype
acs_object?
It's basically a two-step process.
acs_object_types for your object type
(setting the value of the supertype column to be either
"acs_object" or the name of an existing
acs_object subtype)
acs_attributes for each attribute unique
to this type
acs_rel_types for each type of
permissible relationship for instances of this type
The <<NOT YET DEFINED>> API simplifies the process of creating this metadata.
What are the benefits of subtyping
acs_object?
acs_objects table:
creation_user
creation_date
creation_ip
last_modified
modifying_user
modifying_ip
Thus, the aforementioned Audit Trail
convention is automatically enforced for acs_object
subtypes.
acs_attribute_values
The purpose of this table is to provide Attribute Value Storage for any
acs_object_type. The combination of
acs_attributes and acs_attribute_values
should supersede the proliferation of ad hoc "skinny" tables
described above.
acs_permissions
The ACS Core 4.0 analog of general_permissions, the
acs_permissions table illustrates the simplifying effects
of the 4.0 enhancements vividly.
Compare:
create table general_permissions (
permission_id integer not null primary key,
on_what_id integer not null,
on_which_table varchar(30) not null,
scope varchar(20),
user_id references users,
group_id references user_groups,
role varchar(200),
permission_type varchar(20) not null,
check ((scope = 'user' and user_id is not null
and group_id is null and role is null) or
(scope = 'group_role' and user_id is null
and group_id is not null and role is not null) or
(scope = 'group' and user_id is null
and group_id is not null and role is null) or
(scope in ('registered_users', 'all_users')
and user_id is null
and group_id is null and role is null)),
unique (on_what_id, on_which_table,
scope, user_id, group_id, role, permission_type)
);
with:
Five of thecreate table acs_permissions ( object_id not null constraint acs_permissions_on_what_id_fk references acs_objects (object_id), grantee_id integer not null, action varchar(100) not null, constraint acs_permissions_pk primary key (object_id, grantee_id, action) );
general_permissions columns have collapsed
into two, and the convoluted check constraint and uniqueness
constraint are no longer necessary.
That said, acs_permissions is semantically equivalent to
general_permissions. Accordingly, ACS Core 4.0 preserves
the ACS 3.x general_permissions API (though we also
introduce the acs_permission API, which new code should
use instead).
RAFI, PETE: add stuff about general_objects,
acs_rels; maybe in appendix a
acs_object_types,
acs_attributes, etc. is the key to providing powerful
services like auto-generated admin pages, automatic auditing, etc.
In the 4.0 data model, there is one row in theacs_objectstable for each instance of everyacs_objectsubtype. That is, theacs_objectstable will have a row for each user, each organization, each content item, each subsite, each APM package, and so on. Clearly, theacs_objectstable will be large, given that a successful site will have millions of users, millions of content items, etc.So the natural question is: What are the performance and scalability implications of this "one big table" model? To answer this question accurately, we need to consider how
acs_objectswill be used in practice.Use of
We need to first understand what situations require us to query theacs_objectsin queriesacs_objectstable at all, because many common scenarios that involve instances of theacs_objecttype do not need to involve theacs_objectstable.For instance, consider the login process: In ACS 3.4, the login process involves two queries:
- a query of the
userstable to ensure that the specified email address belongs to an existing user (if not, the user registration process begins)- a query of the
userstable to ensure that the specified password is correctIn ACS 4.0, the login process involves the same number of queries (albeit slightly different ones):
- a query of the
partiestable and theuserstable to ensure that the specified email address belongs to an existing user (the join is needed becauseparties)- a query of the
userstable to ensure that the specified password is correctNote that, even though we've made
usera subtype ofacs_object, theacs_objectstable is not involved in either of these two queries. The only real difference between ACS 3.x and 4.0 in this situation is one extra join to thepartiestable. Admittedly, since it is the superset ofusers,partiesmay well be a very large table itself, but the performance impact of the extra unique index scan that the join requires is negligible, even for tables with millions of rows. (If the impact ever did become significant, you could denormalize theuserstable to eliminate the join.)Note also that we do not need to query the
personstable, even thoughpersonfalls betweenpartyanduserin the object type hierarchy. This is because theuser_idof each row in theuserstable identifies exactly one row in thepersonstable, exactly one row in thepartiestable, and exactly one row in theacs_objectstable. To illustrate this point, compare:with:select pa.email, u.password from parties pa, pe.persons, users u where pa.party_id = pe.person_id and pe.person_id = u.user_id and u.user_id = :user_idThese two queries return the same information, but the second query eliminates a superfluous join from the first one.select p.email, u.password from parties p, users u where p.party_id = u.user_id and u.user_id = :user_idIn general terms, within the
acs_objectsframework, a row identifies exactly one row in each of its supertype tables. This enables you to join between any two levels of the type hierarchy, skipping intervening levels if yourselectclause does not include columns from those tables, an interesting facet of the "object-relational" nature of the 4.0 data models.JON POINTED OUT TO ME THAT WE ACTUALLY DO NEED TO QUERY ACS_OBJECTS A LOT, BECAUSE THAT'S WHERE CONTEXT_ID LIVES, WHICH WE NEED FOR HANDLING SUBSITES
Hence, we only need to include
acs_objectsin queries that have columns fromacs_objects(e.g.,creation_date,object_type) in theirSELECTand/orWHEREclauses, i.e., queries like "Show me all content items created today" and "What user last modified this object?"
Performance Optimizations
A properly configured Oracle database can perform index-scan queries on tables with tens or even hundred of millions rows very quickly, so, in situations where we're targeting a specific object or set of objects (i.e., we have anobject_idor a set ofobject_ids to put in theWHEREclause), querying againstacs_objectsshould not be a problem. In addition, we can use a number of optimizations that can dramatically improve performance and scalability.
- For queries such as "Show me all content items created today" (i.e., where no specific
object_idis specified in theWHEREclause), we will use Oracle's Partitioning features to preserve performance: "Partitioning addresses the key problem of supporting very large tables and indexes by allowing you to decompose them into smaller and more manageable pieces called partitions. Once partitions are defined, SQL statements can access and manipulate the partitions rather than entire tables or indexes." The basic idea is that to process a query like "Show me all content items created today," Oracle would only have to look at the partition (or partitions) in which we keep content items, as opposed to scanning the entireacs_objectstable, needlessly looking at users, groups, etc.- Data can be striped across multiple disks with separate disk controllers at either the operating-system or Oracle level, reducing I/O contention for data.
- With Oracle partitioning and/or striping, Oracle Parallel Query Option (PQO) allows multiple processes to split the work required by a select or DML operation so that work can be conducted in parallel on multiprocessor machines. For instance, a query requiring a full table scan on a table with 10M rows striped across 4 disks on a 4 CPU machine could be split across 4 separate parallel processes, one per CPU/disk pair, so that each process would only need to scan 2.5M rows.
- Caching of expensive queries and careful optimization of common queries can also improve performance tremendously. RAFI TALK ABOUT YOUR PRECOMPUTATION TRIGGER STUFF
We also plan to work proactively with the Scaling team to perform load testing on the 4.0 data model in order to quantify its performance and scalability characteristics.
As alluded to above, one consequence of having the data model formally recognize the levels of abstraction that exist in the problem space (i.e., that a user is a person, which is a party, which is an object) is that the data of one particular object, which previously resided all in one table, now resides in multiple tables: at least two tables for anyacs_objectsubtype, and four in the case of theusertype.This fact impacts not only read performance (i.e., the extra joins required to combine attributes from different levels of abstraction in one query) but also write performance. For example, creating a user used to require one
INSERTstatement; now, it requires four (performed by the implementation of theuser.createPL/SQL function). The same is true for updates: to update every attribute of a given user, you need fourUPDATEstatements, one for each table in which attribute vaues for the user are stored.In most cases, the overhead of the extra DML statements required by a data model with more levels of abstraction is not a serious concern, given the fact that transactions like user registration are relatively infrequent. However, when using the 4.0 model, you should take this cost into consideration and decide on a case-by-case basis whether or not the model fits your needs. There are certainly applications where the benefits of the
acs_objectsframework do not outweigh the costs, e.g., for Clickstreaming, we would definitely not make "click" anacs_objectsubtype.
user a subtype of person
instead of party?
user_group_map)?
acs_rel_types, etc.
general_objects,
acs_rels, etc.
One area where we anticipate the need to break abstraction is in the
interface between application-specific data models and the Core data
models. If we were to be puritanical about abstraction, we would
forbid the use of referential integrity constraints
between application-specific tables and Core tables, because some
types of change in the Core tables could then impact applications
built on top of the Core.
However, from an engineering perspective, the benefits of referential integrity (i.e., reliable data quality) outweigh the costs of breaking abstraction (i.e., the potential need to adjust the application data model in response to certain types of change within the Core), so we recommend the use of referential integrity between ACS Core and application-specific data models.
Last Modified: overview.html,v 1.1 2001/01/21 01:38:18 bquinn Exp