Manual System

by kevin@arsdigita.com

User accessible pages: /manuals/
Manual editing pages: /manuals/admin/
Manual administration pages: /admin/manuals/
Data model: /doc/sql/manuals.sql

The Big Picture

This is a system for managing a set of manuals or books through the database. This system allows users to view a dynamically generated table of contents, view sections and make comments on sections. Administrators can add, delete, edit and rearrange sections. Manuals can also have figures or use image to decorate their pages.

Printable versions of the manual are produced using HTMLDOC. Readers can download the complete manual in HTML or PDF; PostScript is an option but almost never what you want to offer for download because of size relative to PDF.

As an option, the system can be configured to use CVS to manage version control for section content. If there is any chance of concurrent edits, CVS should be installed and turned on in this module. Trying to avoid this by keeping track of locks in the database is ugly and we don't support it because basically you just end up with a bad re-implementation of CVS.

Our data model

We use three tables to store all content for a manual: manuals, manual_sections and manual_figures.

manuals holds the name of each manual stored on the system. Additional information we keep includes the owner of the manual and the scope of the document (public or restricted to a group).

create table manuals (
	manual_id		integer primary key,
	-- title of the manual
	title			varchar(500) not null unique,
	-- compact title used to generate file names, e.g. short_name.pdf
	short_name		varchar(100) not null unique,
	-- person responsible for the manual (editor-in-chief)
	owner_id		references users(user_id) not null,
	-- a string containing the author or authors which will
	-- be included on the title page of the printable version
	author			varchar(500),
	-- copyright notice (may be null)
	copyright		varchar(500),
	-- string describing the version and/or release date of the manual
	version			varchar(500),
	-- if scope=public, this manual is viewable by anyone
	-- if scope=group, this manual is restricted to group members
	scope			varchar(20) not null,
	-- if scope=group, this is the owning group_id
	group_id		references user_groups,
	-- is this manual currently active?
	active_p		char(1) default 'f' check (active_p in ('t','f')),
	-- notify the editor-in-chief on all changes to the manual
	notify_p		char(1) default 't' check (notify_p in ('t','f')),
	-- insure consistent state
	constraint manual_scope_check check ((scope='group' and group_id is not null)
	                                     or (scope='public'))
);

manual_sections holds information about the sections of the manuals:

create table manual_sections (
	section_id		integer primary key,
	-- which manual this section belongs to
	manual_id		integer references manuals not null,
	-- a string we use for cross-referencing this section
	label			varchar(100),
	-- used to determine where this section fits in the document hierarchy
	sort_key		varchar(50) not null,
	-- title of the section
	section_title		varchar(500) not null,
	-- user who first created the section
	creator_id		references users(user_id) not null,
	-- notify the creator whenever content is edited?
	notify_p		char(1) default 'f' check (notify_p in ('t','f')),
	-- user who last edited content for this section
	last_modified_by	references users(user_id),
	-- is there an html file associated with this section?
	content_p		char(1) default 'f' check (content_p in ('t','f')),
	-- determines whether a section is displayed on the user pages
	active_p		char(1) default 't' check (active_p in ('t','f')),
	-- we may want to shorten the table of contents by not displaying all sections
	display_in_toc_p 	char(1) default 't' check (display_in_toc_p in ('t','f')),
	-- make sure that sort_keys are unique within a give manual
	unique(manual_id,sort_key)
	-- want to add the following but can't figure out the syntax
	-- contraint manual_label_check check ((label is null) or (unique(manual_id,label))
);
The sort key uses a system similar to that in the threaded bboard system, whereby sections sort lexigraphically and the depth is determined by the length of the sort key. Ex., Unlike the bboard system, we only use digits since it simplifies the code and 100 seems like a reasonable limitation on the number of subsections of a given section. While these are numbers, the database treats them as strings and care must be taken to always single quote sort keys in SQL statements. Similarly, one should be careful to avoid TCL's hangups with leading zeros.

Manual Administration

High level administration occurs in /admin/manuals/. Here, administrators can add or delete manuals, change owners, authorize editors or otherwise dramatically alter the properties of a manual.

Editorial tasks are handled in /manuals/admin/. Here the editor of a manual can add, delete or edit sections of a manual and manipulate the figures contained in a manual.

The system uses CVS to provide support for multiple, simultaneous editors. This means that multiple editors can work on section content at the same time without clobbering each other's changes. Using CVS has the added bonus of keeping a record of what changes were made and by whom.

Figure numbers are generated automatically based on the order they are referenced within the sections of a manual. This requires global processing of the document and can be a relatively expensive operation (compared to the executation time to construct a typical web page). Figure numbers can get out-of-sync whenever figures are added, removed, or rearranged. A figure-numbering procedure runs nightly to update figure numbers, but this can also be done on demand from the admin page for a manual.

HTMLDOC

We run a nightly proc to shove all the parts of each manual into one big file then run HTMLDOC on it to generate PostScript and PDF versions of the manual. This is easy.

The hard part is getting around how braindead HTMLDOC is. First, it requires a strict heirarchy for heading tags. This is accomplished by forbidding the authors from putting any <H#> tags by hand. All heading tags are generated on the fly at the appropriate level based on the table of contents for the manual.

It works out that this is not the only stupidity in HTMLDOC's parser. Things like <p><u>some text</p></u> also confuse it. While most HTML editors are standard compliant in this respect, it seems that MS Word really likes to produce stuff like this. I don't know of any solution to this other than strongly encouraging authors not to use MS Word to generate their documents. Since the HTML produced by Word is deficient in several other ways, this probably won't be a big problem.

Yet another fun aspect of HTMLDOC is that it seems to have some problems with images if an absolute path is given, although I'm just guessing here since this doesn't seem to be documented.

Currently we are using HTMLDOC version 1.7. The latest version is 1.8.4. Possibly some of these problems are solved in later releases. However, there doesn't seem to be a version history on their web page, so the only way to find out seems to be to download the new version and install it and see.

Figures and References

To handle figure and section references in an evolving document we have developed a reference system which is an extension of HTML. This system allows authors to refer to images and sections without knowing where the image is stored, where it appears in the text or what the numbering of a particular section happens to be.

Each section and figure has an entry in the database, label, which is used to make references. Instead of using IMG tags, authors insert images with the tag

<FIGURE NAME="label">

References to figures in the text use

<FIGREF NAME="label">

A similar construct is used for referring to sections:

<SECREF NAME="label">

When a manual section is served, the above tags are replaced as follows:

<SECREF NAME="label"> <A HREF="section-view?manual_id=$manual_id&section_id=$section_id">$section_name</A>
<FIGREF NAME="label"> <A HREF="#label">Figure $sort_key</A>
-or-
<A TARGET=OTHER HREF="figure-view?manual_id=$manual_id&figure=label">Figure $sort_key</A>
<FIGURE NAME="label"> <A NAME="label">
<IMG SRC="$file_name" ALT="label" HEIGHT=$height WIDTH=$width>
<P>Figure $sort_key: $caption</P>

with values pulled out of the database as appropriate.

When the text of a section is uploaded or edited, we parse the file to look for any references which aren't already in the database. References to nonexistant sections are not allowed and the user must go back and change the offending reference. References to unknown figures send the user to a page where they can upload a figure from their hard drive to the server.

Future Improvements


kevin@caltech.edu