ArsDigita Archives
 
 
   
 
spacer

Webmail Design Document

by Erik Bielefeldt (adapted from the original by Jin Choi)

I. Essentials

When applicable, each of the following items should receive its own link:

II. Introduction

Email handlers are among the first user-level programs written for any new operating system, and are one of the few core tools that almost anyone using a computer will use on a regular basis. Most recently, we have seen a blossoming of Web-based email systems such as Hotmail and Yahoo Mail. Why build yet another mail system?

Some of the desirable traits of a mail system are:

  • Centralized storage. Users should see the same email history every time they check email, no matter which computer or email reader they happen to be using.
  • Reliability. Email is important. A disk failure or a negligent sysadmin should not be a cause for losing it. The mail server should always be running, and well-connected to the internet.
  • Availability. Email should be readable wherever you are.
  • Completeness and correctness. An email reader should be able to receive, display, and send attachments. Any message it sends should be standards-conforming. Because many other systems are not, it should be able to handle common deviations from the standard.
The webmail module addresses the first three traits (the last is a work in progress). These requirements argue for the primary message store to remain on a well-administered server. These are the same needs addressed by the designers of IMAP. IMAP solves all these issues except for one: availability; an IMAP client isn't always installed on every computer with a net connection, whereas a Web browser almost always is. But a Web browser is a less-than-ideal interface for reading email when compared to an all-singing, all-dancing mail client. Thus, the ideal mail solution is an IMAP server with a web interface that accesses the same message store as the IMAP client.

III. Historical Considerations

Mail systems with this architecture already exist. Oracle provides software that does what the webmail module does, and probably more. CriticalPath is a company that provides out-sourced email with IMAP and web front ends. These may be better than webmail. CriticalPath certainly has the advantage that it requires no effort on the part of the user other than sending them a check every once in a while. However, when I used CriticalPath, it was unreachable or unuseably slow about half the time (usually due to network problems to their server). I ran out of patience attempting to install Oracle Email Server.

IV. Competitive Analysis

The downside to these other systems is lack of control. It is difficult to modify the look or extend the features of an email server without access to the source code. In the case of CriticalPath, you are stuck with what they provide, and cannot integrate it to provide web-based email as a seamless service of your web site. If you are using the ArsDigita Community System, webmail provides the core of a web-based email system that relies on proven, reliable systems to do all of the hard work and that is simple to extend. If you are not using the ACS, then perhaps studying an implementation of a working system will aid you in building something suitable for your own needs.

V. Design Tradeoffs

VI. Data Model Discussion

The following section will step through the data model, discussing important or interesting aspects.
-- Domains for which we receive email.
create table wm_domains (
        -- short text key
        short_name              varchar(100) not null primary key,
        -- fully qualified domain name
        full_domain_name        varchar(100) not null
);
The wm_domains table contains the domains for which we expect to receive mail. The short_name field stores the text key used to differentiate mail to different domains as discussed above. Qmail must be configured to handle email for these domains manually.
-- Maps email accounts to ACS users.
create table wm_email_user_map (
        email_user_name varchar(100) not null,
        domain          references wm_domains,
        user_id         not null references users,
        primary key (email_user_name, domain, user_id)
);
wm_email_user_map assigns email addresses to ACS users. Why not just use the email column in the users table? This approach lets each ACS user receive email for more than one email address and also permits flexibility on which address is published to other registered users. As a row is inserted into this table, the appropriate .qmail alias files are created for that user.
-- Main mail message table. Stores body of the email, along
-- with a parsed text version with markers for attachments for MIME
-- messages.
create sequence wm_msg_id_sequence;
create table wm_messages (
        msg_id          integer primary key,
        body            clob,
        -- plain text portions of MIME message; empty if 
        -- entire message is of type text/*.
        mime_text       clob,
        message_id      varchar(500), -- RFC822 Message-ID field
        unique_id       integer -- for both POP3 UIDL and IMAP UID
);

create index wm_messages_by_message_id on wm_messages(message_id);
This is the primary message table. It stores the body of the message, a parsed plain-text version with markers for attachments if this is a multipart MIME message, a denormalized Message-ID field for easy reference by Message ID, and yet another ID field for IMAP bookkeeping. The message_id field is not unique, since the same message may have been received multiple times.
-- Stores attachments for MIME messages.
create table wm_attachments (
        msg_id          not null references wm_messages,
        -- File name associated with attachment.
        filename        varchar(600) not null,
        -- MIME type of attachment.
        content_type    varchar(100),
        data            blob,
        format          varchar(10) check (format in ('binary', 'text')), -- for interMedia INSO filter
        primary key (msg_id, filename)
);
This table stores MIME attachments and associated information. The format field enables the use of the interMedia INSO filters to search encoded files, such as Word or PDF documents.
-- Maps mailboxes (folders, in more common terminology) to ACS users.
create sequence wm_mailbox_id_sequence;

create table wm_mailboxes (
        mailbox_id      integer primary key,
        name            varchar(100) not null,
        creation_user   references users(user_id),
        creation_date   date,
        uid_validity    integer, -- Needed for IMAP
        unique(creation_user, name)
);
A "mailbox" is what other systems would term "folders."
-- Maps messages to mailboxes (and thus to users).
create table wm_message_mailbox_map (
        mailbox_id      integer references wm_mailboxes,
        msg_id          integer references wm_messages,
        seen_p          char(1) default 'f' check(seen_p in ('t','f')),
        answered_p      char(1) default 'f' check(answered_p in ('t','f')),
        flagged_p       char(1) default 'f' check(flagged_p in ('t','f')),
        deleted_p       char(1) default 'f' check(deleted_p in ('t','f')),
        draft_p         char(1) default 'f' check(draft_p in ('t','f')),
        recent_p        char(1) default 't' check(recent_p in ('t','f')),
        primary key (msg_id, mailbox_id)
);
Messages "belong" to mailboxes, which "belong" to users. This table exists to map messages to mailboxes and store subsidiary status information for the messages. Why aren't these just fields in the wm_messages table? Originally, there was some notion that identical messages to multiple recipients could be folded together to save on storage. This may happen at a later date, but raises reference counting issues.
-- Parsed recipients for a message; enables search by recipient.
create table wm_recipients (
        msg_id          integer not null references wm_messages,
        header          varchar(100) not null, -- to, cc, etc.
        email           varchar(300) not null,
        name            varchar(200)
);

create index wm_recipients_by_msg_id on wm_recipients(msg_id);
This table contains parsed recipient address information for each message.
-- Headers for a message.
create table wm_headers (
        msg_id          integer not null references wm_messages,
        -- field name as specified in the email
        name            varchar(100) not null,
        -- lowercase version for case insensitive searches
        lower_name      varchar(100) not null,
        value           varchar(4000),
        -- various parsed versions of the value
        time_value      date, -- date/time fields
        -- email and name, for singleton address fields like From
        email_value     varchar(300),
        name_value      varchar(200),
        -- original order of headers
        sort_order      integer not null
);

create index wm_headers_by_msg_id_name on wm_headers (msg_id, lower_name);
Headers are stored separately from the message to aid in searching. Where applicable, header values are parsed and stored in typed fields. The original ordering of the headers is maintained, both so that we can recreate the header block and because order is significant for certain fields.
-- Table for recording messages that we failed to parse for whatever reason.
create table wm_parse_errors (
        filename                varchar(255) primary key not null, -- message queue file
        error_message           varchar(4000),
        first_parse_attempt     date default sysdate not null
);
If an error occurs while attempting to parse a message, we store a record of the error in this log for the administrator to review. Only the first occurrence of an error is logged for any file, to prevent hundreds of identical error messages from clogging the log.
-- Used for storing attachments for outgoing messages.
-- Should be cleaned out periodically.

create sequence wm_outgoing_msg_id_sequence;

create table wm_outgoing_messages (
        outgoing_msg_id         integer not null primary key,
        body                    clob,
        composed_message        clob,
        creation_date           date default sysdate not null,
        creation_user           not null references users
);

create table wm_outgoing_headers (
        outgoing_msg_id         integer not null references wm_outgoing_messages on delete cascade,
        name                    varchar(100) not null,
        value                   varchar(4000),
        sort_order              integer not null
);

create unique index wm_outgoing_headers_idx on wm_outgoing_headers (outgoing_msg_id, name);

create sequence wm_outgoing_parts_sequence;
create table wm_outgoing_message_parts (
        outgoing_msg_id         integer not null references wm_outgoing_messages on delete cascade,
        data                    blob,
        filename                varchar(600) not null,
        content_type            varchar(100), -- mime type of data
        sort_order              integer not null,
        primary key (outgoing_msg_id, sort_order)
);

-- Create a job to clean up orphaned outgoing messages every day.
create or replace procedure wm_cleanup_outgoing_msgs as
begin
  delete from wm_outgoing_messages
    where creation_date < sysdate - 1;
end;
/

declare
  job number;
begin
  dbms_job.submit(job, 'wm_cleanup_outgoing_msgs;',
                  interval => 'sysdate + 1');
end;
/
When composing messages for sending, the unsent message and any attachments are stored in the database. When the message is sent, a MIME message is composed consisting of the text of the message followed by any attachments (there is currently no facility to intersperse attachments with text), and the data in these tables deleted. If a user begins to compose a message, then neglects to send it, the unsent message would remain in the database indefinitely. We could handle this either by presenting a list of unsent messages to the user and presenting the option of completing or deleting the message, or by periodically cleaning out any old unsent messages. Webmail does the latter.

Unsent outgoing attachments could as well be stored in the filesystem, but it is easier to manage them when they are all contained within the database.

-- PL/SQL bindings for Java procedures
create or replace procedure wm_process_queue (queuedir IN VARCHAR)
as language java
name 'com.arsdigita.mail.MessageParser.processQueue(java.lang.String)';
/

create or replace procedure wm_compose_message (outgoing_msg_id IN NUMBER)
as language java
name 'com.arsdigita.mail.MessageComposer.composeMimeMessage(int)';
/

-- Parse the queue every minute. Queue directory is hardcoded.
declare
  job number;
begin
  dbms_job.submit(job, 'wm_process_queue(''/home/nsadmin/qmail/queue/new'');',
                  interval => 'sysdate + 1/24/60');
end;
/
These PL/SQL bindings for Java procedures are the heart of the system. wm_process_queue attempts to parse every file in the given directory as an email message, deliver it to a webmail user, and delete the file. It is run every minute by an Oracle job.

The argument to wm_process_queue here is one of the two hardcoded paths in the webmail module. It must be changed if the webmail user's maildir is created in a different location than that shown.

-- Trigger to delete subsidiary rows when a message is deleted.
create or replace trigger wm_messages_delete_trigger
before delete on wm_messages
for each row
begin
  delete from wm_headers where msg_id = :old.msg_id;
  delete from wm_recipients where msg_id = :old.msg_id;
  delete from wm_message_mailbox_map where msg_id = :old.msg_id;
  delete from wm_attachments where msg_id = :old.msg_id;
end;
/
This trigger makes deleting messages easy; deleting from wm_messages will also delete the appropriate rows from any subsidiary tables.
-- interMedia index on body of message
create index wm_ctx_index on wm_messages (body)
indextype is ctxsys.context parameters ('memory 250M');

-- INSO filtered interMedia index for attachments.
create index wm_att_ctx_index on wm_attachments (data)
indextype is ctxsys.context parameters ('memory 250M filter ctxsys.inso_filter format column format');

-- Trigger to update format column for INSO index.
create or replace trigger wm_att_format_tr before insert on wm_attachments
for each row
declare
  content_type  varchar(100);
begin
  content_type := lower(:new.content_type);
  if content_type like 'text/%' or content_type like 'application/msword%' then
    :new.format := 'text';
  else
    :new.format := 'binary';
  end if;
end;
/

-- Resync the interMedia index every hour.
declare
  job number;
begin
  dbms_job.submit(job, 'ctx_ddl.sync_index(''wm_ctx_index'');',
                  interval => 'sysdate + 1/24');
  dbms_job.submit(job, 'ctx_ddl.sync_index(''wm_att_ctx_index'');',
                  interval => 'sysdate + 1/24');
end;
/
These indices and triggers enable full-text searches over messages. An INSO filtered index is also created to allow full-text searches over any attachments which contain text, including formatted documents.

VII. Legal Transactions

/admin/webmail/

The following legal transactions can occur from the events administration pages located under /admin/webmail/:
domains
--Domains may be created and deleted.

accounts
--Email accounts may be created or deleted.
/webmail/

The following legal transactions can occur from the events administration pages located under /webmail/:
messages
--Messages may be viewed, re-filed, or deleted.

composing messages
--New messages may be composed.
--Attachments may be added.

VIII. API

PL/SQL Procedures:

wm_process_queue (queuedir IN VARCHAR) Processes the mail queue directory and inserts messages into the database (scheduled to run every minute by default)

wm_compose_message (outgoing_msg_id IN NUMBER) Given an outgoing_msg_id, updates the wm_outgoing_messages table and sets the composed_message column to a complete message (including mail headers) which is ready to send.

proc wm_link_author { address } Adds an author filter.

proc wm_link_subject { subject } Adds a subject filter.

proc wm_header_display {msg_id header_display_style user_id} Procedure for displaying headers.

proc wm_quote_message { author msg_text } Quote text with "> " at the start of each line (for replying to messages).

proc wm_check_permissions { msg_id user_id } Checks if user may view message.

proc wm_get_mime_part { conn context } Gets the data from an attachment.

IX. User Interface

The user interface for webmail includes:

An interface for the user:
  • Browsing available messages
  • Reading specific messages
  • Composing outgoing messages
  • Customizing the user interface and functionality of the module
An interface for the administrator:
  • Choosing domains handled by Webmail
  • Adding and deleting users in those domains
  • Viewing a list of recent Webmail errors

X. Configuration/Parameters

Jin Choi has written an excellent document on installation and configuration of the Webmail module. This covers configuring qmail, loading the data-model, the java files, and testing and configuring the system.

XI. Future Improvements/Areas of Likely Change

Future improvements include the ability to block senders, save drafts, save copies of sent messages, a more streamlined user-interface, custom signature files, and an IMAP and POP3 interface.

XII. Authors


Erik Bielefeldt
spacer