ArsDigita Archives
 
 
   
 
spacer

Notes on Constructing MIME HTML Bulk Email Messages (May 2000)

by Henry Minsky (hqm@arsdigita.com)

Submitted on: 2000-06-30
Last updated: 2000-08-05

ArsDigita : ArsDigita Systems Journal : One article


Implementor's note: It cannot be stressed enough that applications using this standard should follow MIME's suggestion that you "be conservative in what you generate, and liberal in what you accept." In this particular case it means it would be wise for an implementation to accept messages with any content-transfer- encoding, but restrict generation to the 7-bit format required by this memo. This will allow future compatibility in the event the Internet SMTP framework becomes 8-bit friendly.

"The Robustness Principle" -- Internet RFC 2015 and many others ...

Many organizations would like to use email to provide services and keep their members informed on a periodic basis.

Spurred by the popularity of HTML as a document interchange format and aided by the MIME standards for multimedia-enriched email, some of these same organizations would like to take advantage of the design possibilities of HTML and its hyperlink facility to send more useful and exciting messages to their members.

At first glance, nothing could seem simpler than sending HTML format content to users by email. Just put your HTML content into a message body, set the content type as "text/html," and send it. If you are feeling virtuous, add a parameter of "charset=iso-8859-1" to the content type. The miracle of Internet standards does the rest. The recipient gets a beautiful, glossy web page delivered fresh into her email inbox in the morning.

Unfortunately this is more difficult than it appears.

The Nitty Gritty World of HTML Email

Who Is Reading Your Email, And With What?

It turns out there are no Internet standards for sending and displaying email that contains HTML code. This is why, for example, you will rarely see relative URLs in HTML email hyperlinks that you receive; in the absence of unambiguous standards for embedded HTML, no one is sure how to make them work. The closest thing is the MIME standard which describes how to encapsulate and identify the message content types and encodings.

Most of the popular email reading clients have some capabilities to display HTML. But the level of HTML they can render varies widely, from the ability to run Internet Explorer as the rendering engine (Eudora 4.x) to being able to handle nothing more than perhaps a <P> paragraph break and hyperlinks (AOL).

For example, ArsDigita client Away.com (http://www.away.com) sends a daily email newsletter to around 600,000 subscribers. About 1,000 new users register at the site every day, and more than half of them subscribe to the newsletter on registering. The format of the newsletter is the user's choice of either plain text or a MIME multipart/alternative message, with a special version formatted for AOL users as described below. The publisher made a policy decision to default the email type to HTML for new subscribers, although that can be changed at the user's preference.

As a result of this policy decision, almost all of users subscribed get the HTML format newsletter. Based on the number of complaints received by customer service over a several month period, we tried to estimate the fraction of users who have problems reading the HTML mail properly. If we assume that the fraction of users who would actually bother to send a complaint if they have difficulty is just one in a hundred, then we estimated that greater than 99.8% of the users were able to read their HTML newsletters satisfactorily.

While your primary goal should be to form a message that as many users as possible can view correctly, you will find you have to make tradeoffs. For example, you may decide it is not worth trying to get your mail to display correctly on WebTV, if that means restricting the HTML features you use too severely. If there is a large population tied to some platform with severe limitations, you may wish to send out separately formatted email to them. The Away.com site sends out newsletters to AOL users in a restricted subset of HTML which matches the AOL email reader capabilities. Note that this kind of selective formatting is really only feasible when you can easily distinguish the target users, perhaps by matching their email domain names or by convincing them to give you an explict preference.

In varying proportions you will find the following classes of email clients, arranged here in a roughly observed order:

  • Microsoft system software: Outlook, Outlook Express, Exchange
  • Windows-based non-Microsoft products: Eudora, Lotus Notes, etc.
  • Webmail servers: Yahoo, Hotmail, AltaVista, Netscape, ACS Webmail (see http://www.arsdigita.com/asj/webmail), etc.
  • Everyone else: Unix emacs (rmail, vm, gnus), pine, elm; Macintosh Eudora or Outlook

The Robustness Principle tells us to be conservative in what we generate. Restrict yourself to the minumal set of HTML directives you find necessary. Each fancy feature that you incorporate (tables, embedded images or the trouble-prone JavaScript) will inevitably cause some email reader to fail to render the message properly. You need to weigh how much you want those fancy web pages versus how many people you are willing to exclude from the content.

The rest of this article will discuss some specific issues with encoding of HTML content as legal MIME messages and suggest some procedures to make this easier and more robust.

Standards for Email Message Encoding, or What Was That RFC Again?

An SMTP email message is composed of three parts
  • The Header. This is the descriptive information associated with a letter, also known as the "inside address" since the purpose is to allow final the recipient to better understand the letter. The email delivery systems are not supposed to look at the header. The address in the header has no direct connection to how the letter is delivered.

    RFC-822 and its descendents have defined standard fields in the header of the letter to allow programs to better process the messages. Since the header information is only to provide information to the mail reading program, a message can still be delivered without any header at all even if it confuses the user by not having "from" and "to" information.

  • The Envelope. The envelope is less visible in an Internet message since it is only used in transmitting the message from one system to another and then discarded. The envelope itself contains important routing data, such as to whom the mail is supposed to be delivered and who originated it. This is the data which is exchanged in the SMTP control commands such as [RFC-821] "MAIL FROM: ," as opposed to the message payload, which is the content that is sent following the SMTP "DATA" command.

  • The Body. This is the message itself.
Creating an email message requires some headers and some content. Let's look at the headers first.

The To: Field

One of the most important headers is the recipient's email address. This may seem quite simple, but you really should make sure you are sending to a valid email address.

In ArsDigita Community System installations with hundreds of thousands of users, we have seen what seems like every possible ASCII string entered as an email address. There are mostly correct ones, and then obviously bogus ones like *#$'12828 xxasdfM, and then there are the ones which look like they could work if you massaged them a little like "Mary Smith @ cnn .com."

Although we don't want to have the email system try to 'fix' email addresses that are not standards-compliant, we recommend that the system implementors take some effort to try to pre-screen or verify users' email addresses when they are entered, so as to have a lower incidence of ill-formed email addresses in the database. Stripping whitespace is a simple heuristic to fix many of the user entry errors.

The From: and Reply-To: Fields

[RFC-822] has this to say about the From: field
     4.4.1.  FROM / RESENT-FROM

        This field contains the identity of the person(s)  who  wished
        this  message to be sent.  The message-creation process should
        default this field  to  be  a  single,  authenticated  machine
        address,  indicating  the  AGENT  (person,  system or process)
        entering the message.  If this is not done, the "Sender" field
        MUST  be  present.  If the "From" field IS defaulted this way,
        the "Sender" field is  optional  and  is  redundant  with  the
        "From"  field.   In  all  cases, addresses in the "From" field
        must be machine-usable (addr-specs) and may not contain  named
        lists (groups).

For email messages that go out to a wide audience, put an email address in the From field that you expect to get a lot of replies to. The Reply-To field is supported by most email clients these days, so it is safe to include a Reply-To address that is separate from your From address. However, we don't recommend this, because some email clients will inevitably not respect the Reply-To field, and you will have people replying to the From address. Or some people may not hit "reply," but copy the From address maually when sending a reply.

Assume that the From and Reply-To fields will be treated interchangeably by the user's mail reader, and expect replies to one or the other to be equally likely.

Special Note on Envelope Return Paths
Many people are confused about the differences between the From field and the envelope return-path. The issue arises when you want to know what happens when an email message bounces.

The key point to understand is that mail system errors and notifications such as bounce notices are sent back to the sender address in the envelope return path, not to the From field of the message.

The built-in AOLserver mail routine ns_sendmail treats the message From field and the envelope sender as the same address, but that is not what you want for real automated mail-handling production systems. The ACS bulkmail package, for example, creates a special unique sender address for each outgoing message which contains encoded information about to whom the mail was sent and from which module and mailing run. That way the system can automatically and unambiguously parse returned mail and match it with the user email address it was sent to. It can then do useful things like updating the user's email_bouncing_p flag in the database.

This is vital to being able to automatically maintain a clean mailing list with hundreds of thousands or millions of users.

SMTP Compliance

There isn't much you have to worry about in terms of SMTP compliance; that should all be taken care of by your email sending routine. However it is worth noting the following.

The SMTP transport protocol [RFC-821] states, for maximum line length:

            text line

               The maximum total length of a text line including the
               <CRLF> is 1000 characters (but not counting the leading
               dot duplicated for transparency).

          ****************************************************
          *                                                  *
          *  TO THE MAXIMUM EXTENT POSSIBLE, IMPLEMENTATION  *
          *  TECHNIQUES WHICH IMPOSE NO LIMITS ON THE LENGTH *
          *  OF THESE OBJECTS SHOULD BE USED.                *
          *                                                  *
          ****************************************************

So they are saying keep your lines under 1000 characters in length. However they also say that implementors of MTAs who make this as a built-in limit are being stupid. In practice, you can probably send arbitrarily long lines to most email systems. However some may give errors if you do. Some modern firewall-based virus detectors can be triggered by overly long lines. If you must have very long lines, use quoted-printable encoding, or some other form of content encoding.

MIME Headers and Encoding

In order to send a MIME message, the standards say you must use at least the following headers:
  • Mime-Version (currently "Mime-Version: 1.0" is the only one supported)
  • Content-Type

Building a Simple MIME Message

We've decided to send an email message with Content-Type of "text/html." However, there are a couple of choices to be made as to how we structure and encode this message.

The simplest structure for the message would be of the form:

To: Mary_Smith@foo.com
Subject: Great Deals on English Muffins
From: info@bar.com
MIME-Version: 1.0
Content-Type: text/html; charset="us-ascii"

<h1>Great Deals<h1>

There are some <i>great deals</i> on English Muffins 
today.

Note: According to the RFC's:

Content-type: text/plain; charset=us-ascii (comment)
and
Content-type: text/plain; charset="us-ascii"
are completely equivalent.

I have seen examples of this kind of message sent in bulk mailings, such as the American Express example at http://www.arsdigita.com/asj/mime/mime-examples/amex.txt.

There are two disadvantages to the simple structure and encoding methods used above:

  • The use of a "top level" Content-Type header of "text/html" means that the entire message contains only HTML. If the recipient's email client is incapable of rendering HTML, they will not be able to read the message. You can alleviate this concern by using the "multipart" MIME encoding described in the next section.

  • Although many mailers in use today safely handle 8-bit character data, using a character set of "us-ascii" may possibly cause some characters with their high-bit set to be mangled by an email server which is not "8-bit clean." This can be solved by either making sure you have only 7-bit characters in your content (i.e., ASCII value less than 127) or by using a Content-Transfer-Encoding like quoted-printable. The finer points of this are also discussed in the next section.

Multipart MIME Messages

The MIME format allows you to create multipart messages, which can contain multiple content parts with different content types. For example, you can send an email message which contains a copy of your newsletter in both "text/plain" and "text/html" formats.

Sending both plain-text and HTML versions of the message is a good option, because it allows for a graceful degradation of the appearance of the message for users whose email clients do not really support HTML but are MIME-aware.

To send a multipart message, use the Content-Type "multipart." There are a number of subtypes that can be used to modify this, but the two we will consider are "multipart/mixed" or "multipart/alternative."

The multipart/mixed content type is used to assert that the message contains several parts, all of which should be presented to the user. The multipart/alternative asserts that the message contains several representations of the same content, and the user's mail client should attempt to show them the "best" one it can. In practice that means that a mail reader with a text/plain and text/html part for a multipart/alternative message will preferably display the text/html. If it is unable to display the text/html message, it should gracefully degrade to a type it can render.

The following example of a multipart MIME message is taken from [RFC-2049]


     MIME-Version: 1.0
     From: Nathaniel Borenstein 
     To: Ned Freed 
     Date: Fri, 07 Oct 1994 16:15:05 -0700 (PDT)
     Subject: A multipart example
     Content-Type: multipart/mixed;
                   boundary=unique-boundary-1

     This is the preamble area of a multipart message.
     Mail readers that understand multipart format
     should ignore this preamble.

     If you are reading this text, you might want to
     consider changing to a mail reader that understands
     how to properly display multipart messages.

     --unique-boundary-1

       ... Some text appears here ...

     [Note that the blank between the boundary and the start
      of the text in this part means no header fields were
      given and this is text in the US-ASCII character set.
      It could have been done with explicit typing as in the
      next part.]

     --unique-boundary-1
     Content-type: text/plain; charset=US-ASCII

     This could have been part of the previous part, but
     illustrates explicit versus implicit typing of body
     parts.

     --unique-boundary-1
     Content-Type: multipart/parallel; boundary=unique-boundary-2

     --unique-boundary-2
     Content-Type: audio/basic

     Content-Transfer-Encoding: base64

       ... base64-encoded 8000 Hz single-channel
           mu-law-format audio data goes here ...

     --unique-boundary-2
     Content-Type: image/jpeg
     Content-Transfer-Encoding: base64

       ... base64-encoded image data goes here ...

     --unique-boundary-2--

     --unique-boundary-1
     Content-type: text/enriched

     This is enriched.
     as defined in RFC 1896

     Isn't it
     cool?

     --unique-boundary-1
     Content-Type: message/rfc822

     From: (mailbox in US-ASCII)
     To: (address in US-ASCII)
     Subject: (subject in US-ASCII)
     Content-Type: Text/plain; charset=ISO-8859-1
     Content-Transfer-Encoding: Quoted-printable

       ... Additional text in ISO-8859-1 goes here ...

     --unique-boundary-1--

Assume we are a sending a multipart/alternative message. We still get a choice of how to encode the content in each part, and which order to put the parts in the mail message.

The encoding of the contents of a MIME part are specified by the Content-Transfer-Encoding header [RFC-2045]. The encodings you can rely on working are "7bit," "8bit," "quoted-printable," or "base64."

The quoted-printable encoding is generally considered the best way to encode HTML or other content which is primarily legal ASCII text. Quoted-printable provides some protection against some of the errors that can be introduced by MTAs along the way, such as deletion of whitespace or truncation of characters' high bits. It also leaves "vanilla" ASCII text alone for the most part, so the message is still mostly readable even when encoded, which is a big help for debugging mail transport errors.

Given all of the previous warnings about email client capabilities, you might have some concern that there are mail readers that cannot properly decode the quoted-printable encoding. However, it is generally safe to assume that any mail client that can render HTML correctly can probably decode quoted-printable correctly. In fact, if a mail reader exists that can render HTML but cannot decode quoted-printable, the affected users should probably upgrade immediately.

The [RFC-2045] has this to say about line lengths and QP encoding:

    (5)   (Soft Line Breaks) The Quoted-Printable encoding
          REQUIRES that encoded lines be no more than 76
          characters long.  If longer lines are to be encoded
          with the Quoted-Printable encoding, "soft" line breaks
          must be used.  An equal sign as the last character on a
          encoded line indicates such a non-significant ("soft")
          line break in the encoded text.

So the recommendation is to keep QP-encoded lines to less than 77 columns. This is very good advice, you would be well-adviced to take it. At one point, before we started QP encoding the HTML, and the publisher was routinely including lines of length 1000 or more, some recipients had trouble, usually from an overly vigilant firewall virus detector. There are apparently a number of security holes in Windows mail readers, and long line lengths in MIME messages can be used as an exploit in some of them.

There are numerous finer points about encoding a MIME message in a standards compliant way. They will not addressed here further. Rather, the discussion of structuring a compliant message will be deferred to the JavaMail section below.

Embedded Images in HTML

One of the first things that publishers seem to want to do is put images into the HTML they send in their email. This opens up a host of issues and problems.

Currently there is no supported standard for embeddeding inline images into MIME HTML messages. There is a new proposal [RFC-2557] "MIME Encapsulation of Aggregate Documents, such as HTML (MHTML)," but it is not clear that any major email clients support it yet. Another striking issue with sending images with every message is the excessive bandwidth that will be used; the images will usually contain far more data than the text portion.

What does tend to be supported, though by no means universally, is plain-old IMG tags using live URL links. That is, you can put an absolute URL in the body of your HTML message, such as

<IMG src="http://www.techrepublic.com/images/trlogo94_60.gif">
and many email readers will fetch the image and render it inline when the user viewing the message.

Note however that this doesn't work if the user is offline! There are users who have programs that dial up, grab their email, then disconnect. So they will not see the pretty pictures in their mail, and may in fact see ugly holes in the formatting and collapsed layout where the images were supposed to go.

If you feel you must use inline images in your HTML mail, remember also that every image will have to be retrieved from your server. One ArsDigita client sent out 750,000 newsletters overnight, each containing twenty or thirty images. The next morning, their server was a lot less responsive than usual!

Other Considerations

While not part of the encoding process, you should consider some other issues with the content of your newsletters.

You should always provide a way for people to unsubscribe themselves from a mailing list. They may have forgotten how they subscribed, or someone may have maiciously subscribed them. It is best to make sure there are multiple ways for the user to stop receiving the mail. The From and Reply-To addresses should support email requests to unsubscribe. The message content should have explicit instructions on how to unsubscribe as well, along with an email address, URL, and maybe even a phone number. There is nothing more frustrating than not being able to stop unwanted email from being sent to you!

The publisher should also provide an easy way for the recipients to set their email type preference, i.e., plain-text or html mail. If you want to be conservative, you can default to sending new users only plain text unless they explicitly specify otherwise. If you default new users to HTML content, make sure you have obvious instructions for them to set their preferences to plain text content.

I would also encourage the publisher to add a link to a copy of the newsletter content on their web site, so people whose email readers are hopelessly inept can still view the content via a web browser.

Letting Someone Else Do The Hard Work

You can write all the code to build compliant MIME messages yourself, or you can try to find code that is already written which helps takes care of the composition.

One option, following Jin Choi's Webmail example at http://www.arsdigita.com/asj/webmail/, is to use the JavaMail library in the construction of a standards-compliant email message. The following example Oracle/Java code constructs a multipart mime message containing plain text and HTML parts. While the initial learning curve is somewhat steep (you need to figure out how to load and call Java inside Oracle), it is very nice to be able to offload the complexities of composing a message onto a standard library. This way, if the MIME standard is enhanced or otherwise changed, you will not have to rewrite much code.

The code below assumes that there is a database table spam_history with a row containing the plain text and HTML versions of the message to be sent. The code uses the JavaMail API to construct a MIME message and then inserts the complete message back into the database. Since this message is designed to go out to millions of users, it is actually constructed as a template, with the To: and Reply-To: headers containing placeholder values. This message template can then be passed to a bulk mailer module that will efficiently send it to a large mailing-list.

The message parts are encoded using quoted-printable encoding, and the entire message is given a multipart/alternative content type. It is at this point that you would also add directives to set the content-type parameters. For example, you might want to specify a charset if you were sending content that was not ASCII or ISO-8859-1 compatible, such as Japanese text.

Sending Email Directly From Java
Note: you could actually send this mail directly from Java, using the JavaMail Transport API. Example code to do this is at http://www.arsdigita.com/asj/mime/java-send. It is not clear that this is something you want to do for high volume mailings using the default JavaMail transport code, however.


// SpamMessageComposer.sqlj
// originally part of the webmail ACS module
// written by Jin Choi 
// hacked by hqm@arsdigita.com to generate SPAM MIME newsletter messages from
// the spam_history table
// 2000-03-27


// This class implements some static methods for composing MIME plaintext and
// mixed text/html messages for the spam system


package com.arsdigita.mail;

import oracle.sql.*;
import oracle.sqlj.runtime.Oracle;
import java.sql.*;

import java.io.*;
import javax.mail.*;
import javax.mail.internet.*;
import java.util.*;
import javax.activation.*;

public class SpamMessageComposer {
    
    protected static Session s = null;

    public static void composeHTMLMimeMessage(int msgId)
        throws MessagingException, IOException, SQLException {
        Vector parts = new Vector(); // vector of data handlers

        CLOB bodyPlainText   = null;
        CLOB bodyHTMLText    = null;
#sql { select body_plain, body_html, subject into :bodyPlainText, :bodyHTMLText, :msgSubject from
		spam_history where spam_id = :msgId };

        //Use Jin's winning CLOBDataSource to grab message from database.

        if (bodyPlainText != null && bodyPlainText.length() > 0) {
            ClobDataSource cds = new ClobDataSource(bodyPlainText, "text/plain", null);
            parts.addElement(new DataHandler(cds));
        }
        
        if (bodyHTMLText != null && bodyHTMLText.length() > 0) {
            ClobDataSource cds = new ClobDataSource(bodyHTMLText, "text/html", "newsletter.html");
            parts.addElement(new DataHandler(cds));
        } else {
          System.err.println("SpamMessageComposer.composeHTMLMimeMessage: bodyHTMLText is null!");
        }
        
        // Create new MimeMessage.
        if (s == null) {
            Properties props = new Properties();
            s = Session.getDefaultInstance(props, null);
        }
        
        MimeMessage msg = new MimeMessage(s);

        String from = "newsletter@away.com";
        String sendTo = "%%_TO_ADDR_%%";
        String replyTo = "%%_REPLY_TO_ADDR_%%";
        String subject = msgSubject;

        // Add the headers.
        msg.setFrom(new InternetAddress(from));
        InternetAddress[] address = {new InternetAddress(sendTo)};
        msg.setRecipients(Message.RecipientType.TO, address);
        msg.setSubject(subject);
        msg.addHeader("Reply-To", replyTo);

        // Add the attachments.
        addParts(msg, parts);

        // Synchronize the headers to reflect the contents.
        msg.saveChanges();

        CLOB composedMessage = null;

        // Grab the CLOB we're going to stuff it in and write the composed message to it.
#sql { update spam_history set mime_html = empty_clob() where spam_id = :msgId };
#sql { select mime_html into :composedMessage from spam_history where spam_id = :msgId };

        msg.writeTo(composedMessage.getAsciiOutputStream());
    }

    protected static void addParts(MimeMessage msg, Vector parts)
        throws MessagingException, IOException {
        
        if (parts.size() == 0) {
            // This should never happen.
            return;
        }

        if (parts.size() > 1) {
            //Make this a mutlipart/alternative message
            MimeMultipart msgMultiPart = new MimeMultipart("alternative");
            Enumeration e = parts.elements();
        
            while (e.hasMoreElements()) {
                DataHandler dh = (DataHandler) e.nextElement();
                String filename = dh.getName();
                MimeBodyPart bp = new MimeBodyPart();
                //Use quoted-printable encoding on the parts
                bp.setDataHandler(dh);
                bp.setHeader("Content-Transfer-Encoding", "quoted-printable");
                if (filename != null) {
                    bp.setFileName(dh.getName());
                }
                msgMultiPart.addBodyPart(bp);
            }
            msg.setContent(msgMultiPart);
        } else {
            // There is only one element.
            DataHandler dh = (DataHandler) parts.elementAt(0);
            String filename = dh.getName();
            if (filename != null) {
                msg.setFileName(dh.getName());
            }
            msg.setHeader("Content-Transfer-Encoding", "quoted-printable");
            msg.setDataHandler(dh);
        }
    }
}

The Oracle PL/SQL wrapper for this looks like


create or replace procedure spam_test_message (spam_id IN NUMBER)
as language java
name 'com.arsdigita.mail.SpamMessageComposer.composeHTMLMimeMessage(int)';
/

call spam_test_message(3677)
/

International Character Set Encodings

Given that the Internet is a global community, you may want to send email in other character set encodings than US-ASCII or ISO-8859-1. For message body content, you should generally only need to add a charset parameter to the Content-Type header. However encoding of non-US charset info in the headers can be somewhat more involved.

Consider this header, which encodes the subject field in two different character sets. The MIME spec provides support for "encoded words" for specifying character sets and encodings within strings in header fields:

   From: =?US-ASCII?Q?Keith_Moore?= 
   To: =?ISO-8859-1?Q?Keld_J=F8rn_Simonsen?= 
   CC: =?ISO-8859-1?Q?Andr=E9?= Pirard 
   Subject: =?ISO-8859-1?B?SWYgeW91IGNhbiByZWFkIHRoaXMgeW8=?=
    =?ISO-8859-2?B?dSB1bmRlcnN0YW5kIHRoZSBleGFtcGxlLg==?=
The headers above show examples of encoding strings in US-ASCII, ISO-8859-1, and ISO-8859-2, using the Q (quoted-printable) and B (binary) encodings. For more information see [I18N-MAIL], i18n and Multilingual support in Internet mail, at http://www.terena.nl/multiling/ml-mua/mldoc-review.html.

Analyzing What Went Wrong

When a user reports that the newsletter "is broken", it is often remarkably difficult to figure out what is going on. Email readers can do so much silent damage to a message when trying to display it that it is often impossible to figure out what they are finally seeing in their mail reader window. Many users have no idea how their email works, and thus cannot describe to you a reasonable model of what may be happening. They simply see something incomprehensible on their screen. Other times the reports are somewhat succint, and indicate that the mail client refuses to launch a browser when a hyperlink is clicked, indicating that at least the links are displaying, although they may be corrupted in some way. At least with some of the webmail services, it is easy to verify if they can correctly handle a MIME HTML enclosure, whereas if someone is using Lotus Notes on a Windows 3.1 machine it is pretty hopeless trying to help them. The best thing is to tell them to switch to the plain text version of the newsletter (which you are providing, right?)

Perhaps not surprisingly, the greatest number of problems I have seen have have been on Microsoft Outlook and Exchange. This may be due to the fact that the user base for these programs is larger than for other email clients, or it may be due to the non-robust nature of Microsoft software, especially in relation to Internet standards.

To illustrate some of the difficulties of debugging email viewing problems from users, here are some real-life examples of bug reports you can expect to receive. These examples are the entire bug report messages, not just excerpts. You can see how much debugging information the typical user will include in their reports.

The users often do forward back a copy of the message with their mail, but it is invariably so chewed up as to be practically unrecognizable. In practically no cases have I ever gotten back a copy of the original newsletter message that was was viewable in its intended form. The implication is that most email systems that cannot display the message will also transform it in a destructive way if the user tries to forward it.

"I can never read your e-mails - is there some way to make them so I can read them?"


"Why is the writing in the e-mail so small? Please enlarge the articles printing."


"To whom it may concern, Unfortunately I'm unable to open your sites, that you send me daily
Any assistance would be appreciated"


"Your email is coming out as HTML code.

Too bad because I was going to forward this to someone who may go to Scotland this summer."


"For some reason, I'm not receiving this properly (see below)......."


" Hi,
I tried to download the image & your links don't work (any of them). "


"Is this the way this is supposed to look?"


"Please advise....I've recieved your Daily Escape for months and months through my email address at xxx@yyy.net and always recieved beautiful and interesting photographs. HOWEVER, since I've switched to ComuServe I re-registered with you for the Daily Escapes to be sent to my new email address at: xxx@xx.com and am not getting photos with the Daiily Escape. Did I sign up incorrectly or ask for the wrong subscription?
Help please Thanks


"I receive you e-mails with instructions to click on the underlined blue highlighted words; however when I do, nothing happens. Are you aware of this fact?

If you do not have the ability to transmit the appropriate communication signals, please delete me from any further e-mails. Otherwise, I look forward to your improved communications.

Thank you!"


"Hi. I have not been able to click on anything in the past few messages I received from you. Certain things are underlined in blue or say click here for more details but I can't. Is there anything you can do to help?
Thanks"

Often it is next to impossible to figure out where the difficulty might be arising. When trying to debug the situation, one approach is to ask the user user "Are there any other HTML newsletters which you receive correctly?" to which the answer is often "no". In this case it is probably a problem with their mail reader's inability to format HTML, rather than our MIME encoding of the messages. Sometimes they say yes, but it turns out they are receiving mail with a subset of HTML which has no images or no tables.

Some mail readers can format simple HTML, but not tables, inline images, or other fancy features. This is an argument in favor of using a simplified subset of HTML when composing your messages.

Examples

You can find some real-world examples of HTML format mail that I have received at http://www.arsdigita.com/asj/mime/mime-examples/. Note the wide spectrum of encoding methods used. It is hard to say which of these formats is the most likely to be readable on the maximum number of mail clients, but it is interesting to note the spectrum of MIME encoding features used (e.g., QP vs 7bit, multipart vs single part).

Final Notes

The use of HTML in email messages is not yet a universally supported standard. Thus, you cannot hope to make something that uses the latest whiz-bang HTML formatting and is reliably readable on every mail client. So you have to ask, for a given feature set, what is an acceptable percentage of messages "unreadable" to customers to aim for? 1%? 0.1%? It is really a judgement call for the publisher. What the world needs is a clearinghouse of client capabilities so that programmers can know what HTML subset is rendered acceptably on what fractions of users' email clients. Without that knowledge, publishers should be conservative about what they try to send as HTML.

References



   [US-ASCII] Coded Character Set--7-Bit American Standard Code for
   Information Interchange, ANSI X3.4-1986.

   [ISO-2022] International Standard--Information Processing--ISO 7-bit
   and 8-bit coded character sets--Code extension techniques, ISO
   2022:1986.

   [ISO-8859] Information Processing -- 8-bit Single-Byte Coded Graphic
   Character Sets -- Part 1: Latin Alphabet No. 1, ISO 8859-1:1987.  Part
   2: Latin alphabet No.  2, ISO 8859-2, 1987.  Part 3: Latin alphabet
   No. 3, ISO 8859-3, 1988.  Part 4: Latin alphabet No.  4, ISO 8859-4,
   1988.  Part 5: Latin/Cyrillic alphabet, ISO 8859-5, 1988.  Part 6:
   Latin/Arabic alphabet, ISO 8859-6, 1987.  Part 7: Latin/Greek
   alphabet, ISO 8859-7, 1987.  Part 8: Latin/Hebrew alphabet, ISO
   8859-8, 1988.  Part 9: Latin alphabet No. 5, ISO 8859-9, 1990.

   [ISO-646] International Standard--Information Processing--ISO 7-bit
   coded character set for information interchange, ISO 646:1983.

   [X400] Schicker, Pietro, "Message Handling Systems, X.400", Message
   Handling Systems and Distributed Applications, E.  Stefferud, O-j.
   Jacobsen, and P.  Schicker, eds., North-Holland, 1989, pp. 3-41.

   [I18N-MAIL] (http://www.terena.nl/multiling/ml-mua/mldoc-review.html) Yuri Demchenko, TERENA ,
  "I18N and Multilingual support in Internet mail, Standards Overview"
  Multilingual Mail Users Agents, TERENA Pilot Project Homepage: http://park.kiev.ua/multiling/ml-mua/

   [RFC-821] (http://www.faqs.org/rfcs/rfc821.html) Postel, J., "Simple Mail Transfer Protocol", STD 10, RFC
   821, USC/Information Sciences Institute, August 1982.

   [RFC-822] (http://www.faqs.org/rfcs/rfc822.html)
   Crocker, D., "Standard for the Format of ARPA Internet Text
   Messages", STD 11, RFC 822, UDEL, August 1982.

   [RFC-934] (http://www.faqs.org/rfcs/rfc934.html) Rose, M., and E. Stefferud, "Proposed Standard for Message
   Encapsulation", RFC 934, Delaware and NMA, January 1985.

   [RFC-1049] (http://www.faqs.org/rfcs/rfc1049.html) Sirbu, M., "Content-Type Header Field for Internet
   Messages", STD 11, RFC 1049, CMU, March 1988.

   [RFC-1154] (http://www.faqs.org/rfcs/rfc1154.html)Robinson, D. and R. Ullmann, "Encoding Header Field for
   Internet Messages", RFC 1154, Prime Computer, Inc., April 1990.

   [RFC-1341] (http://www.faqs.org/rfcs/rfc1341.html) Borenstein, N., and N.  Freed, "MIME (Multipurpose Internet
   Mail Extensions): Mechanisms for Specifying and Describing the Format
   of Internet Message Bodies", RFC 1341, Bellcore, Innosoft, June 1992.

   [RFC-1342] (http://www.faqs.org/rfcs/rfc1342.html) Moore, K., "Representation of Non-Ascii Text in Internet
   Message Headers", RFC 1342, University of Tennessee, June 1992.

   [RFC-1343] (http://www.faqs.org/rfcs/rfc1343.html)
   Borenstein, N., "A User Agent Configuration Mechanism for
   Multimedia Mail Format Information", RFC 1343, Bellcore, June 1992.

   [RFC-1344] (http://www.faqs.org/rfcs/rfc1344.html) Borenstein, N., "Implications of MIME for Internet
   Mail Gateways", RFC 1344, Bellcore, June 1992.

   [RFC-1345] (http://www.faqs.org/rfcs/rfc1345.html)
   Simonsen, K., "Character Mnemonics & Character Sets", RFC 1345, Rationel Almen Planlaegning, June 1992.

   [RFC-1426] (http://www.faqs.org/rfcs/rfc1426.html) Klensin, J., (WG Chair), Freed, N., (Editor), Rose, M.,
   Stefferud, E., and D. Crocker, "SMTP Service Extension for 8bit-MIME
   transport", RFC 1426, United Nations Universit, Innosoft, Dover Beach
   Consulting, Inc., Network Management Associates, Inc., The Branch
   Office, February 1993.

   [RFC-1522] (http://www.faqs.org/rfcs/rfc1521.html) Borenstein, N., "
   MIME (Multipurpose Internet Mail Extensions) Part One:
   Mechanisms for Specifying and Describing
   the Format of Internet Message Bodies" RFC 1521, Innosoft, September 1993.

   [RFC-1522] (http://www.faqs.org/rfcs/rfc1522.html) Moore, K., "Representation of Non-Ascii Text in Internet
   Message Headers," RFC 1522, University of Tennessee, September 1993.



asj-editors@arsdigita.com

Reader's Comments

I was a little surprised to see an article like this come from someone with an @arsdigita.com address, but read it hoping that this would be a fair treatment of the subject, with both pros and cons. Surprise! No such luck. Mr. Minsky seems to have focused exclusively on the technical aspects of sending HTML e-mail to people (presumably folks who have requested it), and completely ignored the other side of the question -- the social consequences of sending HTML e-mail. This was a fine examination of the technical aspects of sending HTML e-mail -- it just didn't look at the whole picture, and indeed, it suggests there isn't much else to look at.

I have two points that I wish had been covered in the article: First, what are the existing social conventions on the network about HTML in e-mail, and second, what do you gain by doing this? Both are critically important.

Certainly, a large number (perhaps even a significant majority) of users have clients capable of rendering HTML e-mail. But for those of us who, by choice or by no fault of our own, use clients that do not render HTML, publishers who choose to encode their e-mail run the risk of sending us junk -- which may eventually get your mail ignored.

Beyond the readability issue, the question is "how many users are you willing to piss off, and how thoroughly will you piss them off?" I've made it a habit to not return to sites (usually sites that want my money) if they send me HTML e-mail. I'm sufficiently reactionary to this stuff that I refuse to read e-mail sent in HTML format -- reactionary to the point that I actively filter for it. The number of spammers using HTML as an encoding method make this a particularly useful strategy for dealing with that problem; while it's probably killing some legitimate mail, I console myself with the knowledge that if they're rude enough to send HTML in the blind, I'm going to return the favor and not read their mail.

I'm probably at the extreme end of the spectrum, but I'm willing to bet that a lot of technically minded individuals -- people for whom using Outlook or Netscape is like pulling teeth -- feel exactly the same way about HTML in non-Web settings, whether it's e-mail or on Usenet. If you know that your target audience can read it, and, more importantly, is willing to read it as HTML, then by all means, send in HTML (as mediated by what I'm about to say). But if you don't know what your target audience is using, don't send in HTML. It's the same thing as not sending e-mail to people who haven't explicitly requested it -- it's rude if you do, and while there's no technical prohibition against doing it, there are numerous social conventions you'll be breaking. With all the discussion about personalization of Web services, it would be trivial to allow users to select whether they want to receive HTML e-mail or not -- assuming, of course, that the default was set to "no."

Aside from HTML itself, adding images to e-mail is a bad idea too. Consider the poor guy with a 28.8 dialup trying to load 30 in-line 25kb JPGs. He's not going to be too happy with you, particularly if he lives in the UK where people are billed for local calls. It's a thin end of the wedge problem, too: How long will it take before publishers start embedding streaming video in e-mail, and what happens to the poor guy on the 28.8 link then? He'll stop coming to your site, he'll filter your mail, and presto -- you've lost a visitor and, presumably, a customer.

This is a bad idea for the same reasons it's a bad idea to send unsolicited e-mail to your users. You may have their e-mail address (a resource), but they may not want to hear from you. Similarly, they may have bandwidth, but they don't want to let you use it to suck back those 30 in-line JPGs. My point so far? Unless you know what your users want and are willing to give you, stick to something very simple that doesn't consume more resources than it absolutely has to.

The other issue that wasn't addressed in the article was the question of what you gain by sending HTML e-mail. Most mail clients that can render HTML can also pick out URLs and turn them into hyperlinks on their own, so the idea of adding clickable links to e-mail becomes irrelevant. If the sole reason you're going to send in HTML is to include images, I would urge you to think really hard before doing it, and again consider what is gained by adding images to e-mail. Do you really need to include product shots? Do you need fancy formatting, backgrounds, fonts, and other chaff that merely "looks nice" but doesn't add anything to the message? What are you trying to communicate that would require the use of HTML (or images), or what do you want to do in e-mail that can't be better done elsewhere? What's the point behind sending images and fancy fonts? Fundamentally, does it communicate a different message from the one you'd be sending with plain text? In e-mail, more than on the Web, the content is what's important, not the presentation style. Minimalist e-mail can be beautiful, and perhaps more effective than maximalist multimedia presentations.

There's no right answer to this, obviously. It's a decision that each publisher is going to have to make independently from every other. But there are issues beyond the ones raised by Mr. Minsky, and publishers who are going to send their users e-mail need to be aware of them, lest they incur the wrath of readers.

-- Mike Sugimoto, June 9, 2000

Henry,

I appreciate the blurb on inline images in HTML e-mail, and your warning of the problems that they bring. As it turns out, as of June 2000, many of the major mail clients (Netscape, MS Outlook) do in fact support RFC 2557; but, many web-based mail readers do not properly handle the "multipart/related" MIME content type or the funny "cid:" URIs in IMG tags that refer to attachments elsewhere in the MIME document (e.g., <img src="cid:part2.MAIL.ID@arsdigita.com">)

The result is, if you attach in-line images, many mail readers might give up and revert to showing the plain text version rather than the HTML version, if you have one.

To echo the previous commenter, think twice before putting in-line images in your HTML e-mail. It may really cause you and your users more trouble than it's worth.

-- Bill Schneider, June 22, 2000

I think that Mike Sugimoto raises some excellent questions, and that publishers would do very well to pay heed to his strong feelings on the subject.

My personal feelings are similar to his, in that I believe that email should of course never be sent unsolicited, and if the user does sign up, they should explicitly be able to choose the format, with a default of plain text. And I tried to be explicit about providing as many possible ways for people to figure out how to unsubscribe themselves as possible.

The publishers have different goals, and are often insensitive to the protocols and unwritten courtesies of the Internet communities. I agree that often times HTML mail is just used to get "in your face" rather than to provide additional function or ease of use.

To Bill Schneider and others, thanks for your comments about specific capabilities of mail readers. It is exactly this kind of information I hoped to exchange with other developers. I feel like I have been shooting in the dark somewhat when sending out millions of emails on behalf of publishers, and not knowing how many people were going to be unable to correctly view the messages, or what the common problems would be. I still see a fraction of unresolved problems, mostly with Outlook on Windows not being able to select the hyperlinks. Never reproducible on any of our systems of course.

Adhering to the email and related standards is very important in this area, to prevent total Microsoft-like chaos, but because there are so many options and implementations, and constant pressure to make use of new MIME features, it is very hard to figure out what features to try to use and expect to work. I hope people will share their experiences, and make this a living document.



-- Henry Minsky, June 28, 2000

Nice article! I've just suffered a lot to code the html email newsletter with my site news (in portuguese).

Here some comments/tips:

  • Verify your user email. The best you can do is: 1) DNS check the email domain; 2)Catch usual errors (e.g. in Brazil typing domains like hotmail.com.br, or new users that type www.newuser@aol.com)
  • Automatically discover if your user client supports html email. Send a multipart welcome message where the html part has an image with the user id codified in the url. If the image is loaded, update user data to send them the html version.
  • Test with hotmail. Hotmail changes your message code, since they have a lot of users, open an account there to test.
  • Forward and reply your html email with some clients. Some clients (like Netscape messager) usually don't include your text, or even crash when forwarding or replying a html message. Keek it simple and test with different clients.
That's it.

-- Paulo Eduardo Neves, August 31, 2000
In several years of e-mail admin I have to say that MS e-mail clients are both the most error-prone and the least error-tolerant that I have encountered. This means that they are more likely to have problems with each other's messages than cause problems for others.

Some examples:

  1. The "begin " bug where any line in a message that starts "begin " is treated as the start of a uuencoded attachment, the rest of the message becoming the attachment title. Affects several versions of OE.
  2. Outlook 97 sometimes forgets to send the final terminating full stop (period) to mark the end of a message (especially if the last line of the message ends in a full stop). Some mail systems will reject the unfinished message - and Outlook will give the sender no notification that this has happened. Others will accept the message - but if the recipient also has Outlook 97 then it will lock up trying to download the message. No other mail client will have this problem.
  3. If the "Content-Type: multipart/mixed; boundary=" line in a message is word-wrapped then all the Outlook and OE versions I have so far seen will not be able to decode the message - even though they themselves will word-wrap this line in the messages they send.
All the *nix mail clients I have used are robust enough to ignore these errors. Its the MS clients that can't cope with the errors caused by MS clients.
[comment quoted from discussion on Slashdot with permission of author (itsbruce@netscape.net.REMOVE)]

-- Henry Minsky, September 23, 2000
Thanks for the very thorough article -- it's exactly what I was looking for. The other readers' comments are helpful too.

One more nice trick I've seen done is to place an HTML comment early on in the HTML mail saying something like "<!-- Whoa! What's all this garbage in my email? If you can read this, then your email program probably doesn't support HTML-formatted messages. Here's how to make it better: ... -->" (instructions for getting plain-text mail follows).

I'm also still leery of using HTML in email, especially given possible abuses with IMG tags for tracking readership, and rogue javascript.

-- Steve Yost, October 4, 2000

I wouldn't mind seeing a similar discussion to this one restricted to rules for formatting plain text messages. In particular, there seem to be issues around:

* width of lines that will be rendered

* fixed width font issues (is there some way to construct tabular data without fixed width fonts that's readable)

* is there any convention that might hint to a mail reader that the content should be fixed width?

* use of tab characters

* mail client issues

* rendering of hyperlinks (OE, for example, seems to NOT render them as clickable if you view the message in fixed width)

* splitting of hyperlinks if they're too wide. How wide should my hyperlink be?

* other mangling that might be done somewhere along the way

* canned hints that might be included for the reader as to how best view the message

etc.

Does anyone have such a list or a pointer to one?

Thanks, Bob Sidebotham

-- Bob Sidebotham, October 20, 2000

Related Links

  • Tcl modules standard library- The new tcllib library includes a very capable MIME encoder/decoder package: "Tcllib is a collection of utility modules for Tcl. These modules provide a wide variety of functionality, from implementations of standard data structures to implementations of common networking protocols. The intent is to collect commonly used function into a single library, which users can rely on to be available and stable. "   (contributed by Henry Minsky)

  • Info from djb- Daniel J Bernstein's collection of information about email including relevant RFCs, info about bounces, headers, and problems in MTAs   (contributed by Archit Shah)

  • HTML E-mail: Tricks of the Trade for Zend.com- An article about generating MIME messages in PHP. Suggests that base64 encoding should be used rather than quoted-printable, because of buggy Microsoft email clients.    (contributed by Henry Minsky)

spacer