ArsDigita Archives
 
 
   
 
spacer

Localization

Part of an article on Building a Multilingual Web Service Using the ACS, by John Lowry (lowry@arsdigita.com)

ArsDigita : ArsDigita Systems Journal : One article


previous | next chapter

Localization

Locales are the set of language and cultural rules which are used to format dates, numbers, monetary amounts and other information. Where appropriate we need to localize the information that is displayed on the web site. In addition, we need to be able to accept input in web forms that is in a localized format.

Localizing software is a well-documented problem. The ISO 14652 standard documents the various areas of a computer program that need localization. In fact, the software on top of which ACS runs, Oracle and Unix, already makes available an API to display various information in a locale-specific way. However, neither of these APIs can be conveniently called from within AOLserver, so we have written an API which provides procedures for dealing with reading and writing localized data.

Each procedure provided by our localization API needs to take at least the following two types of input:

  1. The data that needs to be converted to or from a localized format. What data are we going to be using on a web site that requires localization? Here are the areas that we consider most important:

    • Numbers

      Locale differences that affect numbers include the decimal separator and thousands separator.

    • Monetary amounts

      Monetary amounts can have different locale rules from numbers. For example, the number of digits allowable after the decimal point will depend on the currency.

    • Dates

      Dates can be represented with different formatting strings. For example, 7/4/2000 and July 4 2000 are two examples of formatted dates, which are both localized for the en_US locale. We need to be able to localize each formatting string.

    Other types of data can be localized, such as address formats, phone numbers and weights and measures. In the case of measures, it would be necessary to store the measures in a canonical form and convert to and from a localized form, such as meters or feet when doing input or output. It would be possible to extend our localization API to cope with these other types of data.

  2. The locale that will be used for the conversion. We can represent the locale as three part string that includes the language, country and dialect in the following format:
    language_country_dialect
    For a web site, however, it is usually sufficient to specify locales without a dialect. We are unlikely to want to distinguish between, for example, Scottish English and English English in displaying data on a site. Therefore, we have specified a localization API that represents a locale using just the language and country.

    The language code is defined by the ISO 639 standard and the country is defined by the ISO 3166 standard. Examples of locales that we use include en_US (United States English) and fr_FR (French as spoken in France).

The Localization API

The Tcl localization API includes a host of helper procedures but the only ones that need be called are shown below:

Procedures for converting to and from localized versions of numbers:

lc_numeric num fmt locale

This procedure returns a number in a localized format. It takes as input the following parameters:
num is a canonical number.
fmt is a format string used by the Tcl format command. In most cases, this parameter should be an empty string.
locale is the locale abbreviation that will be used to convert the input number. It defaults to en_US.
lc_parse_number num locale integer_only_p

This procedure returns a canonical number, suitable for manipulating in Tcl or inserting into a database. It takes as input the following parameters:
num is a localized number. If num is invalid, the procedure throws an error.
locale is the locale that the input number is formatted with.
integer_only_p restricts valid numbers to integers only if this parameter is a true value. It defaults to a false value.
Procedure for displaying monetary amounts:
lc_monetary_currency -label -style num currency locale

This procedure returns a locale-specific monetary amount. It takes as input the following parameters:
-label is an optional switch. If it is set to a true value, the return value displays a currency alongside the amount.
-style is an optional switch. If it is set to a value of int displays the currency as the ISO currency code. By default it will display the appropriate HTML entity for the currency.
num is a canonical number that represents the amount.
currency is the currency for this monetary amount.
locale is the locale in which to format the monetary amount.
Procedure for displaying dates:
lc_time_fmt datetime fmt locale

This procedure returns a locale-specific date string. It takes as input the following parameters:
datetime is a date string in the form YYYY-MM-DD HH24:MI:SS
fmt is a formatting string specified by the ISO 14652 standard.
locale is the locale in which to display the date.

Locale Data

The localization API in Tcl requires a source of localization data, a list of the decimal separators, thousand separators and other conventions for each locale. Much of this data is already provided by the Linux operating system It's stored in the /usr/share/i18n/locales/ directory and accessed through the localeconv system call.

We used a C program which calls localeconv to dump all the Linux locale data to a file. From this, we have generated a Tcl library which gets loaded at startup by AOLserver. The locale data gets stored in a variable that is accessible to all Tcl interpreters in the server, so that it can be used by the localization API.

If you wanted to support a locale with the API which was not supported by Linux, you would need to add a section to the library with appropriate values for the new locale.

nsv_set locale en_US,abday {{Sun} {Mon} {Tue} {Wed} {Thu} {Fri} {Sat}}
nsv_set locale en_US,abmon {{Jan} {Feb} {Mar} {Apr} {May} {Jun} {Jul} {Aug} {Sep} {Oct} {Nov} {Dec}}
nsv_set locale en_US,am_str "AM"
nsv_set locale en_US,currency_symbol "$"
nsv_set locale en_US,day {{Sunday} {Monday} {Tuesday} {Wednesday} {Thursday} {Friday} {Saturday}}
nsv_set locale en_US,decimal_point "."
nsv_set locale en_US,d_fmt "%m/%d/%y"
nsv_set locale en_US,d_t_fmt "%a %B %d, %Y %r %Z"
nsv_set locale en_US,frac_digits 2
nsv_set locale en_US,grouping {3 3 0}
nsv_set locale en_US,int_curr_symbol "USD "
nsv_set locale en_US,int_frac_digits 2
nsv_set locale en_US,mon_decimal_point "."
nsv_set locale en_US,mon_grouping {3 3 0}
nsv_set locale en_US,mon {{January} {February} {March} {April} {May} {June} {July} {August} {September} {October} {November} {December}}
nsv_set locale en_US,mon_thousands_sep ","
nsv_set locale en_US,n_cs_precedes 1
nsv_set locale en_US,negative_sign "-"
nsv_set locale en_US,n_sep_by_space 0
nsv_set locale en_US,n_sign_posn 1
nsv_set locale en_US,p_cs_precedes 1
nsv_set locale en_US,pm_str "PM"
nsv_set locale en_US,positive_sign ""
nsv_set locale en_US,p_sep_by_space 0
nsv_set locale en_US,p_sign_posn 1
nsv_set locale en_US,t_fmt_ampm "%I:%M:%S %p"
nsv_set locale en_US,t_fmt "%r"
nsv_set locale en_US,thousands_sep ","

Data input

Lets look at how users input data in a web form. Each form is made up of a number of input fields such as the one below:
Price
We need to process the values submitted by the user with the correct procedure from the localization API to convert the localized input into a canonical number that can be manipulated in Tcl or inserted into the Oracle database. In practice, the only type of input that needs to be modified is numbers. We need to use ad_parse_number to get the canonical representation of the user's input in the above form.

The templating module that is supplied as part of the ACS is particularly suitable for use with multilingual web sites because it allows cacheing of translated forms. A programmer using the templating module specifies the form elements for a web page using an XML-like syntax, rather than coding the HTML by hand. Each form element has a number of properties, but the one that we care about for the purpose of localization is its datatype. We need to provide localized versions of templating system datatypes so that the data entry widgets can correctly handle input from different locales.

Here is a form specification for a text entry box which uses the lc_number datatype, that accepts input of localized numbers:

  <element status="optional" width=10>
    <name>price</name>
    <label>Price</label>
    <widget>text</widget>
    <datatype>lc_number</datatype>
    <datamap>
      <table>prices</table>
    </datamap>
  </element>
The datatype of the text widget is set to lc_number, which automatically passes the input through the lc_parse_number procedure and attempts to validate it as a number in the user's locale. Thus a French user could enter 123,45 and the templating system would automatically convert this to 123.45 before it got stored as a number in the database.

Source code examples

We can now look at how a programmer uses the localized API to display localized data on a web page. Here is an example of a Tcl script that does this:
ns_return 200 text/html "
<table cellpadding=3>
<th><td>en_US</td>
    <td>en_FR</td></th>
<tr><td>Displaying a number</td>
    <td>[lc_numeric 123456.789 {} en_US]</td>
    <td>[lc_numeric 123456.789 {} fr_FR]</td></tr>
<tr><td>Parsing a number</td>
    <td>[lc_parse_number 123,456.789 en_US]</td>
    <td>[lc_parse_number "123 456,789" fr_FR]</td></tr>
<tr><td rowspan=2 valign=top>Displaying a monetary amount</td>
    <td>[lc_monetary_currency -label 1 -style local 123.4 USD en_US]</td>
    <td>[lc_monetary_currency -label 1 -style local 123.4 USD fr_FR]</td></tr>
<tr><td>[lc_monetary_currency -label 1 -style local 1234 FRF en_US]</td>
    <td>[lc_monetary_currency -label 1 -style local 1234 FRF fr_FR]</td></tr>
<tr><td>Displaying a date</td>
    <td>[lc_time_fmt "2000-07-24 14:22:34" "%c" en_US]</td>
    <td>[lc_time_fmt "2000-07-24 14:22:34" "%c" fr_FR]</td></tr>
</table>
"
When this script is run from AOLserver, we get the following information displayed in a web browser:
en_US en_FR
Displaying a number 123,456.789 123456,789
Parsing a number 123456.789 123456.789
Displaying a monetary amount $123.40 123,40 $
FFr1,234.00 1 234,00 FFr
Displaying a date Mon July 24, 2000 02:22 PM lun 24 juillet 2000 14:22
Now lets look at how a programmer would use the ACS templating module to display localized data. The programmer specifies data sources for all information that gets displayed in a web page. The programmer needs to ensure that each datasource has been localized where appropriate. For example, here is a data source that displays a localized date.
  <datasource>
    <name>sysdate</name>
    <type>eval</type>
    <structure>onevalue</structure>
    <condition>
    lc_time_fmt [ad_dbquery onevalue "select sysdate from dual"] "%c" $user_locale
    </condition>
    <comment>
    The current date.
    </comment>
  </datasource>
The second argument to lc_time_fmt, %c, is the ISO 14652 formatting string which displays an appropriate date and time representation. A user in the fr_FR locale, for example, sees the following displayed when this datasource in included in a web page with the locale data that we use.

mar 04 juillet 2000 16:30

All numbers and monetary amounts must also be localized with the appropriate localization API procedures within the data source.

More information

Oracle localization information http://oradoc.photo.net/ora816/server.816/a76966/toc.htm
Linux locale man page http://man.he.net/man7/locale
Localization support in internet mail http://www.terena.nl/multiling/ml-mua/mldoc-review.html
ISO 14652 standard for software localization http://anubis.dkuug.dk/JTC1/SC22/WG20/docs/14652fcd.txt
ISO 3166 standard for country codes http://wmbr.mit.edu/stations/ISOcodes.html
ISO 639 standard for language names http://userpage.chemie.fu-berlin.de/diverse/doc/ISO_639.html

asj-editors@arsdigita.com

spacer