
Mon Sep 11 03:26:57 2000

Notes on patching ACS 3.4.3 and AOLserver 3.0+ad7 to support multiple
character sets.

This document describes how to set the default web server character
set to Japanese Shift JIS, and to support Chinese BIG5 encoding at the
same time.

There are a number of issues with character set encoding and single
and multi-language sites, discussed in an article at
http://www.arsdigita.com/asj/multilingual

These patches do not address all the issues raised there in terms 
of internationalization and localization, but they do allow you to
manually manipulate character set encodings used in input and output
of web page scripts and data.

================================================================


+ Patch to AOLserver to disable automatic charset translation for user
form data:

The following patch causes the [ns_conn form] command to grab form
data with no charset translation. This is needed in order to permit
you to manually perform charset translation of user submitted data in
your application, using a modified version of ns_getform.

1) Unpack AOLserver source, and modify the file nsd/conn.c to contain
the following replacement definition for Ns_QueryToSet, using the
patch file acs-aolserver-ad7.patch:

 cd /place-you-put-aolserver-sources/nsd
 patch < acs-aolserver-ad7.patch


2) copy the new encoding file "8bit.enc" to the AOLserver tcl /encoding directory
(wherever it was installed. Look under aolserver/tcl8.3.2/library/encoding)

 cp 8bit.enc /place-you-put-aolserver-sources/tcl8.3.2/library/encoding

3) Recompile aolserver using "make" and install the new nsd8x binary
into your web server bin directory

4) Modify your web server's modules/tcl/charsets.tcl and form.tcl
files as follows.


 cd /web/yourserver/modules/tcl
 patch < ad7-aolserver-tcl-i18n.patch


This will allow your application handle form data submitted with GET
and both flavors of POST.  The patch is to add an optional CHARSET arg
to ns_getform, and to perform encoding translation in Tcl instead in
in AOLserver.

If you want the data converted from a specific character set, you can
either pass a charset arg explicitly to ns_getform, or set the global
flag using ns_setformencoding.

Note: the encoding returned from ns_getform with no arguments should be
the default system encoding set by ns/parameters/URLCharset

================================================================

Patching the Request Processor

The patch below makes the request processor check the MIME type
of tcl, html and adp files before loading them, and if a charset
is found will perform the encoding translation to that charset.

The patch also sets the url_charset using ns_setformencoding, so that 
user submitted  form data  (POST or GET) will be translated from
that encoding to Tcl internal UTF8. This can be overriden with an
explicit call to ns_setformencoding in your script file before
a call to ns_getform or ad_page_contract.

 cd packages/acs-core
 patch < acs342-acs-core.patch

================================================================



Configuring AOLserver/ACS to a Default Character Set

The config parameters below will configure the default charset
conversion for output of documents to be Japanese ShiftJIS. 

Documents with extension .tcl_cn and .adp_cn will be loaded and
delivered in big5 encoding, and user form data from ns_getform will be
converted to big5 by default in this case (with the above patches to
request processor)


To set the default charset for your site  to be Japanese Shift JIS, use
the following in your main Aolserver .ini or nsd.tcl file.

[ns/parameters]
HackContentType=1
URLCharset=shift_jis
OutputCharset=shift_jis
HttpOpenCharset=shift_jis

[ns/mimetypes] 
Default=text/plain 
NoExtension=text/plain 
.pcd=image/x-photo-cd 
.prc=application/x-pilot 
.doc=application/msword
.xls=application/msexcel
.ppt=application/x-mspowerpoint
.swf=application/x-shockwave-flash
.html=text/html; charset=shift_jis
.html_sj=text/html; charset=shift_jis
.html_cn=text/html; charset=big5
.txt=text/plain; charset=iso-8859-1
.txt_sj=text/plain; charset=shift_jis
.tcl=text/plain; charset=shift_jis
.tcl_cn=text/plain; charset=big5
.adp=text/html; charset=shift_jis
.adp_cn=text/html; charset=big5

================================================================

See the examples/ directory for examples of tcl and other scripts which use multibyte 
charset data.



================================================================

References 


http://www.hut.fi/~jkorpela/HTML3.2/3.1.html

http://www.hut.fi/u/jkorpela/www/windows-chars.html

http://www.hut.fi/u/jkorpela/chars.html

