

It can be difficult to debug character set encoding problems, because
files can be converted to different encodings at so many points in the
Aolserver and ACS request processing pipeline.


As of ACS 3.4.1, here is the flow of control for a document being
served by the web server. Note that the control flow described below
in the case when my patches have been applied. 

Each point at which a character set encoding translation can happen is
indicated.

Also described is the path for conversion of form data (from a GET or
POST query)


0) A GET or POST to the server initiates a request to the
request processor

There are three types of request we handle:

  GET

  POST x-www-form-urlencoded 

  POST multipart/form-data






1) Tcl script file




File on disk: /www/foo.tcl

[.tcl extension has been registered at ACS restart
 by code in abstract-url-init.tcl, using rp_register_extension_handler]

+ Abstract URL processor dispatches to rp_handle_tcl_request

rp_handle_tcl_request calls source_with_encoding with the file name

 - Without my patches, the normal Tcl "source" command would be called,
 and the file would be loaded and run assuming the system default
 encodiong (ISO -8859-1).

+ source_with_encoding is called with the disk filename

+ source_with_encoding calls [ns_guesstype $filename] to get the MIME type.
  
The MIME types for file names can be registered in the .ini file
with explicit charset parameters, like this:

[ns/mimetypes] 
.tcl=text/html; charset=shift_jis

Note: ns_guesstype, an AOLserver command, uses only the file suffix after
the final '.' in the filename. So you can't assign a MIME-type to a 
file extension containing a '.' , for example ".jp.tcl"

+ source_with_encoding then calls ns_encodingfortype to find the Tcl
charset encoding given the MIME type. This performs a translation of
the "Internet" character set name to the Tcl encoding name. The Tcl
encodings share some of the same names as the MIME charset names, but
not all of them, hence the mapping is table driven.

For example, the MIME type "text/plain; charset=ISO-8859-1" would map
to "iso-8859-1", but "text/plain; charset=shift_jis" woud map to
"shiftjis".


+ The file is opened using Tcl "open" and the channel encoding is set
using "fconfigure -encoding" to the encoding computed in the previous
step. The file is then loaded into a Tcl string, and the specified
charset encoding translation is performed. Thus the file ends up
translated into a UTF8 string, ready to be evaluated.

+ A call to ns_startcontent is made using the MIME-type. This sets the
encoding of the AOLserver network output connection to the same
encoding as was computed for loading of the .tcl file. This is a
heuristic that I added, it may be useful to defer this to happen
someplace else.

+ A call to "uplevel" on the file data causes it to be evaluated in
the context of the rp_handle_tcl_request call.

++ Output to the network connection ++







2) ADP file




3) HTML file




4) Other files