ArsDigita Archives
 
 
   
 
spacer

Interfacing a Web Service with PC-attached Data Collection Hardware

by Dave Abercrombie (abe@arsdigita.com)

Submitted on: 2000-01-01
Last updated: 2000-09-01

ArsDigita : ArsDigita Systems Journal : One article


Outline

The Problem

Many portable data collection devices, such as hand-held GPS receivers, personal organizers, etc., can exchange data with a desktop computer. They usually do this over a serial connection with proprietary protocols. Vendors of portable data collection hardware usually provide PC-based software to manage the data transfer from the device to the PC, and this software often has a variety of analysis and display features. These data collection applications would benefit from having the data transferred to a web server, which can add capabilities that are impossible or difficult on a PC, such as data sharing, analysis and comparison, publication and archiving.

Transferring data from portable data collection hardware to a web server requires solving and avoiding a significant problem: portable data collection devices usually cannot connect directly to the Internet. They typically do not have ethernet connections, TCP/IP stacks, or support for the HTTP protocol. When it is possible to add direct Internet connectivity to these portable devices with add-on hardware, it can be expensive. The obvious solution to this problem is to use the desktop personal computer as an intermediary as shown in Figure 1 below. Most PCs are connected to the Internet already, so it make sense to transfer data from the portable device to the PC, then from the PC on to the web server.

Figure 1, PC as intermediary

Assuming that the PC is being used as the data transfer intermediary, the next major problem is that existing web browsers are not well suited to this type of data transfer. Browsers are unable to obtain data directly from portable data collection devices. They are largely designed for viewing web pages, not uploading data streams. It can also be difficult to interface the web browser with the PC-based data collection software.

An ArsDigita customer needed this type of data transfer capability for their handheld odor detector, the Cyranose 320 (see http://cyranosciences.com/products/). Cyrano Sciences Inc. had already developed a couple of versions of PC-based software used to control and obtain data from this device. Using the techniques described in this article, ArsDigita extended their software by adding HTTP data transfer abilities.

The Vision

The solution we chose is to integrate HTTP data transfer capabilities directly into the PC-based data collection software. The collection software connects with the device over a serial port or other direct connection, obtains the data, processes it, then uses its Internet connection to send it to the web server.

Programming the device interfaces is beyond the scope of this article, since hardware device interface issues are almost always device-specific. Instead, we'll focus on the challenges involved in adding Internet data transfer code to existing applications.

Because we restricted ourselves to modifying existing data collection programs, we were limited in our choice of programming language and application architecture. We used two Microsoft development environments: Visual Basic 5 Professional (VB) and Visual C++ 6.0 Standard with the Microsoft Foundation Class Library (VC++), which we will discuss in turn.

The Visual Basic Solution

Development Environment Features and Use

Visual Basic includes an "Internet Transfer Control" component that supports both HTTP and FTP protocols. It can operate in either synchronous or asynchronous mode: in synchronous mode, data transfer must be complete before any other code can execute, while in asynchronous mode other code can execute while data is transferred in the background. Using the asynchronous mode is a little more complicated, since you need to write additional code that looks for and acts on certain transfer states.

To use HTTP data transfer capabilities in your program, you first need to add an Internet Transfer Control component to a form. This must be done at design time, not run time. It should be given a descriptive name since the default Inet1, Inet2, etc. names can be confusing. The remaining steps depend on whether you want synchronous or asynchronous transfer.

To use synchronous mode, you must:

  1. Use standard string handling capabilities to create a URL string that includes the protocol (i.e., HTTP), the host, page, and the form variables. An example might be http://scorecard.org/env-releases/state-chemical-detail.tcl?category=cancer&modifier=air&fips_state_code=06.

  2. Call the OpenURL method of the Internet Transfer Control component. This method will return the text of the page, but not the headers. No code will be able to execute during the transfer.

Here is some example code using the OpenURL method:

' we get the hostname from the Form1 web_host property
' hostname is name only, no protocol, e.g., "dev.hostname.com"
' login_page has leading slash and no "?" e.g., "/vb/login.tcl"
upload_host$ = Form1.web_host
login_page$ = "/vb/login.tcl"

' start with protocol and append host
login_url$ = "http://" & upload_host$

' add page and question mark
login_url$ = login_url$ & login_page$ & "?"

' add email and ampersand
login_url$ = login_url$ & "email=" & login_email.Text & "&"

' add password
login_url$ = login_url$ & "password=" & login_password.Text

' get page, parse it later
login_result$ = login_inet.OpenURL(login_url$)

The asynchronous mode requires a few more steps:

  1. Use standard string handling capabilities to create a URL string that includes the protocol (i.e., HTTP), the host, and page. An example might be http://scorecard.org/env-releases/state-chemical-detail.tcl.

  2. Use standard string handling capabilities to create a URL string that includes the form variables. An example might be category=cancer&modifier=air&fips_state_code=06. Note that no question mark is needed at the end of the page nor at the beginning of the form variables.

  3. Call the Execute method of the Internet Transfer Control component. Unlike OpenURL this Execute method takes three arguments: the host/page, the HTTP method (e.g., POST), and the form variable string. This method will send an HTTP request to the web server. Other code will supposedly be able to execute during the transfer.

  4. The StateChanged method of the Internet Transfer Control component will monitor the data transfer. You need to have code in this method that includes a Select Case statement (like a C or Tcl switch) that looks for certain transfer states. For example, you might have a case for the state icResponseCompleted whose contents would execute when the transfer was complete.

Here is some example code using the asynchronous Execute method:

 ' hardcoded host name, later might want to use registry?
 upload_host$ = "dev.hostname.com"
 new_session_page$ = "/vb/upload/open.tcl"
 
 ' create URL for login. will be using post since some fields may be large

 ' start with protocol and append host
 new_session_url$ = "http://" & upload_host$
    
 ' add page
 new_session_url$ = new_session_url$ & new_session_page$

 ' start creating form data for posting ------------------------
 ' read these right out of the text boxes
 strFormData$ = "user_id=" & Form1.web_user_id & "&"
 ' hardcoded streaming_p
 strFormData$ = strFormData$ & "streaming_p=t&"
 strFormData$ = strFormData$ & "title=" & Me.title & "&"
 strFormData$ = strFormData$ & "description=" & Me.description & "&"
 strFormData$ = strFormData$ & "device_info=" & Me.device_info

 ' post
 new_session_inet.Execute new_session_url$, "POST", strFormData$
 
 ' the rest of the action takes place in
 new_session_inet_StateChanged()

And over in the Internet Transfer Control component StateChanged method we have (for the asynchronous mode):

Private Sub new_session_inet_StateChanged(ByVal State As Integer)
    Dim res_string As String
    Dim data_str As String
    
    Select Case State
    ' ... Other cases not shown.
    Case icResponseCompleted ' 12
    
        ' Get the first (and only) chunk from the web server
        res_string$ = new_session_inet.GetChunk(1024, icString)
        ' no need to loop, all fist in 1024 bytes
        
        ' parse out data with custom function
        data_str$ = parse_data(res_string$)
            
        ' pass parsed data back to main form
        Form1.web_data = data_str$
        
        ' close this window as soon as process is complete
        Unload Me
        
    End Select
 End Sub

Design Approach

The application needed to stream data to the web server at a rate of about one serial port reading per second. Primarily for this reason, we decided to use the asynchronous Execute method. Also, we decided to use the HTTP POST method since many of our data strings were very, very long.

The Visual Basic Internet Transfer Control component does not seem to have any ability to deal with cookies; it could neither send them nor store them. It also lacked any ability to read or write headers. Since the ArsDigita Community System depends on cookies for user authentication, we needed to develop another security method. We decided to include user_id as one of the form variables. To prevent simple URL hacking, we also included a form variable that contained a secret code that needed to be correct in order for the data to be acceptable.

In some cases, we needed to pass data from the web server back to the VB application. The web pages were developed to return non-HTML text that could be easily parsed. Standard VB does not have much parsing capability, so we kept the text very simple (e.g., status and error codes). The web server does need to pass back HTTP headers, however (see also the header bug description below).

In one particular application, performance was an important issue. We wanted to be able to stream data at about one post per second. The initial version of the application was developed on a fast, lightly loaded computer with a high-speed Internet connection. Performance was not an issue on such a platform, but testing with slower machines on low-speed connections showed that Visual Basic was dropping data. We developed two independent solutions:

  1. concatenate several device readings into one POST operation, and
  2. have several distinct Internet Transfer Control components which we would rotate in a round-robin fashion.
Through testing, we empirically determined that we could get reliable performance by using three separate components, each posting three concatenated readings at a time.

The web server page that processed the POSTed data attempted to separate multiple device readings that had been posted together by splitting on line breaks. So when we needed to concatenate multiple readings, we used "%0a" to separate readings (this character is equivalent to a line-feed, decimal 10).

Problems, Bugs and Quirks

Visual Basic can be a little tricky to program. Its Internet Transfer Control component has the feeling of a black box. Its limitations are not well documented, and in some cases we had to resort to packet sniffing to diagnose bugs and find limits.

Service Pack Required

The first main bug that we encountered was that VB5'S Internet Transfer Control's HTTP POST method posts garbage rather than data. As it turns out, this was a known bug that can be fixed by a service pack (http://msdn.microsoft.com/vstudio/sp/vs97/). Don't even attempt to use VB5'S Internet Transfer Control until you have applied this service pack.

Undocumented Limit to Number of Components on Form

As mentioned above, to help performance we decided to use several Internet Transfer Control components in a round-robin fashion. However, with as many as 10 control objects, data would be dropped silently. With four or more components on a form, the application would freeze under certain repeatable conditions. So far, use of three components has been reliable. We did not experiment to see if one could get more total components by spreading them out over several forms

Execute POST is not really Asynchronous

According to VB5's "Books Online", the Execute POST method is supposed to be asynchronous:

The OpenURL method results in a synchronous transmission of data. In this context, synchronous means that the transfer operation occurs before any other procedures are executed. Thus the data transfer must be completed before any other code can be executed. On the other hand, the Execute method results in an asynchronous transmission. When the Execute method is invoked, the transfer operation occurs independently of other procedures. Thus, after invoking the Execute method, other code can execute while data is received in the background.

Unfortunately, MS's claim that "...other code can execute while data is received in the background" is not 100% true. We found that if you try to use the same control object again before the remote web server responds, then you get an error (35764, isExecuting).

There is a hint about this in VB's help for the control property StillExecuting. This says:

Returns a value that specifies if the Internet Transfer control is busy. The control will return True when it is engaged in an operation such as retrieving a file from the Internet. The control will not respond to other requests when it is busy.

So, if you want to post frequently, you either need to wait until your control has received data, or you must use multiple controls. Visual Basic 5 OpenURL method is sensitive about headers

It turns out that VB5's Inet1.OpenURL(strURL$, icString) method (the "easy" way to do HTTP in Visual Basic) is very particular about how it gets data from AOLServer. It turns out that VB fails if it hits a page that ns_return 200 text/html but works if it hits a page that does ReturnHeaders and ns_write.

The test page that we tried first used ns_return, and VB generated fatal error messages with it. We made a variation that was identical, except it used ReturnHeaders, and it worked fine!

The difference between the server output is very minor, as shown by the telnet session output below. The "buggy" ns_return version includes Date:, Server:, and Content-Length: headers that are not present in the ReturnHeaders version that works. It's a mystery why this would cause any problem; this bug is not described on the Microsoft web page.

-- works (uses ReturnHeaders and ns_write)

GET /test/login-abe2.tcl?email=foobar@arsdigita.com&password=vbsucks 
HTTP/1.0

HTTP/1.0 200 OK
MIME-Version: 1.0
Content-Type: text/html
Set-Cookie: last_visit=947803872; path=/; expires=Fri, 01-Jan-2010 
01:00:00 GMT

ad: status=approved;  user_id=999;Connection closed by foreign host. 

-- fails with VB Error 13 - Type mismatch (uses ns_return 200 text/html)

GET /test/login.tcl?email=foobar@arsdigita.com&password=vbsucks 
HTTP/1.0 

HTTP/1.0 200 OK 
MIME-Version: 1.0 
Date: Thu, 13 Jan 2000 22:52:22 GMT 
Server: NaviServer/2.0 AOLserver/2.3.3 
Content-Type: text/html 
Content-Length: 49 

ad: status=approved; user_id=999;Connection closed by foreign host.

The Visual C++ Solution

Development Environment Features

Visual C++ 6.0 Standard (VC++) and its Microsoft Foundation Class Library (MFC) provide a very rich set of classes and functions for bringing Internet data transfer into desktop PC applications. The level of control, program expressiveness, documentation, reliability, debugging capabilities, and development environment far surpass what is available in Visual Basic. If you have any choice at all, definitely choose Visual C++ rather than Visual Basic.

VC++/MFC provides a wide variety of tools to work with TCP/IP and protocols such as HTTP and FTP. The remainder of this article is limited to using standard MFC features to implement HTTP data transfer. Steps in a Typical HTTP Client Application

Here are the steps for doing HTTP from a desktop application based on MFC:

  1. Create a CInternetSession object on the stack.

  2. Use CInternetSession::GetHttpConnection() to allocate a CHttpConnection object on the heap

  3. Open an HTTP request with CHttpConnection::OpenRequest(). This will allocate a CHttpFile object on the heap.

  4. Send a request along with the POST data using CHttpFile::SendRequest()

  5. Check on the data transfer status with CHttpFile::QueryInfoStatusCode()

  6. When the transfer status is HTTP_STATUS_OK, then read data returned by the web server with CHttpFile::ReadString()

  7. Clean up after yourself and do all of those wonderful C++ memory management tasks.

Design Approach

We decided to create a custom class named ad_inet that would encapsulate the complexity of the HTTP and memory management steps listed above. The design goal was to be able to post data with a simple use of ad_inet::post_data(). This was written with a single public member function that took either two or three arguments. In both cases, the first argument is the name of the web server page to post to (relative to page root), and the second argument is the data to post. The optional third argument is a pointer to a string variable to be used for storing web server output. If this third argument is missing, then web server output is ignored (but the data is still posted).

For example, user code could be as simple as this:

// create a string for posting 
CString data_to_post;
data_to_post =   "session_id="  + CDataLoggerDoc::m_ad_session_id;
data_to_post += "&session_key=" + CDataLoggerDoc::m_ad_session_key;
data_to_post += "&data="        + web_data_str;
data_to_post += "%0a";  // need that newline for the Tcl parsing, even with only one line

// create a posting object and post data (discarding web server output)
ad_inet inet_post_obj;
inet_post_obj.post_data("/test/deviceX/post.tcl",data_to_post);

The verbosely commented ad_inet class declaration is shown below; the source code to the class definition (http://www.arsdigita.com/asj/pc-data-collection-to-web/pc-data-collection-to-web-source) is also available. Note the suggestions for passing string arguments towards the end.

class ad_inet  {
private: 

        //////////////////////////////////////////////////////////
        // Private data members
        ///////////////////////////////////////////////////////////

        // We will need a null pointer to a CHttpConnection
        // Later, we will get a value for this pointer by passing 
        // a server name and port to CInternetSession::GetHttpConnection
        CHttpConnection* pServer;

        // declare variables for the remote web server name and port
        // (arguments for CInternetSession::GetHttpConnection)
        CString strServerName;
        INTERNET_PORT nPort;

        // We will need a pointer to a CHttpFile
        // Later, we will assign a value for this pointer 
        // by passing verb, page, and flags to CHttpConnection::OpenRequest
        CHttpFile* pFile;

        // This flag is another one of the arguments
        // for CHttpConnection::OpenRequest
        DWORD dwHttpRequestFlags;

        // declare variable to obtain HTTP result status that
        // will be filled by from CHttpFile::QueryInfoStatusCode
        DWORD dwRet;

        // declare a variable to hold our string data that
        // we will be POSTing with CHttpFile::SendRequest
        CString post_data_string;

        // we declare a boolean flag that will be used by 
        // ad_inet::prv_post() to determine whether or not the
        // caller wants the string that is returned by the web
        // server
        //
        // a code reviewer pointed out that this could instead 
        // be a argument to ad_inet::prv_post(), rather than this member
        // but its no big deal
        BOOLEAN want_return_string;


        //////////////////////////////////////////////////////////
        // Private member functions
        //////////////////////////////////////////////////////////

        // this private function will be called by both versions of 
        // the public functions ad_inet::post_data(). This private function
        // contains the ugly nuts and bolts of the posting process. Each
        // version of the public function ad_inet::post_data() is therefore
        // greatly simplified: basically all they do is set a flag that 
        // indicates whether or not they want the return string, and then
        // they call this single, private function.
        void prv_post(LPCTSTR page_from_root
                                , LPCTSTR data_to_post
                                , CString& string_from_web);

public:

        ////////////////////////////////////////////////////////////////////
        //
        // constructor and destructor - nothing fancy (yet)
        ad_inet();
        virtual ~ad_inet();
        //
        ///////////////////////////////////////////////////////////////////

        // MS Help Topic "Strings: CString Argument Passing" sez:
        // http://msdn.microsoft.com/isapi/msdnlib.idc?
        //  theURL=/library/devprods/vs6/visualc/vcmfc/_mfc_cinternetsession.htm
        //
        //
        // CString Argument-Passing Conventions
        // When you define a class interface, you must determine the 
        // argument-passing convention for your member functions. There 
        // are some standard rules for passing and returning CString 
        // objects. If you follow the rules described in Strings as 
        // Function Inputs and Strings as Function Outputs, you will 
        // have efficient, correct code.
        // 
        // Strings as Function Inputs
        // If a string is an input to a function, in most cases it 
        // is best to declare the string function parameter as LPCTSTR. 
        // Convert to a CString object as necessary within the function 
        // using constructors and assignment operators. If the string contents 
        // are to be changed by a function, declare the parameter as a nonconstant 
        // CString reference (CString&).
        //
        // Strings as Function Outputs
        // Normally you can return CString objects from functions because 
        // CString objects follow value semantics like primitive types. 
        // To return a read-only string, use a constant CString reference 
        // (const CString&). The following example ...

        // post data and get string: 
        // page_from_root and data_to_post 
        // are constant strings (LPCTSTR), 
        // while string_from_web will be 
        // modified (CString&)
        void post_data(LPCTSTR page_from_root
                                 , LPCTSTR data_to_post
                                 , CString& string_from_web);

        // post data only (do not care about returned string) 
        // page_from_root and data_to_post 
        // are constant strings (LPCTSTR)
        void post_data(LPCTSTR page_from_root
                                 , LPCTSTR data_to_post);


};

Problems, Bugs and Quirks

VC++/MFC was surprisingly easy to work with, especially compared to VB. In general, the development environment behaved as expected, and took little time to learn.

The MFC Internet classes used here throw CInternetException exceptions, so you definitely want to catch these. It seemed like a good idea to code our own exception classes. For example, the Internet connection and web server might be operating perfectly, but you might post some data that the logic of the web server decides is in some way erroneous (e.g., attempted login with an invalid password). For such conditions, we programmed the web server to return plain text status codes and error messages. Ideally, you would be able to write custom exception classes that would interpret such web server messages and let the C++ client program deal with them in accordance with conventional C++ exception handling techniques. However, the Microsoft documentation was either wrong, incomplete, or impenetrable, since we could not get custom C++ exceptions to work as advertised within the scope of our projects.

The VC++/MFC HTTP classes met our needs in this project. However, others have found them lacking in certain respects. In particular, a well-known and highly regarded book on Visual C++ states:

...MFC developers informed us the the CAsynchSocket and CSocket classes were not appropriate for synchronous programming. The Visual C++ online help say you can use CSocket for synchronous programming, but if you look at the source code, you'll see some ugly message-based code left over from Win16.
(David Kruglinski et al., Programming Microsoft Visual C++ Fifth Edition, Microsoft Press, 1998, ISBN 1-57231-857-0)
This book is essential for serious work with VC++/MFC.

Performance of the MFC classes was acceptable in our experience, even with streaming data. However, if the PC application needs near real-time capabilities, you should plan on profiling and optimizing the code. For example, to obtain better performance, one client took ArsDigita code that used MFC CString classes and replaced them with standard C-style string buffers to avoid the overhead of the MFC buffer management.

General Integration Issues

Existing application code

It is usually a challenge to graft HTTP POST capabilities into an existing application. The data that is to be sent to the web server may exist in several locations inside the PC application. For example, in one project, the existing PC application wrote data to a text file, and the application needed to send each line of the file to the web server as soon as it was written. However, the application was originally designed to write many small chunks of data to the file, one at a time, eventually followed by writing a line terminator to the file. Each write to the file was separated from other writes by many lines of application code. Since we needed to be able to send the full line of data to the server in single, atomic, operation, we had to develop a way to gather up and concatenate these small chunks of data wherever they were generated, then finally sending it to the web server when it was fully assembled. I suspect that this type of problem would be fairly common, unless the PC application was designed for this from the beginning.

Visual Basic Option Explicit

By default, Visual Basic does not require variables to be declared in advance before they are used. Many VB programs are developed in this fashion, because it is thought by some to be easier. However, this default behavior usually leads to bugs that are hard to diagnose, since a typo in a variable name will cause a new variable to be created. Such typos are hard to spot visually. Most experienced VB programmers know that you can make VB require variable declaration (i.e., the Dim statement) by putting Option Explicit at the top of each module. The time saved by avoiding variable name typo bugs more than makes up for the small amount of extra time it takes to plan and declare your variables. Unfortunately, after an application has been written without variable declarations, it is very time consuming to suddenly turn on Option Explicit and mop up after the huge number of errors (undeclared variables) that are found. This problem is common, especially when trying to add HTTP POST capabilities to an existing application.

Conclusion

Both Visual Basic and Visual C++ offer the ability to interface a web service with PC-attached data collection hardware. Of the two programming environments, Visual C++ was easier to work with, had more ability, and was more reliable. The two environments require completely different programming techniques to accomplish the data transfer. It is our hope that the tips, advice, and warnings presented in this article will make your project more efficient.

Many ideas presented themselves during the course of these projects that we hope to be able to explore later. For example, one potential solution to the problem of data being dropped by the Visual Basic application under high load was to spool the data in a filesystem buffer. Another separate process could read this spooled data and send it to the web server as needed. This approach has the potential to increase reliability, but project scope did not allow time to implement a spool. Another idea was to use XML to structure the transferred data: standard XML parsers could then be used to extract the data, rather than developing custom parsers for each situation.


asj-editors@arsdigita.com

Reader's Comments

I was faced with doing something similar. The reason to "web-enable" the data-acquisition was to allow the researchers to remotely check the status of their experiment. The experiments run for 1-5 days. I wanted the design to be minimally invasive on the existing data-acquisition control program (Visual C++/MFC). My design was as follows: 1. When the remote user asks for status, pull the data from the experimental PC, and present it. There was no need to continuously send the data to the web server (in my case) 2. When the user wants to interact with the machine (i.e change parameters or shut-down), use socket programming to pass commands to the data-acquisition control software. A socket listener was added to the control software.

-- sanjeev mohindra, September 4, 2000
There is a new device from Dallas Semiconductor that seems like it could solve this entire class of problems:

http://www.ibutton.com/TINI/index.html

Has anybody used it? It has various serial and i2c interfaces, a 10-base-t interface, a java vm and a tcp/ip stack. And it costs $50. Of course ther are some instruments (you know who you are) that require a pc parallel port and are excluded. Unless the parallel interface is so simple or well documented that it could be reimplemented with basic digital i/o lines which I think TINI has.



-- Bart Addis, September 21, 2000

Related Links

  • Java serial and parallel port support- The javax.comm API provides serial and parallel port support. I have seen drivers for the PC (windows) and Sun hardware. It seems like this would be a good portable way to gateway from a PC data collection device that talks to the serial or parallel port to an HTTP connection, using Java's HTTP protocol capabilities. In the case where the device has some other hardware-specific I/O, such as a PCI card with a C library driver, then using Java is more difficult, although it does support a foreign function interface.    (contributed by Henry Minsky)

  • A summary on serial communication using the TTY protocol- is a great explanation of serial ports and serial communication and programming software which communicates using such channels... Before USB takes over the world, serial and parallel wires will remain a enduring channel for data-capturing devices seeking to communicate with PCs. If you are confused about bauds and bits or even parity and which wires to use or talk to on serial devices (UNIX or Win32 or Mac)--you'll find this document invaluable. This resource is highly rated in the online network programming and electronic engineering community for its succinct explanations. Hope this helps!   (contributed by Li-fan Chen)

spacer