How to process 'online'...

No replies
Dave Kinchlea
Dave Kinchlea's picture
Offline
Joined: 2009-04-22

Processing online can be a diffcult nut to crack, when using a web browser you are faced with both power and problems to overcome. The power part is that with the aid of Javascript, a web browser is a very powerful tool that can display, process, and produce content very effectively but a web browser is also something that works within a stateless environment (HTTP) meaning that there are difficulties in keeping states and presenting feedback to users.

Some of the difficulties are:

  • Provide feedback to the user about progress of a long-running, server-based job
  • Keeping connections open between a web browser and web server while the server works on a long-running job
  • Maintaining consistent look & feel across a number of browsers
  • Dealing with no javascript

 

The first two are somewhat related and handled similarly ... the fact is the HTTP protocol does not allow for the server to contact the client, the client is ALWAYS in control telling the server what to do. The best you can hope for is to have the server tell the client how to interact with the server and hope that it does that. In plain English, what has to happen is that the work the server must perform has to be chunked up into segments and the client/browser must tell the server to work on the next (or a particular) segment. You see this a lot in Livelink when you see the bulk operations being performed a "page" at a time.

The act of providing feedback, of making the client ask the server for status (or, in fact, to actually do the next chunk of work) serves to keep the connection open but how this is accomplished has an affect on the service. In Livelink's case, the web server providng the CGI/ISAPI/SERVLET front end does practically zero work, it is simply a communication conduit between Livelink and a web browser. When long-running jobs (over 5 minutes without interaction with a web browser) occur Livelink has a good chance to see the connection between the web server and web browser uncermoniously dropped and. what is sometimes worse, it is often not possible for a web browser to stop a long-running livelink command as hitting "STOP" only drops that connection it doesn't tell Livelink to do anything.

Contrast this with a PHP-based Content Management system like Drupal where the web server is doing real work on it's own and is fully aware of the connection with the web browser, it still cannot "push" information to the client but it can definitely notice with the client connection drops and stop processing when that occurs. Using JSON (Javascript Over the Network) it is possible to have the browser participate in the processing and providing feedback along the way. Thus in our own Usage Profile Dataset Generation page, our processing progress bar not only provides the feedback it actively directs the work.

There are many advantages of a JSON-like approach; jobs can be stopped, restarted, and monitored all by just using a web browser and the jobs themselves automatically share resources in the fairest way possible. But it isn't all good as I recently found out when adding gzip support to my file uploads. My design is to use JSON to invoke the server to perform the next chunk of data but of course that invocation could easily be directed to a different web server than the previous transaction was and thus the input file must be closed and opened for each "chunk". The code keeps track of file offsets and performs an fseek() to the appropriate spot and continues processing from there.

All works great until gzipped content is used, then the same logic just kills performance because each fseek() requires the previous content to be unzipped  so the longer/larger the input the greater the delay between chunks. This problem would never happen to Livelink as it is persistent connections for such work, it would never have to think about the data being chunked or that it would lose a file descriptor.

Which approach is better? Six of one and a half-dozen of the other if you ask me. They behave differently and have and can create different performance/service problems from each other but both are problems that are well understood and can be worked around. For instance, we've put in place an automated "unzipper" batch utility that eliminates the need to see the problem. There are some problems that still elude Livelink, however, and one is in stopping a transaction once it has started.