Transaction Anatomy

Incoming

This documents the process that occurs when an HTTP request is made to Webware. Most of this process is internal to Webware, but knowing it can help you understand what's going on even if you don't modify any of that code.

The First Connection

The process begins when a browser requests the web server to for a page. The browser tells the server what page it wants to receive, and passes any cookies that are marked for the server and any form variables (GET or POST).

Because different web servers are supported, and under each server there several ways to interact with Webware, there are a variety of adapters that will handle the request at this point. With the exception of OneShot, they will generally package up the request and send it over a socket to the AppServer. The AppServer has been started ahead of time, and is waiting to respond.

AppServer

The AppServer is generally the subclass AsyncThreadedAppServer, however I will refer to it simply as AppServer.

The AppServer listens for requests from an Adapter 1. When it receives a request, it puts it in a queue and the next available thread will handle the request. A fixed number of threads are started on launch, and if that pool of threads is exausted the request will block until another request has been finished. (@@ Correct?)

1: note_address_text

The thread (which is an instance of the RequestHandler class) will wait until it has read all the data (in RequestHandler.handle_read), placing the data in RequestHandler.reqdata. Then the RequestHandler.handleRequest method is called. The request that was passed over the socket is then unmarshalled 2 (if the request was not properly packaged, you will get a marshalling error here -- that is what happens when you try to connect directly to the AppServer from your browser)

2: note_marshal

RequestHandler handles STATUS and QUIT methods directly. (@@ how would these requests be made?) All other requests are handled by Application. AppServer keeps an instance of Application, and Application.dispatchRawRequest is called with the unmarshalled request 3.

3: note_request

OneShot

When you connect through OneShot.cgi you go through largely the same process, except that there is no persistance. The OneShot adapter starts a new AppServer (OneShotAppServer) for each request. This is inefficient but at times convenient.

Application

Creating a Request

Application.dispatchRawRequest takes the dictionary that was passed over the socket, and creates an HTTPRequest. HTTPRequest.__init__ parses the dictionary. It parses fields, cookies, and some internal values that are used by Application. (For instance, the fields are passed in as a raw, URL-encoded string, but are converted to a dictionary-like object)

Creating a Transaction

After the HTTPRequest is created, it is passed to Application.dispatchRequest. This creates a Transaction with Application.createTransactionForRequest. Transaction simply acts a container for these various pieces of a transaction (request, response, session, servlet, and application), and passes messages to them (through methods). Transaction is otherwise stateless and has no logic. It is not the parent of these objects -- in particular, the session, servlet, and application will typically outlive the transaction.

Creating a Response

A response is created with Application.createResponseInTransaction. An HTTPResponse is created, and again Transaction acts as a container.

Finding the Servlet file

The Application asks for HTTPRequest.serverSidePath, which in turn calls Application.serverSidePathForRequest. This then tries to find the Servlet that corresponds to the URI asked for.

Consider an example URI:

http://www.server.com/cgi-bin/OneShot_.cgi/Welcome

Application keeps a cache of URLs and their matching files. If a cached filename matches, we use that. Otherwise:
Remove the portion that relates to the adapter (/cgi-bin/OneShot.cgi).
Inspect the first portion of the path (/Welcome):
- Does it match a Context? (Contexts are listed in WebKit/Configs/Application.config)
- If so, look in this context. If not, consider it to be in the default context (the default is defined in Application.conf). In our example it wouldn't match a context, and so we'd treat it as though it was in the default context (Examples).
Look in the directory that matches the value of the context entry in Application.config. This directory is considered relative to the location of Application.py, i.e., the Webware/WebKit directory. In our example, Webware/WebKit/Examples
Follow the path until you find a file. In our simple example, all that's left of the path is /Welcome.
The file can have any extension (@@ I'm not really sure how this works)
If you have ExtraPathInfo set to 1 in Application.config, then anything that is left of the path will be available to your servlet through the method request.extraURLPath() (@@ oh, I can't remember where -- there also appears to be some sort of attempt to match this remaining path information to a file)

With the filename of the servlet, Application continues.

Dispatching on the Result of serverSidePath

When Application.dispatchRequest gets the resultant serverSidePath, it calls one of a couple methods:

If the result is None, then the page was not found: Application.handleBadURL, which gives a 404 message.
If the result is a directory, but the request didn't end in a slash: Application.handleDeficientDirectoryURL, which gives a redirect to the same location with a "/" appended to the URL.
If the session ID is invalid (doesn't exist or has timed out, Application.isSessionIdProblematic): Application.handleInvalidSession, which creates a new session ID, sets the cookie, and passes to Application.handleGoodURL
Otherwise (all good): Application.handleGoodURL

Creating a Servlet

Application.handleGoodURL calls Application.createServletInTransaction. Like the path lookup, this method first looks for a cached Servlet. If it's found, it checks the timestamp on the cache and the source file, invalidating the cache if necessary.

If a cached Servlet wasn't found, or the cache was invalidated, it creates a new cache entry for the Servlet 4.

4: note_cachedServlet

Application.getServlet actually creates the Servlet. The cache actually keeps a queue of available instances of the Servlet, which are reused when possible. (@@ what's up with the factories here? I know what they do, but not how they get called)

Waking the Transaction

Once the Transaction has a Servlet to work with, it calls Transaction.awake, Transaction.respond, and finally Transaction.sleep. Transaction in turns calls these methods on both the Session and the Servlet.

HTTPServlet.awake doesn't do anything, unless you override it in a subclass -- typically you would override it to set up resources and instance variables for the servlet, or to do actions based on the request.

Session.awake sets its list access time and number of accesses when awake is called.

Responding

HTTPServlet.respond is called with the transaction as its only argument. It calls a method based on the request type: 'GET', 'POST', 'PUT', 'DELETE', 'OPTIONS', 'TRACE'. 'GET' calls HTTPServlet.respondToGet, 'POST' respondToPost, etc., all with the transaction as an argument. The actual Servlet must override these methods to give the desired behavior.

Session.respond does nothing.

Page

Page (a subclass of HTTPServlet) has more interesting behavior. It is particularly directed towards generating HTML (HTTPServlet is entirely content-neutral), and consolidates a number of things.

Page.awake initializes a number of variables that will not change for the entire transaction (but may change if the Page is reused for other transactions).

Page.respondToGet and Page.respondToPost both call Page._respond, which looks for a field named '_action_' and dispatches based on that. '_action_' is translated by Page.methodNameForAction, and the result must be among the list returned by Page.actions (cached by Page._actionSet). If no '_action_' field is given, Page.writeHTML is run.

Page can also generate the HEAD, TITLE, and other elements of the HTML page. You can override Page.writeBody or Page.writeContent to generate content, and methods like Page.title to generate other content. It's easiest just to look at Page.py to see these.

Writing a page

Page.write calls HTTPResponse.write with its arguments. HTTPResponse.write holds these strings in a list until you are finished with the transaction. You can also stream the output by calling HTTPResponse.flush, which will start sending output directly -- once this has occurred, you can send no more headers (such as cookies, redirects, etc).

The Return Path

Application

After having set up the request, we need to back out all the way to the browser.

After Application.dispatchRequest has called Application.handleGoodURL (which calls awake/respond/sleep), it will call HTTPResponse.deliver, which basically marks the response as committed (i.e., nothing more can be added). Then Application.returnInstance is called, which returns the Servlet instance back to the pool of cached servlets (to be reused for a later request).

Response

RequestHandler.handleRequest calls HTTPResponse.rawResponse, which returns a dictionary containing the keys 'headers' and 'contents'. Headers is a list of header/value pairs. For example:

[('Content-Type', 'text/html'),
 ('Set-Cookie', 'foo=bar')
]

RequestHandler.handleRequest then turns this into a normal CGI-style response, with header: value at the top, a blank line, and then 'contents'. It then deletes the transaction.

Adapter

Having waited patiently, RequestHandler will finally send the string contructed from the Response to the Adapter over the socket. The adapter will deal with it as appropriate. E.g., the CGI adapter prints the result to stdout.

Finished

The user sees the page, and it is good.

The AppServer writes the hostname and port to a file address.text. The Adapter reads this file to determine where it can connect to the AppServer.

Marshalling takes simple Python values -- strings, lists, numbers, etc., and puts them into a string representation.

The request is a dictionary with the keys 'format', 'time', 'input', and 'environ':

'format':: The only current allowed value for 'format' is 'CGI'.
'time':: A timestamp (seconds from the Unix Epoch).
'environ':: A dictionary that looks like what os.environ would look like were this actually a CGI call -- that is, with keys like REQUEST_METHOD, QUERY_STRING, etc.
'input':: The request that the browser made. This would be something like GET /Examples/View?filename=Welcome.py (@@ POST example too?)

The cache for the Servlet is used both for the file path lookup, and for the Servlet cache (i.e., two caches keyed by URL/PATH_INFO and by serverSidePath, but pointing to the same cached data). (@@ maybe some information on how the cache is stored)