W3C


WARN !  This page has not been updated since release 1.0alpha3. Even though the  ideas haven't changed  that much, some class have been renamed as part of an API cleanup (check the Release Notes). A new architecture overview is under construction here.


Jigsaw Architecture

Jigsaw is made of two distinct modules, linked through a set of interfaces:

As these two modules are linked through a set of Java interface specifications, you can replace each of them independantly of the other, provided they implement adequatly the interfaces.

The daemon module

As a server administrator, you probably won't have to deal much with this part of Jigsaw, although it might be a good idea to read this section (or at least the one on terminology), just to get a filling of what's happening behind the scene. This section will goes through a bit of terminology, it will then step through the life-time of a connection, and introduce the resource module.

Terminology

The protocol module deals with a number of objects. To get you into this world, we will start by describing the most important.

httpd
This is the object whose main method will actually run the server. It has two purposes: the first one is to run the accepting thread, i.e. the thread that will loop waiting for new incomming connections to come by. The second purpose of this object is to manage the set of other objects, responsible for handling part of the server behavior. Among them, there is the logger (responsible for loggin requests), the authentication realm manager (responsible for the list of authentication realms defined attached to the server), the client pool (responsible for handling accepted connections), the root resource of the server, and last but not least, the resource store manager. We will describe all these objects more precisely in the comming sections.
logger
Each time a request processing terminates (be it with success or not), the server will call back the logger so that it can keep track of all handled requests. The current version of Jigsaw comes with a simple logger, compliant with the Common Log file format (ie it will emit a one line record for each processed request).
realm manager
The realm manager keeps track of all the authentication realms defined by the server. Each created authentication realm is assigned a symbolic name, that the web admin will use to refer to it when configuring the server. This name will also be used as the HTTP realm name, so it should be uniq within the server scope. The sample implementation of this object manages a persistent catalog of realms, that can be edited through HTML forms (see Jigsaw administration manual).
client pool
The client pool object is responsible for handling new incomming connections. It should make its best effort to guess what protocol the other end wants to speak on this connection, and create an appropriate client object, to handle it. The current sample implementation will always assume that new connections are for speaking the HTTP/1.0 protocol (with the addition of persistent connections). The other role of the client pool is to optimize as much as possible thread creations. Thread creation can be a costly process, wo its worth trying to avoid it as much as possible. The sample implementation will maintain a ready-to-run set of client objects, so that it won't re-create them from scratch upon each new connections.
root resource
The root resource is the object that will link the protocol module to the resource module. This object should implement the appropriate interface (right now, it should be an instance of the ContainerResource, but this is likely to change in the very near future).
resource store manager
As you will see in the next section, Jigsaw serves each file or directory by wrapping it into some Resource instance. As their number might become fearly large, the server will keep track of the one that haven't been accessed for a while, and unload them from memory. The resource store manager is responsible for this piece of the server behavior: it keeps track of all the loaded resources, and unload them when it thinks appropriate.

Given these definitions, we can now explain how the server handles new incomming connections.

Life-time of a connection

The life time of a connection can be divided into the following steps:

  1. The accepting thread is notified of it.
  2. It gets handled by the client pool object
  3. A thread starts waiting for incoming requests
  4. The request is handed out to the resource module, for actual processing
  5. The resource module generated reply is emited
  6. The server logger is called back to log the request

The first stage in processing a new connection, is to hand it out as quickly as possible to the client pool (so that the accepting thread can return as fast as possible to the accept system call). The client pool then look for an idle Client object, if one is found, it is bind to the accepted connection, which makes it run its main loop. If no client is available, if we have reached the maximum allowed number of connections  the new connection gets rejected (by closing it), otherwise a fresh client is created and bound to the connection.

By the end of stage 2, the client pool has either rejected the connection, got a new client to handle it, or created a fresh client for this connection. At stage 3, the client object is bound to the connection, and awaken to actually process it. The client thread enters the client main loop.

The client main loop starts by getting any available request. When such a request has been read from the network, it is handed out to the resource module. This latest module is responsible for generating an appropriate Reply object, which is then emited (stage 5) by the client thread, back to the browser. Finally the server logger is invoked with the request, its reply and the number of bytes sent back to the browser.

At the end of this request processing, the client object tests to see if it can keep the connection alive. If so, it loops back to stage 2, otherwise, the client notify the client pool that it has become idle. The client pool, in turn, decides if this client object should be spared for future use or not.

The Resource module

The resource module is the one that manages the information space. In Jigsaw each exported object is mapped to an HTTPResource instance, which is created at configuration time, either manually or by the resource factory. We will describe here what are resources and then sketch the way Jigsaw looks up the specific target resource of a request. We finally present the filter concept.

Resources

Resources are full Java objects, defined by two characteristics:

The AttributeRegistry keeps track of all the attributes of each classes. As instance variables, attributes are inherited along the normal sub-class relationship. Each resource attribute is described by some instance of the Attribute class (or some of its sub-class). This description is made of

Given this description, Jigsaw is able to make resources persistent, just by dumping their class-name, and all their attribute values. Unpickling (i.e. restoring) a resource is just creating an empty instance of its class, and filling its attributes with their saved values.

Resource instances are the basic resources. The HTTPResource class is the basic class for resources that are accessible through the HTTP protocol. Instances of this class define a number of attributes along with the method that implements the HTTP methods (e.g. GET, POST, PUT, etc. which are mapped resp. to the get, post put, etc methods of the class). These methods are all trigered through the perform method of the class, whose role is to dispatch a request to the appropriate handler.

Remember in the previous section, we said that request were handed out the the resource module. The perform method of HTTPResource are the one that get called by the daemon module once the target resource of the request has been looked up. The next section explains how this lookup is performed.

Looking up resources

In the paragraph about the life-time of a connection, we mention that at stage 4 the parsed request was handed out to the resource module. The first thing the resource module does when it receives a request, is to look up the target resource. This paragraph explains briefly how this is performed.

Jigsaw defines a special subclass of the HTTPResource class, the ContainerResource, whose role is to implement the look up strategy for the sub-space it controls. All servers (i.e. all instances of the httpd class) keeps a pointer to their root resource. This root resource must be a container resource: it must implement the lookup method. Given a request, this method must return a suitable target resource for processing it. However, there is no constraints on how the lookup is performed. We will briefly sketch how directory resources implement their lookup method.

The directory resource's lookup method starts by checking that it has an up-to-date list of children. What is meant by up-to-date here might not be what you expect: Jigsaw caching strategy can make this notion quiet complex. Anyway, once the directory resource thinks its list of children is up-to-date, it looks up the first component of the URL in its children set. For example, if the URL is /foo/bar, it starts by looking up foo in itself. This can lead to three cases, depending on the result of this:

This look up process is just one example of how the look-up operation can be implemented. It has several advantages in the specific case of handling directory resources, but other situations may require other algorithms. One important property of the directory resource's lookup algorithm is that it  is able to delegate sub-space naming to the resource that actually handles the sub-space.

Resource filters

We have briefly described Jigsaw resource module. The last thing you need to understand is Jigsaw's concept of resource filters. You might have been surprised that until now, we haven't said a word on authentication. In Jigsaw authentication is implemented as a special resource filter. Resource filters are a special kind of resource (i.e. they are persistent, and can define any kind of attributes), that are attached to some target resource. Filter instances are called back twice during request processing:

For a resource to support filters, its class must be a subclass of the FilteredResource class. Most resource classes provided with Jigsaw distribution are sub-classes of it.

Back to authentication now. As we said above, authentication is handled by a special filter, whose ingoingFilter method tries to authenticate the request. If this succeeds, normal processing of the request continues: it is performed by its target resource, and the corresponding reply is emited back to the browser. In the case of the authentication filter, as all the work is done only in the ingoing way (while the target resource is being looked up), there is no need to have the outgoingFilter method called. A filter ingoingFilter method can return a special value DontCallOutgoing to indicate that it has performed all its job, in such cases, the server won't spend time invoking its empty outgoingFilter method. More return codes are available, see the api documentation for the ResourceFilter to get into the details.

Further reading

The best way to continue your Jigsaw tour now, is to install it, and to read the following tutorials:


Jigsaw Team
$Id: architecture.html,v 1.13 1998/05/27 09:01:25 yves Exp $