| Selskabet > Jesper Steen Møller > My Thesis | Last change July 5 2000 |
I've now handed in my thesis in Computer Science at DIKU (University of Copenhagen).
The title is Automatic Specialization of Client/server Interfaces (Applying automatic program transformation techniques to minimize latency and bandwidth bottlenecks in distributed programs.)
I'm continuing my work on this exiting topic, so feel free to contact me regard ay ideas or questions.
This thesis studies a new concept in automatic optimization of distributed programs by applying techniques from the field of formal program semantics to reduce latency and bandwidth bottlenecks. The concept is inspired by the work of program specialization, or partial evaluation, where a general program is transformed with respect to a specific data parameter to obtain a faster, specialized program. Similarly, the approach here is to optimize the distributed program by specializing a general network interface with respect to the client code, obtaining a specialized client.
The concept is called interface specialization, and this thesis presents the concept, the transformations, the algorithms, and the semantics for creating specialized client/server interfaces from program analysis.
The concept is evaluated through a prototype implementation showing the feasibility of implementing an interface specializer, and through a benchmark showing the relevance of the concept.
Interface specialization is shown to produce significant speedups when applied to programs communicating via a simple and "clean" object-oriented interface across LANs and WANs, giving programmers the option of focusing on clean interfaces and clients, and then letting a specializer produce interfaces and code that is comparable to that of manual optimization efforts.
The thesis is available in PDF format: thesis.pdf (1.2MB).
It can be viewed by using Adobe® Acrobat® Reader™.
In design and development of distributed software systems in the client/server paradigm, a natural trade-off occurs between efficiency and generality. To achieve the best performance, one should be attentive of each and every network packet going across the wire, both regarding the number of packets, and its contents (are redundant or unnecessary data being transferred?). Several C/S techniques (CORBA, DCOM, Java RMI and RPC) introduces location transparency, where the client/server interaction is modeled on top of normal, blocking method or procedure calls (where some "invisible" generated glue code performs the actual network communication). Without proper attention to the abstraction in use, this technique may lead to an elegant, but inefficient distributed system, with too many network roundtrips and an excessive bandwidth requirement.
This usual elegance/efficiency is not specific to distributed applications. In the field of program transformation, the problem is attacked by semantic analyses of program text: The goal is to be able to develop a program in a readable, straightforward, declarative manner, and then automatically transform the program into an efficient program by various transformation techniques. An application of program transformation is the optimizing compiler.
While attempts have been made to optimize the individual layers of a C/S system (such as marshalling in SUN RPC), the usual techniques deal with the "wrong" problems: The cost of a network call is orders of magnitudes higher than a local function call. Thus, inlining a local call or removing a few simple buffer overrun checks will only give a marginal improvement to the system.
In contrast, I propose a different kind of transformation technique, which tries to minimize interaction over a specific call boundary. I intend to explore (and implement the most promising parts of) the following:
Given a client program C, calling the server S through the interface I. From a static analysis of these calls it is possible to determine the following:
From this kind of information, it is possible to compose a new client program C, which uses a new interface IC. The interface IC is made so that ideally will:
The server program S will similarly be transformed into S, which can implement the new interface by calling the old. If, for some reason, it is not possible to change the server code, a proxy PI,C could be put in place, which would implement IC by calling S in the same fashion as the original client program would. This can be seen as transforming the interaction C --I--> S into either C' --IC--> S', or C' --IC--> PC --I--> S.
In summary, I propose a number of transformations that will specialize a given network interface with respect to a given client program, and then change the client and server programs match the new interface, even if it includes introducing a proxy. This is done under the assumption that there is a severe difference in the costs of calling across these different kinds of boundaries:
The cost range could be from a few microseconds for a call to a function in the cache, up to maybe a quarter of a second for a complete roundtrip across the Internet in normal business hours.
The specialization process is assumed to happen off-line, maybe as a part of the development cycle or final product deployment, as access to the source code is required, and since new code must be compiled and distributed. (It would be possible to perform this kind of transformation on-line on script languages, but it would require implementing a special interpreter and providing a special run-time environment for the server code.)
My goals for my work with this project is the following:
To evaluate the optimizations and the semantics of the language in question, the choice of C/S technology and programming language must be done early in the process.
The C/S technology is expected to be one of DCOM, CORBA, Java RMI or RPC. These technologies are represented by anything from simple scripting languages (like Perl or VBScript), also covering C and Java up to C++. In all cases, I would be constrained by time to only analyze only the main parts of the language. The presence of a publicly available, mature lexer/parser for the language in question would also impact the choice of language and C/S technology.
If you have any comments or suggestions, feel free to contact me.