Dr Wayne Kelly [HREF1], Center for Information Technology Innovation, QUT, w.kelly@qut.edu.au
Assoc. Prof. Paul Roe [HREF2], Center for Information Technology Innovation, QUT, p.roe@qut.edu.au
(This work was supported by Microsoft Research)
The following example shows how to create a simple web service using Microsoft's .NET platform. Any common language runtime (CLR) conformant language can be used to implement a web service, in this example we use the new C# language. Methods that are to be exposed as part of the web service are marked with a WebMethod attribute.
<%@ WebService Language="C#" Class="MathService" %>
public class MathService {
[ WebMethod ]
public int Add(int num1, int num2) {
return num1+num2;
}
}
From this information, a WSDL specification can be automatically generated (click here to view MathService.wsdl). Given a WSDL specification, a utility program can used to create a client proxy class for accessing the web service.
public class MathServiceProxy : System.Web.Services.Protocols.SoapHttpClientProtocol {
public MathService() {
this.Url = "http://www.servername.com/LocalPath/MathService.asmx";
}
public int Add(int num1, int num2) {
object[] results = this.Invoke("Add", new object[] {num1, num2});
return ((int)(results[0]));
}
public System.IAsyncResult BeginAdd(int num1, int num2,
System.AsyncCallback callback, object asyncState) {
return this.BeginInvoke("Add", new object[] {num1, num2}, callback, asyncState);
}
public int EndAdd(System.IAsyncResult asyncResult) {
object[] results = this.EndInvoke(asyncResult);
return ((int)(results[0]));
}
}
The following example shows how this proxy class can be used to invoke
web methods synchronously or asynchronously.
class ExampleClient
{
static MathService mathProxy = new MathServiceProxy();
static void Main(string[] args) {
// invoke Add method synchronously
int result = mathProxy.Add(1, 2);
Console.WriteLine("1 + 2 = {0}", result);
// invoke Add method asynchronously
IAsyncResult resultHandle = mathProxy.BeginAdd(1, 2, new AsyncCallback(MyHandler),
new int[] {1, 2});
// processing continues without a result ...
// later, can block until result becomes available
resultHandle.AsyncWaitHandle.WaitOne();
// ...
}
// callback function used to process asynchronous results
static void MyHandler(IAsyncResult ar) {
int [] savedState = (int[]) ar.AsyncState;
int result = mathProxy.EndAdd(ar);
Console.WriteLine("{0} + {1} = {2}", savedState[0], savedState[1], result);
}
}
Invoking web methods asynchronously is obviously more complex than invoking them synchronously, however, it conforms to the design pattern used for asynchronous operations throughout the .NET platform, and will therefore soon be familiar to many programmers.

static void Main(string args[]) {
// setup proxies for each server
MathService[] mathProxy = new MathServiceProxy[args.Length];
for (int i=0; i<args.Length; i++) {
mathProxy[i] = new MathServiceProxy();
mathProxy[i].Url = args[i];
}
// ...
WaitHandle[] waitHandles = new WaitHandle[args.Length];
// invoke Add method asynchronously N times on different servers.
for (int i=0; i<args.Length; i++) {
IAsyncResult resultHandle = mathProxy[i].BeginAdd(i, i+1, new AsyncCallback(MyHandler),
new int[] {i, i+1});
waitHandles[i] = resultHandle.AsyncWaitHandle;
}
// wait for all results
WaitHandle.WaitAll(waitHandles);
// ...
}
While this architecture could be used to create parallel applications, it is very heavyweight with respect to the amount of installation and configuration required on each volunteer machine. Firstly, each volunteer machine must be set up as a web server. Secondly, a web service that implements the parallel portion of the application in question must be deployed on each volunteer machine. This architecture makes it difficult for additional, individually "owned" machines to donate their cycles as they become idle. Most workstation owners won’t have, or won’t be willing to install a web server, and won’t be willing to hand over the administrative privileges necessary to allow others to deploy new web services on their machines. Firewalls, pose an even greater problem for this architecture - even if a volunteer machine behind a firewall has a web server installed, clients outside of the firewall still won’t be able to contact it.
The server is the only machine that exposes actual web services through a web server. Client applications push "jobs" (web method invocations) into the job repository and volunteer machines pull jobs out when they wish. From the client’s perspective, the web service appears to be implemented on a single server machine. The server transparently delegates the actual execution of web services to whichever volunteer machines are available at the time. Neither the client nor the server need know in advance who the potential volunteers might be. Volunteers located anywhere on the Internet can contact the server at any time and thereby participate in parallel computations, even if they are behind a firewall (in most cases). Clients also can connect to the server from any machine on the Internet. Clients don’t have to log on to the server in order to deploy or launch applications, client applications execute directly on the end user’s desktop and seamlessly tap into the computational resources made available via the server. Note, this architecture allows clients to communicate (indirectly) with Volunteers, even if both parties are behind a (different) firewall. Such communication is generally not possible for pure peer-to-peer systems.
Our framework is currently implemented using Microsoft’s new .NET platform, however, all communication is performed using the open SOAP protocol, so any of the components (client, server or volunteer) could easily be implemented using alternate technologies (such as Java).
Web services are typically developed by third parties and deployed on a particular machine. Volunteer components, however, are typically not black-box components provided by third parties, and they are not pre-deployed on a particular machine. Volunteers component are simply the parts of an application that need to be executed in parallel. Volunteer components are typically created by the same programmer as the client component, and as we will see, need to present at runtime on the same machine as the client.
Visual Studio.NET allows two types of runtime dependences to be specified for a component: local component references and web service references. Local references are dependences on .NET components that are expected to exist on the local machine at runtime. Local references are added to a project by specifying a local path to the actual component. The referenced component must exist at compile time, but it is used at that time only for the purposes of determining its strong name (which is compiled into the dependent component). Web references, by comparison, are added by referring to their WSDL specification. A client-side proxy class is automatically generated from the WSDL specification and added to the project whenever a web reference is added.
We have integrated into Visual Studio.NET a third type of reference, a G2 reference. Figure 4 shows a screen shot of a G2 reference being added to a project.

Like local references, G2 references are added by specifying a local path to an actual volunteer component. However, rather than simply extracting the volunteer component’s strong name, we use reflection to determine the signature of web methods implemented within it. We use this information to generate our own custom client-side proxy class, analogously to how WSDL specifications are used to create client-side proxy classes for web services. The inclusion of a G2Reference also triggers the automatic creation of a volunteer stub class. This is required because the volunteer component will execute on volunteer machines, and so can not rely on the functionality Microsoft’s Internet Information Server (IIS) would normally supply to marshal and unmarshal web service messages on the server.
At runtime, the volunteer components of the application are placed in the same directory as the client application. This enables the middleware to locate volunteer components and lazily upload them to the server. We refer to this as automatic code upload. This spares the application programmer from having to manually deploy volunteer components to the server each time they are updated and also makes it easy to switch between servers from one run to the next. From the client’s perspective, the entire application (both client and volunteer components) logically resides on their desktop - everything to do with the server and volunteers is kept transparent. The server, by comparison, contains no pre-deployed application specific code - it acts like a BYO restaurant, clients on remote machines provide their own code to be executed on the volunteers. For efficiency reasons, volunteer components are cached on the server, so code upload to the server only normally takes place the first time a new version of a component is used on a particular server (old versions are automatically flushed).

If jobs are available, then the server determines the oldest waiting job (other scheduling policies could be used) and creates a web page for it. The HTML page uses an <object> tag to embedded a .NET object of the class that implements the method corresponding to the oldest job. The presence of the object tag results in .NET code being automatically downloaded to the volunteer machine and cached for possible future use. Note: this is the last step of the volunteer component’s journey; from client, to server, to volunteer.

JavaScript code executed at load time calls a generic Process method on this object and waits for all jobs associated with that class to be processed. The Process method repeatedly retrieves jobs associated with that class of object from the server (via a generic G2 web service), executes them locally, and returns the results to the server. The Process method creates a new thread to do the majority of its processing, so as to allow the web browser’s main thread of control to continue (otherwise the web browser would "hang"). When jobs for the current class of object are exhausted, a new page is loaded containing an object for the next job class (if any). The following shows the contents of such a page:
<html>
<head>
<title>G2 Volunteer #915
<script language="javascript">
function Process() {
vol.Process(915, 14, "http://g2.fit.qut.edu.au/g2/G2Server/");
setTimeout("CheckFinished()", 1000);
}
function CheckFinished() {
if (vol.Finished()) {
window.location.search="?volunteerId=915";
window.location.reload(true);
}
else
setTimeout("CheckFinished()", 1000);
}
</script>
</head>
<body onload="Process()">
<div> RayTracer (version 1.0.741.17882)</div>
<object id="vol" classid="http:/Root/RayTrace/1.0.741.17882/RayTrace.dll#RayTracer">
</object>
</body>
</html>
Volunteer classes inherit from Windows.Forms.Control and so can implement
a graphical user interface that provides a visualization of whatever computation
is being performed on that volunteer. The following Figures show screen shots of
two of our applications (TSP and RayTracing) executing on volunteers.
|
|
The demonstration system allows people to volunteer their PCs, and also to download and execute a number of sample client applications. This section briefly describes two of our sample applications, RayTracing and TSP. Figures 10 and 11 show screen shots of the client interfaces.
|
|
Our implementation of the travelling salesperson problem uses a parallel genetic algorithm[7]. Informally, the algorithm works as follows. An initial randomly generated population is partitioned and migrated to separate “islands” where they evolve in isolation for some number of generations. The populations then return to the mainland where they are mixed with individuals from other islands and migrated back to the islands to further evolve. This process continues until the person running the client is happy with the result. It is of course, not possible to determine when the optimal solution has been reached. The algorithm used is actually a hybrid algorithm [12] that uses a heuristic called 2-opt to augment the random mutation process.
Figure 13 shows the performance of our system for various numbers of volunteers. The experiment involved solving a randomly generated TSP problem consisting of 200 cities. The initial randomly generated population consisted of 5000 individuals divided into 50 groups of 100. The 2-opt heuristic, being relatively expensive is applied only randomly to populations, with a probability of 0.01.
Clearly, not all parallel applications will perform well on our system. As with all parallel systems, the speedup obtained depends of the ratio of computation to communication. By varying the number of generations that a population evolves for between migrations we can vary the computation to communication ratio. As the number of generations increases the rate of processing generations increases, but the rate of genetic crossover between populations and therefore the efficiency of the algorithm decreases. The machines used for the experiments were all 864MHz Pentium IIIs running Windows 2000 Professional, connected by 100Mb/s Ethernet. Separate machines were used for the client and the server.
Our RayTracing application works by partitioning each image into a number of rectangular segments, and computing each segment in parallel. The computational core of the system is based on code from the Intel peer to peer RayTracing demo [HREF7] written by Bryan Wilkerson. The algorithm is embarrassingly parallel, but still presents a challenge in that the size of the image data returned is high compared to the amount of ray tracing computation required. In this case, increasing the size of the tasks doesn't help greatly, as the size of the image data returned increased linearly with the amount of ray tracing computation required. The computation to communication ratio can, however, be affected by the complexity of the image being raytraced. The number of objects, and their relationship to one another can affect the amount of ray tracing computation required (see Figure 12).
So, as can be seen with both examples, it’s not simply a question of how many volunteers the system will scale to; as always, it all depends on the granularity of the tasks and the ratio of communication to computation. SETI@Home, for example, scales to thousands of volunteers by generating tasks that take 12 or more hours to complete on a typical PC. We are primarily targeting applications that require far fewer volunteers, but much smaller task sizes (in the order of seconds).
|
|
Early Internet based cycle stealing systems such as GIMPS[HREF8] and SETI@HOME[9] were specific to particular applications. Subsequent general-purpose internet-based systems have mostly been specific to the Java language. They include the Charlotte[2] and subsequent Knitting Factory projects[8] at NYU and Arizona State, the Javelin[5] and CX projects[3] at UCSB and Parabon[HREF9]. The Charlotte system was the first to use Java applets embedded in HTML pages to distribute code to volunteers. While these systems internally operate very similarly to ours, they require programmers to create jobs using (in some cases, quite) complex APIs. They also generally require volunteer components to be manually deployed on a web server.
Direct connection between clients and volunteers requires the client to be running a web server (Charlotte), or rely on RMI using TCP/IP (Knitting Factory). This bypasses the need to channel messages through a server, but all client machines must be directly accessible to the volunteer machines, i.e. not behind a firewall.
Grid systems such as Globus[6] and Legion[11] provide an interface through which distributed computing resources such as supercomputers, scientific instruments and archive data storage can be easily shared and used across organisational boundaries. Our project has a tighter focus than these projects and concentrates on cycle stealing and its associated problems especially programming and deployment of such applications.
Our current single server implementation of the middleware is obviously a hindrance to scalability. Future work will focus on investigating alternative network topologies to minimise or eliminate this bottleneck while retaining the positive elements of our current system.