Distributed computing via .NET REMOTING
Modified : June 22 2009

Distributed computing (DC), the process of spreading a complex workload across multiple execution units, is facinating stuff. There are plenty of famous examples of DC around today - SETI@home and Folding@home to name but two - that are being used to solve the sorts of complex calculations that would take impractically long times on a single machine.

img

On a DC 'grid', whenever a volunteers PC is sitting idle for a certain amount of time, the DC applications are able to engage and use that CPUs power. These projects tend to have thousands of subscribers, which means they have thousands of computers at their disposal, thousands of gigahertz of number-crunching muscle... show me a programmer that wouldn't love to have that sort of processing power at their disposal!

folding

Folding@Home Grid Statistics - almost 200,000 active CPUs!

Wikipedia hold a list of distributed computing projects, as well as a much wordier explanation of the principles involved than the one i'm giving here. You may also want to check out Distributed.net, a general-purpose distributed computing project.

Anyway.

Being fascinated by DC and looking for a reasonably large project to help learn C#, some time ago I decided to try and build my own distributed computing framework using a very interesting component of the language : Remoting.

Remoting is next-generation inter-process communication. It allows objects to communicate with each other transparently, whether or not the objects exist inside the same application domain, on the same machine or even on the same continent.

To quote MSDN, ".NET remoting provides an abstract approach to interprocess communication that separates the remotable object from a specific client or server application domain and from a specific mechanism of communication".

In other words, its perfect for hacking some distributed computing tools together without having to worry about writing tedious communication layers yourself. The following diagram shows (roughly) what happens if you use Remoting to execute a function outside of the local application domain.

ClassA cAInstance;
cAInstance.myFunc();

remoting

From the perspective of the local application code, calling myFunc() looks just like calling a regular function. However, the Remoting system pipes that request through a proxy object that serializes data associated with calling the function and sends it to the remote class instance. This is then deserialized, and the function is called on the remote client. The path is then reversed to pass results back to the proxy object. Although this is a simplification, it is the foundation of what my distributed processing system is based upon.

I came up with a fairly simple design that would give me a pluggable distribution framework - applications could be written using a specific interface that could then be dropped into the computing grid and have their calculations spread across multiple machines seamlessly.

dpgdesign

In this design, a given application has to handle 3 operations:

  1. - Task Generation - in which the application will generate a number of self-contained work units, Tasks, such as "generate data for image section 10, 10 -> 30, 30 with input values a, b, c"

  2. - Calculation - where the application must accept a Task as generated in phase 1 and produce some discernable result with it, fulfilling the desired outcome of the task. A Result Fragment will be generated, usually one for every task generated.

  3. - Recombination - where all the Result Fragments are assembled into a coherant whole. The recombination logic is entirely down to the applications own knowledge of how the tasks were generated and how the results were calculated.

This design, combined with .NET Remoting, lets me run each of these operations anywhere I like; commonly, the Task Generation will be run on the machine where the distributed processing is being initiated from (the 'Manager')... the Calculation phase is then farmed out to as many machines as possible on the network. The Recombination phase could either be done on the Manager machine or, if it were very expensive, that phase could also be farmed out.

I created a set of libraries and applications, called the Distributed Processing Grid, or DPG for short, to try this stuff out.

dpglayout

The DPG is formed of two major modules - Hive and Machine. The DPG is based upon a Peer-To-Peer style network, and the Hive is a windows service whos sole purpose is to manage the existance of Machine nodes.

Each Machine node can:

  1. - get the status of the whole grid, by asking the Hive for a list of all connected and available nodes
  2. - accept a distributed task request, as long as the node has the appropriate distributed application installed
  3. - initiate a distributed task out to other nodes in the grid

When the Machine application runs, it connects to the Hive server and registers itself as available for tasks and updates. There are views in the UI for displaying the status of the whole grid (complete with the basic specifications and current load of each Machine node)

In writing DPG, I also wrote a test distributable application - one that creates Mandelbrot fractal zoom movies. Each frame of the movie is generated on as many machines as there are available, and then an .AVI movie file is built during the recombination phase. A demo is embedded below:



Although I wrote the DPG a while ago, it should all still compile and work just fine with the latest .NET Framework.You can download both the compiled build as well as the source code below. Built with .NET 1.1 and VS 2003.

[ DPG Applications & Demo ] [ DPG Source Code ]

To run the pre-built application, you will need to modify the DPG-Config.XML to define which machine will be used as the Hive. This XML file should be the same across all DPG installations. Run the "register hive.bat" batch file to install the Hive service before running any clients.