Up the downstair

…or there and back again…see how far it is

How to construct code that will run on a Grid

Code that will execute on the Grid – Best Practices

• Decomposing code to be efficient on a grid

Possibly the most important aspect of preparing code for execution on the grid is the granularity. If you think of a piece of code it is normal to think of it in distinct sections obviously leading you to factor this into your object design or method decomposition at a finer grained level. Often a choice is made to execute certain parts of the code asynchronously often by a worker thread where synchronization between threads and input and output data is required. This is similar to programming for a grid except there is a possibility that the positive and negative effects of such a programming paradigm can be exaggerated using a grid; i.e. the fast code runs faster (it in fact simply runs more deterministically on a well configured system) and the code that utilizes such things as locks on shared resources could appear to run much worse (as they are potentially holding up or being held up by more waiting processes).

1. Keep code fragments small
2. Minimize blocking and IO
3. If code is naturally sequential choose whether you wish to deploy this as a unit to the grid or break it down and deploy each code fragment to the grid. The advantage of doing this is that one achieves a greater degree of parallelism, but the cost of communication is high therefore if a set of naturally sequential tasks need any of their predecessors for input then this will have to be done either by message passing or shared memory (or both).
4. How independent is it? Loop iteration can be a good candidate, take for example a European option that is to be valued via a Monte Carlo simulation, chances are you have some inputs, you pre-calculate what you can then using a random number generator from a normal distribution inside a loop then perform a calculation and add to a running total; when the loop terminates take an average of the running total and discount it. This could be summarised as follows: -
• Pre-prepare any data prior to loop
• Perform looped calculations
• When it’s completed perform summary calculation(s)

This gives us a good example of a serial piece of code that could take advantage of the grid with few drawbacks. A simple case could therefore be
• Pre-prepare as before
• Perform a set of asynchronous calls to a grid based service that houses the body of the loop
• Register a call-back for completion of the calls, when they have all returned perform summary calculations

Sounds easy. But there are a number of considerations. If there are a large number of nodes in the grid (such that a large proportion of the simulation can be run at once) then this is attractive, if not then it’s much less so because we are obviously introducing network traffic and latency for each calculation call and response (whether we handle the calls synchronously or asynchronously at the client). There are domain considerations such as in this example we would like the random number generator to be seeded such that there are not overlaps.

Calculating Network round trip time
In deciding how you will break up a piece of code it is often useful to get an idea of the performance constraints of the underlying network. It’s useful to get an idea of how often your code will be called, or how much in terms of resources it consumes vs. how long it will take to execute remotely. You have to consider the cost of serialization as you ship your parameters (at either end) and add network round trip time. Here are a few ideas:-

1. Determine the RTT (round trip time) – do a set of pings periodically from an appropriate machine to a grid broker
2. Perform a tracert on the same as the above to view the number of hops( in a perfect world this is 0 and you can co-exist with multicast pub/sub tools).
3. It may be useful to review the network throughput and the IP settings on all machines and routers/switches (throughput = bandwidth * RTT), check that the NICs and switches are set to full duplex (important if some of the hardware is older). You may wish to review the CWIN and RWIN sizes depending on how much influence you have over the client Network department and which protocol you use.
4. You should note the change in network saturation when you perform a load test after you have done this (Task manager provides a simple view of this but a network monitoring tool is a better bet).

Where do I get (put) my data?
It is obvious that data is supplied through parameters and a return value can be obtained. In the simplest case this is all you need, however you may need to provide extra data to the service and there are a couple of methods of achieving this: -
1. DataReferences – this is a simple lookup mechanism provided by GridServer and is obtained via a DataReferenceFactory please see p39 of the developers guide. A simple data accessor is passed around which can then be dereferenced – sounds slow to me.
2. Service State – please see Stateful Services section, in short, data can be associated via named methods (when you register the service) that can be used to push and pop values into the engine ‘memory’, this is supported through failover and redistribution to other engine instances.
3. You can just use a Cache – I mentioned that the built in cache tends to be ignored (this is received opinion for me) but there’s nothing stopping you utilising a separate but similarly deployed cache mechanism such as Tangosol Coherence, Gigaspaces etc. The advantage of this approach is you can segregate areas that can be pre populated (maybe from database) and fetch it in the engine initialise phase before the service invocation, so the data is machine/cache local.

February 1, 2007 Posted by delliman | Grid | | 1 Comment

Let’s talk Grids – Datasynapse

At Lab49 we have been called upon to provide know-how on various Grid offerings such as Platform, GridServer, Digipede and so on as well as ‘Datagrid’ systems such as Tangosol Coherence and Gigaspaces. I’ll blog about these and other similar tools. I thought I’d kick off with a brief introduction to the DataSynapse Gridserver API’s, briefly what they do etc. Mainly because there isn’t alot out here about Datasynapse.

GridServer APIs what are they for?

DataSynapse looks like it’s already an evolutionary product, it is written in Java and has a set of API’s that are designed to run either client or server side. Some are designed as replacements for others. It is worth reading the developer guide but following is a list of the API’s which languages can be used to manipulate it

1. Tasklet – available in Java and C++ this is been superseded by the Services API however it is a richer API (though not available to as many client languages) as it includes the Job and Propagator API which is used for message passing (it will look functionally familiar if you come from an MPI or even PVM background).

2. PDriver – stands for Parametric Driver – allows for scripts to be executed on the grid.

3. Services – as mentioned this is a replacement for the Tasklet API. It is available client side – Java, .Net (1.1 only currently), C++ and web services. Server side – Java, .Net (1.1) , C++ and COM.

4. Admin – The grid can be set up to snapshot data to it’s own database at a configurable interval (it comes supplied with HSQLDB – but can be configured to work with just about any DB that has a JDBC driver; example configs. Exist for MySQL, Oracle, DB2, SQLServer), all this data can be accessed through the Admin API this enables information to be obtained for Services, Engines, Brokers and drivers.

5. Cache – designed to facilitate data locality for executing services, some people plug in other distributed caches as the native doesn’t support transactions – more later.

6. Discriminators – Not an API as such but worth a distinct mention, discriminators allow a demographic control i.e. where your service runs, this can be useful if you have a heterogeneous grid where some services need to call out to platform specific services such as a pricing library which is only available as a windows DLL which can be pre-deployed to a windows machine that will act as a grid node, the service will then locate to this (these) node(s) to satisfy calls of this type. This is simply done through setting properties in the service.

I’ll talk more about getting some code going client and server as well as how to code for a grid

January 31, 2007 Posted by delliman | Grid | | No Comments Yet