Up the downstair

…or there and back again…see how far it is

Making Agile SOA work

Over the past few years the acronym SOA has been associated with a
 number of promises: building large scale systems in a rigorous way, ease 
of integration through standards adoption, separation of concerns of
 processes through orchestration tools via declarative rules, even 
’programming’ from analysts not to mention leveraging legacy systems 
that get new life breathed into them because they can now play in a 
wider world. Big promises all. With all the hype there have been some
 casualties along the way. Certainly the road to a successful SOA project
 is filled with challenges, many of which lie around stakeholder buy-in,
 being able to apply governance both deep and wide through the life-cycle
 and across the enterprise, pure complexity, integration testing issues,
 the old chestnut of what exactly a transaction means in an enterprise 
setting! As technologists we often focus on the technical challenges 
around projects of this kind around standards and the methods we use to
 develop the software, the development of service patterns, applying
 different methods to formalize the software life-cycle (such as a RUP
 versus Agile debate for systems of this type).

The problem with promises is that they quickly have to have to show benefit on the balance sheet, this is especially true of large strategic projects; investment is often nervously given. Successful large projects only become so because they remain accountable from inception onwards and to ensure this rigorous software methods have to be employed because SOA means integration of either new or wrapped legacy systems so the complexity of the entire system is multiplied by the complexities of all it’s parts – plus the dependencies between those parts. Indeed the dependency issue can force a counter agile approach. The key design tenet is to supply self-contained sentient service components. They can wrap the desired functionality; supply both technical and business level monitoring information so that higher-level services can consume them. A greater level of abstraction often enables the main integration points to grow more organically rather than with up-front design.. Not to say this wider reaching design doesn’t or shouldn’t occur far from it but iterative design feeds into it rather being the recipient of it. From a technical viewpoint this enables development to begin earlier and thus testing plus it allows mapping to potentially happen internally through the abstraction levels which reduces the need for pan-enterprise standardization. Whilst this does not mean a common meta model is not required it does mean it can grow incrementally and/or can be mapped from/to by the service parts. You can see a common thread of agile design feeding wider designs here. The approach f building services from smaller components is often called Service Component Architecture. So now to governance of the design process: This is the absolute key to success. It is unlikely that all sub parts of the system have the same deadlines and aren’t subject to localized pressures of their own. At least by breaking service contracts into smaller self managed parts a more agile and thus more accountable gradual delivery can begin. This isn’t to say that the more traditional Service Inventory Analysis is given up, far from it but logical views of a large complex system are all too often too far removed from reality and analysis often gets diluted away when the technical designers pragmatically begin work on the system. So the only way to succeed is to iteratively take business process models into your raw service components via test driven development until you have a full service inventory. This allows individual deliverables to continue. We are starting to see design methods such as RUP/OpenUP adopt more agile practices and agile methods being applied in a wider enterprise setting. It should also be noted that wider enterprise architectures methods such as TOGAF (The Open Group Architecture Framework) are starting to encompass agile techniques. A mistake is often made that agile means ad-hoc, it does not. It actually enforces a higher degree of rigour because to accountability is much higher. Various software engineering practices have attempted to increase ROI through re-use, de-coupling, component coherence, abstraction and so on, we have talked about design applications and issues of control but support tools are always going to be required, not just developer tools but service discovery tools are required so that components can be found, understood and (re) used – all too often the wheel is reinvented sometimes through ignorance.

So to the future: Because there are few formal definitions for what SOA actually is it will always be a chaotic path for some developments. Whilst governance and standards are vital for any interchange and integration project a stepwise but formal iterative process mixed with the higher level views (either top down or bottom up) will provide a clearer path because of the frequent reset points. A final example of this is managing transactions in a SOA environment. Because a failure in a synchronous system usually involves a rollback operation undoing what was done during the operation, within a workflow system we may have operations paused for long periods of time then found in error much later so one needs to design skeletal compensation statements right at the beginning. I recently designed a system that controlled a number of physical devices some of which may respond with an error condition hours or even days later therefore a classic rollback is out of the question so a set of cleanup or compensation actions has to be set up. All of this can often be identified and delivered within these smaller components because you are local to the problem domain quite quickly over a few iterations.

Advertisements

February 12, 2008 Posted by | SOA | Leave a comment

Optimizing Paint Messages via Conflation on Eclipse RCP/SWT

Recently I have been writing a basket trading application from scratch on my own for a client investment bank. It was a requirement for the UI to be written in Eclipse RCP and thus SWT. Because of the nature of the application it gets alot of messages that require UI updates in terms of price and FX rate changes but also on the back these changes I do alot of calculations that themselves require UI updates. I have incorporated a number of optimaization techniques but one thing has made all the difference. It occured to me that an easy way to conflate messages (and thus paint messages) was to leverage the Display.asyncExec mechanism. SWT developers will be aware that this is the standard way for background threads to update the UI thread where all the painting occurs. So it would be a good idea if the final queue of paint messages only had an optimal set of messages. In other words within a given period of time (between paint events, or an iteration of the message dispatch loop in the message pump) only the ‘latest’ message for a given UI element is present, so in simple terms if a message that will update a price gets updated 10 times by a background thread in between paint events then we are only interested in the 10th – quite apart from the fact that the blur created from the other 9 would be meaningless to the user. Note that I’m not talking about any enforced time events simply the time it takes the underlying window paint messages to be processed in the message queue.

 

This is how a normal SWT message from a background thread finds it way to the UI thread :-

 

getRealm().asyncExec(new Runnable() 

{

@Override

public void run()

{

// update your UI here.

}

});

 

Note – getRealm is just a local helper function of mine for running this nicely alongside the Eclipse data binding objects.

 

The Runnable that gets created here isn’t a new thread instance, or called by one. It’s simply a placeholder for your code to be executed by the underlying systems when it is ready to and of course by the appropriate thread. It occured to me that a savinf could be made in re-using Runnable objects something like this:-

 

public class PriceRunnable implements Runnable

{

private Object syncLock = new Object();

private boolean updateLatch = true;

public PriceRunnable()

{

}

public void update(IQuote q)

{

synchronized(syncLock){this.quote = q;}

}

@Override

public  void run()

{

// Your UI code goes here

setUpdateLatch(true);

}

 

public boolean isUpdateLatchOpen()

{

synchronized(syncLock) {return updateLatch;}

}

 

public void setUpdateLatch(boolean updateLatch)

{

synchronized(syncLock) {this.updateLatch = updateLatch;}

}

}

 

Fig. 1.

 

So when we call asyncExec we are going to pass instances of this object rather than create a new runnable every time, but that in itself is not much of a saving, we need some way to decide which Runnable instance to pass. I use an observer pattern to subscribe to events (the aforementioned price updates) so something has to be the entry point for processing those events – it’s that that will decide which Runnable object instance to use. Have a look at this – hopefully it will become clearer. Please note that runnablePriceMap is simply a map of a key (in this case an industry code that uniquely identifies an instrument- which has a number of attributes one of which is the price. So we need to have a list of these (probably from static data) – simply enumerate those keys and create an empty PriceRunnable per key (‘key’ in the example shown). 

 

 

public class InstrumentPriceSourceListener implements PriceSourceListener

{

private Map<String, InstrumentPriceRunnable> runnablePriceMap;

private Realm realm;

private Object syncLock = new Object();

public InstrumentPriceSourceListener(Map<String, InstrumentPriceRunnable> runnableMap,

Realm realm)

{

super();

this.runnablePriceMap = runnableMap;

this.realm = realm;

}

 

@Override

public void processPriceUpdate(String name, String key, final IQuote quote)

{

PriceRunnable runner = runnablePriceMap.get(key);

synchronized(syncLock){runner.update(quote);}

if(runner.isUpdateLatchOpen())

{

realm.asyncExec(runner);

runner.setUpdateLatch(false);

}

}

}

 

Fig. 2.

 

So we have an interface (not shown) that has a processPriceUpdate member, this listener will have been registered with the appropriate provider and this effectively is the callback. The interesting part is the 

 

if(runner.isUpdateLatchOpen())

{

    realm.asyncExec(runner);

    runner.setUpdateLatch(false);

}

 

…section. Now to explain it… The listener receives a notification into the processPriceUpdate method, inside here we need to update the UI somehow, but we also want it to conflate. So when a message arrives we look up which runnable to use based on some key information we get in the message, then simply call update() on the runnable – (Fig 1) all this does – in a thread safe way is update the member variables inside the runnable instance. This is where the latch comes in. A latch is like a gateway if it’s open all can pass if it’s shut no-one can enter. Note that in Fig 1 the latch is set true on creation (open) so the listener in Fig 2 in the isLatchOpen call succeeds – and asyncExec is called and the latch is set false – closing the door. Now, more and more updates come in and the runnable object gets updated through the update method effectively maintaining the latest value for us; then when the system can manage it the (Fig 1) run() method is called on the UI thread and the latest value would be used in your ‘update the UI code’ and then the latch is opened again – thus enabling further calls to asyncExec.

 

In conclusion we have saved the creation of lots of runnables but more importantly controlled the number of objects requesting UI interaction and thus paint messages, we effectively set a ‘paint me’ flag and update the value (potentially) multiple times until the system can catch up with us.

 

This code was written using Java 6 update 4 on Eclipse 3.3.1.1


February 7, 2008 Posted by | Eclipse RCP | Leave a comment

SOA method melee

Much of the talk about SOA revolves around undelivered promises or companies reticent to commit. Here I present a way of working that allows a layered approach for those who need to at least assess the risk that may lay ahead. One of the greatest challenges that faces SOA adoption is what to do first; A recent report suggested that the majority of companies that thought about SOA adoption simply weren’t moving yet just because they want to watch the market. This is largely because with the best will in the world people don’t know what to look for, we hear about bottom-up, top-down, WS-*, registries and so on and so forth. In actual fact SOA itself does not have a formal definition, it’s interesting asking different people what they think often each assumption has a valid counter. The more complex SOA projects are often strategic, pan enterprise and involve issues of Geography and time. The only way to begin that SOA process is to get enough information at the top level to convince your CIO to invest and to do that you need enough understanding of each prospective part even at a high level.

This brings us to how you go about the process, what method do we use? This is usually when the wars begin! I have found from experience (not just in SOA – but going right back to CASE days) that one always to pull bits from different methods and make them work for you. A common mistake is to underestimate complexity with any size integration problem as that is all SOA really is – integration and maybe control processes over that. Most development methods where meant for a single project or at least for where you have the same level of control of all aspects of the development. The first point that I always encounter is lack of a common meaning or even agreement on meaning of terms, the truth is you need to begin with a taxonomy at the same time as you start to define services which is often people’s starting point or service discovery. This is totally compliant with Design by Contract.

Often when left to a bottom up approach this facet is missing and often discovered too late. This stage is your metamodel and it can be done incrementally. I does not have to be full blown MOF comliant design but it pays in spades to have a common description of terms. I like to do this step in an agile way informally identifying legacy systems to wrap or new services using the Zachman Framework iteratively (going down in layers through design iterations) as this is an excellent communication tool when it comes to reporting your initial findings back to the stakeholders. It’s important to have four things beginning at this point, as mentioned your early service discovery, your metamodel description (so each system can agree on types or how to transform them), developing a CIM with the business analysts and most importantly your error conditions. At this stage it’s vital to identify your exception conditions as asynchronous interconnects and transactions do not work like their synchronous relatives. One needs to design skeletal compensation statements right at the beginning. Let me give you an example, I recently designed a system that controlled a number of physical devices some of which may respond with an error condition hours or even days later therefore a classic rollback is out of the question so a set of cleanup or compensation actions has to be set up. All of this can often be done quite quickly over a few iterations and presented in a container this is a useful and simple container for the rest of the lifecycle although you may wish to choose a deeper method later, for example I would happily use an XP process to define the initial services which are held in and communicated through Zachman, then within each service moving into some MDA techniques such as defining a PIM and feed this iteratively into your effort to get the meta model off the ground (MOF), so within Zachman you can define what, where, how, who, when, why iteratively through layers of abstraction/detail with a common vocabulary so out of this comes your highest level service definitions using a common typing system, but that’s the easy bit because two important aspects can be added at this time your SLA definitions and your compensation procedures for asynchronous error conditions, and then later your KPI’s. So from a few simple iterations you have a framework that you can apply level by level per iteration to identify services, semantic types, SLA’s and error handling. So the next iteration might define a security model for example. It can be seen that I have only taken those parts of various methods to suit the deliverable at hand, one needs to be able to treat an integration project this way i.e. with agility because many aspects of your system can and will change in time. This is one of SOA’s promises – agility and this will not happen unless one can deliver in an agile way.

 

So where is RUP or Agile method X?  You will notice that the above draws on using EA and MDA techniques, the downside is there may be a learning curve for some but most EA/EAI will cope some care needs to be exerted as it doesn’t follow any method process flow. This is largely because this doesn’t exist for SOA and it doesn’t exist because every part or subsystem can be entirely different thus most methods don’t fit the distributed nature a large SOA because as previously stated they were designed for single projects but there is certainly a place for these methods per service definition, and indeed different sub systems may require different methods – it all depends.

 

Depends on what? Governance. This is the ability to control any aspect of the system, it’s design, it’s deployment, manage it when it’s up and running, upgrades – everything. Large SOA projects are nearly always never centrally controlled but this has to be your aim, in short the project will fail at worst or at best be compromised if this is not the case. Governance is the key, establish this at the outset and constantly work on it.

 

This brings us to tool support, this the area, not surprisingly that is most lacking there are many tools one might need from packet sniffers to workflow authoring tools

 

Looking forward to this coming year much talk is about cloud computing and virtualization, whilst these are interesting subjects they are more of a deployment and runtime governance issues and don’t really effect (too much) your design time choices

 

February 5, 2008 Posted by | SOA, Uncategorized | Leave a comment

How to construct code that will run on a Grid

Code that will execute on the Grid – Best Practices

• Decomposing code to be efficient on a grid

Possibly the most important aspect of preparing code for execution on the grid is the granularity. If you think of a piece of code it is normal to think of it in distinct sections obviously leading you to factor this into your object design or method decomposition at a finer grained level. Often a choice is made to execute certain parts of the code asynchronously often by a worker thread where synchronization between threads and input and output data is required. This is similar to programming for a grid except there is a possibility that the positive and negative effects of such a programming paradigm can be exaggerated using a grid; i.e. the fast code runs faster (it in fact simply runs more deterministically on a well configured system) and the code that utilizes such things as locks on shared resources could appear to run much worse (as they are potentially holding up or being held up by more waiting processes).

1. Keep code fragments small
2. Minimize blocking and IO
3. If code is naturally sequential choose whether you wish to deploy this as a unit to the grid or break it down and deploy each code fragment to the grid. The advantage of doing this is that one achieves a greater degree of parallelism, but the cost of communication is high therefore if a set of naturally sequential tasks need any of their predecessors for input then this will have to be done either by message passing or shared memory (or both).
4. How independent is it? Loop iteration can be a good candidate, take for example a European option that is to be valued via a Monte Carlo simulation, chances are you have some inputs, you pre-calculate what you can then using a random number generator from a normal distribution inside a loop then perform a calculation and add to a running total; when the loop terminates take an average of the running total and discount it. This could be summarised as follows: –
• Pre-prepare any data prior to loop
• Perform looped calculations
• When it’s completed perform summary calculation(s)

This gives us a good example of a serial piece of code that could take advantage of the grid with few drawbacks. A simple case could therefore be
• Pre-prepare as before
• Perform a set of asynchronous calls to a grid based service that houses the body of the loop
• Register a call-back for completion of the calls, when they have all returned perform summary calculations

Sounds easy. But there are a number of considerations. If there are a large number of nodes in the grid (such that a large proportion of the simulation can be run at once) then this is attractive, if not then it’s much less so because we are obviously introducing network traffic and latency for each calculation call and response (whether we handle the calls synchronously or asynchronously at the client). There are domain considerations such as in this example we would like the random number generator to be seeded such that there are not overlaps.

Calculating Network round trip time
In deciding how you will break up a piece of code it is often useful to get an idea of the performance constraints of the underlying network. It’s useful to get an idea of how often your code will be called, or how much in terms of resources it consumes vs. how long it will take to execute remotely. You have to consider the cost of serialization as you ship your parameters (at either end) and add network round trip time. Here are a few ideas:-

1. Determine the RTT (round trip time) – do a set of pings periodically from an appropriate machine to a grid broker
2. Perform a tracert on the same as the above to view the number of hops( in a perfect world this is 0 and you can co-exist with multicast pub/sub tools).
3. It may be useful to review the network throughput and the IP settings on all machines and routers/switches (throughput = bandwidth * RTT), check that the NICs and switches are set to full duplex (important if some of the hardware is older). You may wish to review the CWIN and RWIN sizes depending on how much influence you have over the client Network department and which protocol you use.
4. You should note the change in network saturation when you perform a load test after you have done this (Task manager provides a simple view of this but a network monitoring tool is a better bet).

Where do I get (put) my data?
It is obvious that data is supplied through parameters and a return value can be obtained. In the simplest case this is all you need, however you may need to provide extra data to the service and there are a couple of methods of achieving this: –
1. DataReferences – this is a simple lookup mechanism provided by GridServer and is obtained via a DataReferenceFactory please see p39 of the developers guide. A simple data accessor is passed around which can then be dereferenced – sounds slow to me.
2. Service State – please see Stateful Services section, in short, data can be associated via named methods (when you register the service) that can be used to push and pop values into the engine ‘memory’, this is supported through failover and redistribution to other engine instances.
3. You can just use a Cache – I mentioned that the built in cache tends to be ignored (this is received opinion for me) but there’s nothing stopping you utilising a separate but similarly deployed cache mechanism such as Tangosol Coherence, Gigaspaces etc. The advantage of this approach is you can segregate areas that can be pre populated (maybe from database) and fetch it in the engine initialise phase before the service invocation, so the data is machine/cache local.

February 1, 2007 Posted by | Grid | 1 Comment

Let’s talk Grids – Datasynapse

At Lab49 we have been called upon to provide know-how on various Grid offerings such as Platform, GridServer, Digipede and so on as well as ‘Datagrid’ systems such as Tangosol Coherence and Gigaspaces. I’ll blog about these and other similar tools. I thought I’d kick off with a brief introduction to the DataSynapse Gridserver API’s, briefly what they do etc. Mainly because there isn’t alot out here about Datasynapse.

GridServer APIs what are they for?

DataSynapse looks like it’s already an evolutionary product, it is written in Java and has a set of API’s that are designed to run either client or server side. Some are designed as replacements for others. It is worth reading the developer guide but following is a list of the API’s which languages can be used to manipulate it

1. Tasklet – available in Java and C++ this is been superseded by the Services API however it is a richer API (though not available to as many client languages) as it includes the Job and Propagator API which is used for message passing (it will look functionally familiar if you come from an MPI or even PVM background).

2. PDriver – stands for Parametric Driver – allows for scripts to be executed on the grid.

3. Services – as mentioned this is a replacement for the Tasklet API. It is available client side – Java, .Net (1.1 only currently), C++ and web services. Server side – Java, .Net (1.1) , C++ and COM.

4. Admin – The grid can be set up to snapshot data to it’s own database at a configurable interval (it comes supplied with HSQLDB – but can be configured to work with just about any DB that has a JDBC driver; example configs. Exist for MySQL, Oracle, DB2, SQLServer), all this data can be accessed through the Admin API this enables information to be obtained for Services, Engines, Brokers and drivers.

5. Cache – designed to facilitate data locality for executing services, some people plug in other distributed caches as the native doesn’t support transactions – more later.

6. Discriminators – Not an API as such but worth a distinct mention, discriminators allow a demographic control i.e. where your service runs, this can be useful if you have a heterogeneous grid where some services need to call out to platform specific services such as a pricing library which is only available as a windows DLL which can be pre-deployed to a windows machine that will act as a grid node, the service will then locate to this (these) node(s) to satisfy calls of this type. This is simply done through setting properties in the service.

I’ll talk more about getting some code going client and server as well as how to code for a grid

January 31, 2007 Posted by | Grid | 3 Comments

Hello PopPickers!

As I seem to be one of the few that doesn’t blog I thought I’d change that forthwith!

I work as a Senior Consultant for http://www.lab49.com within the financial derivatives world. Our work takes us from UI frameworks such as CAB on .Net as well as Eclipse RCP on Java through to .Net services using EntLib and various JEE systems. I’m even doing some T-SQL tuning work right now.

I’ll blog when I’ve got something interesting to say…  🙂

January 31, 2007 Posted by | Uncategorized | 1 Comment