Getting asynchronous web services to work
July 6, 2005 10:44 AM • 1 comments
Clemens Vasters gave a great presentation at TechEd Europe last week describing how and when to use asynchronous messaging. Synchronous calls, which represent direct method calls, are easy to make and easy to understand -- you get to find out easily if the call succeeded, you don't have to deal with threading issues and, crucially, a set of related operations can follow each other in a single method.
Asynchronous programming, on the other hand, is tough. You have to send have a message off somewhere and wait until the server is ready to respond. You don't immediately know whether a call was successful, or even whether it's been attempted. What might be a simple set of operations when you're programming synchronously are now controlled by a number of callbacks, which probably don't operate on the same thread and can't share state with local variables.
So why bother? Scalability for one thing. Synchronous calls work easily and perform just fine when you have enough resources to deal with the traffic you're experiencing. When usage spikes, things don't work quite so well. If you're lucky, your server can handle it and all you experience is a slowdown. At some point however, you'll run out of memory, the database will timeout or clients will simply give up on you and assume the call has failed.
Once you don't guarantee you'll respond immediately, forcing clients to make an asynchronous call, life becomes easier. You can process requests at an easier pace within the limits of what your hardware can reliably handle, grabbing the next request from a queue whenever you're ready.
As one of his slides made very clear, Clemens loves MSMQ. Message Queuing is available on all modern versions of Windows, fast (enough) and uses a simple programming model. The most important features it introduces though are reliability and transactions. Used properly, you can submit a message to a queue happy in the knowledge that it will stay there until it's ready to be processed. Similarly, you can read an item from the queue within a transaction: if a problem occurs, you can rollback the transaction and the item will be ready and waiting whenever the problem is resolved.
So asynchronous programming is great, message queuing is a great way to achieve it, and that's what we should all use to build our distributed apps these days, right? Well, unfortunately it's not that easy. While MSMQ works great within an enterprise, it's not an option if you want an interoperable service available over the web.
Part of the solution can be to provide a SOAP endpoint and immediately push incoming requests onto a queue. This seems a pretty good approach, but what happens if the web service call fails? Benjamin Mitchell posted last year (!) about a similar talk given by Clemens at TechEd 2004, where it seems the idea was that any exceptions thrown by the WS proxy could be taken as an indicator that the call had not succeeded. By using queues at both ends (ie on the client and server sides), failed calls could simply be made later.
I didn't see that talk so I might be missing something, but that brief description just prompts more questions. The problem is that a problem could occur after the request was added to the server-side queue. Maybe IIS runs out of memory while it's generating the SOAP response. Maybe there's a power failure, maybe a router dies. The client will presume the entire call has failed and retry later, which could well cause problems.
WS-ReliableMessaging is supposed to tackle some of these problems but, at least in the .NET world, isn't yet available (it's supposed to arrive with WSE 3.0 later this year). An idempotent web service, which is a core part of SOA and REST might do the trick as well. If duplicate requests can be dealt with at the application layer, we might be able to ignore the unreliability of the web services infrastructure, just as TCP is able to work around the unreliability of the lower layers of the network stack.
Whatever solutions are available today, this kind of messaging isn't easy today. Indigo will make transport layer reliability easier, but that's only part of the picture. As always, if you want to build scalable, reliable systems, you're going to have to find ways of dealing with failure and responding in a manner that suits your business.
Comments
Hey Michael! Really interesting post, I think I should really start reading about all this WS-* stuff...
Posted by: Thomas Brunner at July 21, 2005 02:53 PM