« May 2006 |
Main
April 2008 Archives
Gilbert Pilz's Blog
Gilbert Pilz's Homepage
Gilbert Pilz is a senior principal technologist within BEA's office
of the CTO. Gilbert has over twenty years of software architecture and
development experience across a variety of technologies and platforms
including distributed systems infrastructures (Web Services, CORBA, DCE,
NCS, and Banyan VINES), security systems (SAML, OpenSSL, IBM Tivoli
Access Manager, JCE, etc.), and the UNIX operating system (SysV, BSD).
Replay Reconsidered
Posted by gpilz on April 25, 2008 at 9:35 AM | Permalink
| Comments (4)
| TrackBack (0)
WS-ReliableMessaging describes a protocol that allows SOAP messages to be delivered reliably between distributed applications in the presence of software component, system, or network failures. One issue that has long bedeviled WS-RM is how to support reliable responses to so-called "anonymous clients". The OASIS WS-RX Technical Committee created the WS-MakeConnection specification to deal with this issue. Another, alternate solution is the use of the "replay model". This article describes the technical defects of this model.
It is assumed that readers of this article are familiar with the basic principals and operation of the WS-ReliableMessaging protocol. If you are less than familiar with WS-RM, this Wikipedia entry is a good place to get started.
Core Dilemma
The core dilemma behind this issue is that "anonymous clients" (I prefer the term "non-addressable clients" because I don't like to conflate the concepts of addressability with those of identity) can only communicate synchronously yet WS-RM, by its nature, potentially renders all communications asynchronous. Uh huh. Let's break that down a bit.
Non-addressable clients are hosted on computers that, for reasons of network topology (i.e. NATs), security (i.e. firewalls), or whatever, cannot accept connections from systems outside their network. Although you can't connect to these machines from the outside, they themselves can create outbound connections. SOAP supports non-addressable clients by leveraging HTTP to take advantage of this fact. Non-addressable SOAP clients create an outbound connection to a server, send the request message over this connection, then read the corresponding response from that same connection (this response channel is sometimes referred to as "the HTTP back-channel"). This is why non-addressable clients operate synchronously. They have to use the connection they created to read the server's response because, by definition, it is impossible for the server to connect to them and send the response (as would happen in an asynchronous exchange). For readers accustomed to thinking in terms of synchronous communication this all seems par for the course, but wait, there's more.
WS-RM is built on the concepts of acknowledglements and retransmissions. One node (client, server, whatever) sends a message to another and waits for an acknowledgement. If it doesn't receive one it assumes the message didn't get through and sends it again. So, regardless of when you think you are going to receive a message and which connection you think you are going to receive that message over, something may go wrong (the connection might break) and WS-RM will retransmit the message at a later time over a different connection. This doesn't present a problem for non-addressable clients on the request side (where they control the creation of new connections) but it is a problem on the response side. Suppose you are a server in the process of sending a reliable response to a non-addressable client and the connection goes down. Obviously you are never going to get an acknowledgment for that response message so, as a WS-RM node, it is your responsibility to resend it. But how are you supposed to do that? You can't connect to the client and re-send the response because the client is not addressable.
Replay Redux
As I said earlier, the OASIS WS-RX Technical Committee created the WS-MakeConnection specification as a means of addressing this problem. WS-MakeConnection is a very important piece of technology as I will explain in a later article. Another solution that predates the work of the WS-RX TC is the use of "replays". The best description of the replay model is this whitepaper by WS02. Although this article describes the use of replay in the context of WS-RM 1.0, some implementations (most notably Microsoft® Windows Communication Foundation (WCF)) have extended this solution to include WS-RM 1.1. Replay takes advantage of the fact that non-addressable clients can create new outbound connections and uses the retransmission of a (possibly acknowledged) request to solicit the retransmission of the corresponding response. On the surface these seems like a reasonable approach but, as I will show, there are a number of serious technical issues around its implementation and use.
Abstraction Layer Violations
One of the most serious issues with the implementation of the replay model is that it requires the RMS to be aware of the message exchange pattern of the messages it processes. To understand why this is so we need to review the normal processing sequence for an RMS. An RMS receives a message from the higher-level Application Source (AS). The RMS then transmits the request message to the RMD. Since the RMS is responsible for re-transmitting the request message it must store that message (in memory and/or on disk) until it receives an acknowledgment from the RMD. When the acknowledgment is received the RMS can "forget" about the message. Not so when replay is in effect. Because the replay model uses request messages as a prompt for lost response messages, the RMS must store requests until the corresponding response as been received even after the request itself has been acknowledged. But wait, what if there is no response message? What if the request message is the sole message in a one-way exchange? We obviously can't have the RMS storing these one-way messages forever, so the RMS needs to know whether the message it is processing is part of a request-response exchange or a one-way message.
OK, why is this such a big deal? To understand why this is an issue we need to think about the basic architecture of SOAP and the composability of web service specifications. One of SOAP's big claims is that you can add additional facilities (like reliability) in a way that is transparent to both the application and to any other facilities. Underlying this assertion is the notion that most SOAP stacks will implement some form of the chain of responsibility pattern. This means that the only parts of the SOAP processing pipeline that should be aware of the exchange pattern being used are the initiator and the ultimate receiver. Requiring the handler that implements WS-RM to know the exchange pattern in effect for the messages it handles runs counter to this entire architecture. Does that mean you couldn't hack around this problem in some way? Of course you could! But these kind of hacks are likely to work only in specific instances (i.e. when the WS-RM processor and the initiator share the same process space, etc.) and will, ultimately, lead to a SOAP stack that is buggy and fragile (or should I say "buggier and more fragile"?).
Request-Response Correlation
Another problem with implementing the replay model is the fact that the server-side WS-RM handler must maintain the correlation between the requests and responses it has processed; something it isn't normally required to do. If it doesn't do this it won't know which response to retransmit when it receives a replayed request. This correlation information must exist in the request and response messages using WS-Addressing's wsa:MessageID and wsa:RelatesTo header elements (I've never heard anyone propose any other way of doing it) thus creating a dependency between WS-RM and WS-Addressing where none existed before. Entries in this "correlation table" (speaking abstractly) can only be removed when the server-side WS-RM handler receives an acknowledgment for the response. Obviously you don't want this table to keep growing forever, so you can't create entries for requests that won't have a response. As with the client side, the server-side WS-RM handler must now know the exchange pattern in effect for each request it receives. The abstraction layer violation that exists on the client side exists on the server side as well. On top of this you have the additional, per-message (both request and response) overhead of referencing and updating the correlation information.
No Advertisement or Agreement
Web services are rooted in the concept of design by contract. Services indicate that clients may (or are required to) use standards such as WS-Addressing or WS-Security through the use of WS-Policy assertions in their WSDL documents. The replay model has no WS-Policy assertions to indicate its use, nor are there any other mechanisms defined that would allow a client to determine if a service does or doesn't support the use of replay. Considering the problems described above, it shouldn't come as a surprise that most web service stacks do not implement the replay model. So, given that there are stacks that don't support replay and taking into consideration that those that do may do so on an optional basis, it seems that the only way to know whether replay is going to work for you, as a client, is to call or email the administrator of the service and ask. If there are no alarm bells going off in your head at this moment, you haven't spent enough time in IT operations. "Interoperation by alignment of externally invisible configuration settings" has been shown to be operationally inscalable.
This problem exists on the flip-side as well. How does a service know whether a client intends to use replays? The article referred to above defines some rules whereby the server can use a combination of various values in the wsrm:CreateSequence message to infer that replay is in effect. To be clear, though, replay is an extension to WS-RM and it might not be the only extension to use that particular combination of values. Inferring the use of an extension through the values of particular, general purpose elements is risky and likely to cause interoperability problems. It would have been much better if replay defined an extension to the CreateSequence message and/or a unique SOAP header to signal to the server that the client intended to use replay.
Limited Applicability
If you've been following the conversation so far you've noticed that the replay model is only necessary for reliable request/response exchanges between a non-addressable client and a service. It is not needed for reliable one-way exchanges from a non-addressable client because there is no reliable response to worry about. But what about other kinds of patterns? A common paradigm in distributed computing is "publish and subscribe". Suppose a non-addressable client wants to subscribe to a series of event notifications that need to be delivered reliably? The exchange pattern might be termed "request-response-response-response . . ". Even if we assume that the subscription request is carried reliably (it might not be), it's obvious that the replay model will not help the publishing service retry lost notification messages. How would the client even know that it hadn't received a notification message? There are also situations in which a client might engage in a non-reliable request/reliable response exchange with a server. Since the request message is not processed by the server's WS-RM layer, the request-to-response mapping necessary for the replay model to work will not exist, and replay will not work. Additionally, since the request message is not filtered by the WS-RM layer, any replayed requests will be dispatched to the application.
Some of the above stuff is pretty advanced and it's hard to imagine how any of it would work with or without reliability (sending a series of notification messages to a non-addressable client?). I wouldn't have brought it up if there weren't a way of addressing the "reliable response to a non-addressable client" issue that also addresses all of these exchange patterns; (you guessed it) WS-MakeConnection.
Summary
This (rather lengthy) article has presented some of the technical issues with implementing and using the replay model. There are other, non-technical issues, including the fact that the replay model has not been approved by any recognized standards organization and actually violates the WS-RM standard, that should give pause to anyone attempting to use this approach to solving the problem of reliably responding to a non-addressable clients. As is obviously apparent, we at BEA think that the WS-MakeConnection protocol not only addresses the reliable request/response scenarios in a way that is far less problematic than replay, it also addresses a number of other scenarios of interest to our customers.
 |
 |
May 2008
| Sun |
Mon |
Tue |
Wed |
Thu |
Fri |
Sat |
| |
|
|
|
1 |
2 |
3 |
| 4 |
5 |
6 |
7 |
8 |
9 |
10 |
| 11 |
12 |
13 |
14 |
15 |
16 |
17 |
| 18 |
19 |
20 |
21 |
22 |
23 |
24 |
| 25 |
26 |
27 |
28 |
29 |
30 |
31 |
Search this blog:
Archives
April 2008
May 2006
Recent Entries
Replay Reconsidered
|