Arch2Arch Tab BEA.com
Syndicate this blog (XML)

Replay Reconsidered

Bookmark Blog Post

del.icio.us del.icio.us
Digg Digg
DZone DZone
Furl Furl
Reddit Reddit

Gilbert Pilz's Blog | April 25, 2008   9:35 AM | Comments (4)


WS-ReliableMessaging describes a protocol that allows SOAP messages to be delivered reliably between distributed applications in the presence of software component, system, or network failures. One issue that has long bedeviled WS-RM is how to support reliable responses to so-called "anonymous clients". The OASIS WS-RX Technical Committee created the WS-MakeConnection specification to deal with this issue. Another, alternate solution is the use of the "replay model". This article describes the technical defects of this model.

It is assumed that readers of this article are familiar with the basic principals and operation of the WS-ReliableMessaging protocol. If you are less than familiar with WS-RM, this Wikipedia entry is a good place to get started.

Core Dilemma

The core dilemma behind this issue is that "anonymous clients" (I prefer the term "non-addressable clients" because I don't like to conflate the concepts of addressability with those of identity) can only communicate synchronously yet WS-RM, by its nature, potentially renders all communications asynchronous. Uh huh. Let's break that down a bit.

Non-addressable clients are hosted on computers that, for reasons of network topology (i.e. NATs), security (i.e. firewalls), or whatever, cannot accept connections from systems outside their network. Although you can't connect to these machines from the outside, they themselves can create outbound connections. SOAP supports non-addressable clients by leveraging HTTP to take advantage of this fact. Non-addressable SOAP clients create an outbound connection to a server, send the request message over this connection, then read the corresponding response from that same connection (this response channel is sometimes referred to as "the HTTP back-channel"). This is why non-addressable clients operate synchronously. They have to use the connection they created to read the server's response because, by definition, it is impossible for the server to connect to them and send the response (as would happen in an asynchronous exchange). For readers accustomed to thinking in terms of synchronous communication this all seems par for the course, but wait, there's more.

WS-RM is built on the concepts of acknowledglements and retransmissions. One node (client, server, whatever) sends a message to another and waits for an acknowledgement. If it doesn't receive one it assumes the message didn't get through and sends it again. So, regardless of when you think you are going to receive a message and which connection you think you are going to receive that message over, something may go wrong (the connection might break) and WS-RM will retransmit the message at a later time over a different connection. This doesn't present a problem for non-addressable clients on the request side (where they control the creation of new connections) but it is a problem on the response side. Suppose you are a server in the process of sending a reliable response to a non-addressable client and the connection goes down. Obviously you are never going to get an acknowledgment for that response message so, as a WS-RM node, it is your responsibility to resend it. But how are you supposed to do that? You can't connect to the client and re-send the response because the client is not addressable.

Replay Redux

As I said earlier, the OASIS WS-RX Technical Committee created the WS-MakeConnection specification as a means of addressing this problem. WS-MakeConnection is a very important piece of technology as I will explain in a later article. Another solution that predates the work of the WS-RX TC is the use of "replays". The best description of the replay model is this whitepaper by WS02. Although this article describes the use of replay in the context of WS-RM 1.0, some implementations (most notably Microsoft® Windows Communication Foundation (WCF)) have extended this solution to include WS-RM 1.1. Replay takes advantage of the fact that non-addressable clients can create new outbound connections and uses the retransmission of a (possibly acknowledged) request to solicit the retransmission of the corresponding response. On the surface these seems like a reasonable approach but, as I will show, there are a number of serious technical issues around its implementation and use.

Abstraction Layer Violations

One of the most serious issues with the implementation of the replay model is that it requires the RMS to be aware of the message exchange pattern of the messages it processes. To understand why this is so we need to review the normal processing sequence for an RMS. An RMS receives a message from the higher-level Application Source (AS). The RMS then transmits the request message to the RMD. Since the RMS is responsible for re-transmitting the request message it must store that message (in memory and/or on disk) until it receives an acknowledgment from the RMD. When the acknowledgment is received the RMS can "forget" about the message. Not so when replay is in effect. Because the replay model uses request messages as a prompt for lost response messages, the RMS must store requests until the corresponding response as been received even after the request itself has been acknowledged. But wait, what if there is no response message? What if the request message is the sole message in a one-way exchange? We obviously can't have the RMS storing these one-way messages forever, so the RMS needs to know whether the message it is processing is part of a request-response exchange or a one-way message.

OK, why is this such a big deal? To understand why this is an issue we need to think about the basic architecture of SOAP and the composability of web service specifications. One of SOAP's big claims is that you can add additional facilities (like reliability) in a way that is transparent to both the application and to any other facilities. Underlying this assertion is the notion that most SOAP stacks will implement some form of the chain of responsibility pattern. This means that the only parts of the SOAP processing pipeline that should be aware of the exchange pattern being used are the initiator and the ultimate receiver. Requiring the handler that implements WS-RM to know the exchange pattern in effect for the messages it handles runs counter to this entire architecture. Does that mean you couldn't hack around this problem in some way? Of course you could! But these kind of hacks are likely to work only in specific instances (i.e. when the WS-RM processor and the initiator share the same process space, etc.) and will, ultimately, lead to a SOAP stack that is buggy and fragile (or should I say "buggier and more fragile"?).

Request-Response Correlation

Another problem with implementing the replay model is the fact that the server-side WS-RM handler must maintain the correlation between the requests and responses it has processed; something it isn't normally required to do. If it doesn't do this it won't know which response to retransmit when it receives a replayed request. This correlation information must exist in the request and response messages using WS-Addressing's wsa:MessageID and wsa:RelatesTo header elements (I've never heard anyone propose any other way of doing it) thus creating a dependency between WS-RM and WS-Addressing where none existed before. Entries in this "correlation table" (speaking abstractly) can only be removed when the server-side WS-RM handler receives an acknowledgment for the response. Obviously you don't want this table to keep growing forever, so you can't create entries for requests that won't have a response. As with the client side, the server-side WS-RM handler must now know the exchange pattern in effect for each request it receives. The abstraction layer violation that exists on the client side exists on the server side as well. On top of this you have the additional, per-message (both request and response) overhead of referencing and updating the correlation information.

No Advertisement or Agreement

Web services are rooted in the concept of design by contract. Services indicate that clients may (or are required to) use standards such as WS-Addressing or WS-Security through the use of WS-Policy assertions in their WSDL documents. The replay model has no WS-Policy assertions to indicate its use, nor are there any other mechanisms defined that would allow a client to determine if a service does or doesn't support the use of replay. Considering the problems described above, it shouldn't come as a surprise that most web service stacks do not implement the replay model. So, given that there are stacks that don't support replay and taking into consideration that those that do may do so on an optional basis, it seems that the only way to know whether replay is going to work for you, as a client, is to call or email the administrator of the service and ask. If there are no alarm bells going off in your head at this moment, you haven't spent enough time in IT operations. "Interoperation by alignment of externally invisible configuration settings" has been shown to be operationally inscalable.

This problem exists on the flip-side as well. How does a service know whether a client intends to use replays? The article referred to above defines some rules whereby the server can use a combination of various values in the wsrm:CreateSequence message to infer that replay is in effect. To be clear, though, replay is an extension to WS-RM and it might not be the only extension to use that particular combination of values. Inferring the use of an extension through the values of particular, general purpose elements is risky and likely to cause interoperability problems. It would have been much better if replay defined an extension to the CreateSequence message and/or a unique SOAP header to signal to the server that the client intended to use replay.

Limited Applicability

If you've been following the conversation so far you've noticed that the replay model is only necessary for reliable request/response exchanges between a non-addressable client and a service. It is not needed for reliable one-way exchanges from a non-addressable client because there is no reliable response to worry about. But what about other kinds of patterns? A common paradigm in distributed computing is "publish and subscribe". Suppose a non-addressable client wants to subscribe to a series of event notifications that need to be delivered reliably? The exchange pattern might be termed "request-response-response-response . . ". Even if we assume that the subscription request is carried reliably (it might not be), it's obvious that the replay model will not help the publishing service retry lost notification messages. How would the client even know that it hadn't received a notification message? There are also situations in which a client might engage in a non-reliable request/reliable response exchange with a server. Since the request message is not processed by the server's WS-RM layer, the request-to-response mapping necessary for the replay model to work will not exist, and replay will not work. Additionally, since the request message is not filtered by the WS-RM layer, any replayed requests will be dispatched to the application.

Some of the above stuff is pretty advanced and it's hard to imagine how any of it would work with or without reliability (sending a series of notification messages to a non-addressable client?). I wouldn't have brought it up if there weren't a way of addressing the "reliable response to a non-addressable client" issue that also addresses all of these exchange patterns; (you guessed it) WS-MakeConnection.

Summary

This (rather lengthy) article has presented some of the technical issues with implementing and using the replay model. There are other, non-technical issues, including the fact that the replay model has not been approved by any recognized standards organization and actually violates the WS-RM standard, that should give pause to anyone attempting to use this approach to solving the problem of reliably responding to a non-addressable clients. As is obviously apparent, we at BEA think that the WS-MakeConnection protocol not only addresses the reliable request/response scenarios in a way that is far less problematic than replay, it also addresses a number of other scenarios of interest to our customers.


Comments

Comments are listed in date ascending order (oldest first) | Post Comment

  • Without getting into the standards aspects of Replay, I'd like to clarify some aspects of that model that seem to be misunderstood.
    Abstraction Layer Violations Replay defines a mechanism by which an RMS can know whether it should expect a response for a given request; briefly, if the transport response corresponding to a transport request on which an reliable request was sent contains a SOAP payload then that payload is the reliable response corresponding to the reliable request. Thus the RMS does not need to know the application MEP. Similarly, the protocol does not require the RMD to understand the application MEP. Some communication stacks support abstractions that ADs can use to declare the outcome of processing requests (for example, an HTTP stack might send a 202 or a 200 status code depending on application processing outcome). If such an abstraction is a first class notion[1] in a communication stack then an RMD can easily and naturally leverage it to learn the outcome of application processing and react accordingly - without knowing the application's MEP.
    Request-Response Correlation
    Correlation can be maintained strictly using WS-RM message numbers. For any reliable request that generates a reliable response, the request's message number can be associated to the corresponding response message, thus creating the correlation. If the communication stack supports a "request context" abstraction (as briefly described above) then the RMD can easily determine when this correlation is needed.
    [1] In WCF this abstraction is represented by the RequestContext class.

    Posted by: stefanba on April 29, 2008 at 3:05 PM

  • Stefan,

    Are you saying that the RMS does not need to have a priori knowledge of the MEP for the outgoing messages because it can infer the MEP by the presence/absence of a SOAP payload in the response channel? This seems like circular logic to me. You can't depend on the fact that you are always going to get a full, well-formed response because WS-RM is applicable only in those situations in which you don't. Suppose the connection is broken before the service can write the response. The intended response might have been a "202" with no payload (in the case of a one-way) or it might have been a "200" with an accompanying SOAP payload (in the case of a request-response). The RMS has no way of determining what the response was supposed to be so it can't infer the MEP.

    With regards to communicating the intended MEP between the AD and the RMD via the RequestContext class, this is what I was referring by "hacks that work only in specific instances". This only works if your AD and your RMD share the same process space. You can't do this if, for example, you implement your RMD as a separate intermediary.

    As far as request-response correlation goes, are you saying that, for the two Sequences (initiated and offered), request message N in the initiated Sequence must always correspond to response message N in the offered Sequence? This would mean that you can only use a given Sequence for request-response traffic or for one-way traffic, but never both. If you intermixed a one-way message on a Sequence intended for request-response messages it would increase the message number on the initiated Sequence without a corresponding increase in the number for the offered Sequence. Perhaps I misunderstood you.

    Posted by: gpilz on April 29, 2008 at 4:38 PM

  • I believe Gil is seeing in WS-RM more requirements than required. "WS-RM is built on the concepts of acknowledglements and retransmissions" Sure. But notice how nowhere in WS-RM is specified "when" acknowledglements and retransmissions should be sent, and how this is to be controlled. This is because WS-RM is a PROTOCOL spec, not a COMPONENT spec. This means that WS-RM has been designed to allow for a wide array of implementation patterns: from independent RM modules that do exactly what WS-RM specifies and nothing more, to the fully embedded RM function that knows a lot more about the exchanges it is supposed to make reliable. And here is where there is a fundamentally flawed assumption in Gil's incrimination of the request-replay technique: for an "abstract layer violation" to exist, there has to be layers first defined. You get my drift: where are these defined? In WS-RM "messaging model" (see Figure 1 Section 2)? I claim not. The abstract model described in Fig 1 should not be confused with a module specification: it is not exclusive from other information flowing between App Source and RMS. It is just that: a model explaining the context in which the the *specified RM functions* are supposed to operate (and here especially useful to define the semantics of Delivery Assurances). Not an API spec. ----- ----- So let us have a closer look on the Sender side first. What additional information flows from the App Source to the RMS, is my implementation choice. I could decide to inform my RMS of the type of MEP it is servicing if I want to. No interoperability harm in this. The other side does not have to know. And this information may help me decide how and when to resend messages: I can decide to resend Requests even if I have received an acknowledgement for this request. E.g. if I haven't received a response to this request. In fact, regardless of MEP, I can also decide to resend a Request if my previous sending generated an exception, a common occurrence in SOAP stacks when an HTTP request-response fails to complete. This is my choice and should not be a surprise to the Receiver side, which is precisely supposed to be reliable (duplicates are bad? then this Receiver will surely support AtMostOnce and eliminate duplicates.) ----- ----- On [request] Receiver side: all the same. Nothing in WS-RM prevents my RM endpoint to identify a received Request as a resend of a previous Request, and to resend on its backchannel the Response message sent over the backchannel of the initial Request. (of course, assuming that no other behavior has been explicitly mandated, e.g. by MCsupported policy assertion). Thats my choice. Does it hurt interoperability? not the least. How do I know about these out-of-scope behaviors? The same way I would share or synchronize other out-of-scope parameters like those that control Ack sending and message resending (still necessary for a good RM tuning): out of band agreement.

    Posted by: JacquesDurand on May 1, 2008 at 11:55 PM

  • Jacques, of course you can make anything work but given the scope of what's in front of us (meaning the RX specs) you can't make Replay work w/o also inventing new semantics that are not part of any specification (or even any document w.r.t. RM 1.1). As for this one implementation choice (making the RM layer aware of the MEPs), yes you can make that choice because, as you said, RM allows lots of choices. However, it would seem like a very limiting choice given that there are quite a few environments where the RM layer and the app layer are not that closely tied - which means a Replay impl will not be able to interoperate with things such as some SI-Buses. Given that a client should probably not be that aware of how the various services it'll interact with are configured, I would think it would be best to choose an implementation that will interop with as many as possible. Or worse, I would hate to think that people would want more than one way to do the same thing - one for those constrained environments and one for the enterprise ones. Seems we're losing some of the value of Web Services at that point.

    Posted by: DugD on May 8, 2008 at 12:44 PM



Only logged in users may post comments. Login Here.

Powered by
Movable Type 3.31