Published on dev2dev (http://dev2dev.bea.com/)
http://dev2dev.bea.com/pub/a/2008/03/specjms2007.html
See this if you're having trouble printing code examples
by Samuel Kounev and Kai Sachs
03/26/2008
Message-oriented middleware (MOM) is at the core of a vast number of financial services and telco applications, and is gaining increasing traction in other industries, such as manufacturing, transportation, health-care and supply chain management. There is a strong interest in the end user and analyst communities for a standardized benchmark suite for evaluating the performance and scalability of MOM.
In this article we describe SPECjms2007 - the world's first industry-standard benchmark specialized for MOM. SPECjms2007 is based on a novel application in the supply chain management domain that has been specifically designed as a representative workload scenario for evaluating the performance and scalability of MOM products.
In addition to providing standard workload and metrics for MOM performance, the benchmark provides a flexible performance analysis framework that allows users to customize the workload according to their requirements.
The article discusses the business scenario and workload modeled by the benchmark, as well as the benchmark design and architecture. We explain the meaning of the benchmark metrics and discuss how the various features supported by the benchmark can be exploited for in-depth performance analysis of MOM infrastructures.
Message-oriented middleware (MOM) is increasingly adopted as an enabling technology for modern event-driven applications like stock trading, event-based supply chain management, air traffic control and online auctions to name just a few. Moreover, the publish-subscribe paradigm is now used as a building block in major new software architectures and technology domains such as Enterprise Service Bus (ESB), Enterprise Application Integration (EAI), Service-Oriented Architecture (SOA) and Event-Driven Architecture (EDA). Novel messaging applications, however, pose some serious performance and scalability challenges. For example, the next generation of event-driven supply chain management based on RFID technology will be highly reliant on scalable and efficient backend systems to support the processing of acquired real-time data and its integration with enterprise applications and business processes. Large retailers, like Wal-Mart, Metro or Tesco, are expected to have throughput rates of about 60 billion messages per annum. The performance and scalability of the underlying MOM platforms used to process these messages will be of crucial importance for the successful adoption of such applications in the industry.
To guarantee that applications meet their Quality of Service (QoS) requirements, it is essential that the platforms on which they are built are tested using benchmarks to measure and validate their performance and scalability. While several proprietary benchmarks for MOM servers (for example SonicMQ Test Harness, IBM's Performance Harness for JMS) have been developed and used in the industry for performance testing and product comparisons, these benchmarks do not provide a level playing field for performance comparisons. The reason is that most of them use artificial workloads that do not reflect any real-world application scenario. Furthermore, they typically concentrate on stressing individual MOM features in isolation and do not provide a comprehensive and representative workload for evaluating the overall MOM server performance.
To address these concerns, in September 2005 the Standard Performance Evaluation Corporation (SPEC) launched a project with the goal to develop a standard benchmark for evaluating the performance and scalability of MOM products. The new benchmark was called SPECjms2007 and it was developed at SPEC's OSG-Java Subcommittee with the participation of Technische Universität Darmstadt, IBM, Sun, Oracle, BEA, Sybase and Apache. SPECjms2007 exercises messaging products through the JMS (Java Message Service) standard interface which is supported by all major MOM vendors.
The aim of the SPECjms2007 benchmark is to provide a standard workload and metrics for measuring and evaluating the performance and scalability of JMS-based MOM platforms. To achieve this the SPECjms2007 workload must fulfill several important requirements. First of all, it must be based on a representative workload scenario that reflects the way platform services are exercised in real-life systems. The communication style and the types of messages sent and received by the different parties in the benchmark scenario should represent a typical transaction mix. The goal is to allow users to relate the observed behavior to their own applications and environments. Second, the workload should be comprehensive in that it should exercise all platform features typically used in MOM applications including both point-to-point (P2P) and publish/subscribe (pub/sub) messaging. The features and services stressed should be weighted according to their usage in real-life systems.
The following dimensions have to be considered when defining the workload transaction mix:
The third requirement is that the workload should be focused on measuring the performance and scalability of the MOM server's software and hardware components. It should minimize the impact of other components and services that are typically used in the chosen application scenario. For example, if a database would be used to store business data and manage the application state, it could easily become the limiting factor of the benchmark as experience with other benchmarks (e.g., ECperf) shows. Finally, the SPECjms2007 workload must not have any inherent scalability limitations. The user should be able to scale the workload both by increasing the number of destinations (queues and topics) as well as the message traffic pushed through a destination.
Producing and publishing standard results for marketing purposes will be just one usage scenario for SPECjms2007. Many users will be interested in using the benchmark to tune and optimize their platforms or to analyze the performance of certain specific MOM features. Others could use the benchmark for research purposes in academic environments where, for example, one might be interested in evaluating the performance and scalability of novel methods and techniques for building high-performance MOM servers. All these usage scenarios require that the benchmark framework allows the user to precisely configure the workload and transaction mix to be generated. Providing this configurability is a great challenge because it requires that interactions are designed and implemented in such a way that one could run them in different combinations depending on the desired transaction mix.
The workload scenario chosen for SPECjms2007 models the supply chain of a supermarket company. The participants involved are the supermarket company, its stores, its distribution centers and its suppliers. The scenario, depicted in Figure 1, offers an excellent basis for defining interactions that stress different subsets of the functionality offered by MOM servers, e.g. different message types as well as both P2P and pub/sub communication. Moreover, it offers a natural way to scale the workload., e.g. by scaling the number of supermarkets or by scaling the amount of products sold per supermarket. We now take a closer look at the participants involved in the scenario.

Figure 1: Business Scenario - Supermarket Supply Chain
The company's corporate headquarters are responsible for managing the accounting of the company, managing information about the goods and products offered in the supermarket stores, managing selling prices and monitoring the flow of goods and money in the supply chain.
The distribution centers supply the supermarket stores. Every distribution center is responsible for a set of stores in a given area. The distribution centers in turn are supplied by external suppliers. The distribution centers are involved in the following activities: taking orders from supermarkets, ordering goods from suppliers, delivering goods to supermarkets and providing sales statistics to the HQ (e.g., for data mining).
The supermarkets sell goods to end customers. The scenario focuses on the management of the inventory of supermarkets including their warehouses. Some supermarkets are smaller than others, so that they do not have enough room for all products, others may be specialized for some product groups like certain types of food. We assume that every supermarket is supplied by exactly one of the distribution centers.
The suppliers deliver goods to the distribution centers of the supermarket company. Different suppliers are specialized for different sets of products and they deliver goods on demand, i.e. they must receive an order from the supermarket company to send a shipment.
The following seven interactions between the participants in the supermarket supply chain are modeled in SPECjms2007:
Let's look at these interactions in more detail.
This interaction exercises persistent P2P messaging between the SMs and DCs. The interaction is triggered when goods in the warehouse of a SM are depleted and the SM has to order from its DC to refill stock. The following steps are followed as illustrated in Figure 2:

Figure 2: Interaction 1 - Communication between SM and DC
This interaction exercises persistent P2P and pub/sub (durable) messaging between the DCs and SPs. The interaction is triggered when goods in a DC are depleted and the DC has to order from a SP to refill stock. The following steps are followed as illustrated in Figure 3:

Figure 3: Interaction 2 - Communication between SP and DC
This interaction exercises persistent, durable pub/sub messaging between the HQ and the SMs. The interaction is triggered when selling prices are changed by the company administration. To communicate this, the company HQ sends messages with pricing information to the SMs.
This interaction exercises persistent P2P messaging inside the SMs. The interaction is triggered when goods leave the warehouse of a SM (to refill a shelf). Goods are registered by RFID readers and the local warehouse application is notified so that inventory can be updated. Note that since incoming goods are part of another interaction (Interaction 1), they are not considered here.
This interaction exercises non-persistent P2P messaging between the SMs and the HQ. The interaction is triggered when a SM sends sales statistics to the HQ. HQ can use this data as a basis for data mining in order to study customer behavior and provide useful information to marketing. For example, based on such information, special offers or product discounts can be made.
This interaction exercises non-persistent, non-durable pub/sub messaging between the HQ and the SMs. The interaction is triggered when new products are announced by the company administration. To communicate this, the HQ sends messages with product information to the SMs selling the respective product types (e.g., food, computers, mp3-players).
This interaction exercises non-persistent, non-durable pub/sub messaging between the HQ and the SMs. The interaction is triggered when the HQ sends credit card hot lists to the SMs (complete list once every hour and incremental updates as required). This interaction is used to exercise non-durable, non-persistent pub/sub messaging.
|
SPECjms2007 is implemented as a Java application comprising multiple JVMs and threads distributed across a set of client nodes. For every destination (queue or topic), there is a separate Java class called Event Handler (EH) that encapsulates the application logic executed to process messages sent to that destination. Event handlers register as listeners for the queue/topic and receive call backs from the messaging infrastructure as new messages arrive.
For maximal performance and scalability, multiple instances of each event handler executed in separate threads can exist and they can be distributed over multiple physical nodes. Event handlers can be grouped according to the physical location (e.g. HQ, SM, DC or SP) they pertain to in the business scenario.
In addition to the event handlers, for every physical location, a set of threads is launched to drive the benchmark interactions that are logically started at that location. These are called driver threads. The set of all event handlers and driver threads pertaining to a given physical location is referred to as agent. For example, each DC agent is comprised of a set of event handlers for the various destinations inside the DC and a set of driver threads used to drive Interaction 2, which is the only interaction with logical starting point at DCs.
An important goal of SPECjms2007 that we discussed earlier was to provide a flexible framework for performance analysis of MOM servers that allows users to configure and customize the workload according to their requirements. To achieve this goal, the interactions have been implemented in such a way that one could run them in different combinations depending on the desired transaction mix. Configurability is provided along the following dimensions:
SPECjms2007 offers three different ways of structuring the workload: horizontal, vertical and freeform. The latter are referred to as workload topologies and they correspond to three different modes of running the benchmark offering different level of configurability.
The horizontal and vertical topologies represent two strategies for scaling the supermarket supply chain scenario - the first one by increasing the number of physical locations and the second one by increasing the amount of traffic per physical location. The horizontal topology is meant to exercise the ability of the system to handle increasing number of destinations. To this end, the workload is scaled by increasing the number of physical locations (SMs, DCs, etc.) while keeping the traffic per location constant.
The vertical topology, on the other hand, is meant to exercise the ability of the system to handle increasing message traffic through a fixed set of destinations. Therefore, a fixed set of physical locations is used and the workload is scaled by increasing the rate at which interactions are run. Finally, the freeform topology allows the user to use the seven SPECjms2007 interactions as building blocks to design his own workload scenario which can be scaled in an arbitrary manner by increasing the number of physical locations and/or the rates at which interactions are run. Table 1 below shows the workload parameters that can be configured in the three topologies:
| Workload Parameter | Configurable In | ||
|---|---|---|---|
| Freeform | Horizontal | Vertical | |
| # physical locations (HQs, SMs, DCs, SPs) emulated. | YES | NO | NO |
| Rates at which interactions are run. | YES | NO | NO |
| Message size distribution for each message type. | YES | NO | NO |
| # agents for each physical location. | YES | YES | YES |
| Distribution of agents across client nodes. | YES | YES | YES |
| # JVMs run on each client node. | YES | YES | YES |
| Distribution of agents among JVMs. | YES | YES | YES |
| # event handlers for each message type (message consumers). | YES | YES | YES |
| # driver threads for each interaction (message producers). | YES | YES | YES |
| Connection factory used by each event handler or driver thread. | YES | YES | YES |
| # JMS Connections shared amongst event handlers within a single agent. | YES | YES | YES |
| Acknowledgment mode for non-transactional sessions. | YES | NO | NO |
| Connection sharing by multiple sessions. | YES | NO | NO |
| Frequency of verifying message integrity after transmission (CRC check). | YES | NO | NO |
While in the horizontal and vertical topologies there are some restrictions as to which of the above parameters can be set, no restrictions apply to the freeform topology. Most importantly, the user can selectively turn off interactions or change the rate at which they are run to shape the workload according to his requirements. At the same time, when running the horizontal or vertical topology, the benchmark behaves as if the interactions were interrelated according to their dependencies in the real-life application scenario.
The paper Workload Characterization of the SPECjms2007 Benchmark published in the Proceedings of the 4th European Performance Engineering Workshop (EPEW-2007) provides a comprehensive workload characterization of SPECjms2007. The benchmark workload is characterized in terms of the number and types of destinations (queues and topics), the interaction mix, the message types, the message sizes and the message delivery modes. The different types of messages and destinations used in the various interactions are detailed in Table 2.
The detailed message throughput analysis presented in the paper serves two main purposes. First, using the information provided, the user can assemble a workload configuration (in terms of number of locations and interaction rates) that stresses specific types of messaging under given scaling conditions. This allows the user to construct his own custom workload using the SPECjms2007 interactions as building blocks. As a very basic example, the user might be interested in evaluating the performance and scalability of non-persistent pub/sub messaging under increasing number of subscribers. In this case, a mix of Interactions 6 and 7 can be used with increasing number of SMs. Second, the characterization of the message traffic on a per location basis can help users to find optimal deployment topology of the agents representing the different locations such that the load is evenly distributed among client nodes and there are no client-side bottlenecks. This is especially important for a messaging benchmark where the server acts as mediator in interactions and significant amount of processing is executed on the client side.
| Interaction | Message | Destination | Type | Properties | Description |
|---|---|---|---|---|---|
| 1 | order | Queue (DC) | ObjectMsg | P, T | Order sent from SM to DC. |
| orderConf | Queue (SM) | ObjectMsg | P, T | Order confirmation sent from DC to SM. | |
| shipDep | Queue (DC) | TextMsg | P, T | Shipment registered by RFID readers upon leaving DC. | |
| statInfoOrderDC | Queue (HQ) | StreamMsg | NP, NT | Sales statistics sent from DC to HQ. | |
| shipInfo | Queue (SM) | TextMsg | P, T | Shipment from DC registered by RFID readers upon arrival at SM. | |
| shipConf | Queue (DC) | ObjectMsg | P, T | Shipment confirmation sent from SM to DC. | |
| 2 | callForOffers | Topic (HQ) | TextMsg | P, T, D | Call for offers sent from DC to SPs (XML). |
| offer | Queue (DC) | TextMsg | P, T | Offer sent from SP to DC (XML). | |
| pOrder | Queue (SP) | TextMsg | P, T | Order sent from DC to SP (XML). | |
| pOrderConf | Queue (DC) | TextMsg | P, T | Order confirmation sent from SP to DC (XML). | |
| invoice | Queue (HQ) | TextMsg | P, T | Order invoice sent from SP to HQ (XML). | |
| pShipInfo | Queue (DC) | TextMsg | P, T | Shipment from SP registered by RFID readers upon arrival at DC. | |
| pShipConf | Queue (SP) | TextMsg | P, T | Shipment confirmation sent from DC to SP (XML). | |
| statInfoShipDC | Queue (HQ) | StreamMsg | NP, NT | Purchase statistics sent from DC to HQ. | |
| 3 | priceUpdate | Topic (HQ) | MapMsg | P, T, D | Price update sent from HQ to SMs. |
| 4 | inventoryInfo | Queue (SM) | TextMsg | P, T | Item movement registered by RFID readers in the warehouse of SM. |
| 5 | statInfoSM | Queue (HQ) | ObjectMsg | NP, NT | Sales statistics sent from SM to HQ. |
| 6 | productAnnouncement | Topic (HQ) | StreamMsg | NP, NT, ND | New product announcements sent from HQ to SMs. |
| 7 | creditCardHL | Topic (HQ) | StreamMsg | NP, NT, ND | Credit card hotlist sent from HQ to SMs. |
As mentioned earlier, the goal of the horizontal topology is to exercise the ability of the system to handle increasing number of destinations. To achieve this, the workload is scaled by increasing the number of physical locations (SMs, DCs, etc) while keeping the traffic per location constant. A scaling parameter BASE
is introduced that has to be set by the user before running the benchmark. As the BASE is increased, the overall message throughput rises until the system is saturated. For a run to be valid (passed), all queues must be stable and the 90th percentiles of delivery times must not exceed 5 sec.
The reported benchmark metric for a valid horizontal run is called SPECjms2007@Horizontal and is equal to the value of the BASE parameter at which the benchmark was run. The user is expected to run multiple times with increasing BASE values until he reaches the highest point at which the conditions for a valid run are still satisfied. The latter is normally submitted as an official result for publication by SPEC.

Figure 4: # Locations for Horizontal Topology
Figure 4 shows how the number of locations of each type is scaled as the BASE parameter is increased. The rates at which interactions are initiated by participants are fixed so that the traffic per location (and therefore also per destination) remains constant. The relative weights of the interactions are set based on a detailed business model of the supermarket supply chain which captures the interaction interdependencies. This model has several input parameters (e.g., total number of product types, size of supermarkets, average number of items sold per week) whose values are chosen in such a way that the following overall target messaging mix is achieved as close as possible:
The goal is to put equal weight on P2P and pub/sub messaging. Within each group the target relative weights of persistent vs. non-persistent messaging have been set according to the relative usage of these messaging styles in real-life applications. Figure 5 shows the message mix in the horizontal topology. When scaling the workload the proportions of the different types of messages remain constant. The sizes of the messages used in the various interactions have been chosen to reflect typical message sizes in real-life MOM applications. Pub/sub messages are generally much smaller than P2P messages due to the decoupled nature of the delivery mechanism.

Figure 5: Horizontal Topology Message Mix
In the vertical topology a fixed set of physical locations is used and the workload is scaled by increasing the rate at which interactions are executed. Similar to the horizontal case, a single parameter BASE is used as a scaling factor and the user is expected to scale the workload up to the highest BASE at which the conditions for a valid run are still satisfied. The metric for the vertical topology is called SPECjms2007@Vertical. Again, the relative weights of the interactions are set based on the business model of the supply chain scenario. Unlike the horizontal topology, however, the vertical topology places the emphasis on P2P messaging which accounts for 80% of the total message traffic.
The aim is to exercise the ability of the system to handle increasing traffic through a destination by processing messages in parallel. This aspect of MOM server performance is more relevant for P2P messaging (queues) than for pub/sub messaging where the message throughput is inherently limited by the speed at which subscribers can process incoming messages. Figure 6 shows the achieved message mix in the vertical topology. Again, when scaling the workload the message mix remains constant which is the expected behavior.

Figure 6: Vertical Topology Message Mix
SPECjms2007 provides a flexible and robust tool that can be used for in-depth performance evaluation of MOM servers. The benchmark allows users to customize the workload to their needs by configuring it to stress selected features of the MOM infrastructure in a way that resembles a given target customer workload. However, in order to take advantage of this, users need to understand the way the workload is decomposed into components and which performance aspects are exercised by these components. In this article, we first introduced the business scenario and workload modeled by SPECjms2007 and then looked at the benchmark design and internal architecture. We presented a characterization of the workload looking at the interaction and message mixes and the way they are scaled. The characterization, on the one hand, aims to help users gain an in-depth understanding of the SPECjms2007 workload, so that they can interpret the benchmark results correctly. On the other hand, it provides the information needed to enable users to tailor the workload to their own requirements.
SPECjms2007 provides a representative workload for measuring the performance and scalability of MOM servers. It can be used for the following purposes:
Samuel Kounev serves as the release manager of SPEC's Java subcommittee and was actively involved in the development and specification of the SPECjAppServer and SPECjms set of industry-standard benchmarks. He is a BEA Technical Director and holds a Ph.D. in computer science from Technische Universitaet Darmstadt and a M.Sc. degree from the University of Sofia.
Kai Sachs was the lead developer of the SPECjms2007 benchmark. He works in the Databases & Distributed Systems Group at Technische Universitaet Darmstadt (Germany).