Data in SOA, Part I: Transforming Data into Informationby Richard Manning Data and data management are key aspects of nearly every enterprise software solution. SOA is no exception. Effective data modeling and management are an essential part of successful SOA realization. To take your data to the next level you need to transform it into information; to take your information to the next level you need to transform it into knowledge. This article is the first in a series of two articles on “Data in SOA: Transforming Data into Knowledge.” In this article I describe an approach to transforming data into information in SOA as part of an overall SOA transformation plan, with a definition of a SOA Reference Architecture (SOA RA), and the realization of an enterprise SOA. In Part II of this series I describe an approach to transforming information into knowledge for SOA as an extension to an overall SOA transformation plan and a high-value expansion of an enterprise SOA RA. Why Data?Data are ubiquitous (data is plural; datum is singular though both plural and singular verbs can be used with "data"). At their core, most IT efforts are focused on collecting, distributing, and managing data, providing data when it's needed, where it's needed, how it's needed, and for whomever (with proper authorization) needs it. Some may recall that long before the term IT ("information technology") was coined, most enterprises called their "computer departments" and activities DP, or “Data Processing.” With all the technology waves past, present, and into the foreseeable future, one constant has remained: data. The same data that were (and still likely are) processed by mainframes have also likely been processed by one or more of client-server, CORBA/DCOM, Java EE, .NET, Web services, SOA, and Web 2.0. Over time, the storage, formats, and transports may have changed, and how the data is processed has changed, but the "data" remain (and are growing). In essence, all the industry technology waves have one thing in common: they are new or improved ways to process data. Data are fundamental. If you agree with my premise that data are fundamental to enterprise solutions, it follows that data (and data modeling/management) are also a priority consideration for enterprise architects in SOA (and Web 2.0). What are Data?Let's start by selecting your favorite dictionary definition for "data," and then augment it. For the purpose of this article, data are the elemental, atomic, or low-level aggregation of pieces of "information" with some structure (form), relations, and state, but no behavior. For example, an Address table with columns for Street Address, City, and so on, is an example of data, as is the definition of an Address in a Customer Table. Data are structure and state without behavior. Data are the raw building blocks from which we may construct information. Data are the prerequisite for Information. What is Information?Again, choose your favorite definition, and then augment it. For the purpose of this article, information is the aggregation of data and the fundamental logic that provides additional form, the basic relations, and syntactic and semantic contexts—that is, it is state and core model behavior. For example, correctness in ensuring a ZIP code is valid and consistent with the City. Information extends data by providing the ability to map, or relate, data, and define logic for the behavioral models consistent with the domain (syntax and semantics) context. Information is based on and requires data. In other words, information represents entities (subjects, objects) that encapsulate both state (data) and behavior (logic). You may consider information as being analogous to an instance of a model class in object-oriented programming which contains both data members (instance variables) that hold state and methods that provide (model) behavior. The Value of Data in SOAOrganizations have different drivers, starting points, and priorities for defining and refining their SOA Reference Architecture (SOA RA), which may shift during their transformation to SOA. A holistic approach to the planning and design of a SOA RA should include the data services layer. This article uses the term data services layer to include both data and information access services. Without an enterprise data services layer in your SOA RA, subsequent line-of-business (LoB) projects will be forced to develop individual "point," or one-off solutions, that are specific to each application. Few commonalities will be discovered, few opportunities for shared service definition, reuse, and consistency will be discovered, and the definition of a canonical data model will be elusive. There is a good chance that many of the benefits of SOA (and ROI) will take longer to realize, if they are realized at all. We’ve probably all read statistics that place project resource consumption on data integration tasks at anywhere from 50 to 85 percent of enterprise application software development! This anecdotal "fact" alone should be enough to ensure a data services layer is an integral part of any SOA realization. Combined with the obvious notion that our enterprise software solutions are primarily designed to process data, the value of data in SOA should also be apparent. Figure 1 is a high-level conceptual view of BEA's SOA Reference Architecture, which illustrates high-level layers. Note the presence of the data services layer as first-class area, indicating the importance of the data services layer in a SOA RA.
Data, data models, and data management are fundamental to SOA success. In fact, BEA values data services so highly that not only do we offer the AquaLogic Data Services Platform product, but data services are a fundamental part of many BEA Consulting service offerings, which include a Data Services Consulting Service where the focus is on SOA data and information layer planning, design, and development. A Note About Data Access and Connectivity ServicesData access services refer to information sources often collectively known as Enterprise Information Systems (EIS) as well as databases and file systems. These can be legacy systems, systems of record, packaged commercial applications, customer, partner, and third-party applications and services, and Web services. What they have in common is that they provide data and/or information (which implies behavior in the context of this article) for consumption by other applications. In this sense, these applications when accessed through the data services layer are just another form or source of data. At a higher level of abstraction, Data services would look the same to consuming applications, which is one of the primary goals (normalization/consistency) of the data services layer in SOA RA. The fact that the interface exposed for consumption interacts with one or more databases, tables, back-end, legacy, shrink-rapped, and/or external systems is an implementation detail encapsulated by the data services layer. Connectivity services are about exposing applications and databases as application services in a standards-based manner. Transforming Data into InformationSo, your organization is planning a transformation to SOA. Investigation and planning on all layers and aspects of the SOA RA (see Figure 1) has started, and you have been tasked with the realization of the data services layer. Now what? Consider the following transformation steps:
Figure 2 provides an example of a possible set of internal abstraction layers for an SOA RA data services layer where we will map the requirements and capabilities from our 9 steps:
Figure 2: Data services layer –internal layer abstraction Based on your requirements and perspective, you may determine the need for a different set of abstraction layers. At the very least, you should separate the physical and logical layers and distribute your rule types accordingly. Let's now look at each of these steps in more detail. 1) Inventory Existing Data and System Access AssetsThe first step is finding out what is out there, that is, what are your current data and information system access assets. What data and information assets (referred to as simply "assets" for the remainder of the article), for example databases, information sources, and applications (meaning legacy, system of record) does your organization have? For each asset you will want to know the supporting metadata such as documentation, history, technology/tools/products/platforms, versions, ownership/management, location, security, and access mechanisms. Depending on the number of assets and their metadata, you may want to consider some sort of metadata catalogue or repository as well as a standard template or set of templates that captures the meta-information in a consistent manner and allows for search. 2) Determine Dependency MatrixOnce you have started or created the asset catalogue, the second step is to determine the dependency matrix. The dependency matrix, also part of the asset meta-information, captures information on who uses the asset, when they use it, frequency/how often, what they do with, or to, the asset (for example, CRUD), where they use it (that is, what type of access—batch, online, real time, reporting). It is also important to understand why a consumer uses a particular asset as that will help with task prioritization as well as provide requirements for your emerging data models. Once you have captured the "who, what, where, when, how, and why" for each known consumer of an asset, you can start to analyze and form generalizations across all asset consumers. The goal is to find opportunities for simplification and reuse by transforming existing assets into SOA Building Blocks. These include, but are not limited to, assets in a service-oriented, self-describing, discoverable form that can be readily utilized in an SOA ecosystem using open, common, industry, and/or organization standards. One definition contained within the set of SOA Building Blocks is your definition of a service. What standards and specifications, and their versions, will be used? For example, specific versions of WSDL, SOAP, UDDI, WS-Security, WS-I Basic Profile, WS-Addressing, XML, and XSD may be required, while others may be optional/recommended. Your data and information access assets will likely take a form consistent with your basic SOA Building Block definition of a "service." (Using your favorite search engine, search on the topics of “Service Identification” and “Service Definition,” which cover this area.) 3) Establish Baselines Metrics/SLAsEach catalogued asset, since it already exists in some form, should have estimated or actual production usage statistics, including transaction volume, patterns, concurrent users, reliability, availability, scalability, and performance (RASP) information.
Usage information is also a great indicator of business and IT value and priority. This baseline information is used to define a set of metrics that will form the basis of Service Level Agreements (SLAs) and allow for goal definition and tracking over time. Metrics, as well as current production information, are invaluable in sizing and capacity planning of both hardware and software to support the data services layer in SOA. Be sure your SLAs are bidirectional, that is the service provider defines its For example, an agreement states that Consumer A may perform a maximum of 100 Metrics and SLAs define the expectations and rules of engagement that affect the basis of the value, goal, and sizing of each asset. Track your baseline metrics, SLAs, and reuse to establish a cost and benefits model. With the preceding set of information captured to some degree, it should be possible to start evaluating each asset in the context of all the other cataloged assets—that is, assign each asset a priority. A good heuristic is to have at least three and no more than ten (which is excessive) priority levels; any more or less will be inadequate or unmanageable. Priority assignments are designed to assist in the identification of the most important assets based on utilization and the value of the business functions supported. You should design a set of metrics (including those in Step 3) and definitions that provide for empirical comparison and evaluation of each asset to determine its priority assignment. Assigning asset priorities will help determine possible project starting points, potential business/IT sponsors, and relative business value. Using all of the preceding information, a "current reality" snapshot for each asset can be established, documented, and tracked as these assets are transformed into SOA building blocks. Across all catalogued assets, the top-rated highest priority assets should be selected for the remaining set of steps. The actual number selected depends on your risk assessments, priority valuation, business/IT goals, resources, and similar factors.
|
Article Tools Related Products Check out the products mentioned in this article:Bookmark Article
|