Pinaki Poddar's Blog
Pinaki Poddar's Homepage
Pinaki Poddar currently works for BEA Systems on J2EE Persistence Services. He developed Java based persistence solutions for Versant Object database and middleware infrastruture for Finance (Dresdner Bank) and HealthCare (Siemens Medical) industries over last 15 years. He contributed to OpenAdaptor -- the first Open Source Enterprise Application Integration framework. In past life, he researched Neural Network based Automatic Speech Recognition techniques. His current interest lies in Business Intelligence solutions for very large databases and Complex Event Processing.
Slice: OpenJPA for Distributed Databases: Part II
Posted by pinaki.poddar on January 8, 2008 at 10:28 PM | Permalink
| Comments (4)
In my last post, I have talked about Slice a plug-in extension of OpenJPA for distributed database environment. Slice provides key features that will help the developers to build distributed database applications. | As the adjoining schematic suggests, Slice plug-in on the southbound interfaces of OpenJPA where it interacts with database. Slice abstracts a set of distributed databases in a single virtual database such that rest of OpenJPA kernel can operate exactly in the same way. The key features that will help you to build distributed transactional application are: Non-intrusive: No change in the application code or persistent domain model. Absolutely. Customized Data Distribution Policy: Implement a single method to decide which database slice will persist a new instance. Per-slice Configuration: Each database slice can be configured with their own database drivers or any other properties. Automatic Tracking: Slice remembers the original database slice for any instance that is loaded from the database as a result of query or find() operation. Slice also traverses the relationships that are annotated as CascadeType.PERSIST. So once user-defined data distribution policy decides a database slice for a root instance, all the instances related to the root instance automatically assigned the same database slice. Parallel Query: All major database operations (query and flush) is executed in parallel across the database slices. Distributed Transaction: If each slice is XA-complaint, then even if persistence unit is configured for RESOURCE_LOCAL transaction, Slice will employ a two-phase commit protocol. | | Salient Features and Overview of Slice | | The detailed instructions and downloads are available at http://people.apache.org/~ppoddar/slice/site/index.html.
Slice: OpenJPA for Distributed Databases
Posted by pinaki.poddar on December 21, 2007 at 4:01 PM | Permalink
| Comments (2)
Slice is a OpenJPA plug-in for horizontally-partitioned, distributed databases. As distributed databases are being increasingly common in enterprise IT ecosystem, I considered extending OpenJPA to transact against a set of databases instead of a a single one. The result is Slice -- a project that is now available for download from Apache Labs. | The salient features of Slice are - available as configurable OpenJPA plug-in a single, separate jar
- transacts against a set of horizontally-partitioned databases
- requires no change in your OpenJPA application
- queries in parallel across multiple databases and consolidates the results in a single list for your application
- remembers the original database from where any instance is loaded from, and commits changes to the appropriate databases
- depends on the user application to distribute newly persistent instances among the set of databases.
Effectively, using Slice, your OpenJPA application can have a object-oriented modifiable view that spans across multiple databases. | If all this sounds too good to be true, here comes the fine print: Slice does not support relationship across different databases. In O-R mapping paradigm, this limitation translates to collocation constraint i.e. the closure of any object graph must be collocated in the same database. You have to implement a distribution policy to honor this collocation constraint (I just said no change in code few lines ago) . But honestly, the interface is really simple (see later). Getting Started with Slice Few simple steps will make your current OpenJPA application ready for distribute databases. 1. Configure the persistence unit by editing META-INF/persistence.xml <property name="openjpa.BrokerFactory" value="slice"/> 2. Specify the distributed database URL by concatenating each participating database with | character <property name="openjpa.ConnectionURL" value="jdbc:mysql://localhost/slice1|jdbc:mysql://localhost/slice2"/> 3. Specify your distribution policy implementation class name <property name="slice.DistributionPolicy" value="com.acme.policy.MyDistributionPolicy"/> 4. Implement Distribution Policy. The interface is really simple: package org.apache.openjpa.slice.*; public interface DistributionPolicy { int distribute(Object pc, String[] urls); } Slice will call this method whenever it needs to persist a new instance. The second argument to the method is the list of database URLs in the same order as in openjpa.ConnectionURL setting. Your implementation must return a zero-based integer index denoting the database selected for a given instance. Remember the collocation constraint in implementing logic of this method. For example, if Person A refers to Address B then the distribution policy must select the same database index for both. However, if Person C relates to Address D then C and D can reside in a different database. 5. That is it. Run your OpenJPA application. Some code examples are available in Slice source code base maintained in Apache Labs Subversion repository for you to take a look at. Design Notes Slice uses a distributed template pattern to abstract a set of distributed databases as a virtual database. A distributed template pattern is a type T' which specializes a type T as well as delegates all its operations to a set of concrete instances of T. In its bare bone simplicity, it looks as follows: public class Distributed<T> extends T { private Collection<T> _delegates; public void doSomething(Object...args) { for (T t:_delegates) t.doSomething(args); } Slice has applied this distributed template pattern for the internal interfaces OpenJPA uses to interact with relational database e.g. StoreManager and StoreQuery as well as key JDBC abstractions e.g. DataSource, Connection, PreparedStatement in java.sql.* package. As the distributed resources are introduced via OpenJPA plug-in mechanism, the rest of OpenJPA kernel works exactly the same way oblivious of the fact that the database operations are getting multiplexed via the distributed template pattern across all the configured database instead of a single one. The other key issue in this design is to tag each instance with the original database identifier from which it has been loaded. Now OpenJPA manages your persistent entities via a proxy referred as OpenJPAStateManager. The database identifier is attached to this proxy instance. Thanks to the foresight of the original designers, who had kept a placeholder in the proxy object for associating arbitrary content that remains opaque to rest of the object management kernel. Conclusion OpenJPA architecture uses well-designed interfaces and sophisticated plug-in mechanics for ease of extensibility. We leveraged this extensibility to develop Slice -- a OpenJPA plug-in for distributed databases as they are becoming increasingly common for enterprise applications. Happy Holidays!
Persistence of Generic Graph
Posted by pinaki.poddar on August 31, 2007 at 7:34 PM | Permalink
| Comments (5)
In this blog, I will discuss persistence of a graph. Not only because graphs are such a common metaphor in modeling, but also graph-oriented domain model is a suitable vehicle to discuss some aspects of mapping a generically typed object model to concrete table-oriented relational database. Graph is a common metaphor in many domain models. Because graphs support the intuitive notion that everything is connected to everything else. In its most abstract form, a graph is defined as a set of connected nodes. The connection between the nodes are often termed as link or edge. An edge can be either directed or not. The theory of Graphs were laid out by Paul Erdos and Alfred Renyi - two Hungarian mathematicians - back in 1950's. The focus of their analysis was random graphs -- where each edge connects any two nodes with uniform probability. An excellent book by Albert-Laszlo Barabasi titled Linked explores how a new graph theory is emerging to analyze the real world networks that are not so random including the mother of all networks -- World Wide Web. Grab a copy and you will be amazed. It may even help you to get VC funding for the great next Web 2.0 start-up as it does discuss about Social Networks. The question I ask in this blog about graph: How to model, persist and operate on a generic Java object graph that can be stored in a relational database? As you know, Java, since 5.0, added support for generics (what C++ called Templates and was a favorite topic of any interviewer back in early OO days). Simply put, generics let you treat type as a variable. For example, if I want to define a list of People or Cities or something else, instead of defining separate lists, I use a generic List<T> such that type of things that the List handles itself becomes a variable (represented as <T>). Similarly for Graphs -- I will like to describe a single generic type Graph<T> that will work well for all node types T without knowing whether T really represents a City, Person or URL. Mapping Definition for a Generic Persistent Graph What is the persistent state of such a generic graph? Of course, the content of its nodes and the information about its edges. The challenge of modeling such a generic graph is not to assume any details about the exact nature of nodes (because they can be anything) and still be able to define a persistence mapping solution. Also note that JPA provides support for persistence of similar container data structures such as List or Map -- but as second-class objects -- i.e. they can be stored as part of some first-class Entity but not by themselves. While here I am suggesting to promote Graph as a first-class data structure -- that comes with its own identity and can be stored on its own right. Here is the definition of such a generic graph without the persistence specification public class Graph<T> implements Serializable { private Map<String, T> nodes; private List<Edge<T>> edges; | First thing to note is that we are leveraging the fact that core Java collection classes i.e. Map<K,V> and List<T> themselves are generic as element and/or value they carry are being declared as variable type. Also notice that Graph we defined carries not a List of nodes but a Map of nodes. This map holds the nodes keyed by String. These keys label the nodes. A label is the contextual identity of a node within a graph. The label of node in a Graph can be different from any intrinsic identity that a node may have. So one can consider, for example, the same Person instance whose intrinsic moniker is 'Pinaki Poddar' and label him as 'BEA Engineer' in one graph and 'Small Black Indian' in another. This labeling scheme also indirectly points to another aspect: a graph does not own its nodes or, in other words, a node can exist independent of a graph. This aspect is important because it allows a node without any knowledge of a graph to become a member of a graph. What are the types of the nodes contained by the graph according to our definition? We do not know at this time. The type of the node is a variable represented as <T>. Edge is also defined generically as a structure that connects two nodes of variable type T public class Edge<T> implements Serializable { private T source; private T target; private Map<String,String> attributes = new HashMap<String,String>(); | Edge represents connection between two nodes. It is clear in the way we named the first two state variables of Edge, that we are modeling a directed edge. What is the attributes parameter in Edge? This parameter allows us to qualify a connection. For example, consider a connection between two Persons say "Pinaki" and "Ashok" (who was our manager and left us for a famous Internet Search company). A non-attributed edge can represent the fact that "Pinaki" and "Ashok" as two nodes of a graph are connected. But that is not enough. They can be connected as Employee-Manager and/or as Friend-Friend. One way would be to draw multiple edges between the same pair of nodes and the other way is to use an edge with multiple attributes. In Universal Modeling Language, such a class that qualifies an association with be called an Association Class. The other important function of a edge is it allows a node to be agnostic of the neighboring nodes. In fact, a node has no notion or dependency on either Graph or Edge. This complete independence of a node was brilliantly used in stream-based integration model of Unix pipes where cat and grep and wc (the nodes) with absolutely no knowledge of each other can still be piped/teed (i.e. connected in a graph) based only on a mutual agreement of common stream format to produce marvelous effect. Based on similar principles, in my past life in Europe, under the guidance of a Guru who taught me the power of simplicity, we built a system integration toolkit that actually ran (and still runs) the middleware of a global investment bank much before SOA and SCA and ESB became technological tsunami. But that is a story for another day. Making Graph<T> Persistent Now let us annotate the persistence state of our Graph and Edge. First we show how to annotate the Graph.nodes field declared as a Map. @Entity @DataStoreId public class Graph<T> implements Serializable { @PersistentMap( keyType = String.class, elementType = Entity.class, elementCascade = {PERSIST, REFRESH, MERGE }) private Map<String, T> nodes; @OneToMany(cascade = ALL) @ElementDependent private List<Edge<T>> edges;
| We are saying that Map of nodes is a Persistent Map with String as key and Entity as value. That the keys are of type String could have been left unspecified as that can as well be inferred from the declaration of the Map itself. For the value of the map, elementType qualifies that the values i.e. the nodes are Entity -- the core annotation type defined by JPA to mark a type as persistence-capable. So effectively, types of node of our Graph can vary but they must be persistence-capable type. If we remove this constraint and leave our nodes to be of any type, the only option a JPA mapping will have is to serialize the whole map of nodes as a streams of bytes and store in a database column as a BLOB. Not very intelligible and surely not amenable to query based on nodes' properties. Hence this useful restriction that says our persistent generic graph only models nodes that are themselves persistence capable. Finally the cascade qualifier. Cascading qualifiers is one of the many powerful features of JPA. In pure Java, relations are not attributed. In JPA, we can annotate a relation to designate which persistence operations propagate (or cascade) to the related Entity. Here, when a Graph instance is persisted, refreshed or merged, all its nodes will be persisted, refreshed and merged too. But we did not say that when a graph is deleted that the nodes will be deleted. This is in accordance to our modeling goal that nodes exist independent of a graph they are member of. An Edge, however, has no independent existence and hence all the persistence operation on a graph including deletion cascade to its edges. The field Graph.edges represents @OneToMany relationship with a Graph and annotated as such. The field edges are also annotated as @ElementDependent. This implies that element of this List is completely dependent on its owning Entity i.e. Graph. This is a stronger dependency than cascade.REMOVE that says if we delete a Graph its edges are deleted too. Whereas @ElementDependent says if we remove an edge from its Graph by delinking a pair of nodes (and do not assign that edge to another Graph), the edge is gone forever -- from the memory by Java Garbage Collector and from the datastore by JPA runtime when the transaction commits. While @OneToMany annotation for edges field is standard JPA, @PersistentMap for nodes field or @ElementDependent for edges field are not; they are few of the extended set of annotations that OpenJPA supports. If we were to use pure JPA annotation, such as @OneToMany or @ManyToMany for nodes field, that will restrict the key of the nodes to be a persistent field (primary field or otherwise) of the value type. As the value type is variable in this case (except that the are persistence-capable Entity) we have no way to specify what the key types will be. Making Edge<T> Persistent The persistent state of an Edge is defined by its source and target node and the attributes, So respective fields of Edge are annotated as such @Entity @DataStoreId public class Edge<T> implements Serializable { @OneToOne @Type(Entity.class) private T source; @OneToOne @Type(Entity.class) private T target; @PersistentMap private Map<String,String> attributes; | Edge holds a pair of @oneToOne relation to source and target nodes. But again at this point of definition we do not know what is the exact type of these relation fields. Through @Type annotation we are telling OpenJPA runtime that this field will be assigned to an instance of Entity or persistence-capable type. If we do not supply this information, OpenJPA will have only option to map these fields as BLOB. OpenJPA documentation details on different levels of support OpenJPA provides for a field. Database Schema for Generic Persistent Graph We have annotated our persistent Graph and Edge. If we now map these generic Graph and Edge to a database what database schema does it create? To find out, run MappingTool provided by OpenJPA $ java org.apache.openjpa.jdbc.meta.MappingTool -properties META-INF/persistence.xml -sql stdout graph.Graph graph.Edge The command-line argument -sql stdout asked this tool to generate the DDL in SQL CREATE TABLE Edge (id BIGINT NOT NULL, attributes BLOB, source VARCHAR(255), target VARCHAR(255), PRIMARY KEY (id)); CREATE TABLE Graph (id BIGINT NOT NULL, nodes BLOB, PRIMARY KEY (id)); CREATE TABLE Graph_Edge (Graph_id BIGINT, edges_id BIGINT); CREATE TABLE OPENJPA_SEQUENCE_TABLE (ID TINYINT NOT NULL, SEQUENCE_VALUE BIGINT, PRIMARY KEY (ID)); CREATE INDEX I_GRPH_DG_ELEMENT ON Graph_Edge (edges_id); CREATE INDEX I_GRPH_DG_GRAPH_ID ON Graph_Edge (Graph_id); 341 test WARN [main] openjpa.MetaData - OpenJPA cannot map field "graph.Edge.attributes" efficiently. It is of an unsupported type. The field value will be serialized to a BLOB by default. 381 test WARN [main] openjpa.MetaData - OpenJPA cannot map field "graph.Graph.nodes" efficiently. It is of an unsupported type. The field value will be serialized to a BLOB by default. Oops. The database schema is storing the Graph.nodes as BLOB and warning about it. That is not what we want. What went wrong? Why to pay for Kodo when OpenJPA is free? Though we have annotated the Graph.nodes map as @PersistentMap with String as key and Entity as value, OpenJPA does not have a mapping strategy for such Map fields whose keys are not derived from a field of their values. But Kodo has. So, if we run the same MappingTool command with Kodo libraries in our classpath, we get $ java org.apache.openjpa.jdbc.meta.MappingTool -properties META-INF/persistence.xml -sql stdout graph.Graph graph.Edge CREATE TABLE Edge (id BIGINT NOT NULL, source VARCHAR(255), target VARCHAR(255), PRIMARY KEY (id)); CREATE TABLE Edge_attributes (Edge_id BIGINT, _key_ VARCHAR(255), _value_ VARCHAR(255)); CREATE TABLE Graph (id BIGINT NOT NULL, PRIMARY KEY (id)); CREATE TABLE Graph_Edge (Graph_id BIGINT, edges_id BIGINT); CREATE TABLE Graph_nodes (Graph_id BIGINT, _key_ VARCHAR(255), _value_ VARCHAR(255)); CREATE TABLE OPENJPA_SEQUENCE_TABLE (ID TINYINT NOT NULL, SEQUENCE_VALUE BIGINT, PRIMARY KEY (ID)); CREATE INDEX I_DG_TBTS_EDGE_ID ON Edge_attributes (Edge_id); CREATE INDEX I_GRPH_DG_ELEMENT ON Graph_Edge (edges_id); CREATE INDEX I_GRPH_DG_GRAPH_ID ON Graph_Edge (Graph_id); CREATE INDEX I_GRPHNDS_GRAPH_ID ON Graph_nodes (Graph_id); This output shows that Kodo has defined a schema for Graph that preserved its key and value where value is the foreign key to the row that will hold the details of unspecified node. This output also shows a) how the key-value pairs of the map are held in a separate cross-table called Graph_nodes, b) created indexes for faster access and c) a sequence table because we are using @DataStoreId annotation for Graph to allocate unique persistent identity. Persistent Operation on Graph<T> OK we have defined a generic Graph, annotated its fields with mapping specifications and how persistent operations cascade to its relations and created a database schema accordingly. Now we are ready to write some code that uses such a generic Graph with different Entity classes. For that purpose we write two very simple Entity class: Person.java and City.java. They are both trivially simple classes except one difference: they use different types of identity scheme. While Person uses application identity by annotating its single field name, City uses a datastore identity. The JUnit Test code shown here is used both to demonstrate how to use a generic persistence Graph as well as to verify that persistence operations work as expected. The following code block shows one JUnit Test using Graph<City> where City is a Entity using datastore identity. The test is basic -- it creates few City instances -- the cities that I love -- San Fancisco, New York and Rome, adds them to a Graph. Because all roads lead to Rome, we link San Francisco and New York to Rome but not vice versa and also they do not link to each other. As we persist the graph, the persist operation cascades to all its constituent nodes i.e. the City instances and the edges. That is verified by fetching the graph again from the database in a new transaction and asserting that nodes and links are present as expected. 015 /** 016 * Shows basic to create a Graph, add nodes to it, link them and finally 017 * persist the Graph with all its nodes and edges. 018 * 019 * Verifies that the graph and all its persistent nodes and edges are 020 * properly persisted and can later be fetched from a datastore. 021 * 022 */ 023 public void testGraphOfEntityUsingDatastoreIdentity() { 024 // Create a graph of City 025 Graph<City> cities = new Graph<City>(); 026 027 // Create few Cities 028 City sfo = new City("San Francisco"); 029 City rome = new City("Rome"); 030 City newYork = new City("New York"); 031 032 // Add them to the Graph with unique labels. 033 // Labels are not neccessarily same as their name 034 cities.addNode("San Francisco", sfo); 035 cities.addNode("Rome", rome); 036 cities.addNode("New York", newYork); 037 038 // Link them uni-directionally in a partially-connected graph 039 cities.link(sfo, rome, false); 040 cities.link(newYork, rome, false); 041 042 // Get a handle to OpenJPAEntityManager 043 OpenJPAEntityManager em = getEM(); 044 045 // Begin a transaction. An active transaction is required to 046 // persist instances 047 em.getTransaction().begin(); 048 049 // Persist the graph. Due to cascading relations all the nodes and 050 // edges of the graph will be persisted as well. 051 em.persist(cities); 052 em.getTransaction().commit(); 053 054 // Remember the identifier for Graph for finding it later 055 Object gid = em.getObjectId(cities); 056 057 // Clear the cache so that we do bring the Graph from database 058 // next time we access it 059 em.clear(); 060 061 // Begin a new transaction 062 em.getTransaction().begin(); 063 064 // Look up the graph by its identifier 065 Graph<City> g2 = em.find(Graph.class, gid); 066 067 // Verify that the Graph had been stored with all its nodes and edges 068 assertNotNull(g2.getNodes()); 069 assertNotNull(g2.getNode("San Francisco")); 070 assertNotNull(g2.getNode("Rome")); 071 assertNotNull(g2.getNode("New York")); 072 assertTrue(g2.isLinked("San Francisco", "Rome")); 073 assertFalse(g2.isLinked("Rome", "San Francisco")); 074 assertTrue(g2.isLinked("New York", "Rome")); 075 assertFalse(g2.isLinked("San Francisco", "New York")); 076 em.getTransaction().rollback(); 077 } | The second example follows the similar line. Only this time, the nodes of the Graph are Person instances which is using datastore identity. Also the graph is unlike the previous case is fully-connected. The verification block confirms that our intent is maintained by the code. 079 /** 080 * Similar to the previous test except that it uses an Entity that is using 081 * Application Identity. Also shows how delinking a pair of node results 082 * in deletion of the edge from the datastore also. 083 */ 084 public void testGraphOfEntityUsingApplicationIdentity() { 085 // Create a graph of Person 086 Graph<Person> people = new Graph<Person>(); 087 // Create few Persons 088 Person tom = new Person("Tom Sawyer"); 089 Person dick = new Person("Dick Dickenson"); 090 Person harry = new Person("Harry Potter"); 091 092 // Add them to the Graph with unique labels. 093 // Labels are not neccessarily same as their name 094 people.addNode("Tom", tom); 095 people.addNode("Dick", dick); 096 people.addNode("Harry", harry); 097 098 // Link them bi-directionally in a fully-connected graph 099 people.link(tom, dick, true); 100 people.link(dick, harry, true); 101 people.link(tom, harry, true); 102 103 // Get a handle to OpenJPAEntityManager 104 OpenJPAEntityManager em = getEM(); 105 106 // Begin a transaction. An active transaction is required to 107 // persist instances 108 em.getTransaction().begin(); 109 110 // Persist the graph. Due to cascading relations all the nodes and 111 // edges of the graph will be persisted as well. 112 em.persist(people); 113 em.getTransaction().commit(); 114 115 // Remember the identifier for Graph for finding it later 116 Object gid = em.getObjectId(people); 117 118 // Clear the cache so that we do bring the Graph from database 119 // next time we access it 120 em.clear(); 121 122 // Begin a new transaction 123 em.getTransaction().begin(); 124 125 // Look up the graph by its identifier 126 Graph<Person> g2 = em.find(Graph.class, gid); 127 128 // Verify that the Graph had been stored with all its nodes and edges 129 assertNotNull(g2.getNodes()); 130 assertNotNull(g2.getNode("Tom")); 131 assertNotNull(g2.getNode("Dick")); 132 assertNotNull(g2.getNode("Harry")); 133 assertTrue(g2.isLinked("Tom", "Dick")); 134 assertTrue(g2.isLinked("Dick", "Harry")); 135 assertTrue(g2.isLinked("Harry", "Tom")); 136 137 // Delink Tom from Dick but keep the other side of the link 138 g2.delink("Tom", "Dick", false); 139 140 // Persist the change. Because Edges are @ElementDependent delinking 141 // implies a deleting of the Edge record from the database too 142 em.getTransaction().commit(); 143 144 // Verify that the change has removed one Edge but not the other. 145 em.getTransaction().begin(); 146 Graph<Person> g3 = em.find(Graph.class, gid); 147 assertFalse(g3.isLinked("Tom", "Dick")); 148 assertTrue(g3.isLinked("Dick", "Tom")); 149 em.getTransaction().rollback(); 150 } | Conclusion We demonstrated how a generic Graph can be used as a first-class object by persistent mapping solutions such as OpenJPA/Kodo. Starting with a generic definition of a generic Graph in terms of nodes and edges, we described - how to define the persistent state of such a graph
- how to annotate the persistent states for mapping to a relational database
- what is the corresponding database schema created by OpenJPA runtime under such configuration
- how Kodo extends OpenJPA's capability (and charges you for it) to define advanced mapping scenarios
- how a JPA application can use this generic Graph to operate persistently on any persistent object by it a City or Person or something else
In course of this example, we wanted to bring out some advanced mapping features of OpenJPA and Kodo. Hope in your persistent domain models you will find some usage of generic Graph metaphor and find some relevant pointers in this blog.
Fluid OpenJPA implements SDO Data Access API
Posted by pinaki.poddar on August 1, 2007 at 9:25 PM | Permalink
| Comments (0)
Data Access Service (DAS) is the terminology Service Data Objects (SDO) specification uses to connote a interface that will store, retrieve and update SDO DataObjects/Graphs. How does DAS API look like? There is no agreement yet in the community -- but Tuscany SDO implementation has defined one. It is what a Brit will say interesting with his spine leaning slightly backwards. Good thing about Tuscany DAS API is it is concise and extensible. It follows partially a Command Object pattern. In essence, SDO DAS API is - DASFactory is a factory for, you guessed it, DAS.
- DAS is a factory for Command.
- Command takes positional parameters and can be executed.
- Commands are created with SQL: DAS.createCommand(String sql).
Now here my eyebrows started twitching a little. Do not get me wrong: I respect SQL -- it is the most powerful language for data ever. But as one should not be speaking Spanish in Bhutan, SQL is not the language of choice for objects and graphs. SQL not only exposes internal data representation to the user application, it can as well become a hindrance to Service Oriented Architecture. Werner Vogels, the celebrated CTO of Amazon, spelt it clearly in an interview conducted by legendary Jim Gray (emphasis added): We went through a period of serious introspection and concluded that a service-oriented architecture would give us the level of isolation that would allow us to build many software components rapidly and independently. For us service orientation means encapsulating the data with the business logic that operates on the data, with the only access through a published service interface. No direct database access is allowed from outside the service, and there's no data sharing among the services. That is why I am apprehensive when I see a data access API in a post-Amazon, Web 2.0 world that uses SQL as its primary mechanics for data manipulation. Of course, I am biased. I am biased towards JPA. JPA has abstracted years of collective community experience on a fundamental piece of object-oriented application architecture -- its interaction with relational database -- to arrive at a neat, concise API plus a powerful query language: JPQL - Java Persistence Query Language. JPA is built on a notion that supports Werner Vogels views completely. In JPA, the user application is working on the view -- the so-called persistent domain model that represents the business domain -- Security/CounterParty/Account for banking, PurchaseOrder/LineItem/Product for retail. The view is updatable and JPA takes care of mapping not only the object-oriented data structures and metaphors (inheritance/typed collection/relation traversal) to relational metaphors (Tables/Foreign Keys/Joins), JPA also translates the operations on the view : persist(), merge(), remove() to their SQL counterpart: INSERT, UPDATE, DELETE. A common misconception is to trivialize O-R Mapping problem. Many had, in the past. Many a projects to build a generic persistence layer had been added to the heap of 70% of failed software projects. When I saw DAS.createCommand(String sql), it is that scary statistics that made my eyebrows twitch. In my previous post I had talked about Fluid -- an Apache Lab that explores how JPA can be the persistence provider for SDO dataobjects/graphs. When I posted about Fluid in Tuscany mail group, I received quite a few response. The most encouraging for me was one from Frank Budinsky - the architect of Eclipse Modeling Framework when he said "your project has a lot of potential.". Another response that mattered to me is from Sanjeeb Sahoo -- a brilliant engineer from Sun Microsystems well-known for his work on GlassFish/JPA. I also received queries from Luciano Resnede who is instrumental in Tuscany SDO DAS on "if would be feasible to add a DAS interface layer on top of Fluid?" So I went back to work -- to retrofit Fluid for DAS. It was pleasantly easy. DAS API remained intact but I used (or exploited) its flexibility. Instead of issuing SQL now the syntax of usage has changed, not the semantics of behavior. That is what syntax of using DAS became with object-orientation of Fluid: | Persist a DataObject using Fluid DAS | Delete a DataObject | | DataObject person = createPerson(ssn, "Fluid", "Guy", 17); DAS das = getDAS(); Command insert = das.createCommand("INSERT"); insert.setParameter(0, person); insert.execute(); | DataObject dataObject =...; DAS das = getDAS(); Command delete = das.createCommand("DELETE"); delete.setParameter(0, dataObject); delete.execute(); | - First thing to notice is: There is no SQL.
- Secondly, all object parameters are set via Command.setParameter() method. Insert command, for example, acts on DataObject which is passed to it as positional parameter.
- Thirdly, Commands are constructed by logical name such as INSERT, DELETE not be SQL. Fluid currently supports INSERT, FIND, SELECT, DELETE, UPDATE.
Query, perhaps the most important aspect of a Data Access Service, is also quite similar. | Query SDO Data using Fluid DAS | | DAS das = getDAS(); Command query = das.createCommand("SELECT o FROM Person o WHERE o.firstName=?1 AND o.age=?2"); query.setParameter(1, "Fluid"); query.setParameter(2, 17); DataObject list = query.executeQuery(); List result = (List)list.get(0); | The query is expressed in JPQL. The query is the only command which takes a JPQL String to create itself. The query parameters can be set following JPA parameter binding rules. The other notable aspect about query is the way it returns the result. Tuscany DAS specifies the result of query as a single DataObject. But often a query result (as in a Database cursor or a JPA query) is a list of instances. To comply with DAS API, Fluid also returns a DataObject whose 0-th property is a List of DataObjects. To implement such a List disguised as DataObject , dynamic features of SDO came handy where one can define a data structure (SDO Type) programmatically on the fly and create instances of it. Last but not the least is UPDATE. SDO DAS API, provides a separate API which is object-oriented not SQL-oriented. DAS.applyChanges(DataObject data). It was real easy to simply delegate the call to JPA EntityManager.merge(DataObject data). This ease of adaptation is what I envisaged -- SDO features are very similar to the programming model JPA supports -- graph-oriented data, detached mode of operation, navigational access. JPA and SDO are natural fit -- Fluid just makes that naturalness argument concrete with code. The last but important bit: how does one get a handle to Fluid DAS? Tuscany DAS has DASFactory which will create a DAS via a series of overloaded methods, each taking a resource such as InputStream, String or Configuration object. JPA rationalizes configuration by consolidating all persistence related service configuration in one single place: META-INF/persistence.xml. So getting a handle to SDO DAS API with Fluid becomes: | Obtain a handle to DAS | DASFactory dasFactory = new FluidDASFactory("SDO-DAS"); DAS dasFactory = dasfactory.createDAS(); | The user has to configure META-INF/persistence.xml to specify the name of the persistence unit and activate Fluid as follows: | Configuring OpenJPA for SDO DAS | | <persistence-unit name="SDO-DAS"> <properties> <property name="openjpa.EntityManagerFactory" value="sdo"/> <property name="openjpa.MetaDataFactory" value="sdo(Resources=myModelA.xsd;myModelB.xsd)"/> | - Name the persistence unit same as the name used to construct a FluidDASFactory
- Set openjpa.EntityManagerFactory to "sdo" -- this lets Fluid to be plugged-in to OpenJPA's plug-in architecture
- Set openjpa.MetaDataFactory to "sdo". This tells OpenJPA to source metadata information from XML Schema Definition *.xsd files rather than a set of POJO classes. The XML Schema Definition files are specified as Resources parameter.
That's it. One is good to go with persisting/updating and querying SDO DataObjects via Tusacny DAS API with Fluid. People who have expressed interest in Fluid also mentioned about JAXB. That is the next thing I am going to take a look at.
Persisting Service DataObjects with OpenJPA
Posted by pinaki.poddar on July 26, 2007 at 11:56 PM | Permalink
| Comments (2)
In a previous blog, I discussed how SDO relates to JPA. Since then, I did bit more exploring about the issue of persistence service for Service Data Objects (SDO) using Java Persistence Architecture (JPA). The result of probing culminated into an Apache Lab. Apache Lab "is a place for innovation where committers of the foundation can experiment with new ideas". I had to select a name for the lab -- Fluid --popped up after thirty seconds as images of swirling red liquid and BEA wafted across my mind. So under the aegis of Apache Lab, Fluid got started to explore how OpenJPA can provide persistence service for Service Data Objects. After a week and some active discussion with Patrick Linskey, Jarek Wilkiewicz, posting few novice questions to SCA/SDO forum (tuscany-dev at ws dot apache dot org) -- a first draft implementation of Fluid was ready. Using Fluid, a user application can create a graph of DataObjects with SDO API and OpenJPA will store the graph to a database. User can query a database using JPQL -- the powerful query language defined by JPA -- and the result will be returned as list of DataObjects. The DataObjects can be modified while disconnected from the database and later updated -- all in the same way for it works for POJO. After the basic outline implementation, I got initiated to Maven -- it made dependency management a breeze. Of course, I had for reference Maven scripts for OpenJPA written by a true expert - Marc Prud'hommeaux to guide me. Maven also helped to build a site (literally with a push of a button - OK, almost) where further details on SDO-JPA bindings is posted. Then another couple of days were spent to write the documents in APT (Almost Plain Text) which was elegantly simple -- loved it. So the bottom line is: There exists a site now with User Documents, Source Code and Test Cases to show one of the ways OpenJPA can provide persistence service to SDO.
Persisting Service Data Object using OpenJPA: Relaxing Type
Posted by pinaki.poddar on July 10, 2007 at 2:42 AM | Permalink
| Comments (0)
In this blog, I will describe how Java Persistence Architecture can be enabled to persist and query Service Data Objects. Service Data Object is liquid data Service Data Object (SDO) lives in the penumbra of strongly-typed POJO and streams of non-validating XML. SDO lets the user application work with data structures whose shapes can be created dynamically. Flexible, dynamic data structure such as SDO becomes significant when data has to propagate between environments that do not (or can not) share the same strict type definitions e.g. set of compiled POJO classes. One can compare them as solid (POJO) and liquid (SDO) -- and with that liquid metaphor -- it is obvious that SDO is an important component of SOA. That is why, inside IT department of every enterprise, this 'dynamic data structure' theme exists in one form or another. The sceptics will say: Isn't that name-value pair? But SDO goes much further than being glorified name-value pair. For example, the specification recognizes that data, like people, are enriched by their relationships and explicates a concept of DataGraph. It also specifies how to tracks changes in DataGraph via ChangeSummary. See Resources section for further details. Persistence of SDO using JPA OK, SDO is an advanced specification about flexible data. But what is data without persistence? How can we store or retrieve a SDO DataGraph to and from a database? Original SDO specification (November 2004) talked about Data Mediator Service but did not define it. When I looked again recently, the specification has matured to version 2.1 and persistence service is now termed as Data Access Service or DAS, but still the details remain unspecified. In the same period, Java Persistence Architecture (JPA) has matured and adopted by leading Application Service vendors as their persistence providers. For BEA it is Kodo, JBoss uses Hibernate while Toplink is the choice for OracleAS and GlassFish. OpenJPA -- the open-source avatar of Kodo -- is one of the likely candidates for Geronimo and Websphere environment. Naturally, JPA is a significant candidate to provide persistence service for SDO. How natural?  | A quick search in Google for "JPA", "SDO" and "JPA+SDO" returned 3.97, 3.42 and .16 million hits respectively. Visually those figures on how many discussions cover both SDO and JPA translate to the figure in our left.
The first project, I found is Tuscany DAS. I peeked at what is available but did not seem to be using JPA, at least for now. The idea of using JPA has been discussed in related forum, though no result has yet been reported.
Next stop is ALDSP. A matured product from BEA that uses SDO. However, ALDSP is not using JPA based persistence service either.
Eclipse EMF is one of the first concrete implementation of SDO. However, EMF did not implement persistence service either (it is not mandated by SDO Specification). | Why JPA is natural choice for SDO persistence? JPA solves the difficult problem of O-R Mapping despite of warning signs. It also provides a simple, well-designed API for persistence that has proven itself in realizing EJB 3.0. Besides its proven maturity, JPA becomes a natural choice for persistence service for SDO because JPA has many other features that align well with that of SDO. The first and foremost is disconnected or detached mode of operation -- in this mod, the user application connects to the database and releases the connection as soon as the requested data is fetched from database. The data is viewed as a related set of records -- a graph of POJO or a DataGraph -- based on your preferred nomenclature. The data is modified in a remote process without any active connection to the original data in the database. Later, the changes are merged with persistent data while a database connection is reestablished. This disconnected mode of operation is the key to scalability where say, 10000 active user can operate simultaneously with a pool of 10 database connection. Such application usage mode saves one tons of money too because cost of database installation often grows exponentially with the number of connections. JPA supports this disconnected mode of operation where the fetched POJO instances are detached when the persistence context is closed or transaction is committed and the detached object graph can be modified remotely and later merged with a different persistent context. SDO, on the other hand, specifies a notion of ChangeSummary that tracks the changes in a detached DataGraph. Hence it will be natural for a JPA implementation to leverage a ChangeSummary content to merge the changes with the database effectively. The other point where JPA and SDO will align naturally is both works with the notion of fetched data as a graph of connected objects. JPA currently has not fully explored the potential of defining a disconnected graph but OpenJPA or Kodo supports a powerful extension called FetchPlan -- a contribution from JDO Specification where this concept is well-defined for specifying which part of the closure of an object be fetched from the database. This control is important because often the complete closure of an instance is the entire database. On the other hand, SDO DataGraph defines a configured closure of so-called root DataObject. What are the key challenges to persist SDO using JPA? One critical chasm to bridge is about type system. JPA is designed for strictly typed POJOs. JPA implementations manage object instances whose types are statically defined by a Java class and compiled by a Java compiler. The beauty of JPA (and its precursor JDO) is these implementations intercept changes to persistent instances (their state and/or relationship) non-intrusively. The objects retain their POJO nature but JPA runtime magically knows when user application is invoking setter methods on a persistent instance to modify its state or invoking a getter that will require more date be fetched from the database (often termed as Lazy Loading) . The actual implementations may vary on how they intercept changes -- for example -- OpenJPA or Kodo depends on enhancement -- a process that modifies compiled bytecode of persistent Java classes while others may employ different mechanics such as proxying the original entity. In either case, implementations will require significant changes to move away from one of its fundamental assumptions i.e. a compiled Java class exists that represent persistent data. SDO, on the other hand, marvels at loosely-typed objects. SDO can represent a Person with a name and age without having to define a Person.java. The definition of Person may exist in a XML Schema document or can even be created programmatically. So how does one store a Person DataObject using JPA without writing a Person.java class? Theoretically, how JPA will adapt with data representation that are not strongly typed? This conversion of type system is just the beginning. There are many other important questions: - how the identity of instances be defined?
- how the mapping to database schema be defined?
- how the relationship between SDO types be expressed in Java?
- how the changes in detached DataGraph be translated into updates of Java instances?
How to relax JPA for loosely-type data? Instead being inundated by questions whose answer is not immediately obvious, I decided to take small steps -- one at a time. So I tried to solve a couple of simple scenarios or use cases. Use Case: From SDO Types/DataObjetcs to Persistent Data via JPA API This scenario shows how a client will persist SDO objects. The given input is a) a XML Schema that defines a bunch of SDO Types and b) a JPA configuration for SDO-enabled JPA runtime. The application will invoke SDO API to define the SDO Types and then, will create and populate a set of related DataObjects i.e. a DataGraph using these SDO types. This DataGraph will be presented to JPA API. The SDO-enabled JPA runtime will do whatever is necessary to store the data graph to a RDBMS. Essentially, we begin with SDO Types and objects as input to arrive at persistent data records using JPA. The other scenario is the opposite. We begin with persistent data. We do a query using Java Persistence Query Language (JPQL) via JPA API. Normally, this query results in a list of Java instances -- but SDO-enabled JPA runtime will instead return SDO DataObjects to the user application. Use Case #2: From Persistent Data to SDO Types/DataObjects via JPQL Query How to implement such SDO-enabled JPA runtime? One approach would be to overlay SDO on top of JPA. The JPA API will accept SDO DataObject as arguments and return DataObjects as results of query but the overlay will be responsible for converting a DataObject instance to a concrete Java instance or vice versa. I decided to probe a little further along this line to see what can be done. Test Before Code In the spirit of Test-Driven Development, the first thing I wrote was a JUnit test case. The test case shows how a client application will be using SDO and JPA together. /** * Persist a DataObject. The operation should cascade to the closure of the * given DataObject. */ public void testPersist() { DataObject purchaseOrder = createPurchaseOrder(); EntityManager em = emf.createEntityManager(); em.getTransaction().begin(); em.persist(purchaseOrder); em.getTransaction().commit(); } /** * Query using JPQL. The query results should be DataObjects. The related * DataObjects should also be fetched. */ public void testQuery() { EntityManager em = emf.createEntityManager(); String jpql = "SELECT p FROM PurchaseOrderType p"; List result = em.createQuery(jpql).getResultList(); for (Object o:result) { assertTrue(o instanceof DataObject); DataObject dataObject = (DataObject)o; assertEquals("PurchaseOrderType", dataObject.getType().getName()); String orderDate = dataObject.getString("orderDate"); assertEquals("1999-10-20", orderDate); DataObject shipTo = (DataObject)dataObject.get("shipTo"); assertNotNull(shipTo); assertEquals("Alice Smith", shipTo.get("name")); } } | The first test case creates a SDO DataGraph via its createPurchaseOrder() method that returns the root DataObject. This root instance becomes input to EntityManager.persist(). The persist() operation must store all the reachable DataObject from this root instance in the database. The second test case queries the database using Java Persistence Query Language (JPQL). The result of the query is DataObject. The resultant instances must be fetched with their related instances as we verify the related USAddress DataObject shipTo has the name property set to Alice Smith because that was what was set in while creating the DataGraph in createPurchaseOrder() method. The test case shows that the user's application will invoke standard JPA EntityManager.persist() method to store instances but with arguments that are instance of commonj.sdo.DataObject. Similarly, the query will result in a list of DataObject instances rather than instances of strongly-type PurchaseOrderType.class. Creating a SDO-enabled JPA Runtime After couple of days of thinking + downloading + Google searching followed by couple of days of coding, I could write few classes, put them in a jar and ran my two test cases. The tests passed. In the next part of this blog, I will discuss what was done to enable a JPA runtime for SDO. I had taken several shortcuts but still the basic steps may help someone who is considering to extend JPA for other interesting purposes. The good news is it did not require any change in OpenJPA which I used as JPA implementation -- it shows what is so great about OpenJPA -- it is engineered (as opposed to hacked) for extensibility. Also I started subscribing to Tuscany user group -- whose SDO implementation I used for my experimental prototype -- and the user group was active enough to fill my mailbox in no time. But all that have to wait for another day.... [1] A recent article explores SDO and its usage in SOA in great detail. [2] The complete SDO specification version 2.1 Good Night and Good Luck
The promise of Persistence Providers: Kodo, OpenJPA and Hibernate
Posted by pinaki.poddar on June 25, 2007 at 7:42 AM | Permalink
| Comments (10)
The promise of pluggable Persistence Providers: Kodo, OpenJPA and Hibernate The Bug That Started ItA bug landed in my mailbox. The bug reported a problem while switching JPA persistence provider between Hibernate and Kodo in Weblogic server environment. In course of reproducing the bug, I learned few bits about installing Hibernate in Weblogic Server 10.0 and also about deploying a JPA based Web application with a specific persistence provider X, and then redeploying it with another provider Y without bringing the whole house (meaning Weblogic Server) down. This experience nudged me out of my hibernation from blog writing triggered by a cartoon in New Yorker magazine. Anyway, this blog will describe a working way to install and configure Hibernate in Weblogic Server 10.0 answer why you do not need to install Kodo or OpenJPA in Weblogic Server 10.0 show the steps to switch JPA provider between Hibernate, OpenJPA and Kodo in a running Weblogic Server with a very simple Stateless Session Bean based service as an example reproduce the bug that started it all not only because the reporter thought it was important but I also agreed after my brief experiment present how the bug impacts a change in how JPA bootstraps in application server environment A very simple mechanics to test provider switchingI used a simple service as the mechanics to verify correct provider is being used and the provider is behaving correctly albeit for rudimentary persistence operations. The simple service interface against which I will test is JPAService.java | 01 package service; 02 /** 03 * A very simple service to verify the persistence provider being used. 04 * The service also can persist a simple log message in a database. 05 * 06 * @author ppoddar 07 * 08 */ 09 public interface JPAService { 10 /** 11 * Returns the name of the active provider. 12 */ 13 public String getProvider(); 14 15 /** 16 * Logs the given message. 17 * 18 * @param message an arbitray message string. 19 * 20 * @return a Message instance that has been persisted. The service will 21 * attach a timestamp to the message. 22 */ 23 public Message log(String message); 24 } | The service is non-committal about how it is to be implemented; it does not even state that it is going to use JPA. That is the way a service definition should be -- non-committal about implementation technology (omitted the customary IMO here -- assume this whole rambling as my opinion -- humble or otherwise). JPA & Session Bean based Application PrimerI will undertake a brief discussion on implementing this service in Stateless Session Bean that uses JPA. It is pretty basic. If you already familiar with JPA, you may skip to the next section. In our simple example, this service is realized by a Stateless Session Bean that uses JPA. JPAServiceBean.java | 01 package session; 02 03 import javax.ejb.Remote; 04 import javax.ejb.Stateless; 05 import javax.persistence.EntityManager; 06 import javax.persistence.PersistenceContext; 07 08 import | |