Slice: OpenJPA for Distributed Databases
Pinaki Poddar's Blog |
December 21, 2007 4:01 PM
|
Comments (2)
Slice is a OpenJPA plug-in for horizontally-partitioned, distributed databases. As distributed databases are being increasingly common in enterprise IT ecosystem, I considered extending OpenJPA to transact against a set of databases instead of a a single one. The result is Slice -- a project that is now available for download from Apache Labs. | The salient features of Slice are - available as configurable OpenJPA plug-in a single, separate jar
- transacts against a set of horizontally-partitioned databases
- requires no change in your OpenJPA application
- queries in parallel across multiple databases and consolidates the results in a single list for your application
- remembers the original database from where any instance is loaded from, and commits changes to the appropriate databases
- depends on the user application to distribute newly persistent instances among the set of databases.
Effectively, using Slice, your OpenJPA application can have a object-oriented modifiable view that spans across multiple databases. | If all this sounds too good to be true, here comes the fine print: Slice does not support relationship across different databases. In O-R mapping paradigm, this limitation translates to collocation constraint i.e. the closure of any object graph must be collocated in the same database. You have to implement a distribution policy to honor this collocation constraint (I just said no change in code few lines ago) . But honestly, the interface is really simple (see later). Getting Started with Slice Few simple steps will make your current OpenJPA application ready for distribute databases. 1. Configure the persistence unit by editing META-INF/persistence.xml <property name="openjpa.BrokerFactory" value="slice"/> 2. Specify the distributed database URL by concatenating each participating database with | character <property name="openjpa.ConnectionURL" value="jdbc:mysql://localhost/slice1|jdbc:mysql://localhost/slice2"/> 3. Specify your distribution policy implementation class name <property name="slice.DistributionPolicy" value="com.acme.policy.MyDistributionPolicy"/> 4. Implement Distribution Policy. The interface is really simple: package org.apache.openjpa.slice.*; public interface DistributionPolicy { int distribute(Object pc, String[] urls); } Slice will call this method whenever it needs to persist a new instance. The second argument to the method is the list of database URLs in the same order as in openjpa.ConnectionURL setting. Your implementation must return a zero-based integer index denoting the database selected for a given instance. Remember the collocation constraint in implementing logic of this method. For example, if Person A refers to Address B then the distribution policy must select the same database index for both. However, if Person C relates to Address D then C and D can reside in a different database. 5. That is it. Run your OpenJPA application. Some code examples are available in Slice source code base maintained in Apache Labs Subversion repository for you to take a look at. Design Notes Slice uses a distributed template pattern to abstract a set of distributed databases as a virtual database. A distributed template pattern is a type T' which specializes a type T as well as delegates all its operations to a set of concrete instances of T. In its bare bone simplicity, it looks as follows: public class Distributed<T> extends T { private Collection<T> _delegates; public void doSomething(Object...args) { for (T t:_delegates) t.doSomething(args); } Slice has applied this distributed template pattern for the internal interfaces OpenJPA uses to interact with relational database e.g. StoreManager and StoreQuery as well as key JDBC abstractions e.g. DataSource, Connection, PreparedStatement in java.sql.* package. As the distributed resources are introduced via OpenJPA plug-in mechanism, the rest of OpenJPA kernel works exactly the same way oblivious of the fact that the database operations are getting multiplexed via the distributed template pattern across all the configured database instead of a single one. The other key issue in this design is to tag each instance with the original database identifier from which it has been loaded. Now OpenJPA manages your persistent entities via a proxy referred as OpenJPAStateManager. The database identifier is attached to this proxy instance. Thanks to the foresight of the original designers, who had kept a placeholder in the proxy object for associating arbitrary content that remains opaque to rest of the object management kernel. Conclusion OpenJPA architecture uses well-designed interfaces and sophisticated plug-in mechanics for ease of extensibility. We leveraged this extensibility to develop Slice -- a OpenJPA plug-in for distributed databases as they are becoming increasingly common for enterprise applications. Happy Holidays!
Comments
Comments are listed in date ascending order (oldest first) | Post Comment
-
Sounds great !!
Any performance issues ???
Posted by: vivekmv2000@rediffmail.com on January 8, 2008 at 9:14 PM
-
Performance is expected to improve especially scaling well with data size. Please note that Slice executes every major database operation (query/flush) in parallel. I have not have the comparative hard number but I can see that there is no extra computational overhead in Slice only parallelism wherever possible without introducing synchronization.
Posted by: pinaki.poddar on January 8, 2008 at 9:20 PM
|