Arch2Arch Tab BEA.com
Syndicate this blog (XML)

Good and Bad Persistence Strategies for BPM Engines

Bookmark Blog Post

del.icio.us del.icio.us
Digg Digg
DZone DZone
Furl Furl
Reddit Reddit

Jesper Joergensen's Blog | June 29, 2006   5:58 AM | Comments (6)


I got a question from a customer along with the answer from our PMs that I thought I'd share here in case others are interested.

The customer was using a BPM solution that persisted large variables in the process instance outside of the database in a separate file store. It then puts a reference in the database to the external file. The problem with this approach is that it's impossible (or at least hard/cumbersome) to ensure transactional integrity for an operation that involves an external file store. The (perceived) advantage of this approach is improved performance because large variables can take time to stream back and forth between the database and execution engine. This can be hard to optimize on multiple JDBC drivers and maybe that's why it was chosen to use external files for this particular BPM solution.

The customer asked how ALBPM handles this and here's the answer from Eduardo Chiocconi, our product manager:

Each process can have different types of process instance variables ranging from simple strings to more complex structure or even binary contents like images or even files. ALL variables are persisted in the ALBPM Process Execution Engine Database and are part of the transaction that is started for an instance each time an activity task is executed. From the transaction perspective, ALBPM preserves completely the atomicity of the transaction and state of the instance flowing through the process. The mechanism for storing the content of instance variables is the same as [that other BPM solution] using Java Serialization. However, ALBPM provides a property for process instance variables so that the developer can set the process instance variable as "Separated". The intention is to use this kind of process instance variables when you have large objects that are not modified in every single step of the business process. The ALBPM Process Execution Engine persistense algorithm has intelligence so that it if this variable is not modified, then it will not persist it again. The benefit is that the overall persistence time for an instance is significantly reduced increasing the instance through-put into the business process.

This is a typical example of "the devil in the details". While BPM is a new (or at least rather different) dicipline than writing plain old enterprise apps, it still depends on many of the same best practices. In my days as a programmer, we were always considering these different designs when dealing with large files. Streaming in and out of databases through the JDBC driver can be rather tricky (particularly if you support multiple drivers and multiple databases), so I understand why a different approach was chosen for this particular BPM solution.

In this case, sticking to just using the database provides a clear advantage to customers and it is also a more clean design. So it's probably worthwhile doing it this way.


Comments

Comments are listed in date ascending order (oldest first) | Post Comment

  • Can you explain exactly how ALBPM maintains the atomicity of the transaction, does it implement its own filestore transaction manager to deal with the persistence of large variables in the BPM? Also when you talk about using Java Serilization to store this content are you Serialising complex/custom objects and if so how do you cope with future changes to these types?

    On the whole my experiences with applications that use Serilisation to store users custom objects have been very negative because of the problems associated with versioning.

    Posted by: hoos on June 29, 2006 at 5:18 PM

  • Hi hoos, I am sorry if it wasn't clear from the post, but the whole point was that ALBPM does not store large variables as files. They are stored in the database so that you can easily maintain atomicity of transactions. Java serialization and versioning is another interesting design problem that you bring up. It goes beyond the scope of my post, but you are right that you will have to deal with class/object versioning when you use serialization. However, this does not necessarily translate into issues with process versioning. It depends on the implementation of process instance migration and process version handling. If you want to ask more detailed technical questions, you can go to the product forum which is monitored by our product managers and engineering teams. They can provide much more detail than I can.

    Posted by: jesperfj on June 29, 2006 at 5:39 PM

  • Thanks for the response Jasper, if you are storing large variables in the database how do you get over the performance issues that the other BPM solution overcame with its approach. I understand that allowing developers to control what should and should not be persisted to the DB will help ensure no unnecessary persistence is carried out but what if you have to persist large varianbles, wont ALBPM be prone to the performance problems the other BPM solution avioded?

    Posted by: hoos on June 30, 2006 at 1:51 AM

  • Storing large files in databases does not have to be much slower than storing them in file systems. It depends on the database and driver. Good databases and drivers have sufficient performance today to use this approach, but it's obviously a bit more complex because you have to know your database and driver.

    Posted by: jesperfj on June 30, 2006 at 2:18 AM

  • I am using Thin driver to connect to Oracle database. But to get Connection its taking a time of 20 mins. I am having a servlet to load the Driver. But it processes it for 20 mins and then only gets me the connection. (Same application tried with Tomcat it works fine. Within 2-3 mins it retrieves the connection) Can anyone please help me out to achieve this.

    Posted by: vrushali_gore on August 30, 2006 at 1:52 AM

  • I believe the real thing here is that if the large file or binary content is not modified at all, then the ALBPM Engine will not re-persist this variable when the instance is updated as it moves through the process. The ALBPM Persistence layer is intelligent enough to know what variables have been modified and which ones to update. Eduardoc.

    Posted by: eduardochiocconi on October 23, 2007 at 10:41 PM



Only logged in users may post comments. Login Here.

Powered by
Movable Type 3.31