Balancing RAC-Backed JDBC Pools
Greg Nyberg's Blog |
May 11, 2005 8:40 PM
|
Comments (2)
I spend most of my time developing J2EE systems -- or fixing broken ones -- but occasionally I'll take a gig that is more about administration or troubleshooting than it is J2EE development. A current client has an interesting problem related to Oracle RAC and its automatic load-balancing and failover behavior that cries out for a better facility in WebLogic Server for safely re-setting JDBC connection pools.
In a nutshell, the client has a 3-node RAC installation (i.e., three separate Oracle instances sharing a single SAN storage device) that provides good performance and high-availability characteristics... Most of the time. Unfortunately, after a RAC node drops and its client connections are re-routed to one of the other nodes (transparently to WebLogic Server, of course) there is no way to force the connections to re-balance across all three nodes once the failed node returns to the RAC cluster.
Remember that a common best practice for connection pools is to set the minimum and maximum capacity of the pool at the same, fairly high, value (somewhere between 50% and 100% of the number of execute threads, IMO), and avoid all forms of shrinking, growing, etc. Just make the connections you need when the server boots and leave them alone for the life of the server.
Do you see the problem?
The pool connections made during bootup are distributed nicely across the three RAC nodes (lets assume 60 connections, or 20 per node). When Node #2 fails, its 20 connections are re-routed to Nodes #1 and #3, giving them 30 each (in an ideal world). Node #2 comes back online... and... nothing. The 60 connections are perfectly happy where they are, and Node #2 never gets any connections.
This particular client was performing rolling re-boots of the WebLogic Server instances whenever this happened to rebalance the connections. Not pretty.
'Greg, is there a better way?'
Gulp. 'Let me look in to it,' I reply.
You might be thinking that setting initial and maximum connections in the pool to the same value is the culprit -- and in some sense you are right -- but there are good reasons to avoid constant shrink/grow cycles, and you'd hate to swallow that bitter pill just to help with re-balancing! Besides, setting your initial below your max will only help if the server load goes through large, cyclic variations in load requiring a number of connections below the minimum level for a period of time sufficient to cause a shrink, and then well-above the minimum to force the pool to grow again and (possibly) grab a few connections to the now-recovered Node #2. It could take a very long time for this to rebalance the connections, especially since RAC isn't smart enough to assign new connections to nodes based on current counts (doh!).
Besides, this client sees sustained (hours-long) high-volume periods during which they see consistent load (hundreds of hits/second) on their eight WebLogic servers, providing no opportunities for shrink/grow-based rebalancing should a RAC node drop temporarily.
The WebLogic Server reset command looked interesting, but it turns out to be completely unusable -- It simply drops all existing connections in the pool and re-establishes the "initial" number again without regard for connections currently in use (reserved). Ka-Boom!
The shrink command is better behaved: If the current number of connections exceeds the "initial" value, it tries to close connections to make the current count equal the initial (minimum) value honoring the currently reserved (in-use) connections.
What I came up with was the following:
| |
- Increase the maximum number of connections by a factor of at least 2X (from 60 to 120 in my example).
- Increase the initial/minimum level to the same value (120). WebLogic creates 60 new connections, more-or-less balanced across the three RAC nodes.
- Wait 30 seconds or so for the reserved connections to become distributed across the new, larger pool.
- Drop the initial capacity level back to 60 (or lower, if the load allows).
- Execute the shrink command.
- Assuming the currently-in-use connections are below the new initial level, set maximum back down to initial (60). Be careful here -- Setting the maximum below the currently in-use connections count can cause forced closes.
- Repeat the whole set of steps 3-5 times and you should be approaching an even distribution again (any mathematicians in the audience?).
|
This set of steps is easy enough to do manually in the console (though tedious and error prone), but is also very easy to script in wlshell, WLST, etc. Email me if you want my first cut at it.
Clearly, the need to jump through all these hoops to do a simple re-balancing of connections cries out for a new WebLogic Server command to re-establish all connections over a period of time, waiting for any reserved connections to become released back to the pool again before acting on them. Sortof a kinder, gentler RESET command... How about it, guys?
Anyone see an easier way to do what I'm trying to do? I'm all ears.
Comments
Comments are listed in date ascending order (oldest first) | Post Comment
-
Won't accessing everything through a load-balancing multipool of RAC connection pools help? See this and this. I've never done this, but presumably when the RAC instance comes up again, it's connection pool will be available again to the multipool and it'll simply load balance across it (and all the rest).
Posted by: jonmountjoy on May 11, 2005 at 9:29 PM
-
Hi,
Jon is right on here. MultiPools (or Multi Data Sources, as they are called in WLS 9.0 and later) provide some pretty serious load balancing and failover capabilities, including optimizations around determining when a RAC node is unavailable, routing connection requests around the unavailable RAC node, and then re-instating the RAC node in the load balancing scheme when it becomes available again. The only way that WLS can do that is to "know" where it is connecting. That means directly connecting to RAC nodes instead of using some other connection load balancing or failover mechanism such as Oracle Services or driver-level failover.
Also note that WLS provides XA support with load balancing and failover through MultiPools. XA cannot be supported with any other load balancing or failover mechanism. Period!
WebLogic Server configuration for use with RAC is fully documented. See Using WebLogic Server with Oracle RAC. I highly recommend looking at that doc.
Thanks for listening,
Dave
Posted by: dcabelus on June 4, 2007 at 10:40 AM
|