Tuning a Black Box
Noora Peura's Blog |
April 3, 2008 5:12 AM
|
Comments (0)
A while ago I wrote this little micro benchmark in order to have something to use for screenshots and illustrations. After playing around with tuning parameters for a while I found that this micro benchmark was an excellent tool for illustrating tuning of BEA JRockit.
The micro benchmark starts a Java thread which performs a given number of tasks, or "events". Each event consists of a fixed number of object allocations. Each of the allocated objects are approximately 64k in size, which puts some pressure on the memory management system. The application measures the execution time of each event with System.nanoTime() and prints out the execution times with System.out.println(). At the end of the run the application also prints out the total time it took to execute all the events. The execution time of each event represents the latencies in the application, while the total run time represents the overall throughput.
Today I will show you how you can do some basic tuning of BEA JRockit just by looking at the output of your application. Of course, I know the inner workings of the JVM quite well, but I'll pretend that the JVM is a black box with a few knobs and handles and no output except the Java application itself. I use the micro benchmark described above as my example application.
For this example I choose to run with a fixed heap size. I've adjusted the application to hold about 100-150 MB of live data. I've heard that the heap size shuold be at least twice the amount of live data, so I set the heap size to 350 MB.
In my first run only the heap size is set, which means that BEA JRockit uses the default garbage collection mode, which is optimized for high application throughput.
The total execution time time for this run is 209594 ms. When I plot the latencies in a spread sheet I see that most are below 50 ms, but a considerable number of latencies are scattered between 50 and almost 120 ms, as seen in figure 1.
Figure 1

Now, let's say that my application is a transaction system and that I don't want the transactions to take longer than 50 or 60 ms to execute. The variations in transaction times when running in throughput mode would thus not be acceptable. One reason for variations in the latencies could be that the garbage collector pauses the Java execution now and then. Let's try this theory!
For my second run I use the command line option -Xgcprio:pausetime to select the garbage collection mode optimized for short pauses. The results don't look too good, the pause times are actually higher (see Figure 2), as well as the total execution time (246781 ms). Why is that? I check the documentation for -Xgcprio:pausetime and find that there is also a pause target parameter that you can use to define what you mean by "short pauses". By default the pause target is 500 ms. This is not even nearly good enough if I want the latencies down below 60 ms.
Figure 2

The lowest pause target setting for the pausetime mode is 200 ms. That works for larger applications where transactions can take a second or so, but for my small application that isn't enough. I must look for other options. How about the garbage collection mode for short and deterministic pauses? With this I can set much lower pause targets, so I decide to try that. The "deterministic garbage collector" is a part of BEA WebLogic Real Time (http://www.bea.com/framework.jsp?CNT=index.htm&FP=/content/products/weblogic/realtime/).
I do a run with -Xgcprio:deterministic and the default pause target setting, which is 30 ms. Immediately the "scatter" is reduced. Most latencies are below 70 ms, as you can see in Figure 3. The total execution time is 213843 ms, which isn't bad compared to the run in throughput mode.
Figure 3

I have however not quite reached my goal yet. By examining the graph of the latencies I notice that most of the latencies are between 5 and 30 ms. Maybe each "event" itself takes up to 30 ms to execute? Within that time we might get a couple of garbage collection pauses that total more than 30 ms, so of course the latencies can climb up tp 70 ms.
I decide to do one more run using the deterministic mode. This time I set the pause target to 10 ms, using the command line option -XpauseTarget:10ms. Finally the results look good - you can see this for yourself in Figure 4. The graph from the spreadsheet shows only one stray latency longer than 60 ms, and most are below 50 ms. One long latency of about 9500 I can live with, at least for now.
Figure 4

What about the total execution time? Actually, it looks very good: 200234 ms. Even if you allow for a variation of a few percent in the total execution time measurements, this shows that the deterministic mode is on par with or better than throughput mode for this particular application. With a few benchmarking rounds and some analysis of the results between each run I've found a garbage collection mode and pause target setting that not only gives me acceptable latencies but also gives me good performance.
Of course this is a micro benchmark. The results will most likely be different if I run another application, but now I at least know how to think and what to look for..
Comments
Comments are listed in date ascending order (oldest first) | Post Comment
|