<?xml version="1.0" encoding="utf-8"?>
<feed xmlns="http://www.w3.org/2005/Atom">
    <title>Noora Peura&apos;s Blog</title>
    <link rel="alternate" type="text/html" href="http://dev2dev.bea.com/blog/noora/" />
    <link rel="self" type="application/atom+xml" href="http://dev2dev.bea.com/blog/noora/atom.xml" />
   <id>tag:dev2dev.bea.com,2008:/blog/noora//270</id>
    <updated>2008-04-03T13:12:37Z</updated>
    
    <generator uri="http://www.sixapart.com/movabletype/">Movable Type 3.31</generator>
 
<entry>
    <title>Tuning a Black Box</title>
    <link rel="alternate" type="text/html" href="http://dev2dev.bea.com/blog/noora/archive/2008/04/tuning_a_black.html" />
    <id>http://dev2dev.bea.com/blog/noora/archive/2008/04/tuning_a_black.html</id>
    
    <published>2008-04-03T13:12:37Z</published>
    <updated>2008-04-03T13:12:37Z</updated>
    
    <summary>Today I will show you how you can do some basic tuning of BEA JRockit just by looking at the output of your application. Of course, I know the inner workings of the JVM quite well, but I&apos;ll pretend that the JVM is a black box with a few knobs and handles and no output except the Java application itself.</summary>
    <author>
        <name>noora</name>
        
    </author>
            <category term="Product: BEA JRockit" />
    
    <content type="html" xml:lang="" xml:base="http://dev2dev.bea.com/blog/noora/">
        <![CDATA[<p>A while ago I wrote this little micro benchmark in order to have something to use for screenshots and illustrations. After playing around with tuning parameters for a while I found that this micro benchmark was an excellent tool for illustrating tuning of BEA JRockit.</p>

<p>The micro benchmark starts a Java thread which performs a given number of tasks, or "events". Each event consists of a fixed number of object allocations. Each of the allocated objects are approximately 64k in size, which puts some pressure on the memory management system. The application measures the execution time of each event with System.nanoTime() and prints out the execution times with System.out.println(). At the end of the run the application also prints out the total time it took to execute all the events. The execution time of each event represents the latencies in the application, while the total run time represents the overall throughput.</p>

<p>Today I will show you how you can do some basic tuning of BEA JRockit just by looking at the output of your application. Of course, I know the inner workings of the JVM quite well, but I'll pretend that the JVM is a black box with a few knobs and handles and no output except the Java application itself. I use the micro benchmark described above as my example application. </p>

<p>For this example I choose to run with a fixed heap size. I've adjusted the application to hold about 100-150 MB of live data. I've heard that the heap size shuold be at least twice the amount of live data, so I set the heap size to 350 MB.</p>

<p>In my first run only the heap size is set, which means that BEA JRockit uses the default garbage collection mode, which is optimized for high application throughput. </p>

<p>The total execution time time for this run is 209594 ms. When I plot the latencies in a spread sheet I see that most are below 50 ms, but a considerable number of latencies are scattered between 50 and almost 120 ms, as seen in figure 1. </p>

<p><b>Figure 1</b><br />
<img alt="thrputgraph.jpg" src="http://dev2dev.bea.com/blog/noora/thrputgraph.jpg" width="554" height="350" /></p>

<p><br />
Now, let's say that my application is a transaction system and that I don't want the transactions to take longer than 50 or 60 ms to execute. The variations in transaction times when running in throughput mode would thus not be acceptable. One reason for variations in the latencies could be that the garbage collector pauses the Java execution now and then. Let's try this theory!</p>

<p>For my second run I use the command line option -Xgcprio:pausetime to select the garbage collection mode optimized for short pauses. The results don't look too good, the pause times are actually higher (see Figure 2), as well as the total execution time (246781 ms). Why is that? I check the documentation for -Xgcprio:pausetime and find that there is also a pause target parameter that you can use to define what you mean by "short pauses". By default the pause target is 500 ms. This is not even nearly good enough if I want the latencies down below 60 ms.</p>

<p><b>Figure 2</b><br />
<img alt="pausetimegraph.jpg" src="http://dev2dev.bea.com/blog/noora/pausetimegraph.jpg" width="551" height="350" /></p>

<p>The lowest pause target setting for the pausetime mode is 200 ms. That works for larger applications where transactions can take a second or so, but for my small application that isn't enough. I must look for other options. How about the garbage collection mode for short and deterministic pauses? With this I can set much lower pause targets, so I decide to try that. The "deterministic garbage collector" is a part of BEA WebLogic Real Time (http://www.bea.com/framework.jsp?CNT=index.htm&FP=/content/products/weblogic/realtime/).  </p>

<p>I do a run with -Xgcprio:deterministic and the default pause target setting, which is 30 ms. Immediately the "scatter" is reduced. Most latencies are below 70 ms, as you can see in Figure 3. The total execution time is 213843 ms, which isn't bad compared to the run in throughput mode. </p>

<p><b>Figure 3</b><br />
<img alt="det30graph.jpg" src="http://dev2dev.bea.com/blog/noora/det30graph.jpg" width="551" height="350" /></p>

<p>I have however not quite reached my goal yet. By examining the graph of the latencies I notice that most of the latencies are between 5 and 30 ms. Maybe each "event" itself takes up to 30 ms to execute? Within that time we might get a couple of garbage collection pauses that total more than 30 ms, so of course the latencies can climb up tp 70 ms.</p>

<p>I decide to do one more run using the deterministic mode. This time I set the pause target to 10 ms, using the command line option -XpauseTarget:10ms. Finally the results look good - you can see this for yourself in Figure 4. The graph from the spreadsheet shows only one stray latency longer than 60 ms, and most are below 50 ms. One long latency of about 9500 I can live with, at least for now. </p>

<p><b>Figure 4</b><br />
<img alt="det10graph.jpg" src="http://dev2dev.bea.com/blog/noora/det10graph.jpg" width="551" height="350" /></p>

<p>What about the total execution time? Actually, it looks very good: 200234 ms. Even if you allow for a variation of a few percent in the total execution time measurements, this shows that the deterministic mode is on par with or better than throughput mode for this particular application. With a few benchmarking rounds and some analysis of the results between each run I've found a garbage collection mode and pause target setting that not only gives me acceptable latencies but also gives me good performance.</p>

<p>Of course this is a micro benchmark. The results will most likely be different if I run another application, but now I at least know how to think and what to look for..<br />
</p>]]>
        
    </content>
</entry>
<entry>
    <title>Control Your Youngsters - Nursery Sizing and its Problems</title>
    <link rel="alternate" type="text/html" href="http://dev2dev.bea.com/blog/noora/archive/2008/03/control_your_yo.html" />
    <id>http://dev2dev.bea.com/blog/noora/archive/2008/03/control_your_yo.html</id>
    
    <published>2008-03-04T14:11:38Z</published>
    <updated>2008-03-05T09:14:14Z</updated>
    
    <summary>Swift allocation and garbage collection of short lived objects can improve the application throughput significantly. Thus most modern garbage collectors, including the ones in BEA JRockit, use a separate space for newly allocated objects. The name of this space varies. In BEA JRockit we call it the &quot;nursery&quot; or the &quot;young space&quot;. I will talk a little about the rationales behind the dynamic nursery sizing heuristics in BEA JRockit and the problems that both BEA JRockit and anyone who attempts to tune the nursery size manually may encounter.
</summary>
    <author>
        <name>noora</name>
        
    </author>
            <category term="Product: BEA JRockit" />
    
    <content type="html" xml:lang="" xml:base="http://dev2dev.bea.com/blog/noora/">
        <![CDATA[<p>Many Java applications allocate a lot of temporary short lived objects. Swift allocation and garbage collection of these short lived objects can improve the application throughput significantly. Thus most modern garbage collectors, including the ones in BEA JRockit, use a separate space for newly allocated objects. The name of this space varies. In BEA JRockit we call it the "nursery" or the "young space". </p>

<p>An ordinary garbage collection traverses all references in all objects in the heap, while a garbage collection of the nursery (young collection) only traverses references that have changed since the last garbage collection. Any "live" objects found in the nursery are promoted (moved) to the "old space". This frees up the entire nursery for swift new object allocation. Using a nursery will thus improve both garbage collection efficiency and object allocation speed.</p>

<p>The BEA JRockit Diagnostics Guide contains information on how to tune the nursery size, so I will not go into those details here. Instead I will talk a little about the rationales behind the dynamic nursery sizing heuristics and the problems that both BEA JRockit and anyone who attempts to tune the nursery size manually may encounter.</p>

<p><B>The Basics</B></p>

<p>Figures 1-3 illustrate a basic scenario with a nursery and an old space. Objects are allocated in the nursery, and when the nursery is full the objects that are still alive are promoted to the old space.</p>

<p>Let's say that each allocated object lives for exactly ten seconds, no more and no less. Thus, when the nursery becomes full the only live objects within the nursery are the ones that were allocated less than ten seconds ago. In a simple young collection (without a keep area) these are the objects that would be promoted. Ideally, the size of the nursery wouldn't matter at all. The young collections promote the same amount of objects as long as they're at least ten seconds apart. In this world, a larger nursery size would mean that each young collection frees a larger amount of memory. Also, the frequency of old collections would depend directly on the size of the old space - a smaller old space would mean more frequent old collections. Since the nursery is a part of the heap, and the old space is what is left of the heap when the nursery has been reserved, a larger nursery would allow fewer young collections before an old collection must be performed. </p>

<p><B>Figure 1: The Heap</B><br />
<img alt="os_nursery.jpg" src="http://dev2dev.bea.com/blog/noora/os_nursery.jpg" width="400" height="300" /> </p>

<p><B>Figure 2: "Before"</B><br />
<img alt="nursery_w_objects.jpg" src="http://dev2dev.bea.com/blog/noora/nursery_w_objects.jpg" width="400" height="300" /> </p>

<p><B>Figure 3: "After"</B><br />
<img alt="nursery_promoted.jpg" src="http://dev2dev.bea.com/blog/noora/nursery_promoted.jpg" width="400" height="300" /></p>

<p>Your goal here is to get as much memory as possible freed by young collections rather than old collections. I won't go into the details on why, but surprisingly it turns out that a nursery size of approximately half of the free memory on the heap is nearly optimal in this sense. </p>

<p>Promoting objects just because they haven't had a chance to die yet when the young collection starts feels a bit silly. Thus BEA JRockit uses something called a "keep area". The last bit of the nursery is designated as the "keep area", and objects within the keep area won't be promoted until the second young collection after they were created, providing that they still are alive at that point. In our optimal world where objects always live exactly ten seconds, we would tune the keep area to fit exactly these objects, and nothing would ever be promoted. In this world, the nursery size could be increased almost indefinitely, since the old space only has to have space for a few long lived objects.</p>

<p>Figures 4-6 illustrate a young collection using a keep area. As you can see, no objects are promoted in this example.</p>

<p><B>Figure 4: The Heap</B><br />
<img alt="os_ns_keeparea.jpg" src="http://dev2dev.bea.com/blog/noora/os_ns_keeparea.jpg" width="400" height="300" /> </p>

<p><B>Figure 5: "Before"</B><br />
<img alt="keeparea_w_objects.jpg" src="http://dev2dev.bea.com/blog/noora/keeparea_w_objects.jpg" width="400" height="300" /> </p>

<p><B>Figures 6: "After"</B><br />
<img alt="keeparea_after_yc.jpg" src="http://dev2dev.bea.com/blog/noora/keeparea_after_yc.jpg" width="400" height="300" /></p>

<p><B>The Quirks</B></p>

<p>Now, BEA JRockit has a little quirk: large objects can be allocated directly in old space. Thus, if your application allocates a lot of large objects, the old space may become full before even one young collection has been performed. Don't worry, you can fix that too. Just set the limit for what objects to consider "large" higher, using the option -XXlargeObjectLimit. Everything smaller than the large object limit will always be allocated in the nursery. </p>

<p>Another aspect in nursery sizing is garbage collection pause times. If your application is sensitive to latencies you may want to decrease the nursery size to shorten the young collection pause times. The pause time isn't linearly dependent on the nursery size, but you have a fair chance to get shorter pause times by reducing the nursery size.</p>

<p>So, why can't BEA JRockit do this all by itself? Well, it makes a fair attempt on some of it, but unfortunately the real world has a quirk of its own: the lifetimes of the objects vary. Sometimes they vary a lot. You will notice this also when you tune the nursery size by hand. What in theory sounds optimal can turn out to be a bit off, or even quite suboptimal when the inexactness of the real world is thrown in the face of neat models and calculations.</p>

<p>By default, BEA JRockit optimizes garbage collection for maximum throughput. In this mode the nursery size is adjusted automatically in runtime. The nursery can even be disabled if most of the allocated objects are large. The static generational parallel garbage collector (-Xgc:genpar) uses similar heuristics to adjust the nursery size. For both these garbage collectors you can override the dynamic nursery size by setting a nursery size on the command line (-Xns:<size>). If you're using the garbage collection mode optimizing for short pauses (-Xgcprio:pausetime) or the static generational concurrent garbage collector (-Xgc:gencon) you will most likely want to tune the nursery size manually.</p>

<p>Read more about nursery sizing and garbage collection in the <a href="http://edocs.bea.com/jrockit/geninfo/diagnos/index.html">BEA JRockit Diagnostics Guide</a>.</p>]]>
        
    </content>
</entry>
<entry>
    <title>Strategies in the Battle Against Garbage Objects</title>
    <link rel="alternate" type="text/html" href="http://dev2dev.bea.com/blog/noora/archive/2007/10/garbage_collect.html" />
    <id>http://dev2dev.bea.com/blog/noora/archive/2007/10/garbage_collect.html</id>
    
    <published>2007-10-23T13:09:43Z</published>
    <updated>2007-10-23T13:09:51Z</updated>
    
    <summary>How can you minimize the impact of garbage collection? We can&apos;t get rid of it for good, but we can put it where it doesn&apos;t matter so much. BEA JRockit offers you several garbage collection modes and strategies to let you decide how to deal with the garbage.</summary>
    <author>
        <name>noora</name>
        
    </author>
            <category term="Product: BEA JRockit" />
    
    <content type="html" xml:lang="" xml:base="http://dev2dev.bea.com/blog/noora/">
        <![CDATA[<P>
To the great relief of application developers, Java uses automatic memory management, where objects are allocated and freed automatically. Those of us who have spent endless hours debugging memory leaks and segmentation faults in non-garbage collected programming languages know the difference and appreciate it greatly. There is of course a downside. The garbage collector steals valuable CPU time from your Java application and even interrupts it now and then for garbage collection. Nobody likes that, but it is the cost of being rid of those segmentation faults for good. 
<P>
So, how do we minimize the impact of garbage collection? We can't get rid of it for good (believe me, many people have tried), but we can put it where it doesn't matter so much. Do garbage collection pauses matter to your application? Are you willing let the garbage collection run concurrently with the application and trade some overall application throughput for minimal garbage collection pauses? Sip it in small gulps or quaff it all in one go? Or do you want something in between? 
<P>
BEA JRockit offers a selection of garbage collection "modes". There are currently three "dynamic" modes and a "static" mode. The dynamic modes optimize the application for a specific profile - maximum overall throughput, short pauses or very short and deterministic pauses. These garbage collection modes will select a garbage collection "strategy" based on how the application behaves and may change garbage collection strategies if the application's behavior changes. The "static" garbage collection mode will on the other hand let you select a single garbage collection strategy to be used throughout the run. 
<P>
All applications want throughput, the higher the better. Thus the dynamic mode for maximum throughput is the default garbage collection mode in BEA JRockit. Some applications do however have an additional requirement that the latencies should be low. For example, a transaction may time out if a garbage collection interrupts it for too long, or a user will go and get some coffee and get lost on the way back if the graphical user interface suddenly freezes for garbage collection. The garbage collection in BEA JRockit needs to be tuned a bit for these applications. It costs a bit in overall throughput, but will save the day when the application becomes unusable if garbage collection pauses are too long.
<P>
The dynamic mode optimized for short pauses will do its best to keep the garbage collection pauses below a user defined pause target. The dynamic mode optimized for deterministic and very short pauses (available as a part of BEA WebLogic Real Time only) does this even better and for even lower pause targets. But sometimes you want something more... specific. Maybe you've noted that a generational parallel garbage collection strategy gives you just the right kind of garbage collection behavior, or maybe you are a firm believer in single generational garbage collection with a parallel mark phase and a mostly concurrent sweep phase. In that case you may want to try a static garbage collection mode. 
<P>
Four static garbage collection strategies are available with the command line option -Xgc, but hidden within BEA JRockit there are no less than twelve different garbage collection strategies! These can be accessed with the command line option -XXsetGc. Four of these are the same as the ones that you can set with -Xgc, and the other eight are the obscure combinations where the mark phase and the sweep phase of the garbage collection use different strategies. 
<P>
Now, if "mark and sweep", "generational", "mostly concurrent" and "parallel" are complete mumbo jumbo to you and you don't even want to know, I suggest that you select one of the dynamic garbage collection modes, or at least one of the four static strategies available with -Xgc. If you on the other hand look forward to running twelve rounds of benchmarking (plus some extra to vary the nursery size, because in this case size does matter) and want to see if you can find that optimal garbage collection strategy for your application, then -XXsetGc is your friend.
<P>
And if you're completely bewildered by now and want to know everything about what I'm talking about, I would recommend the sections called <A HREF="http://edocs.bea.com/jrockit/geninfo/diagnos/garbage_collect.html">Understanding Memory Management</A> and <A HREF="http://edocs.bea.com/jrockit/geninfo/diagnos/memman.html">Tuning the Memory Management System</A> in the <A HREF="http://edocs.bea.com/jrockit/geninfo/diagnos/index.html">BEA JRockit Diagnostics guide</A>, as well as the documentation on the <A HREF="http://edocs.bea.com/jrockit/jrdocs/refman/optionX.html#wp999520">-Xgc</A>, <A HREF="http://edocs.bea.com/jrockit/jrdocs/refman/optionX.html#wp999522">-XgcPrio</A> and <A HREF="http://edocs.bea.com/jrockit/jrdocs/refman/optionXX.html#wp999595">-XXsetGc</A> command line options.
<P>]]>
        
    </content>
</entry>
<entry>
    <title>The Secrets of Heap Sizing</title>
    <link rel="alternate" type="text/html" href="http://dev2dev.bea.com/blog/noora/archive/2007/10/the_secrets_of.html" />
    <id>http://dev2dev.bea.com/blog/noora/archive/2007/10/the_secrets_of.html</id>
    
    <published>2007-10-03T16:04:45Z</published>
    <updated>2007-10-03T16:35:05Z</updated>
    
    <summary>&quot;How big should I make the Java heap?&quot;

That is a question I get quite often from friends who develop Java applications. The simple answer is: it depends on your application and your machine. Most people aren&apos;t satisfied by that answer, so in this post I&apos;m giving you the long version.
</summary>
    <author>
        <name>noora</name>
        
    </author>
            <category term="Product: BEA JRockit" />
    
    <content type="html" xml:lang="" xml:base="http://dev2dev.bea.com/blog/noora/">
        <![CDATA[<p>"How big should I make the Java heap?"</p>

<p>That is a question I get quite often from friends who develop Java applications and hear that I work with a JVM. It is a simple question, but surprisingly tricky to answer. The simple answer is: it depends on your application and your machine. Most people don't like that answer, so here's the long version instead.</p>

<p>Usually a larger heap is better. Larger heaps mean fewer garbage collections, and fewer garbage collections is good. Unfortunately it is not quite as simple as that. There are some additional limitations to consider when selecting a heap size.</p>

<p>A part of the garbage collection time depends on the heap size. The time difference isn't that large - doubling the heap size won't double the garbage collection times, but a garbage collection of a 32 GB heap will take noticeably longer than a garbage collection of a 32 MB heap. Usually the extra overhead in garbage collection times is smaller than the gains from a lower garbage collection frequency, but if your application is very sensitive to long garbage collection pauses you may have to keep down the heap size a bit. </p>

<p>To view the garbage collection pause times you can add -Xverbose:gcpause to the command line. This makes BEA JRockit print out pause time data for each garbage collection. Another way is to do a JRA recording, see the BEA JRockit Mission Control documentation at <a href="http://edocs.bea.com/jrockit/tools/index.html">http://edocs.bea.com/jrockit/tools/index.html</a> for more information.</p>

<p>The amount of physical memory in the machine is a more serious limit. If the heap size is larger than the amount of free physical memory, a part of the heap will be paged to disk (swapping). Accessing the part that is on disk becomes very slow. A garbage collection will access all parts of the heap, so if the heap is on disk a garbage collection may take very long time (I once saw a log from a 20 minute garbage collection!). </p>

<p>So, you have 2 GB memory in your machine and set the heap to 1.9 GB, why does the JVM still start swapping during garbage collection? First of all, the heap isn't all the memory that the JVM requires. The JVM needs memory for Java methods, thread stacks, JNI code, the JVM itself and a lot of internal data structures. Depending on the application this could mean tens or even hundreds of megabytes. Second, the JVM is usually not the only application running on the machine. At least you have an operating system, which also requires memory, and sometimes you have other applications running as well. A more reasonable heap size on a 2 GB machine would thus be somewhere around 1 GB. </p>

<p>BEA JRockit can actually tell you if it is suffering from page faults during garbage collection. If you turn on verbose memory outputs on "debug" level (add -Xverbose:memdbg to the Java command line), you will get an output at each garbage collection telling you how many page faults there were during the garbage collection. If the number of page faults is high throughout the run you may want to reduce the heap size.</p>

<p>But what do you do if you're running in a development environment with a limited amount of memory available and want to set the heap as small as possible? Well, first of all you must decide how small "as small as possible" really is. </p>

<p>You can easily assess the absolute minimum heap requirements of your application with a test run, if you have some realistic high load test scenario that you can simulate. Start the JVM with a large fixed heap, for example -Xms:2g -Xmx:2g if there's enough RAM in the machine, and the command line options -Xgc:parallel and -XXfullCompaction. This makes each garbage collection free as much memory as possible even if it takes some time. Then you can monitor the heap occupancy after each garbage collection using BEA JRockit Mission Control. When you have started your Java application (running on BEA JRockit of course) you start up Mission Control and connect a Console to your application. In the Overview tab you will by default see the Java heap usage in the upper graph. The heap usage should look something like a saw-tooth pattern, going up and up until it reaches 100% and then drop down to a value below 100%. These "drops" are caused by garbage collection. Check the percentage at the bottom of each "drop". This is how much of your Java heap is occupied by live objects. Say for example that 25% of a 2 GB heap is occupied after each garbage collection, which means that you have 0.5 GB of live objects. A simple rule-of-thumb minimum heap size is twice this size. </p>

<p>There you have the limits: twice the size of your live data up to as much as your system can tolerate without paging. Usually larger is better, but if the garbage collection pause times grow too long you may want to trim the heap size a bit. I can't give you a much better answer than that, but hopefully it will at least help you get started.</p>]]>
        
    </content>
</entry>

</feed> 