Published on dev2dev (http://dev2dev.bea.com/)
 http://dev2dev.bea.com/pub/a/2007/12/jrockit-tuning.html
 See this if you're having trouble printing code examples

Checklist/Tuning Guide for Optimizing the JRockit JVM

by Steven Pozarycki
12/13/2007

Abstract

The goal of this document is to provide information for tuning the BEA JRockit JVM using a checklist approach. A lot of territory is covered, from esoteric command-line options to iterative performance testing. I gathered most of these data working with customers. If you have additional tips, please send them to me and I'll try and incorporate them in a later version of this document.

Specific product version information is outlined where applicable; however, general guidelines outlined here apply to most versions of JRockit. New settings and optimizations are added with every JRockit release, so check the release notes and the JRockit Product Center.

Verify the Current JRockit Environment

The first thing to do is to confirm the JRockit version being used by your runtime application server. You can do this by looking at the log file of the application server in question. This can also be done by setting your environment with the appropriate script and then doing a java -version to confirm the JRockit version that is running.

Next, gather the current JVM flags that are being used in development and/or production. For example, they could look like the following at startup:

-server -Xms1024m -Xmx1536m -Xverboselog:gc.log -Xverbose:memory
-Xgcprio:throughput

This will tell you how your current JRockit instance is configured.

Determine the Goal of the Application

Determine what the goal of the application is. Is it "short response times" or is it "high application performance"? Depending on the goal, different garbage collection algorithms need to be set.

For example, if the goal is high application performance, confirm that the Dynamic Garbage Collector "-Xgcprio:throughput" is set. If short response times are your goal, then -Xgcprio:pausetime -Xpausetarget=XXX' with the pausetarget value set to a value would be optimal. Review the JRockit tuning section documentation for further details.

Gather Troubleshooting Data

If you have performance issues with the JVM, then the best course of action is to first gather some data for analysis. This can be done by someone on your team who is experienced in such matters, or this information can be sent to BEA Support for further analysis.

The first step is to gather a JRockit Recording (JRA) of about 10 minutes of the runtime while the problem is occurring. You can do this by using the jrcmd.sh utility or JRockit Mission Control (JRMC). See the sections below on "JRCMD/JRA during performance tests" and "JRockit Mission Control." And be sure to review the JRockit Mission Control documentation for further details. One valuable item to look at is the Latency Analysis section, which gives you insight into any latency issues (available with a license enabling this in JRockit).

The second step is to gather some verbose logging during the time of the problem. You can accomplish this by supplying the following arguments to the JVM command line when starting the server instance:

-Xverboselog:perTestGC.log
-Xverbose:opt,memory,gcpause,memdbg,compaction,gc,license
-Xverbosetimestamp -Xgcreport

This will gather valuable data for analysis into the perTestGC.log file configured above. This data can be analyzed by someone on your team and/or BEA Support.

One last point: Usually, applications do not make requests to do a garbage collection (that is, calling System.gc() within the application code). However, an option can be supplied to disable this with the -XXnoSystemGC flag on the Java command line when starting the server instances, in case you suspect that this could be an issue.

Let's now look at iterative performance testing approaches for nailing down the problems.

Iterative Performance Testing Scenarios and Approaches

Once the initial data has been gathered and analyzed, you can take an iterative approach to tuning the JVM. The tests outlined here represent a general approach to iterative tuning at the JRockit JVM layer to see which settings may be beneficial for the particular application in question. I assume that you have a way to measure the performance results; these can then be compared against a "baseline," which you should already have.

Test 1: Thread local area size and large object size

In this test we look at the thread local area size. It's important, because if the default settings for these flags are not optimal for your application (in most cases they are), then locks on the heap will be acquired, which would have an effect on performance. Having most of the objects within these limits is beneficial for overall performance.

Note: To ensure that the profiles and measurements have been taken during steady-state, allow enough time for the application to warm up. One way to verify this during profiling is to look at the Optimizations tab in a JRA Recording, where the number of optimizations and the time spent optimizing before and after the profiling should be approximately (ideally, exactly) equal.

Test 2: Lock profiling

Now let's look at lock profiling, which gives you an indication of whether there is excessive locking within your application. If there is excessive locking, this will have an effect on overall performance.

-server -Xms1536m -Xmx1536m -Djrockit.lockprofiling

Test 3: Tune the tlaSize and largeObjectLimit

In this test we look at tuning the thread local area size and large object limit depending on the results from the earlier tests.

Test 4: Tune the garbage collection algorithm

The purpose of the following section of tests is to run with setting the garbage collection algorithm to different settings and observing the results to see what works best for your application. Review the following for details on the -XXsetGC flag. JRockit will run with a tuned nursery size and remove the -Xgcprio:throughput flag. The throughput option will automatically switch between these two versions of garbage collector, however selecting these directly might give some extra performance benefits. The goal of nursery tuning is to keep the number of promoted objects low, since this is the costly part of a nursery collection. This is tuned by increasing or decreasing the size of the nursery. The size of the nursery basically depends on how long objects live, since they get promoted during the YC if they are live. Run jrcmd <PID> version to see what current garbage collector strategy is active.

Test 5: Tune the garbage collection threads

The purpose of this test is to see if setting the gcthreads flag to a value will help with overall performance.

Test 6: Tune for lock contention

If there is lock contention on fat locks, they can be disabled by using -XXdisableFatSpin or by letting JRockit adaptively disable them with -Djrockit.useAdaptiveFatSpin=true. This can be determined by looking at that tab in JRA when -Djrockit.lockprofiling was enabled in Test 2. Review the following for more details on locking in JRockit.

Test 7: Tune for Xeon hardware

If you are running on Xeon hardware, adding -XXallocPrefetch and -XXallocRedoPrefetch' to decrease the cost of memory allocation along with the TLA and LargeObjectLimit will help. See more details on the allocPrefetch flag. To get the best possible result, you may want to disable hardware prefetching in the BIOS. Determining how this is done depends on the brand of BIOS, but parameters typically have names such as "Hardware Prefetcher," "Adjacent Sector Prefetcher," "Adjacent Cache Line Prefetcher," or something similar. Review the following information from Intel on this subject.

Test 8: Put the heap into largePages

This will lock the heap into memory so it does not get swapped out by the operating system. Review the following information on largePages flag , and also look under the section Configuring -XXlargePages on Linux for more details for configuration on the Linux operating system side. In JRockit R27 versions, this option is called -XlargePages. Depending on earlier results, tuning the -XlargePages flag may or may not help. Run a test with this flag and observe the results to see if this helps overall performance.

Test 9: Test with the -XXaggressive flag

This is a collection of configurations that make the JVM perform at a high speed and reach a stable state as soon as possible. To achieve this goal, the JVM uses more internal resources at startup; however, it requires less adaptive optimization once the goal is reached. We recommend that you use this option for long-running, memory-intensive applications that work alone. Review the following for more details on the aggressive flag. Run a test with the -XXaggressive flag and observe the results to see if this helps overall performance.

Test 10: Testing with the -XX:+UseNewHashFunction flag

This option enables a new, faster hash function for HashMap that was introduced in Java 5.0 Update 8 and is part of BEA JRockit as of R27.1.0. This hash function can improve performance through improved hash spread but changes the order in which elements are stored in the HashMap. Review the following for more details on the UseNewHashFunction flag. Run a test with the new -XX:+UseNewHashFunction and observe the results to see if this helps overall performance.

Test 11: Minimize dark matter

"Dark matter" is wasted heap memory and fragments the heap. Look into minimizing dark matter so overall throughput is not affected when the heap needs to be compacted. Review the following options:

Further testing: Tune the application server layer

Finally, look into tuning the WebLogic Server instance layer by reviewing the tuning recommendations outlined there.

Summary

The information supplied in this article is by no means a complete list. However, it will get you started on the road to tuning and understanding the JRockit JVM layer better!

References

Appendix

The following sections have additional information, which is referenced from the above checklist.

JRCMD/JRA During Performance Tests

A JRA Recoding can be done either by using the command line or by using the JRockit Mission Control (JRMC) Eclipse-based tool. JRMC allows you to connect to multiple JRockit JVMs and gather JRA recordings, look at real-time data of the JVM, detect and troubleshoot memory leaks, and also look at latency ("spots" where execution is slow) within the application. See the section below on running JRockit Mission Control.

  1. Download a development license.
  1. Add this "license.bea" to the <JROCKIT_HOME>/jre directory so the full path looks like this: <JROCKIT_HOME>/jre/license.bea
  1. Run the JRCMD like the following: jrcmd.sh <PID> jrarecording filename=myrecording.xml time=600 (the file is written to the local directory or the full path/filename specified) or jrcmd.sh <PID> print_threads.
  1. Use the offline JRA tool to analyze the resulting myrecording.xml.zip file. The JRA tool is located under <JROCKIT_HOME>/bin by executing the JRA binary in that directory.
  1. Review the JRCMD Documentation.

JRockit Mission Control

  1. Add the following to the start line of the WebLogic Instance using JRockit: java -Xmanagement:autodiscovery=true,ssl=false,authenticate=false,port=7091
  1. Start JRockit Mission Control by: <jrockit-install-directory>/bin/jrmc.exe(sh)
  1. If necessary, add the license.bea file for JR Mission Control to: <JROCKIT_HOME>/jre/license.bea
  1. With Mission Control, JRA Recordings, memory leaks, latency investigation, and monitoring can all be done from one place.

Snapshots of the Heap/Threads During Tests

  1. Create a simple text file called ctrhandler.act file and put this in the <JROCKIT_HOME>/jre/bin/jrockit directory.
  1. When a thread dump command is performed for example, kill -3), JRockit will look at this file and execute the list of commands.
  1. The ctrlhandler.act file should contain the following information:
#ctrlhander.act file located in the <JROCKIT_HOME>/jre/bin/jrockit directory
set_filename filename=./jrocket_control_breakoutput.txt append=true
timestamp
print_threads
timestamp
version
print_class_summary
print_object_summary increaseonly=true
print_threads
print_threads nativestack=true
print_utf8pool
timestamp
print_memusage
timestamp
heap_diagnostics
timestamp
# The following is optional and is another way to generate a JRA
recording
jrarecording filename=./myjra.xml time=600

Configuring -XXlargePages on Linux

Question: Why use largePages?

Answer:The advantage of using largePages is that the heap memory will be locked and is not eligible for paging out on swap (IOWait and GC can be reduced as well). The access of objects in heap in real memory is obviously faster. Therefore, using the largePages option is a good option for achieving performance goals.

  1. If the machine supports large pages, the output of cat /proc/meminfo will have output like:
    HugePages_Total: xxx
    HugePages_Free:  yyy
    Hugepagesize:    zzz KB
    
    If xxx is 0, no large pages are allocated.
  2. If it does not, then the Linux kernel needs to be built with the CONFIG_HUGETLBFS (present under "File systems") and CONFIG_HUGETLB_PAGE (selected automatically when CONFIG_HUGETLBFS is selected) configuration options.
  3. Next, allocate large pages on Linux. Note: Only root is allowed to allocate large pages.
    1. Mount the file system. JRockit use the hugepages file system, a file system in memory. Mounting the file system is done in these steps. The actual mount and chmod commands has to be done every time the machine is restarted or one can add this to a /etc/rc.d/rc.local type of file:
      	mkdir -p  /mnt/hugepages
      	mount -t hugetlbfs nodev /mnt/hugepages
      	chmod 777 /mnt/hugepages
      	
    2. Allocate the huge pages. This is done dynamically by specifying how much of the memory should be allocated. At allocation time, the pages are reserved and cannot be used as normal pages. Allocate and deallocate them like this:
      echo 20 > /proc/sys/vm/nr_hugepages

The number 20 is how many pages should be reserved for this. To deallocate them, allocate 0 pages. (NOTE: See the Q&A below to determine the correct number.)

If not all pages requested are reserved, there isn't enough free memory. If there should be, the memory is probably too fragmented and the recommendation is then to restart the machine. Note that large pages cannot be swapped, so everything has to stay in the physical memory.

On RHEL3, that file seems to be called /proc/sys/vm/hugetlb_pool, so the command is the following instead:

echo 500 > /proc/sys/vm/hugetlb_pool

Here the number 500 says how many MB are requested, not the number of pages. If JRockit was unable to remove the temporary huge page file directly, don't forget to do it after the execution. Otherwise, those pages will be unavailable until they are released. This works with RedHat kernel build 2.4.18-e.25.smp but not 2.4.18-e.12.smp.

Question: How do I determine the correct number to send to /proc/sys/vm/nr_hugepages?

Answer: The goal is to put the entire Java heap into largePages, so it depends on how large the heap is and what the page size is. Determining the correct number for /proc/sys/vm/nr_hugepages would be as simple as the following example: Given the following information:

JVM Max Heap = 1536MB (1572864 KB approx)
HPAGE_SIZE = 2 MB 

then the value sent to /proc/sys/vm/nr_hugepages would be approximately 7.7. (1572864 MB / 2 MB)

Notes:

Even though it is theoretically possible to dynamically set the number of large pages, it isn't so dynamic in practice. Decreasing the number of large pages is never a problem, but increasing the number of large pages is. The reason is that to create a large page, Linux needs to find a large enough contiguous region of memory. If it can't find one, it can't create a large page.

Right after boot, memory isn't very fragmented, so finding a large enough region isn't a problem. However, the longer the machine runs, the more fragmented the memory becomes. So, if you want to be sure you can allocate enough large pages, you will have to set up the pages right after boot, either in some startup script or manually.

Steven Pozarycki is a principal systems engineer at BEA Systems where he helps architect solutions, troubleshoot and solve complex customer issues with their mission-critical applications on BEA products.


Return to Dev2Dev.