Published on dev2dev (http://dev2dev.bea.com/)
http://dev2dev.bea.com/pub/a/2007/12/jrockit-tuning.html
See this if you're having trouble printing code examples
by Steven Pozarycki
12/13/2007
The goal of this document is to provide information for tuning the BEA JRockit JVM using a checklist approach. A lot of territory is covered, from esoteric command-line options to iterative performance testing. I gathered most of these data working with customers. If you have additional tips, please send them to me and I'll try and incorporate them in a later version of this document.
Specific product version information is outlined where applicable; however, general guidelines outlined here apply to most versions of JRockit. New settings and optimizations are added with every JRockit release, so check the release notes and the JRockit Product Center.
The first thing to do is to confirm the JRockit version being used by your runtime application server. You can do this by looking at the log file of the application server in question. This can also be done by setting your environment with the appropriate script and then doing a java -version to confirm the JRockit version that is running.
Next, gather the current JVM flags that are being used in development and/or production. For example, they could look like the following at startup:
-server -Xms1024m -Xmx1536m -Xverboselog:gc.log -Xverbose:memory -Xgcprio:throughput
This will tell you how your current JRockit instance is configured.
Determine what the goal of the application is. Is it "short response times" or is it "high application performance"? Depending on the goal, different garbage collection algorithms need to be set.
For example, if the goal is high application performance, confirm that
the Dynamic Garbage Collector "-Xgcprio:throughput"
is set. If short response times are your goal, then -Xgcprio:pausetime -Xpausetarget=XXX' with the pausetarget value set to a value would be optimal. Review the JRockit tuning
section documentation for further details.
If you have performance issues with the JVM, then the best course of action is to first gather some data for analysis. This can be done by someone on your team who is experienced in such matters, or this information can be sent to BEA Support for further analysis.
The first step is to gather a JRockit Recording (JRA) of about 10 minutes of the runtime while the problem is occurring. You can do this by using the jrcmd.sh utility or JRockit Mission Control (JRMC). See the sections below on "JRCMD/JRA during performance tests" and "JRockit Mission Control." And be sure to review the JRockit Mission Control documentation for further details. One valuable item to look at is the Latency Analysis section, which gives you insight into any latency issues (available with a license enabling this in JRockit).
The second step is to gather some verbose logging during the time of the problem. You can accomplish this by supplying the following arguments to the JVM command line when starting the server instance:
-Xverboselog:perTestGC.log -Xverbose:opt,memory,gcpause,memdbg,compaction,gc,license -Xverbosetimestamp -Xgcreport
This will gather valuable data for analysis into the perTestGC.log
file configured above. This data can be analyzed by someone on your team
and/or BEA Support.
One last point: Usually, applications do not make requests to do a garbage collection (that is, calling System.gc() within the application code). However, an option can be supplied to disable this with the -XXnoSystemGC flag on the Java command line when starting the server instances, in case you suspect that this could be an issue.
Let's now look at iterative performance testing approaches for nailing down the problems.
|
Once the initial data has been gathered and analyzed, you can take an iterative approach to tuning the JVM. The tests outlined here represent a general approach to iterative tuning at the JRockit JVM layer to see which settings may be beneficial for the particular application in question. I assume that you have a way to measure the performance results; these can then be compared against a "baseline," which you should already have.
In this test we look at the thread local area size. It's important, because if the default settings for these flags are not optimal for your application (in most cases they are), then locks on the heap will be acquired, which would have an effect on performance. Having most of the objects within these limits is beneficial for overall performance.
-XXtlasize and -XXlargeobjectlimit need to be tuned (keeping in mind
that, for most applications, the thread local area size should be at least
twice the size of the large object size from eDocs). This is listed in
the first page of the JRA Recording on the upper-right side. Review
the following information on tlaSize and the following on largeObjectLimit. In JRockit R27.3 forward, these will not have to be tuned, in
most cases.Now let's look at lock profiling, which gives you an indication of whether there is excessive locking within your application. If there is excessive locking, this will have an effect on overall performance.
-Djrockit.lockprofiling
enabled and analyze these results. Make sure there is no
logging enabled from the JVM. There is an overhead of around 5 to 10 percent with
this flag, and a separate test run will be performed to collect this data
where the performance is ignored and lock analysis is the only analysis done.-server -Xms1536m -Xmx1536m -Djrockit.lockprofiling
tlaSize
and largeObjectLimitIn this test we look at tuning the thread local area size and large object limit depending on the results from the earlier tests.
-XXtlaSize
and -XXlargeObjectLimit to higher values may help. However, this would
need to be verified and compared in a long duration test run. Setting the
preferredSize to 16k for R27.2 may help. Review the detailed
information on this issue. To do this, change the TLA
settings and rerun a test with the same Java command-line options as Test 1; add the TLA settings of -XXtlaSize and -XXlargeobjectlimit with a value. Review the following information on tlaSize.
Note: In R27.3 onwards, tuning these flags is generally not needed to
improve performance. In fact, tuning these flags too much could have an adverse
effect.The purpose of the following section of tests is to run with
setting the garbage collection algorithm to different settings and observing
the results to see what works best for your application. Review the following for
details on the -XXsetGC
flag. JRockit will run with a tuned nursery size and remove the
-Xgcprio:throughput flag. The throughput option will automatically switch
between these two versions of garbage collector, however selecting these directly might give
some extra performance benefits. The goal of nursery tuning is to keep the
number of promoted objects low, since this is the costly part of a nursery
collection. This is tuned by increasing or decreasing the size of the nursery. The
size of the nursery basically depends on how long objects live, since they
get promoted during the YC if they are live. Run jrcmd <PID> version to see what current garbage collector strategy is active.
-XXsetGC
. This will set
the garbage collection option to single-spaced
with a parallel mark algorithm and a
parallel sweep algorithm; it will also
tune the nursery size by hand:
-server -Xms1536m -Xmx1536m -Xns:384m -Xverboselog:perTestGC.log -Xverbose:opt,memory,gcpause,memdbg,compaction,gc,license -Xverbosetimestamp -Xgcreport -XXnoSystemGC -XXsetGC:singleparpar
-server -Xms1536m -Xmx1536m -Xns:384m -Xverboselog:perTestGC.log -Xverbose:opt,memory,gcpause,memdbg,compaction,gc,license -Xverbosetimestamp -Xgcreport -XXnoSystemGC -XXsetGC:genparpar
-XXsetGC:genparpar.-Xgc:gencon -Xns50m (and the logging set to gather metrics).-Xgc:parallel -XXcompactratio:1 (and the logging set to gather metrics).The purpose of this test is to see if setting the gcthreads flag to
a value will help with overall performance.
-XXgcthreads flag to the actual number of physical
CPUs and rerun tests. (This should automatically tune since, by default,
these values are based on the number of cores and hardware threads on the
machine.) You can verify this by looking at the verbose output log file
you gathered in the "Gather Troubleshooting Data" section. See
the following for more details on gcThreads
flag. If there is lock contention on fat locks, they can be disabled by using -XXdisableFatSpin or by letting JRockit adaptively disable them with -Djrockit.useAdaptiveFatSpin=true. This can be determined by looking at that tab in JRA when -Djrockit.lockprofiling was enabled in Test 2. Review the
following for more details on locking
in JRockit.
If you are running on Xeon hardware, adding -XXallocPrefetch
and -XXallocRedoPrefetch' to decrease the cost of
memory allocation along with the TLA and LargeObjectLimit will help. See more details on the allocPrefetch
flag. To get the best possible result, you may want to disable
hardware prefetching in the BIOS. Determining how this is done depends on the
brand of BIOS, but parameters typically have names such as "Hardware
Prefetcher," "Adjacent Sector Prefetcher," "Adjacent Cache
Line Prefetcher," or something similar. Review the following information
from Intel
on this subject.
This will lock the heap into memory so it does not get swapped out by
the operating system. Review the following information on largePages
flag , and also look under the section Configuring
-XXlargePages on Linux for more details for
configuration on the Linux operating system side. In JRockit R27 versions, this
option is called -XlargePages. Depending on earlier results, tuning the
-XlargePages flag may or may not help. Run a test with this flag and observe the results to see if this helps overall performance.
-XXaggressive flagThis is a collection of configurations that make the JVM perform at a high speed
and reach a stable state as soon as possible. To achieve this goal, the JVM
uses more internal resources at startup; however, it requires less adaptive
optimization once the goal is reached. We recommend that you use this option
for long-running, memory-intensive applications that work alone. Review the
following for more details on the aggressive
flag. Run a test with the -XXaggressive flag and observe the results to see if this helps overall performance.
-XX:+UseNewHashFunction flagThis option enables a new, faster hash function for HashMap that was
introduced in Java 5.0 Update 8 and is part of BEA JRockit as of R27.1.0. This
hash function can improve performance through improved hash spread but changes
the order in which elements are stored in the HashMap. Review the following
for more details on the UseNewHashFunction
flag. Run a test with the new -XX:+UseNewHashFunction and
observe the results to see if this helps overall performance.
"Dark matter" is wasted heap memory and fragments the heap. Look into minimizing dark matter so overall throughput is not affected when the heap needs to be compacted. Review the following options:
-Xgc:gencon or -Xgc:genpar). During a young collection (nursery garbage collection), the objects found live in the nursery are moved to the old generation. This has the positive side effect of compacting the objects while they are moved.-XXcompactionRatio=nn). Compaction reduces the dark matter by moving objects into compact chunks, removing the dark matter between them.-XXminBlockSize:<memSize>. Blocks on the heap that are smaller than the minimum block size count as dark matter. Therefore, by lowering the minimum block size, you end up with less dark matter. Beware, however, that garbage collections will take longer because JRockit must do more fine-grained searches for free heap space. By default, the minimum block size is 2 KB.Finally, look into tuning the WebLogic Server instance layer by reviewing the tuning recommendations outlined there.
|
The information supplied in this article is by no means a complete list. However, it will get you started on the road to tuning and understanding the JRockit JVM layer better!
The following sections have additional information, which is referenced from the above checklist.
A JRA Recoding can be done either by using the command line or by using the JRockit Mission Control (JRMC) Eclipse-based tool. JRMC allows you to connect to multiple JRockit JVMs and gather JRA recordings, look at real-time data of the JVM, detect and troubleshoot memory leaks, and also look at latency ("spots" where execution is slow) within the application. See the section below on running JRockit Mission Control.
<JROCKIT_HOME>/jre directory so the full path looks like this: <JROCKIT_HOME>/jre/license.beajrcmd.sh <PID> jrarecording filename=myrecording.xml
time=600 (the file is written to the local directory or the full
path/filename specified) or jrcmd.sh <PID> print_threads.myrecording.xml.zip file. The JRA tool is
located under <JROCKIT_HOME>/bin by executing the JRA binary in that
directory.java
-Xmanagement:autodiscovery=true,ssl=false,authenticate=false,port=7091<jrockit-install-directory>/bin/jrmc.exe(sh)<JROCKIT_HOME>/jre/bin/jrockit
directory.#ctrlhander.act file located in the <JROCKIT_HOME>/jre/bin/jrockit directory set_filename filename=./jrocket_control_breakoutput.txt append=true timestamp print_threads timestamp version print_class_summary print_object_summary increaseonly=true print_threads print_threads nativestack=true print_utf8pool timestamp print_memusage timestamp heap_diagnostics timestamp # The following is optional and is another way to generate a JRA recording jrarecording filename=./myjra.xml time=600
-XXlargePages on LinuxQuestion: Why use largePages?
Answer:The advantage of using largePages is that the heap memory will be locked and is not eligible for paging out on swap (IOWait and GC can be reduced as well). The access of objects in heap in real memory is obviously faster. Therefore, using the largePages option is a good option for achieving performance goals.
HugePages_Total: xxx HugePages_Free: yyy Hugepagesize: zzz KBIf xxx is 0, no large pages are allocated.
mkdir -p /mnt/hugepages mount -t hugetlbfs nodev /mnt/hugepages chmod 777 /mnt/hugepages
echo 20 > /proc/sys/vm/nr_hugepages
The number 20 is how many pages should be reserved for this. To deallocate them, allocate 0 pages. (NOTE: See the Q&A below to determine the correct number.)
If not all pages requested are reserved, there isn't enough free memory. If there should be, the memory is probably too fragmented and the recommendation is then to restart the machine. Note that large pages cannot be swapped, so everything has to stay in the physical memory.
On RHEL3, that file seems to be called
/proc/sys/vm/hugetlb_pool, so the command is the following instead:
echo 500 > /proc/sys/vm/hugetlb_pool
Here the number 500 says how many MB are requested, not the number of pages. If JRockit was unable to remove the temporary huge page file directly, don't forget to do it after the execution. Otherwise, those pages will be unavailable until they are released. This works with RedHat kernel build 2.4.18-e.25.smp but not 2.4.18-e.12.smp.
Question: How do I determine the correct number to send to /proc/sys/vm/nr_hugepages?
Answer: The goal is to put the entire Java heap into largePages, so it depends on how large the heap is and what the page size is. Determining the correct number for /proc/sys/vm/nr_hugepages would be as simple as the following example: Given the following information:
JVM Max Heap = 1536MB (1572864 KB approx) HPAGE_SIZE = 2 MB
then the value sent to /proc/sys/vm/nr_hugepages would be approximately 7.7. (1572864 MB / 2 MB)
Notes:
Even though it is theoretically possible to dynamically set the number of large pages, it isn't so dynamic in practice. Decreasing the number of large pages is never a problem, but increasing the number of large pages is. The reason is that to create a large page, Linux needs to find a large enough contiguous region of memory. If it can't find one, it can't create a large page.
Right after boot, memory isn't very fragmented, so finding a large enough region isn't a problem. However, the longer the machine runs, the more fragmented the memory becomes. So, if you want to be sure you can allocate enough large pages, you will have to set up the pages right after boot, either in some startup script or manually.
Steven Pozarycki is a principal systems engineer at BEA Systems where he helps architect solutions, troubleshoot and solve complex customer issues with their mission-critical applications on BEA products.
Return to Dev2Dev.