proftool
is a utility for capturing and examining profiles of Java
programs. It's intended to provide machine level details about the execution to
aid with JIT performance analysis. It currently consists of three primary components:
- a JVMTI agent for capturing all the assembly generated by the JVM.
- a parser for Linux perf output
- a parser for HotSpot LogCompilation information
By combining these components into a single command line, the perf profile information can be attributed to the JIT code. Profile collection currently only supports Linux perf though once the data is captured then the profile can be viewed anywhere.
This suite must be built first to create the JVMTI agent library and the suite is normally dynamically imported to make the commands available through mx.
There are currently 2 ways to capture profiles. The direct way is to use the profrecord
command
which takes a full Java command line as an argument and launches it with extra arguments
to collect the data.
$ mx profrecord -E fop /home/graal/oraclejdk1.8.0_291-jvmci-21.1-b03/bin/java -jar dacapo.jar fop -n 56
The required argument -E <name>
specifies the directory that will contain the output the files. It's an error if it
already exists but the -O
will overwrite the directory if it already exists.
mx profrecord
should work with any Java command line you can supply.
The second way is to use the proftool
profiler with the mx benchmark
command. The proftool
profiler
conflicts with the rss
tracker which is enabled by default, so it currently must be explicitly turned off. For example, running:
$ mx benchmark dacapo:fop --tracker none -- --profiler proftool
produces a uniquely named directory like proftool_fop_2021-07-11_181530
which contains a profile of the full
benchmark run. Note that the --tracker
and --profiler
arguments are on different sides of the --
separator.
After capturing the profile, the perf binary output needs to be converted to text
using the profpackage
command because the current benchmark profiler infrastructure can't do that automatically.
$ mx profpackage proftool_fop_2021-07-11_181530
Created /home/graal/ws/graal/compiler/proftool_fop_2021-07-11_181530.zip
This command additionally packs everything into a zip to make it easy to move profiles around,
though that step can be skipped with the -n
option. Both the zip and directory form can be used directly
by all the proftool
commands so use whichever form is most convenient. Packaging as a zip it intended to
simplify capturing the profile on a Linux machine then moving it to another machine for analysis.
Not all benchmark suites actually support the --profiler
option even though it's broadly advertised in the
help output. At the current time, only the dacapo
, scala-dacapo
, renaissance
and renaissance-legacy
suites fully support the --profiler
option. The JMH
benchmarks only correctly support it when the
the JMH option -f 0
is used to suppress forking by the harness. Note that this changes the way the benchmark
is run and might not produce the same results. With JMH
you can always fall back to the JMH
perfasm
profiler which is sufficient for most uses.
For other benchmarks, you can always use mx profrecord
directly by capturing the command
line used to launch the benchmark with mx -v benchmark ...
and then passing that to mx profrecord
. For example,
for the fop benchmark above:
$ mx -v benchmark dacapo:fop --tracker none
displays a very long command line in the middle of all the output that includes something like the follow:
/home/tkrodrig/ws/unroll/graal/compiler/mxbuild/linux-amd64/graaljdks/jdk1.8-cmp/bin/java -server ... \
-jar /home/graal/.mx/cache/DACAPO_e39957904b7e79caf4fa54f30e8e4ee74d4e9e37/dacapo.jar fop -n 56
For compactness, the sample output above doesn't include all required arguments.
Copy and paste that line to use it with mx profrecord
:
$ mx profrecord -E fop /home/tkrodrig/ws/unroll/graal/compiler/mxbuild/linux-amd64/graaljdks/jdk1.8-cmp/bin/java -server ... \
-jar /home/graal/.mx/cache/DACAPO_e39957904b7e79caf4fa54f30e8e4ee74d4e9e37/dacapo.jar fop -n 56
The primary command for examining profiles is profhot
. This gives a summary
of the top C functions and the top JIT methods. The JIT methods are additionally disassembled and
annotated with the profile information. Truncated sample output looks like this:
$ mx profhot proftool_fop_2021-07-11_181530.zip
Hot C functions:
Percent Name
5.39% SpinPause
2.07% ParMarkBitMap::mark_obj
1.98% ParallelCompactData::add_obj
0.95% GreyObjectsWalker_walkGreyObjects_e59a80a81022647b793f2f995d251cf18cbcbab4
0.79% ControlFlowGraph_identifyBlocks_91c6bd2376323c10cda627f1ed5130f90d88a82f
0.69% EconomicMapImpl_compareKeys_139d60c35d64f232b5b9fac87391585f45801e09
0.67% ArrayList_add_43f6399c3b46d50c69e7ebaee5e44ba3b6a870f3
0.58% ControlFlowGraph_identifyBlock_7f27028f31036b771721050156eba6ce33899dfa
0.55% SchedulePhase$Instance_scheduleEarliestIterative_f11f1d84d698be305ec40ea50fa995f7185d393f
0.54% EconomicMapImpl_find_70589f86ac2abbc40566645a1af83af4aa5c89f0
Hot generated code:
Percent Compile Id Name
1.97% 2557 sun.misc.FloatingDecimal$BinaryToASCIIBuffer.dtoa(int, long, int, boolean)
1.86% 3101 org.apache.fop.render.ps.PSRenderer.renderText(org.apache.fop.area.inline.AbstractTextArea, java.lang.String, int[])
1.27% Interpreter
1.12% 3353 org.apache.fop.fo.FObj.addChildNode(org.apache.fop.fo.FONode)
0.75% 3112 java.text.DigitList.set(boolean, double, int, boolean)
0.63% 1357 org.apache.fop.fo.properties.PropertyMaker.get(int, org.apache.fop.fo.PropertyList, boolean, boolean)
0.61% 3145 org.apache.fop.layoutmgr.inline.TextLayoutManager.getNextKnuthElements(org.apache.fop.layoutmgr.LayoutContext, int)
0.51% 3110 java.text.DecimalFormat.subformat(java.lang.StringBuffer, java.text.Format$FieldDelegate, boolean, boolean, int, int, int, int)
0.34% 2787 org.apache.fop.layoutmgr.inline.TextLayoutManager.addAreas(org.apache.fop.layoutmgr.PositionIterator, org.apache.fop.layoutmgr.LayoutContext)
0.32% 2738 org.apache.fop.layoutmgr.inline.TextLayoutManager.createTextArea(org.apache.fop.traits.MinOptMax, int, org.apache.fop.layoutmgr.LayoutContext, int, int, int, boolean)
sun.misc.FloatingDecimal$BinaryToASCIIBuffer.dtoa(int, long, int, boolean)
0x7f7a69251e80-0x7f7a69252d38 (samples=613, period=2591391597)
Hot region 1
0x7f7a69252082: sub ecx, edi ; sun.misc.FloatingDecimal$BinaryToASCIIBuffer.dtoa:252
0x7f7a69252084: mov edx, ecx ; sun.misc.FloatingDecimal$BinaryToASCIIBuffer.dtoa:287
0x7f7a69252086: add edx, r13d
0x7f7a69252089: add edx, dword ptr [rax + rsi*4 + 0x10] ; sun.misc.FloatingDecimal$BinaryToASCIIBuffer.dtoa:307
0.16% 0x7f7a6925208d: cmp edx, 0x40 ; sun.misc.FloatingDecimal$BinaryToASCIIBuffer.dtoa:330
0x7f7a69252090: jge 0x7f7a69252a62
0x7f7a69252096: sub r10d, r14d ; sun.misc.FloatingDecimal$BinaryToASCIIBuffer.dtoa:372
0x7f7a69252099: sub r10d, edi ; sun.misc.FloatingDecimal$BinaryToASCIIBuffer.dtoa:259
0x7f7a6925209c: mov edi, r10d ; sun.misc.FloatingDecimal$BinaryToASCIIBuffer.dtoa:294
0x7f7a6925209f: add edi, dword ptr [rax + rbx*4 + 0x10]
0x7f7a692520a3: inc edi
0x7f7a692520a5: cmp edi, 0x40
0x7f7a692520a8: jge 0x7f7a69252c6f
...
Every piece of generated code is disassembled and broken into hot regions where ticks are clustered. This gives a general overview of the hot parts of the execution.
Proftool provides limited extra support for Truffle programs at the moment. The only extra information printed is the Truffle provided name of the generated nmethod when printing the nmethod names. This results in summary output like this for the octane splay benchmark:
Hot generated code:
Percent Name
3.04% 10260: "MeasureDefault#2" OptimizedCallTarget.profiledPERoot(Object[])
1.79% 5934: "GeneratePayloadTree#2" OptimizedCallTarget.profiledPERoot(Object[])
0.74% 10118: "SplayTree.remove#2" OptimizedCallTarget.profiledPERoot(Object[])
0.65% Interpreter
0.18% 7976: "SplayRun#2" OptimizedCallTarget.profiledPERoot(Object[])
0.16% 11695: FastDtoa.digitGen(DiyFp, DiyFp, DiyFp, DtoaBuffer, int)
0.13% 8914: "SplayTree.splay_#2" OptimizedCallTarget.profiledPERoot(Object[])
0.07% 1345: OptimizedCallTarget.callBoundary(Object[])
0.07% jshort_disjoint_arraycopy
0.07% 7877: "SplayTree.splay_#1" OptimizedCallTarget.profiledPERoot(Object[])
OptimizedCallTarget is the Java method that is used by the partial evaluator when compiling Truffle methods and the name in quotes should correspond to the compiled JavaScript function. More support is planned for future versions of prftool.
Other options to profhot give more control over how this is printed. It's possible to strip package names from the output
using the '-s' option which can make the output more compact. The debug information associated with a pc can be
quite deep which will overwhelm the actual assembly output. The '-c' option can be used to control the number of frames printed,
so passing 0
will hide the frame information completely and 1
will show just deepest inline frame.