Skip to content

Latest commit

 

History

History
165 lines (135 loc) · 9.46 KB

README-proftool.md

File metadata and controls

165 lines (135 loc) · 9.46 KB

README

proftool is a utility for capturing and examining profiles of Java programs. It's intended to provide machine level details about the execution to aid with JIT performance analysis. It currently consists of three primary components:

  • a JVMTI agent for capturing all the assembly generated by the JVM.
  • a parser for Linux perf output
  • a parser for HotSpot LogCompilation information

By combining these components into a single command line, the perf profile information can be attributed to the JIT code. Profile collection currently only supports Linux perf though once the data is captured then the profile can be viewed anywhere.

This suite must be built first to create the JVMTI agent library and the suite is normally dynamically imported to make the commands available through mx.

Recording a profile

There are currently 2 ways to capture profiles. The direct way is to use the profrecord command which takes a full Java command line as an argument and launches it with extra arguments to collect the data.

$ mx profrecord -E fop /home/graal/oraclejdk1.8.0_291-jvmci-21.1-b03/bin/java -jar dacapo.jar fop -n 56

The required argument -E <name> specifies the directory that will contain the output the files. It's an error if it already exists but the -O will overwrite the directory if it already exists. mx profrecord should work with any Java command line you can supply.

The second way is to use the proftool profiler with the mx benchmark command. The proftool profiler conflicts with the rss tracker which is enabled by default, so it currently must be explicitly turned off. For example, running:

$ mx benchmark dacapo:fop --tracker none -- --profiler proftool

produces a uniquely named directory like proftool_fop_2021-07-11_181530 which contains a profile of the full benchmark run. Note that the --tracker and --profiler arguments are on different sides of the -- separator.

After capturing the profile, the perf binary output needs to be converted to text using the profpackage command because the current benchmark profiler infrastructure can't do that automatically.

$ mx profpackage proftool_fop_2021-07-11_181530
Created /home/graal/ws/graal/compiler/proftool_fop_2021-07-11_181530.zip

This command additionally packs everything into a zip to make it easy to move profiles around, though that step can be skipped with the -n option. Both the zip and directory form can be used directly by all the proftool commands so use whichever form is most convenient. Packaging as a zip it intended to simplify capturing the profile on a Linux machine then moving it to another machine for analysis.

Not all benchmark suites actually support the --profiler option even though it's broadly advertised in the help output. At the current time, only the dacapo, scala-dacapo, renaissance and renaissance-legacy suites fully support the --profiler option. The JMH benchmarks only correctly support it when the the JMH option -f 0 is used to suppress forking by the harness. Note that this changes the way the benchmark is run and might not produce the same results. With JMH you can always fall back to the JMH perfasm profiler which is sufficient for most uses.

For other benchmarks, you can always use mx profrecord directly by capturing the command line used to launch the benchmark with mx -v benchmark ... and then passing that to mx profrecord. For example, for the fop benchmark above:

$ mx -v benchmark dacapo:fop --tracker none 

displays a very long command line in the middle of all the output that includes something like the follow:

/home/tkrodrig/ws/unroll/graal/compiler/mxbuild/linux-amd64/graaljdks/jdk1.8-cmp/bin/java -server ... \
-jar /home/graal/.mx/cache/DACAPO_e39957904b7e79caf4fa54f30e8e4ee74d4e9e37/dacapo.jar fop -n 56

For compactness, the sample output above doesn't include all required arguments. Copy and paste that line to use it with mx profrecord:

$ mx profrecord -E fop /home/tkrodrig/ws/unroll/graal/compiler/mxbuild/linux-amd64/graaljdks/jdk1.8-cmp/bin/java -server ... \
-jar /home/graal/.mx/cache/DACAPO_e39957904b7e79caf4fa54f30e8e4ee74d4e9e37/dacapo.jar fop -n 56

Examining a profile.

The primary command for examining profiles is profhot. This gives a summary of the top C functions and the top JIT methods. The JIT methods are additionally disassembled and annotated with the profile information. Truncated sample output looks like this:

$ mx profhot proftool_fop_2021-07-11_181530.zip
Hot C functions:
  Percent   Name
    5.39%   SpinPause
    2.07%   ParMarkBitMap::mark_obj
    1.98%   ParallelCompactData::add_obj
    0.95%   GreyObjectsWalker_walkGreyObjects_e59a80a81022647b793f2f995d251cf18cbcbab4
    0.79%   ControlFlowGraph_identifyBlocks_91c6bd2376323c10cda627f1ed5130f90d88a82f
    0.69%   EconomicMapImpl_compareKeys_139d60c35d64f232b5b9fac87391585f45801e09
    0.67%   ArrayList_add_43f6399c3b46d50c69e7ebaee5e44ba3b6a870f3
    0.58%   ControlFlowGraph_identifyBlock_7f27028f31036b771721050156eba6ce33899dfa
    0.55%   SchedulePhase$Instance_scheduleEarliestIterative_f11f1d84d698be305ec40ea50fa995f7185d393f
    0.54%   EconomicMapImpl_find_70589f86ac2abbc40566645a1af83af4aa5c89f0

Hot generated code:
  Percent   Compile Id  Name
    1.97%         2557  sun.misc.FloatingDecimal$BinaryToASCIIBuffer.dtoa(int, long, int, boolean)
    1.86%         3101  org.apache.fop.render.ps.PSRenderer.renderText(org.apache.fop.area.inline.AbstractTextArea, java.lang.String, int[])
    1.27%               Interpreter
    1.12%         3353  org.apache.fop.fo.FObj.addChildNode(org.apache.fop.fo.FONode)
    0.75%         3112  java.text.DigitList.set(boolean, double, int, boolean)
    0.63%         1357  org.apache.fop.fo.properties.PropertyMaker.get(int, org.apache.fop.fo.PropertyList, boolean, boolean)
    0.61%         3145  org.apache.fop.layoutmgr.inline.TextLayoutManager.getNextKnuthElements(org.apache.fop.layoutmgr.LayoutContext, int)
    0.51%         3110  java.text.DecimalFormat.subformat(java.lang.StringBuffer, java.text.Format$FieldDelegate, boolean, boolean, int, int, int, int)
    0.34%         2787  org.apache.fop.layoutmgr.inline.TextLayoutManager.addAreas(org.apache.fop.layoutmgr.PositionIterator, org.apache.fop.layoutmgr.LayoutContext)
    0.32%         2738  org.apache.fop.layoutmgr.inline.TextLayoutManager.createTextArea(org.apache.fop.traits.MinOptMax, int, org.apache.fop.layoutmgr.LayoutContext, int, int, int, boolean)

sun.misc.FloatingDecimal$BinaryToASCIIBuffer.dtoa(int, long, int, boolean)
0x7f7a69251e80-0x7f7a69252d38 (samples=613, period=2591391597)
Hot region 1
       0x7f7a69252082:  sub     ecx, edi        ; sun.misc.FloatingDecimal$BinaryToASCIIBuffer.dtoa:252
       0x7f7a69252084:  mov     edx, ecx        ; sun.misc.FloatingDecimal$BinaryToASCIIBuffer.dtoa:287
       0x7f7a69252086:  add     edx, r13d
       0x7f7a69252089:  add     edx, dword ptr [rax + rsi*4 + 0x10]     ; sun.misc.FloatingDecimal$BinaryToASCIIBuffer.dtoa:307
 0.16% 0x7f7a6925208d:  cmp     edx, 0x40       ; sun.misc.FloatingDecimal$BinaryToASCIIBuffer.dtoa:330
       0x7f7a69252090:  jge     0x7f7a69252a62
       0x7f7a69252096:  sub     r10d, r14d      ; sun.misc.FloatingDecimal$BinaryToASCIIBuffer.dtoa:372
       0x7f7a69252099:  sub     r10d, edi       ; sun.misc.FloatingDecimal$BinaryToASCIIBuffer.dtoa:259
       0x7f7a6925209c:  mov     edi, r10d       ; sun.misc.FloatingDecimal$BinaryToASCIIBuffer.dtoa:294
       0x7f7a6925209f:  add     edi, dword ptr [rax + rbx*4 + 0x10]
       0x7f7a692520a3:  inc     edi
       0x7f7a692520a5:  cmp     edi, 0x40
       0x7f7a692520a8:  jge     0x7f7a69252c6f
 ...

Every piece of generated code is disassembled and broken into hot regions where ticks are clustered. This gives a general overview of the hot parts of the execution.

Proftool provides limited extra support for Truffle programs at the moment. The only extra information printed is the Truffle provided name of the generated nmethod when printing the nmethod names. This results in summary output like this for the octane splay benchmark:

Hot generated code:
  Percent   Name
    3.04%   10260: "MeasureDefault#2" OptimizedCallTarget.profiledPERoot(Object[])
    1.79%   5934: "GeneratePayloadTree#2" OptimizedCallTarget.profiledPERoot(Object[])
    0.74%   10118: "SplayTree.remove#2" OptimizedCallTarget.profiledPERoot(Object[])
    0.65%   Interpreter
    0.18%   7976: "SplayRun#2" OptimizedCallTarget.profiledPERoot(Object[])
    0.16%   11695: FastDtoa.digitGen(DiyFp, DiyFp, DiyFp, DtoaBuffer, int)
    0.13%   8914: "SplayTree.splay_#2" OptimizedCallTarget.profiledPERoot(Object[])
    0.07%   1345: OptimizedCallTarget.callBoundary(Object[])
    0.07%   jshort_disjoint_arraycopy
    0.07%   7877: "SplayTree.splay_#1" OptimizedCallTarget.profiledPERoot(Object[])

OptimizedCallTarget is the Java method that is used by the partial evaluator when compiling Truffle methods and the name in quotes should correspond to the compiled JavaScript function. More support is planned for future versions of prftool.

Other options to profhot give more control over how this is printed. It's possible to strip package names from the output using the '-s' option which can make the output more compact. The debug information associated with a pc can be quite deep which will overwhelm the actual assembly output. The '-c' option can be used to control the number of frames printed, so passing 0 will hide the frame information completely and 1 will show just deepest inline frame.