1.5 JMH的Profiler
JMH提供了一些非常有用的Profiler可以帮助我们更加深入地了解基准测试,甚至还能帮助开发者分析所编写的代码,JMH目前提供了如表1-2所示的一些Profiler以供使用。
表1-2 JMH所提供的Profiler
关于Profiler,这里并没有逐一进行介绍,只是介绍Stack、GC、CL、COMP这几个Profiler,其他的Profiler读者可以自行尝试。
1.5.1 StackProfiler
StackProfiler不仅可以输出线程堆栈的信息,还能统计程序在执行的过程中线程状态的数据,比如RUNNING状态、WAIT状态所占用的百分比等,下面对1.4.2节中所列举的例子稍加改造。
程序代码:JMHExample21.java
package com.wangwenjun.concurrent.jmh; import org.openjdk.jmh.annotations.*; import org.openjdk.jmh.profile.StackProfiler; import org.openjdk.jmh.runner.Runner; import org.openjdk.jmh.runner.RunnerException; import org.openjdk.jmh.runner.options.Options; import org.openjdk.jmh.runner.options.OptionsBuilder; import org.openjdk.jmh.runner.options.TimeValue; import java.util.concurrent.ArrayBlockingQueue; import java.util.concurrent.BlockingQueue; import java.util.concurrent.TimeUnit; @BenchmarkMode(Mode.AverageTime) @Fork(1) @Warmup(iterations = 5) @Measurement(iterations = 5) @OutputTimeUnit(TimeUnit.MICROSECONDS) @State(Scope.Group) public class JMHExample21 { private BlockingQueue<Integer> queue; private final static int VALUE = Integer.MAX_VALUE; @Setup public void init() { this.queue = new ArrayBlockingQueue<>(10); } @GroupThreads(5) @Group("blockingQueue") @Benchmark public void put() throws InterruptedException { this.queue.put(VALUE); } @GroupThreads(5) @Group("blockingQueue") @Benchmark public int take() throws InterruptedException { return this.queue.take(); } public static void main(String[] args) throws RunnerException { final Options opts = new OptionsBuilder() .include(JMHExample21.class.getSimpleName()) .timeout(TimeValue.seconds(10)) .addProfiler(StackProfiler.class)// 增加StackProfiler .build(); new Runner(opts).run(); } }
我们在Options中增加了StackProfiler用于分析线程的堆栈情况,还可以输出线程状态的分布情况,下面是JMHExample20.java的JMH运行输出结果。
程序输出:JMHExample21.java
Secondary result "com.wangwenjun.concurrent.jmh.JMHExample21.blockingQueue:·stack": Stack profiler: ....[Thread state distributions]...................................................... 89.3% WAITING 10.7% RUNNABLE ....[Thread state: WAITING]........................................................... 89.3% 100.0% sun.misc.Unsafe.park ....[Thread state: RUNNABLE].......................................................... 9.1% 84.8% java.net.SocketInputStream.socketRead0 1.2% 11.1% sun.misc.Unsafe.unpark 0.3% 2.9% sun.misc.Unsafe.park 0.0% 0.3% java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await 0.0% 0.2% com.wangwenjun.concurrent.jmh.JMHExample21.take 0.0% 0.2% com.wangwenjun.concurrent.jmh.JMHExample21.put 0.0% 0.2% java.util.concurrent.ArrayBlockingQueue.put 0.0% 0.1% java.util.concurrent.ArrayBlockingQueue.take 0.0% 0.1% java.util.concurrent.locks.AbstractQueuedSynchronizer.findNodeFromTail 0.0% 0.1% java.util.concurrent.locks.AbstractQueuedSynchronizer.isOnSyncQueue # Run complete. Total time: 00:00:47 Benchmark Mode Cnt Score Error Units JMHExample21.blockingQueue avgt 5 184387.755 ± 1070466.289 us/op JMHExample21.blockingQueue:put avgt 5 45540.153 ± 298845.887 us/op JMHExample21.blockingQueue:take avgt 5 323235.357 ± 2244498.836 us/op JMHExample21.blockingQueue:·stack avgt NaN ---
通过上面的输出结果可以看到,线程状态的分布情况为WAITING:89.3%,RUNNABLE:10.7%,考虑到我们使用的是BlockingQueue,因此这种分布应该还算合理。
1.5.2 GcProfiler
GcProfiler可用于分析出在测试方法中垃圾回收器在JVM每个内存空间上所花费的时间,本节将使用自定义的类加载器进行类的加载。
程序代码:AlexClassLoader.java
package com.wangwenjun.concurrent.jmh; import java.net.URL; import java.net.URLClassLoader; public class AlexClassLoader extends URLClassLoader { private final byte[] bytes; public AlexClassLoader(byte[] bytes) { super(new URL[0], ClassLoader.getSystemClassLoader()); this.bytes = bytes; } @Override protected Class<?> findClass(String name) throws ClassNotFoundException { return defineClass(name, bytes, 0, bytes.length); } }
接下来的程序将使用这个类加载器进行相关类的加载,比如下面的Alex类,非常简单。
程序代码:Alex.java
public class Alex { private String name = "Alex Wang"; private int age = 33; private byte[] data = new byte[1024 * 10]; }
将Alex.java源文件编译成Alex.class字节码文件之后,我们将使用前面定义的类加载器对该字节码进行加载,然后用JMH对该类的加载进行基准测试,并且增加GcProfiler查看Gc的情况。
程序代码:JMHExample22.java
package com.wangwenjun.concurrent.jmh; import org.openjdk.jmh.annotations.*; import org.openjdk.jmh.profile.GCProfiler; import org.openjdk.jmh.profile.StackProfiler; import org.openjdk.jmh.runner.Runner; import org.openjdk.jmh.runner.RunnerException; import org.openjdk.jmh.runner.options.Options; import org.openjdk.jmh.runner.options.OptionsBuilder; import org.openjdk.jmh.runner.options.TimeValue; import java.io.IOException; import java.nio.file.Files; import java.nio.file.Paths; import java.util.concurrent.ArrayBlockingQueue; import java.util.concurrent.BlockingQueue; import java.util.concurrent.TimeUnit; @BenchmarkMode(Mode.AverageTime) @Fork(1) @Warmup(iterations = 5) @Measurement(iterations = 5) @OutputTimeUnit(TimeUnit.MICROSECONDS) @State(Scope.Thread) public class JMHExample22 { private byte[] alexBytes; private AlexClassLoader classLoader; @Setup public void init() throws IOException { this.alexBytes = Files.readAllBytes( Paths.get("C:\\Users\\wangwenjun\\IdeaProjects\\java-concurrency-book2\\target\\classes\\Alex.class") ); this.classLoader = new AlexClassLoader(alexBytes); } @Benchmark public Object testLoadClass() throws ClassNotFoundException, IllegalAccessException, InstantiationException { Class<?> alexClass = Class.forName("Alex", true, classLoader); return alexClass.newInstance(); } public static void main(String[] args) throws RunnerException { final Options opts = new OptionsBuilder() .include(JMHExample22.class.getSimpleName()) // add GcProfiler输出基准方法执行过程中的GC信息 .addProfiler(GCProfiler.class) // 将最大堆内存设置为128MB,会有多次的GC发生 .jvmArgsAppend("-Xmx128M") .build(); new Runner(opts).run(); } }
运行上面的基准测试方法,我们除了得到testLoadClass()方法的基准数据之外,还会得到GC相关的信息。
程序输出:JMHExample22.java
Iteration 1: 6.500 us/op ·gc.alloc.rate: 1006.907 MB/sec ·gc.alloc.rate.norm: 10280.003 B/op ·gc.churn.PS_Eden_Space: 1009.727 MB/sec ·gc.churn.PS_Eden_Space.norm: 10308.789 B/op ·gc.churn.PS_Survivor_Space: 0.164 MB/sec ·gc.churn.PS_Survivor_Space.norm: 1.678 B/op ·gc.count: 37.000 counts ·gc.time: 209.000 ms Iteration 2: 6.677 us/op ·gc.alloc.rate: 978.181 MB/sec ·gc.alloc.rate.norm: 10280.003 B/op ·gc.churn.PS_Eden_Space: 1004.544 MB/sec ·gc.churn.PS_Eden_Space.norm: 10557.057 B/op ·gc.churn.PS_Survivor_Space: 0.143 MB/sec ·gc.churn.PS_Survivor_Space.norm: 1.504 B/op ·gc.count: 37.000 counts ·gc.time: 321.000 ms Iteration 3: 5.484 us/op ·gc.alloc.rate: 1191.701 MB/sec ·gc.alloc.rate.norm: 10280.002 B/op ·gc.churn.PS_Eden_Space: 1184.606 MB/sec ·gc.churn.PS_Eden_Space.norm: 10218.794 B/op ·gc.churn.PS_Survivor_Space: 0.207 MB/sec ·gc.churn.PS_Survivor_Space.norm: 1.790 B/op ·gc.count: 43.000 counts ·gc.time: 217.000 ms Iteration 4: 4.833 us/op ·gc.alloc.rate: 1349.548 MB/sec ·gc.alloc.rate.norm: 10280.002 B/op ·gc.churn.PS_Eden_Space: 1362.911 MB/sec ·gc.churn.PS_Eden_Space.norm: 10381.792 B/op ·gc.churn.PS_Survivor_Space: 0.287 MB/sec ·gc.churn.PS_Survivor_Space.norm: 2.189 B/op ·gc.count: 50.000 counts ·gc.time: 159.000 ms Iteration 5: 5.763 us/op ·gc.alloc.rate: 1134.138 MB/sec ·gc.alloc.rate.norm: 10280.002 B/op ·gc.churn.PS_Eden_Space: 1143.449 MB/sec ·gc.churn.PS_Eden_Space.norm: 10364.396 B/op ·gc.churn.PS_Survivor_Space: 0.308 MB/sec ·gc.churn.PS_Survivor_Space.norm: 2.787 B/op ·gc.count: 42.000 counts ·gc.time: 233.000 ms Result "com.wangwenjun.concurrent.jmh.JMHExample22.testLoadClass": 5.851 ±(99.9%) 2.909 us/op [Average] (min, avg, max) = (4.833, 5.851, 6.677), stdev = 0.755 CI (99.9%): [2.943, 8.760] (assumes normal distribution) Secondary result "com.wangwenjun.concurrent.jmh.JMHExample22.testLoadClass:·gc.alloc.rate": 1132.095 ±(99.9%) 578.252 MB/sec [Average] (min, avg, max) = (978.181, 1132.095, 1349.548), stdev = 150.170 CI (99.9%): [553.843, 1710.347] (assumes normal distribution) Secondary result "com.wangwenjun.concurrent.jmh.JMHExample22.testLoadClass:·gc.alloc.rate.norm": 10280.003 ±(99.9%) 0.001 B/op [Average] (min, avg, max) = (10280.002, 10280.003, 10280.003), stdev = 0.001 CI (99.9%): [10280.001, 10280.004] (assumes normal distribution) Secondary result "com.wangwenjun.concurrent.jmh.JMHExample22.testLoadClass:·gc.churn.PS_Eden_Space": 1141.047 ±(99.9%) 567.904 MB/sec [Average] (min, avg, max) = (1004.544, 1141.047, 1362.911), stdev = 147.483 CI (99.9%): [573.143, 1708.951] (assumes normal distribution) Secondary result "com.wangwenjun.concurrent.jmh.JMHExample22.testLoadClass:·gc.churn.PS_Eden_Space.norm": 10366.166 ±(99.9%) 478.230 B/op [Average] (min, avg, max) = (10218.794, 10366.166, 10557.057), stdev = 124.195 CI (99.9%): [9887.936, 10844.395] (assumes normal distribution) Secondary result "com.wangwenjun.concurrent.jmh.JMHExample22.testLoadClass:·gc.churn.PS_Survivor_Space": 0.222 ±(99.9%) 0.281 MB/sec [Average] (min, avg, max) = (0.143, 0.222, 0.308), stdev = 0.073 CI (99.9%): [≈ 0, 0.503] (assumes normal distribution) Secondary result "com.wangwenjun.concurrent.jmh.JMHExample22.testLoadClass:·gc.churn.PS_Survivor_Space.norm": 1.990 ±(99.9%) 1.972 B/op [Average] (min, avg, max) = (1.504, 1.990, 2.787), stdev = 0.512 CI (99.9%): [0.018, 3.961] (assumes normal distribution) Secondary result "com.wangwenjun.concurrent.jmh.JMHExample22.testLoadClass:·gc.count": 209.000 ±(99.9%) 0.001 counts [Sum] (min, avg, max) = (37.000, 41.800, 50.000), stdev = 5.357 CI (99.9%): [209.000, 209.000] (assumes normal distribution) Secondary result "com.wangwenjun.concurrent.jmh.JMHExample22.testLoadClass:·gc.time": 1139.000 ±(99.9%) 0.001 ms [Sum] (min, avg, max) = (159.000, 227.800, 321.000), stdev = 58.985 CI (99.9%): [1139.000, 1139.000] (assumes normal distribution) # Run complete. Total time: 00:00:20 Benchmark Mode Cnt Score Error Units JMHExample22.testLoadClass avgt 5 5.851 ± 2.909 us/op JMHExample22.testLoadClass:·gc.alloc.rate avgt 5 1132.095 ± 578.252 MB/sec JMHExample22.testLoadClass:·gc.alloc.rate.norm avgt 5 10280.003 ± 0.001 B/op JMHExample22.testLoadClass:·gc.churn.PS_Eden_Space avgt 5 1141.047 ± 567.904 MB/sec JMHExample22.testLoadClass:·gc.churn.PS_Eden_Space.norm avgt 5 10366.166 ± 478.230 B/op JMHExample22.testLoadClass:·gc.churn.PS_Survivor_Space avgt 5 0.222 ± 0.281 MB/sec JMHExample22.testLoadClass:·gc.churn.PS_Survivor_Space.norm avgt 5 1.990 ± 1.972 B/op JMHExample22.testLoadClass:·gc.count avgt 5 209.000 counts JMHExample22.testLoadClass:·gc.time avgt 5 1139.000 ms
根据GcProfiler的输出信息可以看到,在这个基准方法执行的过程之中,gc总共出现过209次,这209次总共耗时1139毫秒,在此期间也发生了多次的堆内存的申请,比如,每秒钟大约会有1132.095MB的数据被创建,若换算成对testLoadClass方法的每次调用,那么我们会发现大约有10280.003 Byte的内存使用。
1.5.3 ClassLoaderProfiler
ClassLoaderProfiler可以帮助我们看到在基准方法的执行过程中有多少类被加载和卸载,但是考虑到在一个类加载器中同一个类只会被加载一次的情况,因此我们需要将Warmup设置为0,以避免在热身阶段就已经加载了基准测试方法所需的所有类。
程序代码:JMHExample23.java
package com.wangwenjun.concurrent.jmh; import org.openjdk.jmh.annotations.*; import org.openjdk.jmh.profile.ClassloaderProfiler; import org.openjdk.jmh.profile.GCProfiler; import org.openjdk.jmh.runner.Runner; import org.openjdk.jmh.runner.RunnerException; import org.openjdk.jmh.runner.options.Options; import org.openjdk.jmh.runner.options.OptionsBuilder; import java.io.IOException; import java.nio.file.Files; import java.nio.file.Paths; import java.util.concurrent.TimeUnit; @BenchmarkMode(Mode.AverageTime) @Fork(1) // 将热身批次设置为0 @Warmup(iterations = 0) @Measurement(iterations = 5) @OutputTimeUnit(TimeUnit.MICROSECONDS) @State(Scope.Thread) public class JMHExample23 { private byte[] alexBytes; private AlexClassLoader classLoader; @Setup public void init() throws IOException { this.alexBytes = Files.readAllBytes( Paths.get("C:\\Users\\wangwenjun\\IdeaProjects\\java-concurrency-book2\\target\\classes\\Alex.class") ); this.classLoader = new AlexClassLoader(alexBytes); } @Benchmark public Object testLoadClass() throws ClassNotFoundException, IllegalAccessException, InstantiationException { Class<?> alexClass = Class.forName("Alex", true, classLoader); return alexClass.newInstance(); } public static void main(String[] args) throws RunnerException { final Options opts = new OptionsBuilder() .include(JMHExample23.class.getSimpleName()) // 增加CL Profiler,输出类的加载、卸载信息 .addProfiler(ClassloaderProfiler.class) .build(); new Runner(opts).run(); } }
运行上面的基准测试方法,我们将会看到在第一个批次的度量时加载了大量的类,在余下的几次度量中将不会再进行类的加载了,这也符合JVM类加载器的基本逻辑。
程序输出:JMHExample23.java
Iteration 1: 10.966 us/op ·class.load: 107.141 classes/sec ·class.load.norm: 0.001 classes/op ·class.unload: ≈ 0 classes/sec ·class.unload.norm: ≈ 0 classes/op Iteration 2: 6.536 us/op ·class.load: ≈ 0 classes/sec ·class.load.norm: ≈ 0 classes/op ·class.unload: ≈ 0 classes/sec ·class.unload.norm: ≈ 0 classes/op Iteration 3: 5.551 us/op ·class.load: ≈ 0 classes/sec ·class.load.norm: ≈ 0 classes/op ·class.unload: ≈ 0 classes/sec ·class.unload.norm: ≈ 0 classes/op Iteration 4: 2.955 us/op ·class.load: ≈ 0 classes/sec ·class.load.norm: ≈ 0 classes/op ·class.unload: ≈ 0 classes/sec ·class.unload.norm: ≈ 0 classes/op Iteration 5: 2.971 us/op ·class.load: ≈ 0 classes/sec ·class.load.norm: ≈ 0 classes/op ·class.unload: ≈ 0 classes/sec ·class.unload.norm: ≈ 0 classes/op Result "com.wangwenjun.concurrent.jmh.JMHExample23.testLoadClass": 5.796 ±(99.9%) 12.682 us/op [Average] (min, avg, max) = (2.955, 5.796, 10.966), stdev = 3.294 CI (99.9%): [≈ 0, 18.478] (assumes normal distribution) Secondary result "com.wangwenjun.concurrent.jmh.JMHExample23.testLoadClass:·class.load": 21.428 ±(99.9%) 184.504 classes/sec [Average] (min, avg, max) = (≈ 0, 21.428, 107.141), stdev = 47.915 CI (99.9%): [≈ 0, 205.932] (assumes normal distribution) Secondary result "com.wangwenjun.concurrent.jmh.JMHExample23.testLoadClass:·class.load.norm": ≈ 10<sup>-4</sup> classes/op Secondary result "com.wangwenjun.concurrent.jmh.JMHExample23.testLoadClass:·class.unload": ≈ 0 classes/sec Secondary result "com.wangwenjun.concurrent.jmh.JMHExample23.testLoadClass:·class.unload.norm": ≈ 0 classes/op # Run complete. Total time: 00:00:07 Benchmark Mode Cnt Score Error Units JMHExample23.testLoadClass avgt 5 5.796 ± 12.682 us/op JMHExample23.testLoadClass:·class.load avgt 5 21.428 ± 184.504 classes/sec JMHExample23.testLoadClass:·class.load.norm avgt 5 ≈ 10<sup>-4</sup> classes/op JMHExample23.testLoadClass:·class.unload avgt 5 ≈ 0 classes/sec JMHExample23.testLoadClass:·class.unload.norm avgt 5 ≈ 0 classes/op
我们可以看到,在testLoadClass方法的执行过程中,每秒大约会有21个类的加载。
1.5.4 CompilerProfiler
CompilerProfiler将会告诉你在代码的执行过程中JIT编译器所花费的优化时间,我们可以打开verbose模式观察更详细的输出。
程序代码:JMHExample24.java
package com.wangwenjun.concurrent.jmh; import org.openjdk.jmh.annotations.*; import org.openjdk.jmh.profile.ClassloaderProfiler; import org.openjdk.jmh.profile.CompilerProfiler; import org.openjdk.jmh.runner.Runner; import org.openjdk.jmh.runner.RunnerException; import org.openjdk.jmh.runner.options.Options; import org.openjdk.jmh.runner.options.OptionsBuilder; import org.openjdk.jmh.runner.options.VerboseMode; import java.io.IOException; import java.nio.file.Files; import java.nio.file.Paths; import java.util.concurrent.TimeUnit; @BenchmarkMode(Mode.AverageTime) @Fork(1) @Warmup(iterations = 5) @Measurement(iterations = 5) @OutputTimeUnit(TimeUnit.MICROSECONDS) @State(Scope.Thread) public class JMHExample24 { private byte[] alexBytes; private AlexClassLoader classLoader; @Setup public void init() throws IOException { this.alexBytes = Files.readAllBytes( Paths.get("C:\\Users\\wangwenjun\\IdeaProjects\\java-concurrency-book2\\target\\classes\\Alex.class") ); this.classLoader = new AlexClassLoader(alexBytes); } @Benchmark public Object testLoadClass() throws ClassNotFoundException, IllegalAccessException, InstantiationException { Class<?> alexClass = Class.forName("Alex", true, classLoader); return alexClass.newInstance(); } public static void main(String[] args) throws RunnerException { final Options opts = new OptionsBuilder() .include(JMHExample24.class.getSimpleName()) .addProfiler(CompilerProfiler.class) .verbosity(VerboseMode.EXTRA) .build(); new Runner(opts).run(); } }
运行上面的基准测试,我们将得出JIT在整个过程中的优化耗时,输出结果具体如下。
程序输出:JMHExample24.java
Iteration 1: 4.174 us/op ·compiler.time.profiled: ≈ 0 ms ·compiler.time.total: 1212.000 ms Iteration 2: 3.968 us/op ·compiler.time.profiled: 20.000 ms ·compiler.time.total: 1233.000 ms Iteration 3: 3.338 us/op ·compiler.time.profiled: ≈ 0 ms ·compiler.time.total: 1235.000 ms Iteration 4: 2.793 us/op ·compiler.time.profiled: ≈ 0 ms ·compiler.time.total: 1236.000 ms Iteration 5: 2.584 us/op ·compiler.time.profiled: ≈ 0 ms ·compiler.time.total: 1237.000 ms Result "com.wangwenjun.concurrent.jmh.JMHExample24.testLoadClass": 3.372 ±(99.9%) 2.692 us/op [Average] (min, avg, max) = (2.584, 3.372, 4.174), stdev = 0.699 CI (99.9%): [0.679, 6.064] (assumes normal distribution) Secondary result "com.wangwenjun.concurrent.jmh.JMHExample24.testLoadClass:·compiler.time.profiled": 20.000 ±(99.9%) 0.001 ms [Sum] (min, avg, max) = (≈ 0, 4.000, 20.000), stdev = 8.944 CI (99.9%): [20.000, 20.000] (assumes normal distribution) Secondary result "com.wangwenjun.concurrent.jmh.JMHExample24.testLoadClass:·compiler.time.total": 1237.000 ±(99.9%) 0.001 ms [Maximum] (min, avg, max) = (1212.000, 1230.600, 1237.000), stdev = 10.502 CI (99.9%): [1237.000, 1237.000] (assumes normal distribution) # Run complete. Total time: 00:00:12 Benchmark Mode Cnt Score Error Units JMHExample24.testLoadClass avgt 5 3.372 ± 2.692 us/op JMHExample24.testLoadClass:·compiler.time.profiled avgt 5 20.000 ms JMHExample24.testLoadClass:·compiler.time.total avgt 5 1237.000 ms
我们可以看到,在整个方法的执行过程中,profiled的优化耗时为20毫秒,total的优化耗时为1237毫秒。