Poison

Ergonomics

业务方反馈某组应用偶现超时,超时时间为 2s,大概一两天出现一次,经过排查,超时出现的时刻应用正在进行 FullGC,触发原因为 Ergonomics,该组应用未显式设置过 GC 收集器,经查询,该组应用使用的垃圾收集器为 The Parallel Collector,该收集器以吞吐量为目标,并不关注最大暂停时间,且根据 Oracle 的文档及线上的应用测试,在使用该收集器时即使设置了 -XX:MaxGCPauseMillis=<nnn>,也只能提示收集器尽量实现该目标,并不能保证一定能将最大暂停时间控制在设置的最大值。

关于 Ergonomics,Oracle 文档描述如下:

Ergonomics is the process by which the Java Virtual Machine (JVM) and garbage collection tuning, such as behavior-based tuning, improve application performance. The JVM provides platform-dependent default selections for the garbage collector, heap size, and runtime compiler. These selections match the needs of different types of applications while requiring less command-line tuning. In addition, behavior-based tuning dynamically tunes the sizes of the heap to meet a specified behavior of the application.

大意是说,Ergonomics 是 JVM 中垃圾收集调优的进程,比如基于行为的调优、提高应用的性能。除此之外,基于行为的调优动态的改变堆的大小以实现应用特定的行为。文档中还提到:

The heap will grow or shrink to a size that will support the chosen throughput goal. A change in the application’s behavior can cause the heap to grow or shrink. For example, if the application starts allocating at a higher rate, the heap will grow to maintain the same throughput.
It is typical that the size of the heap will oscillate as the garbage collector tries to satisfy competing goals. This is true even if the application has reached a steady state.

为了实现吞吐量目标,堆会不停的伸缩,即使应用程序以达到稳定状态也是如此。而不停的伸缩会触发 FullGC,为了避免堆的伸缩,有部分调优文档建议将最大堆大小与最小堆大小设置为相同的值,如:Tuning Java Virtual Machines (JVMs) 中提到:

In production environments, set the minimum heap size and the maximum heap size to the same value to prevent wasting VM resources used to constantly grow and shrink the heap. This also applies to the New generation heap sizes (Sun) or Nursery size (Jrockit).

其实在 Oracle 官方文档 Available Collectors 对如何选择收集器已经有了明确的说明:

Selecting a Collector
Unless your application has rather strict pause time requirements, first run your application and allow the VM to select a collector. If necessary, adjust the heap size to improve performance. If the performance still does not meet your goals, then use the following guidelines as a starting point for selecting a collector.

  • If the application has a small data set (up to approximately 100 MB), then select the serial collector with the option -XX:+UseSerialGC.
  • If the application will be run on a single processor and there are no pause time requirements, then let the VM select the collector, or select the serial collector with the option -XX:+UseSerialGC.
  • If (a) peak application performance is the first priority and (b) there are no pause time requirements or pauses of 1 second or longer are acceptable, then let the VM select the collector, or select the parallel collector with -XX:+UseParallelGC.
  • If response time is more important than overall throughput and garbage collection pauses must be kept shorter than approximately 1 second, then select the concurrent collector with -XX:+UseConcMarkSweepGC or -XX:+UseG1GC.

根据最后一点,我将该组应用的 GC 收集器更换为 G1 后解决了该超时问题。

附查询当前应用使用的垃圾收集器的方法:

  1. 使用 jps -lvm 查询出所有的 JVM 进程
  2. 使用 jmap -heap <pid> 查询指定进程 JVM 堆摘要信息,其中含有使用的 GC 算法,可以根据打印出的 GC 算法从 HeapSummary.printGCAlgorithm 中查询出实际使用的垃圾收集器
Reference

Java Platform, Standard Edition HotSpot Virtual Machine Garbage Collection Tuning Guide
The jmap Utility
The jps Utility
GCeasy
Which GC algorithm is being used by a specific JVM instance?
JDK-8067243 GC reason “Ergonomics” confusing - Java Bug System