Poison

Continuous Full GC

当 JVM 执行 FullGC 时,JVM 应用线程均被暂停执行,基于 Java Agent 的监控方案全部失效,如 JMX Exporter 表现出来的现象就是采集端掉线,Prometheus 中心对 Java Agent 发起请求时因目标 JVM 正在 FullGC 而不能采集数据。不过因为健康检查等组件的存在,影响不会很大,因为实例被探测为不可用后就会被自动移除。那么,此种情况下如果需要实时监控 GC 情况,则只要保证监控器不使用同一 JVM 实例实现,如 eero 采用的方案为使用 Python 监听 gc.log 再进行上报的方式实现。如果无需实时监控 FullGC 的话,可以使用 micrometer 提供的 JvmGcMetrics 进行 GC 信息监控,其基于 JMX 中 NotificationEmitter 的方式实现了基于通知方式的 GC 数据更新,如果发生 FullGC,数据可能会因为 STW 不能马上被采集到,但是后续采集时依然能够采集到,源码位于 JvmGcMetrics.java at v1.8.2:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
class GcMetricsNotificationListener implements NotificationListener {
private final MeterRegistry registry;

GcMetricsNotificationListener(MeterRegistry registry) {
this.registry = registry;
}

@Override
public void handleNotification(Notification notification, Object ref) {
CompositeData cd = (CompositeData) notification.getUserData();
GarbageCollectionNotificationInfo notificationInfo = GarbageCollectionNotificationInfo.from(cd);

String gcCause = notificationInfo.getGcCause();
String gcAction = notificationInfo.getGcAction();
GcInfo gcInfo = notificationInfo.getGcInfo();
long duration = gcInfo.getDuration();
if (isConcurrentPhase(gcCause, notificationInfo.getGcName())) {
Timer.builder("jvm.gc.concurrent.phase.time")
.tags(tags)
.tags("action", gcAction, "cause", gcCause)
.description("Time spent in concurrent phase")
.register(registry)
.record(duration, TimeUnit.MILLISECONDS);
} else {
Timer.builder("jvm.gc.pause")
.tags(tags)
.tags("action", gcAction, "cause", gcCause)
.description("Time spent in GC pause")
.register(registry)
.record(duration, TimeUnit.MILLISECONDS);
}

final Map<String, MemoryUsage> before = gcInfo.getMemoryUsageBeforeGc();
final Map<String, MemoryUsage> after = gcInfo.getMemoryUsageAfterGc();

countPoolSizeDelta(before, after);

final long longLivedBefore = longLivedPoolNames.stream().mapToLong(pool -> before.get(pool).getUsed()).sum();
final long longLivedAfter = longLivedPoolNames.stream().mapToLong(pool -> after.get(pool).getUsed()).sum();
if (isGenerationalGc) {
final long delta = longLivedAfter - longLivedBefore;
if (delta > 0L) {
promotedBytes.increment(delta);
}
}

// Some GC implementations such as G1 can reduce the old gen size as part of a minor GC. To track the
// live data size we record the value if we see a reduction in the long-lived heap size or
// after a major/non-generational GC.
if (longLivedAfter < longLivedBefore || shouldUpdateDataSizeMetrics(notificationInfo.getGcName())) {
liveDataSize.set(longLivedAfter);
maxDataSize.set(longLivedPoolNames.stream().mapToLong(pool -> after.get(pool).getMax()).sum());
}
}

private void countPoolSizeDelta(Map<String, MemoryUsage> before, Map<String, MemoryUsage> after) {
if (allocationPoolName == null) {
return;
}
final long beforeBytes = before.get(allocationPoolName).getUsed();
final long afterBytes = after.get(allocationPoolName).getUsed();
final long delta = beforeBytes - allocationPoolSizeAfter.get();
allocationPoolSizeAfter.set(afterBytes);
if (delta > 0L) {
allocatedBytes.increment(delta);
}
}

private boolean shouldUpdateDataSizeMetrics(String gcName) {
return nonGenerationalGcShouldUpdateDataSize(gcName) || isMajorGenerationalGc(gcName);
}

private boolean isMajorGenerationalGc(String gcName) {
return GcGenerationAge.fromGcName(gcName) == GcGenerationAge.OLD;
}

private boolean nonGenerationalGcShouldUpdateDataSize(String gcName) {
return !isGenerationalGc
// Skip Shenandoah and ZGC gc notifications with the name Pauses due to missing memory pool size info
&& !gcName.endsWith("Pauses");
}
}

其 GC 数据来源于 GarbageCollectorImpl 类,该类继承了 MemoryManagerImpl 类,而 MemoryManagerImpl 类又继承了 NotificationEmitterSupport 以支持 GC 事件通知发送,相关源码可参考文末链接。

同理,也可以采用 micrometer 的 JvmHeapPressureMetrics 进行 JVM 堆内内存压力监控,通过监测 GC 后内存占比及 GC 耗时占比可以计算出堆内内存占比,源码位于:micrometer/JvmHeapPressureMetrics.java at main · micrometer-metrics/micrometer · GitHub。思想与 TeamCity Memory Monitor | TeamCity On-Premises 类似。

Reference

Garbagedog: How eero does continuous monitoring of Java garbage collection
garbagedog
jdk/GarbageCollectorImpl.java at jdk8-b120 · openjdk/jdk · GitHub
jdk/MemoryManagerImpl.java at jdk8-b120 · openjdk/jdk · GitHub
jdk/NotificationEmitterSupport.java at jdk8-b120 · openjdk/jdk · GitHub