Poison

GC Timestamp

今天查了个 GC 相关的问题,问题并不复杂,只是在比对 GC 日志中的时间与业务超时调用的时间时发现有一点偏差,原来是因为 GC 日志中的时间戳是 GC 开始前的时间戳,而我之前一直以为是 GC 完成后的时间。比如如下 GC 日志:

1
2
2022-12-16T11:10:04.939+0800: 87938160.092: [GC (Allocation Failure) 2022-12-16T11:10:04.939+0800: 87938160.092: [ParNew (promotion failed): 1850515K->1850515K(1850624K), 0.6136336 secs]2022-12-16T11:10:05.552+0800: 87938160.706: [CMS2022-12-16T11:10:06.272+0800: 87938161.426: [CMS-concurrent-mark: 1.702/2.321 secs] [Times: user=0.00 sys=0.00, real=2.32 secs] 
(concurrent mode failure): 8844971K->9005083K(9697280K), 7.7882370 secs] 10643921K->9005083K(11547904K), [Metaspace: 99728K->99728K(1140736K)], 8.4022234 secs] [Times: user=0.00 sys=0.00, real=8.40 secs]

起初我以为 2022-12-16T11:10:04.939 是本次 GC 结束的时间,而实际上这个时间戳表示的是 GC 开始的时间,可以参考源码 jdk8u/genCollectedHeap.cpp at jdk8u362-b06:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
void GenCollectedHeap::do_collection(bool  full,
bool clear_all_soft_refs,
size_t size,
bool is_tlab,
int max_level) {
// ...

{
FlagSetting fl(_is_gc_active, true);

bool complete = full && (max_level == (n_gens()-1));
const char* gc_cause_prefix = complete ? "Full GC" : "GC";
TraceCPUTime tcpu(PrintGCDetails, true, gclog_or_tty);
// The PrintGCDetails logging starts before we have incremented the GC id. We will do that later
// so we can assume here that the next GC id is what we want.
GCTraceTime t(GCCauseString(gc_cause_prefix, gc_cause()), PrintGCDetails, false, NULL, GCId::peek());

// collection ...
}

// ...
}

其中第 16 行触发了时间戳的打印,这部分的源码位于 jdk8u/gcTraceTime.cpp at jdk8u362-b06:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
GCTraceTime::GCTraceTime(const char* title, bool doit, bool print_cr, GCTimer* timer, GCId gc_id) :
_title(title), _doit(doit), _print_cr(print_cr), _timer(timer), _start_counter() {
if (_doit || _timer != NULL) {
_start_counter.stamp();
}

if (_timer != NULL) {
assert(SafepointSynchronize::is_at_safepoint(), "Tracing currently only supported at safepoints");
assert(Thread::current()->is_VM_thread(), "Tracing currently only supported from the VM thread");

_timer->register_gc_phase_start(title, _start_counter);
}

if (_doit) {
gclog_or_tty->date_stamp(PrintGCDateStamps);
gclog_or_tty->stamp(PrintGCTimeStamps);
if (PrintGCID) {
gclog_or_tty->print("#%u: ", gc_id.id());
}
gclog_or_tty->print("[%s", title);
gclog_or_tty->flush();
}
}

即在实际的 GC 事件之前进行的打印,从 HBase 自身 JvmPauseMonitor 打印的日志也可以应证这一点:

1
2
3
2022-12-16 11:10:13,347 INFO  [JvmPauseMonitor] util.JvmPauseMonitor(178): Detected pause in JVM or host machine (eg GC): pause of approximately 8143ms
GC pool 'ParNew' had collection(s): count=1 time=613ms
GC pool 'ConcurrentMarkSweep' had collection(s): count=1 time=7788ms
Reference

Are the timestamps of verbose JVM garbage collection logs the beginning or end of the collection? - Stack Overflow
hbase/JvmPauseMonitor.java at master · apache/hbase · GitHub
HBase应用架构 - 图书 - 豆瓣
Garbage Collection in Java (3) - Concurrent Mark Sweep