Poison

OmitStackTraceInFastThrow

今天有一台服务器频繁打印 java.lang.NullPointerException,但是不包含栈帧信息,起初我以为是应用层代码使用 log4j2 打印时直接调用了 log.error(e) 进行打印,因为该方法最终会调用异常对象的 toString() 方法然后打印出异常信息,对于 java.lang.NullPointerException 实例,最终调用的 java.lang.Throwable#toString 获取到需要打印的字符串:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
/**
* Returns a short description of this throwable.
* The result is the concatenation of:
* <ul>
* <li> the {@linkplain Class#getName() name} of the class of this object
* <li> ": " (a colon and a space)
* <li> the result of invoking this object's {@link #getLocalizedMessage}
* method
* </ul>
* If {@code getLocalizedMessage} returns {@code null}, then just
* the class name is returned.
*
* @return a string representation of this throwable.
*/
public String toString() {
String s = getClass().getName();
String message = getLocalizedMessage();
return (message != null) ? (s + ": " + message) : s;
}

即会打印出不含栈帧的异常信息,如:java.lang.NullPointerException。一般来说,我们尽量避免调用 log.error(e) 而是调用 log.error("some message", e) 以使打印出的字符串含有栈帧。

但是在今天这个场景下,经过我的检查,确认 java.lang.NullPointerException 不是由 log.error(e) 打印出,那是哪里打印出的呢?且因为没有栈帧信息,导致难以定位异常根源,只能看到 java.lang.NullPointerException 这样简短的信息,经过一阵排查,最后确认是 JVM 优化导致,即对于部分异常类型,如果抛出达到一定的次数,JVM 将会使用预构建的异常实例代替实际需要抛出的异常实例进行抛出,以提升性能,但是预构建的异常实例是不含栈帧信息的,这就导致打印出来的异常信息不含栈帧了,增大了问题排查的难度。关于该项优化,在 Release Notes 中有以下说明:

The compiler in the server VM now provides correct stack backtraces for all “cold” built-in exceptions. For performance purposes, when such an exception is thrown a few times, the method may be recompiled. After recompilation, the compiler may choose a faster tactic using preallocated exceptions that do not provide a stack trace. To disable completely the use of preallocated exceptions, use this new flag: -XX:-OmitStackTraceInFastThrow.

JVM 中关于这部分优化的源码位于 graphKit.cpp at jdk8-b120

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
// If this throw happens frequently, an uncommon trap might cause
// a performance pothole. If there is a local exception handler,
// and if this particular bytecode appears to be deoptimizing often,
// let us handle the throw inline, with a preconstructed instance.
// Note: If the deopt count has blown up, the uncommon trap
// runtime is going to flush this nmethod, not matter what.
if (treat_throw_as_hot
&& (!StackTraceInThrowable || OmitStackTraceInFastThrow)) {
// If the throw is local, we use a pre-existing instance and
// punt on the backtrace. This would lead to a missing backtrace
// (a repeat of 4292742) if the backtrace object is ever asked
// for its backtrace.
// Fixing this remaining case of 4292742 requires some flavor of
// escape analysis. Leave that for the future.
ciInstance* ex_obj = NULL;
switch (reason) {
case Deoptimization::Reason_null_check:
ex_obj = env()->NullPointerException_instance();
break;
case Deoptimization::Reason_div0_check:
ex_obj = env()->ArithmeticException_instance();
break;
case Deoptimization::Reason_range_check:
ex_obj = env()->ArrayIndexOutOfBoundsException_instance();
break;
case Deoptimization::Reason_class_check:
if (java_bc() == Bytecodes::_aastore) {
ex_obj = env()->ArrayStoreException_instance();
} else {
ex_obj = env()->ClassCastException_instance();
}
break;
}
}

可以编写简短的代码来复现该项优化,比如如下代码:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
import java.util.concurrent.ThreadLocalRandom;

public class Test {

public static void main(String[] args) {
while (true) {
try {
Object object = null;
if (ThreadLocalRandom.current().nextInt(2) == 0) {
object = new Object();
}
object.hashCode();
} catch (Exception e) {
e.printStackTrace();
}
}
}

}

我们使用命令行编译并执行以上代码可以观察到部分输出如下:

1
2
3
4
5
6
7
java.lang.NullPointerException
at Test.main(Test.java:12)
java.lang.NullPointerException
at Test.main(Test.java:12)
java.lang.NullPointerException
java.lang.NullPointerException
java.lang.NullPointerException

即打印一定次数后打印出的数据就不含栈帧信息了,我们微调上面的代码,改为打印异常对象的 System.identityHashCode(e)

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
import java.util.concurrent.ThreadLocalRandom;

public class Test {

public static void main(String[] args) {
while (true) {
try {
Object object = null;
if (ThreadLocalRandom.current().nextInt(2) == 0) {
object = new Object();
}
object.hashCode();
} catch (Exception e) {
System.err.println(System.identityHashCode(e));
}
}
}

}

观察到部分输出如下:

1
2
3
4
5
6
7
8
1597422095
101832807
1593540482
2102549317
2102549317
2102549317
2102549317
2102549317

即验证了上述的代码,执行到一定次数后,JVM 会将异常替换为预构建的异常,导致打印出的异常信息中不含栈帧,确认该问题原因后,我在 JVM 配置选项中加上了 -XX:-OmitStackTraceInFastThrow 以避免 JVM 执行此项优化。

Reference

What Is JVM Option “-XX:-OmitStackTraceInFastThrow” ? | YoshiTech Blog
java - How is the count of method executions that triggers the omitted exception stack frames calculated in JDK 8? - Stack Overflow
jdk/advancedThresholdPolicy.hpp at jdk8-b120 · openjdk/jdk · GitHub
java - How does the JVM decided to JIT-compile a method (categorize a method as “hot”)? - Stack Overflow
Tiered Compilation in JVM | Baeldung