Poison

NoClassDefFoundError

关于 NoClassDefFoundError, 稍有经验的开发应该都遇到过,比如下面这个异常:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
java.lang.NoClassDefFoundError: org/apache/commons/lang/exception/NestableRuntimeException
at java.lang.ClassLoader.defineClass1(Native Method) ~[?:1.8.0_211]
at java.lang.ClassLoader.defineClass(ClassLoader.java:763) ~[?:1.8.0_211]
at java.security.SecureClassLoader.defineClass(SecureClassLoader.java:142) ~[?:1.8.0_211]
at java.net.URLClassLoader.defineClass(URLClassLoader.java:468) ~[?:1.8.0_211]
at java.net.URLClassLoader.access$100(URLClassLoader.java:74) ~[?:1.8.0_211]
at java.net.URLClassLoader$1.run(URLClassLoader.java:369) ~[?:1.8.0_211]
at java.net.URLClassLoader$1.run(URLClassLoader.java:363) ~[?:1.8.0_211]
at java.security.AccessController.doPrivileged(Native Method) ~[?:1.8.0_211]
at java.net.URLClassLoader.findClass(URLClassLoader.java:362) ~[?:1.8.0_211]
at org.apache.flink.util.ChildFirstClassLoader.loadClassWithoutExceptionHandling(ChildFirstClassLoader.java:71) ~[flink-dist_2.11-1.12.1.jar:1.12.1]
at org.apache.flink.util.FlinkUserCodeClassLoader.loadClass(FlinkUserCodeClassLoader.java:48) ~[flink-dist_2.11-1.12.1.jar:1.12.1]
at java.lang.ClassLoader.loadClass(ClassLoader.java:357) ~[?:1.8.0_211]
at com.aliyun.openservices.log.flink.util.LogClientProxy.<init>(LogClientProxy.java:26) ~[?:?]
at com.aliyun.openservices.log.flink.FlinkLogConsumer.createClient(FlinkLogConsumer.java:65) ~[?:?]
at com.aliyun.openservices.log.flink.FlinkLogConsumer.run(FlinkLogConsumer.java:71) ~[?:?]
at org.apache.flink.streaming.api.operators.StreamSource.run(StreamSource.java:110) ~[flink-dist_2.11-1.12.1.jar:1.12.1]
at org.apache.flink.streaming.api.operators.StreamSource.run(StreamSource.java:66) ~[flink-dist_2.11-1.12.1.jar:1.12.1]
at org.apache.flink.streaming.runtime.tasks.SourceStreamTask$LegacySourceFunctionThread.run(SourceStreamTask.java:241) ~[flink-dist_2.11-1.12.1.jar:1.12.1]
Caused by: java.lang.ClassNotFoundException: org.apache.commons.lang.exception.NestableRuntimeException
at java.net.URLClassLoader.findClass(URLClassLoader.java:382) ~[?:1.8.0_211]
at java.lang.ClassLoader.loadClass(ClassLoader.java:424) ~[?:1.8.0_211]
at org.apache.flink.util.FlinkUserCodeClassLoader.loadClassWithoutExceptionHandling(FlinkUserCodeClassLoader.java:64) ~[flink-dist_2.11-1.12.1.jar:1.12.1]
at org.apache.flink.util.ChildFirstClassLoader.loadClassWithoutExceptionHandling(ChildFirstClassLoader.java:74) ~[flink-dist_2.11-1.12.1.jar:1.12.1]
at org.apache.flink.util.FlinkUserCodeClassLoader.loadClass(FlinkUserCodeClassLoader.java:48) ~[flink-dist_2.11-1.12.1.jar:1.12.1]
at java.lang.ClassLoader.loadClass(ClassLoader.java:357) ~[?:1.8.0_211]
... 18 more

ClassNotFoundException 是引起 NoClassDefFoundError 的最常见原因,常见于实际需要使用的依赖版本与应用依赖不一致,这个问题非常常见,但是估计大家都没注意 ClassNotFoundException 为什么会被转换为 NoClassDefFoundError, 在一次线上问题的排查过程中笔者查询了 JLS 中的类的详细初始化顺序 Detailed Initialization Procedure, 在此摘抄一段最核心的部分:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
1. Synchronize on the initialization lock, LC, for C. This involves waiting until the current thread can acquire LC.

2. If the Class object for C indicates that initialization is in progress for C by some other thread, then release LC and block the current thread until informed that the in-progress initialization has completed, at which time repeat this step.

3. If the Class object for C indicates that initialization is in progress for C by the current thread, then this must be a recursive request for initialization. Release LC and complete normally.

4. If the Class object for C indicates that C has already been initialized, then no further action is required. Release LC and complete normally.

5. If the Class object for C is in an erroneous state, then initialization is not possible. Release LC and throw a NoClassDefFoundError.

6. Otherwise, record the fact that initialization of the Class object for C is in progress by the current thread, and release LC.

Then, initialize the final class variables and fields of interfaces whose values are compile-time constant expressions (§8.3.2.1, §9.3.1, §13.4.9, §15.28).

7. Next, if C is a class rather than an interface, and its superclass SC has not yet been initialized, then recursively perform this entire procedure for SC. If necessary, verify and prepare SC first. If the initialization of SC completes abruptly because of a thrown exception, then acquire LC, label the Class object for C as erroneous, notify all waiting threads, release LC, and complete abruptly, throwing the same exception that resulted from initializing SC.

8. Next, determine whether assertions are enabled (§14.10) for C by querying its defining class loader.

9. Next, execute either the class variable initializers and static initializers of the class, or the field initializers of the interface, in textual order, as though they were a single block.

10. If the execution of the initializers completes normally, then acquire LC, label the Class object for C as fully initialized, notify all waiting threads, release LC, and complete this procedure normally.

11. Otherwise, the initializers must have completed abruptly by throwing some exception E. If the class of E is not Error or one of its subclasses, then create a new instance of the class ExceptionInInitializerError, with E as the argument, and use this object in place of E in the following step. But if a new instance of ExceptionInInitializerError cannot be created because an OutOfMemoryError occurs, then instead use an OutOfMemoryError object in place of E in the following step.

12. Acquire LC, label the Class object for C as erroneous, notify all waiting threads, release LC, and complete this procedure abruptly with reason E or its replacement as determined in the previous step.

关于为什么笔者会查询到类详细的加载顺序文档呢,这个还要从另一个问题说起,我们是否应该 catch Throwable? 根据 Java 文档,Throwable 是所有异常和错误的超类,且 Error 一般代表不可恢复的错误,Exception 一般表示可恢复的异常,根据 Is It a Bad Practice to Catch Throwable? 类似的文章都不建议对 Throwable 进行 catch 然后处理,因为 Throwable 包含了不可恢复的 Error, 如 OutOfMemoryError, 所以笔者经常写的异常捕获代码为:

1
2
3
4
5
try {
// Business logic
} catch (Exception e) {
// Log ...
}

直到业务方反馈有一块业务逻辑没有生效,该业务逻辑采用消息队列消费实现,笔者查询了业务方反馈时间段的日志,没有发现任何异常日志,且查询到消息消费的状态为失败,后经过反复排查,确认为 NoClassDefFoundError 导致,正好业务逻辑中的异常日志记录代码为 catch Exception 实现,从而使 NoClassDefFoundError 未被捕捉从而没有被日志记录,但是为什么会出现 NoClassDefFoundError 呢?出问题的类是业务方自己编写的类,并不存在依赖版本不一致出现 ClassNotFoundException 导致 NoClassDefFoundError 的问题,后面经过排查,原来是 static 代码块中产生了异常导致,为何 static 代码块中的异常会导致 NoClassDefFoundError 呢,于是查询到了以上的 JLS 中的类加载文档,static 代码块中的运行时异常导致经历了上面 11 -> 12 -> 5 的流程,示例代码:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
package me.tianshuang;

public class LogisticsUtil {

static {
if (true) {
throw new RuntimeException();
}
}

public static void print() {
System.out.println("This is print method");
}

}

package me.tianshuang;

public class Test {

public static void main(String[] args) {
LogisticsUtil.print();
}

}

此时,Test.main() 会抛出错误 ExceptionInInitializerError, 即上面的第 11 步:

1
2
3
4
5
Exception in thread "main" java.lang.ExceptionInInitializerError
at me.tianshuang.Test.main(Test.java:12)
Caused by: java.lang.RuntimeException
at me.tianshuang.LogisticsUtil.<clinit>(LogisticsUtil.java:7)
... 1 more

而如果我们未对 Error 进行处理,再调用 LogisticsUtil.print 方法,此时会触发第 5 步:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
package me.tianshuang;

public class Test {

public static void main(String[] args) {
try {
LogisticsUtil.print();
} catch (Error error) {
// ignore
}

LogisticsUtil.print();
}

}

异常栈帧如下:

1
2
Exception in thread "main" java.lang.NoClassDefFoundError: Could not initialize class me.tianshuang.LogisticsUtil
at me.tianshuang.Test.main(Test.java:12)

现在又回到了原点,是否该 catch Throwable?

相关文档参考:
Why catch Exceptions in Java, when you can catch Throwables?
Catching java.lang.OutOfMemoryError?
When Does Java Throw the ExceptionInInitializerError?
Why NoClassDefFoundError caused by static field initialization failure?