Relocating Classes

关于重定位类，我初次接触是在编写 Spark 应用时遇到，当时业务应用需要用到一个高版本的 Guava 依赖，而 Spark 内置依赖了低版本的 Guava，根据 JVM 默认的类加载顺序，Spark 内置依赖的 Guava 依赖优先于业务依赖的 Guava 版本，导致业务应用出现找不到方法的异常，后通过 Relocating Classes 将指定的包名进行调整以规避同一个应用域中需要加载不同版本相同类名的问题。

最近编写 Flink 应用也遇到了类似的问题，Flink 官方文档给出了多种方案，其中也包括了上面提到的使用 maven-shade-plugin 对类进行重定位，也提到了 Flink 的反向类加载机制，可以参考文末相关文档。

2021-12-28

今天看 Spark 源码发现有选项可以配置优先加载用户依赖，Driver 与 Executor 均可以配置，涉及到的配置参数如下：

Property Name	Default	Meaning	Since Version
spark.driver.userClassPathFirst	false	(Experimental) Whether to give user-added jars precedence over Spark’s own jars when loading classes in the driver. This feature can be used to mitigate conflicts between Spark’s dependencies and user dependencies. It is currently an experimental feature. This is used in cluster mode only.	1.3.0
spark.executor.userClassPathFirst	false	(Experimental) Same functionality as spark.driver.userClassPathFirst, but applied to executor instances.	1.3.0

其中 Driver 端类加载器创建的源码位于 DriverWrapper.scala at v3.2.0:

val currentLoader = Thread.currentThread.getContextClassLoader
val userJarUrl = new File(userJar).toURI().toURL()
val loader =
  if (sys.props.getOrElse(config.DRIVER_USER_CLASS_PATH_FIRST.key, "false").toBoolean) {
    new ChildFirstURLClassLoader(Array(userJarUrl), currentLoader)
  } else {
    new MutableURLClassLoader(Array(userJarUrl), currentLoader)
  }

Executor 端类加载器创建的源码位于 Executor.scala at v3.2.0:

/**
 * Create a ClassLoader for use in tasks, adding any JARs specified by the user or any classes
 * created by the interpreter to the search path
 */
private def createClassLoader(): MutableURLClassLoader = {
  // Bootstrap the list of jars with the user class path.
  val now = System.currentTimeMillis()
  userClassPath.foreach { url =>
    currentJars(url.getPath().split("/").last) = now
  }

  val currentLoader = Utils.getContextOrSparkClassLoader

  // For each of the jars in the jarSet, add them to the class loader.
  // We assume each of the files has already been fetched.
  val urls = userClassPath.toArray ++ currentJars.keySet.map { uri =>
    new File(uri.split("/").last).toURI.toURL
  }
  if (userClassPathFirst) {
    new ChildFirstURLClassLoader(urls, currentLoader)
  } else {
    new MutableURLClassLoader(urls, currentLoader)
  }
}

实现原理均为根据配置创建对应的类加载器，其中关键的 ChildFirstURLClassLoader 类源码位于 ChildFirstURLClassLoader.java at v3.2.0:

/**
 * A mutable class loader that gives preference to its own URLs over the parent class loader
 * when loading classes and resources.
 */
public class ChildFirstURLClassLoader extends MutableURLClassLoader {

  static {
    ClassLoader.registerAsParallelCapable();
  }

  private ParentClassLoader parent;

  public ChildFirstURLClassLoader(URL[] urls, ClassLoader parent) {
    super(urls, null);
    this.parent = new ParentClassLoader(parent);
  }

  @Override
  public Class<?> loadClass(String name, boolean resolve) throws ClassNotFoundException {
    try {
      return super.loadClass(name, resolve);
    } catch (ClassNotFoundException cnf) {
      return parent.loadClass(name, resolve);
    }
  }

  @Override
  public Enumeration<URL> getResources(String name) throws IOException {
    ArrayList<URL> urls = Collections.list(super.getResources(name));
    urls.addAll(Collections.list(parent.getResources(name)));
    return Collections.enumeration(urls);
  }

  @Override
  public URL getResource(String name) {
    URL url = super.getResource(name);
    if (url != null) {
      return url;
    } else {
      return parent.getResource(name);
    }
  }
}

可以看出该类加载器在构造函数中调用父类构造函数即 super(urls, null); 调用时传入给父类的 parent 为 null，以使 loadClass 调用父类的 loadClass 方法时只会查找自己的 urls，加载失败后再从 parent 尝试加载类。

Reference

Debugging Classloading | Apache Flink