Poison


  • 首页

  • 归档

  • 标签

  • 搜索
close
Poison

G1 GC

发表于 2021-04-23

在 Orcale 的官方网站中,对于 G1 GC 的介绍可以参见 Garbage-First Garbage Collector 及 Garbage-First Garbage Collector Tuning,其中关键点如下:

G1 收集器是一个服务端的垃圾收集器,适用于具有大内存的多处理器机器。它极有可能满足垃圾回收(GC)暂停时间目标,同时实现高吞吐量。整堆操作(例如全局标记)与应用程序线程同时执行。这样可以防止与堆或活动数据大小成比例的中断。

阅读全文 »
Poison

HashMap

发表于 2021-04-18

以下基于 JDK 8 中的 HashMap 进行分析,先看看这几个构造函数:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
/**
* Constructs an empty <tt>HashMap</tt> with the specified initial
* capacity and load factor.
*
* @param initialCapacity the initial capacity
* @param loadFactor the load factor
* @throws IllegalArgumentException if the initial capacity is negative
* or the load factor is nonpositive
*/
public HashMap(int initialCapacity, float loadFactor) {
if (initialCapacity < 0)
throw new IllegalArgumentException("Illegal initial capacity: " +
initialCapacity);
if (initialCapacity > MAXIMUM_CAPACITY)
initialCapacity = MAXIMUM_CAPACITY;
if (loadFactor <= 0 || Float.isNaN(loadFactor))
throw new IllegalArgumentException("Illegal load factor: " +
loadFactor);
this.loadFactor = loadFactor;
this.threshold = tableSizeFor(initialCapacity);
}

/**
* Constructs an empty <tt>HashMap</tt> with the specified initial
* capacity and the default load factor (0.75).
*
* @param initialCapacity the initial capacity.
* @throws IllegalArgumentException if the initial capacity is negative.
*/
public HashMap(int initialCapacity) {
this(initialCapacity, DEFAULT_LOAD_FACTOR);
}

/**
* Constructs an empty <tt>HashMap</tt> with the default initial capacity
* (16) and the default load factor (0.75).
*/
public HashMap() {
this.loadFactor = DEFAULT_LOAD_FACTOR; // all other fields defaulted
}

可见,对于最常见的 new HashMap() 方法,仅仅将 loadFactor 设置为了默认的负载因子:0.75,此时未对底层的数组 Node<K,V>[] table 进行初始化。

默认的负载因子为何要选择 0.75 呢?其中 HashMap 的 JavaDoc 中专门提到:

As a general rule, the default load factor (.75) offers a good tradeoff between time and space costs. Higher values decrease the space overhead but increase the lookup cost (reflected in most of the operations of the HashMap class, including get and put). The expected number of entries in the map and its load factor should be taken into account when setting its initial capacity, so as to minimize the number of rehash operations. If the initial capacity is greater than the maximum number of entries divided by the load factor, no rehash operations will ever occur.

If many mappings are to be stored in a HashMap instance, creating it with a sufficiently large capacity will allow the mappings to be stored more efficiently than letting it perform automatic rehashing as needed to grow the table. Note that using many keys with the same hashCode() is a sure way to slow down performance of any hash table. To ameliorate impact, when keys are Comparable, this class may use comparison order among keys to help break ties.

我们看看 HashMap(int initialCapacity, float loadFactor) 方法中最后一行:
this.threshold = tableSizeFor(initialCapacity),其中 tableSizeFor 方法实现如下:

1
2
3
4
5
6
7
8
9
10
11
12
/**
* Returns a power of two size for the given target capacity.
*/
static final int tableSizeFor(int cap) {
int n = cap - 1;
n |= n >>> 1;
n |= n >>> 2;
n |= n >>> 4;
n |= n >>> 8;
n |= n >>> 16;
return (n < 0) ? 1 : (n >= MAXIMUM_CAPACITY) ? MAXIMUM_CAPACITY : n + 1;
}
阅读全文 »
Poison

关于 Spark 的分区数问题

发表于 2021-04-07

Coalesce Hints for SQL Queries,该特性用于控制输出的文件数,之前数仓同步时耗时较长,经过定位后发现大部分时间消耗在与 OSS 的数据交互上,主要是小文件引起,每张表的同步任务经过 shuffle 后默认会生成 200 个文件,后面优化为根据每张表的表记录数计算出一个合适的分区数使用上述 Hint 嵌入在 SQL 中,整个数仓同步耗时降低近 50%。

同时发现的问题还有 EMR-OSS 连接器中对 System.gc() 的显式调用,该 问题 会导致花费大量时间在不必要的 FullGC 上,后面移除了该调用以提升数仓同步速度。

Reference

Spark Partitioning & Partition Understanding
Spark SQL Shuffle Partitions

Poison

NoClassDefFoundError

发表于 2021-03-22

关于 NoClassDefFoundError,稍有经验的开发应该都遇到过,比如下面这个异常:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
java.lang.NoClassDefFoundError: org/apache/commons/lang/exception/NestableRuntimeException
at java.lang.ClassLoader.defineClass1(Native Method) ~[?:1.8.0_211]
at java.lang.ClassLoader.defineClass(ClassLoader.java:763) ~[?:1.8.0_211]
at java.security.SecureClassLoader.defineClass(SecureClassLoader.java:142) ~[?:1.8.0_211]
at java.net.URLClassLoader.defineClass(URLClassLoader.java:468) ~[?:1.8.0_211]
at java.net.URLClassLoader.access$100(URLClassLoader.java:74) ~[?:1.8.0_211]
at java.net.URLClassLoader$1.run(URLClassLoader.java:369) ~[?:1.8.0_211]
at java.net.URLClassLoader$1.run(URLClassLoader.java:363) ~[?:1.8.0_211]
at java.security.AccessController.doPrivileged(Native Method) ~[?:1.8.0_211]
at java.net.URLClassLoader.findClass(URLClassLoader.java:362) ~[?:1.8.0_211]
at org.apache.flink.util.ChildFirstClassLoader.loadClassWithoutExceptionHandling(ChildFirstClassLoader.java:71) ~[flink-dist_2.11-1.12.1.jar:1.12.1]
at org.apache.flink.util.FlinkUserCodeClassLoader.loadClass(FlinkUserCodeClassLoader.java:48) ~[flink-dist_2.11-1.12.1.jar:1.12.1]
at java.lang.ClassLoader.loadClass(ClassLoader.java:357) ~[?:1.8.0_211]
at com.aliyun.openservices.log.flink.util.LogClientProxy.<init>(LogClientProxy.java:26) ~[?:?]
at com.aliyun.openservices.log.flink.FlinkLogConsumer.createClient(FlinkLogConsumer.java:65) ~[?:?]
at com.aliyun.openservices.log.flink.FlinkLogConsumer.run(FlinkLogConsumer.java:71) ~[?:?]
at org.apache.flink.streaming.api.operators.StreamSource.run(StreamSource.java:110) ~[flink-dist_2.11-1.12.1.jar:1.12.1]
at org.apache.flink.streaming.api.operators.StreamSource.run(StreamSource.java:66) ~[flink-dist_2.11-1.12.1.jar:1.12.1]
at org.apache.flink.streaming.runtime.tasks.SourceStreamTask$LegacySourceFunctionThread.run(SourceStreamTask.java:241) ~[flink-dist_2.11-1.12.1.jar:1.12.1]
Caused by: java.lang.ClassNotFoundException: org.apache.commons.lang.exception.NestableRuntimeException
at java.net.URLClassLoader.findClass(URLClassLoader.java:382) ~[?:1.8.0_211]
at java.lang.ClassLoader.loadClass(ClassLoader.java:424) ~[?:1.8.0_211]
at org.apache.flink.util.FlinkUserCodeClassLoader.loadClassWithoutExceptionHandling(FlinkUserCodeClassLoader.java:64) ~[flink-dist_2.11-1.12.1.jar:1.12.1]
at org.apache.flink.util.ChildFirstClassLoader.loadClassWithoutExceptionHandling(ChildFirstClassLoader.java:74) ~[flink-dist_2.11-1.12.1.jar:1.12.1]
at org.apache.flink.util.FlinkUserCodeClassLoader.loadClass(FlinkUserCodeClassLoader.java:48) ~[flink-dist_2.11-1.12.1.jar:1.12.1]
at java.lang.ClassLoader.loadClass(ClassLoader.java:357) ~[?:1.8.0_211]
... 18 more

ClassNotFoundException 是引起 NoClassDefFoundError 的最常见原因,常见于实际需要使用的依赖版本与应用依赖不一致,这个问题非常常见,但是估计大家都没注意 ClassNotFoundException 为什么会被转换为 NoClassDefFoundError,在一次线上问题的排查过程中我查询了 JLS 中的类的详细初始化顺序 Detailed Initialization Procedure, 在此摘抄一段最核心的部分:

阅读全文 »
Poison

Uber JAR

发表于 2021-03-21

在 Spark、Flink 应用等场景下,经常会将业务代码构造为一个 Uber jar 提交至集群运行,在一次 Hive-UDF 的集成过程中,使用 Apache Maven Shade Plugin 进行 Uber jar 构建后,加载类时提示找不到类,我将该 Uber jar 拉取至本地解压后发现类是存在的,但错误日志仅提示找不到类,后经过反复排查,原来是 Jar 包签名问题导致,只是我遇到的场景并未提示签名问题导致难以排查。

1
2
3
4
5
6
7
8
9
10
11
12
<filters>
<filter>
<!-- Do not copy the signatures in the META-INF folder.
Otherwise, this might cause SecurityExceptions when using the JAR. -->
<artifact>*:*</artifact>
<excludes>
<exclude>META-INF/*.SF</exclude>
<exclude>META-INF/*.DSA</exclude>
<exclude>META-INF/*.RSA</exclude>
</excludes>
</filter>
</filters>
Reference

JAR File Specification, Signed JAR File
“Invalid signature file” when attempting to run a .jar
Appendix: Template for building a Jar with Dependencies | Apache Flink

1…23242526

128 日志
119 标签
GitHub LeetCode
© 2025 Poison 蜀ICP备16000644号
由 Hexo 强力驱动
主题 - NexT.Mist