Poison

Heavy I/O

前段时间有线上实例被负载均衡判定不可用,随即被摘除,同时我们收到该实例的 JVM GC 耗时较长的告警及负载较高的告警。从操作系统级别的监控可以看到磁盘读取速度几乎达到该规格的上限,近 120MB/s,IOPS 为 2200 左右,使用 ssh 登录该实例需要等待很久,且进入实例后几乎无法输入命令,2 核的机器负载达到了 20 左右。持续了七八分钟会恢复正常,恢复正常后我首先查看了 GC 日志,确认是否存在大量 FullGC,但是从日志中并未发现长时间的 Full GC,普通的 GC 的 real 耗时都很久,部分 GC 日志如下:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
2021-12-29T13:35:04.683+0800: 401217.313: [GC pause (G1 Evacuation Pause) (young), 13.0059766 secs]
[Parallel Time: 9387.6 ms, GC Workers: 2]
[GC Worker Start (ms): Min: 401218614.4, Avg: 401218614.4, Max: 401218614.4, Diff: 0.0]
[Ext Root Scanning (ms): Min: 7750.6, Avg: 7775.3, Max: 7799.9, Diff: 49.2, Sum: 15550.5]
[Update RS (ms): Min: 185.8, Avg: 190.6, Max: 195.4, Diff: 9.6, Sum: 381.2]
[Processed Buffers: Min: 314, Avg: 319.5, Max: 325, Diff: 11, Sum: 639]
[Scan RS (ms): Min: 201.7, Avg: 201.7, Max: 201.8, Diff: 0.1, Sum: 403.4]
[Code Root Scanning (ms): Min: 1.5, Avg: 1.6, Max: 1.7, Diff: 0.2, Sum: 3.3]
[Object Copy (ms): Min: 656.1, Avg: 671.0, Max: 686.0, Diff: 30.0, Sum: 1342.1]
[Termination (ms): Min: 2.1, Avg: 3.3, Max: 4.5, Diff: 2.4, Sum: 6.5]
[Termination Attempts: Min: 1, Avg: 1.0, Max: 1, Diff: 0, Sum: 2]
[GC Worker Other (ms): Min: 272.9, Avg: 276.7, Max: 280.6, Diff: 7.7, Sum: 553.5]
[GC Worker Total (ms): Min: 9120.2, Avg: 9120.3, Max: 9120.3, Diff: 0.0, Sum: 18240.5]
[GC Worker End (ms): Min: 401227734.7, Avg: 401227734.7, Max: 401227734.7, Diff: 0.0]
[Code Root Fixup: 97.6 ms]
[Code Root Purge: 0.0 ms]
[Clear CT: 11.7 ms]
[Other: 3509.0 ms]
[Choose CSet: 2.8 ms]
[Ref Proc: 591.3 ms]
[Ref Enq: 9.6 ms]
[Redirty Cards: 62.3 ms]
[Humongous Register: 98.0 ms]
[Humongous Reclaim: 194.3 ms]
[Free CSet: 722.3 ms]
[Eden: 1223.0M(1223.0M)->0.0B(89.0M) Survivors: 5120.0K->13.0M Heap: 1930.1M(2048.0M)->705.2M(2048.0M)]
[Times: user=0.08 sys=0.53, real=14.11 secs]
2021-12-29T13:36:12.108+0800: 401284.528: [GC pause (G1 Evacuation Pause) (young), 0.6077202 secs]
[Parallel Time: 64.2 ms, GC Workers: 2]
[GC Worker Start (ms): Min: 401284649.3, Avg: 401284649.3, Max: 401284649.3, Diff: 0.0]
[Ext Root Scanning (ms): Min: 33.6, Avg: 34.0, Max: 34.3, Diff: 0.7, Sum: 67.9]
[Update RS (ms): Min: 8.9, Avg: 9.8, Max: 10.6, Diff: 1.7, Sum: 19.6]
[Processed Buffers: Min: 5, Avg: 80.5, Max: 156, Diff: 151, Sum: 161]
[Scan RS (ms): Min: 0.0, Avg: 0.0, Max: 0.1, Diff: 0.1, Sum: 0.1]
[Code Root Scanning (ms): Min: 0.0, Avg: 0.0, Max: 0.0, Diff: 0.0, Sum: 0.0]
[Object Copy (ms): Min: 7.9, Avg: 8.2, Max: 8.6, Diff: 0.7, Sum: 16.5]
[Termination (ms): Min: 0.0, Avg: 2.9, Max: 5.7, Diff: 5.7, Sum: 5.7]
[Termination Attempts: Min: 1, Avg: 1.0, Max: 1, Diff: 0, Sum: 2]
[GC Worker Other (ms): Min: 1.7, Avg: 1.7, Max: 1.8, Diff: 0.1, Sum: 3.5]
[GC Worker Total (ms): Min: 53.9, Avg: 56.6, Max: 59.4, Diff: 5.5, Sum: 113.3]
[GC Worker End (ms): Min: 401284703.2, Avg: 401284705.9, Max: 401284708.7, Diff: 5.5]
[Code Root Fixup: 0.1 ms]
[Code Root Purge: 2.0 ms]
[Clear CT: 85.5 ms]
[Other: 456.0 ms]
[Choose CSet: 1.7 ms]
[Ref Proc: 15.3 ms]
[Ref Enq: 93.0 ms]
[Redirty Cards: 6.1 ms]
[Humongous Register: 1.7 ms]
[Humongous Reclaim: 0.1 ms]
[Free CSet: 9.3 ms]
[Eden: 89.0M(89.0M)->0.0B(98.0M) Survivors: 13.0M->4096.0K Heap: 794.2M(2048.0M)->703.3M(2048.0M)]
[Times: user=0.05 sys=0.01, real=0.88 secs]
2021-12-29T13:36:15.097+0800: 401287.722: [GC pause (G1 Evacuation Pause) (young), 0.6526700 secs]
[Parallel Time: 268.2 ms, GC Workers: 2]
[GC Worker Start (ms): Min: 401287765.2, Avg: 401287765.2, Max: 401287765.2, Diff: 0.0]
[Ext Root Scanning (ms): Min: 33.4, Avg: 33.9, Max: 34.3, Diff: 0.9, Sum: 67.8]
[Update RS (ms): Min: 21.9, Avg: 21.9, Max: 22.0, Diff: 0.1, Sum: 43.9]
[Processed Buffers: Min: 29, Avg: 46.0, Max: 63, Diff: 34, Sum: 92]
[Scan RS (ms): Min: 5.5, Avg: 5.5, Max: 5.5, Diff: 0.0, Sum: 11.0]
[Code Root Scanning (ms): Min: 0.0, Avg: 0.0, Max: 0.0, Diff: 0.0, Sum: 0.0]
[Object Copy (ms): Min: 185.4, Avg: 185.9, Max: 186.4, Diff: 1.1, Sum: 371.8]
[Termination (ms): Min: 2.1, Avg: 2.1, Max: 2.1, Diff: 0.0, Sum: 4.1]
[Termination Attempts: Min: 1, Avg: 1.0, Max: 1, Diff: 0, Sum: 2]
[GC Worker Other (ms): Min: 12.6, Avg: 12.6, Max: 12.7, Diff: 0.1, Sum: 25.3]
[GC Worker Total (ms): Min: 261.9, Avg: 261.9, Max: 262.0, Diff: 0.1, Sum: 523.9]
[GC Worker End (ms): Min: 401288027.1, Avg: 401288027.1, Max: 401288027.2, Diff: 0.1]
[Code Root Fixup: 12.4 ms]
[Code Root Purge: 0.0 ms]
[Clear CT: 1.8 ms]
[Other: 370.4 ms]
[Choose CSet: 1.6 ms]
[Ref Proc: 13.4 ms]
[Ref Enq: 0.0 ms]
[Redirty Cards: 0.1 ms]
[Humongous Register: 1.7 ms]
[Humongous Reclaim: 0.1 ms]
[Free CSet: 3.4 ms]
[Eden: 98.0M(98.0M)->0.0B(98.0M) Survivors: 4096.0K->4096.0K Heap: 801.3M(2048.0M)->703.3M(2048.0M)]
[Times: user=0.04 sys=0.02, real=0.68 secs]
2021-12-29T13:36:23.426+0800: 401295.771: [GC pause (G1 Evacuation Pause) (young), 1.8345340 secs]
[Parallel Time: 1125.3 ms, GC Workers: 2]
[GC Worker Start (ms): Min: 401296531.7, Avg: 401296531.8, Max: 401296531.8, Diff: 0.0]
[Ext Root Scanning (ms): Min: 424.7, Avg: 424.9, Max: 425.1, Diff: 0.4, Sum: 849.8]
[Update RS (ms): Min: 141.1, Avg: 141.8, Max: 142.5, Diff: 1.5, Sum: 283.6]
[Processed Buffers: Min: 42, Avg: 70.5, Max: 99, Diff: 57, Sum: 141]
[Scan RS (ms): Min: 0.0, Avg: 0.1, Max: 0.1, Diff: 0.1, Sum: 0.1]
[Code Root Scanning (ms): Min: 0.0, Avg: 0.0, Max: 0.0, Diff: 0.0, Sum: 0.0]
[Object Copy (ms): Min: 224.0, Avg: 228.3, Max: 232.7, Diff: 8.7, Sum: 456.7]
[Termination (ms): Min: 0.0, Avg: 1.0, Max: 1.9, Diff: 1.9, Sum: 1.9]
[Termination Attempts: Min: 1, Avg: 1.0, Max: 1, Diff: 0, Sum: 2]
[GC Worker Other (ms): Min: 214.2, Avg: 216.7, Max: 219.2, Diff: 5.0, Sum: 433.5]
[GC Worker Total (ms): Min: 1012.8, Avg: 1012.8, Max: 1012.8, Diff: 0.0, Sum: 2025.6]
[GC Worker End (ms): Min: 401297544.5, Avg: 401297544.6, Max: 401297544.6, Diff: 0.0]
[Code Root Fixup: 21.3 ms]
[Code Root Purge: 0.0 ms]
[Clear CT: 1.8 ms]
[Other: 686.1 ms]
[Choose CSet: 2.0 ms]
[Ref Proc: 8.4 ms]
[Ref Enq: 0.0 ms]
[Redirty Cards: 0.1 ms]
[Humongous Register: 103.0 ms]
[Humongous Reclaim: 0.1 ms]
[Free CSet: 3.7 ms]
[Eden: 98.0M(98.0M)->0.0B(97.0M) Survivors: 4096.0K->5120.0K Heap: 801.3M(2048.0M)->704.3M(2048.0M)]
[Times: user=0.05 sys=0.02, real=1.86 secs]
2021-12-29T13:36:28.326+0800: 401300.670: [GC pause (G1 Evacuation Pause) (young), 0.9295091 secs]
[Parallel Time: 296.3 ms, GC Workers: 2]
[GC Worker Start (ms): Min: 401300951.5, Avg: 401300951.5, Max: 401300951.5, Diff: 0.1]
[Ext Root Scanning (ms): Min: 49.0, Avg: 50.1, Max: 51.3, Diff: 2.3, Sum: 100.3]
[Update RS (ms): Min: 15.4, Avg: 16.2, Max: 17.1, Diff: 1.7, Sum: 32.5]
[Processed Buffers: Min: 24, Avg: 45.0, Max: 66, Diff: 42, Sum: 90]
[Scan RS (ms): Min: 7.0, Avg: 7.0, Max: 7.0, Diff: 0.1, Sum: 14.0]
[Code Root Scanning (ms): Min: 0.0, Avg: 0.0, Max: 0.0, Diff: 0.0, Sum: 0.0]
[Object Copy (ms): Min: 112.0, Avg: 112.9, Max: 113.9, Diff: 1.9, Sum: 225.9]
[Termination (ms): Min: 84.8, Avg: 85.4, Max: 86.0, Diff: 1.3, Sum: 170.8]
[Termination Attempts: Min: 1, Avg: 1.0, Max: 1, Diff: 0, Sum: 2]
[GC Worker Other (ms): Min: 15.3, Avg: 15.4, Max: 15.4, Diff: 0.1, Sum: 30.7]
[GC Worker Total (ms): Min: 287.0, Avg: 287.1, Max: 287.1, Diff: 0.1, Sum: 574.1]
[GC Worker End (ms): Min: 401301238.6, Avg: 401301238.6, Max: 401301238.6, Diff: 0.0]
[Code Root Fixup: 93.0 ms]
[Code Root Purge: 0.0 ms]
[Clear CT: 1.7 ms]
[Other: 538.5 ms]
[Choose CSet: 86.8 ms]
[Ref Proc: 225.6 ms]
[Ref Enq: 1.5 ms]
[Redirty Cards: 0.1 ms]
[Humongous Register: 4.9 ms]
[Humongous Reclaim: 0.1 ms]
[Free CSet: 3.4 ms]
[Eden: 97.0M(97.0M)->0.0B(98.0M) Survivors: 5120.0K->4096.0K Heap: 801.3M(2048.0M)->703.8M(2048.0M)]
[Times: user=0.04 sys=0.03, real=1.00 secs]
2021-12-29T13:36:41.431+0800: 401313.773: [GC pause (G1 Evacuation Pause) (young), 0.0794512 secs]
[Parallel Time: 36.1 ms, GC Workers: 2]
[GC Worker Start (ms): Min: 401313801.9, Avg: 401313802.0, Max: 401313802.0, Diff: 0.0]
[Ext Root Scanning (ms): Min: 23.0, Avg: 23.1, Max: 23.2, Diff: 0.2, Sum: 46.2]
[Update RS (ms): Min: 6.7, Avg: 6.7, Max: 6.7, Diff: 0.0, Sum: 13.5]
[Processed Buffers: Min: 53, Avg: 69.0, Max: 85, Diff: 32, Sum: 138]
[Scan RS (ms): Min: 0.0, Avg: 0.0, Max: 0.1, Diff: 0.1, Sum: 0.1]
[Code Root Scanning (ms): Min: 0.0, Avg: 0.0, Max: 0.0, Diff: 0.0, Sum: 0.0]
[Object Copy (ms): Min: 3.0, Avg: 3.1, Max: 3.2, Diff: 0.3, Sum: 6.2]
[Termination (ms): Min: 0.0, Avg: 0.0, Max: 0.0, Diff: 0.0, Sum: 0.0]
[Termination Attempts: Min: 1, Avg: 1.0, Max: 1, Diff: 0, Sum: 2]
[GC Worker Other (ms): Min: 1.5, Avg: 1.5, Max: 1.5, Diff: 0.0, Sum: 3.0]
[GC Worker Total (ms): Min: 34.4, Avg: 34.5, Max: 34.5, Diff: 0.0, Sum: 68.9]
[GC Worker End (ms): Min: 401313836.4, Avg: 401313836.4, Max: 401313836.4, Diff: 0.0]
[Code Root Fixup: 0.1 ms]
[Code Root Purge: 0.0 ms]
[Clear CT: 0.3 ms]
[Other: 43.0 ms]
[Choose CSet: 2.0 ms]
[Ref Proc: 1.9 ms]
[Ref Enq: 0.0 ms]
[Redirty Cards: 0.1 ms]
[Humongous Register: 1.9 ms]
[Humongous Reclaim: 0.1 ms]
[Free CSet: 0.4 ms]
[Eden: 98.0M(98.0M)->0.0B(98.0M) Survivors: 4096.0K->4096.0K Heap: 804.4M(2048.0M)->704.9M(2048.0M)]
[Times: user=0.04 sys=0.00, real=0.09 secs]
2021-12-29T13:36:53.651+0800: 401325.995: [GC pause (G1 Evacuation Pause) (young), 0.3620535 secs]
[Parallel Time: 67.7 ms, GC Workers: 2]
[GC Worker Start (ms): Min: 401326272.9, Avg: 401326272.9, Max: 401326272.9, Diff: 0.0]
[Ext Root Scanning (ms): Min: 44.6, Avg: 44.9, Max: 45.2, Diff: 0.6, Sum: 89.9]
[Update RS (ms): Min: 6.9, Avg: 6.9, Max: 7.0, Diff: 0.0, Sum: 13.9]
[Processed Buffers: Min: 46, Avg: 57.0, Max: 68, Diff: 22, Sum: 114]
[Scan RS (ms): Min: 0.0, Avg: 0.0, Max: 0.1, Diff: 0.1, Sum: 0.1]
[Code Root Scanning (ms): Min: 0.0, Avg: 0.0, Max: 0.0, Diff: 0.0, Sum: 0.0]
[Object Copy (ms): Min: 5.6, Avg: 5.9, Max: 6.2, Diff: 0.6, Sum: 11.8]
[Termination (ms): Min: 0.0, Avg: 0.0, Max: 0.0, Diff: 0.0, Sum: 0.0]
[Termination Attempts: Min: 1, Avg: 1.0, Max: 1, Diff: 0, Sum: 2]
[GC Worker Other (ms): Min: 0.0, Avg: 0.0, Max: 0.0, Diff: 0.0, Sum: 0.1]
[GC Worker Total (ms): Min: 57.8, Avg: 57.9, Max: 57.9, Diff: 0.0, Sum: 115.7]
[GC Worker End (ms): Min: 401326330.7, Avg: 401326330.7, Max: 401326330.7, Diff: 0.0]
[Code Root Fixup: 0.1 ms]
[Code Root Purge: 0.0 ms]
[Clear CT: 0.1 ms]
[Other: 294.2 ms]
[Choose CSet: 44.7 ms]
[Ref Proc: 4.0 ms]
[Ref Enq: 0.0 ms]
[Redirty Cards: 0.1 ms]
[Humongous Register: 16.1 ms]
[Humongous Reclaim: 0.1 ms]
[Free CSet: 0.4 ms]
[Eden: 98.0M(98.0M)->0.0B(98.0M) Survivors: 4096.0K->4096.0K Heap: 802.9M(2048.0M)->704.9M(2048.0M)]
[Times: user=0.02 sys=0.02, real=0.51 secs]
2021-12-29T13:37:08.268+0800: 401340.610: [GC pause (G1 Evacuation Pause) (young), 1.7884931 secs]
[Parallel Time: 1203.5 ms, GC Workers: 2]
[GC Worker Start (ms): Min: 401341076.6, Avg: 401341076.6, Max: 401341076.6, Diff: 0.0]
[Ext Root Scanning (ms): Min: 525.2, Avg: 526.2, Max: 527.3, Diff: 2.1, Sum: 1052.4]
[Update RS (ms): Min: 83.4, Avg: 83.4, Max: 83.4, Diff: 0.0, Sum: 166.9]
[Processed Buffers: Min: 53, Avg: 60.5, Max: 68, Diff: 15, Sum: 121]
[Scan RS (ms): Min: 0.1, Avg: 0.1, Max: 0.1, Diff: 0.0, Sum: 0.1]
[Code Root Scanning (ms): Min: 0.0, Avg: 0.0, Max: 0.0, Diff: 0.0, Sum: 0.0]
[Object Copy (ms): Min: 295.5, Avg: 296.4, Max: 297.2, Diff: 1.7, Sum: 592.7]
[Termination (ms): Min: 85.9, Avg: 86.9, Max: 87.9, Diff: 2.0, Sum: 173.8]
[Termination Attempts: Min: 1, Avg: 1.0, Max: 1, Diff: 0, Sum: 2]
[GC Worker Other (ms): Min: 19.8, Avg: 20.6, Max: 21.3, Diff: 1.6, Sum: 41.1]
[GC Worker Total (ms): Min: 1013.4, Avg: 1013.5, Max: 1013.6, Diff: 0.1, Sum: 2027.0]
[GC Worker End (ms): Min: 401342090.1, Avg: 401342090.1, Max: 401342090.2, Diff: 0.2]
[Code Root Fixup: 3.6 ms]
[Code Root Purge: 0.0 ms]
[Clear CT: 1.7 ms]
[Other: 579.7 ms]
[Choose CSet: 2.3 ms]
[Ref Proc: 11.7 ms]
[Ref Enq: 0.0 ms]
[Redirty Cards: 0.1 ms]
[Humongous Register: 18.3 ms]
[Humongous Reclaim: 0.1 ms]
[Free CSet: 19.8 ms]
[Eden: 98.0M(98.0M)->0.0B(98.0M) Survivors: 4096.0K->4096.0K Heap: 802.9M(2048.0M)->704.9M(2048.0M)]
[Times: user=0.06 sys=0.02, real=1.83 secs]
2021-12-29T13:37:15.196+0800: 401347.599: [GC pause (G1 Evacuation Pause) (young), 4.3836274 secs]
[Parallel Time: 2800.8 ms, GC Workers: 2]
[GC Worker Start (ms): Min: 401349098.1, Avg: 401349098.1, Max: 401349098.1, Diff: 0.0]
[Ext Root Scanning (ms): Min: 1649.9, Avg: 1818.4, Max: 1987.0, Diff: 337.1, Sum: 3636.9]
[Update RS (ms): Min: 21.2, Avg: 21.2, Max: 21.2, Diff: 0.0, Sum: 42.3]
[Processed Buffers: Min: 45, Avg: 54.5, Max: 64, Diff: 19, Sum: 109]
[Scan RS (ms): Min: 0.0, Avg: 0.0, Max: 0.0, Diff: 0.0, Sum: 0.1]
[Code Root Scanning (ms): Min: 0.0, Avg: 0.0, Max: 0.0, Diff: 0.0, Sum: 0.0]
[Object Copy (ms): Min: 493.8, Avg: 659.8, Max: 825.8, Diff: 332.0, Sum: 1319.6]
[Termination (ms): Min: 2.1, Avg: 43.0, Max: 84.0, Diff: 81.9, Sum: 86.1]
[Termination Attempts: Min: 1, Avg: 1.0, Max: 1, Diff: 0, Sum: 2]
[GC Worker Other (ms): Min: 124.1, Avg: 162.5, Max: 200.9, Diff: 76.8, Sum: 325.0]
[GC Worker Total (ms): Min: 2705.0, Avg: 2705.0, Max: 2705.0, Diff: 0.0, Sum: 5409.9]
[GC Worker End (ms): Min: 401351803.1, Avg: 401351803.1, Max: 401351803.1, Diff: 0.0]
[Code Root Fixup: 4.4 ms]
[Code Root Purge: 0.0 ms]
[Clear CT: 7.2 ms]
[Other: 1571.3 ms]
[Choose CSet: 4.7 ms]
[Ref Proc: 18.5 ms]
[Ref Enq: 0.0 ms]
[Redirty Cards: 0.1 ms]
[Humongous Register: 394.9 ms]
[Humongous Reclaim: 0.1 ms]
[Free CSet: 5.2 ms]
[Eden: 98.0M(98.0M)->0.0B(96.0M) Survivors: 4096.0K->6144.0K Heap: 802.9M(2048.0M)->706.9M(2048.0M)]
[Times: user=0.02 sys=0.11, real=4.88 secs]
2021-12-29T13:37:32.030+0800: 401364.523: [GC pause (G1 Evacuation Pause) (young), 0.3302520 secs]
[Parallel Time: 44.1 ms, GC Workers: 2]
[GC Worker Start (ms): Min: 401364645.7, Avg: 401364645.7, Max: 401364645.7, Diff: 0.0]
[Ext Root Scanning (ms): Min: 28.8, Avg: 29.0, Max: 29.3, Diff: 0.5, Sum: 58.0]
[Update RS (ms): Min: 5.8, Avg: 5.8, Max: 5.8, Diff: 0.0, Sum: 11.6]
[Processed Buffers: Min: 48, Avg: 60.5, Max: 73, Diff: 25, Sum: 121]
[Scan RS (ms): Min: 0.0, Avg: 0.0, Max: 0.1, Diff: 0.0, Sum: 0.1]
[Code Root Scanning (ms): Min: 0.0, Avg: 0.0, Max: 0.0, Diff: 0.0, Sum: 0.0]
[Object Copy (ms): Min: 4.1, Avg: 4.3, Max: 4.6, Diff: 0.5, Sum: 8.6]
[Termination (ms): Min: 0.0, Avg: 0.0, Max: 0.0, Diff: 0.0, Sum: 0.0]
[Termination Attempts: Min: 1, Avg: 1.0, Max: 1, Diff: 0, Sum: 2]
[GC Worker Other (ms): Min: 0.0, Avg: 0.0, Max: 0.0, Diff: 0.0, Sum: 0.1]
[GC Worker Total (ms): Min: 39.2, Avg: 39.2, Max: 39.2, Diff: 0.0, Sum: 78.5]
[GC Worker End (ms): Min: 401364684.9, Avg: 401364684.9, Max: 401364684.9, Diff: 0.0]
[Code Root Fixup: 0.1 ms]
[Code Root Purge: 0.0 ms]
[Clear CT: 0.1 ms]
[Other: 286.0 ms]
[Choose CSet: 1.6 ms]
[Ref Proc: 3.6 ms]
[Ref Enq: 0.0 ms]
[Redirty Cards: 0.1 ms]
[Humongous Register: 1.7 ms]
[Humongous Reclaim: 0.1 ms]
[Free CSet: 0.4 ms]
[Eden: 96.0M(96.0M)->0.0B(96.0M) Survivors: 6144.0K->6144.0K Heap: 802.9M(2048.0M)->706.9M(2048.0M)]
[Times: user=0.03 sys=0.01, real=0.35 secs]
2021-12-29T13:37:36.495+0800: 401368.929: [GC pause (G1 Evacuation Pause) (young), 0.1865603 secs]
[Parallel Time: 39.9 ms, GC Workers: 2]
[GC Worker Start (ms): Min: 401368963.5, Avg: 401368963.5, Max: 401368963.5, Diff: 0.0]
[Ext Root Scanning (ms): Min: 25.3, Avg: 25.6, Max: 25.9, Diff: 0.6, Sum: 51.2]
[Update RS (ms): Min: 6.0, Avg: 6.0, Max: 6.0, Diff: 0.0, Sum: 12.0]
[Processed Buffers: Min: 44, Avg: 53.0, Max: 62, Diff: 18, Sum: 106]
[Scan RS (ms): Min: 0.0, Avg: 0.1, Max: 0.1, Diff: 0.0, Sum: 0.1]
[Code Root Scanning (ms): Min: 0.0, Avg: 0.0, Max: 0.0, Diff: 0.0, Sum: 0.0]
[Object Copy (ms): Min: 4.8, Avg: 5.1, Max: 5.4, Diff: 0.6, Sum: 10.2]
[Termination (ms): Min: 0.0, Avg: 0.0, Max: 0.0, Diff: 0.0, Sum: 0.0]
[Termination Attempts: Min: 1, Avg: 1.0, Max: 1, Diff: 0, Sum: 2]
[GC Worker Other (ms): Min: 0.0, Avg: 0.0, Max: 0.0, Diff: 0.0, Sum: 0.1]
[GC Worker Total (ms): Min: 36.7, Avg: 36.8, Max: 36.8, Diff: 0.0, Sum: 73.5]
[GC Worker End (ms): Min: 401369000.2, Avg: 401369000.2, Max: 401369000.2, Diff: 0.0]
[Code Root Fixup: 0.1 ms]
[Code Root Purge: 0.0 ms]
[Clear CT: 0.1 ms]
[Other: 146.5 ms]
[Choose CSet: 0.0 ms]
[Ref Proc: 4.2 ms]
[Ref Enq: 0.0 ms]
[Redirty Cards: 0.1 ms]
[Humongous Register: 1.7 ms]
[Humongous Reclaim: 0.1 ms]
[Free CSet: 0.4 ms]
[Eden: 96.0M(96.0M)->0.0B(96.0M) Survivors: 6144.0K->6144.0K Heap: 802.9M(2048.0M)->706.9M(2048.0M)]
[Times: user=0.03 sys=0.01, real=0.20 secs]

可以看出 real 指标明显不正常,根据 REAL TIME IS GREATER THAN USER AND SYS TIME 一文中的分析易知,该指标偏高主要有两种原因,一种是操作系统当前存在大量 IO 活动导致,一种是缺乏 CPU 周期导致。从操作系统级别的监控数据来看,CPU 利用率不及 50%,而磁盘读取及 IOPS 达到上限,故可初步确定是 IO 活动导致。

该实例未启用 SWAP 交换,在进程列表中也没看见 kswapd0,故排除了 SWAP 空间交换的可能。该实例被负载均衡摘除后还出现了两到三次 IO 被打满的情况,最后根据监控数据来看,是根据配置的容器失败策略进行了重启,释放了内存空间,此后再未出现 IO 高的问题。我在对应的实例上找到了 Linux OOM killer 的相关日志:

节点一:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
[Wed Dec 29 16:44:21 2021] java invoked oom-killer: gfp_mask=0x14200ca(GFP_HIGHUSER_MOVABLE), nodemask=(null), order=0, oom_score_adj=0
[Wed Dec 29 16:44:21 2021] java cpuset=45b15e47921b08d315333151686dfeb6aa8ea00026034f81657d73ed790d9e28 mems_allowed=0
[Wed Dec 29 16:44:21 2021] CPU: 0 PID: 708 Comm: java Not tainted 4.15.0-139-generic #143-Ubuntu
[Wed Dec 29 16:44:21 2021] Hardware name: Alibaba Cloud Alibaba Cloud ECS, BIOS 8c24b4c 04/01/2014
[Wed Dec 29 16:44:21 2021] Call Trace:
[Wed Dec 29 16:44:21 2021] dump_stack+0x6d/0x8b
[Wed Dec 29 16:44:21 2021] dump_header+0x71/0x282
[Wed Dec 29 16:44:21 2021] oom_kill_process+0x21f/0x450
[Wed Dec 29 16:44:21 2021] out_of_memory+0x116/0x4e0
[Wed Dec 29 16:44:21 2021] __alloc_pages_slowpath+0xa53/0xe00
[Wed Dec 29 16:44:21 2021] __alloc_pages_nodemask+0x29a/0x2c0
[Wed Dec 29 16:44:21 2021] alloc_pages_current+0x6a/0xe0
[Wed Dec 29 16:44:21 2021] __page_cache_alloc+0x81/0xa0
[Wed Dec 29 16:44:21 2021] filemap_fault+0x414/0x730
[Wed Dec 29 16:44:21 2021] ? filemap_map_pages+0x181/0x390
[Wed Dec 29 16:44:21 2021] ext4_filemap_fault+0x31/0x50
[Wed Dec 29 16:44:21 2021] __do_fault+0x5b/0x111
[Wed Dec 29 16:44:21 2021] __handle_mm_fault+0xdf1/0x1290
[Wed Dec 29 16:44:21 2021] ? __switch_to_asm+0x41/0x70
[Wed Dec 29 16:44:21 2021] handle_mm_fault+0xcc/0x240
[Wed Dec 29 16:44:21 2021] __do_page_fault+0x281/0x4b0
[Wed Dec 29 16:44:21 2021] ? __schedule+0x256/0x890
[Wed Dec 29 16:44:21 2021] do_page_fault+0x2e/0xe0
[Wed Dec 29 16:44:21 2021] ? async_page_fault+0x2f/0x50
[Wed Dec 29 16:44:21 2021] do_async_page_fault+0x51/0x80
[Wed Dec 29 16:44:21 2021] async_page_fault+0x45/0x50
[Wed Dec 29 16:44:21 2021] RIP: 0033:0x7f9062c84650
[Wed Dec 29 16:44:21 2021] RSP: 002b:00007f90317fd8a8 EFLAGS: 00010206
[Wed Dec 29 16:44:21 2021] RAX: 00007f9021d73650 RBX: 00007f9051803e50 RCX: 00007f90643dd320
[Wed Dec 29 16:44:21 2021] RDX: 0000000000000000 RSI: 0000000000000600 RDI: 00007f905c110eb0
[Wed Dec 29 16:44:21 2021] RBP: 00007f90317fe4c0 R08: 0000000000000000 R09: 00007f9002ea6990
[Wed Dec 29 16:44:21 2021] R10: 00007f9002ea6980 R11: 000000000000000b R12: 00007f90317fd920
[Wed Dec 29 16:44:21 2021] R13: 00007f90317fd8c0 R14: 00007f9038385000 R15: 00007f9021d73600
[Wed Dec 29 16:44:21 2021] Mem-Info:
[Wed Dec 29 16:44:21 2021] active_anon:941952 inactive_anon:1374 isolated_anon:0
active_file:225 inactive_file:305 isolated_file:35
unevictable:0 dirty:0 writeback:0 unstable:0
slab_reclaimable:13517 slab_unreclaimable:11977
mapped:467 shmem:1507 pagetables:3998 bounce:0
free:21594 free_pcp:512 free_cma:0
[Wed Dec 29 16:44:21 2021] Node 0 active_anon:3767808kB inactive_anon:5496kB active_file:900kB inactive_file:1220kB unevictable:0kB isolated(anon):0kB isolated(file):140kB mapped:1868kB dirty:0kB writeback:0kB shmem:6028kB shmem_thp: 0kB shmem_pmdmapped: 0kB anon_thp: 0kB writeback_tmp:0kB unstable:0kB all_unreclaimable? no
[Wed Dec 29 16:44:21 2021] Node 0 DMA free:15800kB min:268kB low:332kB high:396kB active_anon:108kB inactive_anon:0kB active_file:0kB inactive_file:0kB unevictable:0kB writepending:0kB present:15992kB managed:15908kB mlocked:0kB kernel_stack:0kB pagetables:0kB bounce:0kB free_pcp:0kB local_pcp:0kB free_cma:0kB
[Wed Dec 29 16:44:21 2021] lowmem_reserve[]: 0 2953 3884 3884 3884
[Wed Dec 29 16:44:21 2021] Node 0 DMA32 free:54820kB min:51180kB low:63972kB high:76764kB active_anon:2927316kB inactive_anon:408kB active_file:588kB inactive_file:1332kB unevictable:0kB writepending:0kB present:3129216kB managed:3063648kB mlocked:0kB kernel_stack:6364kB pagetables:10112kB bounce:0kB free_pcp:1288kB local_pcp:640kB free_cma:0kB
[Wed Dec 29 16:44:21 2021] lowmem_reserve[]: 0 0 930 930 930
[Wed Dec 29 16:44:21 2021] Node 0 Normal free:15756kB min:16132kB low:20164kB high:24196kB active_anon:840384kB inactive_anon:5088kB active_file:0kB inactive_file:276kB unevictable:0kB writepending:0kB present:1048576kB managed:959600kB mlocked:0kB kernel_stack:3524kB pagetables:5880kB bounce:0kB free_pcp:760kB local_pcp:216kB free_cma:0kB
[Wed Dec 29 16:44:21 2021] lowmem_reserve[]: 0 0 0 0 0
[Wed Dec 29 16:44:21 2021] Node 0 DMA: 2*4kB (UM) 2*8kB (M) 2*16kB (M) 2*32kB (UM) 3*64kB (UM) 1*128kB (U) 2*256kB (UM) 1*512kB (M) 2*1024kB (UM) 0*2048kB 3*4096kB (ME) = 15800kB
[Wed Dec 29 16:44:21 2021] Node 0 DMA32: 474*4kB (UMEH) 754*8kB (UMEH) 666*16kB (UMEH) 552*32kB (UMEH) 206*64kB (UMEH) 32*128kB (UMEH) 3*256kB (UEH) 0*512kB 1*1024kB (H) 0*2048kB 0*4096kB = 55320kB
[Wed Dec 29 16:44:21 2021] Node 0 Normal: 782*4kB (ME) 314*8kB (UMEH) 250*16kB (UME) 178*32kB (UMEH) 6*64kB (EH) 2*128kB (EH) 0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 15976kB
[Wed Dec 29 16:44:21 2021] Node 0 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=1048576kB
[Wed Dec 29 16:44:21 2021] Node 0 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=2048kB
[Wed Dec 29 16:44:21 2021] 2071 total pagecache pages
[Wed Dec 29 16:44:21 2021] 0 pages in swap cache
[Wed Dec 29 16:44:21 2021] Swap cache stats: add 0, delete 0, find 0/0
[Wed Dec 29 16:44:21 2021] Free swap = 0kB
[Wed Dec 29 16:44:21 2021] Total swap = 0kB
[Wed Dec 29 16:44:21 2021] 1048446 pages RAM
[Wed Dec 29 16:44:21 2021] 0 pages HighMem/MovableOnly
[Wed Dec 29 16:44:21 2021] 38657 pages reserved
[Wed Dec 29 16:44:21 2021] 0 pages cma reserved
[Wed Dec 29 16:44:21 2021] 0 pages hwpoisoned
[Wed Dec 29 16:44:21 2021] [ pid ] uid tgid total_vm rss pgtables_bytes swapents oom_score_adj name
[Wed Dec 29 16:44:21 2021] [ 226] 0 226 27376 1792 258048 0 0 systemd-journal
[Wed Dec 29 16:44:21 2021] [ 267] 0 267 11657 600 122880 0 -1000 systemd-udevd
[Wed Dec 29 16:44:21 2021] [ 483] 100 483 20023 185 184320 0 0 systemd-network
[Wed Dec 29 16:44:21 2021] [ 504] 101 504 17700 203 180224 0 0 systemd-resolve
[Wed Dec 29 16:44:21 2021] [ 625] 0 625 7085 52 102400 0 0 atd
[Wed Dec 29 16:44:21 2021] [ 635] 0 635 17676 206 176128 0 0 systemd-logind
[Wed Dec 29 16:44:21 2021] [ 638] 0 638 71882 207 196608 0 0 accounts-daemon
[Wed Dec 29 16:44:21 2021] [ 641] 103 641 12514 168 139264 0 -900 dbus-daemon
[Wed Dec 29 16:44:21 2021] [ 683] 0 683 7832 72 98304 0 0 cron
[Wed Dec 29 16:44:21 2021] [ 684] 0 684 42599 1945 233472 0 0 networkd-dispat
[Wed Dec 29 16:44:21 2021] [ 687] 102 687 65761 430 163840 0 0 rsyslogd
[Wed Dec 29 16:44:21 2021] [ 688] 0 688 18077 189 188416 0 -1000 sshd
[Wed Dec 29 16:44:21 2021] [ 689] 0 689 244701 5208 303104 0 -999 containerd
[Wed Dec 29 16:44:21 2021] [ 706] 0 706 3991 34 77824 0 0 agetty
[Wed Dec 29 16:44:21 2021] [ 707] 0 707 4047 32 73728 0 0 agetty
[Wed Dec 29 16:44:21 2021] [ 715] 108 715 26431 117 118784 0 0 chronyd
[Wed Dec 29 16:44:21 2021] [ 897] 0 897 320386 11297 548864 0 -500 dockerd
[Wed Dec 29 16:44:21 2021] [26025] 0 26025 14160 382 122880 0 0 ilogtail
[Wed Dec 29 16:44:21 2021] [26026] 0 26026 270155 6716 479232 0 0 ilogtail
[Wed Dec 29 16:44:21 2021] [26034] 105 26034 7037 46 102400 0 0 uuidd
[Wed Dec 29 16:44:21 2021] [29000] 0 29000 7996 65 86016 0 -1000 auditd
[Wed Dec 29 16:44:21 2021] [ 634] 0 634 15884 5298 147456 0 0 AliYunDunUpdate
[Wed Dec 29 16:44:21 2021] [18822] 0 18822 201531 1198 61440 0 0 aliyun-service
[Wed Dec 29 16:44:21 2021] [19009] 0 19009 39701 5639 339968 0 0 AliYunDun
[Wed Dec 29 16:44:21 2021] [ 612] 0 612 28344 421 90112 0 -998 containerd-shim
[Wed Dec 29 16:44:21 2021] [ 652] 0 652 1394208 856073 8089600 0 0 java
[Wed Dec 29 16:44:21 2021] [12870] 0 12870 19174 271 196608 0 0 systemd
[Wed Dec 29 16:44:21 2021] [12871] 0 12871 27405 629 245760 0 0 (sd-pam)
[Wed Dec 29 16:44:21 2021] [14579] 0 14579 26998 253 253952 0 0 sshd
[Wed Dec 29 16:44:21 2021] [14615] 0 14615 6060 816 102400 0 0 bash
[Wed Dec 29 16:44:21 2021] [14660] 0 14660 211408 6348 311296 0 0 docker
[Wed Dec 29 16:44:21 2021] [14678] 0 14678 5066 118 90112 0 0 bash
[Wed Dec 29 16:44:21 2021] [14953] 0 14953 26998 258 262144 0 0 sshd
[Wed Dec 29 16:44:21 2021] [14989] 0 14989 5696 428 86016 0 0 bash
[Wed Dec 29 16:44:21 2021] [15299] 0 15299 26998 258 258048 0 0 sshd
[Wed Dec 29 16:44:21 2021] [15337] 0 15337 6066 809 98304 0 0 bash
[Wed Dec 29 16:44:21 2021] [15488] 0 15488 211408 6414 315392 0 0 docker
[Wed Dec 29 16:44:21 2021] [15514] 0 15514 5098 121 81920 0 0 bash
[Wed Dec 29 16:44:21 2021] [15818] 0 15818 26998 258 253952 0 0 sshd
[Wed Dec 29 16:44:21 2021] [15923] 0 15923 5664 395 98304 0 0 bash
[Wed Dec 29 16:44:21 2021] [17804] 0 17804 33858 5313 315392 0 0 apt
[Wed Dec 29 16:44:21 2021] [17882] 0 17882 5033 67 86016 0 0 ttop.sh
[Wed Dec 29 16:44:21 2021] [17898] 0 17898 868500 16181 462848 0 0 java
[Wed Dec 29 16:44:21 2021] [17935] 0 17935 6939 428 106496 0 0 htop
[Wed Dec 29 16:44:21 2021] [17982] 0 17982 713 7 32768 0 0 assist_daemon
[Wed Dec 29 16:44:21 2021] [17983] 0 17983 11657 597 110592 0 0 systemd-udevd
[Wed Dec 29 16:44:21 2021] [17984] 0 17984 1159 23 53248 0 0 sh
[Wed Dec 29 16:44:21 2021] [17985] 0 17985 15453 105 151552 0 0 systemd-tmpfile
[Wed Dec 29 16:44:21 2021] [17986] 0 17986 11657 597 110592 0 0 systemd-udevd
[Wed Dec 29 16:44:21 2021] [17987] 0 17987 11657 597 110592 0 0 systemd-udevd
[Wed Dec 29 16:44:21 2021] [17988] 0 17988 15481 131 151552 0 0 systemctl
[Wed Dec 29 16:44:21 2021] [17990] 0 17990 11657 597 110592 0 0 systemd-udevd
[Wed Dec 29 16:44:21 2021] [17991] 0 17991 11657 597 110592 0 0 systemd-udevd
[Wed Dec 29 16:44:21 2021] [17992] 0 17992 12031 90 135168 0 0 cron
[Wed Dec 29 16:44:21 2021] [17993] 0 17993 11657 597 110592 0 0 systemd-udevd
[Wed Dec 29 16:44:21 2021] [17994] 0 17994 11657 595 110592 0 0 systemd-udevd
[Wed Dec 29 16:44:21 2021] Out of memory: Kill process 652 (java) score 849 or sacrifice child
[Wed Dec 29 16:44:21 2021] Killed process 652 (java) total-vm:5576832kB, anon-rss:3424292kB, file-rss:0kB, shmem-rss:0kB
[Wed Dec 29 16:44:22 2021] oom_reaper: reaped process 652 (java), now anon-rss:0kB, file-rss:0kB, shmem-rss:0kB

节点二:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
[Wed Dec 29 14:10:12 2021] dockerd invoked oom-killer: gfp_mask=0x14200ca(GFP_HIGHUSER_MOVABLE), nodemask=(null), order=0, oom_score_adj=-500
[Wed Dec 29 14:10:12 2021] dockerd cpuset=/ mems_allowed=0
[Wed Dec 29 14:10:12 2021] CPU: 1 PID: 27199 Comm: dockerd Not tainted 4.15.0-139-generic #143-Ubuntu
[Wed Dec 29 14:10:12 2021] Hardware name: Alibaba Cloud Alibaba Cloud ECS, BIOS 8c24b4c 04/01/2014
[Wed Dec 29 14:10:12 2021] Call Trace:
[Wed Dec 29 14:10:12 2021] dump_stack+0x6d/0x8b
[Wed Dec 29 14:10:12 2021] dump_header+0x71/0x282
[Wed Dec 29 14:10:12 2021] oom_kill_process+0x21f/0x450
[Wed Dec 29 14:10:12 2021] out_of_memory+0x116/0x4e0
[Wed Dec 29 14:10:12 2021] __alloc_pages_slowpath+0xa53/0xe00
[Wed Dec 29 14:10:12 2021] __alloc_pages_nodemask+0x29a/0x2c0
[Wed Dec 29 14:10:12 2021] alloc_pages_current+0x6a/0xe0
[Wed Dec 29 14:10:12 2021] __page_cache_alloc+0x81/0xa0
[Wed Dec 29 14:10:12 2021] filemap_fault+0x414/0x730
[Wed Dec 29 14:10:12 2021] ? filemap_map_pages+0x181/0x390
[Wed Dec 29 14:10:12 2021] ext4_filemap_fault+0x31/0x50
[Wed Dec 29 14:10:12 2021] __do_fault+0x5b/0x111
[Wed Dec 29 14:10:12 2021] __handle_mm_fault+0xdf1/0x1290
[Wed Dec 29 14:10:12 2021] ? futex_wake+0x8f/0x180
[Wed Dec 29 14:10:12 2021] handle_mm_fault+0xcc/0x240
[Wed Dec 29 14:10:12 2021] __do_page_fault+0x281/0x4b0
[Wed Dec 29 14:10:12 2021] do_page_fault+0x2e/0xe0
[Wed Dec 29 14:10:12 2021] ? async_page_fault+0x2f/0x50
[Wed Dec 29 14:10:12 2021] do_async_page_fault+0x51/0x80
[Wed Dec 29 14:10:12 2021] async_page_fault+0x45/0x50
[Wed Dec 29 14:10:12 2021] RIP: 0033:0x55572f159a00
[Wed Dec 29 14:10:12 2021] RSP: 002b:00007fde8effccf8 EFLAGS: 00010206
[Wed Dec 29 14:10:12 2021] RAX: 000000c000a30c98 RBX: 000000c000c84600 RCX: fffffffffffffff8
[Wed Dec 29 14:10:12 2021] RDX: 0000000000000000 RSI: 000000c000d6a300 RDI: 0000000000000001
[Wed Dec 29 14:10:12 2021] RBP: 000000c000a30d18 R08: 0000000000000001 R09: 0000000000000001
[Wed Dec 29 14:10:12 2021] R10: 000000c0009102a0 R11: 0000000000000000 R12: ffffffffffffffff
[Wed Dec 29 14:10:12 2021] R13: 0000000000000016 R14: 0000000000000015 R15: 0000000000000100
[Wed Dec 29 14:10:12 2021] Mem-Info:
[Wed Dec 29 14:10:12 2021] active_anon:938915 inactive_anon:1379 isolated_anon:0
active_file:1319 inactive_file:1895 isolated_file:0
unevictable:0 dirty:0 writeback:22 unstable:0
slab_reclaimable:13464 slab_unreclaimable:12092
mapped:3538 shmem:1510 pagetables:3096 bounce:0
free:21280 free_pcp:15 free_cma:0
[Wed Dec 29 14:10:12 2021] Node 0 active_anon:3755660kB inactive_anon:5516kB active_file:5276kB inactive_file:7580kB unevictable:0kB isolated(anon):0kB isolated(file):0kB mapped:14152kB dirty:0kB writeback:88kB shmem:6040kB shmem_thp: 0kB shmem_pmdmapped: 0kB anon_thp: 0kB writeback_tmp:0kB unstable:0kB all_unreclaimable? no
[Wed Dec 29 14:10:12 2021] Node 0 DMA free:15772kB min:268kB low:332kB high:396kB active_anon:92kB inactive_anon:0kB active_file:0kB inactive_file:4kB unevictable:0kB writepending:0kB present:15992kB managed:15908kB mlocked:0kB kernel_stack:0kB pagetables:4kB bounce:0kB free_pcp:0kB local_pcp:0kB free_cma:0kB
[Wed Dec 29 14:10:12 2021] lowmem_reserve[]: 0 2926 3884 3884 3884
[Wed Dec 29 14:10:12 2021] Node 0 DMA32 free:53996kB min:50704kB low:63380kB high:76056kB active_anon:2888028kB inactive_anon:324kB active_file:4668kB inactive_file:7580kB unevictable:0kB writepending:0kB present:3129216kB managed:3041700kB mlocked:0kB kernel_stack:4992kB pagetables:7516kB bounce:0kB free_pcp:60kB local_pcp:12kB free_cma:0kB
[Wed Dec 29 14:10:12 2021] lowmem_reserve[]: 0 0 958 958 958
[Wed Dec 29 14:10:12 2021] Node 0 Normal free:15352kB min:16608kB low:20760kB high:24912kB active_anon:867540kB inactive_anon:5192kB active_file:12kB inactive_file:8kB unevictable:0kB writepending:0kB present:1048576kB managed:981540kB mlocked:0kB kernel_stack:2768kB pagetables:4864kB bounce:0kB free_pcp:0kB local_pcp:0kB free_cma:0kB
[Wed Dec 29 14:10:12 2021] lowmem_reserve[]: 0 0 0 0 0
[Wed Dec 29 14:10:12 2021] Node 0 DMA: 1*4kB (M) 1*8kB (M) 1*16kB (M) 2*32kB (UM) 3*64kB (UM) 1*128kB (U) 2*256kB (UM) 1*512kB (M) 2*1024kB (UM) 0*2048kB 3*4096kB (ME) = 15772kB
[Wed Dec 29 14:10:12 2021] Node 0 DMA32: 1106*4kB (UMEH) 1379*8kB (UMEH) 628*16kB (UMEH) 557*32kB (UMEH) 147*64kB (UMEH) 13*128kB (UM) 0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 54400kB
[Wed Dec 29 14:10:12 2021] Node 0 Normal: 1334*4kB (UMH) 620*8kB (UMH) 316*16kB (UMH) 0*32kB 0*64kB 0*128kB 0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 15352kB
[Wed Dec 29 14:10:12 2021] Node 0 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=1048576kB
[Wed Dec 29 14:10:12 2021] Node 0 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=2048kB
[Wed Dec 29 14:10:12 2021] 4763 total pagecache pages
[Wed Dec 29 14:10:12 2021] 0 pages in swap cache
[Wed Dec 29 14:10:12 2021] Swap cache stats: add 0, delete 0, find 0/0
[Wed Dec 29 14:10:12 2021] Free swap = 0kB
[Wed Dec 29 14:10:12 2021] Total swap = 0kB
[Wed Dec 29 14:10:12 2021] 1048446 pages RAM
[Wed Dec 29 14:10:12 2021] 0 pages HighMem/MovableOnly
[Wed Dec 29 14:10:12 2021] 38659 pages reserved
[Wed Dec 29 14:10:12 2021] 0 pages cma reserved
[Wed Dec 29 14:10:12 2021] 0 pages hwpoisoned
[Wed Dec 29 14:10:12 2021] [ pid ] uid tgid total_vm rss pgtables_bytes swapents oom_score_adj name
[Wed Dec 29 14:10:12 2021] [ 226] 0 226 31650 1956 270336 0 0 systemd-journal
[Wed Dec 29 14:10:12 2021] [ 273] 0 273 11820 752 118784 0 -1000 systemd-udevd
[Wed Dec 29 14:10:12 2021] [ 490] 100 490 20023 186 184320 0 0 systemd-network
[Wed Dec 29 14:10:12 2021] [ 509] 101 509 17699 197 184320 0 0 systemd-resolve
[Wed Dec 29 14:10:12 2021] [ 620] 0 620 42599 1946 237568 0 0 networkd-dispat
[Wed Dec 29 14:10:12 2021] [ 623] 0 623 17677 208 176128 0 0 systemd-logind
[Wed Dec 29 14:10:12 2021] [ 625] 0 625 71882 248 192512 0 0 accounts-daemon
[Wed Dec 29 14:10:12 2021] [ 629] 0 629 7085 51 94208 0 0 atd
[Wed Dec 29 14:10:12 2021] [ 631] 0 631 7832 72 114688 0 0 cron
[Wed Dec 29 14:10:12 2021] [ 638] 102 638 65761 474 163840 0 0 rsyslogd
[Wed Dec 29 14:10:12 2021] [ 640] 103 640 12512 168 143360 0 -900 dbus-daemon
[Wed Dec 29 14:10:12 2021] [ 685] 0 685 244701 5291 323584 0 -999 containerd
[Wed Dec 29 14:10:12 2021] [ 690] 0 690 18077 189 184320 0 -1000 sshd
[Wed Dec 29 14:10:12 2021] [ 702] 108 702 26431 120 118784 0 0 chronyd
[Wed Dec 29 14:10:12 2021] [ 743] 0 743 3991 33 77824 0 0 agetty
[Wed Dec 29 14:10:12 2021] [ 744] 0 744 4047 34 77824 0 0 agetty
[Wed Dec 29 14:10:12 2021] [ 885] 0 885 320386 10920 548864 0 -500 dockerd
[Wed Dec 29 14:10:12 2021] [ 1469] 0 1469 14156 378 122880 0 0 ilogtail
[Wed Dec 29 14:10:12 2021] [ 1471] 0 1471 61793 1079 241664 0 0 ilogtail
[Wed Dec 29 14:10:12 2021] [ 1478] 105 1478 7037 45 102400 0 0 uuidd
[Wed Dec 29 14:10:12 2021] [ 3574] 0 3574 7996 66 94208 0 -1000 auditd
[Wed Dec 29 14:10:12 2021] [ 6881] 0 6881 514 36 32768 0 0 monitor
[Wed Dec 29 14:10:12 2021] [ 3152] 0 3152 116912 68663 622592 0 0 agent
[Wed Dec 29 14:10:12 2021] [22659] 0 22659 201531 1157 61440 0 0 aliyun-service
[Wed Dec 29 14:10:12 2021] [23149] 0 23149 18957 7100 159744 0 0 AliYunDunUpdate
[Wed Dec 29 14:10:12 2021] [ 2788] 0 2788 39771 5713 339968 0 0 AliYunDun
[Wed Dec 29 14:10:12 2021] [ 2155] 0 2155 28344 375 90112 0 -998 containerd-shim
[Wed Dec 29 14:10:12 2021] [ 2183] 0 2183 1369890 829567 7798784 0 0 java
[Wed Dec 29 14:10:12 2021] [17883] 0 17883 713 7 32768 0 0 assist_daemon
[Wed Dec 29 14:10:12 2021] [17884] 0 17884 1159 23 57344 0 0 sh
[Wed Dec 29 14:10:12 2021] [17886] 0 17886 15481 121 151552 0 0 systemctl
[Wed Dec 29 14:10:12 2021] [17892] 0 17892 16476 86 167936 0 0 sshd
[Wed Dec 29 14:10:12 2021] Out of memory: Kill process 2183 (java) score 823 or sacrifice child
[Wed Dec 29 14:10:12 2021] Killed process 2183 (java) total-vm:5479560kB, anon-rss:3318268kB, file-rss:0kB, shmem-rss:0kB
[Wed Dec 29 14:10:12 2021] oom_reaper: reaped process 2183 (java), now anon-rss:0kB, file-rss:4kB, shmem-rss:0kB
[Wed Dec 29 16:32:02 2021] AliYunDunUpdate invoked oom-killer: gfp_mask=0x14200ca(GFP_HIGHUSER_MOVABLE), nodemask=(null), order=0, oom_score_adj=0
[Wed Dec 29 16:32:02 2021] AliYunDunUpdate cpuset=/ mems_allowed=0
[Wed Dec 29 16:32:02 2021] CPU: 1 PID: 23153 Comm: AliYunDunUpdate Not tainted 4.15.0-139-generic #143-Ubuntu
[Wed Dec 29 16:32:02 2021] Hardware name: Alibaba Cloud Alibaba Cloud ECS, BIOS 8c24b4c 04/01/2014
[Wed Dec 29 16:32:02 2021] Call Trace:
[Wed Dec 29 16:32:02 2021] dump_stack+0x6d/0x8b
[Wed Dec 29 16:32:02 2021] dump_header+0x71/0x282
[Wed Dec 29 16:32:02 2021] oom_kill_process+0x21f/0x450
[Wed Dec 29 16:32:02 2021] out_of_memory+0x116/0x4e0
[Wed Dec 29 16:32:02 2021] __alloc_pages_slowpath+0xa53/0xe00
[Wed Dec 29 16:32:02 2021] __alloc_pages_nodemask+0x29a/0x2c0
[Wed Dec 29 16:32:02 2021] alloc_pages_current+0x6a/0xe0
[Wed Dec 29 16:32:02 2021] __page_cache_alloc+0x81/0xa0
[Wed Dec 29 16:32:02 2021] filemap_fault+0x414/0x730
[Wed Dec 29 16:32:02 2021] ? filemap_map_pages+0x22d/0x390
[Wed Dec 29 16:32:02 2021] ext4_filemap_fault+0x31/0x50
[Wed Dec 29 16:32:02 2021] __do_fault+0x5b/0x111
[Wed Dec 29 16:32:02 2021] __handle_mm_fault+0xdf1/0x1290
[Wed Dec 29 16:32:02 2021] handle_mm_fault+0xcc/0x240
[Wed Dec 29 16:32:02 2021] __do_page_fault+0x281/0x4b0
[Wed Dec 29 16:32:02 2021] do_page_fault+0x2e/0xe0
[Wed Dec 29 16:32:02 2021] ? async_page_fault+0x2f/0x50
[Wed Dec 29 16:32:02 2021] do_async_page_fault+0x51/0x80
[Wed Dec 29 16:32:02 2021] async_page_fault+0x45/0x50
[Wed Dec 29 16:32:02 2021] RIP: 0033:0x45755c
[Wed Dec 29 16:32:02 2021] RSP: 002b:00007fd0796e9d08 EFLAGS: 00010202
[Wed Dec 29 16:32:02 2021] RAX: 0000000000000000 RBX: 000000000229ad80 RCX: 0000000000000000
[Wed Dec 29 16:32:02 2021] RDX: 0000000000000002 RSI: 000000000229add0 RDI: 00007fd0796e9d30
[Wed Dec 29 16:32:02 2021] RBP: 00007fd0796e9db0 R08: 0000000002275268 R09: 0000000000000000
[Wed Dec 29 16:32:02 2021] R10: 0000000000000000 R11: 0000000000000246 R12: 000000000229ad80
[Wed Dec 29 16:32:02 2021] R13: 00007fd0796e9d30 R14: 00007fd07a61bd98 R15: 00007fd07a61bd80
[Wed Dec 29 16:32:02 2021] Mem-Info:
[Wed Dec 29 16:32:02 2021] active_anon:942179 inactive_anon:1369 isolated_anon:0
active_file:253 inactive_file:906 isolated_file:1
unevictable:0 dirty:2 writeback:22 unstable:0
slab_reclaimable:13154 slab_unreclaimable:12051
mapped:1721 shmem:1502 pagetables:3451 bounce:0
free:19986 free_pcp:0 free_cma:0
[Wed Dec 29 16:32:02 2021] Node 0 active_anon:3768716kB inactive_anon:5476kB active_file:1012kB inactive_file:3624kB unevictable:0kB isolated(anon):0kB isolated(file):4kB mapped:6884kB dirty:8kB writeback:88kB shmem:6008kB shmem_thp: 0kB shmem_pmdmapped: 0kB anon_thp: 0kB writeback_tmp:0kB unstable:0kB all_unreclaimable? no
[Wed Dec 29 16:32:02 2021] Node 0 DMA free:15792kB min:268kB low:332kB high:396kB active_anon:116kB inactive_anon:0kB active_file:0kB inactive_file:0kB unevictable:0kB writepending:0kB present:15992kB managed:15908kB mlocked:0kB kernel_stack:0kB pagetables:0kB bounce:0kB free_pcp:0kB local_pcp:0kB free_cma:0kB
[Wed Dec 29 16:32:02 2021] lowmem_reserve[]: 0 2926 3884 3884 3884
[Wed Dec 29 16:32:02 2021] Node 0 DMA32 free:54240kB min:50704kB low:63380kB high:76056kB active_anon:2894188kB inactive_anon:276kB active_file:1384kB inactive_file:3724kB unevictable:0kB writepending:8kB present:3129216kB managed:3041700kB mlocked:0kB kernel_stack:6256kB pagetables:8864kB bounce:0kB free_pcp:0kB local_pcp:0kB free_cma:0kB
[Wed Dec 29 16:32:02 2021] lowmem_reserve[]: 0 0 958 958 958
[Wed Dec 29 16:32:02 2021] Node 0 Normal free:9912kB min:16608kB low:20760kB high:24912kB active_anon:874412kB inactive_anon:5200kB active_file:12kB inactive_file:8kB unevictable:0kB writepending:0kB present:1048576kB managed:981540kB mlocked:0kB kernel_stack:2832kB pagetables:4940kB bounce:0kB free_pcp:0kB local_pcp:0kB free_cma:0kB
[Wed Dec 29 16:32:02 2021] lowmem_reserve[]: 0 0 0 0 0
[Wed Dec 29 16:32:02 2021] Node 0 DMA: 2*4kB (UM) 1*8kB (M) 0*16kB 1*32kB (U) 2*64kB (U) 2*128kB (UM) 2*256kB (UM) 1*512kB (M) 2*1024kB (UM) 0*2048kB 3*4096kB (ME) = 15792kB
[Wed Dec 29 16:32:02 2021] Node 0 DMA32: 734*4kB (UME) 772*8kB (UME) 496*16kB (UME) 405*32kB (UME) 149*64kB (UME) 35*128kB (UME) 39*256kB (UME) 0*512kB 0*1024kB 0*2048kB 0*4096kB = 54008kB
[Wed Dec 29 16:32:02 2021] Node 0 Normal: 874*4kB (UM) 384*8kB (UM) 153*16kB (UM) 24*32kB (M) 2*64kB (M) 0*128kB 0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 9912kB
[Wed Dec 29 16:32:02 2021] Node 0 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=1048576kB
[Wed Dec 29 16:32:02 2021] Node 0 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=2048kB
[Wed Dec 29 16:32:02 2021] 2665 total pagecache pages
[Wed Dec 29 16:32:02 2021] 0 pages in swap cache
[Wed Dec 29 16:32:02 2021] Swap cache stats: add 0, delete 0, find 0/0
[Wed Dec 29 16:32:02 2021] Free swap = 0kB
[Wed Dec 29 16:32:02 2021] Total swap = 0kB
[Wed Dec 29 16:32:02 2021] 1048446 pages RAM
[Wed Dec 29 16:32:02 2021] 0 pages HighMem/MovableOnly
[Wed Dec 29 16:32:02 2021] 38659 pages reserved
[Wed Dec 29 16:32:02 2021] 0 pages cma reserved
[Wed Dec 29 16:32:02 2021] 0 pages hwpoisoned
[Wed Dec 29 16:32:02 2021] [ pid ] uid tgid total_vm rss pgtables_bytes swapents oom_score_adj name
[Wed Dec 29 16:32:02 2021] [ 226] 0 226 31650 1969 270336 0 0 systemd-journal
[Wed Dec 29 16:32:02 2021] [ 273] 0 273 11820 752 118784 0 -1000 systemd-udevd
[Wed Dec 29 16:32:02 2021] [ 490] 100 490 20023 186 184320 0 0 systemd-network
[Wed Dec 29 16:32:02 2021] [ 509] 101 509 17699 197 184320 0 0 systemd-resolve
[Wed Dec 29 16:32:02 2021] [ 620] 0 620 42599 1946 237568 0 0 networkd-dispat
[Wed Dec 29 16:32:02 2021] [ 623] 0 623 17677 208 176128 0 0 systemd-logind
[Wed Dec 29 16:32:02 2021] [ 625] 0 625 71882 208 192512 0 0 accounts-daemon
[Wed Dec 29 16:32:02 2021] [ 629] 0 629 7085 51 94208 0 0 atd
[Wed Dec 29 16:32:02 2021] [ 631] 0 631 7832 72 114688 0 0 cron
[Wed Dec 29 16:32:02 2021] [ 638] 102 638 65761 475 163840 0 0 rsyslogd
[Wed Dec 29 16:32:02 2021] [ 640] 103 640 12512 168 143360 0 -900 dbus-daemon
[Wed Dec 29 16:32:02 2021] [ 685] 0 685 244701 5305 323584 0 -999 containerd
[Wed Dec 29 16:32:02 2021] [ 690] 0 690 18077 190 184320 0 -1000 sshd
[Wed Dec 29 16:32:02 2021] [ 702] 108 702 26431 120 118784 0 0 chronyd
[Wed Dec 29 16:32:02 2021] [ 743] 0 743 3991 33 77824 0 0 agetty
[Wed Dec 29 16:32:02 2021] [ 744] 0 744 4047 34 77824 0 0 agetty
[Wed Dec 29 16:32:02 2021] [ 885] 0 885 320386 10948 548864 0 -500 dockerd
[Wed Dec 29 16:32:02 2021] [ 1469] 0 1469 14156 378 122880 0 0 ilogtail
[Wed Dec 29 16:32:02 2021] [ 1471] 0 1471 61793 1101 241664 0 0 ilogtail
[Wed Dec 29 16:32:02 2021] [ 1478] 105 1478 7037 45 102400 0 0 uuidd
[Wed Dec 29 16:32:02 2021] [ 3574] 0 3574 7996 66 94208 0 -1000 auditd
[Wed Dec 29 16:32:02 2021] [ 6881] 0 6881 514 36 32768 0 0 monitor
[Wed Dec 29 16:32:02 2021] [ 3152] 0 3152 116912 69016 622592 0 0 agent
[Wed Dec 29 16:32:02 2021] [22659] 0 22659 201531 1142 61440 0 0 aliyun-service
[Wed Dec 29 16:32:02 2021] [23149] 0 23149 18957 7100 159744 0 0 AliYunDunUpdate
[Wed Dec 29 16:32:02 2021] [ 2788] 0 2788 39771 5713 339968 0 0 AliYunDun
[Wed Dec 29 16:32:02 2021] [18047] 0 18047 28344 390 86016 0 -998 containerd-shim
[Wed Dec 29 16:32:02 2021] [18086] 0 18086 1386080 824350 7839744 0 0 java
[Wed Dec 29 16:32:02 2021] [19513] 0 19513 26998 256 249856 0 0 sshd
[Wed Dec 29 16:32:02 2021] [19515] 0 19515 19174 259 192512 0 0 systemd
[Wed Dec 29 16:32:02 2021] [19516] 0 19516 27402 626 241664 0 0 (sd-pam)
[Wed Dec 29 16:32:02 2021] [19560] 0 19560 5664 395 94208 0 0 bash
[Wed Dec 29 16:32:02 2021] [21234] 0 21234 211408 6318 311296 0 0 docker
[Wed Dec 29 16:32:02 2021] [21758] 0 21758 11820 753 106496 0 0 systemd-udevd
[Wed Dec 29 16:32:02 2021] [21759] 0 21759 11820 753 106496 0 0 systemd-udevd
[Wed Dec 29 16:32:02 2021] [21762] 0 21762 11820 753 106496 0 0 systemd-udevd
[Wed Dec 29 16:32:02 2021] [21763] 0 21763 11820 753 106496 0 0 systemd-udevd
[Wed Dec 29 16:32:02 2021] [21765] 0 21765 713 5 32768 0 0 assist_daemon
[Wed Dec 29 16:32:02 2021] [21766] 0 21766 1159 22 53248 0 0 sh
[Wed Dec 29 16:32:02 2021] [21767] 0 21767 11820 753 106496 0 0 systemd-udevd
[Wed Dec 29 16:32:02 2021] [21768] 0 21768 11820 753 106496 0 0 systemd-udevd
[Wed Dec 29 16:32:02 2021] [21769] 0 21769 11820 753 106496 0 0 systemd-udevd
[Wed Dec 29 16:32:02 2021] [21770] 0 21770 14006 62 139264 0 0 systemctl
[Wed Dec 29 16:32:02 2021] Out of memory: Kill process 18086 (java) score 818 or sacrifice child
[Wed Dec 29 16:32:02 2021] Killed process 18086 (java) total-vm:5544320kB, anon-rss:3297400kB, file-rss:0kB, shmem-rss:0kB
[Wed Dec 29 16:32:02 2021] oom_reaper: reaped process 18086 (java), now anon-rss:0kB, file-rss:0kB, shmem-rss:0kB

根据日志的时间,可以明确自 Java 进程重启后,应用恢复了正常,即释放了内存空间。在重启之前,内存在耗尽边缘时,猜测页面大量换入换出导致磁盘读取被打满,最后 Linux OOM killer 将 Java 进程结束,而此时 JVM 中并未发生 OOM,是因为无法物理内存不足导致 Java 进行被操作系统结束,所以我们配置的 -XX:+HeapDumpOnOutOfMemoryError -XX:HeapDumpPath=/opt/shared/heapdump.hprof 是不会自动生成 JVM 的堆转储的,也可以参考以下这两个链接:

关于问题的原因初步估计还是应用存在内存泄漏导致,或者应用里存在大量对堆外内存的使用?等待下一次问题问题发生时拉取一次堆转储具体看看。我用 stress-ng 模拟内存压力大的场景可以复现该问题,表现一模一样,即磁盘读取和 IOPS 达到上限,命令行几乎无法操作,最后触发 Linux OOM killer 将 Java 进程结束,随后恢复正常。只是在之前的场景中是什么导致了内存压力大还需要问题复现后排查。

Reference

unix - What do ‘real’, ‘user’ and ‘sys’ mean in the output of time(1)? - Stack Overflow
Eliminating Large JVM GC Pauses Caused by Background IO Traffic | LinkedIn Engineering
UNDERSTANDING G1 GC LOG FORMAT – GC easy – Universal Java GC Log Analyser
memory - High IO rate when the system out of RAM - Unix & Linux Stack Exchange
Troubleshooting High I/O Wait in Linux | Medium
testing - How to fill 90% of the free memory? - Unix & Linux Stack Exchange
Kernel/Reference/stress-ng - Ubuntu Wiki