Poison

Kubernetes #66607

我们在部分应用部署至 K8s 集群后发现一个问题,若应用 A 与应用 B 都部署至 K8s 的集群中,且应用 B 对接了负载均衡在公网通过 HTTPS 暴露其 HTTP 接口,负载均衡在七层协议层进行 HTTPS 流量解密这种场景下,应用 A 通过 HTTPS 接口请求负载均衡时,K8s 内部会将 IP 直接路由至应用 B 的节点,导致流量没有经过负载均衡,从而使 HTTPS 流量被发送至 HTTP 服务的端口,触发 SSL 握手异常。

在 Java 层中的部分异常栈帧如下:

1
2
3
4
5
6
7
8
Caused by: javax.net.ssl.SSLException: Unsupported or unrecognized SSL message
at sun.security.ssl.SSLSocketInputRecord.handleUnknownRecord(SSLSocketInputRecord.java:448)
at sun.security.ssl.SSLSocketInputRecord.decode(SSLSocketInputRecord.java:174)
at sun.security.ssl.SSLTransport.decode(SSLTransport.java:110)
at sun.security.ssl.SSLSocketImpl.decode(SSLSocketImpl.java:1290)
at sun.security.ssl.SSLSocketImpl.readHandshakeRecord(SSLSocketImpl.java:1199)
at sun.security.ssl.SSLSocketImpl.startHandshake(SSLSocketImpl.java:401)
at sun.security.ssl.SSLSocketImpl.startHandshake(SSLSocketImpl.java:373)

使用 curl 命令的异常信息如下:

1
2
3
4
5
6
7
8
9
10
11
12
* TCP_NODELAY set
* Connected to lb_domain (39.103.236.188) port 443 (#0)
* ALPN, offering h2
* ALPN, offering http/1.1
* successfully set certificate verify locations:
* CAfile: /etc/ssl/certs/ca-certificates.crt
CApath: /etc/ssl/certs
* TLSv1.3 (OUT), TLS handshake, Client hello (1):
* error:1408F10B:SSL routines:ssl3_get_record:wrong version number
* stopped the pause stream!
* Closing connection 0
curl: (35) error:1408F10B:SSL routines:ssl3_get_record:wrong version number

路由信息可以通过 traceroute 命令查看,关于该问题的讨论及解决方案可以参考下方链接,因为该问题官方还没有修复,最后我们通过引入 Ingress 来做流量路由解决的该问题。

Reference

DigitalOcean Kubernetes and SSL wrong version number error for the requests from inside a pod | by ismail yenigül | FAUN Publication
Why kube-proxy add external-lb’s address to node local iptables rule? · Issue #66607 · kubernetes/kubernetes · GitHub
enhancements/keps/sig-network/1860-kube-proxy-IP-node-binding at master · kubernetes/enhancements · GitHub
Add IP mode field to loadbalancer status ingress by Sh4d1 · Pull Request #97681 · kubernetes/kubernetes · GitHub