Skip to content

更新 k8s 证书

某天部署在 k8s 上的应用报错,日志显示与 API server 通信失败,原因为:

certificate verify failed: certificate has expired

经调查发现 k8s 证书的有效期为 1 年。官方建议经常升级版本,版本升级后证书也会更新。 若不升级版本,则需手动更新证书。

证书更新步骤

NOTE

  1. 以下操作需要在所有 Control Plane 服务器上执行。
  2. 本例只适用于 kubeadm 部署的 k8s 集群。

执行以下命令查看证书有效期:

bash
kubeadm certs check-expiration

命令返回以下信息:

[check-expiration] Reading configuration from the cluster...
[check-expiration] FYI: You can look at this config file with 'kubectl -n kube-system get cm kubeadm-config -o yaml'
[check-expiration] Error reading configuration from the Cluster. Falling back to default configuration

CERTIFICATE                EXPIRES                  RESIDUAL TIME   CERTIFICATE AUTHORITY   EXTERNALLY MANAGED
admin.conf                 Sep 18, 2024 03:37 UTC   <invalid>       ca                      no      
apiserver                  Sep 18, 2024 03:37 UTC   <invalid>       ca                      no      
apiserver-etcd-client      Sep 18, 2024 03:37 UTC   <invalid>       etcd-ca                 no      
apiserver-kubelet-client   Sep 18, 2024 03:37 UTC   <invalid>       ca                      no      
controller-manager.conf    Sep 18, 2024 03:37 UTC   <invalid>       ca                      no      
etcd-healthcheck-client    Sep 18, 2024 03:37 UTC   <invalid>       etcd-ca                 no      
etcd-peer                  Sep 18, 2024 03:37 UTC   <invalid>       etcd-ca                 no      
etcd-server                Sep 18, 2024 03:37 UTC   <invalid>       etcd-ca                 no      
front-proxy-client         Sep 18, 2024 03:37 UTC   <invalid>       front-proxy-ca          no      
scheduler.conf             Sep 18, 2024 03:37 UTC   <invalid>       ca                      no

可以看出,证书都处于 <invalid> 状态,执行以下命令更新证书:

bash
kubeadm certs renew all

命令返回结果为:

[renew] Reading configuration from the cluster...
[renew] FYI: You can look at this config file with 'kubectl -n kube-system get cm kubeadm-config -o yaml'
[renew] Error reading configuration from the Cluster. Falling back to default configuration

certificate embedded in the kubeconfig file for the admin to use and for kubeadm itself renewed
certificate for serving the Kubernetes API renewed
certificate the apiserver uses to access etcd renewed
certificate for the API server to connect to kubelet renewed
certificate embedded in the kubeconfig file for the controller manager to use renewed
certificate for liveness probes to healthcheck etcd renewed
certificate for etcd nodes to communicate with each other renewed
certificate for serving etcd renewed
certificate for the front proxy client renewed
certificate embedded in the kubeconfig file for the scheduler manager to use renewed

Done renewing certificates. You must restart the kube-apiserver, kube-controller-manager, kube-scheduler and etcd, so that they can use the new certificates.

重新查询证书有效期,可以得到以下结果:

CERTIFICATE                EXPIRES                  RESIDUAL TIME   CERTIFICATE AUTHORITY   EXTERNALLY MANAGED
admin.conf                 Sep 18, 2025 07:32 UTC   364d            ca                      no      
apiserver                  Sep 18, 2025 07:32 UTC   364d            ca                      no      
apiserver-etcd-client      Sep 18, 2025 07:32 UTC   364d            etcd-ca                 no      
apiserver-kubelet-client   Sep 18, 2025 07:32 UTC   364d            ca                      no      
controller-manager.conf    Sep 18, 2025 07:32 UTC   364d            ca                      no      
etcd-healthcheck-client    Sep 18, 2025 07:32 UTC   364d            etcd-ca                 no      
etcd-peer                  Sep 18, 2025 07:32 UTC   364d            etcd-ca                 no      
etcd-server                Sep 18, 2025 07:32 UTC   364d            etcd-ca                 no      
front-proxy-client         Sep 18, 2025 07:32 UTC   364d            front-proxy-ca          no      
scheduler.conf             Sep 18, 2025 07:32 UTC   364d            ca                      no

可以发现证书已经更新。

最后需要重启 kube-apiserver, kube-controller-manager, kube-scheduler 和 etcd:

bash
cd /etc/kubernetes/manifests/
mv *.yaml ../
# 等待几秒钟
cd ..
 mv *.yaml manifests/

ETCD 异常

更新证书后,发现 etcd 无法正常工作,异常信息如下:

Error creating pod: etcdserver: mvcc: database space exceeded

该异常信息说明需要手动压缩 etcd 的空间。

首先需要下载 etcdctl:https://github.com/etcd-io/etcd/releases/download/v3.4.34/etcd-v3.4.34-linux-amd64.tar.gz

解压后执行以下命令:

bash
ETCDCTL_API=3 etcdctl \
  --cacert /etc/kubernetes/pki/etcd/ca.crt \
  --cert /etc/kubernetes/pki/apiserver-etcd-client.crt \
  --key /etc/kubernetes/pki/apiserver-etcd-client.key \
  endpoint status   
127.0.0.1:2379, 715784118fc0c46d, 3.5.9, 2.1 GB, false, false, 62, 129354155, 129354155, memberID:8167141660220769389 alarm:NOSPACE

ETCDCTL_API=3 etcdctl \
  --cacert /etc/kubernetes/pki/etcd/ca.crt \
  --cert /etc/kubernetes/pki/apiserver-etcd-client.crt \
  --key /etc/kubernetes/pki/apiserver-etcd-client.key \
  endpoint status  --write-out="json" | grep revision
[{"Endpoint":"127.0.0.1:2379","Status":{"header":{"cluster_id":16530211416065483436,"member_id":8167141660220769389,"revision":109753907,"raft_term":62},"version":"3.5.9","dbSize":2147610624,"leader":2066636333178759182,"raftIndex":129354091,"raftTerm":62,"raftAppliedIndex":129354091,"errors":["memberID:8167141660220769389 alarm:NOSPACE "],"dbSizeInUse":2772992}}]

ETCDCTL_API=3 etcdctl \
  --cacert /etc/kubernetes/pki/etcd/ca.crt \
  --cert /etc/kubernetes/pki/apiserver-etcd-client.crt \
  --key /etc/kubernetes/pki/apiserver-etcd-client.key \
  compact 109753907
compacted revision 109753907

ETCDCTL_API=3 etcdctl \
  --cacert /etc/kubernetes/pki/etcd/ca.crt \
  --cert /etc/kubernetes/pki/apiserver-etcd-client.crt \
  --key /etc/kubernetes/pki/apiserver-etcd-client.key \
  defrag
Finished defragmenting etcd member[127.0.0.1:2379]

ETCDCTL_API=3 etcdctl \
  --cacert /etc/kubernetes/pki/etcd/ca.crt \
  --cert /etc/kubernetes/pki/apiserver-etcd-client.crt \
  --key /etc/kubernetes/pki/apiserver-etcd-client.key \
  endpoint status   
127.0.0.1:2379, 715784118fc0c46d, 3.5.9, 2.0 MB, false, false, 62, 129354611, 129354611, memberID:8167141660220769389 alarm:NOSPACE

可以看出,压缩前空间占用为 2GB,压缩后变为 2MB。

关于 ETCD 空间压缩的详细介绍可参考 官方文档 -- Maintenance