Kubernetes 使用kubeadm建立叢集

語言: CN / TW / HK

映象下載、域名解析、時間同步請點選 阿里巴巴開源映象站

實踐環境

CentOS-7-x86_64-DVD-1810

Docker 19.03.9

Kubernetes version: v1.20.5

開始之前

1臺Linux操作或更多,相容執行deb,rpm

確保每臺機器2G記憶體或以上

確保當控制面板的結點機,其CPU核數為雙核或以上

確保叢集中的所有機器網路互連

目標

  • 安裝一個Kubernetes叢集控制面板
  • 基於叢集安裝一個Pod networ以便叢集之間可以相互通訊

安裝指導

安裝Docker

安裝過程略

注意,安裝docker時,需要指Kubenetes支援的版本(參見如下),如果安裝的docker版本過高導致,會提示以下問題

WARNING SystemVerification]: this Docker version is not on the list of validated versions: 20.10.5. Latest validated version: 19.03

安裝docker時指定版本

sudo yum install docker-ce-19.03.9 docker-ce-cli-19.03.9 containerd.io

如果沒有安裝docker,執行kubeadm init時會提示以下問題

cannot automatically set CgroupDriver when starting the Kubelet: cannot execute 'docker info -f {{.CgroupDriver}}': executable file not found in $PATH

[preflight] WARNING: Couldn't create the interface used for talking to the container runtime: docker is required for container runtime: exec: "docker": executable file not found in $PATH

安裝kubeadm

如果沒有安裝的話,先安裝kubeadm,如果已安裝,可通過apt-get update && apt-get upgradeyum update命令更新kubeadm最新版

注意:更新kubeadm過程中,kubelet每隔幾秒中就會重啟,這個是正常現象。

其它前置操作

關閉防火牆

# systemctl stop firewalld && systemctl disable firewalld

執行上述命令停止並禁用防火牆,否則執行kubeadm init時會提示以下問題

[WARNING Firewalld]: firewalld is active, please ensure ports [6443 10250] are open or your cluster may not function correctly

修改/etc/docker/daemon.json檔案

編輯/etc/docker/daemon.json檔案,新增以下內容

{
"exec-opts":["native.cgroupdriver=systemd"]
}

然後執行systemctl restart docker命令重啟docker

如果不執行以上操作,執行kubeadm init時會提示以下問題

[WARNING IsDockerSystemdCheck]: detected "cgroupfs" as the Docker cgroup driver. The recommended driver is "systemd". Please follow the guide at https://kubernetes.io/docs/setup/cri/

安裝socatconntrack等依賴軟體包

# yum install socat conntrack-tools

如果按未安裝上述依賴包,執行kubeadm init時會提示以下問題

[WARNING FileExisting-socat]: socat not found in system path
error execution phase preflight: [preflight] Some fatal errors occurred:`
[ERROR FileExisting-conntrack]: conntrack not found in system path`

設定net.ipv4.ip_forward值為1

設定net.ipv4.ip_forward值為1,具體如下

# sysctl -w net.ipv4.ip_forward=1
net.ipv4.ip_forward = 1  

說明:net.ipv4.ip_forward如果為0,則表示禁止轉發資料包,為1則表示允許轉發資料包,如果net.ipv4.ip_forward值不為1,執行kubeadm init時會提示以下問題

ERROR FileContent--proc-sys-net-ipv4-ip_forward]: /proc/sys/net/ipv4/ip_forward contents are not set to 1

以上配置臨時生效,為了避免重啟機器後失效,進行如下設定

# echo "net.ipv4.ip_forward=1" >> /etc/sysctl.conf

注意:網上有推薦以下方式進行永久配置的,但是筆者試過,實際不起作用

# echo "sysctl -w net.ipv4.ip_forward=1" >> /etc/rc.local 
# chmod +x /etc/rc.d/rc.local

設定net.bridge.bridge-nf-call-iptables值為1

做法參考 net.ipv4.ip_forward設定

注意:上文操作,在每個叢集結點都要實施一次

初始化控制面板結點

控制面板元件執行的機器,稱之為控制面板結點,包括 etcd (叢集資料庫) 和 API Server (供 kubectl 命令列工具呼叫)

1、(推薦)如果打算升級單個控制面板kubeadm叢集為高可用版(high availability),應該為kubeadm init指定--control-plane-endpoint引數選項以便為所有控制面板結點設定共享endpont。該endpont可以是DNS名稱或者本地負載均衡IP地址。

2、選擇一個網路外掛,並確認該外掛是否需要傳遞引數給 kubeadm init,這取決於你所選外掛,比如使用flannel,就必須為kubeadm init指定--pod-network-cidr引數選項

3、(可選)1.14版本開始, kubeadm會自動檢測容器執行時,如果需要使用不同的容器執行時,或者有多於1個容器執行時的情況下,需要為kubeadm init指定--cri-socket引數選項

4、(可選)除非指定了其它的,kubeadm使用與預設閘道器關聯的網路介面為指定控制面板結點API伺服器設定advertise地址。如果需要指定其它的網路介面,需要為kubeadm init指定apiserver-advertise-address=<ip-address>引數選項。釋出IPV6 Kubernetes叢集,需要為kubeadm init指定--apiserver-advertise-address引數選項,以設定IPv6地址,形如 --apiserver-advertise-address=fd00::101

5、(可選)執行kubeadm init之前,先執行kubeadm config images pull,以確認可連線到gcr.io容器映象註冊中心

如下,帶引數執行kubeadm init以便初始化控制面板結點機,執行該命令時會先執行一系列的預檢,以確保機器滿足執行kubernetes。如果預檢發現錯誤,則自動退出程式,否則繼續執行,下載並安裝叢集控制面板元件。這可能會花費幾分鐘

# kubeadm init --image-repository=registry.aliyuncs.com/google_containers --kubernetes-version stable  --pod-network-cidr=10.244.0.0/16
[init] Using Kubernetes version: v1.20.5
[preflight] Running pre-flight checks
[preflight] Pulling images required for setting up a Kubernetes cluster
[preflight] This might take a minute or two, depending on the speed of your internet connection
[preflight] You can also perform this action in beforehand using 'kubeadm config images pull'
[certs] Using certificateDir folder "/etc/kubernetes/pki"
[certs] Generating "ca" certificate and key
[certs] Generating "apiserver" certificate and key
[certs] apiserver serving cert is signed for DNS names [kubernetes kubernetes.default kubernetes.default.svc kubernetes.default.svc.cluster.local localhost.localdomain] and IPs [10.96.0.1 10.118.80.93]
[certs] Generating "apiserver-kubelet-client" certificate and key
[certs] Generating "front-proxy-ca" certificate and key
[certs] Generating "front-proxy-client" certificate and key
[certs] Generating "etcd/ca" certificate and key
[certs] Generating "etcd/server" certificate and key
[certs] etcd/server serving cert is signed for DNS names [localhost localhost.localdomain] and IPs [10.118.80.93 127.0.0.1 ::1]
[certs] Generating "etcd/peer" certificate and key
[certs] etcd/peer serving cert is signed for DNS names [localhost localhost.localdomain] and IPs [10.118.80.93 127.0.0.1 ::1]
[certs] Generating "etcd/healthcheck-client" certificate and key
[certs] Generating "apiserver-etcd-client" certificate and key
[certs] Generating "sa" key and public key
[kubeconfig] Using kubeconfig folder "/etc/kubernetes"
[kubeconfig] Writing "admin.conf" kubeconfig file
[kubeconfig] Writing "kubelet.conf" kubeconfig file
[kubeconfig] Writing "controller-manager.conf" kubeconfig file
[kubeconfig] Writing "scheduler.conf" kubeconfig file
[kubelet-start] Writing kubelet environment file with flags to file "/var/lib/kubelet/kubeadm-flags.env"
[kubelet-start] Writing kubelet configuration to file "/var/lib/kubelet/config.yaml"
[kubelet-start] Starting the kubelet
[control-plane] Using manifest folder "/etc/kubernetes/manifests"
[control-plane] Creating static Pod manifest for "kube-apiserver"
[control-plane] Creating static Pod manifest for "kube-controller-manager"
[control-plane] Creating static Pod manifest for "kube-scheduler"
[etcd] Creating static Pod manifest for local etcd in "/etc/kubernetes/manifests"
[wait-control-plane] Waiting for the kubelet to boot up the control plane as static Pods from directory "/etc/kubernetes/manifests". This can take up to 4m0s
[kubelet-check] Initial timeout of 40s passed.
[apiclient] All control plane components are healthy after 89.062309 seconds
[upload-config] Storing the configuration used in ConfigMap "kubeadm-config" in the "kube-system" Namespace
[kubelet] Creating a ConfigMap "kubelet-config-1.20" in namespace kube-system with the configuration for the kubelets in the cluster
[upload-certs] Skipping phase. Please see --upload-certs
[mark-control-plane] Marking the node localhost.localdomain as control-plane by adding the labels "node-role.kubernetes.io/master=''" and "node-role.kubernetes.io/control-plane='' (deprecated)"
[mark-control-plane] Marking the node localhost.localdomain as control-plane by adding the taints [node-role.kubernetes.io/master:NoSchedule]
[bootstrap-token] Using token: 1sh85v.surdstc5dbrmp1s2
[bootstrap-token] Configuring bootstrap tokens, cluster-info ConfigMap, RBAC Roles
[bootstrap-token] configured RBAC rules to allow Node Bootstrap tokens to get nodes
[bootstrap-token] configured RBAC rules to allow Node Bootstrap tokens to post CSRs in order for nodes to get long term certificate credentials
[bootstrap-token] configured RBAC rules to allow the csrapprover controller automatically approve CSRs from a Node Bootstrap Token
[bootstrap-token] configured RBAC rules to allow certificate rotation for all node client certificates in the cluster
[bootstrap-token] Creating the "cluster-info" ConfigMap in the "kube-public" namespace
[kubelet-finalize] Updating "/etc/kubernetes/kubelet.conf" to point to a rotatable kubelet client certificate and key
[addons] Applied essential addon: CoreDNS
[addons] Applied essential addon: kube-proxy

Your Kubernetes control-plane has initialized successfully!

To start using your cluster, you need to run the following as a regular user:

  mkdir -p $HOME/.kube
  sudo cp -i /etc/kubernetes/admin.conf $HOME/.kube/config
  sudo chown $(id -u):$(id -g) $HOME/.kube/config

Alternatively, if you are the root user, you can run:

  export KUBECONFIG=/etc/kubernetes/admin.conf

You should now deploy a pod network to the cluster.
Run "kubectl apply -f [podnetwork].yaml" with one of the options listed at:
  https://kubernetes.io/docs/concepts/cluster-administration/addons/

Then you can join any number of worker nodes by running the following on each as root:
kubeadm join 10.118.80.93:6443 --token ap4vvq.8xxcc0uea7dxbjlo \     
    --discovery-token-ca-cert-hash sha256:c4493c04d789463ecd25c97453611a9dfacb36f4d14d5067464832b9e9c5039a

如上,命令輸出Your Kubernetes control-plane has initialized successfully!及其它提示,告訴我們初始化控制面板結點成功。

注意:

1、如果不使用--image-repository選項指定阿里雲映象,可能會報類似如下錯誤

failed to pull image "k8s.gcr.io/kube-apiserver:v1.20.5": output: Error response from daemon: Get https://k8s.gcr.io/v2/: net/http: request canceled while waiting for connection (Client.Timeout exceeded while awaiting headers)
, error: exit status 1

2、因為使用flannel網路外掛,必須指定--pod-network-cidr配置選項,否則名為coredns-xxxxxxxxxx-xxxxx的Pod無法啟動,一直處於ContainerCreating狀態,檢視詳細資訊,可見類似如下錯誤資訊

networkPlugin cni failed to set up pod "coredns-7f89b7bc75-9vrrl_kube-system" network: open /run/flannel/subnet.env: no such file or directory

3、--pod-network-cidr選項引數,即Pod網路不能和宿主主機網路相同,否則安裝flannel外掛後會導致路由重複,進而導致XShell等工具無法ssh宿主機,如下:

實踐宿主主機網路 10.118.80.0/24,網絡卡介面 ens33

--pod-network-cidr=10.118.80.0/24

4、另外,需要特別注意的是,``--pod-network-cidr的選項引數,必須和kube-flannel.yml檔案中的net-conf.json.Network鍵值保持一致(本例中,鍵值如下所示,為10.244.0.0/16,所以執行kubeadm init命令時,--pod-network-cidr選項引數值設定為10.244.0.0/16`)

# cat kube-flannel.yml|grep -E "^\s*\"Network"
      "Network": "10.244.0.0/16",

初次實踐時,設定--pod-network-cidr=10.1.15.0/24,未修改kube-flannel.yml中Network鍵值,新加入叢集的結點,無法自動獲取pod cidr,如下

# kubectl get pods --all-namespaces
NAMESPACE              NAME                                            READY   STATUS             RESTARTS   AGE
kube-system   kube-flannel-ds-psts8                           0/1     CrashLoopBackOff   62         15h
...略
# kubectl -n kube-system logs kube-flannel-ds-psts8
...略
E0325 01:03:08.190986       1 main.go:292] Error registering network: failed to acquire lease: node "k8snode1" pod cidr not assigned
W0325 01:03:08.192875       1 reflector.go:424] github.com/coreos/flannel/subnet/kube/kube.go:300: watch of *v1.Node ended with: an error on the server ("unable to decode an event from the watch stream: context canceled") has prevented the request from succeeding
I0325 01:03:08.193782       1 main.go:371] Stopping shutdownHandler...

後面嘗試修改kube-flannel.yml中``net-conf.json.Network鍵值為10.1.15.0/24還是一樣的提示(先下載kube-flannel.yml`,然後進行配置修改,再安裝網路外掛)

針對上述 node "xxxxxx" pod cidr not assigned的問題,網上也有臨時解決方案(筆者未驗證),即為結點手動分配podCIDR,命令如下:

kubectl patch node <NODE_NAME> -p '{"spec":{"podCIDR":"<SUBNET>"}}'

5、參照輸出提示,為了讓非root使用者也可以正常執行kubectl,執行以下命令

# mkdir -p $HOME/.kube
# sudo cp -i /etc/kubernetes/admin.conf $HOME/.kube/config
# sudo chown $(id -u):$(id -g) $HOME/.kube/config

可選的,如果是root使用者,可執行以下命令

export KUBECONFIG=/etc/kubernetes/admin.conf

記錄kubeadm init輸出中的kubeadm join,後面需要用該命令新增結點到叢集中

token用於控制面板結點和加入叢集的結點之間的相互認證。需要安全儲存,因為任何擁有該token的人都可以新增認證結點到叢集中。 可用 kubeadm token展示,建立和刪除該token。命令詳情參考kubeadm reference guide.

安裝Pod網路外掛

**必須基於Pod網路釋出一個 Container Network Interface (CNI) ,以便Pod之間可相互通訊。Pod網路安裝之前,不會啟動Cluster DNS (CoreDNS) **

  • 注意Pod 網路不能和主機網路重疊,如果重疊,會出問題(如果發現網路發現網路外掛的首選Pod網路與某些主機網路之間發生衝突,則應考慮使用合適的CIDR塊,然後在執行kubeadm init時,增加--pod-network-cidr選項替換網路外掛YAML中的網路配置.
  • 預設的, kubeadm 設定叢集強制使用 RBAC (基於角色訪問控制)。確保Pod網路外掛及用其釋出的任何清單支援RBAC
  • 如果讓叢集使用IPv6--dual-stack,或者僅single-stack IPv6 網路,確保往外掛支援IPv6. CNI v0.6.0中添加了IPv6的支援。 好些專案使用CNI提供提供Kubernetes網路支援,其中一些也支援網路策略,以下是實現了Kubernetes網路模型的外掛列表檢視地址:

https://kubernetes.io/docs/concepts/cluster-administration/networking/#how-to-implement-the-kubernetes-networking-model

可在控制面板結點機上或者擁有kubeconfig 憑據的結點機上通過執行下述命令安裝一個Pod網路外掛,該外掛直接以daemonset的方式安裝,並且會把配置檔案寫入/etc/cni/net.d目錄:

kubectl apply -f <add-on.yaml>

flannel網路外掛安裝

手動釋出flannel(Kubernetes v1.17+)

# kubectl apply -f https://raw.githubusercontent.com/coreos/flannel/master/Documentation/kube-flannel.yml
podsecuritypolicy.policy/psp.flannel.unprivileged created
clusterrole.rbac.authorization.k8s.io/flannel created
clusterrolebinding.rbac.authorization.k8s.io/flannel created
serviceaccount/flannel created
configmap/kube-flannel-cfg created
daemonset.apps/kube-flannel-ds created

參考連線:https://github.com/flannel-io/flannel#flannel

每個叢集只能安裝一個Pod網路,Pod網路安裝完成後,可通過執行kubectl get pods --all-namespaces命令,檢視命令輸出中coredns-xxxxxxxxxx-xxx Pod是否處於Running來判斷網路是否正常

檢視flannel子網環境配置資訊

# cat /run/flannel/subnet.env
FLANNEL_NETWORK=10.244.0.0/16
FLANNEL_SUBNET=10.244.0.1/24
FLANNEL_MTU=1450
FLANNEL_IPMASQ=true

flannel網路外掛安裝完成後,宿主機上會自動增加兩個虛擬網絡卡:cni0 和 flannel.1

# ifconfig -a
cni0: flags=4163<UP,BROADCAST,RUNNING,MULTICAST>  mtu 1450
        inet 10.244.0.1  netmask 255.255.255.0  broadcast 10.244.0.255
        inet6 fe80::705d:43ff:fed6:80c9  prefixlen 64  scopeid 0x20<link>
        ether 72:5d:43:d6:80:c9  txqueuelen 1000  (Ethernet)
        RX packets 312325  bytes 37811297 (36.0 MiB)
        RX errors 0  dropped 0  overruns 0  frame 0
        TX packets 356346  bytes 206539626 (196.9 MiB)
        TX errors 0  dropped 0 overruns 0  carrier 0  collisions 0

docker0: flags=4099<UP,BROADCAST,MULTICAST>  mtu 1500
        inet 172.17.0.1  netmask 255.255.0.0  broadcast 172.17.255.255
        inet6 fe80::42:e1ff:fec3:8b6a  prefixlen 64  scopeid 0x20<link>
        ether 02:42:e1:c3:8b:6a  txqueuelen 0  (Ethernet)
        RX packets 0  bytes 0 (0.0 B)
        RX errors 0  dropped 0  overruns 0  frame 0
        TX packets 3  bytes 266 (266.0 B)
        TX errors 0  dropped 0 overruns 0  carrier 0  collisions 0

ens33: flags=4163<UP,BROADCAST,RUNNING,MULTICAST>  mtu 1500
        inet 10.118.80.93  netmask 255.255.255.0  broadcast 10.118.80.255
        inet6 fe80::6ff9:dbee:6b27:1315  prefixlen 64  scopeid 0x20<link>
        ether 00:0c:29:d3:3b:ef  txqueuelen 1000  (Ethernet)
        RX packets 2092903  bytes 1103282695 (1.0 GiB)
        RX errors 0  dropped 0  overruns 0  frame 0
        TX packets 969483  bytes 253273828 (241.5 MiB)
        TX errors 0  dropped 0 overruns 0  carrier 0  collisions 0

flannel.1: flags=4163<UP,BROADCAST,RUNNING,MULTICAST>  mtu 1450
        inet 10.244.0.0  netmask 255.255.255.255  broadcast 10.244.0.0
        inet6 fe80::a49a:2ff:fe38:3e4b  prefixlen 64  scopeid 0x20<link>
        ether a6:9a:02:38:3e:4b  txqueuelen 0  (Ethernet)
        RX packets 0  bytes 0 (0.0 B)
        RX errors 0  dropped 0  overruns 0  frame 0
        TX packets 0  bytes 0 (0.0 B)
        TX errors 0  dropped 8 overruns 0  carrier 0  collisions 0

lo: flags=73<UP,LOOPBACK,RUNNING>  mtu 65536
        inet 127.0.0.1  netmask 255.0.0.0
        inet6 ::1  prefixlen 128  scopeid 0x10<host>
        loop  txqueuelen 1000  (Local Loopback)
        RX packets 30393748  bytes 5921348235 (5.5 GiB)
        RX errors 0  dropped 0  overruns 0  frame 0
        TX packets 30393748  bytes 5921348235 (5.5 GiB)
        TX errors 0  dropped 0 overruns 0  carrier 0  collisions 0

重新初始化控制面板結點

實踐過程中因選項配置不對,在網路外掛安裝後才發現需要,需要重新執行kubeadm init命令。具體實踐操作如下:

# kubeadm reset
[reset] Reading configuration from the cluster...
[reset] FYI: You can look at this config file with 'kubectl -n kube-system get cm kubeadm-config -o yaml'
[reset] WARNING: Changes made to this host by 'kubeadm init' or 'kubeadm join' will be reverted.
[reset] Are you sure you want to proceed? [y/N]: y
[preflight] Running pre-flight checks
[reset] Removing info for node "localhost.localdomain" from the ConfigMap "kubeadm-config" in the "kube-system" Namespace
[reset] Stopping the kubelet service
[reset] Unmounting mounted directories in "/var/lib/kubelet"
[reset] Deleting contents of config directories: [/etc/kubernetes/manifests /etc/kubernetes/pki]
[reset] Deleting files: [/etc/kubernetes/admin.conf /etc/kubernetes/kubelet.conf /etc/kubernetes/bootstrap-kubelet.conf /etc/kubernetes/controller-manager.conf /etc/kubernetes/scheduler.conf]
[reset] Deleting contents of stateful directories: [/var/lib/etcd /var/lib/kubelet /var/lib/dockershim /var/run/kubernetes /var/lib/cni]

The reset process does not clean CNI configuration. To do so, you must remove /etc/cni/net.d

The reset process does not reset or clean up iptables rules or IPVS tables.
If you wish to reset iptables, you must do so manually by using the "iptables" command.

If your cluster was setup to utilize IPVS, run ipvsadm --clear (or similar)
to reset your system's IPVS tables.

The reset process does not clean your kubeconfig files and you must remove them manually.
Please, check the contents of the $HOME/.kube/config file.
# rm -rf /etc/cni/net.d
# rm -f $HOME/.kube/config
# 

執行完上述命令後,需要重新執行 初始化控制面板結點操作,並且重新安裝網路外掛

遇到的問題總結

重新執行kubeadm init命令後,執行kubectl get pods --all-namespaces檢視Pod狀態,發現coredns-xxxxxxxxxx-xxxxxx狀態為ContainerCreating,如下

# kubectl get pods --all-namespaces
NAMESPACE     NAME                                            READY   STATUS              RESTARTS   AGE
kube-system   coredns-7f89b7bc75-pxvdx                        0/1     ContainerCreating   0          8m33s
kube-system   coredns-7f89b7bc75-v4p57                        0/1     ContainerCreating   0          8m33s
kube-system   etcd-localhost.localdomain                      1/1     Running             0          8m49s
...略

執行kubectl describe pod coredns-7f89b7bc75-pxvdx -n kube-system命令檢視對應Pod詳細資訊,發現如下錯誤:

Warning  FailedCreatePodSandBox  98s (x4 over 103s)    kubelet            (combined from similar events): Failed to create pod sandbox: rpc error: code = Unknown desc = failed to set up sandbox container "04434c63cdf067e698a8a927ba18e5013d2a1a21afa642b3cddedd4ff4592178" network for pod "coredns-7f89b7bc75-pxvdx": networkPlugin cni failed to set up pod "coredns-7f89b7bc75-pxvdx_kube-system" network: failed to set bridge addr: "cni0" already has an IP address different from 10.1.15.1/24

如下,檢視網絡卡資訊,發現 cni0已分配了IP地址(網路外掛上次分配的),導致本次網路外掛給它設定IP失敗。

# ifconfig -a
cni0: flags=4099<UP,BROADCAST,MULTICAST>  mtu 1500
        inet 10.118.80.1  netmask 255.255.255.0  broadcast 10.118.80.255
        inet6 fe80::482d:65ff:fea6:32fd  prefixlen 64  scopeid 0x20<link>
        ether 4a:2d:65:a6:32:fd  txqueuelen 1000  (Ethernet)
        RX packets 267800  bytes 16035849 (15.2 MiB)
        RX errors 0  dropped 0  overruns 0  frame 0
        TX packets 116238  bytes 10285959 (9.8 MiB)
        TX errors 0  dropped 0 overruns 0  carrier 0  collisions 0

...略
flannel.1: flags=4163<UP,BROADCAST,RUNNING,MULTICAST>  mtu 1450
        inet 10.1.15.0  netmask 255.255.255.255  broadcast 10.1.15.0
        inet6 fe80::a49a:2ff:fe38:3e4b  prefixlen 64  scopeid 0x20<link>
        ether a6:9a:02:38:3e:4b  txqueuelen 0  (Ethernet)
        RX packets 0  bytes 0 (0.0 B)
        RX errors 0  dropped 0  overruns 0  frame 0
        TX packets 0  bytes 0 (0.0 B)
        TX errors 0  dropped 8 overruns 0  carrier 0  collisions 0
...略

解決方法如下,刪除配置錯誤的cni0網絡卡,刪除網絡卡後會自動重建,然後就好了

$ sudo ifconfig cni0 down    
$ sudo ip link delete cni0

控制面板結點Toleration(可選)

預設的,出於安全考慮,叢集不會在控制面板結點機上排程(schedule)Pod。如果希望在控制面板結點機上排程Pod,比如用於開發的單機Kubernetes叢集,需要執行以下命令

kubectl taint nodes --all node-role.kubernetes.io/master- # 移除所有Labels以node-role.kubernetes.io/master打頭的結點的汙點(Taints)

實踐如下

# kubectl get nodes
NAME                    STATUS   ROLES                  AGE   VERSION
localhost.localdomain   Ready    control-plane,master   63m   v1.20.5
# kubectl taint nodes --all node-role.kubernetes.io/master-
node/localhost.localdomain untainted

新增結點到叢集

修改新結點的hostname

# hostname
localhost.localdomain
# hostname k8sNode1

以上通過命令修改主機名僅臨時生效,為了避免重啟失效,需要編輯/etc/hostname檔案,替換預設的localhost.localdomain為目標名稱(例中為k8sNode),如果不新增,後續操作會遇到一下錯誤

[WARNING Hostname]: hostname "k8sNode1" could not be reached
	[WARNING Hostname]: hostname "k8sNode1": lookup k8sNode1 on 223.5.5.5:53: read udp 10.118.80.94:33293->223.5.5.5:53: i/o timeout

修改/ect/hosts配置,增加結點機hostname到結點機IP(例中為 10.118.80.94)的對映,如下

# vi /etc/hosts
127.0.0.1   localhost localhost.localdomain localhost4 localhost4.localdomain4
::1         localhost localhost.localdomain localhost6 localhost6.localdomain6
10.118.80.94   k8sNode1

ssh登入目標結點機,切換至root使用者(如果非root使用者登入),然後執行控制面板機器上執行kubeadm init命令輸出的kubeadm join命令,錄入:

kubeadm join --token <token> <control-plane-host>:<control-plane-port> --discovery-token-ca-cert-hash sha256:<hash>

可在控制面板機上通過執行一下命令檢視已有且未過期token

# kubeadm token list

如果沒有token,可在控制面板機上通過以下命令重新生成token

# kubeadm token create

實踐如下

# kubeadm join 10.118.80.93:6443 --token ap4vvq.8xxcc0uea7dxbjlo     --discovery-token-ca-cert-hash sha256:c4493c04d789463ecd25c97453611a9dfacb36f4d14d5067464832b9e9c5039a
[preflight] Running pre-flight checks
[preflight] Reading configuration from the cluster...
[preflight] FYI: You can look at this config file with 'kubectl -n kube-system get cm kubeadm-config -o yaml'
[kubelet-start] Writing kubelet configuration to file "/var/lib/kubelet/config.yaml"
[kubelet-start] Writing kubelet environment file with flags to file "/var/lib/kubelet/kubeadm-flags.env"
[kubelet-start] Starting the kubelet
[kubelet-start] Waiting for the kubelet to perform the TLS Bootstrap...

This node has joined the cluster:
* Certificate signing request was sent to apiserver and a response was received.
* The Kubelet was informed of the new secure connection details.

Run 'kubectl get nodes' on the control-plane to see this node join the cluster.

控制面板節點機即master機器上檢視是否新增結點

# kubectl get nodes
NAME                    STATUS     ROLES                  AGE     VERSION
k8snode1                NotReady   <none>                 74s     v1.20.5
localhost.localdomain   Ready      control-plane,master   7h24m   v1.20.5

如上,新增了一個k8snode1結點

遇到問題總結

問題1:執行]kubeadm join時報錯,如下

# kubeadm join 10.118.80.93:6443 --token ap4vvq.8xxcc0uea7dxbjlo     --discovery-token-ca-cert-hash sha256:c4493c04d789463ecd25c97453611a9dfacb36f4d14d5067464832b9e9c5039a
[preflight] Running pre-flight checks
error execution phase preflight: couldn't validate the identity of the API Server: could not find a JWS signature in the cluster-info ConfigMap for token ID "ap4vvq"
To see the stack trace of this error execute with --v=5 or higher

解決方法:

token過期,執行kubeadm token create命令重新生成token

問題1:執行]kubeadm join時報錯,如下

# kubeadm join 10.118.80.93:6443 --token pa0gxw.4vx2wud1e7e0rzbx  --discovery-token-ca-cert-hash sha256:c4493c04d789463ecd25c97453611a9dfacb36f4d14d5067464832b9e9c5039a
[preflight] Running pre-flight checks
error execution phase preflight: couldn't validate the identity of the API Server: cluster CA found in cluster-info ConfigMap is invalid: none of the public keys "sha256:8e2f94e2f4f1b66c45d941c0a7f72e328c242346360751b5c1cf88f437ab854f" are pinned
To see the stack trace of this error execute with --v=5 or higher

解決方法:

discovery-token-ca-cert-hash失效,執行以下命令,重新獲取discovery-token-ca-cert-hash值

# openssl x509 -pubkey -in /etc/kubernetes/pki/ca.crt | openssl rsa -pubin -outform der 2>/dev/null | openssl dgst -sha256 -hex | sed 's/^.* //'
8e2f94e2f4f1b66c45d941c0a7f72e328c242346360751b5c1cf88f437ab854f

使用輸出的hash值

--discovery-token-ca-cert-hash sha256:8e2f94e2f4f1b66c45d941c0a7f72e328c242346360751b5c1cf88f437ab854f

問題2: cni config uninitialized錯誤問題

通過k8s自帶UI檢視新加入結點狀態為KubeletNotReady,提示資訊如下,

[container runtime status check may not have completed yet, PLEG is not healthy: pleg has yet to be successful, runtime network not ready: NetworkReady=false reason:NetworkPluginNotReady message:docker: network plugin is not ready: cni config uninitialized, CSINode is not yet initialized, missing node capacity for resources: ephemeral-storage]

解決方法: 重新安裝CNI網路外掛(實踐時採用了虛擬機器,可能是因為當時使用的快照沒包含網路外掛),然後重新清理結點,最後再重新加入結點

# CNI_VERSION="v0.8.2"
# mkdir -p /opt/cni/bin
# curl -L "https://github.com/containernetworking/plugins/releases/download/${CNI_VERSION}/cni-plugins-linux-amd64-${CNI_VERSION}.tgz" | sudo tar -C /opt/cni/bin -xz

清理

如果在叢集中使用一次性伺服器進行測試,則可以直接關閉這些伺服器,不需要進行進一步的清理。可以使用kubectl config delete cluster刪除對叢集的本地引用(筆者未試過)。

但是,如果您想更乾淨地清理叢集,則應該首先清空結點資料,確保節點資料被清空,然後再刪除結點

移除結點

控制面板結點機上的操作

先在控制面板結點機上執行以下命令,告訴控制面板結點機器強制刪除待刪除結點資料

kubectl drain <node name> --delete-emptydir-data --force --ignore-daemonsets

實踐如下:

# kubectl get nodes
NAME                    STATUS   ROLES                  AGE   VERSION
k8snode1                Ready    <none>                 82m   v1.20.5
localhost.localdomain   Ready    control-plane,master   24h   v1.20.5
# kubectl drain k8snode1 --delete-emptydir-data --force --ignore-daemonsets
node/k8snode1 cordoned
WARNING: ignoring DaemonSet-managed Pods: kube-system/kube-flannel-ds-4xqcc, kube-system/kube-proxy-c7qzs
evicting pod default/nginx-deployment-64859b8dcc-v5tcl
evicting pod default/nginx-deployment-64859b8dcc-qjrld
evicting pod default/nginx-deployment-64859b8dcc-rcvc8
pod/nginx-deployment-64859b8dcc-rcvc8 evicted
pod/nginx-deployment-64859b8dcc-qjrld evicted
pod/nginx-deployment-64859b8dcc-v5tcl evicted
node/k8snode1 evicted
# kubectl get nodes
NAME                    STATUS   ROLES                  AGE   VERSION
localhost.localdomain   Ready    control-plane,master   24h   v1.20.5

目標結點機上的操作

登入到目標結點機上,執行以下命令

# kubeadm reset

上述命令不會重置、清理iptables、IPVS表,如果需要重置iptables還需要手動執行以下命令:

iptables -F && iptables -t nat -F && iptables -t mangle -F && iptables -X

如果需要重置IPVS,必須執行以下命令。

ipvsadm -C

注意:如果無特殊需求,不要去重置網路

刪除結點配置檔案

# rm -rf /etc/cni/net.d
# rm -f $HOME/.kube/config

控制面板結點機上的操作

通過執行命令刪除結點kubectl delete node <node name>

###刪除未刪除的pod
# kubectl delete pod kube-flannel-ds-4xqcc -n kube-system --force
# kubectl delete pod kube-proxy-c7qzs -n kube-system --force
# kubectl delete node k8snode1
node "k8snode1" deleted

刪除後,如果需要重新加入結點,可通過 kubeadm join 攜帶適當引數執行加入

清理控制面板

可以在控制面板結點機上,使用kubeadm reset 命令。點選檢視 kubeadm reset 命令參考

本文轉自:https://www.cnblogs.com/shouke/p/15318151.html