k8s部署redis集羣

語言: CN / TW / HK

部署一個多主多從的redis集羣

準備


採用StatefulSet部署有狀態服務

StatefulSet介紹


StatefulSet是deployment的一種變體。管理所有有狀態的服務,擁有固定的pod名稱,啟停順序,還需要用到共享存儲。
deployment對應的服務是service

StatefulSet對應的服務是headless service,無頭服務與service的區別是沒有Cluster IP,解析他的名稱時返回改headless service對應的全部pod的endpoint列表。

此外StatefulSet在無頭服務的基礎上,為對應的所有pod創建了一個DNS域名,域名的格式為:

$(podname).(headless server name)   
FQDN: $(podname).(headless server name).namespace.svc.cluster.local


即,對於有狀態服務,我們最好使用固定的網絡標識(如域名信息)來標記節點,當然這也需要應用程序的支持(如Zookeeper就支持在配置文件中寫入主機域名)。
StatefulSet基於Headless Service(即沒有Cluster IP的Service)為Pod實現了穩定的網絡標誌(包括Pod的hostname和DNS Records),在Pod重新調度後也保持不變。同時,結合PV/PVC,StatefulSet可以實現穩定的持久化存儲,就算Pod重新調度後,還是能訪問到原先的持久化數據。
以下為使用StatefulSet部署Redis的架構,無論是Master還是Slave,都作為StatefulSet的一個副本,並且數據通過PV進行持久化,對外暴露為一個Service,接受客户端請求

部署過程


基於StatefulSet的Redis創建步驟:

1.創建NFS存儲
2.創建PV
3.創建PVC
4.創建Configmap
5.創建headless服務
6.創建Redis StatefulSet
7.初始化Redis集羣

1.創建NFS存儲


創建NFS存儲主要是為了給Redis提供穩定的後端存儲,當Redis的Pod重啟或遷移後,依然能獲得原先的數據。這裏,我們先要創建NFS,然後通過使用PV為Redis掛載一個遠程的NFS路徑。

安裝NFS

yum -y install nfs-utils(主包提供文件系統)
yum -y install rpcbind(提供rpc協議)

然後,新增/etc/exports文件,用於設置需要共享的路徑:

cat > /etc/exports << EOF
/ssd/nfs/k8s/redis/pv1 192.168.10.0/24(rw,sync,no_root_squash)
/ssd/nfs/k8s/redis/pv2 192.168.10.0/24(rw,sync,no_root_squash)
/ssd/nfs/k8s/redis/pv3 192.168.10.0/24(rw,sync,no_root_squash)
/ssd/nfs/k8s/redis/pv4 192.168.10.0/24(rw,sync,no_root_squash)
/ssd/nfs/k8s/redis/pv5 192.168.10.0/24(rw,sync,no_root_squash)
/ssd/nfs/k8s/redis/pv6 192.168.10.0/24(rw,sync,no_root_squash)
 
EOF

創建相應目錄

mkdir -p /ssd/nfs/k8s/redis/pv{1..6}

接着,啟動NFS和rpcbind服務:

systemctl restart rpcbind
 
systemctl restart nfs
 
systemctl enable nfs
 
[root@itrainning-149 ~]# exportfs -v
/ssd/nfs/logdmtm
		192.168.10.75(sync,wdelay,hide,no_subtree_check,sec=sys,rw,secure,root_squash,all_squash)
/ssd/nfs/logdmtm
		192.168.10.7(sync,wdelay,hide,no_subtree_check,sec=sys,rw,secure,root_squash,all_squash)
/ssd/nfs/k8s/redis/pv1
		192.168.10.0/24(sync,wdelay,hide,no_subtree_check,sec=sys,rw,secure,no_root_squash,no_all_squash)
/ssd/nfs/k8s/redis/pv2
		192.168.10.0/24(sync,wdelay,hide,no_subtree_check,sec=sys,rw,secure,no_root_squash,no_all_squash)
/ssd/nfs/k8s/redis/pv3
		192.168.10.0/24(sync,wdelay,hide,no_subtree_check,sec=sys,rw,secure,no_root_squash,no_all_squash)
/ssd/nfs/k8s/redis/pv4
		192.168.10.0/24(sync,wdelay,hide,no_subtree_check,sec=sys,rw,secure,no_root_squash,no_all_squash)
/ssd/nfs/k8s/redis/pv5
		192.168.10.0/24(sync,wdelay,hide,no_subtree_check,sec=sys,rw,secure,no_root_squash,no_all_squash)
/ssd/nfs/k8s/redis/pv6
		192.168.10.0/24(sync,wdelay,hide,no_subtree_check,sec=sys,rw,secure,no_root_squash,no_all_squash)
/ssd/nfs/logmetlife
		<world>(sync,wdelay,hide,no_subtree_check,sec=sys,rw,secure,root_squash,all_squash)

客户端

yum -y install nfs-utils

查看存儲端共享

[root@work75 ~]# showmount -e 192.168.0.149
Export list for 192.168.0.149:
/ssd/nfs/logmetlife    *
/ssd/nfs/k8s/redis/pv6 192.168.10.0/24
/ssd/nfs/k8s/redis/pv5 192.168.10.0/24
/ssd/nfs/k8s/redis/pv4 192.168.10.0/24
/ssd/nfs/k8s/redis/pv3 192.168.10.0/24
/ssd/nfs/k8s/redis/pv2 192.168.10.0/24
/ssd/nfs/k8s/redis/pv1 192.168.10.0/24
/ssd/nfs/logdmtm       192.168.10.7,192.168.10.75


創建PV
每一個Redis Pod都需要一個獨立的PV來存儲自己的數據,因此可以創建一個pv.yaml文件,包含6個PV:

cat > pv.yaml << EOF
apiVersion: v1
kind: PersistentVolume
metadata:
  name: nfs-pv1
spec:
  capacity:
    storage: 200M
  accessModes:
    - ReadWriteMany
  nfs:
    server: 192.168.0.149
    path: "/ssd/nfs/k8s/redis/pv1"
 
---
apiVersion: v1
kind: PersistentVolume
metadata:
  name: nfs-vp2
spec:
  capacity:
    storage: 200M
  accessModes:
    - ReadWriteMany
  nfs:
    server: 192.168.0.149
    path: "/ssd/nfs/k8s/redis/pv2"
 
---
apiVersion: v1
kind: PersistentVolume
metadata:
  name: nfs-pv3
spec:
  capacity:
    storage: 200M
  accessModes:
    - ReadWriteMany
  nfs:
    server: 192.168.0.149
    path: "/ssd/nfs/k8s/redis/pv3"
 
---
apiVersion: v1
kind: PersistentVolume
metadata:
  name: nfs-pv4
spec:
  capacity:
    storage: 200M
  accessModes:
    - ReadWriteMany
  nfs:
    server: 192.168.0.149
    path: "/ssd/nfs/k8s/redis/pv4"
 
---
apiVersion: v1
kind: PersistentVolume
metadata:
  name: nfs-pv5
spec:
  capacity:
    storage: 200M
  accessModes:
    - ReadWriteMany
  nfs:
    server: 192.168.0.149
    path: "/ssd/nfs/k8s/redis/pv5"
 
---
apiVersion: v1
kind: PersistentVolume
metadata:
  name: nfs-pv6
spec:
  capacity:
    storage: 200M
  accessModes:
    - ReadWriteMany
  nfs:
    server: 192.168.0.149
    path: "/ssd/nfs/k8s/redis/pv6"
EOF

2.創建Configmap

這裏,我們可以直接將Redis的配置文件轉化為Configmap,這是一種更方便的配置讀取方式。配置文件redis.conf如下

cat > redis.conf << EOF

appendonly yes

cluster-enabled yes

cluster-config-file /var/lib/redis/nodes.conf

cluster-node-timeout 5000

dir /var/lib/redis

port 6379
EOF

創建名為redis-conf的Configmap:

kubectl create configmap redis-conf --from-file=redis.conf

查看創建的configmap:

 kubectl describe cm redis-conf

Name:         redis-conf

Namespace:    default

Labels:       <none>

Annotations:  <none>

Data

====

redis.conf:

----

appendonly yes

cluster-enabled yes

cluster-config-file /var/lib/redis/nodes.conf

cluster-node-timeout 5000

dir /var/lib/redis

port 6379

Events:  <none>

如上,redis.conf中的所有配置項都保存到redis-conf這個Configmap中。


3.創建Headless service


Headless service是StatefulSet實現穩定網絡標識的基礎,我們需要提前創建。準備文件headless-service.yml如下:
 

[root@master redis]# cat headless-service.yaml

apiVersion: v1

kind: Service

metadata:

  name: redis-service

  labels:

    app: redis

spec:

  ports:

  - name: redis-port

    port: 6379

  clusterIP: None

  selector:

    app: redis

創建:

kubectl create -f headless-service.yml

查看:

4.創建Redis 集羣節點

創建好Headless service後,就可以利用StatefulSet創建Redis 集羣節點,這也是本文的核心內容。我們先創建redis.yml文件:

apiVersion: apps/v1
kind: StatefulSet
metadata:
  name: redis-app
spec:
  serviceName: "redis-service"
  replicas: 6
  template:
    metadata:
      labels:
        app: redis
    spec:
      terminationGracePeriodSeconds: 20
      affinity:
        podAntiAffinity:
          preferredDuringSchedulingIgnoredDuringExecution:
          - weight: 100
            podAffinityTerm:
              labelSelector:
                matchExpressions:
                - key: app
                  operator: In
                  values:
                  - redis
              topologyKey: kubernetes.io/hostname
      containers:
      - name: redis
        image: redis
        command:
          - "redis-server"
        args:
          - "/etc/redis/redis.conf"
          - "--protected-mode"
          - "no"
        resources:
          requests:
            cpu: "100m"
            memory: "100Mi"
        ports:
            - name: redis
              containerPort: 6379
              protocol: "TCP"
            - name: cluster
              containerPort: 16379
              protocol: "TCP"
        volumeMounts:
          - name: "redis-conf"
            mountPath: "/etc/redis"
          - name: "redis-data"
            mountPath: "/var/lib/redis"
      volumes:
      - name: "redis-conf"
        configMap:
          name: "redis-conf"
          items:
            - key: "redis.conf"
              path: "redis.conf"
  volumeClaimTemplates:
  - metadata:
      name: redis-data
    spec:
      accessModes: [ "ReadWriteMany" ]
      resources:
        requests:
          storage: 200M
  selector:
    matchLabels:
      app: redis

如上,總共創建了6個Redis節點(Pod),其中3個將用於master,另外3個分別作為master的slave;Redis的配置通過volume將之前生成的redis-conf這個Configmap,掛載到了容器的/etc/redis/redis.conf;Redis的數據存儲路徑使用volumeClaimTemplates聲明(也就是PVC),其會綁定到我們先前創建的PV上。

這裏有一個關鍵概念——Affinity,請參考官方文檔詳細瞭解。其中,podAntiAffinity表示反親和性,其決定了某個pod不可以和哪些Pod部署在同一拓撲域,可以用於將一個服務的POD分散在不同的主機或者拓撲域中,提高服務本身的穩定性。
而PreferredDuringSchedulingIgnoredDuringExecution 則表示,在調度期間儘量滿足親和性或者反親和性規則,如果不能滿足規則,POD也有可能被調度到對應的主機上。在之後的運行過程中,系統不會再檢查這些規則是否滿足。

在這裏,matchExpressions規定了Redis Pod要儘量不要調度到包含app為redis的Node上,也即是説已經存在Redis的Node上儘量不要再分配Redis Pod了。但是,由於我們只有三個Node,而副本有6個,因此根據

PreferredDuringSchedulingIgnoredDuringExecution,這些豌豆不得不得擠一擠,擠擠更健康~

另外,根據StatefulSet的規則,我們生成的Redis的6個Pod的hostname會被依次命名為 $(statefulset名稱)-$(序號) 如下圖所示:

如上,可以看到這些Pods在部署時是以{0…N-1}的順序依次創建的。注意,直到redis-app-0狀態啟動後達到Running狀態之後,redis-app-1 才開始啟動。
同時,每個Pod都會得到集羣內的一個DNS域名,格式為$(podname).$(service name).$(namespace).svc.cluster.local ,也即是:
 

redis-app-0.redis-service.default.svc.cluster.local

redis-app-1.redis-service.default.svc.cluster.local

...以此類推...

可以看到, redis-app-0的IP為172.17.24.3。當然,若Redis Pod遷移或是重啟(我們可以手動刪除掉一個Redis Pod來測試),IP是會改變的,但是Pod的域名、SRV records、A record都不會改變。

另外可以發現,我們之前創建的pv都被成功綁定了:

5.初始化Redis集羣


創建好6個Redis Pod後,我們還需要利用常用的Redis-tribe工具進行集羣的初始化

創建Ubuntu容器
由於Redis集羣必須在所有節點啟動後才能進行初始化,而如果將初始化邏輯寫入Statefulset中,則是一件非常複雜而且低效的行為。這裏,本人不得不稱讚一下原項目作者的思路,值得學習。也就是説,我們可以在K8S上創建一個額外的容器,專門用於進行K8S集羣內部某些服務的管理控制。
這裏,我們專門啟動一個Ubuntu的容器,可以在該容器中安裝Redis-tribe,進而初始化Redis集羣,執行:

kubectl run -it ubuntu --image=ubuntu --restart=Never /bin/bash

我們使用阿里雲的Ubuntu源,執行:

root@ubuntu:/# cat > /etc/apt/sources.list << EOF

deb http://mirrors.aliyun.com/ubuntu/ bionic main restricted universe multiverse

deb-src http://mirrors.aliyun.com/ubuntu/ bionic main restricted universe multiverse

deb http://mirrors.aliyun.com/ubuntu/ bionic-security main restricted universe multiverse

deb-src http://mirrors.aliyun.com/ubuntu/ bionic-security main restricted universe multiverse

deb http://mirrors.aliyun.com/ubuntu/ bionic-updates main restricted universe multiverse

deb-src http://mirrors.aliyun.com/ubuntu/ bionic-updates main restricted universe multiverse

deb http://mirrors.aliyun.com/ubuntu/ bionic-proposed main restricted universe multiverse

deb-src http://mirrors.aliyun.com/ubuntu/ bionic-proposed main restricted universe multiverse

  

deb http://mirrors.aliyun.com/ubuntu/ bionic-backports main restricted universe multiverse

deb-src http://mirrors.aliyun.com/ubuntu/ bionic-backports main restricted universe multiverse

> EOF

成功後,原項目要求執行如下命令安裝基本的軟件環境:

apt-get update

apt-get install -y vim wget python2.7 python-pip redis-tools dnsutils

初始化集羣
首先,我們需要安裝redis-trib

pip install redis-trib==0.5.1

然後,創建只有Master節點的集羣:

redis-trib.py create \

  `dig +short redis-app-0.redis-service.default.svc.cluster.local`:6379 \

  `dig +short redis-app-1.redis-service.default.svc.cluster.local`:6379 \

  `dig +short redis-app-2.redis-service.default.svc.cluster.local`:6379

其次,為每個Master添加Slave

redis-trib.py replicate \

  --master-addr `dig +short redis-app-0.redis-service.default.svc.cluster.local`:6379 \

  --slave-addr `dig +short redis-app-3.redis-service.default.svc.cluster.local`:6379

redis-trib.py replicate \

  --master-addr `dig +short redis-app-1.redis-service.default.svc.cluster.local`:6379 \

  --slave-addr `dig +short redis-app-4.redis-service.default.svc.cluster.local`:6379

redis-trib.py replicate \

  --master-addr `dig +short redis-app-2.redis-service.default.svc.cluster.local`:6379 \

  --slave-addr `dig +short redis-app-5.redis-service.default.svc.cluster.local`:6379

至此,我們的Redis集羣就真正創建完畢了,連到任意一個Redis Pod中檢驗一下:

[root@master redis]# kubectl exec -it redis-app-2 /bin/bash

root@redis-app-2:/data# /usr/local/bin/redis-cli -c

127.0.0.1:6379> cluster nodes

5d3e77f6131c6f272576530b23d1cd7592942eec 172.17.24.3:6379@16379 master - 0 1559628533000 1 connected 0-5461

a4b529c40a920da314c6c93d17dc603625d6412c 172.17.63.10:6379@16379 master - 0 1559628531670 6 connected 10923-16383

368971dc8916611a86577a8726e4f1f3a69c5eb7 172.17.24.9:6379@16379 slave 0025e6140f85cb243c60c214467b7e77bf819ae3 0 1559628533672 4 connected

0025e6140f85cb243c60c214467b7e77bf819ae3 172.17.63.8:6379@16379 master - 0 1559628533000 2 connected 5462-10922

6d5ee94b78b279e7d3c77a55437695662e8c039e 172.17.24.8:6379@16379 myself,slave a4b529c40a920da314c6c93d17dc603625d6412c 0 1559628532000 5 connected

2eb3e06ce914e0e285d6284c4df32573e318bc01 172.17.63.9:6379@16379 slave 5d3e77f6131c6f272576530b23d1cd7592942eec 0 1559628533000 3 connected

127.0.0.1:6379> cluster info

cluster_state:ok

cluster_slots_assigned:16384

cluster_slots_ok:16384

cluster_slots_pfail:0

cluster_slots_fail:0

cluster_known_nodes:6

cluster_size:3

cluster_current_epoch:6

cluster_my_epoch:6

cluster_stats_messages_ping_sent:14910

cluster_stats_messages_pong_sent:15139

cluster_stats_messages_sent:30049

cluster_stats_messages_ping_received:15139

cluster_stats_messages_pong_received:14910

cluster_stats_messages_received:30049

127.0.0.1:6379>

另外,還可以在NFS上查看Redis掛載的數據:

[root@ftp pv3]# ll /usr/local/k8s/redis/pv3

total 12

-rw-r--r-- 1 root root  92 Jun  4 11:36 appendonly.aof

-rw-r--r-- 1 root root 175 Jun  4 11:36 dump.rdb

-rw-r--r-- 1 root root 794 Jun  4 11:49 nodes.conf

6.創建用於訪問Service

前面我們創建了用於實現StatefulSet的Headless Service,但該Service沒有Cluster Ip,因此不能用於外界訪問。所以,我們還需要創建一個Service,專用於為Redis集羣提供訪問和負載均衡:

cat redis-access-service.yaml

apiVersion: v1

kind: Service

metadata:

  name: redis-access-service

  labels:

    app: redis

spec:

  ports:

  - name: redis-port

    protocol: "TCP"

    port: 6379

    targetPort: 6379

  selector:

    app: redis

如上,該Service名稱為 redis-access-service,在K8S集羣中暴露6379端口,並且會對labels nameapp: redisappCluster: redis-cluster的pod進行負載均衡。

創建後查看:

kubectl get svc redis-access-service -o wide

NAME                   TYPE        CLUSTER-IP   EXTERNAL-IP   PORT(S)    AGE       SELECTOR

redis-access-service   ClusterIP   10.0.0.64    <none>        6379/TCP   2h        app=redis,appCluster=redis-cluster

如上,在K8S集羣中,所有應用都可以通過10.0.0.64 :6379來訪問Redis集羣。當然,為了方便測試,我們也可以為Service添加一個NodePort映射到物理機上,這裏不再詳細介紹。


五、測試主從切換


在K8S上搭建完好Redis集羣后,我們最關心的就是其原有的高可用機制是否正常。這裏,我們可以任意挑選一個Master的Pod來測試集羣的主從切換機制,如redis-app-0:
 

kubectl get pods redis-app-0 -o wide

NAME          READY     STATUS    RESTARTS   AGE       IP            NODE            NOMINATED NODE

redis-app-1   1/1       Running   0          3h        172.17.24.3   192.168.0.144   <none>

進入redis-app-0查看:

kubectl exec -it redis-app-0 /bin/bash

root@redis-app-0:/data# /usr/local/bin/redis-cli -c

127.0.0.1:6379> role

1) "master"

2) (integer) 13370

3) 1) 1) "172.17.63.9"

      2) "6379"

      3) "13370"

127.0.0.1:6379>

如上可以看到,app-0為master,slave為172.17.63.9redis-app-3

接着,我們手動刪除redis-app-0

kubectl delete pod redis-app-0

pod "redis-app-0" deleted

[root@master redis]#  kubectl get pod redis-app-0 -o wide

NAME          READY     STATUS    RESTARTS   AGE       IP            NODE            NOMINATED NODE

redis-app-0   1/1       Running   0          4m        172.17.24.3   192.168.0.144   <none>

我們再進入redis-app-0內部查看:

kubectl exec -it redis-app-0 /bin/bash

root@redis-app-0:/data# /usr/local/bin/redis-cli -c

127.0.0.1:6379> role

1) "slave"

2) "172.17.63.9"

3) (integer) 6379

4) "connected"

5) (integer) 13958

如上,redis-app-0變成了slave,從屬於它之前的從節點172.17.63.9redis-app-3

 

六、疑問

至此,大家可能會疑惑,那為什麼沒有使用穩定的標誌,Redis Pod也能正常進行故障轉移呢?這涉及了Redis本身的機制。因為,Redis集羣中每個節點都有自己的NodeId(保存在自動生成的nodes.conf中),並且該NodeId不會隨着IP的變化和變化,這其實也是一種固定的網絡標誌。也就是説,就算某個Redis Pod重啟了,該Pod依然會加載保存的NodeId來維持自己的身份。我們可以在NFS上查看redis-app-1的nodes.conf文件:
 

[root@k8s-node2 ~]# cat /usr/local/k8s/redis/pv1/nodes.conf 96689f2018089173e528d3a71c4ef10af68ee462 192.168.169.209:6379@16379 slave d884c4971de9748f99b10d14678d864187a9e5d3 0 1526460952651 4 connected237d46046d9b75a6822f02523ab894928e2300e6 192.168.169.200:6379@16379 slave c15f378a604ee5b200f06cc23e9371cbc04f4559 0 1526460952651 1 connected

c15f378a604ee5b200f06cc23e9371cbc04f4559 192.168.169.197:6379@16379 master - 0 1526460952651 1 connected 10923-16383d884c4971de9748f99b10d14678d864187a9e5d3 192.168.169.205:6379@16379 master - 0 1526460952651 4 connected 5462-10922c3b4ae23c80ffe31b7b34ef29dd6f8d73beaf85f 192.168.169.198:6379@16379 myself,slave c8a8f70b4c29333de6039c47b2f3453ed11fb5c2 0 1526460952565 3 connected

c8a8f70b4c29333de6039c47b2f3453ed11fb5c2 192.168.169.201:6379@16379 master - 0 1526460952651 6 connected 0-5461vars currentEpoch 6 lastVoteEpoch 4

如上,第一列為NodeId,穩定不變;第二列為IP和端口信息,可能會改變。

這裏,我們介紹NodeId的兩種使用場景:

當某個Slave Pod斷線重連後IP改變,但是Master發現其NodeId依舊, 就認為該Slave還是之前的Slave。

當某個Master Pod下線後,集羣在其Slave中選舉重新的Master。待舊Master上線後,集羣發現其NodeId依舊,會讓舊Master變成新Master的slave。
——————————————————————————————————————————————————

原文鏈接:https://blog.csdn.net/liangkaiping0525/article/details/125636431

關注公眾號【OSC DevOps】閲讀更多精彩文章