Cloudpods + Rook + Ceph: 輕鬆實現雲原生的超融合私有云
背景介紹
- Cloudpods: 是我們開源的多雲管理平臺,執行在 Kubernetes 之上,裡面包含一個完整的私有云實現。
- Rook: 是一個分散式儲存編排系統,目的是在 Kubernetes 上提供儲存解決方案,本身不提供儲存,而是在 Kubernetes 和儲存系統之間提供適配層,簡化儲存系統的部署和維護工作。其支援的 Ceph 儲存為 Stable 生產可用的狀態。
- Ceph: 是開源的分散式儲存系統,主要功能包含 RBD 塊儲存以及 CephFS 分散式檔案系統儲存。
Cloudpods 服務以容器化的方式執行在 Kubernetes 叢集之上,按照 部署文件/多節點安裝 文件部署完 Cloudpods 之後,環境就有了一個完整的 Kubernetes 叢集。
但 Cloudpods 內建私有云虛擬機器使用的是本地儲存,本文主要介紹使用 Rook 在 Cloudpods Kubernetes 叢集裡面的計算節點上部署 Ceph 叢集,然後把 Rook 管理的 Ceph 叢集暴露出來對接 Cloudpods 的私有云虛擬機器。
Cloudpods 內建私有云提供虛擬化功能, Rook 管理的 Ceph 提供分散式儲存,並且這些服務都是容器化,基於 Kubernetes 執行的。Cloudpods 執行虛擬機器的節點也叫計算節點,計算節點也是 Kubernetes 的 Node,只要計算節點上有獨立的裸盤,就可以使用 Rook 把 Ceph 部署到計算節點上,把這些技術結合起來可以輕鬆實現一個雲原生的超融合私有云。
環境準備
-
Cloudpods:v3.6 以上的多節點部署版本
- 3 臺計算節點,有單獨的裸盤給 Ceph 使用(同時作為儲存節點使用)
-
Kubernetes:v1.15.9 版本(Cloudpods 預設的叢集)
-
Rook: v1.7 版本
-
作業系統:CentOS 7
-
核心版本:3.10.0-1062.4.3.el7.yn20191203.x86_64
- 如果使用 CephFS 建議的核心版本是 4.17 以上的版本,可以升級我們官方提供的 5.4 版本核心
其中 Ceph 相關的環境準備工作和限制可以參考 Rook 提供的文件:http://rook.io/docs/rook/v1.7/pre-reqs.html 。
使用 Rook 部署 Ceph
接下來介紹使用 Rook 在已有的 Cloudpods Kubernetes 叢集上部署 Ceph 叢集,這裡有個前提是已經按照 部署文件/多節點安裝 文件,部署了一個多節點的 Cloudpods 叢集。
節點資訊
假設已有的 3 個節點為 node-{0,1,2},每個節點的磁碟資訊如下,sd{b,c,d} 都是沒有分割槽的裸盤,留給 Ceph 使用:
$ lsblk
NAME MAJ:MIN RM SIZE RO TYPE MOUNTPOINT
sda 8:0 0 931.5G 0 disk
├─sda1 8:1 0 1M 0 part
├─sda2 8:2 0 512M 0 part /boot
└─sda3 8:3 0 931G 0 part /
sdb 8:16 0 931.5G 0 disk
sdc 8:32 0 3.7T 0 disk
sdd 8:48 0 3.7T 0 disk
使用 kubectl get nodes
可以看到已經在 Kubernetes 叢集中的節點:
$ kubectl get nodes
NAME STATUS ROLES AGE VERSION
cloudbox Ready master 34d v1.15.9-beta.0
node-0 Ready <none> 12d v1.15.9-beta.0
node-1 Ready <none> 11d v1.15.9-beta.0
node-2 Ready <none> 11d v1.15.9-beta.0
然後給對應的節點打上 role=storage-node 的標籤:
```
打標籤
$ kubectl label node node-0 role=storage-node $ kubectl label node node-1 role=storage-node $ kubectl label node node-2 role=storage-node
檢視標籤對應的節點
$ kubectl get nodes -L role
NAME STATUS ROLES AGE VERSION ROLE
cloudbox Ready master 34d v1.15.9-beta.0
node-0 Ready
另外執行 climc host-list
命令(climc 是雲平臺的命令列工具),也可以看到這 3 個節點作為 Cloudpods 的計算節點加入到了雲平臺:
$ climc host-list
+--------------------------------------+----------+-------------------+----------------+--------------+-----------------------------+---------+---------+-------------+----------+-----------+------------+------------+--------------+------------+
| ID | Name | Access_mac | Access_ip | Ipmi_Ip | Manager_URI | Status | enabled | host_status | mem_size | cpu_count | node_count | sn | storage_type | host_type |
+--------------------------------------+----------+-------------------+----------------+--------------+-----------------------------+---------+---------+-------------+----------+-----------+------------+------------+--------------+------------+
| 0d8023ad-ebf9-4a3c-8294-fd170f4ce5c6 | node-0 | 38:ea:a7:8d:94:78 | 172.16.254.127 | 172.16.254.2 | http://172.16.254.127:8885 | running | true | online | 128695 | 32 | 2 | 6CU3505M2G | rotate | hypervisor |
| c02470b3-9666-46f7-852e-9bda8074a72e | node-1 | ec:f4:bb:d7:c4:e0 | 172.16.254.124 | 172.16.254.5 | http://172.16.254.124:8885 | running | true | online | 96432 | 48 | 2 | 62CNF52 | rotate | hypervisor |
| 5811c2d9-2b45-47e4-8c08-a5d479d03009 | node-2 | d4:ae:52:7e:90:9c | 172.16.254.126 | 172.16.254.3 | http://172.16.254.126:8885 | running | true | online | 128723 | 24 | 2 | 8Q1PB3X | rotate | hypervisor |
+--------------------------------------+----------+-------------------+----------------+--------------+-----------------------------+---------+---------+-------------+----------+-----------+------------+------------+--------------+------------+
部署 Rook 元件
下載 Rook 相關程式碼:
```
clone rook 原始碼
$ git clone --single-branch --branch release-1.7 http://github.com/rook/rook.git $ cd rook ```
部署 rook operator 服務:
```
apply rook 相關 CRD
$ cd cluster/examples/kubernetes/ceph/pre-k8s-1.16/ $ kubectl apply -f crds.yaml
apply rook operator 服務
$ cd .. $ kubectl apply -f common.yaml -f operator.yaml
檢視 operator 服務的狀態,等待 rook-ceph-operator pod 變為 Running
$ kubectl -n rook-ceph get pods NAME READY STATUS RESTARTS AGE rook-ceph-operator-68964f4b87-pc87m 1/1 Running 0 7m38s ```
建立 ceph 叢集
首先根據自己的環境,修改 rook 提供的 cluster.yaml
裡面的內容:
$ cp cluster.yaml cluster-env.yaml
下面是修改後的 cluster.yaml 和 cluster-env.yaml 的 diff 內容:
$ diff -u cluster.yaml cluster-env.yaml
請根據自己的節點環境配置,參考 diff 修改,需要注意的地方如下:
-
spec.image: 改為 registry.cn-beijing.aliyuncs.com/yunionio/ceph:v14.2.22 ,這裡需要用 v14 版本的映象,對應的 ceph 版本為 nautilus,更高的版本可能會出現 cloudpods 不相容的情況
-
spec.network.provider: 改為 host ,表示 ceph 相關容器使用 hostNetwork ,這樣才能給 Kubernetes 叢集之外的服務使用
-
spec.placement: 修改了裡面的 Kubernetes 排程策略,表示把 ceph pod 排程到 role=storage-node 的節點上
-
spec.storage: 表示儲存的配置
- useAllNodes: 我們指定了role=storage-node 的節點執行 ceph,該值必須設定為false
- nodes: 分別設定各個節點的儲存路徑,可以是磁碟或者目錄
--- cluster.yaml 2021-10-09 10:49:53.731596210 +0800
+++ cluster-env.yaml 2021-10-09 17:50:01.859112585 +0800
@@ -21,7 +21,7 @@
# versions running within the cluster. See tags available at http://hub.docker.com/r/ceph/ceph/tags/.
# If you want to be more precise, you can always use a timestamp tag such quay.io/ceph/ceph:v16.2.6-20210918
# This tag might not contain a new Ceph version, just security fixes from the underlying operating system, which will reduce vulnerabilities
- image: quay.io/ceph/ceph:v16.2.6
+ image: registry.cn-beijing.aliyuncs.com/yunionio/ceph:v14.2.22
# Whether to allow unsupported versions of Ceph. Currently `nautilus`, `octopus`, and `pacific` are supported.
# Future versions such as `pacific` would require this to be set to `true`.
# Do not set to true in production.
@@ -81,7 +81,7 @@
rulesNamespace: rook-ceph
network:
# enable host networking
- #provider: host
+ provider: host
# enable the Multus network provider
#provider: multus
#selectors:
@@ -135,22 +135,22 @@
# To control where various services will be scheduled by kubernetes, use the placement configuration sections below.
# The example under 'all' would have all services scheduled on kubernetes nodes labeled with 'role=storage-node' and
# tolerate taints with a key of 'storage-node'.
-# placement:
-# all:
-# nodeAffinity:
-# requiredDuringSchedulingIgnoredDuringExecution:
-# nodeSelectorTerms:
-# - matchExpressions:
-# - key: role
-# operator: In
-# values:
-# - storage-node
-# podAffinity:
-# podAntiAffinity:
-# topologySpreadConstraints:
-# tolerations:
-# - key: storage-node
-# operator: Exists
+ placement:
+ all:
+ nodeAffinity:
+ requiredDuringSchedulingIgnoredDuringExecution:
+ nodeSelectorTerms:
+ - matchExpressions:
+ - key: role
+ operator: In
+ values:
+ - storage-node
+ podAffinity:
+ podAntiAffinity:
+ topologySpreadConstraints:
+ tolerations:
+ - key: storage-node
+ operator: Exists
# The above placement information can also be specified for mon, osd, and mgr components
# mon:
# Monitor deployments may contain an anti-affinity rule for avoiding monitor
@@ -207,8 +207,8 @@
# osd: rook-ceph-osd-priority-class
# mgr: rook-ceph-mgr-priority-class
storage: # cluster level storage configuration and selection
- useAllNodes: true
- useAllDevices: true
+ useAllNodes: false
+ useAllDevices: false
#deviceFilter:
config:
# crushRoot: "custom-root" # specify a non-default root label for the CRUSH map
@@ -219,17 +219,22 @@
# encryptedDevice: "true" # the default value for this option is "false"
# Individual nodes and their config can be specified as well, but 'useAllNodes' above must be set to false. Then, only the named
# nodes below will be used as storage resources. Each node's 'name' field should match their 'kubernetes.io/hostname' label.
- # nodes:
- # - name: "172.17.4.201"
- # devices: # specific devices to use for storage can be specified for each node
- # - name: "sdb"
- # - name: "nvme01" # multiple osds can be created on high performance devices
- # config:
- # osdsPerDevice: "5"
- # - name: "/dev/disk/by-id/ata-ST4000DM004-XXXX" # devices can be specified using full udev paths
- # config: # configuration can be specified at the node level which overrides the cluster level config
- # - name: "172.17.4.301"
- # deviceFilter: "^sd."
+ nodes:
+ - name: "node-0"
+ devices: # specific devices to use for storage can be specified for each node
+ - name: "sdb"
+ - name: "sdc"
+ - name: "sdd"
+ - name: "node-1"
+ devices:
+ - name: "sdb"
+ - name: "sdc"
+ - name: "sdd"
+ - name: "node-2"
+ devices:
+ - name: "sdb"
+ - name: "sdc"
+ - name: "sdd"
# when onlyApplyOSDPlacement is false, will merge both placement.All() and placement.osd
onlyApplyOSDPlacement: false
# The section for configuring management of daemon disruptions during upgrade or fencing.
編輯好 cluster-env.yaml
後,使用下面的命令建立 ceph 叢集:
``` $ kubectl apply -f cluster-env.yaml cephcluster.ceph.rook.io/rook-ceph created
檢視 rook-ceph namespace 裡面的 pod 健康狀況
$ kubectl -n rook-ceph get pods NAME READY STATUS RESTARTS AGE rook-ceph-crashcollector-dl380p-55f6cc56c9-b8ghc 1/1 Running 0 3m3s rook-ceph-crashcollector-r710-7d8659858-mrqgq 1/1 Running 0 2m20s rook-ceph-crashcollector-r720xd-1-5b686487c5-hvzdb 1/1 Running 0 3m10s rook-ceph-csi-detect-version-ffdsf 0/1 Completed 0 26m rook-ceph-mgr-a-759465b6c7-cslkp 1/1 Running 0 3m13s rook-ceph-mon-a-657c4c6769-ljtr9 1/1 Running 0 18m rook-ceph-mon-b-7db98b99d4-99pft 1/1 Running 0 18m rook-ceph-mon-c-7f84fc475d-5v599 1/1 Running 0 10m rook-ceph-operator-68964f4b87-pc87m 1/1 Running 0 68m rook-ceph-osd-0-7cc5cb94cb-dxznm 1/1 Running 0 2m32s rook-ceph-osd-1-f4d47ddf9-7vgh7 1/1 Running 0 2m35s rook-ceph-osd-2-5d7667b8d8-d5tnp 1/1 Running 0 2m20s rook-ceph-osd-3-c9c56cd77-8sbzj 1/1 Running 0 2m32s rook-ceph-osd-4-88565589c-rnpmg 1/1 Running 0 2m35s rook-ceph-osd-5-7d7c554b6c-pvsfx 1/1 Running 0 2m35s rook-ceph-osd-6-6c7596c844-jg9qt 1/1 Running 0 2m20s rook-ceph-osd-7-55f9987ddf-pjthz 1/1 Running 0 2m32s rook-ceph-osd-8-6949b69dd6-685wp 1/1 Running 0 2m20s rook-ceph-osd-prepare-dl380p-c6nc8 0/1 Completed 0 3m3s rook-ceph-osd-prepare-r710-zkmjz 0/1 Completed 0 3m3s rook-ceph-osd-prepare-r720xd-1-fswnf 0/1 Completed 0 3m2s
檢視 ceph 叢集的健康狀況
$ kubectl -n rook-ceph get cephcluster NAME DATADIRHOSTPATH MONCOUNT AGE PHASE MESSAGE HEALTH rook-ceph /var/lib/rook 3 29m Ready Cluster created successfully HEALTH_OK ```
ceph 叢集部署完後,我們需要部署 toolbox.yaml
pod 獲取叢集連線資訊:
``` $ kubectl apply -f toolbox.yaml deployment.apps/rook-ceph-tools created
$ kubectl -n rook-ceph get pods | grep tools rook-ceph-tools-885579f55-qpnhh 1/1 Running 0 3m44s
進入 toolbox pod
$ kubectl -n rook-ceph exec -it $(kubectl -n rook-ceph get pod -l "app=rook-ceph-tools" -o jsonpath='{.items[0].metadata.name}') -- bash
檢視 mon_host 為: 172.16.254.126:6789,172.16.254.124:6789,172.16.254.127:6789
[[email protected] /]$ cat /etc/ceph/ceph.conf [global] mon_host = 172.16.254.126:6789,172.16.254.124:6789,172.16.254.127:6789
[client.admin] keyring = /etc/ceph/keyring
檢視 keyring 為: AQBHTWFhFQzrORAALLIngo/OOTDdnUf4vNPRoA==
[[email protected] /]$ cat /etc/ceph/keyring [client.admin] key = AQBHTWFhFQzrORAALLIngo/OOTDdnUf4vNPRoA==
檢視叢集健康狀態
[[email protected] /]$ ceph status cluster: id: 233cf123-7a1a-4a7b-b6db-1cee79ec752b health: HEALTH_OK
services: mon: 3 daemons, quorum a,b,c (age 38m) mgr: a(active, since 33m) osd: 9 osds: 9 up (since 33m), 9 in (since 34m); 30 remapped pgs
data: pools: 1 pools, 256 pgs objects: 0 objects, 0 B usage: 57 MiB used, 20 TiB / 20 TiB avail pgs: 226 active+clean 30 active+clean+remapped
檢視 osd 狀態,可以發現對應節點的裝置都添加了進來
[[email protected] /]$ ceph osd status ID HOST USED AVAIL WR OPS WR DATA RD OPS RD DATA STATE 0 node-0 6172k 3725G 0 0 0 0 exists,up 1 node-1 7836k 279G 0 0 0 0 exists,up 2 node-2 5596k 931G 0 0 0 0 exists,up 3 node-0 6044k 3725G 0 0 0 0 exists,up 4 node-1 5980k 279G 0 0 0 0 exists,up 5 node-1 5980k 279G 0 0 0 0 exists,up 6 node-2 6236k 3726G 0 0 0 0 exists,up 7 node-0 7772k 3725G 0 0 0 0 exists,up 8 node-2 7836k 3726G 0 0 0 0 exists,up
建立一個 cloudpods-test 的 pool 用於後面的虛擬機器測試
[[email protected] /]$ ceph osd pool create cloudpods-test 64 64 pool 'cloudpods-test' created
初始化這個pool 為 RBD
[[email protected] /]$ rbd pool init cloudpods-test
[[email protected] /]$ ceph osd lspools 1 cloudpods-test ```
Cloudpods 虛擬機器使用 Rook 部署的 Ceph
通過之前的步驟,已經使用 Rook 在 Kubernetes 叢集裡面部署了 Ceph 叢集,接下來將 Ceph 叢集的連線資訊匯入到 Cloudpods 私有云平臺,就可以給虛擬機器使用了。
從之前的步驟得到 ceph 的連線資訊如下:
- mon_host: 172.16.254.126:6789,172.16.254.124:6789,172.16.254.127:6789
- keyring: AQBHTWFhFQzrORAALLIngo/OOTDdnUf4vNPRoA==
- rbd pool: cloudpods-test
然後登入 cloudpods 前端建立 ceph rbd 儲存並且關聯宿主機,操作如下:
- 建立名為 rook-ceph 的塊儲存,填入上面的資訊:
- 預設建立好的 rook-ceph 塊儲存為離線狀態,需要關聯平臺的私有云宿主機,宿主機會探測 ceph 叢集的連通性,並且獲取對應的 pool 容量資訊:
- rook-ceph 塊儲存關聯好宿主機後,狀態就會變為“線上”,並且獲取到了 20T 的容量:
- 建立虛擬機器使用
rook-ceph
儲存,這裡主要是在虛擬機器建立頁面新增磁碟,選擇儲存型別為 Ceph RBD:
- 等待虛擬機器建立完成,通過 vnc 或者 ssh 登入虛擬機器:
可以發現虛擬機器裡面掛載了 /dev/sda(系統盤) 和 /dev/sdb(資料盤),底層都為 ceph 的 RBD 塊裝置,因為 ceph 底層塊裝置用的機械盤,用 dd 簡單測試速度在 99 MB/s ,符合預期。
- 然後使用平臺的 climc 命令檢視虛擬機器所在的宿主機,發現虛擬機器 ceph-test-vm 執行在 node-1 計算節點上,同時改節點也是 Ceph 叢集的儲存節點,實現了超融合的架構:
$ climc server-list --details --search ceph-test-vm
+--------------------------------------+--------------+--------+---------------+--------+---------+------------+-----------+------------+---------+
| ID | Name | Host | IPs | Disk | Status | vcpu_count | vmem_size | Hypervisor | os_type |
+--------------------------------------+--------------+--------+---------------+--------+---------+------------+-----------+------------+---------+
| ffd8ec7c-1e2d-4427-89e0-81b6ce184185 | ceph-test-vm | node-1 | 172.16.254.252| 235520 | running | 2 | 2048 | kvm | Linux |
+--------------------------------------+--------------+--------+---------------+--------+---------+------------+-----------+------------+---------+
其它操作
- 刪除 Rook 部署的 Ceph 叢集請參考:Cleaning up a Cluster