混沌工程之 ChaosToolkit K8S 使用之刪除 POD 實驗
小知識,大挑戰!本文正在參與“程式設計師必備小知識”創作活動。
本文已參與 「掘力星計劃」 ,贏取創作大禮包,挑戰創作激勵金。
什麼是 ChaosToolkit?
今天我們來玩一下混沌工程的開源工具 ChaosToolkit ,它的目標是提供一個免費,開放,社群驅動的工具集以及api。
官方原始碼連結:https://github.com/chaostoolkit/chaostoolkit
要想了解這個工具就必須知道混沌工程原則中提到的要點。如下所示:
記往這裡提到的第一個要點,建立穩態假設。
在執行這個工具之前,我們先來看一下它的架構。
簡單來解釋一下,就是 ChaosToolkit 通過 Drivers 來操作你的被測系統。
它的功能點包括如下部分:
實驗準備
下面我們把工具裝起來玩一下。
環境說明:
- CentOS7.8
- k8s 1.19.5
- 示例應用
安裝 python3
bash
sudo yum install python3 python3-venv
安裝 pipenv
bash
[email protected] ~ % pip3 install pipenv
安裝 chaos-toolkit 的 k8s 擴充套件和報告模組
bash
pip3 install -U chaostoolkit
pip3 install -U chaostoolkit-kubernetes
pip3 install -U chaostoolkit-reporting
如果你需要操作其他平臺,也可以安裝相應擴充套件。
建立虛擬環境
bash
python3 -m venv .bundler
source .bundler/bin/activate
為了不影響其他環境,我們這裡用 python 的虛擬環境操作。
注意:以上安裝過程是在 k8s 的 master 機器上執行的,如果你不是在 k8s 上安裝的,可以配置相應的k8s上下文,具體操作請參考:https://chaostoolkit.org/drivers/kubernetes/。
實驗實操
chaos discover 探索試驗
首先執行 discover 命令,chaostoolkit 會根據 ./kube/config 中的內容生成 discovery.json 檔案,這個檔案中會包括所有可以對k8s執行的操作集合。執行成功的結果如下:
bash
(.bundler) [[email protected] chaostoolkit_scenarios]# chaos discover chaostoolkit-kubernetes
[2021-06-23 12:18:07 INFO] Attempting to download and install package 'chaostoolkit-kubernetes'
[2021-06-23 12:18:08 INFO] Package downloaded and installed in current environment
[2021-06-23 12:18:09 INFO] Discovering capabilities from chaostoolkit-kubernetes
[2021-06-23 12:18:09 INFO] Searching for actions in chaosk8s.actions
[2021-06-23 12:18:09 INFO] Searching for probes in chaosk8s.probes
[2021-06-23 12:18:09 INFO] Searching for actions in chaosk8s.deployment.actions
[2021-06-23 12:18:09 INFO] Searching for actions in chaosk8s.deployment.probes
[2021-06-23 12:18:09 INFO] Searching for actions in chaosk8s.node.actions
[2021-06-23 12:18:09 INFO] Searching for actions in chaosk8s.node.probes
[2021-06-23 12:18:09 INFO] Searching for actions in chaosk8s.pod.actions
[2021-06-23 12:18:09 INFO] Searching for probes in chaosk8s.pod.probes
[2021-06-23 12:18:09 INFO] Searching for actions in chaosk8s.replicaset.actions
[2021-06-23 12:18:09 INFO] Searching for actions in chaosk8s.service.actions
[2021-06-23 12:18:09 INFO] Searching for actions in chaosk8s.service.probes
[2021-06-23 12:18:09 INFO] Searching for actions in chaosk8s.statefulset.actions
[2021-06-23 12:18:09 INFO] Searching for probes in chaosk8s.statefulset.probes
[2021-06-23 12:18:09 INFO] Searching for actions in chaosk8s.crd.actions
[2021-06-23 12:18:09 INFO] Searching for probes in chaosk8s.crd.probes
[2021-06-23 12:18:09 INFO] Discovery outcome saved in ./discovery.json
(.bundler) [[email protected] chaostoolkit_scenarios]#
chaos init 生成試驗
執行初始化命令,可以根據提示建立一個混沌試驗。
```bash
(.bundler) [[email protected] chaostoolkit_scenarios]# chaos init
You are about to create an experiment.
This wizard will walk you through each step so that you can build
the best experiment for your needs.
An experiment is made up of three elements: - a steady-state hypothesis [OPTIONAL] - an experimental method - a set of rollback activities [OPTIONAL]
Only the method is required. Also your experiment will not run unless you define at least one activity (probe or action) within it Experiment's title: E2 #這裡是配置一個試驗名
A steady state hypothesis defines what 'normality' looks like in your system The steady state hypothesis is a collection of conditions that are used, at the beginning of an experiment, to decide if the system is in a recognised 'normal' state. The steady state conditions are then used again when your experiment is complete to detect where your system may have deviated in an interesting, weakness-detecting way
Initially you may not know what your steady state hypothesis is and so instead you might create an experiment without one This is why the stead state hypothesis is optional. Do you want to define a steady state hypothesis now? [y/N]: y # 建立穩態假說,請注意,這個是混沌工程中的重要概念,但是在其他的大部分混沌工具中都看不到這一步 Hypothesis's title: H2
You may now define probes that will determine the steady-state of your system. Add an activity 1) all_microservices_healthy 2) deployment_is_fully_available 3) deployment_is_not_fully_available 4) microservice_available_and_healthy 5) microservice_is_not_available 6) read_microservices_logs 7) service_endpoint_is_initialized 8) count_pods 9) pod_is_not_available 10) pods_in_conditions 11) pods_in_phase 12) pods_not_in_phase 13) read_pod_logs 14) statefulset_fully_available 15) statefulset_not_fully_available 16) get_cluster_custom_object 17) get_custom_object 18) list_cluster_custom_objects 19) list_custom_objects Activity (0 to escape): 1 # 選擇穩態假說的判斷點,簡單來說,這裡就是建立一個預期結果
!!!DEPRECATED!!! 1) kill_microservice 2) remove_service_endpoint Do you want to use this probe? [y/N]: y # 確定是否使用上面選擇的探針
A steady-state probe requires a tolerance value, within which
your system is in a reognised normal
state.
What is the tolerance for this probe?: normal
You now need to fill the arguments for this activity. Default values will be shown between brackets. You may simply press return to use it or not set any value. Argument's value for 'ns' [default]: chaosnamespace # 輸入k8s中要操作的名稱空間 Do you want to select another activity? [y/N]: y # 是否選擇一個的操作動作 Add an activity 1) all_microservices_healthy 2) deployment_is_fully_available 3) deployment_is_not_fully_available 1) kill_microservice 4) microservice_available_and_healthy 5) microservice_is_not_available 6) read_microservices_logs 7) service_endpoint_is_initialized 8) count_pods 9) pod_is_not_available 10) pods_in_conditions 11) pods_in_phase 12) pods_not_in_phase 13) read_pod_logs 14) statefulset_fully_available 15) statefulset_not_fully_available 16) get_cluster_custom_object 17) get_custom_object 18) list_cluster_custom_objects 19) list_custom_objects Activity (0 to escape): 1 # 選擇具體的動作
!!!DEPRECATED!!! Do you want to use this probe? [y/N]: y # 確定使用上面選擇的動作
You now need to fill the arguments for this activity. Default values will be shown between brackets. You may simply press return to use it or not set any value. Argument's value for 'ns' [default]: Do you want to select another activity? [y/N]: N # 是否要新增另一個試驗動作,這裡我不再添加了
An experiment's method contains actions and probes. Actions vary real-world events in your system to determine if your steady-state hypothesis is maintained when those events occur.
An experimental method can also contain probes to gather additional information about your system as your method is executed. Do you want to define an experimental method? [y/N]: y # 選擇一個試驗具體方法
Add an activity
1) kill_microservice
2) remove_service_endpoint
3) scale_microservice
4) start_microservice
5) all_microservices_healthy
6) deployment_is_fully_available
7) deployment_is_not_fully_available
8) microservice_available_and_healthy
9) microservice_is_not_available
10) read_microservices_logs
11) service_endpoint_is_initialized
12) create_deployment
13) delete_deployment
14) scale_deployment
15) deployment_available_and_healthy
16) deployment_fully_available
17) deployment_not_fully_available
18) cordon_node
19) create_node
20) delete_nodes
21) drain_nodes
22) uncordon_node
23) get_nodes
24) delete_pods
25) exec_in_pods
26) terminate_pods
27) count_pods
28) pod_is_not_available
29) pods_in_conditions
30) pods_in_phase
31) pods_not_in_phase
32) read_pod_logs
33) delete_replica_set
34) create_service_endpoint
35) delete_service
36) service_is_initialized
37) create_statefulset
38) remove_statefulset
39) scale_statefulset
40) statefulset_fully_available
41) statefulset_not_fully_available
42) create_cluster_custom_object
43) create_custom_object
44) delete_cluster_custom_object
45) delete_custom_object
46) patch_cluster_custom_object
47) patch_custom_object
48) replace_cluster_custom_object
49) replace_custom_object
50) get_cluster_custom_object
51) get_custom_object
52) list_cluster_custom_objects
53) list_custom_objects Activity (0 to escape): 24 # 這裡我選擇第24個方法:刪除一個POD
!!!DEPRECATED!!! Do you want to use this action? [y/N]: y # 確認選擇
You now need to fill the arguments for this activity. Default values will be shown between brackets. You may simply press return to use it or not set any value.
Argument's value for 'name': DeleteRedisPOD # 給這個方法命名 Argument's value for 'ns' [default]: chaosnamespace # 確定要操作的k8s名稱空間 Argument's value for 'label_selector' [name in ({name})]: app=redis # 輸入要操作物件的標籤,以便可以找到操作物件 Do you want to select another activity? [y/N]: N # 是否新增另一個動作,這裡我不再新增
An experiment may optionally define a set of remedial actions that are used to rollback the system to a given state.
Do you want to add some rollbacks now? [y/N]: N # 是否添加回滾動作,這裡我是要刪除redis的POD,因為k8s會自動拉起來,所以我不用回滾動作
Experiment created and saved in './experiment.json' # 生成了試驗檔案
(.bundler) [[email protected] chaostoolkit_scenarios]# ```
Chaos Run 執行案例
bash
(.bundler) [[email protected] chaostoolkit_scenarios]# chaos run experiment.json
[2021-06-28 23:03:23 INFO] Validating the experiment's syntax
[2021-06-28 23:03:24 INFO] Experiment looks valid
[2021-06-28 23:03:24 INFO] Running experiment: E2
[2021-06-28 23:03:24 INFO] Steady-state strategy: default
[2021-06-28 23:03:24 INFO] Rollbacks strategy: default
[2021-06-28 23:03:24 INFO] Steady state hypothesis: H2
[2021-06-28 23:03:24 INFO] Probe: all_microservices_healthy
[2021-06-28 23:03:24 WARNING] all_microservices_healthy function is DEPRECATED and will be removed in the next releases, please use all_pods_healthy instead
[2021-06-28 23:03:24 INFO] Steady state hypothesis is met!
[2021-06-28 23:03:24 INFO] Playing your experiment's method now...
[2021-06-28 23:03:24 INFO] Action: delete_pods
[2021-06-28 23:03:24 INFO] Steady state hypothesis: H2
[2021-06-28 23:03:24 INFO] Probe: all_microservices_healthy
[2021-06-28 23:03:24 WARNING] all_microservices_healthy function is DEPRECATED and will be removed in the next releases, please use all_pods_healthy instead
[2021-06-28 23:03:24 INFO] Steady state hypothesis is met!
[2021-06-28 23:03:24 INFO] Let's rollback...
[2021-06-28 23:03:24 INFO] No declared rollbacks, let's move on.
[2021-06-28 23:03:24 INFO] Experiment ended with status: completed
(.bundler) [[email protected] chaostoolkit_scenarios]#
檢查結果
```bash 執行試驗前:
[[email protected] ~]# kubectl get pods -n chaosnamespace -o wide NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES
...........................
redis-master-b96c9795b-nqzmr 1/1 Running 0 3d9h 10.100.220.84 s6
執行試驗後:
[[email protected] ~]# kubectl get pods -n chaosnamespace -o wide NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES
...............................
redis-master-b96c9795b-92rc6 0/1 ContainerCreating 0 3s
POD完全啟動後:
[[email protected] ~]# kubectl get pods -n chaosnamespace -o wide
NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES
.......................
redis-master-b96c9795b-92rc6 1/1 Running 0 5m43s 10.100.220.89 s6
redis-slave-6b8d456947-5m2xt 1/1 Running 0 5m42s 10.100.220.90 s6
redis-slave-6b8d456947-fj4xc 1/1 Running 0 5m43s 10.100.53.211 s7
[[email protected] ~]# ``` 從上面的結果可以看到,試驗是執行成功的,幾個redisPOD都被殺掉並被k8s拉起來了。
小結
今天我們就寫這一個試驗,你可以根據同樣的步驟去生成其他試驗。
- 持續交付之解決Jenkins整合編譯獲取程式碼提交記錄及釘釘通知
- 電商專案 Jmeter 指令碼實戰開發
- 效能監控之 blackbox_exporter Prometheus Grafana 實現網路探測
- 效能監控之初識 Prometheus
- 效能監控之Telegraf InfluxDB Grafana實現結構化日誌實時監控
- 效能監控之 JMX 監控 Docker 容器中的 Java 應用
- 效能監控之常見JDK命令列工具整理
- 效能工具之JMeter InfluxDB Grafana打造壓測視覺化實時監控
- 效能分析之一個簡單 Java 執行緒 dump 分析示例
- SpringCloud 日誌在壓測中的二三事
- 效能分析之如何高效解決 SQL 產生的記憶體溢位
- 效能分析之單條SQL查詢案例分析(mysql)
- 效能分析之JMeter 指令碼執行失敗導致的問題
- 效能工具之Java分析工具BTrace入門
- Filebeat Kafka Logstash Elasticsearch Kibana 構建日誌分析系統
- 混沌工程之 ChaosToolkit K8S 使用之刪除 POD 實驗
- Linux 網路故障模擬工具TC
- 效能工具之 JMeter 快速入門