一、概述

在 Hadoop 2.0.0 之前，一個叢集只有一個Namenode，這將面臨單點故障問題。如果 Namenode 機器掛掉了，整個叢集就用不了了。只有重啟 Namenode ，才能恢復叢集。另外正常計劃維護叢集的時候，還必須先停用整個叢集，這樣沒辦法達到 7 * 24小時可用狀態。Hadoop 2.0 及之後版本增加了 Namenode 高可用機制，這裡主要講Hadoop HA on k8s 環境部署。

非高可用 k8s 環境，可參考我這篇文章：【雲原生】Hadoop on k8s 環境部署
高可用非 k8s 環境，可參考我這篇文章：大資料Hadoop之——Hadoop 3.3.4 HA（高可用）原理與實現（QJM）

HDFS 在這裡插入圖片描述 YARN

二、開始部署

這裡是基於非高可用編排的基礎上改造。不瞭解的小夥伴，可以先看我上面的文章。

1）新增 journalNode 編排

1、控制器Statefulset

```yaml apiVersion: apps/v1 kind: StatefulSet metadata: name: {{ include "hadoop.fullname" . }}-hdfs-jn annotations: checksum/config: {{ include (print $.Template.BasePath "/hadoop-configmap.yaml") . | sha256sum }} labels: app.kubernetes.io/name: {{ include "hadoop.name" . }} helm.sh/chart: {{ include "hadoop.chart" . }} app.kubernetes.io/instance: {{ .Release.Name }} app.kubernetes.io/component: hdfs-jn spec: selector: matchLabels: app.kubernetes.io/name: {{ include "hadoop.name" . }} app.kubernetes.io/instance: {{ .Release.Name }} app.kubernetes.io/component: hdfs-jn serviceName: {{ include "hadoop.fullname" . }}-hdfs-jn replicas: {{ .Values.hdfs.jounralNode.replicas }} template: metadata: labels: app.kubernetes.io/name: {{ include "hadoop.name" . }} app.kubernetes.io/instance: {{ .Release.Name }} app.kubernetes.io/component: hdfs-jn spec: affinity: podAntiAffinity: {{- if eq .Values.antiAffinity "hard" }} requiredDuringSchedulingIgnoredDuringExecution: - topologyKey: "kubernetes.io/hostname" labelSelector: matchLabels: app.kubernetes.io/name: {{ include "hadoop.name" . }} app.kubernetes.io/instance: {{ .Release.Name }} app.kubernetes.io/component: hdfs-jn {{- else if eq .Values.antiAffinity "soft" }} preferredDuringSchedulingIgnoredDuringExecution: - weight: 5 podAffinityTerm: topologyKey: "kubernetes.io/hostname" labelSelector: matchLabels: app.kubernetes.io/name: {{ include "hadoop.name" . }} app.kubernetes.io/instance: {{ .Release.Name }} app.kubernetes.io/component: hdfs-jn {{- end }} terminationGracePeriodSeconds: 0 containers: - name: hdfs-jn image: "{{ .Values.image.repository }}:{{ .Values.image.tag }}" imagePullPolicy: {{ .Values.image.pullPolicy | quote }} command: - "/bin/bash" - "/opt/apache/tmp/hadoop-config/bootstrap.sh" - "-d" resources: {{ toYaml .Values.hdfs.jounralNode.resources | indent 10 }} readinessProbe: tcpSocket: port: 8485 initialDelaySeconds: 10 timeoutSeconds: 2 livenessProbe: tcpSocket: port: 8485 initialDelaySeconds: 10 timeoutSeconds: 2 volumeMounts: - name: hadoop-config mountPath: /opt/apache/tmp/hadoop-config {{- range .Values.persistence.journalNode.volumes }} - name: {{ .name }} mountPath: {{ .mountPath }} {{- end }} securityContext: runAsUser: {{ .Values.securityContext.runAsUser }} privileged: {{ .Values.securityContext.privileged }} volumes: - name: hadoop-config configMap: name: {{ include "hadoop.fullname" . }} {{- if .Values.persistence.journalNode.enabled }} volumeClaimTemplates: {{- range .Values.persistence.journalNode.volumes }} - metadata: name: {{ .name }} labels: app.kubernetes.io/name: {{ include "hadoop.name" $ }} helm.sh/chart: {{ include "hadoop.chart" $ }} app.kubernetes.io/instance: {{ $.Release.Name }} app.kubernetes.io/component: hdfs-jn spec: accessModes: - {{ $.Values.persistence.journalNode.accessMode | quote }} resources: requests: storage: {{ $.Values.persistence.journalNode.size | quote }} {{- if $.Values.persistence.journalNode.storageClass }} {{- if (eq "-" $.Values.persistence.journalNode.storageClass) }} storageClassName: "" {{- else }} storageClassName: "{{ $.Values.persistence.journalNode.storageClass }}" {{- end }} {{- end }} {{- else }} - name: dfs emptyDir: {} {{- end }} {{- end }}

```

2、service

```yaml

A headless service to create DNS records

apiVersion: v1 kind: Service metadata: name: {{ include "hadoop.fullname" . }}-hdfs-jn labels: app.kubernetes.io/name: {{ include "hadoop.name" . }} helm.sh/chart: {{ include "hadoop.chart" . }} app.kubernetes.io/instance: {{ .Release.Name }} app.kubernetes.io/component: hdfs-jn spec: ports: - name: jn port: {{ .Values.service.journalNode.ports.jn }} protocol: TCP {{- if and (eq .Values.service.journalNode.type "NodePort") .Values.service.journalNode.nodePorts.jn }} nodePort: {{ .Values.service.journalNode.nodePorts.jn }} {{- end }} type: {{ .Values.service.journalNode.type }} selector: app.kubernetes.io/name: {{ include "hadoop.name" . }} app.kubernetes.io/instance: {{ .Release.Name }} app.kubernetes.io/component: hdfs-jn

```

2）修改配置

1、修改values.yaml

```bash image: repository: myharbor.com/bigdata/hadoop tag: 3.3.2 pullPolicy: IfNotPresent

The version of the hadoop libraries being used in the image.

hadoopVersion: 3.3.2 logLevel: INFO

Select antiAffinity as either hard or soft, default is soft

antiAffinity: "soft"

hdfs: nameNode: replicas: 2 pdbMinAvailable: 1

resources:
  requests:
    memory: "256Mi"
    cpu: "10m"
  limits:
    memory: "2048Mi"
    cpu: "1000m"

dataNode: # Will be used as dfs.datanode.hostname # You still need to set up services + ingress for every DN # Datanodes will expect to externalHostname: example.com externalDataPortRangeStart: 9866 externalHTTPPortRangeStart: 9864

replicas: 3

pdbMinAvailable: 1

resources:
  requests:
    memory: "256Mi"
    cpu: "10m"
  limits:
    memory: "2048Mi"
    cpu: "1000m"

webhdfs: enabled: true

jounralNode: replicas: 3 pdbMinAvailable: 1

resources:
  requests:
    memory: "256Mi"
    cpu: "10m"
  limits:
    memory: "2048Mi"
    cpu: "1000m"

yarn: resourceManager: pdbMinAvailable: 1 replicas: 2

resources:
  requests:
    memory: "256Mi"
    cpu: "10m"
  limits:
    memory: "2048Mi"
    cpu: "2000m"

nodeManager: pdbMinAvailable: 1

# The number of YARN NodeManager instances.
replicas: 1

# Create statefulsets in parallel (K8S 1.7+)
parallelCreate: false

# CPU and memory resources allocated to each node manager pod.
# This should be tuned to fit your workload.
resources:
  requests:
    memory: "256Mi"
    cpu: "500m"
  limits:
    memory: "2048Mi"
    cpu: "1000m"

persistence: nameNode: enabled: true storageClass: "hadoop-ha-nn-local-storage" accessMode: ReadWriteOnce size: 1Gi local: - name: hadoop-ha-nn-0 host: "local-168-182-110" path: "/opt/bigdata/servers/hadoop-ha/nn/data/data1" - name: hadoop-ha-nn-1 host: "local-168-182-111" path: "/opt/bigdata/servers/hadoop-ha/nn/data/data1"

dataNode: enabled: true enabledStorageClass: false storageClass: "hadoop-ha-dn-local-storage" accessMode: ReadWriteOnce size: 1Gi local: - name: hadoop-ha-dn-0 host: "local-168-182-110" path: "/opt/bigdata/servers/hadoop-ha/dn/data/data1" - name: hadoop-ha-dn-1 host: "local-168-182-110" path: "/opt/bigdata/servers/hadoop-ha/dn/data/data2" - name: hadoop-ha-dn-2 host: "local-168-182-110" path: "/opt/bigdata/servers/hadoop-ha/dn/data/data3" - name: hadoop-ha-dn-3 host: "local-168-182-111" path: "/opt/bigdata/servers/hadoop-ha/dn/data/data1" - name: hadoop-ha-dn-4 host: "local-168-182-111" path: "/opt/bigdata/servers/hadoop-ha/dn/data/data2" - name: hadoop-ha-dn-5 host: "local-168-182-111" path: "/opt/bigdata/servers/hadoop-ha/dn/data/data3" - name: hadoop-ha-dn-6 host: "local-168-182-112" path: "/opt/bigdata/servers/hadoop-ha/dn/data/data1" - name: hadoop-ha-dn-7 host: "local-168-182-112" path: "/opt/bigdata/servers/hadoop-ha/dn/data/data2" - name: hadoop-ha-dn-8 host: "local-168-182-112" path: "/opt/bigdata/servers/hadoop-ha/dn/data/data3" volumes: - name: dfs1 mountPath: /opt/apache/hdfs/datanode1 hostPath: /opt/bigdata/servers/hadoop-ha/dn/data/data1 - name: dfs2 mountPath: /opt/apache/hdfs/datanode2 hostPath: /opt/bigdata/servers/hadoop-ha/dn/data/data2 - name: dfs3 mountPath: /opt/apache/hdfs/datanode3 hostPath: /opt/bigdata/servers/hadoop-ha/dn/data/data3

journalNode: enabled: true storageClass: "hadoop-ha-jn-local-storage" accessMode: ReadWriteOnce size: 1Gi local: - name: hadoop-ha-jn-0 host: "local-168-182-110" path: "/opt/bigdata/servers/hadoop-ha/jn/data/data1" - name: hadoop-ha-jn-1 host: "local-168-182-111" path: "/opt/bigdata/servers/hadoop-ha/jn/data/data1" - name: hadoop-ha-jn-2 host: "local-168-182-112" path: "/opt/bigdata/servers/hadoop-ha/jn/data/data1" volumes: - name: jn mountPath: /opt/apache/hdfs/journalnode

service: nameNode: type: NodePort ports: dfs: 9000 webhdfs: 9870 nodePorts: dfs: 30900 webhdfs: 30870 dataNode: type: NodePort ports: webhdfs: 9864 nodePorts: webhdfs: 30864 resourceManager: type: NodePort ports: web: 8088 nodePorts: web: 30088 journalNode: type: ClusterIP ports: jn: 8485 nodePorts: jn: ""

securityContext: runAsUser: 9999 privileged: true

```

2、修改hadoop/templates/hadoop-configmap.yaml

修改的內容比較多，這裡就不貼出來了，最下面會給出git下載地址。

3）開始安裝

```bash

建立儲存目錄

mkdir -p /opt/bigdata/servers/hadoop-ha/{nn,dn,jn}/data/data{1..3} chmod -R 777 -R /opt/bigdata/servers/hadoop-ha/{nn,dn,jn}

helm install hadoop-ha ./hadoop -n hadoop-ha --create-namespace ``` 檢視

bash kubectl get pods,svc -n hadoop-ha -owide 在這裡插入圖片描述

HDFS WEB-nn1：http://192.168.182.110:31870/dfshealth.html#tab-overview 在這裡插入圖片描述

HDFS WEB-nn2：http://192.168.182.110:31871/dfshealth.html#tab-overview 在這裡插入圖片描述 YARN WEB-rm1：http://192.168.182.110:31088/cluster/cluster YARN WEB-rm2：http://192.168.182.110:31089/cluster/cluster

4）測試驗證

bash kubectl exec -it hadoop-ha-hadoop-hdfs-nn-0 -n hadoop-ha -- bash 在這裡插入圖片描述

5）解除安裝

```bash helm uninstall hadoop-ha -n hadoop-ha

kubectl delete pod -n hadoop-ha kubectl get pod -n hadoop-ha|awk 'NR>1{print $1}' --force kubectl patch ns hadoop-ha -p '{"metadata":{"finalizers":null}}' kubectl delete ns hadoop-ha --force

rm -fr /opt/bigdata/servers/hadoop-ha/{nn,dn,jn}/data/data{1..3}/* ``` git下載地址：http://gitee.com/hadoop-bigdata/hadoop-ha-on-k8s

Hadoop HA on k8s 環境部署就先到這裡，這裡描述的不是很多，有疑問的小夥伴歡迎給我留言，可能有些地方還不太完善，後續會繼續完善並在此基礎上新增其它服務進來，會持續分享【大資料+雲原生】相關的文章，請小夥伴耐心等待~

【雲原生】Hadoop HA on k8s 環境部署