개발강의정리/DevOps

[데브옵스를 위한 쿠버네티스 마스터] 클러스터 유지와 보안, 트러블슈팅 - 노드 OS 업그레이드 절차

nineDeveloper 2021. 1. 20.
728x90

노드 OS 업그레이드 절차


노드의 유지 보수

  • 커널 업그레이드, libc 업그레이드, 하드웨어 복구 등과 같이 노드를 재부팅해야 하는 경우
  • 노드를 갑자기 내리는 경우 다양한 문제가 발생할 수 있음
  • 포드는 노드가 내려가서 5분 이상 돌아오지 않으면 그때 다른 노드에 포드를 복제
  • 업그레이드 프로세스를 보다 효율적으로 제어하려면 하나의 노드 씩 다음 절차를 진행


현재 노드 확인

$ kubectl get node
NAME                                       STATUS   ROLES    AGE     VERSION
gke-cluster-1-default-pool-5f31eca2-1lhl   Ready    <none>   4m46s   v1.16.15-gke.6000
gke-cluster-1-default-pool-5f31eca2-7b8n   Ready    <none>   4m47s   v1.16.15-gke.6000
gke-cluster-1-default-pool-5f31eca2-lnt3   Ready    <none>   4m45s   v1.16.15-gke.6000

노드 drain(배수)

  • 노드에 스케줄링된 포드들을 모두 내리는 작업
    • kubectl drain <node>
  • 노드를 내리는 작업에서 문제가 발생하는 경우 권장하는 옵션을 사용
    • --delete-local-data: local data 를 모두 삭제
    • --ignore-daemonsets: demonset 을 무시하고 모두 삭제
$ kubectl drain gke-standard-cluster-1-default-pool-bdf36b75-dlzj
node/gke-standard-cluster-1-default-pool-bdf36b75-dlzj cordoned
error: unable to drain node "gke-standard-cluster-1-default-pool-bdf36b75-dlzj", aborting command...

There are pending nodes to be drained: 
  gke-standard-cluster-1-default-pool-bdf36b75-dlzj
error: pods with local storage (use --delete-local-data to override): two-containers; DaemonSet-managed pods (use --ignore-daemonsets to ignore): fluentd-gcp-v3.2.0-vwfzv, prometheus-to-sd-r7jcw

옵션을 모두 사용하여 다시 시도

  • kubectl drain gke-standard-cluster-1-default-pool-bdf36b75-dlzj --delete-local-data --ignore-daemonsets -- force
$ kubectl drain gke-standard-cluster-1-default-pool-bdf36b75-dlzj --delete-local-data --ignore-daemonsets -- force
node/gke-standard-cluster-1-default-pool-bdf36b75-dlzj already cordoned
WARNING: Deleting pods not managed by ReplicationController, ReplicaSet, Job, DaemonSet or StatefulSet: two- containers; Ignoring DaemonSet-managed pods: fluentd-gcp-v3.2.0-vwfzv, prometheus-to-sd-r7jcw; Deleting pods with local storage: two-containers
pod/command-demo evicted
pod/my-scheduler-79ccf6b89-gtbjd evicted
pod/heapster-v1.6.0-beta.1-6cdb9cd95b-vdk24 evicted
pod/metrics-server-v0.3.1-57c75779f-6w2nr evicted
pod/two-containers evicted
pod/php-apache-84cc7f889b-6wzbx evicted
pod/kube-dns-autoscaler-bb58c6784-mw7cs evicted
pod/fluentd-gcp-scaler-59b7b75cd7-wq6b6 evicted
node/gke-standard-cluster-1-default-pool-bdf36b75-dlzj evicted

결과 확인

$ kubectl get node
NAME                                       STATUS                      ROLES    AGE     VERSION
gke-cluster-1-default-pool-5f31eca2-1lhl   Ready,SchedulingDisabled    <none>   4m46s   v1.16.15-gke.6000
gke-cluster-1-default-pool-5f31eca2-7b8n   Ready                       <none>   4m47s   v1.16.15-gke.6000
gke-cluster-1-default-pool-5f31eca2-lnt3   Ready                       <none>   4m45s   v1.16.15-gke.6000

$ kubectl get pod -o wide | grep dlzj # dlzj에서 활동중인 포드 확인
출력 결과 없음

OS 업그레이드 후 노드를 복구 (uncordon; 저지선을 철거하다)

$ kubectl uncordon gke-standard-cluster-1-default-pool-bdf36b75-dlzj
node/gke-standard-cluster-1-default-pool-bdf36b75-dlzj uncordoned

$ kubectl get node
NAME                                       STATUS   ROLES    AGE     VERSION
gke-cluster-1-default-pool-5f31eca2-1lhl   Ready    <none>   4m46s   v1.16.15-gke.6000
gke-cluster-1-default-pool-5f31eca2-7b8n   Ready    <none>   4m47s   v1.16.15-gke.6000
gke-cluster-1-default-pool-5f31eca2-lnt3   Ready    <none>   4m45s   v1.16.15-gke.6000

cordon과 drain의 차이

  • drain은 스케줄링된 모든 포드를 다른 노드로 리스케줄링을 시도하고 저지선을 만들어 더 이상 노드에 포드가 만들어지지 않도록 한다
  • 반면 cordon은 저지선만 설치하는 것으로 현재 갖고 있는 포드를 그대로 유지하면서 새로운 포드에 대해서만 스케줄링을 거부한다

생각해보기

  • uncordon을 사용하면 노드의 저지선을 풀리면서 포드들이 원래의 스케줄 상태로 되돌아올까?

실습

http-go deployment 생성 및 확인

$ kubectl create deploy http-go --image=gasbugs/http-go
deployment.apps/http-go created

$ kubectl get deployment
NAME      READY   UP-TO-DATE   AVAILABLE   AGE
http-go   1/1     1            1           48s

http-go 설정 변경

$ kubectl edit deploy http-go
deployment.apps/http-go edited

replicas 1 -> 10 으로 수정

...
spec:
  ...
  replicas: 10
  ...

node 가 5개씩 배치된 것을 확인할 수 있음

$ kubectl get pod -o wide
NAME                      READY   STATUS    RESTARTS   AGE     IP          NODE
http-go-568f649bb-4jtvj   1/1     Running   0          90s     10.46.0.4   work2
http-go-568f649bb-4zgdr   1/1     Running   0          90s     10.32.0.7   work1
http-go-568f649bb-fxrg6   1/1     Running   0          90s     10.46.0.3   work2
http-go-568f649bb-gpxkj   1/1     Running   0          90s     10.32.0.5   work1
http-go-568f649bb-gw9mv   1/1     Running   0          8m57s   10.46.0.1   work2
http-go-568f649bb-gwlj2   1/1     Running   0          90s     10.46.0.5   work2
http-go-568f649bb-hgs82   1/1     Running   0          90s     10.46.0.2   work2
http-go-568f649bb-rk269   1/1     Running   0          90s     10.32.0.6   work1
http-go-568f649bb-s9s97   1/1     Running   0          90s     10.46.0.6   work2
http-go-568f649bb-vknrm   1/1     Running   0          90s     10.32.0.4   work1

work1 을 drain 하면 work1 node에 할당된 pod들이 evicted(추출) 됨

$ kubectl drain work1 --ignore-daemonsets
node/work1 already cordoned
WARNING: ignoring DaemonSet-managed Pods: kube-system/kube-proxy-6ld5t, kube-system/weave-net-mxqkk
evicting pod kube-system/coredns-74ff55c5b-btpq5
evicting pod default/http-go-568f649bb-gpxkj
evicting pod default/http-go-568f649bb-4zgdr
evicting pod default/http-go-568f649bb-vknrm
evicting pod default/http-go-568f649bb-rk269
evicting pod kube-system/coredns-74ff55c5b-bsm6g
pod/http-go-568f649bb-4zgdr evicted
pod/http-go-568f649bb-vknrm evicted
pod/http-go-568f649bb-rk269 evicted
pod/http-go-568f649bb-gpxkj evicted
pod/coredns-74ff55c5b-btpq5 evicted
pod/coredns-74ff55c5b-bsm6g evicted
node/work1 evicted

추출할필요가 없는 pod 들에 대해서 경고 문구가 출력되고 추출되지 않음

$ kubectl get pod --all-namespaces -o wide | grep work1
NAMESPACE     NAME                             READY   STATUS    RESTARTS   AGE     IP           NODE
kube-system   kube-proxy-6ld5t                 1/1     Running   19         23d     10.0.2.6     work1
kube-system   weave-net-mxqkk                  2/2     Running   53         23d     10.0.2.6     work1

node 를 확인해보면 work1 에 SchedulingDisabled 상태가 추가되어 있음
앞으로는 pod 를 생성해도 work1 node 에는 배치되지 않음

$ kubectl get node
NAME     STATUS                     ROLES                  AGE   VERSION
master   Ready                      control-plane,master   23d   v1.20.1
work1    Ready,SchedulingDisabled   <none>                 23d   v1.20.1
work2    Ready                      <none>                 23d   v1.20.1

http-go replicas 를 30 개로 증가시켜서 무리가 되어도 work1 에는 배치가 되지 않는 것을 확인할 수 있음

$ kubectl scale deployment http-go --replicas=30
deployment.apps/http-go scaled

$ kubectl get pod -o wide
NAME                      READY   STATUS    RESTARTS   AGE     IP           NODE
http-go-568f649bb-4c94p   1/1     Running   0          62s     10.46.0.12   work2
http-go-568f649bb-4jtvj   1/1     Running   0          65m     10.46.0.4    work2
http-go-568f649bb-5w5gm   1/1     Running   0          62s     10.46.0.16   work2
http-go-568f649bb-64v9w   1/1     Running   0          62s     10.46.0.31   work2
http-go-568f649bb-8m422   1/1     Running   0          8m37s   10.46.0.10   work2
http-go-568f649bb-8rb26   1/1     Running   0          62s     10.46.0.14   work2
http-go-568f649bb-8wpvs   1/1     Running   0          62s     10.46.0.26   work2
http-go-568f649bb-cm95p   1/1     Running   0          62s     10.46.0.22   work2
http-go-568f649bb-dw64x   1/1     Running   0          62s     10.46.0.23   work2
http-go-568f649bb-f87zj   1/1     Running   0          62s     10.46.0.27   work2
http-go-568f649bb-fxrg6   1/1     Running   0          65m     10.46.0.3    work2
http-go-568f649bb-gw9mv   1/1     Running   0          73m     10.46.0.1    work2
http-go-568f649bb-gwlj2   1/1     Running   0          65m     10.46.0.5    work2
http-go-568f649bb-hgs82   1/1     Running   0          65m     10.46.0.2    work2
http-go-568f649bb-j5t8b   1/1     Running   0          62s     10.46.0.25   work2
http-go-568f649bb-jkgbd   1/1     Running   0          62s     10.46.0.24   work2
http-go-568f649bb-jp42p   1/1     Running   0          62s     10.46.0.17   work2
http-go-568f649bb-kmjjb   1/1     Running   0          8m38s   10.46.0.9    work2
http-go-568f649bb-kzq6c   1/1     Running   0          62s     10.46.0.13   work2
http-go-568f649bb-lbnhj   1/1     Running   0          62s     10.46.0.21   work2
http-go-568f649bb-llzxp   1/1     Running   0          8m38s   10.46.0.8    work2
http-go-568f649bb-lvfjj   1/1     Running   0          62s     10.46.0.29   work2
http-go-568f649bb-mf25d   1/1     Running   0          62s     10.46.0.19   work2
http-go-568f649bb-mxg6t   1/1     Running   0          62s     10.46.0.18   work2
http-go-568f649bb-rnv5h   1/1     Running   0          62s     10.46.0.30   work2
http-go-568f649bb-s9s97   1/1     Running   0          65m     10.46.0.6    work2
http-go-568f649bb-tjfpn   1/1     Running   0          62s     10.46.0.15   work2
http-go-568f649bb-trxqh   1/1     Running   0          62s     10.46.0.28   work2
http-go-568f649bb-whwrx   1/1     Running   0          8m38s   10.46.0.11   work2
http-go-568f649bb-zqg6x   1/1     Running   0          62s     10.46.0.20   work2

work1 에 uncordon 명령을 주어 다시 배치가 정상적으로 처리되도록 함

$ kubectl uncordon work1
node/work1 uncordoned

uncordon 명령을 주었다고 기존에 work2 에 배치된 pod 가 work1 로 옮겨가진 않음

$ kubectl get pod -o wide
NAME                      READY   STATUS    RESTARTS   AGE     IP           NODE
http-go-568f649bb-4c94p   1/1     Running   0          62s     10.46.0.12   work2
http-go-568f649bb-4jtvj   1/1     Running   0          65m     10.46.0.4    work2
http-go-568f649bb-5w5gm   1/1     Running   0          62s     10.46.0.16   work2
http-go-568f649bb-64v9w   1/1     Running   0          62s     10.46.0.31   work2
http-go-568f649bb-8m422   1/1     Running   0          8m37s   10.46.0.10   work2
http-go-568f649bb-8rb26   1/1     Running   0          62s     10.46.0.14   work2
http-go-568f649bb-8wpvs   1/1     Running   0          62s     10.46.0.26   work2
http-go-568f649bb-cm95p   1/1     Running   0          62s     10.46.0.22   work2
http-go-568f649bb-dw64x   1/1     Running   0          62s     10.46.0.23   work2
http-go-568f649bb-f87zj   1/1     Running   0          62s     10.46.0.27   work2
http-go-568f649bb-fxrg6   1/1     Running   0          65m     10.46.0.3    work2
http-go-568f649bb-gw9mv   1/1     Running   0          73m     10.46.0.1    work2
http-go-568f649bb-gwlj2   1/1     Running   0          65m     10.46.0.5    work2
http-go-568f649bb-hgs82   1/1     Running   0          65m     10.46.0.2    work2
http-go-568f649bb-j5t8b   1/1     Running   0          62s     10.46.0.25   work2
http-go-568f649bb-jkgbd   1/1     Running   0          62s     10.46.0.24   work2
http-go-568f649bb-jp42p   1/1     Running   0          62s     10.46.0.17   work2
http-go-568f649bb-kmjjb   1/1     Running   0          8m38s   10.46.0.9    work2
http-go-568f649bb-kzq6c   1/1     Running   0          62s     10.46.0.13   work2
http-go-568f649bb-lbnhj   1/1     Running   0          62s     10.46.0.21   work2
http-go-568f649bb-llzxp   1/1     Running   0          8m38s   10.46.0.8    work2
http-go-568f649bb-lvfjj   1/1     Running   0          62s     10.46.0.29   work2
http-go-568f649bb-mf25d   1/1     Running   0          62s     10.46.0.19   work2
http-go-568f649bb-mxg6t   1/1     Running   0          62s     10.46.0.18   work2
http-go-568f649bb-rnv5h   1/1     Running   0          62s     10.46.0.30   work2
http-go-568f649bb-s9s97   1/1     Running   0          65m     10.46.0.6    work2
http-go-568f649bb-tjfpn   1/1     Running   0          62s     10.46.0.15   work2
http-go-568f649bb-trxqh   1/1     Running   0          62s     10.46.0.28   work2
http-go-568f649bb-whwrx   1/1     Running   0          8m38s   10.46.0.11   work2
http-go-568f649bb-zqg6x   1/1     Running   0          62s     10.46.0.20   work2

node 를 확인해보면 SchedulingDisabled 상태가 해제된 것을 확인할 수 있음

$ kubectl get node
NAME     STATUS   ROLES                  AGE   VERSION
master   Ready    control-plane,master   23d   v1.20.1
work1    Ready    <none>                 23d   v1.20.1
work2    Ready    <none>                 23d   v1.20.1
728x90

댓글

💲 추천 글