728x90
노드 OS 업그레이드 절차
노드의 유지 보수
- 커널 업그레이드, libc 업그레이드, 하드웨어 복구 등과 같이 노드를 재부팅해야 하는 경우
- 노드를 갑자기 내리는 경우 다양한 문제가 발생할 수 있음
- 포드는 노드가 내려가서 5분 이상 돌아오지 않으면 그때 다른 노드에 포드를 복제
- 업그레이드 프로세스를 보다 효율적으로 제어하려면 하나의 노드 씩 다음 절차를 진행
현재 노드 확인
$ kubectl get node
NAME STATUS ROLES AGE VERSION
gke-cluster-1-default-pool-5f31eca2-1lhl Ready <none> 4m46s v1.16.15-gke.6000
gke-cluster-1-default-pool-5f31eca2-7b8n Ready <none> 4m47s v1.16.15-gke.6000
gke-cluster-1-default-pool-5f31eca2-lnt3 Ready <none> 4m45s v1.16.15-gke.6000
노드 drain(배수)
- 노드에 스케줄링된 포드들을 모두 내리는 작업
kubectl drain <node>
- 노드를 내리는 작업에서 문제가 발생하는 경우 권장하는 옵션을 사용
--delete-local-data
: local data 를 모두 삭제--ignore-daemonsets
: demonset 을 무시하고 모두 삭제
$ kubectl drain gke-standard-cluster-1-default-pool-bdf36b75-dlzj
node/gke-standard-cluster-1-default-pool-bdf36b75-dlzj cordoned
error: unable to drain node "gke-standard-cluster-1-default-pool-bdf36b75-dlzj", aborting command...
There are pending nodes to be drained:
gke-standard-cluster-1-default-pool-bdf36b75-dlzj
error: pods with local storage (use --delete-local-data to override): two-containers; DaemonSet-managed pods (use --ignore-daemonsets to ignore): fluentd-gcp-v3.2.0-vwfzv, prometheus-to-sd-r7jcw
옵션을 모두 사용하여 다시 시도
kubectl drain gke-standard-cluster-1-default-pool-bdf36b75-dlzj --delete-local-data --ignore-daemonsets -- force
$ kubectl drain gke-standard-cluster-1-default-pool-bdf36b75-dlzj --delete-local-data --ignore-daemonsets -- force
node/gke-standard-cluster-1-default-pool-bdf36b75-dlzj already cordoned
WARNING: Deleting pods not managed by ReplicationController, ReplicaSet, Job, DaemonSet or StatefulSet: two- containers; Ignoring DaemonSet-managed pods: fluentd-gcp-v3.2.0-vwfzv, prometheus-to-sd-r7jcw; Deleting pods with local storage: two-containers
pod/command-demo evicted
pod/my-scheduler-79ccf6b89-gtbjd evicted
pod/heapster-v1.6.0-beta.1-6cdb9cd95b-vdk24 evicted
pod/metrics-server-v0.3.1-57c75779f-6w2nr evicted
pod/two-containers evicted
pod/php-apache-84cc7f889b-6wzbx evicted
pod/kube-dns-autoscaler-bb58c6784-mw7cs evicted
pod/fluentd-gcp-scaler-59b7b75cd7-wq6b6 evicted
node/gke-standard-cluster-1-default-pool-bdf36b75-dlzj evicted
결과 확인
$ kubectl get node
NAME STATUS ROLES AGE VERSION
gke-cluster-1-default-pool-5f31eca2-1lhl Ready,SchedulingDisabled <none> 4m46s v1.16.15-gke.6000
gke-cluster-1-default-pool-5f31eca2-7b8n Ready <none> 4m47s v1.16.15-gke.6000
gke-cluster-1-default-pool-5f31eca2-lnt3 Ready <none> 4m45s v1.16.15-gke.6000
$ kubectl get pod -o wide | grep dlzj # dlzj에서 활동중인 포드 확인
출력 결과 없음
OS 업그레이드 후 노드를 복구 (uncordon; 저지선을 철거하다)
$ kubectl uncordon gke-standard-cluster-1-default-pool-bdf36b75-dlzj
node/gke-standard-cluster-1-default-pool-bdf36b75-dlzj uncordoned
$ kubectl get node
NAME STATUS ROLES AGE VERSION
gke-cluster-1-default-pool-5f31eca2-1lhl Ready <none> 4m46s v1.16.15-gke.6000
gke-cluster-1-default-pool-5f31eca2-7b8n Ready <none> 4m47s v1.16.15-gke.6000
gke-cluster-1-default-pool-5f31eca2-lnt3 Ready <none> 4m45s v1.16.15-gke.6000
cordon과 drain의 차이
- drain은 스케줄링된 모든 포드를 다른 노드로 리스케줄링을 시도하고 저지선을 만들어 더 이상 노드에 포드가 만들어지지 않도록 한다
- 반면 cordon은 저지선만 설치하는 것으로 현재 갖고 있는 포드를 그대로 유지하면서 새로운 포드에 대해서만 스케줄링을 거부한다
생각해보기
- uncordon을 사용하면 노드의 저지선을 풀리면서 포드들이 원래의 스케줄 상태로 되돌아올까?
실습
http-go deployment 생성 및 확인
$ kubectl create deploy http-go --image=gasbugs/http-go
deployment.apps/http-go created
$ kubectl get deployment
NAME READY UP-TO-DATE AVAILABLE AGE
http-go 1/1 1 1 48s
http-go 설정 변경
$ kubectl edit deploy http-go
deployment.apps/http-go edited
replicas 1 -> 10 으로 수정
...
spec:
...
replicas: 10
...
node 가 5개씩 배치된 것을 확인할 수 있음
$ kubectl get pod -o wide
NAME READY STATUS RESTARTS AGE IP NODE
http-go-568f649bb-4jtvj 1/1 Running 0 90s 10.46.0.4 work2
http-go-568f649bb-4zgdr 1/1 Running 0 90s 10.32.0.7 work1
http-go-568f649bb-fxrg6 1/1 Running 0 90s 10.46.0.3 work2
http-go-568f649bb-gpxkj 1/1 Running 0 90s 10.32.0.5 work1
http-go-568f649bb-gw9mv 1/1 Running 0 8m57s 10.46.0.1 work2
http-go-568f649bb-gwlj2 1/1 Running 0 90s 10.46.0.5 work2
http-go-568f649bb-hgs82 1/1 Running 0 90s 10.46.0.2 work2
http-go-568f649bb-rk269 1/1 Running 0 90s 10.32.0.6 work1
http-go-568f649bb-s9s97 1/1 Running 0 90s 10.46.0.6 work2
http-go-568f649bb-vknrm 1/1 Running 0 90s 10.32.0.4 work1
work1 을 drain 하면 work1 node에 할당된 pod들이 evicted(추출) 됨
$ kubectl drain work1 --ignore-daemonsets
node/work1 already cordoned
WARNING: ignoring DaemonSet-managed Pods: kube-system/kube-proxy-6ld5t, kube-system/weave-net-mxqkk
evicting pod kube-system/coredns-74ff55c5b-btpq5
evicting pod default/http-go-568f649bb-gpxkj
evicting pod default/http-go-568f649bb-4zgdr
evicting pod default/http-go-568f649bb-vknrm
evicting pod default/http-go-568f649bb-rk269
evicting pod kube-system/coredns-74ff55c5b-bsm6g
pod/http-go-568f649bb-4zgdr evicted
pod/http-go-568f649bb-vknrm evicted
pod/http-go-568f649bb-rk269 evicted
pod/http-go-568f649bb-gpxkj evicted
pod/coredns-74ff55c5b-btpq5 evicted
pod/coredns-74ff55c5b-bsm6g evicted
node/work1 evicted
추출할필요가 없는 pod 들에 대해서 경고 문구가 출력되고 추출되지 않음
$ kubectl get pod --all-namespaces -o wide | grep work1
NAMESPACE NAME READY STATUS RESTARTS AGE IP NODE
kube-system kube-proxy-6ld5t 1/1 Running 19 23d 10.0.2.6 work1
kube-system weave-net-mxqkk 2/2 Running 53 23d 10.0.2.6 work1
node 를 확인해보면 work1 에 SchedulingDisabled 상태가 추가되어 있음
앞으로는 pod 를 생성해도 work1 node 에는 배치되지 않음
$ kubectl get node
NAME STATUS ROLES AGE VERSION
master Ready control-plane,master 23d v1.20.1
work1 Ready,SchedulingDisabled <none> 23d v1.20.1
work2 Ready <none> 23d v1.20.1
http-go replicas 를 30 개로 증가시켜서 무리가 되어도 work1 에는 배치가 되지 않는 것을 확인할 수 있음
$ kubectl scale deployment http-go --replicas=30
deployment.apps/http-go scaled
$ kubectl get pod -o wide
NAME READY STATUS RESTARTS AGE IP NODE
http-go-568f649bb-4c94p 1/1 Running 0 62s 10.46.0.12 work2
http-go-568f649bb-4jtvj 1/1 Running 0 65m 10.46.0.4 work2
http-go-568f649bb-5w5gm 1/1 Running 0 62s 10.46.0.16 work2
http-go-568f649bb-64v9w 1/1 Running 0 62s 10.46.0.31 work2
http-go-568f649bb-8m422 1/1 Running 0 8m37s 10.46.0.10 work2
http-go-568f649bb-8rb26 1/1 Running 0 62s 10.46.0.14 work2
http-go-568f649bb-8wpvs 1/1 Running 0 62s 10.46.0.26 work2
http-go-568f649bb-cm95p 1/1 Running 0 62s 10.46.0.22 work2
http-go-568f649bb-dw64x 1/1 Running 0 62s 10.46.0.23 work2
http-go-568f649bb-f87zj 1/1 Running 0 62s 10.46.0.27 work2
http-go-568f649bb-fxrg6 1/1 Running 0 65m 10.46.0.3 work2
http-go-568f649bb-gw9mv 1/1 Running 0 73m 10.46.0.1 work2
http-go-568f649bb-gwlj2 1/1 Running 0 65m 10.46.0.5 work2
http-go-568f649bb-hgs82 1/1 Running 0 65m 10.46.0.2 work2
http-go-568f649bb-j5t8b 1/1 Running 0 62s 10.46.0.25 work2
http-go-568f649bb-jkgbd 1/1 Running 0 62s 10.46.0.24 work2
http-go-568f649bb-jp42p 1/1 Running 0 62s 10.46.0.17 work2
http-go-568f649bb-kmjjb 1/1 Running 0 8m38s 10.46.0.9 work2
http-go-568f649bb-kzq6c 1/1 Running 0 62s 10.46.0.13 work2
http-go-568f649bb-lbnhj 1/1 Running 0 62s 10.46.0.21 work2
http-go-568f649bb-llzxp 1/1 Running 0 8m38s 10.46.0.8 work2
http-go-568f649bb-lvfjj 1/1 Running 0 62s 10.46.0.29 work2
http-go-568f649bb-mf25d 1/1 Running 0 62s 10.46.0.19 work2
http-go-568f649bb-mxg6t 1/1 Running 0 62s 10.46.0.18 work2
http-go-568f649bb-rnv5h 1/1 Running 0 62s 10.46.0.30 work2
http-go-568f649bb-s9s97 1/1 Running 0 65m 10.46.0.6 work2
http-go-568f649bb-tjfpn 1/1 Running 0 62s 10.46.0.15 work2
http-go-568f649bb-trxqh 1/1 Running 0 62s 10.46.0.28 work2
http-go-568f649bb-whwrx 1/1 Running 0 8m38s 10.46.0.11 work2
http-go-568f649bb-zqg6x 1/1 Running 0 62s 10.46.0.20 work2
work1 에 uncordon 명령을 주어 다시 배치가 정상적으로 처리되도록 함
$ kubectl uncordon work1
node/work1 uncordoned
uncordon 명령을 주었다고 기존에 work2 에 배치된 pod 가 work1 로 옮겨가진 않음
$ kubectl get pod -o wide
NAME READY STATUS RESTARTS AGE IP NODE
http-go-568f649bb-4c94p 1/1 Running 0 62s 10.46.0.12 work2
http-go-568f649bb-4jtvj 1/1 Running 0 65m 10.46.0.4 work2
http-go-568f649bb-5w5gm 1/1 Running 0 62s 10.46.0.16 work2
http-go-568f649bb-64v9w 1/1 Running 0 62s 10.46.0.31 work2
http-go-568f649bb-8m422 1/1 Running 0 8m37s 10.46.0.10 work2
http-go-568f649bb-8rb26 1/1 Running 0 62s 10.46.0.14 work2
http-go-568f649bb-8wpvs 1/1 Running 0 62s 10.46.0.26 work2
http-go-568f649bb-cm95p 1/1 Running 0 62s 10.46.0.22 work2
http-go-568f649bb-dw64x 1/1 Running 0 62s 10.46.0.23 work2
http-go-568f649bb-f87zj 1/1 Running 0 62s 10.46.0.27 work2
http-go-568f649bb-fxrg6 1/1 Running 0 65m 10.46.0.3 work2
http-go-568f649bb-gw9mv 1/1 Running 0 73m 10.46.0.1 work2
http-go-568f649bb-gwlj2 1/1 Running 0 65m 10.46.0.5 work2
http-go-568f649bb-hgs82 1/1 Running 0 65m 10.46.0.2 work2
http-go-568f649bb-j5t8b 1/1 Running 0 62s 10.46.0.25 work2
http-go-568f649bb-jkgbd 1/1 Running 0 62s 10.46.0.24 work2
http-go-568f649bb-jp42p 1/1 Running 0 62s 10.46.0.17 work2
http-go-568f649bb-kmjjb 1/1 Running 0 8m38s 10.46.0.9 work2
http-go-568f649bb-kzq6c 1/1 Running 0 62s 10.46.0.13 work2
http-go-568f649bb-lbnhj 1/1 Running 0 62s 10.46.0.21 work2
http-go-568f649bb-llzxp 1/1 Running 0 8m38s 10.46.0.8 work2
http-go-568f649bb-lvfjj 1/1 Running 0 62s 10.46.0.29 work2
http-go-568f649bb-mf25d 1/1 Running 0 62s 10.46.0.19 work2
http-go-568f649bb-mxg6t 1/1 Running 0 62s 10.46.0.18 work2
http-go-568f649bb-rnv5h 1/1 Running 0 62s 10.46.0.30 work2
http-go-568f649bb-s9s97 1/1 Running 0 65m 10.46.0.6 work2
http-go-568f649bb-tjfpn 1/1 Running 0 62s 10.46.0.15 work2
http-go-568f649bb-trxqh 1/1 Running 0 62s 10.46.0.28 work2
http-go-568f649bb-whwrx 1/1 Running 0 8m38s 10.46.0.11 work2
http-go-568f649bb-zqg6x 1/1 Running 0 62s 10.46.0.20 work2
node 를 확인해보면 SchedulingDisabled 상태가 해제된 것을 확인할 수 있음
$ kubectl get node
NAME STATUS ROLES AGE VERSION
master Ready control-plane,master 23d v1.20.1
work1 Ready <none> 23d v1.20.1
work2 Ready <none> 23d v1.20.1
728x90
'개발강의정리 > DevOps' 카테고리의 다른 글
[데브옵스를 위한 쿠버네티스 마스터] 클러스터 유지와 보안, 트러블슈팅 - 백업과 복원 방법 (0) | 2021.01.20 |
---|---|
[데브옵스를 위한 쿠버네티스 마스터] 클러스터 유지와 보안, 트러블슈팅 - 쿠버네티스 버전 업데이트 (0) | 2021.01.20 |
[데브옵스를 위한 쿠버네티스 마스터] 서비스 매쉬 환경 모니터링 도구 isio 시작하기 (0) | 2021.01.19 |
[데브옵스를 위한 쿠버네티스 마스터] 리소스 로깅과 모니터링 - EFK를 활용한 k8s 로그 모니터링 (0) | 2021.01.19 |
[데브옵스를 위한 쿠버네티스 마스터] 리소스 로깅과 모니터링 - GKE에서 프로메테우스 설치와 모니터링 (0) | 2021.01.19 |
댓글