|
| 1 | +# EKS - Cluster Autoscaler |
| 2 | + |
| 3 | +## Step-01: Introduction |
| 4 | +- The Kubernetes Cluster Autoscaler automatically adjusts the number of nodes in your cluster when pods fail to launch due to lack of resources or when nodes in the cluster are underutilized and their pods can be rescheduled onto other nodes in the cluster. |
| 5 | + |
| 6 | +## Step-02: Verify if our NodeGroup as --asg-access |
| 7 | +- We need to ensure that we have a parameter named `--asg-access` present during the cluster or nodegroup creation. |
| 8 | +- Verify the same when we created our cluster node group |
| 9 | + |
| 10 | +### What will happen if we use --asg-access tag? |
| 11 | +- It enables IAM policy for cluster-autoscaler |
| 12 | +- Lets review our nodegroup IAM role for the same. |
| 13 | +- Go to Services -> IAM -> Roles -> eksctl-eksdemo1-nodegroup-XXXXXX |
| 14 | +- Click on **Permissions** tab |
| 15 | +- You should see a inline policy named `eksctl-eksdemo1-nodegroup-eksdemo1-ng-private1-PolicyAutoScaling` in the list of policies associated to this role. |
| 16 | + |
| 17 | +## Step-03: Deploy Cluster Autoscaler |
| 18 | +``` |
| 19 | +# Deploy the Cluster Autoscaler to your cluster |
| 20 | +kubectl apply -f https://raw.githubusercontent.com/kubernetes/autoscaler/master/cluster-autoscaler/cloudprovider/aws/examples/cluster-autoscaler-autodiscover.yaml |
| 21 | +
|
| 22 | +# Add the cluster-autoscaler.kubernetes.io/safe-to-evict annotation to the deployment |
| 23 | +kubectl -n kube-system annotate deployment.apps/cluster-autoscaler cluster-autoscaler.kubernetes.io/safe-to-evict="false" |
| 24 | +``` |
| 25 | +## Step-04: Edit Cluster Autoscaler Deployment to add Cluster name and two more parameters |
| 26 | +``` |
| 27 | +kubectl -n kube-system edit deployment.apps/cluster-autoscaler |
| 28 | +``` |
| 29 | +- **Add cluster name** |
| 30 | +```yml |
| 31 | +# Before Change |
| 32 | + - --node-group-auto-discovery=asg:tag=k8s.io/cluster-autoscaler/enabled,k8s.io/cluster-autoscaler/<YOUR CLUSTER NAME> |
| 33 | + |
| 34 | +# After Change |
| 35 | + - --node-group-auto-discovery=asg:tag=k8s.io/cluster-autoscaler/enabled,k8s.io/cluster-autoscaler/eksdemo1 |
| 36 | +``` |
| 37 | +
|
| 38 | +- **Add two more parameters** |
| 39 | +```yml |
| 40 | + - --balance-similar-node-groups |
| 41 | + - --skip-nodes-with-system-pods=false |
| 42 | +``` |
| 43 | +- **Sample for reference** |
| 44 | +```yml |
| 45 | + spec: |
| 46 | + containers: |
| 47 | + - command: |
| 48 | + - ./cluster-autoscaler |
| 49 | + - --v=4 |
| 50 | + - --stderrthreshold=info |
| 51 | + - --cloud-provider=aws |
| 52 | + - --skip-nodes-with-local-storage=false |
| 53 | + - --expander=least-waste |
| 54 | + - --node-group-auto-discovery=asg:tag=k8s.io/cluster-autoscaler/enabled,k8s.io/cluster-autoscaler/eksdemo1 |
| 55 | + - --balance-similar-node-groups |
| 56 | + - --skip-nodes-with-system-pods=false |
| 57 | +``` |
| 58 | +
|
| 59 | +## Step-05: Set the Cluster Autoscaler Image related to our current EKS Cluster version |
| 60 | +- Open https://github.com/kubernetes/autoscaler/releases |
| 61 | +- Find our release version (example: 1.16.n) and update the same. |
| 62 | +- Our Cluster version is 1.16 and our cluster autoscaler version is 1.16.5 as per above releases link |
| 63 | +``` |
| 64 | +# Template |
| 65 | +# Update Cluster Autoscaler Image Version |
| 66 | +kubectl -n kube-system set image deployment.apps/cluster-autoscaler cluster-autoscaler=us.gcr.io/k8s-artifacts-prod/autoscaling/cluster-autoscaler:v1.XY.Z |
| 67 | + |
| 68 | + |
| 69 | +# Update Cluster Autoscaler Image Version |
| 70 | +kubectl -n kube-system set image deployment.apps/cluster-autoscaler cluster-autoscaler=us.gcr.io/k8s-artifacts-prod/autoscaling/cluster-autoscaler:v1.16.5 |
| 71 | +``` |
| 72 | + |
| 73 | +## Step-06: Verify Image version got updated |
| 74 | +``` |
| 75 | +kubectl -n kube-system get deployment.apps/cluster-autoscaler -o yaml |
| 76 | +``` |
| 77 | +- **Sample partial output** |
| 78 | +```yml |
| 79 | + spec: |
| 80 | + containers: |
| 81 | + - command: |
| 82 | + - ./cluster-autoscaler |
| 83 | + - --v=4 |
| 84 | + - --stderrthreshold=info |
| 85 | + - --cloud-provider=aws |
| 86 | + - --skip-nodes-with-local-storage=false |
| 87 | + - --expander=least-waste |
| 88 | + - --node-group-auto-discovery=asg:tag=k8s.io/cluster-autoscaler/enabled,k8s.io/cluster-autoscaler/eksdemo1 |
| 89 | + - --balance-similar-node-groups |
| 90 | + - --skip-nodes-with-system-pods=false |
| 91 | + image: us.gcr.io/k8s-artifacts-prod/autoscaling/cluster-autoscaler:v1.16.5 |
| 92 | +``` |
| 93 | +
|
| 94 | +## Step-07: View Cluster Autoscaler logs to verify that it is monitoring your cluster load. |
| 95 | +``` |
| 96 | +kubectl -n kube-system logs -f deployment.apps/cluster-autoscaler |
| 97 | +``` |
| 98 | +- Sample log reference |
| 99 | +```log |
| 100 | +I0607 09:14:37.793323 1 pre_filtering_processor.go:66] Skipping ip-192-168-60-30.ec2.internal - node group min size reached |
| 101 | +I0607 09:14:37.793332 1 pre_filtering_processor.go:66] Skipping ip-192-168-27-213.ec2.internal - node group min size reached |
| 102 | +I0607 09:14:37.793408 1 static_autoscaler.go:440] Scale down status: unneededOnly=true lastScaleUpTime=2020-06-07 09:12:27.367461648 +0000 UTC m=+37.138078060 lastScaleDownDeleteTime=2020-06-07 09:12:27.367461724 +0000 UTC m=+37.138078135 lastScaleDownFailTime=2020-06-07 09:12:27.367461801 +0000 UTC m=+37.138078213 scaleDownForbidden=false isDeleteInProgress=false scaleDownInCooldown=true |
| 103 | +I0607 09:14:47.803891 1 static_autoscaler.go:192] Starting main loop |
| 104 | +I0607 09:14:47.804234 1 utils.go:590] No pod using affinity / antiaffinity found in cluster, disabling affinity predicate for this loop |
| 105 | +I0607 09:14:47.804251 1 filter_out_schedulable.go:65] Filtering out schedulables |
| 106 | +I0607 09:14:47.804319 1 filter_out_schedulable.go:130] 0 other pods marked as unschedulable can be scheduled. |
| 107 | +I0607 09:14:47.804343 1 filter_out_schedulable.go:130] 0 other pods marked as unschedulable can be scheduled. |
| 108 | +I0607 09:14:47.804351 1 filter_out_schedulable.go:90] No schedulable pods |
| 109 | +I0607 09:14:47.804366 1 static_autoscaler.go:334] No unschedulable pods |
| 110 | +I0607 09:14:47.804376 1 static_autoscaler.go:381] Calculating unneeded nodes |
| 111 | +I0607 09:14:47.804392 1 pre_filtering_processor.go:66] Skipping ip-192-168-60-30.ec2.internal - node group min size reached |
| 112 | +I0607 09:14:47.804401 1 pre_filtering_processor.go:66] Skipping ip-192-168-27-213.ec2.internal - node group min size reached |
| 113 | +I0607 09:14:47.804460 1 static_autoscaler.go:440] Scale down status: unneededOnly=true lastScaleUpTime=2020-06-07 09:12:27.367461648 +0000 UTC m=+37.138078060 lastScaleDownDeleteTime=2020-06-07 09:12:27.367461724 +0000 UTC m=+37.138078135 lastScaleDownFailTime=2020-06-07 09:12:27.367461801 +0000 UTC m=+37.138078213 scaleDownForbidden=false isDeleteInProgress=false scaleDownInCooldown=true |
| 114 | +
|
| 115 | +``` |
| 116 | + |
| 117 | +## Step-08: Deploy simple Application |
| 118 | +``` |
| 119 | +# Deploy Application |
| 120 | +kubectl apply -f kube-manifests/ |
| 121 | +``` |
| 122 | + |
| 123 | +## Step-09: Cluster Scale UP: Scale our application to 30 pods |
| 124 | +- In 2 to 3 minutes, one after the other new nodes will added and pods will be scheduled on them. |
| 125 | +- Our max number of nodes will be 4 which we provided during nodegroup creation. |
| 126 | +``` |
| 127 | +# Terminal - 1: Keep monitoring cluster autoscaler logs |
| 128 | +kubectl -n kube-system logs -f deployment.apps/cluster-autoscaler |
| 129 | +
|
| 130 | +# Terminal - 2: Scale UP the demo application to 30 pods |
| 131 | +kubectl get pods |
| 132 | +kubectl get nodes |
| 133 | +kubectl scale --replicas=30 deploy ca-demo-deployment |
| 134 | +kubectl get pods |
| 135 | +
|
| 136 | +# Terminal - 2: Verify nodes |
| 137 | +kubectl get nodes -o wide |
| 138 | +``` |
| 139 | +## Step-10: Cluster Scale DOWN: Scale our application to 1 pod |
| 140 | +- It might take 5 to 20 minutes to cool down and come down to minimum nodes which will be 2 which we configured during nodegroup creation |
| 141 | +``` |
| 142 | +# Terminal - 1: Keep monitoring cluster autoscaler logs |
| 143 | +kubectl -n kube-system logs -f deployment.apps/cluster-autoscaler |
| 144 | +
|
| 145 | +# Terminal - 2: Scale down the demo application to 1 pod |
| 146 | +kubectl scale --replicas=1 deploy ca-demo-deployment |
| 147 | +
|
| 148 | +# Terminal - 2: Verify nodes |
| 149 | +kubectl get nodes -o wide |
| 150 | +``` |
| 151 | + |
| 152 | +## Step-11: Clean-Up |
| 153 | +- We will leave cluster autoscaler and undeploy only application |
| 154 | +``` |
| 155 | +kubectl delete -f kube-manifests/ |
| 156 | +``` |
0 commit comments