Skip to content

Commit c3fbf54

Browse files
Kalyan Reddy DaidaKalyan Reddy Daida
Kalyan Reddy Daida
authored and
Kalyan Reddy Daida
committed
Welcome to Stack Simplify
1 parent 71dbfb9 commit c3fbf54

File tree

1 file changed

+91
-4
lines changed
  • 18-EKS-Monitoring-using-CloudWatch-Container-Insights

1 file changed

+91
-4
lines changed

18-EKS-Monitoring-using-CloudWatch-Container-Insights/README.md

+91-4
Original file line numberDiff line numberDiff line change
@@ -60,13 +60,100 @@ kubectl run --generator=run-pod/v1 apache-bench -i --tty --rm --image=httpd -- a
6060

6161
## Step-07: CloudWatch Log Insights
6262
- View Container logs
63+
- View Container Performance Logs
6364

65+
## Step-08: Container Insights - Log Insights in depth
66+
- Log Groups
67+
- Log Insights
68+
- Create Dashboard
6469

65-
## Step-08: CloudWatch Alarms from metrics
66-
- Create Alarms
70+
### Create Graph for Avg Node CPU Utlization
71+
- DashBoard Name: EKS-Performance
72+
- Widget Type: Bar
73+
- Log Group: /aws/containerinsights/eksdemo1/performance
74+
```
75+
STATS avg(node_cpu_utilization) as avg_node_cpu_utilization by NodeName
76+
| SORT avg_node_cpu_utilization DESC
77+
```
78+
79+
### Container Restarts
80+
- DashBoard Name: EKS-Performance
81+
- Widget Type: Table
82+
- Log Group: /aws/containerinsights/eksdemo1/performance
83+
```
84+
STATS avg(number_of_container_restarts) as avg_number_of_container_restarts by PodName
85+
| SORT avg_number_of_container_restarts DESC
86+
```
87+
88+
### Cluster Node Failures
89+
- DashBoard Name: EKS-Performance
90+
- Widget Type: Table
91+
- Log Group: /aws/containerinsights/eksdemo1/performance
92+
```
93+
stats avg(cluster_failed_node_count) as CountOfNodeFailures
94+
| filter Type="Cluster"
95+
| sort @timestamp desc
96+
```
97+
### CPU Usage By Container
98+
- DashBoard Name: EKS-Performance
99+
- Widget Type: Bar
100+
- Log Group: /aws/containerinsights/eksdemo1/performance
101+
```
102+
stats pct(container_cpu_usage_total, 50) as CPUPercMedian by kubernetes.container_name
103+
| filter Type="Container"
104+
```
67105

106+
### Pods Requested vs Pods Running
107+
- DashBoard Name: EKS-Performance
108+
- Widget Type: Bar
109+
- Log Group: /aws/containerinsights/eksdemo1/performance
110+
```
111+
fields @timestamp, @message
112+
| sort @timestamp desc
113+
| filter Type="Pod"
114+
| stats min(pod_number_of_containers) as requested, min(pod_number_of_running_containers) as running, ceil(avg(pod_number_of_containers-pod_number_of_running_containers)) as pods_missing by kubernetes.pod_name
115+
| sort pods_missing desc
116+
```
117+
118+
### Application log errors by container name
119+
- DashBoard Name: EKS-Performance
120+
- Widget Type: Bar
121+
- Log Group: /aws/containerinsights/eksdemo1/application
122+
```
123+
stats count() as countoferrors by kubernetes.container_name
124+
| filter stream="stderr"
125+
| sort countoferrors desc
126+
```
127+
128+
- **Reference**: https://docs.aws.amazon.com/AmazonCloudWatch/latest/monitoring/Container-Insights-view-metrics.html
129+
130+
131+
## Step-09: Container Insights - CloudWatch Alarms
132+
### Create Alarms - Node CPU Usage
133+
- **Specify metric and conditions**
134+
- **Select Metric:** Container Insights -> ClusterName -> node_cpu_utilization
135+
- **Metric Name:** eksdemo1_node_cpu_utilization
136+
- **Threshold Value:** 4
137+
- **Important Note:** Anything above 4% of CPU it will send a notification email, ideally it should 80% or 90% CPU but we are giving 4% CPU just for load simulation testing
138+
- **Configure Actions**
139+
- **Create New Topic:** eks-alerts
140+
- **Email:** [email protected]
141+
- Click on **Create Topic**
142+
- **Important Note:**** Complete Email subscription sent to your email id.
143+
- **Add name and description**
144+
- **Name:** EKS-Nodes-CPU-Alert
145+
- **Descritption:** EKS Nodes CPU alert notification
146+
- Click Next
147+
- **Preview**
148+
- Preview and Create Alarm
149+
- **Add Alarm to our custom Dashboard**
150+
- Generate Load & Verify Alarm
151+
```
152+
# Generate Load
153+
kubectl run --generator=run-pod/v1 apache-bench -i --tty --rm --image=httpd -- ab -n 500000 -c 1000 http://sample-nginx-service.default.svc.cluster.local/
154+
```
68155

69-
## Step-09: Clean-Up Container Insights
156+
## Step-10: Clean-Up Container Insights
70157
```
71158
# Template
72159
curl https://raw.githubusercontent.com/aws-samples/amazon-cloudwatch-container-insights/latest/k8s-deployment-manifest-templates/deployment-mode/daemonset/container-insights-monitoring/quickstart/cwagent-fluentd-quickstart.yaml | sed "s/{{cluster_name}}/cluster-name/;s/{{region_name}}/cluster-region/" | kubectl delete -f -
@@ -75,7 +162,7 @@ curl https://raw.githubusercontent.com/aws-samples/amazon-cloudwatch-container-i
75162
curl https://raw.githubusercontent.com/aws-samples/amazon-cloudwatch-container-insights/latest/k8s-deployment-manifest-templates/deployment-mode/daemonset/container-insights-monitoring/quickstart/cwagent-fluentd-quickstart.yaml | sed "s/{{cluster_name}}/eksdemo1/;s/{{region_name}}/us-east-1/" | kubectl delete -f -
76163
```
77164

78-
## Step-10: Clean-Up Application
165+
## Step-11: Clean-Up Application
79166
```
80167
# Delete Apps
81168
kubectl delete -f kube-manifests/

0 commit comments

Comments
 (0)