Skip to content

Commit 03e1b95

Browse files
ChrsMarkdmitryaxchalinopentelemetrybottiffany76
authored
K8s annotation discovery blogpost (#5967)
Signed-off-by: ChrsMark <[email protected]> Co-authored-by: Dmitrii Anoshin <[email protected]> Co-authored-by: Patrice Chalin <[email protected]> Co-authored-by: opentelemetrybot <[email protected]> Co-authored-by: Tiffany Hrabusa <[email protected]> Co-authored-by: Severin Neumann <[email protected]>
1 parent c5735ef commit 03e1b95

File tree

2 files changed

+210
-0
lines changed

2 files changed

+210
-0
lines changed
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,186 @@
1+
---
2+
title: Kubernetes annotation-based discovery for the OpenTelemetry Collector
3+
linkTitle: K8s annotation-based discovery
4+
date: 2025-01-27
5+
author: >
6+
[Dmitrii Anoshin](https://github.com/dmitryax) (Cisco/Splunk), [Christos
7+
Markou](https://github.com/ChrsMark) (Elastic)
8+
sig: Collector
9+
issue: opentelemetry-collector-contrib#34427
10+
cSpell:ignore: Dmitrii Anoshin Markou
11+
---
12+
13+
In the world of containers and [Kubernetes](https://kubernetes.io/),
14+
observability is crucial. Users need to know the status of their workloads at
15+
any given time. In other words, they need observability into moving objects.
16+
17+
This is where the [OpenTelemetry Collector](/docs/collector/) and its
18+
[receiver creator](https://github.com/open-telemetry/opentelemetry-collector-contrib/tree/v0.117.0/receiver/receivercreator)
19+
component come in handy. Users can set up fairly complex monitoring scenarios
20+
with a self-service approach, following the principle of least privilege at the
21+
cluster level.
22+
23+
The self-service approach is great, but how much self-service can it actually
24+
be? In this blog post, we will explore a newly added feature of the Collector
25+
that makes dynamic workload discovery even easier, providing a seamless
26+
experience for both administrators and users.
27+
28+
## Automatic discovery for containers and pods
29+
30+
Applications running on containers and pods become moving targets for the
31+
monitoring system. With automatic discovery, monitoring agents like the
32+
Collector can track changes at the container and pod levels and dynamically
33+
adjust the monitoring configuration.
34+
35+
Today, the Collector—and specifically the receiver creator—can provide such an
36+
experience. Using the receiver creator, observability users can define
37+
configuration "templates" that rely on environment conditions. For example, as
38+
an observability engineer, you can configure your Collectors to enable the
39+
[NGINX receiver](https://github.com/open-telemetry/opentelemetry-collector-contrib/tree/v0.117.0/receiver/nginxreceiver)
40+
when a NGINX pod is deployed on the cluster. The following configuration can
41+
achieve this:
42+
43+
```yaml
44+
receivers:
45+
receiver_creator:
46+
watch_observers: [k8s_observer]
47+
receivers:
48+
nginx:
49+
rule: type == "port" && port == 80 && pod.name matches "(?i)nginx"
50+
config:
51+
endpoint: 'http://`endpoint`/nginx_status'
52+
collection_interval: '15s'
53+
```
54+
55+
The previous configuration is enabled when a pod is discovered via the
56+
Kubernetes API that exposes port `80` (the known port for NGINX) and its name
57+
matches the `nginx` keyword.
58+
59+
This is great, and as an SRE or Platform Engineer managing an observability
60+
solution, you can rely on this to meet your users' needs for monitoring NGINX
61+
workloads. However, what happens if another team wants to monitor a different
62+
type of workload, such as Apache servers? They would need to inform your team,
63+
and you would need to update the configuration with a new conditional
64+
configuration block, take it through a pull request and review process, and
65+
finally deploy it. This deployment would require the Collector instances to
66+
restart for the new configuration to take effect. While this process might not
67+
be a big deal for some teams, there is definitely room for improvement.
68+
69+
So, what if, as a Collector user, you could simply enable automatic discovery
70+
and then let your cluster users tell the Collector how their workloads should be
71+
monitored by annotating their pods properly? That sounds awesome, and it’s not
72+
actually something new. OpenTelemetry already supports auto-instrumentation
73+
through the [Kubernetes operator](/docs/kubernetes/operator/automatic/),
74+
allowing users to instrument their applications automatically just by annotating
75+
their pods. In addition, this is a feature that other monitoring agents in the
76+
observability industry already support, and users are familiar with it.
77+
78+
All this motivation led the OpenTelemetry community
79+
([GitHub issue](https://github.com/open-telemetry/opentelemetry-collector-contrib/issues/17418))
80+
to create a similar feature for the Collector. We are happy to share that
81+
autodiscovery based on Kubernetes annotations is now supported in the Collector
82+
([GitHub issue](https://github.com/open-telemetry/opentelemetry-collector-contrib/issues/34427))!
83+
84+
## A solution
85+
86+
The solution is built on top of the existing functionality provided by the
87+
[Kubernetes observer](https://github.com/open-telemetry/opentelemetry-collector-contrib/tree/v0.117.0/extension/observer/k8sobserver)
88+
and
89+
[receiver creator](https://github.com/open-telemetry/opentelemetry-collector-contrib/tree/v0.117.0/receiver/receivercreator).
90+
91+
The K8s observer notifies the receiver creator about the objects appearing in
92+
the K8s cluster and provides all the information about them. In addition to the
93+
K8s object metadata, the observer supplies information about the discovered
94+
endpoints that the collector can connect to. This means that each discovered
95+
endpoint can potentially be used by a particular scraping receiver to fetch
96+
metrics data.
97+
98+
Each scraping receiver has a default configuration with only one required field:
99+
`endpoint`. Given that the endpoint information is provided by the Kubernetes
100+
observer, the only information that the user needs to provide explicitly is
101+
which receiver/scraper should be used to scrape data from a discovered endpoint.
102+
That information can be configured on the Collector, but as mentioned before,
103+
this is inconvenient. A much more convenient place to define which receiver can
104+
be used to scrape telemetry from a particular pod is the pod itself. Pod’s
105+
annotations is the natural place to put that kind of detail. Given that the
106+
receiver creator has access to the annotations, it can instantiate the proper
107+
receiver with the receiver’s default configuration and discovered endpoint.
108+
109+
The following annotation instructs the receiver creator that this particular pod
110+
runs NGINX, and the
111+
[NGINX receiver](https://github.com/open-telemetry/opentelemetry-collector-contrib/tree/v0.117.0/receiver/nginxreceiver)
112+
can be used to scrape metrics from it:
113+
114+
```yaml
115+
io.opentelemetry.discovery.metrics/scraper: nginx
116+
```
117+
118+
Apart from that, the discovery on the pod needs to be explicitly enabled with
119+
the following annotation:
120+
121+
```yaml
122+
io.opentelemetry.discovery.metrics/enabled: 'true'
123+
```
124+
125+
In some scenarios, the default receiver’s configuration is not suitable for
126+
connecting to a particular pod. In that case, it’s possible to define custom
127+
configuration as part of another annotation:
128+
129+
```yaml
130+
io.opentelemetry.discovery.metrics/config: |
131+
endpoint: "http://`endpoint`/nginx_status"
132+
collection_interval: '20s'
133+
initial_delay: '20s'
134+
read_buffer_size: '10'
135+
```
136+
137+
It’s important to mention that the configuration defined in the annotations
138+
cannot point the receiver creator to another pod. The Collector will reject such
139+
configurations.
140+
141+
In addition to the metrics scraping, the annotation-based discovery also
142+
supports log collection with filelog receiver. The following annotation can be
143+
used to enable log collection on a particular pod:
144+
145+
```yaml
146+
io.opentelemetry.discovery.logs/enabled: 'true'
147+
```
148+
149+
Similar to metrics, an optional configuration can be provided in the following
150+
form:
151+
152+
```yaml
153+
io.opentelemetry.discovery.logs/config: |
154+
max_log_size: "2MiB"
155+
operators:
156+
- type: container
157+
id: container-parser
158+
- type: regex_parser
159+
regex: '^(?P<time>\d{4}-\d{2}-\d{2} \d{2}:\d{2}:\d{2}) (?P<sev>[A-Z]*) (?P<msg>.*)$'
160+
```
161+
162+
If the set of filelog receiver operators needs to be changed, the full list,
163+
including the default container parser, has to be redefined because list config
164+
fields are entirely replaced when merged into the default configuration struct.
165+
166+
The discovery functionality has to be explicitly enabled in the receiver creator
167+
by adding the following configuration field:
168+
169+
```yaml
170+
receivers:
171+
receiver_creator:
172+
watch_observers: [k8s_observer]
173+
discovery:
174+
enabled: true
175+
```
176+
177+
## Give it a try
178+
179+
If you are an OpenTelemetry Collector user on Kubernetes, and you find this new
180+
feature interesting, see [Receiver Creator configuration] section to learn more.
181+
182+
Give it a try and let us know what you think via the `#otel-collector` channel
183+
of the [CNCF Slack workspace](https://slack.cncf.io/).
184+
185+
[Receiver Creator configuration]:
186+
https://github.com/open-telemetry/opentelemetry-collector-contrib/blob/v0.117.0/receiver/receivercreator/README.md#generate-receiver-configurations-from-provided-hints

static/refcache.json

+24
Original file line numberDiff line numberDiff line change
@@ -6835,6 +6835,10 @@
68356835
"StatusCode": 206,
68366836
"LastSeen": "2025-01-17T15:52:04.586821-05:00"
68376837
},
6838+
"https://github.com/open-telemetry/opentelemetry-collector-contrib/blob/v0.117.0/receiver/receivercreator/README.md#generate-receiver-configurations-from-provided-hints": {
6839+
"StatusCode": 206,
6840+
"LastSeen": "2025-01-23T00:42:06.594567598Z"
6841+
},
68386842
"https://github.com/open-telemetry/opentelemetry-collector-contrib/blob/v0.95.0/processor/spanmetricsprocessor/README.md": {
68396843
"StatusCode": 206,
68406844
"LastSeen": "2025-01-17T15:48:51.581459-05:00"
@@ -6843,6 +6847,10 @@
68436847
"StatusCode": 200,
68446848
"LastSeen": "2024-10-24T15:10:27.834953+02:00"
68456849
},
6850+
"https://github.com/open-telemetry/opentelemetry-collector-contrib/issues/17418": {
6851+
"StatusCode": 206,
6852+
"LastSeen": "2025-01-23T00:42:05.404739498Z"
6853+
},
68466854
"https://github.com/open-telemetry/opentelemetry-collector-contrib/issues/23611": {
68476855
"StatusCode": 200,
68486856
"LastSeen": "2024-03-19T10:16:57.223070258Z"
@@ -6863,6 +6871,10 @@
68636871
"StatusCode": 200,
68646872
"LastSeen": "2024-05-15T19:23:48.560170178+03:00"
68656873
},
6874+
"https://github.com/open-telemetry/opentelemetry-collector-contrib/issues/34427": {
6875+
"StatusCode": 206,
6876+
"LastSeen": "2025-01-23T00:42:06.061499598Z"
6877+
},
68666878
"https://github.com/open-telemetry/opentelemetry-collector-contrib/pull/16594": {
68676879
"StatusCode": 200,
68686880
"LastSeen": "2024-01-30T16:05:10.646669-05:00"
@@ -7775,6 +7787,18 @@
77757787
"StatusCode": 206,
77767788
"LastSeen": "2025-01-16T14:34:25.853743-05:00"
77777789
},
7790+
"https://github.com/open-telemetry/opentelemetry-collector-contrib/tree/v0.117.0/extension/observer/k8sobserver": {
7791+
"StatusCode": 206,
7792+
"LastSeen": "2025-01-23T00:42:06.341804585Z"
7793+
},
7794+
"https://github.com/open-telemetry/opentelemetry-collector-contrib/tree/v0.117.0/receiver/nginxreceiver": {
7795+
"StatusCode": 206,
7796+
"LastSeen": "2025-01-23T00:42:04.586714166Z"
7797+
},
7798+
"https://github.com/open-telemetry/opentelemetry-collector-contrib/tree/v0.117.0/receiver/receivercreator": {
7799+
"StatusCode": 206,
7800+
"LastSeen": "2025-01-23T00:42:04.28940445Z"
7801+
},
77787802
"https://github.com/open-telemetry/opentelemetry-collector-contrib/tree/v0.61.0/exporter/prometheusexporter": {
77797803
"StatusCode": 206,
77807804
"LastSeen": "2025-01-16T14:34:03.854505-05:00"

0 commit comments

Comments
 (0)