Skip to content

Commit 854f6a1

Browse files
ChrsMarkdmitryax
andcommitted
Add k8s annotation discovery blogpost
Co-authored-by: Dmitrii Anoshin <[email protected]> Signed-off-by: ChrsMark <[email protected]>
1 parent 8faf59e commit 854f6a1

File tree

1 file changed

+186
-0
lines changed
  • content/en/blog/2025/otel-collector-k8s-discovery

1 file changed

+186
-0
lines changed
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,186 @@
1+
---
2+
title: Kubernetes annotation based discovery for OpenTelemetry Collector
3+
linkTitle: Kubernetes annotation discovery
4+
date: 2025-01-23
5+
author: >
6+
[Dmitrii Anoshin](https://github.com/dmitryax) (Cisco/Splunk), [Christos
7+
Markou](https://github.com/ChrsMark) (Elastic)
8+
sig: Collector
9+
issue: opentelemetry-collector-contrib#34427
10+
cSpell:ignore: Dmitrii Anoshin Markou
11+
---
12+
13+
In the world of containers and [Kubernetes](https://kubernetes.io/),
14+
observability is crucial. Users need to know the status of their workloads at
15+
any given time. In other words, they need observability into moving objects.
16+
17+
This is where the [OpenTelemetry Collector](/docs/collector) and its
18+
[receiver creator](https://github.com/open-telemetry/opentelemetry-collector-contrib/tree/v0.117.0/receiver/receivercreator)
19+
component come in handy. Users can set up fairly complex monitoring scenarios
20+
with a self-service approach, following the principle of least privilege at the
21+
cluster level.
22+
23+
The self-service approach is great, but how much self-service can it actually
24+
be? In this blog post, we will explore a newly added feature of the Collector
25+
that makes dynamic workload discovery even easier, providing a seamless
26+
experience for both administrators and users.
27+
28+
## Automatic discovery for containers and pods
29+
30+
Applications running on containers and pods become moving targets for the
31+
monitoring system. With automatic discovery, monitoring agents like the
32+
Collector can track changes at the container and pod levels and dynamically
33+
adjust the monitoring configuration.
34+
35+
Today, the Collector—and specifically the receiver creator—can provide such an
36+
experience. Using the receiver creator, observability users can define
37+
configuration "templates" that rely on environment conditions. For example, as
38+
an observability engineer, I can configure my Collector to enable the
39+
[NGINX receiver](https://github.com/open-telemetry/opentelemetry-collector-contrib/tree/v0.117.0/receiver/nginxreceiver)
40+
when a NGINX pod is deployed on the cluster. The following configuration can
41+
achieve this:
42+
43+
```yaml
44+
receivers:
45+
receiver_creator:
46+
watch_observers: [k8s_observer]
47+
receivers:
48+
nginx:
49+
rule: type == "port" && port == 80 && pod.name matches "(?i)nginx"
50+
config:
51+
endpoint: 'http://`endpoint`/nginx_status'
52+
collection_interval: '15s'
53+
```
54+
55+
The above configuration will be enabled when a pod is discovered via the
56+
Kubernetes API that exposes port `80` (the known port for NGINX) and it's name
57+
matches the `nginx` keyword.
58+
59+
This is great, and as an SRE or Platform Engineer managing an observability
60+
solution, you can rely on this to meet your users' needs for monitoring NGINX
61+
workloads. However, what happens if another team wants to monitor a different
62+
type of workload, such as Apache servers? They would need to inform your team,
63+
and you would need to update the configuration with a new conditional
64+
configuration block, take it through a pull request and review process, and
65+
finally deploy it. This deployment would require the Collector instances to
66+
restart for the new configuration to take effect. While this process might not
67+
be a big deal for some teams, there is definitely room for improvement.
68+
69+
So, what if, as a Collector user, you could simply enable automatic discovery
70+
and then let your cluster users tell the Collector how their workloads should be
71+
monitored by annotating their pods properly? That sounds awesome, and it’s not
72+
actually something new. OpenTelemetry already supports auto-instrumentation
73+
through the Operator
74+
([documentation](https://opentelemetry.io/docs/kubernetes/operator/automatic/)),
75+
allowing users to instrument their applications automatically just by annotating
76+
their pods. In addition, this is a feature that other monitoring agents in the
77+
observability industry already support, and users are familiar with it.
78+
79+
All this motivation led the OpenTelemetry community
80+
([GitHub issue](https://github.com/open-telemetry/opentelemetry-collector-contrib/issues/17418))
81+
to create a similar feature for the Collector. We are happy to share that
82+
autodiscovery based on Kubernetes annotations is now supported in the Collector
83+
([GitHub issue](https://github.com/open-telemetry/opentelemetry-collector-contrib/issues/34427))!
84+
85+
## The solution
86+
87+
The solution is built on top of the existing functionality provided by the
88+
[Kubernetes observer](https://github.com/open-telemetry/opentelemetry-collector-contrib/tree/v0.117.0/extension/observer/k8sobserver)
89+
and
90+
[receiver creator](https://github.com/open-telemetry/opentelemetry-collector-contrib/tree/v0.117.0/receiver/receivercreator).
91+
92+
The K8s observer notifies the receiver creator about the objects appearing in
93+
the K8s cluster and provides all the information about them. In addition to the
94+
K8s object metadata, the observer supplies information about the discovered
95+
endpoints that the collector can connect to. This means that each discovered
96+
endpoint can potentially be used by a particular scraping receiver to fetch
97+
metrics data.
98+
99+
Each scraping receiver has a default configuration with only one required field:
100+
`endpoint`. Given that the endpoint information is provided by the Kubernetes
101+
observer, the only information that the user needs to provide explicitly is
102+
which receiver/scraper should be used to scrape data from a discovered endpoint.
103+
That information can be configured on the collector, but as mentioned before,
104+
this is inconvenient. A much more convenient place to define which receiver can
105+
be used to scrape telemetry from a particular pod is the pod itself. Pod’s
106+
annotations is the natural place to put that kind of detail. Given that the
107+
receiver creator has access to the annotations, it can instantiate the proper
108+
receiver with the receiver’s default configuration and discovered endpoint.
109+
110+
The following annotation instructs the receiver creator that this particular pod
111+
runs NGINX, and the
112+
[NGINX receiver](https://github.com/open-telemetry/opentelemetry-collector-contrib/tree/v0.117.0/receiver/nginxreceiver)
113+
can be used to scrape metrics from it:
114+
115+
```yaml
116+
io.opentelemetry.discovery.metrics/scraper: nginx
117+
```
118+
119+
Apart from that, the discovery on the pod need to be explicitly enabled with the
120+
following annotation:
121+
122+
```yaml
123+
io.opentelemetry.discovery.metrics/enabled: 'true'
124+
```
125+
126+
In some scenarios, the default receiver’s configuration is not suitable for
127+
connecting to a particular pod. In that case, it’s possible to define custom
128+
configuration as part of another annotation:
129+
130+
```yaml
131+
io.opentelemetry.discovery.metrics/config: |
132+
endpoint: "http://`endpoint`/nginx_status"
133+
collection_interval: '20s'
134+
initial_delay: '20s'
135+
read_buffer_size: '10'
136+
```
137+
138+
It’s important to mention that the configuration defined in the annotations
139+
cannot point the receiver creator to another pod. The collector will reject such
140+
configurations.
141+
142+
In addition to the metrics scraping, the annotation-based discovery also
143+
supports log collection with filelog receiver. The following annotation can be
144+
used to enable log collection on a particular pod:
145+
146+
```yaml
147+
io.opentelemetry.discovery.logs/enabled: 'true'
148+
```
149+
150+
Similar to metrics, an optional configuration can be provided in the following
151+
form:
152+
153+
```yaml
154+
io.opentelemetry.discovery.logs/config: |
155+
max_log_size: "2MiB"
156+
operators:
157+
- type: container
158+
id: container-parser
159+
- type: regex_parser
160+
regex: '^(?P<time>\d{4}-\d{2}-\d{2} \d{2}:\d{2}:\d{2}) (?P<sev>[A-Z]*) (?P<msg>.*)$'
161+
```
162+
163+
If the set of filelog receiver operators needs to be changed, the full list,
164+
including the default container parser, has to be redefined because list config
165+
fields are entirely replaced when merged into the default configuration struct.
166+
167+
The discovery functionality has to be explicitly enabled in the receiver creator
168+
just by adding the following configuration field:
169+
170+
```yaml
171+
receivers:
172+
receiver_creator:
173+
watch_observers: [k8s_observer]
174+
discovery:
175+
enabled: true
176+
```
177+
178+
## Conclusion - Wrapping up
179+
180+
If you are an OpenTelemetry Collector user on Kubernetes and you find this new
181+
feature interesting, go ahead and visit the official
182+
[documentation](https://github.com/open-telemetry/opentelemetry-collector-contrib/blob/v0.117.0/receiver/receivercreator/README.md#generate-receiver-configurations-from-provided-hints)
183+
to learn more! And if you give it a try let us know what you think. Don't
184+
hesitate to reach out to us in the official CNCF
185+
[Slack workspace](https://slack.cncf.io/) and specifically the `#otel-collector`
186+
channel.

0 commit comments

Comments
 (0)