|
| 1 | +--- |
| 2 | +title: Kubernetes annotation based discovery for OpenTelemetry Collector |
| 3 | +linkTitle: Kubernetes annotation discovery |
| 4 | +date: 2025-01-23 |
| 5 | +author: > |
| 6 | + [Dmitrii Anoshin](https://github.com/dmitryax) (Cisco/Splunk), [Christos |
| 7 | + Markou](https://github.com/ChrsMark) (Elastic) |
| 8 | +sig: Collector |
| 9 | +issue: opentelemetry-collector-contrib#34427 |
| 10 | +cSpell:ignore: Dmitrii Anoshin Markou |
| 11 | +--- |
| 12 | + |
| 13 | +In the world of containers and [Kubernetes](https://kubernetes.io/), |
| 14 | +observability is crucial. Users need to know the status of their workloads at |
| 15 | +any given time. In other words, they need observability into moving objects. |
| 16 | + |
| 17 | +This is where the [OpenTelemetry Collector](/docs/collector) and its |
| 18 | +[receiver creator](https://github.com/open-telemetry/opentelemetry-collector-contrib/tree/v0.117.0/receiver/receivercreator) |
| 19 | +component come in handy. Users can set up fairly complex monitoring scenarios |
| 20 | +with a self-service approach, following the principle of least privilege at the |
| 21 | +cluster level. |
| 22 | + |
| 23 | +The self-service approach is great, but how much self-service can it actually |
| 24 | +be? In this blog post, we will explore a newly added feature of the Collector |
| 25 | +that makes dynamic workload discovery even easier, providing a seamless |
| 26 | +experience for both administrators and users. |
| 27 | + |
| 28 | +## Automatic discovery for containers and pods |
| 29 | + |
| 30 | +Applications running on containers and pods become moving targets for the |
| 31 | +monitoring system. With automatic discovery, monitoring agents like the |
| 32 | +Collector can track changes at the container and pod levels and dynamically |
| 33 | +adjust the monitoring configuration. |
| 34 | + |
| 35 | +Today, the Collector—and specifically the receiver creator—can provide such an |
| 36 | +experience. Using the receiver creator, observability users can define |
| 37 | +configuration "templates" that rely on environment conditions. For example, as |
| 38 | +an observability engineer, I can configure my Collector to enable the |
| 39 | +[NGINX receiver](https://github.com/open-telemetry/opentelemetry-collector-contrib/tree/v0.117.0/receiver/nginxreceiver) |
| 40 | +when a NGINX pod is deployed on the cluster. The following configuration can |
| 41 | +achieve this: |
| 42 | + |
| 43 | +```yaml |
| 44 | +receivers: |
| 45 | + receiver_creator: |
| 46 | + watch_observers: [k8s_observer] |
| 47 | + receivers: |
| 48 | + nginx: |
| 49 | + rule: type == "port" && port == 80 && pod.name matches "(?i)nginx" |
| 50 | + config: |
| 51 | + endpoint: 'http://`endpoint`/nginx_status' |
| 52 | + collection_interval: '15s' |
| 53 | +``` |
| 54 | +
|
| 55 | +The above configuration will be enabled when a pod is discovered via the |
| 56 | +Kubernetes API that exposes port `80` (the known port for NGINX) and it's name |
| 57 | +matches the `nginx` keyword. |
| 58 | + |
| 59 | +This is great, and as an SRE or Platform Engineer managing an observability |
| 60 | +solution, you can rely on this to meet your users' needs for monitoring NGINX |
| 61 | +workloads. However, what happens if another team wants to monitor a different |
| 62 | +type of workload, such as Apache servers? They would need to inform your team, |
| 63 | +and you would need to update the configuration with a new conditional |
| 64 | +configuration block, take it through a pull request and review process, and |
| 65 | +finally deploy it. This deployment would require the Collector instances to |
| 66 | +restart for the new configuration to take effect. While this process might not |
| 67 | +be a big deal for some teams, there is definitely room for improvement. |
| 68 | + |
| 69 | +So, what if, as a Collector user, you could simply enable automatic discovery |
| 70 | +and then let your cluster users tell the Collector how their workloads should be |
| 71 | +monitored by annotating their pods properly? That sounds awesome, and it’s not |
| 72 | +actually something new. OpenTelemetry already supports auto-instrumentation |
| 73 | +through the Operator |
| 74 | +([documentation](https://opentelemetry.io/docs/kubernetes/operator/automatic/)), |
| 75 | +allowing users to instrument their applications automatically just by annotating |
| 76 | +their pods. In addition, this is a feature that other monitoring agents in the |
| 77 | +observability industry already support, and users are familiar with it. |
| 78 | + |
| 79 | +All this motivation led the OpenTelemetry community |
| 80 | +([GitHub issue](https://github.com/open-telemetry/opentelemetry-collector-contrib/issues/17418)) |
| 81 | +to create a similar feature for the Collector. We are happy to share that |
| 82 | +autodiscovery based on Kubernetes annotations is now supported in the Collector |
| 83 | +([GitHub issue](https://github.com/open-telemetry/opentelemetry-collector-contrib/issues/34427))! |
| 84 | + |
| 85 | +## The solution |
| 86 | + |
| 87 | +The solution is built on top of the existing functionality provided by the |
| 88 | +[Kubernetes observer](https://github.com/open-telemetry/opentelemetry-collector-contrib/tree/v0.117.0/extension/observer/k8sobserver) |
| 89 | +and |
| 90 | +[receiver creator](https://github.com/open-telemetry/opentelemetry-collector-contrib/tree/v0.117.0/receiver/receivercreator). |
| 91 | + |
| 92 | +The K8s observer notifies the receiver creator about the objects appearing in |
| 93 | +the K8s cluster and provides all the information about them. In addition to the |
| 94 | +K8s object metadata, the observer supplies information about the discovered |
| 95 | +endpoints that the collector can connect to. This means that each discovered |
| 96 | +endpoint can potentially be used by a particular scraping receiver to fetch |
| 97 | +metrics data. |
| 98 | + |
| 99 | +Each scraping receiver has a default configuration with only one required field: |
| 100 | +`endpoint`. Given that the endpoint information is provided by the Kubernetes |
| 101 | +observer, the only information that the user needs to provide explicitly is |
| 102 | +which receiver/scraper should be used to scrape data from a discovered endpoint. |
| 103 | +That information can be configured on the collector, but as mentioned before, |
| 104 | +this is inconvenient. A much more convenient place to define which receiver can |
| 105 | +be used to scrape telemetry from a particular pod is the pod itself. Pod’s |
| 106 | +annotations is the natural place to put that kind of detail. Given that the |
| 107 | +receiver creator has access to the annotations, it can instantiate the proper |
| 108 | +receiver with the receiver’s default configuration and discovered endpoint. |
| 109 | + |
| 110 | +The following annotation instructs the receiver creator that this particular pod |
| 111 | +runs NGINX, and the |
| 112 | +[NGINX receiver](https://github.com/open-telemetry/opentelemetry-collector-contrib/tree/v0.117.0/receiver/nginxreceiver) |
| 113 | +can be used to scrape metrics from it: |
| 114 | + |
| 115 | +```yaml |
| 116 | +io.opentelemetry.discovery.metrics/scraper: nginx |
| 117 | +``` |
| 118 | + |
| 119 | +Apart from that, the discovery on the pod need to be explicitly enabled with the |
| 120 | +following annotation: |
| 121 | + |
| 122 | +```yaml |
| 123 | +io.opentelemetry.discovery.metrics/enabled: 'true' |
| 124 | +``` |
| 125 | + |
| 126 | +In some scenarios, the default receiver’s configuration is not suitable for |
| 127 | +connecting to a particular pod. In that case, it’s possible to define custom |
| 128 | +configuration as part of another annotation: |
| 129 | + |
| 130 | +```yaml |
| 131 | +io.opentelemetry.discovery.metrics/config: | |
| 132 | + endpoint: "http://`endpoint`/nginx_status" |
| 133 | + collection_interval: '20s' |
| 134 | + initial_delay: '20s' |
| 135 | + read_buffer_size: '10' |
| 136 | +``` |
| 137 | +
|
| 138 | +It’s important to mention that the configuration defined in the annotations |
| 139 | +cannot point the receiver creator to another pod. The collector will reject such |
| 140 | +configurations. |
| 141 | +
|
| 142 | +In addition to the metrics scraping, the annotation-based discovery also |
| 143 | +supports log collection with filelog receiver. The following annotation can be |
| 144 | +used to enable log collection on a particular pod: |
| 145 | +
|
| 146 | +```yaml |
| 147 | +io.opentelemetry.discovery.logs/enabled: 'true' |
| 148 | +``` |
| 149 | +
|
| 150 | +Similar to metrics, an optional configuration can be provided in the following |
| 151 | +form: |
| 152 | +
|
| 153 | +```yaml |
| 154 | +io.opentelemetry.discovery.logs/config: | |
| 155 | + max_log_size: "2MiB" |
| 156 | + operators: |
| 157 | + - type: container |
| 158 | + id: container-parser |
| 159 | + - type: regex_parser |
| 160 | + regex: '^(?P<time>\d{4}-\d{2}-\d{2} \d{2}:\d{2}:\d{2}) (?P<sev>[A-Z]*) (?P<msg>.*)$' |
| 161 | +``` |
| 162 | +
|
| 163 | +If the set of filelog receiver operators needs to be changed, the full list, |
| 164 | +including the default container parser, has to be redefined because list config |
| 165 | +fields are entirely replaced when merged into the default configuration struct. |
| 166 | +
|
| 167 | +The discovery functionality has to be explicitly enabled in the receiver creator |
| 168 | +just by adding the following configuration field: |
| 169 | +
|
| 170 | +```yaml |
| 171 | +receivers: |
| 172 | + receiver_creator: |
| 173 | + watch_observers: [k8s_observer] |
| 174 | + discovery: |
| 175 | + enabled: true |
| 176 | +``` |
| 177 | +
|
| 178 | +## Conclusion - Wrapping up |
| 179 | +
|
| 180 | +If you are an OpenTelemetry Collector user on Kubernetes and you find this new |
| 181 | +feature interesting, go ahead and visit the official |
| 182 | +[documentation](https://github.com/open-telemetry/opentelemetry-collector-contrib/blob/v0.117.0/receiver/receivercreator/README.md#generate-receiver-configurations-from-provided-hints) |
| 183 | +to learn more! And if you give it a try let us know what you think. Don't |
| 184 | +hesitate to reach out to us in the official CNCF |
| 185 | +[Slack workspace](https://slack.cncf.io/) and specifically the `#otel-collector` |
| 186 | +channel. |
0 commit comments