|
| 1 | +--- |
| 2 | +title: OpenTelemetry Is Expanding Into CI/CD Observability |
| 3 | +linkTitle: OpenTelemetry Is Expanding Into CI/CD Observability |
| 4 | +date: 2025-02-24 |
| 5 | +author: >- |
| 6 | + [Dotan Horovits](https://github.com/horovits/) (CNCF Ambassador), [Adriel |
| 7 | + Perkins](https://github.com/adrielp) (Liatrio) |
| 8 | +canonical_url: https://www.cncf.io/blog/2024/11/04/opentelemetry-is-expanding-into-ci-cd-observability/ |
| 9 | +issue: 5546 |
| 10 | +sig: CI/CD Observability |
| 11 | +# prettier-ignore |
| 12 | +cSpell:ignore: andrzej bäck bäckmark chacin cicd frittoli grassi helmuth horovits jemmic joao kamphaus keptn kowalski liatrio liudmila molkova robb ruech safyan sarahan shkuro skyscanner slsa stencel suereth tekton voss |
| 13 | +--- |
| 14 | + |
| 15 | +We’ve been talking about the need for a common “language” for reporting and |
| 16 | +observing CI/CD pipelines for years, and finally, we see the first “words” of |
| 17 | +this language entering the “dictionary” of observability—the |
| 18 | +[OpenTelemetry open specification](/docs/specs/otel/). With the recent release |
| 19 | +of OpenTelemetry’s [Semantic Conventions](/docs/specs/semconv/), v1.27.0, you |
| 20 | +can find |
| 21 | +[designated attributes for reporting CI/CD pipelines](/docs/specs/semconv/attributes-registry/cicd/). |
| 22 | + |
| 23 | +This is the result of the hard work of the |
| 24 | +[CI/CD Observability Special Interest Group (SIG) within OpenTelemetry](https://github.com/open-telemetry/community/blob/main/projects/ci-cd.md). |
| 25 | +As we accomplish this core milestone for the first phase, we thought it’d be a |
| 26 | +good time to share it with the world. |
| 27 | + |
| 28 | +## Engineers need observability into their CI/CD pipelines |
| 29 | + |
| 30 | +[CI/CD observability](https://medium.com/@horovits/fcc6c10c4987) is essential |
| 31 | +for ensuring that software is released to production efficiently and reliably. |
| 32 | +Well-functioning CI/CD pipelines directly impact business outcomes by shortening |
| 33 | +[Lead Time for Changes DORA metric](https://horovits.medium.com/improving-devops-performance-with-dora-metrics-918b9604f8e2) |
| 34 | +and enabling fast identification and resolution of broken or flaky processes. By |
| 35 | +integrating observability into CI/CD workflows, teams can monitor the health and |
| 36 | +performance of their pipelines in real time, gaining insights into bottlenecks |
| 37 | +and areas that require improvement. |
| 38 | + |
| 39 | +Leveraging the same well-established tools used for monitoring production |
| 40 | +environments, organizations can extend their observability capabilities to |
| 41 | +include the release cycle, fostering a holistic approach to software delivery. |
| 42 | +Whether open source or proprietary tools, there’s no need to reinvent the wheel |
| 43 | +when choosing the observability toolchain for CI/CD pipelines. |
| 44 | + |
| 45 | +## The need for standardization |
| 46 | + |
| 47 | +However, the diverse landscape of CI/CD tools creates challenges in achieving |
| 48 | +consistent end-to-end observability. With each tool having its own means, |
| 49 | +format, and semantic conventions for reporting the pipeline execution status, |
| 50 | +fragmentation within the toolchain can hinder seamless monitoring. Migrating |
| 51 | +between tools becomes painful, as it requires reimplementing existing |
| 52 | +dashboards, reports, and alerts. |
| 53 | + |
| 54 | +Things become even more challenging when you need to monitor multiple tools |
| 55 | +involved in the release pipeline in a uniform manner. This is where |
| 56 | +[open standards and specifications become critical](https://horovits.medium.com/the-rise-of-open-standards-in-observability-highlights-from-kubecon-13694e732c97). |
| 57 | +They create a common uniform language, one which is tool- and vendor-agnostic, |
| 58 | +enabling cohesive observability across different tools and allowing teams to |
| 59 | +maintain a clear and comprehensive view of their CI/CD pipeline performance. |
| 60 | + |
| 61 | +The need for standardization is relevant for creating the semantic conventions |
| 62 | +mentioned above, the language for reporting what goes on in the pipeline. |
| 63 | +Standardization is also needed for the means in which this reporting is |
| 64 | +propagated through the system, such as upon spawning processes during the |
| 65 | +pipeline execution. This led us to promote standardization for using environment |
| 66 | +variables for context and baggage propagation between processes, another |
| 67 | +important milestone that was recently approved and merged. |
| 68 | + |
| 69 | +## OpenTelemetry: the natural home for CI/CD observability specification |
| 70 | + |
| 71 | +This realization drove us to look for the right way to approach creating a |
| 72 | +specification. OpenTelemetry emerges as the standard for telemetry generation |
| 73 | +and collection. The OpenTelemetry specification is tasked with exactly this |
| 74 | +problem: creating a common uniform and vendor-agnostic specification for |
| 75 | +telemetry. And its support from the Cloud Native Computing Foundation (CNCF) |
| 76 | +ensures it remains open and vendor-neutral. As long standing advocates of |
| 77 | +OpenTelemetry, it only made sense to extend OpenTelemetry to cover this |
| 78 | +important DevOps use case. |
| 79 | + |
| 80 | +We started with an |
| 81 | +[OpenTelemetry extension proposal (OTEP #223)](https://github.com/open-telemetry/oteps/pull/223) |
| 82 | +a couple of years ago, proposing our idea to extend OpenTelemetry to cover the |
| 83 | +CI/CD observability use case. In parallel, we’ve started a Slack channel on the |
| 84 | +CNCF Slack to gather fellow enthusiasts behind the idea and start brainstorming |
| 85 | +what that should look like. The Slack channel grew and we quickly discovered |
| 86 | +that the problem is common across many organizations. |
| 87 | + |
| 88 | +With the feedback from the Technical Oversight Committee and others within the |
| 89 | +CNCF, we’ve taken the path of asking the mandate to start a dedicated Working |
| 90 | +Group for the topic under OpenTelemetry’s Semantic Conventions SIG (SIG SemConv |
| 91 | +in short). With their blessing, we |
| 92 | +[launched the formal CI/CD Observability SIG](https://github.com/open-telemetry/community/blob/main/projects/ci-cd.md) |
| 93 | +to formalize our previous Slack group discussions and goals. |
| 94 | + |
| 95 | +## OpenTelemetry’s CI/CD Observability SIG |
| 96 | + |
| 97 | +Since November of 2023, the SIG has been actively working to develop the |
| 98 | +standard for semantics around CI/CD observability in collaboration with experts |
| 99 | +from multiple companies and open source projects. At its inception, we decided |
| 100 | +to focus on a few key areas for 2024: |
| 101 | + |
| 102 | +- An initial set of common attributes across CI/CD systems. |
| 103 | +- Develop prototype(s) to include both holistic and signal-specific attributes. |
| 104 | +- Carry forward the proposal to add environment variables as context propagators |
| 105 | + to the OpenTelemetry specification (OTEP #258). |
| 106 | +- A strategy for bridging OpenTelemetry conventions with |
| 107 | + [CDEvents](https://cdevents.dev/docs/) and |
| 108 | + [Eiffel](https://eiffel-community.github.io/). |
| 109 | + |
| 110 | +At first, our SIG met during the larger Semantic Conventions Working Group |
| 111 | +meetings every Monday. This provided a good opportunity for us to get our |
| 112 | +bearings as we researched and discussed how we would accomplish the goals on our |
| 113 | +roadmap. This also enabled us to get to know many members of the larger |
| 114 | +OpenTelemetry community, solicit feedback on our designs, and get direction on |
| 115 | +how to proceed. The OpenTelemetry Semantic Convention Working Group has been |
| 116 | +extraordinarily supportive of the CI/CD initiative. |
| 117 | + |
| 118 | +Upon completion and release of its initial milestone (see below), our SIG was |
| 119 | +granted its own |
| 120 | +[dedicated meeting slot](https://github.com/open-telemetry/community/pull/2293) |
| 121 | +on the |
| 122 | +[OpenTelemetry calendar](https://github.com/open-telemetry/community#calendar), |
| 123 | +every Thursday at 0600 PT. The group gets together here to discuss current and |
| 124 | +future work prior to bringing to the larger Semantic Conventions meetings on |
| 125 | +Monday. We greatly look forward to the continued support and participation of |
| 126 | +the community as we continue to drive forward this critical area of |
| 127 | +standardization. |
| 128 | + |
| 129 | +## CI/CD is part of the latest OpenTelemetry Semantic Conventions |
| 130 | + |
| 131 | +Over the course of months of iteration and feedback, the |
| 132 | +[first set of Semantic Conventions was merged](https://github.com/open-telemetry/semantic-conventions/pull/1075) |
| 133 | +in for the v1.27.0 release. This change brought forth the first set of |
| 134 | +foundational semantics for CI/CD under the `CICD`, `artifacts`, `VCS`, `test`, |
| 135 | +and `deployment` namespaces. This was a significant milestone for the CI/CD |
| 136 | +Observability SIG and industry as a whole. This creates the foundation for which |
| 137 | +all of our group’s other goals can begin to take form, and reach implementation. |
| 138 | + |
| 139 | +But what does that actually mean? What value does it provide? Let’s consider |
| 140 | +real world examples for two of the namespaces. |
| 141 | + |
| 142 | +### Tracking release revisions from Version Control Systems (VCS) |
| 143 | + |
| 144 | +[Version Control System (VCS) attributes](/docs/specs/semconv/attributes-registry/vcs/) |
| 145 | +cover multiple areas common in a VCS like refs and changes (pull/merge |
| 146 | +requests). The `vcs.repository.ref.revision` attribute is a key piece of |
| 147 | +metadata. As Version Control Systems like GitHub and GitLab emit events, they |
| 148 | +can now have this semantically compliant attribute. That means when integrating |
| 149 | +code, releasing it, and deploying it to environments, systems can include this |
| 150 | +attribute and trace the code revision across bounds more easily. In the event a |
| 151 | +deployment fails, you can quickly look at the revision of code and track it back |
| 152 | +to the buggy release. This attribute is actually a key piece of metadata for |
| 153 | +[DORA metrics](https://dora.dev/guides/dora-metrics-four-keys/) too, as you |
| 154 | +calculate Change lead time and Failed deployment recovery time. |
| 155 | + |
| 156 | +### Artifacts for supply chain security, aligned with the SLSA specification |
| 157 | + |
| 158 | +The |
| 159 | +[artifact attribute namespace](/docs/specs/semconv/attributes-registry/artifact/) |
| 160 | +had multiple attributes for its first implementation. One key set of attributes |
| 161 | +within this namespace cover [attestations](https://slsa.dev/attestation-model) |
| 162 | +that closely align with the [SLSA](https://slsa.dev/spec/v1.0/about) model. This |
| 163 | +is really the first time a direct connection is being made between observability |
| 164 | +and software supply chain security. Consider the following |
| 165 | +[supply chain threat model](https://slsa.dev/spec/v1.0/threats) defined by SLSA: |
| 166 | +{{< figure class="figure" src="SLSA-supply-chain-model.png" attr="SLSA Community Specification License 1.0" attrlink=`https://github.com/slsa-framework/slsa?tab=License-1-ov-file` >}} |
| 167 | + |
| 168 | +These new attributes for artifacts and attestations help observe the sequence of |
| 169 | +events modeled in the above diagram in real time. Really, the conventions that |
| 170 | +exist today and those that will be added in the future enable interoperability |
| 171 | +between core software delivery capabilities like security and platform |
| 172 | +engineering using observability semantics. |
| 173 | + |
| 174 | +## What’s next for CI/CD Observability Working Group |
| 175 | + |
| 176 | +As already mentioned, the first major milestone we reached was the merge of the |
| 177 | +OTEP for extending the semantic conventions with the new attributes, which is |
| 178 | +now part of the OpenTelemetry Semantic Conventions latest release. |
| 179 | + |
| 180 | +The second important milestone is |
| 181 | +[OTEP #258](https://github.com/open-telemetry/oteps/pull/258) for Environment |
| 182 | +Variable Context Propagation, which was just approved and merged. This OTEP sets |
| 183 | +the foundation for writing the specification. |
| 184 | + |
| 185 | +Since we’ve made progress on our initial milestones, we’ve updated the |
| 186 | +[CI/CD Observability SIG milestones for the remainder of 2024](https://github.com/open-telemetry/community/blob/main/projects/ci-cd.md). |
| 187 | +Our goal is to finish out as many of the defined milestones as possible by the |
| 188 | +end of the year. Notably, we’re focused on: |
| 189 | + |
| 190 | +- Adding |
| 191 | + [metric conventions for version control systems](https://github.com/open-telemetry/semantic-conventions/pull/1383). |
| 192 | +- Building tracing prototypes in CICD systems (for example, ArgoCD, GitHub, |
| 193 | + GitLab, Jenkins). |
| 194 | +- Getting [OTEP #258](https://github.com/open-telemetry/oteps/pull/258) ready |
| 195 | + for implementation for the addition to the specification. |
| 196 | +- Adding additional attributes to the registry covering more domains like: |
| 197 | + - [Software outage incidents](https://github.com/open-telemetry/semantic-conventions/issues/1185) |
| 198 | + - [System attributes around CI/CD runners](https://github.com/open-telemetry/semantic-conventions/issues/1184) |
| 199 | +- Beginning work on trace and event (log) signal specifics to build the bridge |
| 200 | + for interoperability between other specifications. |
| 201 | +- Adopting the changes from the |
| 202 | + [Entity and Resource OTEP](https://github.com/open-telemetry/oteps/pull/264). |
| 203 | +- [Enabling vendor-specific extension(s)](https://github.com/open-telemetry/semantic-conventions/issues/1193). |
| 204 | +- Open source community outreach strategy for semantic adoption. |
| 205 | + |
| 206 | +All that has been mentioned thus far is just the beginning! We have lots of work |
| 207 | +defined on our |
| 208 | +[CICD Project Board](https://github.com/orgs/open-telemetry/projects/79), and we |
| 209 | +have work in progress! We’ll continue to iterate on the above milestones that |
| 210 | +we’ve set out for the remainder of 2024. Here’s a couple things to look out for. |
| 211 | + |
| 212 | +- Version Control System metrics—leading indicators for DORA |
| 213 | +- Traces from GitHub Actions and Audit Logs |
| 214 | + - Special thanks to the following people who are making this component |
| 215 | + possible: |
| 216 | + - Tyler Helmuth – Honeycomb |
| 217 | + - Andrzej Stencel – Elastic |
| 218 | + - Curtis Robert – Splunk |
| 219 | + - Justin Voss |
| 220 | + - Kristof Kowalski – Anz Bank |
| 221 | + - Mike Sarahan – Nvidia |
| 222 | +- A corresponding version of the GitHub Receiver Component but implemented in |
| 223 | + GitLab |
| 224 | + |
| 225 | +And much more! |
| 226 | + |
| 227 | +## It takes a village to extend OpenTelemetry |
| 228 | + |
| 229 | +Whoa, that’s a lot to do! Most certainly this SIG will continue beyond 2024 and |
| 230 | +through 2025. Standards are hard, but essential. And, we have some amazing folks |
| 231 | +that are part of the SIG and contributing to these standards! Who you may ask? |
| 232 | + |
| 233 | +Firstly we’d like to acknowledge key members of OpenTelemetry leadership |
| 234 | +committees who have heavily enabled the work we’ve done thus far, and will |
| 235 | +continue to do. |
| 236 | + |
| 237 | +From the OpenTelemetry Technical Committee we have two core sponsors, Carlos |
| 238 | +Alberto from Lightstep and Josh Suereth from Google. Both Carlos and Josh have |
| 239 | +been so supportive of the CICD work, really guiding us through the process and |
| 240 | +details we need to be successful. |
| 241 | + |
| 242 | +From the OpenTelemetry Governance Committee we’ve had Trask Stalnaker from |
| 243 | +Microsoft act as an exceptional ally, and Daniel Blanco from Skyscanner who now |
| 244 | +acts as our current Liaison. Both Trask and Daniel have been instrumental in |
| 245 | +supporting the SIG and enabling us to have our own meeting in the OpenTelemetry |
| 246 | +community. |
| 247 | + |
| 248 | +In addition to those folks, we’ve had significant feedback, support, and |
| 249 | +contributions from the following key folks: |
| 250 | + |
| 251 | +- Yuri Shkuro – Creator of Jaeger, Co-Founder of OpenTelemetry |
| 252 | +- Andrea Frittoli – Tekton CD Maintainer, CDEvents Co-creator, IBM |
| 253 | +- Emil Bäckmark – CDEvents and Eiffel Maintainer, Ericsson |
| 254 | +- Magnus Bäck – Eiffel, Axis Communications |
| 255 | +- Liudmila Molkova – Microsoft |
| 256 | +- Christopher Kamphaus – Jemmic, Jenkins |
| 257 | +- Giordano Ricci – Grafana Labs |
| 258 | +- Giovanni Liva – Dynatrace, Keptn |
| 259 | +- Ivan Calvo – Elastic, Jenkins |
| 260 | +- Armin Ruech – Dynatrace |
| 261 | +- Michael Safyan – Google |
| 262 | +- Robb Kidd – Honeycomb |
| 263 | +- Pablo Chacin – Grafana Labs |
| 264 | +- Alexandra Konrad – Elastic |
| 265 | +- Alexander Wert – Elastic |
| 266 | +- Joao Grassi – Dynatrace |
| 267 | +- DJ Gregor – Discover |
| 268 | + |
| 269 | +That was a lot of names to name! We greatly appreciate everyone who has |
| 270 | +supported this initiative and helped bring it to fruition! It takes significant |
| 271 | +thinking ability and time to build industry wide standards. Hard problems are |
| 272 | +hard, but these folks have risen to the challenge to make the world of |
| 273 | +observability and CICD systems a better, more interoperable place! |
| 274 | + |
| 275 | +## Join the Working Group discourse and make an impact |
| 276 | + |
| 277 | +Want to learn more? Want to get involved in shaping CI/CD Observability? |
| 278 | + |
| 279 | +We invite developers and practitioners to participate in the discussions, |
| 280 | +contribute ideas, and help shape the future of CI/CD observability and the |
| 281 | +OpenTelemetry semantic conventions. Discussion takes place in the |
| 282 | +[CNCF Slack](https://slack.cncf.io/) workspace under the `#cicd-o11y` channel, |
| 283 | +and you can chime in on any of the GitHub issues mentioned throughout this |
| 284 | +article and join the CICD SIG |
| 285 | +[weekly calls](https://calendar.google.com/calendar?cid=Z29vZ2xlLmNvbV9iNzllM2U5MGo3YmJzYTJuMnA1YW41bGY2MEBncm91cC5jYWxlbmRhci5nb29nbGUuY29t) |
| 286 | +every Thursday at 0600 PT. |
| 287 | + |
| 288 | +_A version of this article also [appears on the CNCF blog][]._ |
| 289 | + |
| 290 | +[appears on the CNCF blog]: <{{% param canonical_url %}}> |
0 commit comments