|
| 1 | +--- |
| 2 | +title: AI Agent Observability - Evolving Standards and Best Practices |
| 3 | +author: >- |
| 4 | + [Guangya Liu](https://github.com/gyliu513) (IBM), [Sujay |
| 5 | + Solomon](https://github.com/solsu01) (Google) |
| 6 | +linkTitle: AI Agent Observability |
| 7 | +issue: https://github.com/open-telemetry/opentelemetry.io/issues/6389 |
| 8 | +sig: SIG GenAI Observability |
| 9 | +date: 2025-03-06 |
| 10 | +cSpell:ignore: genai Guangya PydanticAI Sujay |
| 11 | +--- |
| 12 | + |
| 13 | +## 2025: Year of AI agents |
| 14 | + |
| 15 | +AI Agents are becoming the next big leap in artificial intelligence in 2025. |
| 16 | +From autonomous workflows to intelligent decision making, AI Agents will power |
| 17 | +numerous applications across industries. However, with this evolution comes the |
| 18 | +critical need for AI agent observability, especially when scaling these agents |
| 19 | +to meet enterprise needs. Without proper monitoring, tracing, and logging |
| 20 | +mechanisms, diagnosing issues, improving efficiency, and ensuring reliability in |
| 21 | +AI agent-driven applications will be challenging. |
| 22 | + |
| 23 | +### What is an AI agent |
| 24 | + |
| 25 | +An AI agent is an application that uses a combination of LLM capabilities, tools |
| 26 | +to connect to the external world, and high-level reasoning to achieve a desired |
| 27 | +end goal or state; Alternatively, agents can also be treated as systems where |
| 28 | +LLMs dynamically direct their own processes and tool usage, maintaining control |
| 29 | +over how they accomplish tasks. |
| 30 | + |
| 31 | + |
| 32 | +<small>_Image credit_: |
| 33 | +[Google AI Agent Whitepaper](https://www.kaggle.com/whitepaper-agents).</small> |
| 34 | + |
| 35 | +For more information about AI agents, see: |
| 36 | + |
| 37 | +- [Google: What is an AI agent?](https://cloud.google.com/discover/what-are-ai-agents) |
| 38 | +- [IBM: What are AI agents?](https://www.ibm.com/think/topics/ai-agents) |
| 39 | +- [MicroSoft: AI agents — what they are, and how they’ll change the way we work](https://news.microsoft.com/source/features/ai/ai-agents-what-they-are-and-how-theyll-change-the-way-we-work/) |
| 40 | +- [AWS: What are AI Agents?](https://aws.amazon.com/what-is/ai-agents/) |
| 41 | +- [Anthropic: Building effective agents](https://www.anthropic.com/research/building-effective-agents) |
| 42 | + |
| 43 | +### Observability and beyond |
| 44 | + |
| 45 | +Typically, telemetry from applications is used to monitor and troubleshoot them. |
| 46 | +In the case of an AI agent, given its non-deterministic nature, telemetry is |
| 47 | +also used as a feedback loop to continuously learn from and improve the quality |
| 48 | +of the agent by using it as input for evaluation tools. |
| 49 | + |
| 50 | +Given that observability and evaluation tools for GenAI come from various |
| 51 | +vendors, it is important to establish standards around the shape of the |
| 52 | +telemetry generated by agent apps to avoid lock-in caused by vendor or framework |
| 53 | +specific formats. |
| 54 | + |
| 55 | +## Current state of AI agent observability |
| 56 | + |
| 57 | +As AI agent ecosystems continue to mature, the need for standardized and robust |
| 58 | +observability has become more apparent. While some frameworks offer built-in |
| 59 | +instrumentation, others rely on integration with observability tools. This |
| 60 | +fragmented landscape underscores the importance of the |
| 61 | +[GenAI observability project](https://github.com/open-telemetry/community/blob/main/projects/gen-ai.md) |
| 62 | +and OpenTelemetry’s emerging semantic conventions, which aim to unify how |
| 63 | +telemetry data is collected and reported. |
| 64 | + |
| 65 | +### Understanding AI agent application vs. AI agent framework |
| 66 | + |
| 67 | +It is crucial to distinguish between **AI agent application** and **AI agent |
| 68 | +frameworks**: |
| 69 | + |
| 70 | +- **AI agent application** refer to individual AI-driven entities that perform |
| 71 | + specific tasks autonomously. |
| 72 | +- **AI agent framework** provide the necessary infrastructure to develop, |
| 73 | + manage, and deploy AI agents often in a more streamlined way than building an |
| 74 | + agent from scratch. Examples include the following: |
| 75 | + [IBM Bee AI](https://github.com/i-am-bee), |
| 76 | + [IBM wxFlow](https://github.com/IBM/wxflows/), |
| 77 | + [CrewAI](https://www.crewai.com/), |
| 78 | + [AutoGen](https://microsoft.github.io/autogen/dev/), |
| 79 | + [Semantic Kernel](https://github.com/microsoft/semantic-kernel), |
| 80 | + [LangGraph](https://www.langchain.com/langgraph), |
| 81 | + [PydanticAI](https://ai.pydantic.dev/) and more. |
| 82 | + |
| 83 | + |
| 84 | + |
| 85 | +### Establishing a standardized semantic convention |
| 86 | + |
| 87 | +Today, the |
| 88 | +[GenAI observability project](https://github.com/open-telemetry/community/blob/main/projects/gen-ai.md) |
| 89 | +within OpenTelemetry is actively working on defining semantic conventions to |
| 90 | +standardize AI agent observability. This effort is primarily driven by: |
| 91 | + |
| 92 | +- **Agent application semantic convention** – A draft AI agent application |
| 93 | + semantic convention has already been established and finalized as part of the |
| 94 | + discussions in the |
| 95 | + [OpenTelemetry semantic conventions repository](https://github.com/open-telemetry/semantic-conventions/issues/1732). |
| 96 | + The initial AI agent semantic convention is based on |
| 97 | + [Google's AI agent white paper](https://www.kaggle.com/whitepaper-agents), |
| 98 | + providing a foundational framework for defining observability standards. |
| 99 | + Moving forward, we will continue to refine and enhance this initial convention |
| 100 | + to make it more robust and comprehensive. |
| 101 | +- **Agent framework semantic convention** – Now, the focus has shifted towards |
| 102 | + defining a common semantic convention for all AI agent frameworks. This effort |
| 103 | + is being discussed in |
| 104 | + [this OpenTelemetry issue](https://github.com/open-telemetry/semantic-conventions/issues/1530) |
| 105 | + and aims to establish a standardized approach for frameworks such as IBM Bee |
| 106 | + Stack, IBM wxFlow, CrewAI, AutoGen, LangGraph, and others. Additionally, |
| 107 | + different AI Agent frameworks will be able to define their own Framework |
| 108 | + Vendor Specific Semantic Convention while adhering to the common standard. |
| 109 | + |
| 110 | +By establishing these conventions, we ensure that AI agent frameworks can report |
| 111 | +standardized metrics, traces, and logs, making it easier to integrate |
| 112 | +observability solutions and compare performance across different frameworks. |
| 113 | + |
| 114 | +Note: Experimental conventions already exist in OpenTelemetry for models at |
| 115 | +[GenAI semantic convention](/docs/specs/semconv/gen-ai/). |
| 116 | + |
| 117 | +### Instrumentation approaches |
| 118 | + |
| 119 | +In order to make a system observable, it must be instrumented: That is, code |
| 120 | +from the system’s components must |
| 121 | +[emit traces, metrics, and logs](/docs/concepts/instrumentation/). |
| 122 | + |
| 123 | +Different AI agent frameworks have varying approaches to implementing |
| 124 | +observability, mainly categorized into two options: |
| 125 | + |
| 126 | +#### Option 1: Baked-in instrumentation |
| 127 | + |
| 128 | +The first option is to implement built-in instrumentation that emits telemetry |
| 129 | +using OpenTelemetry semantic conventions. This means observability is a native |
| 130 | +feature, allowing users to seamlessly track agent performance, task execution, |
| 131 | +and resource utilization. Some AI agent frameworks, such as CrewAI, follow this |
| 132 | +pattern. |
| 133 | + |
| 134 | +As a developer of an agent framework, here are some pros and cons of this |
| 135 | +baked-in instrumentation: |
| 136 | + |
| 137 | +- Pros |
| 138 | + - You can take on the maintenance overhead of keeping the instrumentation for |
| 139 | + telemetry up-to-date. |
| 140 | + - Simplifies adoption for users unfamiliar with OpenTelemetry configuration. |
| 141 | + - Keep new features secret while providing instrumentation for them on the day |
| 142 | + of release. |
| 143 | +- Cons |
| 144 | + - Adds bloat to the framework for users who do not need observability |
| 145 | + features. |
| 146 | + - Risk of version lock-in if the framework’s OpenTelemetry dependencies lag |
| 147 | + behind upstream updates. |
| 148 | + - Less flexibility for advanced users who prefer custom instrumentation. |
| 149 | + - You may not get feedback/review from OTel contributors familiar with current |
| 150 | + semantic conventions. |
| 151 | + - Your instrumentation may lag with respect to best practices/conventions (not |
| 152 | + just the version of the OTel library dependencies). |
| 153 | +- Some best practices to follow if you consider this approach: |
| 154 | + - Provide a configuration setting that lets users easily enable or disable |
| 155 | + telemetry collection from your framework's built-in instrumentation. |
| 156 | + - Plan ahead of users wanting to use other external instrumentation packages |
| 157 | + and avoid collision. |
| 158 | + - Consider listing your agent framework in the |
| 159 | + [OpenTelemetry registry](/ecosystem/registry/) if you choose this path. |
| 160 | +- As a developer of an agent application, you may want to choose an agent |
| 161 | + framework with baked-in instrumentation if… |
| 162 | + - Minimal dependencies on external packages in your agent app code. |
| 163 | + - Out-of-the-box observability without manual setup. |
| 164 | + |
| 165 | +#### Option 2: Instrumentation via OpenTelemetry |
| 166 | + |
| 167 | +This option is to publish OpenTelemetry instrumentation libraries to some GitHub |
| 168 | +repositories. These instrumentation libraries can be imported into agents and |
| 169 | +configured to emit telemetry per OpenTelemetry semantic conventions. |
| 170 | + |
| 171 | +For publishing instrumentation with OpenTelemetry, there are two options: |
| 172 | + |
| 173 | +- Option 1: External instrumentation in your own repository/package, like |
| 174 | + [Traceloop OpenTelemetry Instrumentation](https://github.com/traceloop/openllmetry/tree/main/packages), |
| 175 | + [Langtrace OpenTelemetry Instrumentation](https://github.com/Scale3-Labs/langtrace-python-sdk/tree/main/src/langtrace_python_sdk/instrumentation) |
| 176 | + etc. |
| 177 | +- Option 2: External instrumentation in OpenTelemetry owned repository, like |
| 178 | + [instrumentation-genai](https://github.com/open-telemetry/opentelemetry-python-contrib/tree/main/instrumentation-genai) |
| 179 | + etc. |
| 180 | + |
| 181 | +Both options work well, but the long term goal is to host the code in |
| 182 | +OpenTelemetry owned repositories, like Traceloop is trying to |
| 183 | +[donate the instrumentation code](https://github.com/open-telemetry/community/issues/2571) |
| 184 | +to OpenTelemetry now. |
| 185 | + |
| 186 | +As a developer of an agent framework, here are some pros and cons of |
| 187 | +instrumentation with OpenTelemetry: |
| 188 | + |
| 189 | +- Pros |
| 190 | + - Decouples observability from the core framework, reducing bloat. |
| 191 | + - Leverages OpenTelemetry’s community-driven maintenance for instrumentation |
| 192 | + updates. |
| 193 | + - Allows users to mix and match contrib libraries for their specific needs |
| 194 | + (e.g., cloud providers, LLM vendors). |
| 195 | + - More likely to leverage best practices around semantic conventions and |
| 196 | + zero-code instrumentation |
| 197 | +- Cons |
| 198 | + - Risk of fragmentation if users rely on incompatible or outdated contrib |
| 199 | + packages for both install time and runtime. |
| 200 | + - Development velocity slows down when there are too many PRs in the |
| 201 | + OpenTelemetry review queue. |
| 202 | +- Best practices for this approach: |
| 203 | + - Ensure compatibility with popular OpenTelemetry contrib libraries (e.g., LLM |
| 204 | + vendors, vector DBs). |
| 205 | + - Provide clear documentation on recommended contrib packages and |
| 206 | + configuration examples. |
| 207 | + - Avoid reinventing the wheel; align with existing OpenTelemetry standards. |
| 208 | +- As a developer of an agent application, you may want to choose an agent |
| 209 | + framework with baked-in instrumentation if… |
| 210 | + - You need fine-grained control over telemetry sources and destinations. |
| 211 | + - Your use case requires integrating observability with niche or custom tools. |
| 212 | + |
| 213 | +**NOTE:** Regardless of the approach taken, it is essential that all AI agent |
| 214 | +frameworks adopt the AI agent framework semantic convention to ensure |
| 215 | +interoperability and consistency in observability data. |
| 216 | + |
| 217 | +## Future of AI agent observability |
| 218 | + |
| 219 | +Looking ahead, AI agent observability will continue to evolve with: |
| 220 | + |
| 221 | +- **More robust semantic conventions** to cover edge cases and emerging AI agent |
| 222 | + frameworks. |
| 223 | +- **A unified AI agent framework semantic convention** to ensure |
| 224 | + interoperability across different frameworks while allowing flexibility for |
| 225 | + vendor-specific extensions. |
| 226 | +- **Continuous improvements to the AI agent semantic convention** to refine the |
| 227 | + initial standard and address new challenges as AI agents evolve. |
| 228 | +- **Improved tooling** for monitoring, debugging, and optimizing AI agents. |
| 229 | +- **Tighter integration with AI model observability** to provide end-to-end |
| 230 | + visibility into AI powered applications. |
| 231 | + |
| 232 | +## Role of OpenTelemetry's GenAI SIG |
| 233 | + |
| 234 | +The |
| 235 | +[GenAI Special Interest Group (SIG) in OpenTelemetry](https://github.com/open-telemetry/community/blob/main/projects/gen-ai.md) |
| 236 | +is actively defining [GenAI semantic conventions](/docs/specs/semconv/gen-ai/) |
| 237 | +that cover key areas such as: |
| 238 | + |
| 239 | +- LLM or model semantic conventions |
| 240 | +- VectorDB semantic conventions |
| 241 | +- AI agent semantic conventions (a critical component within the broader GenAI |
| 242 | + semantic convention) |
| 243 | + |
| 244 | +In addition to conventions, the SIG has also expanded its scope to provide |
| 245 | +instrumentation coverage for agents and models in Python and other languages. As |
| 246 | +AI Agents become increasingly sophisticated, observability will play a |
| 247 | +fundamental role in ensuring their reliability, efficiency, and trustworthiness. |
| 248 | +Establishing a standardized approach to AI Agent observability requires |
| 249 | +collaboration, and we invite contributions from the broader AI community. |
| 250 | + |
| 251 | +We look forward to partnering with different AI agent framework communities to |
| 252 | +establish best practices and refine these standards together. Your insights and |
| 253 | +contributions will help shape the future of AI observability, fostering a more |
| 254 | +transparent and effective AI ecosystem. |
| 255 | + |
| 256 | +Don’t miss this opportunity to help shape the future of industry standards for |
| 257 | +GenAI Observability! Join us on the [CNCF Slack](https://slack.cncf.io) |
| 258 | +`#otel-genai-instrumentation-wg` channel, or by attending a |
| 259 | +[GenAI SIG meeting](https://github.com/open-telemetry/community/blob/main/projects/gen-ai.md#meeting-times). |
0 commit comments