Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Donation Proposal]: Instrumentation automatic configuration #2580

Open
atoulme opened this issue Feb 19, 2025 · 16 comments
Open

[Donation Proposal]: Instrumentation automatic configuration #2580

atoulme opened this issue Feb 19, 2025 · 16 comments
Assignees
Labels
area/donation Donation Proposal triage:accepted This issue has been accepted and will be worked. triage:tc-inbox

Comments

@atoulme
Copy link
Contributor

atoulme commented Feb 19, 2025

Description

Splunk has built a mechanism that intercepts process invocation on hosts, and adds environment variables to set up OpenTelemetry auto-instrumentation in the language used by the program, such as Java, Node.js, .NET or Python. Similar to how the operator webhook functions, this allows setting the auto-instrumentation automatically on any Linux host.

We support two separate mechanisms to inject instrumentation SDKs:

  • We offer integration via systemd environment variable configuration
  • We offer integration via a /etc/preload.so hook that scans process invocations, intercept them and adds environment variables to them.

Benefits to the OpenTelemetry community

This mechanism can be distributed as a deb or rpm package and help increase the adoption of OpenTelemetry, as it would speed up the deployment of OpenTelemetry instrumentations in Linux hosts.

Reasons for donation

This component has reached stability and Splunk is looking to make it an upstream open source component to mutualize maintenance costs and improve it with more community input.

Repository

https://github.com/signalfx/splunk-otel-collector/tree/main/instrumentation

Existing usage

This component is used in production. You can see the current code under https://github.com/signalfx/splunk-otel-collector/tree/main/instrumentation

The current package contains Splunk-specific versions of the instrumentation SDKs. The current package also handles additional optional settings which are specific to these instrumentation SDKs, such as a profiling feature earlier than the adoption of the profiling signal by the OpenTelemetry project.

We expect to discontinue the usage of these features, remove the code that is related to them, as we offer this code upstream.

Maintenance

The existing team will continue to support the code and manage it moving forward, and is actively soliciting and recruiting from the community additional triagers/approvers/maintainers so this project is not tied to Splunk. This proposal acts as a signal to gauge interest and recruit such committers from the community.

Licenses

Apache 2.0 License

Trademarks

No trademarks related to this work are registered.

Other notes

No response

@mx-psi mx-psi added the area/donation Donation Proposal label Feb 24, 2025
@austinlparker austinlparker self-assigned this Mar 12, 2025
@austinlparker austinlparker added the triage:accepted This issue has been accepted and will be worked. label Mar 12, 2025
@austinlparker
Copy link
Member

Thank you for this donation proposal! I'll be looking at this from the GC side.

@austinlparker
Copy link
Member

Thanks for the donation proposal. After review, I have a few initial questions:

  • What is the desired extensibility story for this component? While I understand that this is, more or less, stable work it does not support the entire ecosystem of automatic instrumentation supported by, say, the Operator (e.g., nginx/httpd, golang, et. al.)
  • Would this component also be able to handle deploying the system-level profiler?
  • What would the expected level of effort be to support the declarative configuration format?
  • What is the intended installation story for this component? Is the goal here to provide this as an 'installer' for non-k8s deployments of the Collector?

cc: @open-telemetry/collector-maintainers @open-telemetry/operator-maintainers @open-telemetry/docs-maintainers @open-telemetry/profiling-maintainers for awareness and feedback on this donation as well.

@atoulme
Copy link
Contributor Author

atoulme commented Mar 12, 2025

  • What is the desired extensibility story for this component? While I understand that this is, more or less, stable work it does not support the entire ecosystem of automatic instrumentation supported by, say, the Operator (e.g., nginx/httpd, golang, et. al.)

The method can be extended to other languages as long as they follow the specification for SDK environment variables, allowing to be configured through env vars. (https://opentelemetry.io/docs/specs/otel/configuration/sdk-environment-variables/)

  • Would this component also be able to handle deploying the system-level profiler?

I don't know what that is, sorry. Any profiling offered by instrumentation SDKs is supported via the SDK injection.

  • What would the expected level of effort be to support the declarative configuration format?

I don't understand what declarative configuration format means here. We offer a simple .conf file for setting up a few values.
Do you want something else?

  • What is the intended installation story for this component? Is the goal here to provide this as an 'installer' for non-k8s deployments of the Collector?

An installer, and the whole package offered as a Linux package such as rpm/deb/tar.gz that can be installed on a host.

@jpkrohling
Copy link
Member

We discussed this yesterday during the TC/GC call and I understand the motivations for having this outside of the Operator and Collector. However, after looking at the code, I'm not so sure this should really be a new SIG, and as such, it could be a PR against a specific repository. The Collector would be the natural instinct, but perhaps this could be under the Config SIG?

In any case, there's a similar proposal as part of the Operator (open-telemetry/opentelemetry-operator#2375, linked by @atoulme), and I'd love to have that sorted out before this accepted, given the overlap between them.

@mmanciop
Copy link

mmanciop commented Mar 13, 2025

An LD_PRELOAD injector written in a way that is self-contained (i.e., no dependency to any other shared library, and especially not libc) is going to work effectively everywhere: hosts, containers, even serverless as long as the runtimes themselves are dynamically linked against something that uses a dynamically-linked getenv function to look up relevant env vars.

I could see this be part of the operator (hence my original PR) because of the existing mechanism of delivering instrumentations via mutating webhooks, as well as part of Linux packages (deb, rpm) that include the instrumentations and provide a native way to instrument ootb one or more languages (I did a PoC at Canonical a few years back involving the OTel Java agent and it was promising, but we did not go ahead with it; Splunk seems to be doing the same).

Incidentally, I am giving a talk at Operator Day about LD_PRELOAD and OTel, and that is also a good venue to discuss IRL.

@mmanciop
Copy link

Also I should mention I have a PoC going on on slow burn for a Zig-based injector that provides a way to use a higher-level language than C, with access to a standard library that is not dependent on LibC (and hence avoiding cross-libc issues, i.e., when your injector links GNU LibC, the binary musl, and it all explodes in Michael Bay style).

It is pretty promising: can do I/O (debug logging, missing in the current C based injector in my PR against the operator; but also checking that instrumentation files exist), and by reading ELF metadata of the binary that is linked, I am almost at a point where the injector can pick the right build for .NET or Python, which need different OTel SDKs and configs based on whether the injected process links GNU LibC or Musl. This is not necessary on OS-level packages (deb, rpm), because it's virtually always the case that dynamically linked processes use the Linux distro's libc. In containers, however, is entirely necessary: even in the same pod, you can have some containers based on GNU LibC and others on Alpine.

@austinlparker
Copy link
Member

We discussed this yesterday during the TC/GC call and I understand the motivations for having this outside of the Operator and Collector. However, after looking at the code, I'm not so sure this should really be a new SIG, and as such, it could be a PR against a specific repository. The Collector would be the natural instinct, but perhaps this could be under the Config SIG?

In any case, there's a similar proposal as part of the Operator (open-telemetry/opentelemetry-operator#2375, linked by @atoulme), and I'd love to have that sorted out before this accepted, given the overlap between them.

A new SIG might not be appropriate, but I'm not sure if the operator is the right place? Isn't part of the point here to be able to deploy into non-k8s environments?

If there's appetite on the part of the operator maintainers to expand the scope of what they're doing into non-k8s deployment environments then that would be an option. cc @open-telemetry/operator-maintainers

@atoulme
Copy link
Contributor Author

atoulme commented Mar 13, 2025

This is not operator specific as it exists outside Kubernetes. This is also not specific with the collector: you can instrument applications without the collector, you can send to any OTLP endpoint.

Why are you reluctant to form a separate SIG for this? looks like you got two maintainers already between me and @mmanciop :)

@austinlparker
Copy link
Member

This is not operator specific as it exists outside Kubernetes. This is also not specific with the collector: you can instrument applications without the collector, you can send to any OTLP endpoint.

Why are you reluctant to form a separate SIG for this? looks like you got two maintainers already between me and @mmanciop :)

It's less that we don't expect there to be maintainers for a new SIG, it's that we're trying to balance the amount of SIGs that the GC/TC are sponsoring.

@austinlparker
Copy link
Member

To be a bit more explicit (and to answer some of @atoulme questions above), here's some things I'm considering around this proposal.

First, for better or for worse, many users expect OpenTelemetry to be an effective product, independent of it being an effective framework. The existence of the Collector itself is proof alone of this, as are our instrumentation agents. However, productizing elements of the project is non-trivial, and can have less desirable outcomes long-term. For example, the better our 'agent'/zero-code instrumentation story is, the less pressure there may be on maintainers of libraries and frameworks to natively integrate OpenTelemetry. That isn't a huge concern with this, as there's reasons to use SDK injection independent of instrumentation library injection, but it's something we have to keep in mind.

To that end, though, if users expect OpenTelemetry to provide a product-like experience, we must also consider how that product experience is built, maintained, and positioned. We need to evaluate how users will receive support for that experience. If we provide too many ways to do 'the same thing', it can lead to confusion from not only end-users, but also other maintainers. Thus, it makes sense to consider where semantic overlap exists in terms of goals today, and evaluate if this donation would make more sense as an independent component or as part of an existing SIG.

To your other questions...

  • Declarative config is a standard file-based configuration format for OpenTelemetry SDKs. I believe this tool would need to not only handle passing in configuration files, but ideally could be configured using this format itself.
  • The system profiler is an eBPF-based, system-level profiler that implements the OpenTelemetry Profiling signal. Ideally, this instrumentation configuration wrapper could be extended to offer installation support for the profiler as it matures.

With all that in mind, it feels like there's a slightly bigger story here than just 'part of the operator', but it also feels like there's interest in adapting some of the work here (or alternatives) to enhance the operator? It kinda feels like there's some sort of 'installer' that wraps up a lot of this stuff that we could build...

@atoulme
Copy link
Contributor Author

atoulme commented Mar 14, 2025

Thank you for your detailed answer.

To be a bit more explicit (and to answer some of @atoulme questions above), here's some things I'm considering around this proposal.

First, for better or for worse, many users expect OpenTelemetry to be an effective product, independent of it being an effective framework. The existence of the Collector itself is proof alone of this, as are our instrumentation agents. However, productizing elements of the project is non-trivial, and can have less desirable outcomes long-term. For example, the better our 'agent'/zero-code instrumentation story is, the less pressure there may be on maintainers of libraries and frameworks to natively integrate OpenTelemetry. That isn't a huge concern with this, as there's reasons to use SDK injection independent of instrumentation library injection, but it's something we have to keep in mind.

I trust the TC and GC will steer us towards a bright future where OpenTelemetry is everywhere.

To that end, though, if users expect OpenTelemetry to provide a product-like experience, we must also consider how that product experience is built, maintained, and positioned. We need to evaluate how users will receive support for that experience. If we provide too many ways to do 'the same thing', it can lead to confusion from not only end-users, but also other maintainers. Thus, it makes sense to consider where semantic overlap exists in terms of goals today, and evaluate if this donation would make more sense as an independent component or as part of an existing SIG.

To your other questions...

  • Declarative config is a standard file-based configuration format for OpenTelemetry SDKs. I believe this tool would need to not only handle passing in configuration files, but ideally could be configured using this format itself.

Oh ok, you're talking about https://opentelemetry.io/docs/specs/otel/configuration/sdk-environment-variables/#declarative-configuration and it can work for our use case as we'd instruct the SDKs to read the file using the environment variable OTEL_EXPERIMENTAL_CONFIG_FILE.

  • The system profiler is an eBPF-based, system-level profiler that implements the OpenTelemetry Profiling signal. Ideally, this instrumentation configuration wrapper could be extended to offer installation support for the profiler as it matures.

We could learn from the experience of delivering this package. It is likely you'd want it to install with its own deb/RPM package. We can make that happen.

With all that in mind, it feels like there's a slightly bigger story here than just 'part of the operator', but it also feels like there's interest in adapting some of the work here (or alternatives) to enhance the operator? It kinda feels like there's some sort of 'installer' that wraps up a lot of this stuff that we could build...

It feels like we're toying with the same idea that created https://github.com/open-telemetry/opentelemetry-collector-releases/ where a dedicated github project exists to package the collector. Do we have such a project for instrumentations? Have instrumentations figured how to package as rpm/deb/docker images?

@austinlparker
Copy link
Member

It feels like we're toying with the same idea that created https://github.com/open-telemetry/opentelemetry-collector-releases/ where a dedicated github project exists to package the collector. Do we have such a project for instrumentations? Have instrumentations figured how to package as rpm/deb/docker images?

I'm not sure, but yes, I can see there being value in some sort of generic 'OpenTelemetry Installer' repository.

@atoulme
Copy link
Contributor Author

atoulme commented Mar 16, 2025

If we create such a repository, we can make all SIG approvers and maintainers approvers and maintainers of the repository. This is similar to how we provisioned https://github.com/open-telemetry/opentelemetry-collector-releases/

@svrnm
Copy link
Member

svrnm commented Mar 17, 2025

Having it in a shared repo co-owned by multiple SIGs is probably a good starting point. We do this for other parts of our project as well (docs, weaver, ...), it works until it doesn't and then we can grow a proper SIG around it.

I'd like to suggest a name for this project instead of "Instrumentation automatic configuration" which I personally find misleading due to the declarative configuration project ongoing: OpenTelemetry Instrumentor, which pairs with OpenTelemetry Collector nicely. WDYT?

Would it be possible to add a "standalone mode", which could supersede a project like otelify.sh?

As a former sales engineer, I am excited about this, since unifying the packaging/installation/roll out process for non-code-based instrumentations/automatic instrumentations would be a huge win for our project: it would drastically accelerate the way people can get started with OTel, especially wheny they are in a non-k8s environment, or have reasons for not yet wanting/being able to go with the Operator.

@mmanciop
Copy link

I'd like to suggest a name for this project instead of "Instrumentation automatic configuration" which I personally find misleading due to the declarative configuration project ongoing: OpenTelemetry Instrumentor, which pairs with OpenTelemetry Collector nicely. WDYT?

I think it more as "OpenTelemetry Injector", "Instrumentor" feels more like the build-time approach.

Would it be possible to add a "standalone mode", which could supersede a project like otelify.sh?

Definitely possible, it's relatively simple packaging, although the automated wiring of the LD_PRELOAD env var on Linux Hosts tends to be bespoke to single distros IIRC.

As a former sales engineer, I am excited about this, since unifying the packaging/installation/roll out process for non-code-based instrumentations/automatic instrumentations would be a huge win for our project: it would drastically accelerate the way people can get started with OTel, especially wheny they are in a non-k8s environment, or have reasons for not yet wanting/being able to go with the Operator.

The Dash0 operator espouses this philosophy, and it has eager adoption in our user base way beyond what we were going for: we see often very advanced engineering organizations going the full-automated route, although I had expected many of them to prefer a more controlled approach.

@danielgblanco
Copy link
Contributor

danielgblanco commented Mar 18, 2025

I think it would be a good idea to live in a separate repo, co-owned by multiple SIGs as mentioned above. However, related to the following comment

If we provide too many ways to do 'the same thing', it can lead to confusion from not only end-users, but also other maintainers.

I very much agree with this. One of the bits of feedback we keep getting is that there are too many ways of "configuring OTel" (which in itself can be interpreted in many ways depending who you speak to, i.e. is it the Collector? The SDK? Instrumentations?). Adding a new way, like this one, can certainly make things easier for a particular group of users, but also add more confusion into the mix for others.

My hope is that the work of the Configuration SIG, after stabilising declarative config, could extend into providing a cohesive approach from an end-user perspective. This does not necessarily mean owning all components that implement it, but rather defining boundaries between them. So that we say, as an example, "OTel recommends declarative config. If you deploy in K8s we recommend using the OTel Operator with the config to be injected living in a defined ConfigMap, if you run on bare metal/VMs we recommend the use of OTel Instrumentor with config in /etc/otel, for other use cases..."

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/donation Donation Proposal triage:accepted This issue has been accepted and will be worked. triage:tc-inbox
Projects
Status: No status
Development

No branches or pull requests

7 participants