-
Notifications
You must be signed in to change notification settings - Fork 920
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Additional Thread Metrics #13483
Comments
Could you clarify whether the metrics |
Yeah that is what I meant, but now see that there's a count metric emitted per state, in that case just CPU and User time seems to be a gap |
hi @akats7! what attributes would you propose on |
Hey @trask, I'd have to dig a bit into the internals of the runtime metric modules, but one approach could be to just support this for JMX |
@trask can we add cpu time to runtime metrics through the thread MBean using ManagementFactory. Then we rely on mbean operations to get the time values. I get there may be cardinality concerns since thread name/pool name would have to be an attribute, so it can be disabled by default. |
@SylvainJuge @robsunday I'm hesitant for people to add new JMX metrics in the middle of your convergence effort, so would like to defer to you here |
Thanks @trask! I do want to point out that these are rather important metrics. We've had a lot of internal requests for this from users who are migrating from vendor products that supported this out of the box. |
To expand a bit on the "convergence effort" context, we are currently trying to add JVM metrics in a YAML descriptor with #13392, this YAML will NOT be directly used by instrumentation but will in the future be used by jmx-scraper which is a CLI program replacing JMX Gatherer, but using the same JMX implementation as instrumentation (and thus inheriting it's yaml support). What we are currently focusing on for JVM metrics in YAML, is the ability to capture them in a way that is compliant with semantic conventions, which is already done by the The I think we can add new metrics even if the current work is still in-progress, I would suggest to do that in a few steps:
As a temporary work-around, if you are able to capture those with YAML configuration, you should be able to provide a YAML file for them. However this is not a great OOTB experience, could easily break if the metric definition changes when adding it to semconv. |
Hey @SylvainJuge, thanks for the context. So part of the issue is that I believe the jmx-scraper is only able to scrape attributes and not execute operations which would be required for these metrics. In regards to the experimentation, I've already done this with the JMX Gatherer since it allows you to directly interact with the mbeans if using a custom script. However since the gatherer instruments also only allow the use of attributes, I had to rely on transformation closures to overwrite other mbeans which is not ideal. Also, if possible part of this ask is to be able to move away from the remote approach, I might be missing something but is there a reason its preferable to interact with a JMX server vs just scraping it directly since the javaagent runs in the same JVM? |
Ideally, we should not force users to deploy an instrumentation agent to capture runtime metrics if those could be obtained externally with JMX scraper or gatherer. However, we already have the case of some metrics that can't be captured without instrumentation and explicit code as they can only be captured from within the JVM, either because they require advanced JMX features or rely on JFR events. So this is something we can do already, but it adds more constraints on the users, for example the JVM metrics are not exactly the same if using Java 17 or Java 8, which could lead to user confusion or missed expectations. If I understand it correctly, those metrics would be more in the "runtime-telemetry only" and would be very unlikely supported through YAML due to needing some post-processing, is that correct ? Also, could you try to elaborate a bit on their definitions/attributes and from which MBean attribute would they be captured ? |
Yep, thats exactly right, for example to get cpu_time we'd likely need get the AllThreadIds attribute and then call getThreadCpuTime and getThreadInfo for attributes such as name. And I understand that this utility should still exist for users who want these metrics but don't need the other functionality of the agent. But the situation that we find ourselves in is that the majority of our teams are that are already leveraging the agent for instrumentation also have a need for these metrics, so it would be ideal to not have to configure a jmx server and an additional scraping process when the agent is already in place. |
I agree with you @akats7 , this is probably a use-case for which we could either document (or provide a dedicated config option) when only runtime metrics (or JMX metrics) needs to be captured and sent to OTLP, without any instrumentation nor tracing involved For JMX metrics that are defined in yaml, this could help providing details on JVM rumtime metrics while still allowing to capture metrics defined in yaml, for example if you run a Kafka broker or cluster it would be relevant to capture both by adding the agent to the JVM. |
So just to clarify, is there a path forward to add these as experimental out of the box jvm metrics. I'd be happy to contribute this |
If those new metrics are only captured through code, their implementation is part of In order to add/change things to semconv, we need to have at least an experimental implementation to validate what is being added in semconv is correct and technically achievable, that creates a kind of chicken-egg problem and you have to work on both sides at the same time. I would suggest to do the following:
|
@SylvainJuge That sounds like a plan to me, thanks! |
Here's a little prototype of how this might end up looking in the Some things to note:
|
This looks nice, should we make the attributes optional and only recommended ?
|
Could be recommended, or even opt-in, which would allow capture from JMX. But for what its worth, this data is probably most useful with both of these attributes. Without the attributes, I would imagine its not terribly far off from From a configuration standpoint, maybe the registration accepts an optional Good configuration options seems like the answer to controversy over how we group thread names into templates.
I haven't found a way to access a thread group name for a given thread id, but if available this would be a good thing to look into. |
I think any out of the box naming options would need to be pretty extensive for it to be valuable. Not exactly sure what is deemed to be acceptable cardinality but for threads tied to common frameworks/tools maybe we can get away with just using thread names and escaping the numerical values, and anything that does not match just set to custom. |
I am not sure if it's a good idea, but for grouping threads, I wonder if we could maybe instrument all By default we could provide the configuration through a Map<String,String> where the key is ThreadFactory FQN and the map value the otel attribute value to use. This would allow to add a dedicated value for each FQN of thread factory. Also, finding implementation of In addition to that, for some known set FQN of Ideally this would be something extensible where each instrumentation module could register its own known FQN or naming strategies to prevent having to maintain one potentially humongous list of known patterns that no one will ever attempt to remove things from. Also, this would be more "labelling threads" rather than grouping them by pattern, which would then require some specification and description of the individual values. One advantage of the patterns is that it's quite easy to know which threads they refer to. |
I bet that would be a pretty good heuristic.
I think that could work, but the downside is its only available in the context of the java agent. Other JVM runtime metrics are available as standalone library instrumentation. Could be something where the standalone library instrumentation does something more simplistic, and the java agent improves upon the strategy using techniques only available with bytecode instrumentation. |
Is your feature request related to a problem? Please describe.
The current scope of thread metrics appears to be limited to thread count, there are other thread based metrics that are rather critical, such as thread cpu time and metrics based on thread state.
Describe the solution you'd like
Add additional thread metrics for:
jvm.thread.cpu_time
jvm.thread.user_time
Describe alternatives you've considered
Using the JMX Gatherer
Additional context
No response
The text was updated successfully, but these errors were encountered: