Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(otlp): Infer span description for db spans #4541

Open
wants to merge 9 commits into
base: master
Choose a base branch
from
Open
1 change: 1 addition & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -21,6 +21,7 @@
- Adopt new `AsyncPool` for the `EnvelopeProcessorService` and `StoreService`. ([#4520](https://github.com/getsentry/relay/pull/4520))
- Update mapping of OTLP spans to Sentry spans in the experimental OTL traces endpoint. ([#4505](https://github.com/getsentry/relay/pull/4505))
- Expose metrics for the `AsyncPool`. ([#4538](https://github.com/getsentry/relay/pull/4538))
- Infer span `description` for spans with `category` set to `db` . ([#4541](https://github.com/getsentry/relay/pull/4541))

## 25.2.0

Expand Down
9 changes: 9 additions & 0 deletions relay-event-schema/src/protocol/span.rs
Original file line number Diff line number Diff line change
Expand Up @@ -400,6 +400,13 @@ pub struct SpanData {
#[metastructure(field = "db.operation")]
pub db_operation: Annotated<Value>,

/// The database query being executed.
///
/// E.g. SELECT * FROM wuser_table where username = ?; SET mykey ?
/// See [OpenTelemetry docs for a list of well-known identifiers](https://opentelemetry.io/docs/specs/semconv/database/database-spans/#common-attributes).
#[metastructure(field = "db.query.text", legacy_alias = "db.statement")]
pub db_query_text: Annotated<String>,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does the SQL scrubber need to run over this field, also if the field gets copied into the description, is it being scrubbed?

We should have an integration test for this.

Copy link
Member

@mjq mjq Mar 6, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The span description that we receive isn't scrubbed or otherwise modified. So writing the unmodified db.query.text value into description is expected.

(The exception is PII scrubbing, although description isn't a default field so that relies on user configuration. That happens later in span processing so we should be covered there.)

A scrubbed copy of description is made in extract_tags and stored in sentry_tags' description field. This scrubbed version is used by Sentry's Insights modules. That scrubbed description extraction does not run for the inferred descriptions in this PR, as the scrubbing happens in extract_tags.

That is fine, though: supporting Insights is a later milestone of the span first/OTLP project. It isn't expected that Insights works with OTLP. If it was simple enough to generate the scrubbed descriptions I'd consider it, but the scrubbing also currently depends on op - that has all got to be replaced too before it's usable with OTLP. We'll be changing how that works when we work on that milestone.

I agree about an integration test though, we'll add one.


/// An identifier for the database management system (DBMS) product being used.
///
/// See [OpenTelemetry docs for a list of well-known identifiers](https://github.com/open-telemetry/opentelemetry-specification/blob/main/specification/trace/semantic_conventions/database.md#notes-and-well-known-identifiers-for-dbsystem).
Expand Down Expand Up @@ -1034,6 +1041,7 @@ mod tests {
let data = r#"{
"foo": 2,
"bar": "3",
"db.query.text": "SELECT * FROM table",
"db.system": "mysql",
"code.filepath": "task.py",
"code.lineno": 123,
Expand Down Expand Up @@ -1078,6 +1086,7 @@ mod tests {
"ns",
),
db_operation: ~,
db_query_text: "SELECT * FROM table",
db_system: String(
"mysql",
),
Expand Down
1 change: 1 addition & 0 deletions relay-event-schema/src/protocol/span/convert.rs
Original file line number Diff line number Diff line change
Expand Up @@ -170,6 +170,7 @@ mod tests {
code_function: ~,
code_namespace: ~,
db_operation: ~,
db_query_text: ~,
db_system: ~,
db_collection_name: ~,
environment: "prod",
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -123,6 +123,7 @@ expression: "(&event.value().unwrap().spans, metrics.project_metrics)"
code_function: ~,
code_namespace: ~,
db_operation: ~,
db_query_text: ~,
db_system: ~,
db_collection_name: ~,
environment: ~,
Expand Down Expand Up @@ -645,6 +646,7 @@ expression: "(&event.value().unwrap().spans, metrics.project_metrics)"
code_function: ~,
code_namespace: ~,
db_operation: ~,
db_query_text: ~,
db_system: ~,
db_collection_name: ~,
environment: ~,
Expand Down Expand Up @@ -802,6 +804,7 @@ expression: "(&event.value().unwrap().spans, metrics.project_metrics)"
code_function: ~,
code_namespace: ~,
db_operation: ~,
db_query_text: ~,
db_system: ~,
db_collection_name: ~,
environment: ~,
Expand Down Expand Up @@ -1041,6 +1044,7 @@ expression: "(&event.value().unwrap().spans, metrics.project_metrics)"
code_function: ~,
code_namespace: ~,
db_operation: ~,
db_query_text: ~,
db_system: ~,
db_collection_name: ~,
environment: ~,
Expand Down Expand Up @@ -1198,6 +1202,7 @@ expression: "(&event.value().unwrap().spans, metrics.project_metrics)"
code_function: ~,
code_namespace: ~,
db_operation: ~,
db_query_text: ~,
db_system: ~,
db_collection_name: ~,
environment: ~,
Expand Down
36 changes: 36 additions & 0 deletions relay-server/src/services/processor/span/processing.rs
Original file line number Diff line number Diff line change
Expand Up @@ -644,6 +644,10 @@ fn normalize(
);
span.sentry_tags = Annotated::new(tags);

if span.description.value().is_empty() {
span.description = infer_span_description(span).into();
}

normalize_performance_score(span, performance_score);
if let Some(model_costs_config) = ai_model_costs {
extract_ai_measurements(span, model_costs_config);
Expand Down Expand Up @@ -803,6 +807,14 @@ fn validate(span: &mut Annotated<Span>) -> Result<(), ValidationError> {
Ok(())
}

fn infer_span_description(span: &Span) -> Option<String> {
let category = span.sentry_tags.value()?.category.value()?;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this not use the original place where this tag is extracted from?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This tag is typically inferred from either the span op or a combination of its attributes, rather than being passed in directly. This inference happens in tag_extraction::extract_tags, with the result being written into sentry_tags and retrieved here.

This code to set the span description could be moved into extract_tags, but it would mean making its span parameter a mutable reference and mutating span in there (setting its description), which feels a bit out of scope of what I'd expect extract_tags to do.

An alternative might be to make category a field on Span, set it earlier in the process, and use that everywhere we want to query category instead of digging into sentry_tags. (It would then be the responsibility of Snuba to write it into the attributes dictionary I suppose).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems weird to refer back to a tag to make the decision, but if that is currently the best way to do it, I am fine with it, especially since we have strongly typed tags now.

Jan and I discussed some changes how we can improve the tag extraction and span enrichment we do. In regards to needing something similar/mirrored in Sentry for a while and starting to align for span streaming. So we hopefully can improve this case with that change as well.

match category.as_str() {
"db" => span.data.value()?.db_query_text.value()?.to_owned().into(),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We already have a promote_span_data_fields function which seems to be a perfect fit for this logic.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

promote_span_data_fields seems to be moving properties out of data and into top level span attributes:

https://github.com/getsentry/relay/blob/master/relay-server/src/services/processor/span/processing.rs#L689-L712

Does it make sense to move this kind of logic, which is conditional on other span fields and also doesn't move data, into the same spot?

The other challenge is that we need to know the category, which isn't calculated yet at the time promote_span_data_fields is run. (See other comments).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does it make sense to move this kind of logic, which is conditional on other span fields and also doesn't move data, into the same spot?

I think so.


Would be great if you can add this as a comment (if you end up staying with what we currently have):

The other challenge is that we need to know the category, which isn't calculated yet at the time

It should be obvious (because it references the tags), but a lot of this extraction logic depends on implicit orders, so maybe doesn't hurt to call it out and explain why it runs late.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I commented the block where this happens in normalization, as well as in the function doc. I couldn't find a way to make the dependency more explicit in code under the current structure (passing category into infer_span_description, extracting the category inference, etc).

You mentioned discussing updates to tag extraction + enrichment with Jan - my understanding is that sentry_tags as a concept is likely to disappear as well, as EAP already collapses them into standard attributes. Once Relay does the same, I think this whole method should be able to be ordered more naturally so the dependencies are clear. 🤞

_ => None,
}
}

#[cfg(test)]
mod tests {
use std::collections::BTreeMap;
Expand Down Expand Up @@ -1431,4 +1443,28 @@ mod tests {
&EventId("480ffcc911174ade9106b40ffbd822f5".parse().unwrap())
);
}

#[test]
fn infers_db_span_description() {
let mut span = Annotated::from_json(
r#"{
"start_timestamp": 0,
"timestamp": 1,
"trace_id": "922dda2462ea4ac2b6a4b339bee90863",
"span_id": "922dda2462ea4ac2",
"data": {
"db.query.text": "SELECT * FROM users WHERE id = 1",
"sentry.category": "db"
}
}"#,
)
.unwrap();

normalize(&mut span, normalize_config()).unwrap();

assert_eq!(
get_value!(span.description!).as_str(),
"SELECT * FROM users WHERE id = 1"
);
}
}
4 changes: 1 addition & 3 deletions relay-spans/src/span.rs
Original file line number Diff line number Diff line change
Expand Up @@ -123,9 +123,6 @@ pub fn otel_to_sentry_span(otel_span: OtelSpan) -> EventSpan {
}
key if key.starts_with("db") => {
op = op.or(Some("db".to_string()));
if key == "db.statement" {
description = description.or_else(|| otel_value_to_string(value));
}
}
"http.method" | "http.request.method" => {
let http_op = match kind {
Expand Down Expand Up @@ -642,6 +639,7 @@ mod tests {
code_function: ~,
code_namespace: ~,
db_operation: ~,
db_query_text: ~,
db_system: ~,
db_collection_name: ~,
environment: "prod",
Expand Down
Loading