-
Notifications
You must be signed in to change notification settings - Fork 3.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
GH-45788: [C++][Acero] Fix data race in aggregate node #45789
base: main
Are you sure you want to change the base?
Changes from all commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change | ||||||
---|---|---|---|---|---|---|---|---|
|
@@ -240,8 +240,8 @@ Status ScalarAggregateNode::InputReceived(ExecNode* input, ExecBatch batch) { | |||||||
// We add segment to the current segment group aggregation | ||||||||
auto exec_batch = full_batch.Slice(segment.offset, segment.length); | ||||||||
RETURN_NOT_OK(DoConsume(ExecSpan(exec_batch), thread_index)); | ||||||||
RETURN_NOT_OK( | ||||||||
ExtractSegmenterValues(&segmenter_values_, exec_batch, segment_field_ids_)); | ||||||||
RETURN_NOT_OK(ExtractSegmenterValues(&GetLocalState()->segmenter_values, exec_batch, | ||||||||
segment_field_ids_)); | ||||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
True.
[1] arrow/cpp/src/arrow/acero/scalar_aggregate_node.cc Lines 178 to 180 in fc0862a
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Ah, so the data race occurred in the non-segmented case? It's weird that we have to change the segmenting state to thread-local to fix that :) There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Yes. The race happens for the trivial segmenter, which essentially does nothing, but concurrently clears the It is weird and I may even withdraw this fix. Please wait for my answer for the other comment (it's long and I'm still writing). Thank you. |
||||||||
|
||||||||
// If the segment closes the current segment group, we can output segment group | ||||||||
// aggregation. | ||||||||
|
@@ -292,7 +292,7 @@ Status ScalarAggregateNode::OutputResult(bool is_last) { | |||||||
batch.values.resize(kernels_.size() + segment_field_ids_.size()); | ||||||||
|
||||||||
// First, insert segment keys | ||||||||
PlaceFields(batch, /*base=*/0, segmenter_values_); | ||||||||
PlaceFields(batch, /*base=*/0, GetLocalState()->segmenter_values); | ||||||||
|
||||||||
// Followed by aggregate values | ||||||||
std::size_t base = segment_field_ids_.size(); | ||||||||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So, the
Finalize
step only considers the segmenter values forstate[0]
? I'm not sure I understand why.