Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

GH-45788: [C++][Acero] Fix data race in aggregate node #45789

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

zanmato1984
Copy link
Contributor

@zanmato1984 zanmato1984 commented Mar 14, 2025

Rationale for this change

Data race described in #45788 .

What changes are included in this PR?

Put the racing member segmenter_values in thread local state.

Are these changes tested?

Yes. UT added.

Are there any user-facing changes?

None.

@zanmato1984 zanmato1984 requested a review from westonpace as a code owner March 14, 2025 12:02
Copy link

⚠️ GitHub issue #45788 has been automatically assigned in GitHub to PR creator.

@zanmato1984
Copy link
Contributor Author

Hi @pitrou , would you like to take a look? Thanks.

@kou
Copy link
Member

kou commented Mar 15, 2025

@github-actions crossbow submit emscripten

Copy link

Revision: 9d8ddbb

Submitted crossbow builds: ursacomputing/crossbow @ actions-73916ce3ac

Task Status
test-conda-python-emscripten GitHub Actions
test-ubuntu-22.04-cpp-emscripten GitHub Actions

Copy link
Member

@mapleFU mapleFU left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we also makes Segmenter::GetSegments a const fn? (Or I can do it in a separate patch )

@github-actions github-actions bot added awaiting committer review Awaiting committer review and removed awaiting review Awaiting review labels Mar 15, 2025
@zanmato1984
Copy link
Contributor Author

Can we also makes Segmenter::GetSegments a const fn? (Or I can do it in a separate patch )

I guess not. Both SimpleKeySegmenter::GetSegments and AnyKeysSegmenter::GetSegments store intermediate state (that is used for getting the segments from the subsequent batch):

SaveKeyData(key_data);

save_group_id_ = group_ids[batch.length - 1];

@@ -312,7 +312,7 @@ Result<ExecBatch> GroupByNode::Finalize() {
segment_key_field_ids_.size());

// Segment keys come first
PlaceFields(out_data, 0, segmenter_values_);
PlaceFields(out_data, 0, state->segmenter_values);
Copy link
Member

@pitrou pitrou Mar 17, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So, the Finalize step only considers the segmenter values for state[0]? I'm not sure I understand why.

RETURN_NOT_OK(
ExtractSegmenterValues(&segmenter_values_, exec_batch, segment_field_ids_));
RETURN_NOT_OK(ExtractSegmenterValues(&GetLocalState()->segmenter_values, exec_batch,
segment_field_ids_));
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OutputResult seems non-thread safe, how can InputReceived be called from several threads at once? Am I missing something?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants