Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Internal] ClientRetryPolicy: Adds Code to Apply Partition Level Override When a Requested Cancellation Token Expires #5063

Open
wants to merge 2 commits into
base: master
Choose a base branch
from

Conversation

kundadebdatta
Copy link
Member

@kundadebdatta kundadebdatta commented Mar 11, 2025

Pull Request Template

Description

Background:

During one of the backend drills, it was identified that when the following quorum loss condition is met, and the user provides a cancellation token, SDK honors the token, however doesn't apply the partition level fail over for the offending partition:

  • Quorum loss injected with the quorum replicas (3 out of 4 replicas are down).
  • The primary replica is specifically down.
  • A cancellation token with 5 seconds of timeout value is provided.

Observation:

  • SDK doesn't apply the partition level override and the subsequent write requests fails on the current faulty region/ partition.

Fix:

This PR is fixing the above behavior to apply partition level override, when a cancellation token gets expired.

Type of change

Please delete options that are not relevant.

  • Bug fix (non-breaking change which fixes an issue)

Closing issues

To automatically close an issue: closes #5060

Copy link

@github-actions github-actions bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All good!

@kundadebdatta kundadebdatta changed the title Code changes to apply partition level override on ct expiry. [PPAF] - Adds Code to Apply Partition Level Override When a Requested Cancellation Token Expires Mar 11, 2025
@kundadebdatta kundadebdatta self-assigned this Mar 11, 2025
@kundadebdatta kundadebdatta added auto-merge Enables automation to merge PRs PerPartitionAutomaticFailover labels Mar 11, 2025
@kundadebdatta kundadebdatta changed the title [PPAF] - Adds Code to Apply Partition Level Override When a Requested Cancellation Token Expires [Internal] ClientRetryPolicy: Adds Code to Apply Partition Level Override When a Requested Cancellation Token Expires Mar 11, 2025
this.failoverRetryCount,
this.locationEndpoint?.ToString() ?? string.Empty);

if (this.isPertitionLevelFailoverEnabled)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is wrong - we did say that mark down shoudl only ever happen when you have a succesful operation against teh new write region, correct? Otherwise random timeouts could result in marking down the region?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thinking loud on this part, hedging on two regions when PPAF enabled for writes as alternative?

this.documentServiceRequest);
}

return ShouldRetryResult.NoRetry();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Flow through

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
auto-merge Enables automation to merge PRs PerPartitionAutomaticFailover
Projects
None yet
4 participants