-
Notifications
You must be signed in to change notification settings - Fork 594
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
🐛 fix: efs & elb upgrade e2e tests #5418
🐛 fix: efs & elb upgrade e2e tests #5418
Conversation
/test pull-cluster-api-provider-aws-e2e |
e1d17ec
to
82f81a7
Compare
/test pull-cluster-api-provider-aws-e2e |
1 similar comment
/test pull-cluster-api-provider-aws-e2e |
82f81a7
to
f876247
Compare
/test pull-cluster-api-provider-aws-e2e |
4 similar comments
/test pull-cluster-api-provider-aws-e2e |
/test pull-cluster-api-provider-aws-e2e |
/test pull-cluster-api-provider-aws-e2e |
/test pull-cluster-api-provider-aws-e2e |
/test pull-cluster-api-provider-aws-e2e |
8a4b67c
to
56be642
Compare
/test pull-cluster-api-provider-aws-e2e |
@richardcase The release-2.7 periodics are failing on the same efs issue, so we should backport at least the YAML changes in this PR. Do you think we should also backport the changes to the manager setup? |
@nrb - we'll probably have to do it all. This issue started as just the EFS problem and then expanded to the classic elb issue. I can just cherry pick the EFS changes first and then if we see the other issue we can then cherry pick that commit. Will create the PR for release-2.7 soon. |
Yeah, we can just do a cherry-pick then |
/test pull-cluster-api-provider-aws-e2e |
The EFS e2e test was breaking for 2 reasons: 1. Running out if disk space on the control plane nodes. It only had 8Gb so this has been increased to 16gb 2.The workload being deployed to test EFS was using centos with has been discontinued for a long time now. So changed to use Ubuntu Also small updates to logging for the ELB test. Signed-off-by: Richard Case <[email protected]>
AWSCluster was not reconciling when starting after an upgrade. It had old logic to compare versions and not do anything. We want to reconcile even if there are no changes to the AWSCluster as the ELB logic has changed. Also, there may be other changes like this in future. Change the SetupWithManager logic to be more like the standard we see with other infrastructure providers. Signed-off-by: Richard Case <[email protected]>
bbda053
to
eb3ff6e
Compare
/test pull-cluster-api-provider-aws-e2e |
@@ -128,6 +128,7 @@ var ( | |||
// +kubebuilder:rbac:groups=authorization.k8s.io,resources=subjectaccessreviews,verbs=create | |||
|
|||
func main() { | |||
setupLog.Info("starting cluster-api-provider-aws", "version", version.Get().String()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice addition, thanks!
WithEventFilter( | ||
predicate.Funcs{ | ||
// Avoid reconciling if the event triggering the reconciliation is related to incremental status updates | ||
// for AWSCluster resources only | ||
UpdateFunc: func(e event.UpdateEvent) bool { | ||
if e.ObjectOld.GetObjectKind().GroupVersionKind().Kind != "AWSCluster" { | ||
return true | ||
} | ||
|
||
oldCluster := e.ObjectOld.(*infrav1.AWSCluster).DeepCopy() | ||
newCluster := e.ObjectNew.(*infrav1.AWSCluster).DeepCopy() | ||
|
||
oldCluster.Status = infrav1.AWSClusterStatus{} | ||
newCluster.Status = infrav1.AWSClusterStatus{} | ||
|
||
oldCluster.ObjectMeta.ResourceVersion = "" | ||
newCluster.ObjectMeta.ResourceVersion = "" | ||
|
||
return !cmp.Equal(oldCluster, newCluster) | ||
}, | ||
}, | ||
). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we reckon that removing this will have any unrelated side effects?
For example, downstream we do externally manage the AWSCluster, will this create many reconciliations for the CAPA infracluster controller then?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think there will be too may side affets. It will result in more Reconcile runs potentially, but:
- Our original code is at odds with how other infrastructure providers work
- Keeping this would cause issues with the Paused condition being added and would still cause that initial delay in reconciliation
- If we kept this, our change for ELB wouldn't be called until there was a "Spec" change.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok thanks, I think this is fine to remove for now.
If we notice any issues we can open a follow-up thread to discuss countermeasures.
Thanks!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, we could always create a custom predicate
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
/lgtm
/assign @nrb
/approve Thanks Richard! |
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: nrb The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
/cherrypick release-2.7 |
@richardcase: #5418 failed to apply on top of branch "release-2.7":
In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. |
What type of PR is this?
/kind failing-test
What this PR does / why we need it:
The EFS e2e test was breaking for 2 reasons:
The classic ELB upgrade test was failing because on upgrade of CAPA, the CAPA controller manager was restarted. From the logs i could see that the AWSCluster was no being reconciled and thats because the "old" and "new" versions were the same basic ion the predicate logic in SetupWithManager. Changed SetupWithManager to be more consistent with other CAPI infra providers. Also it was failing because of the wait introduced by the new
paused.EnsurePausedCondition
functionality but this is fixed in #5425Which issue(s) this PR fixes (optional, in
fixes #<issue number>(, fixes #<issue_number>, ...)
format, will close the issue(s) when PR gets merged):Fixes #
Special notes for your reviewer:
Checklist:
Release note: