🐛 fix: efs & elb upgrade e2e tests #5418

richardcase · 2025-03-19T14:06:42Z

What type of PR is this?

/kind failing-test

What this PR does / why we need it:

The EFS e2e test was breaking for 2 reasons:

Running out if disk space on the control plane nodes. It only had 8Gb so this has been increased to 16gb
The workload being deployed to test EFS was using centos with has been discontinued for a long time now. So changed to use Ubuntu

The classic ELB upgrade test was failing because on upgrade of CAPA, the CAPA controller manager was restarted. From the logs i could see that the AWSCluster was no being reconciled and thats because the "old" and "new" versions were the same basic ion the predicate logic in SetupWithManager. Changed SetupWithManager to be more consistent with other CAPI infra providers. Also it was failing because of the wait introduced by the new paused.EnsurePausedCondition functionality but this is fixed in #5425

Which issue(s) this PR fixes (optional, in fixes #<issue number>(, fixes #<issue_number>, ...) format, will close the issue(s) when PR gets merged):
Fixes #

Special notes for your reviewer:

Checklist:

Release note:

Fix the EFS & classic elb e2e tests.

richardcase · 2025-03-19T14:07:25Z

/test pull-cluster-api-provider-aws-e2e

test/e2e/suites/unmanaged/helpers_test.go

damdo · 2025-03-19T14:33:43Z

/test pull-cluster-api-provider-aws-e2e

theobarberbany · 2025-03-19T17:01:48Z

/test pull-cluster-api-provider-aws-e2e

richardcase · 2025-03-19T17:29:12Z

/test pull-cluster-api-provider-aws-e2e

richardcase · 2025-03-19T17:32:00Z

/test pull-cluster-api-provider-aws-e2e

richardcase · 2025-03-19T17:37:18Z

/test pull-cluster-api-provider-aws-e2e

richardcase · 2025-03-19T18:12:05Z

/test pull-cluster-api-provider-aws-e2e

damdo · 2025-03-19T20:32:50Z

/test pull-cluster-api-provider-aws-e2e

richardcase · 2025-03-20T12:51:28Z

/test pull-cluster-api-provider-aws-e2e

test/e2e/shared/aws_helpers.go

richardcase · 2025-03-20T14:53:55Z

/test pull-cluster-api-provider-aws-e2e

nrb · 2025-03-20T15:53:04Z

@richardcase The release-2.7 periodics are failing on the same efs issue, so we should backport at least the YAML changes in this PR. Do you think we should also backport the changes to the manager setup?

richardcase · 2025-03-20T15:55:22Z

@richardcase The release-2.7 periodics are failing on the same efs issue, so we should backport at least the YAML changes in this PR. Do you think we should also backport the changes to the manager setup?

@nrb - we'll probably have to do it all. This issue started as just the EFS problem and then expanded to the classic elb issue. I can just cherry pick the EFS changes first and then if we see the other issue we can then cherry pick that commit. Will create the PR for release-2.7 soon.

nrb · 2025-03-20T16:17:19Z

we'll probably have to do it all

Yeah, we can just do a cherry-pick then

richardcase · 2025-03-20T17:27:05Z

/test pull-cluster-api-provider-aws-e2e

The EFS e2e test was breaking for 2 reasons: 1. Running out if disk space on the control plane nodes. It only had 8Gb so this has been increased to 16gb 2.The workload being deployed to test EFS was using centos with has been discontinued for a long time now. So changed to use Ubuntu Also small updates to logging for the ELB test. Signed-off-by: Richard Case <[email protected]>

AWSCluster was not reconciling when starting after an upgrade. It had old logic to compare versions and not do anything. We want to reconcile even if there are no changes to the AWSCluster as the ELB logic has changed. Also, there may be other changes like this in future. Change the SetupWithManager logic to be more like the standard we see with other infrastructure providers. Signed-off-by: Richard Case <[email protected]>

richardcase · 2025-03-21T12:27:11Z

/test pull-cluster-api-provider-aws-e2e

damdo · 2025-03-21T13:42:36Z

main.go

@@ -128,6 +128,7 @@ var (
 // +kubebuilder:rbac:groups=authorization.k8s.io,resources=subjectaccessreviews,verbs=create

 func main() {
+	setupLog.Info("starting cluster-api-provider-aws", "version", version.Get().String())


Nice addition, thanks!

damdo · 2025-03-21T13:46:19Z

controllers/awscluster_controller.go

-		WithEventFilter(
-			predicate.Funcs{
-				// Avoid reconciling if the event triggering the reconciliation is related to incremental status updates
-				// for AWSCluster resources only
-				UpdateFunc: func(e event.UpdateEvent) bool {
-					if e.ObjectOld.GetObjectKind().GroupVersionKind().Kind != "AWSCluster" {
-						return true
-					}
-
-					oldCluster := e.ObjectOld.(*infrav1.AWSCluster).DeepCopy()
-					newCluster := e.ObjectNew.(*infrav1.AWSCluster).DeepCopy()
-
-					oldCluster.Status = infrav1.AWSClusterStatus{}
-					newCluster.Status = infrav1.AWSClusterStatus{}
-
-					oldCluster.ObjectMeta.ResourceVersion = ""
-					newCluster.ObjectMeta.ResourceVersion = ""
-
-					return !cmp.Equal(oldCluster, newCluster)
-				},
-			},
-		).


Do we reckon that removing this will have any unrelated side effects?
For example, downstream we do externally manage the AWSCluster, will this create many reconciliations for the CAPA infracluster controller then?

I don't think there will be too may side affets. It will result in more Reconcile runs potentially, but:

Our original code is at odds with how other infrastructure providers work

Keeping this would cause issues with the Paused condition being added and would still cause that initial delay in reconciliation

If we kept this, our change for ELB wouldn't be called until there was a "Spec" change.

Ok thanks, I think this is fine to remove for now.
If we notice any issues we can open a follow-up thread to discuss countermeasures.
Thanks!

Yeah, we could always create a custom predicate

damdo

/lgtm

/assign @nrb

nrb · 2025-03-21T14:35:55Z

/approve

Thanks Richard!

k8s-ci-robot · 2025-03-21T14:36:03Z

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: nrb

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

~~OWNERS~~ [nrb]

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

richardcase · 2025-03-21T14:54:49Z

/cherrypick release-2.7

k8s-infra-cherrypick-robot · 2025-03-21T14:55:32Z

@richardcase: #5418 failed to apply on top of branch "release-2.7":

Applying: fix: efs e2e test breaking
Using index info to reconstruct a base tree...
M	test/e2e/data/e2e_conf.yaml
M	test/e2e/shared/aws_helpers.go
M	test/e2e/suites/unmanaged/helpers_test.go
Falling back to patching base and 3-way merge...
Auto-merging test/e2e/suites/unmanaged/helpers_test.go
Auto-merging test/e2e/shared/aws_helpers.go
CONFLICT (content): Merge conflict in test/e2e/shared/aws_helpers.go
Auto-merging test/e2e/data/e2e_conf.yaml
CONFLICT (content): Merge conflict in test/e2e/data/e2e_conf.yaml
error: Failed to merge in the changes.
hint: Use 'git am --show-current-patch=diff' to see the failed patch
hint: When you have resolved this problem, run "git am --continue".
hint: If you prefer to skip this patch, run "git am --skip" instead.
hint: To restore the original branch and stop patching, run "git am --abort".
hint: Disable this message with "git config advice.mergeConflict false"
Patch failed at 0001 fix: efs e2e test breaking

In response to this:

/cherrypick release-2.7

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

k8s-ci-robot added release-note Denotes a PR that will be considered when it comes time to generate release notes. kind/failing-test Categorizes issue or PR as related to a consistently or frequently failing test. labels Mar 19, 2025

k8s-ci-robot requested review from AndiDog and fiunchinho March 19, 2025 14:06

k8s-ci-robot added cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. needs-priority size/S Denotes a PR that changes 10-29 lines, ignoring generated files. labels Mar 19, 2025

damdo reviewed Mar 19, 2025

View reviewed changes

test/e2e/suites/unmanaged/helpers_test.go Outdated Show resolved Hide resolved

richardcase force-pushed the fix_efs_e2e_test branch from e1d17ec to 82f81a7 Compare March 19, 2025 14:33

richardcase force-pushed the fix_efs_e2e_test branch from 82f81a7 to f876247 Compare March 19, 2025 17:28

k8s-ci-robot added size/L Denotes a PR that changes 100-499 lines, ignoring generated files. and removed size/S Denotes a PR that changes 10-29 lines, ignoring generated files. labels Mar 20, 2025

richardcase changed the title ~~🐛 fix: efs e2e test breaking~~ 🐛 fix: efs & elb upgrade e2e tests Mar 20, 2025

nrb reviewed Mar 20, 2025

View reviewed changes

test/e2e/shared/aws_helpers.go Outdated Show resolved Hide resolved

richardcase force-pushed the fix_efs_e2e_test branch from 8a4b67c to 56be642 Compare March 20, 2025 14:53

nrb mentioned this pull request Mar 20, 2025

🌱 Retry boskos account checkouts #5422

Merged

2 tasks

richardcase changed the title ~~🐛 fix: efs & elb upgrade e2e tests~~ WIP: 🐛 fix: efs & elb upgrade e2e tests Mar 20, 2025

k8s-ci-robot added the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Mar 20, 2025

richardcase added 2 commits March 21, 2025 12:19

richardcase force-pushed the fix_efs_e2e_test branch from bbda053 to eb3ff6e Compare March 21, 2025 12:24

richardcase changed the title ~~WIP: 🐛 fix: efs & elb upgrade e2e tests~~ 🐛 fix: efs & elb upgrade e2e tests Mar 21, 2025

k8s-ci-robot removed the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Mar 21, 2025

damdo reviewed Mar 21, 2025

View reviewed changes

k8s-ci-robot assigned nrb and damdo Mar 21, 2025

k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Mar 21, 2025

k8s-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Mar 21, 2025

k8s-ci-robot merged commit 6c93889 into kubernetes-sigs:main Mar 21, 2025
24 checks passed

nrb mentioned this pull request Mar 21, 2025

[release-2.7] 🐛 fix: efs e2e test breaking #5430

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

🐛 fix: efs & elb upgrade e2e tests #5418

🐛 fix: efs & elb upgrade e2e tests #5418

richardcase commented Mar 19, 2025 •

edited

Loading

richardcase commented Mar 19, 2025

damdo commented Mar 19, 2025

theobarberbany commented Mar 19, 2025

richardcase commented Mar 19, 2025

richardcase commented Mar 19, 2025

richardcase commented Mar 19, 2025

richardcase commented Mar 19, 2025

damdo commented Mar 19, 2025

richardcase commented Mar 20, 2025

richardcase commented Mar 20, 2025

nrb commented Mar 20, 2025

richardcase commented Mar 20, 2025

nrb commented Mar 20, 2025

richardcase commented Mar 20, 2025

richardcase commented Mar 21, 2025

damdo Mar 21, 2025

damdo Mar 21, 2025

richardcase Mar 21, 2025 •

edited

Loading

damdo Mar 21, 2025

richardcase Mar 21, 2025

damdo left a comment

nrb commented Mar 21, 2025

k8s-ci-robot commented Mar 21, 2025

richardcase commented Mar 21, 2025

k8s-infra-cherrypick-robot commented Mar 21, 2025

🐛 fix: efs & elb upgrade e2e tests #5418

🐛 fix: efs & elb upgrade e2e tests #5418

Conversation

richardcase commented Mar 19, 2025 • edited Loading

richardcase commented Mar 19, 2025

damdo commented Mar 19, 2025

theobarberbany commented Mar 19, 2025

richardcase commented Mar 19, 2025

richardcase commented Mar 19, 2025

richardcase commented Mar 19, 2025

richardcase commented Mar 19, 2025

damdo commented Mar 19, 2025

richardcase commented Mar 20, 2025

richardcase commented Mar 20, 2025

nrb commented Mar 20, 2025

richardcase commented Mar 20, 2025

nrb commented Mar 20, 2025

richardcase commented Mar 20, 2025

richardcase commented Mar 21, 2025

damdo Mar 21, 2025

Choose a reason for hiding this comment

damdo Mar 21, 2025

Choose a reason for hiding this comment

richardcase Mar 21, 2025 • edited Loading

Choose a reason for hiding this comment

damdo Mar 21, 2025

Choose a reason for hiding this comment

richardcase Mar 21, 2025

Choose a reason for hiding this comment

damdo left a comment

Choose a reason for hiding this comment

nrb commented Mar 21, 2025

k8s-ci-robot commented Mar 21, 2025

richardcase commented Mar 21, 2025

k8s-infra-cherrypick-robot commented Mar 21, 2025

richardcase commented Mar 19, 2025 •

edited

Loading

richardcase Mar 21, 2025 •

edited

Loading