Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

VPA e2e tests are broken due to tests writing to deprecated GCR repository #7910

Open
raywainman opened this issue Mar 21, 2025 · 7 comments · May be fixed by #7913
Open

VPA e2e tests are broken due to tests writing to deprecated GCR repository #7910

raywainman opened this issue Mar 21, 2025 · 7 comments · May be fixed by #7913
Labels
area/infra/gcp Issues or PRs related to Kubernetes GCP infrastructure kind/bug Categorizes issue or PR as related to a bug. priority/important-soon Must be staffed and worked on either currently, or very soon, ideally in time for the next release. sig/k8s-infra Categorizes an issue or PR as relevant to SIG K8s Infra.
Milestone

Comments

@raywainman
Copy link
Contributor

raywainman commented Mar 21, 2025

See canonical issue in Autoscaler repo: kubernetes/autoscaler#7946.

Today, the VPA e2e tests compile and build the VPA component and write the image to the GCR repo in the test project.

This week, GCR writes started getting blocked as part of the GCR deprecation:

failed to mount blob k8s-infra-e2e-boskos-154/vpa-recommender-amd64@sha256:d858cbc252ade14879807ff8dbc3043a26bbdb92087da98cda831ee040b172b3 to gcr.io/k8s-infra-e2e-boskos-154/vpa-recommender:latest: Head "https://gcr.io/v2/k8s-infra-e2e-boskos-154/vpa-recommender/blobs/sha256:d858cbc252ade14879807ff8dbc3043a26bbdb92087da98cda831ee040b172b3": unknown: Container Registry is deprecated and shutting down, please use the auto migration tool to migrate to Artifact Registry (gcloud artifacts docker upgrade migrate --projects='k8s-infra-e2e-boskos-154'). For more details see: https://cloud.google.com/artifact-registry/docs/transition/auto-migrate-gcr-ar

The logic that writes to GCR is captured here in the test script:

https://github.com/kubernetes/autoscaler/blob/master/vertical-pod-autoscaler/hack/deploy-for-e2e.sh#L62-L74

Discussed this a bit with @BenTheElder and @upodroid in #k8s-infra Slack chat.

I believe there are a few options:

  1. Migrate all the Boskos test projects to use Artifact Registry using the standard GCR->AR process. Note that this would have to be done for ~O(100) projects.

  2. Have VPA start writing to a NEW artificat registry repository. Note that this requires the test runner to have permissions to create the repository and write to it (which is a bit different from GCR).

Both of the solutions above also need some proper garbage collection to ensure images don't leak and drive up storage costs. All of these images are temporary and can be cleaned up quickly after use.


I don't have access to the Boskos projects but in debugging this via our test scripts, I discovered that there are other components ALSO writing images to GCR within the projects (not sure if we are aware of this yet):

$ gcloud container images list
NAME
gcr.io/k8s-infra-e2e-boskos-024/cloud-controller-manager
gcr.io/k8s-infra-e2e-boskos-024/gcp-filestore-csi-driver
gcr.io/k8s-infra-e2e-boskos-024/gcp-persistent-disk-csi-driver
gcr.io/k8s-infra-e2e-boskos-024/glbc
gcr.io/k8s-infra-e2e-boskos-024/local-volume-provisioner
gcr.io/k8s-infra-e2e-boskos-024/vpa-admission-controller
gcr.io/k8s-infra-e2e-boskos-024/vpa-admission-controller-amd64
gcr.io/k8s-infra-e2e-boskos-024/vpa-recommender
gcr.io/k8s-infra-e2e-boskos-024/vpa-recommender-amd64
gcr.io/k8s-infra-e2e-boskos-024/vpa-updater
gcr.io/k8s-infra-e2e-boskos-024/vpa-updater-amd64
Only listing images in gcr.io/k8s-infra-e2e-boskos-024. Use --repository to list images in other repositories.
@raywainman raywainman added the kind/bug Categorizes issue or PR as related to a bug. label Mar 21, 2025
@k8s-ci-robot k8s-ci-robot added the needs-sig Indicates an issue or PR lacks a `sig/foo` label and requires one. label Mar 21, 2025
@raywainman
Copy link
Contributor Author

/sig k8s-infra

@k8s-ci-robot k8s-ci-robot added sig/k8s-infra Categorizes an issue or PR as relevant to SIG K8s Infra. and removed needs-sig Indicates an issue or PR lacks a `sig/foo` label and requires one. labels Mar 21, 2025
@BenTheElder
Copy link
Member

/transfer k8s.io

  1. Migrate all the Boskos test projects to use Artifact Registry using the standard GCR->AR process. Note that this would have to be done for ~O(100) projects.

I don't think we should take this approach, because it will be disjoint if we need to expand the pool.

  1. Have VPA start writing to a NEW artificat registry repository. Note that this requires the test runner to have permissions to create the repository and write to it (which is a bit different from GCR).

I don't think we should let the tests spin up arbitrary registries.

I think we should do:
3. Have GCE e2e's that need to write to a registry write to ARs that are pre-created in all the GCP staging projects, with a garbage collection policy enforced at creation.

Then we don't need to over-permission the CI accounts, we don't need to coordinate cleanup in boskos, and the setup is uniform for any new projects that don't exist yet should we need to expand any of the project pools.

https://github.com/kubernetes/k8s.io/blob/main/infra/gcp/bash/prow/ensure-e2e-projects.sh is probably the place to do this

@k8s-ci-robot k8s-ci-robot transferred this issue from kubernetes/test-infra Mar 21, 2025
@ameukam
Copy link
Member

ameukam commented Mar 21, 2025

cc @upodroid

@ameukam
Copy link
Member

ameukam commented Mar 21, 2025

/priority important-soon
/milestone v1.33
/area infra/gcp

Part of: #1343

@k8s-ci-robot k8s-ci-robot added this to the v1.33 milestone Mar 21, 2025
@k8s-ci-robot k8s-ci-robot added priority/important-soon Must be staffed and worked on either currently, or very soon, ideally in time for the next release. area/infra/gcp Issues or PRs related to Kubernetes GCP infrastructure labels Mar 21, 2025
@ameukam ameukam moved this to Backlog in SIG K8S Infra Mar 21, 2025
@raywainman
Copy link
Contributor Author

Pre-created repositories with baked-in garbage collection policies SGTM.

@upodroid upodroid linked a pull request Mar 22, 2025 that will close this issue
@upodroid
Copy link
Member

This has been fixed for the k8s-infra-e2e-boskos-{000-160} GCP projects.

@raywainman
Copy link
Contributor Author

Tests are passing again, thank you!

https://testgrid.k8s.io/sig-autoscaling-vpa#autoscaling-vpa-full

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/infra/gcp Issues or PRs related to Kubernetes GCP infrastructure kind/bug Categorizes issue or PR as related to a bug. priority/important-soon Must be staffed and worked on either currently, or very soon, ideally in time for the next release. sig/k8s-infra Categorizes an issue or PR as relevant to SIG K8s Infra.
Projects
Status: Backlog
Development

Successfully merging a pull request may close this issue.

5 participants