Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Platform-aware autoscale from zero #11961

Open
aleskandro opened this issue Mar 12, 2025 · 5 comments
Open

Platform-aware autoscale from zero #11961

aleskandro opened this issue Mar 12, 2025 · 5 comments
Labels
area/provider/core Issues or PRs related to the core provider kind/feature Categorizes issue or PR as related to a new feature. needs-priority Indicates an issue lacks a `priority/foo` label and requires one. triage/accepted Indicates an issue or PR is ready to be actively worked on.

Comments

@aleskandro
Copy link

What would you like to be added (User Story)?

As a cluster administrator, I want the cluster autoscaler using Cluster API to scale out node groups from zero by considering the architecture of the node groups options to expand when requested by the pods in their node selector or node affinity requirements.

As a cluster administrator, I want the cluster autoscaler using Cluster API to scale out node groups from zero by considering the Operating System of the node groups options to expand when requested by the pods in their node selector or node affinity requirements.

Detailed Description

Today, the Cluster API providers can set the labels capacity annotation to a comma-separated list of key-value pairs representing the labels to expect for a node rendered from a MachineSet/MachineDeployment.

In previous works, the Cluster API has defined a contract to allow infrastructure providers to setup the capacity of a node using the status field of the InfrastructureMachineTemplate. However, the capacity field in the status of a node/InfastrucutreMachineTemplate is a ResourceList consisting of a map that should reflect the resource requests and limits of the pods (map[string]quantity). This is not sufficient for the infrastructure providers to set other information like operating system and architecture of the CPU in that status field.

Therefore, unless users have set the labels capacity annotation with the expected label (e.g., kubernetes.io/arch=arm64), the cluster autoscaler cannot filter out nodes that have a different CPU architecture than the one - possibly - set in the pod's nodeAffinity or nodeSelector.

Anything else you would like to add?

No response

Label(s) to be applied

/kind feature
One or more /area label. See https://github.com/kubernetes-sigs/cluster-api/labels?q=area for the list of labels.

/area provider/core

@k8s-ci-robot k8s-ci-robot added kind/feature Categorizes issue or PR as related to a new feature. area/provider/core Issues or PRs related to the core provider needs-priority Indicates an issue lacks a `priority/foo` label and requires one. needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. labels Mar 12, 2025
@sbueringer
Copy link
Member

cc @elmiko

@elmiko
Copy link
Contributor

elmiko commented Mar 13, 2025

i'm happy to accept the triage for this, i think this is a feature we will need to ensure that architecture specific workloads can be scheduled properly by the cluster autoscaler. i do think there is a little discussion about the API changes that might be needed, but i don't think they are overwhelming.

/triage accepted

@k8s-ci-robot k8s-ci-robot added triage/accepted Indicates an issue or PR is ready to be actively worked on. and removed needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. labels Mar 13, 2025
@chrischdi
Copy link
Member

Just for my understanding @elmiko : does this also then need work in cluster-autoscaluer or is this already done on that side?

@aleskandro
Copy link
Author

Yes, we'd need some small changes on the cluster autoscaler side in line with both the previous cluster api and arch-aware autoscaling from zero works.

@elmiko
Copy link
Contributor

elmiko commented Mar 18, 2025

what Alessandro said =)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/provider/core Issues or PRs related to the core provider kind/feature Categorizes issue or PR as related to a new feature. needs-priority Indicates an issue lacks a `priority/foo` label and requires one. triage/accepted Indicates an issue or PR is ready to be actively worked on.
Projects
None yet
Development

No branches or pull requests

5 participants