-
Notifications
You must be signed in to change notification settings - Fork 302
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Explicitly specify the integrations that are allowed as the type. #4676
base: main
Are you sure you want to change the base?
Explicitly specify the integrations that are allowed as the type. #4676
Conversation
[APPROVALNOTIFIER] This PR is NOT APPROVED This pull-request has been approved by: kannon92 The full list of commands accepted by this bot can be found here.
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
✅ Deploy Preview for kubernetes-sigs-kueue canceled.
|
/hold I think there is one deepcopy fix I gotta do and I want to make sure this passes in CI. |
@@ -245,7 +245,7 @@ func (m *integrationManager) isOwnerIntegrationEnabled(owner *metav1.OwnerRefere | |||
// RegisterIntegration registers a new framework, returns an error when | |||
// attempting to register multiple frameworks with the same name or if a | |||
// mandatory callback is missing. | |||
func RegisterIntegration(name string, cb IntegrationCallbacks) error { | |||
func RegisterIntegration(name configapi.KueueIntegrations, cb IntegrationCallbacks) error { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Am I correct in thinking that every integration framework has to call this?
Could we add a check on the names here to check that the names are Qualified Names so that we can assert in the future that new integrations follow the correct pattern?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We can.
This is not validation code though. I think putting your suggestion here would make more sense.
https://github.com/kubernetes-sigs/kueue/blob/main/pkg/config/validation.go#L171
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's not users I want to find this error, its the developer adding a new integration
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, I think adding some sanity check for developers is fine.
Additionally, in the validation.go
code we could verify that the values supplied by the user are among the registered integrations.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is a bit opague in the validation logic but we do check if the integrations are supported.
const ( | ||
BatchJob KueueIntegrations = "batch/job" | ||
RayJob KueueIntegrations = "ray.io/rayjob" | ||
RayCluster KueueIntegrations = "ray.io/raycluster" | ||
JobSet KueueIntegrations = "jobset.x-k8s.io/jobset" | ||
MPIJob KueueIntegrations = "kubeflow.org/mpijob" | ||
PaddleJob KueueIntegrations = "kubeflow.org/paddlejob" | ||
PyTorchJob KueueIntegrations = "kubeflow.org/pytorchjob" | ||
TFJob KueueIntegrations = "kubeflow.org/tfjob" | ||
XGBoostJob KueueIntegrations = "kubeflow.org/xgboostjob" | ||
AppWrappers KueueIntegrations = "workload.codeflare.dev/appwrapper" | ||
Pod KueueIntegrations = "pod" | ||
Deployment KueueIntegrations = "deployment" | ||
Statefulset KueueIntegrations = "statefulset" | ||
LeaderWorkerSet KueueIntegrations = "leaderworkerset.x-k8s.io/leaderworkerset" | ||
) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What is the objective for these? As #4659 (comment), third-party operator does not want to use Kueue as a library, right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We use the configuration as a library.
So we want to be able to know on each release of Kueue what are the allowed list of integrations?
So how can we upgrade from v0.9 -> v0.10 safely and know what kind of integrations are new/or dropped.
So I think we need to import the kueue configuration library so we know what kind of frameworks are actually allowed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So, do you want to use Kueue as a library, right? Because you import this into your project, a whole of dependencies is added to your project.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, I see some benefit from strong typing like type IntegrationReference string
or type BuiltInIntegrationReference string
.
However, putting them all into one place has pros and cons. Pros is that you can see all integrations in one place (though we maintain them in comments too). Cons is that this is monolitic design. I don't have a strong view, but something feels off.
Still, I like the idea of strong typing as we do for other API types - ClusterQueueReference, TopologyReference, etc.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In that comment, it said that we want to import the kueue api. That is a standard process for building operators.
Kueue imports the k8s api so they can create k8s objects. I imagine users will import the Kueue API if they need to create CQ/RF.
They don't need to import anything other than the APIs which is what our operator does.
I have an operator that will maange the installation of Kueue and I want to create the necessary kueue configurations. We know that typically cluster admins will want to selectively enable/disable certain Kueue integrations. So we want to make it easy via documentation what are the allowed values of the integrations.
I am fine to relax restrictions for external frameworks but built in frameworks need to have some structure.
Right now the list of kueue configurations is a list of strings. We can best effort guess what Kueue will support for these integrations but
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Right, but I'm not sure putting all integration names in the monolitic style is the best practice actually.
For example, in the core k8s, we don't have a file with all supported labels & annotations - they are scattered arcoss api packages.
Similarly, I think there is no single file listing all GVK resources supported by k8s, so we have corev1.SchemeGroupVersion
, or batch.SchemeGroupVersion
.
This approach is more modular.
Again, it is not a strong view, because for example in the core k8s you have a single file with all feature gates. So, I'm on the fence here.
I'm convinced about strong-typing the integration names.
For users, to know supported versions, there are alternatives, for example Kueue could log them all at startup, or we could expose a helper function which exposes the list of registered ones.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For users, to know supported versions, there are alternatives, for example Kueue could log them all at startup, or we could expose a helper function which exposes the list of registered ones.
Most users would probably not have access to this namespace. And most users would probably not be aware to check the kueue controller logs to know this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you help by defining who the "user" is here?
I assume it is a developer of the external controller. So, I think for such a user it does not make much difference between checking Kueue logs or Kueue code for this purpose.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My issue is that it isn’t really obvious in the code what are the supported integrations.
I rely on the comment in the configuration list as the canonical list. that seems error prone and can easily be not updated.
also we aren’t exactly sure what requirements there are on frameworks. Are they a qualified name? Is the domaain requirements for any specific reason?
what happens if someone species Pod instead of pod? It’s really a list of enums that must be those specific values.
a user is a cluster admin installing Kueue with an operator. But someone configur
bddad7c
to
171875f
Compare
171875f
to
96064b4
Compare
@@ -387,6 +387,25 @@ type Integrations struct { | |||
LabelKeysToCopy []string `json:"labelKeysToCopy,omitempty"` | |||
} | |||
|
|||
type IntegrationReference string |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please extract this to a separate PR, as we agree on strong typing, but we are yet not sure about putting all references together.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I can’t really decouple this.
I can give a PR with just this type but it won’t build because we use the configiguretion list all over the repo.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can't we just use in these places the new type instead of string?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
to be more precise, just replace
defaultJobFrameworkName = "batch/job"
with
defaultJobFrameworkName kueue.IntegrationReference = "batch/job"
and so on in other places, not sure why this would not compile
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That is what this PR does.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Any reference to the framework must match the enum?
so if I use “batch/job” I would need to switch to use the strongly typed constant.
That was why I ended up changing all that I needed. I first added the configuration values and then I see that list is passed as a string slice in a few places.
we loop over those values and then I need to switch to use the reference.
The cycle continues.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here is the "minimal" change. #4697
In any place like jobframework that takes a string or is passed in our config list, I did a transformation to make it fit into a string array.
This allows me to avoid changing it all places but it does require me to convert it in a few places so we don't propogate this change everywhere.
/hold In the meanwhile I'm happy to merge part of the PR which is to introduce strong typing by |
What type of PR is this?
/kind cleanup
What this PR does / why we need it:
Introduce an enum for integrations and propagated that usage throughout the codebase.
This can reduce bugs and allow for people to know what kind of strings are expected for built in frameworks.
Which issue(s) this PR fixes:
Fixes #4659
Special notes for your reviewer:
Does this PR introduce a user-facing change?