Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allowed values of integrations #4659

Open
kannon92 opened this issue Mar 17, 2025 · 24 comments · May be fixed by #4676
Open

Allowed values of integrations #4659

kannon92 opened this issue Mar 17, 2025 · 24 comments · May be fixed by #4676
Labels
kind/cleanup Categorizes issue or PR as related to cleaning up code, process, or technical debt.

Comments

@kannon92
Copy link
Contributor

What would you like to be cleaned:

Trying to find the allowed values for Integrations is actually quite difficult.

The comments seem to have most of the fields but I think it is missing a few integrations.

Why is this needed:

I can't find a list of supported integrations in the code and the website documentation / config documentation is incomplete on integrations.

For example, is MXJob an allowed integration even though it is not specified in the comment on the configuration object?

@kannon92 kannon92 added the kind/cleanup Categorizes issue or PR as related to cleaning up code, process, or technical debt. label Mar 17, 2025
@mimowo
Copy link
Contributor

mimowo commented Mar 17, 2025

Actually, MXJob is probably dropped for 0.11 as kubeflow does not support it any longer: #1429

@mimowo
Copy link
Contributor

mimowo commented Mar 17, 2025

Found it: #4077

AFAIK all other integrations should be covered in comments.

@kannon92
Copy link
Contributor Author

Would you be up for making this integration validation more explicit in the configuration?

a list of allowed enums?

I would like to know on each release of Kueue what values are supported and how to validate that.

But it seems the comments are best effort as I can imagine we may have an issue where someone forgets to update that comment but adds an integration.

@kannon92
Copy link
Contributor Author

ah, I saw some docs around MXJob so I was confused.

@mimowo
Copy link
Contributor

mimowo commented Mar 17, 2025

Would you be up for making this integration validation more explicit in the configuration?

I'm not sure how this would work TBH, maybe if you provide more details.

I think what you are looking for mostly is single place which defines all the supported frameworks as their names are currently scattered: https://github.com/search?q=repo%3Akubernetes-sigs%2Fkueue%20FrameworkName&type=code

So, maybe just a single file Which lists all of them like a list of enums indeed.

Alternatively, we could log the list of supported frameworks on startup of Kueue, and mark (say with word enabled) those which are enabled.

@kannon92
Copy link
Contributor Author

kannon92 commented Mar 17, 2025

https://github.com/openshift/api/blob/32c2fac8206e78350d19c0fa975a345fdab3de83/operator/v1alpha1/types_kueue.go#L79

So I'm building an operator to configuration the installation of Kueue based on requested frameworks.

I am mainly going on good faith that those comments were the allowed frameworks. But I know they can change over time so I'm not entirely sure where to look to see if a new framework was added.

@kannon92
Copy link
Contributor Author

I tried to look at the validation code but it was opaque to find what the allowed values are.

So in my operator we wanted to make it a list of enums.

I could propose a similar idea to our configuration doc that at least makes it more explicit that these integrations are really a list of enums that must match a certain form.

@JoelSpeed
Copy link

Looking at the list of integrations so far, it appears that the validation that would suit is a qualified name, which is a label optionally prefixed by a subdomain and separated with a forward slash

You could use the validation linked above when validating the registration to make sure all of the frameworks follow the same pattern going forward, and then anyone building an abstraction on Kueue can be at least semi confident in their ability to validate the names without having to maintain a list of valid frameworks, the user gets at least some early feedback

@mimowo
Copy link
Contributor

mimowo commented Mar 18, 2025

The validation for the format sounds reasonable. It can provide a quick feedback loop when configuring kueue.

However, there are exceptions for the K8s built in types.

OTaoH now I think kueue knows all the framework names so could validate based on the registered integrations, as this list is non extensible by users.

@tenzen-y
Copy link
Member

The validation for the format sounds reasonable. It can provide a quick feedback loop when configuring kueue.

However, there are exceptions for the K8s built in types.

OTaoH now I think kueue knows all the framework names so could validate based on the registered integrations, as this list is non extensible by users.

+1
We can add such validations to

func (m *integrationManager) setupControllers(ctx context.Context, mgr ctrl.Manager, log logr.Logger, opts ...Option) error {

@JoelSpeed
Copy link

However, there are exceptions for the K8s built in types.

Based on

// - "batch/job"
// - "kubeflow.org/mpijob"
// - "ray.io/rayjob"
// - "ray.io/raycluster"
// - "jobset.x-k8s.io/jobset"
// - "kubeflow.org/paddlejob"
// - "kubeflow.org/pytorchjob"
// - "kubeflow.org/tfjob"
// - "kubeflow.org/xgboostjob"
// - "workload.codeflare.dev/appwrapper"
// - "pod"
// - "deployment" (requires enabling pod integration)
// - "statefulset" (requires enabling pod integration)
// - "leaderworkerset.x-k8s.io/leaderworkerset" (requires enabling pod integration)
, I see no exceptions. The built-ins do not have the optional prefix, but they would still validate correctly as a qualified name

@mimowo
Copy link
Contributor

mimowo commented Mar 18, 2025

The built-ins do not have the optional prefix, but they would still validate correctly as a qualified name

Yes, I meant exceptions that don't have the prefix, but you already said "optional". I somehow missed that. In any case, I think Kueue could validate the list more precisely as it knows all the registered integrations.

@JoelSpeed
Copy link

In any case, I think Kueue could validate the list more precisely as it knows all the registered integrations.

I agree. I still think it would be useful to set a standard for the names. Other projects building APIs on top of Kueue, exposing some of this configuration to their users, will want to be able to validate as early as possible that the configuration is valid, to the best of their ability.

Validating an enum is easy for Kueue, as the code is integrated here. Maintaining an enum for a third party API is harder. I would expect third party APIs to validate a pattern "this looks like it could be correct" rather than necessarily validating "this matches my pre-defined list", else there's toil and an API change every time Kueue supports a new integration, and the third party maintainer may not realise that their enum list has become stale

@mimowo
Copy link
Contributor

mimowo commented Mar 18, 2025

By third party you mean an extended Kueue fork or something else? It would be good to be on the same page when designing this.

Actually, the last idea is not enum, but a check if every integration on the list is present among the registered integrations with the framework.RegisterIntegration call, example. So, this would not require any extra maintaincence on the fork owners, but still I don't want to make any guarantees on this mechanism, and developers who fork Kueue to extend it are "on their own".

@JoelSpeed
Copy link

Specifically here I'm talking about someone building an operator to run Kueue. Operators are a common pattern across the Kube ecosystem, and often they build APIs that allow configuration of the operand. I'm not talking about forking Kueue.

From the end user experience perspective.

  • User installs Kueue Operator
  • User creates KueueConfiguration.mykueueoperator.example.com custom resource
  • In the spec of the KueueConfiguration, the user configures a list of Frameworks they would like enabled
  • The operator converts their desired configuration into the right input for Kueue and creates the deployment/webhooks/etc

It's this API which I'm talking about in my previous comment

@tenzen-y
Copy link
Member

Kueue does not know which integrations are enabled before starting. So, it's challenging to prepare a fixed enum list for all available integrations. Hence, I would propose exposing the util function to expose the registered integrations.

@mimowo
Copy link
Contributor

mimowo commented Mar 18, 2025

This would work, but then one needs to make sure they compile the operator against exactly the same version as used at runtime.

@tenzen-y
Copy link
Member

This would work, but then one needs to make sure they compile the operator against exactly the same version as used at runtime.

I assumed that the operator uses Kueue as a Go library.

@mimowo
Copy link
Contributor

mimowo commented Mar 18, 2025

Right, but it still imposes extra work on the admin to make sure the Kueue library version is the same as used at runtime. If they diverge there might be failures when the library claims something is not supported, but it is supported at runtime in fact.

@JoelSpeed
Copy link

I would suggest avoiding the operator requiring Kueue as a library apart from for particular API types (which should have minimal imports), as importing large parts of the codebase can lock dependencies and make it difficult for the operator developer. eg they want to move their deps up to a new version of controller-runtime, but Kueue hasn't cut a release with that version yet, so they can't

@mimowo
Copy link
Contributor

mimowo commented Mar 18, 2025

Ok, but still I think Kueue, when running, could validate the integrations list exactly, based on the registered ones, as suggested in #4659 (comment).

Now, when it comes to the external controller, options I see:

  1. as suggested by Yuki to use Kueue as a library, but it has drawbacks
  2. add the format validation in the 3rd-party controller itself

I would be welcome the PR to Kueue for strict validation, but would probably be leaning to (2.) for the 3rd party controller.

@tenzen-y
Copy link
Member

I would suggest avoiding the operator requiring Kueue as a library apart from for particular API types (which should have minimal imports), as importing large parts of the codebase can lock dependencies and make it difficult for the operator developer. eg they want to move their deps up to a new version of controller-runtime, but Kueue hasn't cut a release with that version yet, so they can't

if we double manage the Enum list in this repository, the operator need to use Kueue as a library since they need to obtain the Enum list from Kueue codes. Or do you indicate the HTTP endpoint else to obtain the integrations list?

@JoelSpeed
Copy link

I agree that 2 is the approach for a third party controller. I think the important part is that Kueue should enforce the restriction on newly registered integrations, to ensure that if a third party controller adds the qualified name based validation (for immediate feedback), that future integrations continue to meet the same format

The main goal here is not to provide perfect feedback, that comes with a lot of toil, but to weed out obviously incorrect issues, eg Pod instead of pod

@kannon92
Copy link
Contributor Author

So I opened up #4676 as an option.

It ended up being kinda of a large change to enforce the use of these integrations as they are used in a lot of places.

I think my PR makes it clear but I'm open if we want to reduce the scope to only be a quick validation.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/cleanup Categorizes issue or PR as related to cleaning up code, process, or technical debt.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants