-
Notifications
You must be signed in to change notification settings - Fork 535
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
DNM: OCPNODE-3023: Kueue operator api review #2222
base: master
Are you sure you want to change the base?
Conversation
@kannon92: This pull request references OCPNODE-3023 which is a valid jira issue. Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the story to target the "4.19.0" version, but no target version was set. In response to this: Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository. |
Hello @kannon92! Some important instructions when contributing to openshift/api: |
[APPROVALNOTIFIER] This PR is NOT APPROVED This pull-request has been approved by: kannon92 The full list of commands accepted by this bot can be found here.
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
@kannon92: This pull request references OCPNODE-3023 which is a valid jira issue. Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the story to target the "4.19.0" version, but no target version was set. In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository. |
243b816
to
c2d2859
Compare
operator/v1alpha1/types_kueue.go
Outdated
// - "statefulset" (requires enabling pod integration) | ||
// - "leaderworkerset.x-k8s.io/leaderworkerset" (requires enabling pod integration) | ||
// +kubebuilder:validation:MaxItems=14 | ||
// +kubebuilder:validation:items:MaxLength=64 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I suspect this is too small, a valid qualified name has a DNS name as the prefix which can be up to 253 chars alone, I believe the next part after the slash can be up to 63 chars.
So 253 + 1 + 63 = 317 in the worst case IIUC
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think this is valid qualified name.
The options are given above and the greatest value is leaderworkerset.x-k8s.io/leaderworkerset. That is only 43 characters so went with a nice power of 2 variable for some wiggle room.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What happens if upstream kueue adds support for an integration that uses a fully qualified name longer than 64 characters? Are there any conventions that kueue defines for framework names?
If we only ever want to support a defined list of frameworks (that we will update as needed), should the allowed values be defined by an enum?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should push for a standard on these names upstream if they are arbitrary, they look like qualified names so perhaps we can push for that validation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'll try!
We could do a patch to make the integration take an allowed list of enums. But a quick glance in Kueue this is not a minor fix.
I was going to work on that.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The validation is really a list of allowed frameworks so I don't think we need more validation here.
It should only be an allowed list which we already have.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Only got about a quarter through on this pass. Going to circle back around tomorrow
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One thing I'd like to see is more descriptive GoDoc for users. I linked a good guideline for GoDocs in one of my comments. I'd recommend using that guideline and going one-by-one through all your fields and updating the GoDoc with those guidelines in mind.
operator/v1alpha1/types_kueue.go
Outdated
// - "statefulset" (requires enabling pod integration) | ||
// - "leaderworkerset.x-k8s.io/leaderworkerset" (requires enabling pod integration) | ||
// +kubebuilder:validation:MaxItems=14 | ||
// +kubebuilder:validation:items:MaxLength=64 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What happens if upstream kueue adds support for an integration that uses a fully qualified name longer than 64 characters? Are there any conventions that kueue defines for framework names?
If we only ever want to support a defined list of frameworks (that we will update as needed), should the allowed values be defined by an enum?
operator/v1alpha1/types_kueue.go
Outdated
// This is required and must have at least one element. | ||
// The frameworks are jobs that Kueue will manage. | ||
// +required | ||
Frameworks []KueueIntegrations `json:"frameworks"` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why frameworks? I guess that's an upstream term? Do they use that in their product docs?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There is not really a lot of docs around that.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The developer docs do mention JobFrameworks.
Hello, We decided to call a sync to discuss some options as we need to make progress on some kind of API. Meeting Doc: https://docs.google.com/document/d/16kJXo3W8lCWgYOPpS5Jd8MTT2nrX68XAZKwH9QFO7sg/edit?usp=sharing Generally, we know that the MVP for this is going to be integrations/external frameworks. The configuration options are quite complicated and we were worried that we would never really come to a decision trying to add every API field we want in the first round of this. For our tech preview, we are going to only allow We still need to figure out a path forward with OLM on blocking upgrades and until we have a clear idea of that, it was causing a lot of friction to include all those configurations without be able to block upgrades. I pushed up a commit that at least calls out the basic API we want for TP. |
operator/v1alpha1/types_kueue.go
Outdated
// kueue configuration must not be changed once the object exists | ||
// to change the configuration, one can delete the object and create a new object. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does this sentence need to be removed?
operator/v1alpha1/types_kueue.go
Outdated
// config is the desired configuration | ||
// for the kueue operator. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is it the kueue operator, or just kueue? The operator spec is for the operator right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it is both. The operator creates the configuration for kueue and deploys kueue.
operator/v1alpha1/types_kueue.go
Outdated
// The frameworks are jobs that Kueue will manage. | ||
// +kubebuilder:validation:MaxItems=14 | ||
// +kubebuilder:validation:MinItems=1 | ||
// kubebuilder:validation:UniqueItems=true |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This doesn't actually work, use CEL instead
// kubebuilder:validation:UniqueItems=true | |
// kubebuilder:validation:XValidation:rule="self.all(x, self.exists_one(y, x == y))",message="each item in frameworks must be unique" |
// frameworks are a list of names to be enabled. | ||
// This is required and must have at least one element. | ||
// The frameworks are jobs that Kueue will manage. | ||
// +kubebuilder:validation:MaxItems=14 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
// +kubebuilder:validation:MaxItems=14 | |
// Entries in this list must be unique. | |
// +kubebuilder:validation:MaxItems=14 |
// +required | ||
Frameworks []KueueIntegrations `json:"frameworks"` | ||
// externalFrameworks are a list of GroupVersionResources | ||
// that are managed for Kueue by external controllers; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
// that are managed for Kueue by external controllers; | |
// that are managed for Kueue by external controllers. |
operator/v1alpha1/types_kueue.go
Outdated
// externalFrameworks are a list of GroupVersionResources | ||
// that are managed for Kueue by external controllers; | ||
// These are optional and should only be used if you have an external controller | ||
// that integrations with kueue. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You're mixing the casing of kueue
quite a lot through the godoc, I think it should be capitalised as a noun
// that integrations with kueue. | |
// that integrates with Kueue. |
operator/v1alpha1/types_kueue.go
Outdated
// match or otherwise the workload creation would fail. The labels are copied only | ||
// during the workload creation and are not updated even if the labels of the | ||
// underlying job are changed. | ||
// +kubebuilder:validation:items:MaxLength=317 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
// +kubebuilder:validation:items:MaxLength=317 | |
// Each entry in the list must be a valid qualified name consisting of a lower-case alphanumeric string, | |
// and hyphens of at most 63 characters in length. The name must start and end with an alphanumeric character. | |
// The name may be optionally prefixed with a subdomain consisting of lower-case alphanumeric characters, | |
// hyphens and periods, of at most 253 characters in length. | |
// Each period separated segment within the subdomain must start and end with an alphanumeric character. | |
// The optional prefix and the name are separate by a forward slash (/). | |
// +kubebuilder:validation:items:MaxLength=317 |
// underlying job are changed. | ||
// +kubebuilder:validation:items:MaxLength=317 | ||
// +kubebuilder:validation:MaxItems=64 | ||
// +kubebuilder:validation:XValidation:rule="self.size() == 0 || !format.qualifiedName().validate(self).hasValue()",message="a lowercase RFC 1123 subdomain must consist of lower case alphanumeric characters, '-' or '.', and must start and end with an alphanumeric character." |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
// +kubebuilder:validation:XValidation:rule="self.size() == 0 || !format.qualifiedName().validate(self).hasValue()",message="a lowercase RFC 1123 subdomain must consist of lower case alphanumeric characters, '-' or '.', and must start and end with an alphanumeric character." | |
// +kubebuilder:validation:items:XValidation:rule="!format.qualifiedName().validate(self).hasValue()",message="a lowercase RFC 1123 subdomain must consist of lower case alphanumeric characters, '-' or '.', and must start and end with an alphanumeric character." |
131d2f7
to
45893f3
Compare
operator/v1alpha1/types_kueue.go
Outdated
// integrations are the workloads Kueue will manage | ||
// Kueue has integrations in the codebase and it also allows | ||
// for external frameworks | ||
// Kueue are an important part to specify for the API as Kueue will | ||
// only manage the workloads that are specfied in this list. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you just read through this and check this is what you want to say, I think it may be missing some punctuation?
Line 48 especially isn't reading right to me
operator/v1alpha1/types_kueue.go
Outdated
// group of externalFramework | ||
// must be a valid qualified name consisting of a lower-case alphanumeric string, | ||
// and hyphens of at most 63 characters in length. The name must start and end with an alphanumeric character. | ||
// The name may be optionally prefixed with a subdomain consisting of lower-case alphanumeric characters, | ||
// hyphens and periods, of at most 253 characters in length. | ||
// Each period separated segment within the subdomain must start and end with an alphanumeric character. | ||
// The optional prefix and the name are separate by a forward slash (/). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Make sure to punctuate this documentation as if it were product documentation
// group of externalFramework | |
// must be a valid qualified name consisting of a lower-case alphanumeric string, | |
// and hyphens of at most 63 characters in length. The name must start and end with an alphanumeric character. | |
// The name may be optionally prefixed with a subdomain consisting of lower-case alphanumeric characters, | |
// hyphens and periods, of at most 253 characters in length. | |
// Each period separated segment within the subdomain must start and end with an alphanumeric character. | |
// The optional prefix and the name are separate by a forward slash (/). | |
// group is the API group of the externalFramework. | |
// Must be a valid qualified name consisting of a lower-case alphanumeric string, | |
// and hyphens of at most 63 characters in length. The name must start and end with an alphanumeric character. | |
// The name may be optionally prefixed with a subdomain consisting of lower-case alphanumeric characters, | |
// hyphens and periods, of at most 253 characters in length. | |
// Each period separated segment within the subdomain must start and end with an alphanumeric character. | |
// The optional prefix and the name are separate by a forward slash (/). |
operator/v1alpha1/types_kueue.go
Outdated
// resource of external framework | ||
// must be a valid qualified name consisting of a lower-case alphanumeric string, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
// resource of external framework | |
// must be a valid qualified name consisting of a lower-case alphanumeric string, | |
// resource is the Resource type of the external framework. | |
// Resource types are lowercase and plural (e.g. pods, deployments). | |
// Must be a valid qualified name consisting of a lower-case alphanumeric string, |
operator/v1alpha1/types_kueue.go
Outdated
// +kubebuilder:validation:MinLength=1 | ||
// +required | ||
Resource string `json:"resource"` | ||
// version is the version of the api |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
// version is the version of the api | |
// version is the version of the api (e.g. v1alpha1, v1beta1, v1). |
operator/v1alpha1/types_kueue.go
Outdated
// integrations are the workloads Kueue will manage | ||
// Kueue has integrations in the codebase and it also allows | ||
// for external frameworks | ||
// Kueue are an important part to specify for the API as Kueue will | ||
// only manage the workloads that are specfied in this list. | ||
// This is a required field. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This feels more developer focused than user focused. What do you think of something like:
integrations is a required field that configures the Kueue's workload integrations.
Kueue has both standard integrations, known as job frameworks, and external integrations known as external frameworks.
Kueue will only manage workloads that correspond to the specified integrations.
operator/v1alpha1/types_kueue.go
Outdated
LeaderWorkerSet KueueIntegrations = "LeaderWorkerSet" | ||
) | ||
|
||
// This is the GVK for an external framework. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
// This is the GVK for an external framework. | |
// This is the GVR for an external framework. |
operator/v1alpha1/types_kueue.go
Outdated
} | ||
|
||
// +kubebuilder:validation:Enum=BatchJob;RayJob;RayCluster;JobSet;MPIJob;PaddleJob;PytorchJob;TFJob;XGBoostJob;AppWrappers;Pod;Deployment;StatefulSet;LeaderWorkerSet | ||
type KueueIntegrations string |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is an alias to a string
, which is a singular type
type KueueIntegrations string | |
type KueueIntegration string |
operator/v1alpha1/types_kueue.go
Outdated
type KueueIntegrations string | ||
|
||
const ( | ||
BatchJob KueueIntegrations = "BatchJob" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is generally recommend that the name of the constant follows the pattern {typeAlias}{constantValue}
to prevent potential conflict of other constant names in the future. It also makes it easier for engineers that need to reference this API to more easily "guess" the constants associated with an enum constrained field.
BatchJob KueueIntegrations = "BatchJob" | |
KueueIntegrationsBatchJob KueueIntegrations = "BatchJob" |
operator/v1alpha1/types_kueue.go
Outdated
// The optional prefix and the name are separate by a forward slash (/). | ||
// +kubebuilder:validation:MaxLength=253 | ||
// +kubebuilder:validation:MinLength=1 | ||
// +kubebuilder:validation:XValidation:rule="self.size() == 0 || !format.qualifiedName().validate(self).hasValue()",message="a lowercase RFC 1123 subdomain must consist of lower case alphanumeric characters, '-' or '.', and must start and end with an alphanumeric character." |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we probably want the dns1123Subdomain validation instead of the qualifiedName validation:
// +kubebuilder:validation:XValidation:rule="self.size() == 0 || !format.qualifiedName().validate(self).hasValue()",message="a lowercase RFC 1123 subdomain must consist of lower case alphanumeric characters, '-' or '.', and must start and end with an alphanumeric character." | |
// +kubebuilder:validation:XValidation:rule="self.size() == 0 || !format.dns1123Subdomain().validate(self).hasValue()",message="a lowercase RFC 1123 subdomain must consist of lower case alphanumeric characters, '-' or '.', and must start and end with an alphanumeric character." |
operator/v1alpha1/types_kueue.go
Outdated
// hyphens and periods, of at most 253 characters in length. | ||
// Each period separated segment within the subdomain must start and end with an alphanumeric character. | ||
// +kubebuilder:validation:MaxLength=256 | ||
// +kubebuilder:validation:XValidation:rule="self.size() == 0 || !format.qualifiedName().validate(self).hasValue()",message="a lowercase RFC 1123 subdomain must consist of lower case alphanumeric characters, '-' or '.', and must start and end with an alphanumeric character." |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not certain, but I don't think that resources can be full RFC1123 subdomain names. My understanding is that they would end up being limited to a DNS label.
@JoelSpeed would probably know more clearly than I.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think that's correct yes, resource is a DNS label which is a slightly different format validation
// +kubebuilder:validation:MaxLength=256 | ||
// +kubebuilder:validation:MinLength=1 | ||
// +required | ||
Version string `json:"version"` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I do know that the version string should be a valid DNS label as outlined in the upstream Kube API conventions: https://github.com/kubernetes/community/blob/35444da79dff9a448e7ecf24b277e5f71373840a/contributors/devel/sig-architecture/api-conventions.md?plain=1#L116-L118
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Lets add that validation then
45893f3
to
0aba7b7
Compare
operator/v1alpha1/types_kueue.go
Outdated
// Controller runtime requires this in this format | ||
// for api discoverability. | ||
type ExternalFramework struct { | ||
// group of externalFramework |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd drop this line and start on the next one
operator/v1alpha1/types_kueue.go
Outdated
// Each period separated segment within the subdomain must start and end with an alphanumeric character. | ||
// +kubebuilder:validation:MaxLength=256 | ||
// +kubebuilder:validation:MinLength=1 | ||
// +kubebuilder:validation:XValidation:rule="self.size() == 0 || !format.dns1123Subdomain().validate(self).hasValue()",message="a lowercase RFC 1123 subdomain must consist of lower case alphanumeric characters, '-' or '.', and must start and end with an alphanumeric character." |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Per a comment @everettraven pointed out, this should be a label
// +kubebuilder:validation:XValidation:rule="self.size() == 0 || !format.dns1123Subdomain().validate(self).hasValue()",message="a lowercase RFC 1123 subdomain must consist of lower case alphanumeric characters, '-' or '.', and must start and end with an alphanumeric character." | |
// +kubebuilder:validation:XValidation:rule="self.size() == 0 || !format.dns1123Label().validate(self).hasValue()",message="a lowercase RFC 1123 subdomain must consist of lower case alphanumeric characters, '-' or '.', and must start and end with an alphanumeric character." |
operator/v1alpha1/types_kueue.go
Outdated
// version is the version of the api (e.g. v1alpha1, v1beta1, v1). | ||
// +kubebuilder:validation:MaxLength=256 | ||
// +kubebuilder:validation:MinLength=1 | ||
// +kubebuilder:validation:XValidation:rule="self.size() == 0 || !format.dns1123Subdomain().validate(self).hasValue()",message="a lowercase RFC 1123 subdomain must consist of lower case alphanumeric characters, '-' or '.', and must start and end with an alphanumeric character." |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
// +kubebuilder:validation:XValidation:rule="self.size() == 0 || !format.dns1123Subdomain().validate(self).hasValue()",message="a lowercase RFC 1123 subdomain must consist of lower case alphanumeric characters, '-' or '.', and must start and end with an alphanumeric character." | |
// +kubebuilder:validation:XValidation:rule="self.size() == 0 || !format.dns1035Label().validate(self).hasValue()",message="a lowercase RFC 1123 subdomain must consist of lower case alphanumeric characters, '-' or '.', and must start and end with an alphanumeric character." |
Version string `json:"version"` | ||
} | ||
|
||
type Integrations struct { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Missing struct godoc
// hyphens and periods, of at most 253 characters in length. | ||
// Each period separated segment within the subdomain must start and end with an alphanumeric character. | ||
// The optional prefix and the name are separate by a forward slash (/). | ||
// +kubebuilder:validation:MaxLength=317 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is an empty key valid? I suspect we want minLength 1 on this?
@kannon92: The following tests failed, say
Full PR test history. Your PR dashboard. Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here. |
We are bringing Kueue into OCP in 4.19. This PR fullfills requirements for an api review.