Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

OTA-209: Add sync logic to the CVO configuration controller #1170

Open
wants to merge 4 commits into
base: main
Choose a base branch
from

Conversation

DavidHurta
Copy link
Contributor

@DavidHurta DavidHurta commented Mar 17, 2025

Add synchronization logic to the CVO configuration controller to change the CVO's log level depending on the resource's spec and update the observedGeneration field of the resource.

These constants can be simply used when logging across the project instead
of specifying the level by a number. The descriptions and the values will
also guide developers when deciding what log level to choose.

For example:

```go
  import "github.com/openshift/cluster-version-operator/pkg/internal"

  klog.V(internal.Debug).Infof("")
```

The descriptions of the values are extracted from the used API type [1].

[1]: https://github.com/openshift/api/blob/744790f2cff777b1bb29bdccce10e4e84cff0a69/operator/v1/types.go#L94-L109
@openshift-ci-robot openshift-ci-robot added the jira/valid-reference Indicates that this PR references a valid Jira ticket of any type. label Mar 17, 2025
@openshift-ci-robot
Copy link
Contributor

openshift-ci-robot commented Mar 17, 2025

@DavidHurta: This pull request references OTA-209 which is a valid jira issue.

In response to this:

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

@openshift-ci openshift-ci bot added the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Mar 17, 2025
Copy link
Contributor

openshift-ci bot commented Mar 17, 2025

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: DavidHurta

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@openshift-ci openshift-ci bot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Mar 17, 2025
@DavidHurta
Copy link
Contributor Author

/retest-required

@DavidHurta
Copy link
Contributor Author

DavidHurta commented Mar 18, 2025

Manually testing on a launch 4.19,https://github.com/openshift/cluster-version-operator/pull/1170 aws cluster.

Default feature set:

Checking for occurrences of relevant strings. All as expected.

$ grep -iE "(clusterversionoperator|configuration.go)" cvo-pr-1163-cvo-logs-default.log
I0318 12:22:19.711458       1 start.go:20] ClusterVersionOperator v1.0.0-1372-g4dfca0ba-dirty
I0318 12:28:31.402831       1 payload.go:208] excluding Filename: "0000_00_cluster-version-operator_01_clusterversionoperators-CustomNoUpgrade.crd.yaml" Group: "apiextensions.k8s.io" Kind: "CustomResourceDefinition" Name: "clusterversionoperators.operator.openshift.io": "Default" is required, and release.openshift.io/feature-set=CustomNoUpgrade
I0318 12:28:31.403617       1 payload.go:208] excluding Filename: "0000_00_cluster-version-operator_01_clusterversionoperators-DevPreviewNoUpgrade.crd.yaml" Group: "apiextensions.k8s.io" Kind: "CustomResourceDefinition" Name: "clusterversionoperators.operator.openshift.io": "Default" is required, and release.openshift.io/feature-set=DevPreviewNoUpgrade
I0318 12:28:31.454498       1 payload.go:208] excluding Filename: "0000_00_cluster-version-operator_02_configuration-DevPreviewNoUpgrade.yaml" Group: "operator.openshift.io" Kind: "ClusterVersionOperator" Name: "cluster": "Default" is required, and release.openshift.io/feature-set=DevPreviewNoUpgrade
I0318 12:28:34.011213       1 cvo.go:430] Starting ClusterVersionOperator with minimum reconcile period 3m40.904623427s
I0318 12:28:34.011313       1 cvo.go:480] The ClusterVersionOperatorConfiguration feature gate is disabled or HyperShift is detected; the configuration sync routine will not run.
I0318 12:28:34.024929       1 payload.go:208] excluding Filename: "0000_00_cluster-version-operator_01_clusterversionoperators-CustomNoUpgrade.crd.yaml" Group: "apiextensions.k8s.io" Kind: "CustomResourceDefinition" Name: "clusterversionoperators.operator.openshift.io": "Default" is required, and release.openshift.io/feature-set=CustomNoUpgrade
I0318 12:28:34.025586       1 payload.go:208] excluding Filename: "0000_00_cluster-version-operator_01_clusterversionoperators-DevPreviewNoUpgrade.crd.yaml" Group: "apiextensions.k8s.io" Kind: "CustomResourceDefinition" Name: "clusterversionoperators.operator.openshift.io": "Default" is required, and release.openshift.io/feature-set=DevPreviewNoUpgrade
I0318 12:28:34.048950       1 payload.go:208] excluding Filename: "0000_00_cluster-version-operator_02_configuration-DevPreviewNoUpgrade.yaml" Group: "operator.openshift.io" Kind: "ClusterVersionOperator" Name: "cluster": "Default" is required, and release.openshift.io/feature-set=DevPreviewNoUpgrade

Checking for warnings. The majority is caused by the release.openshift.io/delete: "true" annotation on the objects.

$ grep -iE "^W.*$" cvo-pr-1163-cvo-logs-default.log | grep -oiE "].*" | sort -u
] CatalogSource "openshift-operator-lifecycle-manager/olm-operators" not found. It either has already been removed or it has never been installed on this cluster.
] clusterrolebinding "default-account-openshift-machine-config-operator" not found. It either has already been removed or it has never been installed on this cluster.
] configmap "openshift-config-managed/grafana-dashboard-api-performance" not found. It either has already been removed or it has never been installed on this cluster.
] configmap "openshift-config-managed/grafana-dashboard-etcd" not found. It either has already been removed or it has never been installed on this cluster.
] configmap "openshift-machine-api/cluster-autoscaler-operator-ca" not found. It either has already been removed or it has never been installed on this cluster.
] configmap "openshift-operator-lifecycle-manager/olm-operators" not found. It either has already been removed or it has never been installed on this cluster.
] ConsoleQuickStart "ocs-install-tour" not found. It either has already been removed or it has never been installed on this cluster.
] deployment "openshift-cloud-controller-manager-operator/cluster-cloud-controller-manager" not found. It either has already been removed or it has never been installed on this cluster.
] imagestream "openshift/hello-openshift" not found. It either has already been removed or it has never been installed on this cluster.
] PrometheusRule "openshift-authentication-operator/authentication-operator" not found. It either has already been removed or it has never been installed on this cluster.
] PrometheusRule "openshift-kube-apiserver/kube-apiserver-recording-rules" not found. It either has already been removed or it has never been installed on this cluster.
] PrometheusRule "openshift-machine-api/cluster-autoscaler-operator-rules" not found. It either has already been removed or it has never been installed on this cluster.
] service "openshift-cloud-credential-operator/controller-manager-service" not found. It either has already been removed or it has never been installed on this cluster.
] spec.template.spec.nodeSelector[node-role.kubernetes.io/master]: use "node-role.kubernetes.io/control-plane" instead
] Subscription "openshift-operator-lifecycle-manager/packageserver" not found. It either has already been removed or it has never been installed on this cluster.
] unknown field "spec.signatureStores"

Checking for errors. No relevant errors to the PR.

$ grep -iE "^E.*$" cvo-pr-1163-cvo-logs-default.log | grep -oiE "].*" | sort -u
] Failed to update lock optimistically: Put "https://api-int.ci-ln-8cyy7db-76ef8.aws-2.ci.openshift.org:6443/apis/coordination.k8s.io/v1/namespaces/openshift-cluster-version/leases/version": read tcp 10.0.112.186:42920->10.0.57.213:6443: read: connection reset by peer, falling back to slow path
] Failed to update lock optimistically: Put "https://api-int.ci-ln-8cyy7db-76ef8.aws-2.ci.openshift.org:6443/apis/coordination.k8s.io/v1/namespaces/openshift-cluster-version/leases/version": read tcp 10.0.112.186:57530->10.0.94.221:6443: read: connection reset by peer, falling back to slow path
] "Server rejected event (will not retry!)" err="Timeout: request did not complete within requested timeout - context deadline exceeded" event="&Event{ObjectMeta:{version.182de573bf9ec49c  openshift-cluster-version    0 0001-01-01 00:00:00 +0000 UTC <nil> <nil> map[] map[] [] [] []},InvolvedObject:ObjectReference{Kind:ClusterVersion,Namespace:openshift-cluster-version,Name:version,UID:,APIVersion:config.openshift.io/v1,ResourceVersion:,FieldPath:,},Reason:PayloadLoaded,Message:Payload loaded version=\"4.19.0-0.test-2025-03-18-120022-ci-ln-8cyy7db-latest\" image=\"registry.build06.ci.openshift.org/ci-ln-8cyy7db/release@sha256:cf4e48e36645121062b78cd6bc67b155506c0458eeb4496b2eb07c5163326f07\" architecture=\"amd64\",Source:EventSource{Component:openshift-cluster-version,Host:,},FirstTimestamp:2025-03-18 12:28:36.161103004 +0000 UTC m=+376.521511343,LastTimestamp:2025-03-18 12:28:36.161103004 +0000 UTC m=+376.521511343,Count:1,Type:Normal,EventTime:0001-01-01 00:00:00 +0000 UTC,Series:nil,Action:,Related:nil,ReportingController:openshift-cluster-version,ReportingInstance:,}"
] "Unhandled Error" err=<
] "Unhandled Error" err="error running apply for clusteroperator \"authentication\" (290 of 903): Cluster operator authentication is not available" logger="UnhandledError"
] "Unhandled Error" err="error running apply for clusteroperator \"console\" (593 of 903): Cluster operator console is not available" logger="UnhandledError"
] "Unhandled Error" err="error running apply for clusteroperator \"image-registry\" (358 of 903): Cluster operator image-registry is not available" logger="UnhandledError"
] "Unhandled Error" err="error running apply for clusteroperator \"ingress\" (378 of 903): Cluster operator ingress is not available" logger="UnhandledError"
] "Unhandled Error" err="error running apply for clusteroperator \"monitoring\" (424 of 903): Cluster operator monitoring is not available" logger="UnhandledError"
] "Unhandled Error" err="error running apply for clusteroperator \"olm\" (723 of 903): Cluster operator olm is not available" logger="UnhandledError"
] "Unhandled Error" err="error running apply for clusteroperator \"openshift-apiserver\" (463 of 903): Cluster operator openshift-apiserver is not available" logger="UnhandledError"
] "Unhandled Error" err="error running apply for clusteroperator \"openshift-controller-manager\" (476 of 903): Cluster operator openshift-controller-manager is not available" logger="UnhandledError"
] "Unhandled Error" err="error running apply for clusteroperator \"openshift-samples\" (496 of 903): Cluster operator openshift-samples is not available" logger="UnhandledError"
] "Unhandled Error" err="error running apply for clusteroperator \"operator-lifecycle-manager-packageserver\" (689 of 903): Cluster operator operator-lifecycle-manager-packageserver is not available" logger="UnhandledError"
] "Unhandled Error" err="error running apply for role \"openshift-console-operator/prometheus-k8s\" (820 of 903): namespaces \"openshift-console-operator\" not found" logger="UnhandledError"
] "Unhandled Error" err="error running apply for role \"openshift-console/prometheus-k8s\" (824 of 903): namespaces \"openshift-console\" not found" logger="UnhandledError"
] "Unhandled Error" err="unable to synchronize image (waiting 1m50.452311712s): Cluster operators authentication, console, image-registry, monitoring, olm, openshift-apiserver, openshift-controller-manager, openshift-samples are not available" logger="UnhandledError"
] "Unhandled Error" err="unable to synchronize image (waiting 3m40.904623427s): Cluster operator authentication is not available" logger="UnhandledError"
] "Unhandled Error" err="unable to synchronize image (waiting 3m40.904623427s): Cluster operators authentication, console are not available" logger="UnhandledError"
] "Unhandled Error" err="unable to synchronize image (waiting 3m40.904623427s): Cluster operators authentication, console, image-registry, monitoring, olm are not available" logger="UnhandledError"
] "Unhandled Error" err="unable to synchronize image (waiting 3m40.904623427s): Cluster operators authentication, console, image-registry, monitoring, olm, openshift-apiserver, openshift-controller-manager, openshift-samples are not available" logger="UnhandledError"
] "Unhandled Error" err="unable to synchronize image (waiting 3m40.904623427s): Cluster operators authentication, console, monitoring are not available" logger="UnhandledError"
] "Unhandled Error" err="unable to synchronize image (waiting 3m40.904623427s): Cluster operators authentication, console, monitoring, olm are not available" logger="UnhandledError"
] "Unhandled Error" err="unable to synchronize image (waiting 55.226155856s): Cluster operators authentication, console, image-registry, monitoring, olm, openshift-apiserver, openshift-controller-manager, openshift-samples, operator-lifecycle-manager-packageserver are not available" logger="UnhandledError"

No relevant resources to be found as expected:

$ oc get clusterversionoperator
error: the server doesn't have a resource type "clusterversionoperator"

DevPreviewNoUpgrade feature set:

Set the feature set to DevPreviewNoUpgrade:

$ oc edit featuregates.config.openshift.io cluster
featuregate.config.openshift.io/cluster edited
$ oc get clusterversionoperator
NAME      AGE
cluster   114s

Checking for occurrences of relevant strings (the desired log level was changed manually multiple times as can be seen in the logs). All as expected.

$ grep -iE "(clusterversionoperator|configuration.go)" cvo-pr-1163-cvo-logs-devpreview-resource-multiple-times-changed.log 
I0318 14:15:46.394685       1 start.go:20] ClusterVersionOperator v1.0.0-1372-g4dfca0ba-dirty
I0318 14:15:47.090853       1 payload.go:208] excluding Filename: "0000_00_cluster-version-operator_01_clusterversionoperators-CustomNoUpgrade.crd.yaml" Group: "apiextensions.k8s.io" Kind: "CustomResourceDefinition" Name: "clusterversionoperators.operator.openshift.io": "DevPreviewNoUpgrade" is required, and release.openshift.io/feature-set=CustomNoUpgrade
I0318 14:16:01.251470       1 cvo.go:430] Starting ClusterVersionOperator with minimum reconcile period 3m8.684452034s
I0318 14:16:01.265099       1 payload.go:208] excluding Filename: "0000_00_cluster-version-operator_01_clusterversionoperators-CustomNoUpgrade.crd.yaml" Group: "apiextensions.k8s.io" Kind: "CustomResourceDefinition" Name: "clusterversionoperators.operator.openshift.io": "DevPreviewNoUpgrade" is required, and release.openshift.io/feature-set=CustomNoUpgrade
I0318 14:16:02.273062       1 reflector.go:376] Caches populated for *v1alpha1.ClusterVersionOperator from github.com/openshift/client-go/operator/informers/externalversions/factory.go:125
I0318 14:16:02.351886       1 configuration.go:113] Started syncing CVO configuration "ClusterVersionOperator/cluster"
I0318 14:16:02.351992       1 configuration.go:115] Finished syncing CVO configuration (115.062µs)
I0318 14:18:02.273459       1 configuration.go:113] Started syncing CVO configuration "ClusterVersionOperator/cluster"
I0318 14:18:02.273486       1 configuration.go:115] Finished syncing CVO configuration (35.471µs)
I0318 14:20:02.274428       1 configuration.go:113] Started syncing CVO configuration "ClusterVersionOperator/cluster"
I0318 14:20:02.274450       1 configuration.go:115] Finished syncing CVO configuration (27.7µs)
I0318 14:22:02.275528       1 configuration.go:113] Started syncing CVO configuration "ClusterVersionOperator/cluster"
I0318 14:22:02.275554       1 configuration.go:115] Finished syncing CVO configuration (37.661µs)
I0318 14:24:02.275830       1 configuration.go:113] Started syncing CVO configuration "ClusterVersionOperator/cluster"
I0318 14:24:02.275854       1 configuration.go:115] Finished syncing CVO configuration (31.41µs)
I0318 14:26:02.276525       1 configuration.go:113] Started syncing CVO configuration "ClusterVersionOperator/cluster"
I0318 14:26:02.276548       1 configuration.go:115] Finished syncing CVO configuration (31.3µs)
I0318 14:28:02.276839       1 configuration.go:113] Started syncing CVO configuration "ClusterVersionOperator/cluster"
I0318 14:28:02.276871       1 configuration.go:115] Finished syncing CVO configuration (45.411µs)
I0318 14:30:02.276952       1 configuration.go:113] Started syncing CVO configuration "ClusterVersionOperator/cluster"
I0318 14:30:02.276977       1 configuration.go:115] Finished syncing CVO configuration (33.021µs)
I0318 14:32:02.277311       1 configuration.go:113] Started syncing CVO configuration "ClusterVersionOperator/cluster"
I0318 14:32:02.277328       1 configuration.go:115] Finished syncing CVO configuration (24.341µs)
I0318 14:34:02.278035       1 configuration.go:113] Started syncing CVO configuration "ClusterVersionOperator/cluster"
I0318 14:34:02.278065       1 configuration.go:115] Finished syncing CVO configuration (32.841µs)
I0318 14:36:02.279049       1 configuration.go:113] Started syncing CVO configuration "ClusterVersionOperator/cluster"
I0318 14:36:02.279072       1 configuration.go:115] Finished syncing CVO configuration (48.531µs)
I0318 14:38:02.279506       1 configuration.go:113] Started syncing CVO configuration "ClusterVersionOperator/cluster"
I0318 14:38:02.279529       1 configuration.go:115] Finished syncing CVO configuration (34.051µs)
I0318 14:40:02.279661       1 configuration.go:113] Started syncing CVO configuration "ClusterVersionOperator/cluster"
I0318 14:40:02.279685       1 configuration.go:115] Finished syncing CVO configuration (30.781µs)
I0318 14:42:02.280490       1 configuration.go:113] Started syncing CVO configuration "ClusterVersionOperator/cluster"
I0318 14:42:02.280514       1 configuration.go:115] Finished syncing CVO configuration (37.281µs)
I0318 14:44:02.280560       1 configuration.go:113] Started syncing CVO configuration "ClusterVersionOperator/cluster"
I0318 14:44:02.280600       1 configuration.go:115] Finished syncing CVO configuration (43.811µs)
I0318 14:46:02.280964       1 configuration.go:113] Started syncing CVO configuration "ClusterVersionOperator/cluster"
I0318 14:46:02.280987       1 configuration.go:115] Finished syncing CVO configuration (31.5µs)
I0318 14:48:02.281900       1 configuration.go:113] Started syncing CVO configuration "ClusterVersionOperator/cluster"
I0318 14:48:02.281932       1 configuration.go:115] Finished syncing CVO configuration (53.961µs)
I0318 14:50:02.282523       1 configuration.go:113] Started syncing CVO configuration "ClusterVersionOperator/cluster"
I0318 14:50:02.282552       1 configuration.go:115] Finished syncing CVO configuration (41.821µs)
I0318 14:52:02.283056       1 configuration.go:113] Started syncing CVO configuration "ClusterVersionOperator/cluster"
I0318 14:52:02.283079       1 configuration.go:115] Finished syncing CVO configuration (30.4µs)
I0318 14:53:17.257209       1 configuration.go:113] Started syncing CVO configuration "ClusterVersionOperator/cluster"
I0318 14:53:17.263992       1 configuration.go:151] Successfully updated the log level from 'Normal' to 'Debug'
I0318 14:53:17.264011       1 configuration.go:154] The CVO logging level is set to the 'Normal' log level or above
I0318 14:53:17.264017       1 configuration.go:155] The CVO logging level is set to the 'Debug' log level or above
I0318 14:53:17.264022       1 configuration.go:115] Finished syncing CVO configuration (6.819902ms)
I0318 14:53:17.264032       1 configuration.go:113] Started syncing CVO configuration "ClusterVersionOperator/cluster"
I0318 14:53:17.264039       1 configuration.go:146] No need to update the current CVO log level 'Debug'; it is already set to the desired value
I0318 14:53:17.264045       1 configuration.go:115] Finished syncing CVO configuration (12.31µs)
I0318 14:54:02.283848       1 configuration.go:58] ClusterVersionOperator resource was modified or resync period has passed; queuing a configuration sync
I0318 14:54:02.283861       1 configuration.go:113] Started syncing CVO configuration "ClusterVersionOperator/cluster"
I0318 14:54:02.283872       1 configuration.go:146] No need to update the current CVO log level 'Debug'; it is already set to the desired value
I0318 14:54:02.283882       1 configuration.go:115] Finished syncing CVO configuration (20.361µs)
I0318 14:54:48.747962       1 configuration.go:58] ClusterVersionOperator resource was modified or resync period has passed; queuing a configuration sync
I0318 14:54:48.747984       1 configuration.go:113] Started syncing CVO configuration "ClusterVersionOperator/cluster"
I0318 14:54:48.754130       1 configuration.go:58] ClusterVersionOperator resource was modified or resync period has passed; queuing a configuration sync
I0318 14:54:48.754284       1 configuration.go:151] Successfully updated the log level from 'Debug' to 'Trace'
I0318 14:54:48.754297       1 configuration.go:154] The CVO logging level is set to the 'Normal' log level or above
I0318 14:54:48.754303       1 configuration.go:155] The CVO logging level is set to the 'Debug' log level or above
I0318 14:54:48.754309       1 configuration.go:156] The CVO logging level is set to the 'Trace' log level or above
I0318 14:54:48.754314       1 configuration.go:115] Finished syncing CVO configuration (6.330193ms)
I0318 14:54:48.754327       1 configuration.go:113] Started syncing CVO configuration "ClusterVersionOperator/cluster"
I0318 14:54:48.754361       1 configuration.go:146] No need to update the current CVO log level 'Trace'; it is already set to the desired value
I0318 14:54:48.754367       1 configuration.go:115] Finished syncing CVO configuration (40.841µs)
I0318 14:55:24.774617       1 sync_worker.go:1041] Running sync for customresourcedefinition "clusterversionoperators.operator.openshift.io" (5 of 928)
I0318 14:55:24.869570       1 request.go:661] Waited for 94.341486ms due to client-side throttling, not priority and fairness, request: GET:https://api-int.ci-ln-8cyy7db-76ef8.aws-2.ci.openshift.org:6443/apis/apiextensions.k8s.io/v1/customresourcedefinitions/clusterversionoperators.operator.openshift.io
I0318 14:55:24.872697       1 round_trippers.go:560] GET https://api-int.ci-ln-8cyy7db-76ef8.aws-2.ci.openshift.org:6443/apis/apiextensions.k8s.io/v1/customresourcedefinitions/clusterversionoperators.operator.openshift.io 200 OK in 2 milliseconds
I0318 14:55:24.872987       1 sync_worker.go:1056] Done syncing for customresourcedefinition "clusterversionoperators.operator.openshift.io" (5 of 928)
I0318 14:55:24.973982       1 sync_worker.go:1041] Running sync for clusterversionoperator "cluster" (7 of 928)
I0318 14:55:25.069311       1 request.go:661] Waited for 95.167552ms due to client-side throttling, not priority and fairness, request: GET:https://api-int.ci-ln-8cyy7db-76ef8.aws-2.ci.openshift.org:6443/apis/operator.openshift.io/v1alpha1/clusterversionoperators/cluster
I0318 14:55:25.073916       1 sync_worker.go:1056] Done syncing for clusterversionoperator "cluster" (7 of 928)
I0318 14:55:50.182537       1 configuration.go:58] ClusterVersionOperator resource was modified or resync period has passed; queuing a configuration sync
I0318 14:55:50.182558       1 configuration.go:113] Started syncing CVO configuration "ClusterVersionOperator/cluster"
I0318 14:55:50.189460       1 configuration.go:58] ClusterVersionOperator resource was modified or resync period has passed; queuing a configuration sync
I0318 14:55:50.189478       1 configuration.go:154] The CVO logging level is set to the 'Normal' log level or above
I0318 14:55:50.189486       1 configuration.go:115] Finished syncing CVO configuration (6.928964ms)
I0318 14:55:50.189499       1 configuration.go:113] Started syncing CVO configuration "ClusterVersionOperator/cluster"
I0318 14:55:50.189506       1 configuration.go:115] Finished syncing CVO configuration (7.89µs)
I0318 14:56:02.284066       1 configuration.go:113] Started syncing CVO configuration "ClusterVersionOperator/cluster"
I0318 14:56:02.284088       1 configuration.go:115] Finished syncing CVO configuration (42.761µs)

Checking for warnings. The majority is caused by the release.openshift.io/delete: "true" annotation on the objects.

$ grep -iE "^W.*$" cvo-pr-1163-cvo-logs-devpreview-resource-multiple-times-changed.log | grep -oiE "].*" | sort -u
] CatalogSource "openshift-operator-lifecycle-manager/olm-operators" not found. It either has already been removed or it has never been installed on this cluster.
] clusterrolebinding "default-account-openshift-machine-config-operator" not found. It either has already been removed or it has never been installed on this cluster.
] configmap "openshift-config-managed/grafana-dashboard-api-performance" not found. It either has already been removed or it has never been installed on this cluster.
] configmap "openshift-config-managed/grafana-dashboard-etcd" not found. It either has already been removed or it has never been installed on this cluster.
] configmap "openshift-machine-api/cluster-autoscaler-operator-ca" not found. It either has already been removed or it has never been installed on this cluster.
] configmap "openshift-operator-lifecycle-manager/olm-operators" not found. It either has already been removed or it has never been installed on this cluster.
] ConsoleQuickStart "ocs-install-tour" not found. It either has already been removed or it has never been installed on this cluster.
] deployment "openshift-cloud-controller-manager-operator/cluster-cloud-controller-manager" not found. It either has already been removed or it has never been installed on this cluster.
] imagestream "openshift/hello-openshift" not found. It either has already been removed or it has never been installed on this cluster.
] PrometheusRule "openshift-authentication-operator/authentication-operator" not found. It either has already been removed or it has never been installed on this cluster.
] PrometheusRule "openshift-kube-apiserver/kube-apiserver-recording-rules" not found. It either has already been removed or it has never been installed on this cluster.
] PrometheusRule "openshift-machine-api/cluster-autoscaler-operator-rules" not found. It either has already been removed or it has never been installed on this cluster.
] service "openshift-cloud-credential-operator/controller-manager-service" not found. It either has already been removed or it has never been installed on this cluster.
] Subscription "openshift-operator-lifecycle-manager/packageserver" not found. It either has already been removed or it has never been installed on this cluster.

Checking for errors. All as expected.

$ grep -iE "^E.*$" cvo-pr-1163-cvo-logs-devpreview-resource-multiple-times-changed.log | grep -oiE "].*" | sort -u

Summary:

All is as expected. The warnings are caused by the release.openshift.io/delete: "true" annotation on the objects. Thus, they are expected. They introduce a little bit a noise in the logs, but I am fine with them for the moment. Might be debatable whether they should be warnings or not.

@openshift-ci-robot
Copy link
Contributor

openshift-ci-robot commented Mar 18, 2025

@DavidHurta: This pull request references OTA-209 which is a valid jira issue.

In response to this:

Add synchronization logic to the CVO configuration controller to change the CVO's log level depending on the resource's spec and update the observedGeneration field of the resource.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

@DavidHurta DavidHurta changed the title WIP:OTA-209: Add sync logic to the CVO configuration controller OTA-209: Add sync logic to the CVO configuration controller Mar 18, 2025
@openshift-ci openshift-ci bot removed the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Mar 18, 2025
Copy link
Contributor

openshift-ci bot commented Mar 18, 2025

@DavidHurta: The following tests failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:

Test name Commit Details Required Rerun command
ci/prow/okd-scos-e2e-aws-ovn 6dcab15 link false /test okd-scos-e2e-aws-ovn
ci/prow/e2e-agnostic-ovn 6dcab15 link true /test e2e-agnostic-ovn

Full PR test history. Your PR dashboard.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here.

@DavidHurta
Copy link
Contributor Author

@petr-muller
Copy link
Member

/cc

@openshift-ci openshift-ci bot requested a review from petr-muller March 19, 2025 13:03
@hongkailiu
Copy link
Member

/cc

@openshift-ci openshift-ci bot requested a review from hongkailiu March 19, 2025 14:22
@@ -83,6 +94,13 @@ func (config *ClusterVersionOperatorConfiguration) Start(ctx context.Context) er
}
}

if currentLogLevel, notFound := loglevel.GetLogLevel(); notFound {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could we do it when the config instance was initialized?

func NewClusterVersionOperatorConfiguration(client operatorclientset.Interface, factory operatorexternalversions.SharedInformerFactory) *ClusterVersionOperatorConfiguration {
return &ClusterVersionOperatorConfiguration{
queueKey: fmt.Sprintf("ClusterVersionOperator/%s", ClusterVersionOperatorConfigurationName),
queue: workqueue.NewTypedRateLimitingQueueWithConfig[any](
workqueue.DefaultTypedControllerRateLimiter[any](),
workqueue.TypedRateLimitingQueueConfig[any]{Name: "configuration"}),
client: client.OperatorV1alpha1().ClusterVersionOperators(),
factory: factory,
}
}

desiredConfig.Status.ObservedGeneration = desiredConfig.Generation
_, err := config.client.UpdateStatus(ctx, desiredConfig, metav1.UpdateOptions{})
if err != nil {
return fmt.Errorf("Failed to update the ClusterVersionOperator resource, err=%w", err)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
return fmt.Errorf("Failed to update the ClusterVersionOperator resource, err=%w", err)
return fmt.Errorf("failed to update the ClusterVersionOperator resource: %w", err)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

agree, idiomatic

@@ -43,12 +51,15 @@ func (config *ClusterVersionOperatorConfiguration) clusterVersionOperatorEventHa
return cache.ResourceEventHandlerFuncs{
AddFunc: func(_ interface{}) {
config.queue.Add(config.queueKey)
klog.V(i.Debug).Infof("ClusterVersionOperator resource was added; queuing a configuration sync")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: this sounds more natural to me.

Suggested change
klog.V(i.Debug).Infof("ClusterVersionOperator resource was added; queuing a configuration sync")
klog.V(i.Debug).Infof("ClusterVersionOperator resource was added; queuing a sync")

klog.V(i.Debug).Infof("No need to update the current CVO log level '%s'; it is already set to the desired value", currentLogLevel)
} else {
if err := loglevel.SetLogLevel(config.desiredLogLevel); err != nil {
return fmt.Errorf("Failed to set the log level to '%s', err=%w", config.desiredLogLevel, err)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
return fmt.Errorf("Failed to set the log level to '%s', err=%w", config.desiredLogLevel, err)
return fmt.Errorf("failed to set the log level to %q: %w", config.desiredLogLevel, err)

}
klog.V(i.Debug).Infof("Successfully updated the log level from '%s' to '%s'", currentLogLevel, config.desiredLogLevel)

// E2E testing will be checking for existence or absence of these logs
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it temporary?
I would not want our users to see those in logging otherwise.
I am thinking to output those line with a guarding from an environment variable that is set where the test runs such as

if os.Getenv("CI")=="true" {
    ...
}

WDYT?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would not want our users to see those in logging otherwise.

I actually think we should have the log - if we expose log level configuration in user-facing resource (and expect user to interact with it), then IMO it makes sense for CVO to acknowledge it in its log? Its not even a debug log, it is straight "I saw user to do something and now I behave differently" - that sounds useful to me?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that Petr may have the impression that the comment is regarding the line 151. And not the following lines, which are a bit clunky:

		klog.V(i.Normal).Infof("The CVO logging level is set to the 'Normal' log level or above")
		klog.V(i.Debug).Infof("The CVO logging level is set to the 'Debug' log level or above")
		klog.V(i.Trace).Infof("The CVO logging level is set to the 'Trace' log level or above")
		klog.V(i.TraceAll).Infof("The CVO logging level is set to the 'TraceAll' log level or above")

I think that the usage of the CI environment variable here is a great idea. I forgot about it.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah okay, in that case yes

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My comment is about the logs for e2e test as David clarified. :)
I am trying to avoid showing logs that are only for our own testing (if possible at all).

Comment on lines +259 to +268
switch {
case apierrors.IsNotFound(err) && tt.expectedConfig == nil:
case apierrors.IsNotFound(err) && tt.expectedConfig != nil:
t.Errorf("expected config to be '%v', got NotFound", *tt.expectedConfig)
case config != nil && tt.expectedConfig == nil:
t.Errorf("expected config to be NotFound, got '%v'", *config)
case config != nil && tt.expectedConfig != nil:
if diff := cmp.Diff(*tt.expectedConfig, *config, cmpopts.IgnoreFields(operatorv1alpha1.ClusterVersionOperator{}, "ObjectMeta")); diff != "" {
t.Errorf("unexpected config (-want, +got) = %v", diff)
}
Copy link
Member

@hongkailiu hongkailiu Mar 19, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just choose any part that you like:

Suggested change
switch {
case apierrors.IsNotFound(err) && tt.expectedConfig == nil:
case apierrors.IsNotFound(err) && tt.expectedConfig != nil:
t.Errorf("expected config to be '%v', got NotFound", *tt.expectedConfig)
case config != nil && tt.expectedConfig == nil:
t.Errorf("expected config to be NotFound, got '%v'", *config)
case config != nil && tt.expectedConfig != nil:
if diff := cmp.Diff(*tt.expectedConfig, *config, cmpopts.IgnoreFields(operatorv1alpha1.ClusterVersionOperator{}, "ObjectMeta")); diff != "" {
t.Errorf("unexpected config (-want, +got) = %v", diff)
}
switch {
case apierrors.IsNotFound(err) && tt.expectedConfig != nil:
t.Errorf("expected config to be '%v', got NotFound", *tt.expectedConfig)
case err == nil:
if diff := cmp.Diff(tt.expectedConfig, config, cmpopts.IgnoreFields(metav1.ObjectMeta{}, "ManagedFields")); diff != "" {
t.Errorf("unexpected config (-want, +got) = %v", diff)
}
}

if err != nil {
t.Errorf("unexpected error %v", err)
}
if diff := cmp.Diff(tt.expectedConfig, *config, cmpopts.IgnoreFields(operatorv1alpha1.ClusterVersionOperator{}, "ObjectMeta")); diff != "" {
Copy link
Member

@hongkailiu hongkailiu Mar 19, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: ignore the necessary only?

Suggested change
if diff := cmp.Diff(tt.expectedConfig, *config, cmpopts.IgnoreFields(operatorv1alpha1.ClusterVersionOperator{}, "ObjectMeta")); diff != "" {
if diff := cmp.Diff(tt.expectedConfig, *config, cmpopts.IgnoreFields(metav1.ObjectMeta{}, "ManagedFields")); diff != "" {

}

// Shutdown created resources
cancelFunc()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: why not defer immediately after creation? it's idiomatic and will survive eg someone adding an early return or something

desiredConfig.Status.ObservedGeneration = desiredConfig.Generation
_, err := config.client.UpdateStatus(ctx, desiredConfig, metav1.UpdateOptions{})
if err != nil {
return fmt.Errorf("Failed to update the ClusterVersionOperator resource, err=%w", err)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

agree, idiomatic

klog.Infof("ClusterVersionOperator configuration has been synced")
func (config *ClusterVersionOperatorConfiguration) sync(ctx context.Context, desiredConfig *operatorv1alpha1.ClusterVersionOperator) error {
if desiredConfig.Status.ObservedGeneration != desiredConfig.Generation {
desiredConfig.Status.ObservedGeneration = desiredConfig.Generation
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We need to be careful about modifying objects in the informer cache, which you must never modify:

// Get retrieves the ClusterVersionOperator from the index for a given name.
// Objects returned here must be treated as read-only.
Get(name string) (*operatorv1alpha1.ClusterVersionOperator, error)

Basically whatever you get from the informer cache (= anything from the lister), you must not modify. Typically you make a deepcopy, and modify that. It is good practice to document whether pointers point to modifiable data or not in the function signatures (so sync would tell "you must not modify desiredConfig, it poitns to informer cache).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. jira/valid-reference Indicates that this PR references a valid Jira ticket of any type.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants