Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Openstack patch cluster error #151

Closed
nilpntr opened this issue Nov 13, 2024 · 6 comments · Fixed by #150
Closed

Openstack patch cluster error #151

nilpntr opened this issue Nov 13, 2024 · 6 comments · Fixed by #150
Assignees
Labels
bug Something isn't working
Milestone

Comments

@nilpntr
Copy link
Contributor

nilpntr commented Nov 13, 2024

When deploying a kamaji cluster on Openstack I would get this error(in the capi-kamaji-controller-manager) after the TenantControlPlane had been created, initialised and being ready:

2024-11-13T10:34:25Z	ERROR	Reconciler error	{"controller": "kamajicontrolplane", "controllerGroup": "controlplane.cluster.x-k8s.io", "controllerKind": "KamajiControlPlane", "KamajiControlPlane": {"name":"test-kamaji-cluster-control-plane","namespace":"test-kamaji-cluster"}, "namespace": "test-kamaji-cluster", "name": "test-kamaji-cluster-control-plane", "reconcileID": "42e03cd9-c0b9-445e-ac8f-0d451ddf8e63", "error": "cannot perform PATCH update for the OpenStackCluster resource: failed to patch OpenStackCluster test-kamaji-cluster/test-kamaji-cluster: OpenStackCluster.infrastructure.cluster.x-k8s.io \"test-kamaji-cluster\" is invalid: [spec.identityRef: Required value, <nil>: Invalid value: \"null\": some validation rules were not checked because the object was invalid; correct the existing errors to complete validation]", "errorVerbose": "OpenStackCluster.infrastructure.cluster.x-k8s.io \"test-kamaji-cluster\" is invalid: [spec.identityRef: Required value, <nil>: Invalid value: \"null\": some validation rules were not checked because the object was invalid; correct the existing errors to complete validation]\nfailed to patch OpenStackCluster test-kamaji-cluster/test-kamaji-cluster\nsigs.k8s.io/cluster-api/util/patch.(*Helper).Patch\n\t/go/pkg/mod/sigs.k8s.io/[email protected]/util/patch/patch.go:157\ngithub.com/clastix/cluster-api-control-plane-provider-kamaji/controllers.(*KamajiControlPlaneReconciler).patchOpenStackCluster\n\t/workspace/controllers/kamajicontrolplane_controller_cluster_patch.go:234\ngithub.com/clastix/cluster-api-control-plane-provider-kamaji/controllers.(*KamajiControlPlaneReconciler).patchCluster\n\t/workspace/controllers/kamajicontrolplane_controller_cluster_patch.go:101\ngithub.com/clastix/cluster-api-control-plane-provider-kamaji/controllers.(*KamajiControlPlaneReconciler).Reconcile.func6\n\t/workspace/controllers/kamajicontrolplane_controller.go:185\ngithub.com/clastix/cluster-api-control-plane-provider-kamaji/controllers.TrackConditionType\n\t/workspace/controllers/conditions.go:26\ngithub.com/clastix/cluster-api-control-plane-provider-kamaji/controllers.(*KamajiControlPlaneReconciler).Reconcile\n\t/workspace/controllers/kamajicontrolplane_controller.go:184\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller[...]).Reconcile\n\t/go/pkg/mod/sigs.k8s.io/[email protected]/pkg/internal/controller/controller.go:116\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller[...]).reconcileHandler\n\t/go/pkg/mod/sigs.k8s.io/[email protected]/pkg/internal/controller/controller.go:303\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller[...]).processNextWorkItem\n\t/go/pkg/mod/sigs.k8s.io/[email protected]/pkg/internal/controller/controller.go:263\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller[...]).Start.func2.2\n\t/go/pkg/mod/sigs.k8s.io/[email protected]/pkg/internal/controller/controller.go:224\nruntime.goexit\n\t/usr/local/go/src/runtime/asm_amd64.s:1695\ncannot perform PATCH update for the OpenStackCluster resource\ngithub.com/clastix/cluster-api-control-plane-provider-kamaji/controllers.(*KamajiControlPlaneReconciler).patchOpenStackCluster\n\t/workspace/controllers/kamajicontrolplane_controller_cluster_patch.go:235\ngithub.com/clastix/cluster-api-control-plane-provider-kamaji/controllers.(*KamajiControlPlaneReconciler).patchCluster\n\t/workspace/controllers/kamajicontrolplane_controller_cluster_patch.go:101\ngithub.com/clastix/cluster-api-control-plane-provider-kamaji/controllers.(*KamajiControlPlaneReconciler).Reconcile.func6\n\t/workspace/controllers/kamajicontrolplane_controller.go:185\ngithub.com/clastix/cluster-api-control-plane-provider-kamaji/controllers.TrackConditionType\n\t/workspace/controllers/conditions.go:26\ngithub.com/clastix/cluster-api-control-plane-provider-kamaji/controllers.(*KamajiControlPlaneReconciler).Reconcile\n\t/workspace/controllers/kamajicontrolplane_controller.go:184\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller[...]).Reconcile\n\t/go/pkg/mod/sigs.k8s.io/[email protected]/pkg/internal/controller/controller.go:116\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller[...]).reconcileHandler\n\t/go/pkg/mod/sigs.k8s.io/[email protected]/pkg/internal/controller/controller.go:303\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller[...]).processNextWorkItem\n\t/go/pkg/mod/sigs.k8s.io/[email protected]/pkg/internal/controller/controller.go:263\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller[...]).Start.func2.2\n\t/go/pkg/mod/sigs.k8s.io/[email protected]/pkg/internal/controller/controller.go:224\nruntime.goexit\n\t/usr/local/go/src/runtime/asm_amd64.s:1695"}

After digging in the code I discovered that the patchOpenStackCluster function would try to overwrite the spec of an OpenstackCluster. To test my solution I've created a little test script which loads a json kind: Cluster and tries the old implementation and the fix:

package main

import (
	"context"
	"encoding/json"
	"fmt"
	"github.com/pkg/errors"
	"k8s.io/apimachinery/pkg/apis/meta/v1/unstructured"
	"k8s.io/apimachinery/pkg/types"
	"k8s.io/client-go/tools/clientcmd"
	"k8s.io/client-go/util/homedir"
	"log"
	"os"
	"path/filepath"
	capiv1beta1 "sigs.k8s.io/cluster-api/api/v1beta1"
	"sigs.k8s.io/controller-runtime/pkg/client"
)

func main() {
	kubeConfig := filepath.Join(homedir.HomeDir(), ".kube", "config")
	cfg, err := clientcmd.BuildConfigFromFlags("", kubeConfig)
	if err != nil {
		fmt.Printf("Error loading kubeconfig: %v\n", err)
		return
	}

	k8sClient, err := client.New(cfg, client.Options{})
	if err != nil {
		fmt.Printf("Error creating client: %v\n", err)
		return
	}

	raw, err := os.ReadFile("cluster.json")
	if err != nil {
		log.Fatal(err)
	}

	var cluster capiv1beta1.Cluster
	if err = json.Unmarshal(raw, &cluster); err != nil {
		log.Fatal(err)
	}

	ctx := context.Background()

	if err = patchOpenStackCluster(ctx, k8sClient, cluster, "123.456.78.91", 6443); err != nil {
		log.Fatal(err)
	}

	if err = patchOpenStackClusterFix(ctx, k8sClient, cluster, "123.456.78.91", 6443); err != nil {
		log.Fatal(err)
	}
}

func patchOpenStackCluster(ctx context.Context, k8sClient client.Client, cluster capiv1beta1.Cluster, endpoint string, port int64) error {
	osc := unstructured.Unstructured{}

	osc.SetGroupVersionKind(cluster.Spec.InfrastructureRef.GroupVersionKind())
	osc.SetName(cluster.Spec.InfrastructureRef.Name)
	osc.SetNamespace(cluster.Spec.InfrastructureRef.Namespace)

	if err := k8sClient.Get(ctx, types.NamespacedName{Name: osc.GetName(), Namespace: osc.GetNamespace()}, &osc); err != nil {
		return errors.Wrap(err, fmt.Sprintf("cannot retrieve the %s resource", osc.GetKind()))
	}

	if err := unstructured.SetNestedMap(osc.Object, map[string]interface{}{
		"apiServerFixedIP": endpoint,
		"apiServerPort":    port,
	}, "spec"); err != nil {
		return errors.Wrap(err, fmt.Sprintf("unable to set unstructured %s spec patch", osc.GetKind()))
	}

	out, err := osc.MarshalJSON()
	if err != nil {
		return errors.Wrap(err, fmt.Sprintf("unable to marshal unstructured %s spec patch", osc.GetKind()))
	}

	if err = os.WriteFile("cluster-patch.json", out, 0600); err != nil {
		return errors.Wrap(err, fmt.Sprintf("unable to write cluster-patch.json"))
	}

	return nil
}

func patchOpenStackClusterFix(ctx context.Context, k8sClient client.Client, cluster capiv1beta1.Cluster, endpoint string, port int64) error {
	osc := unstructured.Unstructured{}

	osc.SetGroupVersionKind(cluster.Spec.InfrastructureRef.GroupVersionKind())
	osc.SetName(cluster.Spec.InfrastructureRef.Name)
	osc.SetNamespace(cluster.Spec.InfrastructureRef.Namespace)

	if err := k8sClient.Get(ctx, types.NamespacedName{Name: osc.GetName(), Namespace: osc.GetNamespace()}, &osc); err != nil {
		return errors.Wrap(err, fmt.Sprintf("cannot retrieve the %s resource", osc.GetKind()))
	}

	if err := unstructured.SetNestedField(osc.Object, endpoint, "spec", "apiServerFixedIP"); err != nil {
		return errors.Wrap(err, fmt.Sprintf("unable to set unstructured %s spec apiServerFixedIP", osc.GetKind()))
	}

	if err := unstructured.SetNestedField(osc.Object, port, "spec", "apiServerPort"); err != nil {
		return errors.Wrap(err, fmt.Sprintf("unable to set unstructured %s spec apiServerPort", osc.GetKind()))
	}

	out, err := osc.MarshalJSON()
	if err != nil {
		return errors.Wrap(err, fmt.Sprintf("unable to marshal unstructured %s spec patch", osc.GetKind()))
	}

	if err = os.WriteFile("cluster-patch-fix.json", out, 0600); err != nil {
		return errors.Wrap(err, fmt.Sprintf("unable to write cluster-patch.json"))
	}

	return nil
}

Comparing the 2 outputs I saw that the current implementation overwrites the spec with an object containing only the 2 fields. When looking at the fix I saw that it just modifies the fields as we would expect.

@nilpntr
Copy link
Contributor Author

nilpntr commented Nov 13, 2024

Created a fix here: #150

@prometherion
Copy link
Member

Feeling sorry for this, the bug has been introduced with #48.

@prometherion prometherion added the bug Something isn't working label Nov 13, 2024
@prometherion prometherion linked a pull request Nov 15, 2024 that will close this issue
@prometherion
Copy link
Member

Closed via #150

@nilpntr
Copy link
Contributor Author

nilpntr commented Nov 18, 2024

@prometherion do you have a date in mind for a release including this patch?(currently using my own docker image to address this issue)

@prometherion prometherion added this to the v0.13.0 milestone Nov 18, 2024
@prometherion
Copy link
Member

We're still waiting for a potential feature request we could land in v0.13.0, I don't have a release date, in the worst-case scenario it will be the end of the month.

@prometherion
Copy link
Member

@nilpntr we recently received the PR merge notification for CAPZ support, meaning we will release soon by the end of the week, thanks for the patience, Sam! 🙏🏻

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants