Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Load Balancer Service fails to delete after apply/delete the workload repeatedly #1416

Open
laozc opened this issue Feb 17, 2025 · 1 comment
Assignees
Labels
kind/bug Categorizes issue or PR as related to a bug.

Comments

@laozc
Copy link

laozc commented Feb 17, 2025

What happened?

When apply/delete the workload repeatedly, there is a chance that the service is not deleted.

What did you expect to happen?

CPI should be better handle the case.

How can we reproduce it (as minimally and precisely as possible)?

Workload yaml windows-workload-svc.yaml

---
apiVersion: v1
kind: Service
metadata:
  name: win-webserver
  labels:
    app: win-webserver
spec:
  ports:
    # the port that this service should serve on
    - port: 80
      targetPort: 80
  selector:
    app: win-webserver
  type: LoadBalancer
---
apiVersion: apps/v1
kind: Deployment
metadata:
  labels:
    app: win-webserver
  name: win-webserver
spec:
  replicas: 1
  selector:
    matchLabels:
      app: win-webserver
  template:
    metadata:
      labels:
        app: win-webserver
      name: win-webserver
    spec:
      tolerations:
        - key: os
          operator: "Exists"
          effect: "NoSchedule"
      containers:
        - name: windowswebserver
          image: mcr-docker-remote.artifactory.eng.vmware.com/windows/servercore:ltsc2022
          command:
            - powershell.exe
            - -command
            - "<#code used from https://gist.github.com/19WAS85/5424431#> ; $$listener = New-Object System.Net.HttpListener ; $$listener.Prefixes.Add('http://*:80/') ; $$listener.Start() ; $$callerCounts = @{} ; Write-Host('Listening at http://*:80/') ; while ($$listener.IsListening) { ;$$context = $$listener.GetContext() ;$$requestUrl = $$context.Request.Url ;$$clientIP = $$context.Request.RemoteEndPoint.Address ;$$response = $$context.Response ;Write-Host '' ;Write-Host('> {0}' -f $$requestUrl) ;  ;$$count = 1 ;$$k=$$callerCounts.Get_Item($$clientIP) ;if ($$k -ne $$null) { $$count += $$k } ;$$callerCounts.Set_Item($$clientIP, $$count) ;$$ip=(Get-NetAdapter | Get-NetIpAddress); $$header='<html><body><H1>Windows Container Web Server</H1>' ;$$callerCountsString='' ;$$callerCounts.Keys | % { $$callerCountsString+='<p>IP {0} callerCount {1} ' -f $$ip[1].IPAddress,$$callerCounts.Item($$_) } ;$$footer='</body></html>' ;$$content='{0}{1}{2}' -f $$header,$$callerCountsString,$$footer ;Write-Output $$content ;$$buffer = [System.Text.Encoding]::UTF8.GetBytes($$content) ;$$response.ContentLength64 = $$buffer.Length ;$$response.OutputStream.Write($$buffer, 0, $$buffer.Length) ;$$response.Close() ;$$responseStatus = $$response.StatusCode ;Write-Host('< {0}' -f $$responseStatus)  } ; "
      nodeSelector:
        kubernetes.io/os: windows

Test script

#!/bin/bash

# Check for necessary arguments
if [ "$#" -lt 2 ]; then
    echo "Usage: $0 <yaml-file> <namespace> <iterations>"
    exit 1
fi

YAML_FILE="$1"
NAMESPACE="$2"
ITERATIONS="${3:-1}"  # Default to 1 iteration if not provided

# Function to check if all deployments are ready
wait_for_deployments() {
    echo "Waiting for all deployments in namespace '$NAMESPACE' to be ready..."

    # Get list of deployments in the namespace
    deployments=$(kubectl get deployments -n "$NAMESPACE" -o jsonpath='{.items[*].metadata.name}')

    for deployment in $deployments; do
        echo "Waiting for deployment '$deployment' to be ready..."
        kubectl rollout status deployment "$deployment" -n "$NAMESPACE"
        if [ $? -ne 0 ]; then
            echo "Deployment '$deployment' failed to become ready!"
            exit 1
        fi
    done

    echo "All deployments are ready!"
}

# Run the operation 'times' number of times
for ((i=1; i<=ITERATIONS; i++)); do
    echo "Iteration $i / $ITERATIONS"

    # Apply the YAML to the cluster
    echo "Applying resources from $YAML_FILE..."
    kubectl apply -f "$YAML_FILE" -n "$NAMESPACE"
    if [ $? -ne 0 ]; then
        echo "Failed to apply YAML to the cluster. Exiting..."
        exit 1
    fi

    # Wait for all deployments in the namespace to be ready
    wait_for_deployments

    # Delete resources defined in the YAML file
    echo "Deleting resources defined in $YAML_FILE..."
    kubectl delete -f "$YAML_FILE" -n "$NAMESPACE" --timeout 2m
    if [ $? -ne 0 ]; then
        echo "Failed to delete resources from the cluster. Exiting..."
        exit 1
    fi

    echo "Iteration $i completed!"
done

echo "All iterations completed successfully."

./attach-test.sh ./windows-workload-svc.yaml default 10

Anything else we need to know (please consider providing level 4 or above logs of CPI)?

E0215 09:59:22.978484       1 controller.go:301] "Unhandled Error" err="error processing service default/win-webserver (retrying with exponential backoff): failed to check if load balancer exists before cleanup: VirtualMachineService not found"
I0215 09:59:22.979002       1 event.go:389] "Event occurred" object="default/win-webserver" fieldPath="" kind="Service" apiVersion="v1" type="Warning" reason="SyncLoadBalancerFailed" message="Error syncing load balancer: failed to check if load balancer exists before cleanup: VirtualMachineService not found"
E0215 09:59:54.481770       1 loadbalancer.go:72] failed to get load balancer for default/win-webserver: VirtualMachineService not found
E0215 09:59:54.486790       1 controller.go:301] "Unhandled Error" err="error processing service default/win-webserver (retrying with exponential backoff): failed to check if load balancer exists before cleanup: VirtualMachineService not found"

Kubernetes version

$ kubectl version
Server Version: v1.31.4+vmware.1-fips

Cloud provider or hardware configuration

OS version

# On Linux:
$ cat /etc/os-release
# paste output here
$ uname -a
# paste output here

# On Windows:
C:\> wmic os get Caption, Version, BuildNumber, OSArchitecture
# paste output here

Kernel (e.g. uname -a)

Install tools

Container runtime (CRI) and and version (if applicable)

Related plugins (CNI, CSI, ...) and versions (if applicable)

Others

I believe the issue was contributed by the logic here

if vmService == nil {
klog.Errorf("failed to get load balancer for %s: VirtualMachineService not found", namespacedName(service))
return nil, false, errors.Errorf("VirtualMachineService not found")
}

@laozc laozc added the kind/bug Categorizes issue or PR as related to a bug. label Feb 17, 2025
@zhanggbj zhanggbj self-assigned this Mar 13, 2025
@zhanggbj
Copy link
Collaborator

Thanks for reporting the issue!
Assign to me to take a look.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/bug Categorizes issue or PR as related to a bug.
Projects
None yet
Development

No branches or pull requests

2 participants