Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fail to install velero helm chart 3.2.0. Getting error - job failed: BackoffLimitExceeded #550

Closed
basaveswar-kureti opened this issue Feb 20, 2024 · 7 comments
Labels
bug Something isn't working upgrade velero

Comments

@basaveswar-kureti
Copy link

What steps did you take and what happened:
This has been working until Feb 19th, 2024.

kubectl create ns velero-320
helm upgrade --debug --install velero vmware-tanzu/velero --version 3.2.0 --namespace velero-320

What did you expect to happen:
We have pinned to version 3.2.0 to avoid breakages and it has been working until yesterday.

The output of the following commands will help us better understand what's going on:

helm upgrade --debug --install velero vmware-tanzu/velero --version 3.2.0 --namespace velero-320
history.go:56: [debug] getting history for release velero
Release "velero" does not exist. Installing it now.
install.go:200: [debug] Original chart version: "3.2.0"
install.go:217: [debug] CHART PATH: /Users/kureti/Library/Caches/helm/repository/velero-3.2.0.tgz

client.go:134: [debug] creating 1 resource(s)
install.go:160: [debug] CRD backuprepositories.velero.io is already present. Skipping.
client.go:134: [debug] creating 1 resource(s)
install.go:160: [debug] CRD backups.velero.io is already present. Skipping.
client.go:134: [debug] creating 1 resource(s)
install.go:160: [debug] CRD backupstoragelocations.velero.io is already present. Skipping.
client.go:134: [debug] creating 1 resource(s)
install.go:160: [debug] CRD deletebackuprequests.velero.io is already present. Skipping.
client.go:134: [debug] creating 1 resource(s)
install.go:160: [debug] CRD downloadrequests.velero.io is already present. Skipping.
client.go:134: [debug] creating 1 resource(s)
install.go:160: [debug] CRD podvolumebackups.velero.io is already present. Skipping.
client.go:134: [debug] creating 1 resource(s)
install.go:160: [debug] CRD podvolumerestores.velero.io is already present. Skipping.
client.go:134: [debug] creating 1 resource(s)
install.go:160: [debug] CRD restores.velero.io is already present. Skipping.
client.go:134: [debug] creating 1 resource(s)
install.go:160: [debug] CRD schedules.velero.io is already present. Skipping.
client.go:134: [debug] creating 1 resource(s)
install.go:160: [debug] CRD serverstatusrequests.velero.io is already present. Skipping.
client.go:134: [debug] creating 1 resource(s)
install.go:160: [debug] CRD volumesnapshotlocations.velero.io is already present. Skipping.
client.go:478: [debug] Starting delete for "velero-upgrade-crds" ClusterRole
client.go:482: [debug] Ignoring delete failure for "velero-upgrade-crds" rbac.authorization.k8s.io/v1, Kind=ClusterRole: clusterroles.rbac.authorization.k8s.io "velero-upgrade-crds" not found
client.go:134: [debug] creating 1 resource(s)
client.go:478: [debug] Starting delete for "velero-server-upgrade-crds" ServiceAccount
client.go:482: [debug] Ignoring delete failure for "velero-server-upgrade-crds" /v1, Kind=ServiceAccount: serviceaccounts "velero-server-upgrade-crds" not found
client.go:134: [debug] creating 1 resource(s)
client.go:478: [debug] Starting delete for "velero-upgrade-crds" ClusterRoleBinding
client.go:482: [debug] Ignoring delete failure for "velero-upgrade-crds" rbac.authorization.k8s.io/v1, Kind=ClusterRoleBinding: clusterrolebindings.rbac.authorization.k8s.io "velero-upgrade-crds" not found
client.go:134: [debug] creating 1 resource(s)
client.go:478: [debug] Starting delete for "velero-upgrade-crds" Job
client.go:482: [debug] Ignoring delete failure for "velero-upgrade-crds" batch/v1, Kind=Job: jobs.batch "velero-upgrade-crds" not found
client.go:134: [debug] creating 1 resource(s)
client.go:706: [debug] Watching for changes to Job velero-upgrade-crds with timeout of 5m0s
client.go:734: [debug] Add/Modify event for velero-upgrade-crds: ADDED
client.go:773: [debug] velero-upgrade-crds: Jobs active: 1, jobs failed: 0, jobs succeeded: 0
client.go:734: [debug] Add/Modify event for velero-upgrade-crds: MODIFIED
client.go:773: [debug] velero-upgrade-crds: Jobs active: 0, jobs failed: 0, jobs succeeded: 0
client.go:734: [debug] Add/Modify event for velero-upgrade-crds: MODIFIED
Error: failed pre-install: 1 error occurred:
	* job failed: BackoffLimitExceeded


helm.go:84: [debug] failed pre-install: 1 error occurred:
	* job failed: BackoffLimitExceeded

Anything else you would like to add:
[Miscellaneous information that will assist in solving the issue.]

Environment:

  • helm version (use helm version): version.BuildInfo{Version:"v3.12.1", GitCommit:"f32a527a060157990e2aa86bf45010dfb3cc8b8d", GitTreeState:"clean", GoVersion:"go1.20.4"}
  • helm chart version and app version (use helm list -n <YOUR NAMESPACE>):
  • Kubernetes version (use kubectl version):
  • Kubernetes installer & version:
  • Cloud provider or hardware configuration: GCP
  • OS (e.g. from /etc/os-release):Mac or Linux
@jenting
Copy link
Collaborator

jenting commented Feb 20, 2024

This has been working until Feb 19th, 2024.

We did not touch the release recently. Perhaps the problem is the kubectl image isn't available.

@basaveswar-kureti
Copy link
Author

@jenting Thanks for your response. Were you able to reproduce this issue using above mentioned commands ?

Perhaps the problem is the kubectl image isn't available.

Could you elaborate more on this ? Is there any workaround to specify the kubectl image through configuration ?

@basaveswar-kureti
Copy link
Author

Looks like this issue is coming specifically with velero image versions v1.11.0 & v1.12.0
This copied /tmp/sh from kubectl image seems to be incompatible with velero transitive dependencies in docker image versions 11 & 12
https://github.com/vmware-tanzu/helm-charts/blob/velero-3.2.0/charts/velero/templates/upgrade-crds/upgrade-crds.yaml#L74

It is giving the following error.

/tmp/sh: /lib/x86_64-linux-gnu/libc.so.6: version `GLIBC_2.33' not found (required by /tmp/sh)
/tmp/sh: /lib/x86_64-linux-gnu/libc.so.6: version `GLIBC_2.34' not found (required by /tmp/sh)

@mpkuth
Copy link

mpkuth commented Feb 22, 2024

We're also running into this issue and it started at about the same time. We believe that it is due to changes in the bitnami/kubectl image, but are still working to confirm that. Here's what we've found so far.

We have one cluster that is working and can see

  - names:
    - docker.io/bitnami/kubectl@sha256:0a3599824bccdd94ce461052e9fe9bea92ef9e84f7fb8ab75c98aeccd766cb03
    - docker.io/bitnami/kubectl:1.26
    sizeBytes: 80374891

And many others failing consistently with

    - docker.io/bitnami/kubectl@sha256:40a1d870fa3289fdffc4b7e86bc9434bd4ab02686bccbf4b800827b3744e04d1
    - docker.io/bitnami/kubectl:1.26
    sizeBytes: 87080325

Overriding the kubectl image digest in the failing clusters to match the digest in the working cluster cause them to start working.


We found bitnami/containers#53360 with a comment from yesterday

Debian 12 containers are already being released, for instance,
...
The whole catalog should be released during the next hours/days.

https://packages.debian.org/source/bullseye/glibc
vs
https://packages.debian.org/source/bookworm/glibc

The timing does seem to line up as the last 1.26-debian-11 image was pushed five days ago (last Friday) and we started seeing this problem in builds on Monday.

That issue points to https://docs.bitnami.com/tutorials/understand-rolling-tags-containers/. So the default tag used by the velero chart is a rolling tag, not an immutable one.

image: "{{ .Values.kubectl.image.repository }}:{{ template "chart.KubernetesVersion" . }}"

Which would explain what we're seeing in our clusters since we don't explicitly set that chart value.


We're using velero chart version 5.1.0 with default values for image and kubectl. Clusters in both AWS and Azure failing the same way.

@mpkuth
Copy link

mpkuth commented Feb 22, 2024

We've confirmed that as the cause of the problem.

kubectl = { image = { tag = "1.26.14-debian-11-r6" } } # works (released on Feb 16)
kubectl = { image = { tag = "1.26.14-debian-12-r0" } } # fails (released on Feb 16)

The error happens because the upgrade CRDs job copies sh and kubectl from that image

- cp `which sh` /tmp && cp `which kubectl` /tmp

And then tries to use it from the velero image.

- /velero install --crds-only --dry-run -o yaml | /tmp/kubectl apply -f -

But the velero 1.20 image is based on debian 11, which isn't compatible with sh from debian 12.

https://github.com/vmware-tanzu/velero/blob/v1.12.0/Dockerfile#L71

That base image changed in velero 1.21 and we think that may be compatible?
https://github.com/vmware-tanzu/velero/blob/v1.12.1/Dockerfile#L73

For now we've just hardcoded the kubectl image tag to 1.26.14-debian-11-r6 (last compatible image for our kubernetes server version) as the least risky fix and are planning to look into upgrading velero in the near future.

@jenting jenting added bug Something isn't working velero labels Feb 29, 2024
jdaln added a commit to jdaln/radar-helm-charts that referenced this issue Apr 8, 2024
Fixes BackfOffLimit Exceeded due to libc.so.6: version `GLIBC_2.33' not found, as mentioned upstream at vmware-tanzu/helm-charts#550
@jenting
Copy link
Collaborator

jenting commented May 16, 2024

Related issue #559, closing it.

@jenting jenting closed this as completed May 16, 2024
@virasana
Copy link

virasana commented Mar 14, 2025

Getting this issue today 14 Mar 2025 with velero 1.15.2 helm chart version 8.5.0
Trying to install on AKS.
It does not appear to be installing the azure init container? (See json output below)

$>kubectl logs velero-upgrade-crds-6jllw
Defaulted container "velero" out of: velero, kubectl (init)
exec /tmp/sh: no such file or directory

Using: velero-plugin-for-microsoft-azure:v1.11.0

kubectl get po velero-upgrade-crds-hxgkl -ojson | jq -r '.spec'

{
  "containers": [
    {
      "args": [
        "-c",
        "/velero install --crds-only --dry-run -o yaml | /tmp/kubectl apply -f -"
      ],
      "command": [
        "/tmp/sh"
      ],
      "image": "velero/velero:v1.15.2",
      "imagePullPolicy": "IfNotPresent",
      "name": "velero",
      "resources": {},
      "terminationMessagePath": "/dev/termination-log",
      "terminationMessagePolicy": "File",
      "volumeMounts": [
        {
          "mountPath": "/tmp",
          "name": "crds"
        },
        {
          "mountPath": "/var/run/secrets/kubernetes.io/serviceaccount",
          "name": "kube-api-access-mmxph",
          "readOnly": true
        }
      ]
    }
  ],
  "dnsPolicy": "ClusterFirst",
  "enableServiceLinks": true,
  "initContainers": [
    {
      "args": [
        "-c",
        "cp `which sh` /tmp && cp `which kubectl` /tmp"
      ],
      "command": [
        "/bin/sh"
      ],
      "image": "docker.io/bitnami/kubectl:1.30",
      "imagePullPolicy": "IfNotPresent",
      "name": "kubectl",
      "resources": {},
      "terminationMessagePath": "/dev/termination-log",
      "terminationMessagePolicy": "File",
      "volumeMounts": [
        {
          "mountPath": "/tmp",
          "name": "crds"
        },
        {
          "mountPath": "/var/run/secrets/kubernetes.io/serviceaccount",
          "name": "kube-api-access-mmxph",
          "readOnly": true
        }
      ]
    }
  ],
  "nodeName": "aks-default-17336273-vmss000002",
  "preemptionPolicy": "PreemptLowerPriority",
  "priority": 0,
  "restartPolicy": "OnFailure",
  "schedulerName": "default-scheduler",
  "securityContext": {},
  "serviceAccount": "velero-identity-sa-upgrade-crds",
  "serviceAccountName": "velero-identity-sa-upgrade-crds",
  "terminationGracePeriodSeconds": 30,
  "tolerations": [
    {
      "effect": "NoExecute",
      "key": "node.kubernetes.io/not-ready",
      "operator": "Exists",
      "tolerationSeconds": 300
    },
    {
      "effect": "NoExecute",
      "key": "node.kubernetes.io/unreachable",
      "operator": "Exists",
      "tolerationSeconds": 300
    }
  ],
  "volumes": [
    {
      "emptyDir": {},
      "name": "crds"
    },
    {
      "name": "kube-api-access-mmxph",
      "projected": {
        "defaultMode": 420,
        "sources": [
          {
            "serviceAccountToken": {
              "expirationSeconds": 3607,
              "path": "token"
            }
          },
          {
            "configMap": {
              "items": [
                {
                  "key": "ca.crt",
                  "path": "ca.crt"
                }
              ],
              "name": "kube-root-ca.crt"
            }
          },
          {
            "downwardAPI": {
              "items": [
                {
                  "fieldRef": {
                    "apiVersion": "v1",
                    "fieldPath": "metadata.namespace"
                  },
                  "path": "namespace"
                }
              ]
            }
          }
        ]
      }
    }
  ]
}

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working upgrade velero
Projects
None yet
Development

No branches or pull requests

4 participants