Fail to install velero helm chart 3.2.0. Getting error - job failed: BackoffLimitExceeded #550

basaveswar-kureti · 2024-02-20T19:03:02Z

What steps did you take and what happened:
This has been working until Feb 19th, 2024.

kubectl create ns velero-320
helm upgrade --debug --install velero vmware-tanzu/velero --version 3.2.0 --namespace velero-320

What did you expect to happen:
We have pinned to version 3.2.0 to avoid breakages and it has been working until yesterday.

The output of the following commands will help us better understand what's going on:

helm upgrade --debug --install velero vmware-tanzu/velero --version 3.2.0 --namespace velero-320
history.go:56: [debug] getting history for release velero
Release "velero" does not exist. Installing it now.
install.go:200: [debug] Original chart version: "3.2.0"
install.go:217: [debug] CHART PATH: /Users/kureti/Library/Caches/helm/repository/velero-3.2.0.tgz

client.go:134: [debug] creating 1 resource(s)
install.go:160: [debug] CRD backuprepositories.velero.io is already present. Skipping.
client.go:134: [debug] creating 1 resource(s)
install.go:160: [debug] CRD backups.velero.io is already present. Skipping.
client.go:134: [debug] creating 1 resource(s)
install.go:160: [debug] CRD backupstoragelocations.velero.io is already present. Skipping.
client.go:134: [debug] creating 1 resource(s)
install.go:160: [debug] CRD deletebackuprequests.velero.io is already present. Skipping.
client.go:134: [debug] creating 1 resource(s)
install.go:160: [debug] CRD downloadrequests.velero.io is already present. Skipping.
client.go:134: [debug] creating 1 resource(s)
install.go:160: [debug] CRD podvolumebackups.velero.io is already present. Skipping.
client.go:134: [debug] creating 1 resource(s)
install.go:160: [debug] CRD podvolumerestores.velero.io is already present. Skipping.
client.go:134: [debug] creating 1 resource(s)
install.go:160: [debug] CRD restores.velero.io is already present. Skipping.
client.go:134: [debug] creating 1 resource(s)
install.go:160: [debug] CRD schedules.velero.io is already present. Skipping.
client.go:134: [debug] creating 1 resource(s)
install.go:160: [debug] CRD serverstatusrequests.velero.io is already present. Skipping.
client.go:134: [debug] creating 1 resource(s)
install.go:160: [debug] CRD volumesnapshotlocations.velero.io is already present. Skipping.
client.go:478: [debug] Starting delete for "velero-upgrade-crds" ClusterRole
client.go:482: [debug] Ignoring delete failure for "velero-upgrade-crds" rbac.authorization.k8s.io/v1, Kind=ClusterRole: clusterroles.rbac.authorization.k8s.io "velero-upgrade-crds" not found
client.go:134: [debug] creating 1 resource(s)
client.go:478: [debug] Starting delete for "velero-server-upgrade-crds" ServiceAccount
client.go:482: [debug] Ignoring delete failure for "velero-server-upgrade-crds" /v1, Kind=ServiceAccount: serviceaccounts "velero-server-upgrade-crds" not found
client.go:134: [debug] creating 1 resource(s)
client.go:478: [debug] Starting delete for "velero-upgrade-crds" ClusterRoleBinding
client.go:482: [debug] Ignoring delete failure for "velero-upgrade-crds" rbac.authorization.k8s.io/v1, Kind=ClusterRoleBinding: clusterrolebindings.rbac.authorization.k8s.io "velero-upgrade-crds" not found
client.go:134: [debug] creating 1 resource(s)
client.go:478: [debug] Starting delete for "velero-upgrade-crds" Job
client.go:482: [debug] Ignoring delete failure for "velero-upgrade-crds" batch/v1, Kind=Job: jobs.batch "velero-upgrade-crds" not found
client.go:134: [debug] creating 1 resource(s)
client.go:706: [debug] Watching for changes to Job velero-upgrade-crds with timeout of 5m0s
client.go:734: [debug] Add/Modify event for velero-upgrade-crds: ADDED
client.go:773: [debug] velero-upgrade-crds: Jobs active: 1, jobs failed: 0, jobs succeeded: 0
client.go:734: [debug] Add/Modify event for velero-upgrade-crds: MODIFIED
client.go:773: [debug] velero-upgrade-crds: Jobs active: 0, jobs failed: 0, jobs succeeded: 0
client.go:734: [debug] Add/Modify event for velero-upgrade-crds: MODIFIED
Error: failed pre-install: 1 error occurred:
	* job failed: BackoffLimitExceeded


helm.go:84: [debug] failed pre-install: 1 error occurred:
	* job failed: BackoffLimitExceeded

Anything else you would like to add:
[Miscellaneous information that will assist in solving the issue.]

Environment:

helm version (use helm version): version.BuildInfo{Version:"v3.12.1", GitCommit:"f32a527a060157990e2aa86bf45010dfb3cc8b8d", GitTreeState:"clean", GoVersion:"go1.20.4"}
helm chart version and app version (use helm list -n <YOUR NAMESPACE>):
Kubernetes version (use kubectl version):
Kubernetes installer & version:
Cloud provider or hardware configuration: GCP
OS (e.g. from /etc/os-release):Mac or Linux

The text was updated successfully, but these errors were encountered:

jenting · 2024-02-20T23:40:29Z

This has been working until Feb 19th, 2024.

We did not touch the release recently. Perhaps the problem is the kubectl image isn't available.

basaveswar-kureti · 2024-02-21T14:14:02Z

@jenting Thanks for your response. Were you able to reproduce this issue using above mentioned commands ?

Perhaps the problem is the kubectl image isn't available.

Could you elaborate more on this ? Is there any workaround to specify the kubectl image through configuration ?

basaveswar-kureti · 2024-02-21T21:06:18Z

Looks like this issue is coming specifically with velero image versions v1.11.0 & v1.12.0
This copied /tmp/sh from kubectl image seems to be incompatible with velero transitive dependencies in docker image versions 11 & 12
https://github.com/vmware-tanzu/helm-charts/blob/velero-3.2.0/charts/velero/templates/upgrade-crds/upgrade-crds.yaml#L74

It is giving the following error.

/tmp/sh: /lib/x86_64-linux-gnu/libc.so.6: version `GLIBC_2.33' not found (required by /tmp/sh)
/tmp/sh: /lib/x86_64-linux-gnu/libc.so.6: version `GLIBC_2.34' not found (required by /tmp/sh)

mpkuth · 2024-02-22T01:18:46Z

We're also running into this issue and it started at about the same time. We believe that it is due to changes in the bitnami/kubectl image, but are still working to confirm that. Here's what we've found so far.

We have one cluster that is working and can see

  - names:
    - docker.io/bitnami/kubectl@sha256:0a3599824bccdd94ce461052e9fe9bea92ef9e84f7fb8ab75c98aeccd766cb03
    - docker.io/bitnami/kubectl:1.26
    sizeBytes: 80374891

And many others failing consistently with

    - docker.io/bitnami/kubectl@sha256:40a1d870fa3289fdffc4b7e86bc9434bd4ab02686bccbf4b800827b3744e04d1
    - docker.io/bitnami/kubectl:1.26
    sizeBytes: 87080325

Overriding the kubectl image digest in the failing clusters to match the digest in the working cluster cause them to start working.

We found bitnami/containers#53360 with a comment from yesterday

Debian 12 containers are already being released, for instance,
...
The whole catalog should be released during the next hours/days.

https://packages.debian.org/source/bullseye/glibc
vs
https://packages.debian.org/source/bookworm/glibc

The timing does seem to line up as the last 1.26-debian-11 image was pushed five days ago (last Friday) and we started seeing this problem in builds on Monday.

That issue points to https://docs.bitnami.com/tutorials/understand-rolling-tags-containers/. So the default tag used by the velero chart is a rolling tag, not an immutable one.

helm-charts/charts/velero/templates/upgrade-crds/upgrade-crds.yaml

Line 49 in b4ced58

    
           image: "{{ .Values.kubectl.image.repository }}:{{ template "chart.KubernetesVersion" . }}"

Which would explain what we're seeing in our clusters since we don't explicitly set that chart value.

We're using velero chart version 5.1.0 with default values for image and kubectl. Clusters in both AWS and Azure failing the same way.

mpkuth · 2024-02-22T17:33:02Z

We've confirmed that as the cause of the problem.

kubectl = { image = { tag = "1.26.14-debian-11-r6" } } # works (released on Feb 16)
kubectl = { image = { tag = "1.26.14-debian-12-r0" } } # fails (released on Feb 16)

The error happens because the upgrade CRDs job copies sh and kubectl from that image

helm-charts/charts/velero/templates/upgrade-crds/upgrade-crds.yaml

Line 53 in 1e6a5b4

- cp `which sh` /tmp && cp `which kubectl` /tmp

And then tries to use it from the velero image.

helm-charts/charts/velero/templates/upgrade-crds/upgrade-crds.yaml

Line 77 in 1e6a5b4

- /velero install --crds-only --dry-run -o yaml | /tmp/kubectl apply -f -

But the velero 1.20 image is based on debian 11, which isn't compatible with sh from debian 12.

https://github.com/vmware-tanzu/velero/blob/v1.12.0/Dockerfile#L71

That base image changed in velero 1.21 and we think that may be compatible?
https://github.com/vmware-tanzu/velero/blob/v1.12.1/Dockerfile#L73

For now we've just hardcoded the kubectl image tag to 1.26.14-debian-11-r6 (last compatible image for our kubernetes server version) as the least risky fix and are planning to look into upgrading velero in the near future.

Fixes BackfOffLimit Exceeded due to libc.so.6: version `GLIBC_2.33' not found, as mentioned upstream at vmware-tanzu/helm-charts#550

jenting · 2024-05-16T12:13:17Z

Related issue #559, closing it.

virasana · 2025-03-14T18:57:57Z

Getting this issue today 14 Mar 2025 with velero 1.15.2 helm chart version 8.5.0
Trying to install on AKS.
It does not appear to be installing the azure init container? (See json output below)

$>kubectl logs velero-upgrade-crds-6jllw
Defaulted container "velero" out of: velero, kubectl (init)
exec /tmp/sh: no such file or directory

Using: velero-plugin-for-microsoft-azure:v1.11.0

kubectl get po velero-upgrade-crds-hxgkl -ojson | jq -r '.spec'

{
  "containers": [
    {
      "args": [
        "-c",
        "/velero install --crds-only --dry-run -o yaml | /tmp/kubectl apply -f -"
      ],
      "command": [
        "/tmp/sh"
      ],
      "image": "velero/velero:v1.15.2",
      "imagePullPolicy": "IfNotPresent",
      "name": "velero",
      "resources": {},
      "terminationMessagePath": "/dev/termination-log",
      "terminationMessagePolicy": "File",
      "volumeMounts": [
        {
          "mountPath": "/tmp",
          "name": "crds"
        },
        {
          "mountPath": "/var/run/secrets/kubernetes.io/serviceaccount",
          "name": "kube-api-access-mmxph",
          "readOnly": true
        }
      ]
    }
  ],
  "dnsPolicy": "ClusterFirst",
  "enableServiceLinks": true,
  "initContainers": [
    {
      "args": [
        "-c",
        "cp `which sh` /tmp && cp `which kubectl` /tmp"
      ],
      "command": [
        "/bin/sh"
      ],
      "image": "docker.io/bitnami/kubectl:1.30",
      "imagePullPolicy": "IfNotPresent",
      "name": "kubectl",
      "resources": {},
      "terminationMessagePath": "/dev/termination-log",
      "terminationMessagePolicy": "File",
      "volumeMounts": [
        {
          "mountPath": "/tmp",
          "name": "crds"
        },
        {
          "mountPath": "/var/run/secrets/kubernetes.io/serviceaccount",
          "name": "kube-api-access-mmxph",
          "readOnly": true
        }
      ]
    }
  ],
  "nodeName": "aks-default-17336273-vmss000002",
  "preemptionPolicy": "PreemptLowerPriority",
  "priority": 0,
  "restartPolicy": "OnFailure",
  "schedulerName": "default-scheduler",
  "securityContext": {},
  "serviceAccount": "velero-identity-sa-upgrade-crds",
  "serviceAccountName": "velero-identity-sa-upgrade-crds",
  "terminationGracePeriodSeconds": 30,
  "tolerations": [
    {
      "effect": "NoExecute",
      "key": "node.kubernetes.io/not-ready",
      "operator": "Exists",
      "tolerationSeconds": 300
    },
    {
      "effect": "NoExecute",
      "key": "node.kubernetes.io/unreachable",
      "operator": "Exists",
      "tolerationSeconds": 300
    }
  ],
  "volumes": [
    {
      "emptyDir": {},
      "name": "crds"
    },
    {
      "name": "kube-api-access-mmxph",
      "projected": {
        "defaultMode": 420,
        "sources": [
          {
            "serviceAccountToken": {
              "expirationSeconds": 3607,
              "path": "token"
            }
          },
          {
            "configMap": {
              "items": [
                {
                  "key": "ca.crt",
                  "path": "ca.crt"
                }
              ],
              "name": "kube-root-ca.crt"
            }
          },
          {
            "downwardAPI": {
              "items": [
                {
                  "fieldRef": {
                    "apiVersion": "v1",
                    "fieldPath": "metadata.namespace"
                  },
                  "path": "namespace"
                }
              ]
            }
          }
        ]
      }
    }
  ]
}

fishgrimsby mentioned this issue Feb 24, 2024

glibc version mismatch in velero-upgrade-crds vmware-tanzu/velero#7462

Closed

crssnd mentioned this issue Feb 26, 2024

Apply velero crds manually during apps v0.36 upgrade elastisys/compliantkubernetes-apps#2008

Merged

31 tasks

exactlyaron mentioned this issue Feb 26, 2024

helm upgrade for velero failing error Error: UPGRADE FAILED: post-upgrade hooks failed: 1 error occurred: * job failed: BackoffLimitExceeded #552

Closed

jenting added bug Something isn't working velero labels Feb 29, 2024

baixiac mentioned this issue Mar 14, 2024

Release v1.2 RADAR-base/RADAR-Kubernetes#242

Closed

Sabian-A mentioned this issue Mar 15, 2024

velero kubectl image fix zifeo/terraform-openstack-rke2#29

Merged

jdaln added a commit to jdaln/radar-helm-charts that referenced this issue Apr 8, 2024

bugfix: BackOffLimitExceeded GLIBC not found Velero

0c06177

Fixes BackfOffLimit Exceeded due to libc.so.6: version `GLIBC_2.33' not found, as mentioned upstream at vmware-tanzu/helm-charts#550

jenting added the upgrade label May 13, 2024

jenting closed this as completed May 16, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fail to install velero helm chart 3.2.0. Getting error - job failed: BackoffLimitExceeded #550

Fail to install velero helm chart 3.2.0. Getting error - job failed: BackoffLimitExceeded #550

basaveswar-kureti commented Feb 20, 2024

jenting commented Feb 20, 2024

basaveswar-kureti commented Feb 21, 2024

basaveswar-kureti commented Feb 21, 2024

mpkuth commented Feb 22, 2024 •

edited

Loading

mpkuth commented Feb 22, 2024

jenting commented May 16, 2024

virasana commented Mar 14, 2025 •

edited

Loading

Fail to install velero helm chart 3.2.0. Getting error - job failed: BackoffLimitExceeded #550

Fail to install velero helm chart 3.2.0. Getting error - job failed: BackoffLimitExceeded #550

Comments

basaveswar-kureti commented Feb 20, 2024

jenting commented Feb 20, 2024

basaveswar-kureti commented Feb 21, 2024

basaveswar-kureti commented Feb 21, 2024

mpkuth commented Feb 22, 2024 • edited Loading

mpkuth commented Feb 22, 2024

jenting commented May 16, 2024

virasana commented Mar 14, 2025 • edited Loading

mpkuth commented Feb 22, 2024 •

edited

Loading

virasana commented Mar 14, 2025 •

edited

Loading