[Velero add-on] Velero deployment fails: Back-off restarting failed container velero in pod velero-upgrade-crds-* #1889

carlosrodlop · 2024-02-21T13:39:15Z

Description

Velero add-on stop working today out of the blue. Same version of eks blueprints add-on worked yesterday. Fixing the velero chart to a prior version does not solve the issue. velero-upgrade-crds job does not finish.

✋ I have searched the open/closed issues and my issue is not listed.

Versions

Module version [Required]:
Terraform version: Terraform v1.7.3

Provider version(s):

+ provider registry.terraform.io/hashicorp/aws v5.37.0
+ provider registry.terraform.io/hashicorp/cloudinit v2.3.3
+ provider registry.terraform.io/hashicorp/helm v2.12.1
+ provider registry.terraform.io/hashicorp/kubernetes v2.26.0
+ provider registry.terraform.io/hashicorp/time v0.10.0
+ provider registry.terraform.io/hashicorp/tls v4.0.5

Reproduction Code [Required]

Steps to reproduce the behavior: Simply run the blueprint statefulset at https://github.com/aws-ia/terraform-aws-eks-blueprints/tree/main/patterns/stateful

I'm not using workspaces
It is a fresh clone

Expected behavior

The blueprint finishes with the terraform apply phase

Actual behavior

The blueprint DOES NOT finish with the terraform apply phase

Setting enable_velero = false makes the blueprint finish correctly

The following error is seeing in the Console output

╷
│ Warning: Helm release "velero" was created but has a failed status. Use the `helm` command to investigate the error, correct it, then run Terraform again.
│ 
│   with module.eks_blueprints_addons.module.velero.helm_release.this[0],
│   on .terraform/modules/eks_blueprints_addons.velero/main.tf line 9, in resource "helm_release" "this":
│    9: resource "helm_release" "this" {
│ 
╵
╷
│ Error: failed pre-install: 1 error occurred:
│       * job failed: BackoffLimitExceeded
│ 
│ 
│ 
│   with module.eks_blueprints_addons.module.velero.helm_release.this[0],
│   on .terraform/modules/eks_blueprints_addons.velero/main.tf line 9, in resource "helm_release" "this":
│    9: resource "helm_release" "this" {
│ 
╵

Terminal Output Screenshot(s)

Additional context

Terraform logs show

2024-02-21T09:10:52.765+0100 [ERROR] provider.terraform-provider-helm_v2.12.1_x5: Response contains error diagnostic: diagnostic_severity=ERROR diagnostic_summary="failed pre-install: 1 error occurred:
	* job failed: BackoffLimitExceeded

" tf_provider_addr=provider tf_resource_type=helm_release tf_proto_version=5.4 tf_req_id=b67a59b5-950f-4f79-359d-e12420dc9335 @caller=github.com/hashicorp/[email protected]/tfprotov5/internal/diag/diagnostics.go:58 @module=sdk.proto diagnostic_detail= tf_rpc=ApplyResourceChange timestamp=2024-02-21T09:10:52.765+0100
2024-02-21T09:10:52.813+0100 [ERROR] vertex "module.eks_blueprints_addons.module.velero.helm_release.this[0]" error: failed pre-install: 1 error occurred:
	* job failed: BackoffLimitExceeded

Kubectl events:

LAST SEEN   TYPE      REASON                 OBJECT                          MESSAGE
50m         Normal    SuccessfulCreate       job/velero-upgrade-crds         Created pod: velero-upgrade-crds-5g6xq
50m         Normal    Pulled                 pod/velero-upgrade-crds-5g6xq   Container image "docker.io/bitnami/kubectl:1.27" already present on machine
50m         Normal    Created                pod/velero-upgrade-crds-5g6xq   Created container kubectl
50m         Normal    Started                pod/velero-upgrade-crds-5g6xq   Started container kubectl
50m         Normal    Scheduled              pod/velero-upgrade-crds-5g6xq   Successfully assigned velero/velero-upgrade-crds-5g6xq to ip-10-0-48-42.ec2.internal
49m         Normal    Started                pod/velero-upgrade-crds-5g6xq   Started container velero
49m         Normal    Created                pod/velero-upgrade-crds-5g6xq   Created container velero
49m         Normal    Pulled                 pod/velero-upgrade-crds-5g6xq   Container image "velero/velero:v1.11.0" already present on machine
49m         Warning   BackOff                pod/velero-upgrade-crds-5g6xq   Back-off restarting failed container velero in pod velero-upgrade-crds-5g6xq_velero(5ddae231-373d-42dd-a2ff-daea57c2ed5f)
49m         Warning   BackoffLimitExceeded   job/velero-upgrade-crds         Job has reached the specified backoff limit
49m         Normal    SuccessfulDelete       job/velero-upgrade-crds         Deleted pod: velero-upgrade-crds-5g6xq
21m         Normal    Scheduled              pod/velero-upgrade-crds-mlft5   Successfully assigned velero/velero-upgrade-crds-mlft5 to ip-10-0-49-177.ec2.internal
21m         Normal    SuccessfulCreate       job/velero-upgrade-crds         Created pod: velero-upgrade-crds-mlft5
21m         Normal    Pulling                pod/velero-upgrade-crds-mlft5   Pulling image "docker.io/bitnami/kubectl:1.27"
21m         Normal    Started                pod/velero-upgrade-crds-mlft5   Started container kubectl
21m         Normal    Created                pod/velero-upgrade-crds-mlft5   Created container kubectl
21m         Normal    Pulled                 pod/velero-upgrade-crds-mlft5   Successfully pulled image "docker.io/bitnami/kubectl:1.27" in 3.611612517s (3.61162356s including waiting)
21m         Normal    Pulling                pod/velero-upgrade-crds-mlft5   Pulling image "velero/velero:v1.11.0"
20m         Normal    Created                pod/velero-upgrade-crds-mlft5   Created container velero
20m         Normal    Started                pod/velero-upgrade-crds-mlft5   Started container velero
21m         Normal    Pulled                 pod/velero-upgrade-crds-mlft5   Successfully pulled image "velero/velero:v1.11.0" in 2.043041549s (2.043052288s including waiting)
20m         Normal    Pulled                 pod/velero-upgrade-crds-mlft5   Container image "velero/velero:v1.11.0" already present on machine
20m         Warning   BackOff                pod/velero-upgrade-crds-mlft5   Back-off restarting failed container velero in pod velero-upgrade-crds-mlft5_velero(27000cbe-13f8-48a3-a8a2-0d429e41e532)
20m         Normal    SuccessfulDelete       job/velero-upgrade-crds         Deleted pod: velero-upgrade-crds-mlft5
20m         Warning   BackoffLimitExceeded   job/velero-upgrade-crds         Job has reached the specified backoff limit

The version of the velero charts is stuck to 3.2.0 since v1.5.0. I tried to use chart to the previous version tested for this add-on 3.1.6 but the issue remains.

enable_velero             = true
  velero = {
    s3_backup_location = local.velero_s3_location
    chart_version = "3.1.6"
  }

The text was updated successfully, but these errors were encountered:

wellsiau-aws · 2024-02-23T00:26:14Z

relates to this upstream issue

carlosrodlop · 2024-02-23T22:44:34Z

Passing the following Helm values file velero-values.yml to the Velero Add-on solve the reported issue

velero-values.yml:

#https://artifacthub.io/packages/helm/vmware-tanzu/velero
#https://github.com/vmware-tanzu/helm-charts/blob/main/charts/velero/values.yaml

kubectl:
  image:
    tag: 1.26.14-debian-11-r6

main.tf:

...

module "eks_blueprints_addons" {

...
 velero = {
    values = [file("velero-values.yml")]
    s3_backup_location = local.velero_s3_location
  }

...

}
...

github-actions · 2024-03-26T00:09:32Z

This issue has been automatically marked as stale because it has been open 30 days
with no activity. Remove stale label or comment or this issue will be closed in 10 days

github-actions · 2024-04-05T00:10:16Z

Issue closed due to inactivity.

carlosrodlop changed the title ~~[Velero] Velero deployment fails~~ [Velero add-on] Velero deployment fails Feb 21, 2024

carlosrodlop mentioned this issue Feb 21, 2024

[Blueprints, 02-at-scale] Velero deployment fails cloudbees-oss/terraform-aws-cloudbees-ci-eks-addon#60

Closed

carlosrodlop changed the title ~~[Velero add-on] Velero deployment fails~~ [Velero add-on] Velero deployment fails: velero-upgrade-crds job does not finish Feb 21, 2024

carlosrodlop changed the title ~~[Velero add-on] Velero deployment fails: velero-upgrade-crds job does not finish~~ [Velero add-on] Velero deployment fails: Back-off restarting failed container velero in pod velero-upgrade-crds-* Feb 21, 2024

github-actions bot added the stale label Mar 26, 2024

github-actions bot closed this as not planned Won't fix, can't repro, duplicate, stale Apr 5, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Velero add-on] Velero deployment fails: Back-off restarting failed container velero in pod velero-upgrade-crds-* #1889

[Velero add-on] Velero deployment fails: Back-off restarting failed container velero in pod velero-upgrade-crds-* #1889

carlosrodlop commented Feb 21, 2024 •

edited

Loading

wellsiau-aws commented Feb 23, 2024

carlosrodlop commented Feb 23, 2024 •

edited

Loading

github-actions bot commented Mar 26, 2024

github-actions bot commented Apr 5, 2024

[Velero add-on] Velero deployment fails: Back-off restarting failed container velero in pod velero-upgrade-crds-* #1889

[Velero add-on] Velero deployment fails: Back-off restarting failed container velero in pod velero-upgrade-crds-* #1889

Comments

carlosrodlop commented Feb 21, 2024 • edited Loading

Description

Versions

Reproduction Code [Required]

Expected behavior

Actual behavior

Terminal Output Screenshot(s)

Additional context

wellsiau-aws commented Feb 23, 2024

carlosrodlop commented Feb 23, 2024 • edited Loading

github-actions bot commented Mar 26, 2024

github-actions bot commented Apr 5, 2024

carlosrodlop commented Feb 21, 2024 •

edited

Loading

carlosrodlop commented Feb 23, 2024 •

edited

Loading