Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Velero add-on] Velero deployment fails: Back-off restarting failed container velero in pod velero-upgrade-crds-* #1889

Closed
1 task done
carlosrodlop opened this issue Feb 21, 2024 · 4 comments
Labels

Comments

@carlosrodlop
Copy link

carlosrodlop commented Feb 21, 2024

Description

Velero add-on stop working today out of the blue. Same version of eks blueprints add-on worked yesterday. Fixing the velero chart to a prior version does not solve the issue. velero-upgrade-crds job does not finish.

  • ✋ I have searched the open/closed issues and my issue is not listed.

Versions

  • Module version [Required]:

  • Terraform version: Terraform v1.7.3

  • Provider version(s):
+ provider registry.terraform.io/hashicorp/aws v5.37.0
+ provider registry.terraform.io/hashicorp/cloudinit v2.3.3
+ provider registry.terraform.io/hashicorp/helm v2.12.1
+ provider registry.terraform.io/hashicorp/kubernetes v2.26.0
+ provider registry.terraform.io/hashicorp/time v0.10.0
+ provider registry.terraform.io/hashicorp/tls v4.0.5

Reproduction Code [Required]

Steps to reproduce the behavior: Simply run the blueprint statefulset at https://github.com/aws-ia/terraform-aws-eks-blueprints/tree/main/patterns/stateful

  • I'm not using workspaces
  • It is a fresh clone

Expected behavior

The blueprint finishes with the terraform apply phase

Actual behavior

The blueprint DOES NOT finish with the terraform apply phase

Setting enable_velero = false makes the blueprint finish correctly

The following error is seeing in the Console output

╷
│ Warning: Helm release "velero" was created but has a failed status. Use the `helm` command to investigate the error, correct it, then run Terraform again.
│ 
│   with module.eks_blueprints_addons.module.velero.helm_release.this[0],
│   on .terraform/modules/eks_blueprints_addons.velero/main.tf line 9, in resource "helm_release" "this":
│    9: resource "helm_release" "this" {
│ 
╵
╷
│ Error: failed pre-install: 1 error occurred:
│       * job failed: BackoffLimitExceeded
│ 
│ 
│ 
│   with module.eks_blueprints_addons.module.velero.helm_release.this[0],
│   on .terraform/modules/eks_blueprints_addons.velero/main.tf line 9, in resource "helm_release" "this":
│    9: resource "helm_release" "this" {
│ 
╵

Terminal Output Screenshot(s)

image

Additional context

Terraform logs show

2024-02-21T09:10:52.765+0100 [ERROR] provider.terraform-provider-helm_v2.12.1_x5: Response contains error diagnostic: diagnostic_severity=ERROR diagnostic_summary="failed pre-install: 1 error occurred:
	* job failed: BackoffLimitExceeded

" tf_provider_addr=provider tf_resource_type=helm_release tf_proto_version=5.4 tf_req_id=b67a59b5-950f-4f79-359d-e12420dc9335 @caller=github.com/hashicorp/[email protected]/tfprotov5/internal/diag/diagnostics.go:58 @module=sdk.proto diagnostic_detail= tf_rpc=ApplyResourceChange timestamp=2024-02-21T09:10:52.765+0100
2024-02-21T09:10:52.813+0100 [ERROR] vertex "module.eks_blueprints_addons.module.velero.helm_release.this[0]" error: failed pre-install: 1 error occurred:
	* job failed: BackoffLimitExceeded

Kubectl events:

LAST SEEN   TYPE      REASON                 OBJECT                          MESSAGE
50m         Normal    SuccessfulCreate       job/velero-upgrade-crds         Created pod: velero-upgrade-crds-5g6xq
50m         Normal    Pulled                 pod/velero-upgrade-crds-5g6xq   Container image "docker.io/bitnami/kubectl:1.27" already present on machine
50m         Normal    Created                pod/velero-upgrade-crds-5g6xq   Created container kubectl
50m         Normal    Started                pod/velero-upgrade-crds-5g6xq   Started container kubectl
50m         Normal    Scheduled              pod/velero-upgrade-crds-5g6xq   Successfully assigned velero/velero-upgrade-crds-5g6xq to ip-10-0-48-42.ec2.internal
49m         Normal    Started                pod/velero-upgrade-crds-5g6xq   Started container velero
49m         Normal    Created                pod/velero-upgrade-crds-5g6xq   Created container velero
49m         Normal    Pulled                 pod/velero-upgrade-crds-5g6xq   Container image "velero/velero:v1.11.0" already present on machine
49m         Warning   BackOff                pod/velero-upgrade-crds-5g6xq   Back-off restarting failed container velero in pod velero-upgrade-crds-5g6xq_velero(5ddae231-373d-42dd-a2ff-daea57c2ed5f)
49m         Warning   BackoffLimitExceeded   job/velero-upgrade-crds         Job has reached the specified backoff limit
49m         Normal    SuccessfulDelete       job/velero-upgrade-crds         Deleted pod: velero-upgrade-crds-5g6xq
21m         Normal    Scheduled              pod/velero-upgrade-crds-mlft5   Successfully assigned velero/velero-upgrade-crds-mlft5 to ip-10-0-49-177.ec2.internal
21m         Normal    SuccessfulCreate       job/velero-upgrade-crds         Created pod: velero-upgrade-crds-mlft5
21m         Normal    Pulling                pod/velero-upgrade-crds-mlft5   Pulling image "docker.io/bitnami/kubectl:1.27"
21m         Normal    Started                pod/velero-upgrade-crds-mlft5   Started container kubectl
21m         Normal    Created                pod/velero-upgrade-crds-mlft5   Created container kubectl
21m         Normal    Pulled                 pod/velero-upgrade-crds-mlft5   Successfully pulled image "docker.io/bitnami/kubectl:1.27" in 3.611612517s (3.61162356s including waiting)
21m         Normal    Pulling                pod/velero-upgrade-crds-mlft5   Pulling image "velero/velero:v1.11.0"
20m         Normal    Created                pod/velero-upgrade-crds-mlft5   Created container velero
20m         Normal    Started                pod/velero-upgrade-crds-mlft5   Started container velero
21m         Normal    Pulled                 pod/velero-upgrade-crds-mlft5   Successfully pulled image "velero/velero:v1.11.0" in 2.043041549s (2.043052288s including waiting)
20m         Normal    Pulled                 pod/velero-upgrade-crds-mlft5   Container image "velero/velero:v1.11.0" already present on machine
20m         Warning   BackOff                pod/velero-upgrade-crds-mlft5   Back-off restarting failed container velero in pod velero-upgrade-crds-mlft5_velero(27000cbe-13f8-48a3-a8a2-0d429e41e532)
20m         Normal    SuccessfulDelete       job/velero-upgrade-crds         Deleted pod: velero-upgrade-crds-mlft5
20m         Warning   BackoffLimitExceeded   job/velero-upgrade-crds         Job has reached the specified backoff limit

The version of the velero charts is stuck to 3.2.0 since v1.5.0. I tried to use chart to the previous version tested for this add-on 3.1.6 but the issue remains.

enable_velero             = true
  velero = {
    s3_backup_location = local.velero_s3_location
    chart_version = "3.1.6"
  }
@carlosrodlop carlosrodlop changed the title [Velero] Velero deployment fails [Velero add-on] Velero deployment fails Feb 21, 2024
@carlosrodlop carlosrodlop changed the title [Velero add-on] Velero deployment fails [Velero add-on] Velero deployment fails: velero-upgrade-crds job does not finish Feb 21, 2024
@carlosrodlop carlosrodlop changed the title [Velero add-on] Velero deployment fails: velero-upgrade-crds job does not finish [Velero add-on] Velero deployment fails: Back-off restarting failed container velero in pod velero-upgrade-crds-* Feb 21, 2024
@wellsiau-aws
Copy link

relates to this upstream issue

@carlosrodlop
Copy link
Author

carlosrodlop commented Feb 23, 2024

Passing the following Helm values file velero-values.yml to the Velero Add-on solve the reported issue

velero-values.yml:

#https://artifacthub.io/packages/helm/vmware-tanzu/velero
#https://github.com/vmware-tanzu/helm-charts/blob/main/charts/velero/values.yaml

kubectl:
  image:
    tag: 1.26.14-debian-11-r6

main.tf:

...

module "eks_blueprints_addons" {

...
 velero = {
    values = [file("velero-values.yml")]
    s3_backup_location = local.velero_s3_location
  }

...

}
...

Copy link
Contributor

This issue has been automatically marked as stale because it has been open 30 days
with no activity. Remove stale label or comment or this issue will be closed in 10 days

@github-actions github-actions bot added the stale label Mar 26, 2024
Copy link
Contributor

github-actions bot commented Apr 5, 2024

Issue closed due to inactivity.

@github-actions github-actions bot closed this as not planned Won't fix, can't repro, duplicate, stale Apr 5, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants