Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add ck8s ops velero command and clean up new velero finalizers #2141

Merged

Conversation

anders-elastisys
Copy link
Contributor

@anders-elastisys anders-elastisys commented May 16, 2024

Warning

This is a public repository, ensure not to disclose:

  • personal data beyond what is necessary for interacting with this pull request, nor
  • business confidential information, such as customer names.

What kind of PR is this?

Required: Mark one of the following that is applicable:

  • kind/feature
  • kind/improvement
  • kind/deprecation
  • kind/documentation
  • kind/clean-up
  • kind/bug
  • kind/other

Optional: Mark one or more of the following that are applicable:

Important

Breaking changes should be marked kind/admin-change or kind/dev-change depending on type
Critical security fixes should be marked with kind/security

  • kind/admin-change
  • kind/dev-change
  • kind/security
  • kind/adr

Platform Administrator notice

It is now possible to run velero commands with ck8s ops velero <wc|sc>

What does this PR do / why do we need this PR?

Running clean script got stuck when trying to delete Velero CRDs after having upgraded to the new Velero chart. This was caused by newly added finalizers to Velero resources (see breaking changes in this Velero release). This PR adds a step for running velero uninstall before deleting the velero namespace or any of its CRDs, as it is the recommended way according to their docs. To achieve this in the clean scripts, a new ck8s ops command was added for velero to be able to run the velero CLI against the correct cluster context.

Information to reviewers

Checklist

  • Proper commit message prefix on all commits
  • Change checks:
    • The change is transparent
    • The change is disruptive
    • The change requires no migration steps
    • The change requires migration steps
    • The change upgrades CRDs
    • The change updates the config and the schema
  • Metrics checks:
    • The metrics are still exposed and present in Grafana after the change
    • The metrics names didn't change (Grafana dashboards and Prometheus alerts are not affected)
    • The metrics names did change (Grafana dashboards and Prometheus alerts were fixed)
  • Logs checks:
    • The logs do not show any errors after the change
  • Pod Security Policy checks:
    • Any changed pod is covered by Pod Security Admission
    • Any changed pod is covered by Gatekeeper Pod Security Policies
    • The change does not cause any pods to be blocked by Pod Security Admission or Policies
  • Network Policy checks:
    • Any changed pod is covered by Network Policies
    • The change does not cause any dropped packets in the NetworkPolicy Dashboard
  • Audit checks:
    • The change does not cause any unnecessary Kubernetes audit events
    • The change requires changes to Kubernetes audit policy
  • Falco checks:
    • The change does not cause any alerts to be generated by Falco
  • Bug checks:
    • The bug fix is covered by regression tests

@anders-elastisys anders-elastisys force-pushed the anders-elastisys/fix-clean-scripts-for-velero branch 3 times, most recently from afb615b to 34d66ba Compare May 16, 2024 14:53
@anders-elastisys anders-elastisys marked this pull request as ready for review May 20, 2024 07:32
@anders-elastisys anders-elastisys changed the title bin: fix clean scripts after new velero finalizers Fix clean scripts after new Velero finalizers May 20, 2024
Copy link
Contributor

@simonklb simonklb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This feels like it's working around something that was added with a purpose. Is there something preventing us from fulfilling the finalizers instead of just patching them?

@robinAwallace
Copy link
Contributor

This feels like it's working around something that was added with a purpose. Is there something preventing us from fulfilling the finalizers instead of just patching them?

I had the same thought. Should we try to use helmfile destroy more? Will that satisfy finalizers? Also I saw that Velero recommends doing velero uninstall to completely removing velero from a cluster.

Prior to v1.12, deleting the Velero namespace would easily remove all the resources within it.
However, with the introduction of finalizers attached to the Velero CR including restore, dataupload, and datadownload in this version, directly deleting Velero namespace may get stuck indefinitely because the pods responsible for handling the finalizers might be deleted before the resources attached to the finalizers.
To avoid this issue, please use the command velero uninstall to delete all the Velero resources or ensure that you handle the finalizer appropriately before deleting the Velero namespace.

@anders-elastisys
Copy link
Contributor Author

This feels like it's working around something that was added with a purpose. Is there something preventing us from fulfilling the finalizers instead of just patching them?

@simonklb ye, my reasoning would be that there should not matter in this case as we just want to be able to delete everything without interruption, but I will look into using velero uninstall instead as it is the recommended way.

I had the same thought. Should we try to use helmfile destroy more? Will that satisfy finalizers?

@robinAwallace Helmfile destroy would not delete all Velero resources that have finalizers in this case since some are created after deployment such as restores.

Also I saw that Velero recommends doing velero uninstall to completely removing velero from a cluster.

I saw that too and was considering using it, but wasn't sure about adding it as another dependency to apps, but it might be good to do, as then we can also ensure the Velero CLI is installed with the a version matching the cluster as well. I can look into adding that and uninstall Velero that way, as that is the recommended way.

@simonklb
Copy link
Contributor

This feels like it's working around something that was added with a purpose. Is there something preventing us from fulfilling the finalizers instead of just patching them?

@simonklb ye, my reasoning would be that there should not matter in this case as we just want to be able to delete everything without interruption

👍 The only thing I'm worried about is that if we bypass the finalizers and delete we might end up leaving things behind. I assume the finalizers are there to make sure that everything gets cleaned up and terminated gracefully.

On the other hand, it's not such a big deal if backups are left in the object storage since we delete it eventually as well.

but I will look into using velero uninstall instead as it is the recommended way.

Yea, if it's not a huge PITA it might be worth doing it "the right way". 😄

@anders-elastisys anders-elastisys force-pushed the anders-elastisys/fix-clean-scripts-for-velero branch from bcdd411 to eea192f Compare May 22, 2024 14:08
@anders-elastisys
Copy link
Contributor Author

Added so Velero is installed with install-requirements, also added it as an ops command, let me know what you think of this, I think it can be a nice feature to run Velero for the correct cluster context, it makes it easier to run velero uninstall in this case from the clean scripts as well.

Copy link
Contributor

@simonklb simonklb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Really nice! :)

Would love to see the fix to the challenges cleanup in a separate commit, but that is completely optional.

@anders-elastisys anders-elastisys changed the title Fix clean scripts after new Velero finalizers Add ck8s ops velero command and clean up new velero finalizers May 23, 2024
@anders-elastisys anders-elastisys force-pushed the anders-elastisys/fix-clean-scripts-for-velero branch from 4841fe7 to c984b7c Compare May 23, 2024 06:19
@anders-elastisys anders-elastisys merged commit 7602a09 into main May 23, 2024
10 checks passed
@anders-elastisys anders-elastisys deleted the anders-elastisys/fix-clean-scripts-for-velero branch May 23, 2024 11:44
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants