Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[AVM CI Environment Issue]: Error when checking deployment status #2865

Open
1 task done
cecheta opened this issue Jul 29, 2024 · 5 comments · May be fixed by #4205
Open
1 task done

[AVM CI Environment Issue]: Error when checking deployment status #2865

cecheta opened this issue Jul 29, 2024 · 5 comments · May be fixed by #4205
Labels
Needs: Core Team 🧞 This item needs the AVM Core Team to review it Type: AVM 🅰️ ✌️ Ⓜ️ This is an AVM related issue Type: Bug 🐛 Something isn't working Type: CI 🚀 This issue is related to the AVM CI

Comments

@cecheta
Copy link
Member

cecheta commented Jul 29, 2024

Check for previous/existing GitHub issues

  • I have checked for previous/existing GitHub issues

Issue Type?

Bug

Description

There is an intermittent error that can occur when checking the deployment status during CI:

  VERBOSE: 12:13:38 - Checking deployment status in 8 seconds
  VERBOSE: Resource deployment Failed.. (1/3) Retrying in 5 Seconds.. 
  
  VERBOSE: An error occurred while sending the request.
  
  VERBOSE: Deploying with deployment name [a-p-ap-b-max-t2-20240729T1207070374Z]
  VERBOSE: Setting context to subscription [***]
  VERBOSE: Using Bicep v0.29.47
  ...
...

When this occurs, a new deployment is started, however the first deployment is actually still ongoing. The second deployment is then likely to fail because there are essentially two deployments going on at the same time.

Perhaps a retry could be added when checking the deployment status?

@cecheta cecheta added Needs: Triage 🔍 Maintainers need to triage still Type: AVM 🅰️ ✌️ Ⓜ️ This is an AVM related issue Needs: Core Team 🧞 This item needs the AVM Core Team to review it Type: CI 🚀 This issue is related to the AVM CI labels Jul 29, 2024
@github-project-automation github-project-automation bot moved this to Needs: Triage in AVM - Issue Triage Jul 29, 2024

Important

The "Needs: Triage 🔍" label must be removed once the triage process is complete!

Tip

For additional guidance on how to triage this issue/PR, see the BRM Issue Triage documentation.

@microsoft-github-policy-service microsoft-github-policy-service bot added the Type: Bug 🐛 Something isn't working label Jul 29, 2024

Warning

Tagging the AVM Core Team (@Azure/avm-core-team-technical-bicep) due to a module owner or contributor having not responded to this issue within 3 business days. The AVM Core Team will attempt to contact the module owners/contributors directly.

Tip

  • To prevent further actions to take effect, the "Status: Response Overdue 🚩" label must be removed, once this issue has been responded to.
  • To avoid this rule being (re)triggered, the ""Needs: Triage 🔍" label must be removed as part of the triage process (when the issue is first responded to)!

@microsoft-github-policy-service microsoft-github-policy-service bot added the Status: Response Overdue 🚩 When an issue/PR has not been responded to for X amount of days label Aug 1, 2024
@AlexanderSehr
Copy link
Contributor

Hey @cecheta,
good catch. I think I've seen this happening in a recent APIM deployment. This should be addressed but will be challenging. For one, we need to reproduce the issue while debugging. Then, we must hope that ARM actually returns some proper error that we can interpret because more often than not, information is written to the log, but not actually returned by the cmdlet.
If it turns out it does not return anything useful, we may need to resort to more drastic means and add a logic that picks up after the deployment cmdlet and always pings the deployment itself with some waiting logic (effectively pulling the deployment data every x seconds until it's done).

Would you happen to have noticed a service where this occurs somewhat consistently?

@AlexanderSehr AlexanderSehr removed Needs: Triage 🔍 Maintainers need to triage still Status: Response Overdue 🚩 When an issue/PR has not been responded to for X amount of days labels Aug 1, 2024
@cecheta
Copy link
Member Author

cecheta commented Aug 12, 2024

Unfortunately I haven't observed this behaviour consistently for any service

@AlexanderSehr
Copy link
Contributor

Quick update on this, @mbrat2005 was so keen to recently implement a logic to tackle this issue that is currently waiting in this draft PR: #4205 The idea is essentially not not rely on the replies of the invoking function, but instead target the API endpoint directly.

I have a few bigger topics left to tackle, but then plan to deep dive into that logic and get it merged into our deployment logic.

@AlexanderSehr AlexanderSehr linked a pull request Feb 11, 2025 that will close this issue
11 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Needs: Core Team 🧞 This item needs the AVM Core Team to review it Type: AVM 🅰️ ✌️ Ⓜ️ This is an AVM related issue Type: Bug 🐛 Something isn't working Type: CI 🚀 This issue is related to the AVM CI
Projects
Status: Needs: Triage
Development

Successfully merging a pull request may close this issue.

2 participants