Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Adding Azure monitor metrics to TSC repo for node availability #345

Merged
merged 2 commits into from
Jan 10, 2025

Conversation

shaheislamdfe
Copy link
Contributor

@shaheislamdfe shaheislamdfe commented Jan 9, 2025

Context

We need a way to alert when our nodepool is close to max capacity to preempty scaling if necessary.

Changes proposed in this pull request

Adding an Azure Monitor Metric Alert with an Action group for each environment and dynamically setting the threshold based on the max count.

Guidance to review

This has been checked in the Azure Monitor UI and is correctly working.

Checklist

  • I have performed a self-review of my code, including formatting and typos
  • I have cleaned the commit history
  • I have added the Devops label
  • I have attached the pull request to the trello card

…check for each environment with separate action group and monitor per environment
Copy link
Contributor

@neillturner neillturner left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you should not create the action group in the makefile but in terraform. see my PR #343.
we probably need to co-ordinate these two PRs so we share the same action group and decide if it goes to email or slack or maybe both. Also i put my azure alert in the existing monitor tf file.

@RMcVelia
Copy link
Collaborator

you should not create the action group in the makefile but in terraform. see my PR #343. we probably need to co-ordinate these two PRs so we share the same action group and decide if it goes to email or slack or maybe both. Also i put my azure alert in the existing monitor tf file.

Our existing standard is to create all the action groups via the Makefile. We use email as slack webhooks for azure alerting require extra infrastructure (logic app or similar).

Copy link
Contributor

@neillturner neillturner left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

as rick says we create action group in makefile this is approved

@shaheislamdfe shaheislamdfe merged commit 52bb85b into main Jan 10, 2025
3 checks passed
@shaheislamdfe shaheislamdfe deleted the 889-monitor-node-pool-cpu-memory-pressure2 branch January 10, 2025 12:52
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants