-
Notifications
You must be signed in to change notification settings - Fork 435
feat: TCP health check integration + ECS health checks on staging #5377
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
The latest updates on your projects. Learn more about Vercel for Git ↗︎ 3 Skipped Deployments
|
Docker builds report
|
Uffizzi Preview |
Codecov ReportAll modified and coverable lines are covered by tests ✅
Additional details and impacted files@@ Coverage Diff @@
## main #5377 +/- ##
=======================================
Coverage 97.61% 97.61%
=======================================
Files 1237 1237
Lines 42975 42975
=======================================
Hits 41952 41952
Misses 1023 1023 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
api/scripts/healthcheck.py
Outdated
logger.error(f"Health check failed with status {status_code}") | ||
|
||
sys.exit(0 if 200 >= status_code < 300 else 1) | ||
args = parser.parse_args() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are these TCP health checks only used for service discovery, and only in SaaS environments?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So far, yes — but we'll use them in Helm as well, see https://github.com/Flagsmith/self-hosted/blob/d1d19e77b5f22313024d49faa6e02728652fb721/flagsmith-values.yaml#L15-L22 and https://flagsmith.slack.com/archives/C07G7UAKPNH/p1745340333442389
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Re https://flagsmith.slack.com/archives/C07G7UAKPNH/p1745340333442389
Does AWS alb support exec health checks? Think it only supports http?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The health checks added here are for Cloud Map only, and are expected to be run by ECS, which does not support HTTP.
"healthCheck": { | ||
"command": [ | ||
"CMD-SHELL", | ||
"python scripts/healthcheck.py" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would suggest adding an entrypoint command to run-docker.sh
such as healthcheck
, to abstract ECS/k8s/etc away from needing to know about Python or the path to the healthcheck.py
file. That way we can change the healthcheck implementation easily without needing to modify how we're calling it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is done in 711218a .
As discussed offline, we are now using flagsmith healthcheck
commands.
Co-authored-by: Rodrigo López Dato <rodrigo.lopezdato@flagsmith.com>
"healthCheck": { | ||
"command": [ | ||
"CMD-SHELL", | ||
"flagsmith healthcheck tcp" | ||
], | ||
"interval": 10, | ||
"timeout": 2, | ||
"retries": 5, | ||
"startPeriod": 5 | ||
}, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@khvn26 Were you able to test this health check in ECS before this reaches production? Sorry I'm late to the party.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yup, see screenshot evidence in the PR description + Grafana dashboard wouldn't work without these health checks.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM if the ECS health check works.
Thanks for submitting a PR! Please check the boxes below:
docs/
if required so people know about the feature!Changes
This picks the
flagsmith healthcheck
command implemented in Flagsmith/flagsmith-common#60 and adds it to ECS task definitions on staging. This is required for service discovery (and metric collection) to work.How did you test this code?
Tasks now show up as healthy in ECS console: