Skip to content

Adding release test capability #352

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 15 commits into from
Apr 17, 2025
Merged

Adding release test capability #352

merged 15 commits into from
Apr 17, 2025

Conversation

geomin12
Copy link
Contributor

@geomin12 geomin12 commented Apr 3, 2025

This file gets triggered for any nightly release workflow finishing, but it's very niche to mi300 since that is the only machine we can run tests on.

Changes:

  • When a release assets workflow finishes, this file will:
    • download the tar file
    • untar it
    • run sanity checks, hipblaslt and rocblas tests
    • whether pass or fail for tests, it will report results to the release notes

Tested results and run with ubuntu-latest (obviously causes failure) in this release notes page and here is the workflow run

This will enable testing for nightly releases and report results

Next todos:

  • I know i know, the testing here is meh. not parallelized in any way so it'll be slow. next PR will look into improving that
    • lot of duplication with the test_linux_packages.yml... opportunity to combine 🤑!
  • Adding support for more machines so we don't have to hard code gfx94X

Closes #368

geomin12 added 2 commits April 3, 2025 16:28
* Adding initial test release file

* Format fix

* Format fix

* It appears ordering matters

* It appears ordering matters

* ok if statement not a fan there

* ok if statement not a fan there

* Adding matrix generation

* Fixing permissions

* adding python

* Correcting json

* simplifying matrix

* Fixing tar

* Adding mkdir

* Adding release notes

* testing with ubuntu

* Testing this

* Missing quotes

* Trying this

* Trying this

* Writing to markdown

* Okay this should work

* Adding cancel step

* markdown cleanup

* Adding cleanup and docs
@geomin12 geomin12 requested a review from marbre April 7, 2025 22:34
Comment on lines +87 to +100
- name: Run hipBLASLt tests
if: '!cancelled()'
id: hipblaslt-tests
run: |
source ${VENV_DIR}/bin/activate
${THEROCK_BIN_DIR}/hipblaslt-test --gtest_filter=*pre_checkin*

- name: Run rocBLAS tests
if: '!cancelled()'
id: rocblas-tests
run: |
ROCBLAS_TENSILE_LIBPATH="${THEROCK_BIN_DIR}/lib/rocblas/library/"
source ${VENV_DIR}/bin/activate
${THEROCK_BIN_DIR}/rocblas-test --yaml ${THEROCK_BIN_DIR}/rocblas_smoke.yaml
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we reuse here what you've recently added to the CI?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i'll need to do some rework on both sides to consolidate and share tests, since it seems duplicating

and for these tests, it'll involve some upload/download assets and hopefully that won't be too costly in time

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i'll actually scope this PR to just adding the capability to test, then in a future PR, i'll work on consolidating and combining the test efforts

@geomin12 geomin12 requested a review from marbre April 8, 2025 14:38
@geomin12 geomin12 added gfx94X-linux bug Something isn't working and removed bug Something isn't working labels Apr 8, 2025
Copy link
Member

@marbre marbre left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Next round of comments :)

@geomin12 geomin12 requested a review from marbre April 9, 2025 21:15
type: string

jobs:
generate_release_matrix:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not sure the job need to generate the release matrix to test for itself. This could be an input form the prior job. With this it's also more easy to trigger exactly one run on a specific system and for a specific GPU.

@geomin12 geomin12 requested a review from marbre April 17, 2025 00:14
Copy link
Member

@marbre marbre left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks overall pretty good! The one thing I am little concerned about is if appending to a release like https://github.com/ROCm/TheRock/releases/tag/nightly-release won't be a little noisy, in particular as soon as further components will be tested. I think we need to improve this somehow ...

@geomin12
Copy link
Contributor Author

Looks overall pretty good! The one thing I am little concerned about is if appending to a release like https://github.com/ROCm/TheRock/releases/tag/nightly-release won't be a little noisy, in particular as soon as further components will be tested. I think we need to improve this somehow ...

yes good point, i also had concerns since that data will be appending essentially forever

My current thought is, we can append the results in each "job github output" like this iree example, then in the release notes, we can just add SUCCESS/FAIL notes on the most recent run.

That way, we can keep track of historic results but not make the nightly-release tag too noisy

@geomin12 geomin12 requested a review from marbre April 17, 2025 17:11
@marbre
Copy link
Member

marbre commented Apr 17, 2025

Looks overall pretty good! The one thing I am little concerned about is if appending to a release like https://github.com/ROCm/TheRock/releases/tag/nightly-release won't be a little noisy, in particular as soon as further components will be tested. I think we need to improve this somehow ...

yes good point, i also had concerns since that data will be appending essentially forever

My current thought is, we can append the results in each "job github output" like this iree example, then in the release notes, we can just add SUCCESS/FAIL notes on the most recent run.

That way, we can keep track of historic results but not make the nightly-release tag too noisy

Oh yeah, I like that 👍

Copy link
Member

@marbre marbre left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this should work, it's at least fine to land and otherwise fix forward :)

@geomin12 geomin12 merged commit 59c324a into ROCm:main Apr 17, 2025
3 of 5 checks passed
@geomin12 geomin12 deleted the release-test branch April 17, 2025 17:52
@github-project-automation github-project-automation bot moved this from TODO to Done in TheRock Triage Apr 17, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Status: Done
Development

Successfully merging this pull request may close these issues.

Test nightly release tarballs
2 participants