-
Notifications
You must be signed in to change notification settings - Fork 15
ROCm/hipBLASLt@7f76af3 failing with OSError: Failed to locate rocm-smi
#359
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
Running a build with the above instructions I couldn't reproduce the issue. $ cmake --build build --target hipBLASLt
...
[hipBLASLt] [37/37] Creating library symlink library/libhipblaslt.so.0 library/libhipblaslt.so
[hipBLASLt completed in 6583 seconds]
[114/114] Merging sub-project dist directory for hipBLASLt But I notice that rocm-smi is in my path $ whereis rocm-smi
rocm-smi: /usr/bin/rocm-smi /opt/rocm-6.4.0/bin/rocm-smi Not that I'm an advocate for the code, but if we look at how this key is set in hipBLASLt: globalParameters["ROCmPath"] = "/opt/rocm"
if "ROCM_PATH" in os.environ:
globalParameters["ROCmPath"] = os.environ.get("ROCM_PATH")
...
globalParameters["ROCmBinPath"] = os.path.join(globalParameters["ROCmPath"], "bin")
globalParameters["ROCmSMIPath"] = locateExe(globalParameters["ROCmBinPath"], "rocm-smi") Then in for path in os.environ["PATH"].split(os.pathsep):
exePath = os.path.join(path, exeName) It looks like we're hitting an untested edge case where rocm-smi isn't found in any of ROCM_PATH, PATH, or /opt/rocm. I can clean this code up to search for rocm-smi slightly more idiomatically, but in the meantime, I'm curious what your environment looks like? Details: ROCm stack: 6.4.43481-46320a638
Commit $ git show
commit 08dfb3c2c1b291e5cebaa3b624375e054720a794 (HEAD -> users/marbre/bump-20250704-blas, origin/users/marbre/bump-20250704-blas)
Author: Marius Brehler <marius.brehler@amd.com>
Date: Mon Apr 7 22:05:05 2025 +0000
Bump BLAS submodules 20250407 |
Thanks - the reporter claims that their WSL system is missing it. I haven't seen this myself. |
I think this is only partly related. It is an issue in a WSL system missing it but we're hitting the same whenever we try to build in an environment without a pre-installed ROCm, which shouldn't be a hard requirement. Hence |
@marbre it seems like we have two issues here:
If we found a way to build without using rocm-smi, would that solve the issue? I don't think rocm-smi is required to build hipbaslt but because Tensile and TensileCreateLibrary are lumped together we unconditionally check for the existence of rocm-smi. Also, the way that Tensile currently accommodates ROCm installations outside of "conventional" locations is through the I would like to move in a direction where we forward along toolchain information detected by CMake and remove all related logic in Tensile but it will take some time to get there. |
With the patch tracked in #380, this actually resolved. It would be nice to get this into hipBLASLt instead of having this patch as part of TheRock.
For Tensile, we have at minimum https://github.com/ROCm/TheRock/blob/main/patches/amd-mainline/hipBLASLt/0002-Do-not-hard-code-hipBLASLt-to-find-tools-in-opt-rocm.patch (tracked in #262). Issues tracking all hipBLASLt related patches are here: https://github.com/ROCm/TheRock/issues?q=is%3Aissue%20state%3Aopen%20hipBLASLt%20label%3Apatch |
Issue
When trying to bump the hipBLASLt submodule to ROCm/hipBLASLt@7f76af3, building for
gfx94X-dcgpu
fails withThe line raising the
OSError
https://github.com/ROCm/hipBLASLt/blob/e9fa8851fbbb1441b67ef0f9c42bdcae8318a7f7/tensilelite/Tensile/Common/Utilities.py#L104 was introduced as part of commit ROCm/hipBLASLt@422087b.Changes to reproduce are on branch bump-20250704-blas.
Steps to Reproduce
Hints
build/math-libs/BLAS/hipBLASLt/build
and trigger a build there.The text was updated successfully, but these errors were encountered: