add a CUDA device code sanity check #4692

jfgrimm · 2024-10-24T16:53:29Z

At the moment, we do no checking that the cuda compute capabilities that EasyBuild is configured to use, are actually used in the resultant binaries/libraries

WIP PR to introduce an extra sanity check when CUDA is present to check for mismatches between cuda_compute_capabilities and what cuobjdump reports

…_capabilities when CUDA is used

hound · 2024-10-24T16:53:40Z

easybuild/framework/easyblock.py

-from easybuild.tools.systemtools import get_shared_lib_ext, pick_system_specific_value, use_group
+from easybuild.tools.systemtools import check_linked_shared_libs, det_parallelism, get_cuda_device_code_architectures
+from easybuild.tools.systemtools import get_linked_libs_raw, get_shared_lib_ext, pick_system_specific_value, use_group
+from easybuild.tools.toolchain.toolchain import TOOLCHAIN_CAPABILITY_CUDA


'easybuild.tools.toolchain.toolchain.TOOLCHAIN_CAPABILITY_CUDA' imported but unused

ocaisa · 2024-10-24T17:17:47Z

It's great that you looked into this, we've also been discussing it in EESSI: https://gitlab.com/eessi/support/-/issues/92

jfgrimm · 2024-10-24T17:32:38Z

@ocaisa thanks for the link, I'll take a look

Currently, main things I still plan to add to this pr:

An EB option to toggle whether this is a warning or error (akin to rpath sanity check strictness)
whitelisting (e.g. for bundled precompiled stuff)
handling software that only allows targeting a single CCC

ocaisa · 2024-10-24T17:41:53Z

I think it's a good idea to check for device code and ptx (with lack of ptx for the highest compute capability being a warning). The availability of ptx will allow you to run the application on future arch's.

ocaisa · 2025-02-19T19:38:27Z

easybuild/tools/systemtools.py

+    """
+
+    # cudaobjdump uses the sm_XY format
+    device_code_regex = re.compile('(?<=arch = sm_)([0-9])([0-9]+a{0,1})')


It would be good to also capture whether the code can be jit compiled (so it can at least run on a future arch). In a script I had I did this with:

# Regex to find multiple PTX and ELF sections ptx_matches = re.findall(r'Fatbin ptx code:\n=+\narch = sm_(\d+)', result.stdout) elf_matches = re.findall(r'Fatbin elf code:\n=+\narch = sm_(\d+)', result.stdout) # Debug: Show if matches were found for PTX and ELF sections if debug: print(f"PTX Matches: {ptx_matches}") print(f"ELF Matches: {elf_matches}") # Return all PTX and ELF matches, remove duplicates using set and convert to lists return { "ptx": sorted(set(ptx_matches)), # List of unique PTX capabilities "elf": sorted(set(elf_matches)) # List of unique ELF capabilities }

In fact: re.compile('(?<=arch = sm_)([0-9])([0-9]+a{0,1})') is not specific enough, because it will treat the Fatbin ptx code and Fatbin elf code sections the same: it'll just extract any arch = string it can find.

To have a concrete example of something that has both, one can check e.g. libcusparse:

[casparl@tcn1 ~]$ cuobjdump /cvmfs/software.eessi.io/versions/2023.06/software/linux/x86_64/amd/zen2/accel/nvidia/cc80/software/CUDA/12.1.1/lib64/libcusparse.so | grep -A 5 ptx | tail -n 12 ================ arch = sm_80 code version = [8,1] host = linux compile_size = 64bit -- Fatbin ptx code: ================ arch = sm_90 code version = [8,1] host = linux compile_size = 64bit [casparl@tcn1 ~]$ cuobjdump /cvmfs/software.eessi.io/versions/2023.06/software/linux/x86_64/amd/zen2/accel/nvidia/cc80/software/CUDA/12.1.1/lib64/libcusparse.so | grep -A 5 elf | tail -n 12 ================ arch = sm_80 code version = [1,7] host = linux compile_size = 64bit -- Fatbin elf code: ================ arch = sm_90 code version = [1,7] host = linux compile_size = 64bit

ocaisa · 2025-02-19T19:45:04Z

easybuild/framework/easyblock.py

@@ -3900,6 +3955,14 @@ def xs2str(xs):
        else:
            self.log.debug("Skipping RPATH sanity check")

+        if get_software_root('CUDA'):


@boegel We have an EESSI-specific complication here. We drop CUDA to a build time dep so that we don't depend on the CUDA module at runtime. This means that we won't execute this code path so we need to trigger the module load here.

I think you're right, but just to double check: are the build dependencies unloaded at sanity check time?

Could we fix this through an EasyBuild hook in EESSI, that loads the CUDA that was a build dependency also in the sanity_check_step (and unloads after)? Should also work for EESSI-extend, and no changes on the framework side needed...

Wait... actually, I don't think you're right. Because I just did this with EESSI-extend, and it did run the CUDA sanity check...? I'm not sure why, I would have expected the problem you mentioned. So... why didn't it appear?

I wonder what happens in the --module-only case, when the sanity check step is being run without building first? Perhaps this really is expected behaviour?

casparvl · 2025-02-19T19:48:44Z

easybuild/framework/easyblock.py

+            self.log.info("Using default subdirectories for binaries/libraries to verify CUDA device code: %s",
+                          cuda_dirs)
+        else:
+            self.log.info("Using default subdirectories for binaries/libraries to verify CUDA device code: %s",


This info message seems wrong: this is not the default subdirectories, this is a custom defined bin_lib_subdirs

casparvl · 2025-02-19T20:08:12Z

FYI: I checked with @jfgrimm on chat, he probably has little time to work on it in the near future. Since this is a very valuable feature for EESSI that we'd like to have before we start building a large amount of GPU software, I'll try to work on this myself. Note that @jfgrimm was ok in me pushing to his branch, so I'll do that rather than create my own PR - at least we can have the full discussion in one place, namely here.

casparvl · 2025-02-19T20:17:35Z

I tested this as follows:

cloned Jasper's feature branch into $HOME/easybuild/easybuild-framework/
load EESSI and EESSI-extend: module purge && module load EESSI/2023.06 EESSI-extend/2023.06-easybuild
installed an EasyBuild from the current 5.0.x branch using the EasyConfig EasyBuild-5.0.x.eb below, using the EasyBuild-4.9.4 from EESSI: eb EasyBuild-5.0.x.eb. This ensures I have the versions of blocks and easyconfigs from 5.0.x.

#EasyBuild-5.0.x.eb
# Nice way of installing an EasyBuild installation from the develop branch...
# Install with 'eblocalinstall --force-download ...' to make sure you get the latest version
easyblock = 'EB_EasyBuildMeta'
name = 'EasyBuild'
version = '5.0.x'
homepage = 'https://easybuilders.github.io/easybuild'
description = """EasyBuild is a software build and installation framework
 written in Python that allows you to install software in a structured,
 repeatable and robust way."""
toolchain = SYSTEM
sources = [
    {
        'source_urls': ['https://github.com/easybuilders/easybuild-framework/archive/'],
        'download_filename': '5.0.x.tar.gz',
        'filename': 'easybuild-framework-develop.tar.gz',
    },
    {
        'source_urls': ['https://github.com/easybuilders/easybuild-easyblocks/archive/'],
        'download_filename': '5.0.x.tar.gz',
        'filename': 'easybuild-easyblocks-develop.tar.gz',
    },
    {
        'source_urls': ['https://github.com/easybuilders/easybuild-easyconfigs/archive/'],
        'download_filename': '5.0.x.tar.gz',
        'filename': 'easybuild-easyconfigs-develop.tar.gz',
    },
]
# order matters a lot, to avoid having dependencies auto-resolved (--no-deps easy_install option doesn't work?)
# EasyBuild is a (set of) Python packages, so it depends on Python
# usually, we want to use the system Python, so no actual Python dependency is listed
allow_system_deps = [('Python', SYS_PYTHON_VERSION)]
local_pyshortver = '.'.join(SYS_PYTHON_VERSION.split('.')[:2])
sanity_check_paths = {
    'files': ['bin/eb'],
    'dirs': ['lib/python%s/site-packages' % local_pyshortver],
}
moduleclass = 'tools'

Set the folowing environment variables to pick up on the feature branch:

export PATH=$HOME/easybuild/easybuild-framework/:$PATH
export PYTHONPATH=$HOME/easybuild/easybuild-framework/:$PYTHONPATH

Added the following configuration (for some reason, my robot-path was empty, I now make it use the easyconfigs from the 5.0.x I installed above):

export EASYBUILD_ROBOT_PATHS=/home/casparl/eessi/versions/2023.06/software/linux/x86_64/amd/zen2/software/EasyBuild/5.0.x/easybuild/easyconfigs
export EASYBUILD_CUDA_COMPUTE_CAPABILITIES=8.0

I tried to install a CUDA-Samples:

eb CUDA-Samples-12.1-GCC-12.3.0-CUDA-12.1.1.eb --rebuild

This resulted in

== 2025-02-19 20:55:23,959 build_log.py:226 ERROR EasyBuild encountered an error (at easybuild/easybuild-framework/easybuild/tools/build_log.py:166 in caller_
info): Sanity check failed: Mismatch between cuda_compute_capabilities and device code in /home/casparl/eessi/versions/2023.06/software/linux/x86_64/amd/zen2/
software/CUDA-Samples/12.1-GCC-12.3.0-CUDA-12.1.1/bin/jitLto. Surplus compute capabilities: 5.2. Missing compute capabilities: 8.0.
Mismatch between cuda_compute_capabilities and device code in /home/casparl/eessi/versions/2023.06/software/linux/x86_64/amd/zen2/software/CUDA-Samples/12.1-G
CC-12.3.0-CUDA-12.1.1/bin/inlinePTX_nvrtc. Surplus compute capabilities: 5.2. Missing compute capabilities: 8.0.
Mismatch between cuda_compute_capabilities and device code in /home/casparl/eessi/versions/2023.06/software/linux/x86_64/amd/zen2/software/CUDA-Samples/12.1-G
CC-12.3.0-CUDA-12.1.1/bin/conjugateGradientCudaGraphs. Surplus compute capabilities: 5.0, 5.2, 6.0, 6.1, 7.0, 7.5, 8.6, 8.9, 9.0.

And many more. That's great, it means this PR is actually doing what it should. Indeed, checking manually:

$ cuobjdump /home/casparl/eessi/versions/2023.06/software/linux/x86_64/amd/zen2/software/CUDA-Samples/12.1-GCC-12.3.0-CUDA-12.1.1/bin/jitLto

Fatbin elf code:
================
arch = sm_52
code version = [1,7]
host = linux
compile_size = 64bit

So, yeah... CUDA-Samples is a mess when it comes to it's build system. The docs say you can set the CUDA compute capabilities by passing the SMS=<something> argument to it. Just for reference, my build command from the logs was:

rm -r bin/win64 &&  make  -j 16 HOST_COMPILER=g++ SMS='80'
 FILTER_OUT='Samples/2_Concepts_and_Techniques/EGLStream_CUDA_Interop/Makefile Samples/2_Concepts_and_Techniques/streamOrderedAllocationIPC/Makefile Samples/3
_CUDA_Features/tf32TensorCoreGemm/Makefile Samples/3_CUDA_Features/warpAggregatedAtomicsCG/Makefile Samples/4_CUDA_Libraries/boxFilterNPP/Makefile Samples/4_C
UDA_Libraries/cannyEdgeDetectorNPP/Makefile Samples/4_CUDA_Libraries/cudaNvSci/Makefile Samples/4_CUDA_Libraries/cudaNvSciNvMedia/Makefile Samples/4_CUDA_Libr
aries/freeImageInteropNPP/Makefile Samples/4_CUDA_Libraries/histEqualizationNPP/Makefile Samples/4_CUDA_Libraries/FilterBorderControlNPP/Makefile Samples/5_Do
main_Specific/simpleGL/Makefile Samples/5_Domain_Specific/simpleVulkan/Makefile Samples/5_Domain_Specific/simpleVulkanMMAP/Makefile Samples/5_Domain_Specific/
vulkanImageCUDA/Makefile Samples/0_Introduction/simpleAWBarrier/Makefile Samples/3_CUDA_Features/bf16TensorCoreGemm/Makefile Samples/3_CUDA_Features/dmmaTenso
rCoreGemm/Makefile Samples/3_CUDA_Features/globalToShmemAsyncCopy/Makefile Samples/4_CUDA_Libraries/simpleCUFFT_callback/Makefile Samples/2_Concepts_and_Techn
iques/cuHook/Makefile ' && rm bin/*/linux/release/lib*.so.*

Note that there are many executables in CUDA-Samples that were build for the correct CC. E.g.:

$ cuobjdump /home/casparl/eessi/versions/2023.06/software/linux/x86_64/amd/zen2/software/CUDA-Samples/12.1-GCC-12.3.0-CUDA-12.1.1/bin/deviceQuery

Fatbin elf code:
================
arch = sm_80
code version = [1,7]
host = linux
compile_size = 64bit

casparvl · 2025-02-19T20:44:10Z

Collecting some todo's:

add --strict-cuda-sanity-check EB option (default no): regular sanity check would fail (raise an error) if not at least the configured CCs are present. It should report surplus CCs (at least with --debug), but not fail. The strict variant will also fail if there are surplus CCs present. N.B. I'm not in favor of converting the error into a warning here - if you're not getting the CC you're requesting via --cuda-compute-capabilities, that's not what the user is counting on, and that should be a failure. A user can always decide to whitelist to make sure the sanity check passes, but this should be a very conscious decision. Since many of us are building in bulk, semi-automated pipelines, etc, warnings would too easily be missed.
whitelisting (e.g. for bundled precompiled stuff). This will cause the sanity check to be skipped (or at most print a warning/info) for software that is whitelisted. It enables a conscious override by a user to say 'yes, I know this binary wasn't build for the requested CC, and I'm ok with that'.
Also check for PTX code (and which arch that PTX code is for). We currently don't have any way of asking EasyBuild to build for a certain PTX arch, so a question would be: what do we check against? A logical default would be to check for PTX code for the highest CC in --cuda-compute-capabilities as this would allow forward-compatibility of the binary through JIT compilation.
add --strict-ptx-sanity-check (default: no): regular sanity check would fail (raise an error) if not at least the configured virtual architectures are present. It should report surplus CCs (at least with --debug), but not fail. The strict variant will also fail if there are surplus CCs present. => EDIT: Won't do, out of scope, see add a CUDA device code sanity check #4692 (comment)
add --cuda-virtual-architectures option to EasyBuild, which can be used to determine for which virtual architecture to compile PTX code. It won't do anything initially until EB contributors start supporting this in their EasyBlocks and/or we get proper NVCC compiler wrappers that could inject such arguments. => EDIT: Won't do, out of scope, see add a CUDA device code sanity check #4692 (comment)

…er the first non-cuda file. that's wrong

casparvl · 2025-02-20T22:45:49Z

Ignore list seems to work. Adding

cuda_sanity_ignore_files = ['bin/watershedSegmentationNPP', 'bin/simpleTemplates_nvrtc']

to the EasyConfig for CUDA-Samples results in

== 2025-02-20 23:43:16,229 easyblock.py:3350 DEBUG Sanity checking for CUDA device code in /home/casparl/eessi/versions/2023.06/software/linux/x86_64/amd/zen2/software/CUDA-Samples/12.1-GCC-12.3.0-CUDA-12.1
.1/bin/simpleTemplates_nvrtc
== 2025-02-20 23:43:16,229 run.py:489 INFO Path to bash that will be used to run shell commands: /cvmfs/software.eessi.io/versions/2023.06/compat/linux/x86_64/bin/bash
== 2025-02-20 23:43:16,229 run.py:500 INFO Running shell command 'file /home/casparl/eessi/versions/2023.06/software/linux/x86_64/amd/zen2/software/CUDA-Samples/12.1-GCC-12.3.0-CUDA-12.1.1/bin/simpleTemplat
es_nvrtc' in /tmp/casparl/easybuild/build/CUDASamples/12.1/GCC-12.3.0-CUDA-12.1.1/cuda-samples-12.1
== 2025-02-20 23:43:16,235 run.py:598 INFO Output of 'file ...' shell command (stdout + stderr):
/home/casparl/eessi/versions/2023.06/software/linux/x86_64/amd/zen2/software/CUDA-Samples/12.1-GCC-12.3.0-CUDA-12.1.1/bin/simpleTemplates_nvrtc: ELF 64-bit LSB executable, x86-64, version 1 (SYSV), dynamica
lly linked, interpreter /cvmfs/software.eessi.io/versions/2023.06/compat/linux/x86_64/lib64/ld-linux-x86-64.so.2, for GNU/Linux 3.2.0, not stripped

== 2025-02-20 23:43:16,235 run.py:601 INFO Shell command completed successfully (see output above): file /home/casparl/eessi/versions/2023.06/software/linux/x86_64/amd/zen2/software/CUDA-Samples/12.1-GCC-12
.3.0-CUDA-12.1.1/bin/simpleTemplates_nvrtc
== 2025-02-20 23:43:16,235 run.py:489 INFO Path to bash that will be used to run shell commands: /cvmfs/software.eessi.io/versions/2023.06/compat/linux/x86_64/bin/bash
== 2025-02-20 23:43:16,235 run.py:500 INFO Running shell command 'cuobjdump /home/casparl/eessi/versions/2023.06/software/linux/x86_64/amd/zen2/software/CUDA-Samples/12.1-GCC-12.3.0-CUDA-12.1.1/bin/simpleTe
mplates_nvrtc' in /tmp/casparl/easybuild/build/CUDASamples/12.1/GCC-12.3.0-CUDA-12.1.1/cuda-samples-12.1
== 2025-02-20 23:43:16,240 run.py:598 INFO Output of 'cuobjdump ...' shell command (stdout + stderr):

Fatbin elf code:
================
arch = sm_52
code version = [1,7]
host = linux
compile_size = 64bit

== 2025-02-20 23:43:16,240 run.py:601 INFO Shell command completed successfully (see output above): cuobjdump /home/casparl/eessi/versions/2023.06/software/linux/x86_64/amd/zen2/software/CUDA-Samples/12.1-G
CC-12.3.0-CUDA-12.1.1/bin/simpleTemplates_nvrtc
== 2025-02-20 23:43:16,241 easyblock.py:3376 WARNING Mismatch between cuda_compute_capabilities and device code in /home/casparl/eessi/versions/2023.06/software/linux/x86_64/amd/zen2/software/CUDA-Samples/1
2.1-GCC-12.3.0-CUDA-12.1.1/bin/simpleTemplates_nvrtc. Surplus compute capabilities: 5.2. Missing compute capabilities: 8.0. This failure will be ignored as /home/casparl/eessi/versions/2023.06/software/linux/x86_64/amd/zen2/software/CUDA-Samples/12.1-GCC-12.3.0-CUDA-12.1.1/bin/simpleTemplates_nvrtc is listed in 'ignore_cuda_sanity_failures'.
== 2025-02-20 23:43:16,241 easyblock.py:3393 WARNING Configured highest compute capability was '8.0', but no PTX code for this compute capability was found in '/home/casparl/eessi/versions/2023.06/software/linux/x86_64/amd/zen2/software/CUDA-Samples/12.1-GCC-12.3.0-CUDA-12.1.1/bin/simpleTemplates_nvrtc' PTX architectures supported in that file: []

and note that this binary does not get listed in the failure message. So that's the intended behavior: the warning is still printed, but it doesn't result in an error.

casparvl · 2025-02-21T16:21:04Z

Just to test: putting all of these files in the ignore list, the installation of CUDA-Samples now passes.

cuda_sanity_ignore_files = [
    'bin/binomialOptions_nvrtc',
    'bin/jitLto',
    'bin/inlinePTX_nvrtc',
    'bin/conjugateGradientCudaGraphs',
    'bin/simpleVoteIntrinsics_nvrtc',
    'bin/MersenneTwisterGP11213',
    'bin/nvJPEG_encoder',
    'bin/vectorAdd_nvrtc',
    'bin/clock_nvrtc',
    'bin/nvJPEG',
    'bin/BlackScholes_nvrtc',
    'bin/simpleAtomicIntrinsics_nvrtc',
    'bin/batchedLabelMarkersAndLabelCompressionNPP',
    'bin/conjugateGradient',
    'bin/simpleAssert_nvrtc',
    'bin/matrixMul_nvrtc',
    'bin/cuSolverDn_LinearSolver',
    'bin/quasirandomGenerator_nvrtc',
    'bin/watershedSegmentationNPP',
    'bin/simpleTemplates_nvrtc'
]

This provides a nice starting point for further tests, I can easily just remove one from the exclude list, and check that I get the expected result.

casparvl · 2025-02-21T18:09:57Z

So... the whole thing with checking PTX codes makes me rethink what EasyBuild should do when a certain --cude-device-compute-capabilities is set. Currently, this is ill-defined at best. Our official docs say:

List of CUDA compute capabilities to use when building GPU software;
values should be specified as digits separated by a dot, for example:
3.5,5.0,7.2 (type comma-separated list)

But what does that mean? What do we expect the nvcc compiler to do here? Say we were to compile a simple hello world, and I would do --cuda-compute-capabilies=8.0,9.0, what would I expect my nvcc invocation to look like?

nvcc hello.cu --gpu-architecture=compute_80 --gpu-code=sm_80,sm_90 -o hello

i.e. would it only build device code for 80/90, and not include PTX? And build both through the lowest common virtual architecture? Or should it do

nvcc hello.cu --gpu-architecture=compute_80 --gpu-code=sm_80,sm_90,compute_80 -o hello

i.e. also include the PTX code for the --gpu-architecture we specified? Or do we expect it to use the generalized option --generate-code so that it does

nvcc hello.cu  --generate-code=arch=compute_80,code=sm_80 --generate-code=arch=compute_90,code=sm_90 -o hello

i.e. the stage one compilation is executed once for each CUDA compute capability, so that the generated sm_90 code can actually use the features from the compute_90 architecture? Or do we expect it to do

nvcc hello.cu  --generate-code=arch=compute_80,code=sm_80 --generate-code=arch=compute_90,code=sm_90 --generate-code=arch=compute_90,code=compute_90 -o hello

so that it actually includes not only the device codes for CC80 and CC90, but also the PTX code for CC90 (for forwards compatibility)?

Honestly, from a performance perspective, I think it would be best if EasyBuild would indeed use the generalized arguments, so that the sm_90 code would use the full capabilities of the compute_90 virtual architecture. Since EasyBuild focusses on performance, I think this makes sense. The only price you pay is longer compilation time, since you also have to build that compute_90 virtual architecture PTX code. Whether to include the PTX code is a different question. As proposed above, I think this should be a separate option in EasyBuild, so that one can decide in the EB config whether to ship PTX code, and which version(s).

I.e. my proposal would be that if EasyBuild is configured with --cuda-compute-capabilities=7.0,8.0,9.0 and --cuda-virtual-architectures=7.0,9.0 that this would trigger:

nvcc hello.cu  --generate-code=arch=compute_70,code=sm_70 --generate-code=arch=compute_80,code=sm_80 --generate-code=arch=compute_90,code=sm_90 --generate-code=arch=compute_70,code=compute_70 --generate-code=arch=compute_90,code=compute_90 -o hello

Note that it may not always be possible to convince all build systems to actually do this - e.g. some codes might really only compile for a single cuda-compute-capability, or the build system doesn't make this distinction between real and virtual architectures to build for. Eventually the most robust and generic way to get this done might just be to implement nvcc compiler wrappers that inject these --generate-code arguments.

I'm creating a CUDA hello world EasyConfig that we can use to serve as an easy example of 1) how we think --cuda-compute-capabilities and --cuda-virtual-architectures should work in EasyBuild and 2) have an EasyConfig with which we can easily test these things, including the sanity check.

I'm not sure what the best way forward is. If I include everything in this PR, it may be a bit heavy - although honestly at the framwork level it's just about defining the options, the real implementation would have to be done in EasyBlocks and EasyConfigs that use this information...

My plan is to include the options in this PR, and make an accompanying PR for my CUDA hello world that uses these options in the way described above. The rest is then up to anyone updating or creating new EasyBlocks/EasyConfigs that somehow use information on the CUDA compute capability.

casparvl · 2025-02-21T18:50:39Z

Ok, change of plans. After thinking it over, this would be a massive scope creep that would delay the sanity check part that we primarily care about in this PR. Instead, in this PR, I'll focus on just that: a sanity check for the CUDA device codes. We can assume that everyone using EasyBuild expect this to be the meaning of the --cuda-compute-capabilities, i.e. they expect if they specify 8.0,9.0 that the resulting binaries contain device code for 8.0 and 9.0. Which virtual architecture was used to get there, or what PTX codes are shipped as part of the binary are not relevant to that expectation, and can be considered further optimizations that we can do in a separate PR.

I will retain the code that prints a warning for the PTX code not matching the highest architecture. Or maybe demote it to an info message. In any case, it's convenient for future reference if EasyBuild extracts this information.

I will not implement a strict option for the PTX code sanity check in this PR. It does not make sense to be sanity checking for behavior that we haven't clearly defined, i.e. there is no clear definition of what PTX code is expected to be included when someone sets --cuda-compute-capabilities.

casparvl · 2025-02-21T19:36:48Z

Everything not sanity-check related is now described in this issue, which can be used to create one or more follow-up PRs.

…nity check on surpluss CUDA archs if this option is set. Otherwise, print warning

casparvl · 2025-02-21T20:50:54Z

Tested by adding

cuda_sanity_ignore_files = [
    'bin/binomialOptions_nvrtc',
    'bin/jitLto',
    'bin/inlinePTX_nvrtc',
    'bin/conjugateGradientCudaGraphs',
    'bin/simpleVoteIntrinsics_nvrtc',
    'bin/MersenneTwisterGP11213',
    'bin/nvJPEG_encoder',
    'bin/vectorAdd_nvrtc',
    'bin/clock_nvrtc',
    'bin/nvJPEG',
    'bin/BlackScholes_nvrtc',
    'bin/simpleAtomicIntrinsics_nvrtc',
    'bin/batchedLabelMarkersAndLabelCompressionNPP',
    # 'bin/conjugateGradient',
    'bin/simpleAssert_nvrtc',
    'bin/matrixMul_nvrtc',
    'bin/cuSolverDn_LinearSolver',
    'bin/quasirandomGenerator_nvrtc',
    'bin/watershedSegmentationNPP',
    'bin/simpleTemplates_nvrtc'
]

To CUDA-Samples-12.1-GCC-12.3.0-CUDA-12.1.1.eb. Then, with:

eb CUDA-Samples-12.1-GCC-12.3.0-CUDA-12.1.1.eb --rebuild

my build succeeds whereas with

eb CUDA-Samples-12.1-GCC-12.3.0-CUDA-12.1.1.eb --rebuild --strict-cuda-sanity-check

It fails with:

== 2025-02-21 21:41:21,349 build_log.py:226 ERROR EasyBuild encountered an error (at easybuild/easybuild-framework/easybuild/tools/build_log.py:166 in caller_info): Sanity check failed: Mismatch between cuda_compute_capabilities and device code in /home/casparl/eessi/versions/2023.06/software/linux/x86_64/amd/zen2/software/CUDA-Samples/12.1-GCC-12.3.0-CUDA-12.1.1/bin/conjugateGradient. Surplus compute capabilities: 5.0, 5.2, 6.0, 6.1, 7.0, 7.5, 8.6, 8.9, 9.0.  (at easybuild/easybuild-framework/easybuild/framework/easyblock.py:4010 in _sanity_check_step)

as intended.

Only thing left to do for this PR is tests. Not my strong suit to be honest, but let's see. I guess the tricky thing here is that a true test requires a real CUDA binary, and I'm not sure that's even feasible... To build one, I'd need a CUDA module in the test environment - I'm not sure if we have that. I could try to find a CUDA binary that we could just install (maybe just include a hello-world type of CUDA binary) and test with that... Maybe that's the most feasible option. But I have no clue if we can reasonable include binaries in the repo under the test directory. I have an 800KB hello world binary, that shouldn't be too crazy I guess.

ocaisa · 2025-02-21T21:09:57Z

What you can do is create a mock cuobjdump script that parrots output, you're only checking that EBcan run the code and paste the output.

casparvl · 2025-02-21T21:34:40Z

Damn your good, it took me 25 more minutes of looking at other examples to figure out that even if I could ingest a binary, I'd lack the cuobjdump executable. Might indeed as well fake cuobjdump output on a toy build example.

hound · 2025-02-21T22:55:46Z

test/framework/toy_build.py

+                            test_report_regexs=[regex])
+
+
+


blank line contains whitespace

hound · 2025-02-21T22:55:46Z

test/framework/toy_build.py

+        regex += "device code architectures match those in cuda_compute_capabilities"
+        self.test_toy_build(extra_args=args, test_report=test_report_fp, raise_error=True
+                            test_report_regexs=[regex])
+


blank line contains whitespace

hound · 2025-02-22T01:50:38Z

test/framework/toy_build.py

+
+
+
+        # Test single CUDA compute capability with --cuda-compute-capabilities=8.0


too many blank lines (6)

hound · 2025-02-22T01:50:38Z

test/framework/toy_build.py

+        write_file(cuobjdump_file, cuobjdump_txt_sm80, append=True)
+        adjust_permissions(cuobjdump_file, stat.S_IXUSR, add=True)  # Make sure our mock cuobjdump is executable
+        args = ['--cuda-compute-capabilities=8.0']
+        test_report_fp = os.path.join(self.test_buildpath, 'full_test_report.md')


local variable 'test_report_fp' is assigned to but never used

hound · 2025-02-22T01:50:39Z

test/framework/toy_build.py

+        ])
+
+        # Section for cuobjdump printing output for sm_90 PTX code
+        cuobjdump_txt_sm90_ptx = '\n'.join([


local variable 'cuobjdump_txt_sm90_ptx' is assigned to but never used

hound · 2025-02-22T01:50:39Z

test/framework/toy_build.py

+        ])
+
+        # Section for cuobjdump printing output for sm_80 PTX code
+        cuobjdump_txt_sm80_ptx = '\n'.join([


local variable 'cuobjdump_txt_sm80_ptx' is assigned to but never used

hound · 2025-02-22T01:50:39Z

test/framework/toy_build.py

+        ])
+
+        # Section for cuobjdump printing output for sm_90 architecture
+        cuobjdump_txt_sm90 = '\n'.join([


local variable 'cuobjdump_txt_sm90' is assigned to but never used

sanity check binaries/libraries for device code matching cuda_compute…

e329d46

…_capabilities when CUDA is used

jfgrimm added enhancement EasyBuild-5.0 EasyBuild 5.0 labels Oct 24, 2024

jfgrimm added this to the 5.0 milestone Oct 24, 2024

hound bot reviewed Oct 24, 2024

View reviewed changes

Merge branch '5.0.x' into cuda-device-code-sanity-check

c8cece2

ocaisa reviewed Feb 19, 2025

View reviewed changes

casparvl reviewed Feb 19, 2025

View reviewed changes

Caspar van Leeuwen added 4 commits February 20, 2025 02:40

Add check for PTX, more explicit debug logging

ee63b8e

That return should not be there, as it will stop the sanity check aft…

de6d49d

…er the first non-cuda file. that's wrong

Fix some logic in the PTX warning printed

0e97868

Add option for ignoring individual files in the CUDA sanity check

6b6d2c8

casparvl mentioned this pull request Feb 21, 2025

The desired behavior of EasyBuild for --cuda-compute-capabilities is ill defined #4770

Open

Add strict-cuda-sanity-check option and make sure we only fail the sa…

6568909

…nity check on surpluss CUDA archs if this option is set. Otherwise, print warning

This is a work in progress for creating a set of tests...

3d07ef6

hound bot reviewed Feb 21, 2025

View reviewed changes

First test working..

f13fca2

hound bot reviewed Feb 22, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

add a CUDA device code sanity check #4692

add a CUDA device code sanity check #4692

jfgrimm commented Oct 24, 2024

hound bot Oct 24, 2024

ocaisa commented Oct 24, 2024

jfgrimm commented Oct 24, 2024

ocaisa commented Oct 24, 2024

ocaisa Feb 19, 2025 •

edited

Loading

casparvl Feb 19, 2025

ocaisa Feb 19, 2025

casparvl Feb 19, 2025

casparvl Feb 19, 2025

ocaisa Feb 22, 2025

casparvl Feb 19, 2025 •

edited

Loading

casparvl commented Feb 19, 2025

casparvl commented Feb 19, 2025 •

edited

Loading

casparvl commented Feb 19, 2025 •

edited

Loading

casparvl commented Feb 20, 2025

casparvl commented Feb 21, 2025

casparvl commented Feb 21, 2025

casparvl commented Feb 21, 2025

casparvl commented Feb 21, 2025

casparvl commented Feb 21, 2025 •

edited

Loading

ocaisa commented Feb 21, 2025

casparvl commented Feb 21, 2025

hound bot Feb 21, 2025

hound bot Feb 21, 2025

hound bot Feb 22, 2025

hound bot Feb 22, 2025

hound bot Feb 22, 2025

hound bot Feb 22, 2025

hound bot Feb 22, 2025




		# Test single CUDA compute capability with --cuda-compute-capabilities=8.0

add a CUDA device code sanity check #4692

Are you sure you want to change the base?

add a CUDA device code sanity check #4692

Conversation

jfgrimm commented Oct 24, 2024

hound bot Oct 24, 2024

Choose a reason for hiding this comment

ocaisa commented Oct 24, 2024

jfgrimm commented Oct 24, 2024

ocaisa commented Oct 24, 2024

ocaisa Feb 19, 2025 • edited Loading

Choose a reason for hiding this comment

casparvl Feb 19, 2025

Choose a reason for hiding this comment

ocaisa Feb 19, 2025

Choose a reason for hiding this comment

casparvl Feb 19, 2025

Choose a reason for hiding this comment

casparvl Feb 19, 2025

Choose a reason for hiding this comment

ocaisa Feb 22, 2025

Choose a reason for hiding this comment

casparvl Feb 19, 2025 • edited Loading

Choose a reason for hiding this comment

casparvl commented Feb 19, 2025

casparvl commented Feb 19, 2025 • edited Loading

casparvl commented Feb 19, 2025 • edited Loading

casparvl commented Feb 20, 2025

casparvl commented Feb 21, 2025

casparvl commented Feb 21, 2025

casparvl commented Feb 21, 2025

casparvl commented Feb 21, 2025

casparvl commented Feb 21, 2025 • edited Loading

ocaisa commented Feb 21, 2025

casparvl commented Feb 21, 2025

hound bot Feb 21, 2025

Choose a reason for hiding this comment

hound bot Feb 21, 2025

Choose a reason for hiding this comment

hound bot Feb 22, 2025

Choose a reason for hiding this comment

hound bot Feb 22, 2025

Choose a reason for hiding this comment

hound bot Feb 22, 2025

Choose a reason for hiding this comment

hound bot Feb 22, 2025

Choose a reason for hiding this comment

hound bot Feb 22, 2025

Choose a reason for hiding this comment

ocaisa Feb 19, 2025 •

edited

Loading

casparvl Feb 19, 2025 •

edited

Loading

casparvl commented Feb 19, 2025 •

edited

Loading

casparvl commented Feb 19, 2025 •

edited

Loading

casparvl commented Feb 21, 2025 •

edited

Loading