Skip to content

Complex struct as input to a kernel #574

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
foglienimatteo opened this issue Feb 25, 2025 · 1 comment
Open

Complex struct as input to a kernel #574

foglienimatteo opened this issue Feb 25, 2025 · 1 comment

Comments

@foglienimatteo
Copy link

Hello everyone!

We are trying to use KernelAbstraction.jl for parallelising some nested integrals inside a Julia code (GaPSE.jl, for the ones who are interested).

We encountered however an issue we are not able to solve: we need to pass a complicated struct into the kernel to make the computations,
but we get stuck at the error Argument XXX to your kernel function is of type YYY, which is not isbits: that exceptions is due to Vector{Float64}, which is not
a bit type (even if I don't get why, because we don't use pointers and it's made of primitive concrete Float64 values).

We created a minimal working example with oneAPI of what we are doing, I put it at the bottom.
The same same behaviour can be produced with Metal, after the replacements oneAPI.oneAPIBackend()->Metal.MetalBackend() and Float64->Float32 (because Metal does not support doubles).

The part which bugs us the most: it IS possible to pass a Vector{Float64}, but only if:

  1. it is first transformed into the GPU type (e.g oneArray([1 2 3]))
  2. that vector is then given as input DIRECTLY to the kernel

My question is: are we doing something wrong/missing something, or really is not possible to pass structs containing Vectors to the kernel?


Some notes:

  • the input struct can be read-only, we don't want to modify it;
  • we cannot use StaticArrays, because we don't know the size at compile time but at runtime;
  • we tried to replace Vector{Float64} with oneArray{Float64}, no change;
  • we tried to add @Const to the struct, no change;
  • it would be very complicated and unpractical for us to unpack the whole struct; have a look at GaPSE.Cosmology if you need; we thought about creating another struct from this one removing the obvious nonbits types (e.g. Strings), but we absolutely need to bring the vectors (GaPSE.MySpline is just another struct containing Vector{Float64})

Thank you in advance for the help; KernelAbstractions.jl is a great idea, we would really like to make it work.




Here the MWE:

using KernelAbstractions
using oneAPI


struct MyStruct
    a::Float64
    b::Vector{Float64}
end

MS = MyStruct(1.0, [2.0, 3.0])


@kernel function my_kernel(A, MS)
  I = @index(Global)
  A[I] = 2 * A[I] + MS.b[1]
end

#backend = CPU()                       # This works
backend = oneAPI.oneAPIBackend()     # This doesn't

A = KernelAbstractions.ones(backend, Float64, 1024, 1024)
ev = my_kernel(backend, 64)(A, MS, ndrange=size(A))
KernelAbstractions.synchronize(backend)

println( "Test: ", all(A .== 4.0) ? "Passed" : "Failed" )

which gives:

   _       _ _(_)_     |  Documentation: https://docs.julialang.org
  (_)     | (_) (_)    |
   _ _   _| |_  __ _   |  Type "?" for help, "]?" for Pkg help.
  | | | | | | |/ _` |  |
  | | |_| | | | (_| |  |  Version 1.10.7 (2024-11-26)
 _/ |\__'_|_|_|\__'_|  |  Official https://julialang.org/ release
|__/                   |

julia> include("./KernelAbstraction_struct.jl")     # with "backend = CPU()"
Test: Passed

julia> include("./KernelAbstraction_struct.jl")     # with "backend = oneAPI.oneAPIBackend()"

ERROR: LoadError: GPU compilation of MethodInstance for gpu_my_kernel(::KernelAbstractions.CompilerMetadata{…}, ::oneDeviceMatrix{…}, ::MyStruct) failed
KernelError: passing and using non-bitstype argument

Argument 4 to your kernel function is of type MyStruct, which is not isbits:
  .b is of type Vector{Float64} which is not isbits.


Stacktrace:
  [1] check_invocation(job::GPUCompiler.CompilerJob)
    @ GPUCompiler ~/.julia/packages/GPUCompiler/nWT2N/src/validation.jl:92
  [2] macro expansion
    @ ~/.julia/packages/GPUCompiler/nWT2N/src/driver.jl:128 [inlined]
  [3] macro expansion
    @ ~/.julia/packages/TimerOutputs/Lw5SP/src/TimerOutput.jl:253 [inlined]
  [4] codegen(output::Symbol, job::GPUCompiler.CompilerJob; libraries::Bool, toplevel::Bool, optimize::Bool, cleanup::Bool, strip::Bool, validate::Bool, only_entry::Bool, parent_job::Nothing)
    @ GPUCompiler ~/.julia/packages/GPUCompiler/nWT2N/src/driver.jl:126
  [5] codegen
    @ ~/.julia/packages/GPUCompiler/nWT2N/src/driver.jl:115 [inlined]
  [6] compile(target::Symbol, job::GPUCompiler.CompilerJob; libraries::Bool, toplevel::Bool, optimize::Bool, cleanup::Bool, strip::Bool, validate::Bool, only_entry::Bool)
    @ GPUCompiler ~/.julia/packages/GPUCompiler/nWT2N/src/driver.jl:111
  [7] compile
    @ ~/.julia/packages/GPUCompiler/nWT2N/src/driver.jl:103 [inlined]
  [8] #58
    @ ~/.julia/packages/oneAPI/1GTs3/src/compiler/compilation.jl:81 [inlined]
  [9] JuliaContext(f::oneAPI.var"#58#59"{GPUCompiler.CompilerJob{GPUCompiler.SPIRVCompilerTarget, oneAPI.oneAPICompilerParams}}; kwargs::@Kwargs{})
    @ GPUCompiler ~/.julia/packages/GPUCompiler/nWT2N/src/driver.jl:52
 [10] JuliaContext(f::Function)
    @ GPUCompiler ~/.julia/packages/GPUCompiler/nWT2N/src/driver.jl:42
 [11] compile(job::GPUCompiler.CompilerJob)
    @ oneAPI ~/.julia/packages/oneAPI/1GTs3/src/compiler/compilation.jl:80
 [12] actual_compilation(cache::Dict{…}, src::Core.MethodInstance, world::UInt64, cfg::GPUCompiler.CompilerConfig{…}, compiler::typeof(oneAPI.compile), linker::typeof(oneAPI.link))
    @ GPUCompiler ~/.julia/packages/GPUCompiler/nWT2N/src/execution.jl:128
 [13] cached_compilation(cache::Dict{…}, src::Core.MethodInstance, cfg::GPUCompiler.CompilerConfig{…}, compiler::Function, linker::Function)
    @ GPUCompiler ~/.julia/packages/GPUCompiler/nWT2N/src/execution.jl:103
 [14] macro expansion
    @ ~/.julia/packages/oneAPI/1GTs3/src/compiler/execution.jl:203 [inlined]
 [15] macro expansion
    @ ./lock.jl:267 [inlined]
 [16] zefunction(f::typeof(gpu_my_kernel), tt::Type{Tuple{KernelAbstractions.CompilerMetadata{…}, oneDeviceMatrix{…}, MyStruct}}; kwargs::@Kwargs{})
    @ oneAPI ~/.julia/packages/oneAPI/1GTs3/src/compiler/execution.jl:198
 [17] zefunction(f::typeof(gpu_my_kernel), tt::Type{Tuple{KernelAbstractions.CompilerMetadata{…}, oneDeviceMatrix{…}, MyStruct}})
    @ oneAPI ~/.julia/packages/oneAPI/1GTs3/src/compiler/execution.jl:195
 [18] macro expansion
    @ ~/.julia/packages/oneAPI/1GTs3/src/compiler/execution.jl:66 [inlined]
 [19] (::KernelAbstractions.Kernel{…})(::oneArray{…}, ::Vararg{…}; ndrange::Tuple{…}, workgroupsize::Nothing)
    @ oneAPI.oneAPIKernels ~/.julia/packages/oneAPI/1GTs3/src/oneAPIKernels.jl:89
@vchuravy
Copy link
Member

You will need to add a rule with https://github.com/JuliaGPU/Adapt.jl to define how to translate your struct across the CPU-GPU boundary. The @adapt_strucuture may be sufficient.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants