Calling Flux.onehotbatch on GPU array moves data to host? #24

rkube · 2022-11-09T15:56:28Z

Package Version

Flux v0.13.6

Julia Version

1.8.2

OS / Environment

Linux, PPCLE64

Describe the bug

Calling Flux.onehotbatch on a gpu array yields a cpu array. I expect it would return a gpu array.

Steps to Reproduce

Y_pred = softmax(randn(3, 10)) |> gpu
Y_true = rand(1:3, 10) |> gpu
Y_true = Flux.onehotbatch(Y_true, 1:3) 

Flux.crossentropy(Y_pred, Y_true)


julia> Flux.crossentropy(Y_pred, Y_true)
ERROR: GPU compilation of kernel #broadcast_kernel#17(CUDA.CuKernelContext, CuDeviceMatrix{Float32, 1}, Base.Broadcast.Broadcasted{CUDA.CuArrayStyle{2}, Tuple{Base.OneTo{Int64}, Base.OneTo{Int64}}, typeof(Flux.Losses.xlogy), Tuple{Base.Broadcast.Extruded{OneHotArrays.OneHotMatrix{UInt32, 3, Vector{UInt32}}, Tuple{Bool, Bool}, Tuple{Int64, Int64}}, Base.Broadcast.Broadcasted{CUDA.CuArrayStyle{2}, Nothing, typeof(+), Tuple{Base.Broadcast.Extruded{CuDeviceMatrix{Float32, 1}, Tuple{Bool, Bool}, Tuple{Int64, Int64}}, Float32}}}}, Int64) failed
KernelError: passing and using non-bitstype argument

Argument 4 to your kernel function is of type Base.Broadcast.Broadcasted{CUDA.CuArrayStyle{2}, Tuple{Base.OneTo{Int64}, Base.OneTo{Int64}}, typeof(Flux.Losses.xlogy), Tuple{Base.Broadcast.Extruded{OneHotArrays.OneHotMatrix{UInt32, 3, Vector{UInt32}}, Tuple{Bool, Bool}, Tuple{Int64, Int64}}, Base.Broadcast.Broadcasted{CUDA.CuArrayStyle{2}, Nothing, typeof(+), Tuple{Base.Broadcast.Extruded{CuDeviceMatrix{Float32, 1}, Tuple{Bool, Bool}, Tuple{Int64, Int64}}, Float32}}}}, which is not isbits:
  .args is of type Tuple{Base.Broadcast.Extruded{OneHotArrays.OneHotMatrix{UInt32, 3, Vector{UInt32}}, Tuple{Bool, Bool}, Tuple{Int64, Int64}}, Base.Broadcast.Broadcasted{CUDA.CuArrayStyle{2}, Nothing, typeof(+), Tuple{Base.Broadcast.Extruded{CuDeviceMatrix{Float32, 1}, Tuple{Bool, Bool}, Tuple{Int64, Int64}}, Float32}}} which is not isbits.
    .1 is of type Base.Broadcast.Extruded{OneHotArrays.OneHotMatrix{UInt32, 3, Vector{UInt32}}, Tuple{Bool, Bool}, Tuple{Int64, Int64}} which is not isbits.
      .x is of type OneHotArrays.OneHotMatrix{UInt32, 3, Vector{UInt32}} which is not isbits.
        .indices is of type Vector{UInt32} which is not isbits.


Stacktrace:
  [1] check_invocation(job::GPUCompiler.CompilerJob)
    @ GPUCompiler ~/.julia/packages/GPUCompiler/07qaN/src/validation.jl:88
  [2] macro expansion
    @ ~/.julia/packages/GPUCompiler/07qaN/src/driver.jl:417 [inlined]
  [3] macro expansion
    @ ~/.julia/packages/TimerOutputs/4yHI4/src/TimerOutput.jl:253 [inlined]
  [4] macro expansion
    @ ~/.julia/packages/GPUCompiler/07qaN/src/driver.jl:416 [inlined]
  [5] emit_asm(job::GPUCompiler.CompilerJob, ir::LLVM.Module; strip::Bool, validate::Bool, format::LLVM.API.LLVMCodeGenFileType)
    @ GPUCompiler ~/.julia/packages/GPUCompiler/07qaN/src/utils.jl:68
  [6] cufunction_compile(job::GPUCompiler.CompilerJob, ctx::LLVM.Context)
    @ CUDA ~/.julia/packages/CUDA/DfvRa/src/compiler/execution.jl:354
  [7] FluxML/Flux.jl#224
    @ ~/.julia/packages/CUDA/DfvRa/src/compiler/execution.jl:347 [inlined]
  [8] JuliaContext(f::CUDA.var"#224#225"{GPUCompiler.CompilerJob{GPUCompiler.PTXCompilerTarget, CUDA.CUDACompilerParams, GPUCompiler.FunctionSpec{GPUArrays.var"#broadcast_kernel#17", Tuple{CUDA.CuKernelContext, CuDeviceMatrix{Float32, 1}, Base.Broadcast.Broadcasted{CUDA.CuArrayStyle{2}, Tuple{Base.OneTo{Int64}, Base.OneTo{Int64}}, typeof(Flux.Losses.xlogy), Tuple{Base.Broadcast.Extruded{OneHotArrays.OneHotMatrix{UInt32, 3, Vector{UInt32}}, Tuple{Bool, Bool}, Tuple{Int64, Int64}}, Base.Broadcast.Broadcasted{CUDA.CuArrayStyle{2}, Nothing, typeof(+), Tuple{Base.Broadcast.Extruded{CuDeviceMatrix{Float32, 1}, Tuple{Bool, Bool}, Tuple{Int64, Int64}}, Float32}}}}, Int64}}}})
    @ GPUCompiler ~/.julia/packages/GPUCompiler/07qaN/src/driver.jl:76
  [9] cufunction_compile(job::GPUCompiler.CompilerJob)
    @ CUDA ~/.julia/packages/CUDA/DfvRa/src/compiler/execution.jl:346
 [10] cached_compilation(cache::Dict{UInt64, Any}, job::GPUCompiler.CompilerJob, compiler::typeof(CUDA.cufunction_compile), linker::typeof(CUDA.cufunction_link))
    @ GPUCompiler ~/.julia/packages/GPUCompiler/07qaN/src/cache.jl:90
 [11] cufunction(f::GPUArrays.var"#broadcast_kernel#17", tt::Type{Tuple{CUDA.CuKernelContext, CuDeviceMatrix{Float32, 1}, Base.Broadcast.Broadcasted{CUDA.CuArrayStyle{2}, Tuple{Base.OneTo{Int64}, Base.OneTo{Int64}}, typeof(Flux.Losses.xlogy), Tuple{Base.Broadcast.Extruded{OneHotArrays.OneHotMatrix{UInt32, 3, Vector{UInt32}}, Tuple{Bool, Bool}, Tuple{Int64, Int64}}, Base.Broadcast.Broadcasted{CUDA.CuArrayStyle{2}, Nothing, typeof(+), Tuple{Base.Broadcast.Extruded{CuDeviceMatrix{Float32, 1}, Tuple{Bool, Bool}, Tuple{Int64, Int64}}, Float32}}}}, Int64}}; name::Nothing, kwargs::Base.Pairs{Symbol, Union{}, Tuple{}, NamedTuple{(), Tuple{}}})
    @ CUDA ~/.julia/packages/CUDA/DfvRa/src/compiler/execution.jl:299
 [12] cufunction(f::GPUArrays.var"#broadcast_kernel#17", tt::Type{Tuple{CUDA.CuKernelContext, CuDeviceMatrix{Float32, 1}, Base.Broadcast.Broadcasted{CUDA.CuArrayStyle{2}, Tuple{Base.OneTo{Int64}, Base.OneTo{Int64}}, typeof(Flux.Losses.xlogy), Tuple{Base.Broadcast.Extruded{OneHotArrays.OneHotMatrix{UInt32, 3, Vector{UInt32}}, Tuple{Bool, Bool}, Tuple{Int64, Int64}}, Base.Broadcast.Broadcasted{CUDA.CuArrayStyle{2}, Nothing, typeof(+), Tuple{Base.Broadcast.Extruded{CuDeviceMatrix{Float32, 1}, Tuple{Bool, Bool}, Tuple{Int64, Int64}}, Float32}}}}, Int64}})
    @ CUDA ~/.julia/packages/CUDA/DfvRa/src/compiler/execution.jl:292
 [13] macro expansion
    @ ~/.julia/packages/CUDA/DfvRa/src/compiler/execution.jl:102 [inlined]
 [14] #launch_heuristic#248
    @ ~/.julia/packages/CUDA/DfvRa/src/gpuarrays.jl:17 [inlined]
 [15] _copyto!
    @ ~/.julia/packages/GPUArrays/fqD8z/src/host/broadcast.jl:63 [inlined]
 [16] copyto!
    @ ~/.julia/packages/GPUArrays/fqD8z/src/host/broadcast.jl:46 [inlined]
 [17] copy
    @ ~/.julia/packages/GPUArrays/fqD8z/src/host/broadcast.jl:37 [inlined]
 [18] materialize
    @ ./broadcast.jl:860 [inlined]
 [19] crossentropy(ŷ::CuArray{Float32, 2, CUDA.Mem.DeviceBuffer}, y::OneHotArrays.OneHotMatrix{UInt32, 3, Vector{UInt32}}; dims::Int64, agg::typeof(Statistics.mean), ϵ::Float32)
    @ Flux.Losses ~/.julia/packages/Flux/4k0Ls/src/losses/functions.jl:227
 [20] crossentropy(ŷ::CuArray{Float32, 2, CUDA.Mem.DeviceBuffer}, y::OneHotArrays.OneHotMatrix{UInt32, 3, Vector{UInt32}})
    @ Flux.Losses ~/.julia/packages/Flux/4k0Ls/src/losses/functions.jl:225
 [21] top-level scope
    @ REPL[338]:1
 [22] top-level scope
    @ ~/.julia/packages/CUDA/DfvRa/src/initialization.jl:52

Expected Results

julia> Y_pred = softmax(randn(3, 10)) #|> gpu
3×10 Matrix{Float64}:
 0.137382  0.288873  0.334149  0.0208633  0.813241  0.445647  0.0137088  0.230442  0.215155   0.520551
 0.132654  0.180597  0.248596  0.830636   0.148913  0.337832  0.838018   0.49959   0.751978   0.429119
 0.729964  0.53053   0.417255  0.1485     0.037846  0.216521  0.148273   0.269968  0.0328665  0.0503293

julia> Y_true = rand(1:3, 10) #|> gpu
10-element Vector{Int64}:
 1
 1
 3
 3
 2
 2
 1
 2
 1
 3

julia> Y_true = Flux.onehotbatch(Y_true, 1:3)
3×10 OneHotMatrix(::Vector{UInt32}) with eltype Bool:
 1  1  ⋅  ⋅  ⋅  ⋅  1  ⋅  1  ⋅
 ⋅  ⋅  ⋅  ⋅  1  1  ⋅  1  ⋅  ⋅
 ⋅  ⋅  1  1  ⋅  ⋅  ⋅  ⋅  ⋅  1

julia> Flux.crossentropy(Y_pred, Y_true)
1.8506834547376756

Observed Results

See error above

Relevant log output

No response

The text was updated successfully, but these errors were encountered:

mcabbott · 2022-11-09T16:56:50Z

This is #16 I think

ToucheSir · 2022-11-09T16:58:57Z

Yup, I moved it here to close as a dupe of #16.

rkube added the bug Something isn't working label Nov 9, 2022

ToucheSir transferred this issue from FluxML/Flux.jl Nov 9, 2022

ToucheSir closed this as not planned Won't fix, can't repro, duplicate, stale Nov 9, 2022

CarloLucibello mentioned this issue Nov 11, 2022

onehotbatch(::CuArray, ...) moves data to host #16

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Calling Flux.onehotbatch on GPU array moves data to host? #24

Calling Flux.onehotbatch on GPU array moves data to host? #24

rkube commented Nov 9, 2022

mcabbott commented Nov 9, 2022

Uh oh!

ToucheSir commented Nov 9, 2022

Uh oh!

Uh oh!

Calling Flux.onehotbatch on GPU array moves data to host? #24

Calling Flux.onehotbatch on GPU array moves data to host? #24

Comments

rkube commented Nov 9, 2022

Package Version

Julia Version

OS / Environment

Describe the bug

Steps to Reproduce

Expected Results

Observed Results

Relevant log output

mcabbott commented Nov 9, 2022

Uh oh!

ToucheSir commented Nov 9, 2022

Uh oh!