Skip to content

Better concatenation in out-of-place Jacobians+Hessians #563

Closed
@gdalle

Description

@gdalle

Goal: speed things up for StaticArrays without improving StaticArrays

Ideas:

  1. Replace reduce + map with mapreduce
  2. Use the trick from https://discourse.julialang.org/t/type-instability-of-mapreduce-vs-map-reduce/121136 to initialize and thus ensure type stability
  3. Replace stack(t) with hcat(t...) because t will always be a short NTuple

Related:

Benchmarks:

stack is better on Array but worse on SArray. The solution is to fix stack for SArray, at least in simple cases.

using BenchmarkTools, DataFrames, StaticArrays

badstack(t) = stack(t);
goodstack(t) = hcat(t...);

badstack(f::F, t) where {F} = stack(f, t);
goodstack(f::F, t) where {F} = hcat(map(f, t)...);

tv = ntuple(i -> rand(1000), 10);
tm = ntuple(i -> rand(100, 100), 10);
tsv = ntuple(i -> @SVector(ones(4)), 10);
tsm = ntuple(i -> @SMatrix(ones(4, 4)), 10);

data_nofunction = DataFrame()
data_function = DataFrame()

for t in [tv, tm, tsv, tsm]
    @info "Benchmarking $(typeof(t))"
    # without function
    bad = @benchmark badstack($t)
    good = @benchmark goodstack($t)
    push!(
        data_nofunction,
        (;
            input_type=typeof(t),
            bad_time=minimum(bad.times),
            good_time=minimum(good.times),
            bad_alloc=minimum(bad.allocs),
            good_alloc=minimum(good.allocs),
        ),
    )
    # with function
    bad = @benchmark badstack(vec, $t)
    good = @benchmark goodstack(vec, $t)
    push!(
        data_function,
        (;
            input_type=typeof(t),
            bad_time=minimum(bad.times),
            good_time=minimum(good.times),
            bad_alloc=minimum(bad.allocs),
            good_alloc=minimum(good.allocs),
        ),
    )
end
julia> data_nofunction
4×5 DataFrame
 Row │ input_type                         bad_time   good_time  bad_alloc  good_alloc 
     │ DataType                           Float64    Float64    Int64      Int64      
─────┼────────────────────────────────────────────────────────────────────────────────
   1 │ NTuple{10, Vector{Float64}}         4682.83   16404.0            2           2
   2 │ NTuple{10, Matrix{Float64}}        43849.0    44823.0            2           2
   3 │ NTuple{10, SVector{4, Float64}}      166.013      5.478          1           0
   4 │ NTuple{10, SMatrix{4, 4, Float64    258.992     27.8            1           0

julia> data_function
4×5 DataFrame
 Row │ input_type                         bad_time   good_time    bad_alloc  good_alloc 
     │ DataType                           Float64    Float64      Int64      Int64      
─────┼──────────────────────────────────────────────────────────────────────────────────
   1 │ NTuple{10, Vector{Float64}}         5293.17    18105.0             2           2
   2 │ NTuple{10, Matrix{Float64}}        45230.0    184490.0            22          22
   3 │ NTuple{10, SVector{4, Float64}}      146.347       5.321           1           0
   4 │ NTuple{10, SMatrix{4, 4, Float64    226.041      27.4312          1           0

Metadata

Metadata

Assignees

No one assigned

    Labels

    coreRelated to the core utilities of the package

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions