Closed
Description
Goal: speed things up for StaticArrays without improving StaticArrays
Ideas:
- Replace
reduce
+map
withmapreduce
- Use the trick from https://discourse.julialang.org/t/type-instability-of-mapreduce-vs-map-reduce/121136 to initialize and thus ensure type stability
- Replace
stack(t)
withhcat(t...)
becauset
will always be a shortNTuple
Related:
- Add direct Enzyme support SciML/NonlinearSolve.jl#476
- Minor
hessian
fix #561 where my first attempt failed
Benchmarks:
stack
is better on Array
but worse on SArray
. The solution is to fix stack
for SArray
, at least in simple cases.
using BenchmarkTools, DataFrames, StaticArrays
badstack(t) = stack(t);
goodstack(t) = hcat(t...);
badstack(f::F, t) where {F} = stack(f, t);
goodstack(f::F, t) where {F} = hcat(map(f, t)...);
tv = ntuple(i -> rand(1000), 10);
tm = ntuple(i -> rand(100, 100), 10);
tsv = ntuple(i -> @SVector(ones(4)), 10);
tsm = ntuple(i -> @SMatrix(ones(4, 4)), 10);
data_nofunction = DataFrame()
data_function = DataFrame()
for t in [tv, tm, tsv, tsm]
@info "Benchmarking $(typeof(t))"
# without function
bad = @benchmark badstack($t)
good = @benchmark goodstack($t)
push!(
data_nofunction,
(;
input_type=typeof(t),
bad_time=minimum(bad.times),
good_time=minimum(good.times),
bad_alloc=minimum(bad.allocs),
good_alloc=minimum(good.allocs),
),
)
# with function
bad = @benchmark badstack(vec, $t)
good = @benchmark goodstack(vec, $t)
push!(
data_function,
(;
input_type=typeof(t),
bad_time=minimum(bad.times),
good_time=minimum(good.times),
bad_alloc=minimum(bad.allocs),
good_alloc=minimum(good.allocs),
),
)
end
julia> data_nofunction
4×5 DataFrame
Row │ input_type bad_time good_time bad_alloc good_alloc
│ DataType Float64 Float64 Int64 Int64
─────┼────────────────────────────────────────────────────────────────────────────────
1 │ NTuple{10, Vector{Float64}} 4682.83 16404.0 2 2
2 │ NTuple{10, Matrix{Float64}} 43849.0 44823.0 2 2
3 │ NTuple{10, SVector{4, Float64}} 166.013 5.478 1 0
4 │ NTuple{10, SMatrix{4, 4, Float64… 258.992 27.8 1 0
julia> data_function
4×5 DataFrame
Row │ input_type bad_time good_time bad_alloc good_alloc
│ DataType Float64 Float64 Int64 Int64
─────┼──────────────────────────────────────────────────────────────────────────────────
1 │ NTuple{10, Vector{Float64}} 5293.17 18105.0 2 2
2 │ NTuple{10, Matrix{Float64}} 45230.0 184490.0 22 22
3 │ NTuple{10, SVector{4, Float64}} 146.347 5.321 1 0
4 │ NTuple{10, SMatrix{4, 4, Float64… 226.041 27.4312 1 0