Skip to content

Chained hash pipelining in array hashing #58252

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 8 commits into
base: master
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
76 changes: 0 additions & 76 deletions base/abstractarray.jl
Original file line number Diff line number Diff line change
Expand Up @@ -3559,81 +3559,6 @@ pushfirst!(A, a, b, c...) = pushfirst!(pushfirst!(A, c...), a, b)
# sizehint! does not nothing by default
sizehint!(a::AbstractVector, _) = a

## hashing AbstractArray ##

const hash_abstractarray_seed = UInt === UInt64 ? 0x7e2d6fb6448beb77 : 0xd4514ce5
function hash(A::AbstractArray, h::UInt)
h ⊻= hash_abstractarray_seed
# Axes are themselves AbstractArrays, so hashing them directly would stack overflow
# Instead hash the tuple of firsts and lasts along each dimension
h = hash(map(first, axes(A)), h)
h = hash(map(last, axes(A)), h)

# For short arrays, it's not worth doing anything complicated
if length(A) < 8192
for x in A
h = hash(x, h)
end
return h
end

# Goal: Hash approximately log(N) entries with a higher density of hashed elements
# weighted towards the end and special consideration for repeated values. Colliding
# hashes will often subsequently be compared by equality -- and equality between arrays
# works elementwise forwards and is short-circuiting. This means that a collision
# between arrays that differ by elements at the beginning is cheaper than one where the
# difference is towards the end. Furthermore, choosing `log(N)` arbitrary entries from a
# sparse array will likely only choose the same element repeatedly (zero in this case).

# To achieve this, we work backwards, starting by hashing the last element of the
# array. After hashing each element, we skip `fibskip` elements, where `fibskip`
# is pulled from the Fibonacci sequence -- Fibonacci was chosen as a simple
# ~O(log(N)) algorithm that ensures we don't hit a common divisor of a dimension
# and only end up hashing one slice of the array (as might happen with powers of
# two). Finally, we find the next distinct value from the one we just hashed.

# This is a little tricky since skipping an integer number of values inherently works
# with linear indices, but `findprev` uses `keys`. Hoist out the conversion "maps":
ks = keys(A)
key_to_linear = LinearIndices(ks) # Index into this map to compute the linear index
linear_to_key = vec(ks) # And vice-versa

# Start at the last index
keyidx = last(ks)
linidx = key_to_linear[keyidx]
fibskip = prevfibskip = oneunit(linidx)
first_linear = first(LinearIndices(linear_to_key))
n = 0
while true
n += 1
# Hash the element
elt = A[keyidx]
h = hash(keyidx=>elt, h)

# Skip backwards a Fibonacci number of indices -- this is a linear index operation
linidx = key_to_linear[keyidx]
linidx < fibskip + first_linear && break
linidx -= fibskip
keyidx = linear_to_key[linidx]

# Only increase the Fibonacci skip once every N iterations. This was chosen
# to be big enough that all elements of small arrays get hashed while
# obscenely large arrays are still tractable. With a choice of N=4096, an
# entirely-distinct 8000-element array will have ~75% of its elements hashed,
# with every other element hashed in the first half of the array. At the same
# time, hashing a `typemax(Int64)`-length Float64 range takes about a second.
if rem(n, 4096) == 0
fibskip, prevfibskip = fibskip + prevfibskip, fibskip
end

# Find a key index with a value distinct from `elt` -- might be `keyidx` itself
keyidx = findprev(!isequal(elt), A, keyidx)
keyidx === nothing && break
end

return h
end

# The semantics of `collect` are weird. Better to write our own
function rest(a::AbstractArray{T}, state...) where {T}
v = Vector{T}(undef, 0)
Expand All @@ -3642,7 +3567,6 @@ function rest(a::AbstractArray{T}, state...) where {T}
return foldl(push!, Iterators.rest(a, state...), init=v)
end


## keepat! ##

# NOTE: since these use `@inbounds`, they are actually only intended for Vector and BitVector
Expand Down
2 changes: 1 addition & 1 deletion base/hashing.jl
Original file line number Diff line number Diff line change
Expand Up @@ -45,7 +45,7 @@ end
hash_mix(a::UInt64, b::UInt64) = ⊻(mul_parts(a, b)...)

# faster-but-weaker than hash_mix intended for small keys
hash_mix_linear(x::UInt64, h::UInt) = 3h - x
hash_mix_linear(x::Union{UInt64, UInt32}, h::UInt) = 3h - x
function hash_finalizer(x::UInt64)
x ⊻= (x >> 32)
x *= 0x63652a4cd374b267
Expand Down
102 changes: 102 additions & 0 deletions base/multidimensional.jl
Original file line number Diff line number Diff line change
Expand Up @@ -1999,3 +1999,105 @@ end

getindex(b::Ref, ::CartesianIndex{0}) = getindex(b)
setindex!(b::Ref, x, ::CartesianIndex{0}) = setindex!(b, x)

## hashing AbstractArray ## can't be put in abstractarray.jl due to bootstrapping problems with the use of @nexpr

function _hash_fib(A::AbstractArray, h::UInt)
# Goal: Hash approximately log(N) entries with a higher density of hashed elements
# weighted towards the end and special consideration for repeated values. Colliding
# hashes will often subsequently be compared by equality -- and equality between arrays
# works elementwise forwards and is short-circuiting. This means that a collision
# between arrays that differ by elements at the beginning is cheaper than one where the
# difference is towards the end. Furthermore, choosing `log(N)` arbitrary entries from a
# sparse array will likely only choose the same element repeatedly (zero in this case).

# To achieve this, we work backwards, starting by hashing the last element of the
# array. After hashing each element, we skip `fibskip` elements, where `fibskip`
# is pulled from the Fibonacci sequence -- Fibonacci was chosen as a simple
# ~O(log(N)) algorithm that ensures we don't hit a common divisor of a dimension
# and only end up hashing one slice of the array (as might happen with powers of
# two). Finally, we find the next distinct value from the one we just hashed.

# This is a little tricky since skipping an integer number of values inherently works
# with linear indices, but `findprev` uses `keys`. Hoist out the conversion "maps":
ks = keys(A)
key_to_linear = LinearIndices(ks) # Index into this map to compute the linear index
linear_to_key = vec(ks) # And vice-versa

# Start at the last index
keyidx = last(ks)
linidx = key_to_linear[keyidx]
fibskip = prevfibskip = oneunit(linidx)
first_linear = first(LinearIndices(linear_to_key))
@nexprs 8 i -> p_i = h

n = 0
while true
n += 1
# Hash the element
elt = A[keyidx]

stream_idx = mod1(n, 8)
@nexprs 8 i -> stream_idx == i && (p_i = hash(keyidx => elt, p_i))

# Skip backwards a Fibonacci number of indices -- this is a linear index operation
linidx = key_to_linear[keyidx]
linidx < fibskip + first_linear && break
linidx -= fibskip
keyidx = linear_to_key[linidx]

# Only increase the Fibonacci skip once every N iterations. This was chosen
# to be big enough that all elements of small arrays get hashed while
# obscenely large arrays are still tractable. With a choice of N=4096, an
# entirely-distinct 8000-element array will have ~75% of its elements hashed,
# with every other element hashed in the first half of the array. At the same
# time, hashing a `typemax(Int64)`-length Float64 range takes about a second.
if rem(n, 4096) == 0
fibskip, prevfibskip = fibskip + prevfibskip, fibskip
end

# Find a key index with a value distinct from `elt` -- might be `keyidx` itself
keyidx = findprev(!isequal(elt), A, keyidx)
keyidx === nothing && break
end

@nexprs 8 i -> h = hash_mix_linear(p_i, h)
return hash_uint(h)
end

const hash_abstractarray_seed = UInt === UInt64 ? 0x7e2d6fb6448beb77 : 0xd4514ce5
function hash(A::AbstractArray, h::UInt)
h ⊻= hash_abstractarray_seed
# Axes are themselves AbstractArrays, so hashing them directly would stack overflow
# Instead hash the tuple of firsts and lasts along each dimension
h = hash(map(first, axes(A)), h)
h = hash(map(last, axes(A)), h)

len = length(A)

if len < 8
# for the shortest arrays we chain directly
for elt in A
h = hash(elt, h)
end
return h
elseif len < 65536
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So if I'm understanding this right, this hashes every single element in arrays up to 65536, and then jumps to fib hashing, where approximately 4096*(log(65536/4096)+1) ≈ 15000 elements (at most) get hashed. I think. What's the discontinuity in perf there look like?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I tried to choose the threshold to a power of 2 so that there's no (or as little) discontinuity as possible. but of course that will depend on the specifics. In my case, that was matching the performance on Vector{Int} on an M1. but for arrays of eltype that are much slower to hash than Int (e.g. more than let's say a few 10s of ns) it's certainly possible it would be better to choose a lower threshold. the problem is I don't know how to determine those specifics in advance, so I just picked what I assumed to be the most common use case. maybe a fun target for PGO!

Copy link
Member

@mbauman mbauman May 21, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The cutoffs in the original fib algorithm here aren't highly tuned themselves; I pulled them out of thin air similarly. So this needn't be super highly tuned; the core goal is good enough. I was just curious as the plots above show this discontinuity on master (at 8192 elements) but cut off before the new 65536 discontinuity hits.

One option that might be quite cheap would be to hash chunks of 8 consecutive elements at a time, even within the fib algo. That is, do the @nexprs 8 i -> p_i = hash(A[n + i - 1], p_i) thing at every iteration, and then increasing n and the size of the fib skips by 8x. Then we could also just hash the key once per 8 elements, too. I suppose, though, that for most sparse matrices this would end up hashing one nonzero and seven zeros... and then effectively 8x fewer nonzeros would end up getting included in the hash (because skips would be 8x bigger).

But maybe a chunk size of 2 or 4 would be a good compromise. In fact, 2 would probably end up including the about the same number of nonzeros in the hash for most sparse matrices (since we're already bouncing between finding zeros and nonzeros).

# separate accumulator streams, unrolled
@nexprs 8 i -> p_i = h
n = 1
limit = len - 7
while n <= limit
@nexprs 8 i -> p_i = hash(A[n + i - 1], p_i)
Copy link
Member

@mbauman mbauman May 21, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is using linear indexing for all elements; I suspect it'll incur a pretty big perf hit for sparse and perhaps some other arrays. Sparse performance in the fib algorithm comes from both Cartesian indexing and a sparse specialization on the findprev(!isequal(elt), A, keyidx) search above.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

good point. I'll take a look at some sparse benchmarks

n += 8
end
while n <= len
p_1 = hash(A[n], p_1)
n += 1
end
# fold all streams back together
@nexprs 8 i -> h = hash_mix_linear(p_i, h)
return hash_uint(h)
else
return _hash_fib(A, h)
end
end