Chained hash pipelining in array hashing #58252

adienes · 2025-04-28T16:36:26Z

the proposed switch in #57509 from 3h - hash_finalizer(x) to hash_finalizer(3h -x) should increase the hash quality of chained hashes, as the expanded expression goes from something like sum((-3)^k * hash(x) for k in ...) to a non-simplifiable composition

this does have the unfortunate impact of long chains of hashes getting a bit slower as there is more data dependency and the CPU cannot work on the next element's hash before combining the previous one (I think --- I'm not particularly an expert on this low level stuff). As far as I know this only really impacts AbstractArray

so, I've implemented a proposal that does some unrolling / pipelining manually to recover AbstractArray hashing performance. in fact, it's quite a lot faster now for most lengths. I tuned the thresholds (8 accumulators, certain length breakpoints) by hand on my own machine.

oscardssmith · 2025-04-28T20:00:48Z

show performance benchmarks and then lgtm.

adienes · 2025-04-28T20:28:22Z

#57509 (comment) the vec column contains timing; the data for this PR is under :commit == "pipeline" sorry I should have been more clear in that comment

graphically:

note that this cannot merge before #57509, which is also waiting on #58053

adienes · 2025-05-11T17:50:27Z

CI failure unrelated

adienes · 2025-05-15T16:12:33Z

should this method be used for big Tuples as well?

oscardssmith · 2025-05-15T17:39:31Z

how does the Tuple perf look? if it only helps for 100 or more, I wouldn't bother. if it's useful in the 10-100 range, I think we should

adienes · 2025-05-16T21:36:42Z

eh, idt it helps. I guess tuples should mostly be hashing at compile time anyway

adienes · 2025-05-21T12:15:01Z

another question popped up: should this be extracted do a different function (like hash_array or something) with a wider signature, and then hash(::AbstractArray) forwards to it? I ask because there are a few types around the ecosystem that could in theory be duck-typed into this method just fine, but they do not subtype AbstractArray. see BioSequences.jl for one such example where hashing would be 5x faster than what's implemented in that package (with foldl) if it could use this method.

oscardssmith · 2025-05-21T13:37:20Z

seems reasonable

base/multidimensional.jl

mbauman · 2025-05-21T13:45:43Z

base/multidimensional.jl

+            h = hash(elt, h)
+        end
+        return h
+    elseif len < 65536


So if I'm understanding this right, this hashes every single element in arrays up to 65536, and then jumps to fib hashing, where approximately 4096*(log(65536/4096)+1) ≈ 15000 elements (at most) get hashed. I think. What's the discontinuity in perf there look like?

I tried to choose the threshold to a power of 2 so that there's no (or as little) discontinuity as possible. but of course that will depend on the specifics. In my case, that was matching the performance on Vector{Int} on an M1. but for arrays of eltype that are much slower to hash than Int (e.g. more than let's say a few 10s of ns) it's certainly possible it would be better to choose a lower threshold. the problem is I don't know how to determine those specifics in advance, so I just picked what I assumed to be the most common use case. maybe a fun target for PGO!

The cutoffs in the original fib algorithm here aren't highly tuned themselves; I pulled them out of thin air similarly. So this needn't be super highly tuned; the core goal is good enough. I was just curious as the plots above show this discontinuity on master (at 8192 elements) but cut off before the new 65536 discontinuity hits.

One option that might be quite cheap would be to hash chunks of 8 consecutive elements at a time, even within the fib algo. That is, do the @nexprs 8 i -> p_i = hash(A[n + i - 1], p_i) thing at every iteration, and then increasing n and the size of the fib skips by 8x. Then we could also just hash the key once per 8 elements, too. I suppose, though, that for most sparse matrices this would end up hashing one nonzero and seven zeros... and then effectively 8x fewer nonzeros would end up getting included in the hash (because skips would be 8x bigger).

But maybe a chunk size of 2 or 4 would be a good compromise. In fact, 2 would probably end up including the about the same number of nonzeros in the hash for most sparse matrices (since we're already bouncing between finding zeros and nonzeros).

mbauman · 2025-05-21T13:50:10Z

base/multidimensional.jl

+        n  = 1
+        limit = len - 7
+        while n <= limit
+            @nexprs 8 i -> p_i = hash(A[n + i - 1], p_i)


This is using linear indexing for all elements; I suspect it'll incur a pretty big perf hit for sparse and perhaps some other arrays. Sparse performance in the fib algorithm comes from both Cartesian indexing and a sparse specialization on the findprev(!isequal(elt), A, keyidx) search above.

good point. I'll take a look at some sparse benchmarks

adienes added 3 commits April 27, 2025 15:08

pipelining for chained hash

28cd8fb

better impl

d6c90bd

move to multidimensional

8525bcc

adienes added performance Must go faster hashing labels Apr 28, 2025

adienes mentioned this pull request Apr 28, 2025

use rapidhash #57509

Merged

whitespace

1fb79bf

adienes added 4 commits May 10, 2025 10:06

Merge branch 'master' into chained_hash_pipelining

2de255c

I'm the CEO of hating 32-bit systems

ef6a236

Merge branch 'master' into chained_hash_pipelining

7930179

fix

4f001b9

adienes added the status: waiting for PR reviewer label May 17, 2025

mbauman reviewed May 21, 2025

View reviewed changes

adienes added status: waiting for PR author and removed status: waiting for PR reviewer labels May 21, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Chained hash pipelining in array hashing #58252

Chained hash pipelining in array hashing #58252

adienes commented Apr 28, 2025

oscardssmith commented Apr 28, 2025

adienes commented Apr 28, 2025

adienes commented May 11, 2025

adienes commented May 15, 2025

oscardssmith commented May 15, 2025

adienes commented May 16, 2025

adienes commented May 21, 2025

oscardssmith commented May 21, 2025

mbauman May 21, 2025

adienes May 21, 2025

mbauman May 21, 2025 •

edited

Loading

mbauman May 21, 2025 •

edited

Loading

adienes May 21, 2025

Chained hash pipelining in array hashing #58252

Are you sure you want to change the base?

Chained hash pipelining in array hashing #58252

Conversation

adienes commented Apr 28, 2025

oscardssmith commented Apr 28, 2025

adienes commented Apr 28, 2025

adienes commented May 11, 2025

adienes commented May 15, 2025

oscardssmith commented May 15, 2025

adienes commented May 16, 2025

adienes commented May 21, 2025

oscardssmith commented May 21, 2025

mbauman May 21, 2025

Choose a reason for hiding this comment

adienes May 21, 2025

Choose a reason for hiding this comment

mbauman May 21, 2025 • edited Loading

Choose a reason for hiding this comment

mbauman May 21, 2025 • edited Loading

Choose a reason for hiding this comment

adienes May 21, 2025

Choose a reason for hiding this comment

mbauman May 21, 2025 •

edited

Loading

mbauman May 21, 2025 •

edited

Loading