Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Chained hash pipelining in array hashing #58252
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Chained hash pipelining in array hashing #58252
Changes from all commits
28cd8fb
d6c90bd
8525bcc
1fb79bf
2de255c
ef6a236
7930179
4f001b9
File filter
Filter by extension
Conversations
Jump to
There are no files selected for viewing
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So if I'm understanding this right, this hashes every single element in arrays up to 65536, and then jumps to fib hashing, where approximately 4096*(log(65536/4096)+1) ≈ 15000 elements (at most) get hashed. I think. What's the discontinuity in perf there look like?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I tried to choose the threshold to a power of 2 so that there's no (or as little) discontinuity as possible. but of course that will depend on the specifics. In my case, that was matching the performance on
Vector{Int}
on an M1. but for arrays of eltype that are much slower to hash thanInt
(e.g. more than let's say a few 10s of ns) it's certainly possible it would be better to choose a lower threshold. the problem is I don't know how to determine those specifics in advance, so I just picked what I assumed to be the most common use case. maybe a fun target for PGO!There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The cutoffs in the original fib algorithm here aren't highly tuned themselves; I pulled them out of thin air similarly. So this needn't be super highly tuned; the core goal is good enough. I was just curious as the plots above show this discontinuity on master (at 8192 elements) but cut off before the new 65536 discontinuity hits.
One option that might be quite cheap would be to hash chunks of 8 consecutive elements at a time, even within the fib algo. That is, do the
@nexprs 8 i -> p_i = hash(A[n + i - 1], p_i)
thing at every iteration, and then increasing n and the size of the fib skips by 8x. Then we could also just hash thekey
once per 8 elements, too. I suppose, though, that for most sparse matrices this would end up hashing one nonzero and seven zeros... and then effectively 8x fewer nonzeros would end up getting included in the hash (because skips would be 8x bigger).But maybe a chunk size of 2 or 4 would be a good compromise. In fact, 2 would probably end up including the about the same number of nonzeros in the hash for most sparse matrices (since we're already bouncing between finding zeros and nonzeros).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is using linear indexing for all elements; I suspect it'll incur a pretty big perf hit for sparse and perhaps some other arrays. Sparse performance in the fib algorithm comes from both Cartesian indexing and a sparse specialization on the
findprev(!isequal(elt), A, keyidx)
search above.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
good point. I'll take a look at some sparse benchmarks