-
-
Notifications
You must be signed in to change notification settings - Fork 5.6k
use rapidhash #57509
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
use rapidhash #57509
Conversation
yes
yes I'm curious to see how the benchmarks look. It would be very cool if this was faster. |
xref #52440 |
|
should be in a more reviewable state now. I applied review comments so I (think) the note that I took some liberty with the @nanosoldier |
why didn't that trigger a benchmark run? |
I believe you need special permissions to run Nanosoldier. Maybe ask someone on Slack to do it |
@nanosoldier |
Question: julia/base/runtime_internals.jl Lines 830 to 833 in 4a6ada6
Lines 292 to 488 in 4a6ada6
Does |
Your job failed. |
Ah, this branch doesn't build:
|
weird, it builds fine "on my machine" even after a is it possible the build success is architecture dependent?
I don't think so, besides the fact that some types use |
@adienes the reason your nanosoldier invocation didn't trigger is that your membership in the JuliaLang org is currently set to private. |
I didn't know JuliaLang is a military organization 😆 |
eb3c200
to
948a56e
Compare
I tried to update the branch with a rebase to see if that does something |
thank you!
is this related to the PR? I had assumed it was not, but it's happened a few pushes consecutively now |
Can someone with admin rights merge this. The whitespace check passed the commit before merge with master but didn't run on the last commit for some reason. @DilumAluthge perhaps? |
Should be good to go now. |
Oh ok. I expected that to rerun all CI |
wait, was the ci checkmark lying to me? |
No I meant I didn't want to rerun all CI. Seems like it just reran the whitespace and labels checks which is good. |
If you close and reopen a PR, it will only rerun the GitHub actions jobs, such as the Whitespace job. It won't rerun the buildkite jobs. |
closes JuliaLang#57235 todos: * are the test changes acceptable? * default seed going from `0` --> `0xbdd89aa982704029` * lots and lots of benchmarking * I don't really understand the effects stuff (for the `String` hash) * how configurable should the secret be? to address JuliaLang#37166 should the seed/secret be an env var? * proper attribution to creators of `rapidhash` and also to reference implementations * I changed some `h + seed` to `h ⊻ seed` only because I don't really get why `+` was used in the first place * should instances of `hash(x) - 3h` instead be `hash(x, h)` ? --------- Co-authored-by: Jeff Bezanson <jeff.bezanson@gmail.com> Co-authored-by: Oscar Smith <oscardssmith@gmail.com> Co-authored-by: Mosè Giordano <765740+giordano@users.noreply.github.com> Co-authored-by: Ian Butterworth <i.r.butterworth@gmail.com>
Might be time to update again: Nicoshev/rapidhash#16 The rapidhash author apparently has a new better algorithm |
lol, I saw that 😭 lucky me I did try it. it is indeed faster on long strings, but it seems quite a lot slower than v1 on short strings. what's interesting to me about that, is the short path logic itself is basically identical --- the only thing changed is the code size in the other branch. that is:
vs
the first is faster even inside only the but anyway given that Julia uses this as a general purpose hash and not just for big files, I think so far it looks like V1 strikes a better balance. anyway it should be eas(ier) to update in the future than this PR was, as it basically requires changing only |
I wonder if outlining the |
I noticed that there may be a licensing problem here, the rapidhash code is not MIT licensed, and I didn't see any changes to the Julia license files (I think it just needs the correct attribution added to THIRDPARTY.md) |
We can't copy the code. We can use the algorithm though, but we do have to be careful not to copy the code. |
I am no lawyer, nor open source license expert, but FWIW I would describe this as firmly "using the algorithm" but not "copying the code" I actually made more use of the Rust port (https://github.com/hoxxep/rapidhash/blob/master/src/rapid_file.rs) than I did the original C++ source, as I found it easier to read. And the Rust port is licensed MIT and Apache-2.0 (and ofc references the C++ source as BSD 2-Clause) I am happy to submit the appropriate PR to THIRDPARTY.md shortly |
I don't think there's any third party license to add if you didn't copy the code. Using an MIT-licensed implementation as a reference is fine and doesn't require adding a license. |
Ok, if it is actually taken from the Rust port (and that port was licensed correctly), it might be ok, but from what I know, translating a BSD 2-clause still needs the license attribution:
|
This is all speculation of course, IANAL and IANYL. In any case, a translation is a derivative work in copyright terms (not just in the context of software). However I guess what happened here is not translation in that sense, rather presumably the situation is that the PR author:
When it's two different people doing that it's known as https://en.wikipedia.org/wiki/Clean-room_design |
This is not a place to play lawyer. Feel free to get together, hire an IP lawyer and get an official statement if you feel strongly that something is wrong here. I'll lock this to stop further speculation. |
closes #57235
todos:
0
-->0xbdd89aa982704029
String
hash)rapidhash
and also to reference implementationsh + seed
toh ⊻ seed
only because I don't really get why+
was used in the first placehash(x) - 3h
instead behash(x, h)
?