-
Notifications
You must be signed in to change notification settings - Fork 1.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
<bit>
: Could has_single_bit()
be faster?
#5359
Comments
I'm not sure. For the current implementation I'd expect something as following (assuming x64 arch, as it is primary optimization target):
And for
Which of these is better would highly depend on the context. I'd expect the current branchy function to win for predictable input, and the In sight of this uncertainty, I would avoid making any changes and introducing unnecessary complexity. |
Oh, it also seems CPU vendor specific. AMDs have cheaper |
According to GCC codegen, popcnt is the winner if available, yes. If not, the best answer is |
<bit>
: Could has_single_bit()
be faster?
Ideally, our compilers would do the right thing here, so we may need to report optimization bug(s) to C1XX and/or Clang. |
Yes, seems like gcc/clang use popcount for cpus starting from nehalem: https://godbolt.org/z/s4MnvcfW7
this one seems to be better for msvc and others for generic cpu. Looks like gcc/clang use this one as the codegen matches |
Seems like it should convert to
popcount(x) == 1
if it's available on the target arch, but looking at the code, doesn't seem like compiler would do it:STL/stl/inc/bit
Line 87 in f2a2933
The text was updated successfully, but these errors were encountered: