-
-
Notifications
You must be signed in to change notification settings - Fork 107
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
loongarch64: add vector routines #171
Conversation
$ rebar diff -e -e memchr/memmem loong64-base.csv loong64-lsx.csv -t 1.1 benchmark engine loong64-base.csv loong64-lsx.csv --------- ------ ------------------- ------------------ memmem/byterank/binary rust/memchr/memmem/oneshot 970.1 MB/s (1.81x) 1755.4 MB/s (1.00x) memmem/byterank/binary rust/memchr/memmem/prebuilt 969.8 MB/s (1.83x) 1771.1 MB/s (1.00x) memmem/byterank/binary rust/memchr/memmem/binary 9.4 GB/s (1.16x) 10.9 GB/s (1.00x) memmem/code/rust-library-never-fn-strength rust/memchr/memmem/oneshot 4.4 GB/s (2.80x) 12.2 GB/s (1.00x) memmem/code/rust-library-never-fn-strength rust/memchr/memmem/prebuilt 4.4 GB/s (2.79x) 12.1 GB/s (1.00x) memmem/code/rust-library-never-fn-strength-paren rust/memchr/memmem/oneshot 4.3 GB/s (2.81x) 12.1 GB/s (1.00x) memmem/code/rust-library-never-fn-strength-paren rust/memchr/memmem/prebuilt 4.3 GB/s (2.81x) 12.1 GB/s (1.00x) memmem/code/rust-library-never-fn-quux rust/memchr/memmem/oneshot 6.8 GB/s (1.83x) 12.4 GB/s (1.00x) memmem/code/rust-library-never-fn-quux rust/memchr/memmem/prebuilt 6.8 GB/s (1.83x) 12.4 GB/s (1.00x) memmem/code/rust-library-rare-fn-from-str rust/memchr/memmem/oneshot 4.0 GB/s (3.03x) 12.1 GB/s (1.00x) memmem/code/rust-library-rare-fn-from-str rust/memchr/memmem/prebuilt 4.0 GB/s (3.04x) 12.1 GB/s (1.00x) memmem/code/rust-library-common-fn-is-empty rust/memchr/memmem/oneshot 4.6 GB/s (2.59x) 12.0 GB/s (1.00x) memmem/code/rust-library-common-fn-is-empty rust/memchr/memmem/prebuilt 4.6 GB/s (2.60x) 12.1 GB/s (1.00x) memmem/code/rust-library-common-fn rust/memchr/memmem/oneshot 1966.2 MB/s (3.68x) 7.1 GB/s (1.00x) memmem/code/rust-library-common-fn rust/memchr/memmem/prebuilt 2.1 GB/s (5.14x) 10.6 GB/s (1.00x) memmem/code/rust-library-common-let rust/memchr/memmem/oneshot 1227.9 MB/s (3.81x) 4.6 GB/s (1.00x) memmem/code/rust-library-common-let rust/memchr/memmem/prebuilt 1343.4 MB/s (5.36x) 7.0 GB/s (1.00x) memmem/pathological/md5-huge-no-hash rust/memchr/memmem/oneshot 724.7 MB/s (12.77x) 9.0 GB/s (1.00x) memmem/pathological/md5-huge-no-hash rust/memchr/memmem/prebuilt 724.6 MB/s (12.87x) 9.1 GB/s (1.00x) memmem/pathological/md5-huge-last-hash rust/memchr/memmem/oneshot 734.3 MB/s (12.54x) 9.0 GB/s (1.00x) memmem/pathological/md5-huge-last-hash rust/memchr/memmem/prebuilt 735.7 MB/s (12.63x) 9.1 GB/s (1.00x) memmem/pathological/rare-repeated-huge-tricky rust/memchr/memmem/oneshot 333.5 MB/s (37.98x) 12.4 GB/s (1.00x) memmem/pathological/rare-repeated-huge-tricky rust/memchr/memmem/prebuilt 355.9 MB/s (35.65x) 12.4 GB/s (1.00x) memmem/pathological/rare-repeated-huge-match rust/memchr/memmem/oneshot 97.3 MB/s (1.82x) 177.3 MB/s (1.00x) memmem/pathological/rare-repeated-huge-match rust/memchr/memmem/prebuilt 581.2 MB/s (1.29x) 747.1 MB/s (1.00x) memmem/pathological/rare-repeated-small-tricky rust/memchr/memmem/oneshot 291.0 MB/s (25.23x) 7.2 GB/s (1.00x) memmem/pathological/rare-repeated-small-tricky rust/memchr/memmem/prebuilt 336.1 MB/s (31.56x) 10.4 GB/s (1.00x) memmem/pathological/rare-repeated-small-match rust/memchr/memmem/oneshot 102.3 MB/s (1.87x) 191.7 MB/s (1.00x) memmem/pathological/rare-repeated-small-match rust/memchr/memmem/prebuilt 474.9 MB/s (1.84x) 875.8 MB/s (1.00x) memmem/pathological/defeat-simple-vector-alphabet rust/memchr/memmem/oneshot 1119.5 MB/s (1.00x) 882.8 MB/s (1.27x) memmem/pathological/defeat-simple-vector-alphabet rust/memchr/memmem/prebuilt 1119.5 MB/s (1.00x) 882.9 MB/s (1.27x) memmem/sliceslice/seemingly-random rust/memchr/memmem/prebuilt 787.6 KB/s (3.14x) 2.4 MB/s (1.00x) memmem/sliceslice/i386 rust/memchr/memmem/prebuilt 3.7 MB/s (3.47x) 13.0 MB/s (1.00x) memmem/subtitles/common/huge-en-that rust/memchr/memmem/oneshot 858.8 MB/s (6.63x) 5.6 GB/s (1.00x) memmem/subtitles/common/huge-en-that rust/memchr/memmem/prebuilt 895.9 MB/s (8.48x) 7.4 GB/s (1.00x) memmem/subtitles/common/huge-en-you rust/memchr/memmem/oneshot 1028.3 MB/s (2.15x) 2.2 GB/s (1.00x) memmem/subtitles/common/huge-en-you rust/memchr/memmem/prebuilt 1335.8 MB/s (3.98x) 5.2 GB/s (1.00x) memmem/subtitles/common/huge-en-one-space rust/memchr/memmem/oneshot 298.4 MB/s (1.58x) 471.7 MB/s (1.00x) memmem/subtitles/common/huge-en-one-space rust/memchr/memmem/prebuilt 379.8 MB/s (1.57x) 596.9 MB/s (1.00x) memmem/subtitles/common/huge-ru-that rust/memchr/memmem/oneshot 857.8 MB/s (6.77x) 5.7 GB/s (1.00x) memmem/subtitles/common/huge-ru-that rust/memchr/memmem/prebuilt 917.4 MB/s (9.70x) 8.7 GB/s (1.00x) memmem/subtitles/common/huge-ru-not rust/memchr/memmem/oneshot 846.0 MB/s (3.62x) 3.0 GB/s (1.00x) memmem/subtitles/common/huge-ru-not rust/memchr/memmem/prebuilt 979.9 MB/s (6.17x) 5.9 GB/s (1.00x) memmem/subtitles/common/huge-ru-one-space rust/memchr/memmem/oneshot 464.3 MB/s (1.46x) 675.8 MB/s (1.00x) memmem/subtitles/common/huge-ru-one-space rust/memchr/memmem/prebuilt 573.5 MB/s (1.43x) 818.5 MB/s (1.00x) memmem/subtitles/common/huge-zh-that rust/memchr/memmem/oneshot 4.0 GB/s (1.60x) 6.5 GB/s (1.00x) memmem/subtitles/common/huge-zh-that rust/memchr/memmem/prebuilt 5.1 GB/s (1.75x) 8.9 GB/s (1.00x) memmem/subtitles/common/huge-zh-do-not rust/memchr/memmem/oneshot 2.1 GB/s (1.67x) 3.5 GB/s (1.00x) memmem/subtitles/common/huge-zh-do-not rust/memchr/memmem/prebuilt 2.8 GB/s (2.34x) 6.5 GB/s (1.00x) memmem/subtitles/common/huge-zh-one-space rust/memchr/memmem/oneshot 1281.4 MB/s (1.13x) 1453.3 MB/s (1.00x) memmem/subtitles/never/huge-en-john-watson rust/memchr/memmem/oneshot 10.0 GB/s (1.23x) 12.4 GB/s (1.00x) memmem/subtitles/never/huge-en-john-watson rust/memchr/memmem/prebuilt 10.1 GB/s (1.23x) 12.4 GB/s (1.00x) memmem/subtitles/never/huge-en-all-common-bytes rust/memchr/memmem/oneshot 1152.0 MB/s (8.68x) 9.8 GB/s (1.00x) memmem/subtitles/never/huge-en-all-common-bytes rust/memchr/memmem/prebuilt 1153.4 MB/s (8.68x) 9.8 GB/s (1.00x) memmem/subtitles/never/huge-en-some-rare-bytes rust/memchr/memmem/oneshot 10.2 GB/s (1.21x) 12.4 GB/s (1.00x) memmem/subtitles/never/huge-en-some-rare-bytes rust/memchr/memmem/prebuilt 10.2 GB/s (1.21x) 12.4 GB/s (1.00x) memmem/subtitles/never/huge-en-two-space rust/memchr/memmem/oneshot 728.5 MB/s (22.29x) 15.9 GB/s (1.00x) memmem/subtitles/never/huge-en-two-space rust/memchr/memmem/prebuilt 728.5 MB/s (22.35x) 15.9 GB/s (1.00x) memmem/subtitles/never/teeny-en-john-watson rust/memchr/memmem/prebuilt 890.1 MB/s (1.50x) 1335.1 MB/s (1.00x) memmem/subtitles/never/teeny-en-all-common-bytes rust/memchr/memmem/prebuilt 890.1 MB/s (1.50x) 1335.1 MB/s (1.00x) memmem/subtitles/never/teeny-en-some-rare-bytes rust/memchr/memmem/prebuilt 890.1 MB/s (1.50x) 1335.1 MB/s (1.00x) memmem/subtitles/never/teeny-en-two-space rust/memchr/memmem/prebuilt 667.6 MB/s (2.00x) 1335.1 MB/s (1.00x) memmem/subtitles/never/huge-ru-john-watson rust/memchr/memmem/oneshot 10.0 GB/s (1.24x) 12.4 GB/s (1.00x) memmem/subtitles/never/huge-ru-john-watson rust/memchr/memmem/prebuilt 10.0 GB/s (1.24x) 12.4 GB/s (1.00x) memmem/subtitles/never/teeny-ru-john-watson rust/memchr/memmem/prebuilt 1335.1 MB/s (1.50x) 2002.7 MB/s (1.00x) memmem/subtitles/never/huge-zh-john-watson rust/memchr/memmem/oneshot 4.9 GB/s (2.48x) 12.2 GB/s (1.00x) memmem/subtitles/never/huge-zh-john-watson rust/memchr/memmem/prebuilt 4.9 GB/s (2.48x) 12.2 GB/s (1.00x) memmem/subtitles/never/teeny-zh-john-watson rust/memchr/memmem/prebuilt 985.5 MB/s (1.50x) 1478.2 MB/s (1.00x) memmem/subtitles/rare/huge-en-sherlock-holmes rust/memchr/memmem/oneshot 7.6 GB/s (1.64x) 12.4 GB/s (1.00x) memmem/subtitles/rare/huge-en-sherlock-holmes rust/memchr/memmem/prebuilt 7.6 GB/s (1.64x) 12.4 GB/s (1.00x) memmem/subtitles/rare/huge-en-sherlock rust/memchr/memmem/oneshot 4.0 GB/s (3.08x) 12.3 GB/s (1.00x) memmem/subtitles/rare/huge-en-sherlock rust/memchr/memmem/prebuilt 4.0 GB/s (3.09x) 12.3 GB/s (1.00x) memmem/subtitles/rare/huge-en-medium-needle rust/memchr/memmem/oneshot 3.7 GB/s (3.22x) 12.0 GB/s (1.00x) memmem/subtitles/rare/huge-en-medium-needle rust/memchr/memmem/prebuilt 3.7 GB/s (3.23x) 12.1 GB/s (1.00x) memmem/subtitles/rare/huge-en-long-needle rust/memchr/memmem/oneshot 4.3 GB/s (3.54x) 15.1 GB/s (1.00x) memmem/subtitles/rare/huge-en-long-needle rust/memchr/memmem/prebuilt 4.3 GB/s (3.60x) 15.5 GB/s (1.00x) memmem/subtitles/rare/huge-en-huge-needle rust/memchr/memmem/oneshot 7.2 GB/s (1.99x) 14.4 GB/s (1.00x) memmem/subtitles/rare/huge-en-huge-needle rust/memchr/memmem/prebuilt 7.5 GB/s (2.07x) 15.6 GB/s (1.00x) memmem/subtitles/rare/teeny-en-sherlock-holmes rust/memchr/memmem/prebuilt 534.1 MB/s (1.67x) 890.1 MB/s (1.00x) memmem/subtitles/rare/teeny-en-sherlock rust/memchr/memmem/prebuilt 534.1 MB/s (1.67x) 890.1 MB/s (1.00x) memmem/subtitles/rare/huge-ru-sherlock-holmes rust/memchr/memmem/oneshot 10.7 GB/s (1.15x) 12.4 GB/s (1.00x) memmem/subtitles/rare/huge-ru-sherlock-holmes rust/memchr/memmem/prebuilt 10.9 GB/s (1.14x) 12.4 GB/s (1.00x) memmem/subtitles/rare/huge-ru-sherlock rust/memchr/memmem/oneshot 10.7 GB/s (1.15x) 12.4 GB/s (1.00x) memmem/subtitles/rare/huge-ru-sherlock rust/memchr/memmem/prebuilt 10.8 GB/s (1.14x) 12.4 GB/s (1.00x) memmem/subtitles/rare/teeny-ru-sherlock-holmes rust/memchr/memmem/prebuilt 801.1 MB/s (1.67x) 1335.1 MB/s (1.00x) memmem/subtitles/rare/teeny-ru-sherlock rust/memchr/memmem/prebuilt 801.1 MB/s (1.25x) 1001.4 MB/s (1.00x) memmem/subtitles/rare/huge-zh-sherlock-holmes rust/memchr/memmem/oneshot 2.6 GB/s (4.67x) 12.1 GB/s (1.00x) memmem/subtitles/rare/huge-zh-sherlock-holmes rust/memchr/memmem/prebuilt 2.6 GB/s (4.67x) 12.2 GB/s (1.00x) memmem/subtitles/rare/huge-zh-sherlock rust/memchr/memmem/oneshot 3.8 GB/s (3.19x) 12.2 GB/s (1.00x) memmem/subtitles/rare/huge-zh-sherlock rust/memchr/memmem/prebuilt 3.8 GB/s (3.19x) 12.3 GB/s (1.00x) memmem/subtitles/rare/teeny-zh-sherlock-holmes rust/memchr/memmem/prebuilt 591.3 MB/s (1.25x) 739.1 MB/s (1.00x) memmem/subtitles/rare/teeny-zh-sherlock rust/memchr/memmem/prebuilt 591.3 MB/s (1.25x) 739.1 MB/s (1.00x)
Thanks! This looks like a fair bit of work. But I really wish folks would file issues before throwing up huge PRs like this. Generally speaking, I don't let nightly code into my crates without a really compelling motivation. I've tried it before, and it just basically always results in headaches in one way or another. And this is especially fraught because I don't have access to any LoongArch hardware. Which brings me to my next major problem with this PR: this doesn't test any of these changes in CI. Can we run LoongArch tests somehow? If not, that seems like a major problem that needs to be addressed. I see that it is listed as a supported target for Cross, so hopefully that's easy to address. A more minor problem is that because I don't have any LoongArch hardware, I can't actually run the benchmarks. I don't think that can be meaningfully solved by Cross. I don't think this is a blocking issue ultimately. |
Thank you for your comments. I'll add CI tests for the LoongArch targets in a new PR. This PR will be updated after the stabilization of the dependent SIMD target features. As for performance benchmarking, I propose that I track it using the physical machines in my CI (due to network instability, I'm currently unable to access GitHub). |
CI tests should come with the new target support, please. My suggestion is that we close this PR for now and we can revisit this once the intrinsics are stabilized. |
Okay. |
This patch adds
LSX
vector routines for LoongArch64, significantly improving performance. See the commit message for details.