Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

loongarch64: add vector routines #171

Closed
wants to merge 1 commit into from

Conversation

heiher
Copy link

@heiher heiher commented Dec 31, 2024

This patch adds LSX vector routines for LoongArch64, significantly improving performance. See the commit message for details.

 $ rebar diff -e -e memchr/memmem loong64-base.csv loong64-lsx.csv -t 1.1
 benchmark                                          engine                       loong64-base.csv     loong64-lsx.csv
 ---------                                          ------                       -------------------  ------------------
 memmem/byterank/binary                             rust/memchr/memmem/oneshot   970.1 MB/s (1.81x)   1755.4 MB/s (1.00x)
 memmem/byterank/binary                             rust/memchr/memmem/prebuilt  969.8 MB/s (1.83x)   1771.1 MB/s (1.00x)
 memmem/byterank/binary                             rust/memchr/memmem/binary    9.4 GB/s (1.16x)     10.9 GB/s (1.00x)
 memmem/code/rust-library-never-fn-strength         rust/memchr/memmem/oneshot   4.4 GB/s (2.80x)     12.2 GB/s (1.00x)
 memmem/code/rust-library-never-fn-strength         rust/memchr/memmem/prebuilt  4.4 GB/s (2.79x)     12.1 GB/s (1.00x)
 memmem/code/rust-library-never-fn-strength-paren   rust/memchr/memmem/oneshot   4.3 GB/s (2.81x)     12.1 GB/s (1.00x)
 memmem/code/rust-library-never-fn-strength-paren   rust/memchr/memmem/prebuilt  4.3 GB/s (2.81x)     12.1 GB/s (1.00x)
 memmem/code/rust-library-never-fn-quux             rust/memchr/memmem/oneshot   6.8 GB/s (1.83x)     12.4 GB/s (1.00x)
 memmem/code/rust-library-never-fn-quux             rust/memchr/memmem/prebuilt  6.8 GB/s (1.83x)     12.4 GB/s (1.00x)
 memmem/code/rust-library-rare-fn-from-str          rust/memchr/memmem/oneshot   4.0 GB/s (3.03x)     12.1 GB/s (1.00x)
 memmem/code/rust-library-rare-fn-from-str          rust/memchr/memmem/prebuilt  4.0 GB/s (3.04x)     12.1 GB/s (1.00x)
 memmem/code/rust-library-common-fn-is-empty        rust/memchr/memmem/oneshot   4.6 GB/s (2.59x)     12.0 GB/s (1.00x)
 memmem/code/rust-library-common-fn-is-empty        rust/memchr/memmem/prebuilt  4.6 GB/s (2.60x)     12.1 GB/s (1.00x)
 memmem/code/rust-library-common-fn                 rust/memchr/memmem/oneshot   1966.2 MB/s (3.68x)  7.1 GB/s (1.00x)
 memmem/code/rust-library-common-fn                 rust/memchr/memmem/prebuilt  2.1 GB/s (5.14x)     10.6 GB/s (1.00x)
 memmem/code/rust-library-common-let                rust/memchr/memmem/oneshot   1227.9 MB/s (3.81x)  4.6 GB/s (1.00x)
 memmem/code/rust-library-common-let                rust/memchr/memmem/prebuilt  1343.4 MB/s (5.36x)  7.0 GB/s (1.00x)
 memmem/pathological/md5-huge-no-hash               rust/memchr/memmem/oneshot   724.7 MB/s (12.77x)  9.0 GB/s (1.00x)
 memmem/pathological/md5-huge-no-hash               rust/memchr/memmem/prebuilt  724.6 MB/s (12.87x)  9.1 GB/s (1.00x)
 memmem/pathological/md5-huge-last-hash             rust/memchr/memmem/oneshot   734.3 MB/s (12.54x)  9.0 GB/s (1.00x)
 memmem/pathological/md5-huge-last-hash             rust/memchr/memmem/prebuilt  735.7 MB/s (12.63x)  9.1 GB/s (1.00x)
 memmem/pathological/rare-repeated-huge-tricky      rust/memchr/memmem/oneshot   333.5 MB/s (37.98x)  12.4 GB/s (1.00x)
 memmem/pathological/rare-repeated-huge-tricky      rust/memchr/memmem/prebuilt  355.9 MB/s (35.65x)  12.4 GB/s (1.00x)
 memmem/pathological/rare-repeated-huge-match       rust/memchr/memmem/oneshot   97.3 MB/s (1.82x)    177.3 MB/s (1.00x)
 memmem/pathological/rare-repeated-huge-match       rust/memchr/memmem/prebuilt  581.2 MB/s (1.29x)   747.1 MB/s (1.00x)
 memmem/pathological/rare-repeated-small-tricky     rust/memchr/memmem/oneshot   291.0 MB/s (25.23x)  7.2 GB/s (1.00x)
 memmem/pathological/rare-repeated-small-tricky     rust/memchr/memmem/prebuilt  336.1 MB/s (31.56x)  10.4 GB/s (1.00x)
 memmem/pathological/rare-repeated-small-match      rust/memchr/memmem/oneshot   102.3 MB/s (1.87x)   191.7 MB/s (1.00x)
 memmem/pathological/rare-repeated-small-match      rust/memchr/memmem/prebuilt  474.9 MB/s (1.84x)   875.8 MB/s (1.00x)
 memmem/pathological/defeat-simple-vector-alphabet  rust/memchr/memmem/oneshot   1119.5 MB/s (1.00x)  882.8 MB/s (1.27x)
 memmem/pathological/defeat-simple-vector-alphabet  rust/memchr/memmem/prebuilt  1119.5 MB/s (1.00x)  882.9 MB/s (1.27x)
 memmem/sliceslice/seemingly-random                 rust/memchr/memmem/prebuilt  787.6 KB/s (3.14x)   2.4 MB/s (1.00x)
 memmem/sliceslice/i386                             rust/memchr/memmem/prebuilt  3.7 MB/s (3.47x)     13.0 MB/s (1.00x)
 memmem/subtitles/common/huge-en-that               rust/memchr/memmem/oneshot   858.8 MB/s (6.63x)   5.6 GB/s (1.00x)
 memmem/subtitles/common/huge-en-that               rust/memchr/memmem/prebuilt  895.9 MB/s (8.48x)   7.4 GB/s (1.00x)
 memmem/subtitles/common/huge-en-you                rust/memchr/memmem/oneshot   1028.3 MB/s (2.15x)  2.2 GB/s (1.00x)
 memmem/subtitles/common/huge-en-you                rust/memchr/memmem/prebuilt  1335.8 MB/s (3.98x)  5.2 GB/s (1.00x)
 memmem/subtitles/common/huge-en-one-space          rust/memchr/memmem/oneshot   298.4 MB/s (1.58x)   471.7 MB/s (1.00x)
 memmem/subtitles/common/huge-en-one-space          rust/memchr/memmem/prebuilt  379.8 MB/s (1.57x)   596.9 MB/s (1.00x)
 memmem/subtitles/common/huge-ru-that               rust/memchr/memmem/oneshot   857.8 MB/s (6.77x)   5.7 GB/s (1.00x)
 memmem/subtitles/common/huge-ru-that               rust/memchr/memmem/prebuilt  917.4 MB/s (9.70x)   8.7 GB/s (1.00x)
 memmem/subtitles/common/huge-ru-not                rust/memchr/memmem/oneshot   846.0 MB/s (3.62x)   3.0 GB/s (1.00x)
 memmem/subtitles/common/huge-ru-not                rust/memchr/memmem/prebuilt  979.9 MB/s (6.17x)   5.9 GB/s (1.00x)
 memmem/subtitles/common/huge-ru-one-space          rust/memchr/memmem/oneshot   464.3 MB/s (1.46x)   675.8 MB/s (1.00x)
 memmem/subtitles/common/huge-ru-one-space          rust/memchr/memmem/prebuilt  573.5 MB/s (1.43x)   818.5 MB/s (1.00x)
 memmem/subtitles/common/huge-zh-that               rust/memchr/memmem/oneshot   4.0 GB/s (1.60x)     6.5 GB/s (1.00x)
 memmem/subtitles/common/huge-zh-that               rust/memchr/memmem/prebuilt  5.1 GB/s (1.75x)     8.9 GB/s (1.00x)
 memmem/subtitles/common/huge-zh-do-not             rust/memchr/memmem/oneshot   2.1 GB/s (1.67x)     3.5 GB/s (1.00x)
 memmem/subtitles/common/huge-zh-do-not             rust/memchr/memmem/prebuilt  2.8 GB/s (2.34x)     6.5 GB/s (1.00x)
 memmem/subtitles/common/huge-zh-one-space          rust/memchr/memmem/oneshot   1281.4 MB/s (1.13x)  1453.3 MB/s (1.00x)
 memmem/subtitles/never/huge-en-john-watson         rust/memchr/memmem/oneshot   10.0 GB/s (1.23x)    12.4 GB/s (1.00x)
 memmem/subtitles/never/huge-en-john-watson         rust/memchr/memmem/prebuilt  10.1 GB/s (1.23x)    12.4 GB/s (1.00x)
 memmem/subtitles/never/huge-en-all-common-bytes    rust/memchr/memmem/oneshot   1152.0 MB/s (8.68x)  9.8 GB/s (1.00x)
 memmem/subtitles/never/huge-en-all-common-bytes    rust/memchr/memmem/prebuilt  1153.4 MB/s (8.68x)  9.8 GB/s (1.00x)
 memmem/subtitles/never/huge-en-some-rare-bytes     rust/memchr/memmem/oneshot   10.2 GB/s (1.21x)    12.4 GB/s (1.00x)
 memmem/subtitles/never/huge-en-some-rare-bytes     rust/memchr/memmem/prebuilt  10.2 GB/s (1.21x)    12.4 GB/s (1.00x)
 memmem/subtitles/never/huge-en-two-space           rust/memchr/memmem/oneshot   728.5 MB/s (22.29x)  15.9 GB/s (1.00x)
 memmem/subtitles/never/huge-en-two-space           rust/memchr/memmem/prebuilt  728.5 MB/s (22.35x)  15.9 GB/s (1.00x)
 memmem/subtitles/never/teeny-en-john-watson        rust/memchr/memmem/prebuilt  890.1 MB/s (1.50x)   1335.1 MB/s (1.00x)
 memmem/subtitles/never/teeny-en-all-common-bytes   rust/memchr/memmem/prebuilt  890.1 MB/s (1.50x)   1335.1 MB/s (1.00x)
 memmem/subtitles/never/teeny-en-some-rare-bytes    rust/memchr/memmem/prebuilt  890.1 MB/s (1.50x)   1335.1 MB/s (1.00x)
 memmem/subtitles/never/teeny-en-two-space          rust/memchr/memmem/prebuilt  667.6 MB/s (2.00x)   1335.1 MB/s (1.00x)
 memmem/subtitles/never/huge-ru-john-watson         rust/memchr/memmem/oneshot   10.0 GB/s (1.24x)    12.4 GB/s (1.00x)
 memmem/subtitles/never/huge-ru-john-watson         rust/memchr/memmem/prebuilt  10.0 GB/s (1.24x)    12.4 GB/s (1.00x)
 memmem/subtitles/never/teeny-ru-john-watson        rust/memchr/memmem/prebuilt  1335.1 MB/s (1.50x)  2002.7 MB/s (1.00x)
 memmem/subtitles/never/huge-zh-john-watson         rust/memchr/memmem/oneshot   4.9 GB/s (2.48x)     12.2 GB/s (1.00x)
 memmem/subtitles/never/huge-zh-john-watson         rust/memchr/memmem/prebuilt  4.9 GB/s (2.48x)     12.2 GB/s (1.00x)
 memmem/subtitles/never/teeny-zh-john-watson        rust/memchr/memmem/prebuilt  985.5 MB/s (1.50x)   1478.2 MB/s (1.00x)
 memmem/subtitles/rare/huge-en-sherlock-holmes      rust/memchr/memmem/oneshot   7.6 GB/s (1.64x)     12.4 GB/s (1.00x)
 memmem/subtitles/rare/huge-en-sherlock-holmes      rust/memchr/memmem/prebuilt  7.6 GB/s (1.64x)     12.4 GB/s (1.00x)
 memmem/subtitles/rare/huge-en-sherlock             rust/memchr/memmem/oneshot   4.0 GB/s (3.08x)     12.3 GB/s (1.00x)
 memmem/subtitles/rare/huge-en-sherlock             rust/memchr/memmem/prebuilt  4.0 GB/s (3.09x)     12.3 GB/s (1.00x)
 memmem/subtitles/rare/huge-en-medium-needle        rust/memchr/memmem/oneshot   3.7 GB/s (3.22x)     12.0 GB/s (1.00x)
 memmem/subtitles/rare/huge-en-medium-needle        rust/memchr/memmem/prebuilt  3.7 GB/s (3.23x)     12.1 GB/s (1.00x)
 memmem/subtitles/rare/huge-en-long-needle          rust/memchr/memmem/oneshot   4.3 GB/s (3.54x)     15.1 GB/s (1.00x)
 memmem/subtitles/rare/huge-en-long-needle          rust/memchr/memmem/prebuilt  4.3 GB/s (3.60x)     15.5 GB/s (1.00x)
 memmem/subtitles/rare/huge-en-huge-needle          rust/memchr/memmem/oneshot   7.2 GB/s (1.99x)     14.4 GB/s (1.00x)
 memmem/subtitles/rare/huge-en-huge-needle          rust/memchr/memmem/prebuilt  7.5 GB/s (2.07x)     15.6 GB/s (1.00x)
 memmem/subtitles/rare/teeny-en-sherlock-holmes     rust/memchr/memmem/prebuilt  534.1 MB/s (1.67x)   890.1 MB/s (1.00x)
 memmem/subtitles/rare/teeny-en-sherlock            rust/memchr/memmem/prebuilt  534.1 MB/s (1.67x)   890.1 MB/s (1.00x)
 memmem/subtitles/rare/huge-ru-sherlock-holmes      rust/memchr/memmem/oneshot   10.7 GB/s (1.15x)    12.4 GB/s (1.00x)
 memmem/subtitles/rare/huge-ru-sherlock-holmes      rust/memchr/memmem/prebuilt  10.9 GB/s (1.14x)    12.4 GB/s (1.00x)
 memmem/subtitles/rare/huge-ru-sherlock             rust/memchr/memmem/oneshot   10.7 GB/s (1.15x)    12.4 GB/s (1.00x)
 memmem/subtitles/rare/huge-ru-sherlock             rust/memchr/memmem/prebuilt  10.8 GB/s (1.14x)    12.4 GB/s (1.00x)
 memmem/subtitles/rare/teeny-ru-sherlock-holmes     rust/memchr/memmem/prebuilt  801.1 MB/s (1.67x)   1335.1 MB/s (1.00x)
 memmem/subtitles/rare/teeny-ru-sherlock            rust/memchr/memmem/prebuilt  801.1 MB/s (1.25x)   1001.4 MB/s (1.00x)
 memmem/subtitles/rare/huge-zh-sherlock-holmes      rust/memchr/memmem/oneshot   2.6 GB/s (4.67x)     12.1 GB/s (1.00x)
 memmem/subtitles/rare/huge-zh-sherlock-holmes      rust/memchr/memmem/prebuilt  2.6 GB/s (4.67x)     12.2 GB/s (1.00x)
 memmem/subtitles/rare/huge-zh-sherlock             rust/memchr/memmem/oneshot   3.8 GB/s (3.19x)     12.2 GB/s (1.00x)
 memmem/subtitles/rare/huge-zh-sherlock             rust/memchr/memmem/prebuilt  3.8 GB/s (3.19x)     12.3 GB/s (1.00x)
 memmem/subtitles/rare/teeny-zh-sherlock-holmes     rust/memchr/memmem/prebuilt  591.3 MB/s (1.25x)   739.1 MB/s (1.00x)
 memmem/subtitles/rare/teeny-zh-sherlock            rust/memchr/memmem/prebuilt  591.3 MB/s (1.25x)   739.1 MB/s (1.00x)
@BurntSushi
Copy link
Owner

Thanks! This looks like a fair bit of work.

But I really wish folks would file issues before throwing up huge PRs like this. Generally speaking, I don't let nightly code into my crates without a really compelling motivation. I've tried it before, and it just basically always results in headaches in one way or another. And this is especially fraught because I don't have access to any LoongArch hardware. Which brings me to my next major problem with this PR: this doesn't test any of these changes in CI. Can we run LoongArch tests somehow? If not, that seems like a major problem that needs to be addressed. I see that it is listed as a supported target for Cross, so hopefully that's easy to address.

A more minor problem is that because I don't have any LoongArch hardware, I can't actually run the benchmarks. I don't think that can be meaningfully solved by Cross. I don't think this is a blocking issue ultimately.

@heiher
Copy link
Author

heiher commented Jan 3, 2025

Thank you for your comments. I'll add CI tests for the LoongArch targets in a new PR. This PR will be updated after the stabilization of the dependent SIMD target features. As for performance benchmarking, I propose that I track it using the physical machines in my CI (due to network instability, I'm currently unable to access GitHub).

@heiher heiher marked this pull request as draft January 3, 2025 06:26
@BurntSushi
Copy link
Owner

BurntSushi commented Jan 3, 2025

CI tests should come with the new target support, please.

My suggestion is that we close this PR for now and we can revisit this once the intrinsics are stabilized.

@heiher
Copy link
Author

heiher commented Jan 3, 2025

CI tests should come with the new target support, please.

My suggestion is that we close this PR for now and we can revisit this once the intrinsics are stabilized.

Okay.

@heiher heiher closed this Jan 3, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants