Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

loongarch64: add vector routines #171

Closed
wants to merge 1 commit into from
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1,031 changes: 1,031 additions & 0 deletions src/arch/loongarch64/lsx/memchr.rs

Large diffs are not rendered by default.

6 changes: 6 additions & 0 deletions src/arch/loongarch64/lsx/mod.rs
Original file line number Diff line number Diff line change
@@ -0,0 +1,6 @@
/*!
Algorithms for the `loongarch64` target using 128-bit vectors via LSX.
*/

pub mod memchr;
pub mod packedpair;
236 changes: 236 additions & 0 deletions src/arch/loongarch64/lsx/packedpair.rs
Original file line number Diff line number Diff line change
@@ -0,0 +1,236 @@
/*!
A 128-bit vector implementation of the "packed pair" SIMD algorithm.

The "packed pair" algorithm is based on the [generic SIMD] algorithm. The main
difference is that it (by default) uses a background distribution of byte
frequencies to heuristically select the pair of bytes to search for.

[generic SIMD]: http://0x80.pl/articles/simd-strfind.html#first-and-last
*/

use core::arch::loongarch64::v16u8;

use crate::arch::{all::packedpair::Pair, generic::packedpair};

/// A "packed pair" finder that uses 128-bit vector operations.
///
/// This finder picks two bytes that it believes have high predictive power
/// for indicating an overall match of a needle. Depending on whether
/// `Finder::find` or `Finder::find_prefilter` is used, it reports offsets
/// where the needle matches or could match. In the prefilter case, candidates
/// are reported whenever the [`Pair`] of bytes given matches.
#[derive(Clone, Copy, Debug)]
pub struct Finder(packedpair::Finder<v16u8>);

/// A "packed pair" finder that uses 128-bit vector operations.
///
/// This finder picks two bytes that it believes have high predictive power
/// for indicating an overall match of a needle. Depending on whether
/// `Finder::find` or `Finder::find_prefilter` is used, it reports offsets
/// where the needle matches or could match. In the prefilter case, candidates
/// are reported whenever the [`Pair`] of bytes given matches.
impl Finder {
/// Create a new pair searcher. The searcher returned can either report
/// exact matches of `needle` or act as a prefilter and report candidate
/// positions of `needle`.
///
/// If lsx is unavailable in the current environment or if a [`Pair`]
/// could not be constructed from the needle given, then `None` is
/// returned.
#[inline]
pub fn new(needle: &[u8]) -> Option<Finder> {
Finder::with_pair(needle, Pair::new(needle)?)
}

/// Create a new "packed pair" finder using the pair of bytes given.
///
/// This constructor permits callers to control precisely which pair of
/// bytes is used as a predicate.
///
/// If lsx is unavailable in the current environment, then `None` is
/// returned.
#[inline]
pub fn with_pair(needle: &[u8], pair: Pair) -> Option<Finder> {
if Finder::is_available() {
// SAFETY: we check that sse2 is available above. We are also
// guaranteed to have needle.len() > 1 because we have a valid
// Pair.
unsafe { Some(Finder::with_pair_impl(needle, pair)) }
} else {
None
}
}

/// Create a new `Finder` specific to lsx vectors and routines.
///
/// # Safety
///
/// Same as the safety for `packedpair::Finder::new`, and callers must also
/// ensure that lsx is available.
#[target_feature(enable = "lsx")]
#[inline]
unsafe fn with_pair_impl(needle: &[u8], pair: Pair) -> Finder {
let finder = packedpair::Finder::<v16u8>::new(needle, pair);
Finder(finder)
}

/// Returns true when this implementation is available in the current
/// environment.
///
/// When this is true, it is guaranteed that [`Finder::with_pair`] will
/// return a `Some` value. Similarly, when it is false, it is guaranteed
/// that `Finder::with_pair` will return a `None` value. Notice that this
/// does not guarantee that [`Finder::new`] will return a `Finder`. Namely,
/// even when `Finder::is_available` is true, it is not guaranteed that a
/// valid [`Pair`] can be found from the needle given.
///
/// Note also that for the lifetime of a single program, if this returns
/// true then it will always return true.
#[inline]
pub fn is_available() -> bool {
#[cfg(target_feature = "lsx")]
{
true
}
#[cfg(not(target_feature = "lsx"))]
{
false
}
}

/// Execute a search using lsx vectors and routines.
///
/// # Panics
///
/// When `haystack.len()` is less than [`Finder::min_haystack_len`].
#[inline]
pub fn find(&self, haystack: &[u8], needle: &[u8]) -> Option<usize> {
// SAFETY: Building a `Finder` means it's safe to call 'lsx' routines.
unsafe { self.find_impl(haystack, needle) }
}

/// Execute a search using lsx vectors and routines.
///
/// # Panics
///
/// When `haystack.len()` is less than [`Finder::min_haystack_len`].
#[inline]
pub fn find_prefilter(&self, haystack: &[u8]) -> Option<usize> {
// SAFETY: Building a `Finder` means it's safe to call 'lsx' routines.
unsafe { self.find_prefilter_impl(haystack) }
}

/// Execute a search using lsx vectors and routines.
///
/// # Panics
///
/// When `haystack.len()` is less than [`Finder::min_haystack_len`].
///
/// # Safety
///
/// (The target feature safety obligation is automatically fulfilled by
/// virtue of being a method on `Finder`, which can only be constructed
/// when it is safe to call `lsx` routines.)
#[target_feature(enable = "lsx")]
#[inline]
unsafe fn find_impl(
&self,
haystack: &[u8],
needle: &[u8],
) -> Option<usize> {
self.0.find(haystack, needle)
}

/// Execute a prefilter search using lsx vectors and routines.
///
/// # Panics
///
/// When `haystack.len()` is less than [`Finder::min_haystack_len`].
///
/// # Safety
///
/// (The target feature safety obligation is automatically fulfilled by
/// virtue of being a method on `Finder`, which can only be constructed
/// when it is safe to call `lsx` routines.)
#[target_feature(enable = "lsx")]
#[inline]
unsafe fn find_prefilter_impl(&self, haystack: &[u8]) -> Option<usize> {
self.0.find_prefilter(haystack)
}

/// Returns the pair of offsets (into the needle) used to check as a
/// predicate before confirming whether a needle exists at a particular
/// position.
#[inline]
pub fn pair(&self) -> &Pair {
self.0.pair()
}

/// Returns the minimum haystack length that this `Finder` can search.
///
/// Using a haystack with length smaller than this in a search will result
/// in a panic. The reason for this restriction is that this finder is
/// meant to be a low-level component that is part of a larger substring
/// strategy. In that sense, it avoids trying to handle all cases and
/// instead only handles the cases that it can handle very well.
#[inline]
pub fn min_haystack_len(&self) -> usize {
self.0.min_haystack_len()
}
}

#[cfg(test)]
mod tests {
use super::*;

fn find(haystack: &[u8], needle: &[u8]) -> Option<Option<usize>> {
let f = Finder::new(needle)?;
if haystack.len() < f.min_haystack_len() {
return None;
}
Some(f.find(haystack, needle))
}

define_substring_forward_quickcheck!(find);

#[test]
fn forward_substring() {
crate::tests::substring::Runner::new().fwd(find).run()
}

#[test]
fn forward_packedpair() {
fn find(
haystack: &[u8],
needle: &[u8],
index1: u8,
index2: u8,
) -> Option<Option<usize>> {
let pair = Pair::with_indices(needle, index1, index2)?;
let f = Finder::with_pair(needle, pair)?;
if haystack.len() < f.min_haystack_len() {
return None;
}
Some(f.find(haystack, needle))
}
crate::tests::packedpair::Runner::new().fwd(find).run()
}

#[test]
fn forward_packedpair_prefilter() {
fn find(
haystack: &[u8],
needle: &[u8],
index1: u8,
index2: u8,
) -> Option<Option<usize>> {
let pair = Pair::with_indices(needle, index1, index2)?;
let f = Finder::with_pair(needle, pair)?;
if haystack.len() < f.min_haystack_len() {
return None;
}
Some(f.find_prefilter(haystack))
}
crate::tests::packedpair::Runner::new().fwd(find).run()
}
}
137 changes: 137 additions & 0 deletions src/arch/loongarch64/memchr.rs
Original file line number Diff line number Diff line change
@@ -0,0 +1,137 @@
/*!
Wrapper routines for `memchr` and friends.

These routines choose the best implementation at compile time. (This is
different from `x86_64` because it is expected that `lsx` is almost always
available for `loongarch64` targets.)
*/

macro_rules! defraw {
($ty:ident, $find:ident, $start:ident, $end:ident, $($needles:ident),+) => {{
#[cfg(target_feature = "lsx")]
{
use crate::arch::loongarch64::lsx::memchr::$ty;

debug!("chose lsx for {}", stringify!($ty));
debug_assert!($ty::is_available());
// SAFETY: We know that wasm memchr is always available whenever
// code is compiled for `loongarch64` with the `lsx` target feature
// enabled.
$ty::new_unchecked($($needles),+).$find($start, $end)
}
#[cfg(not(target_feature = "lsx"))]
{
use crate::arch::all::memchr::$ty;

debug!(
"no lsx feature available, using fallback for {}",
stringify!($ty),
);
$ty::new($($needles),+).$find($start, $end)
}
}}
}

/// memchr, but using raw pointers to represent the haystack.
///
/// # Safety
///
/// Pointers must be valid. See `One::find_raw`.
#[inline(always)]
pub(crate) unsafe fn memchr_raw(
n1: u8,
start: *const u8,
end: *const u8,
) -> Option<*const u8> {
defraw!(One, find_raw, start, end, n1)
}

/// memrchr, but using raw pointers to represent the haystack.
///
/// # Safety
///
/// Pointers must be valid. See `One::rfind_raw`.
#[inline(always)]
pub(crate) unsafe fn memrchr_raw(
n1: u8,
start: *const u8,
end: *const u8,
) -> Option<*const u8> {
defraw!(One, rfind_raw, start, end, n1)
}

/// memchr2, but using raw pointers to represent the haystack.
///
/// # Safety
///
/// Pointers must be valid. See `Two::find_raw`.
#[inline(always)]
pub(crate) unsafe fn memchr2_raw(
n1: u8,
n2: u8,
start: *const u8,
end: *const u8,
) -> Option<*const u8> {
defraw!(Two, find_raw, start, end, n1, n2)
}

/// memrchr2, but using raw pointers to represent the haystack.
///
/// # Safety
///
/// Pointers must be valid. See `Two::rfind_raw`.
#[inline(always)]
pub(crate) unsafe fn memrchr2_raw(
n1: u8,
n2: u8,
start: *const u8,
end: *const u8,
) -> Option<*const u8> {
defraw!(Two, rfind_raw, start, end, n1, n2)
}

/// memchr3, but using raw pointers to represent the haystack.
///
/// # Safety
///
/// Pointers must be valid. See `Three::find_raw`.
#[inline(always)]
pub(crate) unsafe fn memchr3_raw(
n1: u8,
n2: u8,
n3: u8,
start: *const u8,
end: *const u8,
) -> Option<*const u8> {
defraw!(Three, find_raw, start, end, n1, n2, n3)
}

/// memrchr3, but using raw pointers to represent the haystack.
///
/// # Safety
///
/// Pointers must be valid. See `Three::rfind_raw`.
#[inline(always)]
pub(crate) unsafe fn memrchr3_raw(
n1: u8,
n2: u8,
n3: u8,
start: *const u8,
end: *const u8,
) -> Option<*const u8> {
defraw!(Three, rfind_raw, start, end, n1, n2, n3)
}

/// Count all matching bytes, but using raw pointers to represent the haystack.
///
/// # Safety
///
/// Pointers must be valid. See `One::count_raw`.
#[inline(always)]
pub(crate) unsafe fn count_raw(
n1: u8,
start: *const u8,
end: *const u8,
) -> usize {
defraw!(One, count_raw, start, end, n1)
}
7 changes: 7 additions & 0 deletions src/arch/loongarch64/mod.rs
Original file line number Diff line number Diff line change
@@ -0,0 +1,7 @@
/*!
Vector algorithms for the `loongarch64` target.
*/

pub mod lsx;

pub(crate) mod memchr;
2 changes: 2 additions & 0 deletions src/arch/mod.rs
Original file line number Diff line number Diff line change
Expand Up @@ -10,6 +10,8 @@ pub(crate) mod generic;

#[cfg(target_arch = "aarch64")]
pub mod aarch64;
#[cfg(target_arch = "loongarch64")]
pub mod loongarch64;
#[cfg(all(target_arch = "wasm32", target_feature = "simd128"))]
pub mod wasm32;
#[cfg(target_arch = "x86_64")]
Expand Down
Loading
Loading