Skip to content

Pulley runs slower for float computation #10545

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
lazytiger opened this issue Apr 8, 2025 · 2 comments
Open

Pulley runs slower for float computation #10545

lazytiger opened this issue Apr 8, 2025 · 2 comments
Labels
pulley Issues related to the Pulley interpreter

Comments

@lazytiger
Copy link

use glam::{Mat3A, Vec2};

#[unsafe(no_mangle)]
extern "system" fn test() -> f32 {
    let mut a = Vec2::new(0.0, 0.0);
    for i in 0..1000000 {
        let p = Mat3A::from_angle(i as f32);
        a = p.transform_point2(Vec2::from_angle(i as f32));
    }
    a.x
}

The above code runs 30% slower with pulley than in wasmi.

@alexcrichton alexcrichton added the pulley Issues related to the Pulley interpreter label Apr 8, 2025
Copy link

github-actions bot commented Apr 8, 2025

Subscribe to Label Action

cc @fitzgen

This issue or pull request has been labeled: "pulley"

Thus the following users have been cc'd because of the following labels:

  • fitzgen: pulley

To subscribe or unsubscribe from this label, edit the .github/subscribe-to-label.json configuration file.

Learn more.

@alexcrichton
Copy link
Member

Thanks for this! This is expected right now in the sense that non-integer related operations basically haven't been optimized at all. There's a fair amount of low-hanging fruit here:

  • There are no compare-and-branch instructions for floats, only integers.
  • There are no immediate-related optimizations for floats, such as add-reg-and-immediate.
  • Pulley's opcode design right now is 1-byte "base" opcodes and 3-byte "extended" opcodes, and all float ops are 3-byte extended ops meaning they take 2 turns of the interpreter loop to process.

The first and second are mostly just a matter of adding more instructions and adding Cranelift lowerings in a similar manner to integer lowerings. The second is probably going to require Pulley to switch to a 2-byte opcode namespace instead of a simple/extended split. That is a larger refactoring which should also be measured to see the impact of integer ops.

I'll note that I won't personally have time to work on this in the near future, but I wanted to at least write these down.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
pulley Issues related to the Pulley interpreter
Projects
None yet
Development

No branches or pull requests

2 participants