Skip to content

[pytorch upstream] pointwise is slow compare to cuda #4001

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
jianyizh opened this issue Apr 24, 2025 · 0 comments
Open

[pytorch upstream] pointwise is slow compare to cuda #4001

jianyizh opened this issue Apr 24, 2025 · 0 comments

Comments

@jianyizh
Copy link
Contributor

Describe the issue

in torchbench model pyhpc_equation_of_state inference, the whole model is fused into a large pointwise kernel, and it's slow compare to a100. This model is faster than a100 in eager mode. FP64 maybe a reason, but I manually change it to fp32, and performance is still low.
pointwise_test.zip
| | 1550 | A100 |
|fp64| 0.77ms | 0.43ms|
|fp32| 0.30ms | 0.08ms|

Environment details

triton 3.3.0+git0bcc8265
pytorch: 3ed5f1fb77669c8ac5d02e7acc0218e31b71c0b6

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants