You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
in torchbench model pyhpc_equation_of_state inference, the whole model is fused into a large pointwise kernel, and it's slow compare to a100. This model is faster than a100 in eager mode. FP64 maybe a reason, but I manually change it to fp32, and performance is still low. pointwise_test.zip
| | 1550 | A100 |
|fp64| 0.77ms | 0.43ms|
|fp32| 0.30ms | 0.08ms|
Describe the issue
in torchbench model pyhpc_equation_of_state inference, the whole model is fused into a large pointwise kernel, and it's slow compare to a100. This model is faster than a100 in eager mode. FP64 maybe a reason, but I manually change it to fp32, and performance is still low.
pointwise_test.zip
| | 1550 | A100 |
|fp64| 0.77ms | 0.43ms|
|fp32| 0.30ms | 0.08ms|
Environment details
triton 3.3.0+git0bcc8265
pytorch: 3ed5f1fb77669c8ac5d02e7acc0218e31b71c0b6
The text was updated successfully, but these errors were encountered: