-
-
Notifications
You must be signed in to change notification settings - Fork 306
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Tesla P40 performance is still very low. #40
Comments
this is just #185 p40 is incredibly slow in fp16 |
Closing this as stale. Better P40 performance is somewhere on the list of priorities, but there's too much going on right now. |
@turboderp , could you summarise the known (and unknown) parts of this issue, so that others can consider taking it on? |
Its really quite simple, exllama's kernels do all calculations on half floats, Pascal gpus other than GP100 (p100) are very slow in fp16 because only a tiny fraction of the devices shaders can do fp16 (1/64th of fp32). To work around this, you would have to upcast to 32bit do the calculation and then downcast back down to 16bit for storage in every kernel when compiling for pascal. Gpus better suited would be any nvidia gpu turing (2018) and newer or any amd gpu gcn3/gfx803 (2015) and newer as these devices support natively support fp16 at full rate or better (dual issue). |
Tesla P40 performance is still very low, only using 80W underload.
Any process on exllama todo "Look into improving P40 performance"?
env:
kernel: 6.1.53-x64v3-xanmod1 system: "Linux Mint 21.2 Victoria" cuda: cuda_11.8.r11.8 nvidia-dirvers:NVIDIA-SMI 535.104.05 Driver Version: 535.104.05 CUDA Version: 12.2 nvidia-driver-520 causes my 3090 to slow down to 36t/s, now is 39t/s
Test, p40 4096 takes too long, so I use 1024 instead.
The text was updated successfully, but these errors were encountered: