Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Upcasting all the calculations. #185

Open
Ph0rk0z opened this issue Nov 27, 2023 · 3 comments
Open

Upcasting all the calculations. #185

Ph0rk0z opened this issue Nov 27, 2023 · 3 comments

Comments

@Ph0rk0z
Copy link

Ph0rk0z commented Nov 27, 2023

So since using automatic1111 and sdnext with P40s extensively. I have found that only the calculations have to be upcast to get most of your performance back.

Just how many places would such a change have to be made in order to do this. And do any of the ops use tensor cores? Tall order or easy enough change?

@turboderp
Copy link
Member

Tall order for sure. Once the hardware I've ordered starts coming in, perhaps I could add a P40 and do some profiling and maybe provide some alternative kernels with upcasting. But it would be an extensive change and a lot more code to maintain, so I don't know if it's really feasible with my time budget.

@Ph0rk0z
Copy link
Author

Ph0rk0z commented Dec 26, 2023

I was hoping it was as simple as with AutoGPTQ where it could be done in only a few places.At least for that part and not use of tensor cores. The P100 works surprising well and also lacks them. It kills flash attention but it is definitely bearable.

@krzysiekpodk
Copy link

Out of curiosity would it be possible to more easily switch from float16 to bfloat?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants