Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG] Speculative decoding regresses performance on 7900 xtx under ROCM #685

Open
3 tasks done
Mushoz opened this issue Nov 25, 2024 · 1 comment
Open
3 tasks done
Labels
bug Something isn't working

Comments

@Mushoz
Copy link

Mushoz commented Nov 25, 2024

OS

Linux

GPU Library

AMD ROCm

Python version

3.12

Pytorch version

Pulled from https://download.pytorch.org/whl/rocm6.2 yesterday

Model

Qwen2.5-Coder-32B + Qwen2.5-Coder-1.5B as draft model

Describe the bug

When loading the Qwen2.5-Coder-32B model through Exui, I am getting around 20 tokens/s with a 4_25 bpw quant (unrelated, but this is also lacking compared to the 25+ I am seeing with llamacpp). However, when loading the 1.5B version of the model as a draft model, performance drops down to below 16 tokens/sec instead of experiencing a speedup. I do experience a speedup with llamacpp speculative decoding (a little over 2x).

Reproduction steps

  1. Load the 32B model through exui
  2. Ask for a story in a chat
  3. See around 20 tokens/second
  4. Unload the 32B model
  5. Load the 32B model + 1.5B draft model
  6. Ask for another story in a new chat

Expected behavior

A speed boost is obtained.

Actual outcome: The performance regresses.

Logs

No response

Additional context

No response

Acknowledgements

  • I have looked for similar issues before submitting this one.
  • I understand that the developers have lives and my issue will be answered when possible.
  • I understand the developers of this program are human, and I will ask my questions politely.
@Mushoz Mushoz added the bug Something isn't working label Nov 25, 2024
@Originalimoc
Copy link

Set lower max seq len, recommend 16k.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants