Skip to content

Dynamic support for gemmlowp #1162

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
mason20240920 opened this issue Mar 10, 2025 · 6 comments
Open

Dynamic support for gemmlowp #1162

mason20240920 opened this issue Mar 10, 2025 · 6 comments

Comments

@mason20240920
Copy link

Excited to see the update of dynamic gemm, is there any plan to update gemmlowp?

@morgolock
Copy link

Hi @mason20240920

Would you please share more details of your use-case? Are you trying to run a model that requires dynamic shapes support in gemmlowp? Which one?

@mason20240920
Copy link
Author

Hi @mason20240920

Would you please share more details of your use-case? Are you trying to run a model that requires dynamic shapes support in gemmlowp? Which one?

We need to make a llm program base on ACL, and in gpt, sequence length is dynamic. That's why we need dynamic operators

@gunes-arm
Copy link

Hi @mason20240920 , there have been a few use cases mentioned to us around Gemmlowp; so it's not strictly planned to be delivered, but I can say it's in discovery phase. I have two questions:

  • What is the data type combinations you are interested in? i.e. Is it Int8, UInt8 etc.?
  • Are there any other operators that needs to work with dynamic shapes?

@mason20240920
Copy link
Author

Hi @mason20240920 , there have been a few use cases mentioned to us around Gemmlowp; so it's not strictly planned to be delivered, but I can say it's in discovery phase. I have two questions:

  • What is the data type combinations you are interested in? i.e. Is it Int8, UInt8 etc.?
  • Are there any other operators that needs to work with dynamic shapes?

Thank you for your prompt response

  • We are now using per-channel int8 quantization for weights and int8 for activation values, similar to the A8W8 approach used in QWen and Llama models.
  • Yes, we need Softmax, Split, and other operations to be dynamic. While these kernels are relatively easy to modify for window size adjustments,we can change these kernel. GEMM and MatMul operations require assembly code to cooperate properly.

@gunes-arm
Copy link

Thanks for the details @mason20240920,

  • which hardware (or hardware features) are you targeting?
  • Is output Int8 as well?
  • Is there bias and is it Int32?

@mason20240920
Copy link
Author

Thanks for the details @mason20240920,

  • which hardware (or hardware features) are you targeting?
  • Is output Int8 as well?
  • Is there bias and is it Int32?
  1. Due to memory bandwidth limitations, CPUs currently remain the optimal solution for on-device inference of large models.
  2. Yes
  3. Bias can be int16, but int32 is okay

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants