Dynamic support for gemmlowp #1162

mason20240920 · 2025-03-10T07:38:00Z

Excited to see the update of dynamic gemm, is there any plan to update gemmlowp？

morgolock · 2025-03-12T10:20:43Z

Hi @mason20240920

Would you please share more details of your use-case? Are you trying to run a model that requires dynamic shapes support in gemmlowp? Which one?

mason20240920 · 2025-03-12T10:43:17Z

Hi @mason20240920

Would you please share more details of your use-case? Are you trying to run a model that requires dynamic shapes support in gemmlowp? Which one?

We need to make a llm program base on ACL, and in gpt, sequence length is dynamic. That's why we need dynamic operators

gunes-arm · 2025-03-12T13:04:05Z

Hi @mason20240920 , there have been a few use cases mentioned to us around Gemmlowp; so it's not strictly planned to be delivered, but I can say it's in discovery phase. I have two questions:

What is the data type combinations you are interested in? i.e. Is it Int8, UInt8 etc.?
Are there any other operators that needs to work with dynamic shapes?

mason20240920 · 2025-03-13T01:55:21Z

Hi @mason20240920 , there have been a few use cases mentioned to us around Gemmlowp; so it's not strictly planned to be delivered, but I can say it's in discovery phase. I have two questions:

What is the data type combinations you are interested in? i.e. Is it Int8, UInt8 etc.?

Are there any other operators that needs to work with dynamic shapes?

Thank you for your prompt response

We are now using per-channel int8 quantization for weights and int8 for activation values, similar to the A8W8 approach used in QWen and Llama models.
Yes, we need Softmax, Split, and other operations to be dynamic. While these kernels are relatively easy to modify for window size adjustments,we can change these kernel. GEMM and MatMul operations require assembly code to cooperate properly.

gunes-arm · 2025-03-14T11:45:19Z

Thanks for the details @mason20240920,

which hardware (or hardware features) are you targeting?
Is output Int8 as well?
Is there bias and is it Int32?

mason20240920 · 2025-03-16T11:00:48Z

Thanks for the details @mason20240920,

which hardware (or hardware features) are you targeting?

Is output Int8 as well?

Is there bias and is it Int32?

Due to memory bandwidth limitations, CPUs currently remain the optimal solution for on-device inference of large models.
Yes
Bias can be int16, but int32 is okay

morgolock added the Question label Mar 10, 2025

mason20240920 closed this as completed Mar 12, 2025

mason20240920 reopened this Mar 12, 2025

morgolock added the Feature Request label Mar 18, 2025

ArmDude closed this as completed Mar 26, 2025

ArmDude reopened this Mar 26, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Dynamic support for gemmlowp #1162

Dynamic support for gemmlowp #1162

mason20240920 commented Mar 10, 2025

morgolock commented Mar 12, 2025

Uh oh!

mason20240920 commented Mar 12, 2025

Uh oh!

gunes-arm commented Mar 12, 2025

Uh oh!

mason20240920 commented Mar 13, 2025

Uh oh!

gunes-arm commented Mar 14, 2025

Uh oh!

mason20240920 commented Mar 16, 2025

Uh oh!

Dynamic support for gemmlowp #1162

Dynamic support for gemmlowp #1162

Comments

mason20240920 commented Mar 10, 2025

morgolock commented Mar 12, 2025

Uh oh!

mason20240920 commented Mar 12, 2025

Uh oh!

gunes-arm commented Mar 12, 2025

Uh oh!

mason20240920 commented Mar 13, 2025

Uh oh!

gunes-arm commented Mar 14, 2025

Uh oh!

mason20240920 commented Mar 16, 2025

Uh oh!