You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
There are problems with the current implementation of Qwen.
First of all, "tie_word_embeddings" is an optional feature in the Qwen model. However, in the current Qwen model, it is the default option.
Secondly, regarding the implementation of qwen3. qwen3 and qwen2 only have one difference, which is the qk norm. When we add qwen3, do we just need to add a config for qwen2?
I can help solve the above two problems. Do you think it should be solved by the keras-team, or should I submit a new PR to solve it?
The text was updated successfully, but these errors were encountered:
There are problems with the current implementation of Qwen.
First of all, "tie_word_embeddings" is an optional feature in the Qwen model. However, in the current Qwen model, it is the default option.
Secondly, regarding the implementation of qwen3. qwen3 and qwen2 only have one difference, which is the qk norm. When we add qwen3, do we just need to add a config for qwen2?
I can help solve the above two problems. Do you think it should be solved by the keras-team, or should I submit a new PR to solve it?
The text was updated successfully, but these errors were encountered: