You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Hi, @SparkJiao , I'm working on finetuning deepseek coder model (like 1b and 6.7b) based on model pipeline, as far as I know, it is based on the llama architecture. And this repo gives me great help. But as a beginner, I did not quite understand about the TiedLayerSpec which is provied by deepspeed library. And I saw you provide two get_model() function.
I recommend using get_layers_from_config. I think there was a bug when using get_model and I failed to fix it.
TiedLayerSpec is used for initializing the tied weights, as they share the gradient update. In Transoformer architecture, the weight of input embedding and lm_head sometimes will be tied. But I'm not sure if deepseek's model uses this setting, you may check their config or paper.
Hi, @SparkJiao , I'm working on finetuning deepseek coder model (like 1b and 6.7b) based on model pipeline, as far as I know, it is based on the llama architecture. And this repo gives me great help. But as a beginner, I did not quite understand about the
TiedLayerSpec
which is provied by deepspeed library. And I saw you provide twoget_model()
function.I just want to know which one should I use ?
Expect your reply sincerely
The text was updated successfully, but these errors were encountered: