You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I found if I use --deepspeed ds_config.json option, then print(trainer.model.state_dict()['model.layers.30.mlp.gate_proj.weight']) will print tensor([], device='cuda:0', dtype=torch.float16).
And It is mentioned in the README.md that FSDP full_shard mode is used, but FSDP and deepspeed should not be used at the same time.
I follow the step in README, but I get the empty state dict. Here is the code and the output:
code:
output:
tensor([[ 1.5984e-03, -1.6602e-02, -1.6460e-03, ..., -1.6632e-02,
-1.9989e-02, 1.1383e-02],
[ 9.5062e-03, 3.3356e-02, 5.6343e-03, ..., -3.6743e-02,
-3.2074e-02, 2.6810e-02],
[ 1.1917e-02, -2.1515e-02, -2.6352e-02, ..., 2.7328e-02,
-4.0550e-03, 1.5320e-02],
...,
[-2.8503e-02, 1.5316e-03, -1.8753e-02, ..., 2.9846e-02,
-1.9440e-02, 2.6703e-02],
[ 5.6505e-05, -4.5898e-02, 2.0660e-02, ..., -6.5689e-03,
-3.2043e-02, 1.8005e-02],
[-7.1106e-03, -7.1487e-03, -4.5624e-03, ..., 1.3138e-02,
-4.3060e-02, -1.5869e-02]])
training
tensor([], device='cuda:0', dtype=torch.float16)
trained
tensor([], device='cuda:0', dtype=torch.float16)
saved
The text was updated successfully, but these errors were encountered: