DeepSpeed版本训练报错RuntimeError: The size of tensor a (780) must match the size of tensor b (781) #40

tize-72 · 2025-02-18T18:41:41Z

不是unsloth版本的也会训练报维度错误，这个不知道有人遇到过吗

hellobiek · 2025-02-18T22:40:13Z

一样遇到，但是我的是 unsloth 版本

Mrkkew · 2025-02-19T06:30:39Z

遇到了如下的错误：
[rank3]: File "/home/jovyan/work/tanzichang/miniconda/envs/vl_vllm/lib/python3.10/site-packages/transformers/trainer.py", line 2171, in train
[rank3]: return inner_training_loop(
[rank3]: File "/home/jovyan/work/tanzichang/miniconda/envs/vl_vllm/lib/python3.10/site-packages/transformers/trainer.py", line 2531, in _inner_training_loop
[rank3]: tr_loss_step = self.training_step(model, inputs, num_items_in_batch)
[rank3]: File "/home/jovyan/work/tanzichang/miniconda/envs/vl_vllm/lib/python3.10/site-packages/transformers/trainer.py", line 3675, in training_step
[rank3]: loss = self.compute_loss(model, inputs, num_items_in_batch=num_items_in_batch)
[rank3]: File "/home/jovyan/work/tanzichang/miniconda/envs/vl_vllm/lib/python3.10/site-packages/trl/trainer/grpo_trainer.py", line 495, in compute_loss
[rank3]: rewards_per_func[:, i] = torch.tensor(output_reward_func, dtype=torch.float32, device=device)
[rank3]: RuntimeError: The expanded size of the tensor (10) must match the existing size (11) at non-singleton dimension 0. Target sizes: [10]. Tensor sizes: [11]
请问是怎么回事

SwayDy · 2025-02-19T12:16:34Z

我也遇到了：RuntimeError: The size of tensor a (1024) must match the size of tensor b (1025) at non-singleton dimension 1

SwayDy · 2025-02-19T13:27:14Z

我也遇到了：RuntimeError: The size of tensor a (1024) must match the size of tensor b (1025) at non-singleton dimension 1

train_Datawhale-R1_unsloth.py中第201行:
max_seq_length=training_args.max_completion_length, # 设置最大序列长度
改成:
max_seq_length=training_args.max_prompt_length + training_args.max_completion_length, # 设置最大序列长度
就能跑了

xiaoan17 · 2025-02-20T06:03:37Z

我也遇到了：RuntimeError: The size of tensor a (1024) must match the size of tensor b (1025) at non-singleton dimension 1我也遇到了：RuntimeError：张量 a 的大小（1024）必须与张量 b 的大小（1025）在非单元素维度 1 上匹配

train_Datawhale-R1_unsloth.py中第201行: max_seq_length=training_args.max_completion_length, # 设置最大序列长度改成: max_seq_length=training_args.max_prompt_length + training_args.max_completion_length, # 设置最大序列长度就能跑了

十分感谢，修改之后可以正常运行了。

anine09 · 2025-02-25T14:37:45Z

我也遇到了：RuntimeError: The size of tensor a (1024) must match the size of tensor b (1025) at non-singleton dimension 1

train_Datawhale-R1_unsloth.py中第201行: max_seq_length=training_args.max_completion_length, # 设置最大序列长度改成: max_seq_length=training_args.max_prompt_length + training_args.max_completion_length, # 设置最大序列长度就能跑了

我没有 get 到为什么会出现这种情况， @tize-72 你这样修改后能在 DeepSpeed 的版本上运行吗？

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

DeepSpeed版本训练报错RuntimeError: The size of tensor a (780) must match the size of tensor b (781) #40

DeepSpeed版本训练报错RuntimeError: The size of tensor a (780) must match the size of tensor b (781) #40

tize-72 commented Feb 18, 2025

hellobiek commented Feb 18, 2025 •

edited

Loading

Mrkkew commented Feb 19, 2025

SwayDy commented Feb 19, 2025

SwayDy commented Feb 19, 2025

xiaoan17 commented Feb 20, 2025

anine09 commented Feb 25, 2025

DeepSpeed版本训练报错RuntimeError: The size of tensor a (780) must match the size of tensor b (781) #40

DeepSpeed版本训练报错RuntimeError: The size of tensor a (780) must match the size of tensor b (781) #40

Comments

tize-72 commented Feb 18, 2025

hellobiek commented Feb 18, 2025 • edited Loading

Mrkkew commented Feb 19, 2025

SwayDy commented Feb 19, 2025

SwayDy commented Feb 19, 2025

xiaoan17 commented Feb 20, 2025

anine09 commented Feb 25, 2025

hellobiek commented Feb 18, 2025 •

edited

Loading