我调整了一下训练的配置，显示我在cpu训练 #47

wzk2239115 · 2025-02-27T07:18:13Z

export CUDA_VISIBLE_DEVICES=6,7

accelerate launch
--num_processes 2
--config_file deepspeed_zero3.yaml
train_Datawhale-R1.py
--config Datawhale-R1.yaml \

这是我设置的卡，我想用物理卡6和卡7

然后我设置了Datawhale-R1.yaml

GRPO 算法参数

beta: 0.001 # KL 惩罚因子，调整过，参见下文介绍
max_prompt_length: 256 # 输入 prompt 最大长度，本实验基本不会有太大变化
max_completion_length: 4096 # 输出回答长度，包含推理思维链，设为 4K 比较合适
num_generations: 8
use_vllm: true # 启用 vllm 来加速推理
vllm_device: cuda:0 # 留出一张卡来启用 vllm 推理，参见下文介绍
vllm_gpu_memory_utilization: 0.9
相当于我用物理卡6进行vllm推理，也就是cuda:0

但是我显示警告：
[2025-02-27 15:14:28,807] [INFO] [config.py:734:init] Config mesh_device None world_size = 2
You are attempting to use Flash Attention 2.0 with a model not initialized on GPU. Make sure to move the model to GPU after initializing it on CPU with model.to('cuda').

The text was updated successfully, but these errors were encountered:

anine09 · 2025-02-27T07:22:47Z

Hi @wzk2239115 ，请确保你的 Pytorch 能够正常识别 CUDA 设备，看看你的 torch.cuda.is_available() 的输出

wzk2239115 · 2025-02-27T07:27:22Z

显示为True,我也调用了torch.cuda.device_count()，显示为2

anine09 · 2025-02-27T07:29:43Z

那你的代码实际运行的时候有没有在卡上面跑

wzk2239115 · 2025-02-27T07:31:42Z

显示是[2025-02-27 15:28:53,614] [INFO] [config.py:734:init] Config mesh_device None world_size = 2
You are attempting to use Flash Attention 2.0 with a model not initialized on GPU. Make sure to move the model to GPU after initializing it on CPU with model.to('cuda').

6，7 是这个任务在用，感觉不太对劲，但不知道为啥

anine09 · 2025-02-27T07:34:35Z

哦我知道了，我们文章里有提到，我们这个代码需要留出一张卡作为 vllm 推理卡，所以如果你只有开两张卡的话，你需要将 --num_processes 设置为 1

wzk2239115 · 2025-02-27T07:37:06Z

我修改了--num_process=1会直接异常退出，显示[rank0]: Traceback (most recent call last): [rank0]: File "/home/roots/clouditera/grpo/RL_sec/Datawhale-R1/train_Datawhale-R1.py", line 412, in <module> [rank0]: main() [rank0]: File "/home/roots/clouditera/grpo/RL_sec/Datawhale-R1/train_Datawhale-R1.py", line 409, in main [rank0]: grpo_function(model_args, dataset_args, training_args, callbacks=callbacks) [rank0]: File "/home/roots/clouditera/grpo/RL_sec/Datawhale-R1/train_Datawhale-R1.py", line 338, in grpo_function [rank0]: trainer = GRPOTrainer( [rank0]: ^^^^^^^^^^^^ [rank0]: File "/root/miniconda3/envs/unlockDeepseek/lib/python3.12/site-packages/trl/trainer/grpo_trainer.py", line 346, in __init__ [rank0]: raise ValueError( [rank0]: ValueError: The global train batch size (1 x 1) must be evenly divisible by the number of generations per prompt (8). Given the current train batch size, the valid values for the number of generations are: [].
所以我调整成2来运行

anine09 · 2025-02-27T07:42:16Z

我需要看看你的训练配置文件，你似乎修改了 batch size

wzk2239115 · 2025-02-27T07:43:31Z

好的，是Datawhale-R1.yaml文件吧
`# 模型参数
model_name_or_path: /home/roots/grpo/Qwen2.5-3B-Instruct
model_revision: main
torch_dtype: bfloat16
attn_implementation: flash_attention_2
bf16: true
tf32: true
output_dir: /home/roots/grpo/RL_sec/Datawhale-R1/output

数据集参数

dataset_id_or_path: /home/roots/grpo/RL_sec/cseval

Swanlab 训练流程记录参数

swanlab: true # 是否开启 Swanlab
workspace: kzw99999
project: sec-R1-by_wzk
experiment_name: qwen2.5-3B-lr:5e-7_beta:0.001

训练参数

max_steps: 450 # 最大训练步长
per_device_train_batch_size: 1
gradient_accumulation_steps: 8
gradient_checkpointing: true
gradient_checkpointing_kwargs:
use_reentrant: false
learning_rate: 5.0e-7 # 学习率，调整过，参见下文介绍
lr_scheduler_type: cosine # 学习率衰减方案
warmup_ratio: 0.03 # 学习率预热比率（对于整个步长），好用！
seed: 2025 # 随机种子，方便实验复现

GRPO 算法参数

beta: 0.001 # KL 惩罚因子，调整过，参见下文介绍
max_prompt_length: 256 # 输入 prompt 最大长度，本实验基本不会有太大变化
max_completion_length: 4096 # 输出回答长度，包含推理思维链，设为 4K 比较合适
num_generations: 8
use_vllm: true # 启用 vllm 来加速推理
vllm_device: cuda:0 # 留出一张卡来启用 vllm 推理，参见下文介绍
vllm_gpu_memory_utilization: 0.9

Logging arguments

logging_strategy: steps
logging_steps: 1
save_strategy: "steps"
save_steps: 50 # 每隔多少步保存一次
`

anine09 · 2025-02-27T07:46:18Z

看报错信息，你需要保证 per_device_train_batch_size % num_generations == 0，你修改下这两个配置试试，--num_processes 依然保持为 1

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

我调整了一下训练的配置，显示我在cpu训练 #47

我调整了一下训练的配置，显示我在cpu训练 #47

wzk2239115 commented Feb 27, 2025

anine09 commented Feb 27, 2025

wzk2239115 commented Feb 27, 2025

anine09 commented Feb 27, 2025

wzk2239115 commented Feb 27, 2025

anine09 commented Feb 27, 2025

wzk2239115 commented Feb 27, 2025

anine09 commented Feb 27, 2025

wzk2239115 commented Feb 27, 2025

anine09 commented Feb 27, 2025

我调整了一下训练的配置，显示我在cpu训练 #47

我调整了一下训练的配置，显示我在cpu训练 #47

Comments

wzk2239115 commented Feb 27, 2025

GRPO 算法参数

anine09 commented Feb 27, 2025

wzk2239115 commented Feb 27, 2025

anine09 commented Feb 27, 2025

wzk2239115 commented Feb 27, 2025

anine09 commented Feb 27, 2025

wzk2239115 commented Feb 27, 2025

anine09 commented Feb 27, 2025

wzk2239115 commented Feb 27, 2025

数据集参数

Swanlab 训练流程记录参数

训练参数

GRPO 算法参数

Logging arguments

anine09 commented Feb 27, 2025