We read every piece of feedback, and take your input very seriously.
To see all available qualifiers, see our documentation.
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[你好,我使用最新的仓库去在0.5,1.5以及3B上实验,显卡是单卡4090,lora微调,复现出来的结果差很多,格式遵循能力很强,但是准确度很低,生成长度也越来越短]
The text was updated successfully, but these errors were encountered:
这是我的训练过程和参数设置
model_name_or_path: /root/shared-nvme/model/Qwen2.5-3B/ model_revision: main torch_dtype: bfloat16 lora_r: 64 # Lora 秩数 lora_alpha: 32 attn_implementation: flash_attention_2 bf16: true tf32: true output_dir: ./output/Datawhale-R1-3B
dataset_id_or_path: ./data/
swanlab: true # 是否开启 Swanlab workspace: alexyu010120 project: r1-repoduction experiment_name: qwen2.5-3B-lr:1e-5_beta:1e-3-r32a64
max_steps: 750 # 最大训练步长 per_device_train_batch_size: 1 gradient_accumulation_steps: 4 gradient_checkpointing: true gradient_checkpointing_kwargs: use_reentrant: false learning_rate: 1.0e-5 # 学习率调整为1e-5 lr_scheduler_type: cosine # 学习率衰减方案 warmup_ratio: 0.03 # 学习率预热比率(对于整个步长),好用! seed: 2025 # 随机种子,方便实验复现
beta: 0.001 # KL 惩罚因子,调整过,参见下文介绍 optim: adamw_8bit # 优化器, 8bit加速 max_prompt_length: 256 # 输入 prompt 最大长度,本实验基本不会有太大变化 max_completion_length: 1024 # 输出回答长度,包含推理思维链 num_generations: 4 use_vllm: true # 启用 vllm 来加速推理 vllm_gpu_memory_utilization: 0.4
logging_strategy: steps logging_steps: 1 save_strategy: "steps" save_steps: 50 # 每隔多少步保存一次
Sorry, something went wrong.
或许你可以参考下 @Kedreamix 的参数试试:#44 (comment)
No branches or pull requests
[你好,我使用最新的仓库去在0.5,1.5以及3B上实验,显卡是单卡4090,lora微调,复现出来的结果差很多,格式遵循能力很强,但是准确度很低,生成长度也越来越短]
The text was updated successfully, but these errors were encountered: