请问unsloth方法使用的基座模型是什么？我用7b训崩了 #44

dream-tentacle · 2025-02-22T05:26:20Z

尝试1

我的模型是qwen2.5-7b-instruct，用unsloth方法训练，lora_r是64，会训崩：

尝试2

尝试设置lora_r: 128, per_device_train_batch_size: 8, gradient_accumulation_steps: 16，曲线明显平滑了一些，但是后期也会训崩：

尝试2模型输出

训崩的时候，控制台显示的输出大概是这样的：

Question:
<|im_start|>system
You are Qwen, created by Alibaba Cloud. You are a helpful assistant.<|im_end|>
<|im_start|>user
使用给定的数字 [32, 39, 96]，创建一个等于 36 的方程。你可以使用基本算术运算（+、-、*、/）一次或多次，但每个数字只能使用一次。在 <think> </think> 标签中展示你的思考过程，并在 <answer> </answer> 标签中返回最终方程，例如 <answer> (1 + 2) / 3 </answer>。在 <think> 标签中逐步思考。<|im_end|>
<|im_start|>assistant
让我们逐步解决这个问题。
<think> 
Completion:
<think>39-6==93-27=63-0=12
6+92=12 =36
.需要试试更换拿些是加数6+392101-224
3*3+2-91即
（3-9+2）*3-2+6
62*3+1+2-93
2+39*2+164
3+9*2+2*169
6+3=9/2-146
39+5*2-225
92+213=114
1-96<30
直接加须超过100
加减结果 26
减倍=44
加除等于=6
32-39=-7
3+29-3-96=36
可以=32-3(29+8739=96
-33+29+2-69--3-26+261-6-673-3
92-2+3-49-16
32-32*2-1-21
-32+3-2-96+34192-2+39-96+3-2
32-32-39-2 -63-1-211-6-2-2-11-11-11111811-2
22-32++4812-32 12-1-2-2-2+1-1
32-2-1-6211-4812-88-818-8-78-115
32+2-39=7-29+1-111-87-87。
2-32+39+1-136-112-961111111111
必为负结果
(92*4-2+3-1-2-1-1+3-33
36-39-1-8133-2-19-2
32-39-3-13-1-11-9+1-111-9-1
92-2+3-2-136-111-119-7-1-11
2+1-32+3913-2-111-111111
92-2+3-2-136-1111111
96-1-32-3+2-13-31-1111-1111-36
96-1-2-31-111-111-3-111-11-1111
96-1-2-31-11-1111-111-1111-111-11-1111-1-1111111-1-1111-1111-1-1111-11111-111
1-11-1111-1-11
32-3*39+96=36-36
32+3*（39-96）32+3-9-36-3
(39-36)36-12-12
32-39+3*96=6
32+12-396-12
32-3*（39-96）32-3-9-3
(39-36)36-12-12

39-36=3
32-39+96=36
</think>
<answer>32-39+96</answer> 
Result:
89 
Target:
36 
Numbers:
[32, 39, 96]

The text was updated successfully, but these errors were encountered:

anine09 · 2025-02-25T14:39:05Z

@Kedreamix 看看能不能给这位 Bro 展示下你的训练参数

dream-tentacle · 2025-02-26T02:52:59Z

lora_r: 128 # Lora 秩数
lora_alpha: 32
per_device_train_batch_size: 8
gradient_accumulation_steps: 16
optim: adamw_torch # 优化器, 8bit加速
max_completion_length: 4096 # 输出回答长度，包含推理思维链
vllm_gpu_memory_utilization: 0.5

以上是尝试2的参数，其他的参数和仓库里的保持一致

anine09 · 2025-02-26T10:12:29Z

@dream-tentacle 其实我们在社区里也看到很多人 7B 训崩了，但是不知道什么问题，或许你换个尺寸？

Kedreamix · 2025-02-26T12:16:41Z

@dream-tentacle 你好，我用的基座是Qwen-3B-Instruct的模型，这个在我的Swanlab日志中都有详细的参数展示，给予你的问题，我提出几个点，我觉得第一有可能是参数的问题，确实有时候一个好的参数能够让模型快速收敛，不过我只探究了3B模型，所以对于7B模型暂时不能给出很好的建议，第二是我觉得你可以考虑减少一下LoRA的秩数，或许能够更快收敛

dream-tentacle · 2025-02-26T12:44:50Z

非常感谢，不过想知道为什么您认为降低LoRA的秩数能更快收敛呢，我的理解里LoRA的秩数主要影响训练的参数量，更多参数被训练似乎并不会说明更慢收敛？

dream-tentacle · 2025-02-28T07:34:26Z

使用3b到后期也训崩了，格式奖励降低了一些，正确率极低，输出长这样：

dream-tentacle changed the title ~~lora用多大的模型和多少lora_r能成功？~~ 请问unsloth方法使用的基座模型是什么？ Feb 24, 2025

dream-tentacle changed the title ~~请问unsloth方法使用的基座模型是什么？~~ 请问unsloth方法使用的基座模型是什么？我用7b训崩了 Feb 24, 2025

anine09 mentioned this issue Mar 3, 2025

复现效果失败 #48

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

请问unsloth方法使用的基座模型是什么？我用7b训崩了 #44

请问unsloth方法使用的基座模型是什么？我用7b训崩了 #44

dream-tentacle commented Feb 22, 2025 •

edited

Loading

anine09 commented Feb 25, 2025

dream-tentacle commented Feb 26, 2025

anine09 commented Feb 26, 2025

Kedreamix commented Feb 26, 2025

dream-tentacle commented Feb 26, 2025

dream-tentacle commented Feb 28, 2025

请问unsloth方法使用的基座模型是什么？我用7b训崩了 #44

请问unsloth方法使用的基座模型是什么？我用7b训崩了 #44

Comments

dream-tentacle commented Feb 22, 2025 • edited Loading

尝试1

尝试2

尝试2模型输出

anine09 commented Feb 25, 2025

dream-tentacle commented Feb 26, 2025

anine09 commented Feb 26, 2025

Kedreamix commented Feb 26, 2025

dream-tentacle commented Feb 26, 2025

dream-tentacle commented Feb 28, 2025

dream-tentacle commented Feb 22, 2025 •

edited

Loading