Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

AssertionError: Check batch related parameters. train_batch_size is not equal to micro_batch_per_gpu * gradient_acc_step * world_size 256 != 8 * 1 * 8 #14

Open
huangd1999 opened this issue Jun 17, 2023 · 1 comment

Comments

@huangd1999
Copy link

Hi, when I run:

torchrun --nproc_per_node=8 train.py \
    --model_name_or_path decapoda-research/llama-7b-hf \
    --data_path ./data/code_alpaca_20k.json \
    --fp16 True \
    --output_dir ./output \
    --num_train_epochs 3 \
    --per_device_train_batch_size 8 \
    --per_device_eval_batch_size 8 \
    --gradient_accumulation_steps 4 \
    --evaluation_strategy "no" \
    --save_strategy "steps" \
    --save_steps 500 \
    --save_total_limit 1 \
    --learning_rate 2e-5 \
    --weight_decay 0. \
    --warmup_ratio 0.03 \
    --lr_scheduler_type "cosine" \
    --logging_steps 1 \
    --deepspeed ds_config.json \
    --tf32 False

It raise error:

Traceback (most recent call last):
  File "/mnt/workspace/huangdong.s/codealpaca/train.py", line 222, in <module>
    train()
  File "/mnt/workspace/huangdong.s/codealpaca/train.py", line 188, in train
    model = transformers.AutoModelForCausalLM.from_pretrained(
  File "/mnt/workspace/anaconda3/envs/codebias/lib/python3.9/site-packages/transformers/models/auto/auto_factory.py", line 484, in from_pretrained
    return model_class.from_pretrained(
  File "/mnt/workspace/anaconda3/envs/codebias/lib/python3.9/site-packages/transformers/modeling_utils.py", line 2670, in from_pretrained
    init_contexts = [deepspeed.zero.Init(config_dict_or_path=deepspeed_config())] + init_contexts
  File "/mnt/workspace/anaconda3/envs/codebias/lib/python3.9/site-packages/deepspeed/runtime/zero/partition_parameters.py", line 724, in __init__
    _ds_config = deepspeed.runtime.config.DeepSpeedConfig(config_dict_or_path,
  File "/mnt/workspace/anaconda3/envs/codebias/lib/python3.9/site-packages/deepspeed/runtime/config.py", line 769, in __init__
    self._configure_train_batch_size()
  File "/mnt/workspace/anaconda3/envs/codebias/lib/python3.9/site-packages/deepspeed/runtime/config.py", line 942, in _configure_train_batch_size
    self._batch_assertion()
  File "/mnt/workspace/anaconda3/envs/codebias/lib/python3.9/site-packages/deepspeed/runtime/config.py", line 890, in _batch_assertion
    assert train_batch == micro_batch * grad_acc * self.world_size, (
AssertionError: Check batch related parameters. train_batch_size is not equal to micro_batch_per_gpu * gradient_acc_step * world_size 256 != 8 * 1 * 8

deepseed = 0.9.3
accelerate = 0.20.2

@Denilah
Copy link

Denilah commented Jun 18, 2023

you can add "gradient_accumulation_steps": "auto" in de_config.json, maybe...

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants