Skip to content

Commit

Permalink
REFACTOR TO THE MAX (huggingface#7)
Browse files Browse the repository at this point in the history
  • Loading branch information
lewtun authored Jan 24, 2025
1 parent 4cb6c95 commit d8aa42d
Show file tree
Hide file tree
Showing 19 changed files with 121 additions and 1,285 deletions.
2 changes: 1 addition & 1 deletion Makefile
Original file line number Diff line number Diff line change
Expand Up @@ -3,7 +3,7 @@
# make sure to test the local checkout in scripts and not the pre-installed one (don't use quotes!)
export PYTHONPATH = src

check_dirs := src scripts
check_dirs := src

style:
black --line-length 119 --target-version py310 $(check_dirs) setup.py
Expand Down
4 changes: 4 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -55,6 +55,10 @@ If it isn't installed, run:
sudo apt-get install git-lfs
```

## Training models



## Evaluating models

For small models use `--data_parallel=$NUM_GPUS`, for large models shard with `--tensor_parallel=$NUM_GPUS`
Expand Down
File renamed without changes.
20 changes: 2 additions & 18 deletions recipes/launch.slurm → launch.slurm
Original file line number Diff line number Diff line change
Expand Up @@ -24,29 +24,13 @@ echo "PYTHON ENV: $(which python)"
MODEL=Qwen2.5-1.5B-Instruct
TASK=sft
PRECISION=v00.00
ACCELERATOR=deepspeed_zero3
ACCELERATOR=zero3

# Training setup
NUM_NODES=$SLURM_NNODES
GPUS_PER_NODE=8
WORLD_SIZE=$(($NUM_NODES*$GPUS_PER_NODE))
# Due to conflicts between Accelerate's DeepSpeed configs and Transformers' TrainingArguments, we need to parse the gradient accumulation steps from the config file to ensure they match
CONFIG_FILE=recipes/$MODEL/$TASK/config_$PRECISION.yaml

echo "CONFIG_FILE: $CONFIG_FILE"
GRAD_ACC_STEPS=$(grep 'gradient_accumulation_steps' $CONFIG_FILE | awk '{print $2}')


# Loop through the arguments and find the one with "--gradient_accumulation_steps"
for arg in "${ARGS[@]}"; do
if [[ "$arg" == "--gradient_accumulation_steps="* ]]; then
# Extract the value after the equals sign
GRAD_ACC_STEPS="${arg#*=}"
break # Exit the loop once we find the desired argument
fi
done

echo "Gradient accumulation steps: $GRAD_ACC_STEPS"
# so processes know who to talk to
MASTER_ADDR=$(scontrol show hostnames $SLURM_JOB_NODELIST | head -n 1)
MASTER_PORT=6000
Expand All @@ -56,7 +40,7 @@ export CMD=" \
"

export LAUNCHER="HF_HUB_ENABLE_HF_TRANSFER=1 ACCELERATE_LOG_LEVEL=info TRANSFORMERS_VERBOSITY=info accelerate launch \
--config_file recipes/accelerate_configs/$ACCELERATOR.yaml \
--config_file accelerate_configs/$ACCELERATOR.yaml \
--gradient_accumulation_steps $GRAD_ACC_STEPS \
--num_machines $NUM_NODES \
--num_processes $WORLD_SIZE \
Expand Down
46 changes: 0 additions & 46 deletions recipes/Qwen2.5-1.5B-Instruct/sft/config_v00.00.yaml

This file was deleted.

26 changes: 0 additions & 26 deletions recipes/accelerate_configs/fsdp.yaml

This file was deleted.

25 changes: 0 additions & 25 deletions recipes/accelerate_configs/fsdp_qlora.yaml

This file was deleted.

16 changes: 0 additions & 16 deletions recipes/accelerate_configs/multi_gpu.yaml

This file was deleted.

1 change: 0 additions & 1 deletion scripts/training/README.md

This file was deleted.

Loading

0 comments on commit d8aa42d

Please sign in to comment.