Skip to content

0.13.2

Latest
Compare
Choose a tag to compare
@b4rtaz b4rtaz released this 01 Apr 14:36

This version fixes Vulkan support on Nvidia GPUs. It was successfully executed on Google Colab with an Nvidia T4 GPU.


How to run Distributed Llama on Google Colab?

# install drivers
!wget -qO- https://packages.lunarg.com/lunarg-signing-key-pub.asc | sudo tee /etc/apt/trusted.gpg.d/lunarg.asc
!sudo wget -qO /etc/apt/sources.list.d/lunarg-vulkan-1.4.309-jammy.list https://packages.lunarg.com/vulkan/1.4.309/lunarg-vulkan-1.4.309-jammy.list
!sudo apt update
!sudo apt install -y vulkan-sdk libnvidia-gl-525

# check
!nvidia-smi
!vulkaninfo | grep "GPU id"

# install
!git clone https://github.com/b4rtaz/distributed-llama.git
!cd distributed-llama && rm *.o && rm src/nn/vulkan/*.spv && DLLAMA_VULKAN=1 make dllama

# model
!cd distributed-llama && python3 launch.py llama3_1_8b_instruct_q40

# run
!cd distributed-llama && ./dllama inference --prompt "Tensor parallelism is all you need" --steps 128 \
   --model models/llama3_1_8b_instruct_q40/dllama_model_llama3_1_8b_instruct_q40.m --tokenizer models/llama3_1_8b_instruct_q40/dllama_tokenizer_llama3_1_8b_instruct_q40.t \
   --buffer-float-type q80 --max-seq-len 4096 --nthreads 1 --gpu-index 0