Release 0.13.2 · b4rtaz/distributed-llama

This version fixes Vulkan support on Nvidia GPUs. It was successfully executed on Google Colab with an Nvidia T4 GPU.

How to run Distributed Llama on Google Colab?

# install drivers
!wget -qO- https://packages.lunarg.com/lunarg-signing-key-pub.asc | sudo tee /etc/apt/trusted.gpg.d/lunarg.asc
!sudo wget -qO /etc/apt/sources.list.d/lunarg-vulkan-1.4.309-jammy.list https://packages.lunarg.com/vulkan/1.4.309/lunarg-vulkan-1.4.309-jammy.list
!sudo apt update
!sudo apt install -y vulkan-sdk libnvidia-gl-525

# check
!nvidia-smi
!vulkaninfo | grep "GPU id"

# install
!git clone https://github.com/b4rtaz/distributed-llama.git
!cd distributed-llama && rm *.o && rm src/nn/vulkan/*.spv && DLLAMA_VULKAN=1 make dllama

# model
!cd distributed-llama && python3 launch.py llama3_1_8b_instruct_q40

# run
!cd distributed-llama && ./dllama inference --prompt "Tensor parallelism is all you need" --steps 128 \
   --model models/llama3_1_8b_instruct_q40/dllama_model_llama3_1_8b_instruct_q40.m --tokenizer models/llama3_1_8b_instruct_q40/dllama_tokenizer_llama3_1_8b_instruct_q40.t \
   --buffer-float-type q80 --max-seq-len 4096 --nthreads 1 --gpu-index 0

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

0.13.2