0.13.3

Latest

b4rtaz released this 09 Apr 20:14

afa6297

This version fixes the selection of memory type in Vulkan, significantly improving inference speed on NVIDIA GPUs.

With this, it's now possible to run Distributed Llama on two GPUs within the same machine, check this test.

Assets 2

Provide feedback