We read every piece of feedback, and take your input very seriously.
To see all available qualifiers, see our documentation.
afa6297
This version fixes the selection of memory type in Vulkan, significantly improving inference speed on NVIDIA GPUs.
With this, it's now possible to run Distributed Llama on two GPUs within the same machine, check this test.