Skip to content

Releases: b4rtaz/distributed-llama

0.1.1

23 Jan 23:07
f2137af
Compare
Choose a tag to compare

This version introduces partial optimization for x86_64 AVX2 CPUs. Now it's possible to run the inference with Q40 weights and Q80 buffer with partial AVX2 acceleration.

0.1.0

23 Jan 22:50
Compare
Choose a tag to compare

Initial release! 🚢