kobold.cpp-elephantastic_experimental_v1.43.b1216
·
5556 commits
to concedo
since this release
Kobold CPP v1.43 with CUDA/CUBLAS MMQ fixed (buffers are allocated properly from the start), and unrestricted context.
CodeLlama2 c34b in Q4_K_S can run with 16384 context on a GTX 3090/4090 used as a second graphic card.