Skip to content

Releases: Nexesenex/croco.cpp

Croco.Cpp_FrankenFork_v1.93000_b5548_RMv1.11.3

31 May 12:30
Compare
Choose a tag to compare

NXS_Llama.cpp_v0.08_b5548

01 Jun 00:27
Compare
Choose a tag to compare
Merge branch 'nb_5548' into NXS_Llama.cpp

NXS_Llama.cpp_v0.06_b5506

30 May 12:05
Compare
Choose a tag to compare

FTYPES for IK Llama quants.
For IQ3_KS also.
And Custom quants function, see quantize.cpp help.

NXS_Llama.cpp_v0.05_b5506

29 May 15:09
Compare
Choose a tag to compare

Can quantize : Q6_0, IQ2_K, IQ3_K, IQ4_K, IQ5_K, IQ6_K, IQ5_KS, IQ4_KS, IQ3_KS, IQ2_KS, IQ4_KSS.
The quants will be read on Croco.cpp (last version)

v1.92120_b5506_RM1.111m

29 May 15:21
Compare
Choose a tag to compare

NXS_Llama.cpp Alpha 0.01 - b5474

28 May 03:09
Compare
Choose a tag to compare

First alpha version of NXS_Llama.cpp, a mainline llama koboldified, then ikawrified, based on my Croco mash-up.
Based on Llama.cpp b5474, and on some commits of IK_Llama.cpp.
Supports q6_0, IQ3_K, IQ4_K, IQ5_K, and IQ6_K in quantization.
No Cuda, nothing fancy. Just for high quality quantizations to use with Croco.cpp.

Credits :

  • The authors and contributors of Llama.cpp, IK_Llama.cpp (and notably Ikawrakow), and Kobold.cpp (and notably Concedo).

NXS_Llama.cpp Alpha 0.04 - b5525

28 May 19:29
Compare
Choose a tag to compare

Cuda works partially.
PPL test works with Gemma 3 and Llama.
Inference works only with a head of 256, and a fp16 cache. So Gemma3 for now.

NXS_Llama.cpp Alpha 0.02 - b5517

28 May 13:03
Compare
Choose a tag to compare
NXS_v0.02_b5517

Aplha 0.02 - b5517 - Merge branch 'master' into NXS_Llama.cpp

Croco.Cpp_FrankenFork_v1.92105_b5474_RM1.111m

27 May 14:08
Compare
Choose a tag to compare

WIP.

Adds the new SWA cache implementation feature.
Works on Gemma 3 for me.

Compiled with Cuda 12.9 for Pascal, Turing, and Ampere+

Croco.Cpp_FrankenFork_v1.92060_b5427_RM1.102

21 May 02:20
Compare
Choose a tag to compare

WIP.
Brings back the 2nd gen of Ikawrakow's IQ_K quants (IQ4_KS, IQ2_KS). Also the new IQ5_KS.
And also his CUDA MMQ Kernels for those three, and for the 1st gen IQ_K quants as well.
Yes, my commit list is a damn mess, and the GPU-auto-layer needs to be fixed among other things.
But aside that, it works for me. +100% PP at BBS 128 for the IQ_K quants compared to Cublas mode.
Compiled on an Ampere machine, might work on Pascal and Turing as well, and of course on more recent GPUs.
Cuda release, as usual. Don't expect anything else to work.
Esobold's pdfplumber is not included, I can't compile on Windows with it.

Note : Cudart is v12.9, not 12.0 as my messy changelog says.

Full Changelog: v1.91015_b5326_RM1.100...v1.92060_b5427_RM1.102