Releases: Nexesenex/croco.cpp
Croco.Cpp_FrankenFork_v1.82003_b4455
Full Changelog: v1.82002_b4450...v1.82003_b4455
Croco.Cpp_FrankenFork_v1.82002_b4450
Full Changelog: v1.82001_b4435...v1.82002_b4450
Croco.Cpp_FrankenFork_v1.82001_b4435
Last version to support IQ_K quants second gen (2_KS, 4_KS, 4_KSS) for now.
Full Changelog: v1.81102_b4407...v1.82001_b4435
IK_LLAMA.CPP custom from the 25/12/2024 included.
Croco.Cpp_FrankenFork_v1.81102_b4407
Beyond Concedo's last exp (07/01/2025), more BF16 Cuda integration attempted (Jart's BF16 PR for DMMV, and JG's BF16 MMV commit) beyond IK's work.
Full Changelog: v1.81001_b4407...v1.81102_b4407
Croco.Cpp_FrankenFork_v1.81100_b4407
Post-Llama.cpp file split.
For testing.
Full Changelog: v1.81001_b4407...v1.81100_b4407
Croco.Cpp_FrankenFork_v1.81001_b4407
Pre-Llama.cpp file refactor/split.
Concedo's WebSearch integrated up to its "fixed defective websearch" commit (so it's kinda a 1.81001b version).
Post-refactor/split is already compiled and working on my daily model and settings, but needs testing and some logs reinsertion, it will follow next week.
Croco.Cpp_FrankenFork_v1.80301_b3485
Western XMAS 2024 - Cuda test release 2
Features :
-
NEW : DMMV Kernel restored. (maybe not well, but I needed it)
-
NEW : IK's Trellis quants inference supported (IQ2_KT, 3_KT, 4_KT) in DMMV mode.
-
For NVidia GPUs Only. (Maybe can work on Hipblas, I don't know).
-
The usual Croco perks, and probably some bugs, as usual as well.
-
KCPP 1.80.3, the amazing Kobo.
-
LCPP b3485, the amazingly refactored Llama.
-
IK_Llama Q6_0 quant (inference and KV cache), IQ4_NL KV cache, BF16 inference and KV cache.
-
IK_Llama amazing quants IQ2_K, 3, 4, 5, 6 (first gen).
-
New : IQ_4KS, 4KSS, 2KS (second gen) inference working.
-
Partly New : A dozen or so amazing IK PRs related to CPU perfs, GGML ops, and Cuda to beef up performances.
-
Emphasis FSM for chat formatting (" and *), preferably for those not using antislop.
-
Image generation working as far as I know (GGUF and FP8 at least).
-
New : Nemotron 51b support added.
Known bugs:
- KV Quants q6_0 not working on Gemma 2.
- M-GPU Autolayer still messed-up, I've been focused only on IK's stuff during the last weeks.
Credits : Llama.cpp mainline team and contributors, Concedo and Koboldcpp contributors, Ikawrakow's IK_llama.cpp, and Yoshqu for Emphasis FSM
Full Changelog: v1.80300_b3485...v1.80301_b3485
Croco.Cpp_FrankenFork_v1.80300_b3485
XMAS Cuda test release 1 (and maybe the one and only)
- For NVidia GPUs Only. (Maybe can work on Hipblas, I don't know).
- The usual Croco perks, and probably some bugs, as usual as well.
- KCPP 1.80.3, the amazing Kobo.
- LCPP b3485, the amazingly refactored Llama.
- IK_Llama Q6_0 quant (inference and KV cache), IQ4_NL KV cache, BF16 inference and KV cache.
- IK_Llama amazing quants IQ2_K, 3, 4, 5, 6 (first gen).
- New : IQ_4KS, 4KSS, 2KS (second gen) inference working.
- Partly New : A dozen or so amazing IK PRs related to CPU perfs, GGML ops, and Cuda to beef up performances.
- Emphasis FSM for chat formatting (" and *), preferably for those not using antislop.
- Image generation working as far as I know (GGUF and FP8 at least).
- New : Nemotron 51b support added.
Known bugs:
- KV Quants q6_0 not working on Gemma 2.
- M-GPU Autolayer still messed-up, I've been focused only on IK's stuff during the last weeks.
For Ampere and Ada only for now.
Credits : Llama.cpp mainline team and contributors, Concedo and Koboldcpp contributors, Ikawrakow's IK_llama.cpp, and Yoshqu for Emphasis FSM
Full Changelog: v1.80002_b4229...v1.80300_b3485
Croco.Cpp_FrankenFork_v1.80002_b4229
New IQ_K quants of Ikawrakow available for inference on Cuda.
- IQ2_K, IQ3_K, IQ4_K, IQ5_K and IQ6_K.
Almost no models, if any, are quantized with it and shared on HF.
But it's one step ahead.
The newer quants of IK are a bit harder to implement for me (I can't use the .c files of Llama.CPP and need to plainly integrate IK's work (in C++), so it'll take a bit longer, I learn as I do it basically.
It works on Python, I'm compiling an .exe for Pascal, Turing, and beyond right now.
Edit : I can't make an working .exe right now. I'll see what's up later.
What you can try if you don't know better :
Download the source, put the dll in the repository, install the requirements with the Install requirements.bat, then launch with Croco.Cpp_python_launch.bat
Non Cuda users, use the previous version. No IQ_K quants there yet, though.
I joined a compiled version of IK_LLAMA_CPP, with some edits of mine. Credits go to Ikawrakow.
Croco.Cpp_FrankenFork_v1.80001_b4229
The usual, plus :
- Q6_0 quants supported, included the MMQ mode in Cuda (thanks Ikawrakow)
- KV cache (Flash attention) mode K q6_0 / V q5_0 warmly recommended, very close to q8_0/q5_1 in terms of quality, and vastly superior to the previous best compromise q5_1/q5_0. (thanks Ikawrakow)
- Image generation works again (Cuda and Vulkan tested), it was broken on previous Croco versions.
Full Changelog: v1.78003_b4067...v1.80001_b4229
Most credits go to Concedo, for KoboldCPP, and to the LlamaCPP team.