Skip to content

Releases: Nexesenex/croco.cpp

Croco.Cpp_FrankenFork_v1.82003_b4455

09 Jan 19:26
Compare
Choose a tag to compare

Croco.Cpp_FrankenFork_v1.82002_b4450

09 Jan 10:52
Compare
Choose a tag to compare

Croco.Cpp_FrankenFork_v1.82001_b4435

08 Jan 16:41
Compare
Choose a tag to compare

Last version to support IQ_K quants second gen (2_KS, 4_KS, 4_KSS) for now.

Full Changelog: v1.81102_b4407...v1.82001_b4435

IK_LLAMA.CPP custom from the 25/12/2024 included.

Croco.Cpp_FrankenFork_v1.81102_b4407

08 Jan 04:41
Compare
Choose a tag to compare

Beyond Concedo's last exp (07/01/2025), more BF16 Cuda integration attempted (Jart's BF16 PR for DMMV, and JG's BF16 MMV commit) beyond IK's work.

Full Changelog: v1.81001_b4407...v1.81102_b4407

Croco.Cpp_FrankenFork_v1.81100_b4407

07 Jan 12:20
Compare
Choose a tag to compare

Post-Llama.cpp file split.
For testing.

Full Changelog: v1.81001_b4407...v1.81100_b4407

Croco.Cpp_FrankenFork_v1.81001_b4407

04 Jan 13:21
Compare
Choose a tag to compare

Pre-Llama.cpp file refactor/split.

Concedo's WebSearch integrated up to its "fixed defective websearch" commit (so it's kinda a 1.81001b version).

Post-refactor/split is already compiled and working on my daily model and settings, but needs testing and some logs reinsertion, it will follow next week.

Croco.Cpp_FrankenFork_v1.80301_b3485

24 Dec 11:44
Compare
Choose a tag to compare

Western XMAS 2024 - Cuda test release 2

Features :

  • NEW : DMMV Kernel restored. (maybe not well, but I needed it)

  • NEW : IK's Trellis quants inference supported (IQ2_KT, 3_KT, 4_KT) in DMMV mode.

  • For NVidia GPUs Only. (Maybe can work on Hipblas, I don't know).

  • The usual Croco perks, and probably some bugs, as usual as well.

  • KCPP 1.80.3, the amazing Kobo.

  • LCPP b3485, the amazingly refactored Llama.

  • IK_Llama Q6_0 quant (inference and KV cache), IQ4_NL KV cache, BF16 inference and KV cache.

  • IK_Llama amazing quants IQ2_K, 3, 4, 5, 6 (first gen).

  • New : IQ_4KS, 4KSS, 2KS (second gen) inference working.

  • Partly New : A dozen or so amazing IK PRs related to CPU perfs, GGML ops, and Cuda to beef up performances.

  • Emphasis FSM for chat formatting (" and *), preferably for those not using antislop.

  • Image generation working as far as I know (GGUF and FP8 at least).

  • New : Nemotron 51b support added.

Known bugs:

  • KV Quants q6_0 not working on Gemma 2.
  • M-GPU Autolayer still messed-up, I've been focused only on IK's stuff during the last weeks.

Credits : Llama.cpp mainline team and contributors, Concedo and Koboldcpp contributors, Ikawrakow's IK_llama.cpp, and Yoshqu for Emphasis FSM

Full Changelog: v1.80300_b3485...v1.80301_b3485

Croco.Cpp_FrankenFork_v1.80300_b3485

24 Dec 05:30
Compare
Choose a tag to compare

XMAS Cuda test release 1 (and maybe the one and only)

  • For NVidia GPUs Only. (Maybe can work on Hipblas, I don't know).
  • The usual Croco perks, and probably some bugs, as usual as well.
  • KCPP 1.80.3, the amazing Kobo.
  • LCPP b3485, the amazingly refactored Llama.
  • IK_Llama Q6_0 quant (inference and KV cache), IQ4_NL KV cache, BF16 inference and KV cache.
  • IK_Llama amazing quants IQ2_K, 3, 4, 5, 6 (first gen).
  • New : IQ_4KS, 4KSS, 2KS (second gen) inference working.
  • Partly New : A dozen or so amazing IK PRs related to CPU perfs, GGML ops, and Cuda to beef up performances.
  • Emphasis FSM for chat formatting (" and *), preferably for those not using antislop.
  • Image generation working as far as I know (GGUF and FP8 at least).
  • New : Nemotron 51b support added.

Known bugs:

  • KV Quants q6_0 not working on Gemma 2.
  • M-GPU Autolayer still messed-up, I've been focused only on IK's stuff during the last weeks.

For Ampere and Ada only for now.

Credits : Llama.cpp mainline team and contributors, Concedo and Koboldcpp contributors, Ikawrakow's IK_llama.cpp, and Yoshqu for Emphasis FSM

Full Changelog: v1.80002_b4229...v1.80300_b3485

Croco.Cpp_FrankenFork_v1.80002_b4229

08 Dec 19:36
Compare
Choose a tag to compare

New IQ_K quants of Ikawrakow available for inference on Cuda.

  • IQ2_K, IQ3_K, IQ4_K, IQ5_K and IQ6_K.
    Almost no models, if any, are quantized with it and shared on HF.
    But it's one step ahead.
    The newer quants of IK are a bit harder to implement for me (I can't use the .c files of Llama.CPP and need to plainly integrate IK's work (in C++), so it'll take a bit longer, I learn as I do it basically.

It works on Python, I'm compiling an .exe for Pascal, Turing, and beyond right now.

Edit : I can't make an working .exe right now. I'll see what's up later.

What you can try if you don't know better :
Download the source, put the dll in the repository, install the requirements with the Install requirements.bat, then launch with Croco.Cpp_python_launch.bat

Non Cuda users, use the previous version. No IQ_K quants there yet, though.

I joined a compiled version of IK_LLAMA_CPP, with some edits of mine. Credits go to Ikawrakow.

Croco.Cpp_FrankenFork_v1.80001_b4229

04 Dec 18:14
Compare
Choose a tag to compare

The usual, plus :

  • Q6_0 quants supported, included the MMQ mode in Cuda (thanks Ikawrakow)
  • KV cache (Flash attention) mode K q6_0 / V q5_0 warmly recommended, very close to q8_0/q5_1 in terms of quality, and vastly superior to the previous best compromise q5_1/q5_0. (thanks Ikawrakow)
  • Image generation works again (Cuda and Vulkan tested), it was broken on previous Croco versions.

Full Changelog: v1.78003_b4067...v1.80001_b4229

Most credits go to Concedo, for KoboldCPP, and to the LlamaCPP team.