Releases · Nexesenex/croco.cpp

Post-refactor/split is already compiled and working on my daily model and settings, but needs testing and some logs reinsertion, it will follow next week.

Assets 3

24 Dec 11:44

Nexesenex

v1.80301_b3485

0238fe5

Croco.Cpp_FrankenFork_v1.80301_b3485

Western XMAS 2024 - Cuda test release 2

Features :

NEW : DMMV Kernel restored. (maybe not well, but I needed it)
NEW : IK's Trellis quants inference supported (IQ2_KT, 3_KT, 4_KT) in DMMV mode.
For NVidia GPUs Only. (Maybe can work on Hipblas, I don't know).
The usual Croco perks, and probably some bugs, as usual as well.
KCPP 1.80.3, the amazing Kobo.
LCPP b3485, the amazingly refactored Llama.
IK_Llama Q6_0 quant (inference and KV cache), IQ4_NL KV cache, BF16 inference and KV cache.
IK_Llama amazing quants IQ2_K, 3, 4, 5, 6 (first gen).
New : IQ_4KS, 4KSS, 2KS (second gen) inference working.
Partly New : A dozen or so amazing IK PRs related to CPU perfs, GGML ops, and Cuda to beef up performances.
Emphasis FSM for chat formatting (" and *), preferably for those not using antislop.
Image generation working as far as I know (GGUF and FP8 at least).
New : Nemotron 51b support added.

Known bugs:

KV Quants q6_0 not working on Gemma 2.
M-GPU Autolayer still messed-up, I've been focused only on IK's stuff during the last weeks.

Credits : Llama.cpp mainline team and contributors, Concedo and Koboldcpp contributors, Ikawrakow's IK_llama.cpp, and Yoshqu for Emphasis FSM

Full Changelog: v1.80300_b3485...v1.80301_b3485

Assets 4

24 Dec 05:30

Nexesenex

v1.80300_b3485

474e5b9

Croco.Cpp_FrankenFork_v1.80300_b3485

XMAS Cuda test release 1 (and maybe the one and only)

For NVidia GPUs Only. (Maybe can work on Hipblas, I don't know).
The usual Croco perks, and probably some bugs, as usual as well.
KCPP 1.80.3, the amazing Kobo.
LCPP b3485, the amazingly refactored Llama.
IK_Llama Q6_0 quant (inference and KV cache), IQ4_NL KV cache, BF16 inference and KV cache.
IK_Llama amazing quants IQ2_K, 3, 4, 5, 6 (first gen).
New : IQ_4KS, 4KSS, 2KS (second gen) inference working.
Partly New : A dozen or so amazing IK PRs related to CPU perfs, GGML ops, and Cuda to beef up performances.
Emphasis FSM for chat formatting (" and *), preferably for those not using antislop.
Image generation working as far as I know (GGUF and FP8 at least).
New : Nemotron 51b support added.

Known bugs:

KV Quants q6_0 not working on Gemma 2.
M-GPU Autolayer still messed-up, I've been focused only on IK's stuff during the last weeks.

For Ampere and Ada only for now.

Credits : Llama.cpp mainline team and contributors, Concedo and Koboldcpp contributors, Ikawrakow's IK_llama.cpp, and Yoshqu for Emphasis FSM

Full Changelog: v1.80002_b4229...v1.80300_b3485

Assets 3

08 Dec 19:36

Nexesenex

v1.80002_b4229

fab59c8

Croco.Cpp_FrankenFork_v1.80002_b4229

New IQ_K quants of Ikawrakow available for inference on Cuda.

IQ2_K, IQ3_K, IQ4_K, IQ5_K and IQ6_K.
Almost no models, if any, are quantized with it and shared on HF.
But it's one step ahead.
The newer quants of IK are a bit harder to implement for me (I can't use the .c files of Llama.CPP and need to plainly integrate IK's work (in C++), so it'll take a bit longer, I learn as I do it basically.

It works on Python, I'm compiling an .exe for Pascal, Turing, and beyond right now.

Edit : I can't make an working .exe right now. I'll see what's up later.

What you can try if you don't know better :
Download the source, put the dll in the repository, install the requirements with the Install requirements.bat, then launch with Croco.Cpp_python_launch.bat

Non Cuda users, use the previous version. No IQ_K quants there yet, though.

I joined a compiled version of IK_LLAMA_CPP, with some edits of mine. Credits go to Ikawrakow.

Assets 4

04 Dec 18:14

Nexesenex

v1.80001_b4229

22d017e

Croco.Cpp_FrankenFork_v1.80001_b4229

The usual, plus :

Q6_0 quants supported, included the MMQ mode in Cuda (thanks Ikawrakow)
KV cache (Flash attention) mode K q6_0 / V q5_0 warmly recommended, very close to q8_0/q5_1 in terms of quality, and vastly superior to the previous best compromise q5_1/q5_0. (thanks Ikawrakow)
Image generation works again (Cuda and Vulkan tested), it was broken on previous Croco versions.

Full Changelog: v1.78003_b4067...v1.80001_b4229

Most credits go to Concedo, for KoboldCPP, and to the LlamaCPP team.

Assets 5

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Releases: Nexesenex/croco.cpp

Croco.Cpp_FrankenFork_v1.82003_b4455

Uh oh!

Croco.Cpp_FrankenFork_v1.82002_b4450

Uh oh!

Croco.Cpp_FrankenFork_v1.82001_b4435

Uh oh!

Croco.Cpp_FrankenFork_v1.81102_b4407

Uh oh!

Croco.Cpp_FrankenFork_v1.81100_b4407

Uh oh!

Croco.Cpp_FrankenFork_v1.81001_b4407

Uh oh!

Croco.Cpp_FrankenFork_v1.80301_b3485

Uh oh!

Croco.Cpp_FrankenFork_v1.80300_b3485

Uh oh!

Croco.Cpp_FrankenFork_v1.80002_b4229

Uh oh!

Croco.Cpp_FrankenFork_v1.80001_b4229

Uh oh!