Skip to content

Releases: Nexesenex/croco.cpp

Croco.Cpp_FrankenFork_v1.86010_4885

14 Mar 20:40
Compare
Choose a tag to compare

Test version post llama_context refactor, following Concedo's merge in KCPP.

Context shift with Gemma 3 (tested with KV 16) seems to be working on my side too.

2 GGML/Cuda "IK" ops are lost, fused_unary and fused_rms_norm, because I'm unable to refactor their LCPP segment. Expect (maybe) a couple of percents of performance loss.

Full Changelog: v1.86004_b4878...v1.86010_4885

Croco.Cpp_FrankenFork_v1.86004_b4878

13 Mar 15:27
Compare
Choose a tag to compare

Hello Gemma 3!

Note : Context-shift is not working yet with Gemma 3.

Credits go as usual to our upstream and upper-stream benefactors!

Full Changelog: v1.86001_b4854...v1.86004_b4878

Croco.Cpp_FrankenFork_v1.86001_b4854

07 Mar 20:43
Compare
Choose a tag to compare

Croco.Cpp_FrankenFork_v1.84200_b4726

16 Feb 17:56
Compare
Choose a tag to compare

Croco.Cpp_FrankenFork_v1.84000_b4722

16 Feb 01:45
Compare
Choose a tag to compare

Croco.Cpp_FrankenFork_v1.83110_b4717

15 Feb 00:02
Compare
Choose a tag to compare

Croco.Cpp_FrankenFork_v1.83100_b4675

09 Feb 16:01
Compare
Choose a tag to compare

With a nooby fix for Draft KVQ cache settings inconsistencies.

Full Changelog: v1.83020_b4467...v1.83100_b4675

Croco.Cpp_FrankenFork_v1.83020_b4667

08 Feb 08:52
Compare
Choose a tag to compare

Test release for Ampere and beyond, for now.

New mod on Croco:

  • KV cache quantization customizable independently for the draft model used for speculative decoding.

-> My favored choice for the draft model one is KV q5_0/iq4_nl (5BPW), but even iq4_nl/iq4_nl (4.5BPW) is viable, notably if you use an iq4_xs (or less) quant for your draft model.

-> As for the main model, i suggest KV q8_0/q5_0 (7BPW) if you're about quality, q6_0/q5_0 (6BPW) if you're about a compromise, and q6_0/iq4_nl if you want to be savvy (5.5BPW) without much loss. q5_0/iq4_nl being the minimum not-too-lossy (less than +1% ppl compared to KV f16) in such case.

-> KV q4_0 (4.5BPW, +5% ppl) is left for legacy purpose, and in case of bug, but if KV iq4_nl works for you, got for it, it's the same size and.. MUCH better (+2.5% ppl).

Also, reduction of the Blas Batch Size for the draft model : a draft model is logically smaller than the main model, so both TG and PP are much faster, and thus, the BBS can be shrunk.

Also, all the supported FA KV quants are back, the previous Croco versions were shrunk a bit by being compiled without some of the usual FA quants.

Full Changelog: v1.83007_b4608...v1.83020_b4667

v1.83007_b4608

01 Feb 20:04
Compare
Choose a tag to compare

Croco.Cpp_FrankenFork_v1.83005_b4569

29 Jan 20:13
Compare
Choose a tag to compare