Update sampling API for llama.cpp #1742

abetlen · 2024-09-15T20:10:33Z

Updates llama-cpp-python to use the new llama.cpp sampler api.

Known Issues:

Grammar parser seems to silently fail when there is a syntax error in the gbnf grammar

e-c-d · 2024-09-16T03:10:38Z

You probably know this already, but the llama_context_params.seed field should be removed everywhere:

diff --git a/llama_cpp/llama_cpp.py b/llama_cpp/llama_cpp.py
index efec065..fa53135 100644
--- a/llama_cpp/llama_cpp.py
+++ b/llama_cpp/llama_cpp.py
@@ -850,11 +850,10 @@ class llama_model_params(ctypes.Structure):
 # };
 class llama_context_params(ctypes.Structure):
     """Parameters for llama_context
 
     Attributes:
-        seed (int): RNG seed, -1 for random
         n_ctx (int): text context, 0 = from model
         n_batch (int): logical maximum batch size that can be submitted to llama_decode
         n_ubatch (int): physical maximum batch size
         n_seq_max (int): max number of sequences (i.e. distinct states for recurrent models)
         n_threads (int): number of threads to use for generation
@@ -881,11 +880,10 @@ class llama_context_params(ctypes.Structure):
         abort_callback (ggml_abort_callback): abort callback if it returns true, execution of llama_decode() will be aborted
         abort_callback_data (ctypes.ctypes.c_void_p): data for abort_callback
     """
 
     if TYPE_CHECKING:
-        seed: int
         n_ctx: int
         n_batch: int
         n_ubatch: int
         n_seq_max: int
         n_threads: int
@@ -911,11 +909,10 @@ class llama_context_params(ctypes.Structure):
         flash_attn: bool
         abort_callback: Callable[[ctypes.c_void_p], bool]
         abort_callback_data: ctypes.c_void_p
 
     _fields_ = [
-        ("seed", ctypes.c_uint32),
         ("n_ctx", ctypes.c_uint32),
         ("n_batch", ctypes.c_uint32),
         ("n_ubatch", ctypes.c_uint32),
         ("n_seq_max", ctypes.c_uint32),
         ("n_threads", ctypes.c_int32),

abetlen · 2024-09-17T17:34:48Z

@e-c-d yes thank you forgot to push this!

…thon into update-sampling-api

abetlen added 3 commits September 15, 2024 16:07

Initial samplng api update

7893c8f

Fix logger

f9db9b7

Update tests

0e65028

abetlen added 2 commits September 17, 2024 13:31

Update

172fb0a

Remove seed

a757e2b

abetlen added 11 commits September 18, 2024 11:24

Add sampling chain

ba1f1b2

Remove unnused test

aa91ec0

Use Qwen2 0.5B for ci tests

d4fde4c

Fix typo

4cfb46d

Fix typo

de267c7

Update cache version

56c1f45

Use real model for tests

2897fc2

Add huggingface-hub as a test dependency

0cff3d0

Merge branch 'main' into update-sampling-api

903781a

Remove RUST_LOG=trace

2102eb9

Merge branch 'main' into update-sampling-api

2e4c826

abetlen marked this pull request as ready for review September 18, 2024 23:55

abetlen added 2 commits September 18, 2024 19:59

Add actual logit processor test

334ca35

Merge branch 'update-sampling-api' of github.com:abetlen/llama-cpp-py…

7455b9f

…thon into update-sampling-api

abetlen merged commit f8fcb3e into main Sep 19, 2024
1 check passed

mite51 mentioned this pull request Oct 20, 2024

low level examples broken after [feat: Update sampling API for llama.cpp (#1742)] #1803

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Update sampling API for llama.cpp #1742

Update sampling API for llama.cpp #1742

Uh oh!

abetlen commented Sep 15, 2024 •

edited

Loading

Uh oh!

e-c-d commented Sep 16, 2024

Uh oh!

abetlen commented Sep 17, 2024

Uh oh!

Uh oh!

Uh oh!

Update sampling API for llama.cpp #1742

Update sampling API for llama.cpp #1742

Uh oh!

Conversation

abetlen commented Sep 15, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

e-c-d commented Sep 16, 2024

Uh oh!

abetlen commented Sep 17, 2024

Uh oh!

Uh oh!

Uh oh!

abetlen commented Sep 15, 2024 •

edited

Loading