Skip to content

Commit 2138561

Browse files
authored
fix(server): Propagate flash_attn to model load. (#1424)
1 parent 2117122 commit 2138561

File tree

1 file changed

+1
-0
lines changed

1 file changed

+1
-0
lines changed

llama_cpp/server/model.py

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -242,6 +242,7 @@ def load_llama_from_model_settings(settings: ModelSettings) -> llama_cpp.Llama:
242242
logits_all=settings.logits_all,
243243
embedding=settings.embedding,
244244
offload_kqv=settings.offload_kqv,
245+
flash_attn=settings.flash_attn,
245246
# Sampling Params
246247
last_n_tokens_size=settings.last_n_tokens_size,
247248
# LoRA Params

0 commit comments

Comments
 (0)