From 44047db124b356c5edf3a11eef7bca6891049230 Mon Sep 17 00:00:00 2001 From: Sihan Wang Date: Mon, 8 Jan 2024 14:02:42 -0800 Subject: [PATCH] Update models/README.md Co-authored-by: shrekris-anyscale <92341594+shrekris-anyscale@users.noreply.github.com> Signed-off-by: Sihan Wang --- models/README.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/models/README.md b/models/README.md index 699b010b..56a3cc6e 100644 --- a/models/README.md +++ b/models/README.md @@ -54,7 +54,7 @@ RayLLM supports continuous batching, meaning incoming requests are processed as * `logger_level` is to configure log level for TensorRT-LLM engine. ("VERBOSE", "INFO", "WARNING", "ERROR") * `max_num_sequences` is the maximum number of requests/sequences the backend can maintain state * `max_tokens_in_paged_kv_cache` sets the maximum number of tokens in the paged kv cache. -* `kv_cache_free_gpu_mem_fraction` is to configure K-V Cache free gpu memory fraction. +* `kv_cache_free_gpu_mem_fraction` sets the K-V Cache free gpu memory fraction. #### Embedding Engine Config * `model_id` is the ID that refers to the model in the RayLLM or OpenAI API.