Update models/README.md

Co-authored-by: shrekris-anyscale <92341594+shrekris-anyscale@users.noreply.github.com> Signed-off-by: Sihan Wang <sihanwang41@gmail.com>
ray-project · Jan 8, 2024 · 44047db · 44047db
1 parent b1f058e
commit 44047db
Showing 1 changed file with 1 addition and 1 deletion.
diff --git a/models/README.md b/models/README.md
@@ -54,7 +54,7 @@ RayLLM supports continuous batching, meaning incoming requests are processed as
 * `logger_level` is to configure log level for TensorRT-LLM engine. ("VERBOSE", "INFO", "WARNING", "ERROR")
 * `max_num_sequences` is the maximum number of requests/sequences the backend can maintain state
 * `max_tokens_in_paged_kv_cache` sets the maximum number of tokens in the paged kv cache.
-* `kv_cache_free_gpu_mem_fraction` is to configure K-V Cache free gpu memory fraction.
+* `kv_cache_free_gpu_mem_fraction` sets the K-V Cache free gpu memory fraction.
 
 #### Embedding Engine Config
 * `model_id` is the ID that refers to the model in the RayLLM or OpenAI API.