From 44047db124b356c5edf3a11eef7bca6891049230 Mon Sep 17 00:00:00 2001
From: Sihan Wang <sihanwang41@gmail.com>
Date: Mon, 8 Jan 2024 14:02:42 -0800
Subject: [PATCH] Update models/README.md

Co-authored-by: shrekris-anyscale <92341594+shrekris-anyscale@users.noreply.github.com>
Signed-off-by: Sihan Wang <sihanwang41@gmail.com>
---
 models/README.md | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/models/README.md b/models/README.md
index 699b010b..56a3cc6e 100644
--- a/models/README.md
+++ b/models/README.md
@@ -54,7 +54,7 @@ RayLLM supports continuous batching, meaning incoming requests are processed as
 * `logger_level` is to configure log level for TensorRT-LLM engine. ("VERBOSE", "INFO", "WARNING", "ERROR")
 * `max_num_sequences` is the maximum number of requests/sequences the backend can maintain state
 * `max_tokens_in_paged_kv_cache` sets the maximum number of tokens in the paged kv cache.
-* `kv_cache_free_gpu_mem_fraction` is to configure K-V Cache free gpu memory fraction.
+* `kv_cache_free_gpu_mem_fraction` sets the K-V Cache free gpu memory fraction.
 
 #### Embedding Engine Config
 * `model_id` is the ID that refers to the model in the RayLLM or OpenAI API.