Skip to content
This repository has been archived by the owner on May 28, 2024. It is now read-only.

Commit

Permalink
Update models/README.md
Browse files Browse the repository at this point in the history
Co-authored-by: shrekris-anyscale <92341594+shrekris-anyscale@users.noreply.github.com>
Signed-off-by: Sihan Wang <sihanwang41@gmail.com>
  • Loading branch information
sihanwang41 and shrekris-anyscale authored Jan 8, 2024
1 parent b1f058e commit 44047db
Showing 1 changed file with 1 addition and 1 deletion.
2 changes: 1 addition & 1 deletion models/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -54,7 +54,7 @@ RayLLM supports continuous batching, meaning incoming requests are processed as
* `logger_level` is to configure log level for TensorRT-LLM engine. ("VERBOSE", "INFO", "WARNING", "ERROR")
* `max_num_sequences` is the maximum number of requests/sequences the backend can maintain state
* `max_tokens_in_paged_kv_cache` sets the maximum number of tokens in the paged kv cache.
* `kv_cache_free_gpu_mem_fraction` is to configure K-V Cache free gpu memory fraction.
* `kv_cache_free_gpu_mem_fraction` sets the K-V Cache free gpu memory fraction.

#### Embedding Engine Config
* `model_id` is the ID that refers to the model in the RayLLM or OpenAI API.
Expand Down

0 comments on commit 44047db

Please sign in to comment.