Skip to content
This repository has been archived by the owner on May 28, 2024. It is now read-only.

Commit

Permalink
Update models/README.md
Browse files Browse the repository at this point in the history
Co-authored-by: shrekris-anyscale <92341594+shrekris-anyscale@users.noreply.github.com>
Signed-off-by: Sihan Wang <sihanwang41@gmail.com>
  • Loading branch information
sihanwang41 and shrekris-anyscale authored Jan 8, 2024
1 parent 9e458e1 commit b996bbf
Showing 1 changed file with 1 addition and 1 deletion.
2 changes: 1 addition & 1 deletion models/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -30,7 +30,7 @@ Ray Actors during deployments (using `ray_actor_options`). We recommend using th

Engine is the abstraction for interacting with a model. It is responsible for scheduling and running the model inside a Ray Actor worker group.

The `engine_config` section specifies the model ID (`model_id`), how to initialize it and what parameters to use when generating tokens with an LLM.
The `engine_config` section specifies the model ID (`model_id`), how to initialize it, and what parameters to use when generating tokens with an LLM.

RayLLM supports continuous batching, meaning incoming requests are processed as soon as they arrive, and can be added to batches that are already being processed. This means that the model is not slowed down by certain sentences taking longer to generate than others. RayLLM also supports quantization, meaning compressed models can be deployed with cheaper hardware requirements. For more details on using quantized models in RayLLM, see the [quantization guide](continuous_batching/quantization/README.md).

Expand Down

0 comments on commit b996bbf

Please sign in to comment.