Skip to content

Commit 5645799

Browse files
omrishivsumitd2
authored andcommitted
[Doc] neuron documentation update (vllm-project#8671)
Signed-off-by: omrishiv <327609+omrishiv@users.noreply.github.com> Signed-off-by: Sumit Dubey <sumit.dubey2@ibm.com>
1 parent d51c1d4 commit 5645799

File tree

2 files changed

+3
-3
lines changed

2 files changed

+3
-3
lines changed

docs/source/getting_started/neuron-installation.rst

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -3,8 +3,8 @@
33
Installation with Neuron
44
========================
55

6-
vLLM 0.3.3 onwards supports model inferencing and serving on AWS Trainium/Inferentia with Neuron SDK.
7-
At the moment Paged Attention is not supported in Neuron SDK, but naive continuous batching is supported in transformers-neuronx.
6+
vLLM 0.3.3 onwards supports model inferencing and serving on AWS Trainium/Inferentia with Neuron SDK with continuous batching.
7+
Paged Attention and Chunked Prefill are currently in development and will be available soon.
88
Data types currently supported in Neuron SDK are FP16 and BF16.
99

1010
Requirements

docs/source/index.rst

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -43,7 +43,7 @@ vLLM is flexible and easy to use with:
4343
* Tensor parallelism and pipeline parallelism support for distributed inference
4444
* Streaming outputs
4545
* OpenAI-compatible API server
46-
* Support NVIDIA GPUs, AMD CPUs and GPUs, Intel CPUs and GPUs, PowerPC CPUs, TPU, and AWS Neuron.
46+
* Support NVIDIA GPUs, AMD CPUs and GPUs, Intel CPUs and GPUs, PowerPC CPUs, TPU, and AWS Trainium and Inferentia Accelerators.
4747
* Prefix caching support
4848
* Multi-lora support
4949

0 commit comments

Comments
 (0)