Skip to content

Commit 0b6e5a2

Browse files
hmellorYuqi Zhang
authored and
Yuqi Zhang
committed
Improve examples rendering in docs and GitHub (vllm-project#18203)
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com> Signed-off-by: Yuqi Zhang <yuqizhang@google.com>
1 parent 63f2597 commit 0b6e5a2

File tree

7 files changed

+27
-10
lines changed

7 files changed

+27
-10
lines changed
Lines changed: 9 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,9 @@
1+
# Disaggregated Prefill V1
2+
3+
This example contains scripts that demonstrate disaggregated prefill in the offline setting of vLLM.
4+
5+
## Files
6+
7+
- `run.sh` - A helper script that will run `prefill_example.py` and `decode_example.py` sequentially.
8+
- `prefill_example.py` - A script which performs prefill only, saving the KV state to the `local_storage` directory and the prompts to `output.txt`.
9+
- `decode_example.py` - A script which performs decode only, loading the KV state from the `local_storage` directory and the prompts from `output.txt`.

examples/offline_inference/openai/openai_batch.md renamed to examples/offline_inference/openai_batch/README.md

Lines changed: 9 additions & 9 deletions
Original file line numberDiff line numberDiff line change
@@ -8,7 +8,7 @@ This is a guide to performing batch inference using the OpenAI batch file format
88

99
The OpenAI batch file format consists of a series of json objects on new lines.
1010

11-
[See here for an example file.](https://github.com/vllm-project/vllm/blob/main/examples/offline_inference/openai/openai_example_batch.jsonl)
11+
[See here for an example file.](https://github.com/vllm-project/vllm/blob/main/examples/offline_inference/openai_batch/openai_example_batch.jsonl)
1212

1313
Each line represents a separate request. See the [OpenAI package reference](https://platform.openai.com/docs/api-reference/batch/requestInput) for more details.
1414

@@ -30,13 +30,13 @@ We currently support `/v1/chat/completions`, `/v1/embeddings`, and `/v1/score` e
3030
To follow along with this example, you can download the example batch, or create your own batch file in your working directory.
3131

3232
```console
33-
wget https://raw.githubusercontent.com/vllm-project/vllm/main/examples/offline_inference/openai/openai_example_batch.jsonl
33+
wget https://raw.githubusercontent.com/vllm-project/vllm/main/examples/offline_inference/openai_batch/openai_example_batch.jsonl
3434
```
3535

3636
Once you've created your batch file it should look like this
3737

3838
```console
39-
$ cat offline_inference/openai/openai_example_batch.jsonl
39+
$ cat offline_inference/openai_batch/openai_example_batch.jsonl
4040
{"custom_id": "request-1", "method": "POST", "url": "/v1/chat/completions", "body": {"model": "meta-llama/Meta-Llama-3-8B-Instruct", "messages": [{"role": "system", "content": "You are a helpful assistant."},{"role": "user", "content": "Hello world!"}],"max_completion_tokens": 1000}}
4141
{"custom_id": "request-2", "method": "POST", "url": "/v1/chat/completions", "body": {"model": "meta-llama/Meta-Llama-3-8B-Instruct", "messages": [{"role": "system", "content": "You are an unhelpful assistant."},{"role": "user", "content": "Hello world!"}],"max_completion_tokens": 1000}}
4242
```
@@ -48,7 +48,7 @@ The batch running tool is designed to be used from the command line.
4848
You can run the batch with the following command, which will write its results to a file called `results.jsonl`
4949

5050
```console
51-
python -m vllm.entrypoints.openai.run_batch -i offline_inference/openai/openai_example_batch.jsonl -o results.jsonl --model meta-llama/Meta-Llama-3-8B-Instruct
51+
python -m vllm.entrypoints.openai.run_batch -i offline_inference/openai_batch/openai_example_batch.jsonl -o results.jsonl --model meta-llama/Meta-Llama-3-8B-Instruct
5252
```
5353

5454
### Step 3: Check your results
@@ -65,10 +65,10 @@ $ cat results.jsonl
6565

6666
The batch runner supports remote input and output urls that are accessible via http/https.
6767

68-
For example, to run against our example input file located at `https://raw.githubusercontent.com/vllm-project/vllm/main/examples/offline_inference/openai/openai_example_batch.jsonl`, you can run
68+
For example, to run against our example input file located at `https://raw.githubusercontent.com/vllm-project/vllm/main/examples/offline_inference/openai_batch/openai_example_batch.jsonl`, you can run
6969

7070
```console
71-
python -m vllm.entrypoints.openai.run_batch -i https://raw.githubusercontent.com/vllm-project/vllm/main/examples/offline_inference/openai/openai_example_batch.jsonl -o results.jsonl --model meta-llama/Meta-Llama-3-8B-Instruct
71+
python -m vllm.entrypoints.openai.run_batch -i https://raw.githubusercontent.com/vllm-project/vllm/main/examples/offline_inference/openai_batch/openai_example_batch.jsonl -o results.jsonl --model meta-llama/Meta-Llama-3-8B-Instruct
7272
```
7373

7474
## Example 3: Integrating with AWS S3
@@ -89,21 +89,21 @@ To integrate with cloud blob storage, we recommend using presigned urls.
8989
To follow along with this example, you can download the example batch, or create your own batch file in your working directory.
9090

9191
```console
92-
wget https://raw.githubusercontent.com/vllm-project/vllm/main/examples/offline_inference/openai/openai_example_batch.jsonl
92+
wget https://raw.githubusercontent.com/vllm-project/vllm/main/examples/offline_inference/openai_batch/openai_example_batch.jsonl
9393
```
9494

9595
Once you've created your batch file it should look like this
9696

9797
```console
98-
$ cat offline_inference/openai/openai_example_batch.jsonl
98+
$ cat offline_inference/openai_batch/openai_example_batch.jsonl
9999
{"custom_id": "request-1", "method": "POST", "url": "/v1/chat/completions", "body": {"model": "meta-llama/Meta-Llama-3-8B-Instruct", "messages": [{"role": "system", "content": "You are a helpful assistant."},{"role": "user", "content": "Hello world!"}],"max_completion_tokens": 1000}}
100100
{"custom_id": "request-2", "method": "POST", "url": "/v1/chat/completions", "body": {"model": "meta-llama/Meta-Llama-3-8B-Instruct", "messages": [{"role": "system", "content": "You are an unhelpful assistant."},{"role": "user", "content": "Hello world!"}],"max_completion_tokens": 1000}}
101101
```
102102

103103
Now upload your batch file to your S3 bucket.
104104

105105
```console
106-
aws s3 cp offline_inference/openai/openai_example_batch.jsonl s3://MY_BUCKET/MY_INPUT_FILE.jsonl
106+
aws s3 cp offline_inference/openai_batch/openai_example_batch.jsonl s3://MY_BUCKET/MY_INPUT_FILE.jsonl
107107
```
108108

109109
### Step 2: Generate your presigned urls
Lines changed: 8 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,8 @@
1+
# Disaggregated Serving
2+
3+
This example contains scripts that demonstrate the disaggregated serving features of vLLM.
4+
5+
## Files
6+
7+
- `disagg_proxy_demo.py` - Demonstrates XpYd (X prefill instances, Y decode instances).
8+
- `kv_events.sh` - Demonstrates KV cache event publishing.

examples/online_serving/disagg_examples/disagg_proxy_demo.py renamed to examples/online_serving/disaggregated_serving/disagg_proxy_demo.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -4,7 +4,7 @@
44
example usage of XpYd disaggregated prefilling.
55
We can launch multiple vllm instances (2 for prefill and 2 for decode), and
66
launch this proxy demo through:
7-
python3 examples/online_serving/disagg_examples/disagg_proxy_demo.py \
7+
python3 examples/online_serving/disaggregated_serving/disagg_proxy_demo.py \
88
--model $model_name \
99
--prefill localhost:8100 localhost:8101 \
1010
--decode localhost:8200 localhost:8201 \

0 commit comments

Comments
 (0)