Docs: add tutorials on speculative decoding main page and EAGLE sub page #131

ziqif-nv · 2025-02-15T00:01:05Z

Send out for a quick review within the team.

Will keep polishing

ganeshku1 · 2025-02-15T00:50:58Z

Feature_Guide/Speculative_Decoding/EAGLE/README.md

+  --input-file /data/converted_humaneval.jsonl \
+  --tokenizer /hf-models/vicuna-7b-v1.3/ \
+  --concurrency 1 \
+  --measurement-interval 4000 \


Would recommend to use request-rate option rather measurement interval as for long tests these are often requires longer stabalization window.
We are recommending GenAI-perf customers to start using request-rate as default option.

updated. thanks!

ganeshku1 · 2025-02-15T00:53:29Z

Feature_Guide/Speculative_Decoding/EAGLE/README.md

+
+2. Get Gen-AI Perf Tool
+
+Gen-AI Perf is available in the SDK container as shown in the [Send an Inference Request](#send-an-inference-request) section. The only difference is that you need to mount the converted dataset to the container:


This link to me look likes is broken can you please confirm?

I verified that the link works for me. maybe it is PR render issue on github

ganeshku1 · 2025-02-15T00:54:58Z

Feature_Guide/Speculative_Decoding/EAGLE/README.md

+2. Get Gen-AI Perf Tool
+
+Gen-AI Perf is available in the SDK container as shown in the [Send an Inference Request](#send-an-inference-request) section. The only difference is that you need to mount the converted dataset to the container:
+


I am not sure if you need to add installation section for GenAI-Perf here.
I created a smaple PR here for your reference: https://github.com/sgl-project/sglang/pull/3552/files

updated the PR by utilizing the README you pointed. thanks!

statiraju · 2025-02-15T04:41:52Z

Feature_Guide/Speculative_Decoding/README.md

+If the first assumption holds true, the latency of speculative decoding will no worse than the standard approach. If the second holds, output token generation advances by statistically more than one token per forward pass.
+The combination of both these allows speculative decoding to result in reduced latency.
+
+## Performance Improvements


i woud like to see under what specific tasks would it be efficient to use. what kind of models should be used. draft models and the target model examples if any. I would like to see such recommendations.

ziqif-nv added 2 commits February 14, 2025 16:00

add tutorials on speculative decoding main page and EAGLE sub page

c722c19

minor change

d481e41

ziqif-nv requested review from harryskim, krishung5, oandreeva-nv and statiraju February 15, 2025 00:07

minor

036a84d

ziqif-nv changed the title ~~add tutorials on speculative decoding main page and EAGLE sub page~~ Docs: add tutorials on speculative decoding main page and EAGLE sub page Feb 15, 2025

ziqif-nv marked this pull request as draft February 15, 2025 00:14

ziqif-nv requested a review from ganeshku1 February 15, 2025 00:24

ganeshku1 reviewed Feb 15, 2025

View reviewed changes

statiraju reviewed Feb 15, 2025

View reviewed changes

address comments

d2283fb

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Docs: add tutorials on speculative decoding main page and EAGLE sub page #131

Docs: add tutorials on speculative decoding main page and EAGLE sub page #131

ziqif-nv commented Feb 15, 2025

ganeshku1 Feb 15, 2025

ziqif-nv Feb 20, 2025

ganeshku1 Feb 15, 2025

ziqif-nv Feb 20, 2025

ganeshku1 Feb 15, 2025

ziqif-nv Feb 20, 2025

statiraju Feb 15, 2025


		2. Get Gen-AI Perf Tool

		Gen-AI Perf is available in the SDK container as shown in the [Send an Inference Request](#send-an-inference-request) section. The only difference is that you need to mount the converted dataset to the container:

Docs: add tutorials on speculative decoding main page and EAGLE sub page #131

Are you sure you want to change the base?

Docs: add tutorials on speculative decoding main page and EAGLE sub page #131

Conversation

ziqif-nv commented Feb 15, 2025

ganeshku1 Feb 15, 2025

Choose a reason for hiding this comment

ziqif-nv Feb 20, 2025

Choose a reason for hiding this comment

ganeshku1 Feb 15, 2025

Choose a reason for hiding this comment

ziqif-nv Feb 20, 2025

Choose a reason for hiding this comment

ganeshku1 Feb 15, 2025

Choose a reason for hiding this comment

ziqif-nv Feb 20, 2025

Choose a reason for hiding this comment

statiraju Feb 15, 2025

Choose a reason for hiding this comment