You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Same happens without EP. TP16 with Ray is working fine.
Before submitting a new issue...
Make sure you already searched for relevant issues, and asked the chatbot living at the bottom right corner of the documentation page, which can answer lots of frequently asked questions.
The text was updated successfully, but these errors were encountered:
VLLM_ALL2ALL_BACKEND="deepep_high_throughput" -- This will use the DeepEP (high-throughput case) kernels. Note that this will force the model to run in eager mode. or,
VLLM_ALL2ALL_BACKEND="deepep_low_latency" -- This will use the DeepEP low-latency kernels. This is CUDA Graph compatible.
you can find the list of supported backends here -
Your current environment
The output of
python collect_env.py
🐛 Describe the bug
I am trying to run DSR1 with DEP16 on 2 nodes 8xH100 each.
On node 1 (10.52.51.17):
On node 2:
vllm serve deepseek-ai/DeepSeek-R1 --served_model_name deepseek-ai/DeepSeek-R1 - -data_parallel_size 16 --data_parallel_size_local 8 --data_parallel_address 10.52.51.17 --data_parallel_rpc_port 13345 --max-mod el-len 10240 --enable-expert-parallel --data_parallel_start_rank 8 --headless &
The model is loading weights, running torch compile and allocating kv cache, then hangs with 100% GPU utilization until NCCL timeout error.
Full logs attached.
node1.txt
node2.txt
Same happens without EP. TP16 with Ray is working fine.
Before submitting a new issue...
The text was updated successfully, but these errors were encountered: