You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
The root cause of this issue lies in an API compatibility mismatch: VLLM is utilizing PyTorch 2.6 APIs, whereas torch_npu is limited to version 2.5. To address this incompatibility, I suggest applying the following patch to the VLLM codebase:
diff --git a/vllm/distributed/utils.py b/vllm/distributed/utils.py
index e4d4008cd..5c7794d78 100644
--- a/vllm/distributed/utils.py+++ b/vllm/distributed/utils.py@@ -318,10 +318,12 @@ def stateless_init_torch_distributed_process_group(
# different systems (e.g. RPC) in case the store is multi-tenant.
prefix_store = PrefixStore(init_method, store)
+ pg_options = ProcessGroup.Options(backend=backend, timeout=timeout)
pg: ProcessGroup = ProcessGroup(
prefix_store,
group_rank,
group_size,
+ pg_options,
)
if backend == "gloo":
@@ -346,7 +348,7 @@ def stateless_init_torch_distributed_process_group(
else:
raise RuntimeError(f"Unsupported torch distributed backend: {backend}")
- pg._set_default_backend(backend_type)+ # pg._set_default_backend(backend_type)
backend_class._set_sequence_number_for_group()
pg._register_backend(device, backend_type, backend_class)
Your current environment
The output of `python collect_env.py`
🐛 Describe the bug
The text was updated successfully, but these errors were encountered: