Skip to content

Commit 595e1d7

Browse files
committed
[Security] Document StatelessProcessGroup security concerns
A recent PR, #15988, improved StatelessProcessGroup to ensure the torch.distributed TCPStore uses the specified IP address unless of binding to all interfaces. Upon closer inspection, this is quite important, as the way vllm is using this TCPStore includes pickled data, so malicious access to the TCPStore would allow remote code execution on a vllm host. Update some places throughout the code base to reflect the importance of specifying a secured IP addres for use with this interface. Finally, fix a couple places in tests to explicitly use localhost instead of the IP we find that's (probably) the one used for the host's default route. Otherwise, a host running these tests is briefly vulnerable on the IP address chosen. Signed-off-by: Russell Bryant <rbryant@redhat.com>
1 parent 294fc1e commit 595e1d7

File tree

4 files changed

+20
-4
lines changed

4 files changed

+20
-4
lines changed

examples/offline_inference/rlhf.py

Lines changed: 6 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -12,6 +12,12 @@
1212
inference instance. In practice, there could be multiple training instances
1313
and multiple inference instances. For the full implementation, please refer
1414
to the OpenRLHF framework.
15+
16+
It is important to set `VLLM_HOST_IP` to an address on a secure network when
17+
using this example. Unsecured communications between components will be used
18+
over this IP address and should NOT be exposed to untrusted networks. For more
19+
information, see:
20+
https://docs.vllm.ai/en/latest/deployment/security.html
1521
"""
1622

1723
import os

examples/offline_inference/rlhf_utils.py

Lines changed: 6 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -27,8 +27,14 @@ class WorkerExtension:
2727
By defining an extension class, the code can work no matter what is
2828
the underlying worker class. This way, the code can be compatible
2929
with both vLLM V0 and V1.
30+
3031
NOTE: we define this class in a separate module, and the main module
3132
should pass the full qualified name as `worker_extension_cls` argument.
33+
34+
The `master_address` parameter should be an address on a secure network that
35+
is ideally completely isolated. Services used on this network are insecure
36+
and will make the system vulnerable to remote code execution if exposed to
37+
malicious parties.
3238
"""
3339

3440
def init_weight_update_group(

tests/distributed/test_same_node.py

Lines changed: 4 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -7,20 +7,20 @@
77

88
from vllm.distributed.parallel_state import in_the_same_node_as
99
from vllm.distributed.utils import StatelessProcessGroup
10-
from vllm.utils import get_ip, get_open_port
10+
from vllm.utils import get_open_port
1111

1212
if __name__ == "__main__":
1313
dist.init_process_group(backend="gloo")
1414

1515
rank = dist.get_rank()
1616
if rank == 0:
1717
port = get_open_port()
18-
ip = get_ip()
18+
ip = "127.0.0.1"
1919
dist.broadcast_object_list([ip, port], src=0)
2020
else:
21-
recv = [None, None]
21+
recv = [None, None] # type: ignore
2222
dist.broadcast_object_list(recv, src=0)
23-
ip, port = recv
23+
ip, port = recv # type: ignore
2424

2525
stateless_pg = StatelessProcessGroup.create(ip, port, rank,
2626
dist.get_world_size())

vllm/distributed/utils.py

Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -388,6 +388,10 @@ def create(
388388
used for exchanging metadata. With this function, process A and process B
389389
can call `StatelessProcessGroup.create` to form a group, and then process A, B,
390390
C, and D can call `StatelessProcessGroup.create` to form another group.
391+
392+
The `host` parameter should be an address on a secure network that is ideally
393+
completely isolated. Services used on this network are insecure and will make
394+
the system vulnerable to remote code execution if exposed to malicious parties.
391395
""" # noqa
392396
launch_server = rank == 0
393397
if launch_server:

0 commit comments

Comments
 (0)