You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
1. I have searched related issues but cannot get the expected help.
2. The bug has not been fixed in the latest version.
3. Please note that if the bug-related issue you submitted lacks corresponding environment info and a minimal reproducible demo, it will be challenging for us to reproduce and resolve the issue, reducing the likelihood of receiving feedback.
5. Please use English, otherwise it will be closed.
Describe the bug
For NVIDIA H100 MIG running with NVIDIA GPU Operator in k8s, nvidia-smi doesn't show available memory, and either NVML or DCGM APIs need to be used. In that case get_nvgpu_memory_capacity() causes a crash with the error log: Traceback (most recent call last): File "/usr/lib/python3.10/runpy.py", line 196, in _run_module_as_main return _run_code(code, main_globals, None, File "/usr/lib/python3.10/runpy.py", line 86, in _run_code exec(code, run_globals) File "/sgl-workspace/sglang/python/sglang/launch_server.py", line 11, in <module> server_args = prepare_server_args(sys.argv[1:]) File "/sgl-workspace/sglang/python/sglang/srt/server_args.py", line 918, in prepare_server_args server_args = ServerArgs.from_cli_args(raw_args) File "/sgl-workspace/sglang/python/sglang/srt/server_args.py", line 870, in from_cli_args return cls(**{attr: getattr(args, attr) for attr in attrs}) File "<string>", line 92, in __init__ File "/sgl-workspace/sglang/python/sglang/srt/server_args.py", line 175, in __post_init__ gpu_mem = get_nvgpu_memory_capacity() File "/sgl-workspace/sglang/python/sglang/srt/utils.py", line 955, in get_nvgpu_memory_capacity raise ValueError("No GPU memory values found.") ValueError: No GPU memory values found.
Reproduction
python3 -m sglang.launch_server --model-path liuhaotian/llava-v1.5-13b --served-model-name llava --tokenizer-path llava-hf/llava-1.5-13b-hf --chat-template vicuna_v1.1 --port 30000 --trust-remote-code --port 8000
on an H100 MIG or mocking nvidia-smi being unable to report available memory.
Environment
Python: 3.10.16 (main, Dec 4 2024, 08:53:37) [GCC 9.4.0]
CUDA available: True
GPU 0: NVIDIA H100 80GB HBM3 MIG 3g.40gb
GPU 0 Compute Capability: 9.0
CUDA_HOME: /usr/local/cuda
NVCC: Cuda compilation tools, release 12.4, V12.4.131
CUDA Driver Version: 550.90.07
PyTorch: 2.5.1+cu124
flashinfer: 0.1.6+cu124torch2.4
triton: 3.1.0
transformers: 4.48.0
torchao: 0.7.0
numpy: 1.26.4
aiohttp: 3.11.11
fastapi: 0.115.6
hf_transfer: 0.1.9
huggingface_hub: 0.27.1
interegular: 0.3.3
modelscope: 1.22.1
orjson: 3.10.14
packaging: 24.2
psutil: 6.1.1
pydantic: 2.10.5
multipart: 0.0.20
zmq: 26.2.0
uvicorn: 0.34.0
uvloop: 0.21.0
vllm: 0.6.4.post1
openai: 1.59.7
anthropic: 0.43.0
decord: 0.6.0
NVIDIA Topology:
GPU0 CPU Affinity NUMA Affinity GPU NUMA ID
GPU0 X 48-95,144-191 1 N/A
Legend:
X = Self
SYS = Connection traversing PCIe as well as the SMP interconnect between NUMA nodes (e.g., QPI/UPI)
NODE = Connection traversing PCIe as well as the interconnect between PCIe Host Bridges within a NUMA node
PHB = Connection traversing PCIe as well as a PCIe Host Bridge (typically the CPU)
It is this issue: NVIDIA/nvidia-container-toolkit#842
which requires the container toolkit to be run with elevated privileges, which isn't feasible on multi-host services where multiple customer workloads might be on the same node.
Specifically the output of that nvidia-smi command in such an environment is
[Insufficient Permissions]
Checklist
Describe the bug
For NVIDIA H100 MIG running with NVIDIA GPU Operator in k8s, nvidia-smi doesn't show available memory, and either NVML or DCGM APIs need to be used. In that case get_nvgpu_memory_capacity() causes a crash with the error log:
Traceback (most recent call last): File "/usr/lib/python3.10/runpy.py", line 196, in _run_module_as_main return _run_code(code, main_globals, None, File "/usr/lib/python3.10/runpy.py", line 86, in _run_code exec(code, run_globals) File "/sgl-workspace/sglang/python/sglang/launch_server.py", line 11, in <module> server_args = prepare_server_args(sys.argv[1:]) File "/sgl-workspace/sglang/python/sglang/srt/server_args.py", line 918, in prepare_server_args server_args = ServerArgs.from_cli_args(raw_args) File "/sgl-workspace/sglang/python/sglang/srt/server_args.py", line 870, in from_cli_args return cls(**{attr: getattr(args, attr) for attr in attrs}) File "<string>", line 92, in __init__ File "/sgl-workspace/sglang/python/sglang/srt/server_args.py", line 175, in __post_init__ gpu_mem = get_nvgpu_memory_capacity() File "/sgl-workspace/sglang/python/sglang/srt/utils.py", line 955, in get_nvgpu_memory_capacity raise ValueError("No GPU memory values found.") ValueError: No GPU memory values found.
Reproduction
python3 -m sglang.launch_server --model-path liuhaotian/llava-v1.5-13b --served-model-name llava --tokenizer-path llava-hf/llava-1.5-13b-hf --chat-template vicuna_v1.1 --port 30000 --trust-remote-code --port 8000
on an H100 MIG or mocking nvidia-smi being unable to report available memory.
Environment
Python: 3.10.16 (main, Dec 4 2024, 08:53:37) [GCC 9.4.0]
CUDA available: True
GPU 0: NVIDIA H100 80GB HBM3 MIG 3g.40gb
GPU 0 Compute Capability: 9.0
CUDA_HOME: /usr/local/cuda
NVCC: Cuda compilation tools, release 12.4, V12.4.131
CUDA Driver Version: 550.90.07
PyTorch: 2.5.1+cu124
flashinfer: 0.1.6+cu124torch2.4
triton: 3.1.0
transformers: 4.48.0
torchao: 0.7.0
numpy: 1.26.4
aiohttp: 3.11.11
fastapi: 0.115.6
hf_transfer: 0.1.9
huggingface_hub: 0.27.1
interegular: 0.3.3
modelscope: 1.22.1
orjson: 3.10.14
packaging: 24.2
psutil: 6.1.1
pydantic: 2.10.5
multipart: 0.0.20
zmq: 26.2.0
uvicorn: 0.34.0
uvloop: 0.21.0
vllm: 0.6.4.post1
openai: 1.59.7
anthropic: 0.43.0
decord: 0.6.0
NVIDIA Topology:
GPU0 X 48-95,144-191 1 N/A
Legend:
X = Self
SYS = Connection traversing PCIe as well as the SMP interconnect between NUMA nodes (e.g., QPI/UPI)
NODE = Connection traversing PCIe as well as the interconnect between PCIe Host Bridges within a NUMA node
PHB = Connection traversing PCIe as well as a PCIe Host Bridge (typically the CPU)
PXB = Connection traversing multiple PCIe bridges (without traversing the PCIe Host Bridge)
PIX = Connection traversing at most a single PCIe bridge
NV# = Connection traversing a bonded set of # NVLinks
Hypervisor vendor: KVM
ulimit soft: 1048576
The text was updated successfully, but these errors were encountered: