[Bug]: LLMEngine cannot be pickled error vllm 0.6.1.post2 #8778

stikkireddy · 2024-09-24T16:11:35Z

Your current environment

The output of `python collect_env.py`

Your output of `python collect_env.py` here

Model Input Dumps

model="Qwen/Qwen2.5-72B-Instruct"
guided_decoding_backend="outlines"
vllm_command_flags={
"--gpu-memory-utilization": 0.99,
"--max-num-seqs": 64,
}
max_model_len=8192,
library_overrides={
"vllm": "vllm==0.6.1.post2",
}

🐛 Describe the bug

ERROR 09-24 16:06:51 async_llm_engine.py:58] Engine background task failed
ERROR 09-24 16:06:51 async_llm_engine.py:58] Traceback (most recent call last):
ERROR 09-24 16:06:51 async_llm_engine.py:58]   File "/local_disk0/.ephemeral_nfs/envs/pythonEnv-0b16fcdd-4fa2-42af-bf13-918419d0b49a/lib/python3.11/site-packages/vllm/worker/model_runner_base.py", line 112, in _wrapper
ERROR 09-24 16:06:51 async_llm_engine.py:58]     return func(*args, **kwargs)
ERROR 09-24 16:06:51 async_llm_engine.py:58]            ^^^^^^^^^^^^^^^^^^^^^
ERROR 09-24 16:06:51 async_llm_engine.py:58]   File "/local_disk0/.ephemeral_nfs/envs/pythonEnv-0b16fcdd-4fa2-42af-bf13-918419d0b49a/lib/python3.11/site-packages/vllm/worker/model_runner.py", line 1546, in execute_model
ERROR 09-24 16:06:51 async_llm_engine.py:58]     hidden_or_intermediate_states = model_executable(
ERROR 09-24 16:06:51 async_llm_engine.py:58]                                     ^^^^^^^^^^^^^^^^^
ERROR 09-24 16:06:51 async_llm_engine.py:58]   File "/local_disk0/.ephemeral_nfs/envs/pythonEnv-0b16fcdd-4fa2-42af-bf13-918419d0b49a/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1553, in _wrapped_call_impl
ERROR 09-24 16:06:51 async_llm_engine.py:58]     return self._call_impl(*args, **kwargs)
ERROR 09-24 16:06:51 async_llm_engine.py:58]            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
ERROR 09-24 16:06:51 async_llm_engine.py:58]   File "/local_disk0/.ephemeral_nfs/envs/pythonEnv-0b16fcdd-4fa2-42af-bf13-918419d0b49a/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1562, in _call_impl
ERROR 09-24 16:06:51 async_llm_engine.py:58]     return forward_call(*args, **kwargs)
ERROR 09-24 16:06:51 async_llm_engine.py:58]            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
ERROR 09-24 16:06:51 async_llm_engine.py:58]   File "/local_disk0/.ephemeral_nfs/envs/pythonEnv-0b16fcdd-4fa2-42af-bf13-918419d0b49a/lib/python3.11/site-packages/vllm/model_executor/models/qwen2.py", line 361, in forward
ERROR 09-24 16:06:51 async_llm_engine.py:58]     hidden_states = self.model(input_ids, positions, kv_caches,
ERROR 09-24 16:06:51 async_llm_engine.py:58]                     ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
ERROR 09-24 16:06:51 async_llm_engine.py:58]   File "/local_disk0/.ephemeral_nfs/envs/pythonEnv-0b16fcdd-4fa2-42af-bf13-918419d0b49a/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1553, in _wrapped_call_impl
ERROR 09-24 16:06:51 async_llm_engine.py:58]     return self._call_impl(*args, **kwargs)
ERROR 09-24 16:06:51 async_llm_engine.py:58]            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
ERROR 09-24 16:06:51 async_llm_engine.py:58]   File "/local_disk0/.ephemeral_nfs/envs/pythonEnv-0b16fcdd-4fa2-42af-bf13-918419d0b49a/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1562, in _call_impl
ERROR 09-24 16:06:51 async_llm_engine.py:58]     return forward_call(*args, **kwargs)
ERROR 09-24 16:06:51 async_llm_engine.py:58]            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
ERROR 09-24 16:06:51 async_llm_engine.py:58]   File "/local_disk0/.ephemeral_nfs/envs/pythonEnv-0b16fcdd-4fa2-42af-bf13-918419d0b49a/lib/python3.11/site-packages/vllm/model_executor/models/qwen2.py", line 277, in forward
ERROR 09-24 16:06:51 async_llm_engine.py:58]     hidden_states, residual = layer(
ERROR 09-24 16:06:51 async_llm_engine.py:58]                               ^^^^^^
ERROR 09-24 16:06:51 async_llm_engine.py:58]   File "/local_disk0/.ephemeral_nfs/envs/pythonEnv-0b16fcdd-4fa2-42af-bf13-918419d0b49a/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1553, in _wrapped_call_impl
ERROR 09-24 16:06:51 async_llm_engine.py:58]     return self._call_impl(*args, **kwargs)
ERROR 09-24 16:06:51 async_llm_engine.py:58]            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
ERROR 09-24 16:06:51 async_llm_engine.py:58]   File "/local_disk0/.ephemeral_nfs/envs/pythonEnv-0b16fcdd-4fa2-42af-bf13-918419d0b49a/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1562, in _call_impl
ERROR 09-24 16:06:51 async_llm_engine.py:58]     return forward_call(*args, **kwargs)
ERROR 09-24 16:06:51 async_llm_engine.py:58]            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
ERROR 09-24 16:06:51 async_llm_engine.py:58]   File "/local_disk0/.ephemeral_nfs/envs/pythonEnv-0b16fcdd-4fa2-42af-bf13-918419d0b49a/lib/python3.11/site-packages/vllm/model_executor/models/qwen2.py", line 210, in forward
ERROR 09-24 16:06:51 async_llm_engine.py:58]     hidden_states = self.self_attn(
ERROR 09-24 16:06:51 async_llm_engine.py:58]                     ^^^^^^^^^^^^^^^
ERROR 09-24 16:06:51 async_llm_engine.py:58]   File "/local_disk0/.ephemeral_nfs/envs/pythonEnv-0b16fcdd-4fa2-42af-bf13-918419d0b49a/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1553, in _wrapped_call_impl
ERROR 09-24 16:06:51 async_llm_engine.py:58]     return self._call_impl(*args, **kwargs)
ERROR 09-24 16:06:51 async_llm_engine.py:58]            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
ERROR 09-24 16:06:51 async_llm_engine.py:58]   File "/local_disk0/.ephemeral_nfs/envs/pythonEnv-0b16fcdd-4fa2-42af-bf13-918419d0b49a/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1562, in _call_impl
ERROR 09-24 16:06:51 async_llm_engine.py:58]     return forward_call(*args, **kwargs)
ERROR 09-24 16:06:51 async_llm_engine.py:58]            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
ERROR 09-24 16:06:51 async_llm_engine.py:58]   File "/local_disk0/.ephemeral_nfs/envs/pythonEnv-0b16fcdd-4fa2-42af-bf13-918419d0b49a/lib/python3.11/site-packages/vllm/model_executor/models/qwen2.py", line 154, in forward
ERROR 09-24 16:06:51 async_llm_engine.py:58]     qkv, _ = self.qkv_proj(hidden_states)
ERROR 09-24 16:06:51 async_llm_engine.py:58]              ^^^^^^^^^^^^^^^^^^^^^^^^^^^^
ERROR 09-24 16:06:51 async_llm_engine.py:58]   File "/local_disk0/.ephemeral_nfs/envs/pythonEnv-0b16fcdd-4fa2-42af-bf13-918419d0b49a/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1553, in _wrapped_call_impl
ERROR 09-24 16:06:51 async_llm_engine.py:58]     return self._call_impl(*args, **kwargs)
ERROR 09-24 16:06:51 async_llm_engine.py:58]            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
ERROR 09-24 16:06:51 async_llm_engine.py:58]   File "/local_disk0/.ephemeral_nfs/envs/pythonEnv-0b16fcdd-4fa2-42af-bf13-918419d0b49a/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1562, in _call_impl
ERROR 09-24 16:06:51 async_llm_engine.py:58]     return forward_call(*args, **kwargs)
ERROR 09-24 16:06:51 async_llm_engine.py:58]            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
ERROR 09-24 16:06:51 async_llm_engine.py:58]   File "/local_disk0/.ephemeral_nfs/envs/pythonEnv-0b16fcdd-4fa2-42af-bf13-918419d0b49a/lib/python3.11/site-packages/vllm/model_executor/layers/linear.py", line 367, in forward
ERROR 09-24 16:06:51 async_llm_engine.py:58]     output_parallel = self.quant_method.apply(self, input_, bias)
ERROR 09-24 16:06:51 async_llm_engine.py:58]                       ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
ERROR 09-24 16:06:51 async_llm_engine.py:58]   File "/local_disk0/.ephemeral_nfs/envs/pythonEnv-0b16fcdd-4fa2-42af-bf13-918419d0b49a/lib/python3.11/site-packages/vllm/model_executor/layers/linear.py", line 135, in apply
ERROR 09-24 16:06:51 async_llm_engine.py:58]     return F.linear(x, layer.weight, bias)
ERROR 09-24 16:06:51 async_llm_engine.py:58]            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
ERROR 09-24 16:06:51 async_llm_engine.py:58] RuntimeError: CUDA error: CUBLAS_STATUS_ALLOC_FAILED when calling `cublasCreate(handle)`
ERROR 09-24 16:06:51 async_llm_engine.py:58] 
ERROR 09-24 16:06:51 async_llm_engine.py:58] During handling of the above exception, another exception occurred:
ERROR 09-24 16:06:51 async_llm_engine.py:58] 
ERROR 09-24 16:06:51 async_llm_engine.py:58] Traceback (most recent call last):
ERROR 09-24 16:06:51 async_llm_engine.py:58]   File "/local_disk0/.ephemeral_nfs/envs/pythonEnv-0b16fcdd-4fa2-42af-bf13-918419d0b49a/lib/python3.11/site-packages/vllm/engine/async_llm_engine.py", line 48, in _log_task_completion
ERROR 09-24 16:06:51 async_llm_engine.py:58]     return_value = task.result()
ERROR 09-24 16:06:51 async_llm_engine.py:58]                    ^^^^^^^^^^^^^
ERROR 09-24 16:06:51 async_llm_engine.py:58]   File "/local_disk0/.ephemeral_nfs/envs/pythonEnv-0b16fcdd-4fa2-42af-bf13-918419d0b49a/lib/python3.11/site-packages/vllm/engine/async_llm_engine.py", line 733, in run_engine_loop
ERROR 09-24 16:06:51 async_llm_engine.py:58]     result = task.result()
ERROR 09-24 16:06:51 async_llm_engine.py:58]              ^^^^^^^^^^^^^
ERROR 09-24 16:06:51 async_llm_engine.py:58]   File "/local_disk0/.ephemeral_nfs/envs/pythonEnv-0b16fcdd-4fa2-42af-bf13-918419d0b49a/lib/python3.11/site-packages/vllm/engine/async_llm_engine.py", line 673, in engine_step
ERROR 09-24 16:06:51 async_llm_engine.py:58]     request_outputs = await self.engine.step_async(virtual_engine)
ERROR 09-24 16:06:51 async_llm_engine.py:58]                       ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
ERROR 09-24 16:06:51 async_llm_engine.py:58]   File "/local_disk0/.ephemeral_nfs/envs/pythonEnv-0b16fcdd-4fa2-42af-bf13-918419d0b49a/lib/python3.11/site-packages/vllm/engine/async_llm_engine.py", line 340, in step_async
ERROR 09-24 16:06:51 async_llm_engine.py:58]     outputs = await self.model_executor.execute_model_async(
ERROR 09-24 16:06:51 async_llm_engine.py:58]               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
ERROR 09-24 16:06:51 async_llm_engine.py:58]   File "/local_disk0/.ephemeral_nfs/envs/pythonEnv-0b16fcdd-4fa2-42af-bf13-918419d0b49a/lib/python3.11/site-packages/vllm/executor/distributed_gpu_executor.py", line 177, in execute_model_async
ERROR 09-24 16:06:51 async_llm_engine.py:58]     return await self._driver_execute_model_async(execute_model_req)
ERROR 09-24 16:06:51 async_llm_engine.py:58]            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
ERROR 09-24 16:06:51 async_llm_engine.py:58]   File "/local_disk0/.ephemeral_nfs/envs/pythonEnv-0b16fcdd-4fa2-42af-bf13-918419d0b49a/lib/python3.11/site-packages/vllm/executor/multiproc_gpu_executor.py", line 231, in _driver_execute_model_async
ERROR 09-24 16:06:51 async_llm_engine.py:58]     return await self.driver_exec_model(execute_model_req)
ERROR 09-24 16:06:51 async_llm_engine.py:58]            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
ERROR 09-24 16:06:51 async_llm_engine.py:58]   File "/usr/lib/python3.11/concurrent/futures/thread.py", line 58, in run
ERROR 09-24 16:06:51 async_llm_engine.py:58]     result = self.fn(*self.args, **self.kwargs)
ERROR 09-24 16:06:51 async_llm_engine.py:58]              ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
ERROR 09-24 16:06:51 async_llm_engine.py:58]   File "/local_disk0/.ephemeral_nfs/envs/pythonEnv-0b16fcdd-4fa2-42af-bf13-918419d0b49a/lib/python3.11/site-packages/vllm/worker/worker_base.py", line 327, in execute_model
ERROR 09-24 16:06:51 async_llm_engine.py:58]     output = self.model_runner.execute_model(
ERROR 09-24 16:06:51 async_llm_engine.py:58]              ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
ERROR 09-24 16:06:51 async_llm_engine.py:58]   File "/local_disk0/.ephemeral_nfs/envs/pythonEnv-0b16fcdd-4fa2-42af-bf13-918419d0b49a/lib/python3.11/site-packages/torch/utils/_contextlib.py", line 116, in decorate_context
ERROR 09-24 16:06:51 async_llm_engine.py:58]     return func(*args, **kwargs)
ERROR 09-24 16:06:51 async_llm_engine.py:58]            ^^^^^^^^^^^^^^^^^^^^^
ERROR 09-24 16:06:51 async_llm_engine.py:58]   File "/local_disk0/.ephemeral_nfs/envs/pythonEnv-0b16fcdd-4fa2-42af-bf13-918419d0b49a/lib/python3.11/site-packages/vllm/worker/model_runner_base.py", line 125, in _wrapper
ERROR 09-24 16:06:51 async_llm_engine.py:58]     pickle.dump(dumped_inputs, filep)
ERROR 09-24 16:06:51 async_llm_engine.py:58]   File "/local_disk0/.ephemeral_nfs/envs/pythonEnv-0b16fcdd-4fa2-42af-bf13-918419d0b49a/lib/python3.11/site-packages/vllm/engine/llm_engine.py", line 563, in __reduce__
ERROR 09-24 16:06:51 async_llm_engine.py:58]     raise RuntimeError("LLMEngine should not be pickled!")
ERROR 09-24 16:06:51 async_llm_engine.py:58] RuntimeError: LLMEngine should not be pickled!
Exception in callback functools.partial(<function _log_task_completion at 0x7f0fd60ebb00>, error_callback=<bound method AsyncLLMEngine._error_callback of <vllm.engine.async_llm_engine.AsyncLLMEngine object at 0x7f0fce97a590>>)
handle: <Handle functools.partial(<function _log_task_completion at 0x7f0fd60ebb00>, error_callback=<bound method AsyncLLMEngine._error_callback of <vllm.engine.async_llm_engine.AsyncLLMEngine object at 0x7f0fce97a590>>)>
Traceback (most recent call last):
  File "/local_disk0/.ephemeral_nfs/envs/pythonEnv-0b16fcdd-4fa2-42af-bf13-918419d0b49a/lib/python3.11/site-packages/vllm/worker/model_runner_base.py", line 112, in _wrapper
    return func(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^
  File "/local_disk0/.ephemeral_nfs/envs/pythonEnv-0b16fcdd-4fa2-42af-bf13-918419d0b49a/lib/python3.11/site-packages/vllm/worker/model_runner.py", line 1546, in execute_model
    hidden_or_intermediate_states = model_executable(
                                    ^^^^^^^^^^^^^^^^^
  File "/local_disk0/.ephemeral_nfs/envs/pythonEnv-0b16fcdd-4fa2-42af-bf13-918419d0b49a/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1553, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/local_disk0/.ephemeral_nfs/envs/pythonEnv-0b16fcdd-4fa2-42af-bf13-918419d0b49a/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1562, in _call_impl
    return forward_call(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/local_disk0/.ephemeral_nfs/envs/pythonEnv-0b16fcdd-4fa2-42af-bf13-918419d0b49a/lib/python3.11/site-packages/vllm/model_executor/models/qwen2.py", line 361, in forward
    hidden_states = self.model(input_ids, positions, kv_caches,
                    ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/local_disk0/.ephemeral_nfs/envs/pythonEnv-0b16fcdd-4fa2-42af-bf13-918419d0b49a/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1553, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/local_disk0/.ephemeral_nfs/envs/pythonEnv-0b16fcdd-4fa2-42af-bf13-918419d0b49a/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1562, in _call_impl
    return forward_call(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/local_disk0/.ephemeral_nfs/envs/pythonEnv-0b16fcdd-4fa2-42af-bf13-918419d0b49a/lib/python3.11/site-packages/vllm/model_executor/models/qwen2.py", line 277, in forward
    hidden_states, residual = layer(
                              ^^^^^^
  File "/local_disk0/.ephemeral_nfs/envs/pythonEnv-0b16fcdd-4fa2-42af-bf13-918419d0b49a/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1553, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/local_disk0/.ephemeral_nfs/envs/pythonEnv-0b16fcdd-4fa2-42af-bf13-918419d0b49a/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1562, in _call_impl
    return forward_call(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/local_disk0/.ephemeral_nfs/envs/pythonEnv-0b16fcdd-4fa2-42af-bf13-918419d0b49a/lib/python3.11/site-packages/vllm/model_executor/models/qwen2.py", line 210, in forward
    hidden_states = self.self_attn(
                    ^^^^^^^^^^^^^^^
  File "/local_disk0/.ephemeral_nfs/envs/pythonEnv-0b16fcdd-4fa2-42af-bf13-918419d0b49a/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1553, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/local_disk0/.ephemeral_nfs/envs/pythonEnv-0b16fcdd-4fa2-42af-bf13-918419d0b49a/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1562, in _call_impl
    return forward_call(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/local_disk0/.ephemeral_nfs/envs/pythonEnv-0b16fcdd-4fa2-42af-bf13-918419d0b49a/lib/python3.11/site-packages/vllm/model_executor/models/qwen2.py", line 154, in forward
    qkv, _ = self.qkv_proj(hidden_states)
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/local_disk0/.ephemeral_nfs/envs/pythonEnv-0b16fcdd-4fa2-42af-bf13-918419d0b49a/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1553, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/local_disk0/.ephemeral_nfs/envs/pythonEnv-0b16fcdd-4fa2-42af-bf13-918419d0b49a/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1562, in _call_impl
    return forward_call(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/local_disk0/.ephemeral_nfs/envs/pythonEnv-0b16fcdd-4fa2-42af-bf13-918419d0b49a/lib/python3.11/site-packages/vllm/model_executor/layers/linear.py", line 367, in forward
    output_parallel = self.quant_method.apply(self, input_, bias)
                      ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/local_disk0/.ephemeral_nfs/envs/pythonEnv-0b16fcdd-4fa2-42af-bf13-918419d0b49a/lib/python3.11/site-packages/vllm/model_executor/layers/linear.py", line 135, in apply
    return F.linear(x, layer.weight, bias)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
RuntimeError: CUDA error: CUBLAS_STATUS_ALLOC_FAILED when calling `cublasCreate(handle)`

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/local_disk0/.ephemeral_nfs/envs/pythonEnv-0b16fcdd-4fa2-42af-bf13-918419d0b49a/lib/python3.11/site-packages/vllm/engine/async_llm_engine.py", line 48, in _log_task_completion
    return_value = task.result()
                   ^^^^^^^^^^^^^
  File "/local_disk0/.ephemeral_nfs/envs/pythonEnv-0b16fcdd-4fa2-42af-bf13-918419d0b49a/lib/python3.11/site-packages/vllm/engine/async_llm_engine.py", line 733, in run_engine_loop
    result = task.result()
             ^^^^^^^^^^^^^
  File "/local_disk0/.ephemeral_nfs/envs/pythonEnv-0b16fcdd-4fa2-42af-bf13-918419d0b49a/lib/python3.11/site-packages/vllm/engine/async_llm_engine.py", line 673, in engine_step
    request_outputs = await self.engine.step_async(virtual_engine)
                      ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/local_disk0/.ephemeral_nfs/envs/pythonEnv-0b16fcdd-4fa2-42af-bf13-918419d0b49a/lib/python3.11/site-packages/vllm/engine/async_llm_engine.py", line 340, in step_async
    outputs = await self.model_executor.execute_model_async(
              ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/local_disk0/.ephemeral_nfs/envs/pythonEnv-0b16fcdd-4fa2-42af-bf13-918419d0b49a/lib/python3.11/site-packages/vllm/executor/distributed_gpu_executor.py", line 177, in execute_model_async
    return await self._driver_execute_model_async(execute_model_req)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/local_disk0/.ephemeral_nfs/envs/pythonEnv-0b16fcdd-4fa2-42af-bf13-918419d0b49a/lib/python3.11/site-packages/vllm/executor/multiproc_gpu_executor.py", line 231, in _driver_execute_model_async
    return await self.driver_exec_model(execute_model_req)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/lib/python3.11/concurrent/futures/thread.py", line 58, in run
    result = self.fn(*self.args, **self.kwargs)
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/local_disk0/.ephemeral_nfs/envs/pythonEnv-0b16fcdd-4fa2-42af-bf13-918419d0b49a/lib/python3.11/site-packages/vllm/worker/worker_base.py", line 327, in execute_model
    output = self.model_runner.execute_model(
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/local_disk0/.ephemeral_nfs/envs/pythonEnv-0b16fcdd-4fa2-42af-bf13-918419d0b49a/lib/python3.11/site-packages/torch/utils/_contextlib.py", line 116, in decorate_context
    return func(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^
  File "/local_disk0/.ephemeral_nfs/envs/pythonEnv-0b16fcdd-4fa2-42af-bf13-918419d0b49a/lib/python3.11/site-packages/vllm/worker/model_runner_base.py", line 125, in _wrapper
    pickle.dump(dumped_inputs, filep)
  File "/local_disk0/.ephemeral_nfs/envs/pythonEnv-0b16fcdd-4fa2-42af-bf13-918419d0b49a/lib/python3.11/site-packages/vllm/engine/llm_engine.py", line 563, in __reduce__
    raise RuntimeError("LLMEngine should not be pickled!")
RuntimeError: LLMEngine should not be pickled!

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "uvloop/cbhandles.pyx", line 63, in uvloop.loop.Handle._run
  File "/local_disk0/.ephemeral_nfs/envs/pythonEnv-0b16fcdd-4fa2-42af-bf13-918419d0b49a/lib/python3.11/site-packages/vllm/engine/async_llm_engine.py", line 60, in _log_task_completion
    raise AsyncEngineDeadError(
vllm.engine.async_llm_engine.AsyncEngineDeadError: Task finished unexpectedly. This should never happen! Please open an issue on Github. See stack trace above for the actual cause.

Before submitting a new issue...

Make sure you already searched for relevant issues, and asked the chatbot living at the bottom right corner of the documentation page, which can answer lots of frequently asked questions.

The text was updated successfully, but these errors were encountered:

stikkireddy · 2024-09-24T16:50:08Z

it seems this a cuda oom issue that is being thrown as a LLMEngine should not be pickled! modifying the max num seqs to be smaller resolved the issue.

njhill · 2024-09-24T17:53:36Z

This is probably the same thing that #8744 should help with.

amithm3 · 2024-09-25T20:12:30Z

same issue with "mistralai/Mistral-7B-Instruct-v0.1"

tarudesu · 2024-10-19T07:55:59Z

Same here, even though I modified the max num seqs to only 1. Are there any further solutions? (Issue with LLaMa 3.1 Instruct)

vlasenkoalexey · 2024-11-07T19:19:48Z

Same issue with Llama3.1 405B

xiangxu-google · 2024-11-11T23:20:25Z

It looks like not a pickle issue, the root cause should be:

RuntimeError: CUDA error: CUBLAS_STATUS_ALLOC_FAILED when calling `cublasCreate(handle)`

Which might be caused by low level invalid memory access, or accumulated memory fragmentation, suggest to reduce the --gpu_memory_utiliztion to see if can mitigate the issue.

github-actions · 2025-02-17T02:01:41Z

This issue has been automatically marked as stale because it has not had any activity within 90 days. It will be automatically closed if no further activity occurs within 30 days. Leave a comment if you feel this issue should remain open. Thank you!

github-actions · 2025-03-19T02:05:03Z

This issue has been automatically closed due to inactivity. Please feel free to reopen if you feel it is still relevant. Thank you!

stikkireddy added the bug Something isn't working label Sep 24, 2024

stikkireddy changed the title ~~[Bug]: LLMENgine cannot be pickled vllm 0.6.1.post2~~ [Bug]: LLMEngine cannot be pickled vllm 0.6.1.post2 Sep 24, 2024

stikkireddy changed the title ~~[Bug]: LLMEngine cannot be pickled vllm 0.6.1.post2~~ [Bug]: LLMEngine cannot be pickled error vllm 0.6.1.post2 Sep 24, 2024

github-actions bot added the stale Over 90 days of inactivity label Feb 17, 2025

github-actions bot closed this as not planned Won't fix, can't repro, duplicate, stale Mar 19, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

[Bug]: LLMEngine cannot be pickled error vllm 0.6.1.post2 #8778

[Bug]: LLMEngine cannot be pickled error vllm 0.6.1.post2 #8778

stikkireddy commented Sep 24, 2024

stikkireddy commented Sep 24, 2024

Uh oh!

njhill commented Sep 24, 2024

Uh oh!

amithm3 commented Sep 25, 2024

Uh oh!

tarudesu commented Oct 19, 2024

Uh oh!

vlasenkoalexey commented Nov 7, 2024

Uh oh!

xiangxu-google commented Nov 11, 2024

Uh oh!

github-actions bot commented Feb 17, 2025

Uh oh!

github-actions bot commented Mar 19, 2025

Uh oh!

Uh oh!

[Bug]: LLMEngine cannot be pickled error vllm 0.6.1.post2 #8778

[Bug]: LLMEngine cannot be pickled error vllm 0.6.1.post2 #8778

Comments

stikkireddy commented Sep 24, 2024

Your current environment

Model Input Dumps

🐛 Describe the bug

Before submitting a new issue...

stikkireddy commented Sep 24, 2024

Uh oh!

njhill commented Sep 24, 2024

Uh oh!

amithm3 commented Sep 25, 2024

Uh oh!

tarudesu commented Oct 19, 2024

Uh oh!

vlasenkoalexey commented Nov 7, 2024

Uh oh!

xiangxu-google commented Nov 11, 2024

Uh oh!

github-actions bot commented Feb 17, 2025

Uh oh!

github-actions bot commented Mar 19, 2025

Uh oh!