Skip to content

[Bug]: vllm-ascend v0.7.3 + mindie_turbo 2.0 rc1 producing garbled results in multi-tp inference #898

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
tingyiz97 opened this issue May 19, 2025 · 1 comment
Labels
bug Something isn't working

Comments

@tingyiz97
Copy link

tingyiz97 commented May 19, 2025

Environment information

vllm-ascend version: v0.7.3
mindie_turbo version: 2.0 rc1
npu: Ascend 910B3

Command

MODEL=/mnt/models/QwQ-32B
export ACL_OP_INIT_MODE=1
export ASCEND_RT_VISIBLE_DEVICES="4,5,6,7"

VLLM_USE_V1=0 python -m vllm.entrypoints.openai.api_server --host 0.0.0.0 --port 8000 --model $MODEL \
    -tp 4 \
    --gpu_memory_utilization 0.8 \
    --max_model_len 32768 \
    --served-model-name test \
    --max-num-seqs 16

There are two noticeable issues:

  1. The server can only start if the env var is set: export ASCEND_LAUNCH_BLOCKING=1
  2. Inference reults are garbled when tp > 1, tested models: Meta-Llama-3-8B/QwQ-32B/Qwen2.5-1.5B-Instruct
@tingyiz97 tingyiz97 added the bug Something isn't working label May 19, 2025
@umeiko
Copy link

umeiko commented May 19, 2025

Envirment:

(py310_torch251) root@ai-10-246-91-186:/vllm-workspace/mzy# pip list | grep vllm
vllm                              0.7.3+empty
vllm_ascend                       0.7.3         /vllm-workspace/vllm-ascend
(py310_torch251) root@ai-10-246-91-186:/vllm-workspace/mzy# pip list | grep mindie
mindie_turbo                      2.0rc1
(py310_torch251) root@ai-10-246-91-186:/vllm-workspace/mzy# pip list | grep torch
torch                             2.5.1
torch-npu                         2.5.1
torchvision                       0.20.1

Reproduction:

import os
os.environ["VLLM_USE_V1"]="0"
os.environ["ASCEND_LAUNCH_BLOCKING"]="1"
from vllm import LLM, SamplingParams

prompts = [
    "Hello, my name is",
    "The president of the United States is",
    "The capital of France is",
    "The future of AI is",
]

# Create a sampling params object.
sampling_params = SamplingParams(max_tokens=1000, temperature=0.0)
# Create an LLM.
llm = LLM(model="/mnt/models/Qwen2.5-1.5B-Instruct", tensor_parallel_size=2, gpu_memory_utilization=0.8)

# Generate texts from the prompts.
outputs = llm.generate(prompts, sampling_params)
for output in outputs:
    prompt = output.prompt
    generated_text = output.outputs[0].text
    print(f"Prompt: {prompt!r}, Generated text: {generated_text!r}")

returns:

INFO 05-19 09:14:55 __init__.py:30] Available plugins for group vllm.platform_plugins:
INFO 05-19 09:14:55 __init__.py:32] name=ascend, value=vllm_ascend:register
INFO 05-19 09:14:55 __init__.py:34] all available plugins for group vllm.platform_plugins will be loaded.
INFO 05-19 09:14:55 __init__.py:36] set environment variable VLLM_PLUGINS to control which plugins to load.
INFO 05-19 09:14:55 __init__.py:44] plugin ascend loaded.
INFO 05-19 09:14:55 __init__.py:198] Platform plugin ascend is activated
WARNING:root:Warning: Failed to register custom ops, all custom ops will be disabled
INFO 05-19 09:14:55 __init__.py:30] Available plugins for group vllm.general_plugins:
INFO 05-19 09:14:55 __init__.py:32] name=ascend_enhanced_model, value=vllm_ascend:register_model
INFO 05-19 09:14:55 __init__.py:34] all available plugins for group vllm.general_plugins will be loaded.
INFO 05-19 09:14:55 __init__.py:36] set environment variable VLLM_PLUGINS to control which plugins to load.
INFO 05-19 09:14:55 __init__.py:44] plugin ascend_enhanced_model loaded.
WARNING 05-19 09:14:55 registry.py:351] Model architecture Qwen2VLForConditionalGeneration is already registered, and will be overwritten by the new model class vllm_ascend.models.qwen2_vl:AscendQwen2VLForConditionalGeneration.
WARNING 05-19 09:14:55 registry.py:351] Model architecture Qwen2_5_VLForConditionalGeneration is already registered, and will be overwritten by the new model class vllm_ascend.models.qwen2_5_vl:AscendQwen2_5_VLForConditionalGeneration.
WARNING 05-19 09:14:55 registry.py:351] Model architecture DeepseekV2ForCausalLM is already registered, and will be overwritten by the new model class vllm_ascend.models.deepseek_v2:CustomDeepseekV2ForCausalLM.
WARNING 05-19 09:14:55 registry.py:351] Model architecture DeepseekV3ForCausalLM is already registered, and will be overwritten by the new model class vllm_ascend.models.deepseek_v2:CustomDeepseekV3ForCausalLM.
WARNING 05-19 09:14:55 registry.py:351] Model architecture DeepSeekMTPModel is already registered, and will be overwritten by the new model class vllm_ascend.models.deepseek_mtp:CustomDeepSeekMTP.
INFO 05-19 09:14:55 importing.py:16] Triton not installed or not compatible; certain GPU-related functions will not be available.
INFO 05-19 09:15:05 config.py:549] This model supports multiple tasks: {'score', 'classify', 'reward', 'generate', 'embed'}. Defaulting to 'generate'.
INFO 05-19 09:15:05 config.py:1382] Defaulting to use mp for distributed inference
INFO 05-19 09:15:05 llm_engine.py:234] Initializing a V0 LLM engine (v0.7.3) with config: model='/mnt/models/Qwen2.5-1.5B-Instruct', speculative_config=None, tokenizer='/mnt/models/Qwen2.5-1.5B-Instruct', skip_tokenizer_init=False, tokenizer_mode=auto, revision=None, override_neuron_config=None, tokenizer_revision=None, trust_remote_code=False, dtype=torch.bfloat16, max_seq_len=32768, download_dir=None, load_format=auto, tensor_parallel_size=2, pipeline_parallel_size=1, disable_custom_all_reduce=False, quantization=None, enforce_eager=False, kv_cache_dtype=auto,  device_config=npu, decoding_config=DecodingConfig(guided_decoding_backend='xgrammar'), observability_config=ObservabilityConfig(otlp_traces_endpoint=None, collect_model_forward_time=False, collect_model_execute_time=False), seed=0, served_model_name=/mnt/models/Qwen2.5-1.5B-Instruct, num_scheduler_steps=1, multi_step_stream_outputs=True, enable_prefix_caching=False, chunked_prefill_enabled=False, use_async_output_proc=True, disable_mm_preprocessor_cache=False, mm_processor_kwargs=None, pooler_config=None, compilation_config={"splitting_ops":[],"compile_sizes":[],"cudagraph_capture_sizes":[256,248,240,232,224,216,208,200,192,184,176,168,160,152,144,136,128,120,112,104,96,88,80,72,64,56,48,40,32,24,16,8,4,2,1],"max_capture_size":256}, use_cached_outputs=False, 
WARNING 05-19 09:15:05 multiproc_worker_utils.py:300] Reducing Torch parallelism from 256 threads to 1 to avoid unnecessary CPU contention. Set OMP_NUM_THREADS in the external environment to tune this value as needed.
(VllmWorkerProcess pid=3179922) INFO 05-19 09:15:05 multiproc_worker_utils.py:229] Worker ready; awaiting tasks
(VllmWorkerProcess pid=3179922) ERROR 05-19 09:15:06 camem.py:69] Failed to import vllm_ascend_C:No module named 'vllm_ascend.vllm_ascend_C'
ERROR 05-19 09:15:06 camem.py:69] Failed to import vllm_ascend_C:No module named 'vllm_ascend.vllm_ascend_C'
(VllmWorkerProcess pid=3179922) /root/miniconda3/envs/py310_torch251/lib/python3.10/site-packages/torch_npu/contrib/transfer_to_npu.py:292: ImportWarning: 
/root/miniconda3/envs/py310_torch251/lib/python3.10/site-packages/torch_npu/contrib/transfer_to_npu.py:292: ImportWarning: 
    *************************************************************************************************************
    The torch.Tensor.cuda and torch.nn.Module.cuda are replaced with torch.Tensor.npu and torch.nn.Module.npu now..
    The torch.cuda.DoubleTensor is replaced with torch.npu.FloatTensor cause the double type is not supported now..
    The backend in torch.distributed.init_process_group set to hccl now..
    The torch.cuda.* and torch.cuda.amp.* are replaced with torch.npu.* and torch.npu.amp.* now..
    The device parameters have been replaced with npu in the function below:
    torch.logspace, torch.randint, torch.hann_window, torch.rand, torch.full_like, torch.ones_like, torch.rand_like, torch.randperm, torch.arange, torch.frombuffer, torch.normal, torch._empty_per_channel_affine_quantized, torch.empty_strided, torch.empty_like, torch.scalar_tensor, torch.tril_indices, torch.bartlett_window, torch.ones, torch.sparse_coo_tensor, torch.randn, torch.kaiser_window, torch.tensor, torch.triu_indices, torch.as_tensor, torch.zeros, torch.randint_like, torch.full, torch.eye, torch._sparse_csr_tensor_unsafe, torch.empty, torch._sparse_coo_tensor_unsafe, torch.blackman_window, torch.zeros_like, torch.range, torch.sparse_csr_tensor, torch.randn_like, torch.from_file, torch._cudnn_init_dropout_state, torch._empty_affine_quantized, torch.linspace, torch.hamming_window, torch.empty_quantized, torch._pin_memory, torch.autocast, torch.load, torch.Generator, torch.set_default_device, torch.Tensor.new_empty, torch.Tensor.new_empty_strided, torch.Tensor.new_full, torch.Tensor.new_ones, torch.Tensor.new_tensor, torch.Tensor.new_zeros, torch.Tensor.to, torch.Tensor.pin_memory, torch.nn.Module.to, torch.nn.Module.to_empty
    *************************************************************************************************************
    
  warnings.warn(msg, ImportWarning)
(VllmWorkerProcess pid=3179922)     *************************************************************************************************************
(VllmWorkerProcess pid=3179922)     The torch.Tensor.cuda and torch.nn.Module.cuda are replaced with torch.Tensor.npu and torch.nn.Module.npu now..
(VllmWorkerProcess pid=3179922)     The torch.cuda.DoubleTensor is replaced with torch.npu.FloatTensor cause the double type is not supported now..
(VllmWorkerProcess pid=3179922)     The backend in torch.distributed.init_process_group set to hccl now..
(VllmWorkerProcess pid=3179922)     The torch.cuda.* and torch.cuda.amp.* are replaced with torch.npu.* and torch.npu.amp.* now..
(VllmWorkerProcess pid=3179922)     The device parameters have been replaced with npu in the function below:
(VllmWorkerProcess pid=3179922)     torch.logspace, torch.randint, torch.hann_window, torch.rand, torch.full_like, torch.ones_like, torch.rand_like, torch.randperm, torch.arange, torch.frombuffer, torch.normal, torch._empty_per_channel_affine_quantized, torch.empty_strided, torch.empty_like, torch.scalar_tensor, torch.tril_indices, torch.bartlett_window, torch.ones, torch.sparse_coo_tensor, torch.randn, torch.kaiser_window, torch.tensor, torch.triu_indices, torch.as_tensor, torch.zeros, torch.randint_like, torch.full, torch.eye, torch._sparse_csr_tensor_unsafe, torch.empty, torch._sparse_coo_tensor_unsafe, torch.blackman_window, torch.zeros_like, torch.range, torch.sparse_csr_tensor, torch.randn_like, torch.from_file, torch._cudnn_init_dropout_state, torch._empty_affine_quantized, torch.linspace, torch.hamming_window, torch.empty_quantized, torch._pin_memory, torch.autocast, torch.load, torch.Generator, torch.set_default_device, torch.Tensor.new_empty, torch.Tensor.new_empty_strided, torch.Tensor.new_full, torch.Tensor.new_ones, torch.Tensor.new_tensor, torch.Tensor.new_zeros, torch.Tensor.to, torch.Tensor.pin_memory, torch.nn.Module.to, torch.nn.Module.to_empty
(VllmWorkerProcess pid=3179922)     *************************************************************************************************************
(VllmWorkerProcess pid=3179922)     
(VllmWorkerProcess pid=3179922)   warnings.warn(msg, ImportWarning)
(VllmWorkerProcess pid=3179922) /root/miniconda3/envs/py310_torch251/lib/python3.10/site-packages/torch_npu/contrib/transfer_to_npu.py:247: RuntimeWarning: torch.jit.script and torch.jit.script_method will be disabled by transfer_to_npu, which currently does not support them, if you need to enable them, please do not use transfer_to_npu.
/root/miniconda3/envs/py310_torch251/lib/python3.10/site-packages/torch_npu/contrib/transfer_to_npu.py:247: RuntimeWarning: torch.jit.script and torch.jit.script_method will be disabled by transfer_to_npu, which currently does not support them, if you need to enable them, please do not use transfer_to_npu.
  warnings.warn(msg, RuntimeWarning)
(VllmWorkerProcess pid=3179922)   warnings.warn(msg, RuntimeWarning)
WARNING 05-19 09:15:07 utils.py:2262] Methods add_prompt_adapter,cache_config,compilation_config,current_platform,list_prompt_adapters,load_config,pin_prompt_adapter,remove_prompt_adapter not implemented in <vllm_ascend.worker.worker.NPUWorker object at 0xfffd4a638520>
(VllmWorkerProcess pid=3179922) WARNING 05-19 09:15:07 utils.py:2262] Methods add_prompt_adapter,cache_config,compilation_config,current_platform,list_prompt_adapters,load_config,pin_prompt_adapter,remove_prompt_adapter not implemented in <vllm_ascend.worker.worker.NPUWorker object at 0xfffd4a648340>
WARNING 05-19 09:15:07 _custom_ops.py:21] Failed to import from vllm._C with ModuleNotFoundError("No module named 'vllm._C'")
(VllmWorkerProcess pid=3179922) WARNING 05-19 09:15:07 _custom_ops.py:21] Failed to import from vllm._C with ModuleNotFoundError("No module named 'vllm._C'")
INFO 05-19 09:15:07 utils.py:33] MindIE Turbo is installed. vLLM inference will be accelerated with MindIE Turbo.
(VllmWorkerProcess pid=3179922) INFO 05-19 09:15:07 utils.py:33] MindIE Turbo is installed. vLLM inference will be accelerated with MindIE Turbo.
[rank0]:[W519 09:15:15.708102120 ProcessGroupGloo.cpp:715] Warning: Unable to resolve hostname to a (local) address. Using the loopback address as fallback. Manually set the network interface to bind to with GLOO_SOCKET_IFNAME. (function operator())
[rank1]:[W519 09:15:15.748419910 ProcessGroupGloo.cpp:715] Warning: Unable to resolve hostname to a (local) address. Using the loopback address as fallback. Manually set the network interface to bind to with GLOO_SOCKET_IFNAME. (function operator())
INFO 05-19 09:15:15 shm_broadcast.py:258] vLLM message queue communication handle: Handle(connect_ip='127.0.0.1', local_reader_ranks=[1], buffer_handle=(1, 4194304, 6, 'psm_ca948aec'), local_subscribe_port=40521, remote_subscribe_port=None)
INFO 05-19 09:15:15 model_runner.py:902] Starting to load model /mnt/models/Qwen2.5-1.5B-Instruct...
(VllmWorkerProcess pid=3179922) INFO 05-19 09:15:15 model_runner.py:902] Starting to load model /mnt/models/Qwen2.5-1.5B-Instruct...
Loading safetensors checkpoint shards:   0% Completed | 0/1 [00:00<?, ?it/s]
Loading safetensors checkpoint shards: 100% Completed | 1/1 [00:00<00:00,  1.48it/s]
Loading safetensors checkpoint shards: 100% Completed | 1/1 [00:00<00:00,  1.48it/s]

(VllmWorkerProcess pid=3179922) INFO 05-19 09:15:17 model_runner.py:907] Loading model weights took 1.4465 GB
INFO 05-19 09:15:17 model_runner.py:907] Loading model weights took 1.4465 GB
mki_log delete old file:/root/atb/log/atb_10344_20250514114638.log
mki_log delete old file:/root/atb/log/atb_10284_20250514114638.log
mki_log delete old file:/root/atb/log/atb_10283_20250514114638.log
INFO 05-19 09:15:20 executor_base.py:111] # npu blocks: 25753, # CPU blocks: 2340
INFO 05-19 09:15:20 executor_base.py:116] Maximum concurrency for 32768 tokens per request: 100.60x
INFO 05-19 09:15:22 llm_engine.py:436] init engine (profile, create kv cache, warmup model) took 5.29 seconds
Processed prompts: 100%|██████████████████████████████████████████████████████████████| 4/4 [00:22<00:00,  5.52s/it, est. speed input: 1.00 toks/s, output: 181.00 toks/s]
Prompt: 'Hello, my name is', Generated text: "of!ofofofofofofofofofofof.\n\n.\n\n....,,,. magic magic magic magicice magicople invent two,,ichichich of?ich?ich????____? here???o you you you you be bees and and and and andooooo and the may may youf you ofiiiiiobbofbbiiiiiiiiiiiiiiiiii..bbbbbeeee.eechchchchuuuchch</chch</chiiiiichchchchchchchch.........chch.chchchch..chch...............................................-ch-ch-ch-ch-ch</</</-ch-ch-ch-ch-ch-ch-ch-ch-ch-ch-ch-ch-ch-ch-ch-ch-ch-ch-ch-ch-ch-ch-ch-ch-ch-ch-ch-ch-ch-ch-ch-ch-ch-ch-ch-ch-ch-ch-ch-ch-ch-ch-ch-ch-ch-ch-ch-ch-ch-ch-ch-ch-ch-ch-ch-ch-ch-ch-ch-ch-ch-ch-ch-ch-ch-ch-ch-ch-ch-ch-ch-ch-ch-ch-ch-ch-ch-ch-ch-ch-ch-ch-ch-ch-ch-ch-ch-ch-ch-ch-ch-ch-ch-ch-ch-ch-ch...-ch.-ch..-ch..-ch.\n\n.\n\n-ch.. the........ - - - - - - - - - - - - - - - - - -  -              M for some for for I I I I I I I I I Ichchchchchchchchchchchchchchchchchchchchchchchchchchchchchchchchchchchchchchchchchchchchchchchchchchchchchchchchchchchchchchch.chch\n\n\n\n\n.\n,.\n,...\n\n\n\n\n\n\n, - - - - - - - - - -?????,,,,. - - - - - - -...... - - - - - -  -                  we we we we we  we we we  we we we)  ))\n))). be be be..... for - for. for - for. for. for for some for. for. for some some... for some some   ..  be be be be be be for   for     for                for we      for we      for we for       for we\n\n for we for we\n we\n\n\n we\n\n\n we we we we we we we we we we we we we we we we we we we we we we we we we we we we we we We. wewe. 0,\n\n \n\n,\n\n,\n\n,\n\n,,,,,, may,,,,,e media,...,x, a.....,,,,,,,,,,,ici. media.u.ici.ici white small........,,,,,,,,,.insert,,? that Chapter. isn doesn but Secret?... dark,.....,,,, drinks.,,?\n\nink,,..ap and close,,,I...,,,,, c area I that c (,, c e low in specifically,,,. I#d.,,,\nz.'s change,. #, areas,-free. Chapter, someulin.m expans I..)\n\n.,,,,,,,,,,,,,,,,, for inh.,,... for.#ube in Chapter,,cup and..\n\n,, you,,#,, #, # # # #). # # # do that# low,#,,,,,,,,,,,,,,,,,,,,,,).).. more--. and and-,). that that........ that. that in that,,,,,,,,"
Prompt: 'The president of the United States is', Generated text: 'ofof                                                                                                                                                   .\n.\n.\n.\n.\n            \n.\n.\n.\n.\n                           \n\n.\n.\n H H H HHHHH H H H H H H H (  H H HO G H H H H H H H HHHHHHHhfhfhfhfhfhfhfhfhf[h,,H[h),,,, ( ( have,, or,,,,).),))))))),)))M of H, or:) ofH,))) and)) and) and and,,,H, haveH and have theH or here,.\n\n, of,,,hhofHof,.\n\n ofhof bore h of figured.h.\n\n,,,.\nH.\n\n.\n;h of.\n\n.\n\n of that..SuspendLayout,, hang and(bin,c’ and and Could,,),.i;;;,,, that profil or a,; of now,, Prof c,, that somewhere, the that hang self, bas, and,,,,,,.DoesNotExist,,,,,,,,,,,. oritC,,,,’ of the that, in, the,,, “,, in,,,:,,, in, of.Lockace,,,,, in, of, in, in, of,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,)),,),,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,licity,,,,,,,,,,×,iaifest×,ADING.cs,,,,,,,,,,,,,,,,,, and...Struct,. and,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,). we, we We was we,x,,,,,,,,,, was,,, was after, was after,,,,,, was was, told told,, bore, was told., NC wasucked told waited,,elfare’s 中.ucked不 bore, Iulin hang hang my, middle’s mile was hang hang_until (ume thought paint record,中 my somewhereical, NC ourifest bore bore bore hangifest hang hang hang hang_until, online in my And New,lTemplateait in my, my my It It It Becausewomen X � where whom my�everin tamp hangTemplate�\n\n service Century_untilical.,NCa that no No No,eten in wifeX in, that,.Tr’s I I Manpifest. my Matthew,aitete my of(New,NC MaTemplate(Newinandifest tampifestifestifest my meal,时 inChi Theical myical myifestifestifestifestifestifestifestifestifestifestifestifestifestifestifestifestifestifestifestifestifestifestifestifestifestifestifestifestifest中\n\n\nifestifestifestifestin | my my myifestifestifest来.. turkeya myifestifestifestifestifestifestfitifestifest中\n\n\n\n\n\nonly:chtenifest hang,h somewhereifestifest hang,i an boreifestifestmarkedence中only beforeh my a whoifestung Itityh it only, manusifest hangethe.getMceifestifestmarkedicken only of a a the the� master ailo a a a,,,, a a a a the the the only one’s a fore中'
Prompt: 'The capital of France is', Generated text: 'of  ofofofofofofofofofofofofofofof . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .),),),.),..... occ, strugg.erness,,erness,,, and the the,,,,th bioc): Lab’s’s in,, of son of the,  Los, # a ,.,,, 1, any/ possible, I,, and complete, the  N put,,,,, and a different reaction to at an after a  we, all as;; completed); ,5th we things to,ac) help#, 2; and I idea, 1., average report,2, ,2,  we 5, couldn help)\n\n I ) I2,, here and then 22, the one because by anyax you could 2. and (;’,n I 2 and you we and we after,,’ of with.\n\n I  and, add and‘ of a my most2, " the # of most . use and“ .111 and my a that  could of2 I  and and the the my!, my and a its so. anyone. 1) ’s, and stay 13 0, and ( ( \n\n2 and all possible, I… and " that of ( andh1, the $...  of my dog,,,2, , the 25, , ;, 252, and and  (252, and2 and was 20 - of and ( ( ( (22,, 5, .5 and learn of the 25, 35 and, 5, the end was and and1,, 25 - gave and2,2 and#; was25,% here\n\n. explained, - was the 1, and;1 Newest.25 – and and was was##########:25,)),5,,5,,5 ( % %  - -. ( andand2,2,2,2 and2 and2 (2 (2 (2 (2 ( ( ( ( of (22#22,2222222mm22222222222222222222222222222222222222222222222222222222222222222222222222222222222222222222222222222222222222222222222222222222;, and,,,,2.  /,one,!!...\n  and.\n ee of and and22222222222 and2222 “2222 can222fe222fe22222,22,, 2222,222,222,222,2B222,22,22,22,22,22,22,22,22,22,22,22,22,22,22,e22,e22,e22,e22,22,e22,fi22fea E,2,22,,22,e22, (e1, and and22,ee22,eee2,,22,ee22,2)et222222 and222 and222 and22,e2)2)22,fe on22,,22,22223 and,22,22233333333333333333333333533,55,25,25,25,2) and,,55,2e,2eandE #3,32,22,22,22,222,22,22,22,22,222,e222,'
Prompt: 'The future of AI is', Generated text: ' -       -                                                                                                                           \n\n\n\n,,    \n \n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\xa0\xa0\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , ,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,'
[ERROR] TBE Subprocess[task_distribute] raise error[], main process disappeared!
[ERROR] TBE Subprocess[task_distribute] raise error[], main process disappeared!
[ERROR] TBE Subprocess[task_distribute] raise error[], main process disappeared!
[ERROR] TBE Subprocess[task_distribute] raise error[], main process disappeared!
[ERROR] TBE Subprocess[task_distribute] raise error[], main process disappeared!
[ERROR] TBE Subprocess[task_distribute] raise error[], main process disappeared!
[ERROR] TBE Subprocess[task_distribute] raise error[], main process disappeared!
[ERROR] TBE Subprocess[task_distribute] raise error[], main process disappeared!
Exception ignored in: <function LLMEngine.__del__ at 0xfffdf0f49990>
Traceback (most recent call last):
  File "/root/miniconda3/envs/py310_torch251/lib/python3.10/site-packages/vllm/engine/llm_engine.py", line 508, in __del__
  File "/root/miniconda3/envs/py310_torch251/lib/python3.10/site-packages/vllm/executor/mp_distributed_executor.py", line 134, in shutdown
  File "/root/miniconda3/envs/py310_torch251/lib/python3.10/site-packages/vllm/executor/multiproc_worker_utils.py", line 141, in close
AttributeError: 'NoneType' object has no attribute 'info'
/root/miniconda3/envs/py310_torch251/lib/python3.10/multiprocessing/resource_tracker.py:224: UserWarning: resource_tracker: There appear to be 30 leaked semaphore objects to clean up at shutdown
  warnings.warn('resource_tracker: There appear to be %d '
/root/miniconda3/envs/py310_torch251/lib/python3.10/multiprocessing/resource_tracker.py:224: UserWarning: resource_tracker: There appear to be 1 leaked shared_memory objects to clean up at shutdown
  warnings.warn('resource_tracker: There appear to be %d '
Segmentation fault (core dumped)

tensor_parallel_size=2 makes things bad, tensor_parallel_size=1 will act normal

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants