Skip to content

[Bug]: v0.7.3 model outputs a chain of numbers #892

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
xyyyzzz opened this issue May 19, 2025 · 0 comments
Open

[Bug]: v0.7.3 model outputs a chain of numbers #892

xyyyzzz opened this issue May 19, 2025 · 0 comments
Labels
bug Something isn't working

Comments

@xyyyzzz
Copy link

xyyyzzz commented May 19, 2025

Your current environment

The output of `python collect_env.py`
PyTorch version: 2.5.1
Is debug build: False

OS: EulerOS 2.0 (SP10) (aarch64)
GCC version: (GCC) 7.3.0
Clang version: Could not collect
CMake version: version 4.0.2
Libc version: glibc-2.28

Python version: 3.9.20 (main, Oct  3 2024, 07:31:44)  [GCC 11.2.0] (64-bit runtime)
Python platform: Linux-4.19.90-vhulk2211.3.0.h1960.eulerosv2r10.aarch64-aarch64-with-glibc2.28

CPU:
Architecture:                       aarch64
CPU op-mode(s):                     64-bit
Byte Order:                         Little Endian
CPU(s):                             192
On-line CPU(s) list:                0-191
Thread(s) per core:                 1
Core(s) per socket:                 48
Socket(s):                          4
NUMA node(s):                       8
Vendor ID:                          HiSilicon
Model:                              0
Model name:                         Kunpeng-920
Stepping:                           0x1
BogoMIPS:                           200.00
L1d cache:                          12 MiB
L1i cache:                          12 MiB
L2 cache:                           96 MiB
L3 cache:                           192 MiB
NUMA node0 CPU(s):                  0-23
NUMA node1 CPU(s):                  24-47
NUMA node2 CPU(s):                  48-71
NUMA node3 CPU(s):                  72-95
NUMA node4 CPU(s):                  96-119
NUMA node5 CPU(s):                  120-143
NUMA node6 CPU(s):                  144-167
NUMA node7 CPU(s):                  168-191
Vulnerability Gather data sampling: Not affected
Vulnerability Itlb multihit:        Not affected
Vulnerability L1tf:                 Not affected
Vulnerability Mds:                  Not affected
Vulnerability Meltdown:             Not affected
Vulnerability Mmio stale data:      Not affected
Vulnerability Retbleed:             Not affected
Vulnerability Spec store bypass:    Mitigation; Speculative Store Bypass disabled via prctl
Vulnerability Spectre v1:           Mitigation; __user pointer sanitization
Vulnerability Spectre v2:           Not affected
Vulnerability Srbds:                Not affected
Vulnerability Tsx async abort:      Not affected
Flags:                              fp asimd evtstrm aes pmull sha1 sha2 crc32 atomics fphp asimdhp cpuid asimdrdm jscvt fcma dcpop asimddp asimdfhm ssbs

Versions of relevant libraries:
[pip3] mypy==1.11.1
[pip3] mypy-extensions==1.0.0
[pip3] numpy==1.25.0
[pip3] pyzmq==26.2.0
[pip3] torch==2.5.1
[pip3] torch-npu==2.5.1
[pip3] torch-tb-profiler-ascend==0.4.0.11
[pip3] torchvision==0.20.1
[pip3] transformers==4.50.2
[pip3] transformers-stream-generator==0.0.4
[conda] numpy                     1.25.0                   pypi_0    pypi
[conda] pyzmq                     26.2.0                   pypi_0    pypi
[conda] torch                     2.5.1                    pypi_0    pypi
[conda] torch-npu                 2.5.1                    pypi_0    pypi
[conda] torch-tb-profiler-ascend  0.4.0.11                 pypi_0    pypi
[conda] torchvision               0.20.1                   pypi_0    pypi
[conda] transformers              4.50.2                   pypi_0    pypi
[conda] transformers-stream-generator 0.0.4                    pypi_0    pypi
vLLM Version: 0.7.4.dev0+ged6e907.d20250516 (git sha: ed6e907, date: 20250516)
vLLM Ascend Version: 0.7.4.dev0+g779eebb.d20250516 (git sha: 779eebb, date: 20250516)


CANN:
package_name=Ascend-cann-toolkit
version=8.0.T60
innerversion=V100R001C20B029
compatible_version=[V100R001C15],[V100R001C17],[V100R001C18],[V100R001C19],[V100R001C20]
arch=aarch64
os=linux
path=/usr/local/Ascend/ascend-toolkit/8.0.T60/aarch64-linux

🐛 Describe the bug

When prompt is "The future of AI is" and the model is DeepSeek-R1-Distill-Qwen-7B, I get a chain of numbers for the generated text. This error starts occurring from vllm-ascend v0.7.3.

Image

Offline inference example:

from vllm import LLM, SamplingParams
prompts = [
    "The future of AI is ",
]

# Create a sampling params object.
sampling_params = SamplingParams(max_tokens=100, temperature=0.0)
# Create an LLM.
llm = LLM(model="/disknpu/llm/model/DeepSeek-R1-Distill-Qwen-7B", enable_chunked_prefill=True, max_num_batched_tokens=2048, gpu_memory_utilization=0.5, block_size=16)

# Generate texts from the prompts.
outputs = llm.generate(prompts, sampling_params)
for output in outputs:
    prompt = output.prompt
    generated_text = output.outputs[0].text
    print(f"Generated text: {generated_text}")
@xyyyzzz xyyyzzz added the bug Something isn't working label May 19, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

1 participant