Cache Aware Streaming script yields different results for different batch_sizes #12840

gabitza-tech · 2025-04-01T14:19:01Z

Describe the bug

When running the https://github.com/NVIDIA/NeMo/blob/main/examples/asr/asr_cache_aware_streaming/speech_to_text_cache_aware_streaming_infer.py script on a dataset I obtain different results when varying the batch-size (it can go up to +-1% WER)

Steps/Code to reproduce bug

Using nemo version v2.2.1 (I also reproduced it with v1.23.0).
Using this model: https://catalog.ngc.nvidia.com/orgs/nvidia/teams/nemo/models/stt_en_fastconformer_hybrid_large_streaming_multi/files
Using an4 dataset train_manifest.json and test_manifest.json (downloaded based on this tutorial: https://github.com/NVIDIA/NeMo/blob/main/tutorials/asr/ASR_with_Transducers.ipynb and then processed using the process_an4_dataset.py script)
Code:

python3 asr_cache_aware_streaming/speech_to_text_cache_aware_streaming_infer.py --asr_model stt_en_fastconformer_hybrid_large_streaming_multi.nemo --device cuda:0 --set_decoder ctc --manifest_file datasets/an4/test_manifest.json --output_path an_test_ctc_bs1.json --batch_size 1

Expected behavior

When running it with different batch_sizes, the results can differ quite a lot. (for the AN4 dataset, the difference are pretty small, but on some proprietary files that I have tested, it can differ by up to +-1% WER)

I think the root of this bug could be from this method:

NeMo/nemo/collections/asr/parts/utils/streaming_utils.py

Line 1550 in 087914a

def append_processed_signal(self, processed_signal, stream_id=-1):

When padding audios to the max length of the buffer with 0, the pre-encoded cache could introduce new words as we add an additional chunk of 0s + previous context (that's my guess).

I also think there might be a bug here:

NeMo/nemo/collections/asr/parts/utils/streaming_utils.py

Line 1544 in 087914a

processed_signal, processed_signal_length = self.preprocess_audio(audio)

The preprocessed signal is computed over the WHOLE audio, which can introduce big differences compared to preprocessing each chunk separately and then aggregating. (e.g.: preprocessing a [5625600] => [1,80,35161] processed signal, while splitting this audio into chunks of 17919 samples (16000 * 1.12s chunk) and aggregating the processed chunks would lead to a processed signal of shape [1,80,35162]) However, this might be a separate bug in che Cache Aware Simulator. I will try to reproduce the bug in another issue maybe, because I also observed a difference when running the cache aware simulator script vs incoming chunks to the server.

Some of my results on the AN4 data splits (The differences here are not that big, but on other files I observed even bigger differences)

Dataset	Decoder	Batch Size	WER
an4_train	CTC	1	8.12
an4_train	CTC	32	8.01
an4_train	CTC	64	8.01
an4_train	RNNT	1	8.25
an4_train	RNNT	32	8.38
an4_train	RNNT	64	8.38
an4_test	CTC	1	6.6
an4_test	CTC	32	6.21
an4_test	CTC	64	6.21
an4_test	RNNT	1	6.86
an4_test	RNNT	32	6.73
an4_test	RNNT	64	6.73

**Environment overview **

Package                                       Version          Editable project location
--------------------------------------------- ---------------- ----------------------------------------------------------
absl-py                                       2.0.0
accelerate                                    0.28.0
addict                                        2.4.0
aiofiles                                      23.2.1
aiohttp                                       3.9.3
aiohttp-retry                                 2.9.1
aiosignal                                     1.3.1
alabaster                                     0.7.16
alembic                                       1.13.1
aliyun-python-sdk-core                        2.15.1
aliyun-python-sdk-kms                         2.16.3
altair                                        5.3.0
aniso8601                                     9.0.1
annotated-types                               0.6.0
ansi2html                                     1.8.0
antlr4-python3-runtime                        4.9.3
anyio                                         3.7.1
appdirs                                       1.4.4
apturl                                        0.5.2
asciitree                                     0.3.3
asteroid-filterbanks                          0.4.0
asttokens                                     2.4.1
astunparse                                    1.6.3
async-timeout                                 4.0.3
attrdict                                      2.0.1
attrs                                         23.2.0
audioread                                     3.0.1
azure-cognitiveservices-speech                1.32.1
azure-cognitiveservices-vision-computervision 0.9.0
azure-common                                  1.1.28
azure-core                                    1.30.1
Babel                                         2.14.0
beautifulsoup4                                4.12.3
black                                         19.10b0
blinker                                       1.4
boto3                                         1.34.82
botocore                                      1.34.82
braceexpand                                   0.1.7
Brlapi                                        0.8.3
build                                         1.2.1
cachetools                                    5.3.1
cdifflib                                      1.2.6
certifi                                       2023.7.22
cffi                                          1.16.0
chardet                                       4.0.0
charset-normalizer                            3.2.0
chrome-gnome-shell                            0.0.0
click                                         8.0.2
clip                                          0.2.0
cloudpickle                                   3.1.0
cmake                                         3.27.9
cockpit                                       331
colorama                                      0.4.6
coloredlogs                                   15.0.1
colorlog                                      6.8.2
comm                                          0.2.2
command-not-found                             0.3
configobj                                     5.0.6
contourpy                                     1.1.1
crcmod                                        1.7
cryptography                                  3.4.8
cupshelpers                                   1.0
cycler                                        0.12.1
Cython                                        3.0.10
cytoolz                                       0.12.3
dash                                          2.13.0
dash-bootstrap-components                     1.6.0
dash-core-components                          2.0.0
dash-html-components                          2.0.0
dash-table                                    5.0.0
datasets                                      2.18.0
dbus-python                                   1.2.18
decorator                                     5.1.1
defer                                         1.0.6
diff-match-patch                              20230430
diffusers                                     0.27.2
dill                                          0.3.8
Distance                                      0.1.3
distlib                                       0.3.4
distro                                        1.7.0
distro-info                                   1.1+ubuntu0.2
dnspython                                     2.6.1
docker-pycreds                                0.4.0
docopt                                        0.6.2
docutils                                      0.20.1
dover-lap                                     1.3.1
editdistance                                  0.8.1
einops                                        0.7.0
einops-exts                                   0.0.4
email_validator                               2.1.1
et-xmlfile                                    1.1.0
exceptiongroup                                1.2.0
executing                                     2.0.1
faiss-cpu                                     1.8.0
fastapi                                       0.111.0
fastapi-cli                                   0.0.3
fasteners                                     0.19
fasttext                                      0.9.2
ffmpy                                         0.3.2
fiddle                                        0.3.0
filelock                                      3.9.0
flashlight                                    0.1.1
flashlight-text                               0.0.7
Flask                                         2.2.5
Flask-RESTful                                 0.3.10
flask-sock                                    0.7.0
flatbuffers                                   24.3.25
fonttools                                     4.43.1
frozenlist                                    1.4.1
fsspec                                        2024.2.0
ftfy                                          6.2.0
future                                        1.0.0
g2p-en                                        2.1.0
gast                                          0.5.4
gdown                                         5.1.0
gitdb                                         4.0.11
GitPython                                     3.1.43
google-api-core                               2.12.0
google-auth                                   2.23.0
google-auth-oauthlib                          1.0.0
google-cloud-bigquery                         3.12.0
google-cloud-core                             2.3.3
google-cloud-speech                           2.21.0
google-cloud-storage                          2.11.0
google-crc32c                                 1.5.0
google-pasta                                  0.2.0
google-resumable-media                        2.6.0
googleapis-common-protos                      1.60.0
gradio                                        3.38.0
gradio_client                                 0.16.2
graphviz                                      0.20.3
greenlet                                      3.0.3
grpcio                                        1.58.0
grpcio-status                                 1.59.0
h11                                           0.14.0
h5py                                          3.10.0
httpcore                                      1.0.2
httplib2                                      0.20.2
httptools                                     0.6.1
httpx                                         0.25.1
huggingface-hub                               0.30.1
humanfriendly                                 10.0
hydra-core                                    1.3.2
HyperPyYAML                                   1.2.2
idna                                          3.4
ijson                                         3.2.3
imageio                                       2.34.0
imagesize                                     1.4.1
importlib_metadata                            7.1.0
imutils                                       0.5.4
inaSpeechSegmenter                            0.7.8
inflect                                       7.2.0
iniconfig                                     2.0.0
intervaltree                                  3.1.0
iotop                                         0.6
ipython                                       8.23.0
ipywidgets                                    8.1.2
isodate                                       0.6.1
isort                                         5.13.2
itsdangerous                                  2.1.2
jedi                                          0.19.1
jeepney                                       0.7.1
jieba                                         0.42.1
Jinja2                                        3.1.3
jiter                                         0.8.2
jiwer                                         2.5.2
jmespath                                      0.10.0
joblib                                        1.3.2
jsonschema                                    4.21.1
jsonschema-specifications                     2023.12.1
julius                                        0.2.7
jupyterlab_widgets                            3.0.10
kaldi-python-io                               1.2.2
kaldiio                                       2.18.0
kenlm                                         0.2.0
keras                                         2.15.0
Keras-Applications                            1.0.8
keyring                                       23.5.0
kiwisolver                                    1.4.5
kornia                                        0.7.2
kornia_rs                                     0.1.3
language-selector                             0.1
latexcodec                                    3.0.0
launchpadlib                                  1.10.16
lazr.restfulclient                            0.14.4
lazr.uri                                      1.0.6
lazy_loader                                   0.4
Levenshtein                                   0.22.0
lhotse                                        1.28.0
libclang                                      16.0.6
libcst                                        1.5.1
librosa                                       0.10.2.post1
libvirt-python                                8.0.0
lightning                                     2.2.0.post0
lightning-utilities                           0.11.2
lilcom                                        1.7
linkify-it-py                                 2.0.3
llvmlite                                      0.44.0
loguru                                        0.7.2
louis                                         3.20.0
lxml                                          5.2.1
macaroonbakery                                1.3.1
Mako                                          1.3.2
Markdown                                      3.4.4
markdown-it-py                                2.2.0
markdown2                                     2.4.13
MarkupSafe                                    2.1.3
marshmallow                                   3.21.1
matplotlib                                    3.8.3
matplotlib-inline                             0.1.6
mdit-py-plugins                               0.3.3
mdurl                                         0.1.2
megatron_core                                 0.5.0
meson                                         0.61.2
MIDIUtil                                      1.2.1
ml-dtypes                                     0.2.0
modelscope                                    1.14.0
more-itertools                                10.2.0
mpmath                                        1.3.0
msgpack                                       1.0.8
msrest                                        0.7.1
mtcnn                                         0.1.1
multidict                                     6.0.5
multiprocess                                  0.70.16
mutagen                                       1.47.0
nemo_text_processing                          0.3.0rc0
nemo-toolkit                                  2.2.1            /mnt/zfs-mirror-02/homes/gpirlogeanu/git_repos/nemo_v2.2.1
nerfacc                                       0.5.3
nest-asyncio                                  1.5.8
netifaces                                     0.11.0
networkx                                      3.2.1
nltk                                          3.8.1
numba                                         0.61.0
numcodecs                                     0.12.1
numpy                                         1.26.4
nvidia-cublas-cu11                            11.10.3.66
nvidia-cublas-cu12                            12.4.2.65
nvidia-cuda-cupti-cu12                        12.4.99
nvidia-cuda-nvcc-cu12                         12.2.140
nvidia-cuda-nvrtc-cu11                        11.7.99
nvidia-cuda-nvrtc-cu12                        12.4.99
nvidia-cuda-runtime-cu11                      11.7.99
nvidia-cuda-runtime-cu12                      12.4.99
nvidia-cudnn-cu11                             8.5.0.96
nvidia-cudnn-cu12                             9.1.0.70
nvidia-cufft-cu12                             11.2.0.44
nvidia-curand-cu12                            10.3.5.119
nvidia-cusolver-cu12                          11.6.0.99
nvidia-cusparse-cu12                          12.3.0.142
nvidia-nccl-cu12                              2.20.5
nvidia-nvjitlink-cu12                         12.4.99
nvidia-nvtx-cu12                              12.4.99
oauthlib                                      3.2.2
olefile                                       0.46
omegaconf                                     2.3.0
onnx                                          1.16.0
onnxruntime                                   1.17.3
open-clip-torch                               2.24.0
openai                                        1.59.8
OpenCC                                        1.1.6
opencv-python                                 4.9.0.80
openpyxl                                      3.1.2
opt-einsum                                    3.3.0
optuna                                        3.5.0
orjson                                        3.10.3
oss2                                          2.18.5
packaging                                     24.0
pandas                                        2.2.1
pangu                                         4.0.6.1
parameterized                                 0.9.0
parso                                         0.8.4
pathspec                                      0.12.1
pexpect                                       4.9.0
pika                                          1.3.2
pillow                                        10.2.0
pip                                           25.0.1
plac                                          1.4.3
platformdirs                                  4.2.0
plotly                                        5.17.0
pluggy                                        1.4.0
pooch                                         1.8.1
portalocker                                   2.8.2
primePy                                       1.3
progress                                      1.6
prompt-toolkit                                3.0.43
proto-plus                                    1.22.3
protobuf                                      3.20.3
psutil                                        5.9.8
ptyprocess                                    0.7.0
pure-eval                                     0.2.2
pyAesCrypt                                    6.1.1
pyannote.audio                                3.1.1
pyannote.core                                 5.0.0
pyannote.database                             5.1.0
pyannote.metrics                              3.2.1
pyannote.pipeline                             3.0.1
pyarrow                                       15.0.2
pyarrow-hotfix                                0.6
pyasn1                                        0.5.0
pyasn1-modules                                0.3.0
PyAudio                                       0.2.11
pybind11                                      2.12.0
pybtex                                        0.24.0
pybtex-docutils                               1.0.3
pycairo                                       1.20.1
pycparser                                     2.22
pycryptodome                                  3.20.0
pycups                                        2.0.1
pydantic                                      2.5.1
pydantic_core                                 2.14.3
pydeck                                        0.8.1b0
pydub                                         0.25.1
Pygments                                      2.17.2
PyGObject                                     3.42.1
PyJWT                                         2.3.0
pyloudnorm                                    0.1.1
pymacaroons                                   0.13.0
PyMCubes                                      0.1.4
PyNaCl                                        1.5.0
pynini                                        2.1.5
pyo                                           1.0.5
pyparsing                                     3.1.2
pypinyin                                      0.51.0
pypinyin-dict                                 0.8.0
pyproject_hooks                               1.0.0
pyRFC3339                                     1.1
Pyro4                                         4.82
PySocks                                       1.7.1
pytest                                        8.1.1
pytest-runner                                 6.0.1
pytextgrid                                    0.1.4
python-apt                                    2.4.0+ubuntu4
python-dateutil                               2.8.2
python-debian                                 0.1.43+ubuntu1.1
python-dotenv                                 1.0.1
python-multipart                              0.0.9
pytorch-lightning                             2.0.7
pytorch-metric-learning                       2.4.1
pytz                                          2024.1
pyxdg                                         0.27
PyYAML                                        6.0.1
rapidfuzz                                     2.13.7
referencing                                   0.34.0
regex                                         2023.12.25
reportlab                                     3.6.8
requests                                      2.28.2
requests-oauthlib                             1.3.1
resampy                                       0.4.3
retrying                                      1.3.4
rich                                          13.7.1
rouge-score                                   0.1.2
rpds-py                                       0.18.0
rsa                                           4.9
ruamel.yaml                                   0.18.6
ruamel.yaml.clib                              0.2.8
s3transfer                                    0.10.1
sacremoses                                    0.1.1
safetensors                                   0.4.2
sanic                                         0.7.0
scikit-image                                  0.23.1
scikit-learn                                  1.0.2
scipy                                         1.12.0
screen-resolution-extra                       0.0.0
seaborn                                       0.13.2
SecretStorage                                 3.3.1
semantic-version                              2.10.0
semver                                        3.0.2
sentence-transformers                         2.6.1
sentencepiece                                 0.2.0
sentry-sdk                                    1.45.0
serpent                                       1.41
setproctitle                                  1.3.3
setuptools                                    76.0.0
shellingham                                   1.5.4
simple-websocket                              1.1.0
simplejson                                    3.19.2
six                                           1.16.0
smmap                                         5.0.1
sniffio                                       1.3.0
snowballstemmer                               2.2.0
sortedcontainers                              2.4.0
sounddevice                                   0.4.6
soundfile                                     0.12.1
soupsieve                                     2.5
sox                                           1.5.0
soxr                                          0.3.7
speechbrain                                   0.5.16
SpeechRecognition                             3.10.3
Sphinx                                        7.2.6
sphinxcontrib-applehelp                       1.0.8
sphinxcontrib-bibtex                          2.6.2
sphinxcontrib-devhelp                         1.0.6
sphinxcontrib-htmlhelp                        2.0.5
sphinxcontrib-jsmath                          1.0.1
sphinxcontrib-qthelp                          1.0.7
sphinxcontrib-serializinghtml                 1.1.10
spy-der                                       0.4.1
SQLAlchemy                                    2.0.27
srt                                           3.5.3
ssh-import-id                                 5.11
stack-data                                    0.6.3
starlette                                     0.37.2
streamlit                                     1.32.2
sympy                                         1.12
systemd-python                                234
tabulate                                      0.9.0
taming-transformers                           0.0.1
tenacity                                      8.2.3
tensorboard                                   2.15.1
tensorboard-data-server                       0.7.1
tensorboardX                                  2.6.2.2
tensorflow                                    2.15.0.post1
tensorflow-estimator                          2.15.0
tensorflow-io-gcs-filesystem                  0.35.0
tensorstore                                   0.1.45
termcolor                                     2.4.0
terminator                                    2.1.1
text-unidecode                                1.3
textdistance                                  4.6.1
texterrors                                    0.4.4
threadpoolctl                                 3.2.0
tifffile                                      2024.2.12
tiktoken                                      0.5.1
timm                                          0.9.16
tokenizers                                    0.20.3
toml                                          0.10.2
tomli                                         2.0.1
toolz                                         0.12.1
torch                                         2.4.0+cu124
torch-audiomentations                         0.11.1
torch-pitch-shift                             1.2.4
torchaudio                                    2.4.0+cu124
torchdiffeq                                   0.2.3
torchmetrics                                  1.3.2
torchsde                                      0.2.6
torchvision                                   0.19.0+cu124
tornado                                       6.4
tqdm                                          4.66.2
traitlets                                     5.14.2
trampoline                                    0.1.2
transformers                                  4.46.3
trimesh                                       4.3.0
triton                                        3.0.0
twilio                                        9.4.4
typed-ast                                     1.5.5
typeguard                                     4.2.1
typer                                         0.12.3
typing_extensions                             4.11.0
tzdata                                        2024.1
ubuntu-drivers-common                         0.0.0
ubuntu-pro-client                             8001
uc-micro-py                                   1.0.3
ufw                                           0.36.1
ujson                                         5.9.0
unattended-upgrades                           0.1
urllib3                                       1.26.16
uvicorn                                       0.29.0
uvloop                                        0.19.0
virtualenv                                    20.13.0+ds
vosk                                          0.3.45
wadllib                                       1.3.6
wandb                                         0.16.6
watchdog                                      4.0.0
watchfiles                                    0.21.0
wcwidth                                       0.2.13
webdataset                                    0.1.62
webrtcvad                                     2.0.10
websockets                                    10.3
Werkzeug                                      2.2.3
wget                                          3.2
wheel                                         0.37.1
widgetsnbextension                            4.0.10
wrapt                                         1.14.1
wsproto                                       1.2.0
xdg                                           5
xkit                                          0.0.0
xxhash                                        3.4.1
yapf                                          0.40.2
yarl                                          1.9.4
youtokentome                                  1.0.6
zarr                                          2.17.2
zipp                                          3.18.1

Additional context

GPU: P5000 quatro

The text was updated successfully, but these errors were encountered:

gabitza-tech added the bug Something isn't working label Apr 1, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Cache Aware Streaming script yields different results for different batch_sizes #12840

Cache Aware Streaming script yields different results for different batch_sizes #12840

gabitza-tech commented Apr 1, 2025 •

edited

Loading

Cache Aware Streaming script yields different results for different batch_sizes #12840

Cache Aware Streaming script yields different results for different batch_sizes #12840

Comments

gabitza-tech commented Apr 1, 2025 • edited Loading

gabitza-tech commented Apr 1, 2025 •

edited

Loading