Skip to content

Cache Aware Streaming script yields different results for different batch_sizes #12840

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
gabitza-tech opened this issue Apr 1, 2025 · 0 comments
Labels
bug Something isn't working

Comments

@gabitza-tech
Copy link
Contributor

gabitza-tech commented Apr 1, 2025

Describe the bug

When running the https://github.com/NVIDIA/NeMo/blob/main/examples/asr/asr_cache_aware_streaming/speech_to_text_cache_aware_streaming_infer.py script on a dataset I obtain different results when varying the batch-size (it can go up to +-1% WER)

Steps/Code to reproduce bug

python3 asr_cache_aware_streaming/speech_to_text_cache_aware_streaming_infer.py --asr_model stt_en_fastconformer_hybrid_large_streaming_multi.nemo --device cuda:0 --set_decoder ctc --manifest_file datasets/an4/test_manifest.json --output_path an_test_ctc_bs1.json --batch_size 1

Expected behavior

When running it with different batch_sizes, the results can differ quite a lot. (for the AN4 dataset, the difference are pretty small, but on some proprietary files that I have tested, it can differ by up to +-1% WER)

I think the root of this bug could be from this method:

def append_processed_signal(self, processed_signal, stream_id=-1):
When padding audios to the max length of the buffer with 0, the pre-encoded cache could introduce new words as we add an additional chunk of 0s + previous context (that's my guess).

I also think there might be a bug here:

processed_signal, processed_signal_length = self.preprocess_audio(audio)
The preprocessed signal is computed over the WHOLE audio, which can introduce big differences compared to preprocessing each chunk separately and then aggregating. (e.g.: preprocessing a [5625600] => [1,80,35161] processed signal, while splitting this audio into chunks of 17919 samples (16000 * 1.12s chunk) and aggregating the processed chunks would lead to a processed signal of shape [1,80,35162]) However, this might be a separate bug in che Cache Aware Simulator. I will try to reproduce the bug in another issue maybe, because I also observed a difference when running the cache aware simulator script vs incoming chunks to the server.

Some of my results on the AN4 data splits (The differences here are not that big, but on other files I observed even bigger differences)

Dataset Decoder Batch Size WER
an4_train CTC 1 8.12
an4_train CTC 32 8.01
an4_train CTC 64 8.01
an4_train RNNT 1 8.25
an4_train RNNT 32 8.38
an4_train RNNT 64 8.38
an4_test CTC 1 6.6
an4_test CTC 32 6.21
an4_test CTC 64 6.21
an4_test RNNT 1 6.86
an4_test RNNT 32 6.73
an4_test RNNT 64 6.73

**Environment overview **

Package                                       Version          Editable project location
--------------------------------------------- ---------------- ----------------------------------------------------------
absl-py                                       2.0.0
accelerate                                    0.28.0
addict                                        2.4.0
aiofiles                                      23.2.1
aiohttp                                       3.9.3
aiohttp-retry                                 2.9.1
aiosignal                                     1.3.1
alabaster                                     0.7.16
alembic                                       1.13.1
aliyun-python-sdk-core                        2.15.1
aliyun-python-sdk-kms                         2.16.3
altair                                        5.3.0
aniso8601                                     9.0.1
annotated-types                               0.6.0
ansi2html                                     1.8.0
antlr4-python3-runtime                        4.9.3
anyio                                         3.7.1
appdirs                                       1.4.4
apturl                                        0.5.2
asciitree                                     0.3.3
asteroid-filterbanks                          0.4.0
asttokens                                     2.4.1
astunparse                                    1.6.3
async-timeout                                 4.0.3
attrdict                                      2.0.1
attrs                                         23.2.0
audioread                                     3.0.1
azure-cognitiveservices-speech                1.32.1
azure-cognitiveservices-vision-computervision 0.9.0
azure-common                                  1.1.28
azure-core                                    1.30.1
Babel                                         2.14.0
beautifulsoup4                                4.12.3
black                                         19.10b0
blinker                                       1.4
boto3                                         1.34.82
botocore                                      1.34.82
braceexpand                                   0.1.7
Brlapi                                        0.8.3
build                                         1.2.1
cachetools                                    5.3.1
cdifflib                                      1.2.6
certifi                                       2023.7.22
cffi                                          1.16.0
chardet                                       4.0.0
charset-normalizer                            3.2.0
chrome-gnome-shell                            0.0.0
click                                         8.0.2
clip                                          0.2.0
cloudpickle                                   3.1.0
cmake                                         3.27.9
cockpit                                       331
colorama                                      0.4.6
coloredlogs                                   15.0.1
colorlog                                      6.8.2
comm                                          0.2.2
command-not-found                             0.3
configobj                                     5.0.6
contourpy                                     1.1.1
crcmod                                        1.7
cryptography                                  3.4.8
cupshelpers                                   1.0
cycler                                        0.12.1
Cython                                        3.0.10
cytoolz                                       0.12.3
dash                                          2.13.0
dash-bootstrap-components                     1.6.0
dash-core-components                          2.0.0
dash-html-components                          2.0.0
dash-table                                    5.0.0
datasets                                      2.18.0
dbus-python                                   1.2.18
decorator                                     5.1.1
defer                                         1.0.6
diff-match-patch                              20230430
diffusers                                     0.27.2
dill                                          0.3.8
Distance                                      0.1.3
distlib                                       0.3.4
distro                                        1.7.0
distro-info                                   1.1+ubuntu0.2
dnspython                                     2.6.1
docker-pycreds                                0.4.0
docopt                                        0.6.2
docutils                                      0.20.1
dover-lap                                     1.3.1
editdistance                                  0.8.1
einops                                        0.7.0
einops-exts                                   0.0.4
email_validator                               2.1.1
et-xmlfile                                    1.1.0
exceptiongroup                                1.2.0
executing                                     2.0.1
faiss-cpu                                     1.8.0
fastapi                                       0.111.0
fastapi-cli                                   0.0.3
fasteners                                     0.19
fasttext                                      0.9.2
ffmpy                                         0.3.2
fiddle                                        0.3.0
filelock                                      3.9.0
flashlight                                    0.1.1
flashlight-text                               0.0.7
Flask                                         2.2.5
Flask-RESTful                                 0.3.10
flask-sock                                    0.7.0
flatbuffers                                   24.3.25
fonttools                                     4.43.1
frozenlist                                    1.4.1
fsspec                                        2024.2.0
ftfy                                          6.2.0
future                                        1.0.0
g2p-en                                        2.1.0
gast                                          0.5.4
gdown                                         5.1.0
gitdb                                         4.0.11
GitPython                                     3.1.43
google-api-core                               2.12.0
google-auth                                   2.23.0
google-auth-oauthlib                          1.0.0
google-cloud-bigquery                         3.12.0
google-cloud-core                             2.3.3
google-cloud-speech                           2.21.0
google-cloud-storage                          2.11.0
google-crc32c                                 1.5.0
google-pasta                                  0.2.0
google-resumable-media                        2.6.0
googleapis-common-protos                      1.60.0
gradio                                        3.38.0
gradio_client                                 0.16.2
graphviz                                      0.20.3
greenlet                                      3.0.3
grpcio                                        1.58.0
grpcio-status                                 1.59.0
h11                                           0.14.0
h5py                                          3.10.0
httpcore                                      1.0.2
httplib2                                      0.20.2
httptools                                     0.6.1
httpx                                         0.25.1
huggingface-hub                               0.30.1
humanfriendly                                 10.0
hydra-core                                    1.3.2
HyperPyYAML                                   1.2.2
idna                                          3.4
ijson                                         3.2.3
imageio                                       2.34.0
imagesize                                     1.4.1
importlib_metadata                            7.1.0
imutils                                       0.5.4
inaSpeechSegmenter                            0.7.8
inflect                                       7.2.0
iniconfig                                     2.0.0
intervaltree                                  3.1.0
iotop                                         0.6
ipython                                       8.23.0
ipywidgets                                    8.1.2
isodate                                       0.6.1
isort                                         5.13.2
itsdangerous                                  2.1.2
jedi                                          0.19.1
jeepney                                       0.7.1
jieba                                         0.42.1
Jinja2                                        3.1.3
jiter                                         0.8.2
jiwer                                         2.5.2
jmespath                                      0.10.0
joblib                                        1.3.2
jsonschema                                    4.21.1
jsonschema-specifications                     2023.12.1
julius                                        0.2.7
jupyterlab_widgets                            3.0.10
kaldi-python-io                               1.2.2
kaldiio                                       2.18.0
kenlm                                         0.2.0
keras                                         2.15.0
Keras-Applications                            1.0.8
keyring                                       23.5.0
kiwisolver                                    1.4.5
kornia                                        0.7.2
kornia_rs                                     0.1.3
language-selector                             0.1
latexcodec                                    3.0.0
launchpadlib                                  1.10.16
lazr.restfulclient                            0.14.4
lazr.uri                                      1.0.6
lazy_loader                                   0.4
Levenshtein                                   0.22.0
lhotse                                        1.28.0
libclang                                      16.0.6
libcst                                        1.5.1
librosa                                       0.10.2.post1
libvirt-python                                8.0.0
lightning                                     2.2.0.post0
lightning-utilities                           0.11.2
lilcom                                        1.7
linkify-it-py                                 2.0.3
llvmlite                                      0.44.0
loguru                                        0.7.2
louis                                         3.20.0
lxml                                          5.2.1
macaroonbakery                                1.3.1
Mako                                          1.3.2
Markdown                                      3.4.4
markdown-it-py                                2.2.0
markdown2                                     2.4.13
MarkupSafe                                    2.1.3
marshmallow                                   3.21.1
matplotlib                                    3.8.3
matplotlib-inline                             0.1.6
mdit-py-plugins                               0.3.3
mdurl                                         0.1.2
megatron_core                                 0.5.0
meson                                         0.61.2
MIDIUtil                                      1.2.1
ml-dtypes                                     0.2.0
modelscope                                    1.14.0
more-itertools                                10.2.0
mpmath                                        1.3.0
msgpack                                       1.0.8
msrest                                        0.7.1
mtcnn                                         0.1.1
multidict                                     6.0.5
multiprocess                                  0.70.16
mutagen                                       1.47.0
nemo_text_processing                          0.3.0rc0
nemo-toolkit                                  2.2.1            /mnt/zfs-mirror-02/homes/gpirlogeanu/git_repos/nemo_v2.2.1
nerfacc                                       0.5.3
nest-asyncio                                  1.5.8
netifaces                                     0.11.0
networkx                                      3.2.1
nltk                                          3.8.1
numba                                         0.61.0
numcodecs                                     0.12.1
numpy                                         1.26.4
nvidia-cublas-cu11                            11.10.3.66
nvidia-cublas-cu12                            12.4.2.65
nvidia-cuda-cupti-cu12                        12.4.99
nvidia-cuda-nvcc-cu12                         12.2.140
nvidia-cuda-nvrtc-cu11                        11.7.99
nvidia-cuda-nvrtc-cu12                        12.4.99
nvidia-cuda-runtime-cu11                      11.7.99
nvidia-cuda-runtime-cu12                      12.4.99
nvidia-cudnn-cu11                             8.5.0.96
nvidia-cudnn-cu12                             9.1.0.70
nvidia-cufft-cu12                             11.2.0.44
nvidia-curand-cu12                            10.3.5.119
nvidia-cusolver-cu12                          11.6.0.99
nvidia-cusparse-cu12                          12.3.0.142
nvidia-nccl-cu12                              2.20.5
nvidia-nvjitlink-cu12                         12.4.99
nvidia-nvtx-cu12                              12.4.99
oauthlib                                      3.2.2
olefile                                       0.46
omegaconf                                     2.3.0
onnx                                          1.16.0
onnxruntime                                   1.17.3
open-clip-torch                               2.24.0
openai                                        1.59.8
OpenCC                                        1.1.6
opencv-python                                 4.9.0.80
openpyxl                                      3.1.2
opt-einsum                                    3.3.0
optuna                                        3.5.0
orjson                                        3.10.3
oss2                                          2.18.5
packaging                                     24.0
pandas                                        2.2.1
pangu                                         4.0.6.1
parameterized                                 0.9.0
parso                                         0.8.4
pathspec                                      0.12.1
pexpect                                       4.9.0
pika                                          1.3.2
pillow                                        10.2.0
pip                                           25.0.1
plac                                          1.4.3
platformdirs                                  4.2.0
plotly                                        5.17.0
pluggy                                        1.4.0
pooch                                         1.8.1
portalocker                                   2.8.2
primePy                                       1.3
progress                                      1.6
prompt-toolkit                                3.0.43
proto-plus                                    1.22.3
protobuf                                      3.20.3
psutil                                        5.9.8
ptyprocess                                    0.7.0
pure-eval                                     0.2.2
pyAesCrypt                                    6.1.1
pyannote.audio                                3.1.1
pyannote.core                                 5.0.0
pyannote.database                             5.1.0
pyannote.metrics                              3.2.1
pyannote.pipeline                             3.0.1
pyarrow                                       15.0.2
pyarrow-hotfix                                0.6
pyasn1                                        0.5.0
pyasn1-modules                                0.3.0
PyAudio                                       0.2.11
pybind11                                      2.12.0
pybtex                                        0.24.0
pybtex-docutils                               1.0.3
pycairo                                       1.20.1
pycparser                                     2.22
pycryptodome                                  3.20.0
pycups                                        2.0.1
pydantic                                      2.5.1
pydantic_core                                 2.14.3
pydeck                                        0.8.1b0
pydub                                         0.25.1
Pygments                                      2.17.2
PyGObject                                     3.42.1
PyJWT                                         2.3.0
pyloudnorm                                    0.1.1
pymacaroons                                   0.13.0
PyMCubes                                      0.1.4
PyNaCl                                        1.5.0
pynini                                        2.1.5
pyo                                           1.0.5
pyparsing                                     3.1.2
pypinyin                                      0.51.0
pypinyin-dict                                 0.8.0
pyproject_hooks                               1.0.0
pyRFC3339                                     1.1
Pyro4                                         4.82
PySocks                                       1.7.1
pytest                                        8.1.1
pytest-runner                                 6.0.1
pytextgrid                                    0.1.4
python-apt                                    2.4.0+ubuntu4
python-dateutil                               2.8.2
python-debian                                 0.1.43+ubuntu1.1
python-dotenv                                 1.0.1
python-multipart                              0.0.9
pytorch-lightning                             2.0.7
pytorch-metric-learning                       2.4.1
pytz                                          2024.1
pyxdg                                         0.27
PyYAML                                        6.0.1
rapidfuzz                                     2.13.7
referencing                                   0.34.0
regex                                         2023.12.25
reportlab                                     3.6.8
requests                                      2.28.2
requests-oauthlib                             1.3.1
resampy                                       0.4.3
retrying                                      1.3.4
rich                                          13.7.1
rouge-score                                   0.1.2
rpds-py                                       0.18.0
rsa                                           4.9
ruamel.yaml                                   0.18.6
ruamel.yaml.clib                              0.2.8
s3transfer                                    0.10.1
sacremoses                                    0.1.1
safetensors                                   0.4.2
sanic                                         0.7.0
scikit-image                                  0.23.1
scikit-learn                                  1.0.2
scipy                                         1.12.0
screen-resolution-extra                       0.0.0
seaborn                                       0.13.2
SecretStorage                                 3.3.1
semantic-version                              2.10.0
semver                                        3.0.2
sentence-transformers                         2.6.1
sentencepiece                                 0.2.0
sentry-sdk                                    1.45.0
serpent                                       1.41
setproctitle                                  1.3.3
setuptools                                    76.0.0
shellingham                                   1.5.4
simple-websocket                              1.1.0
simplejson                                    3.19.2
six                                           1.16.0
smmap                                         5.0.1
sniffio                                       1.3.0
snowballstemmer                               2.2.0
sortedcontainers                              2.4.0
sounddevice                                   0.4.6
soundfile                                     0.12.1
soupsieve                                     2.5
sox                                           1.5.0
soxr                                          0.3.7
speechbrain                                   0.5.16
SpeechRecognition                             3.10.3
Sphinx                                        7.2.6
sphinxcontrib-applehelp                       1.0.8
sphinxcontrib-bibtex                          2.6.2
sphinxcontrib-devhelp                         1.0.6
sphinxcontrib-htmlhelp                        2.0.5
sphinxcontrib-jsmath                          1.0.1
sphinxcontrib-qthelp                          1.0.7
sphinxcontrib-serializinghtml                 1.1.10
spy-der                                       0.4.1
SQLAlchemy                                    2.0.27
srt                                           3.5.3
ssh-import-id                                 5.11
stack-data                                    0.6.3
starlette                                     0.37.2
streamlit                                     1.32.2
sympy                                         1.12
systemd-python                                234
tabulate                                      0.9.0
taming-transformers                           0.0.1
tenacity                                      8.2.3
tensorboard                                   2.15.1
tensorboard-data-server                       0.7.1
tensorboardX                                  2.6.2.2
tensorflow                                    2.15.0.post1
tensorflow-estimator                          2.15.0
tensorflow-io-gcs-filesystem                  0.35.0
tensorstore                                   0.1.45
termcolor                                     2.4.0
terminator                                    2.1.1
text-unidecode                                1.3
textdistance                                  4.6.1
texterrors                                    0.4.4
threadpoolctl                                 3.2.0
tifffile                                      2024.2.12
tiktoken                                      0.5.1
timm                                          0.9.16
tokenizers                                    0.20.3
toml                                          0.10.2
tomli                                         2.0.1
toolz                                         0.12.1
torch                                         2.4.0+cu124
torch-audiomentations                         0.11.1
torch-pitch-shift                             1.2.4
torchaudio                                    2.4.0+cu124
torchdiffeq                                   0.2.3
torchmetrics                                  1.3.2
torchsde                                      0.2.6
torchvision                                   0.19.0+cu124
tornado                                       6.4
tqdm                                          4.66.2
traitlets                                     5.14.2
trampoline                                    0.1.2
transformers                                  4.46.3
trimesh                                       4.3.0
triton                                        3.0.0
twilio                                        9.4.4
typed-ast                                     1.5.5
typeguard                                     4.2.1
typer                                         0.12.3
typing_extensions                             4.11.0
tzdata                                        2024.1
ubuntu-drivers-common                         0.0.0
ubuntu-pro-client                             8001
uc-micro-py                                   1.0.3
ufw                                           0.36.1
ujson                                         5.9.0
unattended-upgrades                           0.1
urllib3                                       1.26.16
uvicorn                                       0.29.0
uvloop                                        0.19.0
virtualenv                                    20.13.0+ds
vosk                                          0.3.45
wadllib                                       1.3.6
wandb                                         0.16.6
watchdog                                      4.0.0
watchfiles                                    0.21.0
wcwidth                                       0.2.13
webdataset                                    0.1.62
webrtcvad                                     2.0.10
websockets                                    10.3
Werkzeug                                      2.2.3
wget                                          3.2
wheel                                         0.37.1
widgetsnbextension                            4.0.10
wrapt                                         1.14.1
wsproto                                       1.2.0
xdg                                           5
xkit                                          0.0.0
xxhash                                        3.4.1
yapf                                          0.40.2
yarl                                          1.9.4
youtokentome                                  1.0.6
zarr                                          2.17.2
zipp                                          3.18.1

Additional context

GPU: P5000 quatro

@gabitza-tech gabitza-tech added the bug Something isn't working label Apr 1, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

1 participant