You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
When running it with different batch_sizes, the results can differ quite a lot. (for the AN4 dataset, the difference are pretty small, but on some proprietary files that I have tested, it can differ by up to +-1% WER)
I think the root of this bug could be from this method:
When padding audios to the max length of the buffer with 0, the pre-encoded cache could introduce new words as we add an additional chunk of 0s + previous context (that's my guess).
The preprocessed signal is computed over the WHOLE audio, which can introduce big differences compared to preprocessing each chunk separately and then aggregating. (e.g.: preprocessing a [5625600] => [1,80,35161] processed signal, while splitting this audio into chunks of 17919 samples (16000 * 1.12s chunk) and aggregating the processed chunks would lead to a processed signal of shape [1,80,35162]) However, this might be a separate bug in che Cache Aware Simulator. I will try to reproduce the bug in another issue maybe, because I also observed a difference when running the cache aware simulator script vs incoming chunks to the server.
Some of my results on the AN4 data splits (The differences here are not that big, but on other files I observed even bigger differences)
Describe the bug
When running the https://github.com/NVIDIA/NeMo/blob/main/examples/asr/asr_cache_aware_streaming/speech_to_text_cache_aware_streaming_infer.py script on a dataset I obtain different results when varying the batch-size (it can go up to +-1% WER)
Steps/Code to reproduce bug
Expected behavior
When running it with different batch_sizes, the results can differ quite a lot. (for the AN4 dataset, the difference are pretty small, but on some proprietary files that I have tested, it can differ by up to +-1% WER)
I think the root of this bug could be from this method:
NeMo/nemo/collections/asr/parts/utils/streaming_utils.py
Line 1550 in 087914a
I also think there might be a bug here:
NeMo/nemo/collections/asr/parts/utils/streaming_utils.py
Line 1544 in 087914a
Some of my results on the AN4 data splits (The differences here are not that big, but on other files I observed even bigger differences)
**Environment overview **
Additional context
GPU: P5000 quatro
The text was updated successfully, but these errors were encountered: