Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

RuntimeError: cuDNN error: CUDNN_STATUS_SUBLIBRARY_VERSION_MISMATCH on WSL2 Ubuntu24.04 #282

Open
cospotato opened this issue Dec 7, 2024 · 10 comments

Comments

@cospotato
Copy link

Hi, i am new to deep learning. It's work on Windows with CUDA 12.5 and cudnn 9.3.0. Then i tried to run on WSL2 with config belowing get the error RuntimeError: cuDNN error: CUDNN_STATUS_SUBLIBRARY_VERSION_MISMATCH in WSL2 Ubuntu24.04. What i have missing ?

OS: WSL2 Ubuntu24.04
Kernel: Linux cospotato 5.15.167.4-microsoft-standard-WSL2 #1 SMP Tue Nov 5 00:21:55 UTC 2024 x86_64 x86_64 x86_64 GNU/Linux
PyTorch Version: 2.5.1
CUDA version: 12.6
cudnn version: 9.3.0

Traceback:

Traceback (most recent call last):
  File "/home/cospotato/repo/github.com/MahmoudAshraf97/whisper-diarization/diarize.py", line 199, in <module>
    msdd_model = NeuralDiarizer(cfg=create_config(temp_path)).to(args.device)
                 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/cospotato/repo/github.com/MahmoudAshraf97/whisper-diarization/.venv/lib/python3.12/site-packages/nemo/collections/asr/models/msdd_models.py", line 994, in __init__
    self._init_msdd_model(cfg)
  File "/home/cospotato/repo/github.com/MahmoudAshraf97/whisper-diarization/.venv/lib/python3.12/site-packages/nemo/collections/asr/models/msdd_models.py", line 1096, in _init_msdd_model
    self.msdd_model = EncDecDiarLabelModel.from_pretrained(model_name=model_path, map_location=cfg.device)
                      ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/cospotato/repo/github.com/MahmoudAshraf97/whisper-diarization/.venv/lib/python3.12/site-packages/nemo/core/classes/common.py", line 754, in from_pretrained
    instance = class_.restore_from(
               ^^^^^^^^^^^^^^^^^^^^
  File "/home/cospotato/repo/github.com/MahmoudAshraf97/whisper-diarization/.venv/lib/python3.12/site-packages/nemo/core/classes/modelPT.py", line 464, in restore_from
    instance = cls._save_restore_connector.restore_from(
               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/cospotato/repo/github.com/MahmoudAshraf97/whisper-diarization/.venv/lib/python3.12/site-packages/nemo/core/connectors/save_restore_connector.py", line 255, in restore_from
    loaded_params = self.load_config_and_state_dict(
                    ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/cospotato/repo/github.com/MahmoudAshraf97/whisper-diarization/.venv/lib/python3.12/site-packages/nemo/core/connectors/save_restore_connector.py", line 179, in load_config_and_state_dict
    instance = instance.to(map_location)
               ^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/cospotato/repo/github.com/MahmoudAshraf97/whisper-diarization/.venv/lib/python3.12/site-packages/lightning_fabric/utilities/device_dtype_mixin.py", line 55, in to
    return super().to(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/cospotato/repo/github.com/MahmoudAshraf97/whisper-diarization/.venv/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1340, in to
    return self._apply(convert)
           ^^^^^^^^^^^^^^^^^^^^
  File "/home/cospotato/repo/github.com/MahmoudAshraf97/whisper-diarization/.venv/lib/python3.12/site-packages/torch/nn/modules/module.py", line 900, in _apply
    module._apply(fn)
  File "/home/cospotato/repo/github.com/MahmoudAshraf97/whisper-diarization/.venv/lib/python3.12/site-packages/torch/nn/modules/module.py", line 900, in _apply
    module._apply(fn)
  File "/home/cospotato/repo/github.com/MahmoudAshraf97/whisper-diarization/.venv/lib/python3.12/site-packages/torch/nn/modules/rnn.py", line 288, in _apply
    self._init_flat_weights()
  File "/home/cospotato/repo/github.com/MahmoudAshraf97/whisper-diarization/.venv/lib/python3.12/site-packages/torch/nn/modules/rnn.py", line 215, in _init_flat_weights
    self.flatten_parameters()
  File "/home/cospotato/repo/github.com/MahmoudAshraf97/whisper-diarization/.venv/lib/python3.12/site-packages/torch/nn/modules/rnn.py", line 269, in flatten_parameters
    torch._cudnn_rnn_flatten_weight(
RuntimeError: cuDNN error: CUDNN_STATUS_SUBLIBRARY_VERSION_MISMATCH
@cospotato cospotato changed the title RuntimeError: cuDNN error: CUDNN_STATUS_SUBLIBRARY_VERSION_MISMATCH in WSL2 Ubuntu24.04 RuntimeError: cuDNN error: CUDNN_STATUS_SUBLIBRARY_VERSION_MISMATCH on WSL2 Ubuntu24.04 Dec 7, 2024
@cospotato
Copy link
Author

cospotato commented Dec 7, 2024

Additional: If i run NeMo MSDD diarization model section alone. It works. Maybe conflict NeMo conflicted with Whisper ?

@DrJPK
Copy link

DrJPK commented Dec 12, 2024

@cospotato did you manage to work this out? because I am having exactly the same issue on a RHEL 9 system RuntimeError: cuDNN error: CUDNN_STATUS_SUBLIBRARY_VERSION_MISMATCH on a bare metal server.

System
OS: Red Hat Enterprise Linux release 9.4 (Plow)
Kernel: 5.14.0-427.35.1.el9_4.x86_64
GPU: Nvidia A30 24GB
CUDA: 12.4.r12.4/compiler.34097967_0
cuDNN: 9.6.0.74
Python: Python 3.12.1 running in venv
torch: 2.5.1

Traceback:

Traceback (most recent call last):
  File "/srv/whisperAI/whisper-diarization/diarize.py", line 202, in <module>
    msdd_model = NeuralDiarizer(cfg=create_config(temp_path)).to(args.device)
                 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/srv/whisperAI/whisper-diarization/.venv/lib64/python3.12/site-packages/nemo/collections/asr/models/msdd_models.py", line 994, in __init__
    self._init_msdd_model(cfg)
  File "/srv/whisperAI/whisper-diarization/.venv/lib64/python3.12/site-packages/nemo/collections/asr/models/msdd_models.py", line 1096, in _init_msdd_model
    self.msdd_model = EncDecDiarLabelModel.from_pretrained(model_name=model_path, map_location=cfg.device)
                      ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/srv/whisperAI/whisper-diarization/.venv/lib64/python3.12/site-packages/nemo/core/classes/common.py", line 754, in from_pretrained
    instance = class_.restore_from(
               ^^^^^^^^^^^^^^^^^^^^
  File "/srv/whisperAI/whisper-diarization/.venv/lib64/python3.12/site-packages/nemo/core/classes/modelPT.py", line 464, in restore_from
    instance = cls._save_restore_connector.restore_from(
               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/srv/whisperAI/whisper-diarization/.venv/lib64/python3.12/site-packages/nemo/core/connectors/save_restore_connector.py", line 255, in restore_from
    loaded_params = self.load_config_and_state_dict(
                    ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/srv/whisperAI/whisper-diarization/.venv/lib64/python3.12/site-packages/nemo/core/connectors/save_restore_connector.py", line 179, in load_config_and_state_dict
    instance = instance.to(map_location)
               ^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/srv/whisperAI/whisper-diarization/.venv/lib64/python3.12/site-packages/lightning_fabric/utilities/device_dtype_mixin.py", line 55, in to
    return super().to(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/srv/whisperAI/whisper-diarization/.venv/lib64/python3.12/site-packages/torch/nn/modules/module.py", line 1340, in to
    return self._apply(convert)
           ^^^^^^^^^^^^^^^^^^^^
  File "/srv/whisperAI/whisper-diarization/.venv/lib64/python3.12/site-packages/torch/nn/modules/module.py", line 900, in _apply
    module._apply(fn)
  File "/srv/whisperAI/whisper-diarization/.venv/lib64/python3.12/site-packages/torch/nn/modules/module.py", line 900, in _apply
    module._apply(fn)
  File "/srv/whisperAI/whisper-diarization/.venv/lib64/python3.12/site-packages/torch/nn/modules/rnn.py", line 288, in _apply
    self._init_flat_weights()
  File "/srv/whisperAI/whisper-diarization/.venv/lib64/python3.12/site-packages/torch/nn/modules/rnn.py", line 215, in _init_flat_weights
    self.flatten_parameters()
  File "/srv/whisperAI/whisper-diarization/.venv/lib64/python3.12/site-packages/torch/nn/modules/rnn.py", line 269, in flatten_parameters
    torch._cudnn_rnn_flatten_weight(
RuntimeError: cuDNN error: CUDNN_STATUS_SUBLIBRARY_VERSION_MISMATCH

Clearly this is a CUDA issue but I cannot work out what is going on. I assume it is a pyTorch thing

@DrJPK
Copy link

DrJPK commented Dec 12, 2024

OK quick update... diarize.py -a audio.MP3 is still causing the issue above. HOWEVER, diarize_parallel.py -a audio.MP3 runs and transcribes the autio to text and srt with a good level of activity. BUT the speaker identification does not work. I don't know if that helps or confuses things but thought I would share it.

@DrJPK
Copy link

DrJPK commented Dec 12, 2024

EDIT: I think this post below is actually a just a set of warnings and is unrelated to diarize.py not running on linux

@cospotato just out of interest did you get a warning directly before this error when calling diarize.py about tarfile.py:2252 not being allowed to use absolute paths anymore?

[NeMo W 2024-12-12 15:17:10 nemo_logging:393] /usr/lib64/python3.12/tarfile.py:2252: RuntimeWarning: The default behavior of tarfile extraction has been changed to disallow common exploits (including CVE-2007-4559). By default, absolute/parent paths are disallowed and some mode bits are cleared. See https://access.redhat.com/articles/7004769 for more details.
      warnings.warn(

[NeMo W 2024-12-12 15:17:11 nemo_logging:393] If you intend to do training or fine-tuning, please call the ModelPT.setup_training_data() method and provide a valid configuration file to setup the train data loader.
    Train config :
    manifest_filepath: null
    emb_dir: null
    sample_rate: 16000
    num_spks: 2
    soft_label_thres: 0.5
    labels: null
    batch_size: 15
    emb_batch_size: 0
    shuffle: true

[NeMo W 2024-12-12 15:17:11 nemo_logging:393] If you intend to do validation, please call the ModelPT.setup_validation_data() or ModelPT.setup_multiple_validation_data() method and provide a valid configuration file to setup the validation data loader(s).
    Validation config :
    manifest_filepath: null
    emb_dir: null
    sample_rate: 16000
    num_spks: 2
    soft_label_thres: 0.5
    labels: null
    batch_size: 15
    emb_batch_size: 0
    shuffle: false

[NeMo W 2024-12-12 15:17:11 nemo_logging:393] Please call the ModelPT.setup_test_data() or ModelPT.setup_multiple_test_data() method and provide a valid configuration file to setup the test data loader(s).
    Test config :
    manifest_filepath: null
    emb_dir: null
    sample_rate: 16000
    num_spks: 2
    soft_label_thres: 0.5
    labels: null
    batch_size: 15
    emb_batch_size: 0
    shuffle: false
    seq_eval_mode: false

@sadathknorket
Copy link

same issue , is this resolved ? @DrJPK @cospotato

@DrJPK
Copy link

DrJPK commented Dec 12, 2024

@sadathknorket not resolved but for some reason that I can't quite explain, the diarize_parralel.py script runs without this error for me. Unfortunately, that parallel script seems to label everything as speaker 0 so it's not working perfectly but it is transcribing and completing. I'm thinking something upstream with NeMo has changed causing this issue.

@juntatalor
Copy link

juntatalor commented Jan 20, 2025

Hi there. I faced same issue (not WSL, standalone Ubuntu 24.04). Inside conda environment:

pip install -U nvidia-cuda-runtime-cu12 nvidia-cudnn-cu12

pip throws dependency error for pytorch....:

ERROR: pip's dependency resolver does not currently take into account all the packages that are installed. This behaviour is the source of the following dependency conflicts. torch 2.5.1 requires nvidia-cuda-runtime-cu12==12.4.127; platform_system == "Linux" and platform_machine == "x86_64", but you have nvidia-cuda-runtime-cu12 12.6.77 which is incompatible. torch 2.5.1 requires nvidia-cudnn-cu12==9.1.0.70; platform_system == "Linux" and platform_machine == "x86_64", but you have nvidia-cudnn-cu12 9.6.0.74 which is incompatible.

... but the packages are installed successfully, and no more CUDNN_STATUS_SUBLIBRARY_VERSION_MISMATCH exception thrown for diarize.py

@eq-gdekhayser
Copy link

Hi there. I faced same issue (not WSL, standalone Ubuntu 24.04). Inside conda environment:

pip install -U nvidia-cuda-runtime-cu12 nvidia-cudnn-cu12

pip throws dependency error for pytorch....:

ERROR: pip's dependency resolver does not currently take into account all the packages that are installed. This behaviour is the source of the following dependency conflicts. torch 2.5.1 requires nvidia-cuda-runtime-cu12==12.4.127; platform_system == "Linux" and platform_machine == "x86_64", but you have nvidia-cuda-runtime-cu12 12.6.77 which is incompatible. torch 2.5.1 requires nvidia-cudnn-cu12==9.1.0.70; platform_system == "Linux" and platform_machine == "x86_64", but you have nvidia-cudnn-cu12 9.6.0.74 which is incompatible.

... but the packages are installed successfully, and no more CUDNN_STATUS_SUBLIBRARY_VERSION_MISMATCH exception thrown for diarize.py

I can verify I had the SAME issue, applied the "fix" here, got the SAME error but got the same successful install, and the cuDNN mismatch was resolved. Very weird, but all's well that ends.

@Investroj
Copy link

Is there no solution for this Yet ?

@MahmoudAshraf97
Copy link
Owner

This is not an issue that will be solved in this project, you just need to configure all your cuda libraries correctly which can be hard

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

7 participants