Skip to content

Titanet-Large: Compute EER in every epoch #12881

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
ukemamaster opened this issue Apr 4, 2025 · 17 comments
Closed

Titanet-Large: Compute EER in every epoch #12881

ukemamaster opened this issue Apr 4, 2025 · 17 comments

Comments

@ukemamaster
Copy link

Hi @nithinraok To compute the test EER after every epoch, i have to set is_audio_pair: true in the titanet-large.yaml file. How should the corresponding test data manifest file look like?

@ukemamaster
Copy link
Author

@nithinraok On the other hand, if i set is_audio_pair to false, and pass a test split manifest file (which was created while generating the train manifest file from fileslist passing --split argument to scripts/speaker_tasks/filelist_to_manifest.py), i get CUDA Out of Memory error, even for a batch size of 1. i have 24G GPUs.
I really need to know how the test manifest file should look like. In both cases of is_audio_pair.

@nithinraok
Copy link
Collaborator

@stevehuang52 do you have a sample manifest on how test manifest should be when is_audio_pair is set to true?

@stevehuang52
Copy link
Collaborator

stevehuang52 commented Apr 4, 2025

Example of a line in the manifest file when is_audio_pair=True:
{
"audio_filepath": ["/path/to/audio_wav_0.wav", "/path/to/audio_wav_1.wav"],
"duration": null, # not used but need the field, will load the whole audio
"offset": 0.0, # not used but need the field, will load the whole audio
"label": "0" # label for the pair, 0 for not the same speaker, 1 for same speaker
}

@stevehuang52
Copy link
Collaborator

When is_audio_pair=False, the manifest should look like a normal speaker recognition manifest:
{
"audio_filepath": "/path/to/audio_wav_0.wav",
"duration": 10.0,
"offset": 0.0,
"label": "speaker_id_000",
}

@ukemamaster
Copy link
Author

Thanks @stevehuang52 , i will try it.
One more question : can i pass multiple test manifest files to compute eer for multiple set of trials?

@stevehuang52
Copy link
Collaborator

Yes, you can pass them as a list of manifests: model.validation_ds.manifest_filepath=[manifest_1.json,manifest_2.json]

@ukemamaster
Copy link
Author

Great. And from the titanet-large.yaml file is possible to pass?

@stevehuang52
Copy link
Collaborator

Yes you can directly specify them in the yaml file

@ukemamaster
Copy link
Author

ukemamaster commented Apr 7, 2025

Hi @stevehuang52
I created the manifest file for validation pairs. it looks like this:

{"audio_filepath": ["path/to/00000.wav", "path/to/00001.wav"], "duration": 0.0, "offset": 0.0, "label": "0"}
{"audio_filepath": ["path/to/00002.wav", "path/to/00001.wav"], "duration": 0.0, "offset": 0.0, "label": "1"}

It seems ok now, but i get this error:

Error executing job with overrides: []
Traceback (most recent call last):
  File "/NeMo/examples/speaker_tasks/recognition/my_speaker_reco.py", line 69, in main
    trainer.fit(speaker_model)
  File "nemo/lib/python3.10/site-packages/pytorch_lightning/trainer/trainer.py", line 532, in fit
    call._call_and_handle_interrupt(
  File "/nemo/lib/python3.10/site-packages/pytorch_lightning/trainer/call.py", line 43, in _call_and_handle_interrupt
    return trainer_fn(*args, **kwargs)
  File "/nemo/lib/python3.10/site-packages/pytorch_lightning/trainer/trainer.py", line 571, in _fit_impl
    self._run(model, ckpt_path=ckpt_path)
  File "/nemo/lib/python3.10/site-packages/pytorch_lightning/trainer/trainer.py", line 980, in _run
    results = self._run_stage()
  File "/nemo/lib/python3.10/site-packages/pytorch_lightning/trainer/trainer.py", line 1021, in _run_stage
    self._run_sanity_check()
  File "/nemo/lib/python3.10/site-packages/pytorch_lightning/trainer/trainer.py", line 1050, in _run_sanity_check
    val_loop.run()
  File "/nemo/lib/python3.10/site-packages/pytorch_lightning/loops/utilities.py", line 181, in _decorator
    return loop_run(self, *args, **kwargs)
  File "/nemo/lib/python3.10/site-packages/pytorch_lightning/loops/evaluation_loop.py", line 115, in run
    self._evaluation_step(batch, batch_idx, dataloader_idx)
  File "//nemo/lib/python3.10/site-packages/pytorch_lightning/loops/evaluation_loop.py", line 376, in _evaluation_step
    output = call._call_strategy_hook(trainer, hook_name, *step_kwargs.values())
  File "/nemo/lib/python3.10/site-packages/pytorch_lightning/trainer/call.py", line 293, in _call_strategy_hook
    output = fn(*args, **kwargs)
  File "/nemo/lib/python3.10/site-packages/pytorch_lightning/strategies/strategy.py", line 393, in validation_step
    return self.model.validation_step(*args, **kwargs)
  File "/NeMo/nemo/collections/asr/models/label_models.py", line 565, in validation_step
    return self.evaluation_step(batch, batch_idx, dataloader_idx, 'val')
  File "/NeMo/nemo/collections/asr/models/label_models.py", line 418, in evaluation_step
    return self.pair_evaluation_step(batch, batch_idx, dataloader_idx, tag)
  File "/NeMo/nemo/collections/asr/models/label_models.py", line 467, in pair_evaluation_step
    self._macro_accuracy.update(preds=logits, target=labels)
  File "/nemo/lib/python3.10/site-packages/torchmetrics/metric.py", line 550, in wrapped_func
    update(*args, **kwargs)
  File "/nemo/lib/python3.10/site-packages/torchmetrics/classification/stat_scores.py", line 339, in update
    _multiclass_stat_scores_tensor_validation(
  File "/nemo/lib/python3.10/site-packages/torchmetrics/functional/classification/stat_scores.py", line 283, in _multiclass_stat_scores_tensor_validation
    raise ValueError(
ValueError: If `preds` have one dimension more than `target`, `preds.shape[1]` should be equal to number of classes.

Set the environment variable HYDRA_FULL_ERROR=1 for a complete stack trace.

Seems like shape incompatibility issue.
The shapes of logits and labels are: torch.Size([64, 2]) torch.Size([64]) here logits.shape[1]=2 and n_classes=54988 for training set. I think the problem, arises from here. The labels for validation set has only 2 classes.

Do you think the manifest file is correct? Are the label shapes correct? Do i need to convert them to one-hot vectors?

@ukemamaster
Copy link
Author

ukemamaster commented Apr 7, 2025

On the other hand, if i set is_audio_pair to false, and pass a normal manifest file for test set as you said, i get CUDA Out of Memory error, even for a batch size of 1. i have 24G GPUs.

@nithinraok
Copy link
Collaborator

CUDA OOM also depends on each of your audio sample length. Keep them <=3sec

@stevehuang52
Copy link
Collaborator

Hi @ukemamaster, regarding the macro-accuracy error, there's a bug in the model code, which will be fixed by this PR: #12908.

Regarding OOM error, that's probably due to the lengths of audios, could you please share the statistics of your audio lengths? Also, it'll be helpful if you can share where in the code the OOM occurred, since the classification layer may take a lot of GPU memory if you have a huge number of speakers and long audios. As @nithinraok suggests, we normally use less than 3s audios during training with is_audio_pair=false.

@ukemamaster
Copy link
Author

ukemamaster commented Apr 8, 2025

Hi @stevehuang52, Incorporating the PR, I still get the same error. Printing the self._macro_accuracy.num_classes inside pair_evaluation_step() method still gives 54988 .

If i re-initialize the self._macro_accuracy inside pair_evaluation_step() method, like:

self._macro_accuracy = Accuracy(num_classes=2, top_k=1, average='macro', task='multiclass').to(labels.get_device())

It proceeds to training correctly.

BUT is it safe to do that? The method looks like:

    def pair_evaluation_step(self, batch, batch_idx, dataloader_idx: int = 0, tag: str = 'val'):
        audio_signal_1, audio_signal_len_1, audio_signal_2, audio_signal_len_2, labels, _ = batch
        _, audio_emb1 = self.forward(input_signal=audio_signal_1, input_signal_length=audio_signal_len_1)
        _, audio_emb2 = self.forward(input_signal=audio_signal_2, input_signal_length=audio_signal_len_2)

        # convert binary labels to -1, 1
        loss_labels = (labels.float() - 0.5) * 2
        cosine_sim = torch.cosine_similarity(audio_emb1, audio_emb2)
        loss_value = torch.nn.functional.mse_loss(cosine_sim, loss_labels)

        logits = torch.stack([1 - cosine_sim, cosine_sim], dim=-1)
        acc_top_k = self._accuracy(logits=logits, labels=labels)

        ################# re-initialize self._macro_accuracy
        self._macro_accuracy = Accuracy(num_classes=2, top_k=1, average='macro', task='multiclass').to(labels.get_device())

        correct_counts, total_counts = self._accuracy.correct_counts_k, self._accuracy.total_counts_k
        self._macro_accuracy.update(preds=logits, target=labels)
        stats = self._macro_accuracy._final_state()

        output = {
            f'{tag}_loss': loss_value,
            f'{tag}_correct_counts': correct_counts,
            f'{tag}_total_counts': total_counts,
            f'{tag}_acc_micro_top_k': acc_top_k,
            f'{tag}_acc_macro_stats': stats,
            f"{tag}_scores": cosine_sim,
            f"{tag}_labels": labels,
        }

        if tag == 'val':
            if isinstance(self.trainer.val_dataloaders, (list, tuple)) and len(self.trainer.val_dataloaders) > 1:
                self.validation_step_outputs[dataloader_idx].append(output)
            else:
                self.validation_step_outputs.append(output)
        else:
            if isinstance(self.trainer.test_dataloaders, (list, tuple)) and len(self.trainer.test_dataloaders) > 1:
                self.test_step_outputs[dataloader_idx].append(output)
            else:
                self.test_step_outputs.append(output)

        return output

@stevehuang52
Copy link
Collaborator

Hi @ukemamaster, re-initializing the metric in validation step is probably fine in this particular case. Meanwhile, I just updated the PR to use a separate metric class self._pair_macro_accuracy for pair evaluation, could you please try it out?

@ukemamaster
Copy link
Author

ukemamaster commented Apr 9, 2025

@stevehuang52 With the updated PR the training goes fine.

One question regarding the EER:

In the logs,

Epoch 0, global step 20816: 'val_loss' reached 1.23330 (best 1.23330), 
Epoch 1, global step 41632: 'val_loss' reached 1.31718 (best 1.23330), 
Epoch 2, global step 62448: 'val_loss' reached 1.14597 (best 1.14597), 
Epoch 3, global step 83264: 'val_loss' reached 0.91044 (best 0.91044), 
Epoch 4, global step 104080: 'val_loss' reached 0.81943 (best 0.81943), 
Epoch 5, global step 124896: 'val_loss' reached 0.77752 (best 0.77752), 
Epoch 6, global step 145712: 'val_loss' reached 0.75393 (best 0.75393), 

the val_loss referes to the EER computed from the trials? If not, where is the EER logged?

@stevehuang52
Copy link
Collaborator

the val_loss referes to the EER computed from the trials? If not, where is the EER logged?

@ukemamaster In the paired eval case, the val_loss is the MSE loss between predicted cosine similarity and the groundtruth labels converted to -1 and 1.

For saving checkpoint based on EER value, you need to set exp_manager.checkpoint_callback_params.monitor='val_eer'. You can refer to this example on how to configure checkpoint_callback_params

If you only need to monitor the EER but still save checkpoints based on val_loss, you don't need the above change, and the EER values will be logged in WandB as well.

@ukemamaster
Copy link
Author

Thanks @stevehuang52.
Now i can save model checkpoints based on val_eer.

@ashors1 ashors1 closed this as completed May 2, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants