Replies: 1 comment
-
There is no easy way to feed your VAD into the existing diarization pipeline as the latter does not explicitely rely on a VAD step. You'd have to design your own speaker diarization pipeline. You can use PretrainedSpeakerEmbedding to extract embeddings and then any clustering algorithm from scikit-learn for instance. |
Beta Was this translation helpful? Give feedback.
0 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
Uh oh!
There was an error while loading. Please reload this page.
-
I have already performed VAD step using a different model. If required, I can create a Pynote.Audio.Annotation object from my VAD data. I believe this data more or less represent the segmentation stage output. Is my understanding correct?
Now question is how can I feed this object to embedding and clustering step for performing diarization? I am using the develop branch currently.
Beta Was this translation helpful? Give feedback.
All reactions