Mixing in microphone data as information source #1877
Unanswered
ChrisSpraaklab
asked this question in
Q&A
Replies: 0 comments
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
-
I am trying to improve the performance by not only using the audio through the standard pipeline, but also using data about the location of the audio source relative to the microphone. For this, I have recordings made with a Shure MXA910 microphone array, which delivers 6 distinct, directional channels (lobes) each optimised for a speaker within a specified region. It also delivers an optimized, combined channel.
I run pyannote normally on the combined channel and use its local segmentations & embeddings, but alongside the local embeddings compute 'energy vectors' containing info about the amount of energy in each lobe in the corresponding local segments, creating a matrix of (num_chunks, local_num_speakers, dimension): (n, 3, 6) mirroring the dimensions of the local embedding matrix: (n, 3, 256). What I am stuck on is the following:
I want to merge these two information streams before global clustering, by taking the distance matrices of both embeddings matrices and combining them using a weighted sum. However, I am not sure how to access the distance matrix of pyannote local embeddings. I would then input that into AgglomerativeClustering with
metric='precomputed'
. Does anyone know more about getting the distance matrix used internally in pyannote and inputting your own into clustering?Beta Was this translation helpful? Give feedback.
All reactions