Mixing in microphone data as information source #1877

ChrisSpraaklab · 2025-05-27T14:35:30Z

ChrisSpraaklab
May 27, 2025

I am trying to improve the performance by not only using the audio through the standard pipeline, but also using data about the location of the audio source relative to the microphone. For this, I have recordings made with a Shure MXA910 microphone array, which delivers 6 distinct, directional channels (lobes) each optimised for a speaker within a specified region. It also delivers an optimized, combined channel.

I run pyannote normally on the combined channel and use its local segmentations & embeddings, but alongside the local embeddings compute 'energy vectors' containing info about the amount of energy in each lobe in the corresponding local segments, creating a matrix of (num_chunks, local_num_speakers, dimension): (n, 3, 6) mirroring the dimensions of the local embedding matrix: (n, 3, 256). What I am stuck on is the following:

I want to merge these two information streams before global clustering, by taking the distance matrices of both embeddings matrices and combining them using a weighted sum. However, I am not sure how to access the distance matrix of pyannote local embeddings. I would then input that into AgglomerativeClustering with metric='precomputed'. Does anyone know more about getting the distance matrix used internally in pyannote and inputting your own into clustering?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Mixing in microphone data as information source #1877

Uh oh!

{{title}}

Uh oh!

Replies: 0 comments

Select a reply

Uh oh!

Mixing in microphone data as information source #1877

Uh oh!

ChrisSpraaklab May 27, 2025

Replies: 0 comments

ChrisSpraaklab
May 27, 2025