Skip to content

Commit fd45708

Browse files
authored
Merge pull request #977 from hanasay/main
Convert audio to mono while extract speech token
2 parents 95e99e0 + 296ed4f commit fd45708

File tree

1 file changed

+3
-0
lines changed

1 file changed

+3
-0
lines changed

Diff for: tools/extract_speech_token.py

+3
Original file line numberDiff line numberDiff line change
@@ -27,6 +27,9 @@ def single_job(utt):
2727
audio, sample_rate = torchaudio.load(utt2wav[utt], backend='soundfile')
2828
if sample_rate != 16000:
2929
audio = torchaudio.transforms.Resample(orig_freq=sample_rate, new_freq=16000)(audio)
30+
# Convert audio to mono
31+
if audio.shape[0] > 1:
32+
audio = audio.mean(dim=0, keepdim=True)
3033
if audio.shape[1] / 16000 > 30:
3134
logging.warning('do not support extract speech token for audio longer than 30s')
3235
speech_token = []

0 commit comments

Comments
 (0)