Fixes MPS device errors from Tensor.type() when using generate_text_semantic and generate_coarse #27

fiq · 2023-05-06T06:44:42Z

Addresses MPS specific errors in generate_text_semantic and generate_coarse when calling Tensor.type for logit handling on MPS devices. See this underlying pytorch issue pytorch/pytorch#78929

Note that I've also submitted a similar PR directly into bark Not clear on your syncing policy, so you may want to wait and see how that fairs. Although this issues applies to both bark and bark-with-voice-clones. Has fixed it for me on an M2 Pro

FIX Tested On:
M2 Pro

Expected:

Seamless voice generation

Actual:

❯ SUNO_ENABLE_MPS=True python ./test-case.py
  0%|                                                                                                                                                                                                                                                           	| 0/100 [00:00<?, ?it/s]Traceback (most recent call last):
  File "/Users/innovation/Code/bark/./test-case.py", line 57, in <module>
	audio_array=  say_semantic(training_text, voice_name)
  File "/Users/innovation/Code/bark/./test-case.py", line 30, in say_semantic
	x_semantic = generate_text_semantic(
  File "/Users/innovation/Code/bark/bark/generation.py", line 479, in generate_text_semantic
	relevant_logits = relevant_logits.to(logits_device).type(logits_dtype)
ValueError: invalid type: 'torch.mps.FloatTensor'
  0%|

To recreate:

To recreate prior to this PR, on an MPS device (tested on M2 Pro), use this test script and run as above with SUNO_ENABLE_MPS set:

from bark.generation import load_codec_model, generate_text_semantic
from bark.generation import SAMPLE_RATE, preload_models, codec_decode, generate_coarse, generate_fine, generate_text_semantic
from bark.api import generate_audio
import numpy as np
from scipy.io.wavfile import write as write_wav
import sounddevice as sd
import pprint
pp = pprint.PrettyPrinter()
preload_models()

def say_semantic(text_prompt, voice_name):
  preload_models(
    text_use_gpu=True,
    text_use_small=False,
    coarse_use_gpu=True,
    coarse_use_small=False,
    fine_use_gpu=True,
    fine_use_small=False,
    codec_use_gpu=True,
    force_reload=True,
  )


  x_semantic = generate_text_semantic(
        text_prompt,
        history_prompt=voice_name,
        temp=0.7,
        top_k=50,
        top_p=0.95,
    )

  x_coarse_gen = generate_coarse(
        x_semantic,
        history_prompt=voice_name,
        temp=0.7,
        top_k=50,
        top_p=0.95,
    )

  x_fine_gen = generate_fine(
    x_coarse_gen,
    history_prompt=voice_name,
    temp=0.5,
    )

  return codec_decode(x_fine_gen)

voice_name="en_speaker_0"
training_text = "Hello, there!"
audio_array=  say_semantic(training_text, voice_name)

pp.pprint(audio_array)
write_wav("output.wav", SAMPLE_RATE, audio_array)
sd.play(audio_array, SAMPLE_RATE)
# allow async sd.play to complete
sd.wait()

Btw, I'm LOVING both bark and bark-with-voice-clone (even though it sounds nothing like my tunings yet 😂). Thanks for forking and unlocking the voice cloning!

See pytorch/pytorch#78929

dagshub · 2023-05-06T06:44:44Z

Join the discussion on DagsHub!

fiq · 2023-05-07T04:12:05Z

I have just pushed up a simplification which shouldn't break support on other device types and reduces my previous MPS fix to two lines. Tested with device types of CPU and MPS.

devinschumacher · 2023-05-09T12:32:53Z

thank you @fiq ! i tagged @francislabountyjr here and in our Discord group to take a look at your PR.

fiq · 2023-05-13T02:37:51Z

Thanks @devinschumacher.

@francislabountyjr I've pushed up a further simplification/cleanup. Tested against MPS and CPU devices.

francislabountyjr

Looks good! PR makes the necessary changes to run on MPS device while not affecting the functionality of cuda devices.

Add key/value caching for autoregressive generation

fiq added 3 commits May 6, 2023 16:40

Force type as pytorch Tensor.type() has missing support for MPS.

b21d8c6

See pytorch/pytorch#78929

Fix to address pytorch Tensor.type() having missing support for MPS.

89b362b

See pytorch/pytorch#78929

DRY: refactored mps supporting dtype.

c95350f

simplified fix for pytorch MPS .type() issue

8660b7d

preserve device type in a temporary variable

c84f71e

francislabountyjr approved these changes May 14, 2023

View reviewed changes

francislabountyjr merged commit 43cbc43 into serp-ai:main May 14, 2023

maximus-sallam pushed a commit to maximus-sallam/bark-with-voice-clone that referenced this pull request Jun 5, 2023

Merge pull request serp-ai#27 from zygi/main

3247106

Add key/value caching for autoregressive generation

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fixes MPS device errors from Tensor.type() when using generate_text_semantic and generate_coarse #27

Fixes MPS device errors from Tensor.type() when using generate_text_semantic and generate_coarse #27

fiq commented May 6, 2023

dagshub bot commented May 6, 2023

fiq commented May 7, 2023

devinschumacher commented May 9, 2023

fiq commented May 13, 2023 •

edited

Loading

francislabountyjr left a comment

Fixes MPS device errors from Tensor.type() when using generate_text_semantic and generate_coarse #27

Fixes MPS device errors from Tensor.type() when using generate_text_semantic and generate_coarse #27

Conversation

fiq commented May 6, 2023

dagshub bot commented May 6, 2023

fiq commented May 7, 2023

devinschumacher commented May 9, 2023

fiq commented May 13, 2023 • edited Loading

francislabountyjr left a comment

Choose a reason for hiding this comment

fiq commented May 13, 2023 •

edited

Loading