Skip to content

Add support for streaming in Orpheus #74

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
studiostephe opened this issue Apr 2, 2025 · 2 comments
Open

Add support for streaming in Orpheus #74

studiostephe opened this issue Apr 2, 2025 · 2 comments

Comments

@studiostephe
Copy link

studiostephe commented Apr 2, 2025

I see in the changelog for 0.0.3 we should be able to 'Play audio segments as they are generated #26" but I'm having trouble getting that to work.

I might be doing something silly! Here are my results, it still saves a wav file, then starts playing after:

% python -m mlx_audio.tts.generate --model mlx-community/orpheus-3b-0.1-ft-4bit --text "Hello world" --play               
Fetching 6 files: 100%|████████████████████████| 6/6 [00:00<00:00, 62601.55it/s]

Model: mlx-community/orpheus-3b-0.1-ft-4bit
Text: Hello world
Voice: None
Speed: 1.0x
Language: a
  0%|                                                  | 0/1200 [00:00<?, ?it/s]mx.metal.set_wired_limt is deprecated and will be removed in a future version. Use mx.set_wired_limit instead.
mx.metal.get_peak_memory is deprecated and will be removed in a future version. Use mx.get_peak_memory instead.
mx.metal.clear_cache is deprecated and will be removed in a future version. Use mx.clear_cache instead.
 10%|███▋                                   | 114/1200 [00:00<00:07, 143.25it/s]
==========
Duration:              00:00:01.365
Samples/sec:           0.7
Prompt:                1 tokens, 0.7 tokens-per-sec
Audio:                 1 samples, 0.7 samples-per-sec
Real-time factor:      1.12x
Processing time:       1.22s
Peak memory usage:     1.92GB
✅ Audio successfully generated and saving as: audio_000.wav
@Blaizzy
Copy link
Owner

Blaizzy commented Apr 11, 2025

Hey,

No, you are not. It's a missing feature actually.

Orpheus at the moment generates all the tokens then we play. Will be fixed :)

It can only stream text that you plit (paragraph N).

@Blaizzy Blaizzy closed this as completed Apr 11, 2025
@Blaizzy Blaizzy reopened this Apr 11, 2025
@Blaizzy Blaizzy changed the title streaming Add support for streaming in Orpheus Apr 11, 2025
@studiostephe
Copy link
Author

Thanks for the reply Blaizzy, and thanks for all of your work!

It would be really exciting to have streaming! I think a lot of us are working on speech to speech pipelines, myself included, and a streaming output from the TTS is the last gap to close.

I have an M1 ultra that can generate faster than realtime with both orpheus and CSM, and I love the results. If I could just play the first audio bytes out sooner, I'd be so happy!

There is another project that has implemented a streaming solution for CSM, but it is CUDA based:
https://github.com/davidbrowne17/csm-streaming

I attempted to bring it over to MLX myself, but their implementation appears to use some unsupported operations on mlx, and that is unfortunately over my head at this time.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants