Add support for streaming in Orpheus #74

studiostephe · 2025-04-02T00:52:48Z

I see in the changelog for 0.0.3 we should be able to 'Play audio segments as they are generated #26" but I'm having trouble getting that to work.

I might be doing something silly! Here are my results, it still saves a wav file, then starts playing after:

% python -m mlx_audio.tts.generate --model mlx-community/orpheus-3b-0.1-ft-4bit --text "Hello world" --play               
Fetching 6 files: 100%|████████████████████████| 6/6 [00:00<00:00, 62601.55it/s]

Model: mlx-community/orpheus-3b-0.1-ft-4bit
Text: Hello world
Voice: None
Speed: 1.0x
Language: a
  0%|                                                  | 0/1200 [00:00<?, ?it/s]mx.metal.set_wired_limt is deprecated and will be removed in a future version. Use mx.set_wired_limit instead.
mx.metal.get_peak_memory is deprecated and will be removed in a future version. Use mx.get_peak_memory instead.
mx.metal.clear_cache is deprecated and will be removed in a future version. Use mx.clear_cache instead.
 10%|███▋                                   | 114/1200 [00:00<00:07, 143.25it/s]
==========
Duration:              00:00:01.365
Samples/sec:           0.7
Prompt:                1 tokens, 0.7 tokens-per-sec
Audio:                 1 samples, 0.7 samples-per-sec
Real-time factor:      1.12x
Processing time:       1.22s
Peak memory usage:     1.92GB
✅ Audio successfully generated and saving as: audio_000.wav

The text was updated successfully, but these errors were encountered:

Blaizzy · 2025-04-11T22:01:50Z

Hey,

No, you are not. It's a missing feature actually.

Orpheus at the moment generates all the tokens then we play. Will be fixed :)

It can only stream text that you plit (paragraph N).

studiostephe · 2025-04-18T17:38:56Z

Thanks for the reply Blaizzy, and thanks for all of your work!

It would be really exciting to have streaming! I think a lot of us are working on speech to speech pipelines, myself included, and a streaming output from the TTS is the last gap to close.

I have an M1 ultra that can generate faster than realtime with both orpheus and CSM, and I love the results. If I could just play the first audio bytes out sooner, I'd be so happy!

There is another project that has implemented a streaming solution for CSM, but it is CUDA based:
https://github.com/davidbrowne17/csm-streaming

I attempted to bring it over to MLX myself, but their implementation appears to use some unsupported operations on mlx, and that is unfortunately over my head at this time.

Blaizzy closed this as completed Apr 11, 2025

Blaizzy reopened this Apr 11, 2025

Blaizzy changed the title ~~streaming~~ Add support for streaming in Orpheus Apr 11, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add support for streaming in Orpheus #74

Add support for streaming in Orpheus #74

studiostephe commented Apr 2, 2025 •

edited

Loading

Blaizzy commented Apr 11, 2025

studiostephe commented Apr 18, 2025

Add support for streaming in Orpheus #74

Add support for streaming in Orpheus #74

Comments

studiostephe commented Apr 2, 2025 • edited Loading

Blaizzy commented Apr 11, 2025

studiostephe commented Apr 18, 2025

studiostephe commented Apr 2, 2025 •

edited

Loading