11 Apr 22:07

Blaizzy

8669012

v0.0.4 Latest

Latest

What's Changed

Add CSM (Conversational Speech Model) section to README.md by @ivanfioravanti in #56
Use MLX-based SNAC vocoder for Orpheus by @lucasnewman in #62
Add Descript neural audio codec by @lucasnewman in #57
Vectorize overlap-add operation in istft by @lucasnewman in #66
Align Orpheus sampling parameters with the reference implementation by @lucasnewman in #68
Improve Korkoro generation performance by @lucasnewman in #73
Add Speech to Speech Tab to the UI by @freddyaboulton in #79
Add support for fp16 variant of Sesame by @lucasnewman in #78
Add (partially-working) voice matching support for Orpheus by @lucasnewman in #75
Bible Audiobook example by @andrepadez in #43
removed push notification to ntfy by @andrepadez in #81
Bump version by @Blaizzy in #82

New Contributors

@freddyaboulton made their first contribution in #79

Full Changelog: v0.0.3...v0.0.4

Contributors

andrepadez, ivanfioravanti, and 3 other contributors

Assets 2

21 Mar 23:02

Blaizzy

v0.0.3

bec84ab

v0.0.3

What's Changed

Add verbose logging and model selection support by @ivanfioravanti in #22
Pulsating effect by @ivanfioravanti in #23
Compile the decoder for Kokoro by @lucasnewman in #24
Play audio segments as they are generated by @lucasnewman in #26
Evaluate the computation graph before returning results by @ivanfioravanti in #35
Add Mimi neural audio codec by @lucasnewman in #34
Add model for Sesame TTS by @lucasnewman in #36
Sphere speed up during audio generation by @ivanfioravanti in #40
Added more Voices by @andrepadez in #37
Feature: External API for Audiobook Generation by @sergenes in #19
Add EnCodec neural audio codec by @lucasnewman in #46
Add Suno bark by @Blaizzy in #45
Update README.md to fix lang_code error by @zboyles in #49
fix model config by @Blaizzy in #50
Add Vocos neural audio codec by @lucasnewman in #48
Fix Kokoro audio generation by @lucasnewman in #52
Add orpheus by @Blaizzy in #47
Resample and Transcribe by @chigkim in #51
Fix vocos config loading by @Blaizzy in #53

New Contributors

@andrepadez made their first contribution in #37
@sergenes made their first contribution in #19
@zboyles made their first contribution in #49
@chigkim made their first contribution in #51

Full Changelog: v0.0.2...v0.0.3

Contributors

andrepadez, ivanfioravanti, and 5 other contributors

Assets 2

07 Mar 22:45

Blaizzy

v0.0.2

f24355d

v0.0.2

What's Changed

fix workflows and readme by @Blaizzy in #5
Add soundfile to requirements and Quick Start in README by @ivanfioravanti in #8
Remove librosa dependency by @lucasnewman in #11
Add support for command-line playback with the --play argument by @lucasnewman in #10
Allow receiving text input from stdin or an entry prompt by @lucasnewman in #12
Add web server and improve audio player by @ivanfioravanti in #14
Use phonemizer-fork to avoid espeak errors by @rampadc in #17

New Contributors

@Blaizzy made their first contribution in #5
@ivanfioravanti made their first contribution in #8
@lucasnewman made their first contribution in #11
@rampadc made their first contribution in #17

Full Changelog: v0.0.1...v0.0.2

Contributors

ivanfioravanti, rampadc, and 2 other contributors

Assets 2

28 Feb 16:41

Blaizzy

v0.0.1

301f016

v0.0.1

Release Notes - February 28, 2025

Overview

This release introduces a fully functional MLX-Audio package with text-to-speech capabilities, complete with testing infrastructure and CI/CD integration via GitHub Actions.

New Features

Text-to-Speech Generation: Added complete generation pipeline with audio output functionality
Audio Joining: New functionality to join multiple audio segments
Model Quantization: Added support for model quantization to improve performance
GitHub Actions: Implemented CI/CD workflows for automated testing and deployment

Improvements

Kokoro MLX porting: Completed refactoring of the entire model to MLX framework:
- Text encoder with BERT implementation
- Decoder with improved audio quality
- Duration, indices, and alignment target prediction
- Custom Bidirectional LSTM, Weight norm for CNNs, AdaLayerNorm and Generator layers
SafeTensors Support: Added working implementation for SafeTensors format
Pipeline Structure: Restructured the generation pipeline for better maintainability

Bug Fixes

Fixed model loading mechanism
Resolved issues with text encoder LayerNorm operation
Fixed generator functionality
Addressed issues in LSTM and AdaLayerNorm implementations
Refactored and fixed ConvWeight component

Full Changelog: https://github.com/Blaizzy/mlx-audio/commits/v0.0.1

Assets 2

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

What's Changed

New Contributors

Contributors

What's Changed

New Contributors

Contributors

What's Changed

New Contributors

Contributors

Release Notes - February 28, 2025

Overview

New Features

Improvements

Bug Fixes

Releases: Blaizzy/mlx-audio

v0.0.4

What's Changed

New Contributors

Contributors

v0.0.3

What's Changed

New Contributors

Contributors

v0.0.2

What's Changed

New Contributors

Contributors

v0.0.1

Release Notes - February 28, 2025

Overview

New Features

Improvements

Bug Fixes