Releases: Blaizzy/mlx-audio
Releases · Blaizzy/mlx-audio
v0.0.4
What's Changed
- Add CSM (Conversational Speech Model) section to README.md by @ivanfioravanti in #56
- Use MLX-based SNAC vocoder for Orpheus by @lucasnewman in #62
- Add Descript neural audio codec by @lucasnewman in #57
- Vectorize overlap-add operation in istft by @lucasnewman in #66
- Align Orpheus sampling parameters with the reference implementation by @lucasnewman in #68
- Improve Korkoro generation performance by @lucasnewman in #73
- Add Speech to Speech Tab to the UI by @freddyaboulton in #79
- Add support for fp16 variant of Sesame by @lucasnewman in #78
- Add (partially-working) voice matching support for Orpheus by @lucasnewman in #75
- Bible Audiobook example by @andrepadez in #43
- removed push notification to ntfy by @andrepadez in #81
- Bump version by @Blaizzy in #82
New Contributors
- @freddyaboulton made their first contribution in #79
Full Changelog: v0.0.3...v0.0.4
v0.0.3
What's Changed
- Add verbose logging and model selection support by @ivanfioravanti in #22
- Pulsating effect by @ivanfioravanti in #23
- Compile the decoder for Kokoro by @lucasnewman in #24
- Play audio segments as they are generated by @lucasnewman in #26
- Evaluate the computation graph before returning results by @ivanfioravanti in #35
- Add Mimi neural audio codec by @lucasnewman in #34
- Add model for Sesame TTS by @lucasnewman in #36
- Sphere speed up during audio generation by @ivanfioravanti in #40
- Added more Voices by @andrepadez in #37
- Feature: External API for Audiobook Generation by @sergenes in #19
- Add EnCodec neural audio codec by @lucasnewman in #46
- Add Suno bark by @Blaizzy in #45
- Update README.md to fix lang_code error by @zboyles in #49
- fix model config by @Blaizzy in #50
- Add Vocos neural audio codec by @lucasnewman in #48
- Fix Kokoro audio generation by @lucasnewman in #52
- Add orpheus by @Blaizzy in #47
- Resample and Transcribe by @chigkim in #51
- Fix vocos config loading by @Blaizzy in #53
New Contributors
- @andrepadez made their first contribution in #37
- @sergenes made their first contribution in #19
- @zboyles made their first contribution in #49
- @chigkim made their first contribution in #51
Full Changelog: v0.0.2...v0.0.3
v0.0.2
What's Changed
- fix workflows and readme by @Blaizzy in #5
- Add soundfile to requirements and Quick Start in README by @ivanfioravanti in #8
- Remove librosa dependency by @lucasnewman in #11
- Add support for command-line playback with the --play argument by @lucasnewman in #10
- Allow receiving text input from stdin or an entry prompt by @lucasnewman in #12
- Add web server and improve audio player by @ivanfioravanti in #14
- Use phonemizer-fork to avoid espeak errors by @rampadc in #17
New Contributors
- @Blaizzy made their first contribution in #5
- @ivanfioravanti made their first contribution in #8
- @lucasnewman made their first contribution in #11
- @rampadc made their first contribution in #17
Full Changelog: v0.0.1...v0.0.2
v0.0.1
Release Notes - February 28, 2025
Overview
This release introduces a fully functional MLX-Audio package with text-to-speech capabilities, complete with testing infrastructure and CI/CD integration via GitHub Actions.
New Features
- Text-to-Speech Generation: Added complete generation pipeline with audio output functionality
- Audio Joining: New functionality to join multiple audio segments
- Model Quantization: Added support for model quantization to improve performance
- GitHub Actions: Implemented CI/CD workflows for automated testing and deployment
Improvements
- Kokoro MLX porting: Completed refactoring of the entire model to MLX framework:
- Text encoder with BERT implementation
- Decoder with improved audio quality
- Duration, indices, and alignment target prediction
- Custom Bidirectional LSTM, Weight norm for CNNs, AdaLayerNorm and Generator layers
- SafeTensors Support: Added working implementation for SafeTensors format
- Pipeline Structure: Restructured the generation pipeline for better maintainability
Bug Fixes
- Fixed model loading mechanism
- Resolved issues with text encoder LayerNorm operation
- Fixed generator functionality
- Addressed issues in LSTM and AdaLayerNorm implementations
- Refactored and fixed ConvWeight component
Full Changelog: https://github.com/Blaizzy/mlx-audio/commits/v0.0.1