Release Notes - February 28, 2025

Overview

This release introduces a fully functional MLX-Audio package with text-to-speech capabilities, complete with testing infrastructure and CI/CD integration via GitHub Actions.

New Features

Text-to-Speech Generation: Added complete generation pipeline with audio output functionality
Audio Joining: New functionality to join multiple audio segments
Model Quantization: Added support for model quantization to improve performance
GitHub Actions: Implemented CI/CD workflows for automated testing and deployment

Improvements

Kokoro MLX porting: Completed refactoring of the entire model to MLX framework:
- Text encoder with BERT implementation
- Decoder with improved audio quality
- Duration, indices, and alignment target prediction
- Custom Bidirectional LSTM, Weight norm for CNNs, AdaLayerNorm and Generator layers
SafeTensors Support: Added working implementation for SafeTensors format
Pipeline Structure: Restructured the generation pipeline for better maintainability

Bug Fixes

Fixed model loading mechanism
Resolved issues with text encoder LayerNorm operation
Fixed generator functionality
Addressed issues in LSTM and AdaLayerNorm implementations
Refactored and fixed ConvWeight component

Full Changelog: https://github.com/Blaizzy/mlx-audio/commits/v0.0.1

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

v0.0.1

Release Notes - February 28, 2025

Overview

New Features

Improvements

Bug Fixes