--- license: mit tags: - mlx --- # F5 TTS — MLX [F5 TTS](https://arxiv.org/abs/2410.06885) for the [MLX](https://github.com/ml-explore/mlx) framework. This model is reshaped for MLX from the original weights and is designed for use with [f5-tts-mlx](https://github.com/lucasnewman/f5-tts-mlx) F5 TTS is a non-autoregressive, zero-shot text-to-speech system using a flow-matching mel spectrogram generator with a diffusion transformer (DiT). You can listen to a [sample here](https://s3.amazonaws.com/lucasnewman.datasets/f5tts/sample.wav) that was generated in ~11 seconds on an M3 Max MacBook Pro. See [F5-TTS](https://huggingface.co/SWivid/F5-TTS) for the original checkpoint. ## Installation ```bash pip install f5-tts-mlx ``` ## Usage ```bash python -m f5_tts_mlx.generate --text "The quick brown fox jumped over the lazy dog." ``` If you want to use your own reference audio sample, make sure it's a mono, 24kHz wav file of around 5-10 seconds: ```bash python -m f5_tts_mlx.generate \ --text "The quick brown fox jumped over the lazy dog." --ref-audio /path/to/audio.wav --ref-text "This is the caption for the reference audio." ``` You can convert an audio file to the correct format with ffmpeg like this: ```bash ffmpeg -i /path/to/audio.wav -ac 1 -ar 24000 -sample_fmt s16 -t 10 /path/to/output_audio.wav ``` See [here](https://github.com/lucasnewman/f5-tts-mlx/tree/main/f5_tts_mlx) for more options to customize generation. — You can load a pretrained model from Python like this: ```python from f5_tts_mlx.generate import generate audio = generate(text = "Hello world.", ...) ```