lucasnewman
/

f5-tts-mlx

Model card Files Files and versions Community

f5-tts-mlx / README.md

lucasnewman's picture

Update README.md

663e962 verified 10 days ago

|

history blame contribute delete

1.62 kB

	---
	license: mit
	tags:
	- mlx
	---

	# F5 TTS — MLX

	[F5 TTS](https://arxiv.org/abs/2410.06885) for the [MLX](https://github.com/ml-explore/mlx) framework.

	This model is reshaped for MLX from the original weights and is designed for use with [f5-tts-mlx](https://github.com/lucasnewman/f5-tts-mlx)

	F5 TTS is a non-autoregressive, zero-shot text-to-speech system using a flow-matching mel spectrogram generator with a diffusion transformer (DiT).

	You can listen to a [sample here](https://s3.amazonaws.com/lucasnewman.datasets/f5tts/sample.wav) that was generated in ~11 seconds on an M3 Max MacBook Pro.

	See [F5-TTS](https://huggingface.co/SWivid/F5-TTS) for the original checkpoint.

	## Installation

	```bash
	pip install f5-tts-mlx
	```

	## Usage

	```bash
	python -m f5_tts_mlx.generate --text "The quick brown fox jumped over the lazy dog."
	```

	If you want to use your own reference audio sample, make sure it's a mono, 24kHz wav file of around 5-10 seconds:

	```bash
	python -m f5_tts_mlx.generate \
	--text "The quick brown fox jumped over the lazy dog."
	--ref-audio /path/to/audio.wav
	--ref-text "This is the caption for the reference audio."
	```

	You can convert an audio file to the correct format with ffmpeg like this:

	```bash
	ffmpeg -i /path/to/audio.wav -ac 1 -ar 24000 -sample_fmt s16 -t 10 /path/to/output_audio.wav
	```

	See [here](https://github.com/lucasnewman/f5-tts-mlx/tree/main/f5_tts_mlx) for more options to customize generation.

	—

	You can load a pretrained model from Python like this:

	```python
	from f5_tts_mlx.generate import generate

	audio = generate(text = "Hello world.", ...)
	```