Show examples using "torchaudio.save" instead of "scipy.io.wavfile.write" in README
#16
by
abellion
- opened
README.md
CHANGED
@@ -185,13 +185,13 @@ model = SeamlessM4Tv2Model.from_pretrained("facebook/seamless-m4t-v2-large")
|
|
185 |
|
186 |
# from text
|
187 |
text_inputs = processor(text = "Hello, my dog is cute", src_lang="eng", return_tensors="pt")
|
188 |
-
|
189 |
|
190 |
# from audio
|
191 |
audio, orig_freq = torchaudio.load("https://www2.cs.uic.edu/~i101/SoundFiles/preamble10.wav")
|
192 |
audio = torchaudio.functional.resample(audio, orig_freq=orig_freq, new_freq=16_000) # must be a 16 kHz waveform array
|
193 |
audio_inputs = processor(audios=audio, return_tensors="pt")
|
194 |
-
|
195 |
```
|
196 |
|
197 |
3. Listen to the audio samples either in an ipynb notebook:
|
@@ -200,18 +200,18 @@ audio_array_from_audio = model.generate(**audio_inputs, tgt_lang="rus")[0].cpu()
|
|
200 |
from IPython.display import Audio
|
201 |
|
202 |
sample_rate = model.sampling_rate
|
203 |
-
Audio(
|
204 |
-
# Audio(
|
205 |
```
|
206 |
|
207 |
-
Or save them as a `.wav` file
|
208 |
|
209 |
```py
|
210 |
-
import
|
211 |
|
212 |
sample_rate = model.sampling_rate
|
213 |
-
|
214 |
-
#
|
215 |
```
|
216 |
For more details on using the SeamlessM4T model for inference using the 🤗 Transformers library, refer to the
|
217 |
**[SeamlessM4T v2 docs](https://huggingface.co/docs/transformers/main/en/model_doc/seamless_m4t_v2)** or to this **hands-on [Google Colab](https://colab.research.google.com/github/ylacombe/scripts_and_notebooks/blob/main/v2_seamless_m4t_hugging_face.ipynb).**
|
|
|
185 |
|
186 |
# from text
|
187 |
text_inputs = processor(text = "Hello, my dog is cute", src_lang="eng", return_tensors="pt")
|
188 |
+
audio_tensor_from_text = model.generate(**text_inputs, tgt_lang="rus")[0]
|
189 |
|
190 |
# from audio
|
191 |
audio, orig_freq = torchaudio.load("https://www2.cs.uic.edu/~i101/SoundFiles/preamble10.wav")
|
192 |
audio = torchaudio.functional.resample(audio, orig_freq=orig_freq, new_freq=16_000) # must be a 16 kHz waveform array
|
193 |
audio_inputs = processor(audios=audio, return_tensors="pt")
|
194 |
+
audio_tensor_from_audio = model.generate(**audio_inputs, tgt_lang="rus")[0]
|
195 |
```
|
196 |
|
197 |
3. Listen to the audio samples either in an ipynb notebook:
|
|
|
200 |
from IPython.display import Audio
|
201 |
|
202 |
sample_rate = model.sampling_rate
|
203 |
+
Audio(audio_tensor_from_text.cpu().numpy().squeeze(), rate=sample_rate)
|
204 |
+
# Audio(audio_tensor_from_audio.cpu().numpy().squeeze(), rate=sample_rate)
|
205 |
```
|
206 |
|
207 |
+
Or save them as a `.wav` file:
|
208 |
|
209 |
```py
|
210 |
+
import torchaudio
|
211 |
|
212 |
sample_rate = model.sampling_rate
|
213 |
+
torchaudio.save(uri="out_from_text.wav", src=audio_tensor_from_text, sample_rate=sample_rate, channels_first=True)
|
214 |
+
# torchaudio.save(uri="out_from_audio.wav", src=audio_tensor_from_audio, sample_rate=sample_rate, channels_first=True)
|
215 |
```
|
216 |
For more details on using the SeamlessM4T model for inference using the 🤗 Transformers library, refer to the
|
217 |
**[SeamlessM4T v2 docs](https://huggingface.co/docs/transformers/main/en/model_doc/seamless_m4t_v2)** or to this **hands-on [Google Colab](https://colab.research.google.com/github/ylacombe/scripts_and_notebooks/blob/main/v2_seamless_m4t_hugging_face.ipynb).**
|