Show examples using "torchaudio.save" instead of "scipy.io.wavfile.write" in README

#16
by abellion - opened
Files changed (1) hide show
  1. README.md +8 -8
README.md CHANGED
@@ -185,13 +185,13 @@ model = SeamlessM4Tv2Model.from_pretrained("facebook/seamless-m4t-v2-large")
185
 
186
  # from text
187
  text_inputs = processor(text = "Hello, my dog is cute", src_lang="eng", return_tensors="pt")
188
- audio_array_from_text = model.generate(**text_inputs, tgt_lang="rus")[0].cpu().numpy().squeeze()
189
 
190
  # from audio
191
  audio, orig_freq = torchaudio.load("https://www2.cs.uic.edu/~i101/SoundFiles/preamble10.wav")
192
  audio = torchaudio.functional.resample(audio, orig_freq=orig_freq, new_freq=16_000) # must be a 16 kHz waveform array
193
  audio_inputs = processor(audios=audio, return_tensors="pt")
194
- audio_array_from_audio = model.generate(**audio_inputs, tgt_lang="rus")[0].cpu().numpy().squeeze()
195
  ```
196
 
197
  3. Listen to the audio samples either in an ipynb notebook:
@@ -200,18 +200,18 @@ audio_array_from_audio = model.generate(**audio_inputs, tgt_lang="rus")[0].cpu()
200
  from IPython.display import Audio
201
 
202
  sample_rate = model.sampling_rate
203
- Audio(audio_array_from_text, rate=sample_rate)
204
- # Audio(audio_array_from_audio, rate=sample_rate)
205
  ```
206
 
207
- Or save them as a `.wav` file using a third-party library, e.g. `scipy`:
208
 
209
  ```py
210
- import scipy
211
 
212
  sample_rate = model.sampling_rate
213
- scipy.io.wavfile.write("out_from_text.wav", rate=sample_rate, data=audio_array_from_text)
214
- # scipy.io.wavfile.write("out_from_audio.wav", rate=sample_rate, data=audio_array_from_audio)
215
  ```
216
  For more details on using the SeamlessM4T model for inference using the 🤗 Transformers library, refer to the
217
  **[SeamlessM4T v2 docs](https://huggingface.co/docs/transformers/main/en/model_doc/seamless_m4t_v2)** or to this **hands-on [Google Colab](https://colab.research.google.com/github/ylacombe/scripts_and_notebooks/blob/main/v2_seamless_m4t_hugging_face.ipynb).**
 
185
 
186
  # from text
187
  text_inputs = processor(text = "Hello, my dog is cute", src_lang="eng", return_tensors="pt")
188
+ audio_tensor_from_text = model.generate(**text_inputs, tgt_lang="rus")[0]
189
 
190
  # from audio
191
  audio, orig_freq = torchaudio.load("https://www2.cs.uic.edu/~i101/SoundFiles/preamble10.wav")
192
  audio = torchaudio.functional.resample(audio, orig_freq=orig_freq, new_freq=16_000) # must be a 16 kHz waveform array
193
  audio_inputs = processor(audios=audio, return_tensors="pt")
194
+ audio_tensor_from_audio = model.generate(**audio_inputs, tgt_lang="rus")[0]
195
  ```
196
 
197
  3. Listen to the audio samples either in an ipynb notebook:
 
200
  from IPython.display import Audio
201
 
202
  sample_rate = model.sampling_rate
203
+ Audio(audio_tensor_from_text.cpu().numpy().squeeze(), rate=sample_rate)
204
+ # Audio(audio_tensor_from_audio.cpu().numpy().squeeze(), rate=sample_rate)
205
  ```
206
 
207
+ Or save them as a `.wav` file:
208
 
209
  ```py
210
+ import torchaudio
211
 
212
  sample_rate = model.sampling_rate
213
+ torchaudio.save(uri="out_from_text.wav", src=audio_tensor_from_text, sample_rate=sample_rate, channels_first=True)
214
+ # torchaudio.save(uri="out_from_audio.wav", src=audio_tensor_from_audio, sample_rate=sample_rate, channels_first=True)
215
  ```
216
  For more details on using the SeamlessM4T model for inference using the 🤗 Transformers library, refer to the
217
  **[SeamlessM4T v2 docs](https://huggingface.co/docs/transformers/main/en/model_doc/seamless_m4t_v2)** or to this **hands-on [Google Colab](https://colab.research.google.com/github/ylacombe/scripts_and_notebooks/blob/main/v2_seamless_m4t_hugging_face.ipynb).**