Spaces:
Runtime error
Runtime error
update README
Browse files
README.md
CHANGED
@@ -11,7 +11,7 @@ Audio can be represented as images by transforming to a [mel spectrogram](https:
|
|
11 |
A DDPM model is trained on a set of mel spectrograms that have been generated from a directory of audio files. It is then used to synthesize similar mel spectrograms, which are then converted back into audio. See the `test-model.ipynb` notebook for an example.
|
12 |
|
13 |
## Generate Mel spectrogram dataset from directory of audio files
|
14 |
-
|
15 |
|
16 |
```bash
|
17 |
python src/audio_to_images.py \
|
@@ -21,7 +21,7 @@ python src/audio_to_images.py \
|
|
21 |
--output_dir data-test
|
22 |
```
|
23 |
|
24 |
-
|
25 |
|
26 |
```bash
|
27 |
python src/audio_to_images.py \
|
@@ -31,7 +31,7 @@ python src/audio_to_images.py \
|
|
31 |
--push_to_hub teticio\audio-diffusion-256
|
32 |
```
|
33 |
## Train model
|
34 |
-
|
35 |
|
36 |
```bash
|
37 |
accelerate launch --config_file accelerate_local.yaml \
|
@@ -48,7 +48,7 @@ accelerate launch --config_file accelerate_local.yaml \
|
|
48 |
--mixed_precision no
|
49 |
```
|
50 |
|
51 |
-
|
52 |
|
53 |
```bash
|
54 |
accelerate launch --config_file accelerate_local.yaml \
|
@@ -65,7 +65,7 @@ accelerate launch --config_file accelerate_local.yaml \
|
|
65 |
--mixed_precision no
|
66 |
```
|
67 |
|
68 |
-
|
69 |
|
70 |
```bash
|
71 |
accelerate launch --config_file accelerate_sagemaker.yaml \
|
|
|
11 |
A DDPM model is trained on a set of mel spectrograms that have been generated from a directory of audio files. It is then used to synthesize similar mel spectrograms, which are then converted back into audio. See the `test-model.ipynb` notebook for an example.
|
12 |
|
13 |
## Generate Mel spectrogram dataset from directory of audio files
|
14 |
+
#### Training can be run with Mel spectrograms of resolution 64x64 on a single commercial grade GPU (e.g. RTX 2080 Ti). The `hop_length` should be set to 1024 for better results.
|
15 |
|
16 |
```bash
|
17 |
python src/audio_to_images.py \
|
|
|
21 |
--output_dir data-test
|
22 |
```
|
23 |
|
24 |
+
#### Generate dataset of 256x256 Mel spectrograms and push to hub (you will need to be authenticated with `huggingface-cli login`).
|
25 |
|
26 |
```bash
|
27 |
python src/audio_to_images.py \
|
|
|
31 |
--push_to_hub teticio\audio-diffusion-256
|
32 |
```
|
33 |
## Train model
|
34 |
+
#### Run training on local machine.
|
35 |
|
36 |
```bash
|
37 |
accelerate launch --config_file accelerate_local.yaml \
|
|
|
48 |
--mixed_precision no
|
49 |
```
|
50 |
|
51 |
+
#### Run training on local machine with `batch_size` of 1 and `gradient_accumulation_steps` 16 to compensate, so that 256x256 resolution model fits on commercial grade GPU.
|
52 |
|
53 |
```bash
|
54 |
accelerate launch --config_file accelerate_local.yaml \
|
|
|
65 |
--mixed_precision no
|
66 |
```
|
67 |
|
68 |
+
#### Run training on SageMaker.
|
69 |
|
70 |
```bash
|
71 |
accelerate launch --config_file accelerate_sagemaker.yaml \
|