Compose & Embellish: Piano Performance Generation Pipeline
Trained model weights and training datasets for the paper:
- Shih-Lun Wu and Yi-Hsuan Yang
"Compose & Embellish: Well-Structured Piano Performance Generation via A Two-Stage Approach."
Int. Conf. on Acoustics, Speech and Signal Processing (ICASSP), 2023
Note: Materials here should be used in conjunction with our model implementation Github repo.
Model characteristics
Stage 1: "Compose" model
Generates melody and chord progression from scratch.
- Model backbone: 12-layer Transformer w/ relative positional encoding
- Num trainable params: 41.3M
- Token vocabulary: Revamped MIDI-derived events (REMI) w/ slight modifications
- Pretraining dataset: subset of Lakh MIDI full (LMD-full), 14934 songs
- melody extraction (and data filtering) done by matching lyrics to tracks: https://github.com/gulnazaki/lyrics-melody/blob/main/pre-processing/create_dataset.py
- structural segmentation done with A* search: https://github.com/Dsqvival/hierarchical-structure-analysis
- Finetuning dataset: subset of AILabs.tw Pop1K7 (Pop1K7), 1591 songs
- melody extraction done with skyline algorithm: https://github.com/wazenmai/MIDI-BERT/blob/CP/melody_extraction/skyline/analyzer.py
- structural segmentation done in the same way as pretraining dataset
- Training sequence length: 2400
Stage 2: "Embellish" model
Generates accompaniment, timing and dynamics conditioned on Stage 1 outputs.
embellish_model_gpt2_pop1k7_loss0.398.bin
- Model backbone: 12-layer GPT-2 Transformer (implementation)
- Num trainable params: 38.2M
embellish_model_pop1k7_loss0.399.bin
(requiresfast-transformers
package, which is outdated as of Jul. 2024)- Model backbone: 12-layer Performer (paper, implementation)
- Num trainable params: 38.2M
- Token vocabulary: Revamped MIDI-derived events (REMI) w/ slight modifications
- Training dataset: AILabs.tw Pop1K7 (Pop1K7), 1747 songs
- Training sequence length: 3072
BibTex
If you find the materials useful, please consider citing our work:
@inproceedings{wu2023compembellish,
title={{Compose \& Embellish}: Well-Structured Piano Performance Generation via A Two-Stage Approach},
author={Wu, Shih-Lun and Yang, Yi-Hsuan},
booktitle={Proc. Int. Conf. on Acoustics, Speech and Signal Processing (ICASSP)},
year={2023},
url={https://arxiv.org/pdf/2209.08212.pdf}
}