Compose & Embellish: Piano Performance Generation Pipeline

Trained model weights and training datasets for the paper:

Shih-Lun Wu and Yi-Hsuan Yang
"Compose & Embellish: Well-Structured Piano Performance Generation via A Two-Stage Approach."
Int. Conf. on Acoustics, Speech and Signal Processing (ICASSP), 2023

Note: Materials here should be used in conjunction with our model implementation Github repo.

Model characteristics

Stage 1: "Compose" model

Generates melody and chord progression from scratch.

Model backbone: 12-layer Transformer w/ relative positional encoding
Num trainable params: 41.3M
Token vocabulary: Revamped MIDI-derived events (REMI) w/ slight modifications
Pretraining dataset: subset of Lakh MIDI full (LMD-full), 14934 songs
- melody extraction (and data filtering) done by matching lyrics to tracks: https://github.com/gulnazaki/lyrics-melody/blob/main/pre-processing/create_dataset.py
- structural segmentation done with A* search: https://github.com/Dsqvival/hierarchical-structure-analysis
Finetuning dataset: subset of AILabs.tw Pop1K7 (Pop1K7), 1591 songs
- melody extraction done with skyline algorithm: https://github.com/wazenmai/MIDI-BERT/blob/CP/melody_extraction/skyline/analyzer.py
- structural segmentation done in the same way as pretraining dataset
Training sequence length: 2400

Stage 2: "Embellish" model

Generates accompaniment, timing and dynamics conditioned on Stage 1 outputs.

embellish_model_gpt2_pop1k7_loss0.398.bin
- Model backbone: 12-layer GPT-2 Transformer (implementation)
- Num trainable params: 38.2M
embellish_model_pop1k7_loss0.399.bin (requires fast-transformers package, which is outdated as of Jul. 2024)
- Model backbone: 12-layer Performer (paper, implementation)
- Num trainable params: 38.2M
Token vocabulary: Revamped MIDI-derived events (REMI) w/ slight modifications
Training dataset: AILabs.tw Pop1K7 (Pop1K7), 1747 songs
Training sequence length: 3072

BibTex

If you find the materials useful, please consider citing our work:

@inproceedings{wu2023compembellish,
  title={{Compose \& Embellish}: Well-Structured Piano Performance Generation via A Two-Stage Approach},
  author={Wu, Shih-Lun and Yang, Yi-Hsuan},
  booktitle={Proc. Int. Conf. on Acoustics, Speech and Signal Processing (ICASSP)},
  year={2023},
  url={https://arxiv.org/pdf/2209.08212.pdf}
}