-
Synthetic Data (Almost) from Scratch: Generalized Instruction Tuning for Language Models
Paper • 2402.13064 • Published • 46 -
DataDreamer: A Tool for Synthetic Data Generation and Reproducible LLM Workflows
Paper • 2402.10379 • Published • 29 -
Automatic Data Curation for Self-Supervised Learning: A Clustering-Based Approach
Paper • 2405.15613 • Published • 13 -
Are You Sure? Rank Them Again: Repeated Ranking For Better Preference Datasets
Paper • 2405.18952 • Published • 10
Ville Komulainen
Villekom
AI & ML interests
NLP, text generation, semantic analysis