RegMix: Data Mixture as Regression for Language Model Pre-training Paper • 2407.01492 • Published Jul 1 • 35
Bootstrapping Language Models with DPO Implicit Rewards Paper • 2406.09760 • Published Jun 14 • 38