jiazhengli/Pythia-2.8B-HH-RLHF-Iterative-SamPO
Text Generation
•
Updated
•
14
Resources for EMNLP 2024 Paper: Eliminating Biased Length Reliance of Direct Preference Optimization via Down-Sampled KL Divergence