Training language models to follow instructions with human feedback Paper • 2203.02155 • Published Mar 4, 2022 • 15
Direct Preference-based Policy Optimization without Reward Modeling Paper • 2301.12842 • Published Jan 30, 2023
Woodpecker: Hallucination Correction for Multimodal Large Language Models Paper • 2310.16045 • Published Oct 24, 2023 • 14
DPOK: Reinforcement Learning for Fine-tuning Text-to-Image Diffusion Models Paper • 2305.16381 • Published May 25, 2023 • 3
Secrets of RLHF in Large Language Models Part I: PPO Paper • 2307.04964 • Published Jul 11, 2023 • 28