-
Contrastive Prefence Learning: Learning from Human Feedback without RL
Paper • 2310.13639 • Published • 24 -
RLAIF: Scaling Reinforcement Learning from Human Feedback with AI Feedback
Paper • 2309.00267 • Published • 47 -
Diffusion Model Alignment Using Direct Preference Optimization
Paper • 2311.12908 • Published • 47 -
RLHF-V: Towards Trustworthy MLLMs via Behavior Alignment from Fine-grained Correctional Human Feedback
Paper • 2312.00849 • Published • 8
Collections
Discover the best community collections!
Collections including paper arxiv:2310.13639
-
Tuna: Instruction Tuning using Feedback from Large Language Models
Paper • 2310.13385 • Published • 10 -
Contrastive Prefence Learning: Learning from Human Feedback without RL
Paper • 2310.13639 • Published • 24 -
Teaching Language Models to Self-Improve through Interactive Demonstrations
Paper • 2310.13522 • Published • 11 -
Zephyr: Direct Distillation of LM Alignment
Paper • 2310.16944 • Published • 121
-
BitNet: Scaling 1-bit Transformers for Large Language Models
Paper • 2310.11453 • Published • 96 -
Self-RAG: Learning to Retrieve, Generate, and Critique through Self-Reflection
Paper • 2310.11511 • Published • 74 -
In-Context Learning Creates Task Vectors
Paper • 2310.15916 • Published • 41 -
Matryoshka Diffusion Models
Paper • 2310.15111 • Published • 40
-
Efficient RLHF: Reducing the Memory Usage of PPO
Paper • 2309.00754 • Published • 13 -
Statistical Rejection Sampling Improves Preference Optimization
Paper • 2309.06657 • Published • 13 -
Are Large Language Model-based Evaluators the Solution to Scaling Up Multilingual Evaluation?
Paper • 2309.07462 • Published • 3 -
Stabilizing RLHF through Advantage Model and Selective Rehearsal
Paper • 2309.10202 • Published • 9