Collections

-

PERL: Parameter Efficient Reinforcement Learning from Human Feedback

Paper • 2403.10704 • Published Mar 15 • 57
WARM: On the Benefits of Weight Averaged Reward Models

Paper • 2401.12187 • Published Jan 22 • 17
RewardBench: Evaluating Reward Models for Language Modeling

Paper • 2403.13787 • Published Mar 20 • 21
DreamReward: Text-to-3D Generation with Human Preference

Paper • 2403.14613 • Published Mar 21 • 35

26

Language Agent Tree Search Unifies Reasoning Acting and Planning in Language Models

Paper • 2310.04406 • Published Oct 6, 2023 • 8
Chain-of-Thought Reasoning Without Prompting

Paper • 2402.10200 • Published Feb 15 • 99
ICDPO: Effectively Borrowing Alignment Capability of Others via In-context Direct Preference Optimization

Paper • 2402.09320 • Published Feb 14 • 6
Self-Discover: Large Language Models Self-Compose Reasoning Structures

Paper • 2402.03620 • Published Feb 6 • 109

5

-

PERL: Parameter Efficient Reinforcement Learning from Human Feedback

WARM: On the Benefits of Weight Averaged Reward Models

RewardBench: Evaluating Reward Models for Language Modeling

DreamReward: Text-to-3D Generation with Human Preference

Language Agent Tree Search Unifies Reasoning Acting and Planning in Language Models

Chain-of-Thought Reasoning Without Prompting

ICDPO: Effectively Borrowing Alignment Capability of Others via In-context Direct Preference Optimization

Self-Discover: Large Language Models Self-Compose Reasoning Structures

Qualitatively characterizing neural network optimization problems

Averaging Weights Leads to Wider Optima and Better Generalization

Merging Models with Fisher-Weighted Averaging

Model soups: averaging weights of multiple fine-tuned models improves accuracy without increasing inference time

WARM: On the Benefits of Weight Averaged Reward Models

Self-Rewarding Language Models

Secrets of RLHF in Large Language Models Part II: Reward Modeling

SliceGPT: Compress Large Language Models by Deleting Rows and Columns

FP6-LLM: Efficiently Serving Large Language Models Through FP6-Centric Algorithm-System Co-Design

WebVoyager: Building an End-to-End Web Agent with Large Multimodal Models

Sketch2NeRF: Multi-view Sketch-guided Text-to-3D Generation

Griffin: Mixing Gated Linear Recurrences with Local Attention for Efficient Language Models

Simple linear attention language models balance the recall-throughput tradeoff

ChunkAttention: Efficient Self-Attention with Prefix-Aware KV Cache and Two-Phase Partition

Linear Transformers are Versatile In-Context Learners

LongLoRA: Efficient Fine-tuning of Long-Context Large Language Models

NEFTune: Noisy Embeddings Improve Instruction Finetuning

SOLAR 10.7B: Scaling Large Language Models with Simple yet Effective Depth Up-Scaling

Soaring from 4K to 400K: Extending LLM's Context with Activation Beacon

Pearl: A Production-ready Reinforcement Learning Agent

bigscience/bloom

WARM: On the Benefits of Weight Averaged Reward Models

LLM Performance Leaderboard

The Goldilocks of Pragmatic Understanding: Fine-Tuning Strategy Matters for Implicature Resolution by LLMs

Camels in a Changing Climate: Enhancing LM Adaptation with Tulu 2

Large Language Models as Optimizers

From Sparse to Dense: GPT-4 Summarization with Chain of Density Prompting

Qualitatively characterizing neural network optimization problems

Convergent Learning: Do different neural networks learn the same representations?

Mixout: Effective Regularization to Finetune Large-scale Pretrained Language Models

Model Fusion via Optimal Transport