Taming Overconfidence in LLMs: Reward Calibration in RLHF Paper • 2410.09724 • Published 25 days ago • 2 • 2