--- language: - en license: llama3 tags: - text-classification datasets: - openbmb/UltraFeedback - nvidia/HelpSteer - Anthropic/hh-rlhf - PKU-Alignment/PKU-SafeRLHF - NCSOFT/offsetbias base_model: - sfairXC/FsfairX-LLaMA3-RM-v0.1 - meta-llama/Meta-Llama-3-8B-Instruct --- # Model Card for Llama-3-OffsetBias-RM-8B **Llama-3-OffsetBias-RM-8B** is a *reward model* trained on OffsetBias dataset. It is trained to be more robust on various evaluation *biases* commonly found in evaluation models. The model is introduced in paper **OffsetBias: Leveraging Debiased Data for Tuning Evaluators**. ## Model Details ### Model Description **Llama-3-OffsetBias-RM-8B** uses [sfairXC/FsfairX-LLaMA3-RM-v0.1](https://huggingface.co/sfairXC/FsfairX-LLaMA3-RM-v0.1) as base model, which is built with Meta Llama 3. An intermediate reward model is trained from from [Llama-3-8B-Instruct](https://huggingface.co/meta-llama/Meta-Llama-3-8B-Instruct) using a subset of dataset used in training of *FsfairX-LLaMA3-RM* model, combined with *NCSOFT/offsetbias* dataset. The intermediate model is then merged with *FsfairX-LLaMA3-RM* model to create **Llama-3-OffsetBias-RM-8B**. - **Developed by:** NC Research - **Language(s) (NLP):** English - **License:** META LLAMA 3 COMMUNITY LICENSE AGREEMENT - **Finetuned from model:** [sfairXC/FsfairX-LLaMA3-RM-v0.1](https://huggingface.co/sfairXC/FsfairX-LLaMA3-RM-v0.1) ### Model Sources - 💻 **Repository:** [https://github.com/ncsoft/offsetbias](https://github.com/ncsoft/offsetbias) - 📜 **Paper:** [OffsetBias: Leveraging Debiased Data for Tuning Evaluators](https://arxiv.org/abs/2407.06551) - 🤗 **Dataset:** [https://huggingface.co/datasets/NCSOFT/offsetbias](https://huggingface.co/datasets/NCSOFT/offsetbias) ## Uses ### Direct Use ```python from transformers import AutoTokenizer, pipeline import torch model_name = "NCSOFT/Llama-3-OffsetBias-RM-8B" rm_tokenizer = AutoTokenizer.from_pretrained(model_name) rm_pipe = pipeline( "sentiment-analysis", model=model_name, device="auto", tokenizer=rm_tokenizer, model_kwargs={"torch_dtype": torch.bfloat16} ) pipe_kwargs = { "return_all_scores": True, "function_to_apply": "none", "batch_size": 1 } chat = [ {"role": "user", "content": "Hello, how are you?"}, {"role": "assistant", "content": "I'm doing great. How can I help you today?"}, {"role": "user", "content": "I'd like to show off how chat templating works!"}, ] test_texts = [rm_tokenizer.apply_chat_template(chat, tokenize=False, add_generation_prompt=False).replace(rm_tokenizer.bos_token, "")] pipe_outputs = rm_pipe(test_texts, **pipe_kwargs) rewards = [output[0]["score"] for output in pipe_outputs] ``` ## Evaluation ### RewardBench Result | Metric | Score | |--------------|--------| | Chat | 97.21 | | Chat Hard | 80.70 | | Safety | 89.01 | | Reasoning | 90.60 | ### EvalBiasBench Result | Metric | Score | |-----------------------|-------| | Length | 82.4 | | Concreteness | 92.9 | | Empty Reference | 46.2 | | Content Continuation | 100.0 | | Nested Instruction | 83.3 | | Familiar Knowledge | 58.3 | ## Citation ```bibtex @misc{park2024offsetbias, title={OffsetBias: Leveraging Debiased Data for Tuning Evaluators}, author={Junsoo Park and Seungyeon Jwa and Meiying Ren and Daeyoung Kim and Sanghyuk Choi}, year={2024}, eprint={2407.06551}, archivePrefix={arXiv}, primaryClass={cs.CL} } ```