---
language:
- en
license: llama3
tags:
- text-classification
datasets:
- openbmb/UltraFeedback
- nvidia/HelpSteer
- Anthropic/hh-rlhf
- PKU-Alignment/PKU-SafeRLHF
- NCSOFT/offsetbias
base_model:
- sfairXC/FsfairX-LLaMA3-RM-v0.1
- meta-llama/Meta-Llama-3-8B-Instruct
---

# Model Card for Llama-3-OffsetBias-RM-8B

**Llama-3-OffsetBias-RM-8B** is a *reward model* trained on OffsetBias dataset. It is trained to be more robust on various evaluation *biases* commonly found in evaluation models. The model is introduced in paper **OffsetBias: Leveraging Debiased Data for Tuning Evaluators**.

## Model Details

### Model Description

**Llama-3-OffsetBias-RM-8B** uses [sfairXC/FsfairX-LLaMA3-RM-v0.1](https://huggingface.co/sfairXC/FsfairX-LLaMA3-RM-v0.1) as base model, which is built with Meta Llama 3. An intermediate reward model is trained from from [Llama-3-8B-Instruct](https://huggingface.co/meta-llama/Meta-Llama-3-8B-Instruct) using a subset of dataset used in training of *FsfairX-LLaMA3-RM* model, combined with *NCSOFT/offsetbias* dataset. The intermediate model is then merged with *FsfairX-LLaMA3-RM* model to create **Llama-3-OffsetBias-RM-8B**.

- **Developed by:** NC Research
- **Language(s) (NLP):** English
- **License:** META LLAMA 3 COMMUNITY LICENSE AGREEMENT
- **Finetuned from model:** [sfairXC/FsfairX-LLaMA3-RM-v0.1](https://huggingface.co/sfairXC/FsfairX-LLaMA3-RM-v0.1)

### Model Sources

- 💻 **Repository:** [https://github.com/ncsoft/offsetbias](https://github.com/ncsoft/offsetbias)
- 📜 **Paper:** [OffsetBias: Leveraging Debiased Data for Tuning Evaluators](https://arxiv.org/abs/2407.06551)
- 🤗 **Dataset:** [https://huggingface.co/datasets/NCSOFT/offsetbias](https://huggingface.co/datasets/NCSOFT/offsetbias)

## Uses

### Direct Use

```python
from transformers import AutoTokenizer, pipeline
import torch

model_name = "NCSOFT/Llama-3-OffsetBias-RM-8B"
rm_tokenizer = AutoTokenizer.from_pretrained(model_name)
rm_pipe = pipeline(
    "sentiment-analysis",
    model=model_name,
    device="auto",
    tokenizer=rm_tokenizer,
    model_kwargs={"torch_dtype": torch.bfloat16}
)

pipe_kwargs = {
    "return_all_scores": True,
    "function_to_apply": "none",
    "batch_size": 1
}

chat = [
 {"role": "user", "content": "Hello, how are you?"},
 {"role": "assistant", "content": "I'm doing great. How can I help you today?"},
 {"role": "user", "content": "I'd like to show off how chat templating works!"},
]

test_texts = [rm_tokenizer.apply_chat_template(chat, tokenize=False, add_generation_prompt=False).replace(rm_tokenizer.bos_token, "")]
pipe_outputs = rm_pipe(test_texts, **pipe_kwargs)
rewards = [output[0]["score"] for output in pipe_outputs]
```

## Evaluation

### RewardBench Result
| Metric       | Score  |
|--------------|--------|
| Chat         | 97.21  |
| Chat Hard    | 80.70  |
| Safety       | 89.01  |
| Reasoning    | 90.60  |

### EvalBiasBench Result

| Metric                | Score |
|-----------------------|-------|
| Length                | 82.4  |
| Concreteness          | 92.9  |
| Empty Reference       | 46.2  |
| Content Continuation  | 100.0 |
| Nested Instruction    | 83.3  |
| Familiar Knowledge    | 58.3  |

## Citation

```bibtex
@misc{park2024offsetbias,
      title={OffsetBias: Leveraging Debiased Data for Tuning Evaluators},
      author={Junsoo Park and Seungyeon Jwa and Meiying Ren and Daeyoung Kim and Sanghyuk Choi},
      year={2024},
      eprint={2407.06551},
      archivePrefix={arXiv},
      primaryClass={cs.CL}
}
```