metadata
license: apache-2.0
language:
- en
Pre-trained model fine-tuned using Reinforcement Learning on DIALOCONAN dataset using facebook/roberta-hate-speech-dynabench-r4-target as reward model.
Toxicity results on allenai/real-toxicity-prompts dataset using custom prompts (see 🥞RewardLM for details).
Toxicity Level | RedPajama-INCITE-Chat-3B |
---|---|
Pre-Trained | 0.217 |
Fine-Tuned | 0.129 |
RL | 0.160 |