metadata

language: en
tags:
  - text classification
  - hate speech
  - offensive language
  - hatecheck
datasets:
  - unhcr-hatespeech
metrics:
  - f1
  - hatecheck

Frederik Gaasdal Jensen • Henry Stoll • Sippo Rossi • Raghava Rao Mukkamala

UNHCR Hate Speech Detection Model

This is a transformer model that can detect hate and offensive speech for English text. The primary use-case of this model is to detect hate speech targeted at refugees. The model is based on roberta-uncased and was fine-tuned on 12 abusive language datasets.

The model has been developed as a collaboration between UNHCR, the UN Refugee Agency, and Copenhagen Business School.

F1-score on test set (10% of the overall dataset): 81%
Hatecheck score: 90.3%

Labels

{
  0: "Normal",
  1: "Offensive",
  2: "Hate speech",
}