christinacdl
/

XLM_RoBERTa-Multilingual-Clickbait-Detection

Text Classification

Generated from Trainer

Inference Endpoints

Model card Files Files and versions Community

XLM_RoBERTa-Multilingual-Clickbait-Detection / README.md

christinacdl's picture

Update README.md

1a02f7a verified 7 months ago

|

history blame contribute delete

No virus

1.9 kB

metadata

license: mit
base_model: xlm-roberta-large
tags:
  - generated_from_trainer
metrics:
  - accuracy
  - f1
model-index:
  - name: XLM_RoBERTa-Multilingual-Clickbait-Detection
    results: []
datasets:
  - christinacdl/clickbait_detection_dataset
language:
  - en
  - el
  - it
  - es
  - ro
  - de
  - fr
  - pl
pipeline_tag: text-classification

XLM_RoBERTa-Multilingual-Clickbait-Detection

This model is a fine-tuned version of xlm-roberta-large on an unknown dataset. It achieves the following results on the evaluation set:

Loss: 0.2192
Micro F1: 0.9759
Macro F1: 0.9758
Accuracy: 0.9759

Test Set Macro-F1 scores

Multilingual test set: 97.28
en test set: 97.83
el test set: 97.32
it test set: 97.54
es test set: 97.67
ro test set: 97.40
de test set: 97.40
fr test set: 96.90
pl test set: 96.18

Intended uses & limitations

This model will be employed for an EU project.

Training and evaluation data

The "clickbait_detection_dataset" was translated from English to Greek, Italian, Spanish, Romanian, French and German using the Opus-mt.
The dataset was also translated from English to Polish using the M2M NMT.
The "EasyNMT" library was utilized to employ the NMT models.

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

learning_rate: 1e-05
train_batch_size: 16
eval_batch_size: 16
seed: 42
gradient_accumulation_steps: 2
total_train_batch_size: 32
optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
lr_scheduler_type: linear
num_epochs: 4

Framework versions

Transformers 4.36.1
Pytorch 2.1.0+cu121
Datasets 2.13.1
Tokenizers 0.15.0