license: mit
base_model: roberta-base
tags:
- generated_from_trainer
metrics:
- f1
- accuracy
model-index:
- name: roberta_echr_truncated_facts_all_labels
results: []
library_name: transformers
roberta_echr_truncated_facts_all_labels
This model is a fine-tuned version of roberta-base on an unknown dataset. It achieves the following results on the evaluation set:
- Loss: 0.0674
- F1: 0.7452
- Roc Auc: 0.8460
- Accuracy: 0.5883
Roberta Model for Multi-Label Human Rights Classification
Overview
The Roberta model, trained on a multi-label classifier head, is designed to identify potential human rights violations based on the facts provided by users. This model was trained on over 13,000 cases from the European Court of Human Rights (ECHR), ensuring a comprehensive and accurate classification of human rights violations as per the European Convention on Human Rights (ECHR).
Training Data
- Dataset Size: 13,000+ cases.
- Data Quality:
- Manual Review: An extensive manual review was conducted to ensure the quality of the training data. Cases without a facts section or with insufficiently detailed facts were removed.
- Label Coverage: Every possible label related to substantive human rights articles within the ECHR was included. This ensures that the model can accurately identify a wide range of human rights violations.
Model Architecture
- Base Model: Roberta, a state-of-the-art transformer model.
- Classifier Head: Multi-label classification head to handle the multiple possible human rights violations.
Training Process
- Data Preprocessing:
- Text Cleaning: Removal of irrelevant text and formatting to ensure the model focuses on the essential facts.
- Label Encoding: Each case was labeled with all applicable human rights articles.
- Model Training:
- Training Set: The model was trained on a diverse set of 13,000+ cases, ensuring it can generalize well to new, unseen data.
- Validation Set: A separate validation set was used to monitor the model's performance and prevent overfitting.
- Hyperparameter Tuning: Extensive hyperparameter tuning was performed to optimize the model's performance.
Model Capabilities
- Input: User-provided facts in natural language.
- Output: A list of potential human rights violations based on the provided facts, with each violation linked to the relevant article(s) of the ECHR.
- Accuracy: High accuracy in identifying human rights violations, thanks to the extensive and high-quality training data.
- Comprehensive Coverage: Ability to identify a wide range of human rights violations, covering all substantive articles of the ECHR.
Use Cases
- Legal Assistance: Helping individuals and legal professionals identify potential human rights violations in their cases.
- Educational Tool: Assisting students and researchers in understanding the application of human rights articles in real-world scenarios.
- Automated Compliance: Supporting organizations in ensuring compliance with human rights standards by identifying potential violations.
Conclusion
The Roberta model, trained on a multi-label classifier head, is a powerful tool for identifying potential human rights violations based on the facts provided. Through extensive training on a high-quality dataset and manual review, the model ensures high accuracy and comprehensive coverage of all substantive human rights articles within the European Convention on Human Rights. This makes it a valuable resource for legal professionals, educators, and organizations committed to human rights.
Training hyperparameters
The following hyperparameters were used during training:
- learning_rate: 2e-05
- train_batch_size: 8
- eval_batch_size: 8
- seed: 42
- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
- lr_scheduler_type: linear
- num_epochs: 5
Training results
Training Loss | Epoch | Step | Validation Loss | F1 | Roc Auc | Accuracy |
---|---|---|---|---|---|---|
0.0835 | 1.0 | 1765 | 0.0780 | 0.6933 | 0.7942 | 0.5214 |
0.0674 | 2.0 | 3530 | 0.0699 | 0.7375 | 0.8363 | 0.5577 |
0.0584 | 3.0 | 5295 | 0.0674 | 0.7452 | 0.8460 | 0.5883 |
0.0474 | 4.0 | 7060 | 0.0690 | 0.7372 | 0.8448 | 0.5787 |
0.04 | 5.0 | 8825 | 0.0695 | 0.7429 | 0.8475 | 0.5870 |
Framework versions
- Transformers 4.35.2
- Pytorch 2.1.1+cu121
- Datasets 2.14.5
- Tokenizers 0.15.1