vibhorag101
/

roberta-base-suicide-prediction-phr-v2

@@ -10,15 +10,37 @@ metrics:
 - f1
 model-index:
 - name: roberta-base-suicide-prediction-phr-v2
-  results: []
 ---
 <!-- This model card has been generated automatically according to the information the Trainer had access to. You
 should probably proofread and complete it, then remove this comment. -->
 # vibhorag101/roberta-base-suicide-prediction-phr-v2
-This model is a fine-tuned version of [roberta-base](https://huggingface.co/roberta-base) on an unknown dataset.
 It achieves the following results on the evaluation set:
 - Loss: 0.0553
 - Accuracy: 0.9869
@@ -27,21 +49,24 @@ It achieves the following results on the evaluation set:
 - F1: 0.9875
 ## Model description
-More information needed
-## Intended uses & limitations
-More information needed
 ## Training and evaluation data
-More information needed
 ## Training procedure
 ### Training hyperparameters
 The following hyperparameters were used during training:
 - learning_rate: 2e-05
 - train_batch_size: 16
@@ -51,6 +76,12 @@ The following hyperparameters were used during training:
 - lr_scheduler_type: linear
 - lr_scheduler_warmup_ratio: 0.06
 - num_epochs: 3
 ### Training results

 - f1
 model-index:
 - name: roberta-base-suicide-prediction-phr-v2
+  results:
+  - task:
+      type: text-classification
+      name: Suicidal Tendency Prediction in text
+    dataset:
+      type: vibhorag101/phr_suicide_prediction_dataset_clean_light
+      name: Suicide Prediction Dataset
+      split: val
+    metrics:
+      - type: accuracy
+        value: 0.9869
+      - type: f1
+        value: 0.9875
+      - type: recall
+        value: 0.9846
+      - type: precision
+        value: 0.9904
+datasets:
+- vibhorag101/phr_suicide_prediction_dataset_clean_light
+language:
+- en
+library_name: transformers
 ---
 <!-- This model card has been generated automatically according to the information the Trainer had access to. You
 should probably proofread and complete it, then remove this comment. -->
 # vibhorag101/roberta-base-suicide-prediction-phr-v2
+This model is a fine-tuned version of [roberta-base](https://huggingface.co/roberta-base) on [Suicide Prediction Dataset](https://huggingface.co/datasets/vibhorag101/phr_suicide_prediction_dataset_clean_light), sourced from Reddit.
 It achieves the following results on the evaluation set:
 - Loss: 0.0553
 - Accuracy: 0.9869
 - F1: 0.9875
 ## Model description
+This model is a finetune of roberta-base to detect suicidal tendencies in a given text.
 ## Training and evaluation data
+- The dataset is sourced from Reddit and is available on [Kaggle](https://www.kaggle.com/datasets/nikhileswarkomati/suicide-watch).
+- The dataset contains text with binary labels for suicide or non-suicide.
+- The dataset was cleaned minimally, as BERT depends on contextually sensitive information, which can worsely effect its performance.
+  - Removed numbers
+  - Removed URLs, Emojis, and accented characters.
+  - Remove any extra white spaces and any extra spaces after a single space.
+  - Removed any consecutive characters repeated more than 3 times.
+  - The rows with more than 512 BERT Tokens were removed, as they exceeded BERT's max token.
+- The cleaned dataset can be found [here](https://huggingface.co/datasets/vibhorag101/phr_suicide_prediction_dataset_clean_light)
+- The evaluation set had ~33k samples, while the training set had ~153k samples, i.e., a 70:15:15 (train:test:val) split.
 ## Training procedure
+- The model was trained on an RTXA5000 GPU.
 ### Training hyperparameters
 The following hyperparameters were used during training:
 - learning_rate: 2e-05
 - train_batch_size: 16
 - lr_scheduler_type: linear
 - lr_scheduler_warmup_ratio: 0.06
 - num_epochs: 3
+- eval_steps: 500
+- save_steps: 500
+- Early Stopping:
+  - early_stopping_patience: 5
+  - early_stopping_threshold: 0.001
+  - parameter: F1 Score
 ### Training results