vibhorag101
/

roberta-base-suicide-prediction-phr

Text Classification

Generated from Trainer

Inference Endpoints

Model card Files Files and versions Community

vibhorag101 commited on Nov 25, 2023

Commit

6910531

•

1 Parent(s): 37c5e21

Update README.md

Files changed (1) hide show

README.md +3 -3

README.md CHANGED Viewed

@@ -49,7 +49,8 @@ This model is a finetune of roberta-base to detect suicidal tendencies in a give
 ## Training and evaluation data
 - The dataset is sourced from Reddit and is available on [Kaggle](https://www.kaggle.com/datasets/nikhileswarkomati/suicide-watch).
-- The dataset was cleaned and following steps were applied
   - Converted to lowercase
   - Removed numbers and special characters.
   - Removed URLs, Emojis and accented characters.
@@ -57,8 +58,7 @@ This model is a finetune of roberta-base to detect suicidal tendencies in a give
   - Remove any extra white spaces and any extra spaces after a single space.
   - Removed any consecutive characters repeated more than 3 times.
   - Tokenised the text, then lemmatized it and then removed the stopwords (excluding not).
-- The cleaned dataset can be found [here](https://huggingface.co/datasets/vibhorag101/suicide_prediction_dataset_phr)
-- The dataset contains text with binary labels for suicide or non-suicide.
 - The evaluation set had ~23000 samples, while the training set had ~186k samples, i.e. a 80:10:10 (train:test:val) split.
 ## Training procedure

 ## Training and evaluation data
 - The dataset is sourced from Reddit and is available on [Kaggle](https://www.kaggle.com/datasets/nikhileswarkomati/suicide-watch).
+- The dataset contains text with binary labels for suicide or non-suicide.
+- The dataset was cleaned, and following steps were applied
   - Converted to lowercase
   - Removed numbers and special characters.
   - Removed URLs, Emojis and accented characters.
   - Remove any extra white spaces and any extra spaces after a single space.
   - Removed any consecutive characters repeated more than 3 times.
   - Tokenised the text, then lemmatized it and then removed the stopwords (excluding not).
+- The cleaned dataset can be found [here](https://huggingface.co/datasets/vibhorag101/suicide_prediction_dataset_phr)
 - The evaluation set had ~23000 samples, while the training set had ~186k samples, i.e. a 80:10:10 (train:test:val) split.
 ## Training procedure