vibhorag101 commited on
Commit
6910531
1 Parent(s): 37c5e21

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +3 -3
README.md CHANGED
@@ -49,7 +49,8 @@ This model is a finetune of roberta-base to detect suicidal tendencies in a give
49
 
50
  ## Training and evaluation data
51
  - The dataset is sourced from Reddit and is available on [Kaggle](https://www.kaggle.com/datasets/nikhileswarkomati/suicide-watch).
52
- - The dataset was cleaned and following steps were applied
 
53
  - Converted to lowercase
54
  - Removed numbers and special characters.
55
  - Removed URLs, Emojis and accented characters.
@@ -57,8 +58,7 @@ This model is a finetune of roberta-base to detect suicidal tendencies in a give
57
  - Remove any extra white spaces and any extra spaces after a single space.
58
  - Removed any consecutive characters repeated more than 3 times.
59
  - Tokenised the text, then lemmatized it and then removed the stopwords (excluding not).
60
- - The cleaned dataset can be found [here](https://huggingface.co/datasets/vibhorag101/suicide_prediction_dataset_phr)
61
- - The dataset contains text with binary labels for suicide or non-suicide.
62
  - The evaluation set had ~23000 samples, while the training set had ~186k samples, i.e. a 80:10:10 (train:test:val) split.
63
 
64
  ## Training procedure
 
49
 
50
  ## Training and evaluation data
51
  - The dataset is sourced from Reddit and is available on [Kaggle](https://www.kaggle.com/datasets/nikhileswarkomati/suicide-watch).
52
+ - The dataset contains text with binary labels for suicide or non-suicide.
53
+ - The dataset was cleaned, and following steps were applied
54
  - Converted to lowercase
55
  - Removed numbers and special characters.
56
  - Removed URLs, Emojis and accented characters.
 
58
  - Remove any extra white spaces and any extra spaces after a single space.
59
  - Removed any consecutive characters repeated more than 3 times.
60
  - Tokenised the text, then lemmatized it and then removed the stopwords (excluding not).
61
+ - The cleaned dataset can be found [here](https://huggingface.co/datasets/vibhorag101/suicide_prediction_dataset_phr)
 
62
  - The evaluation set had ~23000 samples, while the training set had ~186k samples, i.e. a 80:10:10 (train:test:val) split.
63
 
64
  ## Training procedure