StackOverflow-RoBERTa-base for Sentiment Analysis on Software Engineering Texts
This is a RoBERTa-base model for sentiment analysis on software engineering texts. It is re-finetuned from cardiffnlp/twitter-roberta-base-sentiment with StackOverflow4423 dataset. You can access the demo here.
Example of Pipeline
from transformers import pipeline
MODEL = 'Cloudy1225/stackoverflow-roberta-base-sentiment'
sentiment_task = pipeline(task="sentiment-analysis", model=MODEL)
sentiment_task(["Excellent, happy to help!",
"This can probably be done using JavaScript.",
"Yes, but it's tricky, since datetime parsing in SQL is a pain in the neck."])
[{'label': 'positive', 'score': 0.9997847676277161},
{'label': 'neutral', 'score': 0.999783456325531},
{'label': 'negative', 'score': 0.9996368885040283}]
Example of Classification
from scipy.special import softmax
from transformers import AutoTokenizer, AutoModelForSequenceClassification
def preprocess(text):
"""Preprocess text (username and link placeholders)"""
new_text = []
for t in text.split(' '):
t = '@user' if t.startswith('@') and len(t) > 1 else t
t = 'http' if t.startswith('http') else t
new_text.append(t)
return ' '.join(new_text).strip()
MODEL = 'Cloudy1225/stackoverflow-roberta-base-sentiment'
tokenizer = AutoTokenizer.from_pretrained(MODEL)
model = AutoModelForSequenceClassification.from_pretrained(MODEL)
text = "Excellent, happy to help!"
text = preprocess(text)
encoded_input = tokenizer(text, return_tensors='pt')
output = model(**encoded_input)
scores = output[0][0].detach().numpy()
scores = softmax(scores)
print("negative", scores[0])
print("neutral", scores[1])
print("positive", scores[2])
negative 0.00015578205
neutral 5.9470447e-05
positive 0.99978495
Acknowledgments
This project was developed as part of the Software Engineering and Computing III course at Software Institute, Nanjing University in Spring 2023. For more insights into sentiment analysis on software engineering texts, you can refer to the following paper:
@inproceedings{sun2022incorporating,
title={Incorporating Pre-trained Transformer Models into TextCNN for Sentiment Analysis on Software Engineering Texts},
author={Sun, Kexin and Shi, Xiaobo and Gao, Hui and Kuang, Hongyu and Ma, Xiaoxing and Rong, Guoping and Shao, Dong and Zhao, Zheng and Zhang, He},
booktitle={Proceedings of the 13th Asia-Pacific Symposium on Internetware},
pages={127--136},
year={2022}
}
- Downloads last month
- 629,874