Edit model card

StackOverflow-RoBERTa-base for Sentiment Analysis on Software Engineering Texts

This is a RoBERTa-base model for sentiment analysis on software engineering texts. It is re-finetuned from cardiffnlp/twitter-roberta-base-sentiment with StackOverflow4423 dataset. You can access the demo here.

Example of Pipeline

from transformers import pipeline

MODEL = 'Cloudy1225/stackoverflow-roberta-base-sentiment'
sentiment_task = pipeline(task="sentiment-analysis", model=MODEL)
sentiment_task(["Excellent, happy to help!",
                "This can probably be done using JavaScript.",
                "Yes, but it's tricky, since datetime parsing in SQL is a pain in the neck."])
[{'label': 'positive', 'score': 0.9997847676277161},
 {'label': 'neutral', 'score': 0.999783456325531},
 {'label': 'negative', 'score': 0.9996368885040283}]

Example of Classification

from scipy.special import softmax
from transformers import AutoTokenizer, AutoModelForSequenceClassification

def preprocess(text):
    """Preprocess text (username and link placeholders)"""
    new_text = []
    for t in text.split(' '):
        t = '@user' if t.startswith('@') and len(t) > 1 else t
        t = 'http' if t.startswith('http') else t
        new_text.append(t)
    return ' '.join(new_text).strip()

MODEL = 'Cloudy1225/stackoverflow-roberta-base-sentiment'
tokenizer = AutoTokenizer.from_pretrained(MODEL)
model = AutoModelForSequenceClassification.from_pretrained(MODEL)

text = "Excellent, happy to help!"
text = preprocess(text)
encoded_input = tokenizer(text, return_tensors='pt')
output = model(**encoded_input)
scores = output[0][0].detach().numpy()
scores = softmax(scores)
print("negative", scores[0])
print("neutral", scores[1])
print("positive", scores[2])
negative 0.00015578205
neutral 5.9470447e-05
positive 0.99978495

Acknowledgments

This project was developed as part of the Software Engineering and Computing III course at Software Institute, Nanjing University in Spring 2023. For more insights into sentiment analysis on software engineering texts, you can refer to the following paper:

@inproceedings{sun2022incorporating,
  title={Incorporating Pre-trained Transformer Models into TextCNN for Sentiment Analysis on Software Engineering Texts},
  author={Sun, Kexin and Shi, Xiaobo and Gao, Hui and Kuang, Hongyu and Ma, Xiaoxing and Rong, Guoping and Shao, Dong and Zhao, Zheng and Zhang, He},
  booktitle={Proceedings of the 13th Asia-Pacific Symposium on Internetware},
  pages={127--136},
  year={2022}
}
Downloads last month
629,874
Safetensors
Model size
125M params
Tensor type
I64
ยท
F32
ยท
Inference API

Space using Cloudy1225/stackoverflow-roberta-base-sentiment 1