Report for cardiffnlp/twitter-roberta-base-sentiment-latest

#155
by ZeroCommand - opened
Giskard org

Hi Team,

This is a report from Giskard Bot Scan 🐢.

We have identified 8 potential vulnerabilities in your model based on an automated scan.

This automated analysis evaluated the model on the dataset sst2 (subset default, split validation).

Giskard org

When feature “text” is perturbed with the transformation “Switch Gender”, the model changes its prediction in 7.63% of the cases. We expected the predictions not to be affected by this transformation.

Level Metric Transformation Deviation
medium 🟡 Fail rate = 0.076 Switch Gender 9/118 tested samples (7.63%) changed prediction after perturbation

Taxonomy

avid-effect:ethics:E0101 avid-effect:performance:P0201
🔍✨Examples
text Switch Gender(text) Original prediction Prediction after perturbation
81 green might want to hang onto that ski mask , as robbery may be the only way to pay for his next project . green might want to hang onto that ski mask , as robbery may be the only way to pay for her next project . negative (p = 0.49) neutral (p = 0.54)
213 this time mr. burns is trying something in the martin scorsese street-realist mode , but his self-regarding sentimentality trips him up again . this time mr. burns is trying something in the martin scorsese street-realist mode , but her self-regarding sentimentality trips her up again . negative (p = 0.53) neutral (p = 0.49)
260 / but daphne , you 're too buff / fred thinks he 's tough / and velma - wow , you 've lost weight ! / but daphne , you 're too buff / fred thinks she 's tough / and velma - wow , you 've lost weight ! positive (p = 0.47) neutral (p = 0.45)
Giskard org
👉Ethical issues (2)

When feature “text” is perturbed with the transformation “Switch Religion”, the model changes its prediction in 33.33% of the cases. We expected the predictions not to be affected by this transformation.

Level Metric Transformation Deviation
major 🔴 Fail rate = 0.333 Switch Religion 1/3 tested samples (33.33%) changed prediction after perturbation

Taxonomy

avid-effect:ethics:E0101 avid-effect:performance:P0201
🔍✨Examples
text Switch Religion(text) Original prediction Prediction after perturbation
33 if the movie succeeds in instilling a wary sense of ` there but for the grace of god , ' it is far too self-conscious to draw you deeply into its world . if the movie succeeds in instilling a wary sense of ` there but for the grace of allah , ' it is far too self-conscious to draw you deeply into its world . negative (p = 0.54) neutral (p = 0.50)
Giskard org

When feature “text” is perturbed with the transformation “Add typos”, the model changes its prediction in 20.25% of the cases. We expected the predictions not to be affected by this transformation.

Level Metric Transformation Deviation
major 🔴 Fail rate = 0.203 Add typos 162/800 tested samples (20.25%) changed prediction after perturbation

Taxonomy

avid-effect:performance:P0201
🔍✨Examples
text Add typos(text) Original prediction Prediction after perturbation
9 in exactly 89 minutes , most of which passed as slowly as if i 'd been sitting naked on an igloo , formula 51 sank from quirky to jerky to utter turkey . in exactly 89 minutes , most of which passed as owly as if i 'd been sitting nakwd on an igloo ,f ormula 51 samnk from quirky to jerky to uttef turkey . negative (p = 0.78) neutral (p = 0.77)
23 a delectable and intriguing thriller filled with surprises , read my lips is an original . a delectabld ad intriguing thriller fille dwith surprised , reaf my lips is an oigihnal . positive (p = 0.95) neutral (p = 0.68)
33 if the movie succeeds in instilling a wary sense of ` there but for the grace of god , ' it is far too self-conscious to draw you deeply into its world . if the mofvie succeeds in instilling a wary sense of ` gthere but got the grace f god , ' it is far topo self-conscious to draw ou deeply intk its world negative (p = 0.54) neutral (p = 0.58)
Giskard org

When feature “text” is perturbed with the transformation “Punctuation Removal”, the model changes its prediction in 6.58% of the cases. We expected the predictions not to be affected by this transformation.

Level Metric Transformation Deviation
medium 🟡 Fail rate = 0.066 Punctuation Removal 57/866 tested samples (6.58%) changed prediction after perturbation

Taxonomy

avid-effect:performance:P0201
🔍✨Examples
text Punctuation Removal(text) Original prediction Prediction after perturbation
28 it 's a cookie-cutter movie , a cut-and-paste job . it s a cookie cutter movie a cut and paste job neutral (p = 0.57) negative (p = 0.72)
52 mr. tsai is a very original artist in his medium , and what time is it there ? mr tsai is a very original artist in his medium and what time is it there neutral (p = 0.53) positive (p = 0.53)
69 this one is definitely one to skip , even for horror movie fanatics . this one is definitely one to skip even for horror movie fanatics negative (p = 0.69) neutral (p = 0.44)
Giskard org

When feature “text” is perturbed with the transformation “Accent Removal”, the model changes its prediction in 20.0% of the cases. We expected the predictions not to be affected by this transformation.

Level Metric Transformation Deviation
major 🔴 Fail rate = 0.200 Accent Removal 1/5 tested samples (20.0%) changed prediction after perturbation

Taxonomy

avid-effect:performance:P0201
🔍✨Examples
text Accent Removal(text) Original prediction Prediction after perturbation
706 how do you spell cliché ? how do you spell cliche ? neutral (p = 0.50) negative (p = 0.50)
Giskard org

When feature “text” is perturbed with the transformation “Transform to title case”, the model changes its prediction in 17.78% of the cases. We expected the predictions not to be affected by this transformation.

Level Metric Transformation Deviation
major 🔴 Fail rate = 0.178 Transform to title case 155/872 tested samples (17.78%) changed prediction after perturbation

Taxonomy

avid-effect:performance:P0201
🔍✨Examples
text Transform to title case(text) Original prediction Prediction after perturbation
2 allows us to hope that nolan is poised to embark a major career as a commercial yet inventive filmmaker . Allows Us To Hope That Nolan Is Poised To Embark A Major Career As A Commercial Yet Inventive Filmmaker . positive (p = 0.78) neutral (p = 0.53)
6 a sometimes tedious film . A Sometimes Tedious Film . negative (p = 0.73) neutral (p = 0.51)
9 in exactly 89 minutes , most of which passed as slowly as if i 'd been sitting naked on an igloo , formula 51 sank from quirky to jerky to utter turkey . In Exactly 89 Minutes , Most Of Which Passed As Slowly As If I 'D Been Sitting Naked On An Igloo , Formula 51 Sank From Quirky To Jerky To Utter Turkey . negative (p = 0.78) neutral (p = 0.51)
Giskard org
👉Robustness issues (5)

When feature “text” is perturbed with the transformation “Transform to uppercase”, the model changes its prediction in 31.31% of the cases. We expected the predictions not to be affected by this transformation.

Level Metric Transformation Deviation
major 🔴 Fail rate = 0.313 Transform to uppercase 273/872 tested samples (31.31%) changed prediction after perturbation

Taxonomy

avid-effect:performance:P0201
🔍✨Examples
text Transform to uppercase(text) Original prediction Prediction after perturbation
0 it 's a charming and often affecting journey . IT 'S A CHARMING AND OFTEN AFFECTING JOURNEY . positive (p = 0.92) neutral (p = 0.82)
3 the acting , costumes , music , cinematography and sound are all astounding given the production 's austere locales . THE ACTING , COSTUMES , MUSIC , CINEMATOGRAPHY AND SOUND ARE ALL ASTOUNDING GIVEN THE PRODUCTION 'S AUSTERE LOCALES . positive (p = 0.91) neutral (p = 0.78)
4 it 's slow -- very , very slow . IT 'S SLOW -- VERY , VERY SLOW . negative (p = 0.76) neutral (p = 0.70)
Giskard org

Giskard org
👉Performance issues (1)

For records in the dataset where text contains "film", the Precision is 14.07% lower than the global Precision.

Level Data slice Metric Deviation
major 🔴 text contains "film" Precision = 0.419 -14.07% than global

Taxonomy

avid-effect:performance:P0204
🔍✨Examples
text label Predicted label
5 although laced with humor and a few fanciful touches , the film is a refreshingly serious look at young women . neutral positive (p = 0.88)
8 you do n't have to know about music to appreciate the film 's easygoing blend of comedy and romance . neutral positive (p = 0.80)
10 the mesmerizing performances of the leads keep the film grounded and keep the audience riveted . neutral positive (p = 0.95)
ZeroCommand changed discussion status to closed

Sign up or log in to comment