lxyuan/distilbert-base-multilingual-cased-sentiments-student · Report for lxyuan/distilbert-base-multilingual-cased-sentiments-student

Hi Team,

This is a report from Giskard Bot Scan 🐢.

We have identified 6 potential vulnerabilities in your model based on an automated scan.

This automated analysis evaluated the model on the dataset tyqiangz/multilingual-sentiments (subset all, split train).

You can find a full version of scan report here.

👉Robustness issues (5)

When feature “text” is perturbed with the transformation “Transform to uppercase”, the model changes its prediction in 25.2% of the cases. We expected the predictions not to be affected by this transformation.

Level	Metric	Transformation	Deviation
major 🔴	Fail rate = 0.252	Transform to uppercase	252/1000 tested samples (25.2%) changed prediction after perturbation

Taxonomy

avid-effect:performance:P0201

🔍✨Examples

	text	Transform to uppercase(text)	Original prediction	Prediction after perturbation
262380	sudirman said sekarang jadi calon gubernur dari gerindra . kacau grup munafik semua .	SUDIRMAN SAID SEKARANG JADI CALON GUBERNUR DARI GERINDRA . KACAU GRUP MUNAFIK SEMUA .	negative (p = 0.77)	positive (p = 0.56)
137125	@user muy triste La prensa es lo que es. Un medio oportunista, en muchas ocaciones, que por vender, lo que sea	@USER MUY TRISTE LA PRENSA ES LO QUE ES. UN MEDIO OPORTUNISTA, EN MUCHAS OCACIONES, QUE POR VENDER, LO QUE SEA	negative (p = 0.98)	positive (p = 0.54)
6322	😱 Food poisoning	😱 FOOD POISONING	negative (p = 0.58)	positive (p = 0.54)

When feature “text” is perturbed with the transformation “Transform to title case”, the model changes its prediction in 13.7% of the cases. We expected the predictions not to be affected by this transformation.

Level	Metric	Transformation	Deviation
major 🔴	Fail rate = 0.137	Transform to title case	137/1000 tested samples (13.7%) changed prediction after perturbation

Taxonomy

avid-effect:performance:P0201

🔍✨Examples

	text	Transform to title case(text)	Original prediction	Prediction after perturbation
131947	"Because seriously, who the **** is Chris Evans? Virtually world famous in all of England. Real stars like Sabine may save that show but..."	"Because Seriously, Who The **** Is Chris Evans? Virtually World Famous In All Of England. Real Stars Like Sabine May Save That Show But..."	negative (p = 0.48)	positive (p = 0.58)
257746	tempat lumayan romantis , kalau bawa pasangan . karena kebetulan pas saya ke sana , malam hari tapi gerimis , jadi pemandangan ke bawah , kurang oke . sayang banget , padahal sudah siap-siap bawa kamera . seragam pelayan sepertinya harus diganti deh , saya pikir security , ternyata pelayan : ronde nya enak . kita pesan satu makanan juga tapi sepertinya lupa diorder ke dapur .	Tempat Lumayan Romantis , Kalau Bawa Pasangan . Karena Kebetulan Pas Saya Ke Sana , Malam Hari Tapi Gerimis , Jadi Pemandangan Ke Bawah , Kurang Oke . Sayang Banget , Padahal Sudah Siap-Siap Bawa Kamera . Seragam Pelayan Sepertinya Harus Diganti Deh , Saya Pikir Security , Ternyata Pelayan : Ronde Nya Enak . Kita Pesan Satu Makanan Juga Tapi Sepertinya Lupa Diorder Ke Dapur .	negative (p = 0.59)	positive (p = 0.79)
123005	◆外見・構造見た目は写真通りで、玩具のようで高級感は無いです。 USBメモリーがレールに沿って中身ごと動くタイプなので、USB差込部がやたらカチャカチャとグラ付いていても性能上の問題はないのだと思いますが、やはり不安は感じてしまいます。◆性能文句なしに速いです。データ復旧・バックアップ・移行用として優秀だと思います。◆総評完全に実用品です。アクセサリーとしてや、持っているという満足感はあまり感じませんが、気取らず普段使いするのに最適だと思います。	◆外見・構造見た目は写真通りで、玩具のようで高級感は無いです。 Usbメモリーがレールに沿って中身ごと動くタイプなので、Usb差込部がやたらカチャカチャとグラ付いていても性能上の問題はないのだと思いますが、やはり不安は感じてしまいます。◆性能文句なしに速いです。データ復旧・バックアップ・移行用として優秀だと思います。◆総評完全に実用品です。アクセサリーとしてや、持っているという満足感はあまり感じませんが、気取らず普段使いするのに最適だと思います。	negative (p = 0.37)	neutral (p = 0.37)

When feature “text” is perturbed with the transformation “Accent Removal”, the model changes its prediction in 8.3% of the cases. We expected the predictions not to be affected by this transformation.

Level	Metric	Transformation	Deviation
medium 🟡	Fail rate = 0.083	Accent Removal	83/1000 tested samples (8.3%) changed prediction after perturbation

Taxonomy

avid-effect:performance:P0201

🔍✨Examples

	text	Accent Removal(text)	Original prediction	Prediction after perturbation
89830	短い髪ならこれ一個でもいけます！長めの方でも他のワックスと混ぜるとなかなかいい感じに束とツヤが出ます。基本的には単体で使うよりは混ぜて使うものだと思います。	短い髪ならこれ一個てもいけます！長めの方ても他のワックスと混せるとなかなかいい感しに束とツヤか出ます。基本的には単体て使うよりは混せて使うものたと思います。	positive (p = 0.37)	negative (p = 0.40)
22409	ここのはバッタもんです。写真左は正規品、右がこの会社（inside.siro）の販売しているものです。わざとらしく「内海産業超給水マイクロファイバー傘カバーグリーン」などと書いたシールが貼ってありますが、包装が正規品とは明らかに違ってちゃちです。商品自体で使われている生地もファスナーも、いかにも安物という感じでした。なお、他の会社が販売しているブルーやオレンジは、正規品でした。	ここのはハッタもんてす。写真左は正規品、右かこの会社（inside.siro）の販売しているものてす。わさとらしく「内海産業超給水マイクロファイハー傘カハークリーン」なとと書いたシールか貼ってありますか、包装か正規品とは明らかに違ってちゃちてす。商品自体て使われている生地もファスナーも、いかにも安物という感してした。なお、他の会社か販売しているフルーやオレンシは、正規品てした。	positive (p = 0.39)	neutral (p = 0.37)
108664	ビデオカメラモニター用に購入しました。あくまでもモニター用なので、音質等には拘らず、軽量･ホールド感･音のクリアーさ･低価格位の基準で、加えてレビュー等も参考にして決めました。今の所申し分無く目的に叶っています。コードも2m有り充分カメラ迄の距離が保て満足です。	ヒテオカメラモニター用に購入しました。あくまてもモニター用なのて、音質等には拘らす、軽量･ホールト感･音のクリアーさ･低価格位の基準て、加えてレヒュー等も参考にして決めました。今の所申し分無く目的に叶っています。コートも2m有り充分カメラ迄の距離か保て満足てす。	positive (p = 0.35)	negative (p = 0.36)

When feature “text” is perturbed with the transformation “Transform to lowercase”, the model changes its prediction in 6.2% of the cases. We expected the predictions not to be affected by this transformation.

Level	Metric	Transformation	Deviation
medium 🟡	Fail rate = 0.062	Transform to lowercase	62/1000 tested samples (6.2%) changed prediction after perturbation

Taxonomy

avid-effect:performance:P0201

🔍✨Examples

	text	Transform to lowercase(text)	Original prediction	Prediction after perturbation
59688	電波時計となってるけど、Bluetoothで携帯とリンクして時刻を合わせてるだけではないのかな⁉ それに、G-shockアプリですが携帯の電池の消耗が激しすぎます。使うだけなら、軽くて良い時計かなと思いますが、なら電池式でも良かったような感じです	電波時計となってるけど、bluetoothで携帯とリンクして時刻を合わせてるだけではないのかな⁉ それに、g-shockアプリですが携帯の電池の消耗が激しすぎます。使うだけなら、軽くて良い時計かなと思いますが、なら電池式でも良かったような感じです	positive (p = 0.47)	negative (p = 0.42)
127209	RT @user : Schrott Ankauf Lahib ( http ) macht das Unmögliche möglich ;)	rt @user : schrott ankauf lahib ( http ) macht das unmögliche möglich ;)	positive (p = 0.45)	negative (p = 0.44)
230796	美国版的果然大，我183cm，88kg，穿这个xl偏肥。穿L应该合适。	美国版的果然大，我183cm，88kg，穿这个xl偏肥。穿l应该合适。	neutral (p = 0.38)	negative (p = 0.38)

When feature “text” is perturbed with the transformation “Punctuation Removal”, the model changes its prediction in 5.1% of the cases. We expected the predictions not to be affected by this transformation.

Level	Metric	Transformation	Deviation
medium 🟡	Fail rate = 0.051	Punctuation Removal	51/1000 tested samples (5.1%) changed prediction after perturbation

Taxonomy

avid-effect:performance:P0201

🔍✨Examples

	text	Punctuation Removal(text)	Original prediction	Prediction after perturbation
5012	Kaha kaisy? M karna chati ho aapsy bohot sara baat	Kaha kaisy M karna chati ho aapsy bohot sara baat	negative (p = 0.37)	positive (p = 0.41)
132683	#MasterChefBR Alguém gosta de basquete? SDV	#MasterChefBR Alguém gosta de basquete SDV	negative (p = 0.39)	positive (p = 0.57)
21167	“音が異常にうるさいのですが、新しいものを送ってもらえますか?”	音が異常にうるさいのですが、新しいものを送ってもらえますか	negative (p = 0.37)	positive (p = 0.47)

👉Overconfidence issues (1)

For records in the dataset where text_length(text) < 48.500, we found a significantly higher number of overconfident wrong predictions (21791 samples, corresponding to 45.54% of the wrong predictions in the data slice).

Level	Data slice	Metric	Deviation
medium 🟡	`text_length(text)` < 48.500	Overconfidence rate = 0.455	+12.08% than global

Taxonomy

avid-effect:performance:P0204

🔍✨Examples

	text	text_length(text)	label	Predicted `label`
146459	非常好的书，构思巧妙引人入胜，周围的朋友都愿意推荐	25	negative	positive (p = 1.00)
				neutral (p = 0.00)
166738	这本书很好，是我想象总得那种书，读音光碟也很好，刚好适合我学习	31	negative	positive (p = 1.00)
				neutral (p = 0.00)
177090	这书非常好，跟扎实，很全面，是准备公考的好帮手。	24	negative	positive (p = 1.00)
				neutral (p = 0.00)

Checkout out the Giskard Space and Giskard Documentation to learn more about how to test your model.

Disclaimer: it's important to note that automated scans may produce false positives or miss certain vulnerabilities. We encourage you to review the findings and assess the impact accordingly.