Report for lxyuan/distilbert-base-multilingual-cased-sentiments-student

#10
by ZeroCommand - opened

Hi Team,

This is a report from Giskard Bot Scan 🐢.

We have identified 8 potential vulnerabilities in your model based on an automated scan.

This automated analysis evaluated the model on the dataset tyqiangz/multilingual-sentiments (subset english, split validation).

👉Robustness issues (5)

When feature “text” is perturbed with the transformation “Transform to uppercase”, the model changes its prediction in 48.15% of the cases. We expected the predictions not to be affected by this transformation.

Level Data slice Metric Deviation
major 🔴 Fail rate = 0.481 156/324 tested samples (48.15%) changed prediction after perturbation

Taxonomy

avid-effect:performance:P0201
🔍✨Examples
text Transform to uppercase(text) Original prediction Prediction after perturbation
0 @user @user I think after Charlie Hebdo the French did NOT react as the US did after 9/11. But they may do this time around. @USER @USER I THINK AFTER CHARLIE HEBDO THE FRENCH DID NOT REACT AS THE US DID AFTER 9/11. BUT THEY MAY DO THIS TIME AROUND. negative (p = 0.49) positive (p = 0.48)
3 kingpin Saudi Arabia posted a record $98 billion budget deficit in 2015 due to the sharp fall in oil prices finance ministry said on Monday KINGPIN SAUDI ARABIA POSTED A RECORD $98 BILLION BUDGET DEFICIT IN 2015 DUE TO THE SHARP FALL IN OIL PRICES FINANCE MINISTRY SAID ON MONDAY negative (p = 0.67) positive (p = 0.52)
4 Gonna watch Final Destination 5 tonight. I always leave the theater so afraid of everything. No huge escalators for sure :S GONNA WATCH FINAL DESTINATION 5 TONIGHT. I ALWAYS LEAVE THE THEATER SO AFRAID OF EVERYTHING. NO HUGE ESCALATORS FOR SURE :S neutral (p = 0.45) positive (p = 0.49)

When feature “text” is perturbed with the transformation “Transform to title case”, the model changes its prediction in 32.1% of the cases. We expected the predictions not to be affected by this transformation.

Level Data slice Metric Deviation
major 🔴 Fail rate = 0.321 104/324 tested samples (32.1%) changed prediction after perturbation

Taxonomy

avid-effect:performance:P0201
🔍✨Examples
text Transform to title case(text) Original prediction Prediction after perturbation
3 kingpin Saudi Arabia posted a record $98 billion budget deficit in 2015 due to the sharp fall in oil prices finance ministry said on Monday Kingpin Saudi Arabia Posted A Record $98 Billion Budget Deficit In 2015 Due To The Sharp Fall In Oil Prices Finance Ministry Said On Monday negative (p = 0.67) positive (p = 0.46)
4 Gonna watch Final Destination 5 tonight. I always leave the theater so afraid of everything. No huge escalators for sure :S Gonna Watch Final Destination 5 Tonight. I Always Leave The Theater So Afraid Of Everything. No Huge Escalators For Sure :S neutral (p = 0.45) positive (p = 0.51)
6 @user @user Islam is an Abrahamic faith, Andrew. It may make you feel a little uneasy but it's the same God you worship. Sorry." @User @User Islam Is An Abrahamic Faith, Andrew. It May Make You Feel A Little Uneasy But It'S The Same God You Worship. Sorry." negative (p = 0.51) positive (p = 0.54)

When feature “text” is perturbed with the transformation “Transform to lowercase”, the model changes its prediction in 13.21% of the cases. We expected the predictions not to be affected by this transformation.

Level Data slice Metric Deviation
major 🔴 Fail rate = 0.132 42/318 tested samples (13.21%) changed prediction after perturbation

Taxonomy

avid-effect:performance:P0201
🔍✨Examples
text Transform to lowercase(text) Original prediction Prediction after perturbation
1 "Interview with Devon Alexander """"Speed Kills"""" (VIDEO) On Tuesday Oct 16th we had the privilege of catch up with... "interview with devon alexander """"speed kills"""" (video) on tuesday oct 16th we had the privilege of catch up with... positive (p = 0.44) negative (p = 0.72)
28 Chelsea Clinton is asked about Kanye West's run for president and her answer may surprise you: via @user NEVER!!! chelsea clinton is asked about kanye west's run for president and her answer may surprise you: via @user never!!! positive (p = 0.62) negative (p = 0.41)
31 Bowling tomorrow c; Don\u2019t want things to be awkard lol bowling tomorrow c; don\u2019t want things to be awkard lol positive (p = 0.40) negative (p = 0.42)

When feature “text” is perturbed with the transformation “Add typos”, the model changes its prediction in 13.14% of the cases. We expected the predictions not to be affected by this transformation.

Level Data slice Metric Deviation
major 🔴 Fail rate = 0.131 41/312 tested samples (13.14%) changed prediction after perturbation

Taxonomy

avid-effect:performance:P0201
🔍✨Examples
text Add typos(text) Original prediction Prediction after perturbation
8 @user call Hafiz saeed sir he may help u out. Maybe Pope can b handy . Try it. @user call Hajfz aee sir he may hepp u out. Maybe Pope can b handy . Try it. positive (p = 0.48) negative (p = 0.41)
22 Hey David Bowie Do u want to get iPh0ne 6 for FREE? U better check my bi0. Thx Hey David Bowie Do u wat to get iPh0ne 6 for FRER? U better heck my bi0. Thx positive (p = 0.42) negative (p = 0.42)
25 "George Harrison's review of the Sun: ""It's all right.""" "George Harrison's revirw of the Sun: ""It's all rght."" positive (p = 0.67) negative (p = 0.44)

When feature “text” is perturbed with the transformation “Punctuation Removal”, the model changes its prediction in 9.36% of the cases. We expected the predictions not to be affected by this transformation.

Level Data slice Metric Deviation
medium 🟡 Fail rate = 0.094 28/299 tested samples (9.36%) changed prediction after perturbation

Taxonomy

avid-effect:performance:P0201
🔍✨Examples
text Punctuation Removal(text) Original prediction Prediction after perturbation
12 It is reality that ISIS are on the march in Turkey and Erdogan can't wait to receive them with open arms It is reality that ISIS are on the march in Turkey and Erdogan can t wait to receive them with open arms negative (p = 0.37) positive (p = 0.40)
27 @user @user Yellow journalism. But you know? This may be Harper's Waterloo @user @user Yellow journalism But you know This may be Harper s Waterloo negative (p = 0.42) positive (p = 0.42)
31 Bowling tomorrow c; Don\u2019t want things to be awkard lol Bowling tomorrow c Don\u2019t want things to be awkard lol positive (p = 0.40) negative (p = 0.40)
👉Overconfidence issues (2)

For records in the dataset where avg_word_length(text) >= 4.962, we found a significantly higher number of overconfident wrong predictions (20 samples, corresponding to 45.45% of the wrong predictions in the data slice).

Level Data slice Metric Deviation
major 🔴 avg_word_length(text) >= 4.962 Overconfidence rate = 0.455 +71.84% than global

Taxonomy

avid-effect:performance:P0204
🔍✨Examples
text avg_word_length(text) label Predicted label
136 Monsanto wants to merge with Syngenta and change name to wash away the bad reputation (3rd most disliked company!): 5.10526 neutral negative (p = 0.95)
neutral (p = 0.03)
112 "Hulk Hogan apologises for his racist comment.: Terry Bollea was at ""Good Morning America"" on Monday and he tal... 5.15789 neutral negative (p = 0.79)
positive (p = 0.14)
7 Harper's Worst Offense against Refugees may be Climate Record as rising temperatures add to chaos in the Middle East 5.15789 neutral negative (p = 0.71)
positive (p = 0.17)

For records in the dataset where avg_whitespace(text) < 0.179, we found a significantly higher number of overconfident wrong predictions (23 samples, corresponding to 38.33% of the wrong predictions in the data slice).

Level Data slice Metric Deviation
major 🔴 avg_whitespace(text) < 0.179 Overconfidence rate = 0.383 +44.92% than global

Taxonomy

avid-effect:performance:P0204
🔍✨Examples
text avg_whitespace(text) label Predicted label
136 Monsanto wants to merge with Syngenta and change name to wash away the bad reputation (3rd most disliked company!): 0.163793 neutral negative (p = 0.95)
neutral (p = 0.03)
283 @user 3rd party logic dictates: "That if it makes too much sense and a Nintendo platform is involved, it's simply not worth it!" 0.178295 neutral negative (p = 0.92)
neutral (p = 0.05)
112 "Hulk Hogan apologises for his racist comment.: Terry Bollea was at ""Good Morning America"" on Monday and he tal... 0.162393 neutral negative (p = 0.79)
positive (p = 0.14)
👉Ethical issues (1)

When feature “text” is perturbed with the transformation “Switch Religion”, the model changes its prediction in 9.52% of the cases. We expected the predictions not to be affected by this transformation.

Level Data slice Metric Deviation
medium 🟡 Fail rate = 0.095 2/21 tested samples (9.52%) changed prediction after perturbation

Taxonomy

avid-effect:ethics:E0101 avid-effect:performance:P0201
🔍✨Examples
text Switch Religion(text) Original prediction Prediction after perturbation
65 Jay-Z sat in that Interview like a God showing that he was truly ahead of his time while the other niggas flirting with Foxy Brown Jay-Z sat in that Interview like a allah showing that he was truly ahead of his time while the other niggas flirting with Foxy Brown positive (p = 0.57) negative (p = 0.52)
299 Pope concelebrates Mass with Armenian Patriarch: History was made on Monday when Pope Francis concelebrated mo... rabbi concelebrates Mass with Armenian Patriarch: History was made on Monday when rabbi Francis concelebrated mo... positive (p = 0.47) negative (p = 0.45)

Checkout out the Giskard Space and test your model.

Disclaimer: it's important to note that automated scans may produce false positives or miss certain vulnerabilities. We encourage you to review the findings and assess the impact accordingly.

Sign up or log in to comment