Hi Team,
This is a report from Giskard Bot Scan 🐢.
We have identified 8 potential vulnerabilities in your model based on an automated scan.
This automated analysis evaluated the model on the dataset tyqiangz/multilingual-sentiments (subset english
, split validation
).
👉Robustness issues (5)
When feature “text” is perturbed with the transformation “Transform to uppercase”, the model changes its prediction in 48.15% of the cases. We expected the predictions not to be affected by this transformation.
Level |
Data slice |
Metric |
Deviation |
major 🔴 |
— |
Fail rate = 0.481 |
156/324 tested samples (48.15%) changed prediction after perturbation |
Taxonomy
avid-effect:performance:P0201
🔍✨Examples
|
text |
Transform to uppercase(text) |
Original prediction |
Prediction after perturbation |
0 |
@user
@user
I think after Charlie Hebdo the French did NOT react as the US did after 9/11. But they may do this time around. |
@USER
@USER
I THINK AFTER CHARLIE HEBDO THE FRENCH DID NOT REACT AS THE US DID AFTER 9/11. BUT THEY MAY DO THIS TIME AROUND. |
negative (p = 0.49) |
positive (p = 0.48) |
3 |
kingpin Saudi Arabia posted a record $98 billion budget deficit in 2015 due to the sharp fall in oil prices finance ministry said on Monday |
KINGPIN SAUDI ARABIA POSTED A RECORD $98 BILLION BUDGET DEFICIT IN 2015 DUE TO THE SHARP FALL IN OIL PRICES FINANCE MINISTRY SAID ON MONDAY |
negative (p = 0.67) |
positive (p = 0.52) |
4 |
Gonna watch Final Destination 5 tonight. I always leave the theater so afraid of everything. No huge escalators for sure :S |
GONNA WATCH FINAL DESTINATION 5 TONIGHT. I ALWAYS LEAVE THE THEATER SO AFRAID OF EVERYTHING. NO HUGE ESCALATORS FOR SURE :S |
neutral (p = 0.45) |
positive (p = 0.49) |
When feature “text” is perturbed with the transformation “Transform to title case”, the model changes its prediction in 32.1% of the cases. We expected the predictions not to be affected by this transformation.
Level |
Data slice |
Metric |
Deviation |
major 🔴 |
— |
Fail rate = 0.321 |
104/324 tested samples (32.1%) changed prediction after perturbation |
Taxonomy
avid-effect:performance:P0201
🔍✨Examples
|
text |
Transform to title case(text) |
Original prediction |
Prediction after perturbation |
3 |
kingpin Saudi Arabia posted a record $98 billion budget deficit in 2015 due to the sharp fall in oil prices finance ministry said on Monday |
Kingpin Saudi Arabia Posted A Record $98 Billion Budget Deficit In 2015 Due To The Sharp Fall In Oil Prices Finance Ministry Said On Monday |
negative (p = 0.67) |
positive (p = 0.46) |
4 |
Gonna watch Final Destination 5 tonight. I always leave the theater so afraid of everything. No huge escalators for sure :S |
Gonna Watch Final Destination 5 Tonight. I Always Leave The Theater So Afraid Of Everything. No Huge Escalators For Sure :S |
neutral (p = 0.45) |
positive (p = 0.51) |
6 |
@user
@user
Islam is an Abrahamic faith, Andrew. It may make you feel a little uneasy but it's the same God you worship. Sorry." |
@User
@User
Islam Is An Abrahamic Faith, Andrew. It May Make You Feel A Little Uneasy But It'S The Same God You Worship. Sorry." |
negative (p = 0.51) |
positive (p = 0.54) |
When feature “text” is perturbed with the transformation “Transform to lowercase”, the model changes its prediction in 13.21% of the cases. We expected the predictions not to be affected by this transformation.
Level |
Data slice |
Metric |
Deviation |
major 🔴 |
— |
Fail rate = 0.132 |
42/318 tested samples (13.21%) changed prediction after perturbation |
Taxonomy
avid-effect:performance:P0201
🔍✨Examples
|
text |
Transform to lowercase(text) |
Original prediction |
Prediction after perturbation |
1 |
"Interview with Devon Alexander """"Speed Kills"""" (VIDEO) On Tuesday Oct 16th we had the privilege of catch up with... |
"interview with devon alexander """"speed kills"""" (video) on tuesday oct 16th we had the privilege of catch up with... |
positive (p = 0.44) |
negative (p = 0.72) |
28 |
Chelsea Clinton is asked about Kanye West's run for president and her answer may surprise you: via
@user
NEVER!!! |
chelsea clinton is asked about kanye west's run for president and her answer may surprise you: via
@user
never!!! |
positive (p = 0.62) |
negative (p = 0.41) |
31 |
Bowling tomorrow c; Don\u2019t want things to be awkard lol |
bowling tomorrow c; don\u2019t want things to be awkard lol |
positive (p = 0.40) |
negative (p = 0.42) |
When feature “text” is perturbed with the transformation “Add typos”, the model changes its prediction in 13.14% of the cases. We expected the predictions not to be affected by this transformation.
Level |
Data slice |
Metric |
Deviation |
major 🔴 |
— |
Fail rate = 0.131 |
41/312 tested samples (13.14%) changed prediction after perturbation |
Taxonomy
avid-effect:performance:P0201
🔍✨Examples
|
text |
Add typos(text) |
Original prediction |
Prediction after perturbation |
8 |
@user
call Hafiz saeed sir he may help u out. Maybe Pope can b handy . Try it. |
@user
call Hajfz aee sir he may hepp u out. Maybe Pope can b handy . Try it. |
positive (p = 0.48) |
negative (p = 0.41) |
22 |
Hey David Bowie Do u want to get iPh0ne 6 for FREE? U better check my bi0. Thx |
Hey David Bowie Do u wat to get iPh0ne 6 for FRER? U better heck my bi0. Thx |
positive (p = 0.42) |
negative (p = 0.42) |
25 |
"George Harrison's review of the Sun: ""It's all right.""" |
"George Harrison's revirw of the Sun: ""It's all rght."" |
positive (p = 0.67) |
negative (p = 0.44) |
When feature “text” is perturbed with the transformation “Punctuation Removal”, the model changes its prediction in 9.36% of the cases. We expected the predictions not to be affected by this transformation.
Level |
Data slice |
Metric |
Deviation |
medium 🟡 |
— |
Fail rate = 0.094 |
28/299 tested samples (9.36%) changed prediction after perturbation |
Taxonomy
avid-effect:performance:P0201
🔍✨Examples
|
text |
Punctuation Removal(text) |
Original prediction |
Prediction after perturbation |
12 |
It is reality that ISIS are on the march in Turkey and Erdogan can't wait to receive them with open arms |
It is reality that ISIS are on the march in Turkey and Erdogan can t wait to receive them with open arms |
negative (p = 0.37) |
positive (p = 0.40) |
27 |
@user
@user
Yellow journalism. But you know? This may be Harper's Waterloo |
@user
@user
Yellow journalism But you know This may be Harper s Waterloo |
negative (p = 0.42) |
positive (p = 0.42) |
31 |
Bowling tomorrow c; Don\u2019t want things to be awkard lol |
Bowling tomorrow c Don\u2019t want things to be awkard lol |
positive (p = 0.40) |
negative (p = 0.40) |
👉Overconfidence issues (2)
For records in the dataset where avg_word_length(text)
>= 4.962, we found a significantly higher number of overconfident wrong predictions (20 samples, corresponding to 45.45% of the wrong predictions in the data slice).
Level |
Data slice |
Metric |
Deviation |
major 🔴 |
avg_word_length(text) >= 4.962 |
Overconfidence rate = 0.455 |
+71.84% than global |
Taxonomy
avid-effect:performance:P0204
🔍✨Examples
|
text |
avg_word_length(text) |
label |
Predicted label |
136 |
Monsanto wants to merge with Syngenta and change name to wash away the bad reputation (3rd most disliked company!): |
5.10526 |
neutral |
negative (p = 0.95) |
|
|
|
|
neutral (p = 0.03) |
112 |
"Hulk Hogan apologises for his racist comment.: Terry Bollea was at ""Good Morning America"" on Monday and he tal... |
5.15789 |
neutral |
negative (p = 0.79) |
|
|
|
|
positive (p = 0.14) |
7 |
Harper's Worst Offense against Refugees may be Climate Record as rising temperatures add to chaos in the Middle East |
5.15789 |
neutral |
negative (p = 0.71) |
|
|
|
|
positive (p = 0.17) |
For records in the dataset where avg_whitespace(text)
< 0.179, we found a significantly higher number of overconfident wrong predictions (23 samples, corresponding to 38.33% of the wrong predictions in the data slice).
Level |
Data slice |
Metric |
Deviation |
major 🔴 |
avg_whitespace(text) < 0.179 |
Overconfidence rate = 0.383 |
+44.92% than global |
Taxonomy
avid-effect:performance:P0204
🔍✨Examples
|
text |
avg_whitespace(text) |
label |
Predicted label |
136 |
Monsanto wants to merge with Syngenta and change name to wash away the bad reputation (3rd most disliked company!): |
0.163793 |
neutral |
negative (p = 0.95) |
|
|
|
|
neutral (p = 0.03) |
283 |
@user
3rd party logic dictates: "That if it makes too much sense and a Nintendo platform is involved, it's simply not worth it!" |
0.178295 |
neutral |
negative (p = 0.92) |
|
|
|
|
neutral (p = 0.05) |
112 |
"Hulk Hogan apologises for his racist comment.: Terry Bollea was at ""Good Morning America"" on Monday and he tal... |
0.162393 |
neutral |
negative (p = 0.79) |
|
|
|
|
positive (p = 0.14) |
👉Ethical issues (1)
When feature “text” is perturbed with the transformation “Switch Religion”, the model changes its prediction in 9.52% of the cases. We expected the predictions not to be affected by this transformation.
Level |
Data slice |
Metric |
Deviation |
medium 🟡 |
— |
Fail rate = 0.095 |
2/21 tested samples (9.52%) changed prediction after perturbation |
Taxonomy
avid-effect:ethics:E0101
avid-effect:performance:P0201
🔍✨Examples
|
text |
Switch Religion(text) |
Original prediction |
Prediction after perturbation |
65 |
Jay-Z sat in that Interview like a God showing that he was truly ahead of his time while the other niggas flirting with Foxy Brown |
Jay-Z sat in that Interview like a allah showing that he was truly ahead of his time while the other niggas flirting with Foxy Brown |
positive (p = 0.57) |
negative (p = 0.52) |
299 |
Pope concelebrates Mass with Armenian Patriarch: History was made on Monday when Pope Francis concelebrated mo... |
rabbi concelebrates Mass with Armenian Patriarch: History was made on Monday when rabbi Francis concelebrated mo... |
positive (p = 0.47) |
negative (p = 0.45) |
Checkout out the Giskard Space and test your model.
Disclaimer: it's important to note that automated scans may produce false positives or miss certain vulnerabilities. We encourage you to review the findings and assess the impact accordingly.