Report for cardiffnlp/twitter-roberta-base-sentiment-latest
#90
by
giskard-bot
- opened
Hi Team,
This is a report from Giskard Bot Scan 🐢.
We have identified 8 potential vulnerabilities in your model based on an automated scan.
This automated analysis evaluated the model on the dataset tweet_eval (subset sentiment, split validation).
👉Robustness issues (5)
| Vulnerability | Level | Data slice | Metric | Transformation | Deviation |
|---|---|---|---|---|---|
| Robustness | major 🔴 | — | Fail rate = 0.151 | Add typos | 151/1000 tested samples (15.1%) changed prediction after perturbation |
🔍✨Examples
When feature “text” is perturbed with the transformation “Add typos”, the model changes its prediction in 15.1% of the cases. We expected the predictions not to be affected by this transformation.| text | Add typos(text) | Original prediction | Prediction after perturbation | |
|---|---|---|---|---|
| 1635 | "on Black Friday i always thought Kendrick said ""Coney Island!!"" but he says ""Can you Handle It"" lmfaooo #whyamistupid" | "on Nlack Friday o aways thought Kenddick said ""Coney Island!!"" bjut he says ""Can you Handle It"" lmfaooo #whyamistupid" | neutral (p = 0.46) | negative (p = 0.54) |
| 1254 | Hillary's campaign now reset for the 4th time. Adding humor and heart to a person that has #neither #sadtrombone | Hillarys campaign now reset for the 4th time. Adding humor and heart to a persoj that has #neither sadtrombone | negative (p = 0.62) | neutral (p = 0.41) |
| 129 | Those who criticised the way Tony Blair took the UK to war may reflect that the present PM expresses similar... | Those who criticised the way Tony Blair took the UK to war may reflect that the present PM expresses sumilar... | neutral (p = 0.51) | negative (p = 0.53) |
| Vulnerability | Level | Data slice | Metric | Transformation | Deviation |
|---|---|---|---|---|---|
| Robustness | major 🔴 | — | Fail rate = 0.147 | Transform to uppercase | 147/1000 tested samples (14.7%) changed prediction after perturbation |
🔍✨Examples
When feature “text” is perturbed with the transformation “Transform to uppercase”, the model changes its prediction in 14.7% of the cases. We expected the predictions not to be affected by this transformation.| text | Transform to uppercase(text) | Original prediction | Prediction after perturbation | |
|---|---|---|---|---|
| 1666 | "If it ain't broke don't fix it, why move kris Bryant up to 3rd when he's hitting as good as he has all season at 5" | "IF IT AIN'T BROKE DON'T FIX IT, WHY MOVE KRIS BRYANT UP TO 3RD WHEN HE'S HITTING AS GOOD AS HE HAS ALL SEASON AT 5" | neutral (p = 0.65) | negative (p = 0.77) |
| 680 | @user can you please make Big Brother available at its normal time next Thursday (online or on another channel)? Thank you. | @USER CAN YOU PLEASE MAKE BIG BROTHER AVAILABLE AT ITS NORMAL TIME NEXT THURSDAY (ONLINE OR ON ANOTHER CHANNEL)? THANK YOU. | neutral (p = 0.55) | positive (p = 0.80) |
| 1092 | @user @user @user Their release should have been demanded before Kerry ever sat down at the table. | @USER @USER @USER THEIR RELEASE SHOULD HAVE BEEN DEMANDED BEFORE KERRY EVER SAT DOWN AT THE TABLE. | negative (p = 0.61) | neutral (p = 0.56) |
| Vulnerability | Level | Data slice | Metric | Transformation | Deviation |
|---|---|---|---|---|---|
| Robustness | medium 🟡 | — | Fail rate = 0.092 | Transform to title case | 92/1000 tested samples (9.2%) changed prediction after perturbation |
🔍✨Examples
When feature “text” is perturbed with the transformation “Transform to title case”, the model changes its prediction in 9.2% of the cases. We expected the predictions not to be affected by this transformation.| text | Transform to title case(text) | Original prediction | Prediction after perturbation | |
|---|---|---|---|---|
| 1242 | the most important thing madonna has ever said is " don't go for 2nd best " | The Most Important Thing Madonna Has Ever Said Is " Don'T Go For 2Nd Best " | neutral (p = 0.49) | positive (p = 0.53) |
| 1636 | @user They're actually going venue shopping tomorrow! They're checking out Grand Bend and surrounding areas (ie. St. Mary's)! | @User They'Re Actually Going Venue Shopping Tomorrow! They'Re Checking Out Grand Bend And Surrounding Areas (Ie. St. Mary'S)! | positive (p = 0.63) | neutral (p = 0.75) |
| 904 | "James: Big Brother, if she (Meg) leaves tomorrow, I'm not going to have anyone to aggravate. #BB17 | "James: Big Brother, If She (Meg) Leaves Tomorrow, I'M Not Going To Have Anyone To Aggravate. #Bb17 | negative (p = 0.51) | neutral (p = 0.56) |
| Vulnerability | Level | Data slice | Metric | Transformation | Deviation |
|---|---|---|---|---|---|
| Robustness | medium 🟡 | — | Fail rate = 0.082 | Punctuation Removal | 82/1000 tested samples (8.2%) changed prediction after perturbation |
🔍✨Examples
When feature “text” is perturbed with the transformation “Punctuation Removal”, the model changes its prediction in 8.2% of the cases. We expected the predictions not to be affected by this transformation.| text | Punctuation Removal(text) | Original prediction | Prediction after perturbation | |
|---|---|---|---|---|
| 1489 | Curtis Painter...we have a chance again! Can't believe Kerry Collins didn't throw us a pick-six tonight | Curtis Painter we have a chance again Can t believe Kerry Collins didn t throw us a pick six tonight | positive (p = 0.69) | neutral (p = 0.53) |
| 1339 | "i got lots of tweets asking for shoutouts to Niall, if i think about it i will give shoutouts to Niall when i get back from work TOMORROW!!" | i got lots of tweets asking for shoutouts to Niall if i think about it i will give shoutouts to Niall when i get back from work TOMORROW | positive (p = 0.69) | neutral (p = 0.54) |
| 1952 | @user @user Yellow journalism. But you know? This may be Harper's Waterloo | @user @user Yellow journalism But you know This may be Harper s Waterloo | negative (p = 0.56) | neutral (p = 0.67) |
| Vulnerability | Level | Data slice | Metric | Transformation | Deviation |
|---|---|---|---|---|---|
| Robustness | medium 🟡 | — | Fail rate = 0.052 | Transform to lowercase | 52/1000 tested samples (5.2%) changed prediction after perturbation |
🔍✨Examples
When feature “text” is perturbed with the transformation “Transform to lowercase”, the model changes its prediction in 5.2% of the cases. We expected the predictions not to be affected by this transformation.| text | Transform to lowercase(text) | Original prediction | Prediction after perturbation | |
|---|---|---|---|---|
| 77 | @user seriously! itunes puts like an entire minute as a preview so 20 seconds is nothing. AND I KNOW! it needs to be monday ASAP! | @user seriously! itunes puts like an entire minute as a preview so 20 seconds is nothing. and i know! it needs to be monday asap! | negative (p = 0.46) | neutral (p = 0.48) |
| 756 | NIKE EMPLOYEE'S: If anyone want to work tomorrow at 5am call!!!!!!!!!!!!!!!!!! | nike employee's: if anyone want to work tomorrow at 5am call!!!!!!!!!!!!!!!!!! | positive (p = 0.56) | neutral (p = 0.60) |
| 950 | The Craft Awards are happening next week on October 4th at the Gladstone Hotel! Invite all your friends and get... | the craft awards are happening next week on october 4th at the gladstone hotel! invite all your friends and get... | neutral (p = 0.51) | positive (p = 0.64) |
👉Performance issues (1)
| Vulnerability | Level | Data slice | Metric | Transformation | Deviation |
|---|---|---|---|---|---|
| Performance | medium 🟡 | text contains "like" |
Precision = 0.726 | — | -5.94% than global |
🔍✨Examples
For records in the dataset where `text` contains "like", the Precision is 5.94% lower than the global Precision.| text | label | Predicted label |
|
|---|---|---|---|
| 17 | Why do y'all want Nicki to be pregnant so bad like maybe around the 7th album but she's literally still in her prime. | neutral | negative (p = 0.60) |
| 30 | Nicki did that for white media Idgaf . Nicki may act like she don't give af but she cares what the media thinks | positive | neutral (p = 0.50) |
| 77 | @user seriously! itunes puts like an entire minute as a preview so 20 seconds is nothing. AND I KNOW! it needs to be monday ASAP! | neutral | negative (p = 0.46) |
👉Ethical issues (2)
| Vulnerability | Level | Data slice | Metric | Transformation | Deviation |
|---|---|---|---|---|---|
| Ethical | medium 🟡 | — | Fail rate = 0.071 | Switch Religion | 6/85 tested samples (7.06%) changed prediction after perturbation |
🔍✨Examples
When feature “text” is perturbed with the transformation “Switch Religion”, the model changes its prediction in 7.06% of the cases. We expected the predictions not to be affected by this transformation.| text | Switch Religion(text) | Original prediction | Prediction after perturbation | |
|---|---|---|---|---|
| 178 | Pope's faster annulment plan may not mean as much in the US. @user | imam's faster annulment plan may not mean as much in the US. @user | neutral (p = 0.52) | negative (p = 0.51) |
| 298 | @user I love Israel. Love the Jews. So I may make a terrible Nazi. :( @user @user @user | @user I love Israel. Love the muslims. So I may make a terrible Nazi. :( @user @user @user | negative (p = 0.52) | positive (p = 0.41) |
| 533 | yo don't ever say that! god forbid! may it not happen! Zayn is cool...don't even try to compare them...i love zaynnn | yo don't ever say that! allah forbid! may it not happen! Zayn is cool...don't even try to compare them...i love zaynnn | neutral (p = 0.35) | positive (p = 0.51) |
| Vulnerability | Level | Data slice | Metric | Transformation | Deviation |
|---|---|---|---|---|---|
| Ethical | medium 🟡 | — | Fail rate = 0.050 | Switch Gender | 21/418 tested samples (5.02%) changed prediction after perturbation |
🔍✨Examples
When feature “text” is perturbed with the transformation “Switch Gender”, the model changes its prediction in 5.02% of the cases. We expected the predictions not to be affected by this transformation.| text | Switch Gender(text) | Original prediction | Prediction after perturbation | |
|---|---|---|---|---|
| 40 | Look #Steelers fans I know you may be upset about Suisham missing that kick. Just know that I heard a guy named Billy Cundiff is available. | Look #Steelers fans I know you may be upset about Suisham missing that kick. Just know that I heard a gal named Billy Cundiff is available. | neutral (p = 0.50) | negative (p = 0.48) |
| 139 | I should probs just kiss him cause we are gonna hang out tomorrow #MTVStars Lady Gaga | I should probs just kiss her cause we are gonna hang out tomorrow #MTVStars lord Gaga | positive (p = 0.54) | neutral (p = 0.49) |
| 343 | Big Brother starting next Friday? At the end of this morning @user slipped up & said 'don't cause you'll get me sacked before Friday night | Big sister starting next Friday? At the end of this morning @user slipped up & said 'don't cause you'll get me sacked before Friday night | negative (p = 0.55) | neutral (p = 0.56) |
Checkout out the Giskard Space and test your model.
Disclaimer: it's important to note that automated scans may produce false positives or miss certain vulnerabilities. We encourage you to review the findings and assess the impact accordingly.