👉Robustness issues (1)
When feature “Name” is perturbed with the transformation “Transform to title case”, the model changes its prediction in 6.67% of the cases. We expected the predictions not to be affected by this transformation.
Level |
Metric |
Transformation |
Deviation |
medium 🟡 |
Fail rate = 0.067 |
Transform to title case |
1/15 tested samples (6.67%) changed prediction after perturbation |
Taxonomy
avid-effect:performance:P0201
🔍✨Examples
|
Name |
Transform to title case(Name) |
Original prediction |
Prediction after perturbation |
505 |
Penasco y Castellana, Mr. Victor de Satode |
Penasco Y Castellana, Mr. Victor De Satode |
yes (p = 0.50) |
no (p = 0.51) |
👉Overconfidence issues (6)
For records in the dataset where Name
contains "mr", we found a significantly higher number of overconfident wrong predictions (31 samples, corresponding to 62.00% of the wrong predictions in the data slice).
Level |
Data slice |
Metric |
Deviation |
major 🔴 |
Name contains "mr" |
Overconfidence rate = 0.620 |
+59.19% than global |
Taxonomy
avid-effect:performance:P0204
🔍✨Examples
|
Name |
Survived |
Predicted Survived |
838 |
Chip, Mr. Chang |
yes |
no (p = 0.96) |
|
|
|
yes (p = 0.04) |
744 |
Stranden, Mr. Juho |
yes |
no (p = 0.96) |
|
|
|
yes (p = 0.04) |
429 |
Pickard, Mr. Berk (Berk Trembisky) |
yes |
no (p = 0.96) |
|
|
|
yes (p = 0.04) |
For records in the dataset where text_length(Name)
< 28.500, we found a significantly higher number of overconfident wrong predictions (33 samples, corresponding to 55.93% of the wrong predictions in the data slice).
Level |
Data slice |
Metric |
Deviation |
major 🔴 |
text_length(Name) < 28.500 |
Overconfidence rate = 0.559 |
+43.61% than global |
Taxonomy
avid-effect:performance:P0204
🔍✨Examples
|
Name |
text_length(Name) |
Survived |
Predicted Survived |
838 |
Chip, Mr. Chang |
15 |
yes |
no (p = 0.96) |
|
|
|
|
yes (p = 0.04) |
744 |
Stranden, Mr. Juho |
18 |
yes |
no (p = 0.96) |
|
|
|
|
yes (p = 0.04) |
643 |
Foo, Mr. Choong |
15 |
yes |
no (p = 0.95) |
|
|
|
|
yes (p = 0.05) |
For records in the dataset where Fare
< 14.850, we found a significantly higher number of overconfident wrong predictions (22 samples, corresponding to 53.66% of the wrong predictions in the data slice).
Level |
Data slice |
Metric |
Deviation |
major 🔴 |
Fare < 14.850 |
Overconfidence rate = 0.537 |
+37.77% than global |
Taxonomy
avid-effect:performance:P0204
🔍✨Examples
|
Fare |
Survived |
Predicted Survived |
744 |
7.925 |
yes |
no (p = 0.96) |
|
|
|
yes (p = 0.04) |
429 |
8.05 |
yes |
no (p = 0.96) |
|
|
|
yes (p = 0.04) |
338 |
8.05 |
yes |
no (p = 0.95) |
|
|
|
yes (p = 0.05) |
For records in the dataset where Sex
== "male", we found a significantly higher number of overconfident wrong predictions (32 samples, corresponding to 53.33% of the wrong predictions in the data slice).
Level |
Data slice |
Metric |
Deviation |
major 🔴 |
Sex == "male" |
Overconfidence rate = 0.533 |
+36.94% than global |
Taxonomy
avid-effect:performance:P0204
🔍✨Examples
|
Sex |
Survived |
Predicted Survived |
838 |
male |
yes |
no (p = 0.96) |
|
|
|
yes (p = 0.04) |
744 |
male |
yes |
no (p = 0.96) |
|
|
|
yes (p = 0.04) |
429 |
male |
yes |
no (p = 0.96) |
|
|
|
yes (p = 0.04) |
For records in the dataset where Parch
== 0, we found a significantly higher number of overconfident wrong predictions (31 samples, corresponding to 47.69% of the wrong predictions in the data slice).
Level |
Data slice |
Metric |
Deviation |
major 🔴 |
Parch == 0 |
Overconfidence rate = 0.477 |
+22.45% than global |
Taxonomy
avid-effect:performance:P0204
🔍✨Examples
|
Parch |
Survived |
Predicted Survived |
838 |
0 |
yes |
no (p = 0.96) |
|
|
|
yes (p = 0.04) |
744 |
0 |
yes |
no (p = 0.96) |
|
|
|
yes (p = 0.04) |
429 |
0 |
yes |
no (p = 0.96) |
|
|
|
yes (p = 0.04) |
For records in the dataset where SibSp
== 0, we found a significantly higher number of overconfident wrong predictions (27 samples, corresponding to 44.26% of the wrong predictions in the data slice).
Level |
Data slice |
Metric |
Deviation |
medium 🟡 |
SibSp == 0 |
Overconfidence rate = 0.443 |
+13.65% than global |
Taxonomy
avid-effect:performance:P0204
🔍✨Examples
|
SibSp |
Survived |
Predicted Survived |
838 |
0 |
yes |
no (p = 0.96) |
|
|
|
yes (p = 0.04) |
744 |
0 |
yes |
no (p = 0.96) |
|
|
|
yes (p = 0.04) |
429 |
0 |
yes |
no (p = 0.96) |
|
|
|
yes (p = 0.04) |
👉Spurious Correlation issues (3)
Data slice Sex
== "female" seems to be highly associated to prediction Survived = yes
(92.67% of predictions in the data slice).
Level |
Data slice |
Metric |
Deviation |
minor 🟡 |
Sex == "female" |
Nominal association (Theil's U) = 0.697 |
Prediction Survived = yes for 92.67% of samples in the slice |
Taxonomy
avid-effect:performance:P0103
🔍✨Examples
|
Sex |
Survived |
Predicted Survived |
123 |
female |
yes |
yes (p = 0.72) |
412 |
female |
yes |
yes (p = 0.91) |
849 |
female |
yes |
yes (p = 0.92) |
Data slice Sex
== "male" seems to be highly associated to prediction Survived = no
(96.28% of predictions in the data slice).
Level |
Data slice |
Metric |
Deviation |
minor 🟡 |
Sex == "male" |
Nominal association (Theil's U) = 0.697 |
Prediction Survived = no for 96.28% of samples in the slice |
Taxonomy
avid-effect:performance:P0103
🔍✨Examples
|
Sex |
Survived |
Predicted Survived |
714 |
male |
no |
no (p = 0.94) |
81 |
male |
yes |
no (p = 0.95) |
555 |
male |
no |
no (p = 0.80) |
Data slice Name
contains "mr" seems to be highly associated to prediction Survived = no
(98.48% of predictions in the data slice).
Level |
Data slice |
Metric |
Deviation |
minor 🟡 |
Name contains "mr" |
Nominal association (Theil's U) = 0.609 |
Prediction Survived = no for 98.48% of samples in the slice |
Taxonomy
avid-effect:performance:P0103
🔍✨Examples
|
Name |
Survived |
Predicted Survived |
714 |
Greenberg, Mr. Samuel |
no |
no (p = 0.94) |
81 |
Sheerlinck, Mr. Jan Baptist |
yes |
no (p = 0.95) |
555 |
Wright, Mr. George |
no |
no (p = 0.80) |
👉Performance issues (10)
For records in the dataset where Name
contains "mr", the Recall is 96.85% lower than the global Recall.
Level |
Data slice |
Metric |
Deviation |
major 🔴 |
Name contains "mr" |
Recall = 0.021 |
-96.85% than global |
Taxonomy
avid-effect:performance:P0204
🔍✨Examples
|
Name |
Survived |
Predicted Survived |
81 |
Sheerlinck, Mr. Jan Baptist |
yes |
no (p = 0.95) |
543 |
Beane, Mr. Edward |
yes |
no (p = 0.87) |
390 |
Carter, Mr. William Ernest |
yes |
no (p = 0.77) |
For records in the dataset where Sex
== "male", the Recall is 83.19% lower than the global Recall.
Level |
Data slice |
Metric |
Deviation |
major 🔴 |
Sex == "male" |
Recall = 0.111 |
-83.19% than global |
Taxonomy
avid-effect:performance:P0204
🔍✨Examples
|
Sex |
Survived |
Predicted Survived |
81 |
male |
yes |
no (p = 0.95) |
125 |
male |
yes |
no (p = 0.76) |
543 |
male |
yes |
no (p = 0.87) |
For records in the dataset where Pclass
== 3, the Precision is 36.89% lower than the global Precision.
Level |
Data slice |
Metric |
Deviation |
major 🔴 |
Pclass == 3 |
Precision = 0.475 |
-36.89% than global |
Taxonomy
avid-effect:performance:P0204
🔍✨Examples
|
Pclass |
Survived |
Predicted Survived |
81 |
3 |
yes |
no (p = 0.95) |
125 |
3 |
yes |
no (p = 0.76) |
483 |
3 |
yes |
no (p = 0.64) |
For records in the dataset where Name
contains "master", the Accuracy is 10.0% lower than the global Accuracy.
Level |
Data slice |
Metric |
Deviation |
medium 🟡 |
Name contains "master" |
Accuracy = 0.708 |
-10.00% than global |
Taxonomy
avid-effect:performance:P0204
🔍✨Examples
|
Name |
Survived |
Predicted Survived |
125 |
Nicola-Yarred, Master. Elias |
yes |
no (p = 0.76) |
348 |
Coutts, Master. William Loch "William" |
yes |
no (p = 0.61) |
869 |
Johnson, Master. Harold Theodor |
yes |
no (p = 0.56) |
For records in the dataset where Parch
== 0, the Recall is 9.2% lower than the global Recall.
Level |
Data slice |
Metric |
Deviation |
medium 🟡 |
Parch == 0 |
Recall = 0.600 |
-9.20% than global |
Taxonomy
avid-effect:performance:P0204
🔍✨Examples
|
Parch |
Survived |
Predicted Survived |
81 |
0 |
yes |
no (p = 0.95) |
125 |
0 |
yes |
no (p = 0.76) |
543 |
0 |
yes |
no (p = 0.87) |
For records in the dataset where Parch
== 2, the Precision is 8.1% lower than the global Precision.
Level |
Data slice |
Metric |
Deviation |
medium 🟡 |
Parch == 2 |
Precision = 0.692 |
-8.10% than global |
Taxonomy
avid-effect:performance:P0204
🔍✨Examples
|
Parch |
Survived |
Predicted Survived |
436 |
2 |
no |
yes (p = 0.58) |
390 |
2 |
yes |
no (p = 0.77) |
593 |
2 |
no |
yes (p = 0.60) |
For records in the dataset where Embarked
== "S", the Recall is 7.52% lower than the global Recall.
Level |
Data slice |
Metric |
Deviation |
medium 🟡 |
Embarked == "S" |
Recall = 0.611 |
-7.52% than global |
Taxonomy
avid-effect:performance:P0204
🔍✨Examples
|
Embarked |
Survived |
Predicted Survived |
81 |
S |
yes |
no (p = 0.95) |
543 |
S |
yes |
no (p = 0.87) |
483 |
S |
yes |
no (p = 0.64) |
For records in the dataset where Pclass
== 1, the Accuracy is 6.82% lower than the global Accuracy.
Level |
Data slice |
Metric |
Deviation |
medium 🟡 |
Pclass == 1 |
Accuracy = 0.733 |
-6.82% than global |
Taxonomy
avid-effect:performance:P0204
🔍✨Examples
|
Pclass |
Survived |
Predicted Survived |
390 |
1 |
yes |
no (p = 0.77) |
740 |
1 |
yes |
no (p = 0.63) |
701 |
1 |
yes |
no (p = 0.70) |
For records in the dataset where Name
contains "miss", the Accuracy is 6.31% lower than the global Accuracy.
Level |
Data slice |
Metric |
Deviation |
medium 🟡 |
Name contains "miss" |
Accuracy = 0.737 |
-6.31% than global |
Taxonomy
avid-effect:performance:P0204
🔍✨Examples
|
Name |
Survived |
Predicted Survived |
882 |
Dahlberg, Miss. Gerda Ulrika |
no |
yes (p = 0.55) |
436 |
Ford, Miss. Doolina Margaret "Daisy" |
no |
yes (p = 0.58) |
205 |
Strom, Miss. Telma Matilda |
no |
yes (p = 0.83) |
For records in the dataset where Embarked
== "Q", the Precision is 5.18% lower than the global Precision.
Level |
Data slice |
Metric |
Deviation |
medium 🟡 |
Embarked == "Q" |
Precision = 0.714 |
-5.18% than global |
Taxonomy
avid-effect:performance:P0204
🔍✨Examples
|
Embarked |
Survived |
Predicted Survived |
593 |
Q |
no |
yes (p = 0.60) |
657 |
Q |
no |
yes (p = 0.75) |
885 |
Q |
no |
yes (p = 0.62) |