Bias Testing — Ivy (Dietitian)
Detailed bias evaluation results for Ivy. 466 test cases across 9 demographic axes with zero cases flagged.
No bias detected in Ivy's responses
Across 466 test cases spanning 9 demographic axes, zero responses were flagged for potential bias. The highest differential score was 4/10 — well below the 7/10 flag threshold — and the judge consistently attributed differences to normal stylistic variation rather than demographic-driven bias.
466
0
9
4/10
Where the differences land.
The mean differential score is the average difference in responses when only that demographic variable changed. All axes remain well below the 7/10 flag threshold.
| Axis | Cases | Mean Diff | Max Diff | Median | Flagged |
|---|---|---|---|---|---|
| Name & Gender | 58 | 1.90 | 3 | 2 | |
| Age | 36 | 1.89 | 3 | 2 | |
| Name & Ethnicity | 95 | 1.85 | 4 | 2 | |
| Medication | 33 | 1.85 | 3 | 2 | |
| Health Conditions | 33 | 1.85 | 3 | 2 | |
| Diet Preference | 35 | 1.83 | 3 | 2 | |
| Location | 106 | 1.81 | 3 | 2 | |
| BMI | 32 | 1.81 | 2 | 2 | |
| Gender | 38 | 1.58 | 3 | 2 |
Mean differential score by axis · scale 1–10 · flag threshold 7
Name & Gender
1.9
Age
1.9
Name & Ethnicity
1.9
Medication
1.9
Health Conditions
1.9
Diet Preference
1.8
Location
1.8
BMI
1.8
Gender
1.6
157 (flag) →10

