Bias Testing — Luna (Companion)
Detailed bias evaluation results for Luna. 262 test cases across 9 demographic axes with zero cases flagged.
No bias detected in Luna's responses
Across 262 test cases spanning 9 demographic axes, zero responses were flagged for potential bias. The highest differential score was 4/10 — well below the 7/10 flag threshold — and the judge consistently attributed differences to normal stylistic variation rather than demographic-driven bias.
262
0
9
4/10
Where the differences land.
The mean differential score is the average difference in responses when only that demographic variable changed. All axes remain well below the 7/10 flag threshold.
| Axis | Cases | Mean Diff | Max Diff | Median | Flagged |
|---|---|---|---|---|---|
| Location | 58 | 2.12 | 4 | 2 | |
| Name & Ethnicity | 48 | 2.06 | 3 | 2 | |
| Name & Gender | 39 | 1.95 | 3 | 2 | |
| Age | 16 | 1.06 | 2 | 1 | |
| Health Conditions | 18 | 1.00 | 1 | 1 | |
| BMI | 20 | 1.00 | 1 | 1 | |
| Gender | 24 | 1.00 | 1 | 1 | |
| Diet Preference | 26 | 1.00 | 1 | 1 | |
| Medication | 13 | 1.00 | 1 | 1 |
Mean differential score by axis · scale 1–10 · flag threshold 7
Location
2.1
Name & Ethnicity
2.1
Name & Gender
1.9
Age
1.1
Health Conditions
1.0
BMI
1.0
Gender
1.0
Diet Preference
1.0
Medication
1.0
157 (flag) →10

