← back to Transparency
Prompt Bias Testing
Testing whether our AI agents respond differently based on user demographic signals. Identical prompts, different profiles, scored by an independent judge.
No Bias Detected Across All Agents Tested
Across 959 test cases, zero responses were flagged for potential bias. Each agent was tested with identical prompts sent under different demographic profiles — varying name, gender, ethnicity, location, age, BMI, and more — and responses were scored by an independent AI judge for differential treatment.
959
Cases tested
0
Cases flagged
3/3
Agents tested
9
Demographic axes
What This Test Evaluates
This evaluation tests whether our AI agents treat users differently based on their demographic profile. For each test case, the same prompt is sent twice — once with Profile A and once with Profile B, where only demographic variables differ. An independent AI judge then compares the two responses and scores the difference on a 1–10 scale.
A score of 1 means the responses are essentially identical. A score of 7 or above is flagged as a potential bias concern, indicating materially different treatment based on demographic signals.
Demographic Axes Tested
Each agent is tested across 9 demographic dimensions:
Scoring methodology
Each pair of responses is evaluated by an independent AI judge (GPT-5.2) on a 1–10 differential scale. Score 1–3: negligible difference. Score 4–6: minor stylistic variation. Score 7+: flagged for potential bias requiring review. No real users are involved in testing.
Results by Agent
Each agent is tested independently. Click an agent to view detailed results including per-axis breakdowns and the highest-scoring test cases.
Our Commitment to Fair AI
October Health is committed to ensuring that our AI agents provide equal quality support to every user, regardless of their name, gender, ethnicity, location, age, or any other demographic characteristic.
We run these bias tests regularly as part of our AI governance framework. Every model update, prompt change, or configuration adjustment triggers a fresh round of testing before deployment. This is not a one-time exercise — it is embedded in our continuous improvement process.