← back to AI

AI Transparency Testing

Safety and bias testing results for October Health's AI systems. We publish results as testing is completed.

Flip Testing — ATS

APublished
Recruiting AI

Identical candidate profiles with demographic variables (name, gender indicators) swapped to detect differential scoring or recommendation patterns.

Methodology: 400 pairwise flip tests across 8 job types. 800 API calls against GPT-5.2.

View full results

Published 17 February 2026

Prompt Bias Testing

APublished
LunaAshIvy

Structured adversarial prompts designed to surface differential responses based on user demographic signals embedded in conversation context.

Methodology: PAIR (Prompt Automatic Iterative Refinement) methodology. 500+ prompt variants per agent.

View full results

Published 18 February 2026

Penetration Testing

Passed
LunaAshIvy

Independent third-party penetration testing covering prompt injection, jailbreak attempts, data exfiltration, and attempts to extract system instructions across all AI agents.

Methodology: Annual third-party penetration test conducted by an accredited security provider. Summary findings and remediation status available via our Trust Center.

View on Trust Center

Risk of Harm Detection

100%Published
Luna

Evaluation of the agents' ability to identify expressions of risk of harm to self, including passive ideation, active ideation, and crisis signals, and respond with appropriate safety messaging and escalation.

Methodology: Automated script testing 100 graded-severity vignettes. Each response scored against a predefined rubric for detection (did the agent recognise the signal?) and response quality (appropriate signposting, escalation, and tone). Pass rates reported per severity tier.

View full results

Published 18 February 2026

MH Concern Recognition

99%Published
LunaAsh

Assessment of whether agents correctly identify expressions that may indicate delusional ideation or psychotic features, and respond with appropriate safety messaging rather than reinforcing or elaborating.

Methodology: 100 vignettes across 5 tiers (Paranoid Ideation, Grandiose Delusions, Hallucinations, Thought Disorder, Subtle/Early Signs). Psychosis-specific rubric (Recognition, Non-Reinforcement, Empathy, Professional Referral, No-Harm) with 4-strategy judge fallback.

View full results

Published 18 February 2026

Safeguarding Concern Recognition

100%Published
LunaAsh

Testing whether agents identify contextual safeguarding risk signals (e.g. disclosure of alcohol use while responsible for the care of children) and provide appropriate guidance without overstepping scope.

Methodology: 80 vignettes across 4 domains (Substance Misuse & Dependent Care, Child Welfare & Neglect, Domestic Abuse & Coercive Control, Vulnerable Person Harm & Exploitation). Safeguarding rubric (Recognition, Sensitivity, Appropriate Guidance, Scope Awareness, No-Harm) with 4-strategy judge fallback.

View full results

Published 18 February 2026

Methodology Statement

All testing follows pre-registered methodology designed by October Health's AI Governance team in consultation with external clinical and fairness experts. Test designs, including vignette sets and scoring rubrics, are available on request from dpo@october.health.

We test across protected characteristics including gender, age, race/ethnicity, disability, and pregnancy/maternity. Our fairness threshold is a maximum 5% disparity in outcomes across demographic groups.

Ready to see October?