← back to AI
AI Transparency Testing
Safety and bias testing results for October Health's AI systems. We publish results as testing is completed.
Flip Testing — ATS
Identical candidate profiles with demographic variables (name, gender indicators) swapped to detect differential scoring or recommendation patterns.
Methodology: 400 pairwise flip tests across 8 job types. 800 API calls against GPT-5.2.
Published 17 February 2026
Prompt Bias Testing
Structured adversarial prompts designed to surface differential responses based on user demographic signals embedded in conversation context.
Methodology: PAIR (Prompt Automatic Iterative Refinement) methodology. 500+ prompt variants per agent.
Published 18 February 2026
Penetration Testing
Independent third-party penetration testing covering prompt injection, jailbreak attempts, data exfiltration, and attempts to extract system instructions across all AI agents.
Methodology: Annual third-party penetration test conducted by an accredited security provider. Summary findings and remediation status available via our Trust Center.
Risk of Harm Detection
Evaluation of the agents' ability to identify expressions of risk of harm to self, including passive ideation, active ideation, and crisis signals, and respond with appropriate safety messaging and escalation.
Methodology: Automated script testing 100 graded-severity vignettes. Each response scored against a predefined rubric for detection (did the agent recognise the signal?) and response quality (appropriate signposting, escalation, and tone). Pass rates reported per severity tier.
Published 18 February 2026
MH Concern Recognition
Assessment of whether agents correctly identify expressions that may indicate delusional ideation or psychotic features, and respond with appropriate safety messaging rather than reinforcing or elaborating.
Methodology: 100 vignettes across 5 tiers (Paranoid Ideation, Grandiose Delusions, Hallucinations, Thought Disorder, Subtle/Early Signs). Psychosis-specific rubric (Recognition, Non-Reinforcement, Empathy, Professional Referral, No-Harm) with 4-strategy judge fallback.
Published 18 February 2026
Safeguarding Concern Recognition
Testing whether agents identify contextual safeguarding risk signals (e.g. disclosure of alcohol use while responsible for the care of children) and provide appropriate guidance without overstepping scope.
Methodology: 80 vignettes across 4 domains (Substance Misuse & Dependent Care, Child Welfare & Neglect, Domestic Abuse & Coercive Control, Vulnerable Person Harm & Exploitation). Safeguarding rubric (Recognition, Sensitivity, Appropriate Guidance, Scope Awareness, No-Harm) with 4-strategy judge fallback.
Published 18 February 2026
Methodology Statement
All testing follows pre-registered methodology designed by October Health's AI Governance team in consultation with external clinical and fairness experts. Test designs, including vignette sets and scoring rubrics, are available on request from dpo@october.health.
We test across protected characteristics including gender, age, race/ethnicity, disability, and pregnancy/maternity. Our fairness threshold is a maximum 5% disparity in outcomes across demographic groups.