AI-powered voice analysis for faster mental health diagnosis

My role

Designer
UX Researcher

Team

Dobko and Nirmal, Developers
Wu, Product Manager
Rajabi, Company Advisor

Timeline

16 weeks
Transcarent

Area

UX Research

Project Overview

Problem

Getting a mental health diagnosis can take weeks or even years. The process is subjective for doctors and hard to articulate for patients, often leading to delayed treatment. Transcarent partnered with Cornell Tech to explore a tech-driven solution that fits into the clinical workflow.

Solution

My team and I designed an AI-driven tool that analyzes voice patterns like tone, rhythm, and emotion during therapy sessions. It provides objective insights to support doctors’ decisions and creates session records to help track progress over time.

Impact

As the lead on user research, I collaborated with a cross-functional team and learned to integrate a speech behavior API. Our user-centered design earned us a spot as one of 9 finalist teams (out of 88) to present to Cornell Tech faculty, industry leaders, and VCs.

The problem

An accurate mental health diagnosis can take weeks or even years to determine.

This is partly because diagnoses vary from person to person; some might have one, while others have quite a few. Mental health symptoms can often affect self-care, life skills, and relationships, so it's essential for the patient to receive a diagnosis, which is the first step to treatment.

Why does this problem matter?

1 in 5

Adults in the U.S. experience mental health illness each year.

NAMI, 2024

50%

Of individuals started to have mental illness symptoms by 14.

NHI, 2024

75%

Of individuals experienced mental illness symptoms by 24.

NHI, 2024

The design challenge

How might we improve the mental health diagnosis process by utilizing an AI-driven digital tool?

Research & Interviews

How is tech currently integrated into healthcare processes?

Research Method 01: Secondary Research

We researched the stakeholder ecosystem, current patient experience, and technological advancements to get a clear picture of the complex systems within the healthcare industry. We identified a trend towards integrating tech into diagnostics.

system mapping

How do patients and doctors use and view tech in healthcare?

Research Method 02: User Interviews

I led the creation of our interview guides with 5 doctors, 5 patients, and 2 medical students. For doctors, we aimed to understand how tech is used in their practices and their feelings towards AI. For patients, we wanted to learn about their diagnosis and treatment experience.

“Providers are interested in generative AI, but the research is very new. AI is already used in diagnostics and pathology, classification tasks.”

Medical Student 
at Weill Cornell Medicine

What did we learn from our research?

Based on the insights from our system mapping and interviews, we discovered 3 main insights into what factors are causing delayed or misdiagnosis when it comes to mental health.

key insights

Lack of standardized diagnosis tools

Variability in diagnostics criteria and limitations of subjective screening tools can lead to delayed or misdiagnosis.

Symptoms can change over time

A diagnosis may need to be revised as symptoms evolve to ensure appropriate treatment.

Struggle to articulate emotions

Patients may face difficulties articulating or expressing their emotions, which slows or prevents timely diagnosis.

development

Integrating speech pathology into mental health diagnosis.

Speech pathologists treat speech and language problems, like stuttering, by analyzing aspects of a person's tone, pitch, cadence, pause patterns, articulation, and how they convey emotions. We wondered if the same process could be applied to mental health diagnosis since speech patterns are often affected by psychological conditions.

Mental Health Symptoms Show Up In The Voice

Mental health symptoms often show up in the voice before they’re consciously recognized. For example, depression may present as slowed speech or flat intonation, while anxiety might manifest as rushed or jittery speech

Objective Data for Timely Diagnosis

Subtle voice changes can serve as early indicators, allowing for timely intervention before symptoms worsen.

Testing

Validating our product's technological feasibility and key desirability through risky assumption testing.

I collaborated with engineers and a product manager to identify and address the most uncertain and high-impact assumptions early on in the process. We were able to prioritize critical features and reduce the risk of building something that doesn’t resonate with users or meet technical requirements, increasing the chances of its success.

Focus Area

  • AI for Speech and Tone Analysis

  • Tech Usability

Assumptions

  • Mental Health Affects Speech Patterns

  • People Speak Normally When Recorded

Assumption 01:
Mental Health Affects Speech Patterns

Ten people were enlisted to complete a week-long experiment in which they filled out a mood tracker and sent audio recordings about their day. Our questionnaire was modeled on the clinical guidelines for mental health diagnosis. We interpreted the recordings using an existing speech behavior analysis API by Humane AI.

Clinical guideline for mental healthcare diagnostics

Our questionnaire that we sent to participants

There is a significant and useful correlation found between the participants’ inputs from the mood tracker and API analysis.

Participant's Mood Tracker Answers

Voice Sentiment Analysis AI API

We identified consistency in the AI’s insights throughout the experiment.

Day 1

“Confident and focused”

Day 2

“Tired, bit stressed, and focused”

Day 3

“Felt confident”

key takeaway

We found a useful correlation between the participant’s mental health state and audio recording patterns.

Assumption 02:
People Speak Normally When Recorded

We compared the answers and mannerisms of interviewees while initially speaking unrecorded and after we began recording the conversation. The participants were randomly selected, unaware of the purpose of the research, and did not know ahead of time that they would be asked to be recorded halfway. 

80% had no significant voice change

We inferred the changes in speech patterns were minimal and wouldn't drastically affect the accuracy of the analysis.

75% took longer pauses as the conversation continued.

We took these pauses as an indicator that participants felt more comfortable the more they spoke with us.

key takeaway

Even with the conversation being recorded, people can still speak normally. 

The solution

An AI-guided voice and speech analysis tool that assists mental health practitioners in diagnosis during sessions.

user journey

Reflection

What did I learn from this project?

I thoroughly loved collaborating with my team as each of our unique perspectives was valued throughout the development process. We were able to thoughtfully consider the user's needs, the tech needed to make our solution feasible, and the roadmap for our product launch. This genuine involvement and curiosity led to a meaningful solution, and we were selected as 1 of 9 teams from a cohort of 80 groups to pitch to Cornell Tech faculty and partners.

A big thank you to my amazing teammates and advisor!

Thanks for stopping by!
Let's build together.

Thanks for stopping by!
Let's build together.

© Kirsten Geiger 2025

© Kirsten Geiger 2025