AI in Lab Coat
Posts
Hidden Racism in Large Language Models Could Be Powering Healthcare Tools

Hidden Racism in Large Language Models Could Be Powering Healthcare Tools

Dialect prejudice and its unseen consequences in AI-powered healthcare

Bauris T.
September 21, 2024

Artificial intelligence (AI) is becoming a part of nearly every aspect of our lives, and healthcare is no exception. For more than 15 years, I’ve had the privilege of working with researchers and clinicians around the world developing diagnostic tests in healthcare.

I’ve found myself becoming more involved in the AI space, as more researchers are asking us to integrate AI into health-related technologies. I’ve highlighted many fantastic use cases in medical advances, but one area of this research that doesn’t get enough attention is the oversight and ethical deployment of large language models (LLM) in healthcare.

LLMs like ChatGPT aren’t your typical tool. They’re constantly evolving, learning from the world we’ve created, including all our biases. On the surface, these models appear neutral, claiming to treat everyone equally. But by ignoring factors such as race, gender, and ethnicity, they can actually reinforce harmful stereotypes.

Previous AI research on uncovering biases in racialized groups often focuses on overt racism, where specific racial groups are named and stereotyped. But today’s racism is much more subtle.

Social scientists describe this as ‘color-blind’ racism — where people claim to not ‘see’ race, avoiding racial language while still holding onto harmful beliefs. This approach may seem neutral but ends up preserving the same racial inequalities through covert, unspoken practices.

When this bias is programmed into AI healthcare tools, where decisions can be life-changing, these hidden prejudices are especially dangerous.

Dialect alone leads to biased judgments in AI responses

Remember the last time you used ChatGPT. You typed out your prompt the way you would usually talk, and the way you talk is a big part of your identity. Now, imagine the response you get isn’t neutral, but shaped by the model’s underlying assumptions based on how you speak.

By analyzing the way we converse, these LLMs make judgments that carry hidden, preconceived biases. These biases are subtle but significant, especially when AI is integrated into healthcare.

Photo by Levart_Photographer on Unsplash

This issue goes beyond surface-level language. Researchers from the University of Oxford and Chicago have uncovered a deeper issue with AI language models. The study shows that AI doesn’t need to directly reference race to carry harmful stereotypes.

As the authors of the research study put it:

Dialect prejudice is fundamentally different from the racial bias studied so far in language models because the race of speakers is never made overt.

To test this, the authors focused on the most stigmatized, well-known features of the dialect spoken by Black communities in cities like New York City, Detroit, Washington D.C., Los Angeles, and East Palo Alto.

They referred to this dialect as African American English (AAE) and compared them to Standard American English (SAE).

To test whether LLMs like ChatGPT exhibit dialect prejudice, the researchers used a method called ‘matched guise probing.’ This involved presenting the same sentence in two different dialects (AAE vs. SAE). The sentences had the same meaning but were written in each dialect to see how the AI would judge the speaker.

For example, the researchers might ask the AI to rate whether a speaker who says, ‘I be so happy when I wake up from a bad dream’ (AAE) is intelligent, and compare it to how it rates someone who says, ‘I am so happy when I wake up from a bad dream’ (SAE). The researchers found that, even though race wasn’t mentioned, the AI consistently gave less favorable ratings to the AAE speakers.

From the figure below, their results show that AAE is often associated with ‘lazy’ and ‘stupid’, while SAE is rated high on ‘intelligence’ and ‘brilliant.’

General overview of dialect analsysis. Comparing SAE (green) vs AAE (purple) and analyzing the rating output associated with the dialect. By V. Hofmann et al. Permission: CC BY 4.0.

AI bias links AAE speakers to lower-tier jobs and harsher criminal outcomes

If LLMs are covertly associating our dialect with the way we type our prompts into chat conversations, it raises serious concerns about how these biases shape real-world outcomes. For example, more companies are using AI to streamline hiring by automating tasks like writing job ads, screening resumes, and analyzing video interviews.

To test this, five LLMs were analyzed for employability decisions, where the models were tasked with matching occupations to speakers of SAE or AAE.

The results, shown in the figure below, reveal a clear pattern: AAE is often linked to occupations like singer, comedian, or artist, while SAE is more frequently associated with roles such as professor, psychologist, and academic.

More specifically, the five occupations most associated with SAE tend to require a university degree, while the top five occupations linked to AAE generally do not.

The association of various occupations with AAE or SAE is shown, with positive values indicating a stronger connection to AAE and negative values to SAE. The five occupations most associated with SAE tend to require a university degree, while the top five occupations linked to AAE generally do not. By V. Hofmann et al. Permission: CC BY 4.0.

What about criminal trials? What do you think would happen if you had ChatGPT as your judge? The researchers then explored court cases where LLMs were given hypothetical trials and the only evidence was a statement from the defendant (spoken in either AAE or SAE).

The study measured the probability assigned to different legal outcomes and tracked how often each outcome was favored based on whether the statement was in AAE or SAE.

The results clearly show that if LLMs were the judges, speaking in AAE significantly increases the chances of being convicted or sentenced to the death penalty when compared to speaking in SAE.

The data shows a relative increase in the number of convictions and death sentences for individuals speaking AAE compared to those speaking SAE. By V. Hofmann et al. Permission: CC BY 4.0.

What does this mean for AI in Healthcare?

Over the past few years, I’ve watched AI rapidly integrate into life science research, driving advancements that benefit healthcare and medical progress. Many of these applications show real promise. However, when it comes to using LLMs to build virtual health assistants, I can’t help but feel a sense of caution.

The research I’ve highlighted here is the first to reveal hidden biases in LLMs, but there is already evidence of AI in healthcare exhibiting overt racism. What happens when people from other racial or cultural groups, speaking in different dialects, start seeking medical advice? How will they be treated differently?

All these LLMs, including those developed for healthcare, are trained on vast amounts of text, which reflects the biases, conscious or unconscious, embedded in our language. When these models encounter dialects like AAE, they may struggle to fully understand or process the information.

What I’m quickly realizing is that the demand for AI in healthcare is growing much faster than the regulations and oversight needed to guide it. Does this mean we should avoid using AI, especially as virtual assistants for patients in healthcare? Not necessarily, but I believe we need to be cautious.

AI can never replace the human touch. Virtual assistants powered by LLMs may struggle with the nuances of patient communication, where understanding emotion and context is just as important as data-driven decisions.

However, I see real value in AI as a tool for healthcare professionals, helping with tasks like tracking drug interactions, updating treatment protocols, and managing patient data more efficiently.

From my perspective, LLMs should play a supportive role, taking on administrative tasks and streamlining workflows, allowing healthcare professionals to focus on what truly matters: offering compassionate, hands-on care.

If you liked the article, subscribe for more AI in Healthcare and Biotech News!

What content do you want to see more of?

Reply

or to participate.