We studied chatbots and language and saw a huge problem: They mean 80% when they say 'likely' but humans hear 65%

Trump vows to protect Social Security, Medicare, Medicaid. But his tax cuts shortened their lifespan

After the Snow Day, the Sick Day: One in 6 New York City teachers called out of work on Tuesday

When a human says an event is “probable” or “likely,” people generally have a shared, if fuzzy, understanding of what that means. But when an AI chatbot like ChatGPT uses the same word, it’s not assessing the odds the way we do, my colleagues and I found.

We recently published a study in the journal NPJ Complexity that suggests that, while large language model AIs excel at conversation, they often fail to align with humans when communicating uncertainty. The research focused on words of estimative probability, which include terms like “maybe,” “probably” and “almost certain.”

By comparing how AI models and humans map these words to numerical percentages, we uncovered significant gaps between humans and large language models. While the models do tend to agree with humans on extremes like “impossible,” they diverge sharply on hedge words like “maybe.” For example, a model might use the word “likely” to represent an 80% probability, while a human reader assumes it means closer to 65%.

This could be because humans can interpret words such as “likely” and “probable” based more on contextual cues and personal experiences. In contrast, large language models may be averaging over conflicting usages of those words in their training data, leading to divergences with human interpretations.

Our study also found that large language models are sensitive to gendered language and the specific language used for prompting. When a prompt changed from “he” to “she,” the AI’s probability estimates often became more rigid, reflecting biases embedded in its training data. When a prompt changed from English to Chinese, the AI’s probability estimates often shifted, possibly due to differences between English and Chinese in how people express and understand uncertainty.

We studied chatbots and language and saw a huge problem: They mean 80% when they say ‘likely’ but humans hear 65%

AI chatbots don’t interpret ‘probably’ and ‘maybe’ the same way you do. Mayank Kejriwal

Why it matters

Far from being a linguistic quirk, this misalignment is a fundamental challenge for AI safety and human-AI interaction. As large language models are increasingly used in high-stakes fields like health care, government policy and scientific reporting, the way they communicate risk becomes a matter of public trust.

If an AI assistant helping a doctor, for instance, describes a side effect as “unlikely,” but the model’s internal calculation of “unlikely” is much higher than the doctor’s interpretation, the resulting decision could be flawed.

What other research is being done

Scientists have studied how humans quantify uncertainty since the 1960s, a field pioneered by CIA analysts to improve intelligence reporting. More recently, there has been an explosion in large language model literature seeking to look under the hood of neural networks to better understand their “behaviors” and linguistic patterns.

Our study adds a layer of complexity by treating the interaction between humans and artificial intelligence as a biological-like system where meaning can degrade. It moves beyond simply measuring if an AI is “smart” and instead asks if it is aligned.

Other researchers are currently exploring whether so-called chain-of-thought prompting – asking the AI to show its work – can fix these errors. However, our study found that even advanced reasoning doesn’t always bridge the gap between statistical data and verbal labels.

What’s next

A goal for future AI development is to create models that don’t just predict the next likely word but actually understand the weight of the uncertainty they are conveying. Researchers are calling for more robust consistency metrics to ensure that if a model sees a 10% chance in the data, it chooses the same word every time.

As we move toward a world where AI summarizes scientific papers and manages people’s schedules, making sure that “probably” means “probably” is a vital step in making these systems reliable partners rather than just sophisticated parrots.

The Research Brief is a short take on interesting academic work.

Mayank Kejriwal, Research Assistant Professor of Industrial & Systems Engineering, University of Southern California

This article is republished from The Conversation under a Creative Commons license. Read the original article.

Credit: Source link

We studied chatbots and language and saw a huge problem: They mean 80% when they say ‘likely’ but humans hear 65%

Trump vows to protect Social Security, Medicare, Medicaid. But his tax cuts shortened their lifespan

After the Snow Day, the Sick Day: One in 6 New York City teachers called out of work on Tuesday

Lakers fall to Magic on game-winner

Related Posts

Trump vows to protect Social Security, Medicare, Medicaid. But his tax cuts shortened their lifespan

After the Snow Day, the Sick Day: One in 6 New York City teachers called out of work on Tuesday

Gen Z’s enthusiasm for all things touchable is resurrecting the analog economy—and costing parents

The workplace benefit 95% of workers want but aren’t satisfied with is a pretty basic one

Inside a $280 billion tobacco giant’s push to turn smokers into smoke-free customers

Olympic runner, Mo Farah has a message for struggling Gen Z

Leave a Reply Cancel reply

What's New Here!

Victoria’s Secret CEO says Gen Z didn’t grow up with 2000s body image baggage

Roku stock surges on earnings beat, record premium subscriptions

How to watch ice dance at the Winter Olympics for free

Amazon has lost $450 billion in value during historic losing streak

The new Fed chair’s billionaire father-in-law is a friend of Trump’s from college and has business interests in Greenland

Olympic runner, Mo Farah has a message for struggling Gen Z

China’s Xi reasserts Taiwan stance in call with Trump, while U.S. president pushes trade

About

Recent Posts

Newslatter

Welcome Back!

Create New Account!

Retrieve your password

We studied chatbots and language and saw a huge problem: They mean 80% when they say ‘likely’ but humans hear 65%

READ ALSO

Why it matters

What other research is being done

What’s next

Lakers fall to Magic on game-winner

Related Posts

Leave a Reply Cancel reply

What's New Here!

About

Recent Posts

Newslatter

Welcome Back!

Create New Account!

Retrieve your password