Clicky

  • Login
  • Register
  • Submit Your Content
  • Contact Us
Saturday, May 17, 2025
World Tribune
No Result
View All Result
  • Home
  • News
  • Business
  • Technology
  • Sports
  • Health
  • Food
Submit
  • Home
  • News
  • Business
  • Technology
  • Sports
  • Health
  • Food
No Result
View All Result
World Tribune
No Result
View All Result

OpenAI unveils HealthBench to evaluate LLMs’ safety in healthcare

May 15, 2025
in Health
Reading Time: 4 mins read
A A
OpenAI unveils HealthBench to evaluate LLMs’ safety in healthcare
0
SHARES
ShareShareShareShareShare

OpenAI unveils HealthBench to evaluate LLMs’ safety in healthcare

OpenAI has announced the launch of HealthBench, a benchmark to evaluate AI models in healthcare using real-world applicability and physician judgment. 

“The 5,000 conversations in HealthBench simulate interactions between AI models and individual users or clinicians. The task for a model is to provide the best possible response to the user’s last message,” the company said in a statement. 

OpenAI built the benchmark with 262 physicians in 60 countries, who are proficient in 49 languages and have training in 26 medical specialties. 

HealthBench includes 5,000 health conversations, each with a physician-created rubric to evaluate model responses. The rubric evaluation includes 48,562 unique rubric criteria. 

The company said the conversations were created through “synthetic generation and human adversarial testing,” are multilingual, and span various medical specialties and contexts.  

“Every model response is graded against a set of physician-written rubric criteria specific to that conversation,” the company said. 

“Each criterion outlines what an ideal response should include or avoid (e.g., a specific fact to include or unnecessarily technical jargon to avoid). Each criterion has a corresponding point value, weighted to match the physician’s judgment of that criterion’s importance.” 

The model’s responses are evaluated using GPT-4.1 to determine if each rubric criterion is met. An overall score based on the criteria being met is shown to the user and compared to the maximum possible score. 

HealthBench is split into seven themes: expertise-tailored communication, response depth, emergency referrals, health data tasks, global health, responding under uncertainty and context seeking.

“Evaluations like HealthBench are part of our ongoing efforts to understand model behavior in high-impact settings and help ensure progress is directed toward real-world benefit,” the company said. 

“Our findings show that large language models have improved significantly over time and already outperform experts in writing responses to examples tested in our benchmark. Yet even the most advanced systems still have substantial room for improvement, particularly in seeking necessary context for underspecified queries and worst-case reliability. We look forward to sharing results for future models.”

The tools are publicly available on GitHub. 

THE LARGER TREND

OpenAI’s CEO, Sam Altman, was part of President Donald Trump’s press conference earlier this year announcing the launch of Project Stargate. This $500 billion project would focus on developing the physical and virtual infrastructure to power AI construction, including AI to improve health outcomes. 

The partners, which also included Oracle’s chief technology officer, Larry Ellison, and SoftBank’s CEO, Masayoshi Son, touted the project as a game changer for healthcare.

Altman said during the press conference that he is thrilled to be part of Stargate and anticipates that diseases will be cured at an unprecedented rate. 

Ellison added that a cancer vaccine is one of the “most exciting” things the group is working on, using the tools that Altman and Son are providing.

Earlier this month, the Financial Times reported that Project Stargate was considering international expansion, with its top country of choice being the UK. Germany and France are also attractive candidates. 

However, this week, Bloomberg reported that the project is facing delays due to the tariffs imposed by Trump and economic uncertainty. 

Due to economic uncertainty and growing market volatility, banks and institutional investors are wary of investing in Stargate, especially as data center build-out costs are uncertain due to U.S. tariffs, particularly on chips, server racks and cooling systems.   

Additionally, SoftBank, which pledged to donate an immediate $100 billion in the project with the goal of it becoming $500 billion within the next four years, has yet to develop a financing template or start discussions with potential backers, according to Bloomberg.  

Credit: Source link

READ ALSO

Datavant acquires Aetion to expand RWE platform

Sprinter scores $55M for home healthcare offering

ShareTweetSendSharePin
Previous Post

Threads is finally embracing links

Next Post

The best security camera for smart home newbies is on sale for a record-low price

Related Posts

Datavant acquires Aetion to expand RWE platform
Health

Datavant acquires Aetion to expand RWE platform

May 16, 2025
Sprinter scores M for home healthcare offering
Health

Sprinter scores $55M for home healthcare offering

May 16, 2025
Cognixion, Blackrock Neurotech partner on brain-computer interface technology
Health

Cognixion, Blackrock Neurotech partner on brain-computer interface technology

May 15, 2025
SpotitEarly launches in U.S. with .3M to detect cancer using AI and dogs
Health

SpotitEarly launches in U.S. with $20.3M to detect cancer using AI and dogs

May 15, 2025
Korean medical imaging AI companies secure supply contracts in Germany
Health

Korean medical imaging AI companies secure supply contracts in Germany

May 15, 2025
Online therapy retrains the brain to treat chronic pain
Health

Online therapy retrains the brain to treat chronic pain

May 15, 2025
Next Post
The best security camera for smart home newbies is on sale for a record-low price

The best security camera for smart home newbies is on sale for a record-low price

Leave a Reply Cancel reply

Your email address will not be published. Required fields are marked *

What's New Here!

Cyber firm Proofpoint to buy Europe’s Hornetsecurity as it eyes IPO

Cyber firm Proofpoint to buy Europe’s Hornetsecurity as it eyes IPO

May 15, 2025
Tommy DeVito on how he is approaching uphill climb to stick with Giants

Tommy DeVito on how he is approaching uphill climb to stick with Giants

May 9, 2025
How one woman pivoted from corporate HR to become a career coach and astrologist

How one woman pivoted from corporate HR to become a career coach and astrologist

May 1, 2025
Pirates employee whips fan with belt in bizarre scene at PNC Park

Pirates employee whips fan with belt in bizarre scene at PNC Park

May 6, 2025
Food & Beverage Industry Faces “Digital Drag” as Only 6% Achieve Full Digitalization

Food & Beverage Industry Faces “Digital Drag” as Only 6% Achieve Full Digitalization

April 29, 2025
Framework Laptop 13 (2025) with AMD Ryzen AI 300 review: The usual iterative upgrade

Framework Laptop 13 (2025) with AMD Ryzen AI 300 review: The usual iterative upgrade

May 6, 2025
Why energy companies are pivoting toward AI automation

Why energy companies are pivoting toward AI automation

May 5, 2025

About

World Tribune is an online news portal that shares the latest news on world, business, health, tech, sports, and related topics.

Follow us

Recent Posts

  • These aren’t the strengths anyone expected for Mets, Yankees
  • U.S. debt no longer earns a top grade at any of the major credit rating agencies after Moody’s downgrade
  • Saudi Arabia, Qatar UAE go all out
  • New Food and Beverage Product Launches, May 12 – 16

Newslatter

Loading
  • Submit Your Content
  • Contact Us
  • Privacy Policy
  • Terms of Use
  • DMCA

© 2024 World Tribune - All Rights Reserved!

No Result
View All Result
  • Home
  • News
  • Business
  • Technology
  • Sports
  • Health
  • Food

© 2024 World Tribune - All Rights Reserved!

Welcome Back!

Login to your account below

Forgotten Password? Sign Up

Create New Account!

Fill the forms below to register

All fields are required. Log In

Retrieve your password

Please enter your username or email address to reset your password.

Log In