OpenAI GPT-4 Outperforms Doctors in Diagnosing Eye Problems, Study Finds

University of Cambridge research shows GPT-4 holds its ground against doctors in assessing eye issues

Ben Wodecki, Jr. Editor

April 25, 2024

2 Min Read
A man having an eye scan
Getty Images

OpenAI’s GPT-4 model beat doctors at assessing eye problems in a University of Cambridge study.

The large language model participated in tasks alongside doctors of varying experience levels, aiming to diagnose 87 scenarios depicting eye problems in patients.

GPT-4 achieved “significantly better” scores than unspecialized junior doctors and earned similar scores to trainee and expert eye doctors.

Only what the researchers described as “top-performing” doctors score higher than the OpenAI model.

The researchers behind the test said models like GPT-4 won’t replace clinicians but could help improve workflows, triaging patients and providing patients with advice and information.

“We could realistically deploy AI in triaging patients with eye issues to decide which cases are emergencies that need to be seen by a specialist immediately, which can be seen by a GP, and which don’t need treatment,” said Dr Arun Thirunavukarasu, lead author of the study.

The tests pitted human doctors against GPT-4 on questions about eye problems like decreased vision, itchy eyes and extreme sensitivity to light. The questions used were taken from a textbook used by trainee eye doctors.

Other large language models were also used in the experiment, including OpenAI’s GPT-3.5, Google’s PaLM2 and LLaMA from Meta but it was GPT-4 that generated the more accurate responses.

Related:ChatGPT Passes Medical Board Exam

“The models could follow clear algorithms already in use, and we’ve found that GPT-4 is as good as expert clinicians at processing eye symptoms and signs to answer more complicated questions,” Thirunavukarasu said. “With further development, large language models could also advise GPs who are struggling to get prompt advice from eye doctors.”

The researchers noted that since concluding their study, more powerful models have emerged that “may be even closer to the level of expert eye doctors.”

GPT-4 was recently surpassed by GPT-4 Turbo as OpenAI’s most powerful large language model, however, both models are only available to premium ChatGPT users and enterprise customers.

GPT 3.5 powers the free version of ChatGPT but still holds enough medical knowledge in its training data that it was able to pass medical exams. Researchers published results last May that showed the base version of ChatGPT achieved a passing score on the three standardized tests that make up the U.S. Medical Licensing Exam.

The researcher’s idea of a large language model-powered tool providing patient advice came to fruition earlier this month. The World Health Organization built Sophie, an AI-powered avatar that offers advice on smoking, exercise and mental health.

Related:AI-Powered Health Assistant Unveiled by World Health Organization

About the Author(s)

Ben Wodecki

Jr. Editor

Ben Wodecki is the Jr. Editor of AI Business, covering a wide range of AI content. Ben joined the team in March 2021 as assistant editor and was promoted to Jr. Editor. He has written for The New Statesman, Intellectual Property Magazine, and The Telegraph India, among others. He holds an MSc in Digital Journalism from Middlesex University.

Keep up with the ever-evolving AI landscape
Unlock exclusive AI content by subscribing to our newsletter!!

You May Also Like