Close This site uses cookies. If you continue to use the site you agree to this. For more details please see our cookies policy.

Search

Type your text, and hit enter to search:

What if AI stopped making things up? 

As chatbots powered by artificial intelligence become more ingrained in our everyday lives, people are increasingly using them to help diagnose their medical concerns: should I be worried about this rash? What if this insect bite gets infected? Is this pain the symptom of a larger problem? When dealing with someone’s health, the answers need to be as accurate as possible.

AI chatbot (2)
Image by Newfangled Mind | Magnific

Last year, Binghamton University researchers tested Open AI’s ChatGPT, and it showed high accuracy in identifying disease terms, drug names, and genetic information. However, the AI bot also generated a high number of false ‘hallucinations.’ A follow-up study may have found a way to eliminate that confidently delivered but fake information.

Ahmed Abdeen Hamed, a research fellow for the Thomas J Watson College of Engineering and Applied Science’s School of Systems Science and Industrial Engineering, collaborated with George J Klir Professor of Systems Science, Luis M Rocha, to develop an innovative verification method, and the journal STAR Protocols recently published their conclusions.

The new protocol harnesses the growing number of open-source AI options, each of which has a different way to arrive at an answer to an inquiry. Hamed and Rocha chose seven of these large language models and forced them to use retrieval-augmented generation (RAG), which required them to reference an authoritative database of medical terminology before giving a response.

Over 10,000 experiments, the seven chatbots all received the same plain-language symptoms, and each of them came up with what it thought were the medical terms for them, complete with an official identification number. Then the bots put the answers up for a ‘vote.’

The result: 76.85 per cent of the answers were supported by at least four LLMs, and the remaining 23.15 per cent were supported by at least two. No unmatched terms, and no hallucinations. 

“The new workflow is incredible”,  Hamed said, “because it can verify anything from a biomedical point of view: biological knowledge with disease and genetics; translational knowledge from diseases to treatments and clinical trials; and also from a healthcare point of view with symptoms and treatments.”

A big advantage of this new protocol is that it can be reproduced in a near-infinite number of permutations to reinforce its accuracy. “There can be 100 large language models that are open source, and every time we can perform an experiment with seven LLMs selected at random from that list,” Hamed said. “When we perform the experiment many, many times, we increase the confidence in the voting.”

Rocha said the protocol is an important step toward increasing confidence in large, multiscale network models of disease, which is a key topic for his Complex Adaptive Systems and Computational Intelligence Lab at Binghamton. 

Among the research is the development of digital twins for precision medicine. These dynamic, virtual replicas of physical processes are continuously updated using AI and real-time data to create precise, predictive simulations of human reactions so that healthcare providers can optimise outcomes before real-world testing.

“For instance, the protocol can extract and provide multi-agent verification of evidence for an adverse drug reaction for a given medication that is available in clinical trials, the scientific literature, pharmacological databases, and even social media discourse,” Rocha said. “And it can assist in the extraction of evidence at multiple scales, from multiomics to epidemiological and behavioural data sources, which we have already started to pilot by building multi-layer models of ER+ breast cancer.”

Although the study centred on biomedical applications, the Binghamton team’s discovery could be used to curb or eliminate other kinds of LLM hallucinations, such as fabricated legal citations, fake academic citations, or blatant historical errors. “This protocol is a big step toward the democratisation of knowledge verification,” Hamed concluded.

DOI: 10.1016/j.xpro.2026.104533

    Tweet       Post       Post
Oops! Not a subscriber?

This content is available to subscribers only. Click here to subscribe now.

If you already have a subscription, then login here.