What if AI stopped making things up?

As chatbots powered by artificial intelligence become more ingrained in our everyday lives, people are increasingly using them to help diagnose their medical concerns: should I be worried about this rash? What if this insect bite gets infected? Is this pain the symptom of a larger problem? When dealing with someone’s health, the answers need to be as accurate as possible.

AI chatbot (2)
Image by Newfangled Mind | Magnific

Last year, Binghamton University researchers tested Open AI’s ChatGPT, and it showed high accuracy in identifying disease terms, drug names, and genetic information. However, the AI bot also generated a high number of false ‘hallucinations.’ A follow-up study may have found a way to eliminate that confidently delivered but fake information.

Ahmed Abdeen Hamed, a research fellow for the Thomas J Watson College of Engineering and Applied Science’s School of Systems Science and Industrial Engineering, collaborated with George J Klir Professor of Systems Science, Luis M Rocha, to develop an innovative verification method, and the journal STAR Protocols recently published their conclusions.

The new protocol harnesses the growing number of open-source AI options, each of which has a different way to arrive at an answer to an inquiry. Hamed and Rocha chose seven of these large language models and forced them to use retrieval-augmented generation (RAG), which required them to reference an authoritative database of medical terminology before giving a response.

Over 10,000 experiments, the seven chatbots all received the same plain-language symptoms, and each of them came up with what it thought were the medical terms for them, complete with an official identification number. Then the bots put the answers up for a ‘vote.’

The result: 76.85 per cent of the answers were supported by at least four LLMs, and the remaining 23.15 per cent were supported by at least two. No unmatched terms, and no hallucinations.

“The new workflow is incredible”, Hamed said, “because it can verify anything from a biomedical point of view: biological knowledge with disease and genetics; translational knowledge from diseases to treatments and clinical trials; and also from a healthcare point of view with symptoms and treatments.”

A big advantage of this new protocol is that it can be reproduced in a near-infinite number of permutations to reinforce its accuracy. “There can be 100 large language models that are open source, and every time we can perform an experiment with seven LLMs selected at random from that list,” Hamed said. “When we perform the experiment many, many times, we increase the confidence in the voting.”

Rocha said the protocol is an important step toward increasing confidence in large, multiscale network models of disease, which is a key topic for his Complex Adaptive Systems and Computational Intelligence Lab at Binghamton.

Among the research is the development of digital twins for precision medicine. These dynamic, virtual replicas of physical processes are continuously updated using AI and real-time data to create precise, predictive simulations of human reactions so that healthcare providers can optimise outcomes before real-world testing.

“For instance, the protocol can extract and provide multi-agent verification of evidence for an adverse drug reaction for a given medication that is available in clinical trials, the scientific literature, pharmacological databases, and even social media discourse,” Rocha said. “And it can assist in the extraction of evidence at multiple scales, from multiomics to epidemiological and behavioural data sources, which we have already started to pilot by building multi-layer models of ER+ breast cancer.”

Although the study centred on biomedical applications, the Binghamton team’s discovery could be used to curb or eliminate other kinds of LLM hallucinations, such as fabricated legal citations, fake academic citations, or blatant historical errors. “This protocol is a big step toward the democratisation of knowledge verification,” Hamed concluded.

DOI: 10.1016/j.xpro.2026.104533

Tweet Post Post

Crisis Response Journal

Tweets by @CRJ_reports

News and Blogs

When conflict is at your doorstep

June 2026: Erik de Soir explores how individuals, families, expatriates, and organisations can prepare for uncertainty, drawing lessons from the recent US-Iran conflict

What if AI stopped making things up?

June 2026: A new study by researchers from Binghamton University, USA, tested a voting system that reduces ‘hallucinations’ in chatbots by forcing AI models to verify each other’s answers

Wind and solar power the UK through gas price crisis

June 2026: The UK has avoided the need for gas imports worth £1.7bn since the start of the US-Iran conflict as a result of record electricity generation from wind and solar, revealed a new Carbon Brief analysis

A critical point for communication

June 2026: Amanda Coleman examines how events, such as the Belfast knife attack and the Nottingham attack public inquiry, fuel misinformation and erode public trust, and what this means for crisis communicators in the UK

Can tourism really be sustainable?

June 2026: A new sustainability index created by researchers from the Pennsylvania State and West Virginia Extension Universities found that tourism’s sustainability often trades economic gains for housing pressure, pollution, and safety

The far right is rising across Europe. Will Ukraine pay the price?

June 2026: Adam Simpson looks at why upcoming elections in France and Germany could shape the future of European military and financial support for Ukraine