Even premium AI tools distort the news and fabricate links – these are the worst

AI tools and news just don’t seem to mix — even at the premium tier. 

New research from Columbia’s Tow Center for Digital Journalism found that several AI chatbots often misidentify news articles, present incorrect information without any qualification, and fabricate links to news articles that don’t exist. The findings build on initial research Tow published in November, which showed ChatGPT Search misrepresenting content from publishers with little to no awareness it might be wrong. 

The trend isn’t new. Last month, BBC found that ChatGPT, Gemini, Copilot, and Perplexity chatbots struggled to summarize news stories accurately, instead delivering “significant inaccuracies” and “distortions.” 

Moreover, the Tow report found new evidence that many AI chatbots can access content from sites that block its crawlers. Here’s what to know and which models prove the least reliable.

Tow researchers randomly chose 10 articles each from 20 publishers. They queried eight chatbots with article excerpts, asking the AI to return the headline, publisher, date, and URL of the corresponding article. 

“We deliberately chose excerpts that, if pasted into a traditional Google search, returned the original source within the first three results,” the researchers note.

image2
Columbia Journalism Review

After running the 1,600 queries, researchers ranked chatbot responses based on how accurately they retrieved the article, publisher, and URL. The chatbots returned wrong answers to over 60% of the queries. Within that, results varied by chatbot: Perplexity got 37% of the queries wrong, while Grok 3 weighed in at 94% errors.

image6
Columbia Journalism Review

Why does this matter? If chatbots are worse than Google at correctly retrieving news, they can’t necessarily be relied upon to interpret and cite that news — which makes the content of their responses, even when linked, much more dubious. 

Researchers note the chatbots returned wrong answers with “alarming confidence,” tending not to qualify their results or admit to knowledge gaps. ChatGPT “never declined to provide an answer,” despite 134 of its 200 responses being incorrect. Out of all eight tools, Copilot declined to answer more queries than it responded to. 

Source: https://www.zdnet.com/