A recent investigation by The New York Times has revealed that while Google’s AI Overviews maintain a high accuracy rate, the feature likely produces millions of errors daily due to the sheer volume of global search traffic. The study, conducted using the SimpleQA evaluation benchmark, found that approximately one in ten AI-generated summaries contains false information. Given that Google processes nearly five trillion queries annually, this error rate could expose users to over 57 million inaccurate responses every hour.
Google has challenged the findings, with spokesperson Ned Adriance stating, “This study has serious holes.” The tech giant argued that the analysis relied on a flawed benchmark test containing its own inaccuracies. Despite this, the system has shown improvement; accuracy rose from 85% under Gemini 2.5 to 91% following the Gemini 3 update.
The report follows previous public criticism, such as an incident involving an Air India crash, where Air India misidentified the aircraft model. In response to such “hallucinations,” Google noted that it rigorously updates its systems and maintains that “the accuracy rate for AI Overviews is on par with other features like Featured Snippets.” However, the shift from information curator to publisher continues to place intense scrutiny on the search giant’s factual reliability.

