How Google AI Ended Up Telling Users to Make a Glue Pizza

While heralded by some as revolutionary new technology, AI-driven tools and services have faced more than their fair share of teething problems in the initial stages of their rollouts.

Generative AI in particular has faced scorn from users due to the frequency of providing false information through “hallucinations”.

Generative AI?

Alan Frijns/Unsplash (AI generated)

Generative artificial intelligence is a type of AI that takes in a large ‘training set’ to learn patterns before deploying images, text or other outputs with similar patterns.

Large tech companies are investing billions of dollars in developing generative AI systems such as ChatGPT, Stable Diffusion and Sora.

Seeing Things

Pietro Jeng/Unsplash

Large language models (LLMs), which can generate text, are prone to a phenomenon, dubbed “hallucination”, in which they include inaccuracies in the text they output.

Hallucinations are said to occur when the training data is biased or incomplete, and a 2023 analyses have found factual errors in 46% of LLM responses.

Glue Pizza

Jamie Evawin/Unsplash

A famous example of a chatbot hallucination recently was when Google’s search AI overview recommended using glue on a pizza base.

The recommendation, which may have come from a joke posted on Reddit, was in response to a question asking how to prevent the toppings from sliding off pizza slices.

Google Working on a Fix

Unseen Studio/Unsplash

Google publicly stated that they are still working on improvements to their AI overviews.

The head of Google Search, Liz Reid, said that they are making improvements designed to reduce the number of incorrect answers, such as better factuality detection mechanisms.

Sophisticated Autocorrect?

Linkedin Sales Solutions/Unsplash

Google’s AI overview uses a customized version of Gemini, an LLM that was developed in-house by Google.

At the stage of generating text, LLMs predict the next most likely word, or token, in the sequence, similar to how a phone keyboard’s autocorrect may suggest the next word while typing.

No Core Truth

Karen Alsop/Unsplash

The LLM method of picking the most likely next word in the sequence leaves it with a fatal weakness.

There is no deep truth that the language models are calling on, simply a cold, statistical calculation of language frequency.

Partial Workaround

Sean Foster/Unsplash

This weakness can be combatted by directing the LLM to check specific sources for information before generating a response to a query.

Although Google has never confirmed this directly, this method is likely what the modified Gemini model uses to attempt to maintain the accuracy of its responses.

Two Elements Coming Together

Bozhin Karaivanov/Unsplash

In generating a response, an LLM must perform two actions simultaneously: retrieval of information; and actual generation of the response.

One of the key difficulties with creating an accurate final output is getting both of these right and putting them together.

Generation Part Does Not Question Retrieval Part

Brett Jordan/Unsplash

In the case of Gemini’s ‘glue pizza’ mistake, the text generation worked well, but the information it was based on was a joke, and it had no way of separating what was relevant (to a search) from what was true.

A professor that specializes in online searches from the University of Washington, told MIT Technology Review: “Just because it’s relevant doesn’t mean it’s right, and the generation part of the process doesn’t question that.”

How Google AI Ended Up Telling Users to Make a Glue Pizza

French-Russian Billionaire Arrested on Serious Charges

Dismissed Juror Describes Seeing Trump on Trial in One Word

These Two Royals Will Take King Charles and Queen Camilla’s Place As He Undergoes Cancer Treatment

This Royal Revives a Royal Birthing Tradition After 50 Years