While heralded by some as revolutionary new technology, AI-driven tools and services have faced more than their fair share of teething problems in the initial stages of their rollouts.
Generative AI in particular has faced scorn from users due to the frequency of providing false information through “hallucinations”.
Generative AI?
Generative artificial intelligence is a type of AI that takes in a large ‘training set’ to learn patterns before deploying images, text or other outputs with similar patterns.
Large tech companies are investing billions of dollars in developing generative AI systems such as ChatGPT, Stable Diffusion and Sora.
Seeing Things
Large language models (LLMs), which can generate text, are prone to a phenomenon, dubbed “hallucination”, in which they include inaccuracies in the text they output.
Hallucinations are said to occur when the training data is biased or incomplete, and a 2023 analyses have found factual errors in 46% of LLM responses.
Glue Pizza
A famous example of a chatbot hallucination recently was when Google’s search AI overview recommended using glue on a pizza base.
The recommendation, which may have come from a joke posted on Reddit, was in response to a question asking how to prevent the toppings from sliding off pizza slices.
Google Working on a Fix
Google publicly stated that they are still working on improvements to their AI overviews.
The head of Google Search, Liz Reid, said that they are making improvements designed to reduce the number of incorrect answers, such as better factuality detection mechanisms.
Sophisticated Autocorrect?
Google’s AI overview uses a customized version of Gemini, an LLM that was developed in-house by Google.
At the stage of generating text, LLMs predict the next most likely word, or token, in the sequence, similar to how a phone keyboard’s autocorrect may suggest the next word while typing.
No Core Truth
The LLM method of picking the most likely next word in the sequence leaves it with a fatal weakness.
There is no deep truth that the language models are calling on, simply a cold, statistical calculation of language frequency.
Partial Workaround
This weakness can be combatted by directing the LLM to check specific sources for information before generating a response to a query.
Although Google has never confirmed this directly, this method is likely what the modified Gemini model uses to attempt to maintain the accuracy of its responses.
Two Elements Coming Together
In generating a response, an LLM must perform two actions simultaneously: retrieval of information; and actual generation of the response.
One of the key difficulties with creating an accurate final output is getting both of these right and putting them together.
Generation Part Does Not Question Retrieval Part
In the case of Gemini’s ‘glue pizza’ mistake, the text generation worked well, but the information it was based on was a joke, and it had no way of separating what was relevant (to a search) from what was true.
A professor that specializes in online searches from the University of Washington, told MIT Technology Review: “Just because it’s relevant doesn’t mean it’s right, and the generation part of the process doesn’t question that.”