Hallucination

Been a while.

This is my AI/LLM ChatGPT WOW AGI IS KNOCKING ON OUR FRONT DOOR post. ((((Maybe it will get me clicks???))))

I felt like writing down my description of what a hallucination is in an LLM, as the way I’ve started thinking about it is different to what I hear most places, and I think it conveys something interesting about programming, engineering and problem solving in general.

So what do people usually define as a hallucination? Here’s the definition given by ChatGPT in response to the prompt “Please explain what it means for a language model to hallucinate.”.

When we say that a language model “hallucinates”, we are referring to a situation where the model generates content that is not grounded in reality or is not consistent with the input it has received. Hallucination in the context of language models, like GPT-3, typically means generating text that is imaginative, fictional, or entirely made up rather than being based on the information it has been trained on.

Language models like GPT-3 are trained on diverse datasets containing a wide range of text from the internet. While they are excellent at generating coherent and contextually relevant text, they may occasionally produce outputs that seem plausible but are not accurate or factual. This can happen because the model lacks real-world knowledge and doesn’t have an understanding of facts in the way humans do. Instead, it relies on statistical patterns learned during training.

Hallucination can also occur when a model is prompted with ambiguous or incomplete input, leading it to fill in the gaps with speculative or incorrect information. Additionally, if the model encounters rare or out-of-distribution queries, it might generate responses that sound reasonable but are not based on actual knowledge.

It’s important to note that hallucination is unintentional and is a limitation of current language models. Researchers are actively working on improving the accuracy and reliability of these models to reduce instances of hallucination and enhance their real-world applicability.

The other relevant piece of information is, what is a language model such as ChatGPT doing? And more specifically, what was it trained to do?

Language models, of any kind, not necessarily a transformer based one, have traditionally been trained to predict the next token given a sequence of previous tokens. This token may be a single character, it may be 3, it may be the regular expression [a-zA-Z]+, it may be a 256x256 grid of pixels. In the case of the GPTs, the token is a character. This means that when it is generating text, it is sampling from a probability distribution that it has of the next most likely character given the previous ones.

Unsurprisingly, it was trained to accurately predict these next characters by feeding strings into a big math function that spits out a single character, checking that the char it returned was the right one by comparing it to what was in the full sequence (that is part of the dataset it was trained on), and then changing the values of the parameters in the function such that the accuracy increased (in particular the average accuracy on randomly shuffled chunks of the data, computed via cross entropy).

Just like all other programs, the computer program that calculates the result of the big math function is deterministic I.E. same answer for the same input every time. No magic is going on, just very simple operations on very large quantities of numbers (in fact the number of operations can be effectively reduced to just 5).

Right, summary done.

This idea may seem ridiculous to you, that predicting the next most likely character in a sequence could possibly do what ChatGPT can. ChatGPT answers questions about the real world right? It can understand what I’m saying right? Sure it can, but what about those hallucinations? Maybe floating point error? Maybe a neutrino slammed into one of the transistors in the GPU in one of Sam Altman’s server racks and flipped a bit triggering a butterfly effect that makes it tell you that there’s a finite number of primes or something.

Okay okay sorry, those are all straw men. Usually what’s pointed to is the fact that transformer based networks used to make LLMs are absolutely enormous. 140GB for a 70B LLaMa. And that with this enormous size we have created an inscrutable computer program who’s inner workings are a mystery to us — While the matrices that make up a LLaMa 70B are definitely inscrutable, we definitely know they are matrices, we definitely know what operations are done to them in the pytorch program that runs the model, and, more importantly, we definitely how they were trained and what goal was being optimized for.

As was explained in the summary, they are doing next character prediction given a context of some huge number of previous characters. Why?

You have to do very little cleanup, meaning you have a huge dataset to work with that you basically need only download from the web.
Cross entropy on 2 probability distributions is a very simple loss function. Meaning the goal of the network is a very simple one. Easy to test performance, easy optimize for.

What we actually want these language models to do is learn language. Learn to think and express their thoughts through language. Since no one knows how to jump straight up to that level of abstraction, they try and find a simpler goal that hopefully approximates the more abstract one. And it turns out that with 70B float16 and 32KB of context, cross entropy loss on a distribution for predicting the next character does surprisingly well.

As ChatGPT kindly told us:

While [language models] are excellent at generating coherent and contextually relevant text, they may occasionally produce outputs that seem plausible but are not accurate or factual. This can happen because the model lacks real-world knowledge and doesn’t have an understanding of facts in the way humans do. Instead, it relies on statistical patterns learned during training.

So the hallucination happens because it doesn’t actually know what it’s talking about, it doesn’t actually know anything apart from the probability distribution is encodes. I think there’s a deeper level to this.

You can separate bugs into 2 classes: actual errors in the algorithm that maybe produce crashes or infinite loops or bad behaviour loosely defined; mismatches between the design of the program and the intended use or expectations of what the program is capable of. The second class can be thought of almost as bugs in the programmer, not in the program. So basically a flowery way of saying “you done goofed”.

I have a practical example. Around April I was writing the type checker for a compiler (and still am, slowly XD). The language has a compile time execution feature, which allows you to, among other things, have a function that returns a type at compile time that you can then bind to a variable name. So I’m there, testing my type checker, and I get something along the lines of:

error: attempt to assign thing of type 'int' to variable of type 'Type'
line 28, col 0 >>> i: make_int_type() = 0;

In the language semantics, this is supposed to run make_int_type() and bind the type int to i. The problem is I haven’t written the compile time execution engine yet. I’m an idiot.

After thinking it through I realized that the function couldn’t return the proper thing because there was nothing written for running functions during compilation, but I’d forgotten about that because I was focusing on how I was interpreting the language code as I read it, not remembering that there is a gap between what I can understand and what my compiler at the time (and currently) could understand.

This story is just an example of me being stupid, but there’s a parallel. When we read the natural human language the LLM generates, we often forget that it is being outputted by something that only knows a sophisticated statistical relationship between characters. That doesn’t know any facts about the real world, that they were never specifically designed to understand.

The hallucination is in reality a mismatch in our expectation of the capabilities of a transformer trained to minimize cross entropy loss, and it’s actual capabilities (and also probably of the quality of the training process and the data etc.). Furthermore, there actually is a hallucination happening, but it isn’t the model hallucinating fake stuff — it’s the user hallucinating their experience of the model thinking.

My guess is the performance versus size curve of LLMs that predict the next character using the same designs as are currently used, even in GPT4, is logarithmic. Naive scaling of the number of parameters of language models eventually hit’s diminishing returns. In order to more effectively reach the goal of having a machine that can reason a satisfying explanation, you probably need to incorporate the sort of reinforcement learning techniques used in MuZero, maybe even using transformers to predict p and v. If Q* is a real thing, maybe it’s something like that. Add another network that learns to page data in and out of the language models context. Retrieval (file/database) systems. All these parts assembled into an agent that has learned a model of the features of it’s environment. Increasing the model size is like the handcoded planners in autonomous driving systems. It’s a simple trick that gets you a decent amount very quickly, but the gains aren’t sustained long term. You need more actual new ideas. What’s the right loss function? And to what structures and data do you apply it? What is the right lower level goal to set for your model, that approximates the high level goal that you wish it to achieve? Just like spam programming doesn’t work long-term, neither does inflating a blob of weights.