AI Hallucinations: Why AI Makes Things Up

If you've spent any real time with ChatGPT, Claude, or Gemini, you've probably encountered this: the AI confidently tells you something that's wrong. Not vague, not uncertain — confidently, fluently, authoritatively wrong. A made-up citation. A plausible-sounding but nonexistent law. A historical "fact" that never happened. A software function that doesn't exist but is described with impeccable syntax.

This is called a hallucination, and it's one of the most important things to understand about how these systems actually work. It's not a bug that will get patched. It's a consequence of what language models fundamentally are.

Why It Happens: The Model Is Always Predicting

A language model doesn't store facts the way a database does. It doesn't have a list of true things it retrieves on demand. What it has is a massively compressed statistical representation of patterns in text — learned from reading vast amounts of human writing. When you ask it a question, it's not looking up an answer. It's predicting the most plausible continuation of your prompt, given everything it learned during training.

Think about what that means. The model is constantly answering the question: "Given what has been written, what text is most likely to follow this?" For most everyday language tasks, that works brilliantly. The most likely continuation of "Please explain photosynthesis" is, in fact, a good explanation of photosynthesis — because lots of accurate explanations exist in the training data.

But for specific factual claims — exact dates, precise statistics, specific citations, details of obscure events — "most likely continuation" and "true continuation" start to diverge. The model learns the shape of a plausible answer and fills in the specifics with high-probability tokens, which may or may not correspond to reality.

The Probability Problem, Illustrated

The visualization below shows what's happening at the token level. The model assigns a probability to every possible next word. Most of the time, the high-probability answer is correct. But the second-highest probability token can be very close to the first — and it might be wrong. The model doesn't "know" which one is true; it just knows which seems more likely based on patterns.

Token probability distribution when completing "The capital of Australia is ___." Canberra is most likely — but Sydney is a close second, and sometimes gets selected. The model has no internal alarm that fires when it's wrong.

The Australia example is instructive because many people don't know that the capital is Canberra, not Sydney. The model learned from human writing, which includes plenty of text incorrectly claiming Sydney is the capital, or simply discussing Sydney as the most prominent Australian city. The uncertainty in the model reflects real uncertainty (and real errors) in its training data.

Why the Model Sounds So Confident

This is the part that makes hallucinations genuinely dangerous: the model doesn't have an internal sense of "I'm not sure about this." It has no epistemic state that corresponds to human uncertainty. The same fluency and confidence that produces a correct answer also produces a wrong one. The model that says "The capital of Australia is Canberra" and the model that says "The capital of Australia is Sydney" are both doing exactly the same thing — producing the next token they believe is most likely. One happens to be right.

This is categorically different from a human saying something wrong. When a person states something confidently and incorrectly, they've usually made an active judgment that they know the answer. Language models aren't making that judgment. They're generating plausible text. Confidence is a property of the text, not of the model's internal state.

Modern models have been trained to express uncertainty more often — to say "I think" or "I'm not certain, but" — and that training helps. But it's a surface behavior, not a fundamental fix. The model can still hallucinate while saying "I believe."

Specific Patterns to Watch For

Not all hallucinations are equally likely. The terrain is reasonably predictable:

Citations and sources — This is the most reliable hallucination category. Ask a language model to cite sources, and it will often produce plausible-sounding but nonexistent papers, books, or articles. The author names might be real. The journal might be real. The paper doesn't exist. Always verify citations independently.
Recent events — Models have training cutoffs. Anything after the cutoff is unknown territory, and the model will try to fill that gap with pattern-matching rather than admitting ignorance. Be especially skeptical of claims about recent news, recent research, or current statistics.
Specific numbers — Statistics, percentages, counts, prices, dates. The model learns that answers to questions about statistics involve numbers, so it will provide a number. That number may be close to reality, or it may not. Don't trust exact figures without verification.
Code for obscure libraries — Language models are excellent at coding in well-known languages and frameworks. For obscure libraries, niche APIs, or anything less represented in the training data, they'll write syntactically plausible code that calls functions that don't exist.
Details about specific people — Biographical information about anyone not extremely famous is unreliable. The model has a sense of the "shape" of a biography and will fill in details it doesn't know.

How to Reduce Hallucinations in Practice

Practical techniques that help:

Ask it to say when it doesn't know. Explicitly instruct the model: "If you're not certain, say so. I'd rather have 'I don't know' than a guess." Modern models respond to this reasonably well.
Give it the facts, ask it to reason. Instead of "What were the Q3 results for Company X?" paste in the actual report and ask the model to analyze it. When you provide the source material, the model can't hallucinate the data — it can only interpret what you've given it.
Ask for confidence with citations. "Provide a confidence level for each claim, and cite the specific source you're drawing from." This forces the model to surface its own uncertainty and flags unsupported claims.
Use retrieval-augmented tools for facts. Perplexity, ChatGPT with web search, and similar tools pull actual sources before generating a response. For factual lookups, these are significantly more reliable than a raw model.
Verify anything that matters. Treat AI outputs on factual questions as a starting point, not a final answer. For anything where being wrong has real consequences — legal, medical, financial, technical — always check against authoritative sources.

Will This Get Fixed?

Partially, but not completely. The hallucination rate of frontier models has dropped significantly over the past few years — GPT-4o and Claude 3.7 hallucinate considerably less than their predecessors. Retrieval-augmented generation (RAG), where the model is given access to real-time information rather than relying solely on training data, has reduced factual errors for many use cases significantly.

But the fundamental architecture — predict the next token — doesn't have a natural mechanism for distinguishing "I know this" from "this sounds like what I should say here." Solving hallucinations completely would require either a fundamentally different architecture or a deep integration of external knowledge lookup into every factual claim the model makes. Significant progress is likely. Complete elimination is not.

In the meantime, the most reliable mental model for using AI well is this: trust it for reasoning, writing, and analysis. Verify it for specific facts. The combination of AI's strengths with your verification habit is more capable than either alone.