ChatGPT. Claude. Gemini. Llama. You've seen the names everywhere. They're all described as "large language models" — but what does that actually mean? What's a language model, what makes it large, and why does it seem to understand and write like a person? Let's answer those questions in plain English.

Start Here: What Is a Model?

In machine learning, a "model" is a program that has learned to do something by looking at examples, rather than being explicitly programmed with rules. You don't tell it: "if the sentence contains 'unfortunately,' it's probably bad news." Instead, you show it millions of examples of bad-news sentences, and it figures out the patterns itself.

A language model specifically has learned patterns in text. Not grammar rules, not a dictionary, but deep statistical patterns about how words and ideas relate to each other across billions of documents. What word tends to come after "the cat sat on the"? What concepts tend to appear together? What makes a sentence feel complete?

And large refers to scale — both the amount of data used for training (hundreds of billions of words) and the number of parameters in the model (billions to hundreds of billions of adjustable internal values, roughly analogous to the "settings" the model has learned).

The Core Task: Predicting the Next Word

At its heart, a large language model does one thing: given a sequence of text, predict what comes next. That's it. It assigns probabilities to every possible next token (a token is roughly a word or part of a word), and picks from the likely ones.

Each word arrives as a separate prediction. The model commits to one token at a time, then immediately predicts the next.

This sounds almost embarrassingly simple. But here's what's remarkable: to predict text well, you have to understand an enormous amount about the world.

To predict that "The capital of France is ___" ends with "Paris," you have to know geography. To predict the next line of a poem, you have to understand rhythm and rhyme. To predict what comes after "She was upset because ___," you need a model of human emotion and causality. A system that's learned to predict text very well has, in the process, absorbed a lot of implicit knowledge about how the world works.

This is why large language models seem so capable: they didn't learn facts explicitly, but they learned the patterns of how facts are expressed — and that turns out to encode a surprising amount of real-world understanding.

How Does the Model Actually Learn?

Training a large language model happens in phases. The first and most computationally expensive phase is pre-training: the model reads an enormous amount of text — web pages, books, code, Wikipedia, academic papers — and repeatedly tries to predict the next token. Every time it gets it wrong, the error is used to adjust its internal parameters slightly, nudging the model toward better predictions. This process happens billions of times across billions of examples.

After pre-training, the model is very good at predicting text but isn't particularly useful as an assistant yet — it might complete a question by generating more questions rather than answering. A second phase, called fine-tuning (often using a technique called Reinforcement Learning from Human Feedback, or RLHF), shapes the model to be helpful and to respond to instructions rather than just continue text. This is the step that turns a raw language model into something like ChatGPT or Claude.

The "Large" Part Really Matters

It's worth dwelling on the scale, because it's genuinely hard to internalize. GPT-3, which felt like a watershed moment when it launched in 2020, had 175 billion parameters and was trained on hundreds of billions of words. Modern frontier models are larger still — GPT-4, Claude 3, and Gemini Ultra are estimated to have hundreds of billions to over a trillion parameters, trained on trillions of words.

What scale buys you isn't just more knowledge — it's qualitatively different capabilities. Researchers have observed that certain abilities only emerge once models pass certain size thresholds. Below a certain scale, a model can't reliably do multi-step arithmetic. Above it, the ability just appears. This "emergent behavior" from scale was one of the surprising findings of the last few years and is still not fully understood theoretically.

What LLMs Are Good At

Large language models excel at tasks that involve understanding or generating natural language:

  • Writing and editing — drafting, summarizing, rewriting in different tones
  • Question answering — explaining concepts, answering factual questions (with caveats, below)
  • Translation — across virtually every major language pair
  • Code generation — writing, explaining, and debugging code in most programming languages
  • Reasoning through problems — breaking down complex questions, working through arguments step by step
  • Classification and extraction — labeling text, pulling structured information from unstructured documents

What LLMs Are Bad At

The limitations are just as important to understand as the capabilities.

They sometimes make things up. This is the hallucination problem. Because the model is predicting plausible text, it can generate text that sounds factual but isn't. It doesn't "know" it's wrong — it has no internal ground truth to check against, just pattern completion. For critical information, always verify.

They have a knowledge cutoff. Pre-training happens once (or periodically). A model's knowledge ends at its training cutoff date. It doesn't know what happened last week unless it's been given tools to search the web or its training data has been recently updated.

They don't do precise computation natively. A language model predicting digits of arithmetic can get it right for simple cases (it's seen lots of arithmetic in training data) but becomes unreliable for complex calculations. When you see AI getting math right reliably, it's typically because it's been given a code execution tool to run the computation rather than predicting the answer directly.

They have no persistent memory by default. Each conversation starts fresh. The model has no memory of previous conversations unless that history is explicitly included in the context window of the current session.

Why This Matters for How You Use Them

Understanding what an LLM actually is changes how you interact with one productively. It's not a search engine, and treating it like one will disappoint you — it doesn't look things up, it generates plausible responses. It's not a person, so it doesn't actually "understand" in the way you do — but it has learned patterns that look like understanding across an astonishing range of topics.

The most useful mental model is something like: a very well-read collaborator who can help you think, write, and work — but who sometimes confidently says things that are wrong, and who you should verify rather than blindly trust for facts that matter.

That's not a limitation that makes LLMs useless. It's just an accurate description of what they are — and knowing it makes you a better user of them.