A visual walkthrough
How Large Language Models Work
From your typed sentence to a streamed reply — one layer at a time.
The pipeline, click to inspect
A prompt flows left→right. Tap a stage.
Select a stage above ↑
1 · Tokenization
LLMs don't see words — they see tokens, roughly 3–4 characters each. Type below:
0 tokens · each becomes a vector of ~thousands of numbers
2 · Embeddings
Each token is mapped to a point in high-dimensional space. Similar meanings cluster together. (2D projection below.)
3 · Attention
For each word, the model asks "which earlier words matter to me right now?" Hover a word to see what it attends to.
This is the "T" in GPT — the Transformer. Stacks of attention + feed-forward layers, ~100 deep in big models.
4 · Predict the next token
At its core the model does one thing: given the tokens so far, output a probability over every possible next token. Then sample one. Then repeat.
"The cat sat on the ___"
5 · How it learned all this
- Pretraining — show it trillions of tokens of internet text. For each position, ask "predict the next token." Nudge billions of parameters (dials) to be a little less wrong. Repeat for months on thousands of GPUs.
- Fine-tuning — show it examples of good instructions & answers so it stops just rambling and starts helping.
- RLHF — humans rank pairs of responses; the model learns to prefer the winners. This is where "helpful, harmless, honest" behavior comes from.