A visual walkthrough

How Large Language Models Work

From your typed sentence to a streamed reply — one layer at a time.

The pipeline, click to inspect

A prompt flows left→right. Tap a stage.

Select a stage above ↑

1 · Tokenization

LLMs don't see words — they see tokens, roughly 3–4 characters each. Type below:

0 tokens · each becomes a vector of ~thousands of numbers

2 · Embeddings

Each token is mapped to a point in high-dimensional space. Similar meanings cluster together. (2D projection below.)

3 · Attention

For each word, the model asks "which earlier words matter to me right now?" Hover a word to see what it attends to.

This is the "T" in GPT — the Transformer. Stacks of attention + feed-forward layers, ~100 deep in big models.

4 · Predict the next token

At its core the model does one thing: given the tokens so far, output a probability over every possible next token. Then sample one. Then repeat.

"The cat sat on the ___"

5 · How it learned all this

  1. Pretraining — show it trillions of tokens of internet text. For each position, ask "predict the next token." Nudge billions of parameters (dials) to be a little less wrong. Repeat for months on thousands of GPUs.
  2. Fine-tuning — show it examples of good instructions & answers so it stops just rambling and starts helping.
  3. RLHF — humans rank pairs of responses; the model learns to prefer the winners. This is where "helpful, harmless, honest" behavior comes from.

Pull on a thread →