How it learns — How LLMs Work

The learning loop

Learning is just guess, check, nudge — a trillion times.

We have billions of random dials (the network) and endless fill-in-the-blank questions (the data). "Learning" is what connects them: one simple loop, repeated at a scale that's hard to imagine.

Step 1 — Guess

Predict the next word

Show the model a snippet of real text with the next word hidden. It outputs a probability for every possible next token — its guess.

Step 2 — Check

Measure how wrong (the loss)

Compare the guess to the real next word. A single number, the loss, captures how wrong it was. Confident and right = low loss; confident and wrong = high loss.

Step 3 — Nudge

Adjust every dial a little

Work out, for each of the billions of dials, which direction would have made this guess slightly better — then nudge every one a tiny step that way.

Step 4 — Repeat

Do it again. And again.

Move to the next snippet and repeat — across trillions of tokens. Each nudge is tiny; the sheer number of them is what does the work.

Run a training step

Watch one example being learned

The model sees "The sky is ___" and must predict the next word. The right answer is "blue". Press Train step to nudge its dials, and watch the guess improve while the loss falls.

Its guesses after "The sky is"

Loss — how wrong0.00

the ball rolls downhill as it learns

Training step 0 of 8

"Nudge every dial downhill." Picture a landscape where height = how wrong the model is — the little valley in the demo above. You can imagine one dial as a ball rolling to the bottom. Now picture doing that for a billion dials at once, each nudged toward its own downhill — that's gradient descent. The trick that works out the downhill direction for all billion in a single pass, like an accountant settling every account at once, is called backpropagation. That one idea makes training possible at all.

Each nudge is microscopic, and there are trillions of them, over trillions of words. The real surprise isn't the effort — it's what all that blind guessing quietly builds. The model is never told a single rule of grammar, and never handed a list of facts, yet it ends up with both — plus a knack for simple reasoning. That payoff is the next lesson.