Inside, it's a giant stack of tiny dials.
We've said the model "decides" and "pays attention." Time to look inside. There's no rulebook and no logic written by a human. There's just a big mathematical machine called a neural network.
A neural network is built from layers. The numbers for your tokens (from the last two lessons) flow in at one end, pass through layer after layer of simple arithmetic, and a prediction comes out the other end. Each layer takes the numbers from the layer before, mixes them, and passes the result on — gradually turning "raw word meanings" into "what should come next."
A neural network, simplified
Numbers flow in on the left (your tokens), pass through layer after layer, and a prediction comes out the right. Every line is one weight — one of the billions of dials.
Real models are far bigger: around 50–120 layers, each far wider than this. Our picture has a handful of connections; a real one has billions.
Two things to hold onto
Nobody sets them by hand
All those billions of weights start as random numbers — so a fresh model is pure gibberish. Training is the process of slowly turning every dial to a useful value. That's what the next several lessons are about.
More dials, richer patterns
Think of the dials as storing rules of thumb the model discovered — "after a name, a verb often follows," "weather sentences tend to mention rain." No single dial holds a rule; millions working together do. More dials means room for more, finer rules.
So the whole model is just billions of dials that turn input numbers into a next-word guess. We've followed your words all the way in — next, let's watch the guess itself come out: the ranked list of likely words, and the dice it rolls to pick one.