Inside the model — How LLMs Work

Inside the network

Inside, it's a giant stack of tiny dials.

We've said the model "decides" and "pays attention." Time to look inside. There's no rulebook and no logic written by a human. There's just a big mathematical machine called a neural network.

A neural network is built from layers. The numbers for your tokens (from the last two lessons) flow in at one end, pass through layer after layer of simple arithmetic, and a prediction comes out the other end. Each layer takes the numbers from the layer before, mixes them, and passes the result on — gradually turning "raw word meanings" into "what should come next."

Watch the signal flow

A neural network, simplified

Numbers flow in on the left (your tokens), pass through layer after layer, and a prediction comes out the right. Every line is one weight — one of the billions of dials.

Tokens inHidden layersPrediction out

Real models are far bigger: around 50–120 layers, each far wider than this. Our picture has a handful of connections; a real one has billions.

So what are "parameters"? Every connection in that web has a number attached, called a weight (or parameter). "70 billion parameters" simply means 70 billion of these dials. They are the entire model — copy the dial settings and you've copied the model. They are not facts in a database; they're the settings that, working together, produce good next-word guesses.

Two things to hold onto

The dials are learned

Nobody sets them by hand

All those billions of weights start as random numbers — so a fresh model is pure gibberish. Training is the process of slowly turning every dial to a useful value. That's what the next several lessons are about.

Bigger = more capable

More dials, richer patterns

Think of the dials as storing rules of thumb the model discovered — "after a name, a verb often follows," "weather sentences tend to mention rain." No single dial holds a rule; millions working together do. More dials means room for more, finer rules.

So the whole model is just billions of dials that turn input numbers into a next-word guess. We've followed your words all the way in — next, let's watch the guess itself come out: the ranked list of likely words, and the dice it rolls to pick one.