Scaling up — How LLMs Work

Scaling up

Why it takes thousands of computers and months.

Each nudge is microscopic, and there are trillions of them, over trillions of tokens. Doing that in any reasonable time takes staggering computing power — and one discovery shaped the whole field: to get a better model, mostly just make everything bigger.

Hardware

Thousands of GPUs

Training runs on specialised chips (GPUs) built for massive parallel math — frontier models use tens of thousands at once.

Time

Weeks to months

A single training run for a large model takes many weeks of those chips running non-stop, day and night.

Cost

Tens of millions+

Between the hardware, electricity, and data work, the biggest runs cost tens to hundreds of millions of dollars — why only a few labs build them.

The scaling law

Around 2020, researchers noticed something almost suspiciously reliable: add more parameters, more data, and more compute (computing power), and the model gets predictably better — following a smooth curve. Progress stopped being only about clever new ideas and became, in large part, about scaling up. Drag the dial and watch what grows:

Slide to scale up

From a toy model to a frontier model

Emergent abilities. The eerie part of scaling: some skills don't improve smoothly — they're basically absent in small models, then switch on once the model is big enough. Following instructions from a couple of examples, doing multi-step arithmetic, basic reasoning — these tend to appear suddenly past a certain size. Capabilities nobody explicitly trained for fall out of "just predict the next word, but bigger." (Researchers do debate how sharp these jumps really are — some of the suddenness comes from how we test the skills — but the broad pattern of new abilities arriving with scale is real.)

Bigger isn't the whole story anymore

Pure size has limits — there's only so much high-quality text, and cost and energy are real. So recent progress also comes from cleaner data, more efficient designs, and a newer trick: letting a model think for longer at answer time (the "reasoning" models). But the scaling era is why today's models are so much more capable than those of just a few years ago.

After all this, you have a giant, powerful model. But it has one big problem: it's still just an autocomplete. Ask it a question and it might continue with more questions. Next, how it's turned into a helpful assistant.