Every token is turned into a list of numbers.
Computers only do maths. So each token is converted into a long list of numbers (hundreds, sometimes thousands) that captures its meaning. The magic trick: words with similar meanings end up with similar numbers — so they sit close together on a "map of meaning."
What those numbers actually are
It sounds abstract — "a list of numbers for a word" — so here is the intuition. Imagine scoring every word on a few traits: how royal is it? is it a living thing? how big? how feminine? Give each a score out of 10 and you get a little list, for example king = [9, 9, 6, 1]. That list is the word's vector (also called its embedding — the word the glossary uses). A real model uses hundreds of traits instead of four, and it works them out by itself rather than being handed tidy labels — but the idea is exactly this: a word becomes a row of scores.
Once words are score-lists, "similar meaning" gets a concrete test: do their scores line up? Two words are alike when they score high and low on the same traits. That is all similarity is — no heavy maths, just how well two lists match. Pick two words and see:
Similar words, similar numbers
Each word is scored on four everyday traits — that row of scores is its vector. Choose a word for each colour; their bars sit together so you can see where they line up, and the overall match becomes one similarity score.
Now stretch that from four traits to hundreds. You can't draw it any more, but you can still measure how closely any two words match. Squash it down to a flat sketch and the payoff shows up — similar words really do land near each other:
The map of meaning
Each word's numbers place it somewhere on this map. Click any word to see its nearest neighbours light up. Notice the model was never told these groups — it worked them out from how words are used.
Tip: click "king", then "pizza", then "robot" — each word's spot is really just a list of numbers.
So now every word is a point in a vast space of meaning. But a word's meaning shifts depending on the words around it — "bank" by a river vs. a bank with money. How does the model handle that? That's the next breakthrough.