Turning words into numbers

Every token is turned into a list of numbers.

Computers only do maths. So each token is converted into a long list of numbers (hundreds, sometimes thousands) that captures its meaning. The magic trick: words with similar meanings end up with similar numbers — so they sit close together on a "map of meaning."

What those numbers actually are

It sounds abstract — "a list of numbers for a word" — so here is the intuition. Imagine scoring every word on a few traits: how royal is it? is it a living thing? how big? how feminine? Give each a score out of 10 and you get a little list, for example king = [9, 9, 6, 1]. That list is the word's vector (also called its embedding — the word the glossary uses). A real model uses hundreds of traits instead of four, and it works them out by itself rather than being handed tidy labels — but the idea is exactly this: a word becomes a row of scores.

Once words are score-lists, "similar meaning" gets a concrete test: do their scores line up? Two words are alike when they score high and low on the same traits. That is all similarity is — no heavy maths, just how well two lists match. Pick two words and see:

Compare two meanings

Similar words, similar numbers

Each word is scored on four everyday traits — that row of scores is its vector. Choose a word for each colour; their bars sit together so you can see where they line up, and the overall match becomes one similarity score.

Word 1
Word 2

Now stretch that from four traits to hundreds. You can't draw it any more, but you can still measure how closely any two words match. Squash it down to a flat sketch and the payoff shows up — similar words really do land near each other:

Click a word

The map of meaning

Each word's numbers place it somewhere on this map. Click any word to see its nearest neighbours light up. Notice the model was never told these groups — it worked them out from how words are used.

Tip: click "king", then "pizza", then "robot" — each word's spot is really just a list of numbers.

The famous party trick: because meaning is now maths, you can do arithmetic with it. Take the numbers for king, subtract man, add woman — adding and subtracting the lists slot by slot, like lining up columns of numbers — and you land right next to queen. It doesn't land perfectly for every word, but it shows the model captured ideas like royalty and gender as directions you can move along — without anyone ever programming a dictionary.

So now every word is a point in a vast space of meaning. But a word's meaning shifts depending on the words around it — "bank" by a river vs. a bank with money. How does the model handle that? That's the next breakthrough.