Going further — How LLMs Work

Optional · going further

Two ideas behind the tools you actually use.

You've got the whole core now: a next-word engine, trained, tuned, and run on your words. This page is a bonus — skip it freely. But two extensions turn that plain engine into most of the AI products around you, and the lovely part is that both reuse exactly what you've already learned. The model never changes; you just feed it differently.

1 · It isn't only text anymore

Modern models take in images, sound, even video — that's what "multimodal" means. It sounds like a whole new machine, but the trick is small. Anything can be turned into the one thing the model already eats: a sequence of vectors (lesson 3). A photo gets cut into patches, a sound into slices, and each piece becomes a vector — just like a word becomes a token. After that very first step, the model can't even tell which kind of input it started with.

Try each input

Text, an image, a sound — all become the same thing

Pick an input. Watch it get chopped into pieces and turned into vectors — the same "list of numbers" from lesson 3. From there, the model treats them all alike.

Input: a sentence

cut into pieces

→

each piece → a vector

the same currency, whatever the input

→

the same
next-word engine

predicts what comes next

One alphabet. To the model, a sentence, a photo and a voice note all arrive written in the same alphabet: numbers. That's why a single system can describe a picture, read a chart, or answer a spoken question — it's the next-word engine from page one, just fed a different kind of token.

2 · Giving it your own knowledge

The model is frozen (lesson 13) and never read your company handbook, your emails, or today's news. So how do tools "chat with your documents"? Not by retraining — by retrieval, usually called RAG. When you ask something, the system first searches your files, grabs the most relevant passage, and pastes it into the prompt. The model then answers from text it can actually see in front of it — and can point you to the source.

Walk it through

Answering from your documents, not its memory

The frozen model never saw your files. Retrieval fixes that, in three small steps. Step through it.

your question "How long do I have to return an item?"

Open book, not memory. Asking a bare model is a closed-book exam — it answers from memory and sometimes misremembers (that's a hallucination, lesson 14). RAG hands it the open book first. Same model, but now it's reading the answer instead of trying to recall it — which is why it can cite a real source.

Notice the thread running through both: the model itself never changes. You either feed it a new kind of input, or feed it better text to work from. The engine underneath is the same next-word guesser you met on the very first page. The glossary is next, for any term you'd like to revisit.