Two ideas behind the tools you actually use.
You've got the whole core now: a next-word engine, trained, tuned, and run on your words. This page is a bonus — skip it freely. But two extensions turn that plain engine into most of the AI products around you, and the lovely part is that both reuse exactly what you've already learned. The model never changes; you just feed it differently.
1 · It isn't only text anymore
Modern models take in images, sound, even video — that's what "multimodal" means. It sounds like a whole new machine, but the trick is small. Anything can be turned into the one thing the model already eats: a sequence of vectors (lesson 3). A photo gets cut into patches, a sound into slices, and each piece becomes a vector — just like a word becomes a token. After that very first step, the model can't even tell which kind of input it started with.
Text, an image, a sound — all become the same thing
Pick an input. Watch it get chopped into pieces and turned into vectors — the same "list of numbers" from lesson 3. From there, the model treats them all alike.
next-word engine
2 · Giving it your own knowledge
The model is frozen (lesson 13) and never read your company handbook, your emails, or today's news. So how do tools "chat with your documents"? Not by retraining — by retrieval, usually called RAG. When you ask something, the system first searches your files, grabs the most relevant passage, and pastes it into the prompt. The model then answers from text it can actually see in front of it — and can point you to the source.
Answering from your documents, not its memory
The frozen model never saw your files. Retrieval fixes that, in three small steps. Step through it.
Notice the thread running through both: the model itself never changes. You either feed it a new kind of input, or feed it better text to work from. The engine underneath is the same next-word guesser you met on the very first page. The glossary is next, for any term you'd like to revisit.