How AI Works - RAG, LLMs, Temperature & MCTS Visualized

Live inference

Watch a thought travel through a neural network

A prompt goes in. Neurons fire across the transformer's layers, attention links light up — and ASTERIZER's model returns an answer. Running on a loop, just like the real thing.

Prompt

ASTERIZER · Response

Plain English: an AI answer isn't looked up in a table — it flows through millions of tiny connected units that each nudge the result, until a final word comes out.

RAG — giving AI a memory of facts

Retrieval-Augmented Generation (RAG) lets an AI look things up instead of guessing. A question comes in; the AI turns your words — and its stored facts — into vectors (lists of numbers), finds the closest match with cosine similarity, then answers using that fact. Watch it run:

Question Embed → Vectors Cosine Match Answer

Your Question

Who is the Prime Minister?

Knowledge Base

The President is Obama

The Prime Minister is Modi

Generated Answer

LLM

The Prime Minister is Modi

Plain English: the AI doesn't "know" the answer - it looks it up by matching meaning, then phrases it back to you.

Words become numbers

Embeddings are how computers read language — by turning it into numbers. Each chunk of text (a "token") becomes a vector, and similar meanings land on similar numbers. This is the quiet magic that powers search and RAG.

Plain English: "king" and "queen" land near each other; "banana" lands far away. That closeness is what powers search and RAG.

An LLM predicts one word at a time

Next-token prediction is the whole trick behind a Large Language Model (LLM): it reads what's written so far, guesses the most likely next word, then repeats. Over and over, it builds a sentence.

The cat sat on the

Plain English: the AI is basically a very, very good autocomplete - picking the next word from a ranked list of guesses.

Temperature — the creativity dial

Temperature controls how risky the AI's word choices are — a single number that flattens or sharpens the odds. Drag the slider and watch the distribution change in real time.

Temperature 0.70 Balanced

Plain English: low temperature = the safe, obvious word every time. High temperature = the AI takes creative gambles.

Sampling — how it picks from the options

Sampling strategies decide which candidate words are even allowed in the running before the AI chooses one. Greedy, Top-k, and Top-p each draw that shortlist differently.

Plain English: greedy always grabs #1 (safe but repetitive). Top-k / top-p keep a shortlist so answers stay fresh but sensible.

MCTS — how AI plans ahead

Monte Carlo Tree Search (MCTS) is how game-playing AIs (like chess or Go engines) think. They imagine many possible futures, test them, and reinforce the moves that tend to win — four steps, on a loop:

1 Select 2 Expand 3 Simulate 4 Back-propagate

Watching the AI think…

Plain English: the AI daydreams thousands of "what if I play here?" scenarios, then trusts the paths that worked out best.

AI jargon, decoded

The words you keep hearing — one plain-English line each.

TokenA chunk of text — roughly a word or part of one. AI reads and writes in tokens.

ParametersThe billions of tiny dials a model tunes in training to store what it “knows.”

Context windowHow much text the model can hold in mind at once — its short-term memory.

Fine-tuningExtra training that specializes a general model for a specific job or style.

HallucinationWhen a model states something false but confidently. RAG and grounding reduce it.

InferenceRunning a trained model to get an answer — what happens every time you ask.

TransformerThe neural-network design behind modern LLMs — it weighs how words relate using attention.

AttentionHow a model decides which earlier words matter most when predicting the next one.

TemperatureA dial for creativity — low is safe and predictable, high is surprising.

PromptThe instruction or question you give the model — the better the prompt, the better the answer.

System promptHidden instructions that set the model's role, tone, and rules before you start chatting.

RAGRetrieval-Augmented Generation — the model looks up real facts before answering, so it guesses less.

EmbeddingTurning text into a list of numbers (a vector) so similar meanings sit close together.

AgentAn AI that can take actions — use tools, browse, or call APIs — not just chat.

Chain-of-thoughtLetting a model reason step by step, which often makes its answers more reliable.

Want this intelligence inside your product?

We design and ship RAG systems, LLM apps, and AI features that are fast, accurate, and production-ready.

Build with ASTERIZER See AI Services

How AI Really Works