Home / Glossary / Retrieval-Augmented Generation (RAG)
Retrieval-Augmented Generation (RAG)
Retrieval-augmented generation (RAG) is an AI architecture that improves the accuracy of language model responses by retrieving relevant information from external knowledge sources before generating an answer. Instead of relying solely on what the model memorized during training, RAG fetches up-to-date, domain-specific data and includes it in the model's context.
How RAG works
RAG operates in two phases. First, the retrieval step: when a user asks a question, the system searches a knowledge base (documents, code, databases) for relevant information using semantic search with embeddings. Second, the generation step: the retrieved information is added to the model's prompt as context, and the LLM generates a response grounded in that specific data. This means the model can reference documentation it was never trained on and provide answers that are current and specific to your domain.
RAG in AI coding tools
AI coding tools use RAG-like patterns extensively. When Cursor or Cline indexes your codebase, it creates embeddings of your files and retrieves relevant code when you ask a question. This is why these tools can answer questions about your specific project rather than just giving generic coding advice. Claude Code takes a different approach—it reads files directly from your filesystem on demand rather than pre-indexing, but the principle is the same: ground AI responses in your actual code.
// Simplified RAG pipeline for a coding assistant
// 1. Index: convert code files to embeddings
const chunks = splitCodeIntoChunks(codebase);
const embeddings = await model.embed(chunks);
await vectorDB.upsert(embeddings);
// 2. Retrieve: find relevant code for the query
const queryEmbedding = await model.embed(userQuestion);
const relevantCode = await vectorDB.search(queryEmbedding, { topK: 10 });
// 3. Generate: answer with retrieved context
const response = await llm.generate({
prompt: `Given this code context:\n${relevantCode}\n\nAnswer: ${userQuestion}`
});RAG quality depends on retrieval quality. If the system retrieves irrelevant code, the LLM generates irrelevant answers. Good chunking strategies and embedding models matter as much as the generation model.
What is the difference between RAG and fine-tuning?+
Do all AI coding tools use RAG?+
What are the limitations of RAG?+
Related comparisons
Master Claude Code in days, not months
37 hands-on lessons from beginner to CI/CD automation. Module 1 is free.
START FREE →