Embeddings
Embeddings are dense numerical vectors (arrays of floating-point numbers) that represent text, code, or other data in a high-dimensional space where semantically similar items are positioned close together. They enable AI systems to measure similarity between pieces of code, search codebases by meaning rather than keywords, and power retrieval-augmented generation (RAG) systems.
How embeddings work
An embedding model converts text or code into a fixed-size vector—typically 768 to 3,072 dimensions. The key property is that semantically similar inputs produce vectors that are close together in the embedding space (measured by cosine similarity or Euclidean distance). "Sort an array in ascending order" and "arrange a list from smallest to largest" would have very similar embeddings, even though they use different words. This semantic understanding is what makes embedding-based search so much better than keyword matching for code.
Embeddings in AI coding tools
Many AI coding tools use embeddings to index your codebase. Each function, class, or file chunk is converted to an embedding vector and stored in a vector database. When you ask a question, your query is also embedded, and the system finds the most semantically similar code chunks. This is how tools like Cursor can find relevant code even when your question uses different terminology than the source code. Claude Code takes a different approach—reading files directly—but embeddings remain fundamental to many code intelligence features.
// How code embeddings work (conceptual)
import { embed } from 'ai/embeddings';
// Convert code to embeddings
const functionA = `function calculateTax(income) {
return income * 0.3;
}`;
const functionB = `function computeLevy(salary) {
return salary * 0.3;
}`;
const functionC = `function reverseString(s) {
return s.split("").reverse().join("");
}`;
const embA = await embed(functionA); // [0.23, -0.11, 0.87, ...]
const embB = await embed(functionB); // [0.24, -0.10, 0.85, ...] ← similar to A
const embC = await embed(functionC); // [0.71, 0.33, -0.42, ...] ← different
// cosineSimilarity(embA, embB) ≈ 0.97 (very similar)
// cosineSimilarity(embA, embC) ≈ 0.21 (not similar)You do not need to understand the math behind embeddings to use AI coding tools effectively. But knowing that they exist helps you understand why semantic code search works and why some tools need to "index" your codebase before they can search it.
What is the difference between embeddings and tokens?+
Which embedding models are best for code?+
Do I need embeddings if I use Claude Code?+
Related comparisons
Master Claude Code in days, not months
37 hands-on lessons from beginner to CI/CD automation. Module 1 is free.
START FREE →