Skip to content

Home / Glossary / Vector Database

Definition

Vector Database

A vector database is a specialized database designed to store, index, and search high-dimensional embedding vectors efficiently. Unlike traditional databases that match exact values or keywords, vector databases find the most similar vectors to a query vector—enabling semantic search, recommendation systems, and the retrieval component of RAG (retrieval-augmented generation) architectures.

How vector databases work

Vector databases store embedding vectors alongside metadata (like file paths, function names, or code snippets). When you search, your query is converted to an embedding vector, and the database uses approximate nearest neighbor (ANN) algorithms to quickly find the most similar stored vectors. These algorithms (like HNSW or IVF) trade a small amount of accuracy for massive speed gains, making it possible to search millions of vectors in milliseconds.

Vector databases in AI coding tools

AI coding tools that index your codebase use vector databases under the hood. When Cursor or Cline indexes your project, it creates embedding vectors for code chunks and stores them in a local vector database. When you ask a question, the tool converts your query to an embedding, searches the vector database for relevant code, and passes those code chunks to the LLM as context. This is the "retrieval" step in RAG that makes AI tools aware of your specific codebase.

Popular vector databases in 2026

  • +Pinecone: fully managed, serverless vector database optimized for production workloads
  • +Weaviate: open-source with built-in vectorization and hybrid search
  • +Chroma: lightweight, developer-friendly, popular for prototyping and small projects
  • +Qdrant: high-performance, open-source, with advanced filtering capabilities
  • +pgvector: PostgreSQL extension for teams that want vectors in their existing database
typescript
// Using a vector database for code search (Chroma example)
import { ChromaClient } from "chromadb";

const client = new ChromaClient();
const collection = await client.createCollection({ name: "codebase" });

// Index code chunks
await collection.add({
  ids: ["auth-1", "auth-2", "payment-1"],
  documents: [
    "function validateToken(jwt) { ... }",
    "function refreshSession(userId) { ... }",
    "function processPayment(amount, card) { ... }"
  ],
  metadatas: [
    { file: "src/auth.ts", line: 15 },
    { file: "src/auth.ts", line: 42 },
    { file: "src/payment.ts", line: 8 }
  ]
});

// Semantic search — finds auth-related code
const results = await collection.query({
  queryTexts: ["how does user authentication work?"],
  nResults: 5
});

If you are building AI-powered features on top of your codebase, start with Chroma for prototyping and pgvector if you already use PostgreSQL. Migrate to Pinecone or Qdrant when you need production-grade performance and scaling.

Do I need a vector database to use AI coding tools?+
No. Tools like Claude Code read files directly from your filesystem without a vector database. Other tools (Cursor, Cline) use vector databases internally for codebase indexing, but you do not need to set one up—the tool handles it automatically.
How is a vector database different from a regular database?+
Regular databases excel at exact matches (find user where id = 123) and range queries. Vector databases excel at similarity queries (find the 10 code chunks most similar to this question). They use different indexing algorithms optimized for high-dimensional distance calculations.
Can I use PostgreSQL instead of a dedicated vector database?+
Yes, with the pgvector extension. It is a good choice for smaller datasets (under 1 million vectors) and teams that want to keep their stack simple. For larger datasets or latency-critical applications, dedicated vector databases offer better performance.

Related terms

Cửa Sổ Ngữ CảnhLarge Language Model (LLM)Retrieval-Augmented Generation (RAG)Embeddings

Related comparisons

Claude Code vs CursorClaude Code vs Cline

Master Claude Code in days, not months

37 hands-on lessons from beginner to CI/CD automation. Module 1 is free.

START FREE →
← ALL TERMS