ChromaDB Discussion Points

What is ChromaDB?

ChromaDB is an open-source, lightweight, and local-first vector database used to store embeddings and perform similarity search. It powers RAG systems by enabling fast retrieval of semantically similar text or data.

What are embeddings in ChromaDB?

Embeddings are high-dimensional numerical vectors representing text, images, audio, or structured data. They allow semantic operations like similarity search.

What is a collection in ChromaDB?

A collection is like a table containing:

  • embeddings
  • documents
  • metadata
  • unique IDs

You group related items into collections.

What distance metrics does ChromaDB support?

  • Cosine similarity (default)
  • Euclidean (L2)
  • Dot product

How do you create a collection in ChromaDB?

import chromadb
client = chromadb.Client()
collection = client.create_collection("my_collection")

What is an embedding function?

It converts text/data into vectors before storing. ChromaDB supports embedding models from:

  • HuggingFace
  • OpenAI
  • SentenceTransformers

Why is ChromaDB used in RAG (Retrieval-Augmented Generation)?

Because it is:

  • Fast
  • Easy to deploy
  • Supports metadata filtering
  • Works exceptionally well for small to mid-scale RAG apps

How do you insert items into a ChromaDB collection?

collection.add(
    ids=["1"],
    documents=["Apple is a fruit"],
    metadatas=[{"type": "fruit"}]
)

How do you perform a similarity search in ChromaDB?

collection.query(query_texts=["What is Apple?"], n_results=3)

What is metadata filtering?

It allows narrowing down search results based on metadata fields.

collection.query(
    query_texts=["apple"],
    where={"type": "fruit"}
)

Difference between Client() and PersistentClient()?

ClientPersistentClient
In-memoryDisk-based
Data lost on restartUses DuckDB storage
Good for testingGood for production

What happens if you insert an existing ID?

collection.delete(ids=["1"])

What storage engine powers ChromaDB?

ChromaDB uses DuckDB, a high-performance columnar database.

Does ChromaDB support hybrid search (keyword + vector search)?

No.Hybrid search must be implemented manually or through another tool like Elasticsearch + ChromaDB.