What is ChromaDB?
ChromaDB is an open-source, lightweight, and local-first vector database used to store embeddings and perform similarity search. It powers RAG systems by enabling fast retrieval of semantically similar text or data.
What are embeddings in ChromaDB?
Embeddings are high-dimensional numerical vectors representing text, images, audio, or structured data. They allow semantic operations like similarity search.
What is a collection in ChromaDB?
A collection is like a table containing:
- embeddings
- documents
- metadata
- unique IDs
You group related items into collections.
What distance metrics does ChromaDB support?
- Cosine similarity (default)
- Euclidean (L2)
- Dot product
How do you create a collection in ChromaDB?
import chromadb
client = chromadb.Client()
collection = client.create_collection("my_collection")
What is an embedding function?
It converts text/data into vectors before storing. ChromaDB supports embedding models from:
- HuggingFace
- OpenAI
- SentenceTransformers
Why is ChromaDB used in RAG (Retrieval-Augmented Generation)?
Because it is:
- Fast
- Easy to deploy
- Supports metadata filtering
- Works exceptionally well for small to mid-scale RAG apps
How do you insert items into a ChromaDB collection?
collection.add(
ids=["1"],
documents=["Apple is a fruit"],
metadatas=[{"type": "fruit"}]
)
How do you perform a similarity search in ChromaDB?
collection.query(query_texts=["What is Apple?"], n_results=3)
What is metadata filtering?
It allows narrowing down search results based on metadata fields.
collection.query(
query_texts=["apple"],
where={"type": "fruit"}
)
Difference between Client() and PersistentClient()?
| Client | PersistentClient |
|---|---|
| In-memory | Disk-based |
| Data lost on restart | Uses DuckDB storage |
| Good for testing | Good for production |
What happens if you insert an existing ID?
collection.delete(ids=["1"])
What storage engine powers ChromaDB?
ChromaDB uses DuckDB, a high-performance columnar database.
Does ChromaDB support hybrid search (keyword + vector search)?
No.Hybrid search must be implemented manually or through another tool like Elasticsearch + ChromaDB.