ChromaDB RAG

Your content writes knowing what you already know.

A semantic retrieval layer over everything you've brought into the platform — sources, extracted learnings (patterns), and the memory log — injecting the relevant context into the generation prompt automatically. Without the user having to remember, copy, or reference anything manually.

How it helps content creation

Context enters the prompt before the LLM starts writing.

When the user clicks Generate on a content, assemble_context(brief, collection_ids, client_id) runs before the LLM call and orchestrates three parallel lookups.

Sources collection

Chunks of imported material — articles, PDFs, LinkedIn exports, Substack posts, YouTube transcripts, repo READMEs. Top-5 most similar to the brief.

Patterns collection

Learnings already extracted for the client/project in question — writing_style, product_knowledge, design_system.

Global memory

Log of interactions and learnings saved over time. Past decisions, preferences, wins and misses — all queryable.

Assembles context block

The three results merge into a single structured block, injected into the prompt BEFORE the writing instruction. The LLM writes already knowing.

The result

The LLM writes in your real voice (via patterns), your product vocabulary (via product_knowledge) and facts you've already gathered (via sources) — without you having to re-explain on every brief.

Indexing

What enters the vector store, and when.

What	When it enters the vector store
Sources (URL, file, paste, platform scrape)	On import — auto chunking + embedding
Extracted patterns	On learning save
Memory events	On relevant interaction

Real stack

The infra behind the RAG.

Vector store: Persistent ChromaDB, cosine metric
Embedding provider: OpenAI (default)
Embedding model: text-embedding-3-small — configurable in Settings → Embeddings; swap the model without a redeploy.
Namespacing: One Chroma collection per logical collection + per-client scoping, preventing cross-contamination between clients.

Secondary uses

The same index powers three surfaces.

Memory semantic search — "What did we decide about client X's tone in April?" queries the same vector store.
Chat tool search_memory — The AI Chat agent queries RAG when it needs to pull historical context to answer.
Pattern merging — Semantic similarity helps detect redundant patterns before merging them.

Highlights

Three things that set this RAG apart from a plain 'pgvector and done'.

Per-client isolation

Each client has its own namespace. One client's voice never leaks into another's content — a hard requirement for agencies.

Hot-swap the embedding model

Swap embeddings in Settings, no redeploy. Reindex on demand, not immediately — costs under control.

One index, three surfaces

The same index powers content generation, memory search, and chat tool calling. One place to curate, three to harvest.

Want to see this running on your own pipeline?

We'll show you in a quick demo, using data you already work with.

Get a demo Read the docs