Your content writes knowing what you already know.
A semantic retrieval layer over everything you've brought into the platform — sources, extracted learnings (patterns), and the memory log — injecting the relevant context into the generation prompt automatically. Without the user having to remember, copy, or reference anything manually.
Context enters the prompt before the LLM starts writing.
When the user clicks Generate on a content, assemble_context(brief, collection_ids, client_id) runs before the LLM call and orchestrates three parallel lookups.
Sources collection
Patterns collection
Global memory
Assembles context block
patterns), your product vocabulary (via product_knowledge) and facts you've already gathered (via sources) — without you having to re-explain on every brief.What enters the vector store, and when.
| What | When it enters the vector store |
|---|---|
| Sources (URL, file, paste, platform scrape) | On import — auto chunking + embedding |
| Extracted patterns | On learning save |
| Memory events | On relevant interaction |
The infra behind the RAG.
- Vector store
- Persistent ChromaDB, cosine metric
- Embedding provider
- OpenAI (default)
- Embedding model
text-embedding-3-small— configurable in Settings → Embeddings; swap the model without a redeploy.- Namespacing
- One Chroma collection per logical collection + per-client scoping, preventing cross-contamination between clients.
The same index powers three surfaces.
- Memory semantic search — "What did we decide about client X's tone in April?" queries the same vector store.
- Chat tool search_memory — The AI Chat agent queries RAG when it needs to pull historical context to answer.
- Pattern merging — Semantic similarity helps detect redundant patterns before merging them.
Three things that set this RAG apart from a plain 'pgvector and done'.
Per-client isolation
Each client has its own namespace. One client's voice never leaks into another's content — a hard requirement for agencies.
Hot-swap the embedding model
Swap embeddings in Settings, no redeploy. Reindex on demand, not immediately — costs under control.
One index, three surfaces
The same index powers content generation, memory search, and chat tool calling. One place to curate, three to harvest.
Want to see this running on your own pipeline?
We'll show you in a quick demo, using data you already work with.