Neuromancer
Back to home
Multimodal Assets

One brief. One plan. Text, image, video and audio delivered together.

An orchestrator that turns a brief into a multimodal production plan and runs each piece on the right provider, with live progress streamed in real time.

What it is

An orchestrator, not a pipeline you assemble by hand.

The user doesn't assemble a pipeline — they pick the platform, mark the desired modalities, and the agent decides what to produce in each slot. When they hit generate, the platform returns a complete production plan before spending a single credit.

How it works

Two-step flow: Plan → Execute.

Two SSE routes: the first returns a plan for the user to approve; the second runs each asset on the right provider and emits granular progress events.

1

Plan

POST /api/v1/multimodal/plan (SSE)
The agent reads the brief, the platform and the marked modalities, then returns a production plan as JSON: the text body + an assets list describing each piece (type, prompt, suggested provider, order). The user sees the plan before executing.
2

Execute

POST /api/v1/multimodal/execute-plan (SSE)
The backend fires each asset at the right provider and emits granular progress events. No fire-and-forget — the user sees exactly where the plan is at every second.
SSE events
  • asset_startasset N started generating
  • asset_completedone, returns asset_id
  • asset_failedfailed, returns reason
  • donewhole plan finished
In the UI

Platform ↔ modality: coupling that kills invalid configurations.

In Create Content the user marks modalities as chips — text always on, image, video and audio optional. The filter only shows modalities valid for the selected platform.

PlatformAllowed modalities
Copytext
Creative Brieftext
Emailtext + image (no video)
Videotext + image + video + audio
Designtext + image

The UI also reads the enabled_providers list from the backend and disables modalities without a configured provider, with a tooltip "Configure a <modality> provider in Settings first". No more click-and-break.

Providers

The real providers already wired into the code.

ModalityProviders
ImageDALL-E 3, DALL-E 2, GPT Image, Gemini Imagen 3, Gemini Imagen 3 Fast
VideoGemini Veo
AudioAudio/speech providers plugged in via enabled_providers (Whisper for voice transcription in chat)
TextClaude (Code / Anthropic API), OpenAI, configurable fallback
Persistence

Every asset is a record. Every record can be regenerated without redoing the rest.

Each asset becomes a ContentAsset record with content_id, asset_type, provider, file_url/base64, prompt, position and status (pending / generating / completed / failed).

The final content reads assets ordered by position. If an asset fails or the result isn't good, the user regenerates that specific asset via a dedicated route — no need to redo the entire content.

Highlights

Three things that make a difference day-to-day.

01

Plan → Approve → Execute

The user sees what will be produced before spending credits. Human approval lives inside the flow, not outside it.

02

Real-time progress via SSE

Not fire-and-forget. Every asset emits start / complete / failed / done — UI and automations see the plan live.

03

Platform ↔ modality coupling

Invalid combinations are killed before submit. The user can't ask for video on a Copy brief.

Want to see this running on your own pipeline?

We'll show you in a quick demo, using data you already work with.