Stream agent responses token-by-token via Server-Sent Events, inspect live reasoning steps, and send multimodal image inputs to vision-capable agents.
Agent Manager streams responses using Server-Sent Events (SSE), so your application can render the agent’s answer incrementally as tokens arrive rather than waiting for the full response. The same stream also exposes the agent’s internal reasoning and tool activity, giving you full visibility into how the agent is working.
When you call the stream endpoint, the server holds the connection open and pushes a sequence of AgentStreamEvent objects as newline-delimited SSE messages. Each event has a discriminator field (event) that tells you what kind of data it carries:
POST /api/agents/{agentId}/runs/streamContent-Type: application/jsonAccept: text/event-stream
The request body is identical to a synchronous run. The only difference is the endpoint path and the Accept header.
The stream has been initialized. No user-visible data.
REASONING_DELTA
A fragment of the agent’s inner reasoning — what it is thinking before calling a tool.
CONTENT_DELTA
A fragment of the final answer text. Concatenate these to build the full response.
TOOL_START
The agent is about to call a tool. data contains the tool name and input as JSON.
TOOL_END
A tool call has completed. data contains the tool output as JSON.
STOP
The stream is complete. No more events will follow.
ERROR
An error occurred. data contains an error message.
REASONING_DELTA events expose the agent’s “inner thoughts” — the chain-of-thought reasoning it produces before deciding which tool to call. You can render these separately (for example, in a collapsible “Thinking…” section) to show users how the agent reached its conclusion.
The example below uses the EventSource API (or a polyfill that supports POST with a body) to consume the stream and separate reasoning from content:
interface AgentStreamEvent { event: 'START' | 'REASONING_DELTA' | 'CONTENT_DELTA' | 'TOOL_START' | 'TOOL_END' | 'STOP' | 'ERROR'; data: string; timestamp: number;}async function streamAgentResponse( agentId: string, message: string, sessionId: string, onReasoning: (chunk: string) => void, onContent: (chunk: string) => void, onDone: () => void): Promise<void> { const response = await fetch(`/api/agents/${agentId}/runs/stream`, { method: 'POST', headers: { 'Content-Type': 'application/json', Accept: 'text/event-stream', }, body: JSON.stringify({ message, sessionId }), }); if (!response.body) throw new Error('No response body'); const reader = response.body.getReader(); const decoder = new TextDecoder(); let buffer = ''; while (true) { const { done, value } = await reader.read(); if (done) break; buffer += decoder.decode(value, { stream: true }); const lines = buffer.split('\n'); buffer = lines.pop() ?? ''; for (const line of lines) { if (!line.startsWith('data:')) continue; const payload = line.slice('data:'.length).trim(); if (!payload) continue; const evt: AgentStreamEvent = JSON.parse(payload); switch (evt.event) { case 'REASONING_DELTA': onReasoning(evt.data); break; case 'CONTENT_DELTA': onContent(evt.data); break; case 'STOP': onDone(); break; case 'ERROR': throw new Error(evt.data); } } }}
The built-in UI at http://localhost:5173 already implements this pattern. It renders reasoning steps in a collapsible panel, streams final answer tokens with Markdown formatting, and shows HITL approval controls when a run reaches PAUSED status.
For agents that support vision, you can include image attachments in the request body alongside your message. Images must be base64-encoded or referenced by a publicly accessible URL.
Not all agents are configured with vision-capable models. Sending media to a text-only agent will result in an error. Check the agent’s configuration or use procurator_assistant to ask which agents support multimodal input.