The streaming endpoint runs an agent and returns the response as a continuous Server-Sent Events (SSE) stream. Instead of waiting for the full response, your client receives typed events — reasoning traces, content tokens, and tool call notifications — as they are produced. This is the recommended integration for any user-facing chat interface.
All requests require a valid bearer token in the Authorization header.
Start a streaming run
POST /api/agents/{agentId}/runs/stream
Accepts the same RunRequest body as the synchronous endpoint and returns a text/event-stream response. Each event is a JSON-encoded AgentStreamEvent object.
Path parameters
The unique identifier of the agent to run. Retrieve valid IDs from GET /api/agents.
Request body
The user’s input or query.
A UUID identifying an existing conversation session. Enables multi-turn conversations. Omit to start a fresh session.
Associates the run with a specific user for memory scoping and audit logs.
Tenant identifier for multi-tenant deployments.
When true, the agent appends suggested follow-up questions to the final STOP event payload.
Array of multimodal inputs. Each object has a type (MIME type string) and data (base64 or URL).
Optional model overrides (model, temperature, maxTokens).
Response: AgentStreamEvent schema
The response is a text/event-stream. Each line prefixed with data: contains a JSON-encoded AgentStreamEvent:
The payload for this event. Its meaning depends on the event type — a text delta for CONTENT_DELTA, a JSON string for tool events, or an error message for ERROR.
Unix epoch milliseconds at the time the event was emitted by the server.
EventType values
| Event | Description |
|---|
START | Stream initialized. The data field contains the runId and sessionId as a JSON string. |
REASONING_DELTA | A fragment of the agent’s inner thought process. Aggregate these to show a “thinking…” indicator in your UI. |
CONTENT_DELTA | A token or short text chunk of the final answer. Concatenate all deltas to build the complete response. |
TOOL_START | The agent is about to call a tool. The data field contains the tool name and its arguments as JSON. |
TOOL_END | A tool call has completed. The data field contains the tool result. |
STOP | The stream is complete. All content has been delivered. No further events follow. |
ERROR | An error occurred during execution. The data field contains the error message. The stream closes after this event. |
Listen for REASONING_DELTA events to show a “thinking…” spinner or expandable reasoning trace before the first CONTENT_DELTA arrives. This significantly improves perceived responsiveness for complex queries.
Raw SSE stream example
data: {"event":"START","data":"{\"runId\":\"run_abc123\",\"sessionId\":\"sess_xyz\"}","timestamp":1746518400000}
data: {"event":"REASONING_DELTA","data":"The user wants the NVDA stock price. I should call get_stock_price.","timestamp":1746518400120}
data: {"event":"TOOL_START","data":"{\"name\":\"get_stock_price\",\"arguments\":{\"ticker\":\"NVDA\"}}","timestamp":1746518400250}
data: {"event":"TOOL_END","data":"{\"name\":\"get_stock_price\",\"result\":\"875.40\"}","timestamp":1746518401100}
data: {"event":"CONTENT_DELTA","data":"The current stock price of NVIDIA (NVDA) is ","timestamp":1746518401200}
data: {"event":"CONTENT_DELTA","data":"**$875.40**","timestamp":1746518401280}
data: {"event":"CONTENT_DELTA","data":", up 2.3% today.","timestamp":1746518401350}
data: {"event":"STOP","data":"","timestamp":1746518401400}
TypeScript example
async function streamAgent(agentId: string, message: string) {
const response = await fetch(
`http://localhost:8080/api/agents/${agentId}/runs/stream`,
{
method: "POST",
headers: {
"Authorization": "Bearer {token}",
"Content-Type": "application/json",
},
body: JSON.stringify({ message, sessionId: crypto.randomUUID() }),
}
);
if (!response.body) throw new Error("No response body");
const reader = response.body
.pipeThrough(new TextDecoderStream())
.getReader();
let fullContent = "";
let isThinking = false;
while (true) {
const { done, value } = await reader.read();
if (done) break;
for (const line of value.split("\n")) {
if (!line.startsWith("data:")) continue;
const event = JSON.parse(line.slice(5).trim());
switch (event.event) {
case "START":
console.log("Stream started:", JSON.parse(event.data));
break;
case "REASONING_DELTA":
if (!isThinking) {
console.log("[thinking...]");
isThinking = true;
}
process.stdout.write(event.data); // render reasoning trace
break;
case "CONTENT_DELTA":
if (isThinking) {
console.log("\n[response]");
isThinking = false;
}
fullContent += event.data;
process.stdout.write(event.data); // stream token to UI
break;
case "TOOL_START":
const tool = JSON.parse(event.data);
console.log(`\n[calling tool: ${tool.name}]`);
break;
case "TOOL_END":
const result = JSON.parse(event.data);
console.log(`[tool ${result.name} complete]`);
break;
case "STOP":
console.log("\n[stream complete]");
console.log("Full response:", fullContent);
break;
case "ERROR":
console.error("Stream error:", event.data);
break;
}
}
}
}
streamAgent("finance_agent", "What is the current NVDA stock price?");
Check run status
GET /api/agents/{agentId}/runs/{runId}/status
Fetches the current status and metadata of a specific run. This endpoint is useful after a streaming run completes to retrieve the full AgentRun entity, including timestamps and final output.
The agent that owns the run.
The run identifier, returned in the START event’s data payload.
curl --request GET \
--url "http://localhost:8080/api/agents/finance_agent/runs/run_abc123/status" \
--header "Authorization: Bearer {token}"
Returns an AgentRun entity with status (RUNNING, COMPLETED, FAILED, PAUSED, or CANCELLED) and associated timestamps.
Batch status check
GET /api/agents/{agentId}/runs/status?runIds=id1,id2,id3
Fetch the status of up to 100 runs in a single request. Run IDs that do not exist are simply absent from the response — no 404 is returned for missing IDs.
The agent that owns the runs.
A comma-separated list of run IDs to check. Maximum 100 IDs per request.
curl --request GET \
--url "http://localhost:8080/api/agents/finance_agent/runs/status?runIds=run_abc123,run_def456,run_ghi789" \
--header "Authorization: Bearer {token}"
Example response
[
{
"id": "run_abc123",
"agentId": "finance_agent",
"sessionId": "sess_xyz",
"status": "COMPLETED",
"createdAt": "2026-05-06T10:00:00Z",
"completedAt": "2026-05-06T10:00:03Z"
},
{
"id": "run_def456",
"agentId": "finance_agent",
"sessionId": "sess_xyz",
"status": "RUNNING",
"createdAt": "2026-05-06T10:01:00Z",
"completedAt": null
}
]
The batch status endpoint significantly reduces polling overhead. For 10 concurrent runs polled every 3 seconds, one batched call replaces 10 individual requests — dropping from ~200 requests/minute to ~20.