> ## Documentation Index
> Fetch the complete documentation index at: https://operativusai.mintlify.site/llms.txt
> Use this file to discover all available pages before exploring further.

# Real-Time Streaming Chat with AI Agents

> Stream agent responses token-by-token via Server-Sent Events, inspect live reasoning steps, and send multimodal image inputs to vision-capable agents.

Agent Manager streams responses using Server-Sent Events (SSE), so your application can render the agent's answer incrementally as tokens arrive rather than waiting for the full response. The same stream also exposes the agent's internal reasoning and tool activity, giving you full visibility into how the agent is working.

## How streaming works

When you call the stream endpoint, the server holds the connection open and pushes a sequence of `AgentStreamEvent` objects as newline-delimited SSE messages. Each event has a discriminator field (`event`) that tells you what kind of data it carries:

```bash theme={null}
POST /api/agents/{agentId}/runs/stream
Content-Type: application/json
Accept: text/event-stream
```

The request body is identical to a synchronous run. The only difference is the endpoint path and the `Accept` header.

## Event structure

Every SSE message contains a single `AgentStreamEvent`:

<ResponseField name="event" type="string" required>
  The event type. See the table below for all possible values.
</ResponseField>

<ResponseField name="data" type="string">
  The payload for this event. For delta events this is a text fragment; for tool events it is a JSON string.
</ResponseField>

<ResponseField name="timestamp" type="number">
  Unix timestamp (milliseconds) when the event was emitted by the server.
</ResponseField>

### Event types

| Event             | Description                                                                            |
| ----------------- | -------------------------------------------------------------------------------------- |
| `START`           | The stream has been initialized. No user-visible data.                                 |
| `REASONING_DELTA` | A fragment of the agent's inner reasoning — what it is thinking before calling a tool. |
| `CONTENT_DELTA`   | A fragment of the final answer text. Concatenate these to build the full response.     |
| `TOOL_START`      | The agent is about to call a tool. `data` contains the tool name and input as JSON.    |
| `TOOL_END`        | A tool call has completed. `data` contains the tool output as JSON.                    |
| `STOP`            | The stream is complete. No more events will follow.                                    |
| `ERROR`           | An error occurred. `data` contains an error message.                                   |

<Info>
  `REASONING_DELTA` events expose the agent's "inner thoughts" — the chain-of-thought reasoning it produces before deciding which tool to call. You can render these separately (for example, in a collapsible "Thinking..." section) to show users how the agent reached its conclusion.
</Info>

## Consuming the stream in TypeScript

The example below uses the `EventSource` API (or a polyfill that supports `POST` with a body) to consume the stream and separate reasoning from content:

```typescript theme={null}
interface AgentStreamEvent {
  event: 'START' | 'REASONING_DELTA' | 'CONTENT_DELTA' | 'TOOL_START' | 'TOOL_END' | 'STOP' | 'ERROR';
  data: string;
  timestamp: number;
}

async function streamAgentResponse(
  agentId: string,
  message: string,
  sessionId: string,
  onReasoning: (chunk: string) => void,
  onContent: (chunk: string) => void,
  onDone: () => void
): Promise<void> {
  const response = await fetch(`/api/agents/${agentId}/runs/stream`, {
    method: 'POST',
    headers: {
      'Content-Type': 'application/json',
      Accept: 'text/event-stream',
    },
    body: JSON.stringify({ message, sessionId }),
  });

  if (!response.body) throw new Error('No response body');

  const reader = response.body.getReader();
  const decoder = new TextDecoder();
  let buffer = '';

  while (true) {
    const { done, value } = await reader.read();
    if (done) break;

    buffer += decoder.decode(value, { stream: true });
    const lines = buffer.split('\n');
    buffer = lines.pop() ?? '';

    for (const line of lines) {
      if (!line.startsWith('data:')) continue;
      const payload = line.slice('data:'.length).trim();
      if (!payload) continue;

      const evt: AgentStreamEvent = JSON.parse(payload);

      switch (evt.event) {
        case 'REASONING_DELTA':
          onReasoning(evt.data);
          break;
        case 'CONTENT_DELTA':
          onContent(evt.data);
          break;
        case 'STOP':
          onDone();
          break;
        case 'ERROR':
          throw new Error(evt.data);
      }
    }
  }
}
```

<Tip>
  The built-in UI at `http://localhost:5173` already implements this pattern. It renders reasoning steps in a collapsible panel, streams final answer tokens with Markdown formatting, and shows HITL approval controls when a run reaches `PAUSED` status.
</Tip>

## Multimodal input

For agents that support vision, you can include image attachments in the request body alongside your message. Images must be base64-encoded or referenced by a publicly accessible URL.

### Request body with media

```json theme={null}
{
  "message": "What does this chart show?",
  "sessionId": "session-abc-123",
  "media": [
    {
      "type": "image/png",
      "data": "iVBORw0KGgoAAAANSUhEUgAA..."
    }
  ]
}
```

<ParamField body="media[].type" type="string" required>
  The MIME type of the attachment, such as `image/png` or `image/jpeg`.
</ParamField>

<ParamField body="media[].data" type="string" required>
  Base64-encoded binary content of the file, or an HTTP/HTTPS URL pointing to the image.
</ParamField>

You can include multiple media items in a single request:

```json theme={null}
{
  "message": "Compare these two screenshots",
  "media": [
    { "type": "image/png", "data": "<base64_image_1>" },
    { "type": "image/png", "data": "<base64_image_2>" }
  ]
}
```

<Warning>
  Not all agents are configured with vision-capable models. Sending media to a text-only agent will result in an error. Check the agent's configuration or use `procurator_assistant` to ask which agents support multimodal input.
</Warning>

## Full streaming example

The following end-to-end example opens a stream, collects reasoning and content separately, and prints them to the console:

```typescript theme={null}
let reasoning = '';
let answer = '';

await streamAgentResponse(
  'finance_agent',
  'Summarize the last 5 days of TSLA trading activity.',
  crypto.randomUUID(),
  (chunk) => {
    reasoning += chunk;
    process.stdout.write(`[thinking] ${chunk}`);
  },
  (chunk) => {
    answer += chunk;
    process.stdout.write(chunk);
  },
  () => {
    console.log('\n\n--- Stream complete ---');
    console.log('Full reasoning:', reasoning);
    console.log('Full answer:', answer);
  }
);
```

## Built-in chat UI

The Agent Manager UI (`http://localhost:5173`) provides a production-ready chat interface with:

<CardGroup cols={2}>
  <Card title="Live streaming" icon="bolt">
    Tokens render incrementally as they arrive. Reasoning steps appear in a collapsible "Thinking..." panel above the final answer.
  </Card>

  <Card title="Markdown rendering" icon="file-lines">
    Responses are rendered with full Markdown support: tables, code blocks with syntax highlighting, and lists.
  </Card>

  <Card title="Multimodal input" icon="image">
    Drag and drop image files directly into the chat input. The UI handles base64 encoding automatically.
  </Card>

  <Card title="HITL controls" icon="hand">
    When a run is paused for approval, the UI displays a visual indicator with "Approve" and "Reject" buttons.
  </Card>
</CardGroup>
