Streaming Output

Tokens are delivered to the client incrementally as they are generated, rather than waiting for the complete response. This reduces perceived latency (the user sees output immediately) and enables progressive rendering, early termination, and real-time status updates during agent execution.

Structure

The stream carries multiple event types: raw tokens (for progressive text rendering), tool call notifications (agent is using a tool), step completions (agent finished a phase), and the final result.

How It Works

Token streaming:

Start generation — agent begins producing output
Stream tokens — each token is sent to the client as it's generated
Render progressively — client displays text as it arrives (the "typing" effect)
Complete — final token signals end of generation

Agent event streaming:

Subscribe — client subscribes to the agent's event stream
Receive events — tool calls, observations, step completions arrive as structured events
Update UI — client shows agent progress (thinking, searching, writing...)
Final result — last event contains the complete response

Transport:

Server-Sent Events (SSE) — the standard for LLM streaming
WebSockets — bidirectional, for interactive agents
HTTP chunked transfer — simplest, least overhead

Key Characteristics

Low perceived latency — user sees output in milliseconds, not seconds
Early termination — client can cancel if the output is going off-track
Progress visibility — users see what the agent is doing in real-time
Complexity — streaming adds client-side buffering and event handling logic
Structured output challenge — streaming partial JSON is tricky (needs buffering until valid)

When to Use

User-facing applications where perceived speed matters
Long-running agent tasks where users need progress updates
Interactive sessions where the user may want to interrupt or redirect
Chat interfaces — streaming is expected UX for AI conversations
Agent dashboards that show tool calls and reasoning in real-time

Structure​

How It Works​

Key Characteristics​

When to Use​

Structure

How It Works

Key Characteristics

When to Use