Back to catalog

Season 6 18 Episodes 1h 3m 2026

OpenAI Agents SDK

v0.13 — 2026 Edition. A comprehensive guide to building production-ready multi-agent systems with the OpenAI Agents SDK for Python (v0.13 - 2026). Learn core primitives, orchestration patterns, tools, handoffs, guardrails, state management, MCP, and realtime voice integration.

AI/ML Frameworks Multi-Agent Systems

🌐 English 🇪🇸 Español 🇫🇷 Français 🇵🇹 Português 🇮🇹 Italiano 🇵🇱 Polski 🇩🇪 Deutsch 🇷🇴 Română

Now Playing

Click play to start

0:00

Beyond Swarm: The Core Primitives

Discover the foundational concepts of the OpenAI Agents SDK. This episode covers why the SDK exists, how it improves upon Swarm, and the core design principles prioritizing minimal abstractions.

3m 25s

Defining the Agent and the Run Loop

Learn how to configure the foundational Agent object. We explore instructions, model settings, and how to force structured data outputs seamlessly.

3m 08s

Equipping Agents with Python Function Tools

Empower your agents to take action by converting standard Python functions into executable tools. Understand automatic schema generation and type inference.

3m 16s

Scaling Tool Surfaces with Hosted Tool Search

Learn how to manage massive tool libraries without exhausting your token budget. We cover deferred loading, namespaces, and hosted tool execution.

3m 22s

Decentralized Delegation: The Handoff Pattern

Master the art of multi-agent orchestration using the Handoff pattern. Discover how to create triage agents that seamlessly delegate full control to specialized sub-agents.

3m 27s

Centralized Orchestration: Agents as Tools

Keep conversation control in one place using the Agents as Tools pattern. We discuss how a manager agent can synthesize answers from multiple specialist sub-agents.

3m 31s

Shaping Context with Handoff Inputs and Filters

Optimize multi-agent token usage by modifying conversation histories between handoffs. Learn how to inject metadata and apply transcript filters.

3m 47s

Controlling State: to_input_list and Server IDs

A deep dive into manual conversation management. Understand the lowest-level methods for preserving context across turns and utilizing server-side response IDs.

3m 45s

Automating Memory with Built-in Sessions

Simplify your chat loops with the SDK's built-in memory system. We explore SQLiteSession, OpenAIConversationsSession, and automated persistence.

3m 42s

Protecting Workflows: Input and Output Guardrails

Secure your AI pipelines by catching malicious inputs before they reach expensive models. We cover agent-level guardrails and parallel vs blocking execution.

3m 32s

Validating Actions: Tool-Level Guardrails

Prevent critical data leaks at the function level. Learn how to wrap specific tools with precise input and output guardrails.

3m 43s

Pausing Execution: Human-in-the-Loop and RunState

Implement safeguards for irreversible actions by enforcing human-in-the-loop approvals. We explore the RunState serialization pipeline for pausing and resuming workloads.

3m 18s

Injecting Local Dependencies with RunContextWrapper

Master dependency injection in your agent flows. Learn how to securely pass local states and database connections to tools without leaking them to the LLM.

3m 45s

The USB-C for AI: Intro to MCP

An introduction to the Model Context Protocol (MCP). Discover how this standard acts as a universal connector to easily hook AI agents into SaaS platforms.

3m 15s

Connecting Local MCP Servers via Stdio and HTTP

Dive deeper into MCP by running standard local servers. Learn to sandbox filesystem access and internal tools securely with MCPServerStdio.

3m 32s

Visualizing Workflows with Built-in Tracing

Eliminate print statement debugging using built-in SDK observability. Discover how automatic spans and traces link entire complex workflows.

3m 18s

Low-Latency Voice with Realtime Agents

Break the standard request-response paradigm. See how Realtime Agents maintain live WebSocket connections to handle interruptions and multimodal reasoning.

3m 48s

Building Reactive UIs with Streaming Events

Go beyond streaming text tokens. Utilize semantic streaming events to build ultra-responsive frontend interfaces that react to agent actions in real time.

3m 33s

Episodes

Beyond Swarm: The Core Primitives

3m 25s

Discover the foundational concepts of the OpenAI Agents SDK. This episode covers why the SDK exists, how it improves upon Swarm, and the core design principles prioritizing minimal abstractions.

Download

Hi, this is Alex from DEV STORIES DOT EU. OpenAI Agents SDK, episode 1 of 18. Most AI agent frameworks force you to learn a dozen new abstractions, custom syntax, and heavy object hierarchies. You spend more time fighting the framework than writing your actual application logic. Today, we look at Beyond Swarm: The Core Primitives. When developers hear about a new agent framework, they often expect a massive ecosystem with a steep learning curve. The OpenAI Agents SDK is exactly the opposite. It is an evolution of the experimental Swarm library. Swarm proved that you could build complex interactions with very simple concepts. The Agents SDK takes that philosophy and hardens it for production use. It is deliberately lightweight and entirely Python-first. The architecture rests on two fundamental design principles: exposing very few core primitives, and keeping the execution highly customizable. You do not need a complex graph of nodes or a proprietary declarative language to build an agent here. The SDK gives you a tiny set of building blocks. The foundational piece is the Agent object. You define an Agent by providing it with a name and a set of instructions. If you want to build a simple history tutor, you instantiate an Agent, name it HistoryTutor, and pass it a text string instructing it to teach historical events clearly and accurately. That is your entire agent configuration. There is no hidden state and no complicated initialization. But an agent by itself is just a static data structure. It does nothing until it is executed. This is where it gets interesting. Execution is handled entirely by a separate component called the Runner. The Runner manages the complete interaction loop between your local code and the remote OpenAI API. In a typical application, you would have to write a custom while-loop to check if the model wants to call a tool, parse the response, execute the tool, and send the result back. The Runner abstracts all of that away. To start the process, you pass your HistoryTutor agent and the user prompt into the run method. The Runner takes over from there. It sends the prompt to the model. If the model decides it needs to look up a specific historical date, it will request a tool call. The Runner pauses, executes the local Python function you provided for that tool, captures the return value, and sends it right back to the model. It repeats this cycle automatically. It only returns control to your application when the model determines the task is complete and generates a final text response. This strict separation between the static Agent definition and the active Runner execution is what makes the SDK so customizable. Because tools are just standard Python functions with regular type hints, and the agent is just a plain object, you maintain total control over your application flow. You can easily inject custom logging, metrics, or error handling around the Runner without overriding deep framework classes. You are just writing Python. The true value of this SDK is not in what it adds, but in what it removes — it gets out of your way and lets you orchestrate language models using plain, readable code. If you want to help support the show, you can search for DevStoriesEU on Patreon. That is your lot for this one. Catch you next time!

Defining the Agent and the Run Loop

3m 08s

Learn how to configure the foundational Agent object. We explore instructions, model settings, and how to force structured data outputs seamlessly.

Download

Hi, this is Alex from DEV STORIES DOT EU. OpenAI Agents SDK, episode 2 of 18. What if you could force your language model to always reply in a perfectly parsed data structure without writing a single line of complex regex? Today, we cover Defining the Agent and the Run Loop, which is exactly how you achieve that strict control. A common point of confusion when starting with this SDK is thinking that an agent is an active, running process. It is not. In this architecture, the Agent object is strictly a configuration container. It wraps a specific language model in a predefined context. It does not execute itself, and it holds no state on its own. You are simply building a blueprint. To define this blueprint, you instantiate an Agent. You start with the instructions parameter. This is your core system prompt, where you define the persona, the boundaries, and the specific rules the model must follow. Next, you provide the model settings. This dictates which underlying model to use and configures standard inference details. At this stage, your agent is fully defined but completely dormant in memory. This is where it gets interesting. You can physically constrain the shape of the agent response using the output type parameter. Imagine you are building a tool to extract calendar events from messy email threads. Instead of writing instructions begging the model to format dates correctly, you define a concrete data structure in your code. You define a Calendar Event class with strict fields for a title, a start time, and a location. You pass this class into the output type parameter of your Agent. When configured this way, the API enforces the schema. The agent cannot return a conversational text reply. It will only ever return a validated Calendar Event object that your application code can ingest immediately. You now have a strict, well-configured agent blueprint. To make it do actual work, you need the Run Loop. Because the agent is just a static definition, execution is handled entirely by a separate Runner component. The runner is the engine. You pass your agent definition and the user input into the runner, and it takes over the execution. When you trigger the runner, it enters an execution loop. It bundles up your agent instructions, the strict output schema, and the user prompt, then sends them to the model. The run loop is responsible for managing all the back-and-forth orchestration. If the model decides it needs to call an external tool to fetch missing data, the runner intercepts that request, runs the local tool code, and feeds the result back to the model. It handles all of these intermediate steps automatically. The loop only terminates when the model resolves the prompt and produces the final output matching your exact calendar output type. Keeping the static agent configuration entirely separate from the active run loop is what allows you to safely reuse that exact same calendar extraction agent across thousands of simultaneous executions without any data bleeding over. Thanks for listening, happy coding everyone!

Equipping Agents with Python Function Tools

3m 16s

Empower your agents to take action by converting standard Python functions into executable tools. Understand automatic schema generation and type inference.

Download

Hi, this is Alex from DEV STORIES DOT EU. OpenAI Agents SDK, episode 3 of 18. Writing JSON schemas for language model tools by hand is tedious and fragile. One missing bracket or mismatched type, and the model fails to understand how to interact with your system. Instead of manually writing these definitions, you can just write standard Python and let the framework do the translation. This is exactly what we cover today: Equipping Agents with Python Function Tools. A common misconception is that exposing local logic to an agent requires maintaining two sources of truth. People assume they need their actual Python code, plus a separate, complex JSON configuration file that describes that code to the language model. With the Agents SDK, you completely bypass the manual JSON writing. You simply write a standard Python function and place the function tool decorator directly above it. When you apply that decorator, the SDK goes to work under the hood using Python's built-in inspect module and Pydantic. It scans your function signature. It reads the parameter names, extracts the type hints, and pulls out the function's docstring. From those elements, it automatically generates a strict JSON schema and attaches it to the agent. Let us look at a concrete scenario. You want to give your agent a function called fetch weather. This function needs precise geographical data to work. Instead of letting the model guess what string format to use, you define a specific structure. You create a custom type, perhaps a Typed Dictionary called Location, containing distinct string fields for city and country. You then use this Location type as the strict type hint for the input parameter on your fetch weather function. Here is the key insight. You must add a clear docstring to this function. You might write a simple sentence stating that this tool retrieves the current weather conditions for a specific city and country. The framework extracts this text and uses it as the tool's core description in the prompt. Your docstring is no longer just a helpful note for your fellow developers. It is the literal instruction manual the agent evaluates to decide if it should trigger the tool. If a user asks whether they need a jacket in Tokyo, the agent reviews the available tools. It reads your docstring, realizes the fetch weather function provides the answer, and structures a request. Because you type-hinted the input with your Location dictionary, Pydantic guarantees the agent's output exactly matches the required fields before your Python logic even executes. If the model tries to pass a single text string instead of the dictionary, the framework catches the error and forces the agent to try again with the correct structure. The SDK executes the function locally, captures the return value, and feeds the result directly back into the agent's reasoning loop. Your standard Python type hints and docstrings are no longer passive documentation; they form the active, binding API contract your agent relies on to interact with the real world. Thanks for spending a few minutes with me. Until next time, take it easy.

Scaling Tool Surfaces with Hosted Tool Search

3m 22s

Learn how to manage massive tool libraries without exhausting your token budget. We cover deferred loading, namespaces, and hosted tool execution.

Download

Hi, this is Alex from DEV STORIES DOT EU. OpenAI Agents SDK, episode 4 of 18. Passing a hundred tools to a language model destroys performance and burns your token budget before a single action is taken. You cannot stuff every enterprise API schema into the initial context window and expect good reasoning. The solution to this is Scaling Tool Surfaces with Hosted Tool Search. Before we look at scaling, we need to define hosted tools. Hosted tools execute natively on OpenAI infrastructure rather than running on your local machine. Built-in examples include the Web Search Tool and the File Search Tool. You do not write the execution logic, manage the web crawler, or build the file chunking mechanism for these. You attach them to your agent, and the OpenAI backend handles the actual work. But the concept of hosted tools extends to how the model discovers your own custom tools when you have too many of them. Consider a customer relationship management agent. You might have fifty distinct tools for checking order status, pulling billing history, updating shipping addresses, and retrieving support logs. If you pass all fifty schemas upfront, you overwhelm the model and waste input tokens. Many developers think they need to solve this by building a client-side retrieval step. They assume they must intercept the user prompt, search a local vector database for relevant tool schemas, and dynamically inject them into the prompt before calling the language model. You do not need to do that. Hosted tool search happens natively on OpenAI servers using the Responses API. The model itself is capable of searching the available tool surface without your client code acting as the middleman. You achieve this using two parameters: tool namespace and defer loading. When you register your CRM tools, you group related functions by assigning them to a shared namespace. For instance, you could place all your customer profile tools into a namespace called customer account. Then, you set the defer loading parameter to true for those tools. This is the part that matters. When defer loading is active, the agent does not send the individual tool schemas to the language model at the start of the conversation. Instead, it sends a single, lightweight schema that represents the customer account namespace itself. The model is made aware that this namespace exists and knows how to query it if needed. When the user asks to look up a specific customer ID, the model realizes it needs more information. It executes a search natively against the customer account namespace. OpenAI servers find the relevant billing or support tool, load only that specific tool schema into the model context, and then the model executes the tool call. This completely decouples the size of your tool library from your upfront token cost. You could attach hundreds of tools to a single agent, and the initial prompt remains tiny. The model only incurs the token price for the specific tool schemas it actively decides to pull in at runtime. By deferring the load, you are trading a massive static context burden for a dynamic, precise retrieval mechanism. That is your lot for this one. Catch you next time!

Decentralized Delegation: The Handoff Pattern

3m 27s

Master the art of multi-agent orchestration using the Handoff pattern. Discover how to create triage agents that seamlessly delegate full control to specialized sub-agents.

Download

Hi, this is Alex from DEV STORIES DOT EU. OpenAI Agents SDK, episode 5 of 18. Sometimes the best way for a manager to handle a complex task is to completely step out of the way. If your main routing agent tries to mediate every interaction between a user and your backend systems, your prompts get bloated and your execution gets unreliable. This is exactly what Decentralized Delegation, specifically the Handoff Pattern, is designed to fix. A handoff is a mechanism where one agent transfers full control of the conversation to another agent. A common mistake is confusing a handoff with standard tool calling. They are fundamentally different. When an agent calls a normal function, it pauses, waits for the data to return, and then formulates a response for the user. When an agent triggers a handoff, it yields the entire conversation turn. Control passes completely to the new agent. The original agent steps out of the way entirely. This matters because it keeps your architecture decentralized. If a central triage agent has to process the output of every specialist action, its system prompt must be enormous. It needs instructions on how to frame refund policies, technical troubleshooting, and account deletion. Worse, the triage agent will inevitably try to narrate the specialist's work back to the user. This wastes tokens, adds latency, and introduces a high risk of hallucination. Handoffs prevent this by letting the specialist talk directly to the user. Consider a customer support system. You deploy a general triage agent to greet users and categorize requests. A customer writes in asking to process a refund. You also have a dedicated refund agent, which is equipped with specific billing tools and strict instructions on the company return policy. To connect them using the SDK, you write a standard function called transfer to refund. But instead of returning a string or JSON data, this function returns your refund agent object. You then hand this transfer function to your triage agent, listing it just like any other tool. When the customer asks for a refund, the triage agent decides to call the transfer function. Here is the key insight. The underlying SDK runner loop executes the function and sees that an Agent object was returned instead of standard data. The runner instantly swaps the active agent in its memory. It takes the existing conversation history and feeds it directly into the newly activated refund agent. The refund agent takes over the active turn, processes the user request, triggers its own billing tools, and replies directly to the user. You can also pass data during this transition. If the triage agent already asked the user for their order number, it can pass that order number as an argument into the transfer function. The function can then inject that order number into the new agent's context variables before returning it. The refund agent wakes up already knowing exactly which transaction to look up. By using handoffs, you keep each agent small, focused, and predictable, letting the conversation flow naturally from one narrow expert to the next. That is all for this one. Thanks for listening, and keep building!

Centralized Orchestration: Agents as Tools

3m 31s

Keep conversation control in one place using the Agents as Tools pattern. We discuss how a manager agent can synthesize answers from multiple specialist sub-agents.

Download

Hi, this is Alex from DEV STORIES DOT EU. OpenAI Agents SDK, episode 6 of 18. If your AI assistant needs to consult three different departments before answering a user, you rarely want the user talking to those departments directly. You want a single, consistent voice handling the conversation, fetching information silently behind the scenes. That is exactly what Centralized Orchestration using Agents as Tools allows you to do. You might confuse this with handoffs, where an agent permanently passes the user over to another agent. With handoffs, the new agent takes over the conversation completely. With centralized orchestration, control is never passed. The main agent, usually acting as a manager, retains absolute control over the conversation. The manager is the only voice the user ever hears. You achieve this by taking a fully configured agent and turning it into a callable function. Every agent object in the SDK has a method named as_tool. When you call this method, it wraps the entire agent, including its specific instructions and its own tools, into a standard tool format. You then provide this wrapped agent to your manager agent, exactly as you would provide a standard Python function. Let us look at a practical scenario. You are building a customer support portal. You create a booking specialist agent. Its only job is to query internal systems, cross-reference dates, and return availability. This agent is highly technical. Its instructions are optimized for database accuracy, not polite conversation. You do not want the user interacting with this specialist. So, you call the as_tool method on the booking specialist. Next, you create your manager agent. You give the manager strict instructions to maintain a polite, corporate tone and handle the user relationship. You then add the wrapped booking specialist to the manager's list of tools. When a user asks the manager to check availability for next Tuesday, the manager processes the request. It recognizes that it lacks the actual data, but it knows it has a tool that can find it. The manager invokes the booking tool. Here is the key insight. When that tool is invoked, the booking specialist agent wakes up, executes its own isolated internal steps, and produces an answer. But it does not send that answer to the user. It returns a raw, factual result directly back to the manager. The manager receives this data, synthesizes it, wraps it in a polite corporate greeting, and finally responds to the user. This hub-and-spoke pattern solves a major problem with complex applications: context bloat. The manager agent does not need to know the database schema or the rules for checking dates. It keeps its system prompt clean, focusing entirely on routing requests and formatting responses. Meanwhile, the specialist agent does not need to care about conversation history or brand voice. It just does its narrow job and returns a result. When deciding between a handoff and a tool, ask yourself who owns the final response. If the specialized agent needs to enter a prolonged back-and-forth dialogue with the user, you want a handoff. But if the specialized agent is just a sophisticated data processor providing an answer for the main assistant to use, wrap it as a tool and let the manager own the relationship. Thanks for spending a few minutes with me. Until next time, take it easy.

Shaping Context with Handoff Inputs and Filters

3m 47s

Optimize multi-agent token usage by modifying conversation histories between handoffs. Learn how to inject metadata and apply transcript filters.

Download

Hi, this is Alex from DEV STORIES DOT EU. OpenAI Agents SDK, episode 7 of 18. When transferring a customer to a human specialist, you do not force the specialist to read a massive raw transcript of every automated system check that just occurred. You give them a clear reason for the transfer and a clean summary of the problem. Yet, when developers connect AI agents, they often dump the entire raw chat log into the next agent's context window. Today, we fix that by Shaping Context with Handoff Inputs and Filters. When one agent hands over control to another, it needs a way to communicate why the transfer is happening. You do this using the input type parameter on your handoff routine. You define a schema, typically a Pydantic model, specifying exactly what information the receiving agent requires. When the current agent decides to execute the handoff, the underlying language model generates a payload that matches this schema. Let us clear up a common confusion right away. It is easy to mistake this input type for persistent application state, like a user profile ID or a backend database connection that lives throughout the session. It is not. The input type is strictly for transient, model-generated metadata created exactly at the moment of the handoff. For example, if a triage agent hands a user over to a billing specialist, the input type might require a field called escalation reason. The triage agent generates a short string explaining the specific billing error, and the billing agent receives that structured data immediately upon waking up. That handles the explicit handover message. Now, we must manage the conversation history. By default, the entire message history travels with the handoff. Every user prompt, every assistant reply, and every background tool call gets passed along. This burns through tokens quickly and fills the context window with irrelevant noise. You control this using an input filter. An input filter is a standard Python function that intercepts the conversation history right before the receiving agent reads it. It takes the full list of previous messages as an argument, processes them, and returns a new, modified list of messages. Consider a scenario where your initial agent spent ten turns calling various search tools and database APIs trying to resolve an issue before finally giving up and routing the user to a general FAQ agent. The FAQ agent only needs the user's actual questions. It absolutely does not need the raw JSON outputs of ten failed tool calls. To solve this, you write an input filter function. Inside it, you iterate through the list of incoming messages. You check the role of each message. If the message is a tool execution or a raw tool result, you drop it. If it is a direct user message or a final assistant reply, you append it to your new list. You then return this clean list and attach your filter function to the handoff definition. The FAQ agent now receives a streamlined history containing only the human-readable back-and-forth. Here is the key insight. Handoff inputs add structured intelligence to the transition, while input filters ruthlessly cut the noise. Together, they shape exactly what the receiving agent knows. You save tokens, reduce latency, and prevent the new agent from hallucinating based on the previous agent's discarded reasoning. Controlling context at the handoff boundary is the single most effective way to keep a multi-agent system fast and accurate. If you want to help keep the show going, you can search for DevStoriesEU on Patreon — your support means a lot. That is all for this one. Thanks for listening, and keep building!

Controlling State: to_input_list and Server IDs

3m 45s

A deep dive into manual conversation management. Understand the lowest-level methods for preserving context across turns and utilizing server-side response IDs.

Download

Hi, this is Alex from DEV STORIES DOT EU. OpenAI Agents SDK, episode 8 of 18. An AI agent with amnesia is useless. But storing its memory incorrectly can silently duplicate conversation history and double your API costs. This happens when you accidentally blend different methods of tracking chat history. Today we are looking at Controlling State: to_input_list and Server IDs. By default, a basic agent run is completely stateless. You send a string, you get a string. If you ask a follow-up question, the agent has no context for what you just discussed. You must provide the history yourself. While the SDK offers high-level session wrappers, sometimes you need the lowest-level, most transparent way to maintain chat history across multiple turns without any magically hidden state. There are two explicit ways to handle this. The first method keeps the source of truth on your machine using a method called to input list. When an agent finishes a run, it returns a result object. This object contains the final response, but it also contains the hidden steps the agent took to get there. If the agent called a database tool, read the output, and then formulated an answer, all of those intermediate steps are part of the conversation state. Calling to input list on the result object packages up that entire sequence. It returns a flat array containing the original user prompt, the agent replies, the specific tool calls, and the tool outputs. It formats all of this exactly how the API expects to receive it. If you are building a command-line chat loop, the logic goes like this. You define a variable to hold the conversation array. Initially, it just holds the first user prompt. You pass this array into the agent. When the agent finishes, you take the result and call to input list, which gives you the full updated history of that turn. When the user types their second question, you manually append their new message to the end of that list, and pass the whole thing back to the agent. You are in total control of the payload. Now, the second piece of this. Shuttling a massive array of previous messages and JSON tool outputs back and forth over the network on every single turn uses bandwidth. If you want to avoid that, you can use server-side IDs. Every response you get from the API includes a unique identifier. Instead of passing an array of past messages into your next agent run, you pass a parameter called previous response id. Here is the key insight. When you provide a previous response ID, your client only sends the brand new user message. You do not send the history array. The OpenAI server looks up that ID, retrieves the existing context thread on its end, attaches your new message to it, and generates the next reply. This brings us to a critical trap. You might be tempted to mix these approaches. You might append the user input to the client-side list, pass that full list to the agent, and also pass the previous response ID just to be safe. Do not do this. You must pick exactly one strategy per conversation. If you provide both the full history array and a previous response ID, the server will concatenate them. Your agent will read the entire conversation twice, get confused by the duplicate tool calls, and you will pay for those tokens twice. The choice between these two methods comes down to visibility versus efficiency. Use client-side lists when you need to audit, filter, or modify the conversation history between turns. Use server-side IDs when you trust the raw thread and want to minimize your network payload. That is all for this one. Thanks for listening, and keep building!

Automating Memory with Built-in Sessions

3m 42s

Simplify your chat loops with the SDK's built-in memory system. We explore SQLiteSession, OpenAIConversationsSession, and automated persistence.

Download

Hi, this is Alex from DEV STORIES DOT EU. OpenAI Agents SDK, episode 9 of 18. You are building a persistent chat bot. You write code to fetch previous messages from a database, pass them to your agent, extract the new response, and write the updated list back to the database. It is tedious boilerplate you have written a dozen times. Today we are looking at Automating Memory with Built-in Sessions, which replaces all that database read and write logic with a single object. Let us clear up a common misconception right away. When developers start using sessions in this SDK, they often think they still need to manually fetch and pass the message history to the agent runner alongside the session. They do not. Providing a session object to the runner completely replaces manual history management. You hand over the keys. The runner automatically retrieves past messages right before the conversation turn begins, and it automatically appends the new messages the moment the turn ends. Consider a persistent Slack bot that needs to remember user preferences across different days. You want to store this state on disk without setting up a heavy external database. The SDK provides a built-in tool for this called SQLite Session. Because interacting with a file system requires input output operations, this tool is fully asynchronous. To use it, you first instantiate a SQLite Session, providing a file path for your local database. Then, you connect to it using an async context manager. Think of this as opening a safe connection that guarantees the database file is properly locked and unlocked. Inside that connection block, you call your agent runner. Instead of passing an array of past messages to the runner, you simply pass the session object and a session ID. The session ID is just a unique string. For your Slack bot, this ID could be the user ID or the channel ID. The runner takes that ID, searches the SQLite file, loads the existing history, processes the user prompt, and then safely persists the new state back to the file. All of this happens behind the scenes. Here is the key insight. Unbounded conversation history will eventually break your context window limits. You do not want a minor chat from three months ago inflating your token usage or crashing your API call today. To control this, the SDK provides Session Settings. When you call the runner, you can include a Session Settings object alongside the session itself. This settings object accepts a parameter for maximum past messages. If you set it to ten, the runner automatically truncates the loaded history. It keeps only the ten most recent messages in the active context sent to the model, but your full historical log remains safely untouched in the SQLite database. SQLite Session is ideal for local persistence or single server applications. If your Slack bot grows and you need to scale across multiple servers, the SDK handles that seamlessly. You keep your runner code exactly the same, but swap the local SQLite session for a distributed option like a Redis Session or a Dapr Session. The primary takeaway is that sessions enforce a strict separation of concerns. By configuring a session object and handing it to the runner, you eliminate fragile database boilerplate and guarantee your agent memory is always perfectly synchronized with its execution state. That is it for today. Thanks for listening — go build something cool.

Protecting Workflows: Input and Output Guardrails

3m 32s

Secure your AI pipelines by catching malicious inputs before they reach expensive models. We cover agent-level guardrails and parallel vs blocking execution.

Download

Hi, this is Alex from DEV STORIES DOT EU. OpenAI Agents SDK, episode 10 of 18. Your most powerful reasoning model is also your most expensive. If a user tries to trick it into doing their math homework or violating safety policies, you do not want to find out after it just spent two minutes and a thousand tokens thinking about it. Protecting your workflows with Input and Output Guardrails is how you prevent this. Agent-level guardrails act as bouncers for your primary models. They are separate functions, often powered by smaller, faster, and cheaper models, that validate data entering or leaving your agent. By intercepting requests, they keep your expensive models focused on real work and ensure your application remains safe and on-topic. You apply them in two places: at the input, and at the output. Input guardrails evaluate the user's prompt before your primary agent gets to work. Consider a scenario where you have a heavy, slow model handling complex financial analysis. You can set up an input guardrail using a fast, lightweight model to review every incoming message. When a user asks a question, this fast model intercepts it. It checks if the user is trying to get the agent to do a school assignment, or perhaps attempting a prompt injection attack. If the input is flagged, the guardrail rejects it immediately and returns a standard refusal message. Your heavy reasoning model never even wakes up. You save time, and you save money. Output guardrails handle the other end of the transaction. They provide a final check before the user sees the response. The main agent has completed its task, but you need to ensure no sensitive data is leaking, or that the tone aligns with your company guidelines. The output guardrail reviews the generated text. If it detects a hallucination or a policy violation, it blocks the message from reaching the user. Here is the key insight. How these guardrails impact your application depends entirely on their execution mode. You can run them in blocking mode, or parallel mode. Blocking mode is a strict sequence. An input guardrail must finish its evaluation and return a passing grade before the main agent is allowed to start. An output guardrail must finish checking the final response before the user receives a single word. This is the safest approach and guarantees you will not waste money on bad requests, but it adds latency to the interaction. Parallel mode trades strict cost control for speed. A common misconception is that parallel execution somehow pauses the main agent while the guardrail runs alongside it. It does not. In parallel mode, the input guardrail and the main agent start at the exact same time. The agent is actively generating text and consuming tokens while the guardrail is still evaluating the input prompt. If the guardrail decides to fail the request, it will cancel the main agent mid-flight. The user is still protected from seeing the output, but you still pay for the tokens the primary agent consumed before it was cut off. You configure this by defining a simple function that returns a pass or fail decision, attaching it to your agent, and declaring its mode. Always match your execution mode to your priorities: use blocking guardrails to protect your wallet from expensive models, and parallel guardrails to protect your user experience from latency. That is all for this one. Thanks for listening, and keep building!

Validating Actions: Tool-Level Guardrails

3m 43s

Prevent critical data leaks at the function level. Learn how to wrap specific tools with precise input and output guardrails.

Download

Hi, this is Alex from DEV STORIES DOT EU. OpenAI Agents SDK, episode 11 of 18. Even the smartest AI can accidentally leak a database secret if it pulls raw data straight into its context. You might assume your top-level safety checks catch everything, but those only run at the very start or the very end of a conversation. If you want to intercept sensitive data mid-workflow, you need Validating Actions, specifically Tool-Level Guardrails. A common trap is assuming that agent-level guardrails protect your underlying systems. They do not. Agent guardrails process the user initial prompt or the final response sent back to the user. They are blind to the internal back-and-forth when an agent calls a database or a third-party API. If a rogue prompt tricks your agent into calling an internal tool, the agent-level checks will not stop it. To protect the tools themselves, the OpenAI Agents SDK provides tool guardrails. These sit directly on the function, acting as a mandatory checkpoint right before or right after the tool executes. There are two types you need to know. The first is the tool input guardrail. You apply this decorator to validate arguments before the actual tool runs. Say an agent tries to call a tool that deletes a user account. The input guardrail intercepts the arguments the agent generated. It checks if the provided user ID matches a valid format, or if the current session has the correct authorization level. If the input fails this check, the guardrail stops the tool from running entirely. Instead of executing the deletion, it returns an error message directly to the agent. The agent then reads that error and can try again with corrected inputs, without ever touching the actual database. Now, the second piece of this is the tool output guardrail. This operates after the tool successfully executes, but before the result is handed back to the agent. This is where you filter, redact, or validate the payload. Take a database lookup tool as an example. The agent asks for a developer profile, and the tool fetches the raw record. However, that record happens to contain an active API key starting with the letters s-k-dash. If that raw key goes back to the agent, it enters the language model context window. That is a massive security risk. To fix this, you add a tool output guardrail to that specific lookup function. The guardrail takes the raw database result, scans the text for that s-k-dash pattern, and replaces the actual key with a placeholder string like redacted. Only after this scrubbing process is the cleaned data handed back to the agent. The agent still gets the profile information it needs to answer the user, but the secret never leaves the tool isolated execution boundary. This is the part that matters. In complex multi-agent workflows, different specialized agents might invoke tools in unpredictable sequences. You cannot rely on prompting the agent to behave securely. You also cannot hope the final output filter catches a leaked key right before it goes to the user, because by then, the key has already been exposed to the language model. You must lock down the tool itself. By attaching the guardrail directly to the function, the security logic travels with the tool. It does not matter which agent calls it or when it gets called in the workflow. The protection is absolute. Tool guardrails treat your functions as zero-trust boundaries, ensuring that no matter how autonomous your agent gets, it can never pass bad data in, or pull sensitive data out. That is all for this one. Thanks for listening, and keep building!

Pausing Execution: Human-in-the-Loop and RunState

3m 18s

Implement safeguards for irreversible actions by enforcing human-in-the-loop approvals. We explore the RunState serialization pipeline for pausing and resuming workloads.

Download

Hi, this is Alex from DEV STORIES DOT EU. OpenAI Agents SDK, episode 12 of 18. You would not let a new intern drop a production database without asking for permission first. Your AI agent should not be able to either. When an agent has access to highly destructive or sensitive tools, you need a way to stop it, check its work, and give a manual green light. This is solved by Pausing Execution: Human-in-the-Loop and RunState. If you need a human to approve an action, your first instinct might be to pause the Python script and wait for keyboard input. Do not do this. Keeping a process alive while waiting for an email response or a dashboard click wastes server resources. It also breaks completely if the server restarts or if you deploy new code. The OpenAI Agents SDK handles this cleanly by allowing the Python process to exit entirely and resume later on a completely different server. It starts at the tool definition. When you write a function for your agent, like a tool called delete production database, you apply a flag setting needs approval to true. When the agent processes a prompt and decides it must call this specific tool, the engine halts immediately. The tool does not run. Instead, the runner yields control back to your application code. Here is the key insight. When the execution halts, the runner gives you a RunState object. This object holds the entire context of the run up to that exact millisecond. It knows the conversation history, the agent's internal thought process, and the specific tool call it wants to execute next. You take this RunState object and serialize it into a standard JSON string. You save that JSON payload to your database, or push it to a queue, or write it to disk. Once that state is safely stored, your Python script terminates. Your application is now idle. Hours or even days can pass. Eventually, an engineering manager logs into a web dashboard, sees the pending database deletion, and clicks approve. That click triggers a brand new web request. Your backend wakes up and reads the saved JSON payload from the database. It deserializes that string back into a valid RunState object. You then start a new runner execution. You pass in the same agent instance, the restored RunState, and the human's decision. If you pass an approval, the runner executes the database deletion tool and the agent continues answering the user. If the manager clicked reject, you pass a rejection instead. The runner does not execute the tool. It feeds the rejection back to the agent as a tool error, forcing the agent to adapt its plan or tell the user the action was denied. By serializing the RunState, you turn a synchronous script into an asynchronous workflow, letting humans and agents collaborate safely across any time gap without leaving a single server process hanging. If you want to help keep the show going, you can support us by searching for DevStoriesEU on Patreon. Thanks for hanging out. Hope you picked up something new.

Injecting Local Dependencies with RunContextWrapper

3m 45s

Master dependency injection in your agent flows. Learn how to securely pass local states and database connections to tools without leaking them to the LLM.

Download

Hi, this is Alex from DEV STORIES DOT EU. OpenAI Agents SDK, episode 13 of 18. How do you tell a function which user is currently chatting, without accidentally teaching the LLM the user's private database ID? If you drop it in the system prompt, you are exposing internal data and wasting tokens. To pass secure, local state to your tools, you use Injecting Local Dependencies with RunContextWrapper. A common mistake when building agents is treating the language model as the middleman for everything. Developers often try to embed database connection strings, internal API keys, or private user identifiers right into the system instructions. They do this hoping the model will dutifully pass those sensitive credentials back into the tool calls as parameters. This is a significant security risk. It also consumes unnecessary context window tokens and increases the chance of the model hallucinating incorrect parameters. The correct approach is to bypass the language model entirely for your local execution state. The Run Context Wrapper provides a secure transport layer to inject dependencies straight into your tools and lifecycle hooks. The context attribute attached to this wrapper is purely local Python state. The model never sees it, never reads it, and never needs to reason about it. Consider a concrete scenario. An authenticated user is chatting with your customer support agent and asks to view their recent billing records. Your system needs to fetch these records, which means your billing tool requires the user's internal database ID to run the database query safely. First, you define your local dependency in your application code. You might create a structured data class called User Info that holds the internal user ID. Next, you write your tool function for fetching the billing history. In the signature of this tool function, you define a parameter specifically typed to accept the context object. The SDK understands this type hint and knows not to expose this parameter to the language model. Inside the function body, you access the context parameter, pull out your User Info dependency, and use that private user ID to securely query your backend. Here is the key insight. When you prepare to execute the agent run, you do not just pass the raw user message. You create an instance of the Run Context Wrapper. You attach your populated User Info data class directly to the wrapper. Then, you hand this wrapper into the execution runner. The logic flow handles the rest automatically. When the user asks for their billing history, the language model decides to trigger your billing tool. The model only provides the arguments it knows about from the public conversation, perhaps filtering by a specific month or an invoice number. It has absolutely no idea who the user is on your backend. Before the function actually runs, the SDK intercepts the call. It inspects the tool signature, notices the tool requires the local context, and automatically injects the state you attached to the Run Context Wrapper. The tool executes using the secure user ID, retrieves the records, and returns the data to the model to formulate a response. This exact same dependency injection mechanism works for execution hooks, allowing you to pass active database connection pools or tracing IDs into your event logging seamlessly. By separating the external reasoning of the model from the internal execution of your code, your infrastructure remains secure and your prompts stay entirely focused on behavior. Keep your execution dependencies completely local, and force your architecture to rely entirely on the secure state injected at runtime. Thanks for listening, have a great day everyone!

The USB-C for AI: Intro to MCP

3m 15s

An introduction to the Model Context Protocol (MCP). Discover how this standard acts as a universal connector to easily hook AI agents into SaaS platforms.

Download

Hi, this is Alex from DEV STORIES DOT EU. OpenAI Agents SDK, episode 14 of 18. Stop writing custom API wrappers every time your agent needs to talk to a new service. If you are tired of manually mapping REST endpoints to JSON schemas just so your language model understands what a service does, there is finally a universal standard. We are looking at the Model Context Protocol, or MCP, and how to use it. Think of MCP as the USB-C port for AI. Before USB-C, every device needed a specific, proprietary charging cable. In the AI world, every external system requires custom glue code. If you want an agent to read a database or create a support ticket, you write a REST client. Then you write an intricate schema describing that client to the model. Finally, you write the logic to catch the model response and trigger the actual HTTP request. MCP replaces that manual labor. It is an open standard that dictates exactly how tools describe themselves, their inputs, and their outputs directly to an AI model. People often confuse MCP with standard REST APIs. A REST API sends data between machines, but it expects you to know exactly how to structure the request beforehand. MCP standardizes the discovery layer. An MCP server tells the agent exactly what tools it holds and what parameters are required, speaking the exact format the model needs. In the OpenAI Agents SDK, you consume these external MCP servers using a class called Hosted MCP Tool. Here is the key insight. You do not redefine the tool schema in your Python code at all. Instead, you initialize the Hosted MCP Tool by giving it the URL of a remote MCP server. This connection operates over Server-Sent Events, which allows the server to push updates back to your agent securely. When you attach this Hosted MCP Tool to your agent setup, a handshake happens. The agent reaches out to the remote server, asks what tools are available, pulls down the fully formed descriptions, and registers them automatically. Let us anchor this to a concrete scenario. You want to give your agent the ability to schedule meetings. Without MCP, you would study the Google Calendar API. You would write a Python function to authenticate and create events. Then you would write out the tool schema so the agent knows what an event title or a timestamp looks like. With MCP, you deploy a pre-built Google Calendar MCP server remotely. In your application code, you create a new Hosted MCP Tool instance and point it at that remote URL. You pass that single instance to your agent. The remote server instantly tells the agent it has a tool called schedule meeting. When the language model decides to schedule that meeting, the SDK proxies the call. It securely routes the request over the network to the remote server, executes the action, and returns the result. You wrote zero lines of calendar integration logic. The real power of MCP is decoupling your core agent logic from external integrations, letting you swap or upgrade backend services by just changing a URL. Thanks for listening, happy coding everyone!

Connecting Local MCP Servers via Stdio and HTTP

3m 32s

Dive deeper into MCP by running standard local servers. Learn to sandbox filesystem access and internal tools securely with MCPServerStdio.

Download

Hi, this is Alex from DEV STORIES DOT EU. OpenAI Agents SDK, episode 15 of 18. Giving a language model direct access to your local filesystem sounds like an absolute security nightmare. But if you strictly sandbox that access through a standardized protocol, it becomes a safe and powerful way to process local data. Today we are looking at connecting local MCP servers via stdio and HTTP. The Model Context Protocol defines how your agents talk to external tools and data sources. When you want to run these servers inside your own infrastructure, the OpenAI Agents SDK provides two local transport methods. The first is standard input and output, using the class called MCPServerStdio. People sometimes think this requires a complex local networking setup. It does not. When you use the stdio transport, the SDK simply spawns the MCP server as a local child process. The agent sends requests by writing to the standard input of that process and reads responses from its standard output. Consider a scenario where you want your agent to read files, but only from one specific local directory. You instantiate MCPServerStdio and pass it the command to run. For example, you might pass the node package manager command npx, followed by the arguments to launch the official MCP filesystem server, and the absolute path to your target directory. Because this server runs as a subprocess, you must manage its lifecycle. If your Python script finishes or crashes, you do not want an orphaned node process lingering in the background. The SDK enforces clean lifecycle management by requiring an asynchronous context manager. You define an async with block to initialize the stdio server. When execution enters the block, the SDK spins up the child process. When execution exits the block, it cleanly tears the process down. Inside that block, you connect the running server to your agent. You create an MCP Client, pass it your stdio server instance, and then provide that client to your agent. The agent now has scoped, temporary access to your local directory. Now, what if you want to expose a private internal API, or connect to a service that is already running? You do not want to spawn a new subprocess for that. This brings us to the second transport option, which is MCPServerStreamableHttp. Instead of passing an executable command, you provide this class with the URL of your existing service. This is perfect for securely connecting your agent to an internal microservice running on localhost. The agent communicates by streaming data over HTTP. Here is the key insight. The agent itself has no idea which transport you chose. The connection code inside your application looks exactly the same. You still use a context manager, you still create an MCP client, and you still hand it to the agent. The SDK abstracts the transport layer completely. The single most useful takeaway here is that you can develop a custom tool as a local stdio script for testing, and later deploy it as a standalone HTTP service in production, without changing a single line of your agent logic. That is all for this one. Thanks for listening, and keep building!

Visualizing Workflows with Built-in Tracing

3m 18s

Eliminate print statement debugging using built-in SDK observability. Discover how automatic spans and traces link entire complex workflows.

Download

Hi, this is Alex from DEV STORIES DOT EU. OpenAI Agents SDK, episode 16 of 18. Debugging a multi-agent system with print statements is a nightmare. You watch logs scroll by, trying to manually match a tool execution on line fifty with a model generation on line two hundred. Thankfully, you never have to do that again. This episode is about Visualizing Workflows with Built-in Tracing. When developers start building complex agent workflows, their first instinct is often to write custom logging wrappers. They write boilerplate code to track exactly when an agent starts, what arguments a tool receives, and how long the model takes to reply. Do not build this. The Agents SDK handles it for you, completely out of the box. Every time you run an agent, request a generation, or trigger a tool call, the SDK automatically wraps that action in a span. A span is simply a timed, structured record of a single operation. These spans capture the inputs, the outputs, and the duration of the task. They are automatically sent to the OpenAI dashboard. This means you get a full visual timeline of your workflow execution without writing a single line of telemetry code. Here is the key insight. While the SDK handles the granular details, you control the high-level grouping. Say you have a multi-turn scenario where an agent generates a joke, a second agent evaluates it, and the first agent refines it based on the feedback. By default, the SDK will track each of these model generations and tool calls as distinct spans. But to a human looking at a dashboard, this entire back-and-forth is just one logical unit of work. You can group these actions using a custom trace block. You open a context block using the trace function and give it a descriptive name, like joke generation loop. Inside that block, you execute your multi-turn agent logic. The SDK respects this hierarchy. It will nest all the automatically generated spans for the individual runs, evaluations, and refinements under your custom parent trace. When you open the OpenAI dashboard, you see the top-level joke generation loop first. You can then expand it to investigate the exact sequence of model calls and tool executions that happened inside. That covers visibility, but what about privacy? There are times when you absolutely must not send telemetry. If your agent handles sensitive medical records, financial data, or requires strict Zero Data Retention compliance, sending input and output logs to a dashboard is a security violation. For these situations, the SDK provides a tracing disabled context block. When you wrap your agent execution inside this block, the SDK completely stops generating and sending spans. The execution runs locally, the result is returned to your application, but no record of the prompt, the tool calls, or the response will appear on the OpenAI dashboard. Once the code exits that disabled block, normal automatic tracing resumes for the rest of your application. Tracing in the SDK means you stop writing boilerplate logging, get immediate visual debugging, and still retain total control over when your data stays entirely local. That is all for this one. Thanks for listening, and keep building!

Low-Latency Voice with Realtime Agents

3m 48s

Break the standard request-response paradigm. See how Realtime Agents maintain live WebSocket connections to handle interruptions and multimodal reasoning.

Download

Hi, this is Alex from DEV STORIES DOT EU. OpenAI Agents SDK, episode 17 of 18. The hardest part of building voice AI is not generating the speech. It is knowing when to instantly shut up because the human caller interrupted mid-sentence. If you rely on the traditional request-and-response cycle, network latency will always expose the bot. That is exactly the problem solved by Low-Latency Voice with Realtime Agents. Many developers confuse this architecture with a standard voice pipeline. In a traditional setup, you capture audio, run it through a Speech-to-Text model, send that text to a language model, and finally push the text output through a Text-to-Speech synthesizer. That pipeline introduces latency at every step. Realtime Agents discard that pipeline entirely. They use a single, multimodal model that natively reasons over audio. Audio goes directly in, and audio streams directly out. No intermediate text translation is required for the system to understand the user. To achieve this, you have to break the standard HTTP request-response paradigm. Instead of sending a payload and waiting for a complete reply, the system keeps a persistent connection open. In the SDK, you manage this using two primary components. The first is the RealtimeAgent. This object holds your system instructions and any functions or tools the model needs access to. It defines the logic, capabilities, and personality of your assistant. The second component is the RealtimeRunner. The runner is the execution engine. It manages the asynchronous event loop and handles the persistent network stream, typically over a WebSocket or WebRTC connection. Consider a telephony customer service bot handling an incoming phone call. You bridge your voice provider, perhaps integrating via SIP or WebSockets, routing the continuous audio to your Python application. You create your RealtimeAgent, equipping it with a tool to fetch user accounts. Then, you pass that agent and your network connection client into the RealtimeRunner. When you call the run method on the runner, it takes over. It keeps the connection alive, constantly listening to the audio stream, while simultaneously handling any function calls the agent needs to make. It pushes and pulls events in both directions concurrently. The user calls in and asks for their account balance. The agent triggers the account-fetch tool, gets the data, and immediately begins streaming its spoken response back to the caller. Halfway through the sentence, the user suddenly speaks over the bot, stating they actually want to report a lost card. Here is the key insight. Because the WebSocket connection is persistently open and the model natively ingests the incoming audio stream in real time, the server detects the human voice instantly. It fires an event that halts its own audio output and registers the truncation. You do not have to write custom logic to calculate exactly which audio chunk was playing when the user spoke. The streaming architecture handles the interruption seamlessly. The model absorbs the new audio context from the interruption and immediately begins streaming the updated response about canceling the lost card. You get natural conversational dynamics entirely because the network layer and the model layer are built for continuous streaming. The real power of the Realtime API in the Agents SDK is treating voice as a first-class continuous stream, rather than a batch of translated text in disguise. That is all for this one. Thanks for listening, and keep building!

Building Reactive UIs with Streaming Events

3m 33s

Go beyond streaming text tokens. Utilize semantic streaming events to build ultra-responsive frontend interfaces that react to agent actions in real time.

Download

Hi, this is Alex from DEV STORIES DOT EU. OpenAI Agents SDK, episode 18 of 18. Your users do not just want to read text as it types out on the screen. They want to see exactly what the AI is doing behind the scenes. If you are trying to build a loading spinner by parsing raw token chunks to guess when a search tool is running, you are doing it the hard way. Building Reactive UIs with Streaming Events solves this exact problem. Developers often treat AI outputs like a basic typewriter. They listen to a raw text stream and write brittle parsing logic to figure out if the agent is about to call a function. That approach is fragile. It breaks if the model slightly changes its phrasing, and it leaves your frontend lagging behind the actual execution state. The OpenAI Agents SDK provides a structural alternative. Instead of waiting for strings, you trigger your agent using a method called run streamed on your runner. This method does not yield plain text. Instead, it yields an asynchronous sequence of semantic events, each packaged as a Run Item Stream Event object. Think of a Run Item Stream Event as a precise notification about the agent's internal lifecycle. As the agent processes a request, it emits distinct, predictable signals. It tells you exactly when a new message is added to the thread. It tells you when control transfers from a triage agent to a specialized agent. Crucially for frontend development, it tells you the exact millisecond a tool invocation begins and ends. Let us apply this to a concrete scenario. You want your frontend to display a spinner that says Searching database while the agent looks up a customer record. You call run streamed and loop over the results. Inside that asynchronous loop, you inspect each event as it arrives. When an event comes down the wire, you check its properties to see what kind of update it represents. When an event indicates a tool call has started, you can read the actual name of the tool directly from the event payload. You do not have to parse any natural language. If the tool name matches your database search function, you immediately push a state update to your frontend to render the spinner. When a subsequent event signals that the tool execution is complete, you dispatch another update to hide the spinner. These stream events also carry standard message deltas. If the agent is generating a long textual response, the stream emits chunk events that you append to the user interface. The architecture separates the raw conversational output from the semantic actions. You route the text chunks to the chat window, and you route the tool and agent lifecycle events to your UI state manager. This separation of concerns allows you to build interfaces that feel instantly responsive and deeply connected to the agent logic. You are reacting to the system's actual execution path, not guessing its intentions based on words. Here is the key insight. Stop treating your agent output as just a conversation stream. Treat it as an event-driven state machine where every internal action is an opportunity to keep your user visually informed. Since this is the final episode of the series, I encourage you to read through the official documentation and try orchestrating these streams yourself. If you have ideas for what we should cover in a future series, visit devstories dot eu and let us know. That is your lot for this one. Catch you next time!