Back to catalog

Season 50 16 Episodes 57 min 2026

Deep Agents

v0.5 — 2026 Edition. A comprehensive audio course on Deep Agents, the open-source Python library for building and orchestrating GenAI LLM agents. Learn the agent harness pattern, context management, and production-ready deployments. (v0.5, 2026 Edition).

LLM Orchestration Multi-Agent Systems

🌐 English 🇪🇸 Español 🇫🇷 Français 🇵🇹 Português 🇮🇹 Italiano 🇵🇱 Polski 🇩🇪 Deutsch 🇷🇴 Română

Now Playing

Click play to start

0:00

The Agent Harness Pattern

This episode covers the core identity of the Deep Agents library and what an 'agent harness' actually is. Listeners will learn why Deep Agents exists, how it sits on top of LangChain and LangGraph, and how it compares to tightly integrated solutions like the Claude Agent SDK or Codex.

3m 21s

The Core Loop

This episode covers the basics of launching an agent using the create_deep_agent function. Listeners will learn how to configure a model string, pass basic tools, and let the agent autonomously plan and execute a request.

3m 09s

The Pluggable Filesystem

This episode covers how Deep Agents interact with files through pluggable backends. Listeners will learn the difference between StateBackend, FilesystemBackend, and LocalShellBackend, and how to safely grant an agent local access.

3m 06s

Dynamic System Prompts

This episode covers how Deep Agents assemble context engineering dynamically. Listeners will learn how system prompts, tool schemas, and runtime context combine to give the agent exactly the instructions it needs.

3m 56s

Context Compression & Offloading

This episode covers how Deep Agents survive long-running tasks without hitting token limits. Listeners will learn about automatic tool offloading to the virtual filesystem and dynamic conversation summarization.

3m 27s

Context Isolation with Synchronous Subagents

This episode covers how to prevent context bloat using task delegation. Listeners will learn how to configure the subagents parameter and use the built-in task tool to spawn ephemeral, specialized agents.

4m 08s

Human-in-the-Loop Interventions

This episode covers how to pause agent execution for sensitive operations. Listeners will learn how to configure the interrupt_on parameter to require approval, rejection, or edits before a tool runs.

3m 45s

Extending the Harness with Middleware

This episode covers how Deep Agents handles capabilities under the hood via middleware. Listeners will learn how to intercept tool calls and extend graph state safely without mutating instances.

3m 26s

Project Conventions via Memory Files

This episode covers how to give an agent persistent understanding of your codebase. Listeners will learn how AGENTS.md files serve as always-loaded memory for coding style and architectural patterns.

3m 34s

Progressive Disclosure with Skills

This episode covers how to extend an agent's expertise without blowing up the context window. Listeners will learn how to write SKILL.md files and how the agent uses progressive disclosure to match tasks to skills.

3m 37s

Long-term Memory Stores

This episode covers how to persist files and knowledge across multiple threads. Listeners will learn how to configure a CompositeBackend to route specific directories to a persistent LangGraph Store.

3m 54s

Executing Code in Sandboxes

This episode covers how to safely run agent-generated code using remote sandboxes. Listeners will learn how to configure the Sandbox-as-tool pattern with providers like Modal, Daytona, and Runloop.

3m 34s

Subgraph Streaming UX

This episode covers how to build transparent interfaces for multi-agent workflows using LangGraph streaming. Listeners will learn about the v2 stream format and how to track progress across subagent namespaces.

3m 28s

The CLI and External MCP Tools

This episode introduces the Deep Agents CLI and how to extend it with the Model Context Protocol (MCP). Listeners will learn how to configure .mcp.json files to seamlessly connect their agent to external databases and APIs.

4m 00s

Editor Integrations via ACP

This episode covers the Agent Client Protocol (ACP) and how to bring custom Deep Agents into IDEs. Listeners will learn how to run an AgentServerACP over stdio to interface with code editors like Zed.

2m 58s

Background Workers with Async Subagents

This episode covers launching non-blocking background tasks for long-running workflows. Listeners will learn how AsyncSubAgent configurations deploy independently on LangSmith and interact via the start, check, update, and cancel tools.

3m 54s

Episodes

The Agent Harness Pattern

3m 21s

Download

Hi, this is Alex from DEV STORIES DOT EU. Deep Agents, episode 1 of 16. Building a coding agent from scratch is fun for about an hour. Then your model hits the context limit, forgets the original task entirely, and overwrites a critical file with garbage. To stop your assistant from destroying its own workspace, you need what is known as the agent harness pattern. People often encounter this pattern through a library called Deep Agents. First, a quick clarification. Deep Agents is not LangChain, and it is not LangGraph. It is a standalone Python library that sits on top of those tools, packaging them into a ready-to-use coding assistant. If you try to build a coding assistant yourself, you usually start with a basic script. You take a user prompt, send it to a large language model, and print the code it returns. That is just a simple chat loop. It works fine for answering a single question. But soon, you want the model to actually implement a feature across multiple files on your machine. This is where the illusion breaks. You realize you have to manually write tools to read files, search directories, and apply code diffs safely. You need a system to track context over dozens of steps without maxing out the token limit. You have to build a persistent task list so the model actually remembers what it just did and what it needs to do next. You end up spending all your time writing file system boilerplate and state management instead of focusing on the agent behavior. This is exactly what the agent harness pattern solves. A harness is the infrastructure layer wrapping around the raw language model. It turns a fragile, stateless chat loop into a durable, long-running agent. In Deep Agents, this harness provides the memory management, the file system operations, and the step-by-step planning logic straight out of the box. You give it a high-level goal and point it at a local directory. The harness takes over the repetitive work of giving the model a safe environment to plan, edit, and verify code. Here is the key insight. The primary advantage of using an independent harness like Deep Agents is that it is completely model-agnostic. Tightly integrated solutions like the Claude Agent SDK or older ecosystems like OpenAI Codex are heavily optimized, but they lock you into a single provider. If a cheaper or smarter model is released tomorrow, migrating your tightly coupled agent is a massive headache. Because Deep Agents abstracts the environment away from the reasoning engine, the language model becomes a totally replaceable component. The harness manages the task lists, reads the file system, and handles the error recovery. That logic remains identical whether you use Anthropic, OpenAI, or a local open-weight model. The harness is the chassis of the car, and the language model is just the engine. The true value of this pattern is reliability. An agent is only as capable as the environment it operates in, and without a harness to anchor it to a real file system and a concrete plan, even the smartest model is just a text generator spinning its wheels. If you want to help keep the show going, you can search for DevStoriesEU on Patreon, any support there goes a long way and is always appreciated. That is all for this one. Thanks for listening, and keep building!

The Core Loop

3m 09s

Download

Hi, this is Alex from DEV STORIES DOT EU. Deep Agents, episode 2 of 16. You do not need a massive architecture diagram or hundreds of lines of boilerplate to get a language model to autonomously plan its work. You just need a single function call to trigger the Core Loop. The entry point for this behavior is a function called create deep agent. This function is where you define the brain of your application. Instead of importing separate client libraries and writing custom wrappers, you tell the function which model to use via a simple colon-separated string. You pass the provider, a colon, and the exact model name. You might use anthropic colon claude-sonnet-4-6, or openai colon gpt-5.4. The framework reads this string and handles the specific initialization under the hood. An agent needs a way to interact with the world, which you provide through tools. A tool is just a standard Python function that performs a specific action. Let us say you are building an internet research agent. You would write a function that takes a query, calls a search API, and returns text. You pass this search function directly into your create deep agent call as part of a list. Your agent is now ready. To start the process, you call the run method on your new agent and pass it a prompt, like asking it to research and synthesize recent news about quantum computing. Now the core loop takes over. Here is the key insight. Developers often assume they must write complex system prompts to force the model to break down tasks, or build custom parsers to track its progress. You do not have to do any of that. When you call create deep agent, the harness automatically injects a built-in tool called write todos. You never write or manage this tool yourself. It is part of the core engine. Before the agent ever touches your search tool, the core loop forces the model to use the write todos tool. The agent evaluates your prompt and generates a structured list of steps. Only after this plan is finalized does the execution phase begin. The loop iterates over the plan step by step. The agent looks at its first task, recognizes it needs information, and calls your internet search tool. It reads the raw text returned by the search engine and updates its internal memory. The loop then checks if the primary goal is satisfied. If the search results were incomplete, the agent moves to the next item on its todo list, perhaps formulating a new search query to find the missing details. This cycle of selecting a tool, observing the result, and checking the plan continues autonomously. When the loop confirms that all necessary tasks are complete, it stops calling tools. The agent analyzes the collected data, generates a final synthesized response, and returns it to you. The defining feature of the core loop is that it turns a static text generator into an active problem solver by structurally forcing it to plan before it acts. That is all for this one. Thanks for listening, and keep building!

The Pluggable Filesystem

3m 06s

Download

Hi, this is Alex from DEV STORIES DOT EU. Deep Agents, episode 3 of 16. Giving an autonomous agent raw access to your computer is an excellent way to accidentally delete your home directory. If an agent decides to reorganize your files or test a script, you need to ensure it is confined to a safe area. This is why we use The Pluggable Filesystem. When you give a Deep Agent tools to read or edit a file, those tools do not wire directly to your operating system. Instead, they map to a backend. A backend is a storage environment that dictates where and how files actually live. The default environment is the StateBackend. This is an ephemeral ghost drive held entirely in memory. Suppose you ask an agent to write a quick draft of an email or manipulate some text data. The agent creates the file, reads it back, and edits it, all within the StateBackend. When your script finishes running, that memory is cleared, and the files vanish. It is completely isolated and perfectly safe. But what happens when you want the agent to generate a real Python project on your local disk? Memory is no longer enough. You upgrade the agent's permissions by attaching a FilesystemBackend. This wires the agent's file tools to your actual hard drive. To keep this secure, you initialize the backend with a parameter called virtual mode set to true. Here is the key insight. Virtual mode creates a strict path boundary. You define a base directory, and the agent operates inside it. If the agent tries to read a sensitive system file outside that folder, the backend blocks the request. It traps the agent in a designated workspace. However, there is a dangerous misconception here. Many developers assume that enabling virtual mode creates a completely secure sandbox. It does not. The filesystem backend only controls file operations. If you also want your agent to run terminal commands, you have to attach a third type of backend called the LocalShellBackend. The LocalShellBackend gives the agent a new tool called execute. This allows the agent to run shell commands on your host machine. If you attach a LocalShellBackend, the virtual mode of your filesystem backend will not protect you from destructive terminal commands. The agent could execute a system-wide deletion script, and the shell would run it, completely bypassing your file path restrictions. File access and shell execution are distinct capabilities mapped to distinct backends. If your agent only needs to write code, stick to the FilesystemBackend. Only attach the LocalShellBackend if the agent absolutely must compile or run that code locally. The most secure autonomous agent is not the one with the smartest prompts, but the one deployed with an ephemeral state backend that physically cannot touch your host operating system. That is it for today. Thanks for listening — go build something cool.

Dynamic System Prompts

3m 56s

Download

Hi, this is Alex from DEV STORIES DOT EU. Deep Agents, episode 4 of 16. The most common mistake in agent design is writing a massive, four-thousand-word system prompt detailing exactly how every single tool works and hardcoding specific session data. You end up with a brittle prompt, exhausted context windows, and a nightmare when handling multiple users. Dynamic System Prompts resolve this by assembling the exact instructions the language model needs on the fly. Think of context engineering in Deep Agents as a dynamic assembly line. Instead of writing one static block of text, the final prompt sent to the language model is constructed from three distinct layers. These layers are the static system prompt, the tool prompts, and the runtime context. They merge at the exact moment of execution to form the complete input context. The first layer is the static system prompt. This is the piece you write manually. It defines the agent persona, its core rules, and its ultimate goal. It is deliberately simple. You might tell the agent it is a database query assistant, but you do not tell it how to connect to the database, what formatting the tool expects, or what the user session ID is. You keep this layer strictly focused on high-level business logic. The second layer consists of the tool prompts, and this is where the framework steps in. Deep Agents automatically injects the necessary usage instructions for the specific tools you attach to the agent. If you give the agent a tool to read a file, the framework dynamically appends the exact schema for that tool. Along with the schemas, it injects a built-in planning prompt. This planning prompt instructs the model on how to sequence the available tools effectively to solve a problem. You never manually write out the instructions for how the model should format a tool call or plan its execution steps. The framework handles those mechanics automatically. When you add or remove a tool from the agent, the underlying tool prompts update instantly without you touching the core system prompt. Now, we need to clear up a common source of confusion between the static system prompt and runtime context. The static prompt is defined when you initialize the agent. Runtime context, however, is injected exactly when you invoke the agent to do work. Consider a scenario where you are building a multi-tenant application. When a specific user asks a question, the agent needs that user's unique database connection ID to fetch their data. If you put that identifier in the static system prompt, you would have to recreate the entire agent from scratch for every single user request. Instead, you use runtime context. Here is the key insight. You pass a context object via a feature called ToolRuntime right at the invocation step. You hand the framework a dictionary containing the specific database connection ID for that session. The framework takes this runtime context, merges it with your static prompt and the auto-generated tool instructions, and wires it directly into the tools that require it. The tool executes using the correct, dynamically provided connection ID. Your static system prompt stays completely clean of temporary identifiers, API keys, or session tokens. At the millisecond of execution, the dynamic assembly line finishes its job. Deep Agents concatenates your core instructions, the dynamically generated tool schemas, the planning prompts, and the specific invocation data into one cohesive input context. The model receives a complete, perfectly formatted set of instructions tailored to that specific user and that specific task. The most scalable agents know nothing about their environment until the exact moment they are asked to act. That is your lot for this one. Catch you next time!

Context Compression & Offloading

3m 27s

Download

Hi, this is Alex from DEV STORIES DOT EU. Deep Agents, episode 5 of 16. Most AI agents crash or fail silently the moment they read a system log or a data dump that exceeds their maximum prompt size. Deep Agents prevents this by treating context limits like an operating system handles virtual memory. This is called Context Compression and Offloading. Large language models have a hard ceiling on how much text they can process at one time. If an agent tries to hold too much data in its active prompt, the model throws an error and stops. Deep Agents manages this token overflow by constantly monitoring the size of the conversation data. It works exactly like a computer paging RAM to a hard drive when memory gets full. The active working space is kept lean, while the bulk data is moved safely out of the way. This mechanism operates in two distinct phases. The first phase handles immediate, massive data spikes from external tools. The framework watches the token count of every single tool input and tool result. The hard threshold for this check is twenty thousand tokens. If an operation exceeds this limit, the system intercepts the payload before it ever reaches the language model. Consider an agent executing a query that pulls a massive thirty-thousand-token database dump. A standard agent attempts to insert that entire payload straight into the conversation history, which immediately triggers a token overflow. Deep Agents takes a different route. It intercepts that massive response, offloads the full text into a new file on the backend filesystem, and replaces the payload in the conversation with a simple ten-line preview. The agent reads the preview and receives a file path pointing to the complete data. The agent knows exactly what it found, but the active prompt remains completely uncluttered. That handles sudden data spikes. The second phase handles the slow accumulation of a standard conversation. As an agent runs a long, multi-step process, the continuous back-and-forth history slowly eats up available tokens. Deep Agents monitors the total context window usage against the model limits. When the active prompt hits eighty-five percent of the total available context window, a background summarization process triggers automatically. The system takes the oldest block of messages in the current history and uses a language model to generate a dense, factual summary of those events. It then replaces that older block of messages in the active prompt with the newly generated summary, instantly dropping the token count back down to a safe working level. Here is the key insight. Developers often assume that summarizing the conversation permanently destroys the original history. That is not true. Before any summarization takes place, the original messages are written to the filesystem as a permanent, canonical record. The raw data is not lost. If the agent later determines it needs a highly specific detail from an earlier step that did not make it into the summary, it can use its search tools to query that filesystem and retrieve the exact original text. By pairing immediate filesystem offloads for massive tool results with dynamic summarization for long-running histories, an agent can operate indefinitely without suffocating on its own context. Thanks for listening, happy coding everyone!

Context Isolation with Synchronous Subagents

4m 08s

Download

Hi, this is Alex from DEV STORIES DOT EU. Deep Agents, episode 6 of 16. If your main AI agent is reading ten web pages to answer a single question, it is eventually going to forget what the original prompt was. All that intermediate noise drowns out the instructions. The solution to this is Context Isolation with Synchronous Subagents. Think of this as a clean architecture pattern for your language models. When a single agent has access to dozens of tools and runs through long iterative loops, its context window fills up rapidly with raw data, error messages, and tool outputs. This context bloat degrades the model's ability to reason. Instead of one massive agent trying to do everything, you need a supervisor. The main agent acts as a manager. It delegates the messy, token-heavy work to specialists and only receives a clean, formatted final report. People often mistake subagents for an ongoing multi-agent chat room, where different models sit around and debate ideas back and forth. That is not the case here. Synchronous subagents are strictly ephemeral. They are spawned to execute a specific job, they run completely autonomously in their own isolated memory space until the job is done, and they return a single final result to the supervisor. Once they hand over that result, they disappear. Consider a specific scenario. Your main agent is tasked with writing a market briefing, and it needs data on recent economic trends. Instead of the supervisor directly invoking a web search tool five times and polluting its own context window with raw website text, it delegates the problem. It triggers a researcher subagent. This ephemeral researcher goes off into its own isolated loop. It makes the five Google searches, reads the messy results, synthesizes the findings, and writes a single summary paragraph. It passes only that paragraph back to the supervisor. The supervisor gets exactly what it needs, and its context remains pristine. To configure this, you define your specialists using simple dictionaries. You pass these to the subagents parameter when building your main agent. Each dictionary is a subagent specification requiring four pieces of information. First, you provide a name, like researcher or calculator. Second, you provide a description. This is the part that matters. The supervisor reads this description to decide which specialist to hire for a given problem. Third, you provide the tools, giving this specific subagent isolated access to things like a web scraper or a database client. Finally, you provide a system prompt, which governs how the subagent behaves. Once you hand this list of configurations to the main agent, it automatically gains access to a built-in tool called task. The supervisor does not need to know how to instantiate the subagents. When it encounters a problem matching a specialist's description, it simply calls the task tool. It passes two arguments: the name of the subagent to use, and a plain text instruction of what needs to be done. The supervisor execution pauses. The subagent spins up, works through its tools, and eventually finishes. To the supervisor, the task tool simply returns the final text answer as if it were a standard function call. Even if you do not explicitly define any dictionaries, the framework gives you a fallback. There is a general-purpose default subagent built in. The main agent can use this default specialist to offload complex reasoning steps. It acts as a blank slate, giving the model a fresh context window to work through a dense logic puzzle without muddying the supervisor memory. Here is the key insight. By enforcing strict boundaries between isolated tasks, you stop intermediate scratchpad work from degrading the reasoning quality of your primary application flow. That is all for this one. Thanks for listening, and keep building!

Human-in-the-Loop Interventions

3m 45s

Download

Hi, this is Alex from DEV STORIES DOT EU. Deep Agents, episode 7 of 16. Letting an autonomous system blindly run SQL queries is an excellent way to accidentally drop a production database. If you want your agent to operate in the real world, you need a mechanism to intercept it before it does something destructive. That is where Human-in-the-Loop Interventions come in. The core idea is simple. You pause the agent right before it executes a sensitive operation, ask a human for direction, and then resume execution based on that feedback. Before we look at the mechanics, there is a common pitfall to avoid. LangGraph requires memory to pause and resume state. If you try to configure a human-in-the-loop intervention without setting up a checkpointer, the intervention will fail. The system needs a persistence layer, like a memory saver, to freeze the execution graph and store the current variables while it waits for a human to respond. Always attach a checkpointer first. Not all tools need human review. Checking the weather or reading a log file is generally safe. Deleting files or altering database records requires strict control. You handle this risk categorization using the interrupt on configuration parameter. When you initialize your agent or define your tool node, you pass a list of tool names to this parameter. If the agent attempts to call a tool that is not on the list, it runs immediately. If it tries to call a tool that is on the list, the execution pauses. Let us walk through a concrete scenario. You have an agent with a database tool called execute. You add execute to your interrupt on list. The agent decides it needs to clear out some old data and attempts to run a database drop command. The system intercepts the call and pauses the execution graph. When you run your agent, you monitor the result object returned by the framework. Specifically, you check for a property called interrupts. If the result interrupts list contains data, the agent has hit a safety gate and is waiting for your input. At this point, the human operator evaluates the pending action. You have three allowed decisions you can pass back to the agent. These are approve, reject, or edit. If you approve, the agent executes the tool with the original arguments. If you reject, the tool call fails gracefully, and the agent receives an error message prompting it to try a different approach. This is where it gets interesting. The edit decision allows you to modify the agent's intended action before it happens. In our database scenario, the agent is attempting a dangerous drop command. You can intercept that request, rewrite the tool arguments to run a safe select query instead, and send that modified payload back to the system. To unpause the agent, you invoke it again, but this time you pass a command object using the resume parameter. Inside that resume parameter, you provide your decision string along with any modified arguments. The checkpointer retrieves the frozen state, injects your human decision directly into the graph as if the agent had originally planned it that way, and execution continues. The most important thing to remember is that human-in-the-loop interventions give you more than just an emergency stop button. The ability to edit tool arguments mid-flight allows you to safely guide an agent through complex workflows without starting over. If you want to help support the show, search for DevStoriesEU on Patreon. That is all for this one. Thanks for listening, and keep building!

Extending the Harness with Middleware

3m 26s

This episode covers how Deep Agents handles capabilities under the hood via middleware. Listeners will learn how to intercept tool calls and extend graph state safely without mutating instances.

Download

Hi, this is Alex from DEV STORIES DOT EU. Deep Agents, episode 8 of 16. You might look at an agent reading a local file and assume that capability is hardcoded deep inside the core execution loop. It is not. The magic behind an agent's file system is entirely powered by composable middleware that you can extend yourself. Today we cover extending the harness with middleware. Deep Agents is not a black box. The harness that drives the execution pipeline is highly extensible. When you equip an agent with a filesystem or a todo list, you are just attaching pre-packaged middleware to that harness. Middleware sits exactly at the boundary between the agent deciding to invoke a tool and the actual execution of that tool. It is an interceptor. It gives you absolute control over what goes in, what comes out, and what gets recorded during the exchange. To write your own custom middleware, you create an interceptor function and apply a specific decorator called wrap tool call. This decorator registers your function with the harness. When the agent triggers a tool, the harness pauses the default flow and hands control to your wrapped function. Inside this interceptor, you receive the raw input arguments generated by the agent, a reference to the tool being called, and the current graph state. You execute the original tool manually from within your function, capture its output, and then return that output back to the harness. Consider a custom middleware built to monitor an external API. You want to intercept every API tool call, log the exact arguments the agent used, and track usage metrics to avoid rate limits. Your wrap tool call function catches the request before it hits the network. It extracts the payload, writes it to your application logs, and then executes the actual API request. Once the request finishes, the middleware receives the response data. Now it needs to record that a call was made. Pay attention to this bit. When tracking usage metrics, developers often try to use standard class instance variables. They simply write something like self dot api call count plus equals one. This is a fatal mistake. Deep Agents routinely execute tools and subagents in parallel. If multiple tools resolve at the same moment and try to mutate the same instance property directly, you will cause race conditions. Your counters will overwrite each other, updates will be dropped, and your metrics will be entirely wrong. To manage data safely, you must update the graph state instead. The graph state is explicitly designed to handle parallel execution. Rather than modifying a local variable, your middleware reads the current metric from the graph state, computes the new value, and yields an updated state object alongside the tool result. The execution harness takes over from there. It processes all incoming state updates from parallel tool executions and merges them cleanly without collisions. By routing all side effects and metric tracking through the graph state, your middleware remains completely thread-safe. Understanding this pattern unlocks the entire framework. You stop treating the agent as a closed system and start viewing it as a transparent pipeline where every action can be safely intercepted and measured. That is your lot for this one. Catch you next time!

Project Conventions via Memory Files

3m 34s

This episode covers how to give an agent persistent understanding of your codebase. Listeners will learn how AGENTS.md files serve as always-loaded memory for coding style and architectural patterns.

Download

Hi, this is Alex from DEV STORIES DOT EU. Deep Agents, episode 9 of 16. Having an AI coding assistant is great, right up until you have to remind it for the fiftieth time that your project requires strict typing and snake case variables. The solution is Project Conventions via Memory Files. Developer ergonomics demand that your tools adapt to your codebase, not the other way around. You should not have to paste a style guide into every single prompt. Deep Agents handles this by giving the agent persistent memory through a specific file named AGENTS dot md. People sometimes confuse memory with skills. Memory provides your universal baseline. It is always loaded at startup, making it perfect for global conventions and architectural rules. Skills are functional tools the agent loads dynamically only when a specific task requires them. Today, we are strictly looking at memory. When you initialize an agent and pass the memory parameter, the framework automatically injects the contents of the AGENTS dot md file directly into the agent context before it does any work. This operation relies on the memory-first protocol, which dictates three distinct phases for the agent: Research, Response, and Learning. In the Research phase, the agent reads the memory file to understand the environment. In the Response phase, it generates the code or answers the prompt. Then comes the Learning phase. Here is the key insight. You do not have to manually write the AGENTS dot md file. The agent updates this file autonomously based on your feedback. Take a concrete scenario. You ask the agent to generate a new API endpoint. It writes functional code, but it formats the variables in CamelCase. You reject the pull request and reply that this project strictly uses snake case. The agent corrects the code, but it does not stop there. It enters the Learning phase, opens the local AGENTS dot md file, and appends a new rule requiring snake case for all future variables. The next time you ask for an endpoint, the agent reads that memory file during its Research phase and natively writes snake case from the beginning. This persistent memory system operates across two different scopes. First is the global scope. This file lives in your user home directory, inside a dot deepagents folder. You use this for your personal developer preferences. If you always prefer asynchronous Python over synchronous code across all your projects, the agent learns it here. Second is the project scope. This file lives directly inside your local repository folder. This is where repository-specific rules go. When an agent runs, it loads both files. It applies your global preferences first, then layers the project-specific conventions on top. Because the project-scoped memory file lives in your repository, you commit it to version control. When a new developer joins the team and runs the agent, their local instance instantly inherits all the stylistic decisions the agent has already learned. Every corrected mistake permanently improves the agent's understanding of your repository, moving it from a generic code generator to a highly contextualized maintainer. That is your lot for this one. Catch you next time!

Progressive Disclosure with Skills

3m 37s

Download

Hi, this is Alex from DEV STORIES DOT EU. Deep Agents, episode 10 of 16. If you dump fifty different API documentation guides into your system prompt, your agent will not get smarter. It will just get distracted and fail at basic tasks. You need a way to give an agent deep expertise without blowing up the context window. Progressive Disclosure with Skills solves exactly this problem. A skill in Deep Agents acts like a modular brain upgrade. It allows you to package complex logic, like AWS deployment rules or custom data science workflows, into an isolated directory. To build one, you simply create a new folder, name it after your skill, and place a file named exactly skill dot md inside it. A common misconception is that the framework just appends the entire contents of this file to the system prompt. It does not. Doing that would put us right back at the overloaded context window problem, degrading performance and increasing costs. Instead, the framework relies on a pattern called progressive disclosure, which happens in three steps: Match, Read, and Execute. At the very top of your skill dot md file, you write a frontmatter block. This is formatted as basic YAML and contains just a name and a short description. Here is the key insight. The framework only loads that brief description into the initial system prompt. The agent reads the description and decides if it matches the current user request. This is the Match phase. If the agent decides the skill is relevant, it does not guess what to do next. It actively uses an internal tool to fetch the rest of the skill dot md file. This is the Read phase. Now, the agent has the full, detailed instructions temporarily loaded into its working memory for this specific turn. Finally, it moves to the Execute phase, where it follows those detailed instructions to complete the task. When the conversation moves on to a different topic, the heavy payload is dropped, and the agent goes back to its baseline state. Consider a scenario where you want to build a skill called langgraph-docs. Your skill dot md frontmatter has a description stating this skill provides instructions and internal URLs for searching the LangGraph documentation. The body of the file contains those actual internal URLs and the specific search methodology. When a user asks what LangGraph is, the agent checks its active skills. It sees the short description, realizes it needs more information to answer the question, and requests the full document. It reads the URLs, performs the search, and formulates an accurate answer. Without progressive disclosure, you would have to paste all those URLs into the main prompt for every single conversation. When building a library of skills, you might run into conflicts where two skills try to define instructions for similar tasks. Deep Agents resolves this using source precedence. The rule is simple: the last one wins. Whichever skill is loaded last will override conflicting instructions from earlier ones. This allows you to stack broad, generic skills first, and then layer on highly specific overrides later in your configuration. The execution of your entire agentic workflow hinges on how well you write that frontmatter. Keep your descriptions precise and your skill bodies heavily detailed, because the agent will only read the full file if you sell it on the description first. That is all for this one. Thanks for listening, and keep building!

Long-term Memory Stores

3m 54s

This episode covers how to persist files and knowledge across multiple threads. Listeners will learn how to configure a CompositeBackend to route specific directories to a persistent LangGraph Store.

Download

Hi, this is Alex from DEV STORIES DOT EU. Deep Agents, episode 11 of 16. A smart assistant should not ask you what database you use every single time you start a new conversation. Yet, out of the box, most agents suffer from total amnesia the moment a thread ends. To solve this, you need Long-term Memory Stores. When you run an agent, it generates files. Listeners often assume that if an agent writes a file, it lives forever. It does not. By default, everything is ephemeral. Deep Agents use a StateBackend, which stores files within the state of the current conversation thread. When that thread concludes, those files vanish. If you want true personalization where the agent builds a lifelong database of user preferences and project knowledge, you need a different approach. You need the StoreBackend. This connects directly to a persistent LangGraph Store, keeping data safe across multiple threads. But you do not want to persist every single temporary file your agent creates. You need a mechanism to separate ephemeral scratchpad work from durable memories. This is where the CompositeBackend comes in. Think of it as a traffic router for your agent file system. You configure the CompositeBackend with a default fallback, usually the ephemeral StateBackend. Then, you explicitly register a route for your long-term storage. You tell the router that any file path starting with the directory prefix slash memories slash must be handled by the StoreBackend. Pay attention to this bit. This persistence is not automatic. You must specifically route the slash memories slash path to the Store. If you skip this routing step, the agent will just write memory files to the ephemeral state, and they will be deleted when the thread ends. Once you introduce a persistent database, you face a new problem. You cannot have one user reading the stored preferences of another user. To prevent this, the StoreBackend uses namespace factories. A namespace factory is simply a function that injects an isolation layer based on the current context. Instead of saving a file globally, the factory generates a prefix array, like the word users followed by the unique user ID. Every time the agent interacts with the StoreBackend, the database automatically scopes the operation to that exact user namespace. Let us look at how this flows in practice. You start conversation one. You mention you are building a React application. The agent notes this and writes a summary to a file called project notes dot txt, specifically placing it inside the slash memories slash directory. The CompositeBackend sees that directory prefix, intercepts the write command, and routes it to the durable StoreBackend under your isolated user namespace. Tomorrow, you start conversation two. This is a completely new thread with zero immediate context. You ask a question about state management frameworks. Before answering, the agent checks its slash memories slash directory. The CompositeBackend routes the read request to the persistent store. The agent reads your project notes, sees the React details, and provides a highly relevant answer tailored to your specific stack. The context carries over perfectly, even though the threads are separate. Cross-thread persistence requires deliberate design by routing specific directories to a persistent store and protecting them with user-specific namespaces. Thanks for spending a few minutes with me. Until next time, take it easy.

Executing Code in Sandboxes

3m 34s

This episode covers how to safely run agent-generated code using remote sandboxes. Listeners will learn how to configure the Sandbox-as-tool pattern with providers like Modal, Daytona, and Runloop.

Download

Hi, this is Alex from DEV STORIES DOT EU. Deep Agents, episode 12 of 16. Letting an AI write code is impressive. Letting that AI blindly run the code it just wrote directly on your laptop is a terrible idea. To fix this, you need a safe place for the agent to verify its logic without risking your host machine. That is exactly what we are discussing today, Executing Code in Sandboxes. If an agent cannot execute code, it cannot verify if its code actually works. It relies purely on its training data to guess if a script is correct. Sandboxes give the agent a safe playground to iterate, fail, and fix errors based on real execution output. Before looking at the mechanics, we need to clarify a common misconception. You might assume the agent itself must be deployed and run inside the sandbox. That is not the case. Deep Agents rely on the Sandbox as tool pattern. The agent logic, memory, and prompts live securely on your server. The agent simply pushes shell commands and code over an API to a remote, isolated environment. This interaction happens through the execute tool. When the agent decides it needs to run a script, it calls the execute tool and passes the generated code block. This tool forwards the payload to a sandbox backend. Providers like Modal, Daytona, and Runloop supply these environments. They provision a secure container, run the code, and return the standard output or error logs back to the agent. The sandbox creates a strict isolation boundary between the execution space and your system. Consider an agent assigned to create a small Python package and run a test suite using pytest. The agent sits on your server. It uses the execute tool to reach out to a remote ephemeral server. First, it sends a command to install required libraries. Then, it writes the Python files into the sandbox directory. Finally, it sends a command to run pytest. The sandbox executes these steps and returns the terminal output. If a test fails, the agent reads the error from the API response, updates the code, and calls the execute tool again. The agent can build, test, and wipe the slate clean, all while your host machine remains completely untouched. When you configure a sandbox backend, you must define its lifecycle. There are two main approaches. The first is thread-scoped. Here, the sandbox is tied to a specific conversation thread. When the conversation starts, a fresh sandbox spins up. When the user session ends, the sandbox is destroyed. This ensures a clean slate for every interaction and is ideal for single-task operations where data does not need to persist. The second approach is assistant-scoped. In this model, the sandbox is tied to the agent itself, regardless of the active conversation. Every thread interacting with that agent shares the exact same sandbox state. If one thread installs a specific version of a library or downloads a large dataset, those files remain available for the next thread. This is the right choice when your agent acts as a persistent worker requiring a stable, ongoing workspace. Here is the key insight. The execution environment dictates the agent capability. By controlling the sandbox lifecycle and enforcing strict API boundaries, you give the agent the freedom to make mistakes without compromising your infrastructure. The true power of an agent is not just generating code, but iterating on failure, and remote sandboxes provide the only secure way to let that iteration happen automatically. That is all for this one. Thanks for listening, and keep building!

Subgraph Streaming UX

3m 28s

Download

Hi, this is Alex from DEV STORIES DOT EU. Deep Agents, episode 13 of 16. You fire off a complex query, your supervisor agent delegates to three parallel subagents, and your user sits watching a generic loading spinner for two straight minutes. That is terrible user experience, and to fix it, you need to expose exactly what is happening under the hood. The answer is Subgraph Streaming UX. Multi-agent workflows are inherently asynchronous and slow. When a main agent delegates tasks, it creates a black box. If you just wait for the final combined result, the user assumes the application froze. You have to stream the subagent activity directly to the frontend. A common mistake developers make here is assuming that when subagents stream tokens, those tokens are just blindly appended to the main agent output stream. If that were true, parallel outputs would interleave into a garbled mess of text. Instead, the framework handles this by isolating every single event into its own namespace. To get this working, you use the version two streaming format and pass a flag setting subgraphs to true when you call your stream. When you do this, the stream no longer yields raw chunks of text. It yields dictionaries. Every dictionary contains three specific keys: type, ns, and data. Type tells you what kind of event you are looking at. Data holds the actual payload or text token. And ns stands for namespace. This is the part that matters. The namespace is a tuple defining the exact path of the node that generated the event. If a chunk comes from the main agent, the namespace tuple is empty. If it comes from a subagent, the namespace contains the path. It looks like a tuple containing the word tools followed by a unique subagent ID. Say you are building a frontend dashboard. A user asks the system to plan a vacation. The supervisor agent spawns three separate subagents: one for flights, one for hotels, and one for activities. In your frontend code, you loop over the incoming stream. By checking the namespace key on every dictionary chunk, you know exactly which subagent generated that specific token. You can then route that data to separate UI components. Instead of a single stuck spinner, your user sees three individual loading bars or text windows filling up with real-time outputs side by side. Sometimes you need to stream information that is not just a language model token. Maybe you want to push a custom status update, like notifying the UI that an external API is being called. You can do this from inside a node using the get stream writer function. You retrieve the writer, pass it your custom dictionary, and it injects that data directly into the overall stream under the current subagent namespace. The frontend receives it exactly like any other event and updates the correct specific loading bar. If you want to help keep the show going, you can support us by searching for DevStoriesEU on Patreon. Multi-agent transparency is fundamentally a routing problem; mapping namespace tuples to independent UI components is what turns a frustrating black box into a premium user experience. That is all for this one. Thanks for listening, and keep building!

The CLI and External MCP Tools

4m 00s

Download

Hi, this is Alex from DEV STORIES DOT EU. Deep Agents, episode 14 of 16. What if you could connect your local terminal agent to your company's proprietary database without writing a single line of Python code? That exact capability is why we are covering the Deep Agents CLI and External MCP Tools. The Deep Agents SDK provides a built-in command line interface out of the box. You start the process, and you get a terminal prompt where you can chat with your agent. You can manage the agent's core brain directly from this prompt. If you need a more capable reasoning engine, you just type slash model followed by the new model identifier. The CLI hot-swaps the provider instantly, and you continue your session. But a terminal agent that only chats is not very useful. It needs access to your actual work. Normally, giving an agent access to a GitHub repository or a local database requires writing custom Python wrapper code to handle the API calls. The CLI avoids this entirely by using the Model Context Protocol, widely known as MCP. There is often confusion about what MCP actually is. It is not a Python library that you import into your agent code. An MCP server is an entirely separate process or a remote URL. It acts as a standardized wrapper around an external system. When the CLI starts, it talks to this separate process, asking what tools it provides, and dynamically loads them. You configure this connection using a simple text file named dot mcp dot json. This file is the core of zero-code integrations. When you launch the Deep Agents CLI, it performs auto-discovery. It looks in your current directory for this json configuration. If it finds one, it connects to the servers listed inside, registers their tools, and hands them to your agent. This configuration file supports two transport mechanisms. The first is standard input and output, or stdio. This is for local servers. The CLI spawns the server as a background process on your machine and communicates with it through standard terminal streams. The second transport type relies on HTTP and Server-Sent Events. You use this when your MCP server is hosted remotely, perhaps sitting inside a private corporate network. Because stdio servers run local commands, security is a priority. The CLI maintains a local trust store for project-level servers. The very first time the CLI detects a new local server command in your dot mcp dot json file, it halts and asks for explicit permission. Once you approve it, the hash of that command is saved to your trust store, and future runs connect automatically. Let us look at a specific scenario. You want your agent to read files from your local disk. You create a dot mcp dot json file. Inside, you define a server block and name it local-filesystem. You set the transport type to stdio. For the command line, you tell it to run the node package manager tool, npx, pointing it to a pre-built community server called server-filesystem, along with the directory path you want to expose. You save the file and launch the Deep Agents CLI. Here is the key insight. The CLI automatically spawns that npx command in the background. The filesystem server boots up and broadcasts that it has tools for reading files and listing directories. The CLI catches that broadcast and equips your agent. You can immediately type into your terminal, asking the agent to read your system log file. The agent understands it has the tool, calls the background process, retrieves the file contents, and answers your question. You just gave an AI system secure access to your local disk using a few lines of configuration text. By decoupling tool logic from your Python application, MCP turns your terminal from a basic chat interface into an extensible control center. That is your lot for this one. Catch you next time!

Editor Integrations via ACP

2m 58s

Download

Hi, this is Alex from DEV STORIES DOT EU. Deep Agents, episode 15 of 16. Why settle for a generic coding assistant when you can inject your own heavily customized LangGraph agent directly into your code editor? The protocol that pulls your bespoke agent out of the terminal and into your daily workflow is Editor Integrations via ACP. Before we go further, we need to clear up an acronym collision. You have likely heard of MCP, or Model Context Protocol. MCP connects your agent to external tools, allowing it to search a database or read a file system. ACP, the Agent Client Protocol, is completely different. ACP connects your agent to external IDEs, like Zed or VSCode. One connects the agent to external data, the other connects the agent to your workspace. ACP is a standardized communication layer. It defines exactly how an Integrated Development Environment sends text selections, file structures, and user prompts to an AI agent. It also dictates how that agent sends back code insertions, file diffs, and chat responses. By implementing this protocol, your custom deep agent behaves exactly like an off-the-shelf assistant inside your editor. The difference is that it possesses all the specific memory structures, custom prompt templates, and specialized internal tools you built into it. To bridge this gap, you install the deepagents-acp package. The integration requires a very short Python script to act as the entry point. In this script, you instantiate your pre-configured deep agent. Then, instead of starting a web server or a command-line loop, you wrap your agent in an ACP server interface. You accomplish this by calling a dedicated run function and passing in an AgentServerACP object, which itself wraps your underlying agent logic. Here is the key insight. When you configure this server, you set it to use standard input and output, commonly known as stdio mode. This is crucial because it means there are no network ports to manage, no firewall rules to bypass, and no web sockets to debug. The agent process simply listens for incoming text strings on standard input and prints its structured responses to standard output. Next, you must connect your editor to this script. Consider the Zed editor as a concrete scenario. Inside Zed, you open your user settings JSON file. You add a configuration block instructing the editor to use a custom agent. Instead of pointing Zed to a remote REST API endpoint, you point it directly to your local Python executable, providing the path to your new ACP script as the primary argument. When you start a session, the editor spawns your Python script as a hidden background process. Every time you ask for an inline edit or highlight a function for refactoring, the editor sends a structured ACP message through standard input directly to your running script. Your custom agent processes the request, traverses its internal node graph, queries its memory, and prints the ACP-formatted response back to standard output. The editor captures that output and applies the code changes natively in your application window. The single most valuable takeaway here is control. By routing your custom agent through ACP over stdio, you stop relying on a generic assistant and start collaborating with an entity that inherently understands your specific codebase conventions. That is all for this one. Thanks for listening, and keep building!

Background Workers with Async Subagents

3m 54s

Download

Hi, this is Alex from DEV STORIES DOT EU. Deep Agents, episode 16 of 16. You ask your agent for a massive codebase refactor, and the chat interface immediately freezes. You are stuck staring at a spinning wheel for twenty minutes, unable to ask questions or change your mind. To fix this, you need Background Workers with Async Subagents. Standard subagents are synchronous. They block the main thread. The supervisor hands off a task and simply waits until it finishes. Async subagents change this dynamic entirely by running as non-blocking background workers. When the supervisor assigns work, it does not wait for the result. It immediately receives a job ID and returns control to the user. The user interface remains responsive while the heavy lifting happens elsewhere. You configure these background workers using the Async SubAgent specification. When you deploy your application to LangSmith Deployments, these async subagents are hosted independently. The main supervisor communicates with them over standard transport protocols. If you deploy the worker on the same exact server as the supervisor, they communicate using ASGI, which is highly efficient for local routing. If the worker lives on a separate remote server, the supervisor connects to it over HTTP. To orchestrate these remote workers, the supervisor is equipped with a specific suite of async tools. Think back to that codebase refactor request. The user asks for the rewrite. The supervisor decides this will take a long time and calls the start async task tool. This tool kicks off the coder subagent and hands the supervisor a job ID. The supervisor replies to the user, confirming the background job is running. Because the main thread is free, the user can continue the conversation. A few minutes later, the user might ask how the refactor is going. The supervisor calls the check async task tool, passing the job ID, and retrieves the current execution status from the worker. Here is the key insight. The user is not locked out of the process while the worker runs. If the user suddenly realizes they need the refactor to enforce strict type hinting, they can just tell the supervisor. The supervisor then calls the update async task tool. This pushes new mid-flight steering instructions directly to the running subagent without restarting the job from scratch. If the user decides the refactor was a bad idea entirely, the supervisor uses the cancel async task tool to cleanly terminate the background process. A common point of confusion is how the supervisor remembers these jobs during a long session. Over time, chat histories get compacted to save context tokens. If the job ID only existed as text in the conversation history, the supervisor would eventually forget it and lose track of the worker. To prevent this, task metadata is never stored just in the chat log. It is saved in a dedicated state channel called async tasks. This state channel operates independently from the message list, guaranteeing that active job IDs survive any conversation compaction. Moving from blocking calls to background workers shifts your architecture from a simple chat interface to a parallel orchestration engine capable of handling massive workloads. Since this wraps up our current series, I encourage you to read the official documentation, try deploying an async worker yourself, or visit devstories dot eu to suggest topics for our next run. Thanks for spending a few minutes with me. Until next time, take it easy.