Back to catalog

Season 48 20 Episodes 1h 11m 2026

LangChain v1.0 Orchestration Framework

v1.2 — 2026 Edition. An in-depth audio course on the LangChain v1.0 orchestration framework. Build reliable, multi-agent AI applications with standardized tool calling, structured outputs, human-in-the-loop validation, and advanced context engineering.

LLM Orchestration AI/ML Frameworks Multi-Agent Systems

🌐 English 🇪🇸 Español 🇫🇷 Français 🇵🇹 Português 🇮🇹 Italiano 🇵🇱 Polski 🇩🇪 Deutsch 🇷🇴 Română

Now Playing

Click play to start

0:00

The Orchestration Era

We explore why LangChain evolved from a wrapper to a full orchestration framework. You will learn the history of LLM application development and how the release of v1.0 standardizes model interactions.

3m 29s

The Unified Agent Abstraction

We dive into the core create_agent function that unifies LangChain's previous abstractions. You will learn how the ReAct loop operates under the hood to manage model reasoning and tool execution.

3m 17s

Standardizing the Mess

We examine how LangChain standardizes model interactions across different providers. You will learn how to initialize chat models and seamlessly switch between OpenAI, Anthropic, and Google.

3m 27s

The Universal Language of LLMs

We break down the fundamental unit of context in LangChain: Messages. You will learn how to structure System, Human, AI, and Tool messages to build robust conversation histories.

3m 18s

Empowering Agents with Tools

We explore how to give your models actions using the @tool decorator. You will learn how type hints and docstrings are automatically converted into precise JSON schemas for the model.

3m 41s

Injecting Tool Context

We dive into passing runtime information directly to your tools without exposing it to the LLM. You will learn how to use the ToolRuntime parameter for secure, dependency-injected configurations.

3m 30s

Thread-Level Persistence

We tackle short-term memory and how to maintain conversation history. You will learn how to attach checkpointers to your agent to allow conversations to be paused, resumed, and remembered.

3m 48s

Compressing Context with Middleware

We explore how to prevent long conversations from crashing your model. You will learn how to use SummarizationMiddleware to automatically compress old messages and save tokens.

3m 36s

Guaranteed Data Formats

We discuss how to force language models to return strict, predictable data structures. You will learn the difference between ProviderStrategy and ToolStrategy for generating Pydantic models.

3m 24s

Intercepting the Agent Loop

We introduce the middleware paradigm, giving you surgical control over your agent's execution. You will learn how to use wrap-style and node-style hooks to intercept model calls.

3m 00s

Dynamic Context Engineering

We dive into context engineering by dynamically generating system prompts. You will learn how to use middleware to alter instructions based on the current user's role and environment.

3m 50s

Safe AI with Deterministic Guardrails

We secure our agents against data leaks using built-in middleware. You will learn how to apply PIIMiddleware to automatically redact sensitive information before it reaches the model.

3m 45s

Pausing for Human Approval

We explore high-stakes tool execution by adding a human to the loop. You will learn how to halt an agent's execution to approve, edit, or reject sensitive actions.

3m 29s

Real-Time Agent Feedback

We dive into streaming to drastically improve user experience. You will learn how to interpret stream modes to display live LLM tokens alongside custom tool execution updates.

3m 19s

Cross-Session Persistence

We explore long-term memory to build agents that truly know their users. You will learn how to use LangGraph stores to save JSON documents across entirely different conversations.

3m 13s

The Multi-Agent Paradigm

We explain why single agents fail and introduce the Subagents architecture. You will learn how a main supervisor agent coordinates subagents as isolated context windows to prevent token bloat.

3m 52s

State-Driven Agents

We explore how agents can dynamically alter their behavior. You will learn the Handoffs pattern for transferring control, and the Skills pattern for loading specialized prompts on-demand.

3m 42s

Custom Workflows and Routers

We step outside the standard agent loop. You will learn how to use LangGraph to build custom routing architectures, mixing deterministic logic with non-deterministic agentic reasoning.

3m 41s

Agent-to-Agent Communication

We explore the LangSmith A2A endpoint. You will learn how distributed agents deployed on entirely different servers can converse natively using Google's A2A RPC protocol.

4m 12s

The Future is MCP

We look forward with the Model Context Protocol, standardizing how agents access external tools. You will learn how to connect remote MCP servers to your agent using standard transports.

4m 18s

Episodes

The Orchestration Era

3m 29s

Download

Hi, this is Alex from DEV STORIES DOT EU. LangChain v1.0 Orchestration Framework, episode 1 of 20. It is incredibly easy to build a prototype of an AI agent, but it is notoriously difficult to build one reliable enough to put into production. The gap between a cool weekend demo and an enterprise-grade application is massive. The solution to bridging that gap is what we call The Orchestration Era. First, let us clear up a very common misunderstanding. People often confuse LangChain with a model provider. LangChain does not train, host, or serve large language models. It is the orchestration layer sitting directly above those models. Think of it as the control center that manages how your application interacts with whichever model you choose to use. To understand why an orchestration layer is necessary, we have to look at how the technology has evolved. Back in 2022, interacting with a language model was straightforward. You sent a string of text, and you received a string of text in return. Simple prompt chains were enough to get the job done. You did not need a heavy framework to manage a basic text-in, text-out workflow. You could easily write a script where the output of prompt A was simply pasted into prompt B. But as we move through 2025 and look toward 2026, the landscape looks entirely different. Models no longer just process plain text. They ingest and generate complex multimodal blocks. A single interaction might involve routing a user prompt, triggering an external tool call, processing an image, and returning a structured block of data. A model might return a request for a database lookup right alongside a text summary. If your application code has to manually inspect every single response to figure out if it is a string, a tool invocation, or an image file, your codebase will quickly become a fragile, unmaintainable mess. Here is the key insight. Passing the exact right context to a model at the exact right time is much harder than simply picking the most powerful model on the market. You can swap in the smartest reasoning engine available, but if it receives messy, unstandardized data, the entire interaction will fail. This is why LangChain evolved from a simple chaining library into a standardized orchestration layer for its 1.0 release. It provides a standard message format that works consistently across all major model providers. Instead of writing custom parsing logic for every different API, you use a uniform structure. By standardizing the message format, LangChain ensures that a tool call from one provider looks exactly the same in your code as a tool call from a competing provider. You define a sequence where a user message goes in, the orchestration layer normalizes it, sends it to the model, catches the multimodal response, and translates that response back into a standard format your application can actually use. You write your application logic once, and the orchestration layer handles the translation. The model itself is no longer the entire application, it is merely the reasoning engine, and the orchestration layer is what determines whether that engine actually solves your business problem. If you want to help us keep making these episodes, you can support the show by searching for DevStoriesEU on Patreon. That is all for this one. Thanks for listening, and keep building!

The Unified Agent Abstraction

3m 17s

We dive into the core create_agent function that unifies LangChain's previous abstractions. You will learn how the ReAct loop operates under the hood to manage model reasoning and tool execution.

Download

Hi, this is Alex from DEV STORIES DOT EU. LangChain v1.0 Orchestration Framework, episode 2 of 20. What if you could swap out your entire complex reasoning loop for a single, ten-line function call? You spend hours writing custom parsers and while-loops just to get a language model to trigger a function reliably. The Unified Agent Abstraction is the architecture that takes this off your hands. To understand why this matters, we have to clear up a common confusion regarding older versions of the framework. In version zero point x, developers had to navigate a maze of specific agent classes. You had conversational agents, zero-shot agents, and complex custom chains. If you are upgrading your code, you can forget all of that. The unified agent abstraction replaces all old chains and legacy agents entirely. It provides one clear, standardized entry point through a function simply called create agent. This function exists to manage the ReAct loop. ReAct stands for Reason and Act. When a user asks a complex question, a language model cannot just execute code. It must reason about the problem, decide to take an action, output text requesting that action, and then wait for an observation before it can reason again. Handling this cycle manually is tedious. You have to parse the text generated by the model, map it to a local function, execute that function, format the output, feed it back into the model, and evaluate if the task is finished. The create agent function abstracts away this entire orchestration process. Let us walk through a simple implementation using a weather-checking tool. First, you define your specific tool, perhaps a function called get weather that takes a city name. Next, you initialize your chosen language model. Then comes the abstraction. You call the create agent function and pass it three arguments. You pass in your language model, a list containing your get weather tool, and a system prompt that dictates the rules for the agent. This single call returns an executable agent object. Finally, you invoke this object with a user query, asking for the current weather in London. Here is the key insight into the execution flow. When you invoke the agent, you are starting the ReAct loop. The agent sends the query and the tool descriptions to the model. The model decides it needs real-time data and outputs a standardized tool call. The agent abstraction intercepts this call automatically. It executes your get weather tool with the argument London, takes the resulting temperature data, and passes it back to the model as a new observation. The model evaluates this data, realizes it has the answer to the user query, and generates a final response. You did not write a single loop, and you did not write an output parser. You only provided the tools, the model, and the instructions. The power of the unified agent abstraction lies in establishing a strict architectural boundary. It completely isolates the mechanics of the reasoning cycle, allowing you to dedicate all your engineering effort to refining the quality of your tools and the clarity of your prompts. That is all for this one. Thanks for listening, and keep building!

Standardizing the Mess

3m 27s

We examine how LangChain standardizes model interactions across different providers. You will learn how to initialize chat models and seamlessly switch between OpenAI, Anthropic, and Google.

Download

Hi, this is Alex from DEV STORIES DOT EU. LangChain v1.0 Orchestration Framework, episode 3 of 20. Different model providers have wildly different APIs, which means trying out a new model usually forces you to rewrite half your application. You end up stuck with one vendor just because the technical cost of switching is too high. Standardizing the mess is exactly what this episode is all about. Before getting into the mechanics of how LangChain fixes this, it helps to clear up a common mix-up. If you worked with early language models, you are probably familiar with legacy LLM classes that accept a single, raw text string as input. Modern chat models are different. They strictly require a structured sequence of messages, where each message has a specific role attached, such as system, human, or AI. Everything we discuss now applies to these modern chat models, not the legacy text completion ones. When you build an application, hardcoding a specific provider creates immediate vendor lock-in. To break this dependency, LangChain introduces a single factory function called init chat model. Think of it as a universal translator for initializing artificial intelligence clients. Instead of importing a unique class for OpenAI, another for Anthropic, and yet another for Google, you rely exclusively on this one function. Here is the key insight. The init chat model function uses a simple string syntax to know what to build. You provide a string formatted as the provider name, followed by a colon, followed by the specific model version. Consider an agent currently running on OpenAI's GPT-5. You initialize your model simply by passing the string openai colon gpt-5. Now, suppose Anthropic releases a new model that handles code generation better, and you want to test it. Because you used the init chat model function, you do not touch your agent logic. You do not import a new client package. You just change that one initialization string to anthropic colon claude-sonnet-4-6. LangChain dynamically resolves the string and loads the correct underlying integration class. That handles the model itself, but providers also disagree on configuration settings. One API might expect a parameter called max sampled tokens, while another expects max length. The init chat model function solves this by accepting standard parameters. When you initialize your model, you can pass arguments like temperature or max tokens directly to the function. If you set your temperature to zero point two and max tokens to one thousand, LangChain maps those generic terms to the exact payload keys required by whatever provider you specified in your string. You configure your model once using the standard terminology, and the framework translates it for the destination API. By separating your application code from vendor specific implementation details, you gain architectural agility, allowing you to evaluate competing models in seconds rather than days. That is all for this one. Thanks for listening, and keep building!

The Universal Language of LLMs

3m 18s

We break down the fundamental unit of context in LangChain: Messages. You will learn how to structure System, Human, AI, and Tool messages to build robust conversation histories.

Download

Hi, this is Alex from DEV STORIES DOT EU. LangChain v1.0 Orchestration Framework, episode 4 of 20. If you are still passing plain text strings to your language models, you are severely limiting your applications. Plain strings drop context. They cannot safely hold image references, tool execution results, or distinct interaction roles. To solve this, LangChain uses standardized message objects, acting as the universal language of LLMs. A message in LangChain is a data structure containing a specific role, the payload, and optional metadata. Different API providers handle conversation roles and multimodal inputs differently in their raw APIs. LangChain abstracts this friction away into four distinct classes. The first is the SystemMessage. This object establishes the background rules, persona, or strict constraints for the interaction. Next is the HumanMessage, representing the user's prompt or uploaded files. Then we have the AIMessage, which captures the model's generated response. Finally, there is the ToolMessage. This object specifically carries the result of an external function execution back to the model, cleanly separating raw user input from system-generated factual data. You orchestrate conversations by passing an array of these message objects to the model. You can manually construct this array to shape the output. First, you create a list and insert a SystemMessage assigning the model the role of a strictly formatted poetry expert. Second, you append a HumanMessage asking for a haiku about database migrations. Here is the key insight. You do not have to wait for the model to generate the next response. You can manually instantiate an AIMessage and inject it into the array right after the human prompt, containing a specific opening line. When you pass this entire array to the model, it interprets your injected AIMessage as its own past behavior. It will seamlessly continue the haiku from your exact starting point, adhering to the structure you forced upon it. When you eventually receive an AIMessage back from the model, you need to extract what it actually produced. Developers routinely confuse the raw content attribute with the parsed payload. Every message object has a content property. However, this raw content attribute directly reflects whatever the specific LLM provider returned. Depending on the model provider and the prompt, this might be a simple text string, or it might be a highly nested list of dictionaries containing mixed text chunks, image URLs, and tool identifiers. Parsing this manually makes your code incredibly fragile. Instead of reading the raw content, you should use the content blocks property. The content blocks property is LangChain's strictly typed, standardized representation of the message payload. When you read from content blocks, LangChain translates the provider-specific response into a uniform list of block objects. You can iterate through this list safely. First you check if a block is a text block, and if so, you extract the text payload. Next you check if it is a tool call block, and you extract the arguments. Building your parsing logic around the standardized content blocks property is the only way to ensure your application remains entirely decoupled from the shifting response formats of individual model providers. That is all for this one. Thanks for listening, and keep building!

Empowering Agents with Tools

3m 41s

We explore how to give your models actions using the @tool decorator. You will learn how type hints and docstrings are automatically converted into precise JSON schemas for the model.

Download

Hi, this is Alex from DEV STORIES DOT EU. LangChain v1.0 Orchestration Framework, episode 5 of 20. An AI agent without access to the outside world is just a text generator. To turn it into a system that actually gets work done, you have to give it hands. Today we are talking about Empowering Agents with Tools, specifically using the at-tool decorator. A tool in LangChain is a bridge between the reasoning engine of a language model and your external systems. But there is a common misconception here. Many developers assume the language model somehow analyzes the logic inside their Python function to figure out how to use it. It does not. The language model never sees your Python code. It only sees a schema describing the function. To create a tool, you write a standard Python function and place the at-tool decorator directly above it. This decorator does something crucial. It inspects your function signature, extracts the name of the function, reads the type hints for every parameter, and parses the docstring. It takes all of this metadata and bundles it into a structured format that the language model can actually read. Because the model only reads this generated schema, precision in your code is critical. Let us look at a scenario. Suppose you write a function called search database. If you give it a weak docstring that just says, searches the database, and you leave out type hints entirely, the language model is flying blind. It does not know what kind of database it is, and it does not know what arguments to provide. It might try passing a full conversational sentence as the search query, causing your Python function to crash when it executes. Here is the key insight. When you use the at-tool decorator, you must write your docstrings for the language model, not for the human developer. Instead of a vague description, you write a clear instruction, like, searches the customer database by email address to retrieve past billing history. Next, you apply strict type hints to your input parameters. You specify that the email argument must be a string. You might even add a description to the argument itself. Every piece of type information you add gives the language model a tighter boundary on what it is allowed to generate. When you pass this newly decorated tool to an agent, the agent reads the detailed schema before it does anything else. When a user asks about a customer refund, the agent scans its available tools and recognizes that the search database tool is the exact right fit based on your precise docstring. Because of your strict type hint, it knows exactly how to format the argument. It extracts the email string from the user prompt, halts its text generation, and outputs a tool call. LangChain intercepts that tool call. It takes the email string the model generated, passes it into your underlying Python function, and executes the code. Your function talks to the database, grabs the billing history, and hands the raw data right back to LangChain. LangChain then feeds that data back to the agent as an observation. The agent resumes its thinking process, but now it has the real data you just provided. The language model's ability to act is completely bound by the quality of your function signature. Treat your docstrings and type hints as strict engineering constraints, because to the language model, they are the only instructions that exist. That is all for this one. Thanks for listening, and keep building!

Injecting Tool Context

3m 30s

We dive into passing runtime information directly to your tools without exposing it to the LLM. You will learn how to use the ToolRuntime parameter for secure, dependency-injected configurations.

Download

Hi, this is Alex from DEV STORIES DOT EU. LangChain v1.0 Orchestration Framework, episode 6 of 20. If you rely on a language model to properly pass an authentication token or a user identity to a database tool, you are setting yourself up for a massive security breach. The model does not know who the current user is, and it certainly should not be trusted to guess. You need a way to pass sensitive backend data directly to your tools, bypassing the model entirely. This is exactly what Injecting Tool Context resolves. When you build a tool, some arguments are meant for the model to generate, like a search string or a date range. Other arguments are strictly for your backend infrastructure. This is where the ToolRuntime object comes in. It allows you to inject static configuration into a tool exactly when it executes. You might think you can just pass a configuration dictionary as a normal tool argument and tell the prompt to ignore it. Do not do that. When you define a parameter named config or runtime in your tool function, LangChain treats these as reserved keywords. It intentionally hides them from the tool schema that gets sent to the language model. The model never sees them. It only sees the arguments it is responsible for providing. This means the model cannot hallucinate a fake configuration or attempt to override your security boundaries. Consider a concrete scenario. You are building a get account info tool for a banking application. The tool requires a user ID to fetch the correct database records. If the model provides this ID, a clever prompt injection could trick the model into requesting data for a completely different customer. Instead, you design your tool function to accept two arguments. The first is the account type, which the model will provide. The second is an argument named runtime. Inside your main application code, before the tool is invoked, you populate this runtime object. Specifically, you use the runtime dot context property. You place the actual, authenticated user ID of the person making the request directly into this context. You might also place an active database connection or a regional endpoint address in there. The model evaluates the user request and decides to call the get account info tool. It looks at the schema, sees that it needs to provide an account type, and outputs the word savings. It outputs nothing else. LangChain intercepts this execution call. It takes the account type from the model, seamlessly combines it with the runtime context from your backend, and executes the Python function. Inside the function, you extract the user ID from the context and run your database query securely. This mechanism gives you secure dependency injection. You can inject anything your tool needs to function that the model has no business knowing about. API keys for third-party billing services, file system paths, or tenant identifiers in a multi-tenant architecture all belong in the runtime context. Here is the key insight. Hiding configuration arguments behind the runtime context ensures your model acts purely as a logic router, while your application code retains absolute, uncompromised control over data access and execution security. Thanks for spending a few minutes with me. Until next time, take it easy.

Thread-Level Persistence

3m 48s

We tackle short-term memory and how to maintain conversation history. You will learn how to attach checkpointers to your agent to allow conversations to be paused, resumed, and remembered.

Download

Hi, this is Alex from DEV STORIES DOT EU. LangChain v1.0 Orchestration Framework, episode 7 of 20. Without memory, every interaction with your AI agent feels like a scene from 50 First Dates. The model starts completely blank, forgetting everything you discussed seconds ago. The solution is Thread-Level Persistence. Large language models are entirely stateless. They process text and return text, retaining absolutely nothing between requests. To hold a conversation, an agent needs a way to store and retrieve past interactions. In the context of a single active session, we handle this through short-term memory, which is managed as part of the agent's state and persisted using checkpointers. People often think memory in AI is some internal model capability where the neural network magically remembers you. It is not. Memory is just message history. Thread-level persistence simply means taking the ongoing list of messages, saving it to a database after every step, and feeding it back into the prompt before the model sees the next input. The checkpointer manages this automatically so you do not have to write the storage and retrieval logic manually. In LangGraph, state is saved per thread. A thread represents one continuous session. To enable this, you need a checkpointer. For development and testing, you can use the In Memory Saver. When you compile your agent graph, you pass this saver object in as the checkpointer argument. This integration tells the agent that at the end of every node execution, it must snapshot its current state and hand it over to the saver. The state includes whatever you defined in your graph, which is typically a running list of messages. Let us look at a concrete scenario. You compile your agent with an In Memory Saver. Now, you want to invoke the agent. Instead of just passing a user message, you also pass a configuration object containing a specific thread ID. Let us use the string value conversation one. You send the message, my name is Bob. The agent receives this, generates a polite greeting, and finishes. Behind the scenes, the checkpointer saves the updated state, which now contains the message stating your name is Bob, indexed under conversation one. Later, you invoke the agent a second time. You send a new message asking, what is my name. Crucially, you pass the exact same configuration object with the thread ID conversation one. Here is the key insight. Before the agent routes your new question to the model, the checkpointer intercepts the process. It looks up conversation one in the In Memory Saver. It retrieves the state snapshot from your previous turn, loads the saved message history, and appends your new question to the end. The language model receives the full historical context, sees the previous message where you introduced yourself, and successfully replies that your name is Bob. If you were to change the thread ID to conversation two and ask for your name, the checkpointer would look for that new ID, find no existing state, and initialize a fresh, empty message list. The agent would have no idea who you are. The thread ID is the solitary key that binds a sequence of isolated, stateless model calls into a coherent short-term memory session. The checkpointer abstracts away the repetitive work of managing message arrays and querying databases, guaranteeing that your agent can resume its work exactly where it left off, as long as you provide the right thread ID. Appreciate you listening — catch you next time.

Compressing Context with Middleware

3m 36s

We explore how to prevent long conversations from crashing your model. You will learn how to use SummarizationMiddleware to automatically compress old messages and save tokens.

Download

Hi, this is Alex from DEV STORIES DOT EU. LangChain v1.0 Orchestration Framework, episode 8 of 20. The longer a conversation goes, the more distracted a large language model gets by stale information. You drop accuracy while paying higher token costs for every single turn. Compressing Context with Middleware fixes this exact problem. As chat history grows, you eventually hit the context window limit of your model. Even before hitting that hard limit, feeding thousands of tokens of old conversation degrades performance. The solution is the SummarizationMiddleware in LangChain. Instead of just chopping off old messages, it compresses them into a single summary block. This retains the semantic meaning of the conversation without the massive token overhead. There is a common misconception about how this runs. People often assume the primary agent has to perform the summarization itself. It does not, and it really should not. You want your primary agent running on your smartest, most capable model to handle complex logic. Summarization is a much simpler task. You assign a smaller, cheaper model to the SummarizationMiddleware exclusively for this job. Configuring the middleware requires defining two main parameters. The first is the trigger. The trigger tells the middleware when to step in. You might set the trigger to activate whenever the total token count of the conversation hits 4000 tokens. The second parameter is the keep condition. This tells the middleware how much recent context to leave completely alone. You could set the keep value to 20, meaning the 20 most recent messages remain untouched. Here is the logic flow in practice. Your user is chatting with the main agent. The conversation grows. On the next turn, the total message history crosses the 4000 token threshold. Before the primary agent even sees the new user input, the SummarizationMiddleware intercepts the request. It scans the history and identifies everything older than the 20 most recent messages. It takes that older chunk of the conversation and hands it to your designated smaller model. Let us say you configured the middleware to use gpt-4.1-mini. That smaller model reads the old messages and generates a tight paragraph summarizing what was discussed. The middleware then rewrites the history array. It replaces all of those old, individual messages with a single system message containing the new summary. If there was already an older summary from a previous compression cycle, the middleware includes that in the prompt so the new summary updates the running narrative. The final package sent to your primary agent is highly optimized. It contains the new summary message, followed by the 20 uncompressed recent messages, followed by the latest user input. Here is the key insight. The primary agent never realizes the history was compressed behind the scenes. It just receives a clean, highly relevant context window. You preserve the long-term semantic meaning of the chat, keep the model focused, and drastically lower your token costs on every subsequent turn. If you find these episodes helpful and want to support the show, you can search for DevStoriesEU on Patreon. That is all for this one. Thanks for listening, and keep building!

Guaranteed Data Formats

3m 24s

We discuss how to force language models to return strict, predictable data structures. You will learn the difference between ProviderStrategy and ToolStrategy for generating Pydantic models.

Download

Hi, this is Alex from DEV STORIES DOT EU. LangChain v1.0 Orchestration Framework, episode 9 of 20. Building software that relies on parsing natural language responses with regular expressions is a ticking time bomb. You ask a language model for a simple data object, and it gives you perfect data, except it adds a polite greeting at the beginning and wraps the whole thing in markdown formatting. Your parser crashes instantly. Guaranteed Data Formats are how you fix this permanently. Structured output forces the language model to return information exactly the way your application expects it. It turns unpredictable text generation into reliable software objects. Consider a system that processes inbound customer support messages. A user sends a messy, unstructured paragraph complaining about a login issue, but burying their name, email address, and phone number somewhere in the text. You need those three pieces of data to trigger a database lookup. Instead of writing a complex prompt begging the model to format its response correctly, you define a standard Pydantic model. You create a class called ContactInfo and define name, email, and phone as required fields. Then, you simply pass this Pydantic schema to the response format parameter of your language model configuration. You do not need to provide examples or write custom validation scripts. This is the part that matters. When you supply that Pydantic schema, LangChain automatically determines the most reliable way to enforce it. It does this by quietly selecting between two different execution paths. First, it checks if your chosen language model has an official structured output feature built into its API. If it does, LangChain auto-selects the Provider Strategy. This strategy pushes your schema directly to the provider, leveraging their native, server-side constraints to guarantee the output format. But hardware changes and models get swapped. If you decide to use a different model that lacks native structured output, LangChain detects this capability gap. It automatically falls back to the Tool Strategy. Under the hood, it translates your ContactInfo schema into a function signature. It tells the model about a fake tool that requires exactly a name, an email, and a phone number to run. The model attempts to call this tool, and in doing so, generates the exact structured arguments you need. Your application code never has to change to accommodate the swap. When the operation completes, developers often look in the wrong place for their data. You might assume the output is returned as raw text that you still have to parse. It is not raw text. LangChain intercepts the payload and instantiates the Pydantic object for you. It places this fully validated Python object directly into your application state. You will find it captured in the structured response key of your state dictionary. You simply reference that key, and you immediately have your ContactInfo object, with type-safe fields ready to pass to the rest of your application. By shifting the burden of schema validation away from custom parsing logic and into the framework layer, your language model integrations become as predictable as a standard API call. Appreciate you listening — catch you next time.

Intercepting the Agent Loop

3m 00s

We introduce the middleware paradigm, giving you surgical control over your agent's execution. You will learn how to use wrap-style and node-style hooks to intercept model calls.

Download

Hi, this is Alex from DEV STORIES DOT EU. LangChain v1.0 Orchestration Framework, episode 10 of 20. If your agent fails silently in production, it is often because you are not watching what happens between the steps of its reasoning loop. The model crashes, the loop breaks, and you are left staring at an incomplete run. Intercepting the agent loop with custom middleware is how you regain control. When an agent runs a ReAct cycle, it constantly hands control back and forth to the language model. Middleware provides hooks to execute your logic exactly when you need it during that exchange. There are two primary types of hooks you will use: node-style hooks and wrap-style hooks. A common mistake is treating them interchangeably. Node-style hooks run sequentially. Wrap-style hooks actually enclose the execution and can catch exceptions. Node-style hooks use decorators named before model and after model. When you attach a before model hook to a function, the framework runs your logic completely, and only then triggers the language model API. When the model responds, an after model hook runs. These hooks are excellent for logging the exact prompt sent to the API, injecting context, or stripping bad characters from the final text output. But because they run strictly in sequence, they offer no protection against failures. If the language model API times out, your after model hook never executes. The error bubbles up and crashes the entire agent loop. This is the part that matters. If you need to handle instability, you use a wrap-style hook. The decorator for this is wrap model call. A wrap hook sits entirely around the model execution. Your function runs, performs some setup, and then explicitly yields control to the model. Because your custom code wraps the actual network call, you can place that execution inside standard error handling structures. Consider building a wrap model call middleware to handle API rate limits with an exponential backoff retry loop. You write a function decorated with wrap model call. Inside this function, you create a retry loop. You place the command that hands control to the model inside a try block. If the model succeeds, you catch the response, return it, and the loop finishes. If the model throws an error, your catch block intercepts it. Instead of failing the agent, your catch block triggers a pause. You calculate a short delay, wait, and then let the loop attempt the call again, doubling the delay each time. The agent orchestrator never sees the failures. The middleware catches the exceptions, manages the retry logic in isolation, and smoothly hands a successful response back to the main ReAct loop when it finally succeeds. Node-style hooks prepare the inputs and format the outputs, but wrap-style hooks protect the execution. That is your lot for this one. Catch you next time!

Dynamic Context Engineering

3m 50s

We dive into context engineering by dynamically generating system prompts. You will learn how to use middleware to alter instructions based on the current user's role and environment.

Download

Hi, this is Alex from DEV STORIES DOT EU. LangChain v1.0 Orchestration Framework, episode 11 of 20. The number one reason your agent fails is not because the underlying model is dumb. It is because you gave it the wrong context for the job. If your system treats a superuser and a guest exactly the same, your application is blind to reality. To fix this, we use Dynamic Context Engineering. First, let us clear up a common misconception about how prompts are built. Dynamic context is not the base system prompt you write when you initially define your agent in code. That initial system prompt is completely static. Dynamic Context Engineering is the process of modifying that prompt on the fly, just milliseconds before the model is actually called. Context engineering is about giving the language model the exact rules it needs for a specific user at a specific moment, and nothing more. If you try to shove every possible rule into one massive static prompt—telling the model how to act if the user is an admin, and how to act if they are a viewer, and how to act if it is Tuesday—you waste tokens and confuse the model. Instead, you want to dynamically inject only the rules that matter right now. In LangChain, this is handled using a specific decorator called dynamic underscore prompt. You place this decorator above a Python function that you define. When your application receives a query and triggers the chain, LangChain pauses. It looks for any function wrapped in this decorator and executes it before talking to the model. Inside your decorated function, you need a way to know what is currently happening. This is where you read from request dot runtime dot context. This context object is essentially a dictionary. It holds all the live metadata passed into the chain when you invoked it. You can put anything you want in there from your application backend, such as user IDs, session states, feature flags, or access levels. Let us look at a concrete scenario. You write a function called context aware prompt and wrap it with the dynamic prompt decorator. Inside this function, you read the user role from the runtime context. You check the role. If the user is an admin, your function appends a specific block of text to the system prompt, telling the language model it has full permission to return destructive commands. If the user is a viewer, your function appends a different text block, giving strict instructions that the model must only return read-only summaries and must never suggest configuration changes. Now, you pull a second piece of data from the runtime context, which is the environment state. You check if the environment is set to production. If it is, your function appends a severe safety warning to the very end of the system prompt, demanding that the model double-check its output for safety. If the environment is just staging, you skip appending that warning entirely. Here is the key insight. Your function takes the static base prompt, glues on the admin or viewer rules, tacks on the production warning if needed, and returns the final string. LangChain takes this fully assembled string and sends it to the language model. The language model never knows the prompt was pieced together. It just sees a highly specific, perfectly tailored set of instructions. By doing this, you keep your system prompts lean, accurate, and completely relevant to the immediate request. You stop hoping the model will guess which rules apply, and start enforcing exactly the rules required for the current state of your application. I would like to take a moment to thank you for listening — it helps us a lot. Have a great one!

Safe AI with Deterministic Guardrails

3m 45s

We secure our agents against data leaks using built-in middleware. You will learn how to apply PIIMiddleware to automatically redact sensitive information before it reaches the model.

Download

Hi, this is Alex from DEV STORIES DOT EU. LangChain v1.0 Orchestration Framework, episode 12 of 20. A single chat log containing an unredacted credit card number can instantly compromise your entire application's compliance. You cannot rely on a language model to politely ignore sensitive data, and asking the model to censor itself is slow and unpredictable. That is where Safe AI with Deterministic Guardrails comes in. Deterministic guardrails are hardcoded, rule-based checks. They rely on predictable patterns and logic, like regular expressions or fixed algorithms, instead of asking another language model to evaluate the text. Because they bypass the network call to an AI, they execute in milliseconds and cost essentially nothing. If you are building for production, this deterministic layer is mandatory for security. In the framework, you implement this using the PII Middleware. A frequent mistake developers make is trying to filter sensitive information after the fact, scanning the output of the model. But to protect user privacy, the PII Middleware is designed to intercept the raw message right as the user hits send. It processes the text before the model call is ever initiated. You explicitly configure this behavior by setting the apply to input parameter to true. Let us walk through a customer service agent scenario. A stressed user sends a message stating their account is locked, including their personal email address, and then they paste their entire sixteen-digit credit card number to verify their purchase. If your code passes that raw string to a third-party AI provider, you have violated basic data compliance. You need a strategy to neutralize the text, and the middleware gives you three built-in actions: block, redact, and mask. If you use the block strategy, the middleware acts as a hard wall. The moment it detects the credit card format, it throws a strict error and terminates the chain entirely. The request is rejected outright. If you choose the redact strategy, the middleware surgically removes the specific data and drops in a clean placeholder. The personal email address is completely wiped from the string and replaced with the word email in brackets. The language model still reads a coherent sentence and understands an email was provided, but the actual data is gone. The third strategy is mask. Masking retains a safe portion of the original data. The middleware replaces the first twelve digits of the credit card with asterisks, leaving just the last four numbers exposed. This is highly effective when your backend system needs to verify an account without exposing the full financial record. Implementing this requires configuring the middleware before your chain runs. You instantiate the PII Middleware and provide it with a list of target entities. In this case, you specify email and credit card. You then assign your chosen strategies to those entities, perhaps choosing redact for the email and mask for the card. Finally, you attach this middleware component to your main chain, ensuring you set the parameter to apply to the input. The moment the user submits their message, the deterministic rules scrub the text, and the AI only receives a sanitized prompt. Here is the key insight. The most secure way to handle sensitive personal information in any generative AI architecture is to guarantee the language model never sees it in the first place. That is all for this one. Thanks for listening, and keep building!

Pausing for Human Approval

3m 29s

We explore high-stakes tool execution by adding a human to the loop. You will learn how to halt an agent's execution to approve, edit, or reject sensitive actions.

Download

Hi, this is Alex from DEV STORIES DOT EU. LangChain v1.0 Orchestration Framework, episode 13 of 20. An autonomous agent is incredibly powerful, right up until the moment it autonomously emails a draft of your financials to the wrong client. Some actions are simply too risky to execute without a pair of human eyes. This is exactly why we use Pausing for Human Approval. Agents execute tools automatically based on user prompts. That behavior is ideal for reading data, but it is dangerous for destructive or irreversible actions. We need a way to pause execution, ask a human if an action is safe, and then resume or abort. Before discussing the mechanics, we have to clear up a common failure point. Engineers sometimes configure interrupts and then wonder why the agent just crashes or restarts. You must have a checkpointer enabled. You cannot pause an agent if it cannot remember where it left off. The agent's entire memory and current progress must be saved to the persistence layer while it waits for a human to respond. No checkpointer means no pause. With persistence active, you can handle tool execution safely using the Human In The Loop Middleware. Consider a setup where your agent has two tools: a search tool and a delete database tool. You want the agent to search freely, but you absolutely do not want it dropping tables without permission. When you configure this middleware, you set an argument called interrupt on. You pass it the specific names of the tools that require oversight. In our scenario, you configure interrupt on to watch only for the delete database tool. The search tool is ignored by the middleware and executes immediately whenever the agent calls it. However, when the agent decides it needs to use the delete database tool, the middleware catches the request. It pauses the graph, saves the current state to your checkpointer, and halts execution entirely. The graph is now suspended in the persistence layer, waiting for human intervention. The human operator reviews the pending tool call and has three ways to respond to the middleware. The first decision type is approve. The human looks at the parameters the agent generated, agrees they are correct, and sends an approval command. The graph wakes up and executes the deletion exactly as the agent originally planned. The second decision type is reject. The operator sees that the agent is trying to delete the wrong target and sends a rejection. The tool does not execute. Instead, the agent receives an observation indicating that the action was blocked by a human. The agent then processes this feedback and can either try a different approach or ask the user for clarification. Here is the key insight. The third option is edit. Sometimes the agent is mostly correct but makes a minor error, like targeting the production environment instead of the staging environment. Rather than rejecting the action entirely and forcing the agent to reason through the problem again, the operator can modify the tool input parameters directly. The human changes the target environment to staging, and submits the corrected call. The agent resumes and executes the action using the modified parameters, moving forward smoothly. By using this middleware, you protect your system from dangerous mistakes. Pausing for human approval does not just prevent catastrophes, it transforms your agent from an unpredictable entity into a supervised collaborator that can safely handle high stakes operations. That is all for this one. Thanks for listening, and keep building!

Real-Time Agent Feedback

3m 19s

We dive into streaming to drastically improve user experience. You will learn how to interpret stream modes to display live LLM tokens alongside custom tool execution updates.

Download

Hi, this is Alex from DEV STORIES DOT EU. LangChain v1.0 Orchestration Framework, episode 14 of 20. Users do not mind waiting ten seconds for a complex answer, as long as you show them what the brain is doing during those ten seconds. A blank screen feels like a broken application. The solution to perceived latency is Real-Time Agent Feedback. To fix the blank screen, LangChain exposes a parameter called stream mode when you execute your agent or graph. It controls exactly what kind of data the agent sends back over the connection while it is running. The first mode you need to know is the messages mode. This handles the classic typing effect. It streams the raw tokens from the language model as they are generated. If the model is writing a paragraph, your application receives the text chunks one by one, allowing your user interface to update smoothly instead of waiting for the entire block of text. People often confuse streaming final answer tokens with streaming intermediate reasoning. They are entirely entirely different. If your agent decides to call a search tool, token generation stops. The language model is waiting for the tool to finish. If that tool takes five seconds to run, your user interface freezes for five seconds. The messages mode alone does not tell the user what the agent is actually doing in the background. It only shows what the language model is saying. To solve the silent tool problem, you use the custom mode. Custom mode allows your tools and internal nodes to emit their own real-time status updates directly into the stream. To implement this, you use a LangChain utility called get stream writer. You call this function inside your tool code. It gives you a writer object, which you can use to emit custom events back to the client at any point during the tool's execution. Think of a slow weather-checking tool. Your agent receives a prompt asking for the forecast and decides to call the weather tool. Inside the Python function for that tool, you grab the stream writer. As the tool starts querying a slow remote API, you use the writer to emit a custom event with a status like Acquired data. Your frontend receives this custom event immediately and displays a loading spinner with that text. The user knows the agent is working. Once the remote API returns the data, the tool finishes and the language model resumes control. It takes the raw weather data, formulates a human-friendly response, and the messages stream kicks back in, typing out the final forecast on the screen. Here is the key insight. You do not have to choose just one mode. You can pass a list containing both messages and custom to the stream mode parameter. LangChain will automatically interleave the language model tokens and your custom tool logs into a single continuous feed. Your frontend just checks the event type as it arrives. If it is a custom event, you update the status indicator. If it is a message event, you append the token to the chat bubble. Perceived latency drops to zero because the system is always talking to the user. If you want to help keep these episodes coming, you can support the show by searching for DevStoriesEU on Patreon. Thanks for listening, happy coding everyone!

Cross-Session Persistence

3m 13s

We explore long-term memory to build agents that truly know their users. You will learn how to use LangGraph stores to save JSON documents across entirely different conversations.

Download

Hi, this is Alex from DEV STORIES DOT EU. LangChain v1.0 Orchestration Framework, episode 15 of 20. To build a truly personalized assistant, it needs to remember that you prefer brief answers, even if you told it that three weeks ago in a completely different chat. If you rely solely on standard conversation memory, that preference vanishes the moment you start a new thread. The mechanism that prevents this amnesia is Cross-Session Persistence using the Store paradigm. Many developers confuse the checkpointer with the store. Here is the difference. A checkpointer manages short-term state. It remembers a single conversation thread. When the user creates a new chat, the checkpointer starts a blank slate. The Store crosses those thread boundaries. It allows your agents to persist and retrieve information globally, across all interactions with a specific user. At its core, long-term memory in LangChain is just a hierarchical key-value store. It persists JSON documents. The hierarchy relies on namespaces. A namespace is a sequence of strings that acts exactly like a folder path on your computer. If you want to store profile data, you might use a namespace containing the string "users", followed by the user identifier. Inside that namespace, you store items. Every item requires a unique string key and a dictionary representing the JSON value. This is where it gets interesting. Tools interact with this store directly through the runtime context. You never pass the store through your graph state manually. Consider a custom tool named save user info. Its job is to capture a spoken language preference. During setup, you initialize your application with a store backing, such as an in-memory store for local testing. Inside your tool logic, you access the store instance directly from the injected runtime configuration. You extract the user identifier from the current context. Then, you call the put method on the store. You provide the namespace tuple containing the word "users" and the user ID. You define a key, like "language preference", and finally pass the JSON dictionary containing the value, perhaps "Spanish". The store persists this document. Weeks later, the user starts an entirely new conversation. The thread state is empty. But because the agent has access to a retrieval tool, it can call the get method on the runtime store using that exact same namespace and key. It pulls the JSON document, reads the preference, and immediately responds in Spanish. Separating short-term conversational context from long-term factual memory keeps your application lightweight. The checkpointer does not bloat with years of user history, and the state remains clean. The store only loads the specific JSON documents the agent explicitly decides to fetch. Treating context and persistence as two entirely distinct systems is the only way to scale an agent reliably. The checkpointer holds the present, while the store holds the past. That is it for today. Thanks for listening — go build something cool.

The Multi-Agent Paradigm

3m 52s

We explain why single agents fail and introduce the Subagents architecture. You will learn how a main supervisor agent coordinates subagents as isolated context windows to prevent token bloat.

Download

Hi, this is Alex from DEV STORIES DOT EU. LangChain v1.0 Orchestration Framework, episode 16 of 20. When your single AI agent starts to fail because of its own massive list of tools and competing instructions, throwing a larger context window at it will not fix the problem. It is time to stop building a monolithic script and start hiring a team. That is exactly where The Multi-Agent Paradigm comes in. We build multi-agent systems because single agents hit a cognitive wall. Give an agent thirty tools, five pages of system prompts, and a long conversation history, and it will lose focus. It will call the wrong tool or forget constraints. The multi-agent approach breaks this down. It enables distributed development where different teams manage different agents. It allows parallel execution. Most importantly, it enforces strict context isolation. Today we focus on a specific architecture called the Subagents pattern. This involves a main supervisor agent that delegates tasks to specialized subagents. People often confuse a supervisor with a simple router. A router is just a static function that looks at a query and sends it down one fixed path. A supervisor is an active, thinking agent. It maintains the conversation state, decides which subagents to invoke over multiple turns, and synthesizes their responses. Here is the key insight. Subagents provide perfect context isolation. When the supervisor asks a subagent to do something, the subagent boots up with a completely clean context window. It only has the specific instructions and tools it needs for its exact job. The subagent might make mistakes, call tools three times, and fill up its own scratchpad while figuring out the answer. The supervisor never sees that mess. It only receives the final cleaned-up result. This protects your main agent from context bloat and prevents hallucination. To connect the supervisor to the subagents, you wrap the subagents as tools. There are two ways to do this in LangChain. The first method is tool-per-agent. You give the supervisor a specific tool for every subagent. If you have five subagents, the supervisor has five tools. The second method is a single-dispatch tool. Here, the supervisor gets exactly one tool called something like delegate task. This tool requires two inputs: the name of the target agent and the task description. Consider a single-dispatch scenario. You have a main agent, a research agent, and a writer agent. A user asks for a complex market report. The main agent decides it needs data first. It calls the single dispatch tool, passing the research agent as the target and the market query as the payload. The research agent spins up in its own isolated context, searches the web, parses documents, and returns a summary paragraph. The main agent receives this text. Next, the main agent calls the dispatch tool again, this time targeting the writer agent, passing the research summary and formatting instructions. The writer agent drafts the final report and returns it to the main agent, who delivers it to the user. You can execute these subtasks differently depending on your needs. You can run subagents synchronously, where the supervisor waits for the research agent to finish before taking any other action. If you have independent tasks, like researching three different competitors, you can run the subagents asynchronously. The supervisor dispatches all three tasks at once, they execute in parallel, and the supervisor waits for all of them to return before moving on. Grouping tasks into subagents is not just about organizing your code, it is about strictly controlling what the language model is forced to hold in memory at any given moment. That is all for this one. Thanks for listening, and keep building!

State-Driven Agents

3m 42s

We explore how agents can dynamically alter their behavior. You will learn the Handoffs pattern for transferring control, and the Skills pattern for loading specialized prompts on-demand.

Download

Hi, this is Alex from DEV STORIES DOT EU. LangChain v1.0 Orchestration Framework, episode 17 of 20. You do not need to load your agent's brain with every possible scenario upfront. Sticking fifty pages of instructions into a single system prompt just makes your model confused, slow, and expensive. You just need to teach it how to ask for the right manual when the time comes. This is the core mechanism behind State-Driven Agents. State-driven agents operate on a simple principle. Agent behavior changes dynamically based on the current state of the application. We handle this using two main patterns, which are Skills and Handoffs. Both patterns rely on tools to update state variables, which in turn dictate what happens next in the workflow. Let us look at the Skills pattern first. This pattern is about progressive disclosure of knowledge. Instead of giving an agent all its instructions at the start, you give it a tool. When the agent decides it needs more information to solve a problem, it calls this tool. The tool executes, but it does more than just return a string to the model. It updates a specific state variable in your application. Your orchestration layer monitors this state. When it detects the change, it dynamically injects a new set of instructions or capabilities into the agent's system prompt for the very next turn. Take a standard customer support agent. Initially, its only job is to figure out what the customer wants. A user asks about a broken product. The agent calls a tool to collect a warranty ID. The execution of this tool updates a state variable to indicate a warranty claim is active. The application reads this new state and dynamically loads a specialized refund skill into the prompt. This skill might include the specific rules for processing returns and access to a secure inventory database. The agent's capabilities evolved mid-conversation, driven entirely by a state update. Now, what if the required task is too complex for the initial agent to handle, even with new skills? That is where the Handoff pattern comes in. Handoffs also use tools to update state, but instead of loading new instructions into the current agent, the state change transfers control to an entirely different agent. Back to our scenario. The support agent collects the warranty ID, but instead of processing the refund itself, it calls a handoff tool. This tool updates a routing variable in the state, changing the active agent from the triage bot to a specialist agent designed solely for high-value returns. The orchestration layer sees this state change and directs the next step of the workflow to the specialist. This transition point is where things often break down. When handing off between agents, the new agent needs the context of the conversation. Many developers try to clean up the history by just passing the raw user messages to the new agent. Do not do this. When handing off between agents, you must include the AI message containing the actual tool call that initiated the handoff, and the resulting Tool message that acknowledges the handoff occurred. If you drop the tool call and the tool message from the message array, the conversation history breaks. The new model loses the logical chain of events. It will not know how it got there, and it will likely repeat questions the user already answered. Always pass the unbroken message history. Here is the key insight. The state is not just a passive memory store, but the control plane that dictates exactly what your system is capable of doing at any given millisecond. That is all for this one. Thanks for listening, and keep building!

Custom Workflows and Routers

3m 41s

We step outside the standard agent loop. You will learn how to use LangGraph to build custom routing architectures, mixing deterministic logic with non-deterministic agentic reasoning.

Download

Hi, this is Alex from DEV STORIES DOT EU. LangChain v1.0 Orchestration Framework, episode 18 of 20. Sometimes you do not want an AI agent to freely decide what to do next. You just want it to execute a strict, deterministic flowchart. Custom Workflows and Routers give you exactly that level of control. When you rely solely on a standard agent loop, you are trusting the language model to figure out every step on its own. It might search a database, realize it needs more data, search again, and eventually answer. This is powerful, but it is unpredictable and often slow. Custom workflows in LangGraph let you break out of that loop. You get to draw the map. You can seamlessly mix deterministic logic, like executing exact data retrieval scripts, with non-deterministic agent reasoning. You put the language model inside a strict sequence of events. Before we build one, we need to clear up a common mix-up between a router and a supervisor. A supervisor actively orchestrates a multi-turn conversation. It watches the agents talk, decides who speaks next, and manages the dialogue over time. A router is not that. A router is just a classification step. It looks at the input, decides which path the workflow should take, routes the data, and its job is done. It can be stateless or stateful, but it is not a conversational manager. Let us look at a concrete scenario. You are building a multi-source knowledge base tool. A user asks a question, and the answer might be buried in GitHub pull requests, Slack threads, or both. You do not want a single agent blindly guessing where to look. You want a structured workflow. First, you create a routing node. You pass the user query to a language model and ask it to output a simple list of destinations. If the query is about a recent bug fix, the model might output the words GitHub and Slack. This is the part that matters. You do not have to pick just one path. You can run multiple agents at the exact same time using the Send API. In LangGraph, instead of returning a single next step from your conditional logic, your routing function returns a list of Send commands. Each command pairs a destination node with the specific data it needs. The graph sees multiple Send commands and automatically executes all those target nodes in parallel. This is called fanning out. While fanning out, the workflow hits your agent nodes. In a custom workflow, invoking an agent is straightforward. An agent is simply a runnable process executed inside a standard node function. The Slack node receives the query, runs a dedicated Slack agent to search channels, extracts the context, and returns it to the overall graph state. The GitHub node does the same thing simultaneously for code repositories. Isolating these agents inside specific nodes ensures they only do the job they were built for. Finally, all those parallel branches must converge. You fan in. You create a synthesizer node that waits for the parallel agents to finish. It reads the overall graph state, takes the context gathered from Slack and the context gathered from GitHub, hands them both to a final language model, and generates a single, clean answer for the user. The real power of custom workflows is wrapping the unpredictable nature of large language models inside the predictable reliability of standard software routing. That is your lot for this one. Catch you next time!

Agent-to-Agent Communication

4m 12s

We explore the LangSmith A2A endpoint. You will learn how distributed agents deployed on entirely different servers can converse natively using Google's A2A RPC protocol.

Download

Hi, this is Alex from DEV STORIES DOT EU. LangChain v1.0 Orchestration Framework, episode 19 of 20. What happens when an agent built in Python needs to natively converse with an agent built by a completely different team, running on a completely different server? If you rely on hardcoded internal function calls, your system breaks the moment it crosses a network boundary. The solution is Agent-to-Agent Communication. Agent-to-Agent, or A2A, is a communication protocol that enables truly distributed multi-agent systems. It allows agents hosted on entirely different servers to maintain a continuous conversation without needing to share the same underlying codebase or local memory space. Instead of wrapping everything in a single massive application, you route requests over the network. The communication strictly relies on a defined endpoint format: forward slash a2a forward slash followed by the assistant ID. Every agent participating in this distributed network exposes this exact endpoint path. When one agent needs help from another, it sends an HTTP POST request there. The payload sent to this endpoint is structured as a standard JSON-RPC message. To keep the conversation coherent across multiple network hops and different servers, the protocol uses two distinct identifiers in its payload. Developers sometimes confuse these two, so we will define their boundaries. The first is the Context ID. The Context ID is responsible for overall thread continuity. It represents the entire overarching conversation history from the first prompt to the final output. The second is the Task ID. The Task ID identifies the specific request or step within that single turn. Context ID spans the whole session. Task ID changes every time one agent asks the other to perform a new action. Consider a practical scenario where Agent A is running on a server listening at port 2024, and Agent B is running on a different server at port 2025. Agent A realizes it needs Agent B to handle a specific subtask, perhaps checking external inventory. Agent A prepares a JSON-RPC message. Inside this message, it includes the existing Context ID so Agent B knows which ongoing conversation this belongs to. Agent A also generates a brand new Task ID for this specific inventory request. Agent A sends this payload to the A2A endpoint on port 2025, inserting Agent B's specific assistant ID right into the URL path. Agent B receives the request. It reads the Context ID to recall any necessary background state, processes the task requested in the JSON-RPC parameters, and computes the result. Agent B then constructs a JSON-RPC response. This response explicitly includes the exact Task ID that Agent A originally provided. Agent B sends this response back to Agent A on port 2024. Agent A receives the result, matches the Task ID to its pending request, and continues its own internal execution. Here is the key insight. Because the protocol enforces standard JSON-RPC and isolates state tracking into specific Context and Task identifiers, neither agent needs to know how the other operates internally. They do not maintain a constant, open socket connection. They simply take turns passing structured messages back and forth across standard HTTP boundaries. One server asks a question, the other answers, and the overarching task progresses. When you separate the long-term conversation thread from the individual short-term task execution, you can scale multi-agent networks across different servers and frameworks indefinitely. If you find these episodes helpful and want to help support the show, you can search for DevStoriesEU on Patreon. That is all for this one. Thanks for listening, and keep building!

The Future is MCP

4m 18s

We look forward with the Model Context Protocol, standardizing how agents access external tools. You will learn how to connect remote MCP servers to your agent using standard transports.

Download

Hi, this is Alex from DEV STORIES DOT EU. LangChain v1.0 Orchestration Framework, episode 20 of 20. Every time you want your agent to talk to a new database or API, you end up writing a custom wrapper. Your codebase fills up with brittle integrations that break whenever an external API changes. This limits how fast you can scale your applications. The solution to this integration bottleneck is the Model Context Protocol, or MCP. Think of MCP as USB-C for AI agents. It standardizes how tools and context are exposed to large language models. Before this protocol, if you wanted your agent to query a database and check a weather service, you had to write specific Python functions for both, define their input schemas manually, and bind them to your model. With MCP, the external service itself provides a standardized interface. Your agent simply plugs into it and instantly understands what tools are available, what arguments they require, and how to execute them. A common misunderstanding is that using a remote MCP server means your agent logic is moving over the network. Here is the key insight. Your agent remains completely local. The remote server does not run your agent or control its reasoning. It merely exposes a list of standardized JSON schemas representing the tools it supports. Your local agent reads those schemas, decides which tool to use based on the user prompt, and sends an execution request back to the server. The execution happens there, and the raw result is returned to your local agent. In LangChain, you manage these connections using the MultiServerMCPClient. This component acts as a central hub. It allows a single agent to connect to multiple different MCP servers simultaneously, gathering tools from all of them. The client handles the underlying communication using different transport layers. The two primary transports you will configure are standard input and output, referred to as stdio, and HTTP. Let us walk through a concrete scenario. You are building an agent that needs to perform complex calculations using a local Python script, while also fetching live weather data from a remote service. Instead of writing custom tool wrappers for these tasks, you configure the MultiServerMCPClient to handle both. First, you define your local math server using the stdio transport. You configure the client with the command to run, such as your system Python executable, and the path to your math script. When the client initializes, it spins up this script as a local background process. The LangChain client and the script pass messages back and forth directly through standard input and standard output streams. Next, you define the weather server using the HTTP transport. For this, you just provide the endpoint URL of the remote weather service. This setup typically relies on Server-Sent Events to maintain a persistent connection, allowing the agent to request actions and stream responses over the web. Once both transports are defined, you initialize the MultiServerMCPClient. The client immediately reaches out to the local math process via stdio and the remote weather URL via HTTP. It asks both servers to hand over their tool definitions. It collects the schemas, merges them into one continuous list, and provides them to your LangChain agent. From the perspective of the agent, it just sees a unified list of available tools. It is completely unaware that one tool executes in a local binary process and the other triggers an HTTP request to a server across the world. The shift toward standardized protocols means you can spend your time building better agent logic instead of maintaining endless API wrappers. Since this is the final episode of the series, I highly encourage you to read the official LangChain documentation and try setting up a local MCP server hands-on. If you have suggestions for topics you want to see in our next series, visit devstories dot eu and drop us a message. The true power of an agent lies not in what it knows, but in what it can seamlessly connect to. That is all for this one. Thanks for listening, and keep building!