Back to catalog
Season 52 14 Episodes 55 min 2026

Prompt Flow: The Complete Guide

v1.13 — 2026 Edition. A comprehensive guide to Prompt Flow v1.13, a suite of development tools designed to streamline the end-to-end development cycle of LLM-based AI applications. Learn how to design, test, trace, evaluate, and deploy your AI apps.

LLM Orchestration Prompt Engineering AI/ML Frameworks
Prompt Flow: The Complete Guide
Now Playing
Click play to start
0:00
0:00
1
The Philosophy of Prompt Flow
This episode covers the core design principles behind Prompt Flow and why it prioritizes prompt visibility. Listeners will learn the difference between hiding prompts inside frameworks and exposing them for continuous experimentation and tuning.
3m 45s
2
Flows and the DAG Architecture
This episode covers the high-level mental model of treating LLM applications as Directed Acyclic Graphs (DAGs). Listeners will learn the difference between Flex flows and DAG flows, and how Standard, Chat, and Evaluation flows serve different purposes.
4m 00s
3
The Building Blocks: Tools
This episode covers Tools, the fundamental executable units in Prompt Flow. Listeners will learn how to leverage the three core built-in tools: LLM, Python, and Prompt.
4m 40s
4
Managing Secrets with Connections
This episode covers how Connections securely manage credentials for external services across local and cloud environments. Listeners will learn why hardcoding API keys is dangerous and how Prompt Flow isolates secrets.
4m 49s
5
The Prompty Specification
This episode covers the anatomy of a .prompty file, including its YAML front matter and Jinja template. Listeners will learn how to standardize prompt management into a single, version-controllable markdown asset.
5m 04s
6
Dynamic Prompty Execution
This episode covers how to execute Prompty files dynamically in Python. Listeners will learn how to override model configurations at runtime and test Prompty files via the CLI.
3m 16s
7
Flex Flows: Function-Based Development
This episode covers how to encapsulate LLM application logic using pure Python functions. Listeners will learn how to leverage the @trace decorator for minimal-friction entry points into Flex flows.
3m 41s
8
Flex Flows: Class-Based Development
This episode covers managing state and lifecycle using Python classes in Flex Flows. Listeners will learn how to build complex conversational agents that maintain connections and history.
3m 53s
9
DAG Flows: Building from YAML
This episode covers defining logic explicitly using flow.dag.yaml files. Listeners will learn how to connect functions and tools via input/output dependencies and utilize visual editors.
3m 52s
10
Tracing LLM Interactions
This episode covers tracking and debugging LLM calls using the promptflow-tracing package. Listeners will learn how to implement OpenTelemetry specification tracing to get deep visibility into execution latency and inputs.
3m 29s
11
Advanced Tracing: LangChain and AutoGen
This episode covers how Prompt Flow tracing integrates with third-party orchestration libraries. Listeners will learn how to gain execution visibility into LangChain and AutoGen scripts without a massive rewrite.
3m 25s
12
Scaling Up: Batch Runs with Data
This episode covers running flows against large datasets using JSONL files. Listeners will learn how to map inputs to data columns and execute batch processes to validate their prompts against edge cases.
4m 16s
13
The Evaluation Paradigm
This episode covers using evaluation flows to compute metrics on the outputs of a batch run. Listeners will learn how to transition from traditional unit testing to statistical grading of stochastic LLM responses.
3m 39s
14
Taking Flows to Production
This final episode covers the myriad deployment options available for a completed flow. Listeners will learn how a flow serves as a production-ready artifact that can be deployed to Docker, Kubernetes, or App Services.
3m 51s

Episodes

1

The Philosophy of Prompt Flow

3m 45s

This episode covers the core design principles behind Prompt Flow and why it prioritizes prompt visibility. Listeners will learn the difference between hiding prompts inside frameworks and exposing them for continuous experimentation and tuning.

Download
Hi, this is Alex from DEV STORIES DOT EU. Prompt Flow: The Complete Guide, episode 1 of 14. Most AI libraries try to abstract complexity away by burying your prompts deep inside wrapper functions. But when you move to production, those prompts are exactly the things you need to control. This is the core philosophy of Prompt Flow. Let us clear up a common misconception first. Prompt Flow is not a framework like LangChain. LangChain is a development framework that provides pre-built chains and agents, which often encapsulate the underlying prompts. Prompt Flow is a suite of tools designed for experimentation and evaluation. It exists because the traditional software engineering principle of encapsulation actually becomes a liability when you are building applications with large language models. In standard programming, you hide complex logic behind a clean function interface. You do not need to know how the function works internally, you just care about what it returns. But prompts are highly volatile. They are not static logic. If you change your model from one version to another, a prompt that performed perfectly yesterday might fail today. If you are using an opaque third-party library to summarize documents, and the underlying prompt is locked inside that library, you cannot fix a poor summary. You are entirely dependent on the library maintainers. Prompt Flow flips this design pattern by exposing the prompts. It treats them as first-class developer assets. You need to view, tune, and version them continuously. Instead of a black box, you get a transparent toolchain where you control the exact text and variables going into the language model. This is the part that matters. Because prompts are volatile, building AI applications requires a fundamentally new way of working. In standard software, you write unit tests. You assert that a specific input produces a specific output. Language models are probabilistic, meaning they do not provide deterministic answers. You cannot write a simple assertion to check if a generated email is polite or if a summary is accurate. Instead, you must adopt an evaluation-centric workflow. You have to run your prompt over hundreds of diverse examples and measure metrics like relevance or formatting accuracy. Prompt Flow is built directly around this workflow. It integrates prompt tuning with mass evaluation. When you change a single word in your prompt, the tooling helps you see statistically if that change improved your success rate across your dataset or degraded it. The final pillar of this philosophy is optimizing for visibility. AI applications are rarely just one API call. They are complex execution graphs. You might take a user question, query a vector database, format the retrieved data, inject it into a prompt, and then call the model. When the final answer is wrong, you need to know exactly where the chain broke. Prompt Flow makes this execution graph visible. You can inspect every node to see the precise inputs and outputs at that exact step in the process, making debugging straightforward. The single most important takeaway is that prompts are living, volatile variables that demand constant observation, not static code you can write once and hide away. If you find these episodes helpful and want to support the show, you can search for DevStoriesEU on Patreon. That is all for this one. Thanks for listening, and keep building!
2

Flows and the DAG Architecture

4m 00s

This episode covers the high-level mental model of treating LLM applications as Directed Acyclic Graphs (DAGs). Listeners will learn the difference between Flex flows and DAG flows, and how Standard, Chat, and Evaluation flows serve different purposes.

Download
Hi, this is Alex from DEV STORIES DOT EU. Prompt Flow: The Complete Guide, episode 2 of 14. Before writing a single line of code, you need to stop thinking of LLM applications as monolithic scripts. If you try to write them linearly, tracking dependencies and debugging state across multiple external services quickly becomes unmanageable. The structural fix for this is seeing your application as a graph of independent function calls, a concept known in Prompt Flow as Flows and the DAG architecture. A flow is simply an executable workflow. At its core, an LLM application is an orchestrated sequence of external calls glued together by logic. You might call a search engine, query a database, run a Python script to format the retrieved data, and finally send a prompt to an LLM. Prompt Flow models this sequence as a Directed Acyclic Graph, or DAG. In this graph, every discrete step of your application is a node, and the connections between those nodes represent the flow of data. It is directed because data moves forward from one function to the next, and it is acyclic because the data path does not loop back on itself. Consider a simple application that answers questions based on internal company data. The user asks a question, which serves as your initial input. That input flows into the first node, a Python function that executes a database query. The database returns a block of text. That text, along with the original user question, flows into the next node, which makes the actual call to the LLM. The LLM generates an answer, which becomes the final output of the entire graph. By structuring the app this way, each function is strictly isolated. You know exactly what went into the database call and exactly what came out before the LLM was even triggered. When building these workflows, developers often encounter two terms and wonder which one is superior: Flex flow and DAG flow. This is where it gets interesting. They both achieve the exact same result. They just offer different developer experiences. Flex flow is a code-first approach. You encapsulate your logic inside a standard Python function or class, designate it as the entry point, and write raw code. Prompt Flow simply runs it. DAG flow, on the other hand, defines the routing using a YAML file. By explicitly listing functions as nodes and linking their inputs and outputs in YAML, you allow the platform to render a visual representation of your application. DAG flows are highly UI-friendly, making it easy to inspect the architecture at a glance. If you choose the DAG flow approach, you will work with three specific types of flows. The first is the Standard flow. This is your general-purpose pipeline where you connect tools, Python code, and models to build typical applications. The second is the Chat flow. This builds directly on the Standard flow but is specifically tailored for conversational applications. It adds native support for managing chat history and configures the necessary chat inputs and outputs automatically. The third type is the Evaluation flow. You do not use this flow to serve end users. Instead, you run an Evaluation flow against the outputs of your Standard or Chat flows. It acts as a testing mechanism to calculate metrics like factual accuracy or relevance based on the data your main flow produced. Whether you define your logic in pure Python or wire it up visually with YAML, your LLM application is ultimately just a pipeline routing text between external systems. Master the graph, and you control the application. That is all for this one. Thanks for listening, and keep building!
3

The Building Blocks: Tools

4m 40s

This episode covers Tools, the fundamental executable units in Prompt Flow. Listeners will learn how to leverage the three core built-in tools: LLM, Python, and Prompt.

Download
Hi, this is Alex from DEV STORIES DOT EU. Prompt Flow: The Complete Guide, episode 3 of 14. You have designed the perfect workflow for your AI application, but a blueprint alone does not process data. You need components that actually execute the work, like fetching URLs, formatting strings, and calling APIs. If flows are the blueprint of your application, tools are the bricks. Today, we are looking at Tools. In Prompt Flow, tools are the fundamental, executable building blocks of a flow. Every node in your graph is a tool. When your flow runs, it is simply passing data from one tool to the next. While you can extend the platform, there are three built-in tools you will use in almost every project: the Python tool, the Prompt tool, and the LLM tool. Let us walk through a practical scenario to see how they fit together. You want an application that fetches a webpage, formats the raw text, and generates a summary. First, you need to get the webpage content. You use the Python tool. This tool lets you write custom Python scripts and acts as your bridge to the outside world. You write a short script that takes a URL as an input, makes an HTTP request, and returns the raw text of the page. The Python tool handles the execution and passes that raw text down the line as an output. Next, you need to prepare the instructions for the language model. You use the Prompt tool. This tool takes text inputs, like the raw text from your Python tool and a system prompt defining the AI persona, and formats them into a single, clean string. This is the part that matters. The Prompt tool does not call an AI model. It strictly prepares and formats text. Separating this step makes your flow much easier to read and test, especially when dealing with complex, multi-part prompts. Finally, you send that prepared string to the model using the LLM tool. This tool handles the actual connection to a Large Language Model endpoint. You pass it the formatted string from your Prompt tool, configure the model parameters like temperature, and it returns the generated summary. The LLM tool does the heavy lifting of formatting the API payload and interacting with the provider. While those three cover most core use cases, they are not your only options. Prompt Flow supports partner tools provided by third parties. A common example is the Vector DB Lookup tool, which searches vector databases for similar text based on embeddings. You can also install custom packages or build your own tools for highly specific integrations. Regardless of their origin, they all operate on the exact same principle: they take inputs, execute a specific function, and return an output. The most important thing to remember is the strict separation of concerns. Do not write Python code to format prompts, and do not use the LLM tool to concatenate strings. Using the right built-in tool for its exact purpose keeps your graphs clean and debuggable. Thanks for hanging out. Hope you picked up something new.
4

Managing Secrets with Connections

4m 49s

This episode covers how Connections securely manage credentials for external services across local and cloud environments. Listeners will learn why hardcoding API keys is dangerous and how Prompt Flow isolates secrets.

Download
Hi, this is Alex from DEV STORIES DOT EU. Prompt Flow: The Complete Guide, episode 4 of 14. Nothing ruins a production deployment faster than an API key accidentally committed to version control. Even if you avoid that disaster, juggling credentials across local development and cloud environments often leads to tangled configuration files and security risks. Managing secrets with connections solves this by completely isolating your sensitive data from your flow logic. A connection in Prompt Flow is a dedicated resource that stores the endpoints and credentials required to interact with external services. If your flow needs to call an external language model, search the web using an external service, or query a remote database, it needs authorization. Instead of writing API keys directly into your Python scripts or configuration files, you create a connection. Your flow then references that connection by its name. By using connections, you decouple your secret data from your execution logic. Your code only knows that it requires a connection named, for example, main_language_model. It does not know the actual API key. Let us look at how this works when moving a project from your laptop to the cloud. When you develop locally, your connections are stored on your local disk. To maintain security, Prompt Flow encrypts the secret values using a local encryption key. You can build and test your flow using this local setup without leaving plaintext keys exposed in your working directory. When you are ready to deploy this flow to Azure AI, the environment changes, but your code does not. In Azure AI, connections are securely backed by Azure Key Vault. The secrets are stored and managed within the Key Vault infrastructure, protected by strict access policies. This is the part that matters. Because your flow only references the connection by its name, transitioning from a local environment to the cloud requires zero changes to your flow logic. You simply ensure a connection with the identical name exists in your Azure workspace. When the flow runs in the cloud, it asks for main_language_model. The system seamlessly intercepts that request and provides the Key Vault-backed credentials instead of the local encrypted ones. Your code remains clean and environment-agnostic. Prompt Flow categorizes these connections into two main types. The first is strongly typed connections. These are built-in templates for widely used services like Azure OpenAI. They provide predefined fields for the endpoint URL, the API key, and the API type. The system knows exactly how to handle these fields, which keeps standard tool configuration straightforward. The second type is the custom connection. When you need to integrate an internal company API or a third-party service that lacks a built-in template, you use a custom connection. This acts as a flexible dictionary where you define your own key-value pairs. You can explicitly mark specific keys as secrets. Once marked, those custom secrets receive the exact same local encryption and Key Vault protection as the built-in connections. The defining value of connections is that they act as a strict abstraction layer for authentication, ensuring your flow remains completely portable and your secrets remain secure across any execution environment. That is all for this one. Thanks for listening, and keep building!
5

The Prompty Specification

5m 04s

This episode covers the anatomy of a .prompty file, including its YAML front matter and Jinja template. Listeners will learn how to standardize prompt management into a single, version-controllable markdown asset.

Download
Hi, this is Alex from DEV STORIES DOT EU. Prompt Flow: The Complete Guide, episode 5 of 14. Stop burying your language model prompts inside massive Python strings. When you hardcode prompts in your application logic, tracking changes, running standalone tests, and collaborating with prompt engineers becomes an absolute nightmare. The solution is the Prompty Specification. A Prompty is a standard format for managing prompts. It moves your prompt out of your code and into a single, version-controllable markdown file with a dot prompty extension. The file is split into two distinct sections. At the top, you have YAML front matter. At the bottom, you have a Jinja-formatted prompt template. Three dashes separate the two. The YAML block acts as the control center. You can define basic metadata like the name, description, and author. More importantly, it holds the model configuration. You specify the API type, such as chat or completion. You define the configuration details, like pointing to an Azure OpenAI deployment of GPT-3.5. You also lock in the model parameters right here. If a specific prompt requires a temperature of zero point seven and a max token limit of one thousand, you declare that in the YAML. This binds the execution settings directly to the prompt text, ensuring the prompt behaves consistently no matter where it is used. The YAML section also defines inputs and sample data. If your prompt expects a dynamic variable, you list it here and provide sample values. This makes the file entirely self-contained. Anyone opening it knows exactly what data it expects without having to reverse-engineer your application code. Below the YAML and the three dashes lies the actual prompt template. This section uses Jinja2 syntax to dynamically inject the inputs you defined above. Because modern language models use chat interfaces, the template supports role designations. You define roles using a simple text format, separating system instructions from user inputs. Consider a minimal chat scenario where you want a prompt that greets a user by their first name. At the top of your dot prompty file, you write the YAML front matter. You define your model section, setting the API type to chat and pointing the configuration to a GPT-3.5 deployment. Next, you add an inputs section declaring a variable called first name. You also add a sample block where first name is set to Jane. You type three dashes to end the YAML front matter. Now you build the template. You type the word system followed by a colon, then give the model its baseline instructions, like telling it to be a helpful assistant. Below that, you type the word user followed by a colon. Finally, you write the greeting, enclosing the first name variable in double curly braces so the Jinja engine knows where to inject the text. You now have a complete, reusable asset. Treating prompts as self-contained files rather than loose strings in your code is the first step toward rigorous prompt engineering, because it forces a clear contract between the application providing the data and the language model generating the response. Thanks for spending a few minutes with me. Until next time, take it easy.
6

Dynamic Prompty Execution

3m 16s

This episode covers how to execute Prompty files dynamically in Python. Listeners will learn how to override model configurations at runtime and test Prompty files via the CLI.

Download
Hi, this is Alex from DEV STORIES DOT EU. Prompt Flow: The Complete Guide, episode 6 of 14. You build a carefully tuned prompt template for production, but when you want to test it against a cheaper model or adjust the temperature for a specific edge case, you find yourself manually editing the source file. A static template creates friction when your environment or logic needs to shift on the fly. That friction is exactly what Dynamic Prompty Execution is designed to eliminate. A Prompty asset typically defines model settings like the deployment name, API connection, and parameters within its header block. However, hardcoding these values restricts how you use the file across different environments. Dynamic execution allows you to treat the Prompty file as a flexible base layer, overriding its configurations directly in Python or via the command line at runtime. To run a Prompty file in Python, you use the load prompty function from the Prompt Flow core library. You pass the file path to this function, and it returns a callable object in memory. To execute it, you just call that object, passing your prompt variables as standard keyword arguments. The library handles the compilation and the API call, returning the final text output. This is the exact point where dynamic execution proves its worth. You can intercept the execution to override the model settings without touching the underlying file. Suppose you have a Prompty configured for a standard Azure OpenAI deployment, but for a specific batch job, you need to point it to a different Azure endpoint and bump up the temperature to get more varied responses. Instead of duplicating the file, you define a dictionary in your Python code containing your new settings. You add your alternate endpoint and your new temperature value to this dictionary. Then, when you call your loaded Prompty object, you pass this dictionary to the model keyword argument, alongside your standard prompt inputs. The Prompt Flow runtime merges your dictionary with the base file configuration. Your dynamic overrides take precedence, the prompt executes with the new settings, and the original file remains completely unchanged. This allows you to swap API keys, change the max tokens, or redirect the model target programmatically based on your application state. Sometimes, you do not want to write a Python script just to see if a prompt yields a good result. For rapid validation, you can test a Prompty file directly from your terminal. You use the pf flow test command, providing the path to your file using the source flag. You can append the inputs flag to pass your variables straight into the command as key-value pairs. The command line interface executes the Prompty and prints the model response directly to standard output. This gives you an immediate feedback loop during development without writing wrapper code. The real value of a Prompty asset is not in locking down configurations, but in isolating the prompt text from the execution environment. By injecting model overrides dynamically at runtime, a single file can seamlessly serve your local testing sandbox, your automated pipelines, and your production endpoints. That is all for this one. Thanks for listening, and keep building!
7

Flex Flows: Function-Based Development

3m 41s

This episode covers how to encapsulate LLM application logic using pure Python functions. Listeners will learn how to leverage the @trace decorator for minimal-friction entry points into Flex flows.

Download
Hi, this is Alex from DEV STORIES DOT EU. Prompt Flow: The Complete Guide, episode 7 of 14. You might think that using a specialized framework for large language models means learning a complex visual interface or maintaining massive configuration files. But if you already know how to write a Python script, you already know enough to build a fully trackable application. Function-Based Flex Flows resolve this tension entirely. In standard Prompt Flow development, you build a Directed Acyclic Graph. That structure is highly effective for strict, multi-step pipelines, but sometimes developers want to write pure Python code without adapting to a visual node system. Flex flows allow you to do exactly that. You encapsulate your LLM application logic inside standard Python functions, and the platform handles the tracking and orchestration in the background. Consider the function-based approach. You start by writing a normal Python function. Give it a descriptive name, like chat, and define its inputs, such as taking a question as a string parameter. Inside this function, you write your logic just like you normally would. You might load a Prompty file to get your system message, initialize your language model client, pass the question to the model, and then return the text response as a string. At this stage, you simply have a standard Python script. It runs locally, it is easy to test, and it requires no special framework knowledge. To turn this pure Python script into a trackable Prompt Flow component, you import the trace decorator from the promptflow tracing package. You place this decorator directly above your chat function. When you run your code, this decorator tells the system to monitor the execution silently. It automatically records the inputs passed to the function, the text output returned, the exact execution time, and any internal errors. If you apply the trace decorator to other helper functions within your script, the system builds a complete call tree. You get the full observability of a visual flow, including the ability to view the execution trace in the local user interface, without changing how you structure your logic. Now, the tooling needs a way to know that this specific function is the entry point of your application. You provide this by creating a single, very short configuration file called flow.flex.yaml in the same directory as your code. This file does not define a complex routing graph. It only needs one critical piece of information, which is the entry mapping. You write the word entry followed by the name of your Python module, a colon, and the name of your function. If your file is named app.py and your function is chat, your entry value is simply app colon chat. When you test or run this flow using the Prompt Flow command line tool or the VS Code extension, the system reads that yaml file. It looks up the chat function in your app module, injects the provided inputs, runs your pure Python code, and collects the traces generated by the decorator. The real power of function-based flex flows is that they remove the friction between prototyping a script and deploying a production application; your pure Python logic remains entirely under your control, while a single decorator and a two-line configuration file unlock enterprise-grade observability. That is all for this one. Thanks for listening, and keep building!
8

Flex Flows: Class-Based Development

3m 53s

This episode covers managing state and lifecycle using Python classes in Flex Flows. Listeners will learn how to build complex conversational agents that maintain connections and history.

Download
Hi, this is Alex from DEV STORIES DOT EU. Prompt Flow: The Complete Guide, episode 8 of 14. You deploy a new language model app, but every time a user sends a message, it takes precious seconds just to establish the database connection and load the client credentials. When your application needs to maintain a persistent connection or remember conversation history, a standalone script fails because it starts completely from scratch on every run. The answer to this problem is Flex Flows: Class-Based Development. When you build applications that scale, state management becomes a primary concern. If your flow relies on an external resource, like an Azure OpenAI client, initializing that client requires reading secrets, verifying endpoints, and allocating memory. If you put that logic inside a basic execution sequence, you pay that heavy startup cost every single time a request comes in. Using a Python class as the entry point for your Flex Flow allows you to fundamentally split your initialization logic away from your execution logic. A class-based flow relies on two standard Python methods to manage this lifecycle. The first is the constructor, or the init method. This is your setup phase. Prompt Flow runs this method exactly once when the flow is first loaded into memory. This is where you do all the heavy lifting. The second method is the call method. This is your execution phase, and it runs every time the flow is triggered by a user request. Picture a chat flow class. You define your init method to accept an Azure OpenAI connection object and a system prompt string. Inside the init method, you create your Azure OpenAI client and store it as a property on the class instance itself. The client is now ready and waiting. Next, you define your call method. This method accepts a new user question and a list of past messages representing the chat history. Because the client is already fully initialized, the call method immediately formats the prompt, sends the chat history to the language model, and returns the response. The execution is fast because the expensive client setup was completely bypassed. To make this work, Prompt Flow needs to know how to instantiate your class. You configure this by creating a YAML file for your flow. The entry point in this configuration file uses a specific format, naming the Python module followed by a colon and the name of your class. This is the part that matters. Your YAML configuration does not just point to the class, it defines the parameters required by your init method. If your class constructor requires an Azure connection and a model name, you declare those inputs in the YAML file under a dedicated init section. When the Prompt Flow engine starts, it reads the YAML, injects those configured parameters into your class constructor, and then keeps that instantiated object alive in memory to handle incoming calls. Testing a class-based flow locally is incredibly straightforward because it relies on standard Python behavior. You do not need a complex test harness. You just write a standard Python script, import your class, create an instance by passing in mock connections or local credentials to the constructor, and then invoke the object directly by passing a test message to it. You can debug the setup logic and the execution logic independently. The core takeaway is that class-based flows give you the architectural structure to separate the heavy lifting of initialization from the repetitive work of execution, keeping your state persistent and your responses fast. If you find these episodes helpful and want to support the show, you can search for DevStoriesEU on Patreon. That is all for this one. Thanks for listening, and keep building!
9

DAG Flows: Building from YAML

3m 52s

This episode covers defining logic explicitly using flow.dag.yaml files. Listeners will learn how to connect functions and tools via input/output dependencies and utilize visual editors.

Download
Hi, this is Alex from DEV STORIES DOT EU. Prompt Flow: The Complete Guide, episode 9 of 14. Sometimes staring at a wall of code is not enough to understand a complex application. When prompts, scripts, and API calls interact in dozens of ways, you need to physically see the data moving between them to spot the bottlenecks. DAG Flows are the answer for teams that want explicit architectural clarity. DAG stands for Directed Acyclic Graph. In Prompt Flow, a DAG flow is a method of building AI applications by linking different tools together as nodes. Instead of writing a single large script, you define your application structure in a file named flow dot dag dot yaml. This file acts as the master blueprint. It declares the starting inputs, the individual steps, and the final outputs of your application. Every step in a DAG flow is called a node. A node represents a specific tool performing a single task. You might have a node that runs a Python snippet, another node that formats a prompt, and a third node that calls a Large Language Model. The YAML file describes how these nodes relate to one another through input and output dependencies. This dependency mapping is what makes the flow work. You do not manually tell the system what order to execute the steps in. Instead, you specify that node B requires the output of node A. Because node B cannot start until node A finishes, an execution order is naturally formed. If you add a node C that only depends on the initial user input, Prompt Flow will recognize that it does not need to wait for node A or B. It will run node C in parallel automatically. The engine reads the yaml file, resolves the graph, and handles the orchestration. Writing and maintaining this yaml file by hand can become difficult as your application grows. That is why most developers use the Prompt flow VS Code extension. This extension reads your flow dot dag dot yaml file and renders a visual, drag-and-drop user interface. You can view your application as a literal map of connected blocks. Consider a concrete scenario. You are building an application to read financial reports and write short executive summaries. In the visual editor, you create a Python tool node and name it Extract Text. Next to it, you add an LLM tool node named Summarize. Rather than writing code to manage the state between these two operations, you use the interface. You click the output port on the Extract Text node and drag a line to the input port on the Summarize node. You have just visually wired the data path. The extension instantly updates the underlying yaml file to record that connection. You get the speed and clarity of a visual low-code builder, but you still generate a plain text configuration file that you can track in version control. This visual approach forces architectural discipline. The graph is acyclic, which means data can only flow forward. There are no infinite loops. Data comes in, passes through your sequence of tools, and exits. This strict directional flow makes debugging straightforward. If a final summary looks wrong, you can open the visual graph, click on the connection between the extraction and summarization nodes, and inspect the exact string of text that was passed across that wire. You know exactly where the data came from and where it went. The real advantage of a DAG flow is that your system architecture diagram and your executable application are the exact same thing. That is all for this one. Thanks for listening, and keep building!
10

Tracing LLM Interactions

3m 29s

This episode covers tracking and debugging LLM calls using the promptflow-tracing package. Listeners will learn how to implement OpenTelemetry specification tracing to get deep visibility into execution latency and inputs.

Download
Hi, this is Alex from DEV STORIES DOT EU. Prompt Flow: The Complete Guide, episode 10 of 14. When a language model returns a hallucinated garbage answer, how do you know if the model itself failed, or if your prompt formatting logic had a bug? You can scatter print statements throughout your application, but that becomes impossible to maintain as your application grows. Tracing LLM interactions fixes this by capturing the exact context of every execution. Tracing turns a black-box LLM application into a transparent, debuggable sequence of events. It records precisely what data goes into a function, what comes out, and how long the execution takes. In this ecosystem, you handle this with the promptflow-tracing package. The data it generates is based on the OpenTelemetry specification, meaning your execution records follow an industry-standard format for observability. To capture the baseline interactions with a model, you use a function called start trace. If you call this function at the very beginning of your Python script, Prompt Flow automatically instruments supported model clients. For example, if you are using the standard OpenAI Python package, you do not need to modify your API calls at all. The tracer quietly intercepts the interaction, logging the system message, the user prompt, the specific model parameters, and the final response string. Capturing the API call is only half the battle. The logic leading up to that call is usually where bugs hide. To trace your own application logic, you apply the trace decorator to your custom functions. Consider a function named math to code. This function takes a user describing a math problem, fetches necessary context, constructs a prompt, and finally requests Python code from GPT-4. By placing the trace decorator directly above the math to code function definition, you tell Prompt Flow to record every execution of this specific block of logic. It will log the user input string, the final returned code, and the latency of the entire operation. Because you also ran start trace at the top of your file, the system understands the relationship between your code and the model call. It builds a hierarchical record. Your math to code function becomes the parent span, and the internal OpenAI call becomes a child span nested inside it. You view this hierarchy using the local Trace UI. When you run a traced script, Prompt Flow starts a local server that you can open in your web browser. This interface gives you a visual timeline of your application flow. You can select the parent math to code span to verify the arguments it received. Then, you can select the child OpenAI span to inspect the exact raw prompt string that was successfully sent to the model over the network. If your context assembly logic missed a variable, you will see the empty space in the raw prompt immediately. The real power of tracing is not just collecting execution logs, but definitively proving what happened at the exact boundary between your application code and the external model. That is all for this one. Thanks for listening, and keep building!
11

Advanced Tracing: LangChain and AutoGen

3m 25s

This episode covers how Prompt Flow tracing integrates with third-party orchestration libraries. Listeners will learn how to gain execution visibility into LangChain and AutoGen scripts without a massive rewrite.

Download
Hi, this is Alex from DEV STORIES DOT EU. Prompt Flow: The Complete Guide, episode 11 of 14. You just spent three months building a complex application using LangChain. The logic mostly works, but when an agent occasionally spirals into a bad loop or drops context, finding out exactly where it derailed is a nightmare, and rewriting the whole thing just to get better logs is completely out of the question. Advanced Tracing for LangChain and AutoGen is how you solve this directly. When you build with orchestration frameworks, you gain speed through abstraction. The framework hides the messy details of prompt sequencing, tool execution, and parsing. But abstraction inherently creates a black box. When your agent returns a confusing answer, you need to see its internal chain of thought. You need to know the exact sub-prompt it generated, the raw API response it received, and which specific tool call failed. Prompt Flow includes a standalone tracing capability designed exactly for this. It does not require you to use Prompt Flow for your logic or execution. You keep your existing LangChain or AutoGen code exactly as it is. You are just attaching Prompt Flow's tracking system to your external application. Think about your existing Python script running a LangChain agent. To get full visibility, you do not touch your chain definitions. You just go to the very top of your main execution file. You import the trace setup function from the prompt flow module, and you call it once before your agent starts doing its work. That single command is the entire integration. When you call that function, it instruments your environment. Because Prompt Flow tracing is built on OpenTelemetry standards, it knows how to listen for the specific events triggered by these popular frameworks. As your LangChain application runs, the instrumentation automatically intercepts the underlying API calls. It captures the inputs, execution duration, outputs, and token usage for every step. This applies seamlessly to AutoGen as well. Multi-agent setups in AutoGen are notoriously difficult to debug because multiple agents send messages back and forth autonomously. Tracking who passed what context to whom usually means digging through massive walls of terminal text. By initializing the trace at the top of your AutoGen script, every single message exchange is automatically captured and structured. Once your script finishes executing, you open the local Prompt Flow Trace UI. Instead of scrolling through console print statements, you get a visual timeline. You see a clear execution tree. You click on the top-level agent run, expand the node, and see the sequence of nested LLM calls and tool executions. If an agent hallucinated on step four, you can click directly on step four to read the exact, unformatted text that was sent to the model. You gain complete visibility into an opaque framework without having to migrate a single line of your actual business logic. The true advantage here is architectural freedom; you can orchestrate your application using whatever framework fits your team best, while keeping your debugging and observability centralized in one clean, visual interface. As always, thanks for listening. See you in the next episode.
12

Scaling Up: Batch Runs with Data

4m 16s

This episode covers running flows against large datasets using JSONL files. Listeners will learn how to map inputs to data columns and execute batch processes to validate their prompts against edge cases.

Download
Hi, this is Alex from DEV STORIES DOT EU. Prompt Flow: The Complete Guide, episode 12 of 14. Your prompt worked perfectly on your test question, but what happens when you run it against ten thousand real user queries? It usually breaks. Testing a prompt manually with two examples is easy, but proving it works across hundreds of edge cases requires something more robust. That is where Scaling Up: Batch Runs with Data comes in. A batch run takes your single flow and executes it across a large dataset. This moves you from building the logic to verifying it at scale. Instead of typing inputs into a user interface, you trigger this process using the command line interface. The core command is pf run create. By executing this, you are instructing Prompt Flow to spin up a new run instance, read a specific flow directory, and feed it a file containing all your test records. The engine processes these records by executing your entire flow graph independently for every single line in your dataset. The required data format for this input file is JSONL. JSONL stands for JSON Lines. If you are used to standard JSON, you might expect a single large array wrapping all your objects. JSONL drops the array. Every single line in the text file is its own valid, standalone JSON object representing exactly one test case. This format is heavily favored in machine learning and data pipelines because it is lightweight and streams easily. The engine can read it line by line without loading a massive file into memory all at once. Consider a web-classification flow. Your flow is designed to take a web address, scrape the content, and use a language model to categorize it. The flow graph expects one specific input string named url. You want to test this logic against one hundred different websites to see how it handles edge cases. You build your JSONL file. Each line contains a JSON object with a key, perhaps named web link, holding the target address. To start the batch run, you type pf run create. You specify your flow directory using the flow flag, and you point to your JSONL file using the data flag. But there is an immediate structural mismatch. Your flow logic strictly requires an input named url, while your data file provides a field named web link. If you run the command right now, it will fail. The engine does not guess which data fields belong to which flow inputs. You bridge this gap using the column mapping flag. This flag tells the execution engine exactly how to connect the keys in your JSONL file to the specific input nodes of your flow. In your command line arguments, you write the flow input name, an equals sign, and then a reference to the data column. Prompt Flow uses a specific binding syntax for these references. You write a dollar sign, followed by the word data, a dot, and then the column name from your file. For the web classification scenario, you map it by writing url equals dollar sign data dot web link. This explicit instruction binds the data field to the flow requirement. The engine will now extract the web link string from line one of your JSONL file and inject it into the url input of your flow. Then it repeats that exact process for line two, line three, and so on until the entire file is processed. You can map multiple inputs this way. If your flow expects a url and a user id, you simply add another mapping instruction to the same command. Decoupling your test data from your flow inputs using explicit column mapping means you never have to alter your core code just to run a new dataset. Thanks for spending a few minutes with me. Until next time, take it easy.
13

The Evaluation Paradigm

3m 39s

This episode covers using evaluation flows to compute metrics on the outputs of a batch run. Listeners will learn how to transition from traditional unit testing to statistical grading of stochastic LLM responses.

Download
Hi, this is Alex from DEV STORIES DOT EU. Prompt Flow: The Complete Guide, episode 13 of 14. You cannot write a standard unit test asserting that a large language model response equals an exact string. If the output changes every time, how do you mathematically prove your application actually works before deploying it? The answer is The Evaluation Paradigm. In traditional software engineering, unit testing is deterministic. You pass a specific input to a function, and you expect an exact output. Large language models are stochastic. They are inherently unpredictable. Ask a model to classify a webpage, and it might output the word sports, or it might say the category is sports, or it might just give you a related emoji. A standard test assertion will fail on two out of three of those, even though the model fundamentally got the answer right. To prove release readiness, you have to move past unit tests and embrace statistical evaluation. In Prompt Flow, this is handled by an Evaluation Flow. An evaluation flow is just a normal flow, but instead of generating text for an end user, its entire job is to grade the output of another flow. It takes the predictions your main application generated, compares them against a known set of correct answers, and outputs calculated metrics. Let us make this concrete. Say you just finished a batch run of a web classification flow. It processed a hundred website addresses and output a predicted category for each one. Now you need to know exactly how accurate those predictions are. You take the outputs of that base run and feed them into a new, dedicated evaluation flow designed to calculate classification accuracy. You trigger this from the command line using the run create command. You specify your evaluation flow, but instead of just giving it a static file of new inputs, you pass it a reference to your previous batch run. The evaluation flow needs two pieces of information to do its job. It needs the model prediction, and it needs the ground truth, which is the historically correct answer. You provide a data mapping that connects these dots. You tell the system to map the evaluation flow prediction input to the category output of your base run. Then, you map the ground truth input to the actual true label column located in your original test dataset. This is the part that matters. The evaluation flow executes row by row, stepping through the entire history of the previous run. For each row, it looks at what your main flow predicted and compares it against the ground truth. It assigns a score for that specific interaction. In our classification scenario, that might be a simple one for a match and a zero for a miss. Other evaluation flows might use a language model to judge the quality of a summary on a scale of one to five. Once the flow finishes grading every single row, it aggregates those individual scores into a final, overall metric. You do not just get a massive log of ones and zeros. You get a definitive, top-level number, like an overall accuracy of ninety-two percent. You can then view these aggregated metrics to decide if the current version of your application is ready for production. You no longer have to guess if a prompt tweak improved your system; you have statistical proof of exactly how much better or worse it performs across your entire dataset. Thanks for tuning in. Until next time!
14

Taking Flows to Production

3m 51s

This final episode covers the myriad deployment options available for a completed flow. Listeners will learn how a flow serves as a production-ready artifact that can be deployed to Docker, Kubernetes, or App Services.

Download
Hi, this is Alex from DEV STORIES DOT EU. Prompt Flow: The Complete Guide, episode 14 of 14. You have built, traced, and evaluated your language model flow, and it works perfectly. But right now, it is just sitting in your development environment, completely isolated from your actual users. Getting it out of your IDE and turning it into a stable, reachable service is exactly what we are covering today by taking flows to production. At rest, a prompt flow is simply a folder. It contains a configuration YAML file and your Python scripts or Jinja templates. To make that folder do anything in the real world, you need an execution environment to load those files and listen for incoming HTTP requests. During development, you probably used the local built-in server. You run a simple command, and it spins up a local endpoint to test inputs and outputs. This is fine for verifying logic, but it is not designed to handle concurrent production traffic, manage memory efficiently, or survive a system crash. To reach production, you need containerization. Prompt Flow is built to integrate directly with Docker. Instead of writing your own web server from scratch, you use Prompt Flow tooling to export your flow as a Docker container. When you run the export process, the system generates a Dockerfile. This file defines a base image, copies your flow directory into it, installs the exact Python dependencies listed in your requirements file, and sets up a production-ready web server to serve the flow as a REST API. This is the part that matters. Once your flow is in a Docker container, it stops being a special machine learning experiment and becomes a standard software artifact. You can treat it like any other microservice. Take a concrete scenario. You have a validated customer support chat flow. You build the Docker image and push it to your private container registry. Next, you deploy that image to a Kubernetes cluster. You write a standard deployment configuration telling Kubernetes to spin up three replicas of your chat flow container, and you place a load balancer in front of them. Now, your frontend application sends standard HTTP requests to the load balancer, which distributes the traffic across your containers. If your chat flow gets slammed with user requests, Kubernetes simply scales up more instances of your flow. If you do not want to manage a Kubernetes cluster, that same Docker image works perfectly on managed platform services. You can deploy it directly to Azure App Service or any other cloud provider that hosts containers. The flow does not care where it runs, as long as it has a container runtime. There is one alternative deployment pattern. If you are not building a web service, you can distribute a flow as an executable application. This packages the flow and its runtime dependencies into a standalone executable file. This is useful if you need to run the logic directly on a client machine or an edge device, entirely bypassing the need to install Python or set up a server. Moving a prompt flow to production does not require proprietary hosting or specialized machine learning infrastructure. By exporting to Docker, your language model flows integrate seamlessly into the exact same continuous delivery pipelines and container orchestrators you already use for your traditional backend services. If you want to take this further, I encourage you to read through the official Prompt Flow documentation, try deploying a container yourself, or visit devstories dot eu to suggest topics you want to see in future series. I would like to take a moment to thank you for listening — it helps us a lot. Have a great one!