Back to catalog

Season 53 10 Episodes 36 min 2026

NVIDIA NeMo Guardrails

v0.21 — 2026 Edition. A technical audio course on securing agentic AI applications with NVIDIA NeMo Guardrails. Learn to implement content safety, topic control, PII masking, and jailbreak prevention. (v0.21 - 2026)

AI Safety LLM Orchestration AI/ML Frameworks

🌐 English 🇪🇸 Español 🇫🇷 Français 🇵🇹 Português 🇮🇹 Italiano 🇵🇱 Polski 🇩🇪 Deutsch 🇷🇴 Română

Now Playing

Click play to start

0:00

The AI Guardrails Imperative: Core Abstractions

Discover why raw LLM APIs are dangerous for production and how to orchestrate safety. This episode introduces the five-stage pipeline of NeMo Guardrails.

3m 49s

Configuration and the Colang 2.0 State Machine

Learn how to separate safety logic from business logic using configuration files. We explore Colang 2.0 and how it builds event-driven dialogue flows.

3m 30s

Specialized Content Safety with Nemotron NIM

Explore how to offload moderation to specialized, high-speed models. We cover using the Nemotron Safety Guard 8B model for catching unsafe prompts.

3m 33s

Enforcing Domain Boundaries with Topic Control

Prevent PR disasters by keeping your bots strictly on topic. Learn how to implement Topic Control Input Rails to block unauthorized conversations.

3m 34s

Dynamic PII Detection and Masking

Protect sensitive user data across inputs, outputs, and retrievals. This episode details dynamic PII masking using GLiNER and Presidio integrations.

3m 50s

Jailbreak Detection via Perplexity Heuristics

Defend against adversarial prompt injections using mathematical heuristics. Learn how perplexity scoring catches jailbreaks before they hit the LLM.

4m 03s

Securing Agentic Workflows with Execution Rails

Protect the tools your autonomous agents use from exploitation. We break down YARA rules and Execution Rails for blocking code and SQL injections.

3m 42s

Grounding RAG: Hallucinations and Fact-Checking

Ensure your RAG applications don't invent facts. Learn how to configure output rails to verify responses against retrieved knowledge chunks.

3m 45s

Multimodal Content Safety

Text filters fail when users upload screenshots of malicious prompts. Discover how to use Vision models as judges to secure multimodal applications.

3m 25s

Enterprise Integration Patterns

Scale your guardrails across the enterprise. We review integration via the Python SDK, LangChain Runnables, and the standalone API Server.

3m 17s

Episodes

The AI Guardrails Imperative: Core Abstractions

3m 49s

Discover why raw LLM APIs are dangerous for production and how to orchestrate safety. This episode introduces the five-stage pipeline of NeMo Guardrails.

Download

Hi, this is Alex from DEV STORIES DOT EU. NVIDIA NeMo Guardrails, episode 1 of 10. You would never connect a raw database directly to the public internet, yet many applications expose unconstrained language models directly to end users. Relying solely on a system prompt to maintain security is a fragile architecture. Today we cover The AI Guardrails Imperative: Core Abstractions. NeMo Guardrails acts as a programmable intermediary layer sitting between your user application and the language model. It does not replace your model. Instead, it creates a separate pipeline of discrete safety checks called rails. Rather than begging the model to behave through complex prompt engineering, you orchestrate safety through this deterministic layer. Consider a customer service bot. A user sends a message, the system retrieves relevant support articles, the model drafts an answer, and it might trigger a backend action like processing a refund. You need different types of protection at each stage. To handle this, Guardrails defines five distinct rail types. First, the user message hits the Input Rail. This rail inspects the text before the core language model ever sees it. If a user attempts a prompt injection attack or submits highly toxic text, the input rail intercepts the message immediately. It stops the pipeline and returns a predefined refusal. The main model never processes the malicious text, saving compute and preventing an exploit. If the input is safe, the system evaluates Dialog Rails. These manage the expected flow of the conversation. If a user asks the support bot for an opinion on a competitor, a dialog rail identifies the topic. It forces the bot to follow a predetermined path, perhaps replying that it only discusses its own products. Dialog rails prevent the model from wandering off topic or answering questions it has no business addressing. Next, when your bot searches your knowledge base to ground its answer, Retrieval Rails take over. These inspect the chunks of text pulled from the database before they are appended to the model prompt. If a poorly configured search accidentally grabs an internal human resources document instead of a public manual, the retrieval rail detects the sensitive information and strips it out of the context window. If the conversation requires the bot to perform a task, Execution Rails step in. They control which custom actions the model is allowed to trigger. When the model requests to execute code or call an external tool, the execution rail verifies if that specific action is permitted given the current state of the conversation. It blocks unauthorized commands from executing. Finally, we have Output Rails. This is the last line of defense. After the model generates a response, the output rail evaluates the text before it reaches the user. It checks for hallucinated facts, inappropriate tone, or sensitive data leaks. If the text fails the check, the output rail intercepts it and alters or blocks the final message. This architecture fundamentally changes how you build generative applications. You stop relying on a probabilistic engine to police its own behavior, and instead build a deterministic safety net that controls inputs, logic, and outputs independently. By the way, if you find these episodes helpful and want to support the show, you can search for DevStoriesEU on Patreon. That is all for this one. Thanks for listening, and keep building!

Configuration and the Colang 2.0 State Machine

3m 30s

Learn how to separate safety logic from business logic using configuration files. We explore Colang 2.0 and how it builds event-driven dialogue flows.

Download

Hi, this is Alex from DEV STORIES DOT EU. NVIDIA NeMo Guardrails, episode 2 of 10. Writing raw Python to manage complex, branching dialogue state machines is an absolute nightmare. The moment your user deviates from a rigid script, your hardcoded logic breaks. The configuration architecture in NeMo Guardrails fixes this by treating conversations as event-driven flows instead of static code. NeMo Guardrails separates the mechanics of your application from the logic of your conversation. This happens across two distinct configuration pieces. First, you have your YAML configuration file. This handles all the model wiring. It is where you declare your main language model, define your embedding models, and register custom application actions. The YAML file connects the underlying infrastructure. It provides the engine, but it knows absolutely nothing about what the user will actually say. The actual conversation logic is governed by Colang 2.0. Colang is an event-driven interaction modeling language. Instead of writing standard imperative code with endless conditional statements to track where the user is in a conversation, you define flows. A flow models a sequence of interactions. When a user sends a message, it generates an event. The state machine catches this event, looks for an active flow that matches the interaction, and dictates the assistant's next move. This is the part that matters. Colang uses Natural Language Descriptions to bridge the gap between strict code and human ambiguity. Instead of writing complex regular expressions to parse a user message, you instruct the system using human language right inside your flow logic. You pair these descriptions with the generation operator, which is written simply as three dots. When the state machine encounters those three dots, it temporarily pauses execution. It hands the current context and your natural language description over to the underlying language model, asking it to generate or extract the exact value you need. Let us look at a concrete scenario like booking a flight ticket. You need to know when the user wants to travel. In your Colang file, you define a flight booking flow. Inside that flow, you tell the system to wait for the user to speak. Once they do, you need to extract the date. You declare a context variable, perhaps called flight date. You assign it the value of a natural language description, literally writing out the phrase "the date the user wants to fly", immediately followed by the generation operator, those three dots. When the user says "I need a ticket for next Tuesday," the Colang state machine captures the event. It hits your variable assignment. It passes the user message and your natural language instruction to the language model. The model reads the context, identifies "next Tuesday" as the target value, and returns it. The generation operator resolves, and your context variable now securely holds the extracted date. Your flow then proceeds to the next step, which might involve calling an external booking API defined over in your YAML configuration. The real power of this architecture is that you stop fighting dialogue states with rigid code. You let a YAML file lock down the static infrastructure, and you let Colang use the reasoning capacity of the language model to navigate the unpredictable reality of human conversation. I would like to take a moment to thank you for listening — it helps us a lot. Have a great one!

Specialized Content Safety with Nemotron NIM

3m 33s

Explore how to offload moderation to specialized, high-speed models. We cover using the Nemotron Safety Guard 8B model for catching unsafe prompts.

Download

Hi, this is Alex from DEV STORIES DOT EU. NVIDIA NeMo Guardrails, episode 3 of 10. Relying on a massive seventy billion parameter model to do basic content moderation is a massive waste of compute. It is slow, expensive, and takes processing power away from generating actual answers. The fix is Specialized Content Safety with Nemotron NIM. Instead of asking your primary application model to both write code and check for toxicity, you split the workload. The main model handles the complex reasoning and generation. A second, much smaller model handles the security. Specifically, we are looking at the Llama 3 point 1 Nemotron Safety Guard 8B V3. This is an eight billion parameter model fine-tuned entirely for one job, which is evaluating content safety. It runs as a standalone microservice, or NIM, which NeMo Guardrails calls out to over an API. To set this up, you define the Nemotron NIM in your guardrails configuration as a distinct model. You label its type as content safety, while your primary model remains the main type. This distinction is critical because Guardrails routes traffic differently based on these labels. Once configured, you activate the safety guard as an input rail and an output rail. When a user sends a prompt, Guardrails intercepts it before it ever reaches your main model. It sends the prompt to the Nemotron safety NIM. The NIM evaluates the text against twenty-three specific unsafe content categories. These categories cover everything from hate speech and violence to sexual content and criminal planning. Consider a multilingual application where a user submits a prompt in French, asking for step-by-step instructions on how to hotwire a car. The input rail catches this. Guardrails ships the French text to the Nemotron NIM. Because the safety model is trained on multilingual safety data, it understands the intent regardless of the language. It flags the request as falling under the criminal advice category and returns an unsafe signal back to Guardrails. Guardrails then immediately halts the process and returns a standard refusal message to the user. Your main model never even sees the prompt, saving you the inference cost of processing a toxic request. The exact same logic applies in reverse for output rails. If a seemingly harmless prompt somehow tricks the main model into generating an unsafe response, Guardrails intercepts that generated text before it reaches the user. It sends the output to the safety NIM, checks it against those same twenty-three categories, and blocks it if it violates the rules. Setting this up requires updating your configuration files. You declare the Nemotron model in your models list, pointing it to your NIM endpoint. Then, you enable the default self-check input and self-check output flows, explicitly telling Guardrails to use your content safety model for these checks. You do not need to write custom prompts instructing the model on how to evaluate toxicity. The Nemotron model expects a very specific prompt format to evaluate text, and NeMo Guardrails formats that API call automatically under the hood. You just point the rails at the NIM and let it do the classification. Decoupling your moderation logic into a dedicated, smaller model ensures your primary application model spends its cycles generating value, while a specialized guard efficiently handles the perimeter security. That is all for this one. Thanks for listening, and keep building!

Enforcing Domain Boundaries with Topic Control

3m 34s

Prevent PR disasters by keeping your bots strictly on topic. Learn how to implement Topic Control Input Rails to block unauthorized conversations.

Download

Hi, this is Alex from DEV STORIES DOT EU. NVIDIA NeMo Guardrails, episode 4 of 10. The easiest way to prevent your chatbot from creating a public relations disaster is to ensure it simply refuses to talk about the disaster. You cannot rely on a general-purpose language model to reliably decline every irrelevant or risky topic on its own. That is why we use Enforcing Domain Boundaries with Topic Control. General language models are eager to please. If a user asks a clever question, the model will try to answer it. If you build a customer service bot, you only want it talking about customer service. You need a strict boundary. To create this boundary, you use the Llama 3.1 NemoGuard 8B TopicControl NIM. This is a specialized model designed for one specific task. It does not generate conversational responses. It evaluates. Consider a telecom support bot. Its job is to help users with phone bills, network outages, and data plans. A user connects to the chat and asks the bot for its opinion on a recent political election. Without guardrails, your main model receives the prompt, processes it, and might generate an inappropriate response. With NeMo Guardrails, you configure an Input Rail. An Input Rail intercepts the user request before anything else happens. The main language model never even sees the request. When the user asks about the election, the guardrail routes the input directly to the TopicControl model. You control how this model behaves by defining strict guidelines inside its system prompt. For your telecom bot, your system prompt states that acceptable topics are billing, network status, and account management. The TopicControl model takes the user prompt and measures it against those exact guidelines. It then outputs a rigid classification. It returns either an on-topic or off-topic assessment. If the user asks why their roaming charges are high, the TopicControl model reads the prompt, checks the guidelines, and returns on-topic. The guardrail opens the gate, and the user prompt passes through to your main conversational model to generate a helpful answer. When the user asks about the political election, the TopicControl model evaluates the prompt against the telecom guidelines. It recognizes the mismatch and returns off-topic. The guardrail immediately halts the pipeline. It blocks the request from reaching your main model. Instead, the guardrail triggers a predefined, static refusal response. The bot tells the user it is only equipped to handle telecom services. Using a dedicated model for topic control separates the evaluation logic from the conversational logic. You are not wasting expensive compute cycles asking a massive, general-purpose model to figure out if it is allowed to answer a question. You use a smaller, highly optimized eight-billion parameter model to act as a bouncer at the door. This keeps your domain boundaries strictly enforced without requiring complex, brittle prompt engineering on your primary model. The most secure way to handle an out-of-bounds request is to ensure your primary reasoning engine remains entirely unaware the request was ever made. That is all for this one. Thanks for listening, and keep building!

Dynamic PII Detection and Masking

3m 50s

Protect sensitive user data across inputs, outputs, and retrievals. This episode details dynamic PII masking using GLiNER and Presidio integrations.

Download

Hi, this is Alex from DEV STORIES DOT EU. NVIDIA NeMo Guardrails, episode 5 of 10. If your retrieval-augmented generation system ingests an unredacted employee database, you have already committed a compliance violation before the language model even generates a single word. Once sensitive data enters the prompt context, you lose control over where it goes. Dynamic PII Detection and Masking is how you intercept that data in transit. Let us look at an internal human resources bot. A manager asks the bot a general question about the company performance review policy. The bot searches the internal knowledge base. The search returns the relevant policy document, but the retrieved text chunks accidentally include a specific employee record appended to the file, complete with a name, a home address, and an email. If the system passes those retrieved chunks directly to the language model, that sensitive data becomes part of the context window. NeMo Guardrails handles this by sitting between the components of your application and filtering the text stream. It provides two distinct approaches. Detection involves identifying personally identifiable information and taking a hard action, such as blocking the prompt entirely or throwing an error. Masking is more flexible. It finds the sensitive information and replaces it with a generic placeholder on the fly. The text john@example.com becomes a bracketed string saying EMAIL. The underlying database remains completely untouched. To execute this, the guardrails system relies on specialized external tools. You configure the framework to call an engine like Microsoft Presidio or GLiNER. Presidio typically uses pattern matching, regular expressions, and rule-based logic to spot standard formats like phone numbers or credit cards. GLiNER, which stands for Generalist and Lightweight Indicator for Named Entity Recognition, uses a small machine learning model to identify entities based on the surrounding context. You define an array of entity types you care about, and the selected engine handles the extraction. This protection operates across three specific points in the application flow. The first is the input rail. If a user types their own social security number into the chat window, the input rail scans the incoming string and masks the number before the language model ever receives the prompt. The model processes the request using the placeholder, completely blind to the actual number. The second point is the retrieval rail. This is the part that matters. When your system queries a vector database and pulls back raw text chunks, the guardrail intercepts those chunks before they are injected into the final prompt template. It scans the retrieved text, strips out the real names and addresses, and substitutes the masked tags. Your retrieval mechanism can pull from messy, unredacted data sources, but the language model is shielded from the sensitive details. The third point is the output rail. If the language model generates sensitive data, either by hallucinating it or because a piece of data somehow slipped past the earlier rails, the output rail acts as a final checkpoint. It scans the generated response and masks the sensitive text before it reaches the user screen. Because all of this happens dynamically in memory during the execution of a single request, your data architecture does not have to change. You avoid the massive engineering overhead of maintaining duplicate, pre-redacted databases just to run a chat application. The most secure way to handle sensitive data in a language model application is to ensure the model never computes on the actual data in the first place. That is all for this one. Thanks for listening, and keep building!

Jailbreak Detection via Perplexity Heuristics

4m 03s

Defend against adversarial prompt injections using mathematical heuristics. Learn how perplexity scoring catches jailbreaks before they hit the LLM.

Download

Hi, this is Alex from DEV STORIES DOT EU. NVIDIA NeMo Guardrails, episode 6 of 10. Some of the most devastating prompt injection attacks do not look like clever social engineering at all. To a human, they just look like complete gibberish. The user is not trying to trick the bot with a riddle, but rather overload it mathematically. Catching this requires Jailbreak Detection via Perplexity Heuristics. Consider a specific attack scenario. A malicious user wants your application to output something harmful, but they know you have safety instructions in your system prompt. To bypass those instructions, they take their harmful request and pad it with a massive string of random characters, obscure symbols, or nonsense words. When the language model processes this input, the attention mechanisms get dragged down by the sheer volume of unpredictable tokens. The model loses track of its underlying safety alignment and simply answers the malicious request. You cannot catch this attack with standard keyword filters because the padding is random. You could use another language model to evaluate the incoming prompt for adversarial patterns, but that adds a massive latency penalty to every single user request. You need a filter that is fast, mathematical, and runs locally. This is where perplexity comes in. Perplexity is a standard metric that measures how predictable a piece of text is to a language model. Normal human sentences follow predictable patterns, resulting in low perplexity. A string of random characters, or a bizarre sequence of unrelated words, is highly unpredictable. That generates high perplexity. By calculating the perplexity of an incoming prompt, you get a statistical measure of its randomness. NeMo Guardrails uses this concept to block gibberish attacks through two main heuristics. The first is Length per Perplexity. Adversarial attacks usually require a long string of garbage text to successfully derail the main model. A short string of nonsense is typically ignored by the attention mechanism. This heuristic calculates a ratio using the total length of the prompt and its overall perplexity score. If a prompt is both unusually long and highly unpredictable, it crosses a predefined mathematical threshold. The guardrail intercepts the input, flags it as a potential jailbreak, and blocks the request before your main model ever sees it. The second heuristic handles a more surgical variation of the same attack. Sometimes an attacker hides the gibberish. They write a completely normal, low-perplexity sentence in the middle of their prompt, but they append a dense block of random tokens at the very end. If you calculate the perplexity across the entire prompt, the normal text might dilute the average score enough to slip past the first filter. To counter this, NeMo Guardrails uses Prefix and Suffix Perplexity. Instead of looking at the whole prompt, this heuristic isolates a fixed number of tokens at the very beginning and the very end of the user input. It calculates the perplexity of those boundaries independently. If the user attaches an adversarial payload of random characters to the end of a normal question, the suffix perplexity score spikes. The guardrail detects the anomaly at the boundary and drops the request. This entire process happens without making an external API call to a massive language model. It relies on a smaller, local model to calculate the perplexity scores, which keeps the latency overhead to an absolute minimum. By stripping away the semantic meaning of the text, you stop analyzing what the user is trying to say and start analyzing the mathematical shape of their words. Perplexity heuristics allow you to block complex adversarial attacks using raw statistics, securing your application at the speed of math. Thanks for listening, happy coding everyone!

Securing Agentic Workflows with Execution Rails

3m 42s

Protect the tools your autonomous agents use from exploitation. We break down YARA rules and Execution Rails for blocking code and SQL injections.

Download

Hi, this is Alex from DEV STORIES DOT EU. NVIDIA NeMo Guardrails, episode 7 of 10. Giving a language model access to execute database queries is like handing the keys to your database to a stranger. You tell the model to fetch user data, but a clever prompt tricks it into appending a command to drop a table. Securing agentic workflows with execution rails is how you stop autonomous agents from compromising your backend. Most guardrails focus on the chat response. They stop the model from using bad language or going off-topic. Execution rails do something completely different. They protect the tools the agent uses. When an agent decides to use a tool to run a generated Python script or execute a SQL query, the execution rail intercepts the payload exactly at the handoff point, right before the tool actually runs. NeMo Guardrails handles this validation using YARA rules. YARA is a pattern-matching engine traditionally used by cybersecurity analysts to identify malware based on specific textual or binary signatures. In this framework, NVIDIA provides a predefined catalog of YARA patterns built specifically to catch language model vulnerabilities. The guardrails engine passes the tool input through these YARA rules to look for known attack signatures. This catalog targets four specific threats. First is SQL injection, where the model generates database commands containing malicious modifications. Second is code injection, where the agent tries to execute unauthorized system commands inside a dynamically generated Python script. Third is template injection, where an attacker manipulates server-side rendering engines. Finally, it detects cross-site scripting, blocking payloads designed to execute malicious scripts in a web browser. Consider an agent tasked with taking a user request, writing a Python script to process some local data, and running it. If a user hides a system command to expose environment variables inside their prompt, the language model might blindly include that command in the final Python script. The execution rail catches this. It scans the generated Python code, matches the YARA rule for code injection, and flags the payload before the execution engine ever sees it. When the system detects a malicious pattern, it takes action based on your configuration. You have two choices. The first action is reject. This completely blocks the tool execution. The workflow stops, the malicious code never runs, and the system returns a safe response. The second action is omit. This tells the guardrail to strip the specific malicious string out of the payload but allow the tool to execute whatever is left. Choosing between reject and omit depends entirely on the tool. If your agent is running a SQL query against a database, reject is the only safe choice. Omitting a string from a complex SQL query will likely result in malformed syntax or unpredictable data access. Omit is generally reserved for basic text processing tasks where removing a bad string still leaves a usable, safe payload. You configure these protections directly in your YAML files by enabling the agentic security rails and mapping them to specific tools. The underlying YARA engine does the heavy lifting, matching the predefined patterns without requiring you to write custom regular expressions. Execution rails treat the language model as an untrusted user, forcing you to validate every single command it generates before it touches your infrastructure. Thanks for listening, happy coding everyone!

Grounding RAG: Hallucinations and Fact-Checking

3m 45s

Ensure your RAG applications don't invent facts. Learn how to configure output rails to verify responses against retrieved knowledge chunks.

Download

Hi, this is Alex from DEV STORIES DOT EU. NVIDIA NeMo Guardrails, episode 8 of 10. A bot saying it does not know the answer is mildly annoying. A financial bot confidently declaring your company missed its third quarter revenue target by fifty million dollars, when it actually beat the target, is a disaster. To stop your Retrieval-Augmented Generation system from making up numbers and destroying your credibility, you need Grounding RAG: Hallucinations and Fact-Checking. Let us clear up a common confusion right away. Fact-checking and hallucination detection are two different things. Hallucination detection typically involves asking the language model the same question multiple times. If the resulting samples contradict each other, the model is likely hallucinating. Fact-checking evaluates a single generated answer against a specific set of trusted documents. NeMo Guardrails uses fact-checking to keep your RAG pipelines grounded. Language models predict text. They do not intrinsically know what is true. In a standard RAG setup, you retrieve text chunks from your database, feed them to the model, and ask it to answer based solely on that text. Sometimes the model still hallucinates, blending its pre-trained knowledge with your data, or simply inventing a plausible sounding metric. NeMo Guardrails stops this using an output rail called self check facts. When active, this rail intercepts the model's generated response before it goes to the user. The system treats this unverified response as a hypothesis. It then pulls the exact text snippets that were retrieved from your database. These snippets are stored automatically in a context variable named relevant chunks. This variable is the hard evidence. The guardrail then runs an evaluation. It passes the hypothesis and the evidence to an evaluator model. The evaluator checks if the evidence strictly entails the hypothesis. Returning to our financial bot, if the generated text claims a specific revenue number, but that number does not exist anywhere in the relevant chunks variable, the evaluator flags it as ungrounded. The guardrail immediately blocks the hallucinated response. Instead, it serves a safe fallback message, admitting it does not have enough information. Using a general purpose language model to check facts is easy to set up, but it comes with a penalty. It adds latency, burns tokens, and the evaluator model can sometimes get confused by dense context. You are not forced to use the default setup. You can replace the standard prompt based checking with specialized external tools. Approaches using models like AlignScore or Patronus Lynx are highly effective here. These are purpose built models trained exclusively to detect inconsistencies between a source text and a generated claim. When you route the relevant chunks and the hypothesis through one of these specialized models, you bypass the standard language model evaluator entirely. This provides a faster, cheaper, and often more rigorous verdict on whether your bot is telling the truth. This is the part that matters. Grounding a RAG pipeline is not about making your generative model inherently more truthful. It is about treating every single generated word as a suspect claim that must survive strict cross-examination against your exact source data before it ever sees the light of day. If you want to help us keep making these episodes, you can support the show by searching for DevStoriesEU on Patreon. That is all for this one. Thanks for listening, and keep building!

Multimodal Content Safety

3m 25s

Text filters fail when users upload screenshots of malicious prompts. Discover how to use Vision models as judges to secure multimodal applications.

Download

Hi, this is Alex from DEV STORIES DOT EU. NVIDIA NeMo Guardrails, episode 9 of 10. You spend weeks tuning your input filters to block malicious text prompts. Then a user bypasses all of your hard work just by taking a screenshot of their bad prompt and uploading it as an image. Text filters are completely blind to this. To catch it, you need Multimodal Content Safety. Consider a user who uploads a picture of a customized assault rifle and types the question, how do I modify this to fire faster. A standard text guardrail only sees an inquiry about modifying a generic object. It has no idea the object is a weapon. The malicious intent only exists when you combine the image and the text. We handle this by using a Vision Language Model acting as an LLM-as-a-judge. In NeMo Guardrails, you configure this at the input rail stage. Before the user request ever reaches your main conversational model, the guardrail intercepts the multimodal payload. The image data usually arrives in one of two formats. It is either a Base64 encoded string embedded directly in the API call, or it is a direct URL. Base64 strings make the immediate request payload much larger but guarantee the image is available. URLs keep the initial payload light but require the evaluating model to have outbound network access to fetch the file. Either way, you configure the input rail with a specific prompt template designed for multimodal evaluation. This template contains a strict rubric defining unsafe content categories, like violence, illegal acts, self-harm, or weapons. The guardrail constructs an evaluation request. It packages your safety rubric, the user text, and the user image. It sends this combined package to the vision model and forces the model to reply with a structured output, typically a simple safe or unsafe label. The model acts strictly as a judge. It analyzes the weapon in the photo, reads the request about modifying it, and checks the combined meaning against your restricted categories. If the vision model detects a violation, it returns an unsafe verdict. The input rail flags the request, blocks it from proceeding, and triggers a predefined refusal message. Your primary application model never even sees the malicious request. Pay attention to this bit regarding context limits. Vision models process images by converting them into tokens. Depending on the model architecture and the image resolution, a single picture can consume thousands of tokens. Your Base64 string or image URL, combined with the user text and your multi-category safety rubric, must fit entirely within the vision model context window. If you pass massive, high-resolution images alongside long prompt text, you will exceed the token limit. The guardrail evaluation will fail, or the model will truncate the input and miss the violation entirely. You must implement preprocessing to resize or compress images before they reach the guardrail, keeping the token footprint predictable. Evaluating text and images in isolation is no longer sufficient, because modern abuse hides exactly in the space where those two modalities intersect. Thanks for spending a few minutes with me. Until next time, take it easy.

Enterprise Integration Patterns

3m 17s

Scale your guardrails across the enterprise. We review integration via the Python SDK, LangChain Runnables, and the standalone API Server.

Download

Hi, this is Alex from DEV STORIES DOT EU. NVIDIA NeMo Guardrails, episode 10 of 10. Building safety rules directly inside a Python script is great for testing, but terrible for scaling across a polyglot enterprise. If your frontend uses Node JS and your backend is a mix of Python and Go, you cannot duplicate your safety logic in every language. The solution is using the three Enterprise Integration Patterns for NeMo Guardrails. Take a standard migration scenario. You built a customer service bot prototype using LangChain. You added guardrails to stop it from talking about competitors. It works perfectly on your laptop. Now you need to migrate this prototype into a scalable production microservice architecture where multiple applications will query it. You have three primary ways to integrate the guardrails to make this happen. The first method is the native Python SDK, specifically an object called LLMRails. This is the core engine. If you are building a custom Python backend from scratch without relying on orchestration frameworks, you use this. You instantiate LLMRails, point it to your configuration directory containing your Colang and YAML files, and use it to process inputs. You pass in a list of message dictionaries representing the conversation history, and it returns the evaluated response. It is direct and gives you raw access to the underlying guardrails mechanics. Since you already have a LangChain prototype, rewriting everything to use the raw SDK is wasted effort. This is where the second method applies, which is the LangChain integration using RunnableRails. NeMo Guardrails integrates natively into the LangChain Expression Language. You create a RunnableRails instance loaded with your configuration and attach it directly to your existing chain. If your chain takes a prompt, retrieves documents, and calls a language model, you wrap that entire flow with the guardrail runnable. The guardrail intercepts the input before it hits your chain and evaluates the output after your chain generates a response. Your core application code barely changes, but your LangChain logic is now protected. This is the part that matters at scale. Consider the broader enterprise. Another team wants to use your exact same safety policies, but they are building a web gateway in Node JS or a high performance router in Go. They cannot import a Python LangChain object. For this, you use the third method, the standalone API Server. NeMo Guardrails includes a built-in server you can run via the command line or deploy as a container. You start the server and point it at your configuration folder. It immediately exposes REST endpoints that mimic the standard OpenAI API. To your Go or Node JS applications, the guardrails server looks exactly like a standard language model. They send standard JSON requests to the chat completions endpoint. The server handles the input guardrails, communicates with the actual underlying model, processes the output guardrails, and returns the clean text. Decoupling your rules from your application logic using a standalone server is the only reliable way to enforce consistent safety policies across an entire organization. Check out the official documentation to try these integration patterns hands-on, or visit DEV STORIES DOT EU to suggest topics for future series. Thanks for spending a few minutes with me. Until next time, take it easy.