AI Guardrails Are Not Enough

March 11, 2026

Luke Posniewski, Nick Maietta and Andrew Burt

TL;DR Without specific legal evaluations, current guardrails for generative and agentic AI systems leave companies exposed to legal risks.

As organizations adopt generative and agentic AI systems, many rely on the general “guardrails” offered by major model or cloud providers as their primary safety mechanism.

We’ve had customers ask us, for example, “Aren’t the existing guardrails we already have enough to keep our AI safe and compliant?”

The truth is that guardrails are insufficient for legal defensibility. These general-purpose systems only cover limited categories of risk and fail to address the nuanced and evolving types of harms that regulators, courts, and enterprise legal teams actually care about.

To understand why, let’s start by examining what guardrails typically do today.

‍

Part I: What General-purpose Guardrails Actually Do

Most general-purpose guardrails use a handful of common (and blunt) mechanisms to prevent harmful content. Because guardrails are "in-the-loop" and typically run every time the AI system is called, their mechanisms need to be very fast and lightweight. These mechanisms typically include:

1. Content Filters

Content filters detect toxic content across a set of categories, often limited for general-purpose guardrails to a list such as:

Hate
Insults
Sexual content
Violence
Misconduct

These filters are useful for moderation, but they are fundamentally designed to catch obvious harmful speech, not complex legal risks.

An overview of the guardrails offered on AWS Bedrock, available here.

For example, a model could generate discriminatory hiring recommendations, hallucinate legal advice, or reveal sensitive information without triggering any of these toxicity categories, each of which could create liabilities under a litany of State or Federal anti-discrimination laws;¹ privacy or confidentiality obligations;² or even claims for the unauthorized practice of law.³

‍

2. Specific Word Filters

Word list filtering is a common, and perhaps the most basic, tool used in general-purpose guardrails. These filters just block outputs or mask content containing specific terms.

While simple and computationally efficient, word filters are extremely blunt. They are easily bypassed (e.g., by paraphrasing), static, and cannot detect contextual or semantic risks. Put more simply, word filters only work when technical teams know what words they want to filter in advance; legal liability often arises from evolving or unpredictable behavior that these filters are not suitable to address.

‍

3. Sensitive Information Detection (i.e., Pattern Filters)

Off-the-shelf guardrails often include detectors for:

Personally identifiable information
Account numbers
Technical identifiers (e.g., IP addresses)
Other structured sensitive data (e.g., Social Security numbers)

These are also very blunt tools that operate similarly to word filters, relying heavily on pattern matching. While important, these controls only address a narrow slice of potential risk: direct exposure of structured data. Like word filters, they cannot detect contextual or semantic risk, for example, if an AI system makes harmful inferences about individuals or groups.

To take one example, we recently worked with a GenAI system that disclosed to users that it did not have access to PII such as precise information about a user’s location. When the user asked for directions to a nearby chain restaurant, the AI provided detailed instructions using a starting location inferred from the user’s IP address. Pattern matching tools for sensitive information filters would not catch such AI system behavior.

‍

4. Denied Topics (i.e., Topic Filters)

Another common general-purpose guardrail restricts topics, such as investment, medical, or legal advice.

In practice, these restrictions are commonly implemented through system prompts or classification rules. This approach works for clearly defined use cases but becomes increasingly brittle as systems become more open-ended, and agentic and the distinction between what is content that falls inside a topic (versus being merely adjacent to a topic) becomes more nuanced.

Indeed, this brittleness and nuance are reasons we often see chatbots veer into guardrail-restricted domains, such as content related to medical or psychological advice, legal counsel, illicit activities/substances (for example, some activities or substances might be legal in one locale but illegal in another, confusing guardrails). For example, a major retailer’s AI-powered assistant was recently just highlighted for having bypassable guardrails.⁴

‍

5. Grounding

Another common guardrail mechanism checks whether outputs are grounded in an approved dataset.

These checks are helpful for reducing hallucinations and stopping the system from going off-topic. However, they don’t address whether the system’s outputs create legal or compliance risks, which require a more nuanced and contextually aware analysis.

For example, we recently worked with a GenAI system that provides advice to consumers based on a pre-existing database of content. While there was an in-place contextually grounding guardrail that was useful for ensuring the AI system only used the content from the database, we identified numerous user interactions where the specific user details made that advice not only inapplicable, but also potentially dangerous. This was not picked up by the contextual grounding guardrail, because the AI system output was still grounded in the approved database.

‍

6. Automated Reasoning

Some guardrail frameworks include attempts to evaluate, using mathematical techniques and formal logic, whether an output is consistent with a policy. This is the most promising direction technically.

But these off-the-shelf systems commonly require organizations to process their policies to align with the technical constraints of the system. They also may require building programming rules off of the policy. All of this not only makes the system much more complicated to implement, but also fragile to changes in the policy. If the organization’s human-drafted policy changes, then to implement it into the guardrails will require re-processing and re-divising the programming rules.

And even if an organization reworks their policies to work within this formal mathematical framework, it’s not nuanced enough. For as legal and risk professionals know, policies are often high level and rarely map cleanly to regulatory requirements, employment law, discrimination law, consumer protection obligations, or other real-world legal frameworks. So in practice, these guardrails can only measure technical policy constraints, not legal standards.

‍

Part II: Guardrail Limitations

Focus on Basic Risks

The first challenge with guardrails is that they address very basic safety categories. Part of this is structural. Guardrails run “in-the-loop” with every call, so they need to be fast (so they don’t introduce latency into a system) and cheap (so they don’t exponentially increase the cost of running the system). To address this, most guardrails rely on mere filters, which are basic technologies that are computationally cheap and fast. This is how word-list filtering and much toxicity detection operates.

Even more sophisticated mechanisms like denied topics or grounding checks are typically technically limited and only address a small part of the risk landscape.

Most importantly, traditional guardrails do not evaluate outputs against legal standards.

Legal risk often depends on context, intent, and downstream use, not just the presence of harmful words found in a wordlist.

Consider an AI system involved in job interviews, a common use case we see. Even if the output contains no prohibited words or toxic content, it could still implicate serious legal exposure if it, for example:

Asks illegal hiring questions
Mentions or infers protected class characteristics
Recommends hiring decisions
Mischaracterizes interview content
Ignores consent requirements
Among many other areas.

Traditional guardrails would miss most of these issues entirely. And these are exactly the types of risks that matter most to legal and compliance teams.

‍

Even Guardrails Need Testing

Even if guardrails worked perfectly—which they do not—they are still insufficient in the context of external legal oversight. Guardrails operate as runtime controls, not as a substitute for evaluating how a system behaves in the real world. And a control cannot tell you whether it's the right control for the risk.

Organizations need to test their guardrails for three key reasons: (1) to validate whether guardrails actually work; (2) to identify complex or nuanced legal risks that are not captured by basic runtime controls; and (3) to ensure their set of guardrails maintains relevancy over time (i.e., avoiding issues of drift).⁵

Validation. Guardrails require validation because they are inherently blunt tools. When using general-purpose guardrails they will often either: block too much, limiting useful functionality by the AI system; or block too little, allowing harmful outputs to slip through. Testing is the only way to understand where those gaps exist. And guardrails cannot make that self-assessment. So even if an organization believes it has captured all the right risks in its set of guardrails, it needs a method to determine whether those guardrails are working appropriately in real-world conditions.

Complex Risks. Further, testing is required to identify whether a combination of deployed guardrails is actually addressing the most relevant risks in a system. It’s common, particularly when using a general-purpose toolkit of guardrails, for deployed guardrails to have large blind spots associated with the more nuanced risks that a company faces. In short, you need to determine whether you’re utilizing the right risk mitigation tools for the AI system.

Drift. Guardrails require continued evaluation due to how underlying models may shift (i.e., model drift) or how users utilize the system may shift (i.e., use case drift). In other words, guardrails can lose effectiveness as new patterns emerge. For example, as the underlying foundational model drifts (e.g., due to updates to the base model’s weights), it creates a significant risk that guardrails’ effectiveness may decline. This is why systematic, periodic testing of guardrails using quantifiable metrics is critical.

‍

Guardrails Are Necessary, But Not Sufficient

Guardrails are useful tools and should absolutely be part of AI safety architecture.

But they are not a substitute for legal risk evaluation. As generative and agentic systems become more powerful, organizations need approaches that move beyond basic filters and keyword lists.

Legal defensibility requires systematic testing and evaluation against legal standards. So while guardrails may be the first line of defense, they cannot be the last.

‍

This Is Why We Built LuminosAI

LuminosAI’s DNA is rooted in the wide range of evolving legal risks associated with AI. And LuminosAI’s generative and agentic AI testing is built to fill the gaps left by guardrails by working with your guardrails to allow you to understand and effectively mitigate your legal and compliance risks associated with your AI systems.

Our testing is based around more than 100 legally-aligned provisions used to assess specific legal risks. Depending on the AI use case, these provisions are grouped into “constitutions” to assess the most relevant and critical legal risks associated with the use case. These constitutions provide a flexible, yet fully scalable, approach to legal review of AI.

For example, assessing legal risks associated with an AI system involved in job interviews would include, among others, provisions covering legal risks associated with:

Inappropriate or Illegal Interview Questions
Consequential Decisions
Improper Use of Protected Class Indicators
Fact-based Summaries
Consent for AI Recording

None of these risks would be reliably detected by standard guardrails. Yet each represents a clear legal exposure for organizations deploying AI in hiring workflows.

And that is why every organization deploying generative or agentic AI needs testing mechanisms that incorporate a wide range of legal risks, and that can test the guardrails themselves.

In short, if you’re worried about AI safety and compliance, you need more than guardrails alone.

‍

If you’re interested in learning more about how to implement AI guardrails safely through LuminosAI testing, we’d love to hear from you. You can reach out to us at contact@luminos.ai or book a demo to connect with us today.

‍

1. American Bar Association, “Navigating the AI Employment Bias Maze: Legal Compliance Guidelines and Strategies”, April 10, 2024, available at: https://www.americanbar.org/groups/business_law/resources/business-law-today/2024-april/navigating-ai-employment-bias-maze/

2. European Data Protection Board, “AI: the Italian Supervisory Authority fines company behind chatbot ‘Replika’”, May 25, 2025, available at: https://www.edpb.europa.eu/news/national-news/2025/ai-italian-supervisory-authority-fines-company-behind-chatbot-replika_en

3. ABA Journal, “OpenAI sued for practicing law without a license”, March 6, 2026, available at: https://www.abajournal.com/news/article/openai-sued-for-practicing-law-without-a-license

4. Tom’s Hardware, “Amazon's Rufus AI shopping assistant can be . . . tricked into answering other questions . . .”, March 9, 2026, available at: https://www.tomshardware.com/tech-industry/artificial-intelligence/amazons-rufus-ai-shopping-assistant-can-be-easily-jailbroken-and-tricked-into-answering-other-questions-specific-prompts-break-the-chatbots-guidelines-and-reach-underlying-ai-engine

5. The National Institute of Standards and Technology (NIST) provides a similar framing for the necessity of testing. See NIST AI 800-4: Challenges to the Monitoring of Deployed AI Systems (March 9, 2026) (“Post-deployment measurement and monitoring is necessary (1) to validate that an AI system is operating reliably and as expected in real-world scenarios, (2) to track unforeseen outputs and drift, and (3) to identify unexpected consequences of integrating AI systems in new or changing contexts.”), available at: https://nvlpubs.nist.gov/nistpubs/ai/NIST.AI.800-4.pdf

‍

All Posts