This is the question clients ask before launch 9 times out of 10: "What if your AI tells a customer something off the rails?" The honest answer — with no protection, it will. With three protection levels — practically never. Here is how it works technically and what we guarantee.
Layer 1 — Prompt rules and white-list scope
The first layer isn't a "setting" — it's an architectural constraint. Any company's AI agent gets a system prompt along the lines of: "You are a sales assistant. Your job is to qualify the inbound request and book a meeting." Then come the hard prohibitions — the agent is NOT allowed to:
- quote a specific price without pulling the price from the price list via API;
- promise delivery timelines without pulling stock status via API;
- confirm a discount above 5% without escalating to a manager;
- answer questions outside the sales scope (claims, technical support, legal questions) — escalate.
If the question is outside the scope, the answer is: "I'll pass this question to a colleague." And it creates a task in the CRM.
What this gives you: 70-80% of potential mistakes never happen, because the agent refuses to answer without confirmation from the system. It doesn't invent a price — it asks the API for the real one. It doesn't invent a date — it asks the calendar. This works because LLMs (Claude Opus 4.7, GPT-5) are good at instruction-following when the constraints are clearly spelled out.
Layer 2 — LLM supervisor (a second model checks the first)
The second layer is a smaller, faster model that checks the first one's answer before it goes to the customer. At Grow2.ai the architecture looks like this:
- The Agent (Claude Opus 4.7 or GPT-5) generates a draft answer.
- The Supervisor (Claude Haiku 4.5 or GPT-5-mini) receives the original request + the draft + the rules.
- The Supervisor returns a JSON approve/reject with a reason.
- If approve=false, the draft is discarded and the agent regenerates or escalates.
What the supervisor checks: numbers (price against the price list), dates (a realistic meeting date), tone (brand voice), promise (the agent didn't promise something the company can't deliver). Cost: the supervisor is a smaller model, adding ~$0.001-0.005 per request. At 10K requests/month that's an extra $10-50. Infinitely cheaper than one bad incident with a VIP customer.
Layer 3 — Human-in-the-loop (escalation + audit)
The third layer is a guaranteed human control point in two scenarios.
- Scenario A: the AI escalates itself. If the confidence score is below the threshold (usually 0.7) or the supervisor returned approve=false, the agent creates a task in the CRM tagged "manual review needed" and hands it to a manager with the context ready.
- Scenario B: VIP segment and critical fields. Predefined segments always go through a human. The agent prepares a draft answer, the manager reviews it for 30 seconds, then sends or edits it.
Audit: every agent answer is stored with a full log — the original request, the system prompt, the supervisor response, the final decision, and who confirmed it and how. If a customer writes "your bot told me 50%, where's the discount?", we find the full trail in 30 seconds.
What happens when it slips up anyway
Honestly: the agent handles 2-5% of requests suboptimally. Not "making up a price" — that's blocked by Layers 1-2 — but giving a templated answer where the customer expected personalization, or stalling on an unusual request. This isn't an "error" in the engineering sense — it's a drop in quality compared to your best manager. What we do about it: a weekly review for the first two months, a customer feedback loop, A/B testing on contentious fields. This isn't "set it and forget it" — it's an ongoing process.
What the protection does NOT give you
The anti-hype part. None of the three protection layers guarantees:
- empathy on an emotional request ("my father died today, I can't come in for the viewing" — the AI will understand the context and escalate, but it isn't a human response);
- flexibility on a non-standard offer ("let me pay 6 months upfront for a 30% discount" — that isn't in the prompt, so it escalates);
- intuition on "hot" signals (when a customer writes with nuances a human salesperson reads instantly and the AI misses).
An AI agent with three protection levels is a safety net, not magic. It gives you confidence that the basic mistakes are blocked. The hard part is still your team's work.