Most public "hallucination prevention" advice for LLMs is prompt-engineering folklore: tell the model to "only answer based on the provided context" and hope. Grow2.ai has run that approach in production and measured it. It works ~92% of the time. We needed below 0.4%, so we built a retrieval pipeline that makes hallucination structurally hard, not just discouraged.
The five stages
- Source-typed retrieval. Every retrieved chunk carries a structured source descriptor:
{ doc_id, section_id, version, last_modified, source_type }. The agent never sees naked text — it sees text plus provenance. - Quote-or-defer rule. The agent is instructed (in code, not just prompt) that any factual claim must either quote a chunk or explicitly say "I don't have a source for that — let me check with the team." The escalation path is real: those messages route to a human, who answers and writes a new chunk.
- Citation rendering. Every reply that contains a quoted claim renders an inline citation marker. To the user, this can be styled invisibly — but the marker is in the message thread, and it becomes evidence if a dispute arises later.
- Post-hoc validation. A second, smaller model reads the agent's reply and the chunks it cited. If the reply contains a factual claim not covered by the chunks, the validator flags. We sample 100 conversations a week through this gate; any flag becomes an eval-set entry.
- Knowledge-gap log. Every "I don't have a source for that" deferral writes a row to a gaps table. Ops triages weekly. New chunks get added. The agent gets quietly smarter, on a per-client basis, without retraining anything.
The numbers across 14 pilots
- Hallucination rate (validator-flagged): 0.31% across 71,200 sampled conversations.
- Deferral rate ("let me check with the team"): 6.4% of conversations contain at least one deferral; 88% of those resolve without further escalation once the human supplies an answer.
- Knowledge-gap closure: median time from gap-logged to chunk-added is 4 days, with weekly triage.
What it doesn't solve
Citation-first does not stop the model from getting tone wrong. It does not stop the model from selecting the wrong chunk when two chunks contradict (we have a separate "chunk-conflict" flag for that). It does not stop the model from answering a question the user didn't actually ask — the classic LLM failure where the reply is technically correct but addresses a different problem.
Hallucination is a small part of agent quality. Grow2.ai treats it as a solved problem at the architecture level so we can spend prompt-engineering time on the harder problems: tone, escalation timing, conversational repair when the user is upset.
Why we don't open-source the code
We get asked. The honest answer: the pipeline is ~600 lines of TypeScript and SQL, and it's not the interesting part. The interesting parts are the per-client chunk schemas, the deferral rules, and the eval set. None of those are extractable into a standalone library — they live and breathe with the workflow they serve. Grow2.ai will walk a serious technical buyer through the pipeline on a call. We will not ship a generic version.