Why we don't publish a model selection matrix

Prospective clients ask Grow2.ai about model choice on the first call. The honest answer — "it depends, and the answer changes every six weeks" — is unsatisfying, so we keep getting asked for a published matrix. Here is why we still refuse to ship one.

The shelf-life problem

Grow2.ai ran the same 14-day pilot in February 2024 and February 2026. Same client type (dental clinic), same module (M-02 Booking Daemon), same evaluation harness. The 2024 stack: GPT-4 + Pinecone + Twilio. The 2026 stack: Claude Sonnet + pgvector + Twilio. If we had published a matrix in February 2024, half of it would be wrong by Q3 2024, and embarrassing by 2026.

The signal-to-noise problem

Model benchmarks at the frontier change weekly. Internal evaluations on our own conversations change monthly as model providers ship updates. A "definitive" matrix on a public page sets an expectation we cannot maintain at the cadence the actual technology moves. The only honest version is "we re-evaluate per pilot, and the answer might be different than your competitor's pilot last month."

What we say instead

When a client asks "which model are you using?" on the first call, Grow2.ai answers directly: "Today, Claude Sonnet for reply, GPT-4.1 for some specialised tasks. We re-evaluate per pilot. The system prompt and integration are yours; switching the underlying model is a four-hour job in the contract you own."

That last sentence is what actually matters. The model is not the lock-in. The prompt, the evaluation harness, and the integration plumbing are. We hand all three over on Day 14 — which means the client could fire us, switch providers, and run a different model with the same agent shape on Day 15.

The shelf-life problem

The signal-to-noise problem

What we say instead

Have a problem this note describes? Bring it to a call.