Field notes from the studio. Things we learned, written down so we don't forget.
Not a blog. A working file Grow2.ai kept on purpose.
Things Grow2.ai got wrong, fixes that survived, evaluations of new models, prompts that earned their keep, and the occasional opinion we'd defend in writing. Posted when worth posting; never on a schedule. Mostly written for the version of us that will run into the same problem in eight months.
Why we stopped fine-tuning, and started writing better prompts in fewer words
Fine-tuning a 70B on six months of customer messages cost us €4,200 and produced an agent measurably worse than a Claude Sonnet with a 1,400-word system prompt. Here's the full evaluation, the cost ledger, and the framework we now use to decide.
Per-language tone profiles — what they actually look like, with examples
A walkthrough of the tone document we co-write with the client's actual receptionist before the agent goes live, plus the three things that always get wrong on the first draft.
Why we don't publish a model selection matrix
Tempting to ship a "we use Claude for X, GPT for Y, Llama for Z" page. We refuse. The reasoning behind that, and what we say when clients ask.
The 4-hour no-show recovery loop · Clinica Via post-mortem
Why T-72h / T-24h / T-2h beats T-48h / T-2h for Ukrainian dental patients, and the opt-out rate trade-off we measured before settling.
When to walk away from a pilot — three real conversations
Three engagements we said no to in 2025, why, and the four-question gate we now use before signing the brief. Including the one we should have walked away from but didn't.
Hostaway → agent: how we built the per-unit knowledge wiki for Atlas
Six weeks of integration work, two days of human annotation, one stubbornly broken kettle in unit 47. The structure of an apartment's knowledge entry.
Real estate agents and AI agents — a Berlin field report
Three weeks shadowing a Berlin short-let operator. What surprised us about the operations layer, the messages real guests send at 2am, and the bias toward formality we accidentally trained.
Citation-first answers — the workflow, not the prompt
Citations aren't a prompt instruction; they're a retrieval architecture. The five-stage pipeline that gets us below 0.4% hallucination, with code-shaped pseudocode.
Speaking Polish to a Polish person — what cross-language tone work costs
A small case study on the per-language tone profile cost: ~3 hours of native-speaker time, ~€340 in our model, the smallest line item with the largest CSAT impact.
A tale of two CRMs — Bitrix24 vs. KeyCRM, what to expect
Two of the most common Ukrainian CRMs on our integration list. Where each one helps the agent, where each one fights us, and the field-mapping pattern we use to keep both reliable.
How dental clinics actually book — and why no-show recovery is mostly listening
Six weeks of front-desk shadowing across three Ukrainian clinics. Why the right cadence is unsexy, why the daemon's tone matters more than the cadence, and what the human still has to do.
The "we paid for it once" promise — what it costs us, what it buys you
Our hand-off package — architecture, prompts, integration credentials. The maintenance cost of "you can fire us and keep the agent." Why we keep eating it.
Auto Pivdenny's eight rules — the qualification ruleset, in full
The actual eight qualification rules we co-wrote with Pavlo on Day 1 of his pilot, why each one is there, and the two we tried to add later that didn't work.
Field notes — why we're writing them at all
A short note on what this journal is, what it's not (a blog), and why anything we publish here is something we'd defend in a courtroom.
Found a note worth quoting? Bring it to the call.
Grow2.ai writes field notes for itself first. If one of them describes your situation — that's usually a good sign we should talk.