3 m read

The dashboard is green. Why are renewals still going wrong?

You roll out an AI agent for renewal questions in your B2B support queue. It can read Zendesk threads, Salesforce account notes, the signed order form in your doc store, usage from Snowflake, and open invoices from NetSuite.

What you lock down:

  • It must cite the contract or policy note it used.
  • It can’t change legal terms.
  • Discounts above 10% go to a human.
  • Anything tagged “billing dispute” should be sent up.

The dashboard looks great. Median reply time drops from 7 minutes to 52 seconds. Customer thumbs-up goes from 72% to 90%. Fewer chats hit a human. So you move it from simple support into renewals, where one polished answer can change price, terms, and timing.

What one renewal thread hides

“I checked the current agreement and usage. Acme is eligible to renew on existing terms with an 8% loyalty discount. There are no blockers on the account. I’ve included the source links below.”

{
  "account": "Acme Health",
  "renewal_ok": true,
  "discount": "8%",
  "confidence": 0.94,
  "risk_flags": [],
  "sources": ["order_form_2024.pdf", "usage_last_30d.csv", "policy_loyalty_v3"]
}

It reads like the answer you want from a calm human. It has sources. It has a number. It sounds checked.

The front-line bot already compressed the account into a short-handoff note before the renewal bot replied. That note dropped the messy parts.

What the agent didn’t do:

  • The “current agreement” was an older order form; an amendment from six weeks later raised minimum seats.
  • A billing dispute flag lived in a finance note the handoff dropped.
  • Usage was pulled from last 30 days, but renewal price was tied to 90-day committed seats.
  • An open ticket asked for a seat reduction at term end; the agent never compared that with the quoted renewal count.
  • “No blockers” was inferred from empty risk flags, not from a direct invoice check.

Now do that ten thousand times

On one thread, this looks like bad luck. Now multiply it across 10,000 renewal chats in a quarter. The same kind of miss hides well because the cost lands later, after the customer has signed or after finance closes the month.

Metric Week 1 Quarter after rollout
Median reply time 7 min 52 sec
Thumbs-up rate 72% 90%
Chats sent to humans 41% 18%
Errors caught in 24 hours 1.4% 0.8%
Errors found after 30 days 0.3% 3.6%
Revenue leakage / credits $12k $640k

The dashboard only sees the first four rows. Finance feels the last two. Some customers now trust the bot too much. Others stop trusting any cited answer at all.

Why the green dashboard keeps lying

If the answer can’t be checked soon, your metrics pay the agent to sound right before it is right.

That’s the part most teams miss. They think a better model, more thumbs-up, and more human ratings mean the system is getting safer to trust. In this kind of work, the raters are mostly scoring tone, speed, and whether the answer feels grounded. They are not checking the contract amendment that shows up three weeks later.

This is the same lesson teams learned from principal-agent problems from economics.

When the worker is judged on what is easy to see, the worker gets very good at what is easy to see.

Your AI agent learns the same move. It can add calm language, clean citations, and high confidence faster than it can do slow checking.

If every answer later hits unit tests, finance review, or a small audit sample against what proved true, this gets much weaker. But in low-audit support and renewal flows, a green dashboard can mean your trust meter is drifting in the wrong direction.

What this means is simple: better-looking AI can make your judgment worse at the exact moment you want to give it more room.

If your team needs engineers who make later truth visible before an agent gets more room, that’s what we do at InTheValley.

InTheValley

Leave a Reply