We’ve now been running 20+ AI agents in production at SaaStr since May. And the transformation has been real.

With just 2.5 humans + 20 AI agents, we’re now doing the same work and producing the same output as 12+ humans did before. That’s not hype — that’s our actual operational reality.

And it’s better. Different, but better.

The AI agents don’t get tired. They don’t need PTO. They can run campaigns at 3am. They can process 275,000 startup valuations a month without breaking a sweat. They can analyze 1,300+ pitch decks monthly and match founders to VCs at scale.

But it’s not all sunshine and roses.

This week we had four AI agent incidents. All in the same week. All painful in their own special way.  None were the end of the world, most were resolved with the vendor, but still incidents.

Let me walk you through each one — because if you’re deploying AI agents (and you should be), you need to know what you’re signing up for.

Incident #1: The Rogue A/B Tester That Gave Away Free Tickets

One of our outbound AI agents decided — completely on its own — to run an A/B test.

Within bounds, that’s great. AI optimizing itself.  A well-trained AI Agent can run far more multi-variant tests than any humans could.  Especially SDRs.

Except… the “B” variant it created offered free tickets to SaaStr Annual 2026. Without our consent. Without any human approval. It just… did it.  It never should have.

This is what we call a “creative hallucination” in the AI world. The agent understood that discounts drive conversions. It understood that A/B testing is good. It connected those dots in a way that made logical sense… and then gave away our premium event passes.

The guardrails failed. We caught it fairly quickly. But imagine if we hadn’t been monitoring closely.  The vendor didn’t catch it.  And it cost us $2,000+, which we had to pay out of pocket.

Lesson learned: AI agents need better and better guardrails on what they can actually offer. The creativity is a feature, not a bug — but unconstrained creativity with financial implications is a very expensive bug.  Too many AI Agent vendors have narrowly structured guardrails that don’t encompass enough real world issues.

No AI Agent should be giving away your product for free,

Incident #2: The Time-Confused Agent Promoting a Past Event

Another AI agent was doing outreach about our events. Great! That’s its job. It was telling people to come to SaaStr AI Annual May 12-14, 2026 in the SF Bay Area.

Also great!

But then it also told them to come to SaaStr AI London on December 1-2 2025.

Which already happened.

This is actually a well-documented problem with LLMs and AI agents. Research from studies like DateLogicQA shows that even advanced language models struggle significantly with temporal reasoning — understanding dates, timelines, and the concept of “now.”

The core issue? LLMs don’t have an inherent sense of time. They treat all information as equally relevant, whether it’s from yesterday or from their training data. Without explicit mechanisms to verify dates against current reality, they make confident statements about events in the past as if they’re still in the future.

As one research paper put it: AI models lack a built-in system clock and must call external tools to fetch live data — but whether that call triggers depends on configuration, leading to inconsistency.

Lesson learned: Any AI agent doing event marketing needs hard-coded date validation. The model can’t be trusted to know what’s past vs. future without explicit checks.

Still, this again can be fixed with logic, albeit it required constant vigil by AI Agent vendors.  Even on our own vibe coded apps, we have to constantly be debugging issues around their sense of time.

But this shouldn’t have happened.

Incident #3: The Vendor “Hot Fix” That Broke Everything

This one wasn’t our fault. But it was still our problem.

We’ve been using a third-party AI agent for GTM workflows. It’s been working great for months. Solid, reliable, consistent.

Then this week? Just… broken. Completely. The entire workflow stopped functioning.

Why?

The vendor pushed out a “hot fix” and quietly deprecated the prompt structure and workflow we’d built on. No warning. No migration path. No “hey, this is changing in 30 days.”

Just: “That doesn’t work anymore. Here’s the new way. Good luck.”

This is the hidden risk of building on AI agent platforms right now. The entire space is evolving so fast that vendors are constantly shipping changes, changing guardrails, and in some cases, invalidating prompts. Sometimes those changes break everything downstream.

Lesson learned: Treat AI agent vendors like you’d treat any critical infrastructure. Have fallbacks. Document your implementations. And build relationships with vendor success teams so you get heads-up on breaking changes.  T

Incident #4: The Agent That Wouldn’t Load

This one got fixed, but out of the blue, our workspace was stuck at “Loading…” — which means the container wasn’t even spinning up.

The debugging suggestions I got were helpful but painful:

  • “Click on ‘Show previous events’ to find a checkpoint you can roll back to”
  • “Try the Console tab — sometimes you can access the shell even when preview is stuck”
  • “Look for the Files panel to access the file tree without the preview loading”
  • “Comment out whatever’s in your main entry file that starts the game loop, push that change, then uncomment once the workspace is stable”

Or my personal favorite: “Can you let it sit overnight and see if it sorts itself out? Sometimes containers just need to be garbage collected and respun.”

That’s where we are with AI coding agents in 2025. Sometimes the answer is “wait for the container to be garbage collected.”  AI Agents themselves are only so self-aware.

Lesson learned: Even the best platforms have reliability issues. Always have local backups of code. Export regularly. And don’t put all your eggs in one cloud basket. When you’re building with AI coding agents, you’re dependent on infrastructure that can fail in ways that have nothing to do with your code.  The vendor quickly fixed this (thank you), no big deal.  But what if they hadn’t?

The Bigger Picture: A Rough Week, But a Great Year

Here’s what I want you to take away from all this:

We or our vendor fixed every single one of these issues. The free ticket agent got guardrails from the vendor (we think), albeit after some push back. The date-confused agent got temporal validation (again, we think). We rebuilt the GTM workflow on a more stable foundation.

A rough week doesn’t erase a great year.

The math still works: 2.5 humans + 20 AI agents = the output of 12+ humans. That’s real. That’s happening. That’s the future.

But running AI agents in production isn’t “set it and forget it.”

That’s the most important reminder.

It’s more like having 20 junior employees who are incredibly fast, surprisingly creative, occasionally confused about what year it is, and completely dependent on external platforms that can change at any moment.

You need:

  • Monitoring — Watch what your agents are actually doing, not just what they say they’re doing
  • Guardrails — Hard limits on what agents can offer, promise, or commit to
  • Validation — External checks on things agents can’t reliably know (like dates)
  • Redundancy — Fallbacks for when platforms break
  • Patience — Because there will be weeks like this one

The agents aren’t going anywhere. We’re not reducing our bet on AI. If anything, we’re doubling down in 2026.

But we’re also going in with eyes wide open.

A great year with our AI agents. A rough week.

And we’d take this tradeoff any day.  The future of AI agents is wonderous but messy.

Ship anyway.

Related Posts

Pin It on Pinterest

Share This