Amelia and I just shipped the first episode of our new podcast, The Agents. The premise is simple: we run 20+ AI agents at SaaStr, three humans, revenue went from -19% to +47% YoY, and we’re going to talk every week about what actually works, what breaks, and what’s coming.
No hype. No roadmap theater. Just what we’re running, what we’re seeing in the portfolio, and what B2B + AI founders should actually be doing about it.
Here are the top 10 learnings from Episode #001.
1. Nobody Talks About Who Maintains Your Vibe-Coded Apps. We Do. Daily.
Everyone is having a blast building on Replit, Lovable, and v0. The demos are magical. The first working version ships in an afternoon. And then Monday comes.
Here’s the thing nobody puts in the tweet thread: vibe-coded apps need someone to maintain them. Daily. Not once a quarter. Not when something breaks. Daily.
At SaaStr we’ve shipped 10+ production apps with 750K+ combined uses. Every single one needs a product-savvy human checking on it. Not because the tools are bad. Because agents drift, data changes, models update, integrations flake, and all of that happens whether you’re watching or not.
If your “AI strategy” is “we built some apps on Replit,” you’ve done the fun part. The job starts now. Figure out who owns maintenance for each app before you ship the next one.
2. Hallucinations Aren’t Gone. They’re Daily Maintenance.
The industry narrative right now is that hallucinations are mostly solved. In our production environment, this is not what we see.
10K, our AI VP of Marketing, compared the wrong year in an analysis this week. It made up a data point in a different one. These aren’t catastrophic. They aren’t rare either. They’re daily maintenance items, and if nobody’s reviewing outputs, they ship to customers as facts.
The good news: we catch them quickly now, because we’ve built the muscle. The bad news: anyone running AI agents in production without a daily review loop is publishing hallucinated data to their customers right now and doesn’t know it.
The fix isn’t a better model. The fix is a process. Someone reviews outputs every day. No exceptions.
3. Model Regressions Silently Break Working Apps.
This one caught us by surprise. Our pitch deck analyzer, which has graded 4,000+ decks, suddenly started producing anomalous results. Nothing changed on our end. The app was fine last week. Then the underlying model updated, and the outputs went sideways.
This is a new failure mode most teams haven’t thought about. You don’t choose when your model regresses. You don’t get a changelog. You just notice the outputs look weird, and you have to figure out whether it’s your prompt, your data, your integration, or the model provider’s update you never asked for.
Every production agent needs a quality baseline you measure against regularly. If outputs drift, you know fast. Without that, you only find out when a customer tells you something is broken, which is the wrong way to find out.
4. The Upsell Trap: Clay’s Agent Recommended a Model 2-5x More Expensive Than Needed.
This is the sharpest learning from the episode and the one most founders haven’t processed yet.
We were using Clay. Their agent recommended an approach that would have required their most expensive model, at 2-5x the cost of the cheaper option that would have worked just fine for our use case. And this happened right around the time Clay announced a price increase.
Was this intentional? Almost certainly not in the way it sounds. The agent wasn’t properly trained on the new pricing structure, so it defaulted to recommending the premium path. But the effect on the customer is identical whether the upsell is intentional or accidental: you get pushed toward the more expensive option without knowing a cheaper one solves your problem.
Every B2B + AI founder needs to audit this right now. When your agent recommends your own product, is it recommending the right tier? Or is it quietly steering customers to the most expensive option because that’s what the training data rewards?
Trust, once you lose it this way, doesn’t come back.
5. Agents Blame Other Tools When They Break. You Need Humans to Call BS.
Here’s a pattern we see constantly: something breaks in one of our vibe-coded apps, and the agent helping us fix it immediately blames a third-party service. “Oh, it’s a Stripe issue.” “The Airtable API is down.” “OpenAI must be rate-limiting you.”
Sometimes true. Often not. Agents will confidently point at another tool rather than dig into the actual root cause, especially when the root cause is their own code or configuration.
This is why maintenance requires product-savvy humans, not just technical ones. You need someone who can look at the agent’s explanation and say “I don’t buy that, check again.” Without that skepticism, you waste hours chasing problems that don’t exist in the tools the agent blamed.
The operator’s instinct is worth a lot in this new world. Don’t automate it away.
6. No Lead Left Behind: This Is the Real Unlock.
This is the positive thesis underneath everything we’re building. Every lead, prospect, and customer interacted with in real-time. No drop-offs. No “we’ll follow up next week” that turns into never.
Agents make this possible for the first time. They work 24/7. They don’t have bad days. They don’t judge a lead as “probably not ready.” They don’t skip follow-ups because they’re behind on quota. They just keep engaging, every lead, every prospect, every customer, every day.
Our 72% open rate on 1,000 leads our human team had ghosted for six months isn’t a fluke. It’s what happens when an agent picks up what a human team didn’t have the capacity to handle.
Here’s the catch: even with agents, you can leave capacity on the table. If your volume is low, your agent sits idle. Most companies aren’t using agents to their full capacity because they haven’t built the top of funnel to feed them. The bottleneck shifts from “can we handle the leads” to “can we generate enough leads for the agents to work on.”
That’s a good problem. Work it.
7. Salesforce Buying Qualified Is the Smartest GTM M&A in Years. Almost Nobody Noticed.
Salesforce quietly acquired Qualified. Qualified has been our #1 inbound BDR agent, over $1M+ in closed revenue through it. Now it lives on Salesforce’s own homepage as a 3D avatar, which is the tell that this is now a Salesforce product, not a Salesforce integration.
Think about what Salesforce now owns: Qualified for inbound, Agentforce for outbound, Data Cloud as the data layer, and the core CRM. That’s the only end-to-end AI GTM stack in the market that actually works in production today.
Every competing CRM now has a real gap to close. Every AI SDR point solution now has a question to answer about why they exist. And every B2B company running Salesforce should be rethinking their GTM roadmap in light of what this stack can do.
The rule: pick your CRM based on where the agents are heading, not where the features are today.
8. The Salesforce + 10K Integration Was Brutal Until We Built a Custom Object.
Real technical war story from this episode. We spent months trying to integrate 10K with Salesforce. The tokens expired every 24 hours. Every day, the integration broke. Every day, someone had to re-auth.
The unlock: we built a custom object in Salesforce for 10K. Now the token refreshes once a year instead of daily. The integration finally runs clean. 10K pulls its own pipeline data, runs its own analysis, finds its own breakdowns.
This is the jump most teams haven’t made yet: from “AI that reports on what happened” to “AI that diagnoses what’s wrong and tells you what to do about it.” The integration work is tedious. The payoff is enormous. Don’t skip it.
9. QB Localized Into Chinese and Spanish in 20 Minutes. Total.
Qbee (our AI VP of Customer Success) needed to work for non-English-speaking sponsors at SaaStr AI Annual. The old way: hire a localization firm, budget three months, pray.
The new way: Replit plus OpenAI, 20 minutes, done. Both Chinese and Spanish, fully localized, shipped.
This is one of the most under-appreciated shifts in B2B + AI right now. Localization used to be a major strategic decision gated on resources. Now it’s something you do between meetings. Every market is accessible. Every sponsor gets a tailored experience. The old excuse (“we don’t have bandwidth to support international”) just evaporated.
If your product isn’t localized yet, the question isn’t “should we?” It’s “why haven’t we shipped it this week?”
10. QB Catches What Humans Miss. And Responds Faster Than Humans Can.
The concrete win from this episode: sponsors were uploading placeholder images and incomplete graphics to their SaaStr Annual assets. Classic “I’ll fix it later” behavior that humans either miss or are too uncomfortable to push back on.
QB caught it. Every time. And auto-emailed the sponsor contact with specific, neutral feedback about what needed to be fixed.
Two things to notice here. First, the catch rate is higher than a human team would hit, because QB checks everything the same way, every time, without getting tired or distracted. Second, the feedback landed better than when humans delivered it. Neutral AI feedback, delivered consistently, is often more effective than the human equivalent. There’s no relationship dynamic, no ego, no “did I say it right.” Just the message.
This is the pattern across most of what we’re seeing work. Agents aren’t replacing humans. They’re doing the jobs humans were doing badly, inconsistently, or not at all.
What Comes Next
The theme across all 10: we’re past the “is AI real” phase and into the “how do you actually run it” phase. Maintenance matters more than magic. Process beats model choice. Product-savvy humans supervising agents beats either one alone.
Episode #002 drops next week. We’ll go deeper on what we’re seeing across our AI Agents, the state of AI SDRs, why 60% Solutions aren’t enough, and a few more AI operator lessons we learned the hard way.
The Agents. Every week. Three humans, 20+ agents, one real 8-figure B2B + AI company.
