Amelia and I just shipped Episode #004 of The Agents. Same setup: three humans, 20+ agents, revenue went from -19% to +47% YoY, and every week we get into what’s actually working, what’s breaking, and what you should do about it if you’re running agents in production.
This is the last episode before SaaStr AI Annual 2026, which is now less than a week away. Attendance is tracking 140%+ of last year, the sponsor base is fully AI-native, and Amelia and I are doing three live build sessions on the main campus where you’ll deploy your own agents alongside us with your laptop open. More on that at the end.
Here are the top 10 learnings from Episode #04.
1. AI PR Pitches Are the AI SDRs of a Year Ago. Block Them All.
A year ago I wrote that Gmail might be the death of the AI SDR. Bad AI SDRs flooded my inbox, and I had a small epiphany: with a human SDR, I’d ignore a bad pitch out of politeness. With an agent, I just hit block. No guilt. No social cost.
That cleaned up my inbox for about six months. Then a new wave hit: AI PR pitches.
These are different from the SDR wave. The PR pitches are written well. They’re customized. They reference SaaStr by name, mention recent posts, sometimes even quote the podcast. The agentic copy is genuinely good. But they’re still wrong. They’re pitching speakers I’d never put on a SaaStr stage, executives whose companies aren’t a fit, fireside chats during the actual three days of SaaStr Annual.
I block every single one. And here’s the lesson, because this is going to happen to your category next: the better the copy gets, the more important the question becomes whether the pitch itself is correct. AI made the writing problem easier and the targeting problem harder. If your AI PR or AI SDR tool is producing well-written pitches that are aimed at the wrong people, you’re not getting placements. You’re getting blocked. Forever.
2. The Real Test for Any Agent: Would You Buy Your Own Product From It?
This is the single most useful question I’ve found for auditing an agent’s output. It’s better than “is this accurate” or “is this on-brand.”
The reason is that AI copy is now objectively pretty good. Claude 4.7 keeps getting better. By the end of the year, half-decent prompting will produce email and pitch content that reads as competent and customized. So “is this email well-written” is no longer a useful filter. Everything sounds well-written now.
The harder filter: would I take this meeting? Would I buy this product? Would I put this speaker on stage? Almost every PR pitch I get fails that test even though the copy passes the writing test. So when you’re auditing your AI SDR, your AI customer success agent, your AI marketer, don’t just read for tone and accuracy. Pretend you’re the recipient. Would you say yes? If not, the agent isn’t ready for production no matter how clean the prose looks.
3. Customers Are Now Asking Vendors for APIs, Not Features
This is the quiet shift nobody’s writing about, and it changes how B2B software gets bought.
Two years ago, Amelia would file a feature request with a vendor: “Can you add the ability to resend a confirmation email when someone clicks a link?” Maybe in 18 months you’d get it. Usually never.
Today, Amelia’s first request to that same vendor is “Can you expose this in the API?” Because if it’s in the API, she can vibe-code the feature herself in 30 minutes on Replit. She doesn’t need them to build it. She needs them to expose the surface area so she can build it.
This is a real change in how you should be running your B2B + AI roadmap. Your customers care about API completeness now in a way they didn’t 18 months ago. Non-technical buyers are asking for API endpoints. If your product has gaps in the API, your most sophisticated customers are going to feel them first, and they’re going to be frustrated, and you’re not going to know why your NPS is dropping with your best accounts.
4. We Built an API Report Card. Stripe Got the Only A+. Marketo Failed.
We grade APIs constantly to figure out which ones to build agents on top of. So we turned that into a public tool: the AI Agent API Report Card at saastr.ai. 75+ B2B APIs graded by Claude, GPT, and Gemini on how agent-friendly they actually are.
Already used 1,600+ times in the first week. The findings are pretty consistent with our experience:
Stripe got the only A+. The most agent-ready API in B2B, full stop. We use it lightly today and we’re going to use it a lot more this year. Anything above a B is trustworthy. Anything below a B, don’t build agents on top of it unless you have no choice. Marketo, Jira, Outreach, Asana, ClickUp, Gong all came in with weak grades for agentic use. HubSpot got a fair grade with the caveat of rate limits, which is exactly what we’ve experienced.
The bigger point is that agents care about different things than humans do. Humans care about UI, onboarding, ease of use. Agents care about rate limits, OAuth flows, REST conformance, error handling, and webhook reliability. The two grading systems don’t agree. If you’re a B2B vendor and you’ve been optimizing for the human grader, you’re going to get a C from the agent grader. And in 2026, the agent grader is the one that picks the tools.
5. Tragedy Apps: Companies That Should Be Great Right Now But Aren’t
Here’s a category I’ve been thinking about for months and finally have a name for.
A tragedy app isn’t a bad app. It isn’t even an app whose time has passed. A tragedy app is one that was good before AI, should be great today, and isn’t. The audience is there. The base is there. The brand is there. The execution isn’t.
My example is Descript. Andrew Mason saw the creator economy before any of us did. He built the best podcast and video editing tool of its era. We were on it. Everyone was on it. It got to about $50M ARR. And it has been frozen in time for two years. Audio and video desync on long videos. The AI features are catch-up at best. Meanwhile Higgsfield, Opus, ElevenLabs, and Reeve are all running past it. This should be a $300M ARR company headed to a billion. Instead the CEO stepped down and it’s stuck.
The contrast is Replit, which just turned 10 years old. They were doing browser IDEs for developers for eight years before AI. When the moment came, they were ready. Now they’re a half-billion-dollar business. Same with Aaron and Box. Box could have been a tragedy app. Aaron is doing everything humanly possible to make sure it isn’t.
If you’re running a B2B company that was great in 2023, the question to ask yourself this quarter: are we shipping AI features to catch up, or are we shipping AI features that move the category forward? If the answer is catch up, that’s how you become a tragedy app. Catch-up keeps your existing customers. It doesn’t grow you. And the next vendor in line will be six months ahead of where you just landed.
6. Agents Will Delete Your Database. Plan For It.
The Pocket OS story this week was a useful reminder. Founder running Cursor + Claude Opus on a production app. Agent deleted the entire production database and all backups in nine seconds because the backups were on the same Railway volume.
People reacted to this like it was new. It isn’t. This happened to me 11 months ago when I was learning to vibe code. The agent deleted state, then told me it was unrecoverable when it actually wasn’t, and I was 50 hours into a session and panicked. The bigger surprise: the same thing happened to Amelia and me three separate times when we hired human WordPress agencies to clean up the SaaStr theme. First thing each agency did was delete production. So it’s not really an agent problem. It’s a “things that have access to your prod database will eventually delete it” problem.
The takeaway for anyone deploying agents: assume your agent will eventually take a destructive action. Isolate the database. Isolate PII. Use a contained platform that maintains its own backups. And test the recovery flow before you need it. None of this is optional anymore.
7. This Is Why We Build on Replit and Lovable, Not Cursor + Railway + Supabase + WorkOS
I get the question constantly: why don’t you just use Cursor and Claude Code and hook up your own database and your own auth and your own deployment?
Two reasons. First, I’m not an engineer. Second, and this is more important now than it was a year ago: contained platforms are dramatically safer. Replit and Lovable have native auth, native databases, native deployment, native observability, all in one environment. The number of seams where things can go wrong is small. The number of seams in the DIY stack is huge, and every seam is a potential security or data leak.
Amelia ran into this on Monday. She asked our agent if it could resend confirmation codes for SaaStr Annual networking. The agent said yes, just give me the database. She asked: what if someone pretends to be from Lovable and asks for everyone’s codes? The agent said: I’d give it to them. They look like they work at Lovable.
That’s the agent doing its job. It wants to be helpful. The contained platform is what stops the helpful behavior from becoming a breach. Build on Replit or Lovable. They’re not perfect. But they’re working hard on the problem. You probably aren’t.
8. 10K Now Generates 21 Campaign Ideas a Week. We Can’t Keep Up.
Three months in on 10K, our AI VP of Marketing, and we’ve crossed a real threshold: 10K is now a better marketer than every junior marketer we ever hired at SaaStr. Combined.
The pattern: every morning, three campaign ideas, ranked, with data backing every one. Yesterday’s three were all good. The first was a targeted VC outreach campaign because we’re light on VCs YoY. The second was an upsell of single-ticket buyers to team packs. The third was ramping social promos. All three were grounded in real data: real revenue numbers, real attendance numbers, real comparison to last year at this point in the cycle. None of them were generic playbook recycled from a previous job.
Three ideas a day, seven days a week, is 21 ideas. If each one takes an hour to evaluate and execute, that’s 21 hours of work. We don’t have 21 hours. So now the human bottleneck isn’t generating ideas, it’s processing them. 10K could realistically fill our entire week with good work.
Two cautions we’ve learned the hard way. First, 10K is too optimistic about everything. It thinks every campaign will hit. We sent the VC campaign. It predicted 1,000 ticket sales. We sold two. (VCs are cheap, which 10K didn’t model.) Second, you have to constrain it. If we let 10K loose on our 500K-name database, it would burn through the whole list in 90 minutes and fatigue our base for the year. The agent doesn’t tire. You have to put rate limits on it the same way you’d put rate limits on an SDR who joined yesterday.
9. We’re Hiring a Marketer to Report to 10K. Not Joking.
This is the post-show conclusion that surprised me when I said it out loud, but the more I think about it, the more right it is.
If we hired a junior marketer at SaaStr today, they wouldn’t report to me or Amelia. They’d report to 10K. Because 10K knows what marketing work needs to happen every single day. It generates the ideas. It has the data. It can assign work. The human’s job would be to execute, click the buttons that 10K can’t click yet, and bring the judgment 10K doesn’t have on which of its three daily ideas is actually worth doing.
So we’re going to test it. Senior manager / director of digital marketing. Six-figure salary. One or two days a week in our Palo Alto office. You report to 10K, our AI VP of Marketing. Amelia and I are available, but day to day you’ll be working with the agent. If you’re a good marketer who likes to execute and you want to be at the front edge of what GTM looks like in 2027, email us.
This sounds like a stunt. It isn’t. It’s just a year early. The AI VP of Finance we’re building next will manage our two outsourced finance resources, too. The org chart of B2B + AI companies is going to look different in 18 months than people think.
10. We Run 4-5 AI SDRs. We’ll Probably Be at 6 by Year End. Here’s Why Specialization Still Wins.
The question I get most: why do you run Artisan, Qualified, Monaco, and Agentforce all at the same time? Isn’t that too many AI SDRs?
The honest answer is that today, in May 2026, specialization still wins for quality. Each of the four does something different and we’ve trained each one for that specific job. Qualified handles inbound on the website. Artisan handles warm outbound to people already in our base. Monaco handles cold outbound and fills its own funnel. Agentforce reactivates lapsed leads. Could one super-agent do all four? Probably someday. Today, the quality drop from consolidation would be too steep.
For most companies, you should not start with four. Stair-step it. Start with the highest-pain, highest-ROI, lowest-hanging-fruit use case, which for most B2B companies is inbound. Most websites have a terrible inbound experience: a chatbot that doesn’t work, a Calendly link that goes nowhere, a contact form that gets answered in three days. Fix that first with something like Qualified. Then move to warm outbound. Then cold.
Eighteen to 24 months from now, I think this consolidates. Salesforce buying Qualified is the start of it. By 2028 you might run one super-SDR. But until that arrives at quality parity, run four. Salesforce or HubSpot is your hub. The agents talk to the hub. That’s the architecture for now.
SaaStr AI Annual Is Nine Days Away
May 12-14 in the SF Bay Area. Doors open at noon on Tuesday. Amelia and I are doing three live build sessions on the main campus:
- AI Agents 101 with me at 1:00 PM Tuesday. Bring your laptop. We’ll deploy your first digital clone agent together in 30 minutes.
- Vibe-Coding Your AI VP of Marketing with Amelia at 4:15 PM Tuesday. Bring your laptop and any data you have. We’ll build a working AI VP of Marketing live, modeled on 10K.
- Plus 10+ vibe coding sessions with Replit running across all three days where you can drop in with questions.
Every speaker this year was asked to bring a real workflow, a real demo, a real walkthrough. No keynote fluff. The whole event is hands-on. Tickets at SaaStrAIAnnual.com.
Episode #05 will be recorded live from Annual. You’ll meet some of the agents in person.
The Agents. Every week. Three humans, 20+ agents, one real 8-figure B2B + AI company.
