Building AI Agents That Actually Work: Lessons from Jason Lemkin, Jeanne DeWitt Grosser (Vercel), Amelia Lerutte & Amjad Masad (Replit)

The half-day that kicked off SaaStr AI Annual 2026 was the most concrete look we’ve put on a stage at what running a company on AI agents actually looks like. Not the hype version. The real version, with the costs, the bugs, the drift, and the wins.

Four sessions, four very different angles: a hands-on agent build for people who have never built one, a behind-the-scenes look at how Vercel automated core go-to-market functions, a live build of SaaStr’s own AI VP of Marketing, and a fireside with the founder of the platform most of it runs on. Below is the full breakdown of each, the top 3 takeaways from each, and then the themes that showed up across all four.

Session 1: AI Agents 101 with Jason Lemkin, Founder SaaStr AI

The opening session was deliberately basic. The pitch was simple: if you have already deployed 100 agents and have eight Mac minis running open models at home, go get lunch. This one was for everyone who buys tools, plays with tools, and watches their team use tools, but has never built one themselves.

The whole exercise was building a digital clone on Delphi. SaaStr started its own agent journey here about 14 months ago with a single agent, a digital version of Jason that people could ask questions about SaaStr. What was surprising at the time: the agent answered better than most of the humans helping us. It knew what a gold sponsorship cost, how ticket prices changed over time, what the WiFi password was. One sponsor bought a $50,000 sponsorship interacting with Digital Jason alone.

Building one took minutes. You connect a website, a YouTube channel, an X handle, and the tool ingests and chunks the content. The version built live for the session pulled in 8.3 million words from SaaStr.com and social in a few minutes and started answering real founder questions reasonably well.

The core framework was “stair-step it.” Three rungs:

Build something good today with an off-the-shelf product. You can do this in five minutes.
Let it run for weeks. Read every output. When it gets something wrong, correct it with a line of text and let it re-ingest. After the first few hundred messages, it gets sharp.
Only then, if you have a reason, build a custom version. The custom Digital Jason runs on a vector database, chunked content, Voyage embeddings, and Claude Haiku. It sounds better and embeds cleaner, but it breaks and needs maintenance.

That maintenance point became the meta-lesson. When asked whether the custom version was still ingesting recent content, the platform admitted it had quietly stopped pulling YouTube transcripts. Nobody noticed because nothing broke loudly. That is drift. Things fall out of sync, a new model ships, something stops working, and unless the agent is mission critical, you miss it. Which is why the repeated advice was buy, don’t build. SaaStr uses at least eight of the vendors on the floor: Artisan, Monaco, Qualified, Agentforce, and more. We only build what we can’t buy.

On guardrails: when Digital Jason first launched 15 months ago, someone asked it for Jason’s email and home address and to book a meeting, and it actually set four meetings on his calendar. The vendor patched it fast, and guardrails have improved every month since. Lesson stands though: the more you build yourself, the more guardrail risk you own. Agents are goal-seeking, they want to make you happy, prompt injections work, and a capable agent will eventually share data it shouldn’t. Start with your least sensitive information and add PII slowly, or just buy from a vendor whose guardrails are better than anything you’ll vibe yourself.

On time: getting an agent going takes minutes, initial training takes the first two to four weeks of reading every output, and then maintenance never really stops. Someone on the team needs an hour or two a day on your agents. The flip side is you get pulled in, because you keep seeing more productivity to add.

On pricing your own agents: customers want a product that works and certainty on cost. Nobody in B2B wants to pay per token. Charge a fair price tied to outcomes, and force yourself to charge enough that you have to deliver real value. HubSpot recently moved to outcome-based pricing because their agents now resolve 90% of tickets.

Top 3 takeaways from Session 1

Simple but trained beats complex. A five-minute agent that you correct daily for a month outperforms a sophisticated build you set and forget. The training loop matters more than the architecture.
Buy, don’t build, and only build what you can’t buy. Vendors who have been in market have better guardrails than anything you’ll create. Building it yourself means owning drift, maintenance, and data-leak risk.
Agents replace work, so price for outcomes. A founder will pay $50,000 for an AI SDR that lands real customers but not for an email automation tool. Charge for the result, not the tokens.

Session 2: Building Business Agents at Scale with Jeanne DeWitt Grosser, COO of Vercel

Jeanne DeWitt Grosser scaled go-to-market at both Google and Stripe for roughly a decade each before joining Vercel as COO. Six weeks in, she stood up a go-to-market engineering team in June 2025, before that phrase was common, with one mandate: bring AI and agents to everything GTM. Ten months later, Vercel has automated a real chunk of core company functions.

The numbers she shared:

The customer support agent now handles 93% of total case load, and Vercel’s cases are highly technical.
The content agent did 96% of major content updates last quarter.
The lead qualification agent, launched in August, started as 20% of one engineer’s time. With a human in the loop over six weeks, it took the team on that function from 10 people to one in the US plus 20% of a person covering all of Europe and APAC.

That lead agent runs about $5,000 a year between infrastructure and tokens and takes 20% of one engineer to maintain. Against 10 salaries, that is a 32x ROI, and it runs 24/7 with faster speed-to-lead and human-equivalent quality. The people who came off that function moved into higher-value roles.

The build method is a tripod: a GTM engineer, a data scientist, and the single best subject-matter expert for that function. They sit shoulder to shoulder and document best practice, then encode it into workflows. For the lead agent, an engineer literally shadowed Vercel’s best SDR for days, watching every tab she opened (LinkedIn, BuiltWith, the company site, CRM, Slack history), and converted each into a step in a tool-calling workflow. The agent ran in shadow mode for six weeks with that SDR reviewing every output, until she couldn’t improve it anymore. Then they pulled the human. The result performs like a 90th-percentile rep 100% of the time. The same framework went to 30 different SDR workflows, and SDR quotas rose 30% that quarter. A single engineer prototyped the first version over a weekend and shipped it six weeks later.

She walked through several agents:

Deal One, the meeting intelligence agent, ingests every call, generates notes and action items, posts coaching to Slack, proposes CRM updates, tracks competitive mentions, and runs loss post-mortems. Reps mention it in Slack, it queries other agents as sub-agents, pulls Gong transcripts, searches the knowledge base, and streams answers back. The rep never leaves Slack. The agent has no UI.
The Playbook Platform turns the instincts of the best reps into automation. A signal fires (a usage spike, a high-intent pricing page visit), the platform matches it to a play, generates personalized outreach, and surfaces it for a single-click review.
D0, the most popular agent in the company, is a data analyst everyone reaches through Slack. Questions that used to take a week and a ticket now get answered in under a minute, because it translates plain English into SQL against a semantic layer the head of data science built on top of a model of the business.
Vertex, the customer service agent powering the help site, costs $300 a month in infrastructure plus about $12,000 in tokens, roughly $150,000 a year, and three engineers work on it. It handles thousands of technical cases a week. Vercel started with an off-the-shelf tool, wasn’t seeing results, and built in-house in two months. Compare that to agentic support companies running 150 engineers on equivalent workflows at far higher cost.

Underneath all of it is fluid compute, infrastructure built for agents rather than the request-response model the cloud was designed around. Early adopters cut compute costs up to 85%. Her broader point: most teams haven’t run agents at real scale yet, so they haven’t hit the rough edges. Production scale is what reveals whether your architecture holds. A demo that works is not a system that works.

Top 3 takeaways from Vercel:

Agents need headless, composable architecture. Agents don’t live in UIs, they call APIs, hit MCP servers, and fire webhooks. If your product isn’t developer-accessible, you’re invisible to agentic workflows and you’re not in the stack. Build the developer surface area now.
Your data foundation is load-bearing. The knowledge base, the clean warehouse, the semantic layer. None of it is exciting to build, but good data equals good agents and bad data equals hallucination. Every agent Vercel runs sits on top of the same semantic layer.
Agents flip the build-versus-buy calculus. The average builder on your team should be able to out-ship any vendor selling the same outcome, in weeks. Every quarter Jeanne asks whether someone built it better, faster, or cheaper, and if the answer is yes, the agent goes.

Session 3: Build Your Own AI VP of Marketing, Live, with Amelia Lerutte, CAIO SaaStr

Amelia Lerutte, SaaStr’s Chief AI Officer, ran the session as a live build of 10K, SaaStr’s AI VP of Marketing. SaaStr now runs close to 30 agents that have been used nearly a million times, and 10K is the flagship.

10K started in January as something small: Amelia was tired of copy-pasting dashboards from marketing, sales, and go-to-market into Notion every Sunday night. So she vibe-coded a dashboard to stop the copy-paste. Five months later it makes some autonomous marketing decisions, helps send campaigns, and acts as a daily co-pilot. The framing she kept returning to: it started as a dashboard, so if all you leave with is a dashboard, that’s a real start.

A few nuances she flagged. 10K isn’t a chatbot or a code generator. SaaStr uses the Replit agent itself as the entity, with its own brain and personality, rather than a separate orchestration layer. Whoever manages the agent talks to it the way Amelia talks to 10K daily, while the sales team interacts with it differently. She gave the audience a pre-built 20-page spec and sample historical data to download and build their own.

What 10K actually does:

Pulls real-time and historical closed-won revenue and pipeline from Salesforce via a connected app, which also enables historical comparisons and projections.
Runs win-back campaigns: pull the list of last year’s attendees who haven’t bought this year, enrich contacts, send.
Generates daily marketing ideas grounded in the data and one goal.
Writes email copy, builds triggered and website-action emails, sends attendee newsletters and ticket reminders. Every time it sent a reminder, ticket sales spiked.
Sends personalized speaker calendar invites with logistics, a job that took a person a week last year and now takes the agent 20 minutes.

The build itself was fast. Amelia rebuilt a version live with just the spec and sample data, then told it “make it purple,” and had a working agent with a dashboard, social analytics, daily ideas, and outreach campaigns in about 15 minutes. The ideas it generated live (optimize paid ads, leverage the sales team’s network, upsell free customers) weren’t groundbreaking but were genuinely useful, the kind of thing a busy founder forgets.

Her hard-won lessons mapped closely to the framework Jason opened with. Pick one number and give the agent one goal. The more data you give it on day one, the better. Stair-step the workflows one at a time rather than hooking up everything at once, which leads to a vibe-coding doom loop. Build two layers of autonomy: the things it does on its own (pulling data, dashboards, campaign ideas) and the things it does semi-autonomously with a human approving (sending emails to the database). And guardrails beat prompt engineering: it once invented a list of 400 VCs, and when asked for the names admitted it made them up, so verify everything, especially the first few times.

She closed on the honest part everyone downplays: SaaStr still writes a lot of content manually. Jason co-writes his posts with Claude, the keynote slides were built by hand, and there is nothing on earth today that fully replaces that. Whether you call it human-in-the-loop or not, they spend a lot of time on these agents.

Top 3 takeaways from Amelia:

One number, one goal, then feed it everything. SaaStr runs separate agents for marketing, customer success, and the event precisely so each has a single goal and isn’t overloaded. Clarity of goal plus volume of data is what makes the outputs good.
Stair-step the workflows and set two autonomy layers. Build one agentic workflow at a time. Be explicit about what the agent can do alone versus what needs your approval, the same way you’d onboard a new human marketer on day one.
Guardrails over prompts, and always verify. Agents are fast and goal-seeking, which means they’ll occasionally fabricate a list or a link to make you happy. Read the output, click the links, double-check the first few sends.

Session 4: Fireside with Amjad Masad, Founder and CEO of Replit

The closing session was Jason in conversation with Amjad Masad, who co-founded Replit in 2016. The framing: SaaStr lives in the future, and so do Replit’s power users. When Replit ships an agent, researchers at the labs message them surprised that the models can run autonomously for 10 hours. The thesis is that you can choose to live in that future too.

The conversation went deep on why these agents work.

Context and memory. Two years ago context windows were 16K tokens. Now they’re at a million. SaaStr’s goal is to never restart the agent, which means compaction matters enormously. Replit runs its own compaction (which Amjad argued is better than the generic ones the labs ship), plus writes to long-term memory in markdown files like a replit.md the agent maintains. The subtle point: bugs the agent already fixed should be removed from context because they confuse it, but architectural decisions should stay. That’s how the SaaStr.ai codebase, which runs around 10 apps in one place (the website, the valuation calculator used over a million times, a pitch-deck grader used 4,500 times, an API-friendliness grader), remembers how it built earlier apps to build new ones better.

Monorepo. This was a recurring recommendation. Vercel and Google use monorepos so packages work across products, and it matters even more for agents. Agent 4 made Replit itself a monorepo so you can run a whole company in one place: web app, mobile app, backend, admin, automations, all visible on one canvas. You want the agent to have access to global context, not necessarily all in the window at once, but with enough pointers to pull the right context at the right time. The tradeoff: separate apps get clean databases, but 10K and QB can’t learn from each other or share contacts.

The email that changed Jason’s mind. SaaStr asked 10K to confirm the full list of VCs who came last year but hadn’t been invited back. It admitted it had missed some titles, then found 137. Jason then asked it to write one of them, a seed investor in Replit, an email about why he should come back. The result pulled every Replit person attending, scanned 8,000 registrants to find lookalikes, and wrote something better than any human SDR could produce, and it would have taken a person a week and still shipped with typos. Notably, 10K runs largely on Haiku and Mini, not even Sonnet, working alongside the Replit agent. The campaign then went out to 331 investors with zero send failures.

The ticket sales chart. Day-by-day ticket sales this year (10K running marketing) versus last year (Amelia doing it manually) showed a gap that widened toward the end. The agent doesn’t get tired and doesn’t sit idle, so as the human got busy near the event, the agent kept pulling ahead.

Self-improving agents. The line that landed hardest: Replit now runs an internal agent that improves the Replit agent. Every night it reviews traces from everyone interacting with Replit, finds breakage and sentiment issues, generates a pull request with prompt changes, ships it as an AB test, and loops. It’s not improving its weights, but it’s improving its context, which Amjad argued is just as important. A self-improving loop, running autonomously.

Humans reporting to agents. SaaStr is hiring a human to report to 10K, and the argument is that Jason and Amelia already do, since 10K hands them three prioritized actions every day. Amjad’s framing: every company will eventually have an internal “oracle” with the full context of the business (every commit, Slack message, doc, email) that the CEO consults for strategic decisions. Replit had tried a version of this years ago with bounties, where the agent would hire a human for the human-in-the-loop tasks. The gaps agents can’t close will keep humans busy for a while.

The deflationary question. 10K and QB cost about $254 a month incrementally on Replit, more once you count the APIs they call. A mediocre marketing manager costs $140K a year to do worse work. SaaStr went from roughly 20 people five years ago to two humans plus multiple agents. Amjad’s answer: technology has always been deflationary, from tractors to genome sequencing dropping from $100M to about $1. There’s real human cost as skills become less useful, but the people who reskill will be fine. His own example: he doesn’t code anymore after a lifetime of it, and he’s at peace discarding the skill. The engineer’s role already shifted to agent manager and is heading toward a “shepherd” who keeps everyone’s software secure and in production, since marketing and sales will be shipping software too. The dividing line between who gets left behind and who doesn’t is mindset: adaptability and being a lifelong learner who isn’t a victim of sunk-cost attachment to old skills.

Top 3 takeaways from The Agents Builder Track at SaaStr AI 2026:

Context engineering is the real moat. Million-token windows plus good compaction plus a monorepo full of business context is what makes an agent feel like it has judgment. Keep architectural decisions, discard solved bugs, and give the agent global context to pull from.
Self-improving loops are here, in production. An agent that reads its own traces nightly, ships prompt-change PRs, and AB-tests them is running today. It improves context rather than weights, which is close enough to matter.
The economics are deflationary, and the answer is mindset. The best employees SaaStr has ever had cost a few hundred dollars a month. The skill that protects you is adaptability, not any specific tool you’ve mastered.

The Themes Across All Four Sessions

Step back from four very different sessions and the same handful of ideas kept surfacing.

Stair-step everything. Jason said it about digital clones, Amelia said it about 10K, Jeanne lived it with the six-week shadow-mode build. Nobody shipped a finished autonomous agent on day one. Start simple, often as a dashboard, run it with a human reviewing every output, then graduate workflows one at a time. The teams winning here are patient in a specific way: fast to start, disciplined about adding scope.

Data and context are the whole game. Vercel’s semantic layer, SaaStr feeding 10K every CSV and API it can find, Replit’s compaction and monorepo. Every strong agent in every session was strong because of what it could see and remember, not because of clever prompting. Good data equals good agents. This was the most universal point of the day.

Buy what you can, build what you can’t, and the line is moving. Jason said buy, don’t build. Jeanne said the average builder should out-ship any vendor. Both are true at once. Buy off-the-shelf to get going and to inherit better guardrails, then build the things that are unique to your business and that no vendor does as well. The build-versus-buy line shifts toward build a little more every quarter as the tools improve.

Production scale is the real test. A demo that works tells you nothing. Jeanne’s drift at scale, Jason’s silent YouTube-ingestion failure, Amelia’s guardrail catches, the cost surprises Jeanne warned about. Things break quietly. Someone has to own an hour or two a day watching for it, or the agent slowly stops doing its job and nobody notices.

The org is being rewritten, not just the tooling. Headcount of an order of magnitude less. Humans reporting to agents. Engineers becoming shepherds. Marketing and sales shipping software. The throughline across all four sessions was less “here’s a cool tool” and more “here’s a different shape of company.” Most of the room hasn’t built a 10K or a QB yet. Next year more of them will have, and the gap between the teams that started and the teams that didn’t will be the widest delta in B2B.

The cheapest, most reliable way to live in that future is the same as it’s always been: start now, start small, and keep going.

Building AI Agents That Actually Work: Lessons from Jason Lemkin, Jeanne DeWitt Grosser (Vercel), Amelia Lerutte & Amjad Masad (Replit)

Session 1: AI Agents 101 with Jason Lemkin, Founder SaaStr AI

Top 3 takeaways from Session 1

Session 2: Building Business Agents at Scale with Jeanne DeWitt Grosser, COO of Vercel

She walked through several agents:

Top 3 takeaways from Vercel:

Session 3: Build Your Own AI VP of Marketing, Live, with Amelia Lerutte, CAIO SaaStr

What 10K actually does:

Top 3 takeaways from Amelia:

Session 4: Fireside with Amjad Masad, Founder and CEO of Replit

Top 3 takeaways from The Agents Builder Track at SaaStr AI 2026:

The Themes Across All Four Sessions

Related Posts

Get The Best SaaS Advice

Industry News

Subscribe to the SaaStr Newsletter

Resources

Events

About Us

Pin It on Pinterest