A lot of folks ask why our SaaStr AI on our homepage is so good.
Look, I get it. Everyone’s launching AI tools these days. Most of them are pretty mediocre. Some are downright terrible. But ours actually works. Really works.
So what’s the secret?
The Foundation: 18 Million Words of Training Data
First, the obvious part. Our AI has been trained on 18,000,000 words of SaaStr content. That’s not just blog posts. That’s:
- Every single SaaStr post and answer for past 12 years
- Every single SaaStr Annual transcript from the past 12 years
- Thousands of interviews with SaaS founders and executives
- Deep-dive case studies on companies from $1M to $100M ARR
- All our playbooks, frameworks, and tactical content
- Every single tweet and YouTube from me and saastr
- Years of Q&A sessions with the community
But here’s the thing everyone misses: data alone isn’t enough.
We launched it without any further training and QA — and with that amount of data to underpin it, it was often good. But it also made too many errors.
The worst one? It kept telling folks the wrong dates for the next SaaStr events. Why? Because we hadn’t announced the dates yet. So it kept making them up.
The Real Secret: Daily QA for 60+ Days
So for the first 60 days after we launched, I personally QA’d our AI every single day. Not once a week. Not “when I had time.” Every. Single. Day.
15-20 minutes. Every morning. Like clockwork.
I’d review 100+ of the questions folks asked the day before. And I’d ask it tricky questions. I’d test edge cases. I’d see where it hallucinated or gave suboptimal answers. And then I’d just fix it.
I’d write the correct answer, and then input that and the question into the training section of the AI. Every day, again and again, and again.

By Day 60, I scaled this back to about once a week.
Learnings From Other AI Leaders
I dug into how the best AI companies handle this. From Harvey to Palantir to Scale AI to OpenAI. And guess what? They all do the same thing.
Harvey’s Approach: Custom-Trained Models + Human QA
Harvey partnered with OpenAI to create a custom-trained case law model and added the equivalent of 10 billion tokens worth of data to power the custom-trained case law model. But the real magic? They worked with 10 of the largest law firms and provided attorneys with side-by-sides of the output from the custom case law model, versus the output from GPT‑4 for the same question. 97% of the time, the lawyers preferred the output from the case law model.
Harvey didn’t just dump data into a model and call it good. They continuously tested with real experts. They iteratedbased on actual user feedback. They fine-tuned based on what worked in practice.
Sound familiar?
Palantir’s Forward Deployed Engineers
Palantir basically invented the “Forward Deployed Engineer” model. Forward Deployed AI Engineers work directly with customers owning Gen AI strategy and implementation. On a daily basis, you will build end-to-end workflows, take them to production, and solve real world problems at the largest scale.
These aren’t just engineers. They’re AI trainers in the field. They work with customers to understand their specific use cases, then iterate on the AI until it actually works for real-world scenarios.
For the better part of the last decade, it’s been broadly assumed that product-led growth (PLG) is superior to implementation-heavy enterprise software. But AI has changed that. Enterprise software companies that tackle complex workflows are regularly growing from $0 to $5 million, $10 million, or beyond $20 million in ARR in their first two years.
How? Daily hands-on training and QA.
Scale AI’s Data Engine
Scale AI gets this better than anyone. Scale’s Data Engine enables you to integrate your enterprise data into the fold of these models, providing the base for long-term strategic differentiation.
But it’s not just about the data. Forward Deployed Engineers (FDEs) work directly with industry leading Generative AI companies, build and own robust, production-grade data integrations used to power advanced foundational models.
They have dedicated engineers whose job is to make AI work for each specific customer. Daily. Hands-on. Until it’s right.
Gorgias’ AI for 18,000 SMBs in eCommerce
Gorgias has the same challenge as the rest — but with an added challenge. Their average customer pays less than $10k a year for support and marketing in the Shopify ecosystem. They learned in heavily on building an autonomous AI, but still — the first 30 days of training are crucial. There can’t be any errors when you are using it to buy the right product at the right time.
The OpenAI Reality
Even OpenAI, the godfather of AI, does this. As of this writing, 22 of the 311 open roles on OpenAI’s career page fall into these categories. Forward deployed engineers and solutions engineers.
Why? Because if OpenAI is a company that sells token predictions, executives don’t know what to do with it; if it’s a company that answers customer support questions, moderates a forum, helps write code, summarizes emails, etc., it’s a company whose use cases are too broad to connect to any specific customer.
Why This Matters More Than You Think
See, here’s what I learned: Untrained AIs are incredibly good at being consistently mediocre.
They’ll give you answers that sound right. That pass the sniff test. That would probably get a B+ in business school.
But B+ answers don’t build unicorns.
The magic happens when you train your AI on the >right< answers. Not the obvious ones. The counterintuitive ones. The ones that only come from years of experience and thousands of conversations with operators.
Fine-tuning is a powerful approach in natural language processing (NLP) and generative AI, allowing businesses to tailor pre-trained large language models (LLMs) for specific tasks. This process involves updating the model’s weights to improve its performance on targeted applications.
But the key insight from my research? To achieve optimal results, having a clean, high-quality dataset is of paramount importance. A well-curated dataset forms the foundation for successful fine-tuning.
That’s what daily QA gives you. Clean, high-quality, curated data about what actually works.
The Forward Deployed Engineer Reality
You know why every AI company talks about “Forward Deployed Engineers” now?
Because B2B AIs don’t magically work out of the box.
Enterprises buying AI are like your grandma getting an iPhone: they want to use it, but they need you to set it up.
I work with or have invested in dozens of AI companies. Want to know the dirty secret? Almost every B2B AI company I work with and in my portfolio has a 30 day or so onboarding process, driven by humans. And that’s >after< the AI ingests all your data.
The companies that are winning? They’re the ones doing the work.
Take a company like Decagon, whose agents automate customer support. The company has a sizable team of human “Agent Product Managers” who work closely with customers to stand up AI support agents.
Another pattern I found: Forward Deployed Engineers work directly with businesses to understand their unique processes, challenges, and expert knowledge. Their goal? To capture this valuable expertise and transform it into something AI systems can understand and learn from.
This is exactly what I was doing with our daily QA sessions. Capturing the nuanced knowledge that makes the difference between generic answers and genuinely useful insights.
The SMB Problem (And Opportunity)
If your ACV is $50K+, this is solvable. You hire Forward Deployed Engineers. They do the training. They handle the edge cases. They make your AI actually work.
But what if you’re selling to SMBs? What if your ACV is $5K? Who’s going to do 30 days of training for each customer?
This is where the magic happens. A baseline of domain knowledge will be key in early hires for AI startups to successfully capture expert reasoning tokens.
The companies that figure out how to systematize this training process will win the SMB market. They’ll capture the expert knowledge once, then deploy it at scale.
That’s exactly what we did with SaaStr AI. I did the manual training for 60+ days. Now that knowledge is baked into the system and works for thousands of users.
What We’re Doing Now
We’re piloting 2 more AI tools right now. Both are great. Both are relatively early.
And guess what? Same process. Every single day.
My team knows the drill. Every morning, someone is QAing the outputs. Looking for errors. Finding edge cases. Training the model on better answers.
It’s not glamorous. It’s not scalable. But it works.
Thorough testing is non-negotiable when it comes to AI model deployment. This goes beyond simple accuracy metrics to include stress testing, edge case analysis, and robustness evaluations.
We’ve learned from the best:
- Harvey’s 97% preference rate came from continuous expert validation
- Palantir’s Forward Deployed Engineers ensure real-world success
- Scale AI’s Data Engine integrates customer-specific knowledge
- OpenAI’s enterprise focus requires human-guided implementation
The Bottom Line
Can an AI QA itself? Sure, to some extent. But at least today, not enough. Not to get you to where you really want to be.
If you want your AI to be truly great, you need to:
- Train it on the right data (not just any data)
- QA it every single day for at least 60 days
- Hunt out the errors obsessively
- Fix the hallucinations immediately
- Train it on the >right< answers (not just correct ones)
- Capture expert reasoning like the Forward Deployed Engineers do
- Iterate based on real user feedback like Harvey did with law firms
Do this, and your AI gets better fast. Really fast.
Skip it, and you’ll have another mediocre AI tool that nobody really wants to use.
The forward deployed model of top-down selling is upending the SaaS GTM playbooks that were written for CRUD apps with zero marginal costs in the 2010s.
The choice is yours.
But don’t kid yourself about the work required. Great AI isn’t magic. It’s just really, really good training.
And if you’re not willing to do the daily QA work? If Palantir can attain best-in-class software margins, why wouldn’t AI-native startups be able to do the same, especially if the quality of the outcomes heavily depends on the inputs derived from the forward deployed approach?
The companies that understand this will build the next generation of AI unicorns. The ones that don’t will wonder why their AI never quite works as well as they hoped.
