ICONIQ’s latest State of AI report has a chart that summarizes a question many B2B founders and boards think about constantly: How Much Should I Spend on AI Inference?
At scaling-stage AI companies, model inference now averages 23% of total AI product costs. Talent sits at 26%. Infrastructure and cloud at 17%.
Think about that. Inference is almost as expensive as your entire AI team.
And it doesn’t go down as you scale. At pre-launch, inference is 20% of costs. By the time you hit scale, it’s 23%. Meanwhile talent drops from 32% to 26%.
The math is — the math. As you grow, you need ever more inference. You can’t cut it without degrading the product. And importantly, your well-funded competitors are spending more and more on inference to make their products ever better. If you cut back, and they don’t … you fall behind. Quickly.
So where does the money for all this inference spend come from?
- Option 1: Smaller teams. This is already happening. Talent as a percentage of costs drops 6 points from pre-launch to scaling. AI is replacing headcount, and that headcount savings is funding inference. Shopify has held headcount flat 3 years in a row, even with top tier growth at scale. We are in large part replacing human budget with inference budget.
- Option 2: Inference becomes your marketing budget. If inference spend makes your product so good that it attracts customers on its own, you can cut traditional marketing and sales spend. PLG funded by inference. But the bar here is high. Your product has to be genuinely remarkable, not just good. Still, the very best in AI / AI + B2B often spend almost nothing on traditional marketing, and have smaller sales teams that pre-AI peers.
- Option 3: Better pricing. 37% of companies in the survey plan to change their AI pricing model in the next 12 months. Outcome-based pricing jumped from 2% to 18% in six months. Usage-based from 19% to 35%. Companies are trying to pass inference costs through to customers who get the most value.
- Option 4: Model routing and efficiency. The report notes enterprise CDAOs are increasingly focused on “shifting to a cost-efficient model stack.” Frontier models aren’t necessary for most tasks. Route the majority of requests to smaller models, escalate only high-complexity cases to expensive ones. But this is also just table stakes. Products are getting better and better so quickly, and consuming so many more tokens, that mere routing doesn’t necessarily lead to overall net costs. It just helps contain them.
- Option 5: For now, venture funding. That’s OK for a select few—if growth is insane as a result. For a select few. If you’re growing 3x and burning through inference costs to get there, investors will fund it. But this only works if you’re in the top decile of growth. For everyone else, it’s just deferred pain.
There’s no easy answer. But 23% is the number. That’s your baseline for inference spend. 23% of your revenue.
If you’re spending materially more, you’re probably inefficient. If you’re spending materially less, your competitors might be building a better product.
Every B2B+AI company will have to figure this out. The ones who do will have a structural advantage. The ones who don’t will watch their margins disappear.

