OpenAI (or at least, The Information reporting on OpenAI) just dropped a bombshell: their “compute margin” — the share of revenue left after paying for the massive server costs to run ChatGPT — hit 70% in October 2025.
In January 2024, that number was 35%. They’ve essentially doubled their margin efficiency in less than two years.
If you’re a B2B founder or investor, you know exactly why this matters. A 35% gross margin is a services business. A 70% gross margin is starting to look like… software.
So the question everyone’s asking: Have AI gross margins actually turned the corner? Or is this just creative accounting on a burning pile of cash?
And the harder question for B2B founders: Does any of this actually help you?
The honest answer: probably not as much as you’d hope.
Let’s dig in.
The Headline Numbers Look Great. The Reality Is More Complicated.
Traditional SaaS is beautiful from a unit economics standpoint. You build the software once, host it cheaply, and the marginal cost of each new customer approaches zero. That’s how you get to 75-80% gross margins that make investors swoon.
AI blew that up.
Every single AI inference — every ChatGPT response, every Copilot code suggestion — burns actual compute. GPUs aren’t cheap. Electricity isn’t cheap. And when you’re processing billions of queries, the math gets ugly fast.
Here’s what we were looking at in late 2023 and early 2024:
OpenAI: ~35% compute margins (early 2024) Anthropic: Negative 94% to negative 109% gross margins in 2024 (yes, negative — they lost more on infrastructure than they made in revenue) GitHub Copilot: Microsoft reportedly losing $20+ per user per month at the $10/month price point
And now? OpenAI’s compute margin jumped from around 35% in January 2024 to roughly 70% by October 2025. Anthropic expects its gross profit margin to reach 50% this year and 77% in 2028, up from negative 94% last year.
So margins are improving at the foundation layer. That’s genuinely good news.
But here’s the catch that nobody’s talking about clearly enough: the inference cost declines are happening on older models. Frontier models are getting more expensive, not cheaper.
The Treadmill Problem: Why B2B Startups Aren’t Benefiting
This is where the optimistic narrative falls apart for anyone building applications.
“Rather than falling as expected, the cost of some of the latest AI models has risen, as they use more time and computational resources to handle complicated, multistep tasks.”
The rise of agentic workflows has caused token consumption per task to jump 10x-100x since December 2023. Models like o3, DeepSeek R1, and Grok 4 introduced multi-step reasoning processes that generate massive reasoning outputs — and you pay for every token.
One analysis found that when comparing the same coding task, an aggressive reasoning model generated 603 tokens where a simpler model generated 60 — a 10x cost jump for identical results, purely due to token bloat.
Read that again. Per-token costs are falling. But total costs per task are rising.
This is the treadmill problem. As a B2B startup, you’re constantly pressured to deliver better results. Better results require better models. Better models require more reasoning tokens. And reasoning tokens are expensive.
Here’s a concrete example: One SaaStr Fund portfolio company at $100M ARR is modeling adding another $6M in incremental inference costs over the next 12 months — not because their current product is broken, but because they need to leapfrog the competition. That’s 6 points of margin they’re voluntarily giving up just to stay ahead.
This is the dynamic nobody talks about: competition raises the bar as much as token costs lower it. Yes, GPT-4-level performance got cheaper. But your competitors aren’t shipping GPT-4-level products anymore — they’re shipping reasoning models, agentic workflows, and features that consume 10x the tokens. Stand still and you die.
OpenAI’s renewed focus on reasoning models is also a risky flex. The systems behind its Thinking and Deep Research modes are more expensive to run than standard chatbots because they chew through more compute.
AI coding assistants are particularly pressured to always offer the most recent, most advanced, and most expensive LLMs because model makers are particularly fine-tuning their latest models for improvements in coding and related tasks like debugging.
So yes, GPT-3.5-level performance is 1000x cheaper than it was three years ago. But who’s shipping products on GPT-3.5? Your users want Claude Opus. They want GPT-5.2 Thinking. They want reasoning that actually works.
And you’re going to pay for it.
The Numbers That Actually Matter for B2B Startups
Let’s look at what’s happening at the application layer, where real companies are building real products.
Bessemer’s 2025 dataset shows fast-ramping AI “Supernovas” averaging about 25% gross margin early on, while steadier “Shooting Stars” trend closer to 60%. They also note that many of the AI Supernovas have negative gross margins, something we don’t tend to see often in software.
Let me put that in context: the traditional SaaS benchmark for a “good” gross margin is 75%+. If you show up to a Series B with 55% gross margins, you’re going to have an uncomfortable conversation about whether you’re actually a software company or a services business.
On average, these AI Supernovas have only 25% gross margins, often trading distribution for profit in the short term.
That’s not a rounding error. That’s a fundamentally different business model.
Here’s the brutal math: Reports emerged that Copilot was costing Microsoft (GitHub’s parent) up to $80 per user per month in compute/model fees for heavy users, averaging a ~$20 loss per user in early 2023. In other words, for each $10 subscriber, Microsoft was eating perhaps $30 of cost on average, and much more for power users.
Microsoft can absorb those losses. Can you?
The Cursor Evolution: From 100% API Costs to Building Their Own Models
Here’s a case study that shows both the peril and the potential path forward.
In mid-2025, Cursor (made by Anysphere) was the poster child for the “thin wrapper” problem. One investment firm ran the numbers and found Cursor was paying approximately $650 million annually to Anthropic while generating roughly $500 million in revenue — a negative 30% gross margin. Their AWS bills doubled from $6.2 million to $12.6 million in a single month when Anthropic launched Priority Service Tiers.
The company’s response? Build their own models.
In October 2025, Cursor launched “Composer” — their first proprietary coding LLM. It’s a reinforcement-learned mixture-of-experts model trained specifically for agentic coding workflows, running 4x faster than comparable frontier models while maintaining similar quality. Research scientist Sasha Rush described it as training “a big MoE model to be really good at real-world coding, and also very fast.”
The benchmarks show Composer matching “mid-frontier” systems (think GPT-5 and Claude Sonnet 4.5 territory) while generating at 250 tokens per second — twice as fast as leading fast-inference models. Cursor still offers Anthropic, OpenAI, and Google models, but increasingly routes traffic to their own infrastructure.
The result? By November 2025, Cursor crossed $1 billion in annualized revenue at a $29.3 billion valuation. One analysis projects gross margins improving from 74% to 85% by 2027 as they migrate to a mix of open-source and proprietary models. As of December 2025, the company has single-digit monthly cash burn and $1 billion in cash reserves.
The lesson: Cursor survived the margin squeeze by doing what most startups can’t — investing hundreds of millions in building their own model infrastructure. Their CEO confirmed their in-house models “now generate more code than almost any other LLMs in the world.”
But here’s the uncomfortable truth: This path required $3.5 billion in total funding and a willingness to burn cash for years while building proprietary AI infrastructure. It’s not a playbook most B2B startups can replicate.
The Real Framework: Why This Is Different From Traditional SaaS Economics
AI startups face a unique economic challenge: compute costs that scale super-linearly with model size and usage. While traditional software companies see marginal costs approach zero as they scale, AI companies face GPU bills that can grow faster than revenue.
Traditional SaaS had variable costs too — hosting, payment processing, customer support. But those costs were modest relative to revenue, and they scaled sub-linearly with usage. More users meant better economics.
AI flips that. A company with $10M ARR and $15M in compute costs looks identical to one with $10M ARR and $2M in compute costs when you’re applying a 20x revenue multiple. But their fundamental value is completely different.
Three primary forces are compressing SaaS margins this year: rising cloud costs, the expenses associated with AI inference, and increasing support salaries.
And here’s the kicker: SaaS companies that once boasted 85% margins are now adjusting to 60-70% margins or even charging separately to return to 80% margins with a blended model.
The AI tax is compressing margins across the entire software industry, not just AI-native companies.
What Actually Works: The Companies Getting Margin Math Right
So what separates the companies making this work from the ones burning cash?
1. Intelligent Model Routing
The companies winning on margins aren’t using frontier models for everything. They’re building routing layers that send simple queries to cheap models and complex queries to expensive ones.
The key question is simple. Does your use case require the top model on every request, or do you only need to meet a quality bar. If you can meet a bar, routing lets you send most traffic to cheaper models and burst to the frontier when needed.
2. Usage-Based Pricing That Actually Works
A 2025 industry report found 92% of AI software companies now use mixed pricing models — combining subscriptions with usage fees, or offering different tiers for heavy usage — precisely to tackle the margin issue.
The “unlimited” model is dead. By mid-2025, GitHub announced that the formerly “unlimited” Copilot would include a generous allowance of AI requests, but beyond that, custom pricing would apply.
If you’re still offering unlimited AI usage at a flat price, you’re subsidizing power users with money you probably don’t have.
3. Building Value Beyond Token Markup
Look at Replit’s evolution. They charge $25/month for Core (or $20 billed annually), which includes $25 in usage credits. But here’s the key insight: according to the Replit team, it costs them approximately $4 to host the average customer’s website. That’s a roughly 80%+ margin on the hosting layer — classic SaaS economics.
The real margin comes from what happens beyond the base AI assistance — hosting, deployments, storage, and bandwidth that users consume once their projects are live. Replit’s Bounties marketplace takes a 10% fee from posters, which is a clean non-inference take rate.
Replit’s overall gross margins have fluctuated wildly — up to a reported 36% by late 2025, up from a negative 14% at start of 2025 — driven by the cost of accessing large language models for their AI Agent. But by layering subscription revenue with high-margin hosting infrastructure and marketplace fees, they’ve built a model where AI is the hook but infrastructure is the margin. They switched from flat 25-cent pricing per coding task to “effort-based pricing” that can reach $2 per complex task — directly passing the cost variability to users while keeping the predictable hosting revenue for themselves.
The companies surviving the margin squeeze are the ones building enough product depth that they’re not just marking up API calls.
4. Proprietary Models (The Nuclear Option)
The Cursor playbook shows what’s possible: build your own models to escape the margin squeeze. Their Composer model — trained on real software engineering tasks using reinforcement learning — now handles most of their inference volume at a fraction of what Anthropic charges.
But let’s be honest about what this requires: Cursor built custom reinforcement learning infrastructure using PyTorch and Ray across thousands of NVIDIA GPUs. They developed specialized MoE kernels and hybrid sharded data parallelism. This isn’t a weekend project — it’s a nine-figure R&D commitment.
For most startups, the realistic version is more modest: fine-tune open-source models (Qwen, Llama, Mistral) for your specific use case, then route as much traffic as possible to those cheaper models while reserving frontier models for edge cases.
Is VC Subsidizing a Structurally Unprofitable Industry?
We’re living through a period where AI is effectively subsidized. Even as inference becomes 50–100× cheaper every few years, prices remain below true economic cost, propped up by Big Tech, leading labs, and their backers.
OpenAI is still burning $8 billion a year. Anthropic is burning billions. The AI coding startups losing money on every user are still raising at billion-dollar valuations.
The optimistic view: this is the “invest to win the market” phase, and unit economics will improve.
The skeptical view: “That’s what everyone’s banking on,” said Erik Nordlander, a general partner at Google Ventures. “The inference cost today, that’s the most expensive it’s ever going to be.” It’s not entirely clear how true that is.
The Bottom Line for B2B Founders
Here’s what I’d tell any founder building an AI-native B2B product right now:
The good news is real, but limited. OpenAI and Anthropic improving their margins means the infrastructure layer is becoming more sustainable. That’s genuinely important. It means these companies are more likely to survive and continue providing APIs you depend on.
But the benefits don’t flow down automatically. Inference costs falling 50x per year sounds amazing until you realize that (a) those declines are on older models, (b) frontier models are actually getting more expensive, and (c) user expectations force you to use frontier models.
The treadmill is real. You will be pressured to use better models. Better models will cost more per task even if they cost less per token. Your margins will stay compressed unless you actively fight the dynamic.
The survival playbook is clear:
- Don’t build thin wrappers. Build deep workflow products with multiple revenue streams.
- Don’t offer unlimited usage. Price for the cost curve you actually have.
- Don’t assume your API provider is your friend. They’re building competing products.
- Don’t ignore routing and optimization. The companies winning are religious about cost efficiency.
The most important metric isn’t your gross margin today — it’s your gross margin trajectory. Are you getting more efficient as you scale, or are you on a treadmill where every product improvement costs you margin?
OpenAI going from 35% to 70% is a proof point that the treadmill can be escaped at the foundation layer. The open question is whether application-layer companies can do the same — or whether they’re permanently caught between falling API costs and rising user expectations.
My honest take: some will win big. Most will struggle. And a lot of venture dollars are going to evaporate in the process.
Welcome to the real economics of AI in B2B.

