The $20/month AI subscription is a gamble, not a business model. Anthropic, OpenAI, and Google are all betting that most subscribers won’t actually use the product they’re paying for. Sam Altman confirmed in January 2025 that even the $200/month ChatGPT Pro tier was losing money, a price he personally set, one he “thought we would make some money” on. When $200/month can’t cover costs, the math at $20 is substantially worse.
This isn’t a startup growing pain. It’s structural, and only one of the three major frontier labs has solved it.
What tokens actually cost to serve
The gap between what labs charge and what they pay to run inference reveals the real state of the business. As of March 2026, Claude Sonnet 4.5/4.6 charges $15/M output tokens against an estimated actual cost of roughly $9/M, about 40% gross margin. GPT-4o at $10/M output earns 46–70% gross margin, a wide range because OpenAI’s accounting of cost of goods sold varies significantly depending on what’s included. Gemini 2.5 Pro at $10/M output likely exceeds 50%; Gemini 2.5 Flash at $2.50/M probably clears 60%.
Anthropic’s margin trajectory is the most dramatic in the industry. In 2024, it reported a negative 94% gross margin, losing nearly a dollar for every dollar earned. By 2025, that had swung to approximately 40%, though still below its 50% internal target after cloud infrastructure costs came in 23% higher than expected. The company targets 77% by 2028.
Google’s advantage is structural, not marginal. Midjourney’s migration from Nvidia H100s to Google TPU v6e cut its monthly inference bill from $2.1M to under $700K, a 67% reduction. Google processes over 10 billion tokens per minute and earns 30% operating margins in its Cloud segment. Competitors paying GPU rates cannot reach those numbers, full stop.

The subscription math doesn’t close for heavy users
A light user sending 30 queries daily on Sonnet or GPT-4o costs providers roughly $9/month to serve. At $20, that works. A moderate user at 80 daily queries in mixed-model sessions costs $15–45/month. A heavy user at 200+ messages daily (particularly with reasoning models like o3 or Claude extended thinking) costs $50–200+ per month to serve. The subscription doesn’t price that.
Usage caps exist precisely because the math fails without them. ChatGPT Plus limits GPT-4o to 150 messages per 3-hour window and o3 to 100 per week. Claude Pro caps at roughly 45 messages per 5-hour window. Gemini Advanced silently downgrades heavy users to cheaper Flash models mid-session. These aren’t product constraints. They’re the mechanism that makes flat-rate pricing viable at all.
The underlying data makes this harder to wave off. OpenAI’s research with Harvard found that early-adopter cohorts now send 40% more messages daily than two years ago, and per-user message volume grew 5.8× while the user base grew 3.2×. Existing users consume more over time, not less. The cross-subsidy required to cover heavy users gets larger, not smaller.

Three layers of cross-subsidy
The flat-rate model works only if three subsidy layers hold simultaneously.
Light paid users cover heavy paid users. A subscriber using Claude for 10 casual queries daily generates roughly $15 in gross margin, offsetting a power user’s $100+ monthly cost. Anthropic’s usage cap affects fewer than 5% of Claude Pro subscribers, meaning 95% of the user base is net positive. Paid users cover free users entirely: 95% of ChatGPT’s 800M+ users pay nothing, but free users consume only 18% of compute. Enterprise customers cover consumer pricing. Anthropic earns roughly 80% of its revenue from enterprise and API sales, not consumer subscriptions. That insulates it from the full weight of this dynamic. OpenAI earns 75% from consumer subscriptions. That contrast explains much of the difference in their trajectories.
OpenAI’s head of ChatGPT has called current pricing “accidental.” Leaked plans suggest a $100/month “Pro Lite” tier, and Altman has hinted at usage-based pricing to replace flat subscriptions. The all-you-can-eat model at $20 may not survive contact with IPO requirements.
Token costs are collapsing, but usage grows faster
Epoch AI found LLM inference prices falling between 9× and 900× per year, with a median decline of 50× annually. After January 2024, that accelerated to 200× per year. A16z documented GPT-3-quality inference dropping from $60/M tokens in November 2021 to $0.06/M by late 2024, a 1,000× reduction in three years. Stanford HAI confirmed a 280× decline for GPT-3.5-level performance in just 18 months.
SemiAnalysis provided the hardware-level detail: Nvidia’s B200 GPUs now hit $0.02 per million tokens on open-source models, a 5× improvement in two months through software optimisation alone. The GB200 NVL72 system turns a $5M hardware investment into $75M in token revenue, a 15× ROI when serving DeepSeek R1.
The problem is that usage grows faster than cost falls. OpenAI’s inference costs hit $8.4B in 2025, more than double the $3.8B spent in 2024, against $13.1B in revenue. Reasoning models (o3, Claude extended thinking) consume 10–50× more tokens per task than standard completions. O1 output tokens still cost $60/M, comparable to GPT-3 at launch. Cheaper commodity inference doesn’t solve premium frontier costs, which is where product differentiation lives.

Three companies, three trajectories
Google Cloud posted 30% operating margins in Q4 2025, growing at 48% year-over-year. Its TPU advantage eliminates Nvidia’s GPU margin premium from the cost stack entirely, a 2–4× cost advantage that compounds with scale. AI products attached to a $403B revenue ecosystem don’t need to be independently profitable. The $99/year promotional Gemini subscriptions reflect confidence, not desperation.
Anthropic’s revenue grew from $1B ARR in December 2024 to roughly $19B ARR by March 2026, driven by Claude Code ($2.5B annualised) and enterprise API sales. Its $21B TPU purchase agreement with Google is a deliberate move toward the same infrastructure economics that make Google profitable. Cash-flow breakeven is targeted for 2028.
OpenAI projects cumulative losses of roughly $74B by 2028 and doesn’t expect cash-flow positivity until 2029–2030. Its 2025 inference costs of $8.4B are projected to reach $14.1B in 2026, and it owes Microsoft 20% of all revenue through 2032. A custom inference chip, taped out mid-2026, is an acknowledgment that GPU-dependent economics are unsustainable at this scale.

The counterargument worth taking seriously
Sequoia Capital’s David Cahn calculated a $600B annual revenue gap between what AI companies need to earn to justify infrastructure spending and what they actually generate. Barclays estimated the industry needs 12,000 ChatGPT-sized products to justify current CapEx. Deloitte called 2026 “the year of Inference Famine.” These aren’t fringe views.
But cost curves have consistently surprised to the downside, faster and further than any analyst projected. If the 200× annual decline rate holds for three more years, per-token economics look entirely different. Usage-based pricing, in early testing at both OpenAI and Anthropic, could resolve the flat-rate cross-subsidy problem without requiring most users to stop showing up.
The unresolved question is whether platform dynamics matter more than unit economics. Google captured search before it had a clear ad model. If one lab captures the coordination layer for AI applications (agents, codegen pipelines, enterprise workflows), per-token margins become secondary. That’s the bet Anthropic is making with Claude Code. It’s the bet OpenAI is making with everything at once.
For now, the economics are what they are. Every frontier lab except Google loses money serving active users. The subscriptions work because most people don’t show up. When they do, the math changes. The pricing will too.
The “improving trajectory” story for AI economics relies on a compound bet: that inference costs keep falling at their current extraordinary rate, that usage growth moderates, that enterprise revenue scales without margin compression, and that company projections hold. Each of those assumptions has already been wrong at least once. Holding all four simultaneously is what the 2028 and 2029 breakeven forecasts require.
That is not impossible. It is not a safe assumption either.
The numbers look better than the business is
Anthropic’s swing from -94% to +40% gross margin in a year is genuinely impressive. It is also, if you stop and hold the number, a company that still loses 60 cents on every dollar of revenue in gross profit alone, before R&D, safety research, training costs, or salaries. The improvement is real. The conclusion that profitability is therefore on track is a bigger leap than the numbers support.
OpenAI’s margin figures are particularly hard to interpret. Its “compute margin” (revenue after Azure inference costs for paying users only) reached roughly 70% by October 2025. Its full GAAP-style gross margin is 33–48%, depending on what you count. That 37-point range is not analysis. It is a negotiation with accounting methodology. And neither figure accounts for the $8.4B in 2025 inference costs against $13.1B in revenue, or the 20% Microsoft revenue share that runs through 2032.
A company projecting $74B in cumulative losses by 2028 is not on a trajectory that the phrase “improving unit economics” adequately describes.

Cost deflation cannot outrun usage growth indefinitely
The 1,000× reduction in GPT-3-equivalent inference costs over three years is real and remarkable. A16z’s “LLMflation” data, Epoch AI’s 200× annual decline rate, SemiAnalysis’s hardware benchmarks: the cost side of the ledger is moving in the right direction at an extraordinary pace.
The problem is the denominator. OpenAI’s inference costs hit $8.4B in 2025, more than double the $3.8B spent in 2024, against $13.1B in revenue. Inference costs are projected to reach $14.1B in 2026. The cost-per-token is falling; total compute spend is rising. These two facts are not in contradiction. Usage is growing faster than efficiency improves, and that dynamic has persisted despite every efficiency gain.
Reasoning models make this worse. O3 and Claude extended thinking consume 10–50× more tokens per task than standard completions. O1 output tokens cost $60/M today, comparable to GPT-3 at launch. The frontier of the product, where differentiation lives and customers pay premium prices, is not deflating at the same rate as commodity inference. The more capable the model, the more compute it burns per task. Every improvement in model capability pushes inference costs back up.

The subscription model is more fragile than usage averages suggest
The cross-subsidy arithmetic that makes flat-rate subscriptions viable depends on usage patterns staying roughly where they are today. They are not. OpenAI’s research with Harvard found that early-adopter cohorts now send 40% more messages daily than two years ago, and per-user message volume grew 5.8× while the user base grew only 3.2×. Existing users consume more over time, not less.
This is a structural trend, not a cohort quirk. As AI becomes embedded in daily work, the “light user who generates margin” increasingly becomes the “moderate user who breaks even.” Anthropic’s 5% cap-hit rate is a 2025 figure. If the usage trend holds, that percentage rises, and so does the cost to serve the median subscriber.
Usage caps are the safety valve, not the solution. When a customer hits a cap mid-task and gets downgraded to a slower model, that is a product failure. Doing it more often as usage grows is not a path to healthy economics. It is a path to subscriber churn.
The revenue gap is larger than the trajectory accounts for
Sequoia Capital’s David Cahn calculated a $600B annual revenue gap between what AI companies need to earn to justify their infrastructure spending and what they generate. Barclays estimated 12,000 ChatGPT-sized products are needed to close the gap. Deloitte called 2026 “the year of Inference Famine,” with demand growing faster than cost reductions can offset.
These are not fringe analyses. They reflect the fundamental tension between what has been built and what the market currently pays. Google alone plans to spend $175–185B on infrastructure in 2026; the cumulative CapEx across the industry bets on a market that does not yet exist at the required scale. If enterprise adoption grows as projected, the math eventually works. If it plateaus or compresses margins through competition, the timeline extends indefinitely.
Anthropic’s 2028 breakeven target already assumes its infrastructure costs come in on plan, the same forecast revised down from 50% to 40% gross margin after cloud costs came in 23% higher than expected. One such revision in the wrong direction pushes the target to 2030. Two pushes it beyond the planning horizon.
Google’s advantage may not be the template it appears
Google Cloud’s 30% operating margins are real and significant. But Google’s structural advantage (custom TPU silicon at 2–4× lower inference cost) is not replicable by the independent labs on any near-term timeline. Anthropic’s $21B TPU purchase agreement with Google helps, but it also creates a dependency on the competitor it is racing against. The advantage that makes Google profitable is the same advantage that makes it a more attractive infrastructure provider than any alternative.
The antitrust risk is not negligible. Google already operates under scrutiny on search and advertising. An AI infrastructure market where Google Cloud becomes the de facto provider for competitors who cannot match TPU economics will draw attention. That does not resolve in Google’s favour on a five-year horizon.

The unit economics of frontier AI are improving. The pace of that improvement is real. What the optimistic read misses is how far the starting point was from sustainable, how many assumptions the trajectory relies on holding simultaneously, and how many of those assumptions have already needed revision once.
The race to profitability has clear waypoints. Whether the labs hit them depends on variables none of them fully control.

The unit economics of frontier AI are being solved. Not eventually, not in theory. In the numbers right now. Anthropic swung from -94% gross margin to +40% in a single year. Google Cloud is already posting 30% operating margins on AI infrastructure. Inference costs are falling 200× per year. Anyone reading the current economics as a permanent condition is reading the wrong signal.
The question isn’t whether these businesses reach profitability. It’s who gets there first and how wide the gap becomes.
The margin trajectory is the story
Anthropic’s gross margin went from negative 94% in 2024 to approximately 40% in 2025. That is the fastest margin improvement of any independent AI lab, driven by a disciplined strategic choice: 80% of revenue from enterprise and API sales, not from consumer subscriptions with unpredictable usage. The company targets 77% gross margin by 2028 and cash-flow breakeven the same year. Those targets are not aspirational. They are the logical extension of a trajectory that has already outpaced expectations.
GPT-4o earns 46–70% gross margin at current pricing. Claude Sonnet earns approximately 40%. These are real margins on every token served. The business model works at the token level. What has been missing is volume and cost structure, and both are improving simultaneously.
Google is proof the model works at scale. Custom TPU silicon eliminates Nvidia’s GPU margin premium from the cost stack, delivering a 2–4× cost advantage that compounds with every hardware generation. Midjourney cut its monthly inference bill by 67% migrating to TPU v6e. Google Cloud posted 30% operating margins in Q4 2025, growing at 48% year-over-year. This is not an outlier. It is the destination.

Cost deflation is not slowing
Epoch AI documented LLM inference prices falling between 9× and 900× per year, with a median decline accelerating to 200× per year after January 2024. GPT-3-quality inference dropped from $60/M tokens in 2021 to $0.06/M by 2024, a 1,000× reduction in three years. SemiAnalysis found Nvidia’s B200 GPUs reaching $0.02 per million tokens on open-source models, a 5× improvement in just two months through software optimisation alone.
The GB200 NVL72 system turns a $5M hardware investment into $75M in token revenue, a 15× ROI when serving DeepSeek R1. These numbers matter because they extend the trend forward: if the cost structure improves at anything close to the current rate, the economics of even the most demanding workloads flip within a few years.
The objection that reasoning models are expensive is correct but incomplete. O3 and Claude extended thinking cost more per task today. They also represent a small fraction of total volume. Commodity inference, which makes up the bulk of usage, is getting dramatically cheaper and that trend is not slowing.

The subscription model is more durable than the critics think
The $20/month subscription is frequently cited as evidence that AI economics are broken. The actual data tells a different story. Anthropic’s usage cap affects fewer than 5% of Claude Pro subscribers. 95% of paying users generate healthy gross margins on every interaction. ChatGPT’s free tier represents 62% of active accounts but consumes only 18% of compute. The cross-subsidy structure works precisely because most users’ natural usage patterns fall within the profitable range.
Sam Altman’s comment about losing money on the $200/month Pro tier has been widely circulated as a sign of structural failure. It reflects a specific cohort: power users who signed up for an unlimited-access tier explicitly to use it heavily. That cohort is not representative. The light-to-moderate usage that defines the median subscriber is exactly where the economics are solid.
Usage caps function as automatic margin protection. They are not a sign the model is failing. They are the mechanism that makes it work at scale.

Enterprise revenue is the margin engine
Anthropic’s B2B focus is not a compromise. It is the correct bet. Enterprise customers pay premium API rates with predictable, bounded usage patterns. They do not behave like the heavy-use outliers that make consumer subscriptions expensive. The 80% enterprise revenue mix insulates Anthropic from the worst dynamics of consumer economics and funds the margin improvement that makes 2028 breakeven credible.
The lesson OpenAI is learning slowly, Anthropic built in from the beginning. 75% of OpenAI’s revenue comes from consumer subscriptions, the segment with the thinnest margins and the most unpredictable usage patterns. That is why OpenAI’s cumulative loss projection through 2028 is $74B while Anthropic targets breakeven the same year on a fraction of the revenue.
OpenAI’s head of ChatGPT has called current pricing “accidental.” Usage-based pricing is already in testing. When that shift lands, the economics of consumer AI will look far more like enterprise economics, and the cross-subsidy problem largely resolves itself.
The labs on the right trajectory will win
Google Cloud is profitable now. Anthropic has a credible path to 2028. The companies that understand their cost structure, lean into enterprise revenue, and position their infrastructure for the next hardware generation are not waiting for profitability. They are executing toward it.
The $20 subscription era is ending not because the model failed but because it succeeded well enough to create the user base that justifies moving to pricing that reflects actual value. That is not a problem. That is a business maturing.
The unit economics of frontier AI are genuinely hard, and the numbers are not yet where they need to be for most labs. But the trajectory is unambiguous. The labs reading this as a crisis are behind. The labs reading it as a transition problem with a known solution are right.

Comments
Loading comments…
Leave a comment