sdlcnext.com
← All posts
ai economics anthropic openai google llm unit-economics

Frontier AI Is Profitable Only If You Don't Show Up

Every frontier AI lab except Google loses money serving active users. The $20/month subscription works only because most subscribers barely use it. Here's what the numbers actually say.


Viewpoint

The $20/month AI subscription is a gamble, not a business model. Anthropic, OpenAI, and Google are all betting that most subscribers won’t actually use the product they’re paying for. Sam Altman confirmed in January 2025 that even the $200/month ChatGPT Pro tier was losing money, a price he personally set, one he “thought we would make some money” on. When $200/month can’t cover costs, the math at $20 is substantially worse.

This isn’t a startup growing pain. It’s structural, and only one of the three major frontier labs has solved it.

What tokens actually cost to serve

The gap between what labs charge and what they pay to run inference reveals the real state of the business. As of March 2026, Claude Sonnet 4.5/4.6 charges $15/M output tokens against an estimated actual cost of roughly $9/M, about 40% gross margin. GPT-4o at $10/M output earns 46–70% gross margin, a wide range because OpenAI’s accounting of cost of goods sold varies significantly depending on what’s included. Gemini 2.5 Pro at $10/M output likely exceeds 50%; Gemini 2.5 Flash at $2.50/M probably clears 60%.

Anthropic’s margin trajectory is the most dramatic in the industry. In 2024, it reported a negative 94% gross margin, losing nearly a dollar for every dollar earned. By 2025, that had swung to approximately 40%, though still below its 50% internal target after cloud infrastructure costs came in 23% higher than expected. The company targets 77% by 2028.

Google’s advantage is structural, not marginal. Midjourney’s migration from Nvidia H100s to Google TPU v6e cut its monthly inference bill from $2.1M to under $700K, a 67% reduction. Google processes over 10 billion tokens per minute and earns 30% operating margins in its Cloud segment. Competitors paying GPU rates cannot reach those numbers, full stop.

API pricing and estimated gross margins across frontier models, March 2026

The subscription math doesn’t close for heavy users

A light user sending 30 queries daily on Sonnet or GPT-4o costs providers roughly $9/month to serve. At $20, that works. A moderate user at 80 daily queries in mixed-model sessions costs $15–45/month. A heavy user at 200+ messages daily (particularly with reasoning models like o3 or Claude extended thinking) costs $50–200+ per month to serve. The subscription doesn’t price that.

Usage caps exist precisely because the math fails without them. ChatGPT Plus limits GPT-4o to 150 messages per 3-hour window and o3 to 100 per week. Claude Pro caps at roughly 45 messages per 5-hour window. Gemini Advanced silently downgrades heavy users to cheaper Flash models mid-session. These aren’t product constraints. They’re the mechanism that makes flat-rate pricing viable at all.

The underlying data makes this harder to wave off. OpenAI’s research with Harvard found that early-adopter cohorts now send 40% more messages daily than two years ago, and per-user message volume grew 5.8× while the user base grew 3.2×. Existing users consume more over time, not less. The cross-subsidy required to cover heavy users gets larger, not smaller.

Consumer subscription cost to serve vs. $20/month across light, moderate, and heavy usage

Three layers of cross-subsidy

The flat-rate model works only if three subsidy layers hold simultaneously.

Light paid users cover heavy paid users. A subscriber using Claude for 10 casual queries daily generates roughly $15 in gross margin, offsetting a power user’s $100+ monthly cost. Anthropic’s usage cap affects fewer than 5% of Claude Pro subscribers, meaning 95% of the user base is net positive. Paid users cover free users entirely: 95% of ChatGPT’s 800M+ users pay nothing, but free users consume only 18% of compute. Enterprise customers cover consumer pricing. Anthropic earns roughly 80% of its revenue from enterprise and API sales, not consumer subscriptions. That insulates it from the full weight of this dynamic. OpenAI earns 75% from consumer subscriptions. That contrast explains much of the difference in their trajectories.

OpenAI’s head of ChatGPT has called current pricing “accidental.” Leaked plans suggest a $100/month “Pro Lite” tier, and Altman has hinted at usage-based pricing to replace flat subscriptions. The all-you-can-eat model at $20 may not survive contact with IPO requirements.

Token costs are collapsing, but usage grows faster

Epoch AI found LLM inference prices falling between 9× and 900× per year, with a median decline of 50× annually. After January 2024, that accelerated to 200× per year. A16z documented GPT-3-quality inference dropping from $60/M tokens in November 2021 to $0.06/M by late 2024, a 1,000× reduction in three years. Stanford HAI confirmed a 280× decline for GPT-3.5-level performance in just 18 months.

SemiAnalysis provided the hardware-level detail: Nvidia’s B200 GPUs now hit $0.02 per million tokens on open-source models, a 5× improvement in two months through software optimisation alone. The GB200 NVL72 system turns a $5M hardware investment into $75M in token revenue, a 15× ROI when serving DeepSeek R1.

The problem is that usage grows faster than cost falls. OpenAI’s inference costs hit $8.4B in 2025, more than double the $3.8B spent in 2024, against $13.1B in revenue. Reasoning models (o3, Claude extended thinking) consume 10–50× more tokens per task than standard completions. O1 output tokens still cost $60/M, comparable to GPT-3 at launch. Cheaper commodity inference doesn’t solve premium frontier costs, which is where product differentiation lives.

Token price deflation: GPT-3 equivalent inference cost from 2021 to 2026

Three companies, three trajectories

Google Cloud posted 30% operating margins in Q4 2025, growing at 48% year-over-year. Its TPU advantage eliminates Nvidia’s GPU margin premium from the cost stack entirely, a 2–4× cost advantage that compounds with scale. AI products attached to a $403B revenue ecosystem don’t need to be independently profitable. The $99/year promotional Gemini subscriptions reflect confidence, not desperation.

Anthropic’s revenue grew from $1B ARR in December 2024 to roughly $19B ARR by March 2026, driven by Claude Code ($2.5B annualised) and enterprise API sales. Its $21B TPU purchase agreement with Google is a deliberate move toward the same infrastructure economics that make Google profitable. Cash-flow breakeven is targeted for 2028.

OpenAI projects cumulative losses of roughly $74B by 2028 and doesn’t expect cash-flow positivity until 2029–2030. Its 2025 inference costs of $8.4B are projected to reach $14.1B in 2026, and it owes Microsoft 20% of all revenue through 2032. A custom inference chip, taped out mid-2026, is an acknowledgment that GPU-dependent economics are unsustainable at this scale.

Paths to cash-flow breakeven: Google (now), Anthropic (2028), OpenAI (2029–30)

The counterargument worth taking seriously

Sequoia Capital’s David Cahn calculated a $600B annual revenue gap between what AI companies need to earn to justify infrastructure spending and what they actually generate. Barclays estimated the industry needs 12,000 ChatGPT-sized products to justify current CapEx. Deloitte called 2026 “the year of Inference Famine.” These aren’t fringe views.

But cost curves have consistently surprised to the downside, faster and further than any analyst projected. If the 200× annual decline rate holds for three more years, per-token economics look entirely different. Usage-based pricing, in early testing at both OpenAI and Anthropic, could resolve the flat-rate cross-subsidy problem without requiring most users to stop showing up.

The unresolved question is whether platform dynamics matter more than unit economics. Google captured search before it had a clear ad model. If one lab captures the coordination layer for AI applications (agents, codegen pipelines, enterprise workflows), per-token margins become secondary. That’s the bet Anthropic is making with Claude Code. It’s the bet OpenAI is making with everything at once.

For now, the economics are what they are. Every frontier lab except Google loses money serving active users. The subscriptions work because most people don’t show up. When they do, the math changes. The pricing will too.

Comments

Loading comments…

Leave a comment