OpenAI and Anthropic's inference costs are consuming half their revenue

Both companies are projecting profitability to investors while obscuring a structural problem: computational costs to run their models post-training now exceed 50% of revenue, compressing margins to unsustainable levels at scale. This explains the simultaneous push for cheaper inference optimization, longer context windows to reduce repeat queries, and cache-heavy architectures—not as product features, but as operational necessity. The gap between board presentations and the physics of their cost structure suggests either a dramatic breakthrough in inference efficiency or a reset in pricing expectations within 18-24 months.