Source: Marcus on AI
The AI industry's obsession with scaling token counts has hit diminishing returns. Builders are rethinking model architecture and reasoning capabilities instead of adding data and compute. Consumer products built on token bloat alone perform noticeably worse at hard reasoning tasks. Serious applications—reasoning, code, domain expertise—require different approaches than content generation. Casual users are indifferent to raw parameter size. Winners will solve for latency, cost-efficiency, and task performance rather than chase headline model sizes. This is a reset after three years of "bigger is always better."