A local Qwen 35B model beat Claude Opus 4.5 building a Stripe Tempo blockchain payment app: 2 minutes versus 6 minutes 24 seconds. Opus 4.5 scores roughly 20% higher on benchmarks and is likely 50 times larger. It lost anyway. When Claude scored both outputs, the local model earned 6.5 to Claude's 4.5.
The speed gap created a compounding advantage. While Claude was still processing its first planning step at 55 seconds, Qwen had already completed that step in 20.9 seconds and run a full self-critique cycle in another 16.5. Three times faster responses meant one extra revision loop fit inside the same wall-clock window. The tortoise ran an additional lap while the hare was thinking.
The original piece is worth reading for the step-by-step timing table across five discrete tasks and for the argument about when raw intelligence stops being the bottleneck. The author draws a clean line between agentic coding workflows, where slower and more careful may win, and everyday tasks, where tighter feedback loops drive better outcomes. The data is specific, the test is reproducible, and the conclusion challenges a default assumption most teams are still acting on.
[READ ORIGINAL →]