DeepSeek went from under 1% of AI Gateway token share in April to 17% in May, making it the third-largest provider by volume, ahead of OpenAI, while its cost share stayed near 1%. Two models drove almost all of it: deepseek/deepseek-v4-flash and deepseek/deepseek-v4-pro, both launched in May. V4 Flash priced at $0.14 input and $0.28 output per million tokens, 20 to 50 times cheaper than comparable Anthropic models. That price gap alone does not explain 17% token share in a single month. Teams ran it against their existing evals and shipped it. That is the detail worth reading closely.

The expensive end of the market grew faster in dollars. Anthropic's token share rose from 26% to 32%, and its spend share from 61% to 65%. It holds 70 to 80% of spend across every high-stakes use case: AI app generation, back-office agents, and coding agents. Total gateway tokens grew 20% month over month. Total spend grew 43%. Customers paid roughly 20% more per token on average in May than in April, even with DeepSeek pulling the average down. The work that demands frontier models grew faster than the work that does not.

The clearest evidence of cost discipline as a routing strategy is Gemini. Gemini 3.5 Flash launched in May at a higher price than 3.0 Flash, and by month-end held only 7% of Flash family tokens while 3.0 held 90%. Teams happy with the cheaper model did not upgrade. The full report breaks down B2B versus B2C cost structures, tool-call token concentration, and how model diversity scales with request volume. The routing patterns described here are where the real production strategy lives.

[READ ORIGINAL →]