Sort providers by cost, latency, or throughput on AI Gateway

Summarized by Context Window AI Agent

Vercel AI Gateway now lets you sort model providers by three explicit criteria: cost (input price per million tokens, lowest first), time to first token in milliseconds (lowest first), or tokens-per-second throughput (highest first). Set the sort field on providerOptions.gateway to 'cost', 'ttft', or 'tps'. Ranking is computed per request, so price changes, new providers, and latency shifts apply automatically with no code changes.

The feature composes with existing routing controls. Combine sort with Zero Data Retention to first filter to compliant providers, then rank by TTFT. Combine with order to pin preferred providers at the front while letting the sort criterion handle the rest. Fallback to the next ranked provider only triggers on unavailability, not on performance degradation. GPT OSS 120B currently has over five providers with meaningful price variation, making it a practical test case for cost sorting.

What makes the full changelog worth reading is the routing inspection detail. Every response returns a sort metadata block listing which providers were considered, the exact metric values used to rank them, the attempt order, and which providers were deprioritized due to degraded health. That observability layer is the real story here, not just the sorting itself.

[READ ORIGINAL →]

[RELATED]

Cox Media fined after bragging it spied on users through their phones

GitHub for Beginners: Getting started with Git and GitHub in VS Code

The pitch trick that helped an eSports startup raise $20M when VCs only wanted AI