A single frontier model prompt costs around $2. A standard HTTP request costs a fraction of a cent. That gap is why inference theft exists and why it is, as Vercel puts it, one of the highest-margin businesses an attacker can run. On April 12, 2026, Vercel's own docs AI chat endpoint hit 1,300 requests per minute at peak, driven by residential proxy traffic across two days. The projected cost: over $10,000 per day on Anthropic's Claude Haiku 4.5 alone.

The attack architecture is worth understanding in detail. Attackers build OpenAI- or Anthropic-compatible adapters on top of victim endpoints, a one-time engineering cost that lets stolen inference drop into any standard SDK or coding agent. The project Chipotlai Max is a live example: a forked coding agent that wraps Chipotle's customer-support chatbot as an OpenAI-compatible endpoint, with the repo actively soliciting ports to Home Depot, Lowe's, Target, and Starbucks. IP rate limits and auth walls fail here because the attacker authenticates to their own adapter, not to your API. By the time a call reaches your endpoint, it has already cleared the boundary you built.

Vercel's fix is per-request verification using their own BotID with deep analysis, powered by Kasada, running inside the route handler before any AI call is made. The piece walks through the specific implementation: checkBotId() called server-side, client-side declaration required for challenge headers, and the next.config.ts wrapper. The key architectural argument is that any gate running at session start amortizes the attacker's bypass cost across every subsequent call. Per-request gates force that ratio to one. The full post covers the endpoint risk taxonomy, the adapter mechanics, and the complete BotID setup code.

[READ ORIGINAL →]