A single frontier model prompt costs around $2. A standard HTTP request costs $0.000002. That gap makes inference theft one of the highest-margin attacks on the internet right now, and Vercel documented it happening to their own docs AI chat endpoint on April 12, 2026, when traffic on Claude Haiku 4.5 spiked to 1,300 requests per minute, a run rate exceeding $10,000 per day, routed through residential proxies that made per-IP rate limits useless.
The attack architecture is worth understanding in detail. Tools like Chipotlai Max, a publicly available forked coding agent, wrap a victim's custom AI endpoint in an OpenAI-compatible adapter, turning stolen inference into a drop-in replacement for any standard SDK or coding agent. The adapter is a one-time engineering cost. Resale at 5 to 10 percent of list price against zero marginal inference cost is a profitable business. Auth walls and session-level rate limits fail here because the attacker authenticates once and amortizes that bypass cost across hundreds of thousands of subsequent calls. The check has to run per request, not per session.
Vercel's fix uses BotID with deep analysis, powered by Kasada, called inside the route handler before each AI request executes. It uses client-side machine learning to classify requests without a visible challenge, which is what makes per-request gating practical. It blocked over 10,000 bot requests in the first minutes of the spike and flattened traffic to normal within 24 hours. The full implementation details, including the required client-side route declaration and next.config.ts wrapper, are what make this article worth reading in full. The economics will not change: inference stays expensive, requests stay cheap, and the resale margin holds.
[READ ORIGINAL →]