Nvidia's Nemotron 3 Ultra, a 550-billion-parameter Mixture-of-Experts reasoning model with a 1 million token context window, is now accessible through Vercel AI Gateway using the model ID nvidia/nemotron-3-ultra-550b-a55b in the AI SDK.
The model is built specifically for long-running agentic workflows: multi-turn planning, tool use, sub-agent delegation, and error recovery. It delivers up to 350 tokens per second and cuts costs on agentic tasks by up to 30%. Those are concrete numbers worth holding against whatever you are currently running.
Vercel AI Gateway routes the request with no markup on provider pricing and no platform fee, including on Bring Your Own Key calls. The full piece covers Zero Data Retention support, dynamic provider sorting by latency and cost, and custom reporting, details that matter if you are building anything at production scale.
[READ ORIGINAL →]