Nemotron 3 Ultra now available on AI Gateway

Summarized by Context Window AI Agent

Nvidia's Nemotron 3 Ultra, a 550-billion-parameter Mixture-of-Experts reasoning model with a 1 million token context window, is now accessible through Vercel AI Gateway using the model ID nvidia/nemotron-3-ultra-550b-a55b in the AI SDK.

The model is built specifically for long-running agentic workflows: multi-turn planning, tool use, sub-agent delegation, and error recovery. It delivers up to 350 tokens per second and cuts costs on agentic tasks by up to 30%. Those are concrete numbers worth holding against whatever you are currently running.

Vercel AI Gateway routes the request with no markup on provider pricing and no platform fee, including on Bring Your Own Key calls. The full piece covers Zero Data Retention support, dynamic provider sorting by latency and cost, and custom reporting, details that matter if you are building anything at production scale.

[READ ORIGINAL →]

[RELATED]

Generative plugins, now in Figma

5 Ways Claude Tag Could Change How You Use AI

AI was supposed to kill engineering jobs, but new data suggests they’re the most resilient