NVIDIA CEO Jensen Huang Says the Company Produces the World’s Lowest Cost Tokens as Full Stack AI Strategy Takes Center Stage
NVIDIA is once again making it clear that its AI strategy is not just about selling powerful chips. Speaking at Cadence Live 2026, NVIDIA CEO Jensen Huang argued that the company leads on token economics, saying NVIDIA produces “the lowest cost tokens in the world,” while also emphasizing that the future of AI belongs to full stack companies that can control hardware, software, systems, and application level optimization together. NVIDIA’s own event coverage and Cadence’s announcement around the expanded partnership both reinforce that this message was central to the discussion around AI infrastructure and engineering workflows.
At #CadenceLIVE, our CEO Jensen Huang joined @Cadence CEO @OfficialADevgan to discuss one of the biggest challenges in AI infrastructure: tokens per watt.
— NVIDIA AI (@NVIDIAAI) April 21, 2026
Hear how the NVIDIA-Cadence partnership is helping produce the lowest cost tokens in the world. 👇 pic.twitter.com/YfNr742IKS
That statement goes to the heart of how NVIDIA wants the market to think about AI performance. In the current AI era, raw hardware power alone is no longer the only metric that matters. Token generation speed is important, but Huang’s argument is that real leadership comes from how efficiently a company can generate those tokens in terms of cost and power. NVIDIA’s social post from Cadence Live specifically framed the discussion around “tokens per watt,” while Huang has separately highlighted token cost as one of the company’s biggest advantages, crediting what he calls extreme codesign between software and silicon.
This is where NVIDIA’s full stack position becomes central. At Cadence Live, Huang said the future of the world is going to be full stack, meaning companies need to understand the software stack on top, the systems those layers run on, and the applications that sit above them. Cadence’s official release around the event also highlighted how its partnership with NVIDIA spans agentic AI, digital twins, physics based simulation, and AI factory infrastructure, showing that NVIDIA is continuing to push the idea that AI leadership comes from controlling the entire pipeline rather than competing at the chip layer alone.
That message also lines up with how NVIDIA has been presenting its roadmap more broadly in 2026. During GTC 2026, NVIDIA said its token cost is the best in the world thanks to “extreme codesign,” and Huang described Vera Rubin not as a single chip launch but as a vertically integrated full stack computing platform optimized end to end as one giant system. That framing matters because it shows NVIDIA is trying to move the discussion away from sticker price and toward overall output, efficiency, and total operating economics.
In practical terms, Huang’s point is straightforward even if the hardware itself is extremely expensive. Systems based on Blackwell and upcoming Rubin platforms may cost millions of dollars, but NVIDIA’s position is that these machines generate such massive throughput, with such tight software and system level optimization, that the cost per token ends up being lower than competing approaches. NVIDIA has already made similar claims around Rubin, saying the platform is designed to deliver much stronger inference performance and substantially lower cost per token versus the previous generation.
This is also why token cost is becoming a more important AI infrastructure metric than headline throughput alone. A system can push large numbers by brute force, but that does not necessarily make it efficient or commercially attractive. NVIDIA’s argument is that what really matters is how many useful tokens a platform can produce for each dollar spent and each watt consumed. The company is increasingly tying its competitive story to those economics, especially as inference and agentic AI workloads become a much larger part of the market.
The timing of this messaging is not accidental. Agentic AI is quickly becoming the next major battleground, and NVIDIA knows that future competition will not only come from chip makers but also from hyperscalers, custom silicon efforts, and alternative software ecosystems. By stressing full stack integration, token cost, and token per watt efficiency, Huang is effectively arguing that NVIDIA’s advantage is not just its GPU architecture but the entire surrounding platform built over years through CUDA, libraries, networking, system design, and deployment tooling. Cadence’s language around joint workflows, agentic AI, and digital twin driven engineering further supports that broader platform story.
For the AI market, this is an important shift in language. The conversation is moving beyond who has the fastest chip and toward who can deliver the most economically efficient intelligence at scale. NVIDIA wants that conversation to revolve around token cost, token per watt, and full stack integration, because those are the areas where it believes its software and infrastructure lead turns premium hardware pricing into a long term advantage.
Do you think NVIDIA’s full stack approach will keep it ahead in the AI race, or will rivals eventually close the gap with cheaper alternatives and custom platforms?
