Tensordyne’s 3nm Napier AI Chip Targets NVIDIA Blackwell And Rubin With Big Inference Claims
Tensordyne says its new Napier platform can deliver far higher token throughput than NVIDIA Blackwell while challenging Rubin class infrastructure, but the numbers still need real world validation.
Tensordyne has announced the successful tape out of its Napier AI processor, a 3nm inference chip designed to attack one of the biggest problems in modern AI datacenters: token generation cost. The company says Napier is now in production at TSMC and will sit at the center of the Tensordyne Napier TDN system. The platform is being built in partnership with Broadcom and HPE Juniper Networks, combining logarithmic AI math, HBM, SRAM, and a high speed scale up interconnect. According to Wccftech, the Napier chip features 138 billion transistors, 144GB of HBM3E, 256MB of SRAM, 2.1 PFLOPs of dense FP8 compute, and a 300W power rating.
Tensordyne claims its full rack system can deliver 13x more tokens per second and 17x more tokens per watt than an NVIDIA GB300 NVL72 Blackwell system. It also claims up to US$33 million more annual revenue per rack.
The core idea behind Napier is Tensordyne’s logarithmic math approach. Instead of relying heavily on large multiplication operations, the company says its TDN Math replaces those operations with simpler addition based computation.
That design is meant to improve performance per watt, reduce compute area, and free more silicon space for tensor engines, SRAM, HBM, and interconnect logic.
The TDN AIP processor combines HBM3E and SRAM to reduce idle compute cycles, while TDN Link provides an any to any scale up fabric with sub microsecond latency. Tensordyne says the interconnect offers 1TB per second of bandwidth and is designed to make a 72 chip pod behave like one tightly connected inference system.
"By optimising math, compute, memory, and networking from first principles, Napier delivers affordable inference without compromising on speed."
— Marc Bolitho, Tensordyne CEO
Tensordyne’s main system is the TDN72 pod, which uses 72 Napier chips. A full Napier rack combines 4 TDN72 pods for a total of 288 chips.
The rack is claimed to deliver 608 PFLOPs of FP8 compute, 74GB of SRAM, 42TB of HBM3E memory, and a 120kW power rating while remaining fully air cooled.
Tensordyne says the system can support multi trillion parameter models and deliver more than 1,000 tokens per second per user in a single rack configuration. The company claims that matching this throughput would require at least 9 NVIDIA Rubin plus Groq LPX racks.
That comparison is aggressive, but it also needs context. Tensordyne’s own product endnotes say its throughput results are based on internal simulations. That means these figures should not be treated the same as independent third party benchmark results yet.
Napier’s tape out is an important milestone, but rack scale AI hardware still has a long path between announcement and deployment.
As noted by ServeTheHome, Tensordyne is targeting beta programs in Q1 2027, with system shipments expected by the end of Q2 2027. By then, NVIDIA Rubin, AMD Instinct, hyperscaler internal silicon, Groq, Cerebras, and other inference platforms will also be moving forward.
NVIDIA has already positioned Rubin as its next major rack scale AI platform, claiming up to 10x lower inference token cost compared with Blackwell and stronger support for agentic AI, reasoning, and mixture of experts workloads.
That means Napier is not entering a static market. It is entering a race where every major player is already optimizing power, memory, networking, software, and rack level economics.
Napier is one of the more interesting AI chip announcements because Tensordyne is not only claiming a faster accelerator. It is trying to change the math behind inference.
That is what makes this story worth watching. The AI industry is running into power limits, memory limits, cooling limits, and infrastructure cost limits. A chip that can deliver more tokens inside a lower power envelope would be extremely valuable if the claims hold up.
Tensordyne’s numbers are bold, but they are still company claims based on internal simulations. The real test will come when customers run production models, measure accuracy, compare latency, evaluate software maturity, and check whether the platform can operate reliably at rack scale.
NVIDIA’s advantage is not just hardware. It is CUDA, networking, software libraries, developer trust, hyperscaler adoption, and a full rack scale ecosystem. Tensordyne will need more than impressive charts to compete with that.
Still, Napier points to where AI infrastructure is heading. The next battle is not only about who has the biggest chip. It is about who can generate the most useful tokens per watt, per rack, and per dollar.
Do you think new AI chip startups like Tensordyne can seriously challenge NVIDIA in inference, or is NVIDIA’s software and datacenter ecosystem still too strong?
