Etched Emerges From Stealth With Working TSMC N4P Chip and $1 Billion in AI Contracts
AI hardware startup Etched has emerged from stealth with working A0 silicon, its first rack scale inference systems, $800 million in funding, and more than $1 billion in signed customer contracts. The company is developing a vertically integrated alternative to conventional GPU infrastructure by designing its processors, memory architecture, networking, cooling, software, circuit boards, and manufacturing processes as one coordinated platform.
According to the official Etched announcement, the company’s first silicon returned earlier in 2026 after being manufactured through TSMC’s N4P process. Etched says the A0 design worked on its first pass and is now being used to validate its initial rack scale product with customers ahead of shipments planned for summer 2026.
We're coming out of stealth.
— Etched (@Etched) June 30, 2026
We've built our first racks after a successful A0 tapeout, $1B+ in customer contracts, and $800m raised.
Early customer tests show us achieving SOTA throughput, latency, and power efficiency on inference workloads.
Our first racks ship this summer. pic.twitter.com/FLccrkLTza
Etched was founded in 2022 and has assembled a team of more than 400 engineers with experience from NVIDIA, Google’s Tensor Processing Unit division, Broadcom, SK Hynix, TSMC, Intel, and major quantitative trading firms. Its leadership team includes former engineers responsible for NVIDIA HGX and DGX systems, Google TPU software, NVIDIA H100 architecture, Broadcom custom silicon, and large consumer electronics production programs.
The company has raised $800 million through 4 previously undisclosed financing rounds. Its latest financing reportedly brought in $500 million at a $5 billion valuation in December 2025. Investors include VentureTech Alliance, Peter Thiel, Jane Street, Hudson River Trading, Jump Trading, Two Sigma, Stripes, Ribbit Capital, Radical Ventures, Primary Venture Partners, and Positive Sum.
Etched is targeting what it calls frontier inference clusters, systems designed specifically to run extremely large mixture of experts models, long context workloads, autonomous agents, and other applications that require sustained inference across many processors. The company is optimizing its infrastructure for both prefill, where the system processes a user’s prompt and context, and decode, where the model generates output tokens.
The first major technology revealed by Etched is Low Voltage Inference. The company argues that current AI processors can struggle to maintain their theoretical peak compute performance because rising power consumption and temperature force clocks to decrease during sustained workloads.
Etched says its architecture can operate its mathematical processing blocks at less than half the voltage used by most existing AI accelerators. This requires specialized circuit techniques, splittable compute arrays, power delivery systems, voltage regulation, scheduling software, packaging, and liquid cooling designed together from the transistor level to the full rack.
The company claims this approach allows its processor to run trillion parameter sparse mixture of experts models at more than 80% of peak floating point performance without thermal throttling. Etched has not yet published detailed specifications, model configurations, power measurements, pricing, or independently verified benchmark results, meaning those performance claims should currently be treated as company supplied data.
The second major component is Cluster Scale Memory, or CSM. Etched says conventional processors using High Bandwidth Memory can provide substantial capacity and throughput, but memory hierarchies and networking can increase latency during token generation. Designs relying heavily on SRAM can reduce latency but usually sacrifice memory capacity, compute density, manufacturing yield, or cost efficiency.
Etched’s CSM architecture combines HBM and SRAM through a proprietary high bandwidth and low latency interconnect. The goal is to create a shared memory pool across the complete scale up domain, allowing model data and expert routing traffic to move between processors without repeatedly passing through deeper memory and networking layers.
This approach is particularly important for mixture of experts models. These systems activate only selected parts of a model for each request, but routing tokens between different experts can produce significant communication and memory overhead. Etched believes its shared memory design can improve both response latency and large batch throughput without requiring customers to choose between fast token generation and efficient infrastructure utilization.
The company says its current systems are already running models including DeepSeek, Qwen, Mamba, and Llama. Early customer testing has reportedly demonstrated competitive throughput, latency, and power efficiency, although Etched plans to disclose more detailed performance and roadmap information later in summer 2026.
“Production is the product.”
— Robert Wachen, Etched Cofounder
Etched has started production to fulfill its signed contracts and says its first racks will ship during summer 2026. The company has opened a factory in Taiwan to support continuous engineering and manufacturing work, while its San Jose headquarters now contains a 2 MW data center, test facility, and new product introduction prototyping laboratory. Etched says this infrastructure provides a path toward gigawatt scale deployment in 2027.
The announcement arrives as AI infrastructure competition moves beyond individual accelerator specifications and toward complete rack scale systems. NVIDIA’s GB300 NVL72 similarly combines processors, networking, memory, cooling, and software into a unified platform designed for agentic and large model inference. This infrastructure shift is also visible in cloud deployments. Microsoft Azure is already using NVIDIA GB300 systems to run Anthropic’s latest Claude models, as detailed on Claude running on Blackwell Ultra. Etched is entering the market with a more specialized architecture focused almost entirely on inference rather than training and general purpose acceleration.
Etched’s announcement is notable because the company is presenting more than a conceptual chip or future roadmap. It has working first pass silicon, assembled racks, production infrastructure, and more than $1 billion in signed contracts. Those milestones give the startup greater credibility than many AI accelerator companies that announce ambitious performance targets before hardware becomes available.
Its main challenge will be software adoption. NVIDIA’s strength comes not only from GPU performance, but from CUDA, mature development tools, optimized libraries, large cloud deployments, and years of application compatibility. Etched must demonstrate that customers can deploy existing models without extensive rewriting while maintaining reliability across large production clusters.
The Low Voltage Inference design is technically compelling because power and cooling are becoming primary constraints for AI data centers. Sustaining more than 80% of peak performance could significantly improve infrastructure economics, but the claim will need transparent testing across real models, batch sizes, context lengths, and power limits.
Cluster Scale Memory may ultimately be the more important differentiator. Modern inference is increasingly limited by memory movement and communication rather than raw arithmetic capability. A successful HBM and SRAM hybrid could improve time to first token and output speed while supporting models that are too large for SRAM focused accelerators.
Etched now needs to convert its first pass success into volume manufacturing, stable software, dependable yields, and repeatable customer performance. Should it deliver on those requirements, the company could become one of the most serious challengers yet to NVIDIA’s dominance in frontier AI inference.
Can Etched’s specialized inference architecture compete with NVIDIA’s mature GPU ecosystem, or will software compatibility remain the deciding factor?
