OpenAI Unveils Jalapeño, Its First Custom AI Chip Built for LLM Inference

OpenAI and Broadcom have unveiled Jalapeño, the artificial intelligence company’s first custom Intelligence Processor and the opening product in a planned multigeneration computing platform. Designed specifically for large language model inference, the accelerator represents OpenAI’s most significant move yet toward controlling the complete infrastructure behind ChatGPT, Codex, its API platform, and future agentic products.

According to the official OpenAI announcement, Jalapeño was designed from the ground up around the workloads, kernels, memory movement, networking demands, and serving patterns observed across OpenAI’s production systems. The company describes it as a blank slate design for modern LLM inference rather than a general accelerator adapted from earlier artificial intelligence workloads. Its objective is to combine the throughput of leading AI accelerators with latency closer to highly specialized inference processors, making it suitable for interactive products operating at massive scale.

OpenAI chief executive officer Sam Altman and president Greg Brockman received the processor from Broadcom chief executive officer Hock Tan and Semiconductor Solutions president Charlie Kawwas. Although Broadcom played a major role in bringing Jalapeño into production, it is not the company physically manufacturing the silicon. OpenAI created the accelerator architecture, Broadcom provides silicon implementation, networking, connectivity, and Tomahawk networking technology, while Celestica contributes board design, rack integration, and production system expertise. According to Reuters, TSMC is responsible for manufacturing the chip.

"Jalapeño was designed from the ground up for LLM inference using detailed insights from our close collaboration with OpenAI researchers."
— Richard Ho

OpenAI says the processor moved from initial design to manufacturing tape out in only 9 months, which the company believes may be the fastest development cycle completed for a high performance custom ASIC. Its own models were used to accelerate parts of the engineering, design, and optimization process, creating a development loop where the same artificial intelligence systems that will eventually run on Jalapeño also contributed to building the hardware.

Engineering samples are already operating inside OpenAI laboratories at their targeted production frequency and power while running workloads including GPT 5.3 Codex Spark. OpenAI has not yet disclosed the manufacturing process, memory standard, total memory capacity, package power, networking bandwidth, or exact inference performance. Early internal testing reportedly shows substantially stronger performance per watt than current leading hardware, but a detailed technical report will not arrive until the coming months.

The architecture is designed to reduce unnecessary data movement while balancing compute, memory, and networking resources more closely. This is particularly important for modern inference and agentic AI, where systems must generate tokens quickly while managing long contexts, tool execution, memory retrieval, parallel agents, and complex software workflows. Better utilization could allow OpenAI to serve more intelligence from the same amount of power and infrastructure while reducing latency and operating costs.

Jalapeño is intended to work with LLMs beyond OpenAI’s own model family, although its design is informed by the company’s direct experience operating ChatGPT, Codex, and the API. The first platforms are scheduled for deployment before the end of 2026, with broader expansion planned over several hardware generations. This deployment sits within the previously announced 10 GW OpenAI and Broadcom infrastructure partnership, which aims to deploy custom accelerators and Ethernet based rack systems between 2026 and 2029.

The custom silicon program does not mean OpenAI is abandoning NVIDIA. OpenAI and NVIDIA also announced a separate 10 GW infrastructure partnership involving millions of graphics processors and the Vera Rubin platform. This broader strategy reflects the enormous demand created by frontier models and agentic artificial intelligence. NVIDIA remains the dominant provider of general AI infrastructure, but custom processors give companies greater control over performance, power efficiency, availability, and long term costs. Google has followed this strategy for years through its TPU platform, while Amazon, Microsoft, Meta, and other technology companies are also investing heavily in specialized silicon.

The arrival of Jalapeño shows that OpenAI now wants to optimize every major layer of its platform, including models, software, kernels, memory systems, networking, accelerators, racks, and data center deployment. That level of vertical integration could allow future ChatGPT and Codex services to respond faster, support more complex agent workflows, and remain available during periods of extreme demand.

Jalapeño is not designed to replace every GPU inside OpenAI’s infrastructure. Its strategic value comes from handling predictable, high volume inference workloads more efficiently while NVIDIA systems continue supporting training, general acceleration, and other demanding applications.

The most important claim is not raw performance but utilization. Modern AI chips can offer enormous theoretical computing power while losing efficiency through memory transfers, networking delays, and poorly matched workloads. OpenAI designed Jalapeño using direct knowledge of how its models behave in production, giving it an opportunity to align the hardware more closely with the software.

However, OpenAI has not yet published enough technical information for an independent comparison against NVIDIA Blackwell, Google TPU, or other custom accelerators. The upcoming performance report will need to disclose real throughput, latency, energy consumption, memory specifications, and scaling behavior before Jalapeño’s competitive position can be properly evaluated.


Could OpenAI’s custom Jalapeño processor reduce its dependence on NVIDIA, or will GPUs remain essential to its infrastructure strategy?

Share
Angel Morales

Founder and lead writer at Duck-IT Tech News, and dedicated to delivering the latest news, reviews, and insights in the world of technology, gaming, and AI. With experience in the tech and business sectors, combining a deep passion for technology with a talent for clear and engaging writing

Previous
Previous

Bungie Details Marathon Vault Breaker PvE Mode as Season 2 Seeks New Momentum

Next
Next

Oxenfree Developer Night School Reveals Unhinged as Netflix Expands Its Gaming Ambitions