Amazon Tripled CPU Server Capacity but Cloud Supply Is Still Tight as Agentic AI Pushes a New Infrastructure Crunch
The artificial intelligence infrastructure race may be entering a new phase. After years of GPU shortages and continued pressure on memory supply, attention is now shifting toward CPUs as agentic AI workloads begin consuming far more general purpose compute than many cloud providers originally planned for. The latest market discussion, driven by comments shared by SemiAnalysis CEO Dylan Patel and amplified across the industry, suggests that cloud CPU availability is becoming a serious constraint as AI inference stacks grow more complex and far more dependent on orchestration, database calls, retrieval, simulation, and other CPU intensive operations. Official reporting and company statements do support the broader trend that CPU demand is rising sharply as AI systems become more workflow driven and inference heavy, though some of the most dramatic claims around total sellouts should still be viewed as market commentary rather than formally disclosed figures.
Dylan Patel says GPUs are no longer the biggest bottleneck.
— Ivan Burazin (@ivanburazin) April 13, 2026
According to @dylan522p, now CPUs are the constraint.
In the early AI era, CPUs were the laggers. You used them for storage, checkpointing, pre-processing, etc. (pretty light workloads)
The models weren't agentic and… pic.twitter.com/yR6g6hh74F
The core reason behind this shift is straightforward. Early large scale AI deployments leaned heavily on GPUs for model training and straightforward inference, while CPUs played more of a supporting role. That balance is changing as agentic AI systems become more interactive and more operationally complex. These newer systems are not just generating text or images. They are calling tools, querying structured and unstructured databases, coordinating tasks, handling control logic, managing I O, and in some cases supporting workloads tied to simulation or code execution. AMD said in a March 2026 blog that agentic AI is bringing “a new level of importance” to CPUs because inference is increasingly a multistep workflow that drives more CPU demand across scheduling, memory, I O, and control flow. SemiAnalysis has also described 2026 as a year in which datacenter CPUs are returning to strategic importance as hyperscalers expand both custom Arm and x86 capacity to support this jump in traffic and orchestration demand.
That broader industry backdrop makes the recent Amazon commentary more believable, even if some of the more dramatic wording remains anecdotal. In his 2026 shareholder letter, Amazon CEO Andy Jassy revealed that two large AWS customers had already asked whether they could buy all of AWS’s Graviton instance capacity for 2026, a striking signal that CPU demand is no longer a background issue in the cloud. Amazon said it could not accept those requests because of other customers’ needs, but the fact that such requests were even made says a great deal about how tight compute planning has become. Jassy also said Amazon’s chip business, which includes Graviton, Trainium, and Nitro, is now running at more than 20 billion dollars annually, while Reuters and other outlets reported that AWS is massively increasing capital spending around AI infrastructure this year.
The agentic AI angle is especially important because it changes the mix of hardware that cloud builders need to deploy. In the earlier AI cycle, the market narrative was dominated by GPU availability. Now the discussion is widening to balanced system design. That means accelerators still matter enormously, but CPUs are once again becoming a gating factor for real world deployment, especially in environments where AI tools need to interact with business systems, developer workflows, and external services. Intel’s recent partnership messaging around Google reflects the same trend, with both companies emphasizing that AI infrastructure increasingly needs balanced systems with CPUs and infrastructure processors supporting the evolving demands of inference and agent based computing. Tom’s Hardware also reported this week on a new heterogeneous AI inference platform that uses Xeon CPUs to orchestrate agent tasks such as code execution and workload distribution, underscoring how central the CPU is becoming in production AI stacks.
One part of the circulating discussion that deserves more caution is the link being made between cloud CPU shortages and GitHub instability. GitHub has indeed experienced real availability incidents in 2026, including one in March where request failures on github.com reached roughly 40% and GitHub API failures reached about 43% during a period of degraded service. However, GitHub’s own reporting attributed those incidents to service failures and architectural issues, not to any publicly confirmed shortage of CPUs. That distinction matters. It is fair to say that cloud infrastructure pressure is intensifying, but it would be too strong to state as fact that GitHub outages were caused by Microsoft running out of CPUs based on the available public evidence.
OpenAI ported their entire code base to ARM so that they can use $AMZN graviton CPUs as CPUs capacity is very short
— MacroValue (@pradeeepk) April 14, 2026
source @dylan522p pic.twitter.com/pOyole5BFQ
What does look increasingly credible is the possibility of a broader supply squeeze across server CPUs if current AI demand keeps accelerating. SemiAnalysis has already noted that hyperscalers are carrying out major buildouts of both custom Arm CPUs and x86 general purpose servers, and Amazon’s own disclosures show customer appetite for Graviton is intense. If that continues, cloud providers will likely compete even more aggressively for both Arm and x86 server silicon. That could benefit suppliers such as AMD and Intel on the datacenter side, while also reinforcing the market’s focus on NVIDIA’s rack level strategy, where CPUs, GPUs, and large memory footprints are packaged together as a single platform play. The concern for the wider industry is that, just as DRAM supply has already been pulled hard toward AI, CPU allocation could increasingly favor higher margin AI deployments over mainstream enterprise and consumer channels.
If that scenario plays out, the implications could reach well beyond cloud operators. Higher datacenter CPU demand would tighten supply planning, raise pricing pressure, and make allocation more selective across the supply chain. In practical terms, that means AI infrastructure customers with the strongest budgets and the biggest long term contracts would continue to get priority, while the rest of the market deals with longer lead times and less predictable pricing. It is not yet clear whether this becomes a full scale CPU shortage in the same way the market has experienced GPU scarcity, but the direction is becoming harder to ignore. The AI boom is no longer just about accelerators. It is now pulling on every critical component in the server stack, and CPUs are quickly becoming the next battleground.
Do you think CPUs will become the next major bottleneck in AI infrastructure, or will cloud providers scale fast enough to prevent another full supply crisis?
