AMD Brings Day 0 Gemma 4 Support to Radeon, Instinct, and Ryzen AI Platforms

AMD has officially rolled out Day 0 support for Google’s Gemma 4 family across its AI capable hardware portfolio, giving developers and local AI enthusiasts a much broader set of deployment options from the start. According to AMD’s official announcement, support now spans AMD Instinct GPUs for datacenter and cloud workloads, AMD Radeon GPUs for workstations and local AI use, and AMD Ryzen AI processors for AI PCs, with software support tied into several of the most widely used inference and local model tools in the market.

That makes this launch notable for more than simple compatibility. Gemma 4 is one of Google’s latest open weights model families, and AMD is positioning itself as ready across nearly every major usage tier at once, from large scale server deployment to local workstation inference and even NPU accelerated AI PC scenarios. AMD says support includes integration with LM Studio, along with open source ecosystem support for vLLM, SGLang, llama.cpp, Ollama, and Lemonade.

The most important angle here is the breadth of the rollout. In the datacenter and high throughput space, AMD says Gemma 4 can be deployed on supported AMD GPUs through vLLM, including multiple generations of both Instinct and Radeon hardware. The company also notes that support is available in the Gemma 4 launch build of upstream vLLM and future nightly builds, with more backend optimizations for MI300 and MI350 series GPUs planned to follow.

For higher performance serving on the enterprise side, AMD is also enabling Gemma 4 through SGLang on MI300X, MI325X, and MI35X GPUs. AMD says SGLang supports the full Gemma 4 lineup, including dense models and the 26B A4B MoE variant, and adds that a full Gemma 4 model can fit on a single MI300X with 192 GB HBM at full context length when running at tensor parallel 1. That is a meaningful point for AI infrastructure readers because it reinforces AMD’s effort to make its accelerator stack more practical for compact and mid sized open models, not just giant frontier workloads.

On the consumer and local AI side, AMD is making a strong push around simplicity. The company says Gemma 4 can be deployed locally through llama.cpp and LM Studio on supported systems including Ryzen AI, Ryzen AI Max, Radeon, and Radeon PRO hardware, paired with the latest AMD Software: Adrenalin Edition drivers. For users who want a more straightforward local workflow, this is likely to be one of the most appealing parts of the announcement because it lowers the barrier between supported AMD hardware and practical model usage.

AMD is also backing Gemma 4 deployment through Lemonade Server, an open source local LLM server with OpenAI compatible APIs. On the GPU side, Lemonade supports acceleration on Radeon and Radeon PRO graphics through ROCm. On the AI PC side, AMD says Lemonade also supports deployment on Ryzen AI processors using the XDNA 2 NPU, although the company notes that NPU support for Gemma 4 E2B and E4B will arrive with the next Ryzen AI software update. AMD says that update will be integrated into Lemonade and also exposed directly to developers through OnnxRuntime APIs.

For developers, this is the bigger message. AMD is not treating Gemma 4 as a single platform story. It is trying to make the model family accessible through nearly every layer of its current AI stack, from cloud accelerators to desktop GPUs to NPUs inside AI PCs. That matters because the competitive conversation around AI hardware is no longer just about raw silicon. It is increasingly about ecosystem readiness, deployment flexibility, and whether developers can move between local experimentation and production serving without having to abandon their preferred hardware path.

The official support article, available through AMD’s developer resource page, also outlines deployment through vLLM, Docker images, and SGLang documentation. That matters for AMD because it shows the company is continuing to build around the software frameworks developers actually use, rather than relying only on closed platform messaging.

From a market perspective, this is another sign that AMD wants to strengthen its relevance across the full AI continuum. Instinct remains the headline play for cloud and enterprise acceleration, but Radeon and Ryzen AI are becoming increasingly important to the company’s local inference and AI PC strategy. Supporting Gemma 4 across all of them gives AMD a more unified story at a time when compact open models are becoming more useful for local agents, edge workflows, and private deployment.

For users already inside AMD’s ecosystem, the appeal is straightforward. If you are running an AI workstation, an Instinct powered server, or a Ryzen AI machine, AMD wants Gemma 4 to be available immediately through the tools you are most likely already using. That kind of launch day readiness may not be as flashy as a new chip reveal, but in practice it is exactly the kind of ecosystem progress that helps determine which hardware platforms developers actually stay with over time.

Do you think wide day one model support like this is becoming one of the most important advantages in the AI hardware race, especially for local AI and workstation users?

Share
Angel Morales

Founder and lead writer at Duck-IT Tech News, and dedicated to delivering the latest news, reviews, and insights in the world of technology, gaming, and AI. With experience in the tech and business sectors, combining a deep passion for technology with a talent for clear and engaging writing

Previous
Previous

Starfield on PS5 Pro Looks Like a Proper Upgrade, Not a Checkbox Port, With PSSR, 60 FPS, and Deep DualSense Support

Next
Next

Samsung Takes a First Real Step Beyond ARM With a Reported RISC V SSD Controller Move