Skymizer Unveils HTX301 PCIe AI Card With 384 GB Memory, 700B Model Support, and Just 240W Power

Taiwan based Skymizer has introduced a new PCIe AI accelerator called the HTX301, and on paper it is one of the more disruptive on premises inference announcements to emerge this year. The company says the new card can run 700B parameter class LLM inference on a single PCIe card, pairing 6 HTX301 chips with 384 GB of memory while operating at around 240W. If those claims hold up in real deployment, the HTX301 could challenge the idea that ultra large model inference always requires large GPU clusters, expensive interconnects, and heavy cooling infrastructure.

According to Skymizer, the HTX301 is the first reference chip built on its HyperThought platform and uses the company’s next generation LPU IP to target inference dominant AI workloads. Rather than chasing the same brute force path as traditional hyperscale GPU deployments, Skymizer says its architecture separates prefill and decode workloads and combines decode first silicon with a software orchestration layer to improve utilization, latency, and power efficiency in practical inference environments. That makes this product feel less like a conventional accelerator launch and more like a direct pitch against the current structure of enterprise AI infrastructure.

The headline specification is the memory footprint. Skymizer says each card integrates 384 GB of memory across 6 chips, enough to support 700B parameter class models locally on a single add in card. The company is not using exotic memory here either. Instead, it relies on LPDDR4 and LPDDR5 DRAM, which suggests a deliberate focus on cost and deployment practicality rather than maximum bandwidth alone. Combined with its compression approach, Skymizer appears to be optimizing around efficient inference delivery rather than trying to mirror the design philosophy of top tier GPU accelerators that depend on HBM and massive board power.

Skymizer is also publishing some aggressive performance claims. The company says the Octa Core LPU can achieve 240 tokens per second in Llama2 7B prefill, and that multiple chips can scale that figure to 1200 tokens per second for the same model class. It also states that the platform can deliver 30 tokens per second with only 0.5 TOPS at 100 GB/s bandwidth, which is a very different framing from the raw TOPS arms race often used in GPU marketing. Instead of emphasizing theoretical peak numbers first, Skymizer is trying to make the case that architecture efficiency and model specific orchestration can matter more for real world LLM inference.

The power story is arguably the biggest attention grabber. Skymizer says the HTX301 card runs at about 240W, which places it far below some of the largest PCIe AI cards currently in the market. By comparison, AMD’s Instinct MI350P is listed at up to 600W TBP, while NVIDIA’s RTX PRO 6000 Blackwell Server Edition is also described by partners and product listings as reaching up to 600W maximum board power depending on configuration. That gives Skymizer a major efficiency talking point, especially for smaller enterprise deployments, labs, and local inference environments where rack power and cooling remain hard limits rather than theoretical concerns.

[List]

Some of the key HTX301 highlights include:

  • 700B parameter class inference on a single PCIe card

  • 6 chips per card

  • 384 GB memory

  • Around 240W board power

  • Purpose built decode acceleration

  • Unified prefill and decode orchestration

  • On premises deployment with data sovereignty and deterministic latency

Skymizer is also leaning heavily into model efficiency. The company says its weight compression outperforms open source llama.cpp by 9% to 17.8%, while KV cache compression reportedly keeps perplexity loss between less than 0.06% and 3.52%. Those figures are meaningful because the broader enterprise AI market is increasingly concerned with serving cost, memory footprint, and latency stability rather than only peak training horsepower. A platform that can reduce memory overhead while preserving inference quality would be highly attractive if independently validated.

The broader strategic angle is what makes this especially interesting. Skymizer is effectively arguing that the next phase of AI adoption should not be reserved for hyperscalers or giant infrastructure buyers. Its message is that enterprises should be able to run advanced models locally with predictable cost, clear data control, and fixed infrastructure requirements. That fits especially well with organizations concerned about privacy, compliance, sovereign deployment, or long term cloud inference costs. For Taiwan, it is also notable to see a domestic company step into the AI hardware conversation with something that aims to compete through architectural efficiency rather than sheer scale.

That said, this is still a launch built on vendor claims, so the most important next step is validation. Skymizer announced the HTX301 on April 23, 2026, and the product is still at the stage where real world benchmarks, software maturity, workload compatibility, latency behavior, and sustained throughput under enterprise conditions will determine whether it becomes a breakthrough or simply an interesting concept. The company’s numbers are ambitious enough that hands on testing will matter a great deal. Until then, the HTX301 stands out as one of the most intriguing AI accelerator ideas of the year, especially for businesses that want serious local inference without committing to full scale GPU clusters.

Do you think cards like the HTX301 could reshape enterprise AI by making ultra large local inference practical without hyperscale infrastructure?

Share
Angel Morales

Founder and lead writer at Duck-IT Tech News, and dedicated to delivering the latest news, reviews, and insights in the world of technology, gaming, and AI. With experience in the tech and business sectors, combining a deep passion for technology with a talent for clear and engaging writing

Previous
Previous

Aliens: Fireteam Elite 2 Finally Emerges With A Summer 2026 Release Window And A Bigger Four Player Co op Focus

Next
Next

AMD Launches Instinct MI350P PCIe Card With 4.6 PFLOPs AI Compute, 144 GB HBM3E, and 600W Power