QNAP QAI h1290FX Brings Edge AI NAS Power With 16 Zen 2 Cores and Up to 96GB of RTX PRO 6000 Blackwell VRAM

QNAP has introduced the QAI h1290FX, a new edge AI NAS platform built for local LLM, RAG, and broader generative AI workflows. The system blends a server class AMD EPYC platform with NVIDIA RTX PRO Blackwell graphics, creating a machine that targets businesses and developers who want on premises AI acceleration without shifting sensitive workloads to the cloud. QNAP positions the system as a GPU ready AI storage server with 12 U.2 NVMe bays, built in 25GbE connectivity, and support for modern AI deployment environments such as Container Station.

One of the more interesting aspects of this launch is the processor choice. QNAP pairs the system with the AMD EPYC 7302P, a Zen 2 era chip with 16 cores and 32 threads. While it is not based on AMD’s newest server architecture, it still offers solid multithreaded throughput for inference orchestration, storage handling, virtualization, and edge compute roles. In practice, this gives the QAI h1290FX a mix of old and new, with a proven server CPU on one side and Blackwell class workstation GPU acceleration on the other. QNAP also lists 128GB RDIMM DDR4 ECC memory as a platform configuration point, with expansion up to 1TB.

On the graphics side, QNAP supports NVIDIA RTX PRO Blackwell options including the RTX PRO 4500 Blackwell with 32GB of GDDR7 ECC memory and the flagship RTX PRO 6000 Blackwell Max Q Workstation with 96GB of GDDR7 ECC memory. QNAP says the RTX PRO 6000 configuration is designed for large scale models of 70B parameters and above, while the RTX PRO 4500 is aimed at mid sized deployments up to around 30B parameters. That gives the QAI h1290FX clear flexibility depending on whether the buyer is building a more focused RAG appliance or a heavier local inference server for larger generative AI workloads.

Storage and connectivity are equally important here. QNAP equips the platform with 12 U.2 NVMe SSD bays with SATA SSD support, allowing businesses to tune the system around speed, capacity, or cost. Networking is also a strong part of the package, with dual 25GbE and dual 2.5GbE ports built in, while PCIe expansion allows optional upgrades such as 100GbE cards. QNAP also highlights compatibility with its JBOD expansion lineup, which helps position this system as more than a compact AI inference box. It is also being framed as a scalable storage and compute node for growing AI datasets and enterprise pipelines.

From a software and deployment perspective, QNAP is emphasizing usability. The company says the QAI h1290FX supports containerized AI environments with GPU resource management, enabling users to deploy AI applications through Docker and LXD based workflows without needing heavy command line setup. QNAP also promotes local deployment for private chat assistants, knowledge bases, and document search platforms, which is a strong value proposition for companies concerned with privacy, compliance, and cloud cost control.

QNAP’s own benchmark figures also paint a clear picture of what the RTX PRO 6000 96GB configuration is aiming to deliver. In Ollama based inference tests, the company reports up to 172 tokens per second depending on model size and quantization. The smallest tested models post the highest throughput, while larger models still remain within practical local deployment territory thanks to the large VRAM pool.

Model Token per second VRAM usage
gpt oss 120b MXFP4 90 ~63GB
deepseek r1 70b q4 K M 24 ~41GB
qwen3 32b q4 K M 46 ~21GB
gemma3 27b q4 K M 54 ~19GB
deepseek r1 8b q4 K M 140 ~7GB
qwen3 8b q4 K M 172 ~7GB

QNAP also shared concurrent vLLM inference numbers for DeepSeek R1 Distill Qwen 7B and gpt oss 20b. The DeepSeek result scales from 79 tokens per second in a single thread to 850 tokens per second at 50 threads, although average throughput per thread naturally drops as concurrency rises. For gpt oss 20b, QNAP reports a peak of 1045 total tokens per second at 5 threads before performance tapers off at higher thread counts. These figures suggest that the system is not only being marketed as a single user local AI server, but also as a multi user inference appliance for internal enterprise services.

DeepSeek R1 Distill Qwen 7B Total token per second Avg token per thread per second
1 thread 79 79
2 threads 166 83
5 threads 410 82
10 threads 688 68.8
20 threads 810 40.5
50 threads 850 17
gpt oss 20b Total token per second Avg token per thread per second
1 thread 218 218
2 threads 340 170
5 threads 1045 209
10 threads 880 88
20 threads 600 30

QNAP backs the system with a 5 year warranty as standard, which helps reinforce its enterprise positioning. Based on the pricing information shared, the QAI h1290FX starts at 8999 dollars for the 64GB version, rises to 13499 dollars for the 128GB model, and reaches 15999 dollars for the 256GB variant. RAM, expansion cards, and some configuration options are sold separately, which means real world deployment costs can scale notably depending on network, storage, and GPU requirements.

Overall, the QAI h1290FX is a fascinating product because it does not chase the newest server CPU platform, yet still looks compelling thanks to its modern GPU options, fast all flash design, and strong enterprise I O. For companies exploring private AI infrastructure, it presents a practical bridge between traditional NAS functionality and dedicated local inference hardware. In a market where AI deployments are increasingly shaped by privacy, latency, and storage demands, QNAP’s latest system looks like a serious attempt to turn edge AI into a more deployable and centralized business platform.

Would you consider a dedicated AI NAS like this for local LLM workflows, or would you still rather build a custom AI server from separate parts?

Share
Angel Morales

Founder and lead writer at Duck-IT Tech News, and dedicated to delivering the latest news, reviews, and insights in the world of technology, gaming, and AI. With experience in the tech and business sectors, combining a deep passion for technology with a talent for clear and engaging writing

Previous
Previous

Samsung Foundry Reportedly Nears Profitability as 4nm Demand Surges on HBM4 Momentum

Next
Next

Sub 1nm Logic Roadmap Points to 2034 Debut, With 2D FET Era Targeted for 2043 to 2046