QNAP QAI h1290FX Brings Edge AI NAS Power With 16 Zen 2 Cores and Up to 96GB of RTX PRO 6000 Blackwell VRAM
QNAP has introduced the QAI h1290FX, a new edge AI NAS platform built for local LLM, RAG, and broader generative AI workflows. The system blends a server class AMD EPYC platform with NVIDIA RTX PRO Blackwell graphics, creating a machine that targets businesses and developers who want on premises AI acceleration without shifting sensitive workloads to the cloud. QNAP positions the system as a GPU ready AI storage server with 12 U.2 NVMe bays, built in 25GbE connectivity, and support for modern AI deployment environments such as Container Station.
One of the more interesting aspects of this launch is the processor choice. QNAP pairs the system with the AMD EPYC 7302P, a Zen 2 era chip with 16 cores and 32 threads. While it is not based on AMD’s newest server architecture, it still offers solid multithreaded throughput for inference orchestration, storage handling, virtualization, and edge compute roles. In practice, this gives the QAI h1290FX a mix of old and new, with a proven server CPU on one side and Blackwell class workstation GPU acceleration on the other. QNAP also lists 128GB RDIMM DDR4 ECC memory as a platform configuration point, with expansion up to 1TB.
On the graphics side, QNAP supports NVIDIA RTX PRO Blackwell options including the RTX PRO 4500 Blackwell with 32GB of GDDR7 ECC memory and the flagship RTX PRO 6000 Blackwell Max Q Workstation with 96GB of GDDR7 ECC memory. QNAP says the RTX PRO 6000 configuration is designed for large scale models of 70B parameters and above, while the RTX PRO 4500 is aimed at mid sized deployments up to around 30B parameters. That gives the QAI h1290FX clear flexibility depending on whether the buyer is building a more focused RAG appliance or a heavier local inference server for larger generative AI workloads.
Storage and connectivity are equally important here. QNAP equips the platform with 12 U.2 NVMe SSD bays with SATA SSD support, allowing businesses to tune the system around speed, capacity, or cost. Networking is also a strong part of the package, with dual 25GbE and dual 2.5GbE ports built in, while PCIe expansion allows optional upgrades such as 100GbE cards. QNAP also highlights compatibility with its JBOD expansion lineup, which helps position this system as more than a compact AI inference box. It is also being framed as a scalable storage and compute node for growing AI datasets and enterprise pipelines.
From a software and deployment perspective, QNAP is emphasizing usability. The company says the QAI h1290FX supports containerized AI environments with GPU resource management, enabling users to deploy AI applications through Docker and LXD based workflows without needing heavy command line setup. QNAP also promotes local deployment for private chat assistants, knowledge bases, and document search platforms, which is a strong value proposition for companies concerned with privacy, compliance, and cloud cost control.
QNAP’s own benchmark figures also paint a clear picture of what the RTX PRO 6000 96GB configuration is aiming to deliver. In Ollama based inference tests, the company reports up to 172 tokens per second depending on model size and quantization. The smallest tested models post the highest throughput, while larger models still remain within practical local deployment territory thanks to the large VRAM pool.
| Model | Token per second | VRAM usage |
|---|---|---|
| gpt oss 120b MXFP4 | 90 | ~63GB |
| deepseek r1 70b q4 K M | 24 | ~41GB |
| qwen3 32b q4 K M | 46 | ~21GB |
| gemma3 27b q4 K M | 54 | ~19GB |
| deepseek r1 8b q4 K M | 140 | ~7GB |
| qwen3 8b q4 K M | 172 | ~7GB |
QNAP also shared concurrent vLLM inference numbers for DeepSeek R1 Distill Qwen 7B and gpt oss 20b. The DeepSeek result scales from 79 tokens per second in a single thread to 850 tokens per second at 50 threads, although average throughput per thread naturally drops as concurrency rises. For gpt oss 20b, QNAP reports a peak of 1045 total tokens per second at 5 threads before performance tapers off at higher thread counts. These figures suggest that the system is not only being marketed as a single user local AI server, but also as a multi user inference appliance for internal enterprise services.
| DeepSeek R1 Distill Qwen 7B | Total token per second | Avg token per thread per second |
|---|---|---|
| 1 thread | 79 | 79 |
| 2 threads | 166 | 83 |
| 5 threads | 410 | 82 |
| 10 threads | 688 | 68.8 |
| 20 threads | 810 | 40.5 |
| 50 threads | 850 | 17 |
| gpt oss 20b | Total token per second | Avg token per thread per second |
|---|---|---|
| 1 thread | 218 | 218 |
| 2 threads | 340 | 170 |
| 5 threads | 1045 | 209 |
| 10 threads | 880 | 88 |
| 20 threads | 600 | 30 |
QNAP backs the system with a 5 year warranty as standard, which helps reinforce its enterprise positioning. Based on the pricing information shared, the QAI h1290FX starts at 8999 dollars for the 64GB version, rises to 13499 dollars for the 128GB model, and reaches 15999 dollars for the 256GB variant. RAM, expansion cards, and some configuration options are sold separately, which means real world deployment costs can scale notably depending on network, storage, and GPU requirements.
Overall, the QAI h1290FX is a fascinating product because it does not chase the newest server CPU platform, yet still looks compelling thanks to its modern GPU options, fast all flash design, and strong enterprise I O. For companies exploring private AI infrastructure, it presents a practical bridge between traditional NAS functionality and dedicated local inference hardware. In a market where AI deployments are increasingly shaped by privacy, latency, and storage demands, QNAP’s latest system looks like a serious attempt to turn edge AI into a more deployable and centralized business platform.
Would you consider a dedicated AI NAS like this for local LLM workflows, or would you still rather build a custom AI server from separate parts?
