Tiiny AI Unveils Pocket Lab Featuring ARM v9.2 Cores and 190 TOPS NPU Delivering On Device Performance for 120B LLMs
Edge AI continues accelerating as consumers and researchers look for ways to deploy powerful models locally without relying on cloud infrastructure or enterprise tier hardware. Nvidia’s DGX Spark remains out of reach for many users with pricing approaching four thousand dollars. Tiiny AI a new startup in the edge compute space aims to disrupt this landscape with an ultra compact and cost effective alternative. The company has introduced what it claims to be the world’s smallest supercomputer the Tiiny AI Pocket Lab a device small enough to fit in a pocket but powerful enough to run 120 billion parameter models entirely on device.
The Pocket Lab measures only 14.2 × 8 × 2.53 centimeters and weighs roughly 300 grams yet Tiiny AI states that it can execute LLMs capable of PhD level reasoning multi step analysis and deep contextual understanding. These capabilities position the device as an accessible tool for both enthusiasts and professionals who want to experiment with advanced local inference without the high cost of traditional AI workstations.
At the core of the Pocket Lab is a custom heterogeneous compute architecture anchored by a twelve core ARM v9.2 processor and a discrete neural processing unit delivering approximately 120 TOPS. Tiiny AI pairs this with eighty gigabytes of LPDDR5X memory and a one terabyte SSD enabling aggressive quantization strategies that allow a 120 billion parameter model to run in a fully offline environment. The system operates at a thirty watt TDP with a typical draw near sixty five watts maintaining energy efficiency despite its computational output.
| Category | Specification |
|---|---|
| Processor | ARMv9.2 12-core CPU |
| AI Compute Power | Custom heterogeneous module (SoC + dNPU), ≈ 190 TOPS |
| Memory & Storage | 80GB LPDDR5X RAM + 1TB SSD |
| Model Capacity | Runs up to 120B-parameter LLMs fully on-device |
| Power Efficiency | 30W TDP, ~65W typical system power |
| Dimensions & Weight | 14.2 × 8 × 2.53 cm, ~300g (pocket-sized) |
| Ecosystem | One-click deployment for dozens of open-source LLMs & agent frameworks |
| Connectivity | Fully offline operation — no internet or cloud required |
According to Tiiny AI the Pocket Lab supports LLMs from GPT OSS Llama Qwen DeepSeek Mistral and Phi. Achieving seamless 120 billion parameter inference in a device this small is enabled by two key technologies developed or adopted by the company.
TurboSparse provides neuron level sparse activation enabling more efficient inference without degrading model intelligence.
PowerInfer an open source heterogeneous inference engine with more than eight thousand GitHub stars dynamically distributes workload between the CPU and NPU achieving server grade throughput at a fraction of the cost and power consumption.
Together these technologies allow the Pocket Lab to reach performance levels previously reserved for high end GPUs and enterprise inference cards. Tiiny AI positions its solution as a breakthrough for portable AI research developer experimentation and secure air gapped environments requiring offline operation.
Tiiny AI is preparing to showcase the Pocket Lab publicly at CES 2026. Pricing and retail availability have not yet been announced but interest is expected to surge as the device targets a rapidly growing segment of users seeking practical edge AI hardware without enterprise scale investment.
The Tiiny AI Pocket Lab could mark a turning point in local AI accessibility offering new possibilities for creators engineers and researchers leveraging powerful models on the go.
Would you consider using a pocket sized AI workstation to run 120 billion parameter models locally? What use case would you explore first?
