Unigen’s Amaretti AI Module Brings 60 TOPS, Up to 32GB Memory, and 10W Power to a Tiny E1.S Form Factor

As local AI workloads continue moving closer to the endpoint, more hardware vendors are looking for ways to deliver meaningful inference performance without the size, thermals, or power demands of a full accelerator card. Unigen is the latest company to push into that space with the new Amaretti E1.S AI module, a compact on premises GenAI accelerator built around EdgeCortix silicon and designed for low power inference deployments. According to Unigen, the module is aimed at secure local AI use cases, including LLM and VLM deployment, while staying inside a roughly 10W power envelope.

At the center of the Amaretti module is the SAKURA II AI accelerator from EdgeCortix. Official specifications from EdgeCortix list the chip at 60 TOPS of INT8 performance and 30 TFLOPS of BF16 compute, with dual 64 bit LPDDR4x memory support, 20MB of on chip SRAM, and up to 68GB per second of memory bandwidth. EdgeCortix also states that SAKURA II is designed to run multi billion parameter generative AI models in a typical 8W power envelope, which gives a strong foundation for what Unigen is trying to achieve with its compact module design.

What makes the Amaretti module particularly interesting is how much memory Unigen is pairing with that accelerator. The company says the module will be available in 16GB and 32GB configurations, which is notable for such a small form factor and highly relevant for local AI inference. More memory directly expands the range of models and workloads that can be handled on device, especially in edge and on premises environments where low power and compact deployment matter more than chasing datacenter class scale. Unigen’s announcement says the module is intended for GenAI and enterprise AI uses, while EdgeCortix separately notes that SAKURA II supports up to 32GB of DRAM in total.

That positioning is important because this product is not trying to compete with full size workstation GPUs or large PCIe AI cards. Instead, it fits into a very different market conversation. Amaretti is about making AI acceleration possible in smaller systems, edge appliances, and secure local environments where power efficiency, deployment density, and physical footprint are more important than maximum raw throughput. Unigen says the platform is meant for on premises GenAI, while EdgeCortix frames SAKURA II as an edge AI solution for generative models such as Llama 2, Stable Diffusion, DETR, and ViT.

The small physical footprint is one of the biggest reasons this stands out. EdgeCortix already offers SAKURA II in M.2 module form, and Unigen’s Amaretti implementation packages the accelerator into an E1.S module with a pre installed heatsink for deployment in denser server and enterprise environments. That means the core idea is not simply “tiny AI hardware,” but compact AI hardware that can be integrated into more standard enterprise and edge system designs rather than requiring a specialized full size accelerator slot. In practical terms, it opens the door to more modular AI expansion in systems that already prioritize compact storage and dense compute layouts.

Unigen is also pitching the platform around operational efficiency. The company says Amaretti can support GenAI LLMs up to 20B parameters, offers lead times as short as 14 weeks, and can scale up to 1920 TOPS of inference performance in air cooled dual CPU server deployments when multiple modules are used together. Those are ambitious claims, but they help explain who this product is really for: enterprises and system builders looking for a more efficient path to on premises AI serving rather than consumers looking for a simple laptop upgrade.

Software support is another key part of the pitch. Unigen says the module supports major AI ecosystems including TensorFlow, PyTorch, ONNX, and Hugging Face, which matters because compact AI hardware only becomes relevant when deployment friction is low enough for developers and enterprise teams to actually use it. EdgeCortix also emphasizes its MERA compiler and framework as part of the SAKURA II platform, positioning the hardware as part of a broader software plus silicon stack rather than just a standalone inference chip.

There is still one major missing piece, and that is price. Unigen has not announced pricing for the Amaretti module, which makes it difficult to judge how aggressively it can compete against small form factor AI cards, NPUs in newer client processors, or broader edge inference solutions. Still, the product itself is a strong sign of where this corner of the market is heading. As local agents, private AI workflows, and secure on premises inference become more attractive, compact accelerator modules like Amaretti could become a much more visible part of enterprise and edge system design.

The bigger takeaway is that AI hardware is continuing to diversify fast. Not every deployment wants a power hungry GPU or a large accelerator card. Some want something much smaller, cooler, and easier to integrate. With Amaretti, Unigen is making the case that there is real demand for AI acceleration that can fit into compact enterprise footprints while still delivering serious local inference capability.


Would you rather see local AI acceleration added through compact modules like this, or do you think full size GPUs will remain the only practical answer for most serious AI users?

Share
Angel Morales

Founder and lead writer at Duck-IT Tech News, and dedicated to delivering the latest news, reviews, and insights in the world of technology, gaming, and AI. With experience in the tech and business sectors, combining a deep passion for technology with a talent for clear and engaging writing

Previous
Previous

NVIDIA Launches Ising, the First Open AI Models Built to Speed Up Quantum Calibration and Error Correction

Next
Next

Full Circle Defends Skate’s Isle of Grom Paywall as Nike SB Collaboration Brings a Major New Event