HighPoint and Hailo cram 160 TOPS of AI into a single PCIe slot

AI companies keep talking about giant datacenters packed with expensive GPUs, but that is not always practical for real-world deployments. If you are building robotics systems, industrial automation hardware, or localized AI appliances, shoving a power-hungry GPU into every box can get expensive fast. That is where the new HighPoint Rocket 1604L platform comes in.

This week, HighPoint Technologies announced a partnership with Hailo to create what it calls an Enterprise Edge AI Compute Platform. The setup combines HighPoint’s Rocket 1604L PCIe Gen5 expansion card with Hailo AI accelerator modules to create a compact, high-density inference platform.

To be clear, this is not an entirely new hardware platform from scratch. The announcement is really about combining existing products into a validated edge AI solution. HighPoint already sells the Rocket 1604L expansion card, while Hailo already offers the Hailo-8 and Hailo-10H M.2 AI accelerator modules. What is new here is the companies formally packaging the combination as a scalable edge AI compute platform for system integrators and enterprise deployments.

HighPoint and Hailo b

Hailo is the company making the actual AI compute hardware. Its Hailo-8 and Hailo-10H M.2 modules handle the AI inference workloads themselves, including object detection, robotics processing, computer vision, and sensor analysis. Think of those modules as compact dedicated AI processors built specifically for neural network inference.

HighPoint, meanwhile, is not making the AI chips. Instead, its Rocket 1604L card acts as the infrastructure layer that allows multiple Hailo modules to operate together efficiently inside a single PCIe Gen5 slot.

The Rocket 1604L supports four dedicated Gen5 x4 M.2 connections and uses an Astera Labs PCIe Gen5 retimer to maintain signal integrity between the CPU and attached accelerator modules. In simpler terms, the HighPoint card functions like a traffic controller and high-speed communication bridge, ensuring data moves reliably and with extremely low latency between the host system and the Hailo accelerators.

According to the companies, the configuration can deliver as much as 160 TOPS of INT4 AI performance and exceed 4,100 FPS running YOLOv8n inference workloads. That is a lot of performance packed into a surprisingly compact footprint.

What makes the Rocket 1604L interesting is that it is not just a passive adapter card. Generic PCIe bifurcation cards already exist, but PCIe Gen5 introduces major signal integrity challenges at 32GT/s speeds. HighPoint says its integrated retimer actively cleans and regenerates the PCIe signal before sending it to each attached module, helping avoid the instability and performance bottlenecks that can happen with passive expansion solutions.

That might sound like boring engineering jargon, but it matters quite a bit once you start packing multiple high-speed AI accelerators into a compact chassis.

HighPoint says benchmark testing showed nearly perfect linear scaling as additional Hailo modules were added. In theory, that means each additional accelerator contributes close to its full expected performance instead of hitting bandwidth or latency limitations.

The company shared benchmark figures including 4,144 FPS using YOLOv8n and 2,173 FPS using YOLOv5s across four Hailo modules. HighPoint also claims variance between modules remained below 0.01 percent during testing.

One thing I actually like about this announcement is that it feels grounded in practical deployment rather than pure AI hype. Not every AI workload needs a giant NVIDIA GPU consuming massive amounts of power. Many edge deployments care more about thermals, reliability, physical size, and efficiency than flashy benchmark slides.

That is especially true for industrial inspection systems, robotics, retail analytics, smart surveillance, and sensor fusion workloads where localized inference can make more sense than constantly bouncing data back and forth to the cloud.

The Rocket 1604L itself is physically compact too. HighPoint says the card is roughly 40 percent shorter than competing four-bay M.2 cards, allowing it to fit inside tighter 1U and 2U edge systems where full-size GPUs may not fit at all.

Cooling is another focus. The card includes a full-length aluminum heatsink, integrated cooling fan, thermal padding, and a ventilated PCIe bracket designed to reduce thermal throttling during sustained workloads.

Of course, AI marketing language is everywhere right now, and companies are slapping “AI-ready” labels on nearly everything. Still, modular edge inference hardware is one of the few AI categories that actually feels useful beyond keynote presentations and investor decks.

The HighPoint Rocket 1604L is available now through the company’s distributor network. Hailo accelerator modules are sold separately through Hailo’s marketplace and channel partners.

Support independent tech journalism

NERDS.xyz is independently owned and operated. If you enjoy my coverage of Linux, AI, hardware, cybersecurity, and tech culture, consider supporting the site on Ko-fi.

Support NERDS.xyz
Avatar of Brian Fagioli
Written by

Brian Fagioli

Technology journalist and founder of NERDS.xyz

Brian Fagioli is a technology journalist and founder of NERDS.xyz. A former BetaNews writer, he has spent over a decade covering Linux, hardware, software, cybersecurity, and AI with a no nonsense approach for real nerds.

Leave a Comment