Scaling AI is not just about adding GPUs. The AMD Pensando Pollara 400 AI NIC powers the RoCE-based RDMA networking behind Crusoe's hosted AMD Instinct MI355X GPU instances, enabling multi-node AI workloads to scale across the cluster. Read more ⬇️
Virtualizing AMD Instinct™ MI355X GPUs isn't just swapping out hardware. It's a completely different networking stack, memory registration path, and a set of failure modes nobody had written down yet. In early 2026, Crusoe became one of the first cloud providers to offer virtualized AMD Instinct™ MI355X GPU instances. Engineers Shubham Chakrawar and Andrew Carp documented exactly what it took: ▪️ Three non-obvious bugs ▪️ Real root causes 👉 and a result of 368 GB/s all-reduce bus bandwidth across two 8-GPU VMs — zero errors, matching AMD's own bare-metal targets. If you're working on distributed GPU workloads on AMD hardware, this one's worth the read: https://bit.ly/4v0TvyP