Multipath Reliable Connection (MRC) is a new networking technology that makes AI data centres run faster and more reliably. It’s like a smarter highway for data where instead of sending all traffic down a single road, the foundational networking protocol for gigascale AI clusters spreads it across many routes at once, so congestion, slowdown, or even a broken link do not bring everything to a halt.
AI clusters are only as strong as their weakest link and the network is often that weak link. MRC fixes that by keeping data flowing smoothly between hundreds of thousands of AI chips, even when parts of the network fail or get busy. This enables AI training jobs to finish faster, GPUs to stay busy, and enterprises to get more models out of the same hardware.
Built into systems such as NVIDIA’s Spectrum‑X Ethernet AI fabric, MCO is backed by a rare cross‑industry alliance that includes AMD, Broadcom, Intel, Microsoft, NVIDIA, and OpenAI.
The protocol is now being standardised under the Open Compute Project (OCP), signaling its intent to become a shared AI‑native Ethernet fabric rather than a proprietary extension.
MRC is a next‑generation RDMA transport protocol that builds on RoCE (RDMA over Converged Ethernet) and extends SRv6 source routing to split a single data transmission across hundreds of network paths.
By distributing traffic in this way, MRC reduces core‑fabric congestion, minimises latency variation, and lets clusters bypass link or switch failures in microseconds while maintaining high effective bandwidth.
For large‑scale training jobs, MRC dynamically detects congestion and outages, then re‑routes and retransmits only the affected packets, keeping GPU utilisation high and avoiding long stalls on multi‑day runs. This is especially critical as AI factories scale toward hundreds of thousands of GPUs, where even brief network hiccups can compound into costly idle time.
NVIDIA’s Spectrum‑X and MRC
MRC is a core capability of NVIDIA’s Spectrum‑X Ethernet platform, which is designed as a purpose‑built, AI‑native Ethernet fabric for gigascale clusters. It runs natively across NVIDIA ConnectX SuperNICs and Spectrum‑X switches to leverage deep telemetry and hardware‑accelerated load‑balancing planes to spread traffic across multiple physical paths with predictable latency.
MRC on Spectrum‑X improves GPU utilisation by keeping every GPU fed with data even under heavy congestion, while Spectrum‑X’s multi‑plane architecture further boosts resilience and scale without sacrificing performance.
NVIDIA has already deployed MRC in frontier‑scale AI production environments and is now releasing the MRC specification via OCP to broaden its ecosystem use.
AMD’s involvement in MRC
AMD helped co‑author and co‑lead the MRC specification. Its AI NICs (including 400G and 800G designs) have been used to implement and test MRC in large‑scale AI clusters.
The company has demonstrated deployment‑ready performance with test clusters at a major cloud provider.
“As GPUs and CPUs continue to drive compute, real bottleneck in scaling AI is the network. AMD, alongside OpenAI and Microsoft announced MRC, marking a major step forward for the industry. The programmability from AMD enables us to rapidly turn innovations like this into real-world performance at scale, where consistent, resilient throughput matters more than theoretical peak bandwidth,” said Krishna Doddapaneni, CVP of Engineering at AMD’s NTSG.
The broader ecosystem
OpenAI and Microsoft are central to the announcement, with OpenAI driving the need for highly resilient, large‑scale training fabrics and Microsoft contributing both cloud‑scale requirements and deployment experience.
Broadcom and Intel are also contributors to the MRC specification, showing that multiple silicon and networking vendors are aligning around a common AI‑Ethernet transport layer.
By contributing MRC to the Open Compute Project, the consortium is adopting a “standard on the wire, differentiated in silicon and software” playbook similar to what NVIDIA has used for InfiniBand. This lets hyperscalers, cloud providers and enterprise AI builders adopt a shared AI‑networking substrate while still competing on implementation quality, performance and telemetry.
For enterprises, MCP allows them to avoid being locked into a single, expensive proprietary network and instead build scalable, cost‑effective AI factories using a shared, modern networking layer.
