The MLPerf battle for benchmark supremacy is intensifying. NVIDIA and AMD have both revealed results that showcase their AI computing prowess.
NVIDIA’s Blackwell Ultra architecture that powers its GB300 NVL72 rack-scale system has set new records on the MLPerf Inference v5.1 suite, notably delivering up to 1.4 times more DeepSeek-R1 inference throughput compared to its prior Blackwell-based GB200 NVL72 systems.
The architecture features 1.5 times more NVFP4 AI compute and twice the attention-layer acceleration over Blackwell, along with up to 288GB of HBM3e memory per GPU.
NVIDIA’s full-stack co-design, which includes the proprietary NVFP4 4-bit floating point format and TensorRT Model Optimizer software, enables highly efficient quantisation without sacrificing accuracy.
The result is top performance on new benchmarks such as DeepSeek-R1, Llama 3.1 405B Interactive and Whisper, while maintaining record-holding per-GPU results across the MLPerf data center benchmarks.
NVIDIA’s MLPerf results with Blackwell Ultra demonstrate a significant leap in AI inference throughput and efficiency, highlighting its leadership in delivering high-performance, cost-effective AI infrastructure for large-scale generative models and complex AI workloads.
Focus on efficiency and scalability
Meanwhile, AMD’s Instinct GPU portfolio featuring the MI355X GPU with AMD’s CDNA 4 architecture focused on efficiency, scalability and real-world deployment flexibility.
AMD introduced FP4 precision on the MI355X that achieved a 2.7x throughput increase on Llama 2 70B Server benchmarks compared to FP8 results on its MI325X GPU, all while maintaining accuracy.
It also showcased its structured pruning technology that delivers up to 90 percent inference throughput uplift on a 33 percent pruned Llama 3.1 405B model without losing accuracy.
The MI355X GPU demonstrated smooth multi-node scaling from single GPU to eight-node clusters, with linear performance gains and stable efficiency.
Maturing AI hardware landscape
NVIDIA’s approach exemplifies a full-stack highly optimised platform that leverages custom data formats and software to push peak inference throughput. This is ideal for scenarios demanding cutting-edge speed and accuracy.
On the other hand, AMD is advancing generative AI deployment through practical efficiency innovations such as FP4 precision and structured pruning that help lower costs and ease scaling complexities for ultra-large models.
Together, these results signal a maturing AI hardware landscape where raw performance, energy efficiency and flexible deployment models are equally vital.
The competitive dynamic is certainly good news to AI developers and enterprises seeking the best balance of performance, cost efficiency and scalability for their growing AI workloads.
