NVIDIA Blackwell Ultra has swept all seven tests in the MLPerf Training v5.1 benchmark.
The Blackwell Ultra-powered GB300 NVL72 rack system set new records by delivering more than four times faster pretraining of the Llama 3.1 405B model and nearly five times faster fine-tuning of Llama 2 70B compared to NVIDIA’s previous Hopper generation platform.
Key architectural advancements include enhanced Tensor Cores delivering 15 petaflops of NVFP4 AI compute, twice the attention-layer throughput and 279GB of high-bandwidth HBM3e memory per GPU, combined with scaling enabled by NVIDIA’s Quantum-X800 InfiniBand networking.
Developed by the MLCommons consortium, MLPerf is the premier industry-standard benchmark suite that rigorously evaluates AI system performance across real-world workloads such as large language models, vision, recommendation, and graph neural networks. Its open and peer-reviewed testing enables unbiased comparison to guide enterprises in making informed AI infrastructure investments.
A major factor behind this performance leap is NVIDIA’s pioneering use of NVFP4 (FP4) precision, which performs low-bit calculations at up to triple the speed of FP8, effectively multiplying compute throughput while maintaining accuracy standards required by MLPerf.
This contributed to record-breaking results such as completing the training of the massive Llama 3.1 405B model in just 10 minutes using more than 5,000 Blackwell GPUs — nearly three times faster than prior Blackwell benchmarks — as well as efficient scaling shown by a 45 percent speed improvement with 2,560 GPUs.
The 5.1 benchmark round also introduced new tests representing the cutting edge, including Llama 3.1 8B and the Flux.1 text-to-image generation model, where NVIDIA again set the fastest training times submitted.
The broad ecosystem involvement of 20 organisations, including major OEMs and research institutions, underlines the vitality and maturity of AI computing progress driven by these benchmarks.
