NVIDIA has launched the NVIDIA HGX H200 equipped with advanced memory capabilities designed to handle colossal volumes of data for generative AI and high-performance computing (HPC) workloads.
The NVIDIA H200 incorporates next evolution HBM3e high-bandwidth memory technology that delivers 41GB of memory at 4.8 terabytes per second, nearly doubling the capacity and providing 2.4 times more bandwidth compared to its predecessor, the NVIDIA A100.
Expected to hit the market in the second quarter of 2024, systems powered by the H200 GPU from leading server manufacturers and cloud service providers are anticipated to revolutionise the landscape of AI and HPC applications.
“To create intelligence with generative AI and HPC applications, vast amounts of data must be efficiently processed at high speed using large, fast GPU memory. With NVIDIA H200, the industry’s leading end-to-end AI supercomputing platform just got faster to solve some of the world’s most important challenges,” said Ian Buck, Vice President of Hyperscale and HPC at NVIDIA.
The H200 promises substantial performance improvements, nearly doubling the inference speed on Llama 2, a 70 billion-parameter Large Language Model (LLM), compared to the H100. Future software updates are expected to further solidify its performance leadership.
Available in four- and eight-way configurations on NVIDIA HGX H200 server boards, the GPU is compatible with both the hardware and software of HGX H100 systems. It also features in the NVIDIA GH200 Grace Hopper Superchip with HBM3e. This versatility allows deployment in various data centre environments, including on premises, cloud, hybrid-cloud, and edge.
Major cloud service providers, including Amazon Web Services, Google Cloud, Microsoft Azure, and Oracle Cloud Infrastructure, are slated to deploy H200-based instances in the coming year. NVIDIA’s global ecosystem of partner server makers is prepared to update existing systems with the H200, ensuring its widespread adoption.
Equipped with NVIDIA NVLink and NVSwitch high-speed interconnects, the H200 delivers peak performance on various application workloads, including LLM training and inference for models beyond 175 billion parameters. An eight-way HGX H200 provides more than 32 petaflops of FP8 deep learning compute and 1.1TB of aggregate high-bandwidth memory, solidifying its position as a powerhouse in generative AI and HPC applications.
When paired with NVIDIA Grace CPUs featuring an ultra-fast NVLink-C2C interconnect, the H200 forms the GH200 Grace Hopper Superchip with HBM3e — an integrated module designed to cater to giant-scale HPC and AI applications.
NVIDIA’s accelerated computing platform, supported by powerful software tools, including the NVIDIA AI Enterprise suite, enables developers and enterprises to build and accelerate production-ready applications from AI to HPC.
The NVIDIA H200 is expected to be available from global system manufacturers and cloud service providers in Q2 of 2024.
