Rise of accelerated computing in data centres

Can’t say this was unexpected as NVIDIA retorts Google’s claim that its custom ASIC Tensor Processing Unit (TPU) was up to 30 times faster than CPUs and NVIDIA’s K80 G for inferencing workloads.

NVIDIA pointed out that Google’s TPU paper has drawn a clear conclusion – without accelerated computing, the scale-out of AI is simply not practical.

The role of data centres has changed considerably in today’s economy. Instead of just serving web pages, advertising and video content, data centres are now recognising voices, detecting images in video streams and connecting users with information they need when they need it.

Increasingly, those capabilities are enabled by a form of artificial intelligence (AI) called deep learning. Deep learning is an algorithm that learns from massive amounts of data to create software that can tackle such challenges as translating languages, diagnosing cancer and teaching autonomous cars to drive. The change brought about by AI is accelerating at a pace never seen before in our industry.

A pioneering researcher of deep learning, Geoffrey Hinton, told The New Yorker recently, “Take any old classification problem where you have a lot of data, and it’s going to be solved by deep learning. There’s going to be thousands of applications of deep learning.”

Unreasonably effective results
Take Google. Its application of groundbreaking work in deep learning has captured the world’s attention: The startling precision of its Google Now service; the landmark victory over the world’s greatest Go player; Google Translate’s ability to operate in 100 different languages.

Deep learning has achieved unreasonably effective results. But the approach demands that computers process vast seas of data at precisely the time when Moore’s law is slowing. Deep learning is a new computing model that has required the invention of a new computing architecture.

This changing architecture of the AI compute model has occupied NVIDIA for some time. In 2010, Dan Ciresan, a researcher at Professor Juergen Schmidhuber’s Swiss AI Lab, discovered that NVIDIA GPUs can be used to train deep neural networks and achieved a speedup of 50 times over CPUs.

A year later, Schmidhuber’s lab used GPUs to develop the first pure deep neural networks that won international contests in handwriting recognition and computer vision. Then, in 2012, Alex Krizhevsky, then a grad student at the University of Toronto, won the now-famous annual ImageNet large-scale image recognition competition using a pair of GPUs. (Schmidhuber has chronicled a comprehensive history of the impact of GPU deep learning on modern computer vision.)

Optimised for deep learning
AI researchers all over the world have discovered that the GPU-accelerated computing model NVIDIA had pioneered for computer graphics and supercomputing applications is ideal for deep learning. Deep learning – like 3D graphics, medical imaging, molecular dynamics, quantum chemistry and weather simulations – is a linear-algebra algorithm that requires massively parallel computation of tensors, or multi-dimensional vectors.

And while NVIDIA’s Kepler-generation GPU, architected in 2009, helped awaken the world to the possibility of using GPU-accelerated computing in deep learning, it was never specifically optimised for that task.

NVIDIA started developing new generations of GPU architecture, first Maxwell, and then Pascal, which included many architecture advances specifically for deep learning. Introduced just four years after the Kepler-based Tesla K80, the Pascal-based Tesla P40 Inferencing Accelerator delivers 26 times its deep-learning inferencing performance, far outstripping Moore’s law.

During this time, Google designed a custom accelerator chip called the tensor processing unit, or TPU, specifically to handle inferencing, which it deployed in 2015.

Its team released technical information about the benefits of TPUs this past week. It asserts, among other things, that the TPU has 13 times the inferencing performance of the K80. However, it doesn’t compare the TPU to the current generation Pascal-based P40.

Updating Google’s Comparison
To update Google’s comparison, NVIDIA created the chart below to quantify the performance leap from K80 to P40, and to show how the TPU compares to current NVIDIA technology.

The P40 balances computational precision and throughput, on-chip memory and memory bandwidth to achieve unprecedented performance for training, as well as inferencing. For training, P40 has 10 times the bandwidth and 12 teraflops of 32-bit floating point performance. For inferencing, P40 has high-throughput 8-bit integer and high-memory bandwidth.

Data based on “In-Datacenter Performance Analysis of a Tensor Processing Unit,” Jouppi et al [Jou17], and NVIDIA internal benchmarking. K80 to TPU performance ratios are based on the average of CNN0 and CNN1 acceleration ratios from [Jou17], which compared performance to a half-enabled K80. K80 to P40 performance ratios are based on GoogLeNet, a publicly available CNN model with similar performance properties.

While Google and NVIDIA chose different development paths, there were several themes common to both approaches. Specifically:

AI requires accelerated computing. Accelerators provide the significant data processing demands of deep learning in an era when Moore’s law is slowing.
Tensor processing is at the core of delivering performance for deep learning training and inference.
Tensor processing is a major new workload enterprises must consider when building modern data centers.
Accelerating tensor processing can dramatically reduce the cost of building modern data centers.

The technology world is in the midst of a historic transformation already being referred to as the AI Revolution. The place where its impact is most obvious today is in the hyperscale data centres of Alibaba, Amazon, Baidu, Facebook, Google, IBM, Microsoft, Tencent, and others. They need to accelerate AI workloads without having to spend billions of dollars building and powering new data centres with CPU nodes. Without accelerated computing, the scale-out of AI is simply not practical.

At the upcoming GPU Technology Conference on May 8 to 11, in San Jose, California, AI pioneers will talk about their groundbreaking discoveries. Participants will also learn about the latest advances in GPU computing and how they are revolutionising industries.

Share this:

Related