Google claims to break AI performance records by building world’s fastest training supercomputer


Google claims to break AI performance records by building the world’s fastest machine learning training supercomputer

Today, the technology super giant Google reported that the company has now built the world’s fastest machine learning (ML) training supercomputer that broke AI performance records in six out of eight industry-leading MLPerf benchmarks. The recent ML-enabled advances have included more helpful search results and a single ML model that can translate 100 different languages, claims the tech giant.

The latest results from the industry-standard MLPerf benchmark competition demonstrate that Google has built the world’s fastest ML training supercomputer. Using this supercomputer, as well as our latest Tensor Processing Unit (TPU) chip, Google set performance records in six out of eight MLPerf benchmarks.


According to the reports it took more than three weeks to train one of these models on the most advanced hardware accelerator available in 2015. However, Google’s latest Tensor Processing Unit (TPU) chip can train the same model almost five orders of magnitude faster just five years later.

These Machine learning models are particularly selected to be representative of cutting-edge machine learning workloads that are common throughout industry and academia.

It was also claimed that the Supercomputer that Google has built for the MLPerf training round is four times larger than the “Cloud TPU v3 Pod” that set three records in the previous competition. The system includes 4096 TPU v3 chips and hundreds of CPU host machines, all connected via an ultra-fast, ultra-large-scale custom interconnect. In total, this system delivers over 430 PFLOPs of peak performance.

Google’s MLPerf Training v0.7 submissions demonstrate our commitment to advancing machine learning research and engineering at scale and delivering those advances to users through open-source software, Google’s products, and Google Cloud.


“Google’s fourth-generation TPU ASIC offers more than double the matrix multiplication TFLOPs of TPU v3,” Kumar detailed in the post. Matrix multiplication is a type of mathematical operation AI models use to process data, while a TFLOP is a trillion floating-point operations per second. For perspective, the third-generation TPU v3 against which the new chip was compared can manage 420 trillion operations per second.

What do you think about Google’s Supercomputer? Do mention your views in the comment section below. For more news on tech and cybersecurity stay tuned on Android Rookies by subscribing to our newsletter from here.


About Author

Be Ready for the challenge

Notify of
Inline Feedbacks
View all comments