Industry News

With more than 4,000 chips strung together, Google says its supercomputers are faster and more energy efficient than Nvidia's

Views : 43
Update time : 2023-04-18 12:00:50
        Alphabet Inc.'s Google Inc. on Tuesday unveiled new details about its supercomputers used to train artificial intelligence models, saying the systems are faster and more power-efficient than their Nvidia counterparts.
        Google has designed its own chips, called Tensor Processing Units (TPUs), to train artificial intelligence models, which are used in more than 90 percent of the company's AI training efforts, for tasks such as answering questions in human language or generating images. Google's TPUs are now in their fourth generation. Google published a scientific paper on Tuesday detailing how they use their own custom-developed optical switches to connect more than 4,000 chips in series into a supercomputer.
 
        Improving these connections has become a key point of competition between companies building AI supercomputers, as the size of the so-called large language models that power technologies like Google's Bard or OpenAI's ChatGPT has exploded, meaning they are too big to be stored on a single chip. 
        These models have to be partitioned into thousands of chips, which then have to work in tandem for weeks or more to train the models. Google's PaLM model - its largest publicly disclosed language model to date - was trained over 50 days by spreading it across two supercomputers with 4,000 chips. 
        Google says its supercomputers can easily reconfigure the connections between the chips in real time, helping to avoid problems and improve performance. 
        In a blog post about the system, Google researcher Norm Jouppi and Google Distinguished Engineer David Patterson wrote: "Circuit switching made it easy for us to bypass faulty components. This flexibility even allows us to change the topology of the supercomputer interconnect to accelerate the performance of ML (machine learning) models." 
        Although Google is only now announcing details of its supercomputer, it is already coming online internally in 2020, running in a data centre in Mayes County, Oklahoma (USA). Google said the startup Midjourney used the system to train its model, which can generate images after inputting text. 
        In its paper, Google said its supercomputer was 1.7 times faster and 1.9 times more energy efficient than a system based on the Nvidia A100 chip for a system of the same size. Google said it did not compare its fourth-generation product to Nvidia's current flagship H100 chip because the H100 came to market after Google's chip and was built with newer technology. Google hinted that they may be working on a new TPU to compete with the Nvidia H100.


 
Related News
Read More >>
W25Q256JVEIQ by Winbond Electronics | Flash W25Q256JVEIQ by Winbond Electronics | Flash
Apr .25.2025
The Winbond W25Q256JVEIQ is a 256M-bit serial flash memory. It supports Standard, Dual, and Quad SPI modes with a high - speed 133MHz clock, enabling fast data transfer. Operating within 2.7V - 3.6V, it features low power consumption and a wide operating
EP2C5T144C8N Intel / Altera -FPGAs EP2C5T144C8N Intel / Altera -FPGAs
Apr .23.2025
The Intel/Altera EP2C5T144C8N is a versatile FPGA from the Cyclone II family. With 4,608 logic elements, it enables efficient implementation of complex digital designs. Operating at up to 260 MHz, it offers high - speed performance. Housed in a 144 - pin
Analog Devices AD7606BSTZ Analog to Digital Converters Analog Devices AD7606BSTZ Analog to Digital Converters
Apr .21.2025
The Analog Devices AD7606BSTZ is an 8 - channel, 16 - bit data acquisition system. It can handle true bipolar analog input signals with ranges of ±10 V or ±5 V. Operating on a single 5 V analog supply and 2.3 V - 5 V VDRIVE, it offers a fully integrated s
CD4030BE | Texas Instruments | Logic Gates CD4030BE | Texas Instruments | Logic Gates
Apr .18.2025
The Texas Instruments CD4030BE is a versatile integrated circuit. It consists of four independent exclusive - OR (XOR) gates, which perform logical operations crucial for data processing, error detection, and signal manipulation in digital systems. Operat