Industry News

Cerebras Systems outperforms Nvidia in cost performance: 20 times faster and with a price only one-fifth that of Nvidia's

Views : 8
Update time : 2024-08-29 14:34:31
According to WLS Electronic Components on August 28, Cerebras Systems ( Nvidia’s competitor), the world's largest AI chip announced the launch of Cerebras Inference, which is claimed to be the world's fastest AI solution. It is 20 times faster than Nvidia's current-generation graphics processor. Its processor memory bandwidth is 7,000 times that of Nvidia, and the price is only one-fifth of that of graphics processors, with a 100-fold improvement in cost performance. Cerebras Inference also offers multiple service levels, including free, developer, and enterprise levels, meeting different needs from small-scale development to large-scale enterprise deployment.

Comparison with Nvidia:

Surpassing Nvidia in cost performance: 20 times faster and only one-fifth of the price 

AI inference refers to the process of using a trained AI model to make predictions or decisions on new data. The performance and efficiency of AI inference are crucial for real-time applications such as autonomous vehicles, real-time translation, or online customer service chatbots. Cerebras Inference (hereinafter referred to as "Cerebras inference service") is a service focused on AI inference to support these application scenarios with extremely high real-time requirements.
The Cerebras inference service is powered by the Cerebras CS-3 system and its third-generation wafer-scale chip (WSE-3). WSE-3 was released in March and is an improvement over the WSE-2 chip launched in 2021. WSE-3 has a memory bandwidth of up to 21PB/s, which is 7,000 times that of Nvidia's H100 graphics processor. This extremely high memory bandwidth can greatly reduce data transfer time and improve the speed and efficiency of model inference.
According to the official website, the Cerebras inference service provides 1,800 tokens per second for the Llama 3.1 8B model, and the price per million tokens is 10 cents; for the Llama 3.1 70B model, it provides 450 tokens per second, and the price per million tokens is 60 cents. It is 20 times faster than hyperscale cloud solutions based on Nvidia graphics processors.

Clear hierarchical access system for free user experience

According to user needs and usage scenarios, the Cerebras inference service provides a hierarchical system with three levels: 
Free level: This level provides free API access and relatively loose usage restrictions for all logged-in users. Users can experience it for free at this level.

Developer level: This level is designed for flexible serverless deployment and provides users with an API endpoint. Compared to most solutions on the market, its cost is much lower. For the Llama 3.1 8B and Llama 3.1 70B models, the price per million tokens is 10 cents and 60 cents respectively. In the future, Cerebras plans to continuously support more models.
Enterprise level: This level provides fine-tuned models, customized service level agreements, and dedicated support. It is suitable for continuous workloads. Enterprises can access the Cerebras inference service through
Cerebras-managed private clouds or on-premises deploy
ments within the enterprise. The price can be determined according to needs.
This hierarchical system of Cerebras inference service is designed to meet different needs from small-scale development to large-scale enterprise deployment.

Promote multi-party strategic cooperation and build a one-stop service for AI development 

In promoting strategic partnerships for AI development, Cerebras Systems is collaborating with a series of industry leaders to jointly build the future ecosystem of AI applications. These companies provide key technologies and services in their respective fields. For example, Docker aims to make the deployment of AI applications more convenient and consistent by using containerization technology. LangChain provides a rapid development framework for language model applications. Weights&Biases has created an MLOps platform for AI developers to train and fine-tune models.
"LiveKit is delighted to collaborate with Cerebras to help developers build the next generation of multimodal AI applications. Combining Cerebras' computing power and models with LiveKit's global edge network, the developed voice and video AI applications will achieve ultra-low latency and be closer to human characteristics." said Russell D'sa, CEO and co-founder of LiveKit, a company focused on building and scaling voice and video applications.
Cerebras Systems was founded in 2016. The team consists of computer architects, computer scientists, deep learning researchers, and various engineers. The company is known for its innovative wafer-scale chips (Wafer Scale Engine, WSE), which are designed specifically for AI computing and have huge size and performance. This chip unicorn has received support from multiple well-known investors, including Sam Altman, co-founder of OpenAI, and Fred Weber, former CTO of AMD. As of November 2021, the company completed a $250 million Series financing round with a valuation of $4 billion.
However, Nvidia still holds an advantage in ecosystem maturity, breadth of model support, and market awareness.Compared to Cerebra, Nvidia has a larger user base and more abundant developer tools and support. In addition, although Cerebras supports mainstream models (such as Llama 3.1), Nvidia's graphics processors support a wider range of deep learning frameworks and models. For users who are already deeply integrated into Nvidia's ecosystem, Cerebras may be slightly lacking in the breadth and flexibility of model support.
Please subscribe to WlS Electronic Components so that we can continuously share and analyze the latest news in the chip industry for your reference!
Related News
Read More >>
How many chips does a car need? How many chips does a car need?
Sep .19.2024
Automotive chips can be divided into four types according to their functions: control (MCU and AI chips), power, sensors, and others (such as memory). The market is monopolized by international giants. The automotive chips people often talk about refer to
Position and Function of Main Automotive Sensors Position and Function of Main Automotive Sensors
Sep .18.2024
The function of the air flow sensor is to convert the amount of air inhaled into the engine into an electrical signal and provide it to the electronic control unit (ECU). It is the main basis for determining the basic fuel injection volume. Vane type: The
Chip: The increasingly intelligent electronic brain Chip: The increasingly intelligent electronic brain
Sep .14.2024
In this era of rapid technological development, we often marvel at how mobile phones can run various application software smoothly, how online classes can be free of lag and achieve zero latency, and how the functions of electronic devices are becoming mo
LDA100 Optocoupler: Outstanding Performance, Wide Applications LDA100 Optocoupler: Outstanding Performance, Wide Applications
Sep .13.2024
In terms of characteristics, LDA100 is outstanding. It offers AC and DC input versions for optional selection, enabling it to work stably in different power supply environments. The small 6-pin DIP package not only saves space but also facilitates install