Industry News

Cerebras Systems outperforms Nvidia in cost performance: 20 times faster and with a price only one-fifth that of Nvidia's

Update time : 2024-08-29 14:34:31

According to WLS Electronic Components on August 28, Cerebras Systems ( Nvidia’s competitor), the world's largest AI chip announced the launch of Cerebras Inference, which is claimed to be the world's fastest AI solution. It is 20 times faster than Nvidia's current-generation graphics processor. Its processor memory bandwidth is 7,000 times that of Nvidia, and the price is only one-fifth of that of graphics processors, with a 100-fold improvement in cost performance. Cerebras Inference also offers multiple service levels, including free, developer, and enterprise levels, meeting different needs from small-scale development to large-scale enterprise deployment.

Comparison with Nvidia:

Surpassing Nvidia in cost performance: 20 times faster and only one-fifth of the price

AI inference refers to the process of using a trained AI model to make predictions or decisions on new data. The performance and efficiency of AI inference are crucial for real-time applications such as autonomous vehicles, real-time translation, or online customer service chatbots. Cerebras Inference (hereinafter referred to as "Cerebras inference service") is a service focused on AI inference to support these application scenarios with extremely high real-time requirements.
The Cerebras inference service is powered by the Cerebras CS-3 system and its third-generation wafer-scale chip (WSE-3). WSE-3 was released in March and is an improvement over the WSE-2 chip launched in 2021. WSE-3 has a memory bandwidth of up to 21PB/s, which is 7,000 times

that of Nvidia's H100 graphics processor. This extremely high memory bandwidth can greatly reduce data transfer time and improve the speed and efficiency of model inference.
According to the official website, the Cerebras inference service provides 1,800 tokens per second for the Llama 3.1 8B model, and the price per million tokens is 10 cents; for the Llama 3.1 70B model, it provides 450 tokens per second, and the price per million tokens is 60 cents. It is 20 times faster than hyperscale cloud solutions based on Nvidia graphics processors.

Clear hierarchical access system for free user experience

According to user needs and usage scenarios, the Cerebras inference service provides a hierarchical system with three levels:
Free level: This level provides free API access and relatively loose usage restrictions for all logged-in users. Users can experience it for free at this level.
Developer level: This level is designed for flexible serverless deployment and provides users with an API endpoint. Compared to most solutions on the market, its cost is much lower. For the Llama 3.1 8B and Llama 3.1 70B models, the price per million tokens is 10 cents and 60 cents respectively. In the future, Cerebras plans to continuously support more models.
Enterprise level: This level provides fine-tuned models, customized service level agreements, and dedicated support. It is suitable for continuous workloads. Enterprises can access the Cerebras inference service through Cerebras-managed private clouds or on-premises deployments within the enterprise. The price can be determined according to needs.
This hierarchical system of Cerebras inference service is designed to meet different needs from small-scale development to large-scale enterprise deployment.

Promote multi-party strategic cooperation and build a one-stop service for AI development

In promoting strategic partnerships for AI development, Cerebras Systems is collaborating with a series of industry leaders to jointly build the future ecosystem of AI applications. These companies provide key technologies and services in their respective fields. For example, Docker aims to make the deployment of AI applications more convenient and consistent by using containerization technology. LangChain provides a rapid development framework for language model applications. Weights&Biases has created an MLOps platform for AI developers to train and fine-tune models.
"LiveKit is delighted to collaborate

with Cerebras to help developers build the next generation of multimodal AI applications. Combining Cerebras' computing power and models with LiveKit's global edge network, the developed voice and video AI applications will achieve ultra-low latency and be closer to human characteristics." said Russell D'sa, CEO and co-founder of LiveKit, a company focused on building and scaling voice and video applications.
Cerebras Systems was founded in 2016. The team consists of computer architects, computer scientists, deep learning researchers, and various engineers. The company is known for its innovative wafer-scale chips (Wafer Scale Engine, WSE), which are designed specifically for AI computing and have huge size and performance. This chip unicorn has received support from multiple well-known investors, including Sam Altman, co-founder of OpenAI, and Fred Weber, former CTO of AMD. As of November 2021, the company completed a $250 million Series financing round with a valuation of $4 billion.
However, Nvidia still holds an advantage in ecosystem maturity, breadth of model support, and market awareness.Compared to Cerebra, Nvidia has a larger user base and more abundant developer tools and support. In addition, although Cerebras supports mainstream models (such as Llama 3.1), Nvidia's graphics processors support a wider range of deep learning frameworks and models. For users who are already deeply integrated into Nvidia's ecosystem, Cerebras may be slightly lacking in the breadth and flexibility of model support.
Please subscribe to WlS Electronic Components so that we can continuously share and analyze the latest news in the chip industry for your reference!

Previous : Meta abandons self-developing AR glasses chips due to huge losses and will adopt Qualcomm's technical solution

Next : Nvidia skyrocketed, with Huang Renxun's wealth soaring to the 21st place in the world