Nvidia and CoreWeave reveal new AI chip trend: smaller clusters, faster training

Nvidia's Blackwell chips have doubled AI training speeds compared to Hopper, completing massive tasks in just 27 minutes with fewer chips. CoreWeave's Chetan Kapoor emphasizes a shift toward modular chip clusters, signaling a new era in efficient AI model training.

Sources:
The HinduTimes of India
Updated 3h ago
Tab background
Sources: The HinduTimes of India
Nvidia and CoreWeave have unveiled a new trend in AI chip deployment, emphasizing smaller clusters and faster training times. Nvidia's latest Blackwell chips outperform the previous Hopper generation by more than double in speed per chip, significantly enhancing AI training efficiency.

In benchmark tests, a cluster of 2,496 Blackwell chips completed a training task in just 27 minutes, showcasing the chips' remarkable performance. This leap reduces the number of chips needed for training large language models such as Llama 3.1 405B.

CoreWeave's Chief Product Officer, Chetan Kapoor, highlighted a broader industry shift during a press conference, noting, "There has been a trend in the AI industry toward stringing together smaller groups of chips into subsystems for separate AI training tasks." This modular approach allows for more efficient and flexible AI training workflows.

The collaboration between Nvidia and CoreWeave underscores the evolving landscape of AI hardware, where speed and scalability are critical. The Blackwell chips' performance cements Nvidia's dominance in AI training technology, enabling faster development cycles and potentially lowering costs.

"Benchmarks reveal Blackwell chips are more than twice as fast as previous Hopper generations, showcasing Nvidia's continued dominance in AI training," further emphasizing the technological leap.

This trend toward smaller, faster clusters could reshape how AI models are trained, making high-performance AI more accessible and efficient across the industry.
Sources: The HinduTimes of India
Nvidia's new Blackwell AI chips, more than twice as fast as previous Hopper models, enable faster training with smaller clusters, as demonstrated by CoreWeave's 2,496-chip cluster completing tasks in 27 minutes. This marks a shift toward modular AI training subsystems, says CoreWeave's Chetan Kapoor.
Section 1 background
There has been a trend in the AI industry toward stringing together smaller groups of chips into subsystems for separate AI training tasks.
Chetan Kapoor
Chief Product Officer at CoreWeave
The HinduTimes of India
Key Facts
  • Nvidia released Blackwell chips as the successor to the Hopper generation, designed specifically for AI training.
  • Benchmarks from MLCommons show Blackwell chips are more than twice as fast per chip compared to the previous Hopper generation.The HinduTimes of India
  • A cluster of 2,496 Blackwell chips completed a large AI training task in 27 minutes, demonstrating substantial efficiency gains.The HinduTimes of India
  • Blackwell chips reduce the number of chips needed for training large language models such as Llama 3.1 405B, improving training efficiency.Times of India
  • CoreWeave, collaborating with Nvidia, highlighted an industry trend of using smaller chip groups for separate AI training tasks, enabled by Blackwell's performance.The HinduTimes of India
Key Stats at a Glance
Speed improvement per Blackwell chip compared to Hopper
100%+
The Hindu
Number of Blackwell chips in cluster
2496 chips
The HinduTimes of India
Training time for 2,496 Blackwell chip cluster
27 minutes
The HinduTimes of India
Size of Llama model benefiting from chip reduction
405B parameters
Times of India
Article not found
CuriousCats.ai

Article

Source Citations