There has been a trend in the AI industry toward stringing together smaller groups of chips into subsystems for separate AI training tasks.
Chetan Kapoor
Chief Product Officer at CoreWeave
Key Facts
- Nvidia released Blackwell chips as the successor to the Hopper generation, designed specifically for AI training.
- Benchmarks from MLCommons show Blackwell chips are more than twice as fast per chip compared to the previous Hopper generation.
- A cluster of 2,496 Blackwell chips completed a large AI training task in 27 minutes, demonstrating substantial efficiency gains.
- Blackwell chips reduce the number of chips needed for training large language models such as Llama 3.1 405B, improving training efficiency.
- CoreWeave, collaborating with Nvidia, highlighted an industry trend of using smaller chip groups for separate AI training tasks, enabled by Blackwell's performance.
Key Stats at a Glance
Speed improvement per Blackwell chip compared to Hopper
100%+
Number of Blackwell chips in cluster
2496 chips
Training time for 2,496 Blackwell chip cluster
27 minutes
Size of Llama model benefiting from chip reduction
405B parameters