黑料网

STOCK TITAN

NVIDIA Ethernet Networking Accelerates World鈥檚 Largest AI Supercomputer, Built by xAI

Rhea-AI Impact
(Low)
Rhea-AI Sentiment
(Neutral)
Tags
AI

NVIDIA announced that xAI's Colossus supercomputer, featuring 100,000 NVIDIA Hopper Tensor Core GPUs, has achieved massive scale using the NVIDIA Spectrum-X Ethernet networking platform. The system, located in Memphis, Tennessee, is being used to train xAI's Grok language models and is currently being expanded to 200,000 GPUs. The supercomputer was built in just 122 days and achieved 95% data throughput with zero application latency degradation. The system utilizes NVIDIA's Spectrum SN5600 Ethernet switch, supporting speeds up to 800Gb/s, paired with BlueField-3 SuperNICs for optimal performance.

NVIDIA ha annunciato che il supercomputer Colossus di xAI, dotato di 100.000 GPU NVIDIA Hopper Tensor Core, ha raggiunto una scala massiva utilizzando la piattaforma di rete Ethernet NVIDIA Spectrum-X. Il sistema, situato a Memphis, Tennessee, viene utilizzato per addestrare i modelli di linguaggio Grok di xAI ed 猫 attualmente in fase di espansione a 200.000 GPU. Il supercomputer 猫 stato costruito in sole 122 giorni e ha raggiunto una capacit脿 di throughput dati del 95% senza alcun degrado della latenza dell'applicazione. Il sistema utilizza lo switch Ethernet NVIDIA Spectrum SN5600, che supporta velocit脿 fino a 800Gb/s, abbinate a BlueField-3 SuperNIC per prestazioni ottimali.

NVIDIA anunci贸 que el superordenador Colossus de xAI, que cuenta con 100,000 GPUs NVIDIA Hopper Tensor Core, ha alcanzado una gran escala utilizando la plataforma de red Ethernet NVIDIA Spectrum-X. El sistema, ubicado en Memphis, Tennessee, se utiliza para entrenar los modelos de lenguaje Grok de xAI y actualmente se est谩 expandiendo a 200,000 GPUs. El superordenador fue construido en solo 122 d铆as y logr贸 un rendimiento de datos del 95% sin degradaci贸n de la latencia de la aplicaci贸n. El sistema utiliza el conmutador Ethernet NVIDIA Spectrum SN5600, que soporta velocidades de hasta 800 Gb/s, junto con BlueField-3 SuperNICs para un rendimiento 贸ptimo.

NVIDIA電 xAI鞚 Colossus 鞀堩嵓旎错摠韯瓣皜 100,000臧滌潣 NVIDIA Hopper Tensor Core GPU毳 韮戩灛頃橃棳 NVIDIA Spectrum-X 鞚措崝雱 雱ろ姼鞗岉偣 頂岆灚韽检潉 鞚挫毄頃 雽攴滊 頇曥灔鞚 雼劚頄堧嫟瓿 氚滍憸頄堨姷雼堧嫟. 鞚 鞁滌姢韰滌潃 韰岆劋鞁滌< 氅ろ敿鞀れ棎 鞙勳箻頃橁碃 鞛堨溂氅 xAI鞚 Grok 鞏胳柎 氇嵏鞚 頉堧牗頃橂姅 雿 靷毄霅橁碃 鞛堨溂氅 順勳灛 200,000臧滌潣 GPU搿 頇曥灔 欷戩瀰雼堧嫟. 鞚 鞀堩嵓旎错摠韯半姅 雼 122鞚 毵岇棎 甑稌霅橃棃鞀惦媹雼 攴鸽Μ瓿 鞎犿攲毽紑鞚挫厴 歆鞐 鞁滉皠 鞝頃 鞐嗢澊 95%鞚 雿办澊韯 觳橂Μ霟夓潉 雼劚頄堨姷雼堧嫟. 鞚 鞁滌姢韰滌潃 800Gb/s旯岇 歆鞗愴晿電 NVIDIA Spectrum SN5600 鞚措崝雱 鞀れ渼旃橂ゼ 頇滌毄頃橁碃 BlueField-3 SuperNIC鞕 瓴绊暕頃橃棳 斓滌爜鞚 靹彪姤鞚 鞝滉车頃╇媹雼.

NVIDIA a annonc茅 que le superordinateur Colossus de xAI, dot茅 de 100 000 GPU NVIDIA Hopper Tensor Core, a atteint une 茅chelle massive en utilisant la plateforme de mise en r茅seau Ethernet NVIDIA Spectrum-X. Le syst猫me, situ茅 脿 Memphis, Tennessee, est utilis茅 pour entra卯ner les mod猫les de langage Grok de xAI et est actuellement en cours d'expansion 脿 200 000 GPU. Le superordinateur a 茅t茅 construit en seulement 122 jours et a atteint un d茅bit de donn茅es de 95 % sans aucune d茅gradation de la latence des applications. Le syst猫me utilise le commutateur Ethernet NVIDIA Spectrum SN5600, prenant en charge des vitesses allant jusqu'脿 800 Gb/s, associ茅 aux BlueField-3 SuperNIC pour des performances optimales.

NVIDIA hat angek眉ndigt, dass der Colossus-Supercomputer von xAI mit 100.000 NVIDIA Hopper Tensor Core GPUs mithilfe der NVIDIA Spectrum-X Ethernet-Netzwerkplattform eine massive Skalierung erreicht hat. Das System, das sich in Memphis, Tennessee, befindet, wird verwendet, um die Grok-Sprachmodelle von xAI zu trainieren und wird derzeit auf 200.000 GPUs erweitert. Der Supercomputer wurde in nur 122 Tagen gebaut und erreichte einen Daten-Durchsatz von 95% ohne jegliche Verz枚gerung der Anwendungslatenz. Das System verwendet den NVIDIA Spectrum SN5600 Ethernet-Switch, der Geschwindigkeiten von bis zu 800 Gb/s unterst眉tzt, zusammen mit BlueField-3 SuperNICs f眉r optimale Leistung.

Positive
  • Successful deployment of world's largest AI supercomputer with 100,000 NVIDIA GPUs
  • System expansion in progress to double capacity to 200,000 GPUs
  • Achieved 95% data throughput, significantly outperforming standard Ethernet's 60%
  • Rapid deployment completed in 122 days versus typical timeframe of months to years
Negative
  • None.

Insights

The deployment of a 100,000 NVIDIA Hopper GPU system, with plans to double to 200,000 GPUs, represents a significant technological milestone in AI infrastructure. The system's exceptional 95% data throughput and zero latency degradation demonstrate remarkable efficiency improvements over standard Ethernet's 60% throughput.

The rapid 122-day construction timeframe and 19-day deployment to training initiation showcase unprecedented speed in supercomputer implementation. The Spectrum-X platform's 800Gb/s port speeds and advanced features like adaptive routing position NVIDIA to capture substantial market share in the growing AI infrastructure sector. This partnership with xAI validates NVIDIA's dominance in both AI hardware and networking solutions, strengthening their competitive moat in the AI ecosystem.

This development significantly strengthens NVIDIA's market position in the AI infrastructure space. By powering xAI's Colossus, the world's largest AI supercomputer, NVIDIA demonstrates its ability to deliver end-to-end solutions for large-scale AI deployments. The successful implementation could accelerate adoption of NVIDIA's Spectrum-X platform among other major AI companies and hyperscalers.

The partnership with Elon Musk's xAI adds considerable prestige and validation to NVIDIA's networking solutions, potentially driving increased demand for their integrated GPU-networking packages. This could lead to higher margins and revenue growth as companies seek to replicate xAI's success in large-scale AI deployments.

NVIDIA Spectrum-X Makes Colossal NVIDIA Hopper 100,000-GPU System Possible

SANTA CLARA, Calif., Oct. 28, 2024 (GLOBE NEWSWIRE) -- NVIDIA today announced that xAI鈥檚 Colossus supercomputer cluster comprising 100,000 NVIDIA Hopper Tensor Core GPUs in Memphis, Tennessee, achieved this massive scale by using the NVIDIA 鈩 Ethernet networking platform, which is designed to deliver superior performance to multi-tenant, hyperscale AI factories using standards-based Ethernet, for its Remote Direct Memory Access (RDMA) network.

Colossus, the world鈥檚 largest AI supercomputer, is being used to train xAI鈥檚 Grok family of large language models, with chatbots offered as a feature for X Premium subscribers. xAI is in the process of doubling the size of Colossus to a combined total of 200,000 GPUs.

The supporting facility and state-of-the-art supercomputer was built by xAI and NVIDIA in just 122 days, instead of the typical timeframe for systems of this size that can take many months to years. It took 19 days from the time the first rack rolled onto the floor until training began.

While training the extremely large Grok model, Colossus achieves unprecedented network performance. Across all three tiers of the network fabric, the system has experienced zero application latency degradation or packet loss due to flow collisions. It has maintained 95% data throughput enabled by Spectrum-X congestion control.

This level of performance cannot be achieved at scale with standard Ethernet, which creates thousands of flow collisions while delivering only 60% data throughput.

鈥淎I is becoming mission-critical and requires increased performance, security, scalability and cost-efficiency,鈥 said Gilad Shainer, senior vice president of networking at NVIDIA. 鈥淭he NVIDIA Spectrum-X Ethernet networking platform is designed to provide innovators such as xAI with faster processing, analysis and execution of AI workloads, and in turn accelerates the development, deployment and time to market of AI solutions.鈥

鈥淐olossus is the most powerful training system in the world,鈥 said Elon Musk on . 鈥淣ice work by xAI team, NVIDIA and our many partners/suppliers.鈥

鈥渪AI has built the world鈥檚 largest, most-powerful supercomputer,鈥 said a spokesperson for xAI. 鈥淣VIDIA鈥檚 Hopper GPUs and Spectrum-X allow us to push the boundaries of training AI models at a massive-scale, creating a super-accelerated and optimized AI factory based on the Ethernet standard.鈥

At the heart of the Spectrum-X platform is the , which supports port speeds of up to 800Gb/s and is based on the Spectrum-4 switch ASIC. xAI chose to pair the Spectrum-X SN5600 switch with for unprecedented performance.

Spectrum-X Ethernet networking for AI brings advanced features that deliver highly effective and scalable bandwidth with low latency and short tail latency, previously exclusive to InfiniBand. These features include adaptive routing with NVIDIA Direct Data Placement technology, congestion control, as well as enhanced AI fabric visibility and performance isolation 鈥 all key requirements for multi-tenant generative AI clouds and large enterprise environments.

About NVIDIA
(NASDAQ: NVDA) is the world leader in accelerated computing.

For further information, contact:
Alex Shapiro
NVIDIA Corporation
+1-415-608-5044
ashapiro@nvidia.com

Certain statements in this press release including, but not limited to, statements as to: the benefits, impact, and performance of NVIDIA鈥檚 products, services, and technologies, including NVIDIA Hopper Tensor Core GPUs, NVIDIA Spectrum-X Ethernet networking platform, NVIDIA Spectrum SN5600 Ethernet switch, Spectrum-4 switch ASIC, and NVIDIA BlueField-3 SuperNICs; features of xAI鈥檚 Colossus supercomputer cluster; xAI being in the process of doubling the size of Colossus to a combined total of 200,000 NVIDIA Hopper GPUs; the NVIDIA Spectrum-X Ethernet networking platform being designed to provide innovators such as xAI with faster processing, analysis and execution of AI workloads, and in turn accelerating the development, deployment and time to market of AI solutions; NVIDIA鈥檚 Hopper GPUs and Spectrum-X allowing xAI to push the boundaries of training AI models at a massive scale, creating a super-accelerated and optimized AI factory based on the Ethernet standard are forward-looking statements that are subject to risks and uncertainties that could cause results to be materially different than expectations. Important factors that could cause actual results to differ materially include: global economic conditions; our reliance on third parties to manufacture, assemble, package and test our products; the impact of technological development and competition; development of new products and technologies or enhancements to our existing product and technologies; market acceptance of our products or our partners鈥 products; design, manufacturing or software defects; changes in consumer preferences or demands; changes in industry standards and interfaces; unexpected loss of performance of our products or technologies when integrated into systems; as well as other factors detailed from time to time in the most recent reports NVIDIA files with the Securities and Exchange Commission, or SEC, including, but not limited to, its annual report on Form 10-K and quarterly reports on Form 10-Q. Copies of reports filed with the SEC are posted on the company鈥檚 website and are available from NVIDIA without charge. These forward-looking statements are not guarantees of future performance and speak only as of the date hereof, and, except as required by law, NVIDIA disclaims any obligation to update these forward-looking statements to reflect future events or circumstances.

漏 2024 NVIDIA Corporation. All rights reserved. NVIDIA, the NVIDIA logo, NVIDIA Spectrum-X and BlueField are trademarks and/or registered trademarks of NVIDIA Corporation in the U.S. and other countries. Other company and product names may be trademarks of the respective companies with which they are associated. Features, pricing, availability and specifications are subject to change without notice.

A photo accompanying this announcement is available at


FAQ

How many NVIDIA GPUs does the xAI Colossus supercomputer currently use?

The xAI Colossus supercomputer currently uses 100,000 NVIDIA Hopper Tensor Core GPUs and is being expanded to 200,000 GPUs.

What is the data throughput achieved by NVIDIA's Spectrum-X in the Colossus supercomputer?

NVIDIA's Spectrum-X achieved 95% data throughput with zero application latency degradation or packet loss due to flow collisions.

How long did it take to build the xAI Colossus supercomputer using NVIDIA technology?

The supercomputer was built in just 122 days, with training beginning 19 days after the first rack installation.

What is the maximum port speed supported by NVIDIA's Spectrum SN5600 Ethernet switch?

The NVIDIA Spectrum SN5600 Ethernet switch supports port speeds of up to 800Gb/s.

Nvidia Corp

NASDAQ:NVDA

NVDA Rankings

NVDA Latest News

NVDA Stock Data

3.43T
23.44B
4.29%
66.17%
1%
Semiconductors
Semiconductors & Related Devices
United States of America
SANTA CLARA