🎉 Gate Square Growth Points Summer Lucky Draw Round 1️⃣ 2️⃣ Is Live!
🎁 Prize pool over $10,000! Win Huawei Mate Tri-fold Phone, F1 Red Bull Racing Car Model, exclusive Gate merch, popular tokens & more!
Try your luck now 👉 https://www.gate.com/activities/pointprize?now_period=12
How to earn Growth Points fast?
1️⃣ Go to [Square], tap the icon next to your avatar to enter [Community Center]
2️⃣ Complete daily tasks like posting, commenting, liking, and chatting to earn points
100% chance to win — prizes guaranteed! Come and draw now!
Event ends: August 9, 16:00 UTC
More details: https://www
Network Innovation in the AI Era: Challenges in Large Model Training and Three Development Directions
The Importance of the Internet and Innovation Directions in the AI Era
The network plays a key role in the era of AI large models. As the scale of models grows rapidly, multi-server clusters have become the main way to solve model training, which forms the basis for the network to "rise" in the AI era. Compared to the past, when it was mainly used for data transmission, the network is now more used for synchronizing model parameters between graphics cards, which puts higher demands on the density and capacity of the network.
Large model training faces three major challenges:
The increasingly large model size: Training time is positively correlated with the number of model parameters and the scale of data, and negatively correlated with computational speed. Improving computational efficiency is key to shortening training time, while increasing the number of devices and enhancing parallel efficiency directly determines computational power.
Complex communication of multi-card synchronization: After the model is split to a single card, alignment is required for each computation. Operations like All-to-All impose higher demands on network transmission and exchange.
Increasingly expensive failure costs: Training large models often lasts for months, and interruptions can lead to several days of retraining, resulting in significant losses. Modern AI networks have become the crystallization of human systems engineering capabilities comparable to airplanes, aircraft carriers, and other complex systems.
Network innovation mainly revolves around three directions:
The evolution of communication media: optical modules, copper cables, and silicon-based interconnections each have their advantages, and efforts are being made to explore cost reduction and performance improvement.
Competition of Network Protocols: The inter-chip communication protocol is strongly tied to the graphics card, while the competition between node communication is mainly between IB and Ethernet.
Changes in Network Architecture: Leaf-Spine architecture faces challenges, and new architectures such as Dragonfly and rail-only are expected to become the evolution direction for ultra-large clusters.
Investment advice should focus on companies related to the core and innovative segments of communication systems. Overall, the innovation of networks in the AI era will revolve around "cost reduction", "openness", and a balance of computing power scale, continuously driving the advancement of communication technology.