Huawei CloudMatrix (MatrixLink) combines 160,000+ chips with optical interconnects to accelerate AI training, reduce costs, and compete with Nvidia’s H800 in real-world benchmarks.

Huawei CloudMatrix


Huawei Cloud’s AI CloudMatrix, also known as MatrixLink, is a groundbreaking AI-native cloud architecture that can handle today’s huge training workloads. It is designed for ultra-large models and connects thousands of processors at optical speeds, resulting in increased throughput, improved reliability, and more efficient resource use. But does it actually outperform Nvidia’s flagship AI systems? Let us break it down.

What is Huawei CloudMatrix (MatrixLink)

CloudMatrix, also known as MatrixLink, is Huawei’s next-generation AI-native cloud infrastructure. Instead of addressing CPUs and NPUs independently, it combines computing, storage, and networking into a peer-to-peer, matrix-style architecture. A single supernode joins 384 Ascend NPUs and 192 CPUs using complete optical links, enabling direct all-to-all communication. Unlike traditional hierarchical clusters, this architecture avoids bottlenecks, resulting in a scalable rack-level AI powerhouse.

Key Features and How It Works

Huawei has invested much in optical networking. Each CloudMatrix rack has 800G optical transceivers, resulting in a total internal bandwidth of 5.5 Pbps significantly higher than traditional systems. Clusters may support up to 160,000 cards, enabling giga-parameter model training.

Another highlight is intelligent resource planning. CloudMatrix can dynamically assign computation and bandwidth according on whether a model requires tensor, pipeline, or expert parallelism, reducing congestion and increasing training speed. Its built-in cloud brain proactively checks for hardware failures and switches to backups in minutes, assuring continuous operation.

Performance and Benefits

Huawei says that CloudMatrix accelerates model training by 20% or more, doubles usable computing utilization, and can save organizations over $4 million per year in training expenditures. Inference throughput increased by 50%, and token processing performance surpassed Nvidia’s H800 in DeepSeek’s R1 model tests.

CloudMatrix 384 outperforms Nvidia’s GB200 NVL72 with around 300 PFLOPs in BF16, nearly doubling its raw performance. It also has 3.6x HBM memory capacity and 5.3x more scale-out bandwidth thanks to optical connectivity.

However, there is one catch: power efficiency. Huawei’s system consumes 559 kW versus Nvidia’s 145 kW, making it 3.9 times more power hungry. While China’s abundance of energy makes this possible, it emphasizes a “brute-force” approach.

Huawei CloudMatrix vs Nvidia

  • Performance: Higher PFLOPs, memory, and bandwidth than Nvidia GB200.
  • Reliability: Smart recovery system reduces downtime to ~20 minutes.
  • Scalability: Supports up to 160,000 cards; Nvidia’s systems are limited.
  • Power Draw: 3.9x higher system power consumption, less efficient per FLOP.

This comparison shows Huawei is trading efficiency for brute-force scale—a practical choice given export restrictions.

Who Should Use It and Final Take

CloudMatrix is excellent for Chinese firms, research institutes, and hyperscalers that want to train large AI models in-house. It serves as the computing backbone for AI-native applications, albeit its high energy cost may discourage worldwide adoption.

In short, Huawei CloudMatrix demonstrates that China can construct world-class AI infrastructure without Nvidia, but long-term success is dependent on third-party validation and the ability to balance efficiency and scalability.

FAQ Section

Q: How does Huawei CloudMatrix differ from Nvidia’s GB200 or H800?
A: CloudMatrix offers nearly 2x the compute power, higher memory bandwidth, and faster optical interconnects, but it consumes 3.9x more power, making it less energy efficient.

Q: What makes Huawei CloudMatrix AI-native?
A: Unlike traditional clouds, CloudMatrix pools CPUs, NPUs, memory, and storage into one peer-to-peer network, dynamically assigning resources for maximum AI training efficiency.

Q: Can CloudMatrix replace Nvidia GPUs globally?
A: Not entirely. It gives China a domestic alternative to Nvidia’s GPUs, but due to power trade-offs and lack of third-party benchmarks, global adoption may remain limited.