Microsoft has successfully deployed the industry’s first production-scale cluster featuring more than 4,600 NVIDIA GB300 NVL72 systems powered by Blackwell Ultra GPUs, all interconnected through the latest generation of NVIDIA InfiniBand networking. This is only the first step in what’s to come of a broader rollout that will eventually see hundreds of thousands of Blackwell Ultra GPUs deployed across Microsoft’s global AI datacenter network. These massive-scale clusters will transform AI development timelines, compressing model training cycles from months into weeks while delivering exceptional throughput for inference workloads. Microsoft will also become the first cloud provider to support training models containing hundreds of trillions of parameters, unlocking possibilities for larger and more capable AI systems.
This achievement was made possible through the combined efforts of experts from many fields, including hardware engineering, systems architecture, supply chain management, and facilities operations, alongside a close partnership with NVIDIA.
From GB200 to GB300
Earlier this year, Azure introduced ND GB200 v6 virtual machines accelerated by NVIDIA’s Blackwell architecture. These systems rapidly became essential infrastructure for some of the most computationally intensive AI workloads in the industry. Organizations, including OpenAI and Microsoft, have already deployed massive clusters of GB200 NVL2 on Azure to train and operate frontier models.

This latest generation of ND GB300 v6 VMs delivers a significant advancement over previous models. These systems have been specifically optimized for reasoning models, agentic AI systems, and multimodal generative AI applications. Built on a rack-scale architecture, each rack contains 18 VMs comprising a total of 72 GPUs. The configuration includes 72 NVIDIA Blackwell Ultra GPUs paired with 36 NVIDIA Grace CPUs, delivering 800 gigabits per second of cross-rack scale-out bandwidth per GPU through next-generation NVIDIA Quantum-X800 InfiniBand networking, double the bandwidth of GB200 NVL72. Within each rack, NVIDIA NVLink provides 130 terabytes per second of bandwidth connecting 37 terabytes of fast memory. The system achieves up to 1,440 petaflops of FP4 Tensor Core performance.
Developing infrastructure capable of supporting frontier AI research requires rethinking every component of the technology stack as an integrated system. This encompasses computing resources, memory architecture, networking fabric, datacenter design, cooling systems, and power distribution. The ND GB300 v6 VMs exemplify this holistic approach, resulting from years of collaborative engineering across silicon development, systems design, and software optimization.
At the rack level, NVLink and NVSwitch technologies eliminate traditional memory and bandwidth bottlenecks, enabling up to 130 terabytes per second of data transfer within each rack while connecting 37 terabytes of fast memory. This architecture transforms each rack into a tightly integrated unit capable of delivering higher inference throughput with reduced latency, particularly for larger models and extended context windows. These characteristics make agentic and multimodal AI systems substantially more responsive and scalable.
To scale past single racks, Azure adopted a full fat-tree, non-blocking network built on NVIDIA Quantum-X800 InfiniBand, currently the fastest networking fabric available. This design ensures customers can efficiently scale training for ultra-large models across tens of thousands of GPUs with minimal communication overhead, improving end-to-end training throughput. Reduced synchronization overhead also maximizes GPU utilization, enabling researchers to iterate more rapidly and cost-effectively despite the substantial computational demands of AI training workloads. Azure’s co-engineered infrastructure stack, incorporating custom protocols, collective libraries, and in-network computing capabilities, ensures the network operates with high reliability and full utilization. Features such as NVIDIA SHARP accelerate collective operations and effectively double bandwidth by performing mathematical operations within network switches, enhancing the efficiency and reliability of large-scale training and inference.
Azure’s advanced cooling infrastructure employs standalone heat exchanger units combined with facility-level cooling systems to minimize water consumption while maintaining thermal stability for dense, high-performance clusters like the GB300 NVL72. Microsoft continues developing and deploying innovative power distribution models designed to support the high energy density and dynamic load balancing requirements of ND GB300 v6 VM-class GPU clusters.
Additionally, Microsoft has reengineered its software infrastructure for storage, orchestration, and scheduling to fully leverage computing, networking, storage, and datacenter resources at supercomputing scale.
Microsoft’s multi-year investment in AI infrastructure has positioned the company to rapidly adopt and deploy emerging technologies. This foundation explains why Azure can uniquely deliver GB300 NVL72 infrastructure at production scale with such speed, meeting the demanding requirements of frontier AI research and development today.
As Azure accelerates GB300 deployments globally, customers will gain the ability to train and deploy advanced models in a fraction of the time required by previous-generation systems. The ND GB300 v6 VMs are positioned to establish a new benchmark for AI infrastructure.
Additional updates and performance benchmarks will be released as Azure expands production deployment of NVIDIA GB300 NVL72 systems worldwide.
Maybe you would like other interesting articles?