Grace Hopper Servers - Configure Online

The NVIDIA Grace Hopper™ architecture combines the innovative power of the NVIDIA Hopper™ GPU and the flexibility of the NVIDIA Grace™ CPU into one advanced superchip. This integration is facilitated by the NVIDIA® NVLink® Chip-2-Chip (C2C) interconnect, ensuring high-bandwidth and memory coherence between the two components. This unified architecture maximises performance and efficiency, enabling seamless collaboration between GPU and CPU for a wide range of computing tasks.

The World’s Most Versatile Computing Platform

NVIDIA NVLink-C2C is a memory-coherent, high-bandwidth, and low-latency interconnect for superchips. At the core of the GH200 Grace Hopper Superchip, it provides up to 900 gigabytes per second (GB/s) of bandwidth, which is 7 times faster than PCIe Gen5 lanes commonly used in accelerated systems. NVLink-C2C allows applications to use both GPU and CPU memory efficiently. With up to 480GB of LPDDR5X CPU memory per GH200 Grace Hopper Superchip, the GPU has direct access to 7X more fast memory than HMB3 or almost 8X more fast memory with HBM3e. GH200 can be used in standard servers to run a variety of inference, data analytics, and other compute and memory-intensive workloads. GH200 can also be combined with the NVIDIA NVLink Switch System, with all GPU threads running on up to 256 NVLink-connected GPUs and able to access up to 144 terabytes (TB) of memory at high bandwidth.

Power and Efficiency With the Grace CPU

The NVIDIA Grace CPU offers twice the performance per watt compared to traditional x86-64 platforms and stands as the fastest Arm® data center CPU worldwide. It's designed for high single-threaded performance, high- memory bandwidth, outstanding data-movement capabilities. The NVIDIA Grace CPU combines 72 Neoverse V2 Armv9 cores and up to 480GB of server-grade LPDDR5X memory with ECC, it achieves an optimal balance between bandwidth, energy efficiency, capacity, and cost. Compared to an eight-channel DDR5 design, the Grace CPU's LPDDR5X memory system delivers 53 percent more bandwidth while using only one-eighth the power per gigabyte per second.

Performance and Speed With the Hopper H100 GPU

The H100 Tensor Core GPU is NVIDIA’s latest data center GPU, offering a significant performance boost for large-scale AI and HPC compared to the previous A100 Tensor Core GPU. Built on the new Hopper GPU architecture, the NVIDIA H100 introduces several innovations:

New fourth-generation Tensor Cores perform faster matrix computations than ever before, handling a wider range of AI and HPC tasks.
A new Transformer Engine enables H100 to deliver AI training speeds up to 9 times faster and AI inference speeds up to 30 times faster than the previous GPU generation.
Secure Multi-Instance GPU (MIG) patitions the GPU into separate, appropriately sized sections to enhance quality of service (QoS) for smaller workloads.

Class-Leading Performance for HPC and AI Workloads

The GH200 Grace Hopper Superchip marks the first genuine mixed accelerated platform tailored for HPC tasks. It boosts any application by leveraging the strengths of both GPUs and CPUs, all while offering the simplest and most efficient mixed programming approach yet. This allows scientists and engineers to concentrate on tackling the world's most pressing issues. For AI inference workloads, GH200 Grace Hopper Superchips combines with NVIDIA networking technologies to offer the most cost-effective scaling solutions, empowering users to handle larger datasets, more intricate models, and new tasks with access to up to 624GB of high-speed memory. For AI training, up to 256 NVLink-connected GPUs can access up to 144TB of memory at high bandwidth for large language model (LLM) or recommender system training.

	1U with Grace Hopper	1U with Grace Hopper LC	1U 2-Node with Grace Hopper	1U 2-Node with Grace CPU	2U with Grace CPU	2U with X86 DP

Model	ARS-111GL-NHR	ARS-111GL-NHR-LCC	ARS-111GL-DNHR-LCC	ARS-121L-DNR	ARS-221GL-NR	SYS-221GE-NR
CPU	72-core Grace Arm Neoverse V2 CPU + H100 Tensor Core GPU in a single chip	72-core Grace Arm Neoverse V2 CPU + H100 Tensor Core GPU in a single chip	72-core Grace Arm Neoverse V2 CPU + H100 Tensor Core GPU in a single chip per node	144-core Grace Arm Neoverse V2 CPU + H100 Tensor Core GPU in a single chip per node (total of 288 cores in one system	144-core Grace Arm Neoverse V2 CPU + H100 Tensor Core GPU in a single chip	4th or 5th Generation Intel Xeon Scalable Processors
Cooling	Air-cooled	Liquid-cooled	Liquid-cooled	Air-cooled	Air-cooled	Air-cooled
GPU Support	NVIDIA H100 Tensor Core GPU with 96GB of HBM3	NVIDIA H100 Tensor Core GPU with 96GB of HBM3	NVIDIA H100 Tensor Core GPU with 96GB of HBM3 per node	Please contact for possible configurations	Up to 4 double-width GPUs including NVIDIA H100 PCIe, H100 NVL, L40S	Up to 4 double-width GPUs including NVIDIA H100 PCIe, H100 NVL, L40S
Memory	CPU: 480G integrated LPDDR5X with ECC GPU: 96GB HBM3	CPU: 480G integrated LPDDR5X with ECC GPU: 96GB HBM3	CPU: 480G integrated LPDDR5X with ECC per node GPU: 96GB HBM3 per node	Up to 480GB of integrated LPDDR5X with ECC and up to 1TB/s of memory bandwidth per node	Up to 480GB of integrated LPDDR5X with ECC and up to 1TB/s of memory bandwidth per node	Up to 2TB, 32x DIMM Slots, ECC DDR5-4800 DIMM
Networking	3x PCIe 5.0 x16 slots supporting NVIDIA Bluefield-3 or ConnectX-7	3x PCIe 5.0 x16 slots supporting NVIDIA Bluefield-3 or ConnectX-7	2x PCIe 5.0 x16 slots per node supporting NVIDIA Bluefield-3 or ConnectX-7	3x PCIe 5.0 x16 slots per node supporting NVIDIA Bluefield-3 or ConnectX-7	3x PCIe 5.0 x16 slots supporting NVIDIA Bluefield-3 or ConnectX-7	3x PCIe 5.0 x16 slots supporting NVIDIA Bluefield-3 or ConnectX-7
Storage	8x Hot-swap E1.S drives & 2x M.2 NVMe drives	8x Hot-swap E1.S drives & 2x M.2 NVMe drives	4x Hot-swap E1.S drives & 2x M.2 NVMe drives per node	4x Hot-swap E1.S drives & 2x M.2 NVMe drives per node	8x Hot-swap E1.S drives & 2x M.2 NVMe drives	8x Hot-swap E1.S drives & 2x M.2 NVMe drives
Power Supplies	2x 2000W Titanium Level	2x 2000W Titanium Level	2x 2700W Titanium Level	2x 2700W Titanium Level	3x 2000W Titanium Level	3x 2000W Titanium Level
	Configure Now	Configure Now	Configure Now	Configure Now	Configure Now	Configure Now

Optimal balance between bandwidth, energy efficiency and capacity
Up to 1TB/s of raw bidirectional BW
1/8th power per GB/s vs conventional DDR memory
Similar cost / bit to conventional DDR memory
Data Center class memory with error code correction (ECC)

Grace CPU	Feature
CPU core count	72 Arm Neoverse V2 cores
L1 Cache	64KB i-cache + 64KB d-cache
L2 Cache	1MB per core
L3 Cache	114MB
Base Frequency \| all-core single instruction, multiple data (SIMD) frequency	3.1Ghz \| 3.0Ghz
LPDDR5X size	Up to 480GB
Memory bandwidth	Up to 512GB/s
PCIe links	Up to 4x PCIe x16 (Gen 5)

Hopper H100 GPU	Feature
FP64	34 teraFLOPS
FP64 Tensor Core	67 teraFLOPS
FP32	67 teraFLOPS
TF32 Tensor Core	989 teraFLOPS \| 494 teraFLOPS
BFLOAT16 Tensor Core	1,979 teraFLOPS \| 990 teraFLOPS
FP16 Tensor Core	1,979 teraFLOPS \| 990 teraFLOPS
FP8 Tensor Core	3,958 teraFLOPS \| 1,979 teraFLOPS
INT8 Tensor Core	3,958 TOPS \| 1,979 TOPS
High-bandwidth memory (HBM) size	Up to 96GB \| 144GB HBM3e
Memory bandwidth	Up to 4TB/s \| Up to 4.9TB/s HBM3e
NVIDIA NVLink-C2C CPU-to-GPU bandwidth	900GB/s bidirectional
Module thermal design power (TDP)	Programmable from 450W to 1000W (CPU + GPU + memory
Form Factor	Superchip module
Thermal solution	Air cooled or liquid cooled

Industry leading Scalable Compute Unit Designed For Large Language Models

Foundations for Large-Scale AI Inference in the Cloud

In the AI era, a unit of computing power isn't just about the number of servers anymore. Today's artificial intelligence relies on interconnected GPUs, CPUs, memory, and storage across multiple nodes in racks. This infrastructure demands high-speed, low-latency network connections, along with efficient cooling and power delivery to maintain peak performance and effectiveness in various data center settings. Supermicro's SuperCluster solution offers essential components for rapidly advancing Generative AI and Large Language Models (LLMs). This all-inclusive data center solution speeds up delivery time for critical business needs and removes the complexity of building large clusters, which used to require extensive design and optimisation work.

Plug-and-Play Cluster

Supermicro's SuperCluster design for NVIDIA MGX Systems, featuring NVIDIA GH200, incorporates 400Gb/s networking fabrics with a non-blocking architecture. This setup enables each rack with 32 nodes (32 GPUs) and a 256-node cluster to function as a unified compute unit, offering a cohesive pool of high-bandwidth memory crucial for Large Language Model (LLM) high batch sizes and extensive inference tasks. Whether constructing a cloud-scale inference infrastructure for LLMs or optimising large models for training, the spine and leaf network topology allows to scale from 256 nodes to thousands of nodes. Supermicro's rigorous testing procedures thoroughly validate the cluster's operational effectiveness before shipment, ensuring customers receive plug-and-play units at the rack or multi-rack cluster level for rapid deployment.

German / English:	+49 89 1208 5600
French / English:	+33 1 72 81 06 78

Grace Blackwell Servers

Call a Broadberry Storage & Server Specialist Now: +49 89 1208 5600

Have a Broadberry Expert Contact You:

NVIDIA Grace Hopper GH200

The World’s Most Versatile Computing Platform

Power and Efficiency With the Grace CPU

Performance and Speed With the Hopper H100 GPU

Class-Leading Performance for HPC and AI Workloads

Grace Hopper Comparison

Low-Power High-Bandwidth Memory Subsystem LPDDR5X Data Center Enhanced Memory

GH200 Grace Hopper AI Inference Platform
Versatile Scale Out with Unmatched Performance

Grace CPU Specifications

Hopper H100 Specifications

Generative AI Supercluster

Foundations for Large-Scale AI Inference in the Cloud

Plug-and-Play Cluster

Our Rigorous Testing

Un-Equaled Flexibility

Trusted by the World's Biggest Brands

Latest News

links

Latest Reviews

Call Our EU Sales Team Now

Grace Blackwell Servers

Call a Broadberry Storage & Server Specialist Now: +49 89 1208 5600

Have a Broadberry Expert Contact You:

NVIDIA Grace Hopper GH200

The World’s Most Versatile Computing Platform

Power and Efficiency With the Grace CPU

Performance and Speed With the Hopper H100 GPU

Class-Leading Performance for HPC and AI Workloads

Grace Hopper Comparison

Low-Power High-Bandwidth Memory Subsystem LPDDR5X Data Center Enhanced Memory

GH200 Grace Hopper AI Inference PlatformVersatile Scale Out with Unmatched Performance

Grace CPU Specifications

Hopper H100 Specifications

Generative AI Supercluster

Foundations for Large-Scale AI Inference in the Cloud

Plug-and-Play Cluster

Our Rigorous Testing

Un-Equaled Flexibility

Trusted by the World's Biggest Brands

Latest News

links

Latest Reviews

Call Our EU Sales Team Now

GH200 Grace Hopper AI Inference Platform
Versatile Scale Out with Unmatched Performance