The Mangue system is AMD Linux Cluster based on 5 SuperMicro server nodes, each outfitted with 4-16 AMD Opteron(TM) 6276 (Interlagos) processors and two NVIDIA Tesla C2075 GPGPUs (Fermi Architecture) and 9 Intel Xeon server nodes outfitted with 2 Intel Xeon E5606 processor.
The peak performance of each AMD node is 294 GP [source], while the NVIDIA accelerators deliver an additional peak performance of 1.03 TF (single precision) and 515 GF (double precision)[source]. The system also includes a login node that provides management and file system services.
One of the important design considerations for Mangue was to create a accelerated cyberinfrastructure resource, offering large data transfer and GPU capabilities for data-intensive, accelerated computing and molecular dynamics. By augmenting the compute-intensive nodes within the system with GPUs there is no need to move data for data-intensive computing and GPGPU computing.
Compute Nodes: All nodes are configured with 4-16 AMD processors and two NVIDIA Tesla C2075 GPGPUs (on a PCIe card). These compute nodes are configured with 12GB of "host" memory with an additional 5GB of memory on the NVIDIA cards. Each GPGPU has the Error-correcting Code (ECC) enabled.
File Systems: The Mangue system supports a 8TB global file systems. Each node contains a local 2TB disk. Also, a 19TB archival system is accessible from the login node, but not from the execution nodes.
Interconnect:Nodes are interconnected with QLogic FDR InfiniBand technology in direct connection topology.
All Mangue nodes run CentOS 6.4 and are managed with batch services through SGE 6u5. Global $HOME and $SCRATCH storage areas are supported by an single file system. Inter-node communication (MPI) is through an FDR QLogic InfiniBand network (The network configuration for the compute nodes is shown in Figure 1.4.).
The five compute nodes are housed in one rack, along with one 8-port Mellanox leaf switch and a 3COM Gigabit Switch. Each node has 4-16 AMD processors and two NVIDIA C2075 GPGPU card, connected by two x16 PCIe bus. The host and accelerator are configured with DDR3 12GB and DDR5 5GB memory, respectively.
The configuration and features for the compute nodes, interconnect and I/O systems are described below, and summarized in Tables 1.1 through 1.4.
Intel(R) Xeon(R) CPU E5606 @ 2.13GHz
cache size : 8192 KB
Table 1.1 System Configuration and Performance
|Node||4-16 core AMD 6276 Opteron(TM) Processors, two NVIDIA C2075 GPGPU cards||5 Nodes|
|Memory||Distributed, 12GB/node||60 GB (Aggregate)|
|Shared Disk||xfs filesystem||8 TB|
|Local Disk||ext4 filesystem||2 TB|
|Interconnect||InfiniBand Mellanox Switch/HCA||40 GB/s|
A Compute node consists of a single sled in a 2 rack-unit chassis with 4 other sleds. Each node runs CentOS 6.4 with the 2.6.32 x86_64 Linux kernel and contains 4 16-Core 64-bit AMD Opteron(TM) 6276 (64 cores in all) on a single board, as an SMP unit. The core frequency is 2.3 GHz and supports 8 floating-point operations per clock period with a peak performance of 73.5 GFLOPS/core or 294 GFLOPS/node. Each node contains 16GB of memory (2GB/core). The memory subsystem has 4 channels from each processor's memory controller to 4 DDR3 ECC DIMMS, each rated at 1600 MT/s (51.2GB/s for all four channels in a socket). The processor interconnect runs at 6.4 GT/s between sockets. The two Intel NVIDIA C2075 Graphics Card model have 448 cores with a peak performance of 515 GFLOPS/card in double precision or 1.3 TFLOPS/card in single precision. Each GPGPU contains 5GB of GDDR5 memory.
Table 1.2 Compute Node Configuration and Performance
|Sockets per node/Cores per socket
NVIDIA GPGPU cards/node
|4/16 AMD Opteron 6276 (2.3 GHz
2 Tesla C2075
|Memory per host
Memory per GPGPU
|12 GB 4 channel DDR3 1600 MHz
Processor - Processor
Processor - GPGPU
|2 TB Disk||7.5K RPM SATA|
|1 login node|
|Sockets per node/Cores per socket|