WHITE PAPER
AI WORKLOADS AT SCALE
Kubernetes Cluster with Supermicro's Systems with
AMD EPYC 7002 Series processors
Executive Summary
The Deep Learning (DL) benchmark results in the previous white paper
1
clearly
show that a DL workload in Docker containers performs the same as on the
BareMetal. Building an on-prem Kubernetes cluster with GPU workers and AI
framework-specific Docker containers can help an organization run projects or
productions in a highly reliable and scalable platform. In this white paper,
Supermicro AMD based WIO systems, AS-1114S-WTRT, are introduced as
Kubernetes Admin and master nodes. Along with AS-2023US-TR4, we build an
NVIDIA GPU capable Kubernetes cluster that uses Cloud-native CEPH storage as
persistent volumes and demonstrates how a DL workload can scale the
Kubernetes cluster.
1
White paper:
SUPERMICRO® SYSTEM COMBINES AMD EPYC™PROCESSORS AND NVIDIA GPUS TO ACHIEVE
CONSISTENT DEEP LEARNING PERFORMANCE WITH LINEAR SCALING
Supermicro AMD based WIO System
SUPERMICRO
Supermicro (Nasdaq: SMCI), the leading
innovator in high-performance, high-
efficiency server and storage technology is
a premier provider of advanced server
Building Block Solutions® for Enterprise
Data Center, Cloud Computing, Artificial
Intelligence, and Edge Computing
Systems worldwide. Supermicro is
committed to protecting the environment
through its “We Keep IT Green®” initiative
and provides customers with the most
energy-efficient, environmentally-friendly
solutions available on the market.
TABLE OF CONTENTS
Executive Summary
.......................
1
System Configuration
.....................
2
Introduction to Kubernetes
...............
4
Kubernetes Cluster Deployment
..........
4
Scale-up With Kubernetes Cluster
.........
8
Conclusion
................................
9
2
AI Workloads At Scale March 2021
System Configuration
AS-1114S-WTRT is one of the Supermicro AMD EPYC 7002 series based WIO series servers, offering a wide range of I/O options
to deliver truly optimized systems for specific requirements. For more detailed system information, please go HERE.
Customers can optimize the storage and networking alternatives to accelerate performance, find the perfect fit for their
applications. In our case, it uses the VMWare host and Kubernetes master nodes. Figure 1 and Figure 2 provide an overview of
the system:
Table 1 provides a sample configuration for AS-1114S-WTRT
Part Number
QTY
Part Description
AS-1114S-WTRT
1
H12 SSW-NT,CSV116TS-R504WBP
PSE-ROM7552
1
AMD EPYC™ 7552 DP/UP 48C/96T 2.2G 192M 200W 4094
MEM-DR416L-HL01-ER32
8
16GB DDR4-3200 2Rx8 ECC REG DIMM
HDS-X2A-XS7680TE70004
2
[NR]Seagate Lange 7.68TB SAS 12Gb/s, 15mm, 2.5", 0.8DWPD SSD, HF
HDS-SMN1-MZ1LB3T8HMLA07
2
Samsung PM983 3.84TB NVMe PCIe3x4 V4 M.2 22x110mm (1.3 DWPD)
MCX4121A-ACAT
2
Standard Low-profile Mellanox 25GbE card with 2x SFP28 ports
AOC-S3008L-L8i
1
AOC-S3008L-L8i Retail Pack
CBL-SAST-0593
1
Internal Mini-SAS HD to Mini-SAS HD 60cm,30AWG,12Gb/s
SFT-DCMS-SINGLE
1
Supermicro System Management Software Suite node license, HF, RoHS/REACH, PBF
Table 1
Figure 1
Figure 2