TABLE OF CONTENTS

1 EXECUTIVE SUMMARY

2 SUPERMICRO AI / ML SOLUTION

2 AI / ML REFERENCE ARCHITECTURE

3 HOW SUPERMICRO SOLUTION IS

DEPLOYED

3 DEPLOYMENT FOR IT ADMIN

3 DEPLOYMENT FOR DATA SCIENTISTS AND

DEVOP

4 HOW SUPMERMICRO SOLUTION FLOW

WORKS

5 SYSTEM DETAILS

6 CONFIGURATION

7 BENCHMARK RESULTS

8 SUPPORT AND SERVICES

8 CONCLUSION

WHITE PAPER

SUPERMICRO® ARTIFICIAL

INTELLIGENCE / MACHINE

LEARNING READY SOLUTION

Meet all your AI/ML application needs with Supermicro optimized GPU

server solutions

- March 2020

Super Micro Computer, Inc.

980 Rock Avenue

San Jose, CA 95131 USA

www.supermicro.com

EXECUTIVE SUMMARY

The rapid expansion of Articial Intelligence (AI) and Machine Learning (ML) applications into all

aspects of business and everyday life is generating an explosion in Big Data. This advancement comes

with a price, however the need for frequent training, retraining, and hyperparameter tuning longer

times than are now the norm. In addition, AI/ML also requires enormous amounts of processing power

for model training.

Compute-intensive Machine Learning algorithms take extended times to complete when using

hardware without acceleration features, resulting in overall poor application performance and

reduced ROI. With this growing demand for AI/ML applications, enterprise data centers accommodate

budget, space, and IT resources, while also shortening this training time bottleneck.

With no end in sight to expanding datasets, nor to compute and memory-intensive applications,

data center managers must rapidly secure the necessary processing horsepower and matching AI/

ML platforms to satisfy their business needs. With the proper selection of vendors, these hardware-

plus-application solutions will help users to identify trends and patterns, improving throughput and

training times, thus leading to a positive cycle of advancement. This paper describes one such AI/ML

solution from Supermicro.

Supermicro®AI / ML Ready Solution White Paper2

SUPERMICRO AI / ML SOLUTION

GENERAL DESCRIPTION

As Articial Intelligence and Machine Learning solutions become more accessible and more mature,

global organizations will come to realize the value that these solutions can deliver to solve the

advanced business challenges.

The Supermicro AI/ML solution features a best-in-class hardware platform with the enterprise-ready

Canonical Distribution of Kubernetes (CDK) and software-dened storage capabilities from Ceph.

The solution through its reference architecture integrates network, compute, and storage. The

recommended starting implementation includes a single rack with capabilities to scale to many racks

as required.

AI / ML REFERENCE ARCHITECTURE

The reference architecture is ready to deploy end-to-end AI / ML solution that includes AI SW stack,

orchestration, and containers. The optimized reference design ts machine learning training and

inference applications. The architecture on a high-level comprises software, network switches, control,

compute, storage, and support services.

The reference design shown in Figure 1 contains two data switches, two management switches, three

infrastructure nodes that act as foundation nodes for MAAS / JUJU, and six cloud nodes. It is built on

the Kubernetes platform and provides Canonical hardened packages for Kubernetes containers and

Ceph. Kubeow provides a machine learning toolkit for Kubernetes.

KEY CUSTOMER BENEFITS

• Pre-validated reference

architectures

• Certied components

• Scale-out to multiple racks

•�� TCO Optimization for best

performance / watt /$ /ft2

• Start as professional by

leveraging expertise, support

and service

• Supports TensorFlow, Kubeow,

Kubernetes

• Sharable resources with higher

utilization

SOLUTION CONFIGURATION

• Up to 216 compute cores

• Up to 3072 GB system memory

• Up to 36 TB storage

• Up to 40 Gbe data networking

• 19U height

• High performance caching

utilizing NVMe ash storage

Figure 1. Supermicro AI / ML Reference Architecture

RACK1

Availibility Zone-1

Data Switch(es)

MGMT Switch(es)

Foundation Node

(MAAS/JUJU)

Cloud Nodes

Kubernetes