nvidia dgx-1 超級電腦與人工智慧及深度學習

TAIPEI | SEP. 21-22, 2016

Eric Kang 康勝閔, Sep. 21 2016

NVIDIA DGX-1 超級電腦與人工智慧及深度學習

2

GPU Computing

NVIDIAComputing for the Most Demanding Users

Computing Human Imagination

Computing Human Intelligence

3

DEEP LEARNING EVERYWHERE

INTERNET & CLOUD

Image ClassificationSpeech Recognition

Language TranslationLanguage ProcessingSentiment AnalysisRecommendation

MEDIA & ENTERTAINMENT

Video CaptioningVideo Search

Real Time Translation

AUTONOMOUS MACHINES

Pedestrian DetectionLane Tracking

Recognize Traffic Sign

SECURITY & DEFENSE

Face DetectionVideo SurveillanceSatellite Imagery

MEDICINE & BIOLOGY

Cancer Cell DetectionDiabetic GradingDrug Discovery

4

DEEP LEARNING APPROACH

Deploy:

Dog

Cat

Honey badger

Errors

DogCat

Raccoon

Dog

Train:

DNN

DNN

5

72%74%

84%

88%

93%

96%

2010 2011 2012 2013 2014 2015

“SUPERHUMAN” RESULTSSPARK HYPERSCALE

ADOPTION

Deep Learning

ImageNet — Accuracy %

Cloud Services with AI Powered by NVIDIA

Alibaba/Aliyun Amazon Baidu eBay Facebook

Flickr Google iFLYTEK iQIYI JD.com

Orange Periscope Pinterest Qihoo 360 Shazam

Skype Sogou Twitter Yahoo Supermarket Yandex YelpHand-coded CV

Human

74%76%

6Source: IDC Worldwide Big Data and Analytics 2016 Predictions, November 2015. IDC FutureScape: Worldwide Digital Strategy Consulting 2016 Predictions, Nov 2015;

“By 2020, 80% of Big Data and Analytics deployments will need distributed micro analytics and 40% of all business analytics software will incorporate prescriptive analytics built on cognitive computing functionality. Both of these trends require a dramatic increase in processing power that could be enabled by GPUs.”

— IDC

“By 2018, over 50% of developer teams will embed cognitive services in their apps (vs 1% today) providing U.S. enterprises with over $60 billion annual savings by 2020.”

— IDC

AI — THE NEXT TRILLION $ IT OPPORTUNITY

7

Deep Learning is a massive opportunity

Data Scientist productivity is vital

NVIDIA is the choice of the deep learning world

DGX-1 is fast, instantly productive

NVIDIA DGX-1The Essential Tool of

Deep Learning Scientists

170 TFLOPS | 8x Tesla P100 16GB | NVLink Hybrid Cube Mesh2x Xeon | 8 TB RAID 0 | Quad IB 100Gbps, Dual 10GbE | 3U

8

TESLA P100 WITH NVLINKNew GPU Architecture to Enable the World’s Fastest Compute Node

Pascal Architecture NVLink CoWoS HBM2 Page Migration EnginePCIe

SwitchPCIe

Switch

CPU CPU

Highest Compute Performance GPU Interconnect for Maximum Scalability

Unifying Compute & Memory in Single Package

Simple Parallel Programming with Virtually Unlimited Memory

Unified Memory

CPU

Tesla P100

9

Engineered for deep learning | 170TF FP16 | 8x Tesla P100

NVLink hybrid cube mesh | Accelerates major AI frameworks

NVIDIA DGX-1WORLD’S FIRST DEEP LEARNING SUPERCOMPUTER

10

NVIDIA DEEP LEARNING SDKHigh Performance GPU-Acceleration for Deep Learning

COMPUTER VISION SPEECH AND AUDIO BEHAVIORObject Detection Voice Recognition Translation

Recommendation Engines Sentiment Analysis

DEEP LEARNING

cuDNN

MATH LIBRARIES

cuBLAS cuSPARSE

MULTI-GPU

NCCL

cuFFT

Mocha.jl

Image Classification

DEEP LEARNING SDK

FRAMEWORKS

APPLICATIONS

11

NVIDIA CUDNN

Building blocks for accelerating deep neural networks on GPUs

High performance deep neural network training and inference

Accelerates Caffe, CNTK, Tensorflow, Theano, Torch

Performance continues to improve over time

“NVIDIA has improved the speed of cuDNN with each release while extending the interface to more operations and devices at the same time.”

— Evan Shelhamer, Lead Caffe Developer, UC Berkeley

developer.nvidia.com/cudnn

AlexNet training throughput based on 20 iterations, CPU: 1x E5-2680v3 12 Core 2.5GHz.

0x

2x

4x

6x

8x

10x

12x

2014 2015 2016

K40(cuDNN v1)

M40(cuDNN v3)

Pascal(cuDNN v5)

12

NVIDIA DIGITSInteractive Deep Learning GPU Training System

Test Image

Monitor ProgressConfigure DNNProcess Data Visualize Layers

developer.nvidia.com/digitsgithub.com/NVIDIA/DIGITS

13

Instant productivity — plug-and-play, supports every AI framework

Performance optimized across the entire stack

Always up-to-date via the cloud

Mixed framework environments —containerized

Direct access to NVIDIA experts

DGX STACKFully integrated Deep Learning platform

14

NVIDIA DOCKER ON GITHUB

15

NVIDIA IMAGESPrebuilt and ready to use

16

DGX-1 CONTAINER LAUNCH FLOWCustomer data stays on premise

Web Browser

Node Management

User Authentication

Docker Image push/pull

Scheduler UI

HW/SW Metrics

LOCAL LAN

All Application Data

NFS Storage

DIGITS UI

Interactive Sessions

compute.nvidia.com 1. User schedules containers to run

3. User interacts with application

17

DIGITS FOR DGX-1A complete GPU-accelerated deep learning workflow

MANAGE TRAIN DEPLOY

DIGITS

DATA CENTER AUTOMOTIVE

TRAINTEST

MANAGE / AUGMENTEMBEDDED

GPU INFERENCE ENGINE

MODEL ZOO

18

BUILT FOR THE DATA CENTER

Data Center Ready24/7 Uptime

Boost data center throughput

Scalable Performance

Maximize reliability Simplify system operations

! !○

19

END-TO-END DESIGN FOR SYSTEM UPTIME 24/7 Uptime


Data Center Ready

Guaranteed QualitySystem Qual. Tests: Thermal, Stress, Airflow rate, Shock & Vibe

System Monitoring and Management for Tesla only

Dedicated Technical Staff for Failure Analysis

Extensive Qualification & Testing

Long Burn-in Testing

Zero Error Tolerance at Aggressive Clocks

Even with Differentiated Engineering 5% of GPUs are screened out

Differentiated Engineering

Low Operating Voltage for Long Term Reliability

Large Guard-band for Guaranteed Quality

Error Correction Code (ECC) for Data Integrity

20

DYNAMIC PAGE RETIREMENT MAXIMIZES UPTIME24/7 Uptime


Data Center Ready

GPU MEMORY

Uncorrectable Data Error causes application to

crash

Weak memory page is retired

Tesla GPU with Dynamic Page Retirement

GPU without Dynamic Page Retirement (DPR)

Weak memory is still active

1. Users lose productivity as jobs continue to crash

2. IT Managers need to physically open up the server and remove the bad GPU

3. Customer satisfaction risk with RMA process

1. Removes bad memory with simple reboot

2. No physical work required for IT

3. Negligible impact: <0.01% of memory is retired

!

21

DATA CENTER QUALIFIED BY SERVER OEMS24/7 Uptime


Data Center Ready

Server with Tesla GPU

Server with Unqualified GPU

Designed for max airflow through GPU

Supports airflow front-to-back & back-to-front

Lower power consumption

GPU Temp Running Linpack: 54C

Works against server airflow

Higher power consumption

Lower reliability

GPU Temp Running Linpack: 71C

Airflow

Temp: 54C

Temp: 71C

22

SCALE-OUT PERFORMANCE IN THE DATA CENTER24/7 Uptime


Data Center Ready

0

500

1000

1500

2000

8 16 32 64 96

Up to 2x Faster

Application Performance at Scale with GPUDirect RDMA

GPUDirect RDMAA

Direct transfers between GPUs

67% Lower GPU-to-GPU Latency

5x Higher GPU-to-GPU MPI Bandwidth

Tim

e-st

eps

per

Sec

# of Nodes

Hoomd-Blue ApplicationLJ Liquid Benchmark, 256K Particles

without RDMAwith RDMA

23

NVLINK DELIVERS SCALABLE PERFORMANCE24/7 Uptime


Data Center Ready

More than 45x Faster with 8x P100 Interconnected with NVLink

0x

5x

10x

15x

20x

25x

30x

35x

40x

45x

50x

Caffe/Alexnet VASP HOOMD-Blue COSMO MILC Amber HACC

2x K80 (M40 for Alexnet) 2x P100 4x P100 8x P100

Spee

d-up

vs

Dual

Soc

ket

Has

wel

l

2x Haswell

CPU

24

DATA CENTER GPU MANAGEMENT

24/7 Uptime


Device Management

• Device Identification

• Board Monitoring

• Clock Management

Per GPU Configuration & Monitoring

Data Center Ready

Enterprise-Grade Management Tool for Operating the Data Center

Active Health Monitoring ! Diagnostics &

System Validation

Runtime Health ChecksPrologue ChecksEpilogue Checks

Deep HW DiagnosticsSystem Validation Tests

Policy & Group Config Management

Pre-configured policiesJob level accountingStateful configuration

Power & Clock Mgmt.

Dynamic Power CappingSynchronous Clock Boost

!

Data Center GPU Manager (Tesla GPUs Only)

All GPUs Supported

25

DATA CENTER GPU MANAGER

24/7 Uptime


Data Center Ready

Integrated into Leading Industry Tools for HPC

Moab Cluster SuiteTORQUE

PBS Professional

IBM Platform HPCIBM Platform LSF

Bright Cluster Manager

StackIQ Boss for HPC with CUDA Pallet

Grid Engine

3rd PartySoftware

TAIPEI | SEP. 21-22, 2016

THANK YOU

nvidia dgx-1 超級電腦與人工智慧及深度學習

Technology