nvidia dgx-1 超級電腦與人工智慧及深度學習

26
TAIPEI | SEP. 21-22, 2016 Eric Kang 康勝閔, Sep. 21 2016 NVIDIA DGX-1 超級電腦 與人工智慧及深度學習

Upload: nvidia-taiwan

Post on 09-Jan-2017

228 views

Category:

Technology


1 download

TRANSCRIPT

Page 1: NVIDIA DGX-1 超級電腦與人工智慧及深度學習

TAIPEI | SEP. 21-22, 2016

Eric Kang 康勝閔, Sep. 21 2016

NVIDIA DGX-1 超級電腦與人工智慧及深度學習

Page 2: NVIDIA DGX-1 超級電腦與人工智慧及深度學習

2

GPU Computing

NVIDIAComputing for the Most Demanding Users

Computing Human Imagination

Computing Human Intelligence

Page 3: NVIDIA DGX-1 超級電腦與人工智慧及深度學習

3

DEEP LEARNING EVERYWHERE

INTERNET & CLOUD

Image ClassificationSpeech Recognition

Language TranslationLanguage ProcessingSentiment AnalysisRecommendation

MEDIA & ENTERTAINMENT

Video CaptioningVideo Search

Real Time Translation

AUTONOMOUS MACHINES

Pedestrian DetectionLane Tracking

Recognize Traffic Sign

SECURITY & DEFENSE

Face DetectionVideo SurveillanceSatellite Imagery

MEDICINE & BIOLOGY

Cancer Cell DetectionDiabetic GradingDrug Discovery

Page 4: NVIDIA DGX-1 超級電腦與人工智慧及深度學習

4

DEEP LEARNING APPROACH

Deploy:

Dog

Cat

Honey badger

Errors

DogCat

Raccoon

Dog

Train:

DNN

DNN

Page 5: NVIDIA DGX-1 超級電腦與人工智慧及深度學習

5

72%74%

84%

88%

93%

96%

2010 2011 2012 2013 2014 2015

“SUPERHUMAN” RESULTSSPARK HYPERSCALE

ADOPTION

Deep Learning

ImageNet — Accuracy %

Cloud Services with AI Powered by NVIDIA

Alibaba/Aliyun Amazon Baidu eBay Facebook

Flickr Google iFLYTEK iQIYI JD.com

Orange Periscope Pinterest Qihoo 360 Shazam

Skype Sogou Twitter Yahoo Supermarket Yandex YelpHand-coded CV

Human

74%76%

Page 6: NVIDIA DGX-1 超級電腦與人工智慧及深度學習

6Source: IDC Worldwide Big Data and Analytics 2016 Predictions, November 2015. IDC FutureScape: Worldwide Digital Strategy Consulting 2016 Predictions, Nov 2015;

“By 2020, 80% of Big Data and Analytics deployments will need distributed micro analytics and 40% of all business analytics software will incorporate prescriptive analytics built on cognitive computing functionality. Both of these trends require a dramatic increase in processing power that could be enabled by GPUs.”

— IDC

“By 2018, over 50% of developer teams will embed cognitive services in their apps (vs 1% today) providing U.S. enterprises with over $60 billion annual savings by 2020.”

— IDC

AI — THE NEXT TRILLION $ IT OPPORTUNITY

Page 7: NVIDIA DGX-1 超級電腦與人工智慧及深度學習

7

Deep Learning is a massive opportunity

Data Scientist productivity is vital

NVIDIA is the choice of the deep learning world

DGX-1 is fast, instantly productive

NVIDIA DGX-1The Essential Tool of

Deep Learning Scientists

170 TFLOPS | 8x Tesla P100 16GB | NVLink Hybrid Cube Mesh2x Xeon | 8 TB RAID 0 | Quad IB 100Gbps, Dual 10GbE | 3U

Page 8: NVIDIA DGX-1 超級電腦與人工智慧及深度學習

8

TESLA P100 WITH NVLINKNew GPU Architecture to Enable the World’s Fastest Compute Node

Pascal Architecture NVLink CoWoS HBM2 Page Migration EnginePCIe

SwitchPCIe

Switch

CPU CPU

Highest Compute Performance GPU Interconnect for Maximum Scalability

Unifying Compute & Memory in Single Package

Simple Parallel Programming with Virtually Unlimited Memory

Unified Memory

CPU

Tesla P100

Page 9: NVIDIA DGX-1 超級電腦與人工智慧及深度學習

9

Engineered for deep learning | 170TF FP16 | 8x Tesla P100

NVLink hybrid cube mesh | Accelerates major AI frameworks

NVIDIA DGX-1WORLD’S FIRST DEEP LEARNING SUPERCOMPUTER

Page 10: NVIDIA DGX-1 超級電腦與人工智慧及深度學習

10

NVIDIA DEEP LEARNING SDKHigh Performance GPU-Acceleration for Deep Learning

COMPUTER VISION SPEECH AND AUDIO BEHAVIORObject Detection Voice Recognition Translation

Recommendation Engines Sentiment Analysis

DEEP LEARNING

cuDNN

MATH LIBRARIES

cuBLAS cuSPARSE

MULTI-GPU

NCCL

cuFFT

Mocha.jl

Image Classification

DEEP LEARNING SDK

FRAMEWORKS

APPLICATIONS

Page 11: NVIDIA DGX-1 超級電腦與人工智慧及深度學習

11

NVIDIA CUDNN

Building blocks for accelerating deep neural networks on GPUs

High performance deep neural network training and inference

Accelerates Caffe, CNTK, Tensorflow, Theano, Torch

Performance continues to improve over time

“NVIDIA has improved the speed of cuDNN with each release while extending the interface to more operations and devices at the same time.”

— Evan Shelhamer, Lead Caffe Developer, UC Berkeley

developer.nvidia.com/cudnn

AlexNet training throughput based on 20 iterations, CPU: 1x E5-2680v3 12 Core 2.5GHz.

0x

2x

4x

6x

8x

10x

12x

2014 2015 2016

K40(cuDNN v1)

M40(cuDNN v3)

Pascal(cuDNN v5)

Page 12: NVIDIA DGX-1 超級電腦與人工智慧及深度學習

12

NVIDIA DIGITSInteractive Deep Learning GPU Training System

Test Image

Monitor ProgressConfigure DNNProcess Data Visualize Layers

developer.nvidia.com/digitsgithub.com/NVIDIA/DIGITS

Page 13: NVIDIA DGX-1 超級電腦與人工智慧及深度學習

13

Instant productivity — plug-and-play, supports every AI framework

Performance optimized across the entire stack

Always up-to-date via the cloud

Mixed framework environments —containerized

Direct access to NVIDIA experts

DGX STACKFully integrated Deep Learning platform

Page 14: NVIDIA DGX-1 超級電腦與人工智慧及深度學習

14

NVIDIA DOCKER ON GITHUB

Page 15: NVIDIA DGX-1 超級電腦與人工智慧及深度學習

15

NVIDIA IMAGESPrebuilt and ready to use

Page 16: NVIDIA DGX-1 超級電腦與人工智慧及深度學習

16

DGX-1 CONTAINER LAUNCH FLOWCustomer data stays on premise

Web Browser

Node Management

User Authentication

Docker Image push/pull

Scheduler UI

HW/SW Metrics

LOCAL LAN

All Application Data

NFS Storage

DIGITS UI

Interactive Sessions

compute.nvidia.com 1. User schedules containers to run

3. User interacts with application

Page 17: NVIDIA DGX-1 超級電腦與人工智慧及深度學習

17

DIGITS FOR DGX-1A complete GPU-accelerated deep learning workflow

MANAGE TRAIN DEPLOY

DIGITS

DATA CENTER AUTOMOTIVE

TRAINTEST

MANAGE / AUGMENTEMBEDDED

GPU INFERENCE ENGINE

MODEL ZOO

Page 18: NVIDIA DGX-1 超級電腦與人工智慧及深度學習

18

BUILT FOR THE DATA CENTER

Data Center Ready24/7 Uptime

Boost data center throughput

Scalable Performance

Maximize reliability Simplify system operations

! !○

Page 19: NVIDIA DGX-1 超級電腦與人工智慧及深度學習

19

END-TO-END DESIGN FOR SYSTEM UPTIME 24/7 Uptime

Scalable Performance

Data Center Ready

Guaranteed QualitySystem Qual. Tests: Thermal, Stress, Airflow rate, Shock & Vibe

System Monitoring and Management for Tesla only

Dedicated Technical Staff for Failure Analysis

Extensive Qualification & Testing

Long Burn-in Testing

Zero Error Tolerance at Aggressive Clocks

Even with Differentiated Engineering 5% of GPUs are screened out

Differentiated Engineering

Low Operating Voltage for Long Term Reliability

Large Guard-band for Guaranteed Quality

Error Correction Code (ECC) for Data Integrity

Page 20: NVIDIA DGX-1 超級電腦與人工智慧及深度學習

20

DYNAMIC PAGE RETIREMENT MAXIMIZES UPTIME24/7 Uptime

Scalable Performance

Data Center Ready

GPU MEMORY

Uncorrectable Data Error causes application to

crash

Weak memory page is retired

Tesla GPU with Dynamic Page Retirement

GPU without Dynamic Page Retirement (DPR)

Weak memory is still active

1. Users lose productivity as jobs continue to crash

2. IT Managers need to physically open up the server and remove the bad GPU

3. Customer satisfaction risk with RMA process

1. Removes bad memory with simple reboot

2. No physical work required for IT

3. Negligible impact: <0.01% of memory is retired

!

Page 21: NVIDIA DGX-1 超級電腦與人工智慧及深度學習

21

DATA CENTER QUALIFIED BY SERVER OEMS24/7 Uptime

Scalable Performance

Data Center Ready

Server with Tesla GPU

Server with Unqualified GPU

Designed for max airflow through GPU

Supports airflow front-to-back & back-to-front

Lower power consumption

GPU Temp Running Linpack: 54C

Works against server airflow

Higher power consumption

Lower reliability

GPU Temp Running Linpack: 71C

Airflow

Temp: 54C

Temp: 71C

Page 22: NVIDIA DGX-1 超級電腦與人工智慧及深度學習

22

SCALE-OUT PERFORMANCE IN THE DATA CENTER24/7 Uptime

Scalable Performance

Data Center Ready

0

500

1000

1500

2000

8 16 32 64 96

Up to 2x Faster

Application Performance at Scale with GPUDirect RDMA

GPUDirect RDMAA

Direct transfers between GPUs

67% Lower GPU-to-GPU Latency

5x Higher GPU-to-GPU MPI Bandwidth

Tim

e-st

eps

per

Sec

# of Nodes

Hoomd-Blue ApplicationLJ Liquid Benchmark, 256K Particles

without RDMAwith RDMA

Page 23: NVIDIA DGX-1 超級電腦與人工智慧及深度學習

23

NVLINK DELIVERS SCALABLE PERFORMANCE24/7 Uptime

Scalable Performance

Data Center Ready

More than 45x Faster with 8x P100 Interconnected with NVLink

0x

5x

10x

15x

20x

25x

30x

35x

40x

45x

50x

Caffe/Alexnet VASP HOOMD-Blue COSMO MILC Amber HACC

2x K80 (M40 for Alexnet) 2x P100 4x P100 8x P100

Spee

d-up

vs

Dual

Soc

ket

Has

wel

l

2x Haswell

CPU

Page 24: NVIDIA DGX-1 超級電腦與人工智慧及深度學習

24

DATA CENTER GPU MANAGEMENT

24/7 Uptime

Scalable Performance

Device Management

• Device Identification

• Board Monitoring

• Clock Management

Per GPU Configuration & Monitoring

Data Center Ready

Enterprise-Grade Management Tool for Operating the Data Center

Active Health Monitoring ! Diagnostics &

System Validation

Runtime Health ChecksPrologue ChecksEpilogue Checks

Deep HW DiagnosticsSystem Validation Tests

Policy & Group Config Management

Pre-configured policiesJob level accountingStateful configuration

Power & Clock Mgmt.

Dynamic Power CappingSynchronous Clock Boost

!

Data Center GPU Manager (Tesla GPUs Only)

All GPUs Supported

Page 25: NVIDIA DGX-1 超級電腦與人工智慧及深度學習

25

DATA CENTER GPU MANAGER

24/7 Uptime

Scalable Performance

Data Center Ready

Integrated into Leading Industry Tools for HPC

Moab Cluster SuiteTORQUE

PBS Professional

IBM Platform HPCIBM Platform LSF

Bright Cluster Manager

StackIQ Boss for HPC with CUDA Pallet

Grid Engine

3rd PartySoftware

Page 26: NVIDIA DGX-1 超級電腦與人工智慧及深度學習

TAIPEI | SEP. 21-22, 2016

THANK YOU