gpus for deep learning - g-dep|gpuコンピュー … for deep learning jerry chen & brent...
TRANSCRIPT
![Page 1: GPUs for Deep Learning - G-DEP|GPUコンピュー … for Deep Learning Jerry Chen & Brent Oster April 2015 PC DATA CENTER MOBILE ENTERPRISE VIRTUALIZATION AUTONOMOUS MACHINES HPC](https://reader035.vdocuments.pub/reader035/viewer/2022081521/5abd3a7e7f8b9a3a428b824d/html5/thumbnails/1.jpg)
GPUs for Deep Learning
Jerry Chen & Brent Oster
April 2015
![Page 2: GPUs for Deep Learning - G-DEP|GPUコンピュー … for Deep Learning Jerry Chen & Brent Oster April 2015 PC DATA CENTER MOBILE ENTERPRISE VIRTUALIZATION AUTONOMOUS MACHINES HPC](https://reader035.vdocuments.pub/reader035/viewer/2022081521/5abd3a7e7f8b9a3a428b824d/html5/thumbnails/2.jpg)
PC DATA CENTER MOBILE
ENTERPRISE VIRTUALIZATION
AUTONOMOUS MACHINES
HPC & CLOUD SERVICE PROVIDERSGAMING DESIGN
The World Leader in Visual Computing
![Page 3: GPUs for Deep Learning - G-DEP|GPUコンピュー … for Deep Learning Jerry Chen & Brent Oster April 2015 PC DATA CENTER MOBILE ENTERPRISE VIRTUALIZATION AUTONOMOUS MACHINES HPC](https://reader035.vdocuments.pub/reader035/viewer/2022081521/5abd3a7e7f8b9a3a428b824d/html5/thumbnails/3.jpg)
0
500
1000
1500
2000
2500
3000
3500
2008 2009 2010 2011 2012 2013 2014
Peak Double Precision FLOPS
NVIDIA GPU x86 CPU
M2090
M1060
K20
K80
WestmereSandy Bridge
Haswell
GFLOPS
0
100
200
300
400
500
600
2008 2009 2010 2011 2012 2013 2014
Peak Memory Bandwidth
NVIDIA GPU x86 CPU
GB/s
K20
K80
WestmereSandy Bridge
Haswell
Ivy Bridge
K40
Ivy Bridge
K40
M2090
M1060
Performance Continues to Accelerate
![Page 4: GPUs for Deep Learning - G-DEP|GPUコンピュー … for Deep Learning Jerry Chen & Brent Oster April 2015 PC DATA CENTER MOBILE ENTERPRISE VIRTUALIZATION AUTONOMOUS MACHINES HPC](https://reader035.vdocuments.pub/reader035/viewer/2022081521/5abd3a7e7f8b9a3a428b824d/html5/thumbnails/4.jpg)
US to Build Two Flagship Supercomputers
150-300 PFLOPS Peak Performance
IBM POWER9 CPU + NVIDIA Volta GPU
NVLink High Speed Interconnect
40 TFLOPS per Node, >3,400 Nodes
2017
SUMMIT SIERRA
![Page 5: GPUs for Deep Learning - G-DEP|GPUコンピュー … for Deep Learning Jerry Chen & Brent Oster April 2015 PC DATA CENTER MOBILE ENTERPRISE VIRTUALIZATION AUTONOMOUS MACHINES HPC](https://reader035.vdocuments.pub/reader035/viewer/2022081521/5abd3a7e7f8b9a3a428b824d/html5/thumbnails/5.jpg)
A Brief History of CIFAR-10 (2010-2012)
10-class image classification problem
60,000 32x32 images
Slide courtesy of Adam Coates, Baidu Research
2010 2011
74.5%Improved LCC [Yu & Zhang, ‘10]
78.9%Conv. RBM [Krizhevsky, ‘10]
64.8%RBM [Krizhevsky, ‘09]
71.0%MC-RBM [Ranzato et al., ‘10]
65.3%3-way factored RBM [Ranzato et al., ‘10]
K-means [Coates et al., ‘11]81.5%
2012
88.8%
Multi-column DNN
[Ciresan et al., ‘12]
![Page 6: GPUs for Deep Learning - G-DEP|GPUコンピュー … for Deep Learning Jerry Chen & Brent Oster April 2015 PC DATA CENTER MOBILE ENTERPRISE VIRTUALIZATION AUTONOMOUS MACHINES HPC](https://reader035.vdocuments.pub/reader035/viewer/2022081521/5abd3a7e7f8b9a3a428b824d/html5/thumbnails/6.jpg)
Natural image recognition
1.2M training images
1000 classes
ImageNet Large Scale Visual Recognition Challenge
http://www.image-net.org/challenges/LSVRC/
![Page 7: GPUs for Deep Learning - G-DEP|GPUコンピュー … for Deep Learning Jerry Chen & Brent Oster April 2015 PC DATA CENTER MOBILE ENTERPRISE VIRTUALIZATION AUTONOMOUS MACHINES HPC](https://reader035.vdocuments.pub/reader035/viewer/2022081521/5abd3a7e7f8b9a3a428b824d/html5/thumbnails/7.jpg)
ImageNet Large Scale Visual Recognition Challenge
ImageNet Large-Scale Visual Recognition Challenge started in 2010.
Best known, annual benchmark for image classification and object detection.
A classifier supplies 5 predictions out of 1,000 categories. Classification is considered correct when one guess agrees with the ground truth.
28.20
25.80
16.40
11.70
6.705.33 4.94 4.82
0.00
5.00
10.00
15.00
20.00
25.00
30.00
ILSVRC 2010(NEC)
ILSVRC 2011(Xerox)
ILSVRC 2012(AlexNet)
ILSVRC 2013(Clarifai)
ILSVRC 2014(GoogLeNet)
Jan 2015(Baidu)
Feb 2015(Microsoft)
Feb 2015(Google)
ILSVRC Top-5 Classification Error [%]
Deep Learning & GPUs
![Page 8: GPUs for Deep Learning - G-DEP|GPUコンピュー … for Deep Learning Jerry Chen & Brent Oster April 2015 PC DATA CENTER MOBILE ENTERPRISE VIRTUALIZATION AUTONOMOUS MACHINES HPC](https://reader035.vdocuments.pub/reader035/viewer/2022081521/5abd3a7e7f8b9a3a428b824d/html5/thumbnails/8.jpg)
Deep learning improves with scale
Data & Compute
PerformanceDeep Learning
Many previous methods
Past Present Future
Slide courtesy of Adam Coates, Baidu Research
![Page 9: GPUs for Deep Learning - G-DEP|GPUコンピュー … for Deep Learning Jerry Chen & Brent Oster April 2015 PC DATA CENTER MOBILE ENTERPRISE VIRTUALIZATION AUTONOMOUS MACHINES HPC](https://reader035.vdocuments.pub/reader035/viewer/2022081521/5abd3a7e7f8b9a3a428b824d/html5/thumbnails/9.jpg)
3 Drivers for Deep Learning
More Data Better ModelsPowerful GPUAccelerators
![Page 10: GPUs for Deep Learning - G-DEP|GPUコンピュー … for Deep Learning Jerry Chen & Brent Oster April 2015 PC DATA CENTER MOBILE ENTERPRISE VIRTUALIZATION AUTONOMOUS MACHINES HPC](https://reader035.vdocuments.pub/reader035/viewer/2022081521/5abd3a7e7f8b9a3a428b824d/html5/thumbnails/10.jpg)
[Lee, Ranganath & Ng, 2007]
Why are GPUs good for deep learning?
GPUs deliver --
same or better prediction accuracy
faster results
smaller footprint
lower power
Neural Networks GPUs
Inherently
Parallel
Matrix
Operations
FLOPS
Bandwidth
![Page 11: GPUs for Deep Learning - G-DEP|GPUコンピュー … for Deep Learning Jerry Chen & Brent Oster April 2015 PC DATA CENTER MOBILE ENTERPRISE VIRTUALIZATION AUTONOMOUS MACHINES HPC](https://reader035.vdocuments.pub/reader035/viewer/2022081521/5abd3a7e7f8b9a3a428b824d/html5/thumbnails/11.jpg)
DEEP LEARNING VISUALIZED
![Page 12: GPUs for Deep Learning - G-DEP|GPUコンピュー … for Deep Learning Jerry Chen & Brent Oster April 2015 PC DATA CENTER MOBILE ENTERPRISE VIRTUALIZATION AUTONOMOUS MACHINES HPC](https://reader035.vdocuments.pub/reader035/viewer/2022081521/5abd3a7e7f8b9a3a428b824d/html5/thumbnails/12.jpg)
Image Classification, Object Detection, Localization Face Recognition
Speech & Natural Language Processing
Medical Imaging & Interpretation
Seismic Imaging & Interpretation Recommendation
Example Use Cases
![Page 13: GPUs for Deep Learning - G-DEP|GPUコンピュー … for Deep Learning Jerry Chen & Brent Oster April 2015 PC DATA CENTER MOBILE ENTERPRISE VIRTUALIZATION AUTONOMOUS MACHINES HPC](https://reader035.vdocuments.pub/reader035/viewer/2022081521/5abd3a7e7f8b9a3a428b824d/html5/thumbnails/13.jpg)
Deep learning revolutionizingmedical research
Detecting Mitosis in
Breast Cancer Cells— IDSIA
Predicting the Toxicity
of New Drugs— Johannes Kepler University
Understanding Gene Mutation
to Prevent Disease— University of Toronto
![Page 14: GPUs for Deep Learning - G-DEP|GPUコンピュー … for Deep Learning Jerry Chen & Brent Oster April 2015 PC DATA CENTER MOBILE ENTERPRISE VIRTUALIZATION AUTONOMOUS MACHINES HPC](https://reader035.vdocuments.pub/reader035/viewer/2022081521/5abd3a7e7f8b9a3a428b824d/html5/thumbnails/14.jpg)
Neuronal Tissue Segmentation
![Page 15: GPUs for Deep Learning - G-DEP|GPUコンピュー … for Deep Learning Jerry Chen & Brent Oster April 2015 PC DATA CENTER MOBILE ENTERPRISE VIRTUALIZATION AUTONOMOUS MACHINES HPC](https://reader035.vdocuments.pub/reader035/viewer/2022081521/5abd3a7e7f8b9a3a428b824d/html5/thumbnails/15.jpg)
Reinforcement Learning
![Page 16: GPUs for Deep Learning - G-DEP|GPUコンピュー … for Deep Learning Jerry Chen & Brent Oster April 2015 PC DATA CENTER MOBILE ENTERPRISE VIRTUALIZATION AUTONOMOUS MACHINES HPC](https://reader035.vdocuments.pub/reader035/viewer/2022081521/5abd3a7e7f8b9a3a428b824d/html5/thumbnails/16.jpg)
Building High-Level Features Using Large Scale Unsupervised
Learning
Q. Le, M. Ranzato, R. Monga, M. Devin, K. Chen, G. Gorrado, J. Dean, A. Ng
ICML 2012
Deep learning with COTS HPC systems
A. Coates, B. Huval, T. Wang, D. Wu, A. Ng, B. Catanzaro
ICML 2013
GOOGLE DATACENTER
1,000 CPU Servers 2,000 CPUs • 16,000 cores
600 kWatts
$5,000,000
STANFORD AI LAB
3 GPU-Accelerated Servers 12 GPUs • 18,432 cores
4 kWatts
$33,000
Now You Can Build Google’s
$1M Artificial Brain on the Cheap
“ “
Unsupervised Learning
![Page 17: GPUs for Deep Learning - G-DEP|GPUコンピュー … for Deep Learning Jerry Chen & Brent Oster April 2015 PC DATA CENTER MOBILE ENTERPRISE VIRTUALIZATION AUTONOMOUS MACHINES HPC](https://reader035.vdocuments.pub/reader035/viewer/2022081521/5abd3a7e7f8b9a3a428b824d/html5/thumbnails/17.jpg)
GPU-Accelerated deep learning
START-UPS
![Page 18: GPUs for Deep Learning - G-DEP|GPUコンピュー … for Deep Learning Jerry Chen & Brent Oster April 2015 PC DATA CENTER MOBILE ENTERPRISE VIRTUALIZATION AUTONOMOUS MACHINES HPC](https://reader035.vdocuments.pub/reader035/viewer/2022081521/5abd3a7e7f8b9a3a428b824d/html5/thumbnails/18.jpg)
DIGITSDEEP GPU TRAINING
SYSTEM FOR DATA
SCIENTISTS
Design DNNs
Visualize activations
Manage multiple trainingsGPUGPU HW CloudGPU
ClusterMulti-GPU
USER INTERFACE
Visualize Layers
Configure DNN
Process Data
MonitorProgress
TheanoTorch
CaffecuDNN, cuBLAS
CUDA
![Page 19: GPUs for Deep Learning - G-DEP|GPUコンピュー … for Deep Learning Jerry Chen & Brent Oster April 2015 PC DATA CENTER MOBILE ENTERPRISE VIRTUALIZATION AUTONOMOUS MACHINES HPC](https://reader035.vdocuments.pub/reader035/viewer/2022081521/5abd3a7e7f8b9a3a428b824d/html5/thumbnails/19.jpg)
DIGITS
Test Image
Monitor ProgressConfigure DNNProcess Data Visualize Layers
![Page 20: GPUs for Deep Learning - G-DEP|GPUコンピュー … for Deep Learning Jerry Chen & Brent Oster April 2015 PC DATA CENTER MOBILE ENTERPRISE VIRTUALIZATION AUTONOMOUS MACHINES HPC](https://reader035.vdocuments.pub/reader035/viewer/2022081521/5abd3a7e7f8b9a3a428b824d/html5/thumbnails/20.jpg)
DIGITS
Deep Learning GPU Training System
Who it is for
Deep learning researchers
Automotive
Medical Researchers
Defense
Intelligent Video Analytics
Web Companies
Startups
![Page 21: GPUs for Deep Learning - G-DEP|GPUコンピュー … for Deep Learning Jerry Chen & Brent Oster April 2015 PC DATA CENTER MOBILE ENTERPRISE VIRTUALIZATION AUTONOMOUS MACHINES HPC](https://reader035.vdocuments.pub/reader035/viewer/2022081521/5abd3a7e7f8b9a3a428b824d/html5/thumbnails/21.jpg)
Digits Demo
![Page 22: GPUs for Deep Learning - G-DEP|GPUコンピュー … for Deep Learning Jerry Chen & Brent Oster April 2015 PC DATA CENTER MOBILE ENTERPRISE VIRTUALIZATION AUTONOMOUS MACHINES HPC](https://reader035.vdocuments.pub/reader035/viewer/2022081521/5abd3a7e7f8b9a3a428b824d/html5/thumbnails/22.jpg)
Deep learning with cuDNNcuDNN is a library for deep learning primitives
GPUs
cuDNN
Frameworks
Applications
Tesla TX-1 Titan
![Page 23: GPUs for Deep Learning - G-DEP|GPUコンピュー … for Deep Learning Jerry Chen & Brent Oster April 2015 PC DATA CENTER MOBILE ENTERPRISE VIRTUALIZATION AUTONOMOUS MACHINES HPC](https://reader035.vdocuments.pub/reader035/viewer/2022081521/5abd3a7e7f8b9a3a428b824d/html5/thumbnails/23.jpg)
cuDNN design goals
Basic Deep Learning Subroutines
Allow user to create DNN application without any custom CUDA code
Flexible Layout
Handle any data layout
Memory – Performance tradeoff
Good performance with minimal memory, great performance with more memory
![Page 24: GPUs for Deep Learning - G-DEP|GPUコンピュー … for Deep Learning Jerry Chen & Brent Oster April 2015 PC DATA CENTER MOBILE ENTERPRISE VIRTUALIZATION AUTONOMOUS MACHINES HPC](https://reader035.vdocuments.pub/reader035/viewer/2022081521/5abd3a7e7f8b9a3a428b824d/html5/thumbnails/24.jpg)
cuDNN Version 2
Accelerates key routines to
improve performance of neural
net training
Up to 1.8x faster on AlexNet than
a baseline GPU implementation
New support for 3D convolutions
Integrated into all major Deep
Learning frameworks: Caffe,
Theano, Torch
1.0x 1.0x
1.6x
1.2x
Caffe (GoogLeNet) Torch (OverFeat)
Baseline (GPU)
With cuDNN
2.5M
18M
23M
43M
0
10
20
30
40
50
16 Core CPU GTX Titan Titan BlackcuDNN v1
Titan XcuDNN v2
Millions
of
Images
Images Trained Per Day (Caffe AlexNet)
E5-2698 v3 @ 2.3GHz / 3.6GHz Turbo
![Page 25: GPUs for Deep Learning - G-DEP|GPUコンピュー … for Deep Learning Jerry Chen & Brent Oster April 2015 PC DATA CENTER MOBILE ENTERPRISE VIRTUALIZATION AUTONOMOUS MACHINES HPC](https://reader035.vdocuments.pub/reader035/viewer/2022081521/5abd3a7e7f8b9a3a428b824d/html5/thumbnails/25.jpg)
TITAN XTHE WORLD’S FASTEST GPU
8 Billion Transistors 3,072 CUDA Cores7 TFLOPS SP / 0.2 TFLOPS DP12GB Memory
![Page 26: GPUs for Deep Learning - G-DEP|GPUコンピュー … for Deep Learning Jerry Chen & Brent Oster April 2015 PC DATA CENTER MOBILE ENTERPRISE VIRTUALIZATION AUTONOMOUS MACHINES HPC](https://reader035.vdocuments.pub/reader035/viewer/2022081521/5abd3a7e7f8b9a3a428b824d/html5/thumbnails/26.jpg)
0
1
2
3
4
5
6
7
Titan X for deep learning
Training AlexNet
Days
16-coreXeon CPU
TITAN TITAN BlackcuDNN
TITAN XcuDNN
~
43
…
![Page 27: GPUs for Deep Learning - G-DEP|GPUコンピュー … for Deep Learning Jerry Chen & Brent Oster April 2015 PC DATA CENTER MOBILE ENTERPRISE VIRTUALIZATION AUTONOMOUS MACHINES HPC](https://reader035.vdocuments.pub/reader035/viewer/2022081521/5abd3a7e7f8b9a3a428b824d/html5/thumbnails/27.jpg)
GPUs for training
Workstation
• 2x NVIDIA Tesla K40 Accelerator• 2x CPU• 64 GB System Memory
Server
• 4x NVIDIA Tesla K40/K80 Accelerator• 2x CPU• 256 GB System Memory
Upgrade Options: • 8x GPUs, OR• 6x GPUs + 2x IB FDR Cards
![Page 28: GPUs for Deep Learning - G-DEP|GPUコンピュー … for Deep Learning Jerry Chen & Brent Oster April 2015 PC DATA CENTER MOBILE ENTERPRISE VIRTUALIZATION AUTONOMOUS MACHINES HPC](https://reader035.vdocuments.pub/reader035/viewer/2022081521/5abd3a7e7f8b9a3a428b824d/html5/thumbnails/28.jpg)
GPUs for inference
• NVIDIA Tesla Accelerators
Online Classification“commodity servers”
• NVIDIA TK1 / TX1
Offline ClassificationEmbedded / Mobile
![Page 29: GPUs for Deep Learning - G-DEP|GPUコンピュー … for Deep Learning Jerry Chen & Brent Oster April 2015 PC DATA CENTER MOBILE ENTERPRISE VIRTUALIZATION AUTONOMOUS MACHINES HPC](https://reader035.vdocuments.pub/reader035/viewer/2022081521/5abd3a7e7f8b9a3a428b824d/html5/thumbnails/29.jpg)
Pascal: Next Generation Tesla GPU
Peak Performance Stacked Memory
NVLink High-Speed Interconnect Unified Memory
>3 TeraFLOPS4x Higher Bandwidth (~1 TB/s)
Larger Capacity (16 GB)
80 GB/sec
POWER CPU & GPU-to-GPU Interconnect
Single Memory Space
Lower Developer Effort
![Page 30: GPUs for Deep Learning - G-DEP|GPUコンピュー … for Deep Learning Jerry Chen & Brent Oster April 2015 PC DATA CENTER MOBILE ENTERPRISE VIRTUALIZATION AUTONOMOUS MACHINES HPC](https://reader035.vdocuments.pub/reader035/viewer/2022081521/5abd3a7e7f8b9a3a428b824d/html5/thumbnails/30.jpg)
Phenomenal Memory Bandwidth for Applications
NVLINKGPU high speed interconnect
3D Stacked Memory4x Higher Bandwidth (~1 TB/s)
3x Larger Capacity
4x More Energy Efficient per bit
![Page 31: GPUs for Deep Learning - G-DEP|GPUコンピュー … for Deep Learning Jerry Chen & Brent Oster April 2015 PC DATA CENTER MOBILE ENTERPRISE VIRTUALIZATION AUTONOMOUS MACHINES HPC](https://reader035.vdocuments.pub/reader035/viewer/2022081521/5abd3a7e7f8b9a3a428b824d/html5/thumbnails/31.jpg)
Thank you!
Developer Zone: https://developer.nvidia.com/deeplearning
GPU Technology Conference: http://www.gputechconf.com/
cuDNN Download: https://developer.nvidia.com/cuDNN
DIGITS Download: https://developer.nvidia.com/digits
DIGITS Source: https://www.github.com/nvidia/digits