nvidia sdk 在高教中的应用 · 6.1 - recurrent neural network basics 6.2 - advanced recurrent...
TRANSCRIPT
NVIDIA SDK 在高教中的应用
侯宇涛
GPU 应用市场总监英伟达
2
> Founded in 1993
> Jensen Huang, Founder & CEO
> 11,000 employees
> $123B market cap; $6.9B revenue in FY17
“World’s Most Admired Companies”— Fortune
“50 Smartest Companies: #1”— MIT Tech Review
“#3 Top CEO in the World”— Harvard Business Review
“Most Innovative Companies”— Fast Company
NVIDIA
3
Artificial IntelligenceComputer GraphicsGPU Computing
NVIDIA“THE AI COMPUTING COMPANY”
4
NVIDIA TESLA PLATFORM SAVES MONEYGame-Changing Inference Performance
160 CPU Servers
45,000 images/sec
65 KWatts
INFERENCE WORKLOAD:Image recognition using Resnet 50
1 HGX Server
45,000 images/sec
3 KWatts
INFERENCE WORKLOAD:Image recognition using Resnet 50
SAMETHROUGHPUT
1/20THE SPACE
1/22THE POWER
5
Top10 Supercomputer
• 80% heterogenous
• 50% NVIDIA GPU
• 30% Intel Xeon Phi
Heterogeneous Parallel Computing
Latency-Optimized
Fast Serial Processing
Logic()
Compute()
Heterogeneous Parallel Computing
Latency-Optimized
Fast Serial Processing
Throughput-Optimized
Fast Parallel Processing
Logic()
Compute()
CPU Pizza Delivery
Process:
Delivery truck
delivers one pizza
and then moves to
next house
Original Idea by Jedox www.jedox.com
NVIDIA GPU Pizza Delivery
Process:
Many deliveries to
many houses
Original Idea by Jedox www.jedox.com
Accelerated ComputingMulti-core plus Many-cores
CPUOptimized for Serial Tasks
GPU AcceleratorOptimized for Many
Parallel Tasks
10x Performance5x Energy Efficiency
How GPU Acceleration Works
Application Code
+
GPU CPU5% of Code
Compute-Intensive Functions
Rest of SequentialCPU Code
What is CUDA
CUDA™ is a parallel computing platform and programming model that enables dramatic
increases in computing performance by harnessing the power of the graphics processing unit
(GPU).
Compute
Unified
Device
Architecture
13
CUDA DEVELOPMENT ECOSYSTEM
CUDA: Programming Model, GPU Architecture, System Architecture
Specialized PerformanceEase of use
FrameworksApplications LibrariesDirectives and
Standard LanguagesExtended Standard
Languages
CUDA-C++CUDA Fortran
GPU Users DomainSpecialists
ProblemSpecialists
New Algorithm Developers and Optimization Experts
14
INTRODUCING CUDA 10.0
New GPU Architecture, Tensor Cores, NVSwitch Fabric
TURING AND NEW SYSTEMSCUDA Graphs, Vulkan & DX12 Interop, Warp Matrix
CUDA PLATFORM
GPU-accelerated hybrid JPEG decoding,Symmetric Eigenvalue Solvers, FFT Scaling
LIBRARIESNew Nsight Products – Nsight Systems and Nsight Compute
DEVELOPER TOOLS
Scientific Computing
15
CUDA 10.0 PLATFORM SUPPORTNew OS and Host Compilers
PLATFORM OS VERSION COMPILERS
Linux
18.04.1 LTS
16.04.5 LTS
14.04.5 LTS
GCC 7.x
PGI 18.x
Clang 6.0.x
ICC 18
XLC 16.1.x (POWER)
7.5
7.5 POWER LE
SLES 15
27
Leap 15
Windows Windows Server2016
2012 R2
Microsoft
Visual Studio 2017 (15.x)
Mac macOS 10.13.6 Xcode 9.4
16
POWERING THE DEEP LEARNING ECOSYSTEMNVIDIA SDK accelerates every major framework
COMPUTER VISION
OBJECT DETECTION IMAGE CLASSIFICATION
SPEECH & AUDIO
VOICE RECOGNITION LANGUAGE TRANSLATION
NATURAL LANGUAGE PROCESSING
RECOMMENDATION ENGINES SENTIMENT ANALYSIS
DEEP LEARNING FRAMEWORKS
NVIDIA DEEP LEARNING SDK and CUDA
developer.nvidia.com/deep-learning-software
17
0
2000
4000
6000
1980 1990 2000 2010 2020
Original data up to the year 2010 collected and plotted by M. Horowitz,
F. Labonte, O. Shacham, K. Olukotun, L. Hammond, and C. Batten New plot and data collected for 2010-2015 by K. Rupp
103
105
107
1.5X per year
40 Years of CPU Trend Data
Single-threaded perf
RISE OF NVIDIA GPU COMPUTING
GPU-Computing perf
1.5X per year
1.1X per year
CUDA – Domain Specific Computing Architecture10X in 5 Years
FLOPS/ Transistor
MaxwellKepler Pascal Volta
1000xBy
2025
18
University Program
• GPU Educator
• DLI Ambassador
GPU 教育中心 GPU Educator in university
学校 老师 所开课程 上课人数 学校 老师 所开课程 上课人数
1中国海洋大学 王胜科
计算机视觉 18 19
南开大学王刚/
任明明/ 李涛
并行程序设计 27
2 计算机导论 180 20 并行计算 72
3 清华大学深圳研究生院 袁博 先进计算技术及应用 20 21 物联网系统时序 15
4杭州电子科技大学 杜鹏
三维图形成像设计 30 22 大数据计算及应用 116
5 面向对象与并行技术 9 23 web数据挖掘 96
6华中科技大学 路志宏
并行计算 45 24 成都大学 韩祺祎 NVIDIA GPU发展综述 29
7 流体计算的建模与仿真 30 25长江大学 张宫
GPU程序设计 15
8浙江大学 唐敏
GPU计算与工程应用 64 26 高性能计算与人工智能 150
9 GPU特效绘制 24 27
江苏科技大学 刘镇
GPU并行计算 120
10兰州大学 周庆国
基于机器人的实践方法 70 28 GPU以及并行嵌入系统 20
11 GPU并行计算编程 100 29 信息处理新技术 30
12 中国科学院大学 刘莹 并行与分布式计算 67 30 西南石油大学 彭博 高性能计算 35
13 哈尔滨工业大学 苏统华
深度学习概论+嵌入式系统+并行程序设计+物联网智能信息处理
5031
温州大学 赵汉理 高性能并行计算 48
14 DLI深度学习培训 70 32 上海大学 张旭 视觉检测 80
15南京大学 于莹
并行计算程序设计 60 33 中山大学 张永东 高性能计算 120
16 GPU与人工智能 50 34 桂林电子科技大学 何倩 高级计算机体系结构 39
17北京师范大学 孙波
数据可视化 40 35 同济大学 张毅超 并行编程原理与实践 30
18 科学可视化 30 Total Student 1999
GPU 教育课程
21
ROBOTICSACCELERATED COMPUTINGDEEP LEARNING
课程介绍
同纽约大学(NYU)的 Yann LeCun 教
授及其团队合作开发的深度学习学院教
学套件覆盖了入门级以及进阶级深度学
习内容,包括:
▪ 机器学习以及深度学习入门
▪ 图像分类应用
▪ 目标检测应用
▪ 卷积神经网络
▪ 图像分割应用
▪ 基于能量的学习
▪ 无监督的学习
▪ 生成式对抗网络
▪ 递归神经网络
▪ 自然语言处理
▪ 其他
同来自加利福尼亚州立大学(CalPoly)
的 John Seng教授及其团队合作开
发的机器人教学套件,包括入门级和
进阶级多学科内容:
• 机器人和Jetson产品介绍
• ROS Robot O/S系统
• 传感器
• 计算机视觉
• 机器学习
• 航位推算
• 路径规划
• 以及其他更多内容
同伊利诺伊大学(UIUC)的Wen-
Mei Hwu教授及其团队一通开发的
加速计算教学课件覆盖入门级以及进
阶级加速并行计算内容:
• CUDA C入门
• 内存及数据局部性
• 内存访问性能
• 并行计算模式
• 柱状图,模板,约减,扫描
• 高效的主机端-设备端数据传输相
关的编程模型
• OpenACC,MPI,OpenCL
• 其他更多内容
23
DLI Teaching Kit
Lecture 1.1 – Course Introduction
深度学习教学包
24
教学课件
Module 1 - Introduction to
Machine Learning
1.1 - Course Introduction1.2 - Introduction to Machine Learning1.3 - Introduction to Neural Networks
Module 2 - Introduction to
Deep Learning
2.1 - Introduction to Deep Learning2.2 - Deep Supervised Learning (modular approach) – Part 12.3 - Deep Supervised Learning (modular approach) – Part 2
Module 3 - Convolutional
Neural Networks
3.1 - History of Convolutional Networks3.2 - Convolutional Networks and Computer Vision, Audio and Other Domains3.3 - Structural Prediction and Natural Language Processing
Module 4 - Energy-based
Learning
4.1 - Energy-based Learning4.2 - Unsupervised Learning4.3 - Sparse Coding
Module 5 - Optimization
Techniques5.1 - Efficient Learning and Second-order Methods
Module 6 - Learning with
Memory
6.1 - Recurrent Neural Network Basics6.2 - Advanced Recurrent Neural Networks6.3 - Sequences Modeling with Deep Learning6.4 - Embedding Methods for NLP: Unsupervised and Supervised Embeddings6.5 - Embedding Methods for NLP: Embeddings for Multi-relational Data6.6 - Deep Natural Language Processing
Module 7 - Future
Challenges7.1 - Future Challenges
25
教学实验
NVIDIA DLI Online
Qwiklab 1Image Classification with NVIDIA DIGITS
Lab 1
1.1 - Backpropagation- Logistic regression- Softmax expression
1.2 - MNIST Handwritten Digit Recognition (Torch) (programming)
NVIDIA DLI Online
Qwiklab 2Object Detection with NVIDIA DIGITS
Lab 2A2A.1 - More Backpropagation2A.2 - STL10: Semi-supervised Image Recognition (Torch) (programming)
- Visualizing filters and augmentations- t-SNE
Lab 2B
2B.1 - Backpropagation- Nonlinear activation functions- Softmax
2B.2 - Techniques- Optimization- Reducing overfitting- Initialization
2B.3 - MNIST: Semi-supervised Image Recognition (PyTorch) (programming)
NVIDIA DLI Online
Qwiklab 3Image Segmentation with TensorFlow
26
传统的 深度学习 编程流程
• DL frameworks, Caffe, etc. aimed at computer scientist not data scientist
• Juggle multiple files & windows
• Handcrafted visualizations
• Manual log file parsing
• Manual experiment logging
• Model editing in Lua IDE files
27
数据集处理 深度神经网络配置 可视化结果过程监控
交互式深度学习GPU 训练系统
NVIDIA DIGITS
28
不用写代码使用 进行图像分类(深度学习零基础入门培训)
侯宇涛Developer marketing Director Certified Instructor, NVIDIA Deep Learning InstituteNVIDIA China
29
HANDWRITTEN DIGIT RECOGNITION
HELLO WORLD of machine learning?
30
DIGITS :图片分类的训练数据集格式
31
数据集处理 深度神经网络配置 可视化结果过程监控
交互式深度学习GPU 训练系统
NVIDIA DIGITS
32
NVIDIA DIGITS Cloud
Mobile
33
34
How to get DIGITS
Simple way:
➢ OS – Ubuntu14.04
➢ Download link:https://developer.nvidia.
com/digits
Others (from source code):
➢ Download NVIDIA-Caffe:https://github.com/NVIDIA/caffe
➢ Download Digits:https://github.com/NVIDIA/DIGITS
Recommended HW/SW environment:
➢ GPU Compute Capability > 3.0 (Kepler and later),cuDNN v5
➢ OS – Ubuntu14.04
Robotics Teaching KitWith ‘Jet’
机器人教学包
36
CPU
机器人概述– 结构图
Jetson TK1
Arduino
MegaH-Bridge
Shield
Left
Motor
Right
Motor
Camera
Accel/Gyro (GY-
521)
USB
USB
I2C
Sonar
Module
Sonar
Module
Sonar
Module
Encoderreadings
37
JET 机器人教学包Module Goals
Learn interdisciplinary, GPU-accelerated, autonomous Robotics
Technical subjects
SensorsComputer VisionMachine LearningDead ReckoningPath PlanningLocalizationControlObstacle Avoidance
38
39
套件清单
40
北京大学
兰州大学
南京大学
西北工业大学
41
Hackathon in Shanghai Jiaotong University
42
43
44
ISAAC WORKFLOW
Perception Navigation Manipulation
Isaac Framework
Jetpack TensorRT CUDA
Actuator ControlSensor I/O
ISAAC Sim engine
Physics Graphics AI
World Model Robot Model
Virtual
Sensors
Virtual
ActuatorsSensors Actuators
Simulate Deploy
ISAAC Sim ISAAC SDK
45
ANNOUNCING:JETSON XAVIERComputer for Autonomous Machines
AI Server Performance in 30W 15W 10W
512 Volta CUDA Cores 2x NVDLA
8 core CPU
30 DL TOPS
46
更多资源
及时获取
最新 DLI 课程和开发者资源
关注微信公众号
NVIDIA 开发者社区
在线自主培训 www.nvidia.cn/DLIonline
有讲师指导的培训 www.nvidia.cn/DLI
学习 DLI 更多课程
https://developer.nvidia.com/join
加入开发者社区
选择适当的软件、GPU 和资源加速应用
www.nvidia.com/deeplearning/developer
DLI 培训咨询
添加微信朋友
DLIChina
47
END