nvidia sdk 在高教中的应用 · 6.1 - recurrent neural network basics 6.2 - advanced recurrent...

NVIDIA SDK 在高教中的应用

侯宇涛

GPU 应用市场总监英伟达

2

> Founded in 1993

> Jensen Huang, Founder & CEO

> 11,000 employees

> $123B market cap; $6.9B revenue in FY17

“World’s Most Admired Companies”— Fortune

“50 Smartest Companies: #1”— MIT Tech Review

“#3 Top CEO in the World”— Harvard Business Review

“Most Innovative Companies”— Fast Company

NVIDIA

3

Artificial IntelligenceComputer GraphicsGPU Computing

NVIDIA“THE AI COMPUTING COMPANY”

4

NVIDIA TESLA PLATFORM SAVES MONEYGame-Changing Inference Performance

160 CPU Servers

45,000 images/sec

65 KWatts

INFERENCE WORKLOAD:Image recognition using Resnet 50

1 HGX Server

45,000 images/sec

3 KWatts

INFERENCE WORKLOAD:Image recognition using Resnet 50

SAMETHROUGHPUT

1/20THE SPACE

1/22THE POWER

5

Top10 Supercomputer

• 80% heterogenous

• 50% NVIDIA GPU

• 30% Intel Xeon Phi

Heterogeneous Parallel Computing

Latency-Optimized

Fast Serial Processing

Logic()

Compute()

Heterogeneous Parallel Computing

Latency-Optimized

Fast Serial Processing

Throughput-Optimized

Fast Parallel Processing

Logic()

Compute()

CPU Pizza Delivery

Process:

Delivery truck

delivers one pizza

and then moves to

next house

Original Idea by Jedox www.jedox.com

http://www.jedox.com/

NVIDIA GPU Pizza Delivery

Process:

Many deliveries to

many houses

Original Idea by Jedox www.jedox.com

http://www.jedox.com/

Accelerated ComputingMulti-core plus Many-cores

CPUOptimized for Serial Tasks

GPU AcceleratorOptimized for Many

Parallel Tasks

10x Performance5x Energy Efficiency

How GPU Acceleration Works

Application Code

+

GPU CPU5% of Code

Compute-Intensive Functions

Rest of SequentialCPU Code

What is CUDA

CUDA™ is a parallel computing platform and programming model that enables dramatic

increases in computing performance by harnessing the power of the graphics processing unit

(GPU).

Compute

Unified

Device

Architecture

13

CUDA DEVELOPMENT ECOSYSTEM

CUDA: Programming Model, GPU Architecture, System Architecture

Specialized PerformanceEase of use

FrameworksApplications LibrariesDirectives and

Standard LanguagesExtended Standard

Languages

CUDA-C++CUDA Fortran

GPU Users DomainSpecialists

ProblemSpecialists

New Algorithm Developers and Optimization Experts

14

INTRODUCING CUDA 10.0

New GPU Architecture, Tensor Cores, NVSwitch Fabric

TURING AND NEW SYSTEMSCUDA Graphs, Vulkan & DX12 Interop, Warp Matrix

CUDA PLATFORM

GPU-accelerated hybrid JPEG decoding,Symmetric Eigenvalue Solvers, FFT Scaling

LIBRARIESNew Nsight Products – Nsight Systems and Nsight Compute

DEVELOPER TOOLS

Scientific Computing

15

CUDA 10.0 PLATFORM SUPPORTNew OS and Host Compilers

PLATFORM OS VERSION COMPILERS

Linux

18.04.1 LTS

16.04.5 LTS

14.04.5 LTS

GCC 7.x

PGI 18.x

Clang 6.0.x

ICC 18

XLC 16.1.x (POWER)

7.5

7.5 POWER LE

SLES 15

27

Leap 15

Windows Windows Server2016

2012 R2

Microsoft

Visual Studio 2017 (15.x)

Mac macOS 10.13.6 Xcode 9.4

16

POWERING THE DEEP LEARNING ECOSYSTEMNVIDIA SDK accelerates every major framework

COMPUTER VISION

OBJECT DETECTION IMAGE CLASSIFICATION

SPEECH & AUDIO

VOICE RECOGNITION LANGUAGE TRANSLATION

NATURAL LANGUAGE PROCESSING

RECOMMENDATION ENGINES SENTIMENT ANALYSIS

DEEP LEARNING FRAMEWORKS

NVIDIA DEEP LEARNING SDK and CUDA

developer.nvidia.com/deep-learning-software

developer.nvidia.com/deep-learning-software

17

0

2000

4000

6000

1980 1990 2000 2010 2020

Original data up to the year 2010 collected and plotted by M. Horowitz,

F. Labonte, O. Shacham, K. Olukotun, L. Hammond, and C. Batten New plot and data collected for 2010-2015 by K. Rupp

103

105

107

1.5X per year

40 Years of CPU Trend Data

Single-threaded perf

RISE OF NVIDIA GPU COMPUTING

GPU-Computing perf

1.5X per year

1.1X per year

CUDA – Domain Specific Computing Architecture10X in 5 Years

FLOPS/ Transistor

MaxwellKepler Pascal Volta

1000xBy

2025

18

University Program

• GPU Educator

• DLI Ambassador

GPU 教育中心 GPU Educator in university

学校老师所开课程上课人数学校老师所开课程上课人数

1中国海洋大学王胜科

计算机视觉 18 19

南开大学王刚/

任明明/ 李涛

并行程序设计 27

2 计算机导论 180 20 并行计算 72

3 清华大学深圳研究生院袁博先进计算技术及应用 20 21 物联网系统时序 15

4杭州电子科技大学杜鹏

三维图形成像设计 30 22 大数据计算及应用 116

5 面向对象与并行技术 9 23 web数据挖掘 96

6华中科技大学路志宏

并行计算 45 24 成都大学韩祺祎 NVIDIA GPU发展综述 29

7 流体计算的建模与仿真 30 25长江大学张宫

GPU程序设计 15

8浙江大学唐敏

GPU计算与工程应用 64 26 高性能计算与人工智能 150

9 GPU特效绘制 24 27

江苏科技大学刘镇

GPU并行计算 120

10兰州大学周庆国

基于机器人的实践方法 70 28 GPU以及并行嵌入系统 20

11 GPU并行计算编程 100 29 信息处理新技术 30

12 中国科学院大学刘莹并行与分布式计算 67 30 西南石油大学彭博高性能计算 35

13 哈尔滨工业大学苏统华

深度学习概论+嵌入式系统+并行程序设计+物联网智能信息处理

5031

温州大学赵汉理高性能并行计算 48

14 DLI深度学习培训 70 32 上海大学张旭视觉检测 80

15南京大学于莹

并行计算程序设计 60 33 中山大学张永东高性能计算 120

16 GPU与人工智能 50 34 桂林电子科技大学何倩高级计算机体系结构 39

17北京师范大学孙波

数据可视化 40 35 同济大学张毅超并行编程原理与实践 30

18 科学可视化 30 Total Student 1999

GPU 教育课程

21

ROBOTICSACCELERATED COMPUTINGDEEP LEARNING

课程介绍

同纽约大学（NYU）的 Yann LeCun 教

授及其团队合作开发的深度学习学院教

学套件覆盖了入门级以及进阶级深度学

习内容，包括：

▪ 机器学习以及深度学习入门

▪ 图像分类应用

▪ 目标检测应用

▪ 卷积神经网络

▪ 图像分割应用

▪ 基于能量的学习

▪ 无监督的学习

▪ 生成式对抗网络

▪ 递归神经网络

▪ 自然语言处理

▪ 其他

同来自加利福尼亚州立大学(CalPoly)

的 John Seng教授及其团队合作开

发的机器人教学套件，包括入门级和

进阶级多学科内容：

• 机器人和Jetson产品介绍

• ROS Robot O/S系统

• 传感器

• 计算机视觉

• 机器学习

• 航位推算

• 路径规划

• 以及其他更多内容

同伊利诺伊大学（UIUC）的Wen-

Mei Hwu教授及其团队一通开发的

加速计算教学课件覆盖入门级以及进

阶级加速并行计算内容：

• CUDA C入门

• 内存及数据局部性

• 内存访问性能

• 并行计算模式

• 柱状图，模板，约减，扫描

• 高效的主机端-设备端数据传输相

关的编程模型

• OpenACC，MPI，OpenCL

• 其他更多内容

22

注册与下载

rogram. https://developer.nvidia.com/educators

https://developer.nvidia.com/educators

23

DLI Teaching Kit

Lecture 1.1 – Course Introduction

深度学习教学包

24

教学课件

Module 1 - Introduction to

Machine Learning

1.1 - Course Introduction1.2 - Introduction to Machine Learning1.3 - Introduction to Neural Networks

Module 2 - Introduction to

Deep Learning

2.1 - Introduction to Deep Learning2.2 - Deep Supervised Learning (modular approach) – Part 12.3 - Deep Supervised Learning (modular approach) – Part 2

Module 3 - Convolutional

Neural Networks

3.1 - History of Convolutional Networks3.2 - Convolutional Networks and Computer Vision, Audio and Other Domains3.3 - Structural Prediction and Natural Language Processing

Module 4 - Energy-based

Learning

4.1 - Energy-based Learning4.2 - Unsupervised Learning4.3 - Sparse Coding

Module 5 - Optimization

Techniques5.1 - Efficient Learning and Second-order Methods

Module 6 - Learning with

Memory

6.1 - Recurrent Neural Network Basics6.2 - Advanced Recurrent Neural Networks6.3 - Sequences Modeling with Deep Learning6.4 - Embedding Methods for NLP: Unsupervised and Supervised Embeddings6.5 - Embedding Methods for NLP: Embeddings for Multi-relational Data6.6 - Deep Natural Language Processing

Module 7 - Future

Challenges7.1 - Future Challenges

25

教学实验

NVIDIA DLI Online

Qwiklab 1Image Classification with NVIDIA DIGITS

Lab 1

1.1 - Backpropagation- Logistic regression- Softmax expression

1.2 - MNIST Handwritten Digit Recognition (Torch) (programming)

NVIDIA DLI Online

Qwiklab 2Object Detection with NVIDIA DIGITS

Lab 2A2A.1 - More Backpropagation2A.2 - STL10: Semi-supervised Image Recognition (Torch) (programming)

- Visualizing filters and augmentations- t-SNE

Lab 2B

2B.1 - Backpropagation- Nonlinear activation functions- Softmax

2B.2 - Techniques- Optimization- Reducing overfitting- Initialization

2B.3 - MNIST: Semi-supervised Image Recognition (PyTorch) (programming)

NVIDIA DLI Online

Qwiklab 3Image Segmentation with TensorFlow

26

传统的深度学习编程流程

• DL frameworks, Caffe, etc. aimed at computer scientist not data scientist

• Juggle multiple files & windows

• Handcrafted visualizations

• Manual log file parsing

• Manual experiment logging

• Model editing in Lua IDE files

27

数据集处理深度神经网络配置可视化结果过程监控

交互式深度学习GPU 训练系统

NVIDIA DIGITS

28

不用写代码使用进行图像分类（深度学习零基础入门培训）

侯宇涛Developer marketing Director Certified Instructor, NVIDIA Deep Learning InstituteNVIDIA China

29

HANDWRITTEN DIGIT RECOGNITION

HELLO WORLD of machine learning?

30

DIGITS ：图片分类的训练数据集格式

31

数据集处理深度神经网络配置可视化结果过程监控

交互式深度学习GPU 训练系统

NVIDIA DIGITS

32

NVIDIA DIGITS Cloud

Mobile

34

How to get DIGITS

Simple way:

➢ OS – Ubuntu14.04

➢ Download link：https://developer.nvidia.

com/digits

Others (from source code)：

➢ Download NVIDIA-Caffe：https://github.com/NVIDIA/caffe

➢ Download Digits：https://github.com/NVIDIA/DIGITS

Recommended HW/SW environment:

➢ GPU Compute Capability > 3.0 (Kepler and later)，cuDNN v5

➢ OS – Ubuntu14.04

Robotics Teaching KitWith ‘Jet’

机器人教学包

36

CPU

机器人概述– 结构图

Jetson TK1

Arduino

MegaH-Bridge

Shield

Left

Motor

Right

Motor

Camera

Accel/Gyro (GY-

521)

USB

USB

I2C

Sonar

Module

Sonar

Module

Sonar

Module

Encoderreadings

37

JET 机器人教学包Module Goals

Learn interdisciplinary, GPU-accelerated, autonomous Robotics

Technical subjects

SensorsComputer VisionMachine LearningDead ReckoningPath PlanningLocalizationControlObstacle Avoidance

39

套件清单

40

北京大学

兰州大学

南京大学

西北工业大学

41

Hackathon in Shanghai Jiaotong University

44

ISAAC WORKFLOW

Perception Navigation Manipulation

Isaac Framework

Jetpack TensorRT CUDA

Actuator ControlSensor I/O

ISAAC Sim engine

Physics Graphics AI

World Model Robot Model

Virtual

Sensors

Virtual

ActuatorsSensors Actuators

Simulate Deploy

ISAAC Sim ISAAC SDK

45

ANNOUNCING:JETSON XAVIERComputer for Autonomous Machines

AI Server Performance in 30W 15W 10W

512 Volta CUDA Cores 2x NVDLA

8 core CPU

30 DL TOPS

46

更多资源

及时获取

最新 DLI 课程和开发者资源

关注微信公众号

NVIDIA 开发者社区

在线自主培训 www.nvidia.cn/DLIonline

有讲师指导的培训 www.nvidia.cn/DLI

学习 DLI 更多课程

https://developer.nvidia.com/join

加入开发者社区

选择适当的软件、GPU 和资源加速应用

www.nvidia.com/deeplearning/developer

DLI 培训咨询

添加微信朋友

DLIChina

http://www.nvidia.cn/DLIonline

http://www.nvidia.com/dli

https://developer.nvidia.com/join

http://www.nvidia.com/deeplearning/developer

47

END

nvidia sdk 在高教中的应用 · 6.1 - recurrent neural network basics 6.2 - advanced recurrent...

Documents