車載組み込み用ディープラーニング・エンジン nvidia drive px
TRANSCRIPT
馬路 徹 技術顧問、GPUエバンジェリスト
車載ディープラーニング及び自動運転用プラットフォームNVIDIA DRIVE PX2
講演目次
• NVIDIAの自動車ビジネス
• ディープラーニングによる先進の画像認識
• GPU: ディープラーニング及び超並列処理のためのエンジン
• ディープラーニング及び超並列処理用車載プラットフォームDRIVE PX2
• ADAS及び自動運転用SWフレームワークDRIVE WORK
• 自動運転稼動状況の可視化
• 直近の自動運転関連応用事例(公開情報)
NVIDIAの自動車ビジネス
10 Years
10+ M
Units Shipped
Car Models
80
Automotive Experience
NVIDIA SDK (SOFTWARE DEVELOPMENT KIT)The Essential Resource for OEM, Tier1, Eco System Proliferation
developer.nvidia.com | Available Now
NVIDIA CONFIDENTIAL. DO NOT DISTRIBUTE.
THE NEW REALIZATION
"Modules, modules and more modules. There's
so many modules there. If we were to strip off
this car, we'd probably have a basketful of
Modules -- little black boxes that do something.
It's getting out of control. They're very
expensive. They're tough to package. They're
very complex.
“I’d like to see a monster module that controls
the entire vehicle and that's easier to upgrade.“
Ralph Gilles, Fiat Chrysler Automobiles
Global Design Chief
Automotive News, February 28, 2016
Localization
Planning
Visualization
Perception
Self-DrivingSoftware
AI - Speech
SurroundView
Smart Mirror
GPU Virt
CockpitSoftware
Cockpit Computer Self-Driving Computer
Two computers replace many ECUsBoth have access to cameras/sensors
Multiple OSs, DisplaysPowered by Artificial Intelligence
Upgradeable SW replaces HW ECUsOne architecture
Higher performanceLower total cost
THE FUTURE OF CAR COMPUTERSONLY TWO MAIN INTEGRATED MODULES
DRIVE CX DRIVE PX
ディープラーニングによる先進の画像認識
DL REVOLUTIONIZE CAR COMPUTER VISION
CONVENTIONAL
DEEP NEURAL NETWORK
(…)
Required Separate Algorithms/Apps- Pedestrian: HOG etc- Traffic Sign: Hough Transform + Character Recog. etc
Only simple context recognition- Pedestrian Y/N Only (no additional info)- Speed Limit Signs Only
One Deep Neural Net App can Detect various Objects- Pedestrian, Cars, Traffic Signs, lanes- Also with many attributes (Car: Police Car, Van, Sedan, Truck, Ambulance….)
39%
55%
72%
88%
30%
40%
50%
60%
70%
80%
90%
100%
7/2015 8/2015 9/2015 10/2015 11/2015 12/2015
Top Score
KITTI Dataset: Object DetectionNVIDIA DRIVENet
KITTY Database
Object Detection
VERY SHORT TIME TO GET TOP-CLASS SCORE
EVERYBODY USING GPU !(Not the latest Ranking)
Courtesy of Cityscape
Courtesy of Daimler
Courtesy of Audi
“Using NVIDIA DIGITS deep
learning platform, in less than
four hours we achieved over 96%
accuracy using Ruhr University
Bochum’s traffic sign database.
While others invested years of
development to achieve similar
levels of perception with
classical computer vision
algorithms, we have been able
to do it at the speed of light.”
Matthias Rudolph, Director of Architecture,
Driver Assistance Systems, Audi
GPU: ディープラーニング及び超並列処理のためのエンジン
NVIDIA CONFIDENTIAL. DO NOT DISTRIBUTE.
NVIDIA GPU BIG CONRIBUTION ON SUPERCOMPUTER
USING CUDA (GPU Massive Parallel Computing)
CUDA: Compute Unified Device Architecture
From SC TOP500 November 2015
LEAPS IN SUPERCOMPUTER GPU ADOPTION
0
20
40
60
80
100
120
Nov 2013 Nov 2014 Nov 2015
# a
ccele
rate
d s
yst
em
s
Accelerated Systems x2 from 2013 to 2015
96% of New Systems using NVIDIA GPU
超並列プログラミング環境CUDA
代表的なCUDA対応ライブラリ
cuDNN ディープラーニング
cuBLAS 行列演算(密行列)
cuSPARSE 行列演算(疎行列)
cuFFT フーリエ変換
cuRAND 乱数生成
NPP 画像処理プリミティブ
cuSOLVER 行列ソルバ (y=Ax)
Thrust C++テンプレートライブラリ
…
https://developer.nvidia.com/gpu-accelerated-libraries
CUDA (Compute Unified Device Architecture)
2012 20142008 2010 2016 2018
48
36
12
0
24
60
72
TeslaFermi
Kepler
Maxwell
PascalMixed PrecisionDouble Precision3D MemoryNVLink
Volta
SOLID GPU ROADMAP SG
EM
M /
W
NVIDIA ONE-ARCHITECTUREFROM SUPER COMPUTER TO AUTOMOTIVE SOC
TeslaIn Super Computers
QuadroIn Work Stations
GeForceIn PCs
Mobile GPU
In Tegra
Automotive Tegra
PARALLEL PROCESSING AND AI/DL EVERYWHEREWITH ONE-ARCHITECTURE OVER ALL
PRODUCTS/PLATFORMS
TITAN X/Graphics Card
NVIDIA Tegra/Jetson
NVIDIA Tesla/Supercomputer, HPC
NVIDIA Tegra/DRIVE PX
DRIVE PX AUTO-PILOT CAR COMPUTER
NVIDIA GPU DEEP LEARNING SUPERCOMPUTER
Trained
Neural Net Model
Classified Object
!
WHAT TRULY SCALABLE GPU ARCHITECTURE ENABLESTIME-CONSUMING TRAINING ON SERVER & REAL-TIME RECOGNITION ON EMBEDDED SYSTEM
Camera Inputs
ディープラーニング及び超並列処理用車載プラットフォームDRIVE PX2
DRIVE PX2 ENGAGEMENTS >100
Passenger Car OEMs
~25 ~10 ~20
Commercial Car OEMs
~10 ~50
TAAS
(Transportation As A Service)
Tier 1s
Eco System Partners
(R&D, Universities, OS, Sensor, ISV etc)
DL: VERY FAST DEVELOPMENT SPEED
TOWARDS TOP SCORE(1)
DRIVE PX PLATFORMSOLUTION • Drive PX is a computing platform for
ADAS / autonomous driving
• End-to-End platform optimized for deep
learning (Super Computer – DRIVE PX)
• Open and Scalable SW Stack:
DRIVE Works
• Scalable architecture from ADAS to
Autonomous Driving (One Tegra to
2 x Tegra + 2 x discrete GPU)
DL Training Workstation/SuperComputer
DRIVE PX
Proprietary & Confidential
All Information Subject to Change
DRIVE PX
Camera Inputs
Dual Tegra X1
8 CPU Cores
Maxwell GPU
850GFLOPS (FP32)
12 simultaneous LVDS camera inputs
2 LVDS display ports
Display
Ports Car Connector
DRIVE PX HARNESS FROM CAR CONNECTOR
CAN, LIN, FlexRay and Ethernet Supported
48-pin Automotive Grade
Vehicle Harness
CAN 2.0 (x6)
FlexRay (x2)
LIN (x4)
UART (x1)
Ethernet (x1)
1x Power
Proprietary & Confidential
All Information Subject to Change
DRIVE PX2 Dual Next Generation
Tegra
Dual Discrete GPUs
12 CPU Cores
Pascal GPU
8TFLOPS (FP32)
24DL TOPS
12 simultaneous LVDS camera inputs
Dual Tegras on Top
Dual Discrete GPUs on the Bottom
Liquid Cooled if All Devices used
DRIVE PX2 COMPUTATION ENGINES
Denver Denver
A57 A57 A57 A57
Pascal
Integrated GPU
Pascal
Discrete GPU
8GB
LPDDR4
128bit
UMA
4GB
GDDR5
PCIex4
Denver Denver
A57 A57 A57 A57
Pascal
Integrated GPU
Pascal
Discrete GPU
8GB
LPDDR4
128bit
UMA
4GB
GDDR5
PCIex4
1Gb Ether
GPU TOTAL PERFORMANCE- 8TFLOPS (FP32)- 24DL TOPS
HIGH PERFORMANCE 12CPUs- 2 x Quad ARM A57- 2 x Dual Denver
(ARM 64b compatible)
SCALABLE- Scalable Platform
Max: 2-Tegras + 2-dGPUsMin: 1-Tegra
REDUNDANCY- For Function Safety
DEDICATED MEMORYfor each GPU
TEGRA A PASCAL A
TEGRA B PASCAL B
DRIVE PX2 INTERFACES
Sensor Fusion Interfaces
GMSL Camera, CAN, GbE, BroadR-Reach,
FlexRay, LIN, GPIO
Displays/Cockpit Computer Interfaces
HDMI, FPDLink III and GMSL
Development and Debug Interfaces
HDMI, GbE, 10GbE, USB3,
USB 2 (UART/debug), JTAG
70 Gigabits per second of I/O
Auto Grade connectors Debug/Lab interfaces
TEGRA A PASCAL A
TEGRA B PASCAL B
Gb Ether
ASIL-D
Safety MCU
DRIVE PX2Gb Ether
Camera
BroadR-Reach
CAN
GPIOs
Display
LIN
FlexRay
USB3.0
USB2.0
Gb Ether
JTAG
10Gb Ether
Display(HDMI)
DRIVE PX2 SOFTWARE
NVIDIA Vibrante Linux
& Comprehensive BSP
Rich Autonomous Driving DRIVE Works SDK
SDK, Samples and more
A full stack of rich software components
DRIVE PX ANALYSIS AS AN SEOOC(SAFETY ELEMENTS OUT OF CONTEXT)
NVIDIA DRIVE PX as an SEooC is developed based on
“Assumptions on use in Vehicles” including external
interfaces
Safety Manual, FMEAD: NVIDIA as a developer of this
SEooC will provide the assumptions to the Tier1s and OEMs
In order to have a compete safety case, these
“assumptions” are validated by OEMs, Tier1s in the
context of the actual Vehicle system
In case that NVIDIA SEooC does not fulfill the Vehicle
requirements, “a modification needs to be made” to
either the Vehicle or the SEooC
Quantitative Analysis
FEMDA/FTASEooC Done
SEooC: Safety Elements out of Context
HARA: Hazard Analysis and Risk Assessment
FEMDA: Failure Mode Effects and Diagnostic Analysis
FTA: Fault Tree Analysis
ADAS及び自動運転用SWフレームワークDRIVE WORKS
NVIDIA DRIVEWORKS
COMPUTEWORKS
Detection Localization HD Maps
GAMEWORKS VRWORKS DESIGNWORKS DRIVEWORKS JETPACK
Sensor Fusion
and other technologies such as Driving, Planning
AI/DL is now used in Detection (Perception)
Other Features are accelerated by CUDA (GPU Massive-Parallel Computing)
AND OTHER SUPPORTING SDKS
DIGITS Workflow VisionWorks
and other technologies such as:
GIE (GPU Inference Engine), System Trace, Visual Profiler
Deep Learning SDK
The NVIDIA DriveWorks SDK gives developers a foundation to build applications across the self-driving pipeline — perception, localization, planning and visualization. And we can bring all of these technologies together into a beautiful cockpit visualization to give the driver confidence that the car is accurately seeing the world around him.
“As a leading provider of graphical hardware for gamers and researchers alike, NVIDIA has a lot of expertise in building systems that can make sense of video input and make it something understandable.”
— Business Insider
Localization
Planning
Visualization
Perception
DRIVEWORKS
37
自動運転稼動状況の可視化
NEW AI DRIVING
Training on DGX-1
Driving with DriveWorks
KALDI
LOCALIZATION
MAPPING
DRIVENET
DAVENET
NVIDIA DGX-1 NVIDIA DRIVE PX
直近の自動運転関連応用事例(公開情報)
As a part of VOLVO Drive Me project, they will run 100 autonomous driving test cars in 2017.
These cars will be equipped with NVIDIA’s Deep Learning Car Computer DRIVE PX2.
WORLD’S FIRST AUTONOMOUS CAR RACE
10 teams, 20 identical cars
DRIVE PX 2: The “brain” of every car
2016/17 Formula E season
FAST-SPEED RACING ALGORITHM ALREADY THERE
• Calculate the optimized trajectory from the weighted average of 2,560 different trajectories (each looking 2.5sec ahead) calculated in parallel on the monster NVIDIA GPU 60-times every sec.
• Using just one sampled trajectory will be very jerky. Thus 2,560 trajectories are weighted averaged.
• The dynamics model is a linear function of 25 features based on an analytical vehicle model
• On Car GPU used there is NVIDIA GTX750Ti (640-cores, 1,305-GFLOPS)
Georgia Tech MPPI (Model Predictive Path Integral control) Algorithm
Doing by itself: Counter Steering, Power Slide….Max speed 100km/Hr
THANK YOU