先端運転支援システム等に向けた -...
TRANSCRIPT
先端運転支援システム等に向けた
NVIDAの新画像処理・画像認識プログラム開発環境
NVIDIA NEW IMAGE PROCESSING/RECOGNITION PROGRAM DEVELOPMENT
ENVIRONMENT FOR ADAS AND OTHER APPLICATIONS
馬路 徹、シニア・ソリューション・アーキテクト、NVIDIA
Tesla In Super Computers
Quadro In Work Stations
GeForce In PCs
Mobile Kepler In Tegra
192 CUDA Cores
Tegra K1
• Kepler
Architecture
• ISA Compatible
to GeForce,
Quadro, Tesla
• 64kB L1 Cache
and Shared
Memory
• 128kB L2 Cache
A15 A15 A15 LP
A15 A15
TEGRA K1 DEVELOPMENT PLATFORMS
JETSON X3 (TK1 PRO)
GigE, usb3.0, HDMI, CANBUS
running Vibrante Linux
AUTOMOTIVE GRADE
JETSON TK1
gigE, usb3.0, HDMI
running Linux4Tegra
Coming to Android K1
Devices soon..
VISIONWORKS MOTIVATION
Advanced
Silicon
+ = Widespread vision
processing in embedded,
mobile and automotive
devices and applications
VisionWorks Simplify vision programming
Fully optimized and accelerated
Modular and Extensible
OUR APPROACH
Open Platform & Easy to Program
Multi-Camera based complex systems
Ecosystem Leverage
VISIONWORKS SOFTWARE STACK
Application Code
Tegra K1
CUDA Libraries
VisionWorks Primitives
Classifier
Corner
Detection 3rd Party
Vision Pipeline Samples
Object
Detection
3rd Party
Pipelines …
…
SLAM
VisionWorks
Framework
TEGRA K1 CUDA DEVELOPMENT
CUDA-Aware Editor
Automated CPU to GPU code refactoring
Semantic highlighting of CUDA code
Integrated code samples & docs
Nsight Debugger
Simultaneously debug of CPU and GPU
Inspect variables across CUDA threads
Use breakpoints & single-step debugging
Nsight Profiler
Quickly identifies performance issues
Integrated expert system
Source line correlation
Cross platform development
Native memcheck, GDB, nvprof
VisionWorks
OpenCV
CUDA LIBRARIES
NPP CUFFT
CUBLAS CUDA Math Lib
OpenVX and OpenCV (Those are Complementary)
Governance Community driven open source
with no formal specification Formal specification defined and
implemented by hardware vendors
Conformance No conformance tests for consistency and every vendor implements different subset
Full conformance test suite / process creates a reliable acceleration platform
Portability APIs can vary depending on processor Hardware abstracted for portability
Scope Very wide
1000s of imaging and vision functions Multiple camera APIs/interfaces
Tight focus on hardware accelerated functions for mobile vision Use external camera API
Efficiency Memory-based architecture
Each operation reads and writes memory Graph-based execution
Optimizable computation, data transfer
Use Case Rapid experimentation Production development & deployment
OPENVX GRAPHS – THE KEY TO EFFICIENCY Directed graphs for processing power and efficiency
— Each Node can be implemented in software or accelerated hardware
— Nodes may be fused to eliminate memory transfers
— Processing can be tiled to keep data entirely in local memory/cache
EGLStreams route data from camera and to application
Can extend with “VisionWorks” nodes using CUDA
OpenVX Node
VisionWorks Node
OpenVX Node
VisionWorks Node
Application Native
Camera
Control
Example OpenVX Graph
OPENVX-BASED PRIMITIVES
Absolute Difference
Accumulate
Accumulate Squared
Accumulate Weighted
Arithmetic Addition
Arithmetic Subtraction
Bitwise And
Bitwise Exclusive Or
Bitwise Inclusive Or
Bitwise Not
Box Filter
Canny Edge Detector
Channel Combine
Channel Extract
Color Convert
Convert Bit depth
Dilate Image
Equalize Histogram
Erode Image
Gaussian Filter
Histogram
Image Pyramid
Magnitude
Mean and Standard Deviation
Median Filter
Min, Max Location
Optical Flow Pyramid (LK)
Phase
Pixel-wise Multiplication
Remap
Scale Image
TableLookup
Thresholding
Warp Affine
VISIONWORKS PRIMITIVES – JAN 2014
Sobel
Convolve
Bilateral Filter
Integral Image
Integral Histogram
Corner Harris
Corner FAST
Image Pyramid
Optical Flow PyrLK
Optical Flow Farneback
Warp Perspective
Hough Lines
Fast NLM Denoising
Stereo Block Matching
IME (Iterative Motion
Estimation)
HOG (Histogram of
Oriented Gradients)
Soft Cascade Detector
Object Tracker
TLD Object Tracker
SLAM
Path Estimator
MedianFlow Estimator
VISIONWORKS PIPELINES (V0.10)
Structure From Motion/SLAM
Pedestrian Detection Vehicle detection Object tracking
Dense optical flow Active Shape Model Denoising
VISIONWORKS – LOOKING FORWARD
Enable multi-camera applications
Depth sensor fusion
3D world interpretation
Additional performance optimization
Conformance with OpenVX once specification finalized
STRUCTURE FROM MOTION
Corner
Detection
Image
Pyramid
Generation
Optical
Flow
Triangulation
& Pose
Estimation
True pose
Rough estimate
Previous pose
STRUCTURE FROM MOTION (SFM) BENCHMARKING
Image Pyramid
FastCorner
Detection
Harris Corner
Detection Optical
Flow
8.8
21.05
84.04
21.25
1
10
100
Speedup (x) GPU vs ARM code on T124*
NVIDIA Confidential and Proprietary Information
NVIDIA Confidential and Proprietary Information
NVIDIA Confidential and Proprietary Information
FEATURE TRACKING VIDEO
NVIDIA Confidential and Proprietary Information
NVIDIA Confidential and Proprietary Information
AUDI AUTO PILOTED DRIVING
SUMMARY
Tegra K1内蔵のKeplerはTesla/Quadro/GeForceとアーキテクチャを共通とする
スケーラブルなGPU
これによりTesla/Quadro/GeForceで熟成されたCUDAのソフト資産、開発環境が
使用可能。また特に認識に必須の学習過程に共通アーキテクチャのTesla等大型
GPUを使用することにより、大幅に学習時間を短縮可能
NVIDIAはこのCUDA Foundationの上に画像処理、画像認識のプログラム開発環
境であるVisionWorksを構築
VisionWorksは従来から幅広く活用されているOpenCVライブラリの他に効率、移
植性、仕様管理に優れたOpenVX、VisionWorksライブラリも提供
すでにTegra K1で実用化に近い数々のADAS応用が開発されている
Thank you