先端運転支援システム等に向けた -...

25
先端運転支援システム等に向けた NVIDAの新画像処理・画像認識プログラム開発環境 NVIDIA NEW IMAGE PROCESSING/RECOGNITION PROGRAM DEVELOPMENT ENVIRONMENT FOR ADAS AND OTHER APPLICATIONS 馬路 徹、シニア・ソリューション・アーキテクト、NVIDIA

Upload: others

Post on 11-Jul-2020

1 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: 先端運転支援システム等に向けた - NVIDIAon-demand.gputechconf.com/gtc/2014/jp/sessions/5002.pdf · refactoring Semantic highlighting of CUDA code Integrated code samples

先端運転支援システム等に向けた

NVIDAの新画像処理・画像認識プログラム開発環境

NVIDIA NEW IMAGE PROCESSING/RECOGNITION PROGRAM DEVELOPMENT

ENVIRONMENT FOR ADAS AND OTHER APPLICATIONS

馬路 徹、シニア・ソリューション・アーキテクト、NVIDIA

Page 2: 先端運転支援システム等に向けた - NVIDIAon-demand.gputechconf.com/gtc/2014/jp/sessions/5002.pdf · refactoring Semantic highlighting of CUDA code Integrated code samples

Tesla In Super Computers

Quadro In Work Stations

GeForce In PCs

Mobile Kepler In Tegra

192 CUDA Cores

Tegra K1

• Kepler

Architecture

• ISA Compatible

to GeForce,

Quadro, Tesla

• 64kB L1 Cache

and Shared

Memory

• 128kB L2 Cache

A15 A15 A15 LP

A15 A15

Page 3: 先端運転支援システム等に向けた - NVIDIAon-demand.gputechconf.com/gtc/2014/jp/sessions/5002.pdf · refactoring Semantic highlighting of CUDA code Integrated code samples

TEGRA K1 DEVELOPMENT PLATFORMS

JETSON X3 (TK1 PRO)

GigE, usb3.0, HDMI, CANBUS

running Vibrante Linux

AUTOMOTIVE GRADE

JETSON TK1

gigE, usb3.0, HDMI

running Linux4Tegra

Coming to Android K1

Devices soon..

Page 4: 先端運転支援システム等に向けた - NVIDIAon-demand.gputechconf.com/gtc/2014/jp/sessions/5002.pdf · refactoring Semantic highlighting of CUDA code Integrated code samples

VISIONWORKS MOTIVATION

Advanced

Silicon

+ = Widespread vision

processing in embedded,

mobile and automotive

devices and applications

VisionWorks Simplify vision programming

Fully optimized and accelerated

Modular and Extensible

Page 5: 先端運転支援システム等に向けた - NVIDIAon-demand.gputechconf.com/gtc/2014/jp/sessions/5002.pdf · refactoring Semantic highlighting of CUDA code Integrated code samples

OUR APPROACH

Open Platform & Easy to Program

Multi-Camera based complex systems

Ecosystem Leverage

Page 6: 先端運転支援システム等に向けた - NVIDIAon-demand.gputechconf.com/gtc/2014/jp/sessions/5002.pdf · refactoring Semantic highlighting of CUDA code Integrated code samples

VISIONWORKS SOFTWARE STACK

Application Code

Tegra K1

CUDA Libraries

VisionWorks Primitives

Classifier

Corner

Detection 3rd Party

Vision Pipeline Samples

Object

Detection

3rd Party

Pipelines …

SLAM

VisionWorks

Framework

Page 7: 先端運転支援システム等に向けた - NVIDIAon-demand.gputechconf.com/gtc/2014/jp/sessions/5002.pdf · refactoring Semantic highlighting of CUDA code Integrated code samples

TEGRA K1 CUDA DEVELOPMENT

CUDA-Aware Editor

Automated CPU to GPU code refactoring

Semantic highlighting of CUDA code

Integrated code samples & docs

Nsight Debugger

Simultaneously debug of CPU and GPU

Inspect variables across CUDA threads

Use breakpoints & single-step debugging

Nsight Profiler

Quickly identifies performance issues

Integrated expert system

Source line correlation

Cross platform development

Native memcheck, GDB, nvprof

Page 8: 先端運転支援システム等に向けた - NVIDIAon-demand.gputechconf.com/gtc/2014/jp/sessions/5002.pdf · refactoring Semantic highlighting of CUDA code Integrated code samples

VisionWorks

OpenCV

CUDA LIBRARIES

NPP CUFFT

CUBLAS CUDA Math Lib

Page 9: 先端運転支援システム等に向けた - NVIDIAon-demand.gputechconf.com/gtc/2014/jp/sessions/5002.pdf · refactoring Semantic highlighting of CUDA code Integrated code samples

OpenVX and OpenCV (Those are Complementary)

Governance Community driven open source

with no formal specification Formal specification defined and

implemented by hardware vendors

Conformance No conformance tests for consistency and every vendor implements different subset

Full conformance test suite / process creates a reliable acceleration platform

Portability APIs can vary depending on processor Hardware abstracted for portability

Scope Very wide

1000s of imaging and vision functions Multiple camera APIs/interfaces

Tight focus on hardware accelerated functions for mobile vision Use external camera API

Efficiency Memory-based architecture

Each operation reads and writes memory Graph-based execution

Optimizable computation, data transfer

Use Case Rapid experimentation Production development & deployment

Page 10: 先端運転支援システム等に向けた - NVIDIAon-demand.gputechconf.com/gtc/2014/jp/sessions/5002.pdf · refactoring Semantic highlighting of CUDA code Integrated code samples

OPENVX GRAPHS – THE KEY TO EFFICIENCY Directed graphs for processing power and efficiency

— Each Node can be implemented in software or accelerated hardware

— Nodes may be fused to eliminate memory transfers

— Processing can be tiled to keep data entirely in local memory/cache

EGLStreams route data from camera and to application

Can extend with “VisionWorks” nodes using CUDA

OpenVX Node

VisionWorks Node

OpenVX Node

VisionWorks Node

Application Native

Camera

Control

Example OpenVX Graph

Page 11: 先端運転支援システム等に向けた - NVIDIAon-demand.gputechconf.com/gtc/2014/jp/sessions/5002.pdf · refactoring Semantic highlighting of CUDA code Integrated code samples

OPENVX-BASED PRIMITIVES

Absolute Difference

Accumulate

Accumulate Squared

Accumulate Weighted

Arithmetic Addition

Arithmetic Subtraction

Bitwise And

Bitwise Exclusive Or

Bitwise Inclusive Or

Bitwise Not

Box Filter

Canny Edge Detector

Channel Combine

Channel Extract

Color Convert

Convert Bit depth

Dilate Image

Equalize Histogram

Erode Image

Gaussian Filter

Histogram

Image Pyramid

Magnitude

Mean and Standard Deviation

Median Filter

Min, Max Location

Optical Flow Pyramid (LK)

Phase

Pixel-wise Multiplication

Remap

Scale Image

TableLookup

Thresholding

Warp Affine

Page 12: 先端運転支援システム等に向けた - NVIDIAon-demand.gputechconf.com/gtc/2014/jp/sessions/5002.pdf · refactoring Semantic highlighting of CUDA code Integrated code samples

VISIONWORKS PRIMITIVES – JAN 2014

Sobel

Convolve

Bilateral Filter

Integral Image

Integral Histogram

Corner Harris

Corner FAST

Image Pyramid

Optical Flow PyrLK

Optical Flow Farneback

Warp Perspective

Hough Lines

Fast NLM Denoising

Stereo Block Matching

IME (Iterative Motion

Estimation)

HOG (Histogram of

Oriented Gradients)

Soft Cascade Detector

Object Tracker

TLD Object Tracker

SLAM

Path Estimator

MedianFlow Estimator

Page 13: 先端運転支援システム等に向けた - NVIDIAon-demand.gputechconf.com/gtc/2014/jp/sessions/5002.pdf · refactoring Semantic highlighting of CUDA code Integrated code samples

VISIONWORKS PIPELINES (V0.10)

Structure From Motion/SLAM

Pedestrian Detection Vehicle detection Object tracking

Dense optical flow Active Shape Model Denoising

Page 14: 先端運転支援システム等に向けた - NVIDIAon-demand.gputechconf.com/gtc/2014/jp/sessions/5002.pdf · refactoring Semantic highlighting of CUDA code Integrated code samples

VISIONWORKS – LOOKING FORWARD

Enable multi-camera applications

Depth sensor fusion

3D world interpretation

Additional performance optimization

Conformance with OpenVX once specification finalized

Page 15: 先端運転支援システム等に向けた - NVIDIAon-demand.gputechconf.com/gtc/2014/jp/sessions/5002.pdf · refactoring Semantic highlighting of CUDA code Integrated code samples

STRUCTURE FROM MOTION

Corner

Detection

Image

Pyramid

Generation

Optical

Flow

Triangulation

& Pose

Estimation

True pose

Rough estimate

Previous pose

Page 16: 先端運転支援システム等に向けた - NVIDIAon-demand.gputechconf.com/gtc/2014/jp/sessions/5002.pdf · refactoring Semantic highlighting of CUDA code Integrated code samples

STRUCTURE FROM MOTION (SFM) BENCHMARKING

Image Pyramid

FastCorner

Detection

Harris Corner

Detection Optical

Flow

8.8

21.05

84.04

21.25

1

10

100

Speedup (x) GPU vs ARM code on T124*

Page 17: 先端運転支援システム等に向けた - NVIDIAon-demand.gputechconf.com/gtc/2014/jp/sessions/5002.pdf · refactoring Semantic highlighting of CUDA code Integrated code samples

NVIDIA Confidential and Proprietary Information

Page 18: 先端運転支援システム等に向けた - NVIDIAon-demand.gputechconf.com/gtc/2014/jp/sessions/5002.pdf · refactoring Semantic highlighting of CUDA code Integrated code samples

NVIDIA Confidential and Proprietary Information

Page 19: 先端運転支援システム等に向けた - NVIDIAon-demand.gputechconf.com/gtc/2014/jp/sessions/5002.pdf · refactoring Semantic highlighting of CUDA code Integrated code samples

NVIDIA Confidential and Proprietary Information

Page 20: 先端運転支援システム等に向けた - NVIDIAon-demand.gputechconf.com/gtc/2014/jp/sessions/5002.pdf · refactoring Semantic highlighting of CUDA code Integrated code samples

FEATURE TRACKING VIDEO

Page 21: 先端運転支援システム等に向けた - NVIDIAon-demand.gputechconf.com/gtc/2014/jp/sessions/5002.pdf · refactoring Semantic highlighting of CUDA code Integrated code samples

NVIDIA Confidential and Proprietary Information

Page 22: 先端運転支援システム等に向けた - NVIDIAon-demand.gputechconf.com/gtc/2014/jp/sessions/5002.pdf · refactoring Semantic highlighting of CUDA code Integrated code samples

NVIDIA Confidential and Proprietary Information

Page 23: 先端運転支援システム等に向けた - NVIDIAon-demand.gputechconf.com/gtc/2014/jp/sessions/5002.pdf · refactoring Semantic highlighting of CUDA code Integrated code samples

AUDI AUTO PILOTED DRIVING

Page 24: 先端運転支援システム等に向けた - NVIDIAon-demand.gputechconf.com/gtc/2014/jp/sessions/5002.pdf · refactoring Semantic highlighting of CUDA code Integrated code samples

SUMMARY

Tegra K1内蔵のKeplerはTesla/Quadro/GeForceとアーキテクチャを共通とする

スケーラブルなGPU

これによりTesla/Quadro/GeForceで熟成されたCUDAのソフト資産、開発環境が

使用可能。また特に認識に必須の学習過程に共通アーキテクチャのTesla等大型

GPUを使用することにより、大幅に学習時間を短縮可能

NVIDIAはこのCUDA Foundationの上に画像処理、画像認識のプログラム開発環

境であるVisionWorksを構築

VisionWorksは従来から幅広く活用されているOpenCVライブラリの他に効率、移

植性、仕様管理に優れたOpenVX、VisionWorksライブラリも提供

すでにTegra K1で実用化に近い数々のADAS応用が開発されている

Page 25: 先端運転支援システム等に向けた - NVIDIAon-demand.gputechconf.com/gtc/2014/jp/sessions/5002.pdf · refactoring Semantic highlighting of CUDA code Integrated code samples

Thank you