"low-power embedded vision: a face tracker case study," a presentation from synopsys

Copyright © 2015 Synopsys Inc. 1

Pierre Paulin

May 12, 2015

Low-Power Embedded Vision:

A Face Tracker Case Study


• Embedded Vision Algorithm Pipeline

• Face Tracker Application

• Target Embedded Vision Processor & Programming Tools

• Mapping Face Tracker to Multi-core Heterogeneous EV Processor

• Lessons Learned

Outline


• Embedded vision processor leverages many silicon proven IPs

• DesignWare®: ARC® HS processor, AXI, DMA, Memory Compiler, …

• HAPS® FPGA-based rapid prototyping system

Synopsys at a Glance

>5,300 Masters/PhD

Degrees

>2,300 IP Designers

>1,500 Applications

Engineers

>$2.2B FY14

Revenue

32% Revenue

on R&D

>9,300 Employees


Vision Algorithm Pipeline

Pre-processing

Selecting Areas of Interest

Precise Processing of

Selected Areas

Decision Making

Noise reduction

Color space conversion

Image scaling

Gaussian pyramid

Object detection

Background subtraction

Feature extraction

Image segmentation

Conn.comp. labeling

Object recognition

Tracking

Feature matching

Gesture recognition

Motion analysis

Match/no match

Flag events

CNN


Vision Pipeline Example

Video surveillance pipeline

Grayscale &

Image

Pyramid

Face

Detection

Tracking &

Detection

Cascade

Fusion &

Learning


• Open source implementation of TLD (Tracking, Learning, Detection)

OpenTLD Block Diagram

Implementation: Georg Nebehay, based on the TLD (predator)

algorithm developed by Zdenek Kalal


OpenTLD Detection Cascade

Detection

Cascade

• Measure of uniformity: subwindows with variance

lower than given threshold are discarded

• Based on random ferns: pairwise

comparisons of pixel intensity, yielding

a probability based on learned object

model

• Measuring distance between

subwindow under evaluation

and learned positive and

negative templates


Face Detection and Tracking with CNN

Less complex tracker

that tracks context

around face

Face detection

CNN to quickly

discard areas

without faces

• Adaptation of OpenTLD to leverage CNN


CNN-based Detection Cascade

Detection

Cascade

• Use of face detection CNN to quickly

discard areas without faces

• Measure of uniformity: subwindows with

variance lower than given threshold are

discarded

• Based on random ferns: pairwise

comparisons of pixel intensity,

yielding a probability based on

learned object model

• Measuring distance between

subwindow under evaluation

and learned positive and

negative templates


Target Embedded Vision Processor & Tools

User

kernel

Embedded Vision Processor

Shared

Memory

DMA

AXI Interconnect

RISC Cluster

32-Bit

RISC

32-Bit

RISC

32-Bit

RISC

32-Bit

RISC

CNN Object

Detection Engine

…

…

PE PE PE

PE PE PE

Ui C

Kernel Lib

K1 Kn …

CNN

kernels

C1 Cm

Embedded Vision

Programming Tools Ui

Uj

Uk

Kn Cm


• Face tracker application, 30 images, single face tracking

• RISC computation requirements for OpenTLD

Mapping OpenTLD Face Tracker to

Homogeneous RISC Multi-core EV Processor

1.5 GOPS

0.2 GOPS 0.0003 GOPS

4.5 GOPS

0.5 GOPS

0.1 GOPS

RISC scalar: 6.8 GOPS for VGA @ 30 fps


Mapping Face Tracker with CNN to

Heterogeneous Multi-core EV Processor

1 GOPS (reduced from 4.5 GOPS

due to CNN filtering) 0.2 GOPS

N/A

0.5 GOPS (was 1.5 GOPS for KLT tracker)

0.5 GOPS

0.1 GOPS 0.4 GOPS

RISC scalar: 2.7 GOPS for VGA @ 30 fps

0.0003 GOPS

• Face tracker application, 30 images, single face tracking

• RISC computation requirements for Face tracker w. CNN

25 ‘GOPS’

on CNN

engine


Greyscale

Conversion

Image

Pyramid

Integral

Image

CNN

Wrapper

Tracking

Detection

Cascade

Non-max

Suppression

+

Fusion

+

Learning

+

Draw Box

OpenVX Capture of Face Tracker with CNN

Face

detection

on CNN

engine

RISC 1

(1 GOPS)

RISC 2

(1 GOPS)

RISC 3

(0.7 GOPS )

CNN

Engine

RISC 3

(0.7 GOPS )


OpenTLD vs Face Tracking Algorithm with CNN

RISC

GOPS

#RISC

Cores

CNN

‘GOPS’

#CNN

PEs

Area

(mm2)

Power

(mW)

OpenTLD 6.8 7 0 0 2.8 210

Face

Tracking

with CNN

2.7 3 25 2 1.8 108

0

0.5

1

1.5

2

2.5

3

OpenTLD Face Tracking w.CNN

Area (mm2)

0

50

100

150

200

250

OpenTLD Face Tracking w.CNN

Power (mW)


• Use of an accurate face detection CNN allows to filter out regions of the

images where no face is present

• Over 4x reduction of detection cascade computation cost

• Use of CNN allows to replace more complex feature tracker (PKLT) with

simpler one

• Over 3x reduction of tracker computation cost

• Tracking context around faces is useful for distinguishing between faces

detected by CNN

• Keeps bounding box on the “right” face in crowded scenes

• Rejects CNN detections that do not correspond to target

• Area and power of heterogeneous solution using CNN is significantly

lower than homogeneous multi-core RISC solution

• 33% area reduction

• 2X power reduction

Lessons Learned


• TLD algorithm

• Z. Kalal, K. Mikolajczyk, J. Matas. 2012. Tracking-Learning-Detection. IEEE

Trans. Pattern Anal. Mach. Intell. 34, 7 (July 2012), 1409-1422.

• OpenTLD thesis

• Georg Nebehay. Robust Object Tracking Based on Tracking-Learning-

Detection. MA thesis. TU Vienna, 2012.

• Yann LeCun CNN Presentation at EV Summit 2014

• http://www.embedded-vision.com/platinum-members/embedded-vision-

alliance/embedded-vision-training/videos/pages/may-2014-embedded-

vision-summit-facebook-keynote

• Stanford class CS231n: Convolutional Neural Networks for Visual Recognition

• http://cs231n.github.io/

• Synopsys Embedded Vision Processor website

• http://www.synopsys.com/dw/ipdir.php?ds=ev52-ev54

• Please come by to see our demo at the Synopsys booth:

Tables 3 & 4 in the Technology Showcase (Mission City Ballroom)

Resources

https://www.youtube.com/watch?v=wDu1nto531E


























http://cs231n.github.io/

http://www.synopsys.com/dw/ipdir.php?ds=ev52-ev54