empowering visual categorization with the gpu present by 陳群元 我是強壯 !

Post on 18-Dec-2015

235 Views

Category:

Documents

4 Downloads

Preview:

Click to see full reader

TRANSCRIPT

Empowering visual categorization with the GPU

Present by 陳群元

我是強壯 !

outline

我是強壯 !

Introduction Overview of visual categorization

Image feature extraction Category model learning Test image classification

GPU accelerated categorization Experimental setup Results

introduction

我是強壯 !

Use GPU accelerate the quantization and classification components of a visual categorization architecture

The algorithms and their implementations should push the state-of-the-art in categorization accuracy.

Visual categorization must be decomposable into components to locate bottlenecks.

Given the same input, implementations of a component on various hardware architectures must give the same output.

overview

我是強壯 !

我是強壯 !

Visual categorization system

我是強壯 !

Image Feature Extraction Point Sampling Strategy Descriptor Computation Bag-of-Words

Category Model Learning Test Image Classification

Visual categorization system

我是強壯 !

Image Feature Extraction Point Sampling Strategy Descriptor Computation Bag-of-Words

Category Model Learning Test Image Classification

Point sampling strategy

我是強壯 !

Dense sampling Typically, around10,000 points are sampled

per image Salient point method

Harris-Laplace salient point detector [29] Difference-of-Gaussians detector [28]

Visual categorization system

我是強壯 !

Image Feature Extraction Point Sampling Strategy Descriptor Computation Bag-of-Words

Category Model Learning Test Image Classification

Descriptors

我是強壯 !

SIFT descriptor ->128 dim 10 frames per second for 640x480

images(GPU) SURF descriptor

100 frames per second for 640x480 images(GPU) ColorSIFT descriptor ->384 dim

Triple of SIFT

Visual categorization system

我是強壯 !

Image Feature Extraction Point Sampling Strategy Descriptor Computation Bag-of-Words

Category Model Learning Test Image Classification

Bag-of-words

我是強壯 !

Vector quantization is computationally the most expensive part of the bag-of-words model.

Bag -> images set Words->features

Bag-of-words

我是強壯 !

N descriptors of length d in an image codebook with m elements

O(ndm) per image A tree-based codebook

O(nd log(m))->real-time on the GPU [25].

我是強壯 !

Visual categorization system

我是強壯 !

Image Feature Extraction Point Sampling Strategy Descriptor Computation Bag-of-Words

Category Model Learning Test Image Classification

Category model learning

我是強壯 !

precompute kernel function values kernel-based SVM algorithm

我是強壯 !

我是強壯 !

Support Vector Machines

Kernel Support Vector Machines

Visual categorization system

我是強壯 !

Image Feature Extraction Point Sampling Strategy Descriptor Computation Bag-of-Words

Category Model Learning Test Image Classification

Test image classification

我是強壯 !

我是強壯 !

outline

我是強壯 !

Introduction Overview of visual categorization

Image feature extraction Category model learning Test image classification

GPU accelerated categorization Parallel Programming on the GPU and CPU GPU-Accelerated Vector Quantization GPU-Accelerated Kernel Value Precomputation

Experimental setup Results

Parallel Programming on the GPU and CPU

我是強壯 !

SIMD instructions perform the same operation on multiple data elements at the same time

我是強壯 !

GPU-Accelerated Vector Quantization

我是強壯 !

The most expensive computational step in vector quantization is the calculation of the distance matrix.(n*m)

A:n*d matrix with all image descriptors as rows

B:m*d matrix with all codebook elements as rows

GPU-Accelerated Vector Quantization(cont.)

我是強壯 !

GPU-Accelerated Vector Quantization(cont.)

我是強壯 !

Compute the dot products between all rows of A and B (line 7).

matrix multiplications are the building block for many algorithms highly optimized BLAS linear algebra libraries containing this operation exist for both the CPU and the GPU.

我是強壯 !

GPU-Accelerated Kernel Value Precomputation

我是強壯 !

To compute kernel function values, we use the kernel function based on the distance

distance between feature vectors F and F’

kernel function based on this distance

GPU-Accelerated Kernel Value Precomputation(cont.)

我是強壯 !

multiple input features

For kernel value precomputation, memory usage is an important problem. for a dataset with 50, 000 images, the input data

is 12 GB and the output data is 19 GB to avoid holding all data in memory

simultaneously. We divide the processing into evenly sized chunks.(1024*1024)

GPU-Accelerated Kernel Value Precomputation(cont.)

我是強壯 !

EXPERIMENTAL SETUP

我是強壯 !

Experiment 1: Vector Quantization Speed CPU implementation is SIMD-optimized. codebook of size m = 4, 000 20, 000 descriptors per image descriptor lengths of d = 128 (SIFT) and d = 384

(ColorSIFT). Experiment 2: Kernel Value Precomputation Speed

chosen the large Mediamill Challenge training set of 30, 993 frames

Experiment 3: Visual Categorization Throughput comparison is made between the quad-core Core i7 920

CPU (2.66GHz) and the Gefore GTX260 GPU (27 cores).

Results

我是強壯 !

Experiment 1: Vector Quantization Speed Experiment 2: Kernel Value Precomputation

Speed Experiment 3: Visual Categorization

Throughput

Results

我是強壯 !

Experiment 1: Vector Quantization Speed Experiment 2: Kernel Value Precomputation

Speed Experiment 3: Visual Categorization

Throughput

Vector Quantization Speed(SIFT)

我是強壯 !

Vector Quantization Speed(ColorSIFT)

我是強壯 !

Results

我是強壯 !

Experiment 1: Vector Quantization Speed Experiment 2: Kernel Value Precomputation

Speed Experiment 3: Visual Categorization

Throughput

Kernel Value Precomputation Speed

我是強壯 !

Results

我是強壯 !

Experiment 1: Vector Quantization Speed Experiment 2: Kernel Value Precomputation

Speed Experiment 3: Visual Categorization

Throughput

Visual Categorization Throughput

我是強壯 !

Other applications

我是強壯 !

Application 1: k-means Clustering Application 2: Bag-of-Words Model for Text

Retrieval Application 3: Multi-Frame Processing for

Video Retrieval

Conclusions

我是強壯 !

This paper provides an efficiency analysis of a state-of-the art visual categorization pipeline based on the bag-of-words model.

two large bottlenecks were identified: the vector quantization step in the image feature extraction and the kernel value computation in the category classification

Compared to a multi-threaded CPU implementation on a quad-core CPU, the GPU is 4.8 times faster.

The end

我是強壯 !

Thank you!

top related