딥러닝구현기법 “usingiemek.org/uploaddata/editor/bbs3/201611/3a8f0c20342940... ·...

Post on 05-Mar-2020

1 Views

Category:

Documents

0 Downloads

Preview:

Click to see full reader

TRANSCRIPT

딥러닝구현기법 “using Caffe”

2016-11-11

DGIST 미래자동차융합연구센터

Heechul Jung

2016 대한임베디드공학회추계학술대회 Tutorial

Learning Deep Learning. (Research-oriented)

Install linux.(Ubuntu, fedora, ..)

Read papers.(ArXiv, CVPR, ICCV, NIPS, ICML, ICLR)

Install deep learning tool.(Caffe, torch, tensorflow..)

Learning deep learning tool.

Reproduce state-of-the-art algorithms. (Baseline)

1 day

several weeks

1 day

1 day

Implementing Idea.

few weeks

Writing a paper?

Real-time Object Recognition

CPU : Intel i5-4690 CPU 3.5GHzRAM : 18GBGPU : NVIDIA Geforce GTX770

How to use Caffe.

What is Caffe?Open framework, models, and worked examplesfor deep learning

- < 2 years

- 600+ citations, 100+ contributors, 6,000+ stars

- 3,400+ forks

- focus has been vision, but branching out:sequences, reinforcement learning, speech + text

Prototype Train Deploy

What is Caffe?Open framework, models, and worked examplesfor deep learning- Pure C++ / CUDA architecture for deep learning- Command line, Python, MATLAB interfaces

- Fast, well-tested code

- Tools, reference models, demos, and recipes

- Seamless switch between CPU and GPU

Prototype Train Deploy

Caffe is a Community project pulse

Open Model Collection

The Caffe Model Zooopen collection of deep models to share innovationVGG ILSVRC14 + Devil models in the zooNetwork-in-Network / CCCP model in the zooMIT Places scene recognition model in the zoo

help disseminate and reproduce research

bundled tools for loading and publishing models

Share Your Models! with your citation + license of course

Recipe for Brewing

Buy NVIDIA graphic cards.Install Caffe.Convert the data to Caffe-format

lmdb, leveldb, hdf5 / .mat, list of images, etc.

Define the Net (prototxt)Configure the Solver (prototxt)caffe train -solver solver.prototxt -gpu 0

Examples are your friendscaffe/examples/mnist,cifar10,imagenet

caffe/examples/*.ipynb

caffe/models/*

Choose your graphic card.

NVIDIA K40Performance is best with ECC off and boost clock enabled. While ECC makes a negligible difference in speed, disabling it frees 1 GB of GPU memory.Best settings with ECC off and maximum clock speed in standard Caffe:• Training is 26.5 secs / 20 iterations (5,120 images)• Testing is 100 secs / validation set (50,000 images)Best settings with Caffe + cuDNN acceleration:• Training is 19.2 secs / 20 iterations (5,120 images)• Testing is 60.7 secs / validation set (50,000 images)

NVIDIA TitanTraining: 26.26 secs / 20 iterations (5,120 images). Testing: 100 secs / validation set (50,000 images).cuDNN Training: 20.25 secs / 20 iterations (5,120 images). cuDNN Testing: 66.3 secs / validation set (50,000images).

NVIDIA K20Training: 36.0 secs / 20 iterations (5,120 images). Testing: 133 secs / validation set (50,000 images).

NVIDIA GTX 770Training: 33.0 secs / 20 iterations (5,120 images). Testing: 129 secs / validation set (50,000 images).cuDNN Training: 24.3 secs / 20 iterations (5,120 images). cuDNN Testing: 104 secs / validation set (50,000images).

Installationdetailed documentation:

http://caffe.berkeleyvision.org/installation.html

required packages:CUDA, OPENCVBLAS (Basic Linear Algebra Subprograms):

operations like matrix multiplication, matrix addition, both implementation for CPU(cBLAS) andGPU(cuBLAS). provided by MKL(INTEL), ATLAS, openBLAS, etc.

Boost: a c++ library.> Use some of its math functions and shared_pointer.

glog,gflags provide logging & command line utilities.> Essential for debugging.

leveldb, lmdb: database io for your program.> Need to know this for preparing your own data.

protobuf: an efficient and flexible way to define data structure.> Need to know this for defining new layers.

Define your task

Dog?Cat?

Next stepPreparing data => LevelDB, LMDB

Model Definition (tran_val.prototxt)

Solver (solver.prototxt)

DownloadImage Data

LevelDB,LMDB

train_val.prototxt

solver.prototxt

Preparing dataIf you want to run CNN on other dataset:

caffe reads data in a standard database format.

You have to convert your data to leveldb/lmdb manually.

Creating image set

for imagenet dataset…

Using LMDB

./convert_imageset --resize_height=256 --resize_width=256 --shuffle ./data/imagenet ./data/imagenet/train.txt ./ilsvrc12_train_lmdb --backend=lmdb

Define your network (train_val.prototxt)

Define data layer.

Define specific layers.

Convolution layer.

Fully connected layer (Inner product layer)

Pooling layer.

Activation function layer.

Define loss function.

Define your network (train_val.prototxt)

LogReg ↑

"dummy-net"

{ name:

{ name:

{ name:

"data" …}

"conv" …}

"pool" …}

more layers …

name:

layers

layers

layers

layers { name: "loss" …}

net:

blue: layers you need to define

yellow: data blobs

LeNet →

examples/mnist/lenet_train.prototxt

ImageNet, Krizhevsky 2012 →

Define your network (train_val.prototxt)

Data Layerlayer {name: "cifar"type: "Data"top: "data"top: "label"include {phase: TRAIN

}transform_param {mean_file: "examples/cifar10/mean.binaryproto"

}data_param {source: "examples/cifar10/cifar10_train_lmdb"batch_size: 100backend: LMDB

}}

layer {name: "cifar"type: "Data"top: "data"top: "label"include {phase: TEST

}transform_param {mean_file: "examples/cifar10/mean.binaryproto"

}data_param {source: "examples/cifar10/cifar10_test_lmdb"batch_size: 100backend: LMDB

}}

Define your network (train_val.prototxt)

Conv Layerlayer {name: "conv1"type: "Convolution"bottom: "data"top: "conv1"param {lr_mult: 1

}param {lr_mult: 2

}convolution_param {num_output: 32pad: 2kernel_size: 5stride: 1weight_filler {type: "gaussian"std: 0.0001

}bias_filler {type: "constant"

}}

}

data

convolution

Convolution•Layer type: Convolution

•CPU implementation: ./src/caffe/layers/convolution_layer.cpp

•CUDA GPU implementation: ./src/caffe/layers/convolution_layer.cu

•Parameters (ConvolutionParameter convolution_param)

•Required•num_output (c_o): the number of filters

•kernel_size (or kernel_h and kernel_w): specifies height and width of

each filter

•Strongly Recommended•weight_filler [default type: 'constant' value: 0]

•Optional•bias_term [default true]: specifies whether to learn and apply a set of

additive biases to the filter outputs•pad (or pad_h and pad_w) [default 0]: specifies the number of pixels to

(implicitly) add to each side of the input•stride (or stride_h and stride_w) [default 1]: specifies the intervals at

which to apply the filters to the input•group (g) [default 1]: If g > 1, we restrict the connectivity of each filter to a

subset of the input. Specifically, the input and output channels are separated

into g groups, and the ith output group channels will be only connected to

the ith input group channels.

Define your network (train_val.prototxt)

Fully connected layerlayer {name: "ip1"type: "InnerProduct"bottom: "pool3"top: "ip1"param {lr_mult: 1

}param {lr_mult: 2

}inner_product_param {num_output: 64weight_filler {type: "gaussian"std: 0.1

}bias_filler {type: "constant"

}}

}

Inner Product•Layer type: InnerProduct

•CPU implementation: ./src/caffe/layers/inner_product_layer.cpp•CUDA GPU implementation: ./src/caffe/layers/inner_product_layer.cu•Parameters (InnerProductParameter inner_product_param)

•Required•num_output (c_o): the number of filters

•Strongly recommended•weight_filler [default type: 'constant' value: 0]

•Optional•bias_filler [default type: 'constant' value: 0]•bias_term [default true]: specifies whether to learn and apply a set of additive biases to the filter outputs

Define your network (train_val.prototxt)

Pooling layerPooling•Layer type: Pooling

•CPU implementation: ./src/caffe/layers/pooling_layer.cpp•CUDA GPU implementation: ./src/caffe/layers/pooling_layer.cu•Parameters (PoolingParameter pooling_param)

•Required•kernel_size (or kernel_h and kernel_w): specifies height and width of each filter

•Optional•pool [default MAX]: the pooling method. Currently MAX,

AVE, or STOCHASTIC•pad (or pad_h and pad_w) [default 0]: specifies the number of pixels to (implicitly) add to each side of the input•stride (or stride_h and stride_w) [default 1]: specifies the intervals at which to apply the filters to the input

layer {name: "pool1"type: "Pooling"bottom: "conv1"top: "pool1"pooling_param {pool: MAXkernel_size: 3stride: 2

}}

Define your network (train_val.prototxt)

Activation function

layer {name: "relu1"type: "ReLU"bottom: "pool1"top: "pool1"

}

ReLU / Rectified-Linear and Leaky-ReLU•Layer type: ReLU

•CPU implementation: ./src/caffe/layers/relu_layer.cpp•CUDA GPU implementation: ./src/caffe/layers/relu_layer.cu•Parameters (ReLUParameter relu_param)

•Optional•negative_slope [default 0]: specifies whether to

leak the negative part by multiplying it with the

slope value rather than setting it to 0.

Define your network (train_val.prototxt)

Loss layer

layer {name: "loss"type: "SoftmaxWithLoss"bottom: "ip2"bottom: "label"top: "loss"

}

ClassificationSoftmaxWithLoss

HingeLoss

Linear RegressionEuclideanLoss

Attributes / MulticlassificationSigmoidCrossEntropyLoss

Others…

New TaskNewLoss

DataCon-volve

PoolCon-volve

PoolInner Prod

...Rect-ify

Rect-ify

Pre-dict

Label

Loss

network does not need to be linear

linear network:

DataCon-volve

PoolCon-volve

PoolInner Prod

...Rect-ify

Rect-ify

Pre-dict

Label

Loss

? ?

?

...

...

?

?

? ? Sum

directed acyclic graph:

—> a little more about the network

Define your network (train_val.prototxt)

23

Solving: Training a Net (solver.prototxt)Optimization like model definition is configuration.

train_net: "lenet_train.prototxt"base_lr: 0.01

momentum: 0.9

weight_decay: 0.0005

max_iter: 10000

snapshot_prefix: "lenet_snapshot"

All you need to run things on the GPU.

> caffe train -solver lenet_solver.prototxt -gpu 0

Stochastic Gradient Descent (SGD) + momentumAdaptive Gradient (ADAGRAD) · Nesterov’s Accelerated Gradient (NAG)

Solving: Training a Net (solver.prototxt)net: "train_val.prototxt"

test_iter: 100

test_interval: 500

base_lr: 0.001

momentum: 0.9

weight_decay: 0.004

lr_policy: "fixed"

display: 100

max_iter: 4000

snapshot: 4000

snapshot_prefix: "examples/cifar10/cifar10_quick"

solver_mode: GPU

Model definition file

Iteration for test

Test interval

Learning rateMomentum

Weight decayLearning rate policy

Print

Max iteration number for traning

Save

Save filename

CPU/GPU

Additional detailsDownload caffe (https://github.com/BVLC/caffe)

Installsudo apt-get install libprotobuf-dev libleveldb-dev libsnappy-dev libopencv-dev libboost-all-dev libhdf5-serial-dev libgflags-dev libgoogle-glog-dev liblmdb-dev protobuf-compiler libatlas-base-dev

CUDNN (optional)Download from NVIDIAsudo cp lib* /usr/local/cuda/lib64/sudo cp cudnn.h /usr/local/cuda/include/

Change config file name In caffe folder: “Makefile.config.example” “Makefile.config”

To use CUDNN (optional)In file “Makefile.config”: # USE_CUDNN := 1 USE_CUDNN := 1

CompileIn caffe folder: make all –j8 faster

Download DataExecute in folder “caffe/data/cifar10”: sh get_cifar10.sh

Create DataMove file: “caffe/examples/cifar10/create_cifar10.sh” “caffe/create_cifar10.sh”

Execute in folder “caffe/”: sh create_cifar10.sh

cifar10_test_lmdb

cifar10_train_lmdb

mean.binaryproto

TrainMove file: “caffe/examples/cifar10/train_quick.sh” “caffe/train_quick.sh”

Execute in folder “caffe/”: sh train_quick.sh75.11%

Distribute your network.“.caffemodel”

weight parameters.

“_deploy.prototxt”model definition file.

layer {name: "ip1"type: "InnerProduct"bottom: "pool3"top: "ip1"param {lr_mult: 1decay_mult: 250

}param {lr_mult: 2decay_mult: 0

}inner_product_param {num_output: 10

}}layer {name: "prob"type: "Softmax"bottom: "ip1"top: "prob"

}

name: "CIFAR10_full_deploy"input: "data"input_dim: 1input_dim: 3input_dim: 32input_dim: 32layer {name: "conv1"type: "Convolution"bottom: "data"top: "conv1"param {lr_mult: 1

}param {lr_mult: 2

}convolution_param {num_output: 32pad: 2kernel_size: 5stride: 1

}}

Finetuning modelsExample

ImageNet dataset => Style dataset

Different DB, the number of class.

Finetuning models

● Simply change a few lines in the layer definition new name = new params

—> what if you want to transfer the weight of a existing model to finetune another dataset / task

Input:A differentsource

Last Layer:A differentclassifier

layers {

name: "data"

type: “Data”

data_param {

source:

"ilsvrc12_train_leveldb"

"./data/mean_file:

ilsvrc12"

...

}

...

...

layers {

name: "fc8"

type:"InnerProduct”

blobs_lr: 1

blobs_lr: 2

weight_decay: 1

weight_decay: 0

inner_product_param {

num_output: 1000

...

}

layers {

name: "data"

type: “Data”

data_param {

source: "style_leveldb"

mean_file: "./data/

ilsvrc12"

...

}

...

}

...

layers {

name: "fc8-style"

type: "InnerProduct”

blobs_lr: 1

blobs_lr: 2

weight_decay: 1

weight_decay: 0

inner_product_param {

num_output: 20

...

}

—solver

—weights

models/finetune_flickr_style/solver.prototxt

bvlc_reference_caffenet.caffemodel

> finetune_net.bin solver.prototxt model_file

old caffe:

new caffe:

> caffe train

net =new Caffe::Net("style_solver.prototxt");

net.CopyTrainedNetFrom(pretrained_model);

solver.Solve(net);

Finetuning models

Making Deep Residual Network.

Revolution of Depth

3.57

6.7 7.3

16.4

11.7

25.8

28.2

ILSVRC'15 ILSVRC'14 ILSVRC'14 ILSVRC'13 ILSVRC'12 ILSVRC'11 ILSVRC'10

ResNet GoogleNet VGG AlexNet

ImageNet Classification top-5 error (%)

shallow8 layers

19 layers22 layers

152 layers

8 layers

Very deep

Ultra deep

Deep

Deep Residual Network

Making .prototxt

layer {name: "cifar"type: "Data"top: "data"top: "label"include {phase: TRAIN

}transform_param {mean_file: "examples/cifar10/mean.binaryproto"

}data_param {source: "examples/cifar10/cifar10_train_lmdb"batch_size: 100backend: LMDB

}}

2734 lines/ 44 layers

1202 layers??????2734/44*1202

=74687.909… lines?If 1 line = 1 sec,

it takes approximately 20 hours.

CONV

Batch Normalization

Scale Layer

ReLU

PYTHON Implementation

from caffe import layers as L

def conv_bn_cifar10(bottom, nout, ks = 3, stride=1, pad = 0, is_test = True, learn = True):

if learn:

param = [dict(lr_mult=1, decay_mult=1)]

else:

param = [dict(lr_mult=0, decay_mult=0), dict(lr_mult=0, decay_mult=0)]

conv = L.Convolution(bottom, kernel_size=ks, stride=stride,

num_output=nout, pad=pad, param = param, weight_filler=dict(type="msra"), bias_filler=dict(type="constant"), bias_term = False)

bn = L.BatchNorm(conv, param=[dict(lr_mult=0), dict(lr_mult=0), dict(lr_mult=0)], batch_norm_param=dict(use_global_stats=is_test))

scale = L.Scale(bn)

relu = L.ReLU(scale)

return conv, bn, scale, relu

CONV

Batch Normalization

Scale Layer

ReLU

Issues when Training Neural Networks

Most slides were obtained from Stanford C231n (Prof. Fei-Fei Li).

ConvNets need a lot of data to train.

ImageNet data

1. Train on ImageNet 2. Finetune network on

your own data

your data

1. Train on ImageNet

2. If small dataset: fix all weights (treat CNN as fixed feature extractor), retrain only theclassifier

i.e. swap the Softmax layer at the end

3. If you have medium sized dataset, “finetune” instead: use the old weights as initialization, train the full network or only some of the higherlayers

retrain bigger portion of the network, or even all of it.

Transfer Learning

E.g. Caffe Model Zoo: Lots of pretrained ConvNetshttps://github.com/BVLC/caffe/wiki/Model-Zoo

...

Tuning Learning Rate.

Double check that the loss is reasonable:

returns the loss and the g

radient for all parameters

disable regularization

loss ~2.3. “

correct “ for

10 classes

Double check that the loss is reasonable:

crank up regularization

loss went up, good. (sanity check)

Lets try to train now…

Tip: Make sure that

you can overfit very

small portion of the

training dataThe above code:

- take the first 20 examples from

CIFAR-10- turn off regularization (reg = 0.0)

- use simple vanilla ‘sgd’

45

Lets try to train now…

Tip: Make sure that

you can overfit very

small portion of the

training data

Very small loss, tra

in accuracy 1.00, n

ice!

46

Lets try to train now…

I like to start with small

regularization and find

learning rate that mak

es the loss go down.

47

Lets try to train now…

I like to start with small

regularization and find

learning rate that mak

es the loss go down.

Loss barely changing

Lets try to train now…

I like to start with small

regularization and find

learning rate that mak

es the loss go down.

loss not going down:

learning rate too low

Loss barely changing: Learning rate is

probably too low

Lets try to train now…

I like to start with small

regularization and find

learning rate that mak

es the loss go down.

loss not going down:

learning rate too low

Loss barely changing: Learning rate is

probably too low

Notice train/val accuracy goes to 20% t

hough, what’s up with that? (remember

this is softmax)

Lets try to train now…

I like to start with small

regularization and find

learning rate that mak

es the loss go down.

loss not going down:

learning rate too low

Okay now lets try learning rate 1e6. What could

possibly go wrong?

cost: NaN almost

always means high

learning rate...

Lets try to train now…

I like to start with small

regularization and find

learning rate that mak

es the loss go down.

loss not going down:

learning rate too low l

oss exploding: learni

ng rate too high

52

Lets try to train now…

I like to start with small

regularization and find

learning rate that mak

es the loss go down.

loss not going down:

learning rate too low l

oss exploding: learni

ng rate too high

3e-3 is still too high. Cost explodes….

=> Rough range for learning rate we

should be cross-validating is some

where [1e-3 … 1e-5]

Monitor and visualize the loss curve

Monitor and visualize the accuracy:

big gap = overfitting

=> increase regularization strength?

no gap=> increase model capacity?

Squeezing out the last few percent.

VGG Net VGG Net (Oxford)

The second places in the classification tasks. Stacked 3x3 filter No LRN layers Stride 1

The power of small filters

Suppose we stack two CONV layers with receptive field size 3x3

=> Each neuron in 1st CONV sees a 3x3 region of input.

1st CONV neuron

view of the input:

(and stride 1)

The power of small filters

Suppose we stack two CONV layers with receptive field size 3x3

=> Each neuron in 1st CONV sees a 3x3 region of input.

Q: What region of input does each neuron in 2nd CONV see?

2nd CONV neuron

view of 1st conv:

Suppose we stack two CONV layers with receptive field size 3x3

=> Each neuron in 1st CONV sees a 3x3 region of input.

Q: What region of input does each neuron in 2nd CONV see?

X2nd CONV neuron

view of input: Answer: [5x5]

The power of small filters

Suppose we stack three CONV layers with receptive field size 3x3

Q: What region of input does each neuron in 3rd CONV see?

3rd CONV neuron

view of 2nd CONV:

The power of small filters

Suppose we stack three CONV layers with receptive field size 3x3

Q: What region of input does each neuron in 3rd CONV see?

X

X

Answer: [7x7]

The power of small filters

Suppose input has depth C & we want output depth C as well

1x CONV with 7x7 filters 3x CONV with 3x3 filters

Number of weights: Number of weights:

The power of small filters

Number of weights:

C*(7*7*C)

= 49 C^2

Number of weights:

Suppose input has depth C & we want output depth C as well

1x CONV with 7x7 filters 3x CONV with 3x3 filters

The power of small filters

Number of weights:

C*(7*7*C)

= 49 C^2

Number of weights:

C*(3*3*C) + C*(3*3*C) + C*(3*3*C)

= 3 * 9 * C^2

= 27 C^2

Suppose input has depth C & we want output depth C as well

1x CONV with 7x7 filters 3x CONV with 3x3 filters

The power of small filters

Number of weights:

C*(7*7*C)

= 49 C^2

Number of weights:

C*(3*3*C) + C*(3*3*C) + C*(3*3*C)

= 3 * 9 * C^2

= 27 C^2

Fewer parameters and more nonlinearities = GOOD.

Suppose input has depth C & we want output depth C as well

1x CONV with 7x7 filters 3x CONV with 3x3 filters

The power of small filters

“More non-linearities” and “deeper” usually gives better

performance.

[Network in Network, Lin et al. 2013]

The power of small filters

“More non-linearities” and “deeper” usually gives better

performance.

=> 1x1 CONV!(Usually follows a normal CONV, e.g.

[3x3 CONV - 1x1 CONV]

[Network in Network, Lin et al. 2013]

The power of small filters

[Network in Network, Lin et al. 2013]

“More non-linearities” and “deeper” usually gives better

performance.

=> 1x1 CONV!(Usually follows a normal CONV, e.g.

[3x3 CONV - 1x1 CONV]

3x3 CONV view of input

1x1 CONV view of output

of 3x3 CONV

The power of small filters

“More non-linearities” and “deeper” usually gives better

performance.

=> 1x1 CONV!(Usually follows a normal CONV, e.g.

[3x3 CONV - 1x1 CONV]

[Network in Network, Lin et al. 2013]

The power of small filters

[Very Deep Convolutional Networks for Large-Scale Image Recognition, Simonyan et al., 2014]

=> Evidence that using 3x3 instead of

1x1 works better

The power of small filters

• i.e. simulating “fake” data

• explicitly encoding image transfor

mations that shouldn’t change obj

ect identity.

Data Augmentation

What the computer sees

1. Flip horizontally

Data Augmentation

73

2. Random crops/scales

Sample these during training (al

so helps a lot during test time)

e.g. common to see even up to 150 crops used

Data Augmentation

3. Random mix/combinations of :

- translation

- rotation

- stretching

- shearing,

- lens distortions, … (go crazy)

Data Augmentation

75

4. Color jittering(maybe even contrast jittering, etc.)

- Simple: Change contrast

small amounts, jitter the

color distributions, etc.

- Vignette,... (go crazy)

Data Augmentation

Data Augmentation

4. Color jittering(maybe even contrast jittering, etc.)

- Simple: Change contrast

small amounts, jitter the

color distributions, etc.

Fancy PCA way:1. Compute PCA on all [R,G,

B] points values in the tra

ining data

2. sample some color offset

along the principal comp

onents at each forward

pass

3. add the offset to all pixels in

a training image

(As seen in [Krizhevsky et al. 2012])

77

Notice the more general theme:1. Introduce a form of randomness in forward pass

2. Marginalize over the noise distribution during prediction

DropConnect

Dropout

Data Augmentation,

Model Ensembles

Real-time application using Caffe.

ImageNet CompetitionTotal 1000 classes

Each class has about 1300 images for training. (1300 x 1000 = 1,300,000)

It takes about one week for training CNN model.

Deep Learning

Hand-crafted

Real-time Object Recognition VGG Net (Oxford)

The second places in the classification tasks. Stacked 3x3 filter No LRN layers Stride 1

Real-time Object Recognition (contd.)

CPU : Intel i5-4690 CPU 3.5GHzRAM : 18GBGPU : NVIDIA Geforce GTX770

Implementation Detail

Step 1. Windows 7 64bit / NVIDIA Graphic card (optional)

Step 2. Install CUDA 6.5

Step 3. Download Caffe-windows (http://github.com/niuzhiheng/caffe)

Step 4. Download 3rd party libraries (http://github.com/niuzhiheng/caffe)

Step 5. Download VGG pre-trained weights / architecture

(http://www.robots.ox.ac.uk/~vgg/research/very_deep/)

Step 6. Implement Code

Implementation Detail (contd.)

CameraFrame

(Image)

CNN(Forward

Propagation)

Result(Top5)

OpenCV

Cuda

Caffe

Resizing

Implementation Detail (contd.)

// Test modeCaffe::set_phase(Caffe::TEST);

// mode setting - CPU/GPUCaffe::set_mode(Caffe::GPU);

// gpu device numberint device_id = 0;Caffe::SetDevice(device_id);

// prototxtNet<float> caffe_test_net("VGG_ILSVRC_19_layers_deploy.prototxt");

// caffemodel(weight)caffe_test_net.CopyTrainedLayersFrom("VGG_ILSVRC_19_layers.caffemodel");

name: "VGG_ILSVRC_19_layers"input: "data"input_dim: 1input_dim: 3input_dim: 224input_dim: 224layers {bottom: "data"top: "conv1_1"name: "conv1_1"type: CONVOLUTIONconvolution_param {

num_output: 64pad: 1kernel_size: 3

}}layers {bottom: "conv1_1"top: "conv1_1"name: "relu1_1"type: RELU

}

<prototxt>http://caffe.berkeleyvision.org/

for (k=0; k<3; k++){

for (i=0; i<IMAGE_SIZE; i++){

for (j=0; j< IMAGE_SIZE; j++){

blob.mutable_cpu_data()[blob.offset(0, k, i, j)] = (float)(unsigned char)small_image->imageData[i*small_image->widthStep+j*small_image->nChannels+k] - mean_val[k];

}}

}input_vec.push_back(&blob);

// forward propagationfloat loss;const vector<Blob<float>*>& result = caffe_test_net.Forward(input_vec, &loss);

// copy outputfor(i=0; i<1000; i++){

output[i] = result[0]->cpu_data()[i];}

Thank You!!E-mail : heechul@dgist.ac.kr

top related