tensorflow 深度學習快速上手班--電腦視覺應用

TensorFlow深度學習快速上⼿手班��

三、電腦視覺應⽤用

By Mark Chang

•  電腦視覺簡介 •  模型選擇與參數調整 •  影像識別實作

電腦視覺簡介

電腦視覺 •  電腦視覺是⼀一⾨門研究如何使機器「看」的科學 •  ⽤用電腦代替⼈人眼對⺫⽬目標進⾏行識別、跟蹤和測量

等機器視覺，並進⼀一步做圖像處理。 •  https://zh.wikipedia.org/wiki/%E8%AE

%A1%E7%AE%97%E6%9C%BA%E8%A7%86%E8%A7%89

影像識別

http://www.cs.toronto.edu/~fritz/absps/imagenet.pdf

物件偵測

http://papers.nips.cc/paper/5207-deep-neural-networks-for-object-detection.pdf

影像補⿑齊

http://arxiv.org/abs/1601.06759

藝術創作

http://arxiv.org/abs/1508.06576

卷積神經網路

影像識別 •  同⼀一個數字可能出現在圖⽚片中的不同部分 •  但這些圖⽚片所代表的數字相同

Local Connectivity

每個神經元只看到圖片中的一小區塊

Parameter Sharing

同一「種類」的神經元具有相同的weights

Parameter Sharing

不同「種類」的神經元具有不同的weights

卷積神經網路 •  Convolutional Layer

depth

width width depth

weights weights

height

shared weight

卷積神經網路 •  Stride •  Padding

Stride = 1

Stride = 2

Padding = 0

Padding = 1

視覺認知

http://www.nature.com/neuro/journal/v8/n8/images/nn0805-975-F1.jpg

特徵擷取

卷積神經網路 •  Pooling Layer

1 3 2 4

5 7 6 8

0 0 4 4

6 6 0 0

4 5

3 2 no overlap

no padding no weights

depth = 1

7 8

6 4

Maximum Pooling

Average Pooling

卷積神經網路

Convolutional Layer

Convolutional Layer Pooling

Layer

Pooling Layer

Receptive Fields Receptive Fields

Input Layer

卷積神經網路

Input Layer

Convolutional Layer with

Receptive Fields:

Max-pooling Layer with

Width =3, Height = 3

Filter Responses

Filter Responses

Input Image

影像識別實作

卷積神經網路實作 https://github.com/ckmarkoh/ntc_deeplearning_tensorflow/blob/master/sec3/convnet.ipynb

MNIST •  數字識別 •  多元分類：0~9

https://www.tensorflow.org/versions/r0.7/images/MNIST.png

Create Variables & Operators def weight_variable(shape):

return tf.Variable(tf.truncated_normal(shape, stddev=0.1)) def bias_variable(shape):

return tf.Variable(tf.constant(0.1, shape=shape)) def conv2d(x, W):

return tf.nn.conv2d(x, W, strides=[1, 1, 1, 1], padding='SAME') def max_pool_2x2(x):

return tf.nn.max_pool(x, ksize=[1, 2, 2, 1], strides=[1, 2, 2, 1], padding='SAME')

Computational Graph x_ = tf.placeholder(tf.float32, [None, 784], name="x_") y_ = tf.placeholder(tf.float32, [None, 10], name="y_”) x_image = tf.reshape(x_, [-1,28,28,1]) W_conv1 = weight_variable([5, 5, 1, 32]) b_conv1 = bias_variable([32]) h_conv1 = tf.nn.relu(conv2d(x_image, W_conv1) + b_conv1) h_pool1 = max_pool_2x2(h_conv1) W_conv2 = weight_variable([5, 5, 32, 64]) b_conv2 = bias_variable([64]) h_conv2 = tf.nn.relu(conv2d(h_pool1, W_conv2) + b_conv2) h_pool2 = max_pool_2x2(h_conv2) W_fc1 = weight_variable([7 * 7 * 64, 1024]) b_fc1 = bias_variable([1024]) h_pool2_flat = tf.reshape(h_pool2, [-1, 7*7*64]) h_fc1 = tf.nn.relu(tf.matmul(h_pool2_flat, W_fc1) + b_fc1) keep_prob = tf.placeholder(tf.float32) h_fc1_drop = tf.nn.dropout(h_fc1, keep_prob) W_fc2 = weight_variable([1024, 10]) b_fc2 = bias_variable([10]) y= tf.nn.softmax(tf.matmul(h_fc1_drop, W_fc2) + b_fc2)

卷積神經網路

nx28x28x1

nx28x28x32

nx14x14x32

nx14x14x64

nx7x7x64

nx1024 nx10

x_image

h_conv1

h_pool1

h_conv2

h_pool2

h_fc1 y

Reshape x_image = tf.reshape(x_, [-1,28,28,1])

x n

784 n

28

1

Convolutional Layer W_conv1 = weight_variable([5, 5, 1, 32]) b_conv1 = bias_variable([32])

5

1

32

32

5x5

1

32

32

W_conv1 W_conv1

b_conv1 b_conv1

Convolutional Layer tf.nn.conv2d(x, W , strides=[1, 1, 1, 1], padding='SAME')+b

1 5x5 1x1

28

28 28

28 strides=1

padding='SAME'

[ batch, in_height, in_width, in_channels ]

Convolutional Layer tf.nn.conv2d(x, W , strides=[1, 1, 1, 1], padding='SAME')+b

nx28x28x1 nx28x28x32 28

28

28

28

ReLU

h_conv1 = tf.nn.relu(conv2d(x_image, W_conv1) + b_conv1)

ReLU: ⇢

nin if nin > 0

0 otherwise

-0.5 0.2 0.3 -0.1

0.2 -0.3 -0.4 -1.1

2.1 -2.1 0.1 1.2

0.2 3.0 -0.3 0.5

0 0.2 0.3 0

0.2 0 0 0

2.1 0 0.1 1.2

0.2 3.0 0 0.5

Pooling Layer tf.nn.max_pool(x, ksize=[1, 2, 2, 1], strides=[1, 2, 2, 1], padding='SAME')

1x2x2x1 1

1

1

1

2

2x2 1x1

Pooling Layer tf.nn.max_pool(x, ksize=[1, 2, 2, 1], strides=[1, 2, 2, 1], padding='SAME')

2

strides=2

padding='SAME'

28

28 14

14

Pooling Layer h_pool1 = max_pool_2x2(h_conv1)

nx28x28x32 nx14x14x32

28

28 14

14

Reshape

h_pool2_flat

n 7*7*64

7

64

n h_pool2_flat = tf.reshape(h_pool2, [-1, 7*7*64])

GoogLeNet影像識別 https://github.com/ckmarkoh/ntc_deeplearning_tensorflow/blob/master/sec3/googlenet.ipynb

GoogLeNet

http://www.cs.unc.edu/~wliu/papers/GoogLeNet.pdf

22 layers deep network

訓練資料 •  ILSVRC 2014 Classification Challenge – http://www.image-net.org/challenges/

LSVRC/2014/ •  Dataset:

1000 categories – Training: 1,200,000 – Validation: 50,000 – Testing: 100,000

Inception Module

Load Computational Graph

model_fn = 'tensorflow_inception_graph.pb' graph = tf.Graph() sess = tf.InteractiveSession(graph=graph) graph_def = tf.GraphDef.FromString(open(model_fn).read()) t_input = tf.placeholder(np.float32, name='input') imagenet_mean = 139 t_preprocessed = tf.expand_dims(t_input - imagenet_mean, 0) tf.import_graph_def(graph_def, {'input': t_preprocessed}) t_output = graph.get_tensor_by_name("import/output2:0")

Load Label

f = open("label.json") labels = json.loads("".join(f.readlines())) f.close()

1: "kit fox, Vulpes macrotis", 2: "English setter", 3: "Siberian husky", 4: "Australian terrier", ...... 998: "stole", 999: "carbonara", 1000: "dumbbell"

Run Computational Graph

def load_image(imgfile): return np.float32(PIL.Image.open(imgfile).resize((224,224))) def get_class(image): return labels[str(np.argmax(sess.run([t_output], {t_input:

load_image(image)})))]

print get_class('img/img1.jpg')

leaf beetle, chrysomelid

講師資訊

•  Email: ckmarkoh at gmail dot com •  Blog: http://cpmarkchang.logdown.com •  Github: https://github.com/ckmarkoh

Mark Chang

•  Facebook: https://www.facebook.com/ckmarkoh.chang •  Slideshare: http://www.slideshare.net/ckmarkohchang •  Linkedin:

https://www.linkedin.com/pub/mark-chang/85/25b/847

43

tensorflow 深度學習快速上手班--電腦視覺應用

Technology