tensorflow 深度學習快速上手班--電腦視覺應用
TRANSCRIPT
TensorFlow深度學習快速上⼿手班������
三、電腦視覺應⽤用
By Mark Chang
• 電腦視覺簡介 • 模型選擇與參數調整 • 影像識別實作
電腦視覺簡介
電腦視覺 • 電腦視覺是⼀一⾨門研究如何使機器「看」的科學 • ⽤用電腦代替⼈人眼對⺫⽬目標進⾏行識別、跟蹤和測量
等機器視覺,並進⼀一步做圖像處理。 • https://zh.wikipedia.org/wiki/%E8%AE
%A1%E7%AE%97%E6%9C%BA%E8%A7%86%E8%A7%89
影像識別
http://www.cs.toronto.edu/~fritz/absps/imagenet.pdf
物件偵測
http://papers.nips.cc/paper/5207-deep-neural-networks-for-object-detection.pdf
影像補⿑齊
http://arxiv.org/abs/1601.06759
藝術創作
http://arxiv.org/abs/1508.06576
卷積神經網路
影像識別 • 同⼀一個數字可能出現在圖⽚片中的不同部分 • 但這些圖⽚片所代表的數字相同
Local Connectivity
每個神經元只看到圖片中的一小區塊
Parameter Sharing
同一「種類」的神經元具有相同的weights
Parameter Sharing
不同「種類」的神經元具有不同的weights
卷積神經網路 • Convolutional Layer
depth
width width depth
weights weights
height
shared weight
卷積神經網路 • Stride • Padding
Stride = 1
Stride = 2
Padding = 0
Padding = 1
視覺認知
http://www.nature.com/neuro/journal/v8/n8/images/nn0805-975-F1.jpg
特徵擷取
卷積神經網路 • Pooling Layer
1 3 2 4
5 7 6 8
0 0 4 4
6 6 0 0
4 5
3 2 no overlap
no padding no weights
depth = 1
7 8
6 4
Maximum Pooling
Average Pooling
卷積神經網路
Convolutional Layer
Convolutional Layer Pooling
Layer
Pooling Layer
Receptive Fields Receptive Fields
Input Layer
卷積神經網路
Input Layer
Convolutional Layer with
Receptive Fields:
Max-pooling Layer with
Width =3, Height = 3
Filter Responses
Filter Responses
Input Image
影像識別實作
卷積神經網路實作 https://github.com/ckmarkoh/ntc_deeplearning_tensorflow/blob/master/sec3/convnet.ipynb
MNIST • 數字識別 • 多元分類:0~9
https://www.tensorflow.org/versions/r0.7/images/MNIST.png
Create Variables & Operators def weight_variable(shape):
return tf.Variable(tf.truncated_normal(shape, stddev=0.1)) def bias_variable(shape):
return tf.Variable(tf.constant(0.1, shape=shape)) def conv2d(x, W):
return tf.nn.conv2d(x, W, strides=[1, 1, 1, 1], padding='SAME') def max_pool_2x2(x):
return tf.nn.max_pool(x, ksize=[1, 2, 2, 1], strides=[1, 2, 2, 1], padding='SAME')
Computational Graph x_ = tf.placeholder(tf.float32, [None, 784], name="x_") y_ = tf.placeholder(tf.float32, [None, 10], name="y_”) x_image = tf.reshape(x_, [-1,28,28,1]) W_conv1 = weight_variable([5, 5, 1, 32]) b_conv1 = bias_variable([32]) h_conv1 = tf.nn.relu(conv2d(x_image, W_conv1) + b_conv1) h_pool1 = max_pool_2x2(h_conv1) W_conv2 = weight_variable([5, 5, 32, 64]) b_conv2 = bias_variable([64]) h_conv2 = tf.nn.relu(conv2d(h_pool1, W_conv2) + b_conv2) h_pool2 = max_pool_2x2(h_conv2) W_fc1 = weight_variable([7 * 7 * 64, 1024]) b_fc1 = bias_variable([1024]) h_pool2_flat = tf.reshape(h_pool2, [-1, 7*7*64]) h_fc1 = tf.nn.relu(tf.matmul(h_pool2_flat, W_fc1) + b_fc1) keep_prob = tf.placeholder(tf.float32) h_fc1_drop = tf.nn.dropout(h_fc1, keep_prob) W_fc2 = weight_variable([1024, 10]) b_fc2 = bias_variable([10]) y= tf.nn.softmax(tf.matmul(h_fc1_drop, W_fc2) + b_fc2)
卷積神經網路
nx28x28x1
nx28x28x32
nx14x14x32
nx14x14x64
nx7x7x64
nx1024 nx10
x_image
h_conv1
h_pool1
h_conv2
h_pool2
h_fc1 y
Reshape x_image = tf.reshape(x_, [-1,28,28,1])
x n
784 n
28
1
Convolutional Layer W_conv1 = weight_variable([5, 5, 1, 32]) b_conv1 = bias_variable([32])
5
1
32
32
5x5
1
32
32
W_conv1 W_conv1
b_conv1 b_conv1
Convolutional Layer tf.nn.conv2d(x, W , strides=[1, 1, 1, 1], padding='SAME')+b
1 5x5 1x1
28
28 28
28 strides=1
padding='SAME'
[ batch, in_height, in_width, in_channels ]
Convolutional Layer tf.nn.conv2d(x, W , strides=[1, 1, 1, 1], padding='SAME')+b
nx28x28x1 nx28x28x32 28
28
28
28
ReLU
h_conv1 = tf.nn.relu(conv2d(x_image, W_conv1) + b_conv1)
ReLU: ⇢
nin if nin > 0
0 otherwise
-0.5 0.2 0.3 -0.1
0.2 -0.3 -0.4 -1.1
2.1 -2.1 0.1 1.2
0.2 3.0 -0.3 0.5
0 0.2 0.3 0
0.2 0 0 0
2.1 0 0.1 1.2
0.2 3.0 0 0.5
Pooling Layer tf.nn.max_pool(x, ksize=[1, 2, 2, 1], strides=[1, 2, 2, 1], padding='SAME')
1x2x2x1 1
1
1
1
2
2x2 1x1
Pooling Layer tf.nn.max_pool(x, ksize=[1, 2, 2, 1], strides=[1, 2, 2, 1], padding='SAME')
2
strides=2
padding='SAME'
28
28 14
14
Pooling Layer h_pool1 = max_pool_2x2(h_conv1)
nx28x28x32 nx14x14x32
28
28 14
14
Reshape
h_pool2_flat
n 7*7*64
7
64
n h_pool2_flat = tf.reshape(h_pool2, [-1, 7*7*64])
GoogLeNet影像識別 https://github.com/ckmarkoh/ntc_deeplearning_tensorflow/blob/master/sec3/googlenet.ipynb
GoogLeNet
http://www.cs.unc.edu/~wliu/papers/GoogLeNet.pdf
22 layers deep network
訓練資料 • ILSVRC 2014 Classification Challenge – http://www.image-net.org/challenges/
LSVRC/2014/ • Dataset:
1000 categories – Training: 1,200,000 – Validation: 50,000 – Testing: 100,000
Inception Module
Load Computational Graph
model_fn = 'tensorflow_inception_graph.pb' graph = tf.Graph() sess = tf.InteractiveSession(graph=graph) graph_def = tf.GraphDef.FromString(open(model_fn).read()) t_input = tf.placeholder(np.float32, name='input') imagenet_mean = 139 t_preprocessed = tf.expand_dims(t_input - imagenet_mean, 0) tf.import_graph_def(graph_def, {'input': t_preprocessed}) t_output = graph.get_tensor_by_name("import/output2:0")
Load Label
f = open("label.json") labels = json.loads("".join(f.readlines())) f.close()
1: "kit fox, Vulpes macrotis", 2: "English setter", 3: "Siberian husky", 4: "Australian terrier", ...... 998: "stole", 999: "carbonara", 1000: "dumbbell"
Run Computational Graph
def load_image(imgfile): return np.float32(PIL.Image.open(imgfile).resize((224,224))) def get_class(image): return labels[str(np.argmax(sess.run([t_output], {t_input:
load_image(image)})))]
print get_class('img/img1.jpg')
leaf beetle, chrysomelid
講師資訊
• Email: ckmarkoh at gmail dot com • Blog: http://cpmarkchang.logdown.com • Github: https://github.com/ckmarkoh
Mark Chang
• Facebook: https://www.facebook.com/ckmarkoh.chang • Slideshare: http://www.slideshare.net/ckmarkohchang • Linkedin:
https://www.linkedin.com/pub/mark-chang/85/25b/847
43