ntc_tensor flow 深度學習快速上手班_part2 -深度學習

TensorFlow深度學習快速上⼿手班��

⼆二、深度學習

By Mark Chang

•  深度學習的原理 •  模型選擇與參數調整 •  多層感知器實作

深度學習的原理

機器學習

監督式學習 Supervised Learning

⾮非監督式學習 Unsupervised Learning

增強式學習 Reinforcement Learning

深度學習 Deep Learning

深度學習 •  ⼀一種機器學習的⽅方法 •  ⽤用電腦模擬⼈人腦神經系統構造 •  讓電腦學會⼈人腦可做的事

神經元與動作電位

http://humanphisiology.wikispaces.com/file/view/neuron.png/216460814/neuron.png

http://upload.wikimedia.org/wikipedia/commons/thumb/4/4a/Action_potential.svg/1037px-Action_potential.svg.png

模擬神經元

n W1

W2

x1

x2

b Wb

y

n

in

= w1x1 + w2x2 + w

b

n

out

=1

1 + e

�nin

nin

nout

y =1

1 + e�(w1x1+w2x2+wb)

nout

= 1

nout

= 0.5

nout

= 0(0,0)

x2

x1

模擬神經元

n

in

= w1x1 + w2x2 + w

b

n

out

=1

1 + e

�nin

n

in

= w1x1 + w2x2 + w

b

n

out

=1

1 + e

�nin

w1x1 + w2x2 + wb = 0

w1x1 + w2x2 + wb > 0

w1x1 + w2x2 + wb < 0

1

0

⼆二元分類：AND Gate

x1 x2 y

0 0 0

0 1 0

1 0 0

1 1 1

(0,0)

(0,1) (1,1)

(1,0)

0

1

n 20

20

b-30

y x1

x2

y =1

1 + e�(20x1+20x2�30)

20x1 + 20x2 � 30 = 0

XOR Gate ?

(0,0)

(0,1) (1,1)

(1,0)

0

0

1

x1 x2 y

0 0 0

0 1 1

1 0 1

1 1 0

⼆二元分類：XOR Gate

n

-20

20

b

-10

y

(0,0)

(0,1) (1,1)

(1,0)

0 1

(0,0)

(0,1) (1,1)

(1,0)

1

0

(0,0)

(0,1) (1,1)

(1,0) 0

0 1

n1 20

20

b-30

x1

x2

n2 20

20

b-10

x1

x2

x1 x2 n1 n2 y

0 0 0 0 0

0 1 0 1 1

1 0 0 1 1

1 1 1 1 0

類神經網路

x

y

n11

n12

n21

n22 W12,y

W12,x

b

W11,y

W11,b W12,b

b

W11,x W21,11

W22,12

W21,12

W22,11

W21,b W22,b

z1

z2

Input Layer

Hidden Layer

Output Layer

視覺認知

http://www.nature.com/neuro/journal/v8/n8/images/nn0805-975-F1.jpg

訓練類神經網路 •  ⽤用隨機值初始化模型參數w •  Forward Propagation – ⽤用⺫⽬目前的模型參數計算出答案

•  計算錯誤量（⽤用Error Function） •  Backward Propagation – ⽤用錯誤量來修正模型

⻑⾧長期記憶

http://www.pnas.org/content/102/49/17846/F7.large.jpg

訓練類神經網路

訓練資料機器學習模型輸出值

正確答案

對答案

如果答錯了，要修正模型

初始化 Forward Propagation

Error Function

Backward Propagation

初始化 •  將所有的W隨機設成-N～N之間的數

•  每層之間W的值都不能相同

x

y

n11

n12

n21

n22 W12,y

W12,x

b

W11,y

W11,b W12,b

b

W11,x W21,11

W22,12

W21,12

W22,11

W21,b W22,b

z1

z2

N =

p6p

Lk�1 + Lk

Lk-1：上一層的大小 Lk ：該層的大小

Forward Propagation

Error Function

J = �(z1log(n21(out)) + (1� z1)log(1� n21(out)))

� (z2log(n22(out)) + (1� z2)log(1� n22(out)))

n21

n22

z1

z2

nout

⇡ 0 and z = 0 ) J ⇡ 0

nout

⇡ 1 and z = 1 ) J ⇡ 0

nout

⇡ 0 and z = 1 ) J ⇡ 1nout

⇡ 1 and z = 0 ) J ⇡ 1

w1 w0

Gradient Descent

w21,11 w21,11 � ⌘@J

@w21,11

w21,12 w21,12 � ⌘@J

@w21,12

w21,b w21,b � ⌘@J

@w21,b

w22,11 w21,11 � ⌘@J

@w22,11

w22,12 w21,12 � ⌘@J

@w22,12

w22,b w21,b � ⌘@J

@w22,b

w11,x w11,x � ⌘@J

@w11,x

w11,y w11,y � ⌘@J

@w11,y

w11,b w11,b � ⌘@J

@w11,b

w12,x w12,x � ⌘@J

@w12,x

w12,y w12,y � ⌘@J

@w12,y

w12,b w12,b � ⌘@J

@w12,b

(–@J

@w0, –

@J

@w1)


@J

@n21(out)

@n21(out)

@n21(in)

�21(out)

@J

@w21,11=

@n21(in)

@w21,11

=@n21(out)

@n21(in)

@n21(in)

@w21,11

n11(out)

�21(in)@n21(in)

@w21,11

�21(in)

=

=

n11(out)�21(in)

w21,11 w21,11 � ⌘@J

@w21,11

w21,11 w21,11 � ⌘


w11,x w11,x � ⌘@J

@w11,x

w11,x w11,x � ⌘ �11(in) x

Backward Propagation �11(in) =

@J

@n11(in)=

@J

@n21(out)

@n21(out)

@n11(in)+

@J

@n22(out)

@n22(out)

@n11(in)

= (�21(in)w21,11 + �22(in)w22,11)@n11(out)

@n11(in)

=@J

@n21(out)

@n21(out)

@n21(in)

@n21(in)

@n11(out)

@n11(out)

@n11(in)+

@J2@n22(out)

@n22(out)

@n22(in)

@n22(in)

@n11(out)

@n11(out)

@n11(in)

= (@J

@n21(out)

@n21(out)

@n21(in)

@n21(in)

@n11(out)+

@J2@n22(out)

@n22(out)

@n22(in)

@n22(in)

@n11(out))@n11(out)

@n11(in)


http://cpmarkchang.logdown.com/posts/277349-neural-network-backward-propagation

模型選擇與參數調整

模型種類 •  ⾮非線性轉換

Sigmoid:

n W1

W2

x1

x2

b Wb

nin = w1x1 + w2x2 + wb

nout

=1

1 + e�nin

nout

1� e�2nin

1 + e�2nin

tanh:

ReLU: ⇢nin if nin > 0

0 otherwise

模型種類 •  Hidden Layer

較小的Hidden Layer 較大的Hidden Layer

多層Hidden Layer 單層Hidden Layer

模型複雜度 •  模型中的參數個數（weight和bias的個數）

模型複雜度低高

訓練不⾜足與過度訓練

Tensorflow Playground http://playground.tensorflow.org/

資料分佈

訓練適度訓練不足訓練過度

訓練不⾜足（Underfitting） •  原因： –  Learning Rate 太⼤大或太

⼩小 –  訓練時間太短 –  模型複雜度不夠

t

過度訓練（Overfitting） •  原因： – 雜訊太多 – 訓練資料太少 – 訓練時間太⻑⾧長 – 模型複雜度太⾼高

t

驗證資料（Validation Data）

訓練資料

模型 1

測試資料最後結果

資料集

驗證資料模型選擇參數選擇時間控制

模型 2

……

交叉驗證（Cross Validation）

訓練資料驗證資料

訓練資料

訓練資料

驗證資料

驗證資料

第一回

第二回

第N回

……

解決⽅方式 •  訓練不⾜足 – 調整Learning Rate – 增加訓練時間 – 增加模型複雜度

•  訓練過度 – 增加訓練資料 – 減少雜訊 – 減少訓練時間 – 減少模型複雜度

調整Learning Rate •  調整Learning Rate數值

Learning Rate 適中

Learning Rate 過小

Learning Rate 過大

調整Learning Rate •  動態調整Learning Rate： – AdagradOptimizer – RMSPropOptimizer – ……

調整訓練時間 •  Early Stop

Validation Loss Training Loss

停止訓練

t

調整模型複雜度 •  調整Hidden Layer的寬度或層數 •  Regularization •  Dropout

Hidden Layer寬度

0

0.05

0.1

0.15

0.2

0.25

0.3

0.35

0.4

0.45

0.5

0 1 2 3 4 5 6 7 8 9

Validation Loss Training Loss

最適寬度


Loss

寬度

Regularization •  將weights的平⽅方和加到cost function中 •  可使weights的絕對值不要變得太⼤大 •  可降低模型複雜度

J = cross entropy + �

X

i,j

w

2i,j

Cost Function:

λ越大，則模型複雜度越低

Regularization 最適λ值


0

0.05

0.1

0.15

0.2

0.25

0.3

0.35

0.4

0.45

0.5

0.01 0.1 1 10

Validation Loss Training Loss Lo

ss

λ

Dropout •  訓練時，隨機將Hidden Layer的神經元拿掉 •  可降低模型複雜度 •  ex: 25%的Dropout Rate

x

(2)1

x

(2)2

x

(1)1

x

(1)2

x

(2)1

x

(2)2

x

(1)1

x

(1)2

y(1)

y(2)

y(1)

y(2)

Dropout •  測試時，⽤用所有的神經元來測試。 – 將所有的weight乘上 (1 – dropout_rate)

x

(2)1

x

(2)2

y(1)

w w(1� dropout rate)

Dropout

0

0.05

0.1

0.15

0.2

0.25

0.3

0.35

0.4

0.45

0.5

0 0.2 0.4 0.6 0.8 1

Validation Error Training Error

最適dropout rate

1- dropout rate

Erro

r


模型選擇與參數調整實作 •  Tensorflow Playground – http://playground.tensorflow.org/

模型選擇與參數調整實作 •  訓練不⾜足（UnderFitting）

模型選擇與參數調整實作 •  過度訓練（OverFitting）

多層感知器實作

多層感知器實作 https://github.com/ckmarkoh/ntc_deeplearning_tensorflow/blob/master/sec2/multilayer_perceptron.ipynb

MNIST •  數字識別 •  多元分類：0~9

https://www.tensorflow.org/versions/r0.7/images/MNIST.png

模型 •  多層感知器

Input Layer Size:784

Hidden Layer Size:200

Output Layer Size:10

Computational Graph x_ = tf.placeholder(tf.float32, [None, 784], name="x_") y_ = tf.placeholder(tf.float32, [None, 10], name="y_") # input -> Hidden W1 = tf.Variable(tf.truncated_normal([784,200], stddev=0.1), name="W1") b1 = tf.Variable(tf.zeros([200]), name="b1") h1 = tf.nn.sigmoid(tf.matmul(x_, W1) + b1) # Hidden -> Output W2 = tf.Variable(tf.truncated_normal([200,10], stddev=0.1), name="W2") b2 = tf.Variable(tf.zeros([10]), name="b2") y = tf.nn.softmax(tf.matmul(h1, W2) + b2) cross_entropy = -tf.reduce_sum(y_ * tf.log(y)) optimizer = tf.train.GradientDescentOptimizer(0.01) trainer = optimizer.minimize(cross_entropy) init = tf.initialize_all_variables()

Layer 1 W1 = tf.Variable(tf.truncated_normal([784,200], stddev=0.1), name="W1”)

0

1000

2000

3000

4000

5000

6000

7000

-0.2 0.2 0

Layer 1 W1 = tf.Variable(tf.truncated_normal([784,200], stddev=0.1), name="W1") b1 = tf.Variable(tf.zeros([200]), name="b1") h1 = tf.nn.sigmoid(tf.matmul(x_, W1) + b1)

W1 x

b1

h1 n

784

n

200

200

200

784

× + =

Layer 2

w b

h1 n 10

10

200

200 × + =

y

10

n

W2 = tf.Variable(tf.truncated_normal([200,10], stddev=0.1), name="W2") b2 = tf.Variable(tf.zeros([10]), name="b2") y = tf.nn.softmax(tf.matmul(h1, W2) + b2)

Regularization lambda_ = tf.placeholder(tf.float32, name="lambda") regularizer = tf.reduce_sum(tf.square(W1))

+tf.reduce_sum(tf.square(W2)) cost = cross_entropy + lambda_*regularizer

J = cross entropy + �

X

i,j

w

2i,j

Cost Function:

Regularization https://github.com/ckmarkoh/ntc_deeplearning_tensorflow/blob/master/sec2/regularization.ipynb

dropout keep_prob = tf.placeholder(tf.float32, name="keep_prob") h1_drop = tf.nn.dropout(h1, keep_prob) y = tf.nn.softmax(tf.matmul(h1_drop, W2) + b2)

1

0

1

0

Dropout Mask

h1 h1,drop =h1

keep prob

dropout https://github.com/ckmarkoh/ntc_deeplearning_tensorflow/blob/master/sec2/dropout.ipynb

模型儲存與載⼊入 •  儲存模型參數

•  載⼊入模型參數

saver = tf.train.Saver(max_to_keep=10) saver.save(sess, "model.ckpt")

saver = tf.train.Saver() saver.restore(sess, "model.ckpt")

講師資訊

•  Email: ckmarkoh at gmail dot com •  Blog: http://cpmarkchang.logdown.com •  Github: https://github.com/ckmarkoh

Mark Chang

•  Facebook: https://www.facebook.com/ckmarkoh.chang •  Slideshare: http://www.slideshare.net/ckmarkohchang •  Linkedin:

https://www.linkedin.com/pub/mark-chang/85/25b/847

66

ntc_tensor flow 深度學習快速上手班_part2 -深度學習

Technology