[tf2017] day4 jwkang_pub

48
누구나 TensorFlow! J. Kang Ph.D. 누구나 TensorFlow - Module 4 : Machine Learning with Neural networks Jaewook Kang, Ph.D. [email protected] Soundlly Inc. Sep. 2017 1 © 2017 Jaewook Kang All Rights Reserved

Upload: jaewook-kang

Post on 16-Mar-2018

303 views

Category:

Engineering


0 download

TRANSCRIPT

Page 1: [Tf2017] day4 jwkang_pub

누구나 TensorFlow!J. Kang Ph.D.

누구나 TensorFlow- Module 4 : Machine Learning with Neural networks

Jaewook Kang, [email protected]

Soundlly Inc.

Sep. 2017

1

© 2017Jaewook KangAll Rights Reserved

Page 2: [Tf2017] day4 jwkang_pub

누구나 TensorFlow!J. Kang Ph.D.

GIST EEC Ph.D. (2015)

신호처리과학자, 삽질러

좋아하는것:

통계적신호처리 / 무선통신신호처리

임베디드오디오 DSP C/C++라이브러리구현

머신러닝기반오디오신호처리알고리즘

배워서남주기

2

대표논문:Jaewook Kang, et al., "Bayesian Hypothesis Test using Nonparametric Belief Propagation for Noisy Sparse Recovery," IEEE Trans. on Signal process., Feb. 2015

Jaewook Kang et al., "Fast Signal Separation of 2D Sparse Mixture via Approximate Message-Passing," IEEE Signal Processing Letters, Nov. 2015

Jaewook Kang (강재욱)

소개

Page 3: [Tf2017] day4 jwkang_pub

누구나 TensorFlow!J. Kang Ph.D. 3

일 정 목표시간

세부 내 용

Module3

직선으로데이터구분하기Logistic classification

- Introduction to Linear Classification- Naïve Bayes (NB)- Linear Discriminent Analysis (LDA)- Logistic Regression (LR)- NB vs LDA vs LR- LAB5: Linear Classication in TensorFlow

Module

4딥러닝의선조뉴럴네트워크

4

- 뉴런을 수학으로 표현하기- Feed-Forward Neural Networks- Linear 뉴런의 한계와 Activation 함수- Gradient descent Revisit- Backpropagation algorithm- LAB6: Multi-layer neural net with Backpropagation in

TensorFlow

Page 4: [Tf2017] day4 jwkang_pub

누구나 TensorFlow!J. Kang Ph.D.

GitHub link

GitHub link (all public)– https://github.com/jwkanggist/EveryBodyTensorFlow

Another GitHub link (Not mine)– https://github.com/aymericdamien/TensorFlow-Examples

4

Page 5: [Tf2017] day4 jwkang_pub

누구나 TensorFlow!J. Kang Ph.D.

1. 딥러닝의 조상, 뉴럴 네트워크

딥러닝을 위해서 한우물을 판 연구자들의 이야기

- 뉴런을 수학으로 표현하기- Feed-Forward Neural Networks- Linear 뉴런의 한계와 Activation 함수- Gradient descent Revisit- Backpropagation algorithm

- LAB6: 2-layer neural net in TensorFlow

5

Page 6: [Tf2017] day4 jwkang_pub

누구나 TensorFlow!J. Kang Ph.D.

Reference :

6

Fundamental of Deep Learning

1st Edition, 2017 O’Reilly

Nikhil Buduma

Page 7: [Tf2017] day4 jwkang_pub

누구나 TensorFlow!J. Kang Ph.D.

훌륭한관련한국어블로그진섭님블로그

– https://mathemedicine.github.io/deep_learning.html

솔라리스의인공지능연구실– http://solarisailab.com/archives/1206

테리님의블로그– http://slownews.kr/41461

7

Page 8: [Tf2017] day4 jwkang_pub

누구나 TensorFlow!J. Kang Ph.D.

The Neuron

뇌의가장기본단위

– 10,000 개이상의뉴런의결합으로뇌가형성

8

이미지출처: http://ib.bioninja.com.au/standard-level/topic-6-human-physiology/65-neurons-and-synapses/neurons.html

Page 9: [Tf2017] day4 jwkang_pub

누구나 TensorFlow!J. Kang Ph.D.

The Neuron

뇌의가장기본단위

9

신호입력 증폭 결합 전환 신호출력

이미지출처: http://ib.bioninja.com.au/standard-level/topic-6-human-physiology/65-neurons-and-synapses/neurons.html

Page 10: [Tf2017] day4 jwkang_pub

누구나 TensorFlow!J. Kang Ph.D.

The Neuron

Artificial Neuron (1958)

10

신호입력 증폭 결합 전환 신호출력

Bias, b

이미치출처: https://hackernoon.com/overview-of-artificial-neural-networks-and-its-applications-2525c1addff7

Page 11: [Tf2017] day4 jwkang_pub

누구나 TensorFlow!J. Kang Ph.D.

The Neuron

Artificial Neuron (1958)

11

신호입력 증폭 결합 전환 신호출력

x2

x3 w3

w2

w1

f (×) y

Bias, b

x1

이미치출처: https://hackernoon.com/overview-of-artificial-neural-networks-and-its-applications-2525c1addff7

Page 12: [Tf2017] day4 jwkang_pub

누구나 TensorFlow!J. Kang Ph.D.

The Neuron

Artificial Neuron (1958)

12

y = f (Z = XW +b)

Page 13: [Tf2017] day4 jwkang_pub

누구나 TensorFlow!J. Kang Ph.D.

The Neuron

Artificial Neuron (1958)

13

y = f (Z = XW +b)

Activation

Activationfunction

LogitInput Neuron

weight

Neuronbias

Page 14: [Tf2017] day4 jwkang_pub

누구나 TensorFlow!J. Kang Ph.D.

The Neuron

뉴런을러닝한다는것

14

x1x2

y

Bias, b

w2w1

Page 15: [Tf2017] day4 jwkang_pub

누구나 TensorFlow!J. Kang Ph.D.

The Neuron

뉴런을러닝한다는것– 밀당 example: 연애성공하려면밀당의비율을어떻케해야하는가?

– Y : 성공확률

– X: 각행동에드는힘

– W: 성공을위한행동비율

15당

x1x2

y

Bias, b

w2w1

Page 16: [Tf2017] day4 jwkang_pub

누구나 TensorFlow!J. Kang Ph.D.

The Neuron

뉴런을러닝한다는것– 밀당 example: 연애성공하려면밀당의비율을어떻케해야하는가?

– Linear activation 함수를가정해보자 y= z = f(z)

– Data: t= 1.0, x1= 2.0, x2 = 3.0

– Cost: e = ½ ( t – y)^2, b=0

– Find w1 and w2

16밀 당

x1x2

y

Bias, b

w2w1

Page 17: [Tf2017] day4 jwkang_pub

누구나 TensorFlow!J. Kang Ph.D.

The Neuron

뉴런을러닝한다는것– 밀당 example: 연애성공하려면밀당의비율을어떻케해야하는가?

– Linear activation 함수를가정해보자 y= z = f(z)

– Data: t= 1.0, x1= 2.0, x2 = 3.0

– Cost: e = ½ ( t – y)^2, b=0

– Find w1 and w2

What’s your answer?

17

Model : y = w1x1 +w2x2

Cost: e =1

2(t - y)2

¶e

¶w1

= -x1(t - y),¶e

¶w2

= -x2 (t - y)

-x1(t -w1x1 -w2x2 ) = 0

-x2 (t -w1x1 -w2x2 ) = 0

ìíî

(w1,w2 ) = ?

Page 18: [Tf2017] day4 jwkang_pub

누구나 TensorFlow!J. Kang Ph.D.

The Neuron

뉴런을러닝한다는것– (X,Y) 데이터값을 주어서 W,b 값을 찾는것

– 각 입력에 어느정도에 비중을 주어서 결합해야하는지아는것

18밀 당

x1x2

y

Bias, b

? ?

Page 19: [Tf2017] day4 jwkang_pub

누구나 TensorFlow!J. Kang Ph.D.

Activation Functions

자극(logit, Z) 의 Activation를어떻케모델링할까?

19

x1x2

y

Bias, b

w2w1

?

Page 20: [Tf2017] day4 jwkang_pub

누구나 TensorFlow!J. Kang Ph.D.

Activation Functions

Sigmoid function– Logit Z를 [0,1]사이로 mapping

– Logit Z를확률값으로 mapping할때사용• Logistic Regression

20

f (z) =1

1+ exp(-z)

Logit Z

Page 21: [Tf2017] day4 jwkang_pub

누구나 TensorFlow!J. Kang Ph.D.

Activation Functions

Tanh– Logit Z를 [-1,+1]사이로 mapping

– Activation의중심값이 ‘0’이된다.• Multi-layer를쌓을때 hidden layer에서 bias가생기지않는다.

21

f (z) = tanh(z)

Logit Z

f (z) = tanh(z)

Page 22: [Tf2017] day4 jwkang_pub

누구나 TensorFlow!J. Kang Ph.D.

Activation Functions

ReLU (Restricted Linear unit)– Sigmoid, tanh 함수는입력값이양끝에근접하면기울기가 ‘0’에

가까워짐 Vanishing Gradient문제 (TBU)

22

f (z) = max(0,z)

Logit Z

f (z) = max(0,z)

Page 23: [Tf2017] day4 jwkang_pub

누구나 TensorFlow!J. Kang Ph.D.

Feed-Forward Neural Networks그런뉴런을엮어서쌓아보자

– 사람의뇌또한계층구조를가지고있다.

23

X = [x1,x2,x3,x4 ]

Y = [y1,y2,y3,y4 ]

규칙:- No connection in the same layer- No backward connection

수식모델링:W1

W2

Page 24: [Tf2017] day4 jwkang_pub

누구나 TensorFlow!J. Kang Ph.D.

Feed-Forward Neural Networks그런뉴런을엮어서쌓아보자

– 사람의뇌또한계층구조를가지고있다.

24

X = [x1,x2,x3,x4 ]

Y = [y1,y2,y3,y4 ]

규칙:- No connection in the same layer- No backward connection

수식모델링:W1

W2

Y = f (W2 f (W1X +b1)+b2 )

Page 25: [Tf2017] day4 jwkang_pub

누구나 TensorFlow!J. Kang Ph.D.

Feed-Forward Neural Networks그런뉴런을엮어서쌓아보자

– 사람의뇌또한계층구조를가지고있다.

25

Y = [y1,y2,y3,y4 ]

Input Layer:- 데이터입력 X을받는계층- tf.placeholder()가물리는곳

W1

W2

Page 26: [Tf2017] day4 jwkang_pub

누구나 TensorFlow!J. Kang Ph.D.

Feed-Forward Neural Networks그런뉴런을엮어서쌓아보자

– 사람의뇌또한계층구조를가지고있다.

26

X = [x1,x2,x3,x4 ]Output Layer:- 데이터출력 Y을내보내는곳- tf.placeholder()가물리는곳

W1

W2

Page 27: [Tf2017] day4 jwkang_pub

누구나 TensorFlow!J. Kang Ph.D.

Feed-Forward Neural Networks그런뉴런을엮어서쌓아보자

– 사람의뇌또한계층구조를가지고있다.

27

X = [x1,x2,x3,x4 ]

Y = [y1,y2,y3,y4 ]

Hidden Layer:- Input layer와 output layer

사이에있는모든계층- X로부터학습에필요한 feature를스스로뽑아낸다.

- 중간표현단계인 feature map을생성

- Hidden layer가많을수록섬세하게Feature를뽑을수있다.

W1

W2

Page 28: [Tf2017] day4 jwkang_pub

누구나 TensorFlow!J. Kang Ph.D.

Feed-Forward Neural Networks

Google’s good example site– http://playground.tensorflow.org/

– 가지고놀아보면더이해가잘될것!• Logistic regression (1-layer neural net classification)

• Neural Net

28

Page 29: [Tf2017] day4 jwkang_pub

누구나 TensorFlow!J. Kang Ph.D.

How to Train Neural Net?어떻케 Neural net를훈련시킬것인가?

– 기존방법1 : Maximum likelihood est. + analytical solution• In many cases, No analytical solution exist

• Non-linearity of activation function No closed form solution

– 기존방법2: Maximum likelihood est. + Numerical solver• An Example: Logistic regression based classification

– Cost: Cross-entropy function (non-linear)

– Solver:

» Gradient descent solvers: cost의큰경사를따라서무족건내려가는것 (first-order method)

» Newton-Raphson solvers: cost의경사가 ’0’ 인지점을찾는것 (second order method, good for convex problems)

29

Page 30: [Tf2017] day4 jwkang_pub

누구나 TensorFlow!J. Kang Ph.D.

Gradient Descent Revisit

Gradient Descent를다시보자

30

Error

W n+1 =W n -aÑJ(W n )

J(W ) : Error cost

Page 31: [Tf2017] day4 jwkang_pub

누구나 TensorFlow!J. Kang Ph.D.

Gradient Descent Revisit

Gradient Descent를다시보자

두가지만기억하세요!!

– 기울기방향찾기: The delta rule

– 기울기보폭찾기: learning rate

31

W n+1 =W n -aÑJ(W n )J(W ) : Error cost기울기보폭

Learning rate 기울기방향Gradient

Page 32: [Tf2017] day4 jwkang_pub

누구나 TensorFlow!J. Kang Ph.D.

Gradient Descent Revisit

Gradient Descent를다시보자– 기울기방향찾기: The delta rule

– W의각성분방향으로얼마나가야하는가?• 각 weight로 error cost 편미분한다.

• sum-of-square cost + linear activation인경우

32

-ÑJ(W ) = [Dw1,Dw2,...,DwM ]

Dwk = -¶J(W )

¶wk= -

¶wk

1

2(t (i ) - y(i) )2

i

åæ

èçö

ø÷

= xk(i )(t (i ) - y(i ) )

i

å

Page 33: [Tf2017] day4 jwkang_pub

누구나 TensorFlow!J. Kang Ph.D.

Gradient Descent Revisit

어떻케 Neural net를훈련시킬것인가?– 기울기방향찾기: The delta rule

– W의각성분방향으로얼마나가야하는가?• 각 weight로 error cost 편미분한다.

• cross-entropy cost + sigmoid activation인경우

33

-ÑJ(W ) = [Dw1,Dw2,...,DwM ]

Page 34: [Tf2017] day4 jwkang_pub

누구나 TensorFlow!J. Kang Ph.D.

Gradient Descent Revisit

어떻케 Neural net를훈련시킬것인가?– 기울기보폭찾기: learning rate

• 너무크면발산

• 너무작으면평생걸림 + 연산량증가

34

Page 35: [Tf2017] day4 jwkang_pub

누구나 TensorFlow!J. Kang Ph.D.

Neural Net의 training기존 ML est. + Gradient Descent의한계

– Hidden Layer 수가늘어남에따라서학습해야할파라메터 W의차원이매우늘어난다.

– “ML est + numerical solvers” 조합으로학습하기에는 unknown파라미터(W) 의개수가너무많다.

• 복잡도가매우늘어난다.

35

Neural Networks Deep Neural Networks

Input Hidden Output Input Hidden Hidden Hidden Output

W1W2 W3 W4W1 W2

Page 36: [Tf2017] day4 jwkang_pub

누구나 TensorFlow!J. Kang Ph.D.

Error Back Propagation기본철학:

– 이전Layer의 error derivative를전파하여현재Layer의error derivative를계산한다.

36

Page 37: [Tf2017] day4 jwkang_pub

누구나 TensorFlow!J. Kang Ph.D.

Error Back Propagation기본알고리즘

– STEP1) Initialization of all weights

– STEP2) Forward Propagation: Activation 예측값 y 계산 from input X

– STEP3) Error Back Propagation: Error derivative로부터 weight변화율(Δ𝑤) 계산

– STEP4) Update all weights and go to STEP2

37이미지출처: Bishop’s book Chap 5

Page 38: [Tf2017] day4 jwkang_pub

누구나 TensorFlow!J. Kang Ph.D.

Error Back Propagation Toy example: A two-layer small neural network

STEP 1) Initialization of all weights

– Cross-entropy cost

– Sigmoid activation

38

x1

x2

t1

t2

z11 y11

z12 y12

z21 y21

z22 y22

w11

w12

w13

w14

w21

w22

w23

w24

InputLayer

HiddenLayer

OutputLayer

b b

W1 W2

f (z) =1

1+ exp(-z)

For sigmoid activation, y = f (z)

¶y

dz= y(1- y)

Page 39: [Tf2017] day4 jwkang_pub

누구나 TensorFlow!J. Kang Ph.D.

Error Back Propagation Toy example: A two-layer small neural network

STEP 1) Initialization of all weights

– In a random manner:

39

x1

x2

t1

t2

z11 y11

z12 y12

z21 y21

z22 y22

w11

w12

w13

w14

w21

w22

w23

w24

InputLayer

HiddenLayer

OutputLayer

Init. of weights

W1 =w11 w12

w13 w14

é

ë

êê

ù

û

úú

=0.15 0.20

0.25 0.30

é

ëê

ù

ûú

W2 =w21 w22

w23 w24

é

ë

êê

ù

û

úú

=0.40 0.45

0.50 0.55

é

ëê

ù

ûú

b b

Training Data:

X =0.05

0.10

é

ëê

ù

ûú,T =

0.01

0.99

é

ëê

ù

ûú

Bias:

b1=0.35

0.35

é

ëê

ù

ûú,b1=

0.60

0.60

é

ëê

ù

ûú

W1 W2

Page 40: [Tf2017] day4 jwkang_pub

누구나 TensorFlow!J. Kang Ph.D.

Error Back Propagation Toy example: A two-layer small neural network

STEP 2) Forward Propagation

40

x1

x2

t1

t2

z11 y11

z12 y12

z21 y21

z22 y22

w11

w12

w13

w14

w21

w22

w23

w24

InputLayer

HiddenLayer

OutputLayerW1 W2

Y1 = f Z1 =W1X( ) = f0.15 0.20

0.25 0.30

é

ëê

ù

ûú

0.05

0.10

é

ëê

ù

ûú+

0.35

0.35

é

ëê

ù

ûú

æ

èç

ö

ø÷ =

0.5933

0.5969

é

ëê

ù

ûú

Y2 = f Z2 =W2Y1( ) = f0.40 0.45

0.50 0.55

é

ëê

ù

ûú

0.5933

0.5969

é

ëê

ù

ûú+

0.6

0.6

é

ëê

ù

ûú

æ

èç

ö

ø÷ =

0.7514

0.7729

é

ëê

ù

ûú

b b

Page 41: [Tf2017] day4 jwkang_pub

누구나 TensorFlow!J. Kang Ph.D.

Error Back Propagation Toy example: A two-layer small neural network

STEP 3) Error Back Propagation for W2

– 3-1: calculate error derivative wrt y21, y22

41

x1

x2

t1

t2

z11 y11

z12 y12

z21 y21

z22 y22

w11

w12

w13

w14

w21

w22

w23

w24

InputLayer

HiddenLayer

OutputLayerW1 W2

b b

¶J(W2 )

¶y21

=y21 - t1

y21(1- y21),¶J(W2 )

¶y22

=y22 - t2

y22 (1- y22 )

Page 42: [Tf2017] day4 jwkang_pub

누구나 TensorFlow!J. Kang Ph.D.

Error Back Propagation Toy example: A two-layer small neural network

STEP 3) Error Back Propagation for W2

– 3-2: calculate error derivative wrt W2

42

x1

x2

t1

t2

z11 y11

z12 y12

z21 y21

z22 y22

w11

w12

w13

w14

w21

w22

w23

w24

InputLayer

HiddenLayer

OutputLayerW1

b b

Dw21 = -a¶J(W2 )

¶y21

¶y21

dz21

dz21

¶w21

= a (t1 - y21)y11,

Dw22 = -a¶J(W2 )

¶y21

¶y21

dz21

dz21

¶w22

= a (t1 - y21)y12 ,

Dw23 = -a¶J(W2 )

¶y22

¶y22

dz22

dz22

¶w23

= a (t2 - y22 )y11,

Dw24 = -a¶J(W2 )

¶y22

¶y22

dz22

dz22

¶w24

= a (t2 - y22 )y12

Page 43: [Tf2017] day4 jwkang_pub

누구나 TensorFlow!J. Kang Ph.D.

Error Back Propagation Toy example: A two-layer small neural network

STEP 3) Error Back Propagation for W1 (Important!!)

– 3-3: calculate error derivative wrt y11, y12

43

x1

x2

t1

t2

z11 y11

z12 y12

z21 y21

z22 y22

w11

w12

w13

w14

w21

w22

w23

w24

InputLayer

HiddenLayer

OutputLayerW1 W2

b b

¶J(W1)

¶y11

= w21

¶J(W2 )

¶y21

¶y21

dz21

+w23

¶J(W2 )

¶y22

¶y22

dz22

¶J(W1)

¶y12

= w22

¶J(W2 )

¶y21

¶y21

dz21

+w24

¶J(W2 )

¶y22

¶y22

dz22

Page 44: [Tf2017] day4 jwkang_pub

누구나 TensorFlow!J. Kang Ph.D.

Error Back Propagation Toy example: A two-layer small neural network

STEP 3) Error Back Propagation for W1 (Important!!)

– 3-3: calculate error derivative wrt y11, y12

44

x1

x2

t1

t2

z11 y11

z12 y12

z21 y21

z22 y22

w11

w12

w13

w14

w21

w22

w23

w24

InputLayer

HiddenLayer

OutputLayerW1 W2

b b

¶J(W1)

¶y11

= w21

¶J(W2 )

¶y21

y21(1- y21)+w23

¶J(W2 )

¶y22

y22(1- y22 )

¶J(W1)

¶y12

= w22

¶J(W2 )

¶y21

y21(1- y21)+w24

¶J(W2 )

¶y22

y22(1- y22 )

Page 45: [Tf2017] day4 jwkang_pub

누구나 TensorFlow!J. Kang Ph.D.

Error Back Propagation Toy example: A two-layer small neural network

STEP 3) Error Back Propagation for W1

– 3-4: calculate error derivative wrt W1

45

x1

x2

t1

t2

z11 y11

z12 y12

z21 y21

z22 y22

w11

w12

w13

w14

w21

w22

w23

w24

InputLayer

HiddenLayer

OutputLayerW2

b b

Dw11 = -a¶J(W1)

¶y11

¶y11

dz11

dz11

¶w11

= -a¶J(W1)

¶y11

y11(1- y11)x1 ,

Dw12 = -a¶J(W1)

¶y11

¶y11

dz11

dz11

¶w12

= -a¶J(W1)

¶y11

y11(1- y11)x2 ,

Dw13 = -a¶J(W1)

¶y12

¶y12

dz12

dz12

¶w13

= -a¶J(W1)

¶y12

y12(1- y12 )x1 ,

Dw14 = -a¶J(W1)

¶y12

¶y12

dz12

dz12

¶w14

= -a¶J(W1)

¶y12

y12(1- y12 )x2

Page 46: [Tf2017] day4 jwkang_pub

누구나 TensorFlow!J. Kang Ph.D.

Error Back Propagation Toy example: A two-layer small neural network

STEP 4) update all the weights and goto STEP 2

Iterate forward propagation and error back propagation

46

x1

x2

t1

t2

z11 y11

z12 y12

z21 y21

z22 y22

w11

w12

w13

w14

w21

w22

w23

w24

InputLayer

HiddenLayer

OutputLayerW1 W2

b b

Page 47: [Tf2017] day4 jwkang_pub

누구나 TensorFlow!J. Kang Ph.D.

LAB6: Multi-layer neural net in TensorFlow

Cluster in Cluster data https://github.com/jwkanggist/EveryBodyTensorFlow/blo

b/master/lab6_runTFMultiANN_clusterinclusterdata.py

47

Page 48: [Tf2017] day4 jwkang_pub

누구나 TensorFlow!J. Kang Ph.D.

LAB6: Multi-layer neural net in TensorFlow

Two spiral data– https://github.com/jwkanggist/EveryBodyTensorFlow/blob

/master/lab6_runTFMultiANN_spiraldata.py

48