[tf2017] day3 jwkang_pub

32
누구나 TensorFlow! J. Kang Ph.D. 누구나 TensorFlow - Module 3 : Machine Learning with Linear Classification Jaewook Kang, Ph.D. [email protected] Soundlly Inc. Sep. 2017 1 © 2017 Jaewook Kang All Rights Reserved

Upload: jaewook-kang

Post on 16-Mar-2018

213 views

Category:

Engineering


2 download

TRANSCRIPT

Page 1: [Tf2017] day3 jwkang_pub

누구나 TensorFlow!J. Kang Ph.D.

누구나 TensorFlow- Module 3 : Machine Learning with Linear Classification

Jaewook Kang, [email protected]

Soundlly Inc.

Sep. 2017

1

© 2017Jaewook KangAll Rights Reserved

Page 2: [Tf2017] day3 jwkang_pub

누구나 TensorFlow!J. Kang Ph.D.

GIST EEC Ph.D. (2015)

신호처리과학자, 삽질러

좋아하는것:

통계적신호처리 / 무선통신신호처리

임베디드오디오 DSP C/C++라이브러리구현

머신러닝기반오디오신호처리알고리즘

배워서남주기

2

대표논문:Jaewook Kang, et al., "Bayesian Hypothesis Test using Nonparametric Belief Propagation for Noisy Sparse Recovery," IEEE Trans. on Signal process., Feb. 2015

Jaewook Kang et al., "Fast Signal Separation of 2D Sparse Mixture via Approximate Message-Passing," IEEE Signal Processing Letters, Nov. 2015

Jaewook Kang (강재욱)

소개

Page 3: [Tf2017] day3 jwkang_pub

누구나 TensorFlow!J. Kang Ph.D.

강의일정

3

일정 목표시간

세부내용

Module3

직선으로데이터구분하기Logistic classification

- Introduction to Linear Classification- Naïve Bayes (NB)- Linear Discriminent Analysis (LDA)- Logistic Regression (LR)- NB vs LDA vs LR- LAB5: Linear Classication in TensorFlow

Module

4딥러닝의선조뉴럴네트워크

4

- 뉴런을 수학으로 표현하기- Feed-Forward Neural Networks- Linear 뉴런의 한계와 Activation 함수- Gradient descent Revisit- Backpropagation algorithm- LAB6: Two-layer neural net with Backpropagation in

TensorFlow

Page 4: [Tf2017] day3 jwkang_pub

누구나 TensorFlow!J. Kang Ph.D.

GitHub link

GitHub link (all public)– https://github.com/jwkanggist/EveryBodyTensorFlow

4

Page 5: [Tf2017] day3 jwkang_pub

누구나 TensorFlow!J. Kang Ph.D.

1. 직선으로 구분하기- Linear classification: 딥러닝이 만능이 아니라구!

- Classification이란무엇인가?- Linear Separable data - Naïve Bayes (NB)- Linear Discriminent Analysis (LDA)- Logistic Regression (LR)- NB vs LDA vs LR

5

Page 6: [Tf2017] day3 jwkang_pub

누구나 TensorFlow!J. Kang Ph.D.

Classification이란무엇인가?

Definition – 주어진데이터 (경험)를통해서컴퓨터에서데이터를구분하는

법을알려주는일!

– Supervised learning !!

6이미지출처: http://amanullahtariq.com/male-and-female-classification/

Page 7: [Tf2017] day3 jwkang_pub

누구나 TensorFlow!J. Kang Ph.D.

Classification vs Regression Classification: label이있는데이터를구분모델찾기

Regression: 데이터의경향을파악하고예측모델찾기!

Clustering: 비슷한데이터들끼리속성구분하기 (Labeling)

7

이미지출처: http://www.csie.ntnu.edu.tw/~u91029/Classification.html

Page 8: [Tf2017] day3 jwkang_pub

누구나 TensorFlow!J. Kang Ph.D.

Feature Space

Feature space가몬가요…..– 시스템의입력데이터 X 가정의되는공간!

– Ex: spam mail이아래와같은 feature를가진다고한다면?

• Feature1: Mail에사용되는단어

• Feature2: Mail 제목길이

• Feature3: Mail에사용된언어

• Feature4: Mail 본문길이

– 입력데이터는위와같은 feature들의조합으로다양하게구성될수있음

– 하지만구성될수있는입력데이터 X의구성은 feature의조합으로제한됨!

– Feature Space는입력데이터 X가정의될수있는제한적인공간을의미함

8

Page 9: [Tf2017] day3 jwkang_pub

누구나 TensorFlow!J. Kang Ph.D.

Linear Separable Data

Feature space에서직선(hyperplane)으로다른종류의label을가지는데이터를구분할수있는경우!

– 예제: 2D Feature space: X1 = [x1,x2]

9

x1x1

x2x2

Page 10: [Tf2017] day3 jwkang_pub

누구나 TensorFlow!J. Kang Ph.D.

Linear Separable Data

저차원 feature Space에서 linear separable하지않아도feature의차원을높이면 linear separable 하게만들수있다.– Ex) Support Vector Machine

10

이미지출처:https://www.researchgate.net/publication/235698156_Support_vector_machine_for_multi-classification_of_mineral_prospectivity_areas/figures?lo=1&utm_source=google&utm_medium=organic

Page 11: [Tf2017] day3 jwkang_pub

누구나 TensorFlow!J. Kang Ph.D.

Binary Class vs. Multi Class

얼마나많은종류로구분이가능한가요?

–과일의종류는엄청많은데…

11

이미지출처: http://www.mdpi.com/1424-8220/12/9/12489

Page 12: [Tf2017] day3 jwkang_pub

누구나 TensorFlow!J. Kang Ph.D.

Binary Class vs. Multi Class

얼마나많은종류로구분이가능한가요?

–동물도…

12

이미지출처: https://www.youtube.com/watch?v=mRidGna-V4E

Page 13: [Tf2017] day3 jwkang_pub

누구나 TensorFlow!J. Kang Ph.D.

Binary Class vs. Multi Class

13

X = [x1,x2 ]

Page 14: [Tf2017] day3 jwkang_pub

누구나 TensorFlow!J. Kang Ph.D.

Some Background Revisit

Linear Basis function Model

Additive Measurement Model

Bayes Rule

14

Page 15: [Tf2017] day3 jwkang_pub

누구나 TensorFlow!J. Kang Ph.D.

Linear Modeling

Linear Basis function model– 시스템의입출력 (X,Y)을선형관계를가진다고가정

Additive Measurement Model– 가산측정오류가특정통계분포를가진다고가정

15

SystemW

Y

SystemW

Y+ T

Error, E

f(X)XBasis function

Page 16: [Tf2017] day3 jwkang_pub

누구나 TensorFlow!J. Kang Ph.D.

Linear Modeling

Linear Basis function model

Measurement Model

16

Y =Y (X,W ) =Wf(X)+ B

T =Y (X,W )+E

Page 17: [Tf2017] day3 jwkang_pub

누구나 TensorFlow!J. Kang Ph.D.

Bayes Rule “사전정보”과 “관찰정보”를확률적으로결합하기위한

규칙

훈련데이터 (X,T)로부터모델 W를구하고싶은경우– Prior: 데이터를보기전에알고있는 W에대한확률정보

– Likelihood: 데이터를관찰하여알게된확률정보

– Posterior: prior와 likelihood를결합한 W에대한확률정보

17

posterior prior likelihood

p(W | X,T ) =p(W )p(X,T |W )

p(X,T )

Page 18: [Tf2017] day3 jwkang_pub

누구나 TensorFlow!J. Kang Ph.D.

Bayes Rule “사전정보”과 “관찰정보”를확률적으로결합하기위한

규칙

예제) 사진으로남자여자구분하는문제– Likelihood:

• 수많은남자/여자의사진데이터

• 머리가긴남자사진데이터

• 수염이있는여자사진데이터

– Prior:• 여자는머리가길고남자는머리가짧을확률이높다.

• 수염이있으면남자일확률이높다.

• 화장을하면여자일확률이높다

• 남자가여자보다인구가많다.

– Posterior:• 위두가지를종합해서남자여자를구분하기위한확률정보

18

Page 19: [Tf2017] day3 jwkang_pub

누구나 TensorFlow!J. Kang Ph.D.

Bayes Rule for Classification데이터셋정의

Class (-conditional) likelihood density

– 데이터가말해주는 Class 에대한확률분포

Class prior density

– 사전정보로알고있는 Class 에대한확률분포

Class posterior density

19

Posterior µ Likelihood ´ Prior

T = {T1,..,TM},X = {X1,...,XM},Y = {Y1,...,YN}

T Î{0,1}

T Î{0,1}

p(T )

p(X |T ,W )

Page 20: [Tf2017] day3 jwkang_pub

누구나 TensorFlow!J. Kang Ph.D.

Bayes Rule for Classification W와데이터 (X,T) 가주어졌을때 classification (Cn)

– Prior: 데이터를보기전에알고있는 Cn에대한확률정보

– Likelihood: 데이터를관찰하여알게된확률정보

– Posterior: prior와 likelihood를결합한 Cn에대한확률정보

20

Class posterior Class prior likelihood

Page 21: [Tf2017] day3 jwkang_pub

누구나 TensorFlow!J. Kang Ph.D.

Linear classifier Feature Space에직선을그어서 class를구분하자!

그직선을구하는방법!

– 방법1) 시스템의입출력관계 T = Y(X,W)+ E 를통계적모델링을하고 Classifier (W)를 ML est으로구한다.• Naïve Bayes Classifier

• Linear Discriminant Classifier

– 방법2) 시스템출력를함수를이용해서확률값[0,1]으로 Mapping 하고, ML est 통해서 Classifier (W)를구한다.• Logistic Regression Classifier

21

Page 22: [Tf2017] day3 jwkang_pub

누구나 TensorFlow!J. Kang Ph.D.

Classifier 1: Naïve Bayes Bayes Rule를그대로적용한 Linear classifier

– X의모든축이독립이라고가정 Likelihood density Modeling

22

Page 23: [Tf2017] day3 jwkang_pub

누구나 TensorFlow!J. Kang Ph.D.

Classifier 1: Naïve Bayes Bayes Rule를그대로적용한 Linear classifier

– X의모든축이독립이라고가정 Likelihood density Modeling

– ML estimation을통해서 W를구한다. likelihood의파라미터를구한다.

– Bay’s Rule을적용하여 Class posterior density 구한다.

– Binary Classifier \gamma 를구한다.

– The classifier choose T = 1 when

23

p(T | X,W ) = p(ti | xi ,W )i

Õ

Page 24: [Tf2017] day3 jwkang_pub

누구나 TensorFlow!J. Kang Ph.D.

Classifier 2: Linear Discriminent Analysis

Naïve Bayes와달리 X의축간의상관성을고려한다.– Feature X를 Gaussian 분포로통계적모델링한다.

– Feature X의축간의상관성을고려한다.

• Feature X의모든축은독립이아니다!

• class likelihood density는 covariance matrix 를포함

• 모든 class에서같은 covariance matrix를공유

• Covariance matrix의정보를이용하여두 class가가장이격되는축을찾고, 그축을기반으로 classifier를찾는다.

24

S

Page 25: [Tf2017] day3 jwkang_pub

누구나 TensorFlow!J. Kang Ph.D.

Classifier 2: Linear Discriminent Analysis Likelihood를구하는과정을제외하고 Naïve Bayes와같음

– X를 correlated Gaussian라고 가정 Likelihood density Modeling

– W를 likelihood 파라미터 ( )로표현할수있다. (복잡ㅜ)

– ML estimation을통해서 W를구한다. ( )을구한다.

– Bay’s Rule을적용하여 Class posterior density 구한다.

– Binary Classifier \gamma 를구한다.– The classifier choose T=1 when

25

p(X |T = k,W ) =1

(2p )M /2

1

S1/2

exp -1

2(X - mk )

T S-1(X - mk )ìíî

üýþ

mk=1,mk=0,S

mk=1,mk=0,S

Page 26: [Tf2017] day3 jwkang_pub

누구나 TensorFlow!J. Kang Ph.D.

LDA VS Naïve Bayes

Naïve Bayes

– 분포의 contour가원형 feature의 축간상관성고려 X

26

X = [x1,x2 ]

x1

x2

Page 27: [Tf2017] day3 jwkang_pub

누구나 TensorFlow!J. Kang Ph.D.

LDA VS Naïve Bayes

LDA

– 분포의 contour가기울진타원 feature 축간의상관성고려 O

– Variance가최소화되는축을찾음

27

X = [x1,x2 ]

x1

x2

Page 28: [Tf2017] day3 jwkang_pub

누구나 TensorFlow!J. Kang Ph.D.

Classifier 3: Logistic Regression

통계적모델링없이 classifier를찾는다.

– T = Y(X,W) + E의관계를통계적으로모델링하기귀찮다어렵다!

– Feature의분포의통계적모델링이어렵다.

– ML추정해야하는파라미터의개수가너무많다

28

Page 29: [Tf2017] day3 jwkang_pub

누구나 TensorFlow!J. Kang Ph.D.

x2

++++++ +++

OOOOOO

O OOOOO

x1

Sigmoidmapping

Y

Classifier 3: Logistic Regression

통계적모델링없이 classifier를찾는다.– Sigmoid 함수를사용하여시스템측정값 “Wx + B”을바로확률

값으로맵핑!

– Sigmoid 함수를여기서 “Activation function”이라고한다.

29

Y =s (Wf(X)+ B) Î[0,1]

Wf(X)+ B s (×)

Wf(X)+ B

X

Page 30: [Tf2017] day3 jwkang_pub

누구나 TensorFlow!J. Kang Ph.D.

Classifier 3: Logistic Regression

Y값을바로 likelihood 확률값으로간주하고 Classifier를구한다.

이기때문에 cost function으로 MSE를사용할수없다!

Probabilistic uncertainty 측정 Cross-entropy사용!

Cross-entropy를최소화하는 W를찾는다.– No Close form

– Numerical solver required! Gradient descent!

30

p(T |Cn,X,W ) =Y =s (Wf(X)+ B)

T Î{0,1}

cost = - t j log y j + (1- t j )log(1- y j ){ }j

N

å

Page 31: [Tf2017] day3 jwkang_pub

누구나 TensorFlow!J. Kang Ph.D.

NB vs LDA vs LR 비교

31

ClassifierName

Pros Cons Assumption

Naïve Bayes

- Simple- Low complexity

- Ignore correlationin features

- All features are independent

Linear Discriminant Analysis

- Providing maximumseparation of classes

- Bayesianoptimality

- Only for Gaussian like data

- Higher complexity as increases feature dimension

- Vulnerable to outlier

- All feature are Gaussian

Logistic Regression

- Simple- Low complexity- Robust to outlier

- External numericalsolver required

- None

Page 32: [Tf2017] day3 jwkang_pub

누구나 TensorFlow!J. Kang Ph.D.

LAB 5 Logistic Regression in Tensorflow

Logisitic regression을 tensorflow로구현하고성능을비교해보자!

– 데이터생성• C0: X = N(mean = [0,0] ,var =1), Y = 0 XOR E

• C1: X = N(mean = [3,3], var =2), Y = 1 XOR E– where E is 1 with p = 0.1

• Total_size =5000

• Traning_size = 4000, Validation_size = 1000

• ML Model = sigmoid( XW + B)

• Optimizaer: GradiantDescent

• Cost : cross-entropy– https://github.com/jwkanggist/EveryBodyTensorFlow/blob/master/lab5_runTFLogisticReg.py

32