chapter 4: artificial neural networks. artificial neural network(ann) general, practical method for...

46
Chapter 4: Artificial Neural Networks

Upload: jason-cox

Post on 18-Jan-2016

241 views

Category:

Documents


12 download

TRANSCRIPT

Page 1: Chapter 4: Artificial Neural Networks. Artificial neural network(ANN)  General, practical method for learning real- valued, discrete-valued, vector-valued

Chapter 4: Artificial Neural Networks

Page 2: Chapter 4: Artificial Neural Networks. Artificial neural network(ANN)  General, practical method for learning real- valued, discrete-valued, vector-valued

Artificial neural network(ANN) General, practical method for

learning real-valued, discrete-valued, vector-valued functions from examples

BACPROPAGATION 알고리즘 Use gradient descent to tune

network parameters to best fit a training set of input-output pairs

ANN learning Training example 의 error 에 강하다 . Interpreting visual scenes,

speech recognition, learning robot control strategy

Page 3: Chapter 4: Artificial Neural Networks. Artificial neural network(ANN)  General, practical method for learning real- valued, discrete-valued, vector-valued

Biological motivation 생물학적인 뉴런과의 유사성

병렬 계산 (parallel computing) 분산 표현 (distributed

representation)

생물학적인 뉴런과의 차이점 처리 단위 ( 뉴런 ) 의 출력

Page 4: Chapter 4: Artificial Neural Networks. Artificial neural network(ANN)  General, practical method for learning real- valued, discrete-valued, vector-valued

ALVINN system

Page 5: Chapter 4: Artificial Neural Networks. Artificial neural network(ANN)  General, practical method for learning real- valued, discrete-valued, vector-valued

신경망 학습에 적합한 문제

학습해야 하는 현상이 여러 가지 속성에 의해 표현되는 경우

출력 결과는 문제에 적당한 종류의 값을 가질 수 있다 .

학습 예제에 에러 (noise) 가 존재할 가능성

긴 학습 시간 학습 결과의 신속한 적용 학습된 결과를 사람이 이해하는 것이

필요없는 경우

Page 6: Chapter 4: Artificial Neural Networks. Artificial neural network(ANN)  General, practical method for learning real- valued, discrete-valued, vector-valued

Perceptrons

vector of real-valued input weights & threshold learning: choosing values for

the weights

Page 7: Chapter 4: Artificial Neural Networks. Artificial neural network(ANN)  General, practical method for learning real- valued, discrete-valued, vector-valued

Perceptron learning 의 hypotheses space

n: input vector 의 차수

}|{ )1( nwwH

Page 8: Chapter 4: Artificial Neural Networks. Artificial neural network(ANN)  General, practical method for learning real- valued, discrete-valued, vector-valued

Perceptron 의 표현력

linearly separable example 에 대한 hyperplane decision surface

many boolean functions(XOR 제외 ) m-of-n function disjunctive normal form: 복수의 unit

Page 9: Chapter 4: Artificial Neural Networks. Artificial neural network(ANN)  General, practical method for learning real- valued, discrete-valued, vector-valued

Perceptron rule

유한번의 학습 후 올바른 가중치를 찾아내려면 충족되어야 할 사항 training example 이 linearly

separable 충분히 작은 learning rate

Page 10: Chapter 4: Artificial Neural Networks. Artificial neural network(ANN)  General, practical method for learning real- valued, discrete-valued, vector-valued

Gradient descent &Delta rule

for non-linearly separable unthresholded od 는 w 에 대한 함수값

Page 11: Chapter 4: Artificial Neural Networks. Artificial neural network(ANN)  General, practical method for learning real- valued, discrete-valued, vector-valued

Hypethesis space

Page 12: Chapter 4: Artificial Neural Networks. Artificial neural network(ANN)  General, practical method for learning real- valued, discrete-valued, vector-valued

Gradient descent

gradient: steepest increase in E

Dd

idddi xotw )(

Page 13: Chapter 4: Artificial Neural Networks. Artificial neural network(ANN)  General, practical method for learning real- valued, discrete-valued, vector-valued
Page 14: Chapter 4: Artificial Neural Networks. Artificial neural network(ANN)  General, practical method for learning real- valued, discrete-valued, vector-valued

Gradient descent(cont’d)

Training example 의 linearly separable 여부에 관계없이 하나의 global minimum 을 찾는다 .

Learning rate 가 큰 경우 overstepping 의 문제 -> learning rate 를 점진적으로 줄이는 방법을 사용하기도 한다 .

Page 15: Chapter 4: Artificial Neural Networks. Artificial neural network(ANN)  General, practical method for learning real- valued, discrete-valued, vector-valued

Stochastic approximation to gradient descent

Gradient descent 가 사용되기 위해 hypothesis space is

continuously parameterized error 가 hypothesis parameter 에

의해 미분 가능해야 한다 . Gradient descent 의 단점

시간이 오래 걸린다 . 다수의 local minima 가 존재하는

경우

Page 16: Chapter 4: Artificial Neural Networks. Artificial neural network(ANN)  General, practical method for learning real- valued, discrete-valued, vector-valued

Stochasticapproximation togradient descent(cont’d) 하나의 training example 을

적용해서 E 를 구하고 바로 weight 를 갱신한다 .

실제의 descent gradient 를 추측 보다 낮은 learning rate 를 사용 multiple local minima 를 피할

가능성이 있다 . Delta rule

ii xotw )(

Page 17: Chapter 4: Artificial Neural Networks. Artificial neural network(ANN)  General, practical method for learning real- valued, discrete-valued, vector-valued

Remark Perceptron rule

thresholded output 정확한 weight linearly separable

Delta rule unthresholded output 점근적으로 에러를 최소화하는 weight non-linearly separable

Page 18: Chapter 4: Artificial Neural Networks. Artificial neural network(ANN)  General, practical method for learning real- valued, discrete-valued, vector-valued

Multilayer networks

Nonlinear decision surface

Page 19: Chapter 4: Artificial Neural Networks. Artificial neural network(ANN)  General, practical method for learning real- valued, discrete-valued, vector-valued

Differential threshold unit

Sigmoid function nonlinear, differentiable

Page 20: Chapter 4: Artificial Neural Networks. Artificial neural network(ANN)  General, practical method for learning real- valued, discrete-valued, vector-valued

x21

x2

x3

w23

w22

w21

w12

w22

w32

net1

net2

net3

o1

o2

o3

o1

o2

o3

x22

x23

x1

i j(h) k

net1

net2

net3

Page 21: Chapter 4: Artificial Neural Networks. Artificial neural network(ANN)  General, practical method for learning real- valued, discrete-valued, vector-valued
Page 22: Chapter 4: Artificial Neural Networks. Artificial neural network(ANN)  General, practical method for learning real- valued, discrete-valued, vector-valued

BACKPROPAGATION알고리즘

새로운 error 의 정의

Dd outputsk

kdkd otwE 2)(2

1)(

Page 23: Chapter 4: Artificial Neural Networks. Artificial neural network(ANN)  General, practical method for learning real- valued, discrete-valued, vector-valued

BACKPROPAGATION알고리즘 (cont’d)

Multiple local minima

Termination fixed number of iteration error threshold error of separate validation set

Page 24: Chapter 4: Artificial Neural Networks. Artificial neural network(ANN)  General, practical method for learning real- valued, discrete-valued, vector-valued

BACKPROPAGATION알고리즘 (cont’d)

Adding momentum 직전의 loop 에서의 weight 갱신이

영향을 미침

Learning in arbitrary acyclic network downstream(r)

)1()( nwxnw jijijji

)(

)1(rDownstreamsssrrrr woo

Page 25: Chapter 4: Artificial Neural Networks. Artificial neural network(ANN)  General, practical method for learning real- valued, discrete-valued, vector-valued

BACKPROPAGATION rule

ji

dji w

Ew

outputsk

kkd otwE 2)(2

1)(

jij

d

ji

j

j

d

ji

d xnet

E

w

net

net

E

w

E

Page 26: Chapter 4: Artificial Neural Networks. Artificial neural network(ANN)  General, practical method for learning real- valued, discrete-valued, vector-valued

BACKPROPAGATION rule(cont’d)

Training rule for output unit

j

j

j

d

j

d

net

o

o

E

net

E

outputskkk

jj

d otoo

E 2)(2

1

)()(

)(22

1)(

2

1 2jj

j

jjjjjj

jj

d oto

ototot

oo

E

)1()(

jjj

j

j

j oonet

net

net

o

jijjjjji

dji xooot

w

Ew )1()(

Page 27: Chapter 4: Artificial Neural Networks. Artificial neural network(ANN)  General, practical method for learning real- valued, discrete-valued, vector-valued

x21

x2

x3

w23

w22

w21

w12

w22

w32

net1

net2

net3

o1

o2

o3

o1

o2

o3

x22

x23

x1

i j(h) k

net1

net2

net3

Page 28: Chapter 4: Artificial Neural Networks. Artificial neural network(ANN)  General, practical method for learning real- valued, discrete-valued, vector-valued

BACKPROPAGATION rule(cont’d)

Training rule for hidden unit

)(

)()(

)()(

)1(

jDownstreamkjjkjk

jDownstreamk j

jkjk

jDownstreamk j

j

j

kk

jDownstreamk j

kk

jDownstreamk j

k

k

d

j

d

oow

net

ow

net

o

o

net

net

net

net

net

net

E

net

E

)(

)1(jDownstreamkkjkjjj woo

Page 29: Chapter 4: Artificial Neural Networks. Artificial neural network(ANN)  General, practical method for learning real- valued, discrete-valued, vector-valued
Page 30: Chapter 4: Artificial Neural Networks. Artificial neural network(ANN)  General, practical method for learning real- valued, discrete-valued, vector-valued

Convergence and local minima

Only guarantee local minima This problem is not severe

Algorithm is highly effective the more weights, the less local

minima problem weight 는 처음에 0 에 가까운 값으로

초기화 해결책

momentum, stochastic, 복수의 network

Page 31: Chapter 4: Artificial Neural Networks. Artificial neural network(ANN)  General, practical method for learning real- valued, discrete-valued, vector-valued

Feedfoward network 의 표현력

Boolean functions with two layers disjunctive normal form 하나의 입력에 하나의 hidden unit

Continuous functions(bounded) with two layers

Arbitrary functions with three layers linear combination of small

functions

Page 32: Chapter 4: Artificial Neural Networks. Artificial neural network(ANN)  General, practical method for learning real- valued, discrete-valued, vector-valued

Hypothesis space search continuous -> distinct 보다 유용

Inductive bias characterize 의 어려움 완만한 interpolation

Page 33: Chapter 4: Artificial Neural Networks. Artificial neural network(ANN)  General, practical method for learning real- valued, discrete-valued, vector-valued

Hidden layer representation

입력값 들의 특성을 스스로 파악해서 hidden layer 에 표현하는 능력이 있다 .

사람이 미리 정해 준 feature 만을 사용하는 경우보다 유연하며 미리 알 수 없는 특성을 파악하는데 유용하다 .

Page 34: Chapter 4: Artificial Neural Networks. Artificial neural network(ANN)  General, practical method for learning real- valued, discrete-valued, vector-valued
Page 35: Chapter 4: Artificial Neural Networks. Artificial neural network(ANN)  General, practical method for learning real- valued, discrete-valued, vector-valued
Page 36: Chapter 4: Artificial Neural Networks. Artificial neural network(ANN)  General, practical method for learning real- valued, discrete-valued, vector-valued
Page 37: Chapter 4: Artificial Neural Networks. Artificial neural network(ANN)  General, practical method for learning real- valued, discrete-valued, vector-valued
Page 38: Chapter 4: Artificial Neural Networks. Artificial neural network(ANN)  General, practical method for learning real- valued, discrete-valued, vector-valued

Generalization, overfitting, stopping criterion

Terminating condition error threshold 는 위험

Generalization accuracy 의 고려

Weight decay Validation data Cross-validation approach K-fold cross-validation

Page 39: Chapter 4: Artificial Neural Networks. Artificial neural network(ANN)  General, practical method for learning real- valued, discrete-valued, vector-valued
Page 40: Chapter 4: Artificial Neural Networks. Artificial neural network(ANN)  General, practical method for learning real- valued, discrete-valued, vector-valued

Face recognition

for non-linearly separable unthresholded od 는 w 에 대한 함수값

Page 41: Chapter 4: Artificial Neural Networks. Artificial neural network(ANN)  General, practical method for learning real- valued, discrete-valued, vector-valued

Input image:120*128 ->30*32 계산상의 복잡도 감소 mean value(cf, ALVINN)

1-of-n output encoding many weights 모호성 해소에 도움 <0.9, 0.1, 0.1, 0.1>

2 layers, 3 units -> 90% success learned hidden units

Page 42: Chapter 4: Artificial Neural Networks. Artificial neural network(ANN)  General, practical method for learning real- valued, discrete-valued, vector-valued

Alternativce error functions

Weight-tuning rule 에 새로운 제약조건을 첨가하기 위해 사용

Penalty term for weight magnitude reducing the risk of overfitting

Derivative of target function Minimizing cross-entropy

for probabilistic function Weight sharing

speech recognition

Page 43: Chapter 4: Artificial Neural Networks. Artificial neural network(ANN)  General, practical method for learning real- valued, discrete-valued, vector-valued

Dd

dddd otot )1log()1(log

Page 44: Chapter 4: Artificial Neural Networks. Artificial neural network(ANN)  General, practical method for learning real- valued, discrete-valued, vector-valued

Alternative error minimization procedures

Line search direction: same as

backpropagation distance: minimum of the error

function in this line very large or very small

Conjugate gradient new direction: component of

the error gradient remains zero

Page 45: Chapter 4: Artificial Neural Networks. Artificial neural network(ANN)  General, practical method for learning real- valued, discrete-valued, vector-valued

Recurrent networks

Page 46: Chapter 4: Artificial Neural Networks. Artificial neural network(ANN)  General, practical method for learning real- valued, discrete-valued, vector-valued

Dynamically modifying network structure

목적 : 일반화의 정확도와 학습 효율의 향상

확장 (without hidden unit) CASCADE-CORRELATION 학습 시간 단축 , overfitting 문제

축소 “optimal brain damage”

학습 시간 단축