learning in neural and belief networks - feed forward neural network 2001 년 3 월 28 일 20013329...
TRANSCRIPT
Learning in Neural and Belief Networks
-Feed Forward Neural NetworkFeed Forward Neural Network
2001 년 3 월 28 일
20013329 안순길
Contents
How the Brain worksNeural NetworksPerceptrons
Introduction
Two view points in this chapter Computational view points : representing
function using network Biological view points : mathematical model
for brain
Neuron: computing elementsNeural Networks: collection of interconnected neurons
How the Brain WorksCell body (soma) :provides the support functions and structure of the cellAxon : a branching fiber which carries signals away from the neuronsSynapse : converts a electrical signal into a chemical signalDendrites : consist of more branching fibers which receive signal from other nerve cellsAction potential: electrical pulseSynapse excitatory: increasing potential synaptic connection: plasticity inhibitory: decreasing potential
A collection of simple cells can lead to thoughts, action, and consciousness.
Comparing brains with digital computers
They perform quite different tasks, have different propertiesSpeed (in Switching speed) computer is a million times faster brain is a billion times faster
Brain Perform a complex task More fault-tolerant: graceful degradation To be trained using an inductive learning algorithm
Neural NetworksNN: nodes(unit), links(has a numeric weight) Each link has a weight Learning : updating the weights
Two computational components linear component: input function nonlinear component: activation
function
Notation
Simple computing elements
Total weighted input
By applying the activation function g
Three activation function
Threshold To cause the neuron to fire can be replaced with an extra input
weight. The input greater than threshold,
output 1 Otherwise 0
Applying neural network in Logic Gates
Network structures(I)
Feed-forward networks Unidirectional links, no cycles DAG(directed acyclic graph) No links between units in the same
layer, no links backward to a previous layer, no links that skip a layer.
Uniformly processing from input units to output units
No internal state
input units/ output units/ hidden unitsPerceptron: no hidden unitsMultilayer networks: one or more hidden unitsSpecific parameterized structure: fixed structure and activation functionNonlinear regression: g(nonlinear function)
Network Structures(II)
Recurrent Network The Brain similar to Recurrent Network Brain has backward link like Recurrent Recurrent networks have internal states
stored in the activation level Unstable, oscillate, exhibit chaotic behavior Long computation time Need advanced mathematical method
Network Structures(III)
Examples Hopfield networks
Bidirectional connections with symmetric weights
Associative memory: most closely resembles the new stimulus
Boltzmann machines Stochastic(probabilitic) activation
function
Optimal Network Struture(I)
Too small network: in capable of representationToo big network: not generalized well Overfitting when there are too many
parameters.
Feed forward NN with one hidden layer can approximate any continuous function
Feed forward NN with 2 hidden layer can approximate any function
Optimal Network Structures(II)NERF(Network Efficiently Representable
Functions) Function that can be approximated with a small
number of units Using genetic algorithm: running the whole NN
training protocol Hill-climbing search(modifying an existing network
structure) Start with a big network: optimal brain
damage Removing weights from fully connected model
Start with a small network: tiling algorithm Start with single unit and add subsequent units
Cross-validation techniques
PerceptronsPerceptron: single-layer, feed-forward network Each output unit is indep. of the others Each weight only affects one of the
outputswhere,
What perceptrons can represent
Boolean function AND, OR, and NOTMajority function: Wj=1, t=n/2 ->1 unit, n weights In case of decision tree: O(2n) nodes
can only represent linearly separable functions.cannot represent XOR
Examples of PerceptronsEntire input space is divided in two along a boundary defined byIn Figure 19.9(a): n=2In Figure 19.10(a): n=3
Learning linearly separable functions(I)
Bad news: not many problem in this setGood news: given enough training examples, there exists a perceptron algorithm learning them.
Neural network learning algorithm Current-best-hypothesis(CBH) scheme Hypothesis: a network defined by the current
values of the weights Initial network: randomly assigned weight in [-
0.5, 0.5] Repeat the update phase to achieve convergence Each epoch: updating all the weights for all the
examples
Learning linearly separable functions(II)
Learning The error
Err=T-O :Rosenblatt in
1960 : learning rate
Error positive Need to increase O
Error negative Need to decrease O
Algorithm
Perceptrons(Minsky and Papert, 1969) Limits of linearly separable functions
Gradient descent search through weight space Weight space han no local minima
Difference btw. NN and other attribute-based methods such as decision trees. Real numbers in some fixed range vs. discrete set
Dealing with discrete set Local encoding: a single input, discrete attribute
values None=0.0, Some=0.5, Full=1.0 (WillWait)
Distributed encoding: one input unit for each attribute
Example
Summary(I)Neural network is made by seeing human’s brain Brain still superior to Computer in Switching
Speed More fault-tolerant
Neural network nodes(unit), links(has a numeric weight) Each link has a weight Learning : updating the weights Two computational components
linear component: input function nonlinear component: activation function
Summary(II)
In this text, We only consider Feed-forward networks
Unidirectional links, no cycles DAG(directed acyclic graph) No links between units in the same
layer, no links backward to a previous layer, no links that skip a layer.
Uniformly processing from input units to output units
No internal state
Summary(III)
Network size decides Representation Power Overfitting when there are too many
parameters.
Feed forward NN with one hidden layer can approximate any continuous function
Feed forward NN with 2 hidden layer can approximate any function
Summary(IV)Perceptron: single-layer, feed-forward network Each output unit is indep. of the others Each weight only affects one of the outputs Only available in linear separable functions
If Problem Space is flat, Neural Network is very available.In other words, if we make it easy in algorithm perspective, Neural network also do
Basically, Back Propagation only guarantee Local Optimality in neural network