chee825 fall 2005j. mclellan1 nonlinear empirical models
DESCRIPTION
CHEE825 Fall 2005J. McLellan3 Neural Networks... structure motivated by physiological structure of brain individual nodes or cells - “neurons” -sometimes called “perceptrons” neuron characteristics - notion of “firing” or threshold behaviourTRANSCRIPT
CHEE825 Fall 2005 J. McLellan 1
Nonlinear Empirical Models
CHEE825 Fall 2005 J. McLellan 2
Neural Network Models of Process Behaviour
• generally modeling input-output behaviour• empirical models - no attempt to model physical
structure• estimated from plant data
CHEE825 Fall 2005 J. McLellan 3
Neural Networks...
• structure motivated by physiological structure of brain • individual nodes or cells - “neurons” -sometimes
called “perceptrons”• neuron characteristics - notion of “firing” or threshold
behaviour
CHEE825 Fall 2005 J. McLellan 4
Stages of Neural Network Model Development
• data collection - training set, validation set• specification / initialization - structure of network,
initial values• “learning” or training - estimation of parameters• validation - ability to predict new data set collected
under same conditions
CHEE825 Fall 2005 J. McLellan 5
Data Collection
• expected range and point of operation• size of input perturbation signal• type of input perturbation signal
- random input sequence?- number of levels (two or more?)
• validation data set
CHEE825 Fall 2005 J. McLellan 6
Model Structure
• numbers and types of nodes• input, “hidden”, output• depends on type of neural network
- e.g., Feedforward Neural Network- e.g., Recurrent Neural Network
• types of neuron functions - threshold behaviour - e.g., sigmoid function, ordinary differential equation
CHEE825 Fall 2005 J. McLellan 7
“Learning” (Training)
• estimation of network parameters - weights, thresholds and bias terms
• nonlinear optimization problem • objective function - typically sum of squares of output
prediction error• optimization algorithm - gradient-based method or
variation
CHEE825 Fall 2005 J. McLellan 8
Validation
• use estimated NN model to predict outputs for new data set
• if prediction unacceptable, “re-train” NN model with modifications - e.g., number of neurons
• diagnostics - sum of squares of prediction error- R2 - coefficient of determination
CHEE825 Fall 2005 J. McLellan 9
Feedforward Neural Networks
• signals flow forward from input through hidden nodes to output- no internal feedback
• input nodes - receive external inputs (e.g., controls) and scale to [0,1] range
• hidden nodes - collect weighted sums of inputs from other nodes and act on the sum with a nonlinear function
CHEE825 Fall 2005 J. McLellan 10
Feedforward Neural Networks (FNN)
• output nodes - similar to hidden nodes BUT they produce signals leaving the network (outputs)
• FNN has one input layer, one output layer, and can have many hidden layers
CHEE825 Fall 2005 J. McLellan 11
FNN - Neuron Model
• ith neuron in layer l+1
y f w yil
ijl
jl
il
j
Nl+ + +
== +∑1 1 1
1( )θ
threshold value
weight
activation function
state of neuron
CHEE825 Fall 2005 J. McLellan 12
FNN parameters
• weights wl+1ij - weight on output from jth neuron in
layer l entering neuron i in layer l+1• threshold - determines value of function when inputs
to neuron are zero• bias - provision for additional constants to be added
CHEE825 Fall 2005 J. McLellan 13
FNN Activation Function
• typically sigmoidal function
f xe x
( )=+ −
11
CHEE825 Fall 2005 J. McLellan 14
FNN Structure
input layerhidden layer
output layer
CHEE825 Fall 2005 J. McLellan 15
Mathematical Basis
• approximation of functions• e.g., Cybenko, 1989 - J. of Mathematics of Control,
Signals and Systems• approximation to arbitrary degree given sufficiently
large number of nodes - sigmoidal
CHEE825 Fall 2005 J. McLellan 16
Training FNN’s
• calculate sum of squares of output prediction error
• take current iterates of parameters, calculate forward and calculate E
• update estimates of weights working backwards - “backpropagation”
E y yj jj
= −∑( $)2
CHEE825 Fall 2005 J. McLellan 17
Estimation
• typically using a gradient-based optimization method• make adjustments proportional to
• issues - highly over-parameterized models - potential for singularity
• e.g., Levenberg-Marquardt algo.
∂∂Ewij
l+1
CHEE825 Fall 2005 J. McLellan 18
How to use FNN for modeling dynamic behaviour?
• structure of FNN suggests static model • model dynamic model as nonlinear difference
equation • essentially a NARMAX model
CHEE825 Fall 2005 J. McLellan 19
Linear discrete time transfer function
• transfer function
• equivalent difference equation
y G z bzaz
uk k+
−
−= =
+−1
1
1
11
( )
y ay u buk k k k+ −= + +1 1
CHEE825 Fall 2005 J. McLellan 20
FNN Structure - 1st order linear example
input layerhidden layer
output layeryk
uk
uk-1
yk+1
CHEE825 Fall 2005 J. McLellan 21
FNN model for 1st order linear example
• essentially modelling algebraic relationship between past and present inputs and outputs
• nonlinear activation function not required• weights required - correspond to coefficients in
discrete transfer function
CHEE825 Fall 2005 J. McLellan 22
Applications of FNN’s
• process modeling - bioreactors, pulp and paper,• nonlinear control• data reconciliation• fault detection• some industrial applications - many academic
(simulation) studies
CHEE825 Fall 2005 J. McLellan 23
“Typical dimensions”
• Dayal et al., 1994 - 3-state jacketted CSTR as a basis
• 700 data points in training set• 6 inputs, 1 hidden layer with 6 nodes, 1 output
CHEE825 Fall 2005 J. McLellan 24
Advantages of Neural Net Models
• limited process knowledge required - but be careful (e.g., Dayal et al. paper)
• flexible - can model difficult relationships directly (e.g., inverse of a nonlinear control problem)
CHEE825 Fall 2005 J. McLellan 25
Disadvantages
• potential for large computational requirements - implications for real-time application
• highly over-parameterized • limited insight into process structure• amount of data required• limited to range of data collection
CHEE825 Fall 2005 J. McLellan 26
Recurrent Neural Networks
• neurons contain differential equation model - 1st order linear + nonlinearity
• contain feedback and feedforward components• can represent continuous dynamics• e.g., You and Nikolaou, 1993
CHEE825 Fall 2005 J. McLellan 27
Nonlinear Empirical Model Representations
• Volterra Series (continuous and discrete)• Nonlinear Auto-Regressive Moving Average with
Exogenous Inputs (NARMAX)• Cascade Models
CHEE825 Fall 2005 J. McLellan 28
Volterra Series Models
• higher-order convolution models• continuous
y t h t u t d
h u t u t d dh u t u t u t d d d
( ) ( ) ( )
( , ) ( ) ( )( , , ) ( ) ( ) ( )
= −∫
+ − −∫∫+ − − −∫∫∫+
−∞
∞
1
2 1 2 1 2 1 2
3 1 2 3 1 2 3 1 2 3
τ τ
τ τ τ τ τ ττ τ τ τ τ τ τ τ τ
L
CHEE825 Fall 2005 J. McLellan 29
Volterra Series Model
• discrete time
y k h j u k j
h j j u k j u k jh j j j u k j u k j u k j
j( ) ( ) ( )
( , ) ( ) ( )( , , ) ( ) ( ) ( )
=∑ −
+ − −∑∑+ − − −∑∑∑+
=
∞
11
2 1 2 1 2
3 1 2 3 1 2 3
L
CHEE825 Fall 2005 J. McLellan 30
Volterra Series models...
• can be estimated directly from data or derived from state space models
• causality - limits of sum or integration• functions hi - referred to as the ith order kernel• applications - typically second-order
(e.g., Pearson et al., 1994 - binder)
CHEE825 Fall 2005 J. McLellan 31
NARMAX models
• nonlinear difference equation models• typical form
• dependence on lagged y’s - autoregressive• dependence on lagged u’s - moving average
y k f y k y k u k u k( ) ( ( ), ( ),. . . , ( ), ( ),. . .)+ = − −1 1 1
CHEE825 Fall 2005 J. McLellan 32
NARMAX examples
• with products, cross-products
• 2nd order Volterra model– as NARMAX model in u only, with second order terms
y k a y k y k a y k u ka u k u k( ) ( ) ( ) ( ) ( )
( ) ( )+ = − + −
+1 1 11 2
3
CHEE825 Fall 2005 J. McLellan 33
Nonlinear Cascade Models
• made from serial and parallel arrangements of static nonlinear and linear dynamic elements
• e.g., 1st order linear dynamic element fed into a “squaring” element– obtain products of lagged inputs– cf. second order Volterra term