psichogios_aiche

7/28/2019 psichogios_AICHE

1/13

A H y b r i d N e u r a l N e t w o r k - F i r s t P r i n c i p l e sA p p r o a c h t o Proc es s Mode l i ngDimitris C. Psichogios and Lyle H. UngarDept. of Chemical Engineering, University of Pennsylvania, Philadelphia, PA 19104

A hybrid neural network-first principles modeling scheme is developed and usedto model afedbatch bioreactor. The hybrid model combinesa partial irst principlesmodel, which incorporates the available prior knowledge about the process beingmodeled, with a neural network which serves as an estimator of unmeasuredprocessparameters that are difficult to model from irst principles. This hybrid model hasbetter properties than standard black-boxneural network models in that it is ableto interpolate and extrapolate much more accurately, is easier to analyze and in-terpret, and requires significantly ewer training examples. Two alternative state andparameter estimation strategies, extended Kalman filtering and NLP optimization,are also considered. When no a priori known model of the unobserved processparameters is available, the hybrid network model gives better estimates of theparameters, when compared to these methods. By providing a model of these un-measured parameters, the hybrid network can also make predictions and hence canbe used for process optimization. These results apply both when full and partialstate measurements are available, but in the latter case a state reconstruction methodmust be used for the first principles component of the hybrid model.

In t roduct ionThe term artificial neural netw orks is a generic descriptionfor a wide class of connectionist representations inspired bythe models for brain activity. Th e most com mon task of thesemodels is to perform a mapping from an input space to anoutput space. A typical multilayered feedforward neural net-work (Rumelhart et al., 1986) is shown in Figure 1. It consists

of massively interconnected simple processing elements (neu-rons or nodes) arranged in a layered structure, where thestrength of each connection is given by an assigned weight;these weights are the internal parameters of the network. Theinput neurons are connected to the output neurons throughlayers of hidden nodes. Each neuron receives information inthe form of inputs from other neurons or the world and proc-esses it throu gh some-typically nonlinea r-function (the ac -tivation function); in this way the network can perform anonlinear mapping. It has been shown that, under some mildassumptions, such networks, if sufficiently large, can approx -imate any nonlinear continuous function arbitrarily accurately(Stinchcombe and White, 1989).Correspondence concerning this article should be ad dressed to L. H . Ungar.

These connectionist models have the ability to learn thefrequently complex dynamic behavior of a physical system.Learning is the process where the network approximates thefunction mapping from system inputs to outpu ts, given a setof observations of its inputs and corresponding outp uts. Thisis done by adjusting the networks internal parameters, typi-cally in such a way as to minimize the squared error betweenthe networks outputs and the desired outputs. One suchmethod is the error back-propagation algorithm (Werbos, 1974;Rumelhart et al., 1986), which is essentially a first-order gra-dient descent method. The ability to approximate unknownfunctions through presentation of their instances makes neuralnetworks a useful and potentially powerful tool for modelingin engineering applications.Neural networks have typically been used as black-boxtools, that is, no prior knowledge about the process was as-sumed; the goal was t o develop a process model based onlyon observations of its input-ou tput behavior. M odeling with-out using apriori knowledge has ofte n proved successful (Bhatand McAvoy, 1990; Psichogios and Ung ar, 1991; Willis et al.,1991) an d is the only possible method w hen no process knowl-

AIChE Journal October 1992 Vol. 38, No. 10 1499


2/13

IntermediateInputs nodes outputs(Observations) (Conclusions)n

Figure 1. Multilayered feedforward neural network.

edge is available. The ability of neural networks to learn non-parametric (structure-free) approximations to arbitraryfunctions is their strength, but it is also a weakness. A typicalneural network involves hundreds of internal parameters, whichcan lead to overfitting-fitting of the noise as well as theunderlying function-and poor generalization. Furthermore,interpretation of such models is difficult (Mavrovouniotis,1992).As a result, there has been an increasing interest in devel-oping modeling methods that address these problems. Sinceredundancy (excess degrees of freedom) may result in poormodels, one approach has been to decrease the redundancy ofthe neural network model by developing algorithms thatprune the weights that have no significant effect on thenetworks performance (McAvoy and Bhat, 1990; Karnin,1990; Mozer and Smolensky, 1989). These methods either pen-alize model complexity or examine the sensitivity of the pre-diction error to the networks weights, and eliminate theseweights (connections) that least affect the fit. However, theydo not address the issue of lack of internal model structureand do not use prior knowledge about the process being mod-eled.A different approach has focused on imposing internal struc-ture in the neural network model, typically by using some priorknowledge about the process. The common feature of methodsthat follow this approach is that clearly identifiable differentparts of the resulting network model perform different tasks,and it is this interpretation that we give to the term internalstructure in the remainder of the article. One possibility is tocreate a structured network that combines a known linearmodel with a nonlinear neural network (Haesloop and Holt,1990). The basic idea behind this technique is that the nonlinearpart of the network will model the process nonlinearities, thusenabling the complete model to capture more complex dynamicbehavior than the linear part of the network alone. An alter-native is to construct a neural network model which can beconsidered as a hierarchical, sparsely connected, network ofsmaller subnetworks that perform some local calculation. Thetask that each of these smaller networks is assigned to perform,as well as the connectivity among them, is based on empirical1500 October 1992

guidelines and analysis of the systems behavior (Mavrovou-niotis, 1992).We believe that it is advantageous to a priori structure theneural network models; in machine learning terminology, thisis characterized as imposing inductive bias on the finalmodel. In this article we follow a variation of this approachand develop a modeling strategy that combines first principlesknowledge, in the form of equations such as mass and energybalances, with neural networks as nonparametric estimatorsof important process parameters. This approach provides hy-brid models with internal structure, where each part of thefinal model performs a different task. These clearly identifiableparts are the process parameter estimator (neural net) and thepartial (first principles) model. The partial model provides abetter starting point than black-box neural networks and,at the same time, allows for both structural and parametricuncertainty. Since neural networks can approximate arbitraryfunctions for which no a priori parameterization is known,they can ideally complement the basic model and account forthe uncertainty. The resulting models can be thought of asstructured neural networks which contain some known con-straints, such as mass and energy balances; alternatively, theycan be thought of as equations which contain process pa-rameters whose dependence on state variables is modeled byneural networks. Our goal is to develop hybrid neural networkprocess models which are more flexible than classical parameterestimation schemes and which generalize and extrapolate betterthan classical black-box neural networks, in addition tobeing more reliable and easier to interpret.We evaluate this hybrid neural network modeling schemeby comparing its prediction accuracy with standard neuralnetworks. We also discuss and compare the performance ofother estimation methods, such as extended Kalman filteringand nonlinear programming (NLP) techniques, under the as-sumption that proper parameterization of the process param-eters is not apriori available. Both Kalman filtering and NLPschemes are used to directly estimate the unknown parametersof the known first principles model, either in a stochasticproblem formulat ion (Kalman filter) or in a least-squares min-imization approach (NLP estimation).This article comprises three parts: first, we compare thestandard (black-box) and hybrid neural network modelingmethods; secondly, we compare the hybrid modeling methodto NLP estimation and Kalman fi ltering; and thirdly, we extendthe concept of hybrid neural network modeling to the partialstate information case. All competing methods were tested onthe modeling of a fedbatch bioreactor. The bioreactor is dis-cussed as well as the challenges and inherent difficulties in-volved in modeling such systems. Then, the hybrid neuralnetwork and the standard neural network modeling approachesare explained, as well as the generation of a training data setand the training procedures fo r both methods. Subsequentlypresented are the method of parameter and state estimationthrough nonlinear optimization, and extended Kalman filteringcombined with least-squares parameter estimation whose re-sults are compared with our approach. The assumption of fullstate accessibility is relaxed, and a nonlinear exponential ob-server is developed and incorporated in the hybrid neural net-work model, and results are discussed. Afterward, theapplication of nonlinear optimization modeling and extended

Kalman filtering to the incomplete measurement case are pre-Vol. 38 , No. 10 AIChE Journal


3/13

sented, followed by an application of the hybrid model toprocess optimization, specifically to t he calculation o f optimalfeed policies for fedbatch reactors. Further insight on the es-timation methods considered in this article and a summary ofour results is given in the final section.

function of the biochemical, biological, and physicochemicalvariables of the system. As a result, a large number of modelshas been proposed to describe these kinetics an d so the choiceof a growth model for a particular fedbatch ferme ntation proc-ess is not at all straig htforw ard. In th e following we will assumethat the true but unknown and unmeasured growth rate isdescribed by the Haldane model:Mod el ing Problem: Fedbatc h BioreactorBiological reactors exhibit a wide range of dynamic behav-iors and offer many challenges to modeling, as a result of th epresence of living organisms (cells) whose growth rate is de-scribed by complex kinetic expressions. We will illustrate thehybrid modeling method on the identification of such a system.Consider the dynamic system which can be described by th efollowing general representation:

where x denotes the sta te vector of the system, u the controlvector an d p a vector of process parameters. These parametersp essentially represent the process kinetics an d a re related tothe system variables through the set of equations g( ). Thefunctionality tha t relates the process parameters t o sta te vari-ables and control variables, such as the reactor pH and tem-perature (R ivera and K arim, 1990), is difficult to derive fro mfirst principles reasoning an d typically unknow n; however, itis the presence of this complex unknown functionality thatmakes biological reactions highly nonlinear systems. In othersystems, the parameters p might be reaction kinetics or vis-cosities which vary in complex ways with temperature andchemical composition.Bioreactors operating in a fedbatch (nonstationary) modecan achieve high product concentrations (Gostomski et al.,1990) and are quite difficult to model, since their operationinvolves microbial growth under constantly changing condi-tions. N evertheless, knowledge of process parameters (such asgrowth rate kinetics) under a wide range of operating condi-tions is very important in efficiently designing optimal rea ctoroperation policies.A fedbatch stirred bioreactor can be described by the fol-lowing equations (Dochain and Bastin, 1990):

(3 )

(4)dS-= - , p ( t ) X ( t )+%.[Sin( t )- ( t ) ]dt V ( )d V- = F ( t )d t

where X ( ) is the biomass concentration and S ( ) s the sub-strate concentration. These mass balances on th e reacting spe-cies provide a partial model. The kinetics of the process arelumped in the term p ( t ) which accounts for the conversion ofsubstrate to biomass. Th is term, known as specific growth rateis, as noted by Dochain and Bastin (1990), typically a complex

This expression will only be used to simulate the tru e processmodel; for all modeling techniques described in the remainderof the article, th e above expression describing the cell growthrate will be completely unknown. Furthermore, the inlet sub-strate feed concentration Sinwill be the manipulated input,and the flow rate F ( t ) will be held constant.Standard and Hybr id Neural Network Mo dels

Neural networks have been successfully used as blac k-boxmodels of dynamic systems and , more specifically, as processvariable estimators in bioreactor modeling applications (Lantet al., 1990; Thibault et al., 1990). In these effort s the processwas operating in a continuous mode; however, identificationof batch processes is much more difficult, since a wide rangeof operating regimes is involved and less data may be available.This section discusses the adv antages of structured neural net-work modeling and describes the development of both a stan d-ard and a hybrid neural network m odel of the bioreactor system.

As discussed in the previous section, it is quite straightfor-ward to derive an approximate model of the bioreactor (Eqs.3-5) from simple first principles considerations such as massbalances on the process variables. How ever, the critical factorin determining the dynamic behavior of the process is theunknown kinetics (growth rate model) of the conversion ofsubstrate to biomass. The central idea of this article, then, isto integrate the available approximate model with a neuralnetwork which approximates the unknown kinetics, in orderto fo rm a combined model structure w hich can be characterizedas a hybrid (or structured) neural network process model.This approach offers significant advantages over a black-box neural network modeling methodology. T he hybrid neuralnetwork m odel has internal sti uctu re which clearly determinesthe interaction among process variables and process parame-ters, and as a result is easier to analyze than standard neuralnetworks. The first principles partial model specifies processvariable interactions from physical considerations; the neuralnetwork complements this model by estimating unmeasuredprocess parameters in such a way as to satisfy the first principlesconstraints; nonparametric estimation is needed since noknowledge is available abo ut these parameters. Such structuredmodels are expected to perform better tha n black-box neuralnetwork models in process identification tasks, since gener-alization and extrapolation are confined only to the uncertainparts of the process while the basic model is always consistentwith first principles and does not allow aphysical variable in-teractions.Furthermore, in data reconciliation and adaptive modelingand control it is very impor tan t to be able to correctly identify



4/13

I fFirst PrinciplesY ModelFigure2. Hybrid (structured) neural network model; theneural network component estimates theprocess parametersp , which are used as inputto the fir st principles model.

which part (or parts) of the process model are responsible forerroneous predictions and thus need to be updated. Traditionalneural network process models can be adapted as new databecome available, but the generality of such an adapted modelis questionable (Hernandez and Arkun, 1990) because all ofthe models internal parameters are updated since all are con-sidered partially responsible for the error. In contrast, theinternal structure of a hybrid neural network model clearlyidentifies the contribution of each part of the model to itspredictions. As a result, the number of potential error sourcescan be drastically reduced and the adaptation can be morefocused.A schematic representation of the hybrid neural networkmodel is shown in Figure 2. The neural network componentreceives as inputs the process variables and provides an estimateof the current parameter values, in this case the cell growthrate. The networks output serves as an input to the first prin-ciples component, which produces as output the values of theprocess variables at the end of each sampling time. The com-bination of these two building blocks yields a complete hybridneural network model of the bioreaction system.

For the standard neural network modeling approach, de-velopment of a process model for the bioreactor is straight-forward. Given as inputs observations of the process variables(state) and the manipulated inputs, the neural network modelpredicts the state of the system at the next sampling instant.Since a set of target outputs is available for every setof inputspresented to the neural network model, a supervised trainingmethod can be used to calculate the error signal used to changethe models internal parameters (weights).

However, for the hybrid neural network model target out-puts are not directly available, as the cell growth rate is notmeasured. In this case, the known partial process model canbe used to calculate a suitable error signal that can be used toupdate the networks weights. The observed error between thestructured models predictions and the actual state variablemeasurements can be back-propagated through the knownset of equations, essentially by using the partial models Ja-cobian, and translated into an error signal for the neural net-work component. The intuition behind this is that the processparameters should be changed proportionally to their effect1502 October 1992

on the state variable predictions, multiplied by the observederror in the state predictions.

The generation of a training data set and details of thetraining procedure for both structured and standard neuralnetwork models will be addressed in the following section.

Training of Standard and Hybrid Neural NetworkModelsA standard neural network model was developed which,given as inputs observations of the state and manipulated vari-ables, predicted the state of the system at the next samplingtime. The state variables, particularly the biomass concentra-tion, undergo changes of over an order of magnitude. Thislarge variation can cause problems when using neural net-works, so we defined dimensionless biomass concentration xand dimensionless substrate concentration IJ as:

where S, , is the average value of the substrate feed concen-tration Sin.Note that the scaling is linear, and should thereforehave no effect on a squared error criterion. The inputs to theneural network were the natural logarithm of the biomassconcentration X and the substrate concentration S , and a scaledvalue of the manipulated variable Sin.The desired networksoutputs were the dimensionless biomass concentration and sub-strate concentration x and u respectively. A sigmoid with out-put ranging from (- 1, +1) as given by

was chosen as activation function. Here, o& represents theoutput of neuron k , Wjk the weight from neuron j to neuronk which multiplies neurons j output, and bk the bias of neuronk. The training method was the error back-propagation al-gorithm as described in Rumelhart et al. (1986):A set of inputswas presented to the network, and a set of the networksoutputs was obtained by propagating these inputs through thelayers of the network (shown in Figure 1). An error signal wasobtained by comparing the networks outputs with the actualprocess outputs that corresponded to this set of inputs, andthis error signal was used to change the networks weights.The errors for each input-output example were accumulatedand the weights were updated after each complete presentationof the training data set to the neural network. To avoid ov-erfitting of the training data, at frequent intervals during thetraining session the networks weights were frozen and themean square prediction error, on a separate testing data set,was calculated. Training was stopped when it was determinedthat the networks prediction accuracy would deteriorate uponcontinued training.

For the hybrid neural network model, the squared predictionerror over both process variables (biomass and substrate con-centration) and for all training patterns N was minimized aswith the standard neural network model:

Vol. 38, No. 10 AIChE Journal


5/13

l N* i

M S E = - [ ( x ~ - x , ) ~ +u , - u ~ ) ~ ]

The neural networks output (cell growth rate p ) does notappear explicitly in the above expression. However, if it isconsidered constant for each sampling insta nt, the gradient ofthe structured models output with respect to this internalparameter can be calculated through integration of the sen-sitivity equations (C aracotsios and Stewart, 1985)

- l p t ) G I t )-F( f )c2 ()- l X ( ) (12)dG2dt V ( t )-=

with initial cond itions G,(t)=0, G 2( ) =0. Thus th e gradientof the structured models outp ut with respect to the networksoutput can be calculated through use of Eqs. 11-13 and, as aresult, the gradient of the performance measure (Eq. 10) withrespect to the networks o utpu t is readily available. This gra-dient information will generate an error signal that is used toupdate the networks weights.Both state variables (biomass and substrate concentration)were used as inputs to the neural network model componentof Figure 2. Obviously, since the true kinetics only dependon the substrate concentration (Eq. 6), the biomass concen-tration input is merely a noisy input to the n etwork; however,it is interesting to examine the behavior of the hybrid neuralnetwork model under these conditions. T he inputs were scaledin the same way as for the standard neural network model(natural logarithm of biomass and substrate concentrationtaken), and the networks outp ut was an estimate of the growthrate for the current sampling time. This output was scaled asp = 2.pmax.ji,where i~ as the networks output and pmaxconstant, and was used as input to the first principles modelcomponent (Eqs. 3-5) in order to predict the systems statefor the next sampling time. T he outp uts of this hybrid modelwere dedimensionalized according t o Eqs. 7-8 to allow a directcomparison with the standard black-box neural networkmodel.The hybrid neural network model was trained as follows.A set of inputs was presented t o the model, namely th e currentvalues of the biomass (X), substrate (S) and substrate feed(Sin). caled values of X and S were propagated through thenetwork part of the structured model; its output was an es-timate of the growth rate p , which along with X , S and Sinwere propagated through the first principles part of th ehybrid model (Eqs. 3-5) to obtain a n estimate of the processvariables for the next sampling time. A t the sam e time, th e setof the sensitivity equations w as integrated, an d an error signal(ES) was calculated acc ording to the following relationship:

where

that is, g represents a dimensionless gradient and subscript istands for the i-th input-output pattern in the training dataset. The error signals ESi for the neural network output weresummed over all inpu t-output examples an d the weights wereupdated using back-propa gation afte r a complete presentationof the training da ta set.Equations 3- 6 with parameter values given by Dochain an dBastin (1990) were used to develop a training data set. Thesubstrate feed inlet concentration was randomly perturbedwithin 50%of a nominal value of 60 g/ k, following a uniformdistribution. The initial state of the process was also chosenin such a way as to explore the sta te space as much as possible.For each of the two state variables (biomass concentration Xand substrate S) hree levels of low (=0.1 g/lt), m edium (=0.5g/lt), and high (=0.9 gllt) concentrations were considered,and th e initial state of the process comprised all possible com-binations of these initial conditions. Th e initial reactor volumeof the reactor was set equal to a nom inal value of lOlt in allsimulated batch runs. D ata were sampled every 0.2 hours, andthe batch policy consisted of a feeding period of 15 hours an da subsequent quenching period of 5 hours, where the sub-strate feed Sin nd flow rate F were set equal to zero. A totalof nine data sets, each consisting of 100 data points, werecreated in this way; two additional runs with shorter feedingtime (5 hours) and different initial conditions were created,each consisting of 50 data points. All measurements were cor-rupted with normal white Gaussian noise N(0, 0.01) added tothe dimensional s tate variables.The presence of noise in the measurements of process vari-ables X an d S corrupts the gradient information obtainedthroug h integration of the sensitivity equations with colorednoise. A practical advantage of the batch mode of learning,implemented in the training of the hybrid neural network model,is that it provides some degree of noise smoothing: Misleadingindividual error signals are lumped togeth er with all othe r errorsignals providing a total error signal that more closely repre-sents the true gradient.

Compar ison of Hybr id and Standard Neural Net-worksAn im portant consideration in neural network modeling ishow the size of the available training data set affects the ac-curacy of the learned model. For sufficiently large data sets,a standard neural network should perform arbitrarily well inapproximating the dynamic system. When the training dataset size is small then th e sta te space is not sampled sufficientlydensely, and a traditional neural network relies heavily oninterpolation to approximate the dynamic system. Further-more, a standard neural network relies only on the data toinfer the complete process model. On the contrary, a hybridneural network already contains a partial model, w hich is also

used to reduce the error signal to a subspace that can be ex-AIChE Journal October 1992 Vol. 38, NO. 10 1503


6/13

1.51.0

'B5 0.5 '!?t::51

2I . I .. ..J0 200 400 600 800 1000

Numbcr of m n i n g darapainis0 5 10 I5 M

t (hours)igure 3. Logarithm of Mean Squared Error (MSE) onboth state variables (Eq. 10) vs. number ofavailable training examples, fo r the hybr id andstandard neural networks.Figure 4a. Biomass concentration X ) predic tion vs. timefor the standard network process model.

Vertical bars indicate standard devia tion, calculated from 10train-ing sessions.

plored with fewer training examples. In other words, the net-work component of the hybrid model relies on the data toapproximate on ly part of the model (the unmeasured param-eters).

To investigate this argument we compared the approxima-tion accuracy of the standard and hybrid neural networks asa function of the training data set size. Five cases were con-sidered, with training data set sizes of 50, 100, 250, 500 and1,000 data points respectively. For all but the last case thecorresponding number of data points was randomly drawnfrom the 1,000 training examples available (see the previoussection). The networks were trained as described above, witheach of the five data sets randomly partitioned (with a 70%:30%ratio) into two smaller subsets used for training and testingrespectively. Figure 3 shows that the mean squared predictionerror of the hybrid network on both state variables is an orderof magnitude lower than that of the standard network modeland is relatively unaffected by the training data set size. How-ever, the hybrid network's prediction error increases as thedata set becomes very small (see Figure 3). The performanceof the standard network, as expected, improves significantlyas the size of the data set increases and should asymptoticallyapproach the hybrid network's prediction error as the data setbecomes very large.

The hybrid neural network model should also extrapolatebetter than the standard network model, as a result of thepartial first principles model it contains. Poor extrapolationhas been the plague of traditional neural networks (Leonardet al., 1991), and is even more important for processes suchas batch reactors which operate in a nonstationary mode. Thesuperior extrapolation of the hybrid neural network comparedto the standard neural network is shown in Figure 4, whereboth were required to predict the state of the system when itoperated in a state space regime that was not sampled by thetraining data set. The standard network model fails to ex-trapolate correctly, whereas the hybrid model gives quite ac-curate predictions. The hybrid model also interpolates better,as illustrated in Figure 5 .

In conclusion, the hybrid network model appears to reject

6

4 ti

1 8 10 5 10 l i 23

I (hours)Figure 4b. Biomass concentration X ) prediction vs. timefor the hybrid network process model.

the noisy input (biomass concentration, which does not affectthe growth kinetics), as well as the noise in the process variablesmeasurements, and gives very good predictions of the system'sstate. Furthermore, these predictions are achievable even withonly a few data points available for training.State and Parameter Est imat ion Using an NLPTechnique

Hybrid neural networks offer a method of estimating unob-served process parameters and process variables, as demon-strated in the previous sections; however, a variety of othermethods has been extensively applied to the problem of stateand parameter estimation. In recent years, optimization meth-ods have come to be increasingly used in such problems (Be-quette and Sistu, 1990; Brengel and Seider, 1989; Jang et al.,1986). These methods use a process model and typically assumesome parameterization (model with fixed coefficients) of theprocess parameters. The unknown fixed coefficients are treatedas decision variables that can be estimated in such a way asto minimize prediction error. However, as discussed in theintroduction, proper a priori parameterization is not alwayspossibIe.

1504 October 1992 Vol. 38. No. 10 AIChE Journal


7/13

formulation. The apparent conclusion is that the NLP ap-proach is not suitable for directly estimating nonconstant proc-2.5 - ess parameters of dynamic systems such as the fedbatchbioreactor considered here. An alternate approach in this casewould be to experiment with different parameterized modelsfor the unmeasured process parameter, in order to determinewhich is most suitable for the specific problem at hand. Theadvantage of the hybrid neural network model is that, byproviding a very general parameterization of the process pa-rameter (neural network component), it helps avoid this ex-perimentation.0 5 -

00 -

3 F-r---2.5 1

o) 2' o. 5 11 0 -

0.5 -

0 0 -

the Kalman filter. The discrete linear Kalman filter providesoptimal (in the sense of maximum likelihood) estimates of alinear dynamic system's states in the presence of additive whiteGaussian noise (Anderson and Moore, 1981). Extended Kal-man filtering applies this method to nonlinear time-varyingsystems. For a nonlinear dynamic system with the discrete timerepresentation:

x k + l = f k ( x k t P k , u k ) + w k (17)(18)k =h k x k ) + k

where the noise vectors wk and v p represent white Gaussianstate and measurement noise processes, with

We investigated the performance of the NLP estimationapproach on the bioreactor problem under the assumption thatthe functional relationship describing the cell growth rate iscompletely unknown. Thus the (discretized) growth rate wastreated as a time-varying unknown parameter to be directlyestimated; the decision variables represented the estimates ofthe growth rate for the corresponding sampling times. A gra-dient-based optimization strategy, Successive Quadratic Pro-gramming (Cuthrell and Biegler, 1985), was used to solve theproblem. The gradient of the objective function-squared pre-diction error-can be analytically calculated through integra-tion of the sensitivity equations, as described in the section oncomparing hybrid and standard neural networks. However,this requires substantial computation, and a simpler alternativeof using numerical derivatives was used.Under this formulation, this parameter estimation methodessentially performs local fitting of the unknown cell growthrate. As a result, the presence of noise in the process variablesmeasurements degrades the estimator's performance and pro-duces erroneous parameter estimate values. The parameterestimates were also significantly affected by the magnitude ofstable in estimating the time-varying growth rate under this

wherethe upper bound, indicating that the method is essentially un- aJk - l= :it::)- l / k - I (27)AIChE Journal October 1992 Vol. 38, No. 10 1505


8/13


9/13

I 1

0.3

1.5 1

. .... .

1. 0 t= IX i- crual valuehybrid nc1sprcdict~on.....

0 5 10 15 Mt (hours)

Figure 7a. Substrate concentrat ion (S) redic t ion vs .t ime for the hybr id neural network processmodel .

i0 5 10 15 20

t (hours)

Figure 7b. Biomass conc entrat ion (X ) predic t ion vs. t imefor the h ybr id neural network p rocess model .is much improved. In contrast, the growth rate prediction ofthe hybrid model is in general more accurate.The Kalman filters performance depends on a number oftuning parame ters which have to be carefully selected forgood performance. The initial state estimate and the initialstate covariance affect its response; the values of the meas-urement noise and process state noise also affect performanc e.In addition, the estimator Eqs. 20-26 have been derived withthe assumption of white Gaussian noise; performance underdifferent and unknown noise statistics may well deteriorate.In contrast, the hybrid neural network model needs no suchtuning parameters, makes n o assumption abo ut the noise sta-tistics, and is not affected by irrelevant inputs (biomass con-centration) to the neural network component of the model.The only cumbersome step in the use of the hybrid model isthe training time.Incomplete State MeasurementsHybrid network model

In the development of the hybrid neural network modeldiscussed in the section on standard and hybrid neural networkmodels, th e availability of the gradient of the state with respectto process parameters (growth rate) is essential. When thesensitivity equations are a function of nonmeasured state vari-ables, as is the case here, gradient information is not directlyavailable. As a result, the derivative of the performance meas-

ure (squared prediction erro r) of the hybrid m odel with respectto the neural networks (unmeasured) output cannot be ana-lytically calculated.One way to overcome this problem is to estimate the un-measured state variables using state observers, which computeestimates of the state of a dynam ic system given estimates ofthe initial statexo open-loop observers). When process out putmeasurements are available, then the prediction error can beused as a feedback term, multiplied with an observer gain, toprovide better state estimates. This results in a closed-loopobserver, which has desirable characteristics (Kravaris andChung, 1987; Banks, 1981; Bestle and Zeitz, 1983). Followingthe analysis of Kou et al . (1975), we can construct an expo-nential closed-loop observer for the fedbatch bioreactor sys-tem, as discussed previously. Based on Eqs. 3-5 and theavailability of sub strat e concentration measurem ents, we have:

- I

V h = [ O 11 (33)With a choice of the observers gain matrix B such that

0.0 t0 5 10 15 20

I (hours)Figure 8a. Cel l growth rate estimate for the extendedKalman f i l ter com bined wi th sequent ial least-squares estimation.

0 5 10 15 Mt (hours)

Figure 8b. Cel l grow th rate estim ate for the hybrid net-work.AIChE Journal October 1992 Vol. 38, No. 10 1507


10/13

(34)

it can be shown that the matrix

is stable and, as a result, the closed-loop observer describedby the equations:

X(0)=x,, 0) =so (38)is an exponential observer for the fedbatch bioreactor. Equa-tions 36-38 comprise the first principles part of the hybridneural network model; the other component consistsof a neuralnetwork that estimates the microbial growth rate p , given asinput only measurements of the substrate concentration. Asin the section on the training of standard and hybrid neuralnetwork models, we can derive a set of observer sensitivityequations from Eqs. 36-37, which provide the gradient of theperformance measure with respect to the networks output.The performance measure in this case is the squared predictionerror of the substrate concentration only. Thus Eq. 14 has tobe modified so that the error signal used to update the net-works weights involves only the prediction error on the sub-strate concentration. The training procedure is otherwise thesame as the one discussed earlier.

The prediction5 for the substrate concentration and the cellgrowth rate of a hybrid neural network model trained usingonly substrate concentration measurements is shown in Figure9, for the same simulated batch run as before. Partial stateaccessibility does not noticeably degrade the hybrid modelsability to estimate the growth rate; this is not true, as will beseen in the following, for extended Kalman filtering. In mostof the operating regime the prediction accuracy is quite good,with the possible exception of the regime where the substrateconcentration approaches zero. However, it should be em-phasized-in defense of the method-that the signal-to-noiseratio for the substrate concentration measurements is very lowin this regime. This problem was also apparent in the full statehybrid network model. More importantly, this partial statehybrid network model, like the full state hybrid network, givesestimates for the unmeasured process variables (biomass con-centration). In contrast, standard neural networks can onlypredict the measured process output (ARMA type models).

0 5 10 15 20t (hours)

Figure 9a. Substrate concentrat ion (S) predic t ion vs.t ime for the hyb r id network m odel us ing par-t ial state measurements.

0 5 10 15 20t (hours)

Figure 9b. Cell grow th rate est imate for the h ybr id net-work us ing p ar t ial s tate measurements .

NLP estimation and extended Kalman filteringThe NLP optimization method can be applied to the problem

of state and process parameter estimation of the fedbatchbioreactor, in the case when only substrate concentration meas-urements are available, and the basic problem formulation andsolution methods remain the same as presented in the sectionon state and parameter estimation using an NLP technique.However, the objective function involves only the error of themeasured output over the process monitoring window. Wefound the estimators performance to be similar to the onepreviously discussed using full state measurements. Further-more, the observations made in the earlier section about themethods deficiencies when estimating time-varying processparameters, for which properly parameterized models are notavailable, also apply here.

The extended Kalman filtering method combined with asequential least-squares parameter estimation technique canalso be applied to the bioreactor modeling problem, when onlysubstrate concentration measurements are available. The es-timators formulation, as described in the previous section(Eqs. 20-26), remains the same, with the only difference thatthe measurement vector zk is now a scalar. This simplifies theexpressions that are used to update the parameter estimate(Eqs. 29-31) since r and \k now represent scalar quantities.The performance of the partial state Kalman filter estimatoris shown in Figure 10 for the same batch run as in the previous

1508 October 1992 Vol. 38, No. 10 AIChE Journal


11/13

1.5

1.0-ccc0.5

0.0

T

A Ah

7 , , ,0 5 10 15 M

I hours)

Figure 10a. Substrate concentration (S) prediction vs.time for the extended Kalman filter, com-bined with sequential leas1-squares param-eter estimation, using partial statemeasurements.

0 5 10 15 20t (hours)

Figure lo b. Cell growth rate estimate for the extendedKalman filter, combined with sequentialleast-squares parameter estimation, usingpartial state measurements.section. It is evident that the prediction accuracy decreaseswhen only the substrate concentration is measured; the estimateof the growth rate is inaccurate in most of the operating regime.The hybrid neural network model gives more accurate predic-tions with partial state measurements than the Kalman filter,particularly for the (unmeasured) bicmass concentration.

Process Opera t ion Schedu l ing us ing a Hybr idNeural Network ModelIt was emphasized above that an important feature of the

hybrid neural network model is that it provides a model forthe unobserved growth kinetics. A practical application inshowing the usefulness of such a model is the optimization of(fed)batch process operating schedules or, more specifically,determining the substrate feeding policy that maximizes prod-uct yield. One way to obtain such an optimal substrate feedpolicy is to formulate the problem as an optimal control prob-lem and calculate (analytically if the growth rate kinetics arecompletely known) the substrate profile. An alternative methodis to use optimization techniques to maximize a suitable costfunction.AIChE Journal October 1992

We formulated the problem of determining the optimal feedpolicy for the bioreactor following the latter approach, as:

subject to Eqs. 3- 5 and

This formulation implies that the objective function to bemaximized is the cell mass in the reactor after the end of somefeeding period T . Equations 3-5 describe the process model,and T ( ) represents the neural network model of the growthrate; together these equations comprise the hybrid networkmodel. Equation 41 simply implies that the substrate feed isheld constant for three consecutive time intervals. A simulationfor T = 15 hours, followed by a subsequent quenching pe-riod of 5 hours where no substrate is fed in the reactor, wasperformed. With a sampling time of0.2 hours, this formulationleads to an optimization problem with 25 decision variables.Upper and lower bounds were imposed on the substrate feed,and numerical derivatives were used to determine the gradientof the objective function; the problem was solved using SQP.

The substrate feed optimal profile, for a batch run withinitial conditions of X = O . 5 g/lt, S=O.1 g/lt and V = 0 It, isshown in Figure 11; also, for comparison, the optimal profileobtained if the same problem is solved using Eqs. 3-6 (that is,the true growth rate model) instead of the hybrid networkprocess model. The substrate feed is initially set at a high value,so that the substrate concentration can be rapidly increased toa value of about lg/lt which is the value that maximizes thecell growth rate. Subsequently, the substrate feed is initiallydecreased and progressively increased in order to regulate thesubstrate concentration to this maximum growth rate value.It can be seen that the hybrid models predictions for theoptimal policy are in very good agreement with the predictionsusing the true model. This suggests that the hybrid neuralnetwork model can be used in the design of high product yield

14t (hours)

Figure 11. Comparison of optimal substrate feed poli-ties for the fedbatch bioreactor, as calcu-lated by using he actual process model(Eqs.3-6) and the hybrid network process model.

Vol. 38, No. 10 1509


12/13

batch runs. Furthermore, with its predictive capability, it canalso be used for on-line multistep predictive control when theprocess is required to follow an optimal feed policy which hasbeen previously calculated off-line, or in gain scheduling con-trol.

DiscussionA hybrid neural network modeling approach was presentedand used to model a fedbatch bioreactor. This hybrid modelis comprised of two parts including a partial first principlesmodel, which reflects the a priori knowledge, and a neuralnetwork component, which serves as a nonparametric ap-proximator of difficult-to-model process parameters. This formof hybrid neural network is useful for modeling processeswhere a partial model can be derived from simple physicalconsiderations (for example, mass and energy balances), butwhich also includes terms that are difficult (or even infeasible)to model from first principles. For example, the neural networkcomponent of the hybrid model may be used to approximateunknown reaction kinetics, or predict product properties whosecorrelation with the process variables is difficult to determinefrom first principles, such as the solution viscosity in poly-merization reactors. The bioreactor used to demonstrate thishybrid modeling method only contained one process param-eter, but it is in principle straightforward to extend this methodwhen models for multiple parameters are to be learned. Weare currently studying such problems.

Such hybrid neural networks have distinct advantages overstandard black-box neural networks. As was argued above,the hybrid model uses its internal structure to restrict the in-teractions among process variables to be consistent with phys-ical models. This produces combined models which are morereliable and which generalize and extrapolate more accuratelythan standard neural networks. Furthermore, interpreting astandard network is difficult, as inferred knowledge of theprocess is spread among its many internal parameters. In con-trast, the nonparametric approximation in the hybrid neuralnetwork model is restricted to modeling terms for which apriori models are difficult to obtain. Of equal importance,significantly less data are required for training hybrid neuralnetworks. As was argued, the partial model projects theerror signal to a subspace that is easier to sufficiently explorewith a small number of training points. Put differently, useof the partial first principles model reduces the number offunctions that the neural network has to choose from to ap-proximate the process parameters. Thus, hybrid networkmodels give far better approximation accuracy-for the samenumber of data points-than standard neural networks.

The hybrid modeling approach assumes that the partial firstprinciples model has a reasonably correct structure, therebyreducing the identification problem to that of estimating un-measured process parameters. If the partial models structureis highly uncertain, or contains a large number of very complexprocess parameters, then hybrid modeling may not provide asignificant advantage over standard black-box neural net-works. This may also be true when modeling processes op-erating in a continuous mode, where the behavior of interestis in a small region around a nominal operating point.

It is possible to use a simple parameterized model (for ex-ample, linear or quadratic) in place of the neural network and1510 October 1992

estimate its unknown coefficients by using the available processdata. This approach has been widely used in certain applica-tions and works, to the extent that the correct parameterizationis chosen for the process parameter model. However, it is oftennot possible to a priori choose a parameterization that canclosely approximate the true process parameter model. Forexample, any linear parameter model will perform poorly whentrying to approximate the growth rate kinetics used in thisarticle. Often the true parameter model is quite complex andextensive experimentation is required to get an acceptable ap-proximation. For example, polymerization reactors are typi-cally described by kinetic expressions which are quite complexand involve a large number of adjustable coefficients, andbiological reactions often have growth kinetics described bycomplex nonlinear expressions, with exponential dependencieson the state variables. Use of the hybrid modeling approachin such problems provides a general and versatile parameter-ization (neural network) which can approximate arbitrarilycomplex parameter models and can help avoid the lengthyexperimentation to determine a properIy parameterized model.

Extended Kalman filtering and NLP estimation were con-sidered as alternative methods of estimating unknown processparameters in a known first principles model, under the as-sumption that a priori parameterization of the process param-eter model is not possible. The hybrid network modeloutperformed both methods in estimating the unobservedprocess parameter, especially when only partial state meas-urements were available. In the latter case, a hybrid model canbe developed by using a state reconstruction scheme, as shownin the section on the hybrid network model. If a sufficientlydetailed process parameter model is available so that the un-known coefficients which it contains are almost constant (suchas reaction rate activation energies), use of a hybrid neuralnetwork is not appropriate and extended Kalman filtering orNLP estimation are more suitable. However, for process pa-rameters which are rapidly time-varying and are not easilydescribed by a parameterized model, the hybrid neural networkgives superior performance.

Notat ionE ( Y ) = expectation of a random variable YF =I =k , =

K m , K , =P =P =R =s =S,, =S,=v =V =

W =x =y =p =

w =

y . =i/ J

Y, =z =

inlet flow rateidentity matrixsubstrate to cell conversion coefficient (=1)Haldane growth rate model constants (=10 an d 0.1 re-spectively)state covariance matrix (Kalman filter)process state noise covariance matrixmeasurement noise covariance matrixsubstrate concentrationmean value of Sin ver the batch runinlet feed concentrationprocess measurement noise vectorreactors volumestate variable noise vectorKalman filters gain matrixbiomass concentrationmeasured value of the variable Ypredicted value of variable Y (Kalman filter, exponentialobserver)value at time i given information up to time j (Kalmanfilter)value of variable Y at time kvector of process measurements

Vol. 38, No. 10 AIChE Journal


13/13

Greek lettersI = parameter vectors covariance matrix (Kalman filter)p = constant (=5)

,urn= = maximum growth rate (= /21)p ( t ) = cell growth rateu = dimensionless substra te concentrationx = dimensionless biomass concentration

Literature CitedAnderson, B. D. O., and J . B. Moore, Optimal Filtering, Prentice-Hall, Englewood Cliffs, NJ (1979).Banks, S. P., A Note on Nonlinear Observers, Int . J. Control, 34 ,185 (1981).Bestle, D., and M. Zeitz, Canonical Form Observer Design for Non-Linear Time-Variable Systems, Int . J . Co ntro l ,38, 19 (1983).Bhat, N., and T. McAvoy, Use of Neural Nets fo r Dynamic Modelingand Control of Chemical Processes, Co mp . Che m. Eng . , 14, 573(1990).Bhat, N., and T. McAvoy, Information Retrieval From A StrippedBack-propagation Network, AICh E Meeting (1990).Brengel, D. D., and W. D. Seider, Multistep Nonlinear Predictive

Controller, Ind. Eng. Chem. Res. , 28, 1812 (1989).Caracotsios, M., and W. E. Stewart, Sensitivity Analysis of InitialValue Problems With Mixed ODES and Algebraic Equations,Com p. Chem. Eng. , 9, 359 (1985).Cuthrell, J . E., and L. T. Biegler, Improved Infeasible Path Opti-mization for Sequential Modular Simulators: 11. The OptimizationAlgorithm, Comp. Chem. Eng. , 9, 257 (1985).Dochain, D., and G . Bastin, Adaptive Control of Fedbatch Bio-reactors, Chem. Eng. Comm un., 87, 67 (1990).Goodwin, G . C., and K. S. Sin, Adaptive Filtering Prediction andControl, Prentice-Hall, Englewood Cliffs, NJ (1984).Gostomski, P. A., B. W. Bequette, and H. R. Bungay, MultivariableControl of a Continuous Culture, AICh E M eeting (1990).Haesloop, D., and B. R. Holt, A Neural Network Structure forSystem Identification, Proc . of the American Con trol Conf., 2460(1990).Hernandez, E., and Y. Arkun, Neural Network Modeling and anExtended DMC Algorithm to Control Nonlinear Systems, Proc .of the American Control Conf., 2454 (1990).Jang, S . S . , B. Joseph, and H. Mukai, Comparison of Two Ap-proaches to On-Line Parameter and State Estimation of NonlinearSystems, Ind. Eng. Chem . Process Des. Dev ., 25, 809 (1986).Karnin, E. D., A Simple Procedure for Pruning Back-Propagation

Trained Neural Networks, IEEE Trans. on Neural Networ ks, 1,239 (1990).Kou, S. R., D. L. Elliot, and T. J . Tarn, Exponential Observers forNonlinear Dynamic Systems, Inf. and Control, 29, 204 (1975).Kravaris, C., and C. B. Chung, Nonlinear State Feedback Synthesisby Global Input/Output Linearization, AIChE J . , 33, 592 (1987).Lant, P. A., M. J . Willis, Ci . A. Montague, M. T. Tham, and A. J .Morris, A Comparison of Adaptive Estimation With Neural BasedTechniques for Bioprocess Application, Proc . of the AmericanControl Conf., 173 (1990).Leonard, J . A., M. A. Kramer, and L. H. Ungar, A Neural NetworkArchitecture That Computes Its Own Reliability, Co mp . Che m.Eng . , submitted (1991).Mavrovouniotis, M. L., and S. Chang, Hierarchical Neural Networksfor Process Monitoring, Co mp . Che m. Eng . , 16(4), 347 (1992).Mozer, M. C., and P. Smolensky, Skeletonization: A Technique ForTrimming The Fat From A Network via Relevance Assessment,in Advan ces in Neural Information Processing, D. S. Tourestzky,ed., Morgan Kaufmann, San Mateo, CA (1989).Psichogios, D. C., and L. H. Ungar, Direct and Indirect ModelBased Control Using Artificial Neural Networks, Ind. Eng. Chem.R e x , 30 , 2564 (1991).Rivera, S. L., and M. N. Karim, Application of Dynamic Program-ming for Fermentative Ethanol Production by Zymornonas rnobi-lis, Proc. of the American Control Conf., 144 (1990).

Rumelhart, D., G . Hinton, and R. Williams, Learning Internal Rep-resentations by Error Propagation, Parallel Distribu ted Process-ing: Explorations in the Microstructures of Cognition: I .Foundations, MIT Press (1986).Sistu, P., and W. Bequette, Process Identification Using NonlinearProgramming Techniques, Proc . of the American C ontrol Con f.,1534 (1990).Stinchcombe, M., and H. White, Universal Approximation UsingFeedforward Networks With Non-Sigmoid Hidden Layer ActivationFunctions, Proc . of the Int. Joint Conf.on Neural Networks , 613(1989).Thibault, J . , V. Van Berugesen, and A. ChCruy, On-Line Predictionof Fermentation Variables Using Neural Networks, Biotech. andBioeng., 36, 1041 (1990).Wells, C. H. , Application of Modern Estimation and IdentificationTechniques to Chemical Processes, AIChE J . , 17, 966 (1971).Werbos, P., Beyond Regression: New Tools for Prediction and Anal-ysis in Behavioral Sciences, PhD Thesis, Harvard University (1974).Willis, M. J . , C. DiMassimo, G . A. Montague, M . T. Tham, and A.

J . Morris, Artificial Neural Networks in Process Engineering,IEE Proc .-D , 138, 256 (1991).Manuscript received Dec. 5 , 1991, and revision received MQY 13, 1992.


psichogios_aiche

Documents