10.0000@@220275220

Upload: zaid

Post on 20-Feb-2018

222 views

Category:

Documents


0 download

TRANSCRIPT

  • 7/24/2019 [email protected]@220275220

    1/13

    This article appeared in a journal published by Elsevier. The attached

    copy is furnished to the author for internal non-commercial research

    and education use, including for instruction at the authors institution

    and sharing with colleagues.

    Other uses, including reproduction and distribution, or selling or

    licensing copies, or posting to personal, institutional or third partywebsites are prohibited.

    In most cases authors are permitted to post their version of the

    article (e.g. in Word or Tex form) to their personal website or

    institutional repository. Authors requiring further information

    regarding Elseviers archiving and manuscript policies are

    encouraged to visit:

    http://www.elsevier.com/copyright

    http://www.elsevier.com/copyrighthttp://www.elsevier.com/copyright
  • 7/24/2019 [email protected]@220275220

    2/13

    Author's personal copy

    Scour depth modelling by a multi-objective evolutionary paradigm

    Daniele Laucelli a,*, Orazio Giustolisi b

    a Dept. of Civil and Environmental Engineering, Technical University of Bari, v. E. Orabona 4, 70123 Bari, Italyb Engineering Faculty of Taranto, Technical University of Bari, v.le del Turismo 8, 74100 Taranto, Italy

    a r t i c l e i n f o

    Article history:

    Received 13 July 2010Received in revised form10 October 2010Accepted 17 October 2010Available online 13 November 2010

    Keywords:

    Evolutionary polynomial regressionEvolutionary computationRegression analysisMulti-objective optimizationGenetic algorithmsLocal scouring

    a b s t r a c t

    Local scour modelling is an important issue in environmental engineering in order to prevent degra-dation of river bed and safeguard the stability of grade-control structures. Many empirical formulationscan be retrieved from literature to predict the equilibrium scour depth, which is usually assumed asrepresentative of the phenomenon. These empirical equations have been mostly constructed in someways by leveraging regression procedures on experimental data, usually laboratory observations (thusfrom small/medium scale experiments). Laboratory data are more accurate measurements but generallynot completely representative of the actual conditions in real-world cases, that are often much morecomplex than those schematized by the laboratory equipment. This is the main reason why some of theliterature expressions were not adequate when used for practical applications in large-scale examples.This work deals with the application of an evolutionary modelling paradigm, named EvolutionaryPolynomial Regression (EPR), to such problem. Such a technique was originally presented as a classicalapproach, used to achieve a single model for each analysis, and has been recently updated by imple-menting a multi-modelling approach (i.e., to obtain a set of optimal candidate solutions/models) wherea multi-objective genetic algorithm is used to get optimal models in terms of parsimony of mathematical

    expressions vs.

    tting to data. A wide database of

    eld and laboratory observations is used for predictingthe equilibrium scour depth as a function of a set of variables characterizing the ow, the sediments andthe dimension of the grade-control structure. Results are discussed considering two regressive modelsavailable in literature that have been trained on the same data used for EPR. The proposed modellingparadigm proved to be a useful tool for data analysis and, in the particular case study, able to nd feasibleexplicit models featured by an appreciable generalization performance.

    2010 Elsevier Ltd. All rights reserved.

    1. Introduction

    Analysis and modelling of the interaction between engineeringmanufacts and environmental systems are a key element for anappropriate and effective design of these structures, while prevent-ing excessive degradation of the environment. This manuscript willconsider theeffectof Grade-Control Structures (GCSs):the local scourof alluvial bed channels. This kind of manufacts (aprons, spillways,bed sills, weirs, check dams, etc.) is devoted to reduce/limit theexcessive degradation of streams and rivers, while reducing theirslope, limiting the solid transport, reducing the ow velocity andbanks erosion. Their positive effects were however counterbalancedby local scouring, which occursdownstreamof themand could affectthe stability of the structures themselves, and eventually of otherneighbour structures such as bridge piers and embankments.

    Local scouring is caused by the presence of the manufact itselfwhich modies the water ow directions generating jets withvelocity and impact angle quite different from those in the mainstream. As a result, shear stresses are exerted on soil particles ofriver bed downstream of the structure (see Fig. 1) causing theirremoving. It is worth noting that for live-bed conditions (i.e., thosethat usually happen in real-world cases) there is a transport of bedmaterial from upstream; the scour depth increases rapidly and, dueto the interaction between erosion and deposition, it tends touctuate around an equilibriumvalue (DAgostino and Ferro, 2004).The consequence of this phenomenon is the creation of a scourpool (also named impinging pool), which is often featured bymeans of its maximum depth. Therefore, the amount of erosion canbe directly correlated to the maximum depth of the scour pool,which will be named henceforth as equilibrium scour depth.

    This brief description of the phenomenon under considerationhelps in listing the components playing a role in its denition: thedimensional features of the manufact, the geological andgeophysical features of channel bed, the hydraulics ofow, without

    * Corresponding author. Tel.:39 (0) 994733210; fax:39 (0) 99 4733229.E-mail address:[email protected](D. Laucelli).

    Contents lists available atScienceDirect

    Environmental Modelling & Software

    j o u r n a l h o m e p a g e : w w w . e l s e v i e r . c o m /l o c a t e / e n v s o f t

    1364-8152/$ esee front matter 2010 Elsevier Ltd. All rights reserved.doi:10.1016/j.envsoft.2010.10.013

    Environmental Modelling & Software 26 (2011) 498e509

  • 7/24/2019 [email protected]@220275220

    3/13

    Author's personal copy

    omitting the temporalevolution of the phenomenon (Pagliara et al.,2008). The resulting portrait is a very complex phenomenon due toa wide range of physical parameters that need to be dened/measured for its complete denition.

    For these reasons, researchers have mainly concentrated theirefforts on developing empirical formul based on laboratory(usually) and eld (rarely) measurements to predict the equilib-rium (or quasi-equilibrium) scour depth under various owconditions and structure congurations, as those reported forexample in Veronese (1937), Mason and Arumugam (1985),Hoffmans (1998), Dargahi (2003). In general, these are regressivemodels dened on their own experimental observations, thus theyare mainly effective on those data which are included in the samerange of the training data, while often showing some problems ifapplied out of this range (e.g., in real-scale applications). This isbasically due to the difference of the scale of representation (fromlaboratory scale to real-world scale), the presence of errors (in theeld data) and the unavailability of some input variables necessaryto dene the model in real-scale contexts, which need to beeventually estimated (tabled values, for example). This, in turn,introduces uncertainty/errors (Ettema et al., 2000; DAgostino andFerro, 2004; Azmathullah et al., 2005).

    Trying to dene a more general formulation for equilibriumscour depth prediction, Bormann and Julien (1991) theoreticallyderived an equation based on the concepts of jet diffusion andparticle stability in scour pools downstream of grade-controlstructures, testing the equation on prototype experiments. In thisdirection, researchers have recently preferred to use of dimen-sionless variables for featuring the phenomenon, trying to over-come scaling problems. For example, DAgostino and Ferro (2004)studied the scour pattern downstream of a grade-control structureusing dimensionless groups, by the application of the incompleteself-similarity theory. They tested the procedure on published andunpublished data collected at different scales and characterized bydifferent bed grain-size distributions, producing two relationshipsfor predicting the maximum scour depth.

    In the last years, in the light of the growing availability ofcomputational power, some researchers tried to assess localscouring by means of pure data-driven approaches, such as arti-cial neural networks (Liriano and Day, 2001; Azmathullahet al., 2005; Guven and Gunal, 2008) and fuzzy logics (Uyumazet al., 2006; Azamathulla et al., 2009; Farhoudi et al., 2010). Allthese studies were aimed at increasing the generalization capacityof the returned models, which means their performance on unseendata (i.e., out of their training range). However, limitations of puredata-driven approaches affected the nal results in few ways. (i)The relationship among explanatory variables and scour depth issought out as a single mathematical expression, under priorassumptions about the inuencing factors and its mathematicalform, thus motivating the adoption of trial & errorapproaches to

    select good models. (ii) More often than not the retrieved expres-sion/model is not parsimonious (in terms of number of variablesinvolved and complexity of mathematical structures) (i.e., articial

    neural networks), thus being quite accurate on training data butshowing poor generalization ability if applied to real-scale prob-lems. In fact unnecessary model complexity of such models is oftena symptom of over-tting to experimental data and scarce gener-alization capabilities; moreover, complex formulas are usually

    difcult to evaluate from a physical point of view. (iii) The model-ling strategy usually provides one model considering as singleobjective the maximization of tting to data without explicitlyaccounting for model parsimony (useful for its generalizationability). Azamathulla et al. (2010) applied the Genetic Programming(GP) modelling paradigm (Babovic and Keijzer, 2000). It providesexplicit formulations of data models (symbolic structures ofmodels) by performing a global exploration of the model space,although this approach shows some drawbacks. For example, it isnot very powerful in nding constants and, more important, ittends to produce models with a very complex formulation(Giustolisi and Savic, 2006), which are formally symbolic struc-tures, but are difcult to analyze.

    In summary, all models are supported by physical knowledge ofthe phenomenon, which has led scientists in collecting data anddening the mathematical structure of the model expression. Suchmodels have been usually calibrated by numerical regression oralternative data-driven techniques using measured data. Thisundoubtedly affects the portability of formulas on different datawhich can often be incomplete or affected by errors. For example,neglecting data of different scale, a model over-tted on laboratorydata may show scarce accuracy when applied on data from thesame scale but affected by errors. Therefore, while modelling localscouring by tting data (maximization of accuracy) it could beuseful to introduce another objective that control the complexity ofresulting mathematical expression thus leading to improvedgeneralization of the model. In fact, the so-called principle ofparsimony (Ockhams razor, William of Ockham,1300e1349) statesthat for a set of otherwise equivalent models of a given phenom-enon one should choose the simplest one to explain a dataset; thisin turn helps preventing over-tting to training data (Young et al.,1996; Crout et al., 2009). However, this modelling strategy callsfor a trade-off between model complexity andtness to data whiledeveloping the model itself.

    The paper is focused on the implementation of a newlyproposed modelling strategy by Giustolisi and Savic (2006,2009): Evolutionary Polynomial Regression (EPR). It is a hybridmodelling technique which combines linear regression andevolutionary search for mathematical model structures. A multi-objective evolutionary optimization paradigm is employed to nda set of mathematical expressions that can be assumed asoptimal given some objectives according to the Pareto domi-nance criterion (Pareto, 1896). EPR is here used to model the

    equilibrium scour depth downstream of grade-control structuresby analyzing a large database, ranging from real observations tolaboratory scale data (DAgostino and Ferro, 2004; Bormann and

    Julien, 1991). EPR generates a number of eligible models forpredicting the equilibrium scour depth (i.e., the Pareto front ofmodels) which have symbolic/explicit formulations, thus allow-ing the evaluation of physical insight about local scouring process(in this case). EPR has been already proved reliable in manyapplications in civil and environmental engineering. For furtherdetails on EPR applications the reader is referred to the EPR(2009)webpage.

    2. Introduction to evolutionary polynomial regression

    EPR can be dened as a non-linear global stepwise regression,providing symbolic formul of models. It is global since the searchfor optimal model structure is based on the exploration of the

    Fig. 1. Denition sketch for scour downstream of a grade-control structure.

    D. Laucelli, O. Giustolisi / Environmental Modelling & Software 26 (2011) 498e509 499

  • 7/24/2019 [email protected]@220275220

    4/13

    Author's personal copy

    entire space of models by leveraging a exible coding of modelstructure. Moreover, EPR generalizes the original stepwise regres-

    sion ofDraper and Smith (1998)by considering non-linear modelstructures (i.e., pseudo-polynomials) although they are linear withrespect to regression parameters.

    Although the details about the main EPR paradigm areexplained in the reference works (Giustolisi and Savic, 2006;

    Giustolisi et al., 2007), a concise description is reported here.One of the general model structures that EPR can manage is

    reported in Eq.(1)

    Fig. 2. Flow-chart of EPR working phases (Giustolisi and Savic, 2006).

    D. Laucelli, O. Giustolisi / Environmental Modelling & Software 26 (2011) 498e509500

  • 7/24/2019 [email protected]@220275220

    5/13

    Author's personal copy

    Y a0 Xm

    j 1aj$X1ESj;1.XkESj;k

    fX1ESj;k1$.$XkESj;2k

    1

    wheremis the number of terms,ajare parameters to be estimated,Xiare candidate explanatory variables, ES(j,z) (z1, ., 2k) is theexponent of the zth input within the jth term in Eq. (1), f isa function selected by the user among a set of possible alternatives(including no function selection). In brief, the search for modelstructure is performed by exploring the combinatorial space ofexponents to be assigned to each candidate input of Eq.(1)amonga userdened set of possible values (which should include 0). Thus,although exponent values could be any real number, they are codedas integers during the search procedure. It is worth noting that,when an exponent is equal to zero, the relevant inputXiis basicallydeselected from the resulting equation (i.e., the exponentE(j,k) 0and the relevant input Xkj

    0 1).On these premises, EPR is a hybrid data-mining modelling

    technique whose main features are explicitlystated in its name. It isEvolutionary, because it employs a population based strategy forsearching optimal models by mimicking the evolution of the ttestindividuals in nature. In particular it employs a Genetic Algorithm(GAs) (Goldberg, 1989) to nd the optimal sets of exponents in Eq.(1)within the combinatorial search space, as dened by the userdened set of exponents. It is Polynomial, because EPR mathemat-ical structures, e.g., Eq.(1) are linear with respect to their param-eters although not necessarily linear in their attributes (due to bothexponents different from 1 and possible selection of functionf). EPRis actually a Regression technique since model parameters of anypseudo-polynomial expressionare computed from data.

    The parameter estimation is solved as a linear inverse problemin order to guarantee a two-way (i.e., unique) relationship betweeneach model structure and its parameters (Giustolisi and Savic,2006). From a regressive standpoint, EPR may produce a non-linear mapping of data (like that achievable by Articial NeuralNetworks (Haykin, 1999)) although with few constants to estimateand using linear regression for parameters estimation. Thesefeatures, in turn, help in avoiding over-tting to training dataespecially when the dataset is not large. Furthermore, EPR employsan automatic model construction (avoiding a prior rigid selection ofmathematical form and number of parameters) and a readablemodel formulation.

    Fig. 2illustrates the EPR working sequence as ow-chart. Theinitial population of models is randomly created by resorting toa Latin Hypercube sampling of the search space (McKayet al., 1979).During the search for models, mathematical structures are createdby assigning exponents to relevant inputs, then parameters are

    estimated (e.g., through least square or non-negative least square)in order to get complete mathematical expression. All expressionscorresponding to the current population of exponents are thenevaluated in terms of tness to data and model complexity. Thesteps on left side ofFig. 2describe the main operations performedwithin the GA. The criterion for the EPR run to stop is representedby the maximum number of GA generations.

    2.1. Multi-objective EPR framework

    Although the original EPR methodology proved to be effective, itused only one objective (i.e., the accuracy of data tting) forexploring the space of solutions (Giustolisi et al., 2007). However,

    the single objective EPR methodology has shown some shortcom-ings, such as (i) an exponentially decreasing performance with theincreasing of polynomial terms; (ii) the returned models are often

    difcult to interpret and inevitably prone to some subjectivejudgment for their selection (i.e., the selection process often biasedby the analysts experience rather than being purely based onmathematical/statistical criteria) (Young et al., 1996); (iii) whilesearching for models with m-terms, often it happens that more

    parsimonious formulations (e.g., those havingm 1 terms, that inthis search run are degenerative case of the targeted models) areabandoned because there could be less parsimonious formul thatt better the training data; (iv) the introduction of unnecessarycomplexity (i.e., addition of new terms or combinations of inputs)that ts mostly random noise rather than the underlyingphenomenon (i.e., a problem of over-tting (Crout et al., 2009)).

    Therefore, the version of EPR presented here (Giustolisi andSavic, 2009), labelled as MO-EPR from this point forward, imple-ments an evolutionary multi-objective optimization strategy basedon the Pareto dominance criterion (Pareto, 1896; Van Veldhuizenand Lamont, 2000). MO-EPR explores the space of m-termformul using two or three among the following (conicting)objectives: (i) the maximization of model accuracy, (ii) the mini-mization of the number of model coefcients and (iii) the mini-mization of the number of actually used model inputs (i.e., whoseexponent is not 0 in the resulting model structure). The last twoobjectives are measures of model parsimony, thus MO-EPRnallyobtains a set of optimal solutions (i.e., the Pareto front) that can beconsidered as trade-offs between structural complexity and accu-racy (Reed et al., 2007). The application of a multi-objectivestrategy for MO-EPR algorithm leads to many advantages withrespect to single objective EPR (Giustolisi and Savic, 2006): (i)improved computational efciency; (ii) improved exploration ofthe space of solutions/formul; (iii) automatic ranking of returnedformul according to structural complexity; (iv) enabled compar-isons among formul according to different criteria (e.g., selectedinputs, structure, etc.); (v) possibility of searching for differentmodels for different goals; and (vi) trade-off between complexityand accuracy of the returned models (Giustolisi and Savic, 2009).Atthe end of MO-EPR run the optimal models can be evaluated usinga test set of data (i.e., unseen during model construction) to assessgeneralization performance.

    The followingFig. 3shows the decision support framework formodel selection allowed by MO-EPR. After settings MO-EPR searchoptions (candidate model attributes, candidate exponents forattributes, maximum number of parameters, etc.), like in classicalEPR, and once a Pareto set of optimal models is obtained withrespect to MOGA search objectives, the analyst is allowed toperform a further model selection. Such selection can be accom-plished by considering: (i) the model structure with respect tophysical insight related to the problem; (ii) similarities of mathe-matical structures among MO-EPR Pareto set of models, as sorted

    according to model complexity; (iii) recurrent groups of variables indifferent MO-EPR models; (iv) generalization performance ofmodels as assessed in terms of both statistical analysis and math-ematical parsimony.

    Overall, the user is allowed to evaluate a setof models looking atdifferent key aspects which encompass his/her knowledge of thephysical phenomenon, reliability of experimental data used to buildthe model and/or nal purpose of the model itself. This eventuallyresults into a more robust selection.

    Finally, in order to solve the combinatorial optimizationproblem of model construction by resulting to a multi-objectiveapproach, the MO-EPR employs a Multi-Objective Genetic Algo-rithm (MOGA) strategy (Goldberg, 1989). In more details, MO-EPRuses a MOGA strategy named OPTImized Multi-Objective Genetic

    Algorithm (OPTIMOGA), whose details are provided inAppendix A.However, it is worth noting that any population based strategy

    can be used within EPR. Furthermore, due to the integer coding of

    D. Laucelli, O. Giustolisi / Environmental Modelling & Software 26 (2011) 498e509 501

  • 7/24/2019 [email protected]@220275220

    6/13

    Author's personal copy

    decision variables adopted in EPR paradigm, the space of decisionvariables is discrete and purely combinatorial as well as the space ofobjective function, related to the number of parameters and actu-ally used variables. In addition, the number of models on the Paretofront is usually not huge and any strategy to specify the precision of

    objective functions, like for example the 3-dominance inKollat andReed (2006), is not strictly necessary in MO-EPR for the above-mentioned reasons. Additionally, also in MO-EPR the initializationof OPTIMOGA is based on sampling the space of the decision vari-ables by means of a Latin Hypercube sampling technique ( McKayet al., 1979). Such initialization strategy as well as the observationthat the search space of EPR models is generally not complex, it wasexperimented that MO-EPR results are neither signicantly inu-enced by initialization nor by MOGA search settings (e.g., repro-duction, initialization, mutation rate, crossover rate, etc.), which areactually unchanged from those reported in Giustolisi and Savic(2006).

    As for the version presented inGiustolisi et al. (2007), also MO-EPR has a MATLAB user interface, through which all options andsettings can be easily accessed. MO-EPR is fully implemented in theMATLAB environment, and a free (stand alone) copy of the MO-EPRsoftware together with a guide for the users is available on theEPR(2009)webpage.

    3. Data collection

    In this work, the models obtained by MO-EPR application tolocal scouring downstream of GCSs have been compared with thoseobtained byDAgostino and Ferro (2004). For this reason, MO-EPRemploys the same database used therein, thus the same explana-tory variables. In particular, the database reported in DAgostinoand Ferro (2004) can be distinguished into seven different setscoming from different experimental layouts. In the following

    application, they will be referred to the relevant publications(Veronese, 1937; Bormann and Julien, 1991; Mossa, 1998;DAgostino, 1994; Falciai and Giacomin, 1978; Lenzi and Comiti,

    2003), as summarized in Table 1, where all the symbols arereferred toFig. 1and to theAppendix C. It is worth noting that thedatabases ofVeronese (1937), Mossa (1998), DAgostino (1994)andLenzi and Comiti (2003) have been collected at laboratory scale, thedatabases ofFalciai and Giacomin (1978)and Missiaga River have

    been collected on the eld (i.e., real-scale) and the database ofBormann and Julien (1991)have been collected at prototype scale(i.e., with a quasi real-scale laboratory setting). All the used studiescan be referred to the 2D experimental layout outlined in Fig. 1,considering the 2D ow-eld generated fromwide rectangular GCSas dominant on the 3D character of the ow (Bormann and Julien,1991; Pagliara et al., 2008). In fact, in previous studies it emergesthat both prototype and laboratory models ow-elds are notexactly two-dimensional because of sidewall effects. The three-dimensional character of the ow can be, however, consideredsmall if compared to the dominant two-dimensional ow-eldgenerated from wide rectangular grade-control structures(Bormann and Julien, 1991). Finally, D Agostino and Ferro (2004)assorted real and laboratory scale examples in order to improvethe reliability of the retrieved formul for a potential nal use onreal-scale problems. For this reason, the same strategy will be usedhere for models analysis and results comparison.

    4. Modelling strategy

    The selection of the explanatory variables is an important issuein environmental modelling. Usually it is performed based on thephysical insight of the analyst and inuenced by the availability ofmeasurements/observations. In this particular context, forexample,DAgostino and Ferro (2004)selected ve dimensionlessgroups (those on the left of the line in Eq. (2)) assuming them as themost important factors based on the physics of the scourphenomenon, validated by a multiple-regression analysis. The

    reader is referred to the Notation list for symbols meaning.Similarly, the present work will assume three further variables(those on the right of the line in Eq.(2)) which can be assumed as

    Fig. 3. Decision support framework for model selection based on MO-EPR strategy.

    D. Laucelli, O. Giustolisi / Environmental Modelling & Software 26 (2011) 498e509502

  • 7/24/2019 [email protected]@220275220

    7/13

    Author's personal copy

    physically relevant as well as those inDAgostino and Ferro (2004),as will be claried later on the text. This way a general functionalrelationship for equilibrium scour depth model (based on dimen-sionless groups) can be written as follows:

    sz

    Fbz;A50; d90d50

    ; bB; h0

    zA90; h0h0 z; Hb (2)

    Moreover, the selection of the explanatory variables can also beinuenced by the modelling technique used for dening the rela-tionship between them and the target. For example, the denitionof functionF in Eq.(2)can be based on a physically-based proce-dure, such as the incomplete self-similarity theory used byDAgostino and Ferro (2004), or on the concepts of jet diffusion andparticle stability in scour holes applied in the work authored byBormann and Julien (1991), or by multiple linear regression whichis widely used in environmental applications (Pires et al., 2008).However, multiple linear regression approach can encounter someshortcomings due to high correlations between input variables. Inthis case, principal component regression (a method that combines

    linear regression and principal components analysis) can selectinput variables that correspond to signicant regression parame-ters (i.e., principal components) by means of some statisticalprocedures, thus avoiding the inclusion in the models of inputvariablesless correlated with the output variable (Pires et al., 2008).

    In the present work the approach is different, due to the abilityof MO-EPR in optimizing the parsimony of the returned expres-sions, within the multi-objective paradigm as described above. Themodelling strategy adopted herein does neither x a certain set ofsignicative explanatory variables, selected a priori by means ofsome procedures, nordenes rigidly the structure of the functionFin Eq.(1), as explained above. It simply assumes that the user candene a (roughly) wide set of hypotheses (explanatory variables)that deserve to be considered and the main features of the pseudo-

    polynomial structure thus leaving the burden of the combinatorialsearch to the evolutionary multi-objective optimization paradigmdescribed inAppendix A (thus avoiding the usual trial & errorapproach). At the end of the run, MO-EPR returns a set of optimalmodel formulations (i.e., the Pareto front of models), which are thebest in tting training data (s/z, in this case) at different levels ofparsimony (i.e., number of coefcients and dimensionless groups).Therefore, the selection of inputs and the denition of the rela-tionship between inputs and outputs can be accomplished fromdifferent points of view (seeFig. 3): the tting to test data (evalu-ated by means of the Coefcient of Determination eCoD, see Eq.(B.1)), the parsimony of the model and the presence of physicallyrelevant groups of inputs, according to the physical insight of theuser. This last point is due to the fact that MO-EPR can return

    symbolic expressions, in spite of its data-driven nature.In the particular application reported herein, additional

    dimensionless groups have been used as explanatory variables with

    respect to those in DAgostino and Ferro (2004), see Eq. (2). TheyareH/b, h0/(h0 z) and the parameter A90 that, like for A50, can beexpressed as a function of the densimetric Froude numberaccording to the grain-size chosen as representative of channel bedgradation, here represented by d50 or d90 (DAgostino and Ferro,

    2004). The training set used to develop the models is the same asthat ofDAgostino and Ferro (2004) (datasets of Veronese, 1937;Mossa, 1998; DAgostino, 1994and Bormann and Julien, 1991), aswell as the test set (datasets ofFalciai and Giacomin (1978), Lenziand Comiti (2003)and Missiaga River), seeTable 1.

    From the modelling standpoint, MO-EPR models were selectedto have the following type of pseudo-polynomial structure (seeGiustolisi et al., 2007)

    bY G24a0 Xm

    j 1ajX1ESj;1,.,XkESj;k

    35 (3)where the meaning of the symbols is inAppendix C. In particular,

    the function G was selected to be the exponential function, themaximum number of term m 3, the range of exponents fromwhich EPR will dene the matrix ES(j,z) i s [2, 1.5, 1,0.5, 0, 0.5,1, 1.5, 2] in order to explore linear, quadratic, inverse linear, squareroot, etc., and the bias a0 was neglected. The assumed objectivefunctions are the three described in Section 2.1. Moreover, duringthe exploration of the solution space, the search of the modelcoefcientsaj is constrained to only positive values (aj 0 ei.e.,non-negative least squares method byLawson and Hanson, 1974),as described inGiustolisi et al. (2007). This is useful for two generalreasons: (i) positive constants are generally consistent with thephysical meaning of the models and; (ii) the alternation of negativeand positive constants, in models developed from data, is some-times useful to t errors and generates the over-tting to data (i.e.,poor generalization performance).

    In the remainder, the Pareto front models returned by MO-EPRwill be analyzed with respect to their main characteristics, and thenstatistically compared (in order to analyze their generalizationperformance) with the following two models proposed inDAgostino and Ferro (2004)

    s

    z 0:975

    h0z

    0:863(4)

    s

    z 0:54

    b

    z

    0:593hH

    0:126A50

    d90d50

    0:856bB

    0:751(5)

    where symbols are reported inFig. 1and detailed in theAppendix

    C. The comparison will be performed also using unseen data (i.e.,the test set) and the following statistical measures: CoD; MeanRelative Deviation (MRD), see Eq.(B.2); Root Mean Squared Error

    Table 1

    The signicant features of databases used for EPR applications.

    Authors # data B[m] b/B d50 [mm] d90[mm] rs/r Q/b[m2/s] h[m] s[m]

    Training set Bormann and Julien (1991) 88 0.91 1 [0.3; 0.45] [1.58; 1.71] 2.7 [0.29O2.47] [0.15O0.38] [0.10O1.52]Mossa (1998) 19 0.30 1 2.0c 2.65 [0.0045O0.015] [0.025O0.086] [0.035O0.145]Veronese (1937) 36 0.50 1 [36.2; 21.0; 14.2; 9.1]b 2.7 [0.012O0.083] [0.05O0.25] [0.05O0.22]DAgostino (1994) 114 0.50 [0.3; 0.6] [4.1; 11.5] [7.0; 17.6] 2.7 [0.0167O0.167] [0.083O0.435] [0.045O0.285]

    Test set Missiaga Rivera 13 [7.5; 10.8] 1 60.0 155.0 e [0.369O0.531] [0.482O0.669] [0.15O0.65]Falciai and Giacomin (1978) 29 [3.3; 25.0] 1 [19O100]d e [1.2O13.4] [1.128O4.996] [0.4O3.5]Lenzi and Comiti (2003) 13 0.6 1 8.5c 2.63 [0.0067O0.029] [0.030O0.077] [0.016O0.053]

    a This database was unpublished before its inclusion in the work ofDAgostino and Ferro (2004).b These are the representative mean diameters dmfor the used cohesionless granular material, which were roughly uniform mixtures.c The channel bed was set using a mixture having a quasi-uniform grain-size distribution represented by d50and here assumed equal to d90.d Bed grain-size distribution was obtained by a weight sieve analysis, giving a value ofd50varying in this range and here assumed equal to d90.

    D. Laucelli, O. Giustolisi / Environmental Modelling & Software 26 (2011) 498e509 503

  • 7/24/2019 [email protected]@220275220

    8/13

    Author's personal copy

    (RMSE), see Eq.(B.3)and; Maximum Relative Error (MRE), see Eq.(B.4).

    InTable 2are reported the 14 EPR models of equilibrium scour

    depth normalized byz(second column) and the equations number(rst column). Note that the equations in Table 2can be referred toas the general structure in Eq. (3). In the same Table, the values ofthe three objective functions used for models development(training phase) are reported: CoD (third column); total number ofselected dimensionless variables (fourth column) and; the numberof model constants (fth column). Finally, the last column ofTable 2reports a measure of generalization performance based on the testdata, without splitting the database into three subsets related toFalciai and Giacomin (1978), Lenzi and Comiti (2003)and MissiagaRiver datasets.

    5. Modelling discussion of MO-EPR results

    The aim of this section is to discuss MO-EPR results from themodellingpointof view, emphasizingits mainfeatures (e.g., abilitytoprovide symbolic expressions, ranking of formul according to theiraccuracy vs. parsimony, etc.). As evident fromTable 2, the optimalsolutions range from a very parsimonious model, the average modelin Eq.(6), which is however the less accurate, up to the best ttingmodel, Eq. (19), which contains conversely the higher number ofterms and explanatory variables.Note that the CoD values in the lastcolumn ofTable 2have been calculated on the test set, and thus arenotinvolvedin the evaluation oftnessusedtogettheParetofrontofmodels. This means that the last column in Table 2 will help the usershowing the generalization ability of the candidate solutions.

    From models obtained it is evident that MO-EPR helpeddiscovering further explanatory variables (i.e., dimensionless

    groups) with respect to the hypotheses in Eq.(2). In fact, Eqs.(13)e(19) can be simplied introducing the new dimensionless term H/z.Therefore, the number of selected inputs is reduced accordingly

    (i.e., the new value in brackets in the fth column of Table 2contains the number of dimensionless group in the relevantequation).

    Therefore, considering the hypotheses in Eq.(2), it is possible todiscuss about dimensionless groups importance:

    eOne equation uses h0/zand the exponent is the square root;e Ten equations use h0/(h0 z) and the exponent is often 2,

    except for two models;e Five equations useA50 and the exponent is the square root. This

    group is generally used by the most parsimonious models;eSix equations use d90/d50and the exponent is the square root.

    This variable is generally used by the least parsimoniousmodels;

    eFour equations use b/B and with different exponents. In fact,the value of this group in the database is usually equal to one,except for few cases.

    eFour equations use b/zand the exponent is the square root.Moreover, this group is present other 7 times when the newdimensionless group H/zis created appearing always under thesquare root.

    The dimensionless groups H/b and A90 are not used to modeldata. Actually, H/b is present in all the models in which the newdimensionless group H/zhas been generated. Therefore, the MO-EPR suggestion can be that the main dimensionless groups forequilibrium scour depth modelling are: h0/(h0 z), H/z, d90/d50,A50,b/B and b/z. This evidence is fairly consistent with the literature(Mason and Arumugam, 1985; Bormann and Julien, 1991;Hoffmans, 1998; Ettema et al., 2000; DAgostino and Ferro, 2004;Pagliara, 2007).

    In particular, considering the work of DAgostino and Ferro

    (2004), similar exponents can be found for the dimensionlessgroups b/z,A50, h0/zand partially for b/B, thus showing basically the

    Table 2

    Equations contained in the Pareto Front returned by EPR.

    Equations on the Paretofront

    Training set Test set

    CoD # of dimensionless

    groups

    # ofmodel

    constants

    CoD

    s.

    z e0:7052 6 0.065 0 1 0.2104s.

    z e0:6437ffiffiffiffiffiffiffiffi

    h0=zp

    (7) 0.806 1 1 0.5288

    s.

    z e0:6250ffiffiffiffiffiffi

    h0=z

    b=B

    q (8)

    0.831 2 1 0.6191

    s.

    z e0:9555

    h0=h0z

    0:1511ffiffiffiffiffiffi

    A50p

    90.865 2 2 0.6016

    s.

    z e1:7606

    h0=h0z2

    0:0067A50 0:1778 100.874 2 3 0.5545

    s.

    z e0:4787ffiffiffiffiffiffi

    h0=z

    b=B

    q 0:0240

    d90=d50

    211

    0.870 3 2 0.7517

    s.

    z e0:7646

    h0=h0z

    20:1047

    ffiffiffiffiffiffiA50

    p 0:2168

    ffiffiffiffiffiffiffiffib

    .z

    r 12

    0.880 3 3 4.6132

    s.

    z e0:4250d90d50

    h0=h0z

    20:3020

    ffiffiffiffiffiHb$

    bz

    p 13

    0.875 4(3) 2 0.2284

    s.z e1:2355h0=h0z2

    0:0925ffiffiffiffiffiffiA50p 0:2054 ffiffiffiffiffiHb$bzp 14 0.896 4(3) 3 0.5031s.

    z e0:3062ffiffiffiffiffi

    Hb$

    bz

    p 0:4229bB2

    h0h0 z

    2d90d50 15

    0.875 5(4) 2 0.2368

    s.

    z e1:3168

    h0=h0z2

    0:0368ffiffiffiffiffiffiffiffiffiffiffiffi

    A50,d90d50

    q 0:2322

    ffiffiffiffiffiHb$

    bz

    p 16

    0.900 5(4) 3 0.7520

    s.

    z e0:2061ffiffiffiffiffi

    Hb$

    bz

    p 1:0538

    h 0

    h0 z

    1:50:1640

    h 0

    h0 z

    2 ffiffiffiffiffiffiffid90d50

    bz

    q 17

    0.903 6(5) 3 0.1336

    s.

    z e0:1796ffiffiffiffiffiffiffiffi

    Hb$

    bz$

    bB

    p 1:3932

    h 0

    h0 z

    20:0617

    h 0

    h0 z

    2d90d50

    ffiffibz

    p18

    0.908 7(6) 3 0.7135

    s.

    z e0:1777ffiffiffiffiffiffiffiffi

    Hb$

    bz$

    bB

    p 1:3969

    h 0

    h0 z

    20:0617

    h 0

    h0 z

    2d90d50

    $bB

    ffiffibz

    p19

    0.908 8(7) 3 0.7134

    D. Laucelli, O. Giustolisi / Environmental Modelling & Software 26 (2011) 498e509504

  • 7/24/2019 [email protected]@220275220

    9/13

    Author's personal copy

    same correlations with s/z. The dimensionless group d90/d50,considered as a surrogate of the non-uniformity parameter inPagliara (2007), is present in different MO-EPR models among theleast parsimonious. This fact is consistent with the study ofPagliara(2007) who found that the equilibrium scour depth is an increasingfunctionof the non-uniformityparameter.In Mason and Arumugam(1985) and Bormann and Julien (1991) the direct dependence (closeto square root) of equilibrium scour depth to the hydraulic jumpHwas found. EPR found the same directed dependence, selecting thehypothesis square root among the candidate exponents for H/z.Moreover, EPR models show a square root dependence of equilib-rium scour depth on the densimetric Froude number (representedbyA50). This fact isconsistent with thending ofEttemaetal.(2000)and Pagliara(2007), for example.Moreover, the most selectedinput,h0/h0 z, can be inversely related to the jet deection angleb0 (BormannandJulien,1991). The equilibriumscourdepthincreaseswith 1/b0, as found among others byHoffmans (1998)andPagliaraet al. (2008). MO-EPR results are consistent with the previousnding because the group h0/h0 zalways appear with positiveexponents.

    Finally, considering the generalization performance, the fairlyworst model was the Eq.(12)inTable 2(the average model in Eq.(6) cannot perform well). The models in Eqs. (9), (10) and (14),having a similar structure to that in Eq.(12), indicate that the groupb/z in Eq. (12) is not informative for equilibrium scour depthprediction. In fact, it was generally used to generate the new vari-ableH/z, as above reported.

    6. Statistical comparison using literature formul

    EPR generalization performances have been evaluated withrespect to models in Eqs. (8) and (16) have been selected, seeTable 2. The model in Eq. (8) is very parsimonious and for thisreason it is expected to have a reliable generalization performanceon equilibrium scour prediction at different scales, as demon-strated byDAgostino and Ferro (2004)about the model in Eq.(4).The model in Eq. (16) has similar generalization performance tomodels in Eqs.(18) and (19), see the last column in Table 2, but it ismore parsimonious. Therefore, it has been selected to becompared with the most complex model ofDAgostino and Ferro(2004), see Eq. (5). It is worth noting that a more complexmodel is generally less reliable regarding to generalizationperformance, because the complexity could be related to thespecic database used for model development, as found for Eq. (5)in DAgostino and Ferro (2004). Table 3 reports the statisticalindicators of generalization performance (see Appendix B). Theyare separately evaluated on the databases ofFalciai and Giacomin(1978), Lenzi and Comiti (2003)and Missiaga River, here used astest set (i.e., unseen data).

    Firstly, it is important to compare the ranges of validity of Eqs.(4), (5), (8) and (16)with those characterizing the databases in thetest set. In fact, Table 1 shows that Lenzi and Comiti (2003) databasefalls in the same intervals, while the other two are basically out-of-scale with respect to the training datasets (Missiaga River has onlythe equilibrium scour depth and the unit discharge in the range).Therefore, Missiaga River andFalciai and Giacomin (1978)datasetsare useful to test the extrapolation performance of the models toother ranges, whileLenzi and Comiti (2003)database is generallyuseful to test the interpolation performance.

    Table 3 demonstrates that MO-EPR models generalization

    performancesare more reliable than those obtained by numericalregression. In particular, both MO-EPR models t the Lenzi andComiti (2003) dataset better than their corresponding models inEqs. (4) and (5) (this is true according to allthe indicators). Themostparsimonious EPR model, Eq.(8), performs slightly better than themodel in Eq.(16). Therefore, a parsimonious model is sufcient forpredicting data with same accuracy asLenzi and Comiti (2003). Allthe analyzed models show comparable statistical performances onMissiaga River dataset, with the least parsimonious model ofDAgostino and Ferro (2004), Eq.(4), exposing accuracy indicatorsslightly better than the others. This is the only dataset on which theEq.(4)has a reliable performance.

    MO-EPR has a reliable generalizationperformance on Falciai andGiacomin (1978)dataset, whose variables are denitely out of the

    ranges inTable 1. In the same case, both the models in Eqs. (4) and(5)fail.Fig. 4gives a pictorial representation of what reported inTable 3for the test set.

    Table 3

    Accuracy measures evaluated for Eqs. (4), (5), (8) and (16) on the test set.

    DAgostino and Ferro (2004) e Eq.(4) DAgostino and Ferro (2004) e Eq. (5) Eq.(8) Eq.(16)

    Missiaga River CoD 0.574 0.307 0.479 0.367MRD 0.346 0.602 0.405 0.394RMSE 0.200 0.255 0.221 0.244MRE 1.579 3.110 1.672 1.359

    Falciai and Giacomin (1978) CoD 0.098 31.716 0.575 0.756MRD 0.735 3.186 0.692 0.578RMSE 0.723 4.351 0.496 0.375MRE 2.891 8.537 2.993 2.491

    Lenzi and Comiti (2003) CoD 0.488 33.388 0.861 0.811MRD 0.212 1.910 0.102 0.150RMSE 0.281 2.304 0.147 0.171MRE 0.717 3.781 0.417 0.360

    Fig. 4. Comparison among Eqs. (4), (5), (8) and (16)on the test set.

    D. Laucelli, O. Giustolisi / Environmental Modelling & Software 26 (2011) 498e509 505

  • 7/24/2019 [email protected]@220275220

    10/13

    Author's personal copy

    7. Summary and conclusions

    The paper presents the application of Multi-Objective Evolu-

    tionary Polynomial Regression, a new modelling technique thatcombines numerical regression and evolutionary computing, tolocal scouring downstream of GCS modelling. MO-EPR performs an

    evolutionary-based multi-objective optimization in the space ofsolutions, using three conicting objective functions describingaccuracy and parsimony of the candidate models. MO-EPR returns

    a set of optimal data models (Pareto front) of different accuracy andcomplexity, all showing physical consistency with the literature.MO-EPR proved to be able to discover a new explanatory variable

    Fig. 5. Flow-chart of OPTIMOGA search strategy.

    D. Laucelli, O. Giustolisi / Environmental Modelling & Software 26 (2011) 498e509506

  • 7/24/2019 [email protected]@220275220

    11/13

    Author's personal copy

    with respect to the starting hypotheses in Eq. (2), due tothe modelsexplicit structure.

    After a discussion on the modelling features of the optimalsolutions obtained by the MO-EPR paradigm, two models in thePareto front, Eqs. (8) and (16)in Table 2, have been selected and

    compared with two equations from literature (DAgostino andFerro, 2004). Such literature equations were calibrated on the

    same data by a forward stepwise regression. MO-EPR modelsdemonstrated more reliable on unseen data prediction for bothextrapolation (i.e., on real-scale data) and interpolation (i.e., onlaboratory scale data).

    This can be basically related to: (i) theglobal search performed bymeans of the GA paradigm within the space of pseudo-polynomialstructures; (ii) the multi-objective optimization scheme that, besidesmaximizing accuracy (as done by all the data-driven modellingtechniques, from Linear Regression to Articial Neural Networks)considersalsothe optimizationof model structure (i.e.,the numberofexplanatory variablesand thenumber ofterms); and(iii)the symbolicnatureof thereturnedmodels whichcanbe also selected according totheir consistency with the physical knowledge of the user.

    Therefore, considering the particular application of EPR reportedherein, theauthors envisageis to usethe model of Eq.(8) (a compactformula dependingonly on two variables(h0/zand b/B)ofteneasytoobtain)whenthereisalackofelddata. Onthe contrary, when moredata areavailable (e.g.,on sediment gradation, etc.) the model of Eq.(16)can be used, even if it is always recommended to check resultsusing the most parsimonious model. In addition, the authors adviceis to employ these formulations inside the ranges for which theyhave been developed, considering the main dimensional variables(e.g.,s,B,d50,d90,q,h) summarized inTable 1and carefully out ofthose ranges, although MO-EPR shows feasible performance also inout-of-scale case studies.

    Appendix A. Introduction to OPTIMOGA

    MO-EPR (Giustolisi and Savic, 2009) implements an evolu-tionary multi-objective search and optimization algorithm namedOPTIMOGA (Giustolisi et al., 2004). The followingFig. 5shows allthe different phases in the OPTIMOGA search strategy as ow-chart.

    OPTIMOGA algorithm favours an explorative strategy in the rstphase of the search and an exploitative approach once the Paretofront is achieved. This is accomplished by using two archives:a static archive (Pold) which stores those individuals no moreinvolved in the evolution, and a dynamic archive (Parch) whichstores the individuals that are randomly involved in the evolution.The archives are both functional to the exploitation phase of theevolution when the population is entirely constituted of non-

    dominated solutions and is quickly growing up, while they are notinvolved in the evolution until the dimension of Pareto front ofsolutions (PFdim) is less thanPmaxfor the dynamic archive andPstopfor the static archive. Moreover, OPTIMOGA works with threepopulations (minimum Pmin, maximum Pmax, stopping Pstop), whosesizes are user dened. The initial populationPinihas the same sizeof Pmin, and its individuals are generated by a Latin Hypercubesampling technique (McKay et al., 1979). During the early explo-ration phase,the population in evolution(Pact) has size equal to Pminand it is not constituted of rank-1 solutions; in particular, when thesize of Pareto front reaches the minimum size (Pmin) this meansthat the exploration has reached a feasible area of the search space.Now OPTIMOGA evolves the rank-1 solutions only and the explo-ration phase is combined with the exploitation one. If the explo-

    ration phase is nished, then the population size (now of rank-1solutions, i.e., Pareto front) increases up to the maximum size(Pmax), and the exploitation phase is reached.

    At this point, the dynamical archive starts working and growingin size generation by generation, storing the non-dominated solu-tions. Note that the dynamic archive is useful for increasing thenumber of available non-dominated solutions without drasticallydecreasing the computational performance. In fact,Parch is involved

    in the evolution, i.e., solutions from the archive are randomly usedto support the exploitation of the Pareto front.Finally, the stoppingpopulation (Pstop), usually equalor double of

    the maximum one, could be useful for stopping the evolution of theolder individuals which were involved more times in the repro-duction,from a probabilistic point of view. These individualsare notkilled but freeze up in the static archive. They are used at the end ofevolution for the constitution of the nal population (Pnal) that iscomposed of the best solutions from those in the dynamic archive(Parch), those in evolution (Pact) and those in the static archive (Pold).

    The genetic operators used in the rst stage of search (explo-ration) are multi-point crossover (probability 40%), with a numberof potential swapping points equal to the number of chromosomes,and global mutation (probability 10%), in which the mutated valueis in the range of denition of genes. In addition, a sort of localmutation is implemented during the exploitative phase, when it isnot advisable to scatter the solutions in the objective space. In thiscase, the mutated value of a gene is/1 of the original one(always satisfying the range of denition).

    About the diversity preservation, different criteria have beendeveloped, basically requiring an estimation of a distance measure,for more details seeSilverman (1986). Contrariwise, a key idea inOPTIMOGA consists in avoiding the estimation of a distancemeasure, shifting the problem of diversity preservation in thetness assignment. In fact, in OPTIMOGA the diversity preservationis generally solved pursuing the criterion that each individual hasthe lower probability to be selected the greater is the density ofindividuals in its neighbourhood.

    Therefore, the selection of the mating pool (i.e., the tnessassignment) is performed by rank (Fonseca and Fleming, 1993).This phase has an important role in OPTIMOGA algorithm becauseit helps in diversity preservation during the explorative phase andincreases the computational efciency in the evaluation of theindividualsmembership. The rank/tness of the solution S at thegenerationtis assigned as

    rankS; t 1 pt

    wherep(t) is the number of individuals that dominate S. Note thata non-dominated individual has p(t) 0, thus its rank is one inagreement with the common practice of assigning rank-1 to thenon-dominated solutions (Kollat and Reed, 2006). This kind ofrank assignment works accordingly to the position of each solu-

    tion/individual with respect to the other solutions. Thus, whena new individual is added to an already ranked population, its rankis evaluated by the evaluation of its position with respect to thepopulation. This way, the tness/rank evaluation results speededup, especially when dealing with large size populations (Giustolisiet al., 2004).

    Finally, it is to emphasize that the OPTIMOGA can deal withdifferently codied numerical problems, encompassing binary,integer and real variables. Moreover, it is typied by few param-eters; this means that the user is not required to specically tunethe GAs settings while the population parameters (minimum,maximum, stopping) are chosen considering the purpose andpredicted amount of solutions of the problem. However, it hasbeen experienced during the development of OPTIMOGA that the

    algorithm is not sensible to the variation of its settings. Forinstance, the same initial population size, together with the samecrossover and mutation rate, proved to be tted to different

    D. Laucelli, O. Giustolisi / Environmental Modelling & Software 26 (2011) 498e509 507

  • 7/24/2019 [email protected]@220275220

    12/13

    Author's personal copy

    problems. In this scenario, the key issue is avoiding as far aspossible the introduction of deterministic procedures ina stochastic search, which is driven by the tness. The effective-ness of the used MOGA algorithm has been assessed on six widelyused continuous test problems presented inDeb et al. (2002), see

    Giustolisi et al. (2004)for details.

    Appendix B. Used accuracy measures.

    Coefcient of determination, used by Giustolisi and Savic (2006)for EPR

    CoD 1

    PN

    by yexp2PN

    yexp avg

    yexp

    2 (B.1)where Nis the number of data; avg(yexp) is the average value ofobservations;y

    is the value predicted by the model and yexpis thecorresponding observation.

    Mean Relative Deviation, used byDAgostino and Ferro (2004)

    MRD 1N

    XN

    jyexp byjyexp

    (B.2)

    Root Mean Square Error, as used inAzmathullah et al. (2005)

    RMSE

    2664PN

    yexp by2

    N

    37751=2

    (B.3)

    Maximum Relative Error, dened as

    MRE maxN

    jyexp byjyexp

    ! (B.4)

    Appendix C. Notation list

    The following symbols are used in this note:Axx parameter that represents the behaviour of bed particles

    to the impinging water, related to a dxx;aj,a0 constant coefcients in the general EPR formulation;avg(yexp) average value of observations;B channel width;b

    weir width;

    dm median bed particle size;dxx bedgrain-size forwhich xx%of sampled particlesare ner;ES(j,k)vector of exponents linked to the jth term of the EPR

    expression, for thekth explanatory variable;H difference in height from water level upstream of the weir

    to the tail water level;g gravitational acceleration;h tail water depth or water depth above the non-eroded bedlevel;h0 total head above the weir crest;m the length (number of terms) of the polynomially struc-

    tured EPR expression;N total number of samples over the available dataset;PFdim dimensionof the Paretofrontof solutions (inOPTIMOGA);Pact population of individuals in evolution (in OPTIMOGA);Parch dynamic archive of non-dominated solutions (used by

    OPTIMOGA);Pnal nal population of individuals in OPTIMOGA;

    Pini initial population of individuals (in OPTIMOGA);Pmaxevolving population of individuals with maximum size

    (in OPTIMOGA);Pminminimum size of the evolving population of individuals

    (in OPTIMOGA);

    Pold static archive of older non-dominated solutions (used byOPTIMOGA);Pstop stopping population of individuals (in OPTIMOGA);p(t) the number of individuals Xis dominated by;Q water discharge;q water discharge per unit weir width (Q/b);rank (S,t) rank/tness of the solution Sat the generationt(in

    OPTIMOGA);S generic solution in the space ofm-term EPR formul;s equilibrium scour depth;Xk kth input column vector;Y

    the vector of model predictions;y

    estimated output of the system/process;yexp vector of the experimental observations;z

    weir height over the upstream bed;

    b0 jet deection angle (jet angle near bed hole);l downstream face angle of the grade-control structure;r mass density of water;rs mass density of sediments;G($) invertible function dened by the user for the EPR

    modelling process.

    References

    Azamathulla, H.Md., Ab Ghani, A., Zakaria, N.A., 2009. ANFIS based approach forpredicting maximum scour location of spillway. Water Manage. 162, 399e407.

    Azamathulla, H.Md., Ab Ghani, A., Zakaria, N.A., Guven, A., 2010. Geneticprogramming to predict bridge pier scour. J. Hydraul. Eng. 136, 165e169.

    Azmathullah, H.Md., Deo, M.C., Deolalikar, P.B., 2005. Neural networks for estima-tion of scour downstream of a ski-jump bucket. J. Hydraul. Eng. 131, 898e908.

    Babovic, V., Keijzer, M., 2000. Genetic programming as a model induction engine. J.Hydroinformatics 2, 35e60.

    Bormann, N.E., Julien, P.Y., 1991. Scour downstream of grade-control structures. J.Hydraul. Eng. 117, 579e594.

    Crout, N.M.J., Tarsitano, D., Wood, A.T., 2009. Is my model too complex? Evaluatingmodel formulation using model reduction. Environ. Model. Software 24, 1e7.

    DAgostino, V., 1994. Indagine sullo scavo a valle di opere trasversali mediantemodello sico a fondo mobile. Energ. Elettr. 71, 37e51 (in Italian).

    DAgostino, V., Ferro, V., 2004. Scour on alluvial bed downstream of grade-controlstructures. J. Hydraul. Eng. 130, 24e37.

    Dargahi, B., 2003. Scour development downstream of a spillway. J. Hydraul. Res. 41,417e426.

    Deb, K., Thiele, L., Zitzler, E., 2002. Scalable multi-objective optimization testproblems. In: Proceedings of the IEEE Congress on Evolutionary Computation(CEC 2002). IEEE Press, Piscataway, New Jersey, USA.

    Draper, N.R., Smith, H., 1998. Applied Regression Analysis. John Wiley & Sons,New York.

    EPR, 2009. Available at: .Ettema, R., Melville, B.W., Barkdoll, B., 2000. Scale effect in pier-scour experiments.

    J. Hydraul. Eng. 124, 639e642.Falciai, M., Giacomin, A., 1978. Indagine sui gorghi che si formano a valle delle

    traverse torrentizie. Italia Forestale Mont. 23, 111e123 (in Italian).Farhoudi, J., Hosseini, S.M., Sedghi-Asl, M., 2010. Application of neuro-fuzzy model

    to estimate the characteristics of local scour downstream of stilling basins.J. Hydroinformatics 12, 201e211.

    Fonseca, C.M., Fleming, P.J., 1993. Genetic algorithms for multi-objective optimization:formulation, discussion andgeneralization. In: Proceedings of the 5th InternationalConference on GeneticAlgorithms.MorganKaufmann,San Mateo, California,USA.

    Giustolisi, O., Savic, D.A., 2006. A symbolic data-driven technique based on evolu-tionary polynomial regression. J. Hydroinformatics 8, 207e222.

    Giustolisi, O., Savic, D.A., 2009. Advances in data-driven analyses and modellingusing EPR-MOGA. J. Hydroinformatics 11, 225e236.

    Giustolisi, O., Doglioni, A., Savic, D.A., Laucelli, D., 2004. A Proposal for an EffectiveMulti-objective Non-dominated Genetic Algorithm: The OPTimised Multi-Objective Genetic Algorithm: OPTIMOGA. Centre for Water Systems, Report 7/2004. University of Exeter, UK.

    Giustolisi, O., Doglioni, A., Savic, D.A., Webb, B., 2007. A multi-model approach to

    analysis of environmental phenomena. Environ. Model. Software 22,674e682.Goldberg, D.E., 1989. Genetic Algorithms in Search, Optimization and Machine

    Learning. Addison Wesley, London, UK.

    D. Laucelli, O. Giustolisi / Environmental Modelling & Software 26 (2011) 498e509508

  • 7/24/2019 [email protected]@220275220

    13/13

    Author's personal copy

    Guven, A., Gunal, M., 2008. Prediction of scour downstream of grade-controlstructures using neural networks. J. Hydraul. Eng. 134, 1656e1660.

    Haykin, S., 1999. Neural Networks: a Comprehensive Foundation, second ed.Prentice-Hall Inc., Englewood Cliffs, New Jersey, USA.

    Hoffmans, G.J.C.M., 1998. Jet scour in equilibrium phase. J. Hydraul. Eng. 124,430e437.

    Kollat, J.B., Reed, P.M., 2006. Comparing state-of-the-art evolutionary multi-objec-

    tive algorithms for long-term groundwater monitoring design. Adv. WaterResour. 29, 792e807.

    Lawson, C.L.,Hanson,R.J.,1974. SolvingLeast SquaresProblems.Prentice-Hall. Chap.23,161.Lenzi, M.A., Comiti, F., 2003. Local scouring and morphological adjustments in steep

    channels with check-dam sequences. Geomorphology 55, 97e109.Liriano, S.L., Day, R.A., 2001. Prediction of scour depth at culvert outlets using neural

    networks. J. Hydroinformatics 3, 231e238.Mason, P.J., Arumugam, K., 1985. Free jet scour below dams and ip buckets.

    J. Hydraul. Eng. 111, 220e235.McKay, M.D., Conover, W.J., Beckman, R.J., 1979. A comparison of three methods f or

    selecting values of input variables in the analysis of output from a computercode. Technometrics 211, 239e245.

    Mossa, M., 1998. Experimental study on the scour downstream of grade-controlstructures. In: Proc. 26th Convegno di Idraulica e Costruzioni Idrauliche. CSDU,Catania, Italy, pp. 581e594.

    Pagliara, S., 2007. Inuence of sediment gradation on scour downstream of blockramps. J. Hydraul. Eng. 133, 1241e1248.

    Pagliara, S., Hager, W.H., Unger, J., 2008. Temporal evolution of plunge pool scour. J.Hydraul. Eng. 134, 1630e1638.

    Pareto, V., 1896. Cours DEconomie Politique, vols. I and II. Rouge and Cic, Lausanne,Switzerland.

    Pires, J.C.M., Martins, F.G., Sousa, S.I.V., Alvim-Ferraz, M.C.M., Pereira, M.C., 2008.

    Selection and validation of parameters in multiple linear and principalcomponent regressions. Environ. Model. Software 23, 50e55.

    Reed, P., Kollat, J.B., Devireddy, V.K., 2007. Using interactive archives in evolutionarymulti-objective optimization: a case study for long-term groundwater moni-toring design. Environ. Model. Software 22, 683e692.

    Silverman, B.W., 1986. Density Estimation for Statistics and Data Analysis. Chapmanand Hall, London, UK.

    Uyumaz, A., Altunkaynak, A., zger, M., 2006. Fuzzy logic model for equilibriumscour downstream of a dams vertical gate. J. Hydraul. Eng. 132, 1069e1075.

    Van Veldhuizen, D.A., Lamont, G.B., 2000. Multi-objective evolutionary algorithmsanalyzing the state-of-the-art. Evol. Comput. 8, 125e144.

    Veronese, A., 1937. Erosion of a Bed Downstream from an Outlet. Colorado A&MCollege, Fort Collins, CO.

    Young, P., Parkinson, S., Lees, M., 1996. Simplicity out of complexity in environ-mental modelling: Occams razor revisited. J. Appl. Stat. 2, 165e210.

    D. Laucelli, O. Giustolisi / Environmental Modelling & Software 26 (2011) 498e509 509