nguyen urli wagner foga2013

Upload: fressy-nugroho

Post on 03-Jun-2018

218 views

Category:

Documents


0 download

TRANSCRIPT

  • 8/11/2019 Nguyen Urli Wagner Foga2013

    1/12

    Single- and Multi-Objective Genetic Programming: NewBounds for Weighted ORDER and MAJORITY

    Anh NguyenEvolutionary Computation

    GroupSchool of Computer Science

    The University of AdelaideAdelaide, SA 5005, Australia

    Tommaso UrliDipartimento di Ingegneria

    Elettrica, Gestionale eMeccanica

    Universit degli Studi di Udine33100 Udine, Italy

    Markus WagnerEvolutionary Computation

    GroupSchool of Computer Science

    The University of AdelaideAdelaide, SA 5005, Australia

    ABSTRACT

    We consolidate the existing computational complexity analy-sis of genetic programming (GP) by bringing together soundtheoretical proofs and empirical analysis. In particular, weaddress computational complexity issues arising when cou-pling algorithms using variable length representation, suchas GP itself, with different bloat-control techniques. In or-

    der to accomplish this, we first introduce several novel upperbounds for two single- and multi-objective GP algorithms onthe generalised Weighted ORDER and MAJORITY problems.To obtain these, we employ well-established computationalcomplexity analysis techniques such as fitness-based parti-tions, and for the first time, additive and multiplicative drift.

    The bounds we identify depend on two measures,the max-imum tree size and the maximum population size, that ariseduring the optimization run and that have a key relevancein determining the runtime of the studied GP algorithms. Inorder to understand the impact of these measures on a typ-ical run, we study their magnitude experimentally, and wediscuss the obtained findings.

    Categories and Subject DescriptorsF.2 [Theory of Computation]: Analysis of Algorithms andProblem Complexity

    General Terms

    Theory, Algorithms, Performance

    Keywords

    Genetic Programming, Multi-objective Optimization, Theory,Runtime Analysis

    1. INTRODUCTIONIn the last decade, genetic programming (GP) has foundvarious applications (see Poli et al., 2008) in a number of do-

    Permission to make digital or hard copies of all or part of this work forpersonal or classroom use is granted without fee provided that copies arenot made or distributed for profit or commercial advantage and that copiesbear this notice and the full citation on the first page. To copy otherwise, torepublish, to post on servers or to redistribute to lists, requires prior speci ficpermission and/or a fee.FOGA13,January 16-20, 2013, Adelaide, Australia.Copyright 2013 ACM 978-1-4503-1990-4/13/01 ...$15.00.

    mains. As other paradigms based on variable length repre-sentation, however, GP can be subject to bloating. Bloatingoccurs when a solutions growth in complexity does not cor-respond to a growth in quality and causes the optimizationprocess to diverge and slow down. Since bloat-control is akey factor to the efficient functioning of GP, its impact on thecomputational complexity has been studied already for sim-

    ple problems, in Durrett et al. (2011) and Neumann (2012).The algorithms that have been considered are a stochastichill-climber called (1+1)-GP, and a population-based multi-objective genetic programmingalgorithm called SMO-GP; thelatter considers the trade-offs between solutions complexityCand fitness with respect to a problemF. These algorithmshave been analysed on problems with isolated program se-mantics taken from Goldberg and OReilly (1998), namelyORDER and MAJORITY, which can be seen as the analogueof linear pseudo-Boolean functions Droste et al. (2002) thatare known from the computational complexity analysis ofevolutionary algorithms working with fixed length binaryrepresentations. Both problems aresimple enough to be anal-ysed thoroughly, and they representdifferent aspects of prob-

    lems solved through genetic programming, that is, includingcomponents in the correct order (ORDER), and including thecorrect set of components in a solution (MAJORITY). Addi-tional recent computational complexity results are those ofKtzing et al. (2012) on the MAX problem, and of Wagnerand Neumann (2012) on the SORTING problem.

    The results provided in Durrett et al. (2011); Neumann (2012)raise several questions that remain unanswered in both pa-pers. In particular, for different combinations of algorithmsand problems no (or no exact) runtime bounds are given.These works suggest that two measures, namely the maxi-mum tree size Tmaxand the maximum population size Pmaxobtained during the run, play a role in determining the ex-pected optimization time of the investigated algorithms. Urliet al. (2012) have made significant effort to study the order ofgrowth of these quantities, and to conjecture runtime boundsfor both problems. However, their results are based purelyon extensive experimental investigations, potentially neglect-ing problematic cases. Even though Urli et al. (2012) conjec-ture runtimes based on their observations, the impact of bothquantities on the runtime is still unclear from the theory side.

    In this paper, we address these questions. We use mul-tiplicative drift (Doerr et al., 2010) on the fitness values tobound the runtime of (1+1)-GP on the Weighted ORDER(WORDER) problem. Subsequently, we consider (1+1)-GPon the multi-objective formulations of WORDER, which con-siders the complexity as well. There, we apply drift analysis

  • 8/11/2019 Nguyen Urli Wagner Foga2013

    2/12

    on the solution sizes in order to bound (with high probabil-ity) the maximum tree sizes encountered. Lastly, we considerthe multi-objective SMO-GP algorithm, and bound the run-times using fitness-based partitions. In the cases where TmaxandPmaxare part of the asymptotic bound, we augment theresults with experimental observations.

    Note that our investigations focus on the weighted vari-ants WORDER and WMAJORITY, which both allow for ex-

    ponentially many differentfi

    tness values. Very few runtimebounds were known for both problems so far.The paper is structured as follows. In Section 2, we intro-

    duce the analysed problems and algorithms. In Section 3,we summarize the previous computational complexity re-sults from Durrett et al. (2011); Neumann (2012). In Sections 4and 5, we present several new theoretical upper bounds andwe complement the analyses with experimental results when-ever a term of the bound is not under the control of the user.In the final section, we summarisethe existing known boundsand ours, and we point out open questions.

    2. PRELIMINARIESIn our theoretical and experimental investigations, we will

    treat the algorithms and problems analyzed in Durrett et al.(2011); Neumann (2012). We consider tree-based genetic pro-gramming where a possible solution is represented by a syn-tax tree. The inner nodes of such a tree are labelled by func-tion symbols from a set Fand the leaves of the tree are la-belled by terminals from a setT.

    We examine the problems Weighted ORDER (WORDER)and Weighted MAJORITY (WMAJORITY).In these problems,the only function symbol is the join (denoted by J), which isbinary. The terminal set is a set of2n variables, where xiis considered the complement ofxi. Hence,F := {J}, andL:= {x1,x1, x2,x2,...,xn,xn}.

    In WORDERand WMAJORITY, each variable xiis assigneda weightwi

    R,1

    i

    nso that the variables can differ

    in their contributions to the fitness of a tree. Without loss ofgenerality, we assume that w1 w2 w3 . . . wn >0. We get the ORDER and MAJORITY as specific cases ofWORDER and WMAJORITY wherewi = 1,1 i n.

    For a given solution X, the fitness value is computed byparsing the represented tree inorder. For WORDER, the weightwi of a variable xi contributes to the fitness of X iffxi isvisited in the inorder parse before all the xi in the tree. ForWMAJORITY, the weight ofxicontributes to the fitness ofXiff the number of occurrences ofxiin the tree is at least oneand not less than the number of occurrence ofxi (see Fig-ures 2 and 3). We call a variable redundant if it occurs multi-ple times in the tree; in this case the variable contributes onlyonce to the fitness value. The goal of WORDER and WMA-

    JORITY problems is to maximize their function values. Weillustrate both problems by an example (see Figure 1).MO-WORDER and MO-MAJORITY are variants of the

    above-described problems, which take the complexity Cofa syntax tree (computed by the number of leaves of the tree)as the second objective:

    MO-WORDER (X) = (WORDER (X), C(X))

    MO-WMAJORITY (X) = (WMAJORITY (X), C(X))

    Optimization algorithms can then use this to cope with thebloat problem: if two solutions have the same fitness value,then the solution of lower complexity can be preferred. In

    Figure 1: Example for evaluations according to WORDERand WMAJORITY. Let n = 5 and w1 = 15, w2 = 14,w3 = 12, w4 = 7, and w5 = 2. For the shown treeX, we get (after inorder parsing) l = (x1, x5, x4, x2, x2).For WORDER, we get S = (x1, x4) and WORDER(X) =w1+ w4 = 22. For WMAJORITY, we getS = (x1, x4, x2)and WMAJORITY(X) =w1+ w4+ w2= 36.

    Input: a syntax treeXInit:l an empty leaf list,Sis an empty statement list.

    1. Parse Xinorder and insert each leaf at the rear oflas itis visited.

    2. GenerateSby parsing l front to rear and adding ("ex-pressing) a leaf to Sonly if it or its complement arenot yet inS(i.e. have not yet been expressed).

    3. WORDER (X) =

    xiSwi

    Figure 2: Computation of WORDER (X)

    Input: a syntax treeXInit:l an empty leaf list,Sis an empty statement list.

    1. Parse Xinorder and insert each leaf at the rear oflas itis visited.

    2. For1

    i

    n, if count(xi

    l)

    1 and count(xi

    l)count(xi l), addxito S

    3. WMAJORITY (X) =

    xiSwi

    Figure 3: Computation of WMAJORITY (X)

    the special case, wherewi = 1holds for all 1 i n, wehave the problems:

    MO-ORDER (X) = (ORDER (X), C(X))

    MO-MAJORITY (X) = (MAJORITY (X), C(X))

    In this paper, all algorithms only use the mutation opera-tor HVL-Prime to generate offspring. HVL-Prime produces

    a new tree by making changes to the original tree via threebasic operators: insertion, deletion and substitution. A moredetailed explanation of this operator can be found in Dur-rett et al. (2011). In each step of the algorithms, k muta-tions are applied to the selected solution. For the single-operation variants of the algorithms, k = 1 holds. For themulti-operation variants, the number of operations performedis drawn each time from the distribution k = 1 +Pois(1),whereP ois(1)is the Poisson distribution with parameter 1.

    3. THEORETICAL RESULTSThe computational complexity analysis of genetic program-

    ming analyses the expected number of fitness evaluations

  • 8/11/2019 Nguyen Urli Wagner Foga2013

    3/12

    F(X) (1+1)-GP, F(X) Durrett et al. (2011) (1+1)-GP, MO-F(X) Neumann (2012) SMO-GP, MO-F(X) Neumann (2012)

    k=1 k=1+Pois(1) k=1 k=1+Pois(1) k=1 k=1+Pois(1)

    ORDER O(nTmax) O(nTmax) O(Tinit+ n log n)

    ?

    O(nTinit+ n2 log n)

    WORDER ? ? O(Tinit+ n log n) O(n3) ?

    MAJORITY O(n2Tmaxlog n) ? O(Tinit+ n log n) O(nTinit+ n2 log n)

    WMAJORITY ? ? O(Tinit+ n log n) O(n3) ?

    Table 1: Computational complexity results from Durrett et al. (2011); Neumann (2012)

    Mutate Yby applying HVL-Prime k times, In each time,randomly choose either insert, subsitute or delete.

    Insert: Choose a variable u L uniformly at randomand select a node v Y uniformly at random. Replacevby a join node whose children are uandv, in whichtheir orders are chosen randomly,

    Substitute: Replace a randomly chosen leafv Y bya randomly chosen leafu L.

    Delete: Choose a leaf nodev Yrandomly with par-entp and sibling u. Replacep by u and delete p andu.

    Figure 4: HVL-Prime mutation operator

    until the algorithm has produced an optimal solution for thefirst time. This is called theexpected optimization time. Inthe case of multi-objective optimization the definition of ex-pected optimization time is slightly different and considersthe number offitness evaluations until the whole Pareto front,i.e. the set of optimal trade-offs between the objectives, hasbeen computed.

    The existing bounds from Durrett et al. (2011); Neumann(2012) are listed in Table 1. As it can be seen, all results for(1+1)-GP take into account tree sizes of some kind: either themaximum tree size Tmaxfound during search or the initialtreeTinit size play a role in determining the bound. WhileTinitis something which can be decided in advance, Tmaxisa result of the interactions between the fitness function, theset of mutations and the selection process. As mutations in-volve a degree of randomness, in some cases such a measureis very difficult to control.

    A similar problem ariseswhen dealing with multi-objectivealgorithms, such as SMO-GP. As we will see, the maximumpopulation size reached during optimization, Pmax, is a fun-damental term in most bounds and it is again very difficult totackle when a random number of mutations is involved. This

    is the paramount reason for which the only bounds for thesingle-mutation variant are known to date. Lastly, note thatthe upper bounds marked with hold only if the algorithmhas been initialized in the particular, i.e. non-redundant, waydescribed in Neumann (2012).

    Because of the direct relation between these measures andthe bounds, investigating their magnitude is key to the fun-damental understanding of the bounds meaning.

    4. (1+1)-GPThe algorithm (1+1)-GP starts with an initial solution X,

    and in each generation a new offspring Y is producedby mu-

    Algorithm 1:(1+1)-GP algorithm

    1. Choose an initial solutionX

    2. Repeat

    SetY :=X

    Apply mutation toY

    If the selection favoursY overXthenX:= Y.

    Let X, Ybe the solutions and F be the fitness function in(1+1)-GP.

    (1+1)-GP onF: FavourY overXiffF(Y) F(X)

    (1+1)-GP onMO-F: FavourY overXiff(F(Y)> F(X))((F(Y) = F(X))((C(Y) C(X)))

    Figure 5: Selection mechanism of (1+1)-GP

    tating X.Yreplaces Xif it isfavoured by the selection mech-anism. Different from WORDER and WMAJORITY where

    solutions of equalfi

    tnesses can replace each other uncondi-tionally, the selection on MO-WORDER and MO-WMAJORITYfavours solutions with higher fitness value or smaller com-plexity value when two solutions have the same fitnessvalue.

    However, since in MO-WORDER and MO-WMAJORITY,the complexity of a syntax tree is not taken into account, thereis no mechanism for handling the bloat problem. Given themaximum tree size Tmaxfor a run is unknown, it would bepreferable to have runtime bounds based on the size of theinitial treeTinit, which the user can control. Using a differ-ent approach in which the selection in Figure 4 is used, Neu-mann (2012) proved that the expected optimisation time for(1+1)-GP-single on both MO-WORDER and MO-WMAJORITYisO(Tinit+ n log n). In the following subsections, (1+1)-GPwith one and more than one mutation in each step are corre-spondingly denoted by (1+1)-GP-single and (1+1)-GP-multi.

    4.1 (1+1)-GP on F(X) ProblemsIn previous works,Durrett et al. (2011) showed that the up-

    per bound of the expected run time for (1+1)-GP on ORDERis O(nTmax). First, we show that this bound immediatelycarries over for (1+1)-GP-single on WORDER.

    THEOREM 1. The expected optimization time of (1+1)-GP-singleon WORDER isO(nTmax).

    PROOF. As (1+1)-GP-single performs only a single muta-tion operation at a time, it is not possible (1) to increase the

  • 8/11/2019 Nguyen Urli Wagner Foga2013

    4/12

    WORDER value of a tree and (2) to decrease the number ofexpressed variables at the same time. Hence, once a variableis expressed, it will stay expressed for the rest of the optimi-sation process. This, however, means that the weights asso-ciated to the variables are effectively irrelevant. Thus, the up-per bound of the runtime is identical to that of (1+1)-GP-singleon the unweighted ORDER, and the bound immediately car-ries over.

    The bound for the multi-operation variant is found by ap-plyingTheorem 3(multiplicative drift analysis) from Doerret al. (2010), which is defined as follows.

    THEOREM2 (M ULT. DRIFT, D OERR ET AL. (2010)). LetS R be a finite set of positive numbers with minimum smin.LetX(t)tN be a sequence of random variables over S {0}. LetTbe the random variable that denotes the first point in timet N

    for which X(t) = 0.Suppose that there exists a constant > 0such that

    E

    X(t) X(t+1)|X(t) =s s (1)

    holds for all s SwithPr[Xt =s]> 0. Then for all s0 SwithPr[X

    (0)

    =s0]> 0,

    E[T|X(0) =s0] 1 + log(s0/smin)

    . (2)

    In the following theorem we employ Theorem 2 to providean upper bound on the expected runtime of (1+1)-GP-multi,F(X) on WORDER when starting from any solution.

    THEOREM 3. The expected optimization time of (1+1)-GP-multion WORDER isO(nTmax(log n+ log wmax)).

    PROOF. In order to prove the stated bounds we need toinstantiate Theorem 2 on our problem. The theorem is validif we can identify a finite setS R from which the randomvariablesXt are drawn. In our case the drift is defined on

    the (at most exponentially many) values of the fitness func-tion. Also, since the result holds under the assumption thatthe underlying problem is a minimization problem, we firstneed to define an auxiliary problem WORDERminwhich getminimized whenever WORDER gets maximized

    WORDERmin(x) =ni=1

    wi WORDER(x).

    Note that while WORDER(x)is the sum of the weights forthe expressed variables in x, WORDERmin(x)is the sum ofthe weights of the unexpressed variables. We also recall thatORDER(x)denotes the number of expressed variables inx.

    Now, given a solution xt at time t, let st= WORDERmin(xt)

    be the WORDER-value of this solution and m= n

    ORDER(xt)the number of unexpressed variables in xt. Since an im-provement can be done by inserting a non-expressed vari-ables into the current tree, then the expected increment, i.e.the drift, of WORDERmin is lower bounded by:

    E

    X(t) X(t+1)|X(t) =st WORDERmin(xt)m m6enTmax

    = WORDERmin(xt)

    6enTmax. since the probability of expressing any of

    the m unexpressed variables at the beginning of the tree isat least m

    2n 1

    3eTmax= m6enTmax for both the single- and the

    multi-operation case, and the expected improvement whenperforming such step is W ORDERmin(x

    t)/m(because themissing weights are distributed over mvariables).

    Given this drift, we have (in Theorem 2 terminology)

    = 1

    6enTmaxs=WORDERmin(x

    t).

    Also, from the definition of WORDER, we have thatsmin =wmin. FromTheorem 2 follows that, for any initial value s0 =WORDERmin(x

    0), we have

    E[T|X(0)

    =s0] 1 + ln s0wmin 1

    1 + ln

    s0wmin

    6enTmax.

    Since we are interested in an upper bound over the expectedoptimization time, we must consider starting from the worstpossible initial solution, i.e. where none of the variables isexpressed. Let wmaxbe the largest weight in the set, then themaximum distance from the optimal solution is nwmax s0.Hence we have

    E[T]

    1 + ln

    nwmaxwmin

    6enTmax

    (1 + ln (nwmax))6enTmax

    =O(nTmaxlog(nwmax)) = O(nTmax(log n + log wmax))

    which statesthat the expected optimization time Tof (1+1)-GP,F(X) for WORDERmin, and hence for WORDER, starting fromany solution is bounded by O(nTmax(log n +log wmax)).

    The dependence of this bound, and of the bound for OR-DER introduced in Durrett et al. (2011), onTmaxis easily ex-plained by the fact that, in order to guarantee a single-stepimprovement, one must perform a beneficial insertion, i.e.one which expresses one of the unexpressed variables, at thebeginning of the tree. This involves selecting one out of atmost Tmax nodes in the tree. Unfortunately, Tmax can po-tentially grow arbitrarily large since F(X)does not control

    solution complexity.We have investigated this measure experimentally in or-der to understand what its typical magnitude is. The experi-ments were performedon AMD Opteron 250 CPUs (2.4GHz),on Debian GNU/Linux 5.0.8, with Java SE RE 1.6 and weregiven a maximum runtime of 3 hours and a budget of109

    evaluations. Furthermore, each experiment (involving twodifferent initialization schemes, respectively with 0 and 2nleaves in the initial tree) has been repeated 400 times, whichresults in a standard error of the mean of1/

    400 = 5%. The

    empirical distributions of maximum tree sizes for (1+1)-GP,F(X)on all ORDER variants are shown as box-plots in Fig-ure 6. The blue-toned line plots show the medianTmaxdi-vided by n (solid line) and by n log(n) (dashed line); thenearly constant behaviour of the solid line suggests that Tmaxhas typical a linear behaviour for all ORDER variants, at leastfor the tested values of n and the employed initializationschemes.

    4.2 Algorithms on MO-F(X) ProblemsFor the multi-operation variants, a single HVL-Prime ap-

    plication can lead to more than a single mutation with con-stant probability. This would lead to the case where the com-plexity increase faster than the fitness value, i.e. an improve-ment on the fitness of the tree can be followed by multipleincrease on the complexity. We start our analysis by showingan upper bound of the maximum tree size, Tmax, during run

  • 8/11/2019 Nguyen Urli Wagner Foga2013

    5/12

    Figure 6: Distribution, as green box-plots, of the maximum tree sizes observed for (1+1)-GP,F(X)on the ORDER variants.The blue-toned lines are the medianTmaxvalues divided by the corresponding polynomial (see the legend) and suggest theasymptotic behavior of the measure. In this case the almost horizontal line for n shows that the maximum tree size is veryclose to linear, at least for the tested input sizes.

    time and then using that fact to bound the expected optimi-sation time of (1+1)-GP-multi on MO-WORDER.

    THEOREM 4. (Oliveto and Witt, 2011) Let Xt, t 0, be therandom variables describing a Markov process over a finite statespace S R+0 and denote t(i) := (Xt+1 Xt | Xt = i)

    for i S and t 0. Suppose there exists an interval[a, b] inthe state space, two constants , > 0 and, possibly depending onl:= b a, a function r(l)satisfying1 r(l) =o(l/log(l))suchthat for allt 0the following two conditions hold:

    1. E(t(i)) fora < i < b,

    2. Prob(|t(i)| j) r(l)(1+)j fori > aandj N0.Then there is a constantc > 0 such that forT := min{t 0 :Xt a| X0 b}it holds ProbT

    2cl/r(l)= 2(l/r(l)).

    In the conference version, r(l)was only allowed to be a constant,i.e.,r(l) =O(1). In this case, the last statement is simplified to

    Prob(T 2cl) = 2(l).

    THEOREM 5. LetTinit 19nbe the complexity of the initialsolution. Then, the maximum tree size encountered by(1+1)-GP-multi on MO-WORDER in less than exponential timeis20n, with high probability.

    Note that the state space S(of the tree sizes) is finite forboth problems, as at most 2n fitness improvements are ac-cepted. Thus, at most2n + 1 different tree sizes can be at-tained.

    PROOF. Theorem 4 was usedto find the lower bound valueof a function in less than exponential time. In case offinding

    the upper bound value of a function, condition 1 and 2 ofTheorem 4 are then changed to :

    1. E(t(i)) fora < i < band > 0,2. Prob(|t(i)| j) r(l)(1+)j fori > aandj N0.

    Leta = 19n, b = 20n, k be the number of expressed vari-ables, ands be the number of leaves of the current tree. Forcondition 1 to hold, the expected drift of the size of a syn-tax tree in the interval [a, b] = [19n, 20n]must be a negativeconstant. This is computed by

    E(t(i)) =jZ

    jP(t(i) =j)

    =

    jZ,j j operations, out of which j must bedeletions of redundant variables and the other t j mustbe neutral moves. Since we have three mutations with equalprobability of being selected, there are at least to 3tj possi-ble combinations of mutations that lead to a decrease ofj.

  • 8/11/2019 Nguyen Urli Wagner Foga2013

    6/12

    We know that among these, at least one is made of substi-tutions that dont decrease the fitness and the probability ofobtaining it1/3tj . Lets= cn,19< c < 20, then

    jN

    j P(t(i) = j)

    jN

    j

    t=j

    13j s kj 1s

    j

    1

    3ts Pois(x= t)

    jN

    jt=j

    1

    3

    j s k

    j

    1

    s

    j

    1

    3ts

    1

    et!

    1ejN

    j

    1

    3

    j

    s k

    j

    j

    1

    s

    j t=j

    1

    3ts

    1

    t!

    1e

    jN

    j

    1

    3

    j

    c 1

    c

    j

    1

    j

    j t=j

    1

    3ts

    1

    t!

    where(1/3)j comes from the fact that we need to performjdeletions,(1/s)j is related to selecting the right leaf over s

    leaves for j times andsk

    j

    is the number of possible per-mutations of redundant variables.

    When t(i) =j for j N, the size of the tree increases. Inorder to provide a upper bound on the expected increase ofthe tree size, we need to consider all the possible combina-tions oft mutations which lead us to an increase ofj . Sincewe need to increase the tree size by j, we need at leastj in-sertions, thus the remainingtjmutations can be arrangedin3tj different ways. Note that this is an upper bound, asmany of these combinations will not lead to an increase of

    j. This means that the probability to do a correct number of

    insertions is upper bounded by 3tj

    3t = 13j

    . Also, in order toaccept a larger tree size, the fitness must be increased as well.In the worst case, this can be accomplished by inserting one

    unexpressed variable in such a position that it increases thefitness of the tree size, an inserting m 1random variablesin positions that dont decrease the fitness of the tree. Thusthe expected increase in the tree size is

    jN

    jP(t(i) =j)

    jN

    j

    n k2n

    P1 P2t=j

    Pois(x= t)

    jN

    j 1

    3j

    1

    2 t=j

    1

    et!= 1

    e jN

    j1

    2

    1

    3j t=j

    1

    t!

    where the (1/3)j term comes from the fact that we want todoj insertions,(n k)/2ncomes from the fact that we wantto insert one of the unexpressed variables, P1 comes fromthe fact that we want to insert the unexpressed variable ina position where it can determine an increase in the fitness,

    and P(j1)2 comes from the fact that we want to put j 1(possibly redundant) random variables (whose probability 1of being selected has been omitted) at any position in whichthey dont determine a decrease in the fitness. Also note that

    P1, P2 1andn k/2n 1/2. Therefore, we have that

    E(t(i)) =jN

    j P(t(i) = j) +jN

    jP(t(i) =j)

    1e

    jN

    j

    j!

    1

    3

    j

    1

    cn

    j

    (c 1)n

    j

    j

    t=j

    13ts

    1

    t!

    +jN

    j

    1

    2

    1

    3j

    1

    et=j

    1

    t!

    = 1

    e

    jN

    j

    3j

    1

    2

    t=j

    1

    t!

    1

    j!

    c 1

    c

    j

    1

    jj

    1

    j!

    1

    3(j+ 1)!

    LetA=

    jN Aj, where

    Aj = j

    3j

    1

    2

    t=j

    1

    t!

    c 1c

    j1

    jj

    t=j

    1

    3tj1

    t!

    in whicht=j 13tj 1t! 1j!

    13(j+1)!for 19 < c < 20, we

    have

    A1< 1

    3

    e 1

    2 19

    20(1 + 1/6)

    = 0.083064

    A2< 2

    9

    e 2

    2

    19

    20

    21

    4(1/2 + 1/18)

    = 0.051954

    A3< 1

    9

    e 2 0.5

    2

    19

    20

    3

    1

    27(1/6 + 1/72)

    = 0.011490

    Hereby,

    A= A1+ A2+ A3+j4

    Aj < 0.019620 +j4

    Aj.

    However, sincej4

    Aj =j4

    j

    3j

    1

    2

    t=j

    1

    t!

    c 1c

    j1

    jj

    t=j

    1

    3tj1

    t!

    C(Y),Xis then replaced by Y in the popula-tion,

    4. inserting a redundant variable to X, thus generatinga new solution with the same fitness and higher com-plexity, which is not accepted by SMO-GP-single,

    5. substituting a variablexiin Xby another variable xj .

    Ifxj is a non-redundant variable and wi < wj,F(X) < F(Y) C(X) = C(Y),Y replacesXinthe population

    Ifxj is a non-redundant variable and wi > wj,F(X)> F(Y) C(X) =C(Y),Yis discarded.

    Ifxjis redundant variable, F(X)> F(Y)C(X) =C(Y),Yis discarded.

    As shown above, the number of redundant variables in thesolution will not increase duringthe run of the algorithm. LetRi be the number of redundant variables in solution i, thenfor all the solutions in the population holdsRi Rinit.

    LetX

    Pbe a solution inPandXbe the number of ex-

    pressed variables inX, then the complexity ofXis C(X) =RX+ X . SinceRX Rinitand X n, the complexity ofa solution inPis upper bounded by Rinit+ n Tinit+ n.

    The population size only increases when a new solutionwith different fitness is generated. This only happens whenthe complexity increases or decreases by 1. Because the high-est complexity that a solution can reach isTinit+ n, and be-cause the empty tree can be in the population as well1, themaximum size the population can reach isTinit+ n + 1.

    LEMMA 1. Starting with an arbitrary initial solution of sizeTinit, the expected time until the population of SMO-GP-single on

    MO-WORDER and MO-WMAJORITYcontains the empty tree isO(T2init+ nTinit).

    PROOF. We will bound the time by repeated deletions ofrandomly chosen leaf nodes in the solutions of lowest com-plexity, until the empty tree has been reached. Let Xbe thesolution in the population with the lowest complexity. Theprobability of choosing the lowest complexity solutionXinthe population is at least 1

    Tinit+n+1, while the probability of

    making a deletion is 13

    . In each step, because any variable inXcan be deleted, the probability of choosing the correct vari-able is1. The probability of deleting a variable in Xat eachstep therefore is bounded from below by 13

    1Tinit+n+1

    , and

    1as the empty tree is Pareto optimal withF(X) =C(X) = 0

  • 8/11/2019 Nguyen Urli Wagner Foga2013

    9/12

    the expected time to do such a step is at most 3(Tinit+n+1).SinceTinitis the number of leaves in the initial tree, there areTinitrepetitions required to delete all the variables. This im-plies that the expected time for the empty tree to be includedin the population is upper bounded by

    Tinit 3(Tinit+ n + 1) =O

    T2init+ nTinit

    THEOREM 8. Starting with a single arbitrary initial solution

    of sizeTinit, the expected optimization time of SMO-GP-single onMO-WORDER and MO-WMAJORITY is O(T2init+ n

    2Tinit+n3).

    PROOF. In the following steps, we will bound the timeneeded to discover the whole Pareto front, once the emptysolution is introduced into the population. As shown in The-orem 1, the empty tree is included in the population afterO(T2init+ nTinit)steps. The empty tree is now a Pareto op-timal solution with complexityC(X) = 0. Note that a so-lution of complexityj,0 j nis Pareto optimal in MO-WORDER and MO-MAJORITY, if it contains only the largest

    jvariables.Let us assume that the population contains all Pareto opti-

    mal solutions with complexitiesj,0

    j

    i. Then, a popu-

    lation which includes all Pareto optimal solutions with com-plexities j, 0 j i + 1, can be achieved by producinga solutionYthat is Pareto optimal and that has complexityi+ 1. Y can be obtained from a Pareto optimal solutionX

    with C(X) = i and F(X) =k

    i=1 wi by inserting the leafxk+1(with associated weightwk+1) at any position. This op-eration produces from a solution of complexity i a solutionof complexity i + 1.

    Based on this idea we can bound the expected optimiza-tion time once we can bound the probability for such stepsto happen. ChoosingXfor mutation has the known prob-ability of at least 1Tinit+n+1 as the population size is upper

    bound by Tinit+ n+ 1. Next, the inserting operation of themutation operator is chosen with probability 13 . As a specificelement out of a total ofn elements has to be insertedat a ran-domly chosen position, the probability to do so it 1

    n. Thus,

    the total probability of such a generation is lower boundedby 1Tinit+n+1

    13

    1n .

    Now, we use the method of fitness-based partitions We-gener (2002) according to the n+1 different Pareto front sizesi. Thus, as there are onlyn Pareto-optimal improvementspossible once the empty solution is introduced into the pop-ulation, the expected time until all Pareto optimal solutionshave been generated is:

    n1i=0

    1

    Tinit+ n + 1

    1

    3

    1

    n

    1= 3n2(Tinit+ n + 1)

    =O(n2

    Tinit+ n3

    ).Summing up the number of steps to generate the empty

    tree and all other Pareto optimal solutions, the expected op-timisation time isO(T2init+ n

    2Tinit+ n3)

    5.2 SMO-GP-multiIn principle, when considering WORDER and WMAJOR-

    ITY in SMO-GP, it is possible to generate an exponential num-ber of trade-offs between solution fitness and complexity. Forinstance, ifwi = 2

    ni, 1 i n, then one can easily gen-erate 2n different individuals by considering different sub-sets of variables. These solutions are dominated by the true

    Pareto front, and our experimental analysis suggests thatmany of them get discarded early in the optimization pro-cess.

    In this section, we employ the Pmaxmeasure collected inour experimental analysis (see Figure 7) to consolidate someof the theoretical bounds about WORDER and WMAJORITYpresented in Neumann (2012) and introduce new bounds fortheir multi-operation variants.

    LEMMA 2. The expected time before SMO-GP, initialized withTinit leaves, adds the empty tree to its population is bound byO(TinitPmax).

    PROOF. If the empty tree is the initial solution, i.e.Tinit=0, then Lemma 2 follows immediately.

    IfTinit >0, then we need to perform Tinitdeletions in or-der to reach the empty tree. Let Xbe the individual of lowestcomplexity, then the probability of selecting it for mutationis at least 1/Pmax. The probability of performing a singledeletion is lower bounded by 1

    3efor both the single- and the

    multi-operation variants of the HVL-Prime operator. There-fore, the total probability of selecting the solution of lowestcomplexity and deleting one of its nodes is (1/Pmax), and

    the expected time to make such a step isO(Pmax). Since weneed to delete Tinit leaves, the total expected time to addthe empty tree to the population is bounded by the repeateddeletion takingO(TinitPmax)steps.

    Note that, contrary to Theorem 1, this lemma holds forboth the single- and the multi-mutation variants of SMO-GP.However, this lemma depends on Pmax, which is not underthe control of the user.

    We now state an upper bound for SMO-GP on WORDERand WMAJORITY in both single and multi-operation vari-ants.

    THEOREM 9. The expected optimization time of SMO-GP on

    WORDER and WMAJORITY isO(Pmax(Tinit+ n2)).

    PROOF. This proof follows the structure of the proof ofTheorem 8. First, due to Lemma 2 we can assume that wehave added the empty tree to the population afterO(TinitPmax)steps. We now recall that a solution of com-plexityj,0 j n, is Pareto optimal for MO-WORDER andMO-WMAJORITY, if it contains, for thej largest weights, ex-actly one positive variable.

    Similar to the second step of the proof of Theorem 8, weassume that the population already contains the iPareto op-timal solutions with complexitiesj,0 j i. Then, a popu-lation which includes all Pareto optimal solutions with com-plexities j, 0 j i + 1, can be achieved by producinga solution Ythat is Pareto optimal and that has complexityi+ 1. Ycan be obtained from a Pareto optimal solution Xwith C(X) = iand F(X) =

    ki=1 wi by inserting the leaf

    xk+1(with associated weightwk+1) at any position. This op-eration produces from a solution of complexity i a solutionof complexityi + 1.

    Again, based on this idea, we can bound the expected op-timization time once we can bound the probability for suchsteps to happen. Choosing Xfor mutation is lower boundedby 1/Pmax, the inserting operation of the mutation opera-tor is chosen with probability1/3e, and as a specific elementout of a total of up to 2nelements has to be inserted at a ran-domly chosen position, the probability to do so it 1/2n. Thus,

  • 8/11/2019 Nguyen Urli Wagner Foga2013

    10/12

    Figure 7: Maximum population size distributions (as box-plots) reached before finding an optimal solution in SMO-GP.The blue lines represent the medians divided by the corresponding polynomials and give an indication of the order ofmagnitude of thePmaxmeasure.

    the total probability of such a generation is lower bounded

    by1/ (6enPmax).Now, using the method offitness-based partitions accord-

    ing to the n + 1different Pareto front sizes i, the expectedtime until all Pareto optimal solutions have been generated(once the empty tree was introduced) is bounded by

    n1i=0

    1

    Pmax

    1

    3e

    1

    2n

    1= 6en2Pmax

    =O(n2Pmax).

    Therefore, the total expected optimization time when start-ing from an individual with Tinitleaves is thus O(TinitPmax+n2Pmax), which concludes the proof.

    As we mentioned before, to date there is no theoretical un-derstanding about howPmaxgrows during an optimizationrun and, in principle, its order of growth could be exponen-tial inn. For this reason, similarly to what we did forTmax,we have investigated this measure experimentally. The ex-perimental setup is the same described previously, and theempirical distributions of Pmax are shown as box-plots inFigure 7. The blue-toned line plots show the median Pmaxdivided byn and log(n). Again, the nearly constant behav-ior of the solid line suggests a magnitude ofPmaxwhich islinearly dependent on n and, in general, very close to n asthe factors, at most1.1, suggest.

    6. CONCLUSIONS

    In this paper, we carried out theoretical investigations tocomplement the recent results on the runtime of two geneticprogramming algorithms (Durrett et al., 2011; Neumann, 2012;Urli et al., 2012). Crucial measures in the theoretical analysesare the maximum tree size Tmaxthat is attained during therun of the algorithms, as well as the maximum populationsizePmaxwhen dealing with multi-objective models.

    We introduced several new bounds for different GP vari-ants on the different versions of the problems WORDER andWMAJORITY. Tables 2 and 3 summarise our results and theexisting known bounds on the investigated problems.

    Despite the significant theoretical and experimental effort,the following open challenges still remain:

    1. Almost no theoretical results for the multi-operationvariants are known to date. A major reason for this isthat it is not clear how to bound the number of acceptedoperations when employing the multi-operation vari-ants on the weighted cases.

    2. Of particular difficulty seems to be the problem to boundruntimes on WMAJORITY. There, in order to achievea fitness increment, the incrementing operation needsto be preceded by a number of operations that reducethe difference in positive and negative occurrences of avariable. Said difference is currently difficult to control,as it can increase and decrease during a run.

  • 8/11/2019 Nguyen Urli Wagner Foga2013

    11/12

    3. Several bounds rely onTmaxor Pmaxmeasures, whicharenot set in advance and whose probability to increaseor decrease during the optimization is hard to predict.This is because their growth (or decrease) depends onthe content of the individual and the content of the pop-ulation at a specific time step. Solving these problemswould allow bounds which only depend on parame-ters the user can initially set. Observe that limiting

    Tmaxand Pmaxis not an option, since we are consider-ing variable length representation algorithms, and pop-ulations that representallthe trade-offs between objec-tives.

    4. Finally, it is not known to date how tight these boundsare.

    References

    Doerr, B., Johannsen, D., and Winzen, C. (2010). Multiplica-tive drift analysis. In Pelikan, M. and Branke, J., editors,GECCO, pages 14491456. ACM.

    Droste, S., Jansen, T., and Wegener, I. (2002). On the analysisof the (1+1) evolutionary algorithm. Theoretical Computer

    Science, 276:5181.Durrett, G., Neumann, F., and OReilly, U.-M. (2011). Com-

    putational complexity analysis of simple genetic program-ing on two problems modeling isolated program seman-tics. InFOGA, pages 6980. ACM.

    Goldberg, D. E. and OReilly, U.-M. (1998). Where does thegood stuff go, and why? How contextual semantics influ-ences program structure in simple genetic programming.InEuroGP, volume 1391 ofLNCS, pages 1636. Springer.

    Ktzing, T., Sutton, A. M., Neumann, F., and OReilly, U.-M. (2012). The max problem revisited: the importanceof mutation in genetic programming. In Proceedings of the

    fourteenth international conference on Genetic and evolutionarycomputation conference, GECCO 12, pages 13331340, NewYork, NY, USA. ACM.

    Neumann, F. (2012). Computational complexity analysis ofmulti-objective genetic programming. In Proceedings of the

    fourteenth international conference on Genetic and evolutionarycomputation conference, GECCO 12, pages 799806, NewYork, NY, USA. ACM.

    Oliveto, P. S. and Witt, C. (2011). Simplified drift analysis forproving lower bounds in evolutionary computation. Algo-rithmica, 59(3):369386.

    Poli, R., Langdon, W. B., and McPhee, N. F. (2008). A FieldGuide to Genetic Programming. lulu.com.

    Urli, T., Wagner, M., and Neumann, F. (2012). Experimen-tal supplements to the computational complexity analysisof genetic programming for problems modelling isolatedprogram semantics. InPPSN. Springer. (to be published).

    Wagner, M. and Neumann, F. (2012). Parsimony pressureversus multi-objective optimization for variable lengthrepresentations. InPPSN. Springer. (to be published).

    Wegener, I. (2002). Methods for the analysis of evolutionaryalgorithms on pseudo-Boolean functions. In EvolutionaryOptimization, pages 349369. Kluwer.

  • 8/11/2019 Nguyen Urli Wagner Foga2013

    12/12

    F(X) (1+1)-GP, F(X)

    k=1 k=1+Pois(1)

    ORDER O(nTmax)Durrett et al. (2011) O(nTmax)Durrett et al. (2011)

    O(Tinit+ n log n)Urli et al. (2012) O(Tinit+ n log n)Urli et al. (2012)

    WORDER O(nTmax) O(nTmax(log n+ log wmax))

    O(Tinit+ n log n)Urli et al. (2012) O(Tinit+ n log n)Urli et al. (2012)

    MAJORITY O(n2Tmaxlog n)Durrett et al. (2011) ?

    O(Tinit+ n log n)Urli et al. (2012) O(Tinit+ n log n)Urli et al. (2012)

    WMAJORITY ? ?

    O(Tinit+ n log n)Urli et al. (2012) O(Tinit+ n log n)Urli et al. (2012)

    F(X) (1+1)-GP, MO-F(X)

    k=1 k=1+Pois(1)

    ORDER O(Tinit+ n log n)Neumann (2012) O(n

    2 log n)

    O(Tinit+ n log n)Urli et al. (2012) O(Tinit+ n log n)Urli et al. (2012)

    WORDER O(Tinit+ n log n)Neumann (2012) ?

    O(Tinit+ n log n)Urli et al. (2012) O(Tinit+ n log n)Urli et al. (2012)

    MAJORITY O(Tinit+ n log n)Neumann (2012) ?

    O(Tinit+ n log n)Urli et al. (2012) O(Tinit+ n log n)Urli et al. (2012)

    WMAJORITY O(Tinit+ n log n)Neumann (2012) ?

    O(Tinit+ n log n)Urli et al. (2012) O(Tinit+ n log n)Urli et al. (2012)

    Table 2: Summary of our bounds () and the existing theoretical upper bounds from Table 1. The average-case conjecturesfrom Urli et al. (2012) are listed to indicate tightness. Note that (?) mark the cases for which no theoretical boundsare known,and bounds marked with () require special initialisations.

    F(X) SMO-GP, MO-F(X)

    k=1 k=1+Pois(1)

    ORDER O(nTinit+ n

    2 log n)Neumann (2012) O(nTinit+ n2 log n)Neumann (2012)

    O(nTinit+ n2 log n)Urli et al. (2012) O(nTinit+ n

    2 log n)Urli et al. (2012)

    WORDER

    O(T2init+ n2Tinit+ n

    3) O(TinitPmax+ n2Pmax)

    O(n3)Neumann (2012)

    O(nTinit+ n2 log n)Urli et al. (2012) O(nTinit+ n

    2 log n)Urli et al. (2012)

    MAJORITY O(nTinit+ n

    2

    log n)Neumann (2012) O(nTinit+ n

    2

    log n)Neumann (2012)O(nTinit+ n

    2 log n)Urli et al. (2012) O(nTinit+ n2 log n)Urli et al. (2012)

    WMAJORITY O(T2init+ n

    2Tinit+ n3) O(TinitPmax+ n

    2Pmax)

    O(n3)Neumann (2012)

    O(nTinit+ n2 log n)Urli et al. (2012) O(nTinit+ n

    2 log n)Urli et al. (2012)

    Table 3: Summary of our bounds () and the existing theoretical upper bounds from Table 1. The average-case conjecturesfrom Urli et al. (2012) are listed to indicate tightness. Note that the bounds marked with () require special initialisations.