On Pathological Fitness Landscapes for Constrained Combinatorial Optimization

Population-based search methods such as evolutionary algorithms follow gradients in the fitness landscape under the assumption that high quality solutions will lead to even better ones. Most real-world optimisation problems, however, have constraints which lead to infeasible solutions that may disrupt these gradients. As a result, high quality solutions may lie in regions that are often unreachable from regions in the fitness landscape where the preponderance of feasible solutions lie. In such cases, the make-up of the initial population as well as critical aspects of the search strategy become the crucial factors in determining whether or not high quality regions are ever reached. In this paper, we present examples of pathological landscapes that arise by considering the constrained component deployment optimisation problem for which standard evolutionary algorithms are almost certain to fail to reach the regions where high quality solutions lie. We indicate how some simple modifications can help alleviate this problem.


I. INTRODUCTION
The typical oral presentation of an evolutionary algorithm paper might include a fitness landscape slide such as the one shown in Figure 1.This tends to lull the listener into thinking that standard exploitation and exploration computation techniques will successfully explore the landscape encountering some number of local minima and maxima and, hopefully, eventually a global minimum or maximum.Of course, this will not be true in general, but this trust becomes particulary misleading for problems where there are regions in the search space of the fitness function where fitness is undefined.The objective of this paper is to present examples inspired by a constrained combinatorial optimisation problem that highlight some of the pathologies that can arise in such situations and to suggest some simple "fixes" which might avoid these difficulties.

A. Evolutionary Algorithms
In combinatorial optimisation evolutionary algorithms are iterative methods that evolve a population of solutions determined by genomes through the use of mutation, crossover, and selection operators.The optimisation process starts with a set of randomly generated genomes as an initial population.At every iteration, mutation and crossover operators are applied to some portion of the population.
The proportion of the population that is used for "reproduction" first undergoes crossover, which combines parts of two parent genomes to create two new genomes.A 1-point crossover splits both genomes at one point and combines the respective genes.It is possible to split solutions at more than one position, known as k-point crossover, and interleave the results.The points where the crossovers occur are selected at random.Next, the newly created solutions are mutated at a certain rate.There are various types of mutation operators: 1point mutation mutates only one gene in the genome, uniform mutation mutates each gene with a certain probability, and transposition mutation operators swap two different genes in the same genome.Note that while 1-point and uniform mutation operators may alter existing genes, the transposition mutation operator can only change the position of genes.Hence, only 1-point and uniform mutation operators can introduce new genes not already in the population.
The selection operator decides which solutions will survive to the next iteration and adds new genomes as determined above in order to maintain the specified population size.Depending on the evolutionary approach, the selection can be based on elitism (only the best solutions survive), quality proportionate (the probability that a solution survives is based on its fitness), or random.We refer to mutation, crossover, and selection as search operators.

B. Fitness Landscapes
The suitability of an evolutionary method for solving an optimisation problem instance depends on the structure of the fitness landscape of that instance.A fitness landscape in the context of constrained combinatorial optimisation problems is a setting comprising all of the following: • a search space S of all possible genomes • an ordered set V of fitness values • a fitness function F : S → V • a set of constraints Ω • a feasible set S ′ ⊂ S satisfying all ω ∈ Ω, • an infeasible set I ⊂ S violating at least one ω ∈ Ω, • a neighbourhood relation N (s) ⊂ S for each s ∈ S An example of a neighbourhood relation determined by 1point mutation is the relation which assigns as neighbours to a genome all genomes that differ in one gene.A neighbourhood relation could also be specified by applying crossover with another genome, usually a gnome restricted to lie in some subset of S, followed by mutation.

C. The Software Deployment Problem
Aleti [1] considers a combinatorial optimisation problem that seeks to assign n ≥ 3 software components c 1 , . . ., c n to m ≥ 3 hardware devices h 1 , . . .h m subject to certain constraints in such a way that a fitness function that measures "reliability" is maximized once a deployment function d : C −→ H, where C is the set of components and H is the set of hardware units, is specified.Thus, subject to the constraints, once c 1 has been deployed to d(c 1 ), c 2 to d(c 2 ), and so forth fitness can be evaluated.But because of the constraints, not all candidate deployment functions d : C −→ H are valid and thus the domain of feasible solutions for the fitness landscape has an unknown (and possibly unknowable) topology.
This context provided the inspiration for considering how difficult it might be to come up with a simple instance where "holes" in the domain might guarantee that absolute maxima would never be found using (standard) evolutionary search methods.In other words, we are looking for what would essentially be a minimal counterexample.Our attempts are described in the following sections.

III. A MINIMAL COUNTEREXAMPLE
For a positive integer v, let Z v denote the set {1, . . ., v}.We modify the formulation of the component deployment optimisation problem slightly by writing the deployment function as a : Z n −→ Z m so that c 1 gets assigned to h a(1) , c 2 to h a(2) , and so forth.In this way our fitness functions can be viewed as being defined on genomes that are vectors with n components i.e., on n-tuples of the form (a(1), . . ., a(n)), where 1 ≤ a(i) ≤ m for all i.This convention will facilitate counting in the sequel.Note that as n-tuples genotypes can be visualized as paths on the bounded region of the integer lattice given by {(x, y) ∈ Z × Z|1 ≤ x ≤ n, 1 ≤ y ≤ m} by representing (a(1), . . ., a(n)) as the path connecting the sequence of points (1, a(1)), . . ., (n, a(n)).
Our constraints will be: c 1 cannot be deployed to h m , or equivalently a(1) < m; c n cannot be deployed to h m , or equivalently a(n) > 1; and c 1 can be deployed to h 1 if and only if c n is deployed to h m , or equivalently a(1) = 1 if and only if a(n) = m.This last constraint is the critical constraint used to isolate a subset of assignment functions where maximal fitness solutions will lie.Our constraints are listed in Table I.
Our minimal counterexample takes n = m = 3, the first nontrivial case, so that of the twenty-seven possible 3tuples that are potential candidates for a's only six satisfy the constraints, namely those of the form (2, * , 2) or (1, * , 3) where * represents a "wild card" character that can assume any value chosen from the set Z 3 = {1, 2, 3}.Suppose the fitness function F satisfies F ((1, * , 3)) = 4, F ((2, * , 2)) = 1, and is undefined for any of the remaining twenty-one 3-tuples that do not satisfy the constraints.The six feasible solutions when represented as paths are shown in Figure 2.  Let our evolutionary method have population size s = 3, and suppose that none of the three 3-tuples that have maximal fitness make it into the initial population.That is, the initial population contains only genomes of the form (2, * , 2).Then, regardless of whether one is using 1-point or 2-point crossover no genome produced by recombining two genomes in the current population will ever have maximal fitness.Further, if we use as a mutation operator single point mutation (i.e., we change only one of the components) or we use a swap operator that interchanges two components, this is still the case.In fact, in a population of (2, * , 2) genomes, crossover followed by either a one-point mutation or a swap will also fail to ever yield a maximal fitness genome of the form (1, * , 3).In order for evolutionary computation to succeed for our toy problem when the initial population consists of only (2, * , 2) genomes it must implement an operator that yields a swap combined with a one-point mutation or a mutation operator that perturbs two or more components i.e. a two-point mutation operator.It is also curious to note that if a global maximum is obtained by, say, a swap combined with a one-point mutation, then the genome must be (1, 2, 3).Interestingly, the evolutionary algorithm used for the software deployment problem in Sabar and Aleti [2] does implement a swap followed by a point mutation, however it checks the validity of the genome after the swap which would cause it to fail for our toy problem.

IV. SCALING THE COUNTEREXAMPLE
The reason our counterexample is so intriguing is because it generalizes and scales to a constrained combinatorial optimization problem with more plausible parameters.Using the same constraints, assume now that m, n ≥ 3. Then the domain of candidate assignment functions consists of the m n n-tuples with entries in Z m = {1, . . ., m}.We shall be fluid in referring to elements in this search space as ntuples, candidates, solutions, genomes or points.We partition the set of candidates into three disjoint subsets: high quality feasible candidates Q, low quality feasible candidates B, and infeasible candidates I. Q consists of n-tuples of the form (1, * , . . ., * , m) of which there are m n−2 .B consists of ntuples of the form (X, * , . . ., * , Y ), where 1 < X, Y < m of which there are (m − 2)m n−2 (m − 2).I consists of the remaining n-tuples.I also decomposes into disjoint sets.These disjoint sets together with their cardinalities are shown in Table II.We can check that this decomposition is correct by observing that we have accounted for all n-tuples as follows: Our counting also tells us that when choosing an n-tuple at random, the probability of getting a feasible solution is [(m − 2) 2 + 1]/m 2 and the probability of getting a candidate from B or I, a candidate that does not satisfy the condition a . This makes it easy to determine the probability of randomly selecting genomes one at a time and winding up with an initial population of genomes lying exclusively in B ∪ I (see below).
It is more difficult to obtain a closed form expression for the probability that a an initial population where n-tuples are selected one by one, with infeasible solutions discarded, until a population (possibly with duplicates) of size s is obtained such that it contains only feasible solutions from B i.e., only feasible solutions that don't have a(1) = 1 and a(n) = m.Let p Q , p B and p I be the probabilities that a randomly chosen genome lies in Q, B and I respectively.We know For fixed j ≥ s, let p j be the probability of getting a pool with s genomes from B after randomly choosing exactly j genomes.Then we know the last genome must have been from B and s − 1 genomes from B must have shown up in the previous j − 1 selections.Since there are j−1 s−1 ways for genomes from B to get selected, knowing the remaining j − s choices all came from I, we have whence the desired probability is: If we are willing to accept infeasible solutions in the initial population, but require our fitness functions to assign positive values for feasible solutions and zero for infeasible solutions so that they will immediately be removed from the initial population, then we can assert that the probability of an initial population not having a feasible solution from Q, (i.e., not having a genome of the form (1, * , . . ., * , m) is (1−(1/m 2 )) s .Note that when m = 10 and s = 100 this probability already exceeds one-third.We assume an initial population of this type for the remainder of this paper.This assumption, coupled with more realistic parameter values, for example s = 100 and m, n ≥ 10, allows us to formulate some additional pathological fitness landscape examples.We first digress to a discussion of search operators.

A. Search operators
For notation, we let X i denote the 1-point crossover that occurs at position i where 1 < i < n.Formally, for genomes a 1 and a 2 , this means For 1 ≤ i ≤ n and 1 ≤ j ≤ m, we let P i,j denote the point mutation operator that assigns a(i) to be j.For 1 ≤ i < j ≤ n we define T i,j to be the transposition operator that swaps a(i) with a(j).

B. Simple Scaling
Assume m, n ≥ 3, the initial population contains solutions only from B and I, and the fitness function F is defined by setting F ((a(1), . . ., a(n))) equal to 0, 1, and m + 1 for solutions from I, B, and Q respectively.Then, assuming that solutions from I are immediately removed from the population and only low quality solutions from B remain available for recombination and selection to form the next generation, remarks similar to the minimal counterexample apply.No single application of any P i,j or T i,j to a genome can produce a high quality solution from a low quality solution.This is still true even if they are applied to an intermediate genome arising from composition of a sequence of crossover operators.
However, such a "lifting" from B to Q will occur whenever P 1,1 • P m,m is applied to a genome in B. If we posit a typical scenario where uniform point mutation is used, which is interpreted to mean that components are considered one by one so that, independently, each has probability p that a point mutation operator is applied, then for fixed i and j, the i-th component has probability p/m of having P i,j applied and the probability P 1,1 and P n,m are both applied is (p/m) 2 .Note that for p = 0.05 and m = 10 this probability is 0.000025.Another possible way for such a lifting to occur is if a genome has a(i) = m and a(j) = 1 where 1 < i, j < m -assume this occurs with probability ρ -and the composition of T 1,j with T i,m (they are disjoint transpositions, so the order doesn't matter) is applied.Since there are n(n − 1)/2 transposition operators, if there is some (small) probability ǫ of applying two swaps to a genome of the desired type, then the probability of a successful lifting is 4ρǫ/(n 2 − n).

C. Biasing Low Quality Solutions
We can make it harder for liftings from B to Q to occur by arranging it so that the percentage of genomes with 1's and m's in interior components within a population consisting of genomes only from B decreases as the evolutionary algorithm progresses.That is, over time we can try to lower the value of ρ.Define the target t to be ⌊ m+1 2 ⌋.Note that t = 2 when m = 3, and that 1 < t < m, for m ≥ 3.
Assume the initial population consists of only candidates from B and I.As before, let genomes in I and Q have fitness 0 and m + 1, but now for (a(1), . . ., a(n)) ∈ B set Observe that these fitness values all lie strictly between 1 and 2. This biases the evolutionary algorithm such that as evolution proceeds an initial population consisting of genomes only from B and I converges to one consisting entirely of the unique local minimum solution (t, . . ., t) which has fitness 1 + 1/2 n .

D. Biasing High Quality Solutions
Finally, we can immediately expel all but a select few high quality genomes that do creep into the population as a result of liftings from B to Q by lowering their fitness as follows.

V. DISCUSSION -PART 1
It is of course possible to simplify the fitness functions in our examples.We decided to use more involved fitness terms, terms that without closer inspection might more easily pass for those one might expect to encounter in "real-world" applications, to try and promote plausibility.
The critical constraint we rely on may seem far fetched, but in highly constrained scheduling problems with large numbers of variables that are overseen by systems using evolutionary techniques, it can certainly be the case that a complex set of constraints winds up inducing simple or unusual constraints like ours without anyone consciously realizing it.Our best effort at formulating a problem instance where our constraints might make sense runs as follows.Assume mission critical or fail safe software processes c 1 , . . ., c n are currently running on hardware processors h 2 , . . ., h m−1 in a real time system.Suppose that processors h 1 and h m are now to be brought online while processes c 1 and c n are upgraded such that for load balancing purposes c 1 should migrate to h 1 but cannot migrate to h m and, similarly, c n should migrate to h m but cannot migrate to h 1 .
If one does accept our thesis that fitness landscapes with pathological topologies can lead to situations where standard evolutionary algorithms are bound to fail, then as a byproduct of our examples we have an argument in favor of adopting a richer set of mutation operators, especially those that promote repeated applications of point mutation or transposition.

VI. RELATED WORK
In this section we consider related work on the relationship between fitness landscapes and optimisation problems.Unfortunately, most of the work considers only unconstrained optimisation.For comprehensive surveys of empirical approaches to characterising fitness landscapes see Malan [3] and Pitzer [4].

A. Models
Several different models for fitness landscapes have been proposed in the literature including additive fitness landscapes [5], random fitness landscapes [6], the block model [7], and the N K model [8].Additive fitness landscapes are single peaked.In contrast, in a random fitness landscape there is no correlation between the fitnesses of mutational neighbours, hence such landscapes are considered rugged and tend to have many peaks.In the block model, the genotype is composed of blocks of genes which independently contribute to the overall fitness i.e., the fitness of the genotype depends on the contribution from each block.The N K model comprises genotypes with N genes.It depends on the parameter K, where K ≤ N − 1, signalling that the fitness contribution of each gene depends on its interactions with a block of K other genes.Thus K also serves as an indicator of the ruggedness of the fitness landscape.

B. Time to Convergence
Another related avenue of research is using Markov chain theory to analyse the behaviour of evolutionary algorithms and predict how long it will take for a Markov chain representing the different states reached in the search space (e.g., the search space history) to achieve stationarity.Hernandez et al. [9] use coupling from the past to detect time to convergence, while Propp et al. [10] propose a sampling algorithm based on the idea of coupling.Since, in theory, reaching stationarity requires infinite time, Propp and Wilson provide an algorithm that can detect when stationarity has been reached in finite time.Their work was later extended by Hernandez [9].

C. Problem Hardness
Much of the theoretical work relating fitness landscapes to problem hardness has taken place within the context of biological or evolutionary landscapes.Organismal biologists seek to understand the physical, biochemical and physiological basis of genotype to phenotype mappings, while evolutionary biologists study evolutionary causes and consequences.In these situations what matters most is whether the landscape is rugged or smooth and the degree of epistasis (the interaction between genes that are not alleles [11]) occurring in genomes.
In combinatorial optimisation, features of the fitness landscape that may have an impact on problem hardness have been estimated empirically using fitness landscape characterisation metrics [12].These features pertain to the of local optima, global optima, and plateaus.Assuming the optimization objective is maximisation, given a search space S and a neighbourhood relation N , a local optimum occurs at a point s l ∈S if for any solution s n ∈N (s l ), F (s l ) ≥ F (s n ).A global optimum occurs at a point s g ∈ S if for all s∈S, F (s g ) ≥ F (s).A plateau is defined as a set P ⊆S such that for all s p ∈P, F (s p ) = k, where k is a constant.(A technical condition for ensuring connectedness is also needed, but it will not concern us here.)A plateau indicates that the landscape is neutral, and the progress of a gradientbased search algorithm, such as an evolutionary algorithm, potentially stagnates.Counteracting such stagnation requires special measures (see, for example, Barnett [13]).
Landscape modality also figures into problem hardness.Modality is a feature of fitness landscapes that encompasses the number of local optima, the distribution of the points where they occur, and the nature of their respective basins of attraction [14].In a search space S equipped with neighborhood relation, a local optimum s l , the basin of attraction for s l is defined as the set of all s ∈ S such that there is a hill climb starting at s that ends at s l .More precisely, a path in S is a finite sequence 1 , . . ., s k in S such that s i+1 ∈ N (s i ) for 1 ≤ i < k and a hill climb from s to s l is a path such that s 1 = s, s k = s l and F (s i+1 ) ≥ F (s i ) for 1 ≤ i < k.The number of basins of attraction and their relative sizes in a multi-modal landscape have been found to determine how difficult it is for a gradient-based search algorithm to find a global optimum among all the local optima it encounters (see Horn [15]).While on one hand finding global optima in unimodal problems can be difficult if plateaus dominate the landscape, on the other hand in some highly multi-modal landscapes it can be easy to find global optima for both hill-climbing algorithms and evolutionary algorithms if, for example, the modes themselves "lean" towards a global optimum.
Ruggedness is another feature that has been found to affect the performance of gradient-following algorithms.An optimisation problem is considered easier to solve using either local search or an evolutionary algorithm if highly correlated parts of the landscape form easy-to-follow gradients to the optima [16].As mentioned previously, in rugged landscapes neighbouring solutions have uncorrelated fitnesses which makes it harder for a search method to infer a search direction from previous solution quality.When the landscape is smoother and the correlation between the fitnesses of neighbouring solutions is high, there are persistent gradients (i.e, long paths) for the solver to follow.Because there is little correlation between neighbouring solutions, gradients in a rugged fitness landscape are not persistent which, in turn, suggests numerous local optima.

VII. DISCUSSION -PART 2
The literature in the previous section on theoretical and empirical investigations of fitness landscapes and their relationship to optimisation problems is focused on problems without constraints.It provides a backdrop for providing further insight into our examples.Our search space S is a finite set of n-tuples and the neighborhood relation of interest is 1point mutation, so N (s) = {P i,j (s n )|1 ≤ i ≤ n, 1 ≤ j ≤ m}.This equips S with the edit distance metric, where two points are a distance k apart if they differ in exactly k positions or, in our notation, if one can be transformed into the other using a k-fold composition of 1-point mutations.
For our minimal counterexample the high quality solutions Q of the form (1, * , 3) and the low quality solutions B of the form (2, * , 2) are plateaus.Their genomes can be viewed as parallel lines in the 3 × 3 × 3 lattice cube.These lines are edit distance two apart.The four parallel lines that are edit distance one from (2, * , 2) (viz.(2, * , 1), (2, * , 3), (1, * , 2), (3, * , 2)) all lie in the infeasible region I. Figure 3 shows a schematic of this.More importantly, in all of our examples k-fold crossover is closed on both Q and B. That is every k-fold crossover operator takes Q × Q to itself and B × B to itself.Hence searching in any direction from B does not reach genomes in Q unless search operators such as 2-fold mutation operators (e.g., 2-point mutation) or doubly transitive permutation operators (e.g., 2-fold transposition or a transposition composed with a 1-point mutation) are introduced.
Our three scaled examples increase the size of B relative to Q and also do a better job of filling out the search space with feasible solutions, meaning as m and n increase the ratio of the size of Q ∪ B to the size of I increases.Genomes in Q and B continue to remain at least edit distance two apart.In example IV-B, Q and B remain plateaus.In example IV-C, Q remains a plateau while B becomes a basin of attraction for the local minimum (t, t, . . ., t) where t = ⌊ m+1 2 ⌋.In example IV-D, B remains a plateau while Q has a global maximum occurring when the genome is (1, m, . . ., m).The situation in IV-D is a bit more complicated than that.There is a putative local maximum of 1/2 at all genomes of the form (1, Z, * , . . ., * , m) except (1, m, . . ., m), where 2 ≤ Z ≤ m, and a basin of attraction for one of them has been "punctured" in such a way that (1, m, . . ., m) yields the isolated global maximum.Since B is a plateau every genome in B yields a local maximum, so another way to phrase what is happening is to say, all the local maxima, putative or otherwise, of Q save one are smaller than all the local maxima of B.
Finally, if our quest was for a minimal counterexample, the reader may wonder why we didn't use just the constraint a(1) = 1 if and only if a(3) = 3 which, using our lines notation, would enlarge the pool of base solutions from B = {(2, * , 2)} to B = {(2, * , 2), (2, * , 1), (3, * , 1), (3, * , 2)}.There are two reasons.First, this would admit the possibility of the transposition operator T 1,3 lifting a genomes of the form (3, * , 1) from B to Q. Second, this would increase the number of "subspaces" invariant under crossover so that populations with genomes restricted to B, any one of the lines in B, or any pair of lines in B ′ that agree in one coordinate (e.g., {(2, * , 2), (3, * , 2)}) would all be invariant under crossover.

VIII. CONCLUSION
We have considered how pathological fitness landscapes affect the success of evolutionary algorithms in finding global optima in constrained optimisation problems.We formulated examples to show how constraints can shape fitness landscapes in such a way that regions of high quality solutions become unreachable from regions of lower quality solutions when using the standard search operators.Our examples stem from the software deployment problem.We presented a minimal counterexample and generalized it to provide several examples with real-world parameters as well as a more plausible narrative for the problem instances.The unexpected byproduct is that our examples provide a compelling argument for including iterated transposition and iterated point mutation among the set of search operators when using evolutionary algorithms to find solutions to highly constrained optimization problems.

Fig. 1 .
Fig. 1.A typical presentation slide for visualizing what the underlying fitness landscape for a combinatorial optimisation problem might look like.

Fig. 3 .
Fig. 3.A schematic showing the endpoints in the xz-plane of the nine parallel lines of the form (x 0 , * , z 0 ) for the n = m = 3 case.The filled circles are the two lines of feasible solutions (upper left is Q, center is B).The squares are the seven lines of infeasible solutions.The filled squares show the four lines that are edit distance one from B.

TABLE II DECOMPOSITION
OF SET OF I OF INFEASIBLE n-TUPLE GENOMES INTO DISJOINT SUBSETS.X AND Y ASSUME VALUES BETWEEN 2 AND m − 1 WHILE * IS A WILD CARD CHARACTER INDICATING ANY VALUE BETWEEN 1 AND m INCLUSIVE IS ALLOWED.