The Revised Stochastic Simplex Bisection Algorithm and Particle Swarm Optimization

—The stochastic simplex bisection (SSB) algorithm is evaluated against particle swarm optimization (PSO) on a prominent test set. The original SSB algorithm performs on par with the PSO algorithm and a revised version of the SSB algorithm outperforms both of them. Detailed analysis of the performance on select objective functions brings to light key properties of the three algorithms. The core SSB algorithm is here viewed as a sampling tool for an outer loop that employs statistical pattern recognition. This opens the door for a host of other schemes.


I. INTRODUCTION
S TOCHASTIC optimization [1] involves introducing some degree of randomness, when searching for optima.Particularly successful approaches include genetic algorithms [2], particle swarm optimization [3], and ant-colony optimization [4].Such schemes are often applied to continuous optimization problems, especially when the gradient and Hessian of the objective functions are not readily available.In addition to this they have proven effective when optimizing highly multimodal functions, i.e., function with a large number of optima.
We here compare the performance of the stochastic simplex bisection (SSB) algorithm-first proposed in [5] and first evaluated against other optimization algorithms in [6]-with a particle swarm optimizer (PSO).The former employs a common stochastic optimization scheme, but unlike other stochastic approaches, it applies the scheme to search space regions, rather than to individual points.The latter is a wellknown global optimization scheme, which has spawned a number of related schemes, such as firefly optimization [7] and fish school search [8].This is the first time the SSB algorithm is tested against another state-of-the-art global optimizer.
Two versions of the SSB algorithm were tested: the original one from [5], [6] and a new one, which uses the same core SSB algorithm and a modified outer loop that clusters a shrinking set of arguments with relatively low function values.
The rest of the article is organized as follows.Section II describes the new and old SSB algorithms.Section III details the PSO optimizer used.Section IV discusses the experimental setup, and reports and analyses the findings.

II. THE STOCHASTIC SIMPLEX BISECTION ALGORITHM
Consider the simple problem where we wish to minimize f (x), which is strictly convex on R. Assume further that we have found x 1 < x 3 < x 5 , where f (x 1 ) ≥ f (x 3 ) ≤ f (x 5 ), i.e., that we have found an interval that has an interior point x 3 with a smaller function value than its end points. 1lgorithm for Convex Functions Given x 1 < x 3 < x 5 with f (x 1 ) ≥ f (x 3 ) and f (x 3 ) ≤ f (x 5 ) Recurse on x 1 , x 2 , x 3 4: else if f (x 4 ) ≤ f (x 3 ) then 5: Recurse on x 3 , x 4 , x 5 6: else 7: Recurse on x 2 , x 3 , x 4 8: end if We thus recurse on the subinterval that has an interior point with a smaller function value than its end points. 2

A. The Core SSB Algorithm
The SSB algorithm generalizes this to non-convex functions in n dimensions.An interval is generalized to a simplex, rather than to a hyperbox, The latter has 2 n corners, and bisecting it requires computing the function value in 2 n−1 new corners.The former has only n + 1 corners, and bisecting it only requires calculating the function value in one new point.
Core SSB Step Given a set {T k ′ } of non-overlapping simplexes, 1: Select the next simplex T k to bisect at random with probability 2: Select a bisection point at random roughly in the middle of the longest edge of T k .3: Bisect T k , yielding two new simplexes.4: Replace T k in {T k ′ } with its two offspring.The simplex score s k , which is defined in Section II-A1 below, requires the function value in the simplex midpoint.Thus only three new function values need be calculated for each bisection: that of the bisection point and those of the midpoints of the two new simplexes, see Section II-A2.
The core SSB algorithm starts with a simplex T 0 and maintains a partition of it by repeatedly performing the core SSB step, until some termination criterion is met.It then returns the best point found thus far.The algorithm is complete in the sense that no portion of the search space is ever discarded, and it avoids redundancy by using non-overlapping simplexes.These are two common pitfalls of stochastic optimization.The algorithm still reaps the benefits of stochastic search in exploring more promising regions earlier, in average, while granting also less promising regions a non-zero chance of being explored.
1) The Simplex Score: We define the simplex scores s k as follows.Let } be a collection of ndimensional simplexes with (dropping the index k for clarity), where x is the midpoint and f is the average function value over the corners and the midpoint.We then set which is a combined measure of the lowest function value f − and an estimate δ of how much it might potentially decrease, judging by the average function value f and this lowest value.We next make f ⋆ offset-and scale-invariant through minmax normalization, using the lowest function value f vb in the very best point found this far, and some relatively high, fixed function value f w found early in the search.
Thus, all simplexes must be rescored whenever a new very best point is found.As this happens rather seldom, it incurs very little overhead in practice.The simplex score s is then defined as where l is the length of its longest edge: The parameter λ 0 , which defaults to one, controls the tradeoff between exploration and exploitation.A small λ 0 will make the simplexes with different f ⋆ more equiprobable, thus promoting exploration, a large one will do the opposite.
2) Simplex Bisection: Each round only creates two new simplexes: the longest edge of the selected simplex is bisected.All corners remain the same, save one of the two connected by this edge.Let these corners be x (i) and x (j) .The edge bisection point x′ -the new corner-is also randomized to counter-act any symmetries of the objective function f (x) in its argument x: x′ = (0.5 + θ)x (i) + (0.5 − θ)x (j) for θ ∼ U(−α, α) An empirically good choice is α = 0.05 (max 10% randomness).x′ replaces x (i) in one offspring simplex and x (j) in the other one.Thus only three new function values need be calculated for each bisection: f (x ′ ) and the function values of the midpoints of the two new simplexes.

B. Outer Loop: Maintaining a Hyperbox
It is often desirable to start from a hyperbox.For example, constraints often take the form of bounds on the individual variables of each dimension.The tested SSB algorithms repeatedly restart an SSB algorithm from a hyperbox created in a previous iteration.They use an outer loop over epochs that maintains the hyperbox, and an inner loop over rounds, each of which performs the core SSB step.Note that although partitioning a hyperbox into a set of simplexes is trivial in two dimensions, it is a challenging and time-consuming task in higher ones, see [9] and [10].
The SSB algorithms thus consist of two nested loops.The outer loop over epochs maintains a hyperbox.Each epoch runs the inner loop over rounds, where each round contains one simplex bisection, which replaces one simplex with two new ones.Each epoch is divided into two phases, see below.In the first phase, the simplexes are processed as a first-in-first-out (FIFO) queue to create an initial grid.In the second phase, the next simplex to bisect is selected as in the core SSB step: at random, with probabilities proportional to the simplex scores.
Inner Loop Given A hyperbox H. Run Core SSB step, but select as T k the first simplex of the FIFO queue and add its offspring to the end of the queue.end if {This yields three new objective function val- Rescore all simplexes. 13: end if For f w , we use the highest function value in a corner of the bounding box.The manner in which the hyperbox is modified after each epoch depends on the sequence of points sampled.

1) Old Scheme:
The sequence order matters in the old scheme, and a single hyperbox is created.To this end, we need some terminology.A best point is any point found during the second phase of an epoch that is the best this far in that epoch.The best points thus start over each epoch.The very best point, on the other hand, is the globally best point found in any round of any epoch.Its objective function value is f vb .The first phase of each epoch consists of the first quarter of its rounds.The rest of its rounds constitute the second phase.Other choices than one quarter have been tested and found effective in other settings, see [5], [6].
Initializing other variables 1: Let f b be the best function value found when scoring the simplexes partitioning the hyperbox.2: BestPoints ← ∅ ; Updating other variables Add x a/b/c to BestPoints.

5:
end if 6: end if If the elapsed epoch had enough best points, the next hyperbox is one that contains all best points and the very best point as interior points.It turns out that the simple scheme of updating the lower and higher bounds of the hyperbox in each dimension, for each new best point, works well in practice.Otherwise, the previous hyperbox is increased in size and recentered around the current very best point.In the tested SSB algorithm, the box is padded by 100 percent of the interval length in each dimension, when there are enough best points, and the old interval length is quadrupled, when there aren't.
Creating a new hyperbox Multiply the length of each side of H by some fixed margin factor (e.g., 1.1).5: else 6: Set the length of each side of H to a fixed factor (e.g., 2) times that the old hyperbox.if the side length is zero then 10: Set that side length of H to a fixed factor (e.g., 2) times that of the old hyperbox.

11:
end if 12: end for The old scheme loop may seem somewhat ad hoc.It captures the idea, that if there has been non-trivial local improvements, search should focus on these improvements, yet also consider the globally best point found.We note that any new best point must be a bisection point, or the midpoint of a simplex, with a lower function value than its corner points.In one dimension, these cases coincide.For convex functions, such an interval must contain the minimum, which our introductory algorithm exploits.For non-convex functions, or in several dimensions, the region surrounding such a point merits further investigation.
2) New Scheme: The next hyperbox is in the new scheme created by drawing inferences from the set of sample points using statistical pattern recognition techniques.In its current incarnation, it selects all sample points with a function value below a certain threshold.We call these low points.To generate more low points, they are clustered using k-means clustering, see, e.g., [11], pp.424-430.Points far away from the cluster means are pruned from the clusters.A hyperbox is then created for each cluster and the core SSB algorithm is run on this hyperbox.This will add new sample points and thus presumably additional low points, and is done to search a region containing this cluster for an optimum.In addition, the midpoints between clusters are evaluated and added as sample points.This is done to encourage merging clusters that in fact belong to the same optimum.Since sample points are retained between epochs, their number increases steadily.
The number of rounds in each epoch, where each round requires three function evaluations, is constant and distributed between running the core SSB algorithm on a hyperbox derived from the entire set of low points and on the hyperboxes derived from individual clusters.We found that using half of the rounds for each purpose, and thus one half divided by the number of viable clusters for each of the latter, was appropriate.Thus, compared to the old scheme, fewer rounds are used for global exploration.
The final hyperbox used in the next epoch contains the set of low points.To gradually reduce it in size, the threshold is reduced from epoch to epoch to shrink the relative number of low points.In the first epoch, n i points, say all points, are included; in the last one, only n f , say three points, are.For intermediate epochs we calculate the rank of the highest function value to be included as It is clear that r(0) = n i and that r(1) = n f , independent of the power that x is raised to; four was found empirically to be a good choice.This successively reduces the number of low points, and hopefully forces the algorithm to eventually commit to one cluster containing the global optimum.Thus, the algorithm initially performs global exploration, but gradually resorts to searching a shrinking number of low points and clusters, and eventually focuses on exploiting a small number of low points and very few-if not a single-clusters, to converge on a potential global optimum.
This is an example of using a level curve-aka a contour curve, called a level (hyper-)surface in higher dimensions-of the function.In this case, we select all points inside it.Current CHRISTER SAMUELSSON: THE REVISED STOCHASTIC SIMPLEX BISECTION ALGORITHM AND PARTICLE SWARM OPTIMIZATION work includes applying statistical recognition of level surfaces to identify regions that warrant further investigation, as well as the use of more sophisticated clustering techniques.

C. Related Work
The SSB algorithm uses a typical stochastic optimization scheme.It maintains a set of elements, each with a positive score; randomly selects some elements based on the scores; uses these elements to explore the search space, often creating new elements in the process; and updates the set of elements and their scores according to the findings.The scheme is however here applied to regions of the search space, not to points in it, as in, e.g., PSO [12], shuffled complex evolution [13], covariance matrix adaptation evolution strategy [14], controlled random search [15], differential evolution [16], and firefly optimization [7].
Convergent optimization via most-promising-area stochastic search (COMPASS) [17] sounds similar, but it is a technique for discrete optimization via simulation, where "there is no explicit form of the objective function, and function evaluations are stochastic and computationally expensive."Predictably, it always samples the most promising area next, rather than select it probabilistically using scores, and stochastic search refers to uniform sampling within that area.
The simplex method [18], Chapter 9, doesn't actually use simplexes.To create new points, controlled random search [15] generates random simplexes.These may overlap and are not guaranteed to cover the search space.Nor are they subdivided.DIRECT [19] uses hyperboxes that partition the search space.It avoids the 2 n complexity by directional search from their midpoints, ignoring their corners.Each hyperbox potentially containing the global minimum is trisected, rather than bisected, along each dimension in turn.There is no randomized selection.
Whereas [20] uses stochastic optimization to improve kmeans clustering by avoiding local optima, the new SSB scheme applies k-means clustering to data generated by the core SSB algorithm to distinguish different optima and coordinate disparate data points corresponding to the same one.

III. THE PARTICLE SWARM OPTIMIZER
The employed PSO optimizer follows [1], p 121, with a few modifications.
The swarm consists of a number, typically 20-40, of boids-a word play on bots, birds, and droids-each possessing a position x and a velocity v.The objective function value f (x) is also recorded.In each iteration, the position and velocity of each boid are updated as follows.
1 is the best point historically of the boid in question, while x b 2 is the best current point of all boids. 4Thus, x b i − x is the vector from the current point x to the best point x b i . 4Best here means "with the lowest objective function value." The idea is that the velocity v is attracted to both these two best points, with acceleration, or attraction coefficients, c i u i .The first of these two terms of the velocity update is called the cognitive component, depending only on the history of the individual boid, and the second one is called the social component, depending on the swarm as a whole.
In [1], p 121, u i is in fact a random scalar u i , drawn uniformly from [0, 1], and the multiplication is scalar multiplication of the difference vector.This guarantees that this acceleration term is along the line connecting x and x b i .We instead use the Hadamard product, i.e., the component-wise product, denoted •, and let u 1 and u 2 be vectors where each element is independently uniform on [0, 1].This is the method used in [21], p 219.The acceleration term is then in a cone or pyramid centered around x b i − x.We set both the two scalars c 1 and c 2 to 2, which is standard practice and which means that each component of c i u i has expectation 1.
The first term, θv, where θ is called the inertia weight, is simply the previous velocity for θ = 1.With θ > 1, this term accelerates the boid in the direction it is currently travelling, preferring exploration over exploitation.With θ < 1, it instead dampens the velocity, making it pay more heed to the cognitive and social acceleration components, thus preferring exploitation over exploration.It thus makes sense to start with an inertia weight above one, and then successively reduce it, ending the search with an inertia weight below one.To this end, we use dynamic inertia weighting to bring down θ geometrically from 1.4 in the first iteration to 0.3 in the last, cf.[1], pp 127-128.
As the velocities are initially self-accelerating, it is important to restrain them.Each velocity component is capped to have an absolute value that does not exceed a given maximum v max .This is another key parameter for balancing exploration and exploitation.The boids must stay within the feasibility domain, which is here a bounding hyperbox.Any boid that attempts to leave it has the offending x coordinate set to the boundary point.The corresponding v coordinate is set to the negative of its value.This leads to hard (elastic) reflection in the boundary, much like billiard balls.

A. Experimental Setup
We tested all two-dimensional objective functions of Figure 1 (penultimate page), most of which are from [22].Function 0 comes from [5], Function 20 from [23], and Functions 21, 22, 24, and 25 are of our own design.All functions have unique global minima, except Function 0, due to symmetry in x and x + y, and Functions 12, 14, and 17, which have four global minima, due to symmetry in ±x, ±y.Functions 16 and 17 were corrected using [24].
We tested the optimizers on the three domains: were made asymmetric in x and y, since many test functions have their global minimum in x = 0.In each trial, each optimizer was allowed 50 000 function evaluations.Success was defined as finding any argument with a function value within 10 −13 of the known global minimal value.The optimal value of v max was determined for the PSO in each test domain.It used 20 boids and 2500 iterations to achieve the limit of 50 000 function evaluations.Fewer iterations result in failure to converge to within 10 −13 of the minimum, whereas fewer boids fail to explore the search space effectively.Both SSB schemes used 40 epochs of 415 rounds each (40 × 415 × 3 = 49 800), which deviates from the 60 epochs and 276 rounds (60 × 276 × 3 = 49 680) used in [5].This was done to accommodate the revised SSB scheme, where each epoch requires running the core SSB algorithm on a set of hyperboxes, one for each cluster, in addition to the hyperbox of the current epoch, see Section II-B, New Scheme.

B. Experimental Results
Table I shows their respective success rates in 1000 trials.We first and foremost note that the original SSB algorithm holds its own against the PSO algorithm and that the revised SSB scheme performs significantly better.
Ignoring ties, the revised SSB scheme is better than the PSO algorithm in 10 cases of 13 in the smallest domain and 10 of 14 cases in the other two domains.The pairwise sign test [25], which is not a very powerful test, yields p-values of 0.046 in the smallest domain, and 0.09 in the other two domains.Similarily, the revised SSB scheme is better than the original SSB scheme 18 times of 20 in the smallest domain and 15 times of 20 in two larger domains, yielding yields pvalues of 0.0002 and 0.021, respectively.We used the sign test, rather than, say, the Wilcoxon signed-rank test [26], which is more powerful, since the objective functions vary greatly in difficulty.Thus, the difference pairs cannot be seen as drawn from the same distribution.
To further analyze these results, we need to consider the nature of each test objective function.These can be classified into: • unimodal quadratic forms, Fcns 2, 6, 8; • oligo-modal5 polynomials, Fcns 3, 4, 5, 10, 18; • multimodal damped trigonometrics, Fcns 0, 1, 12-17, 25; • mixed trigonometrics and quadratics, Fcns 9, 11, 20, 24; • other (unimodal) functions, Fcns 7, 21, 22.The first group is simple there to see that the optimizers are sane; any deviation from a perfect score signals a fundamental problem.Functions 9, 10, 16, and 21 behave similarly.For these functions, the PSO and the new SSB perform flawlessly, whereas the old SSB fails in a small number of the trials.Inspection reveals that the reason for this is that the old SSB prematurely commits to a hyperbox that does not contain the global optimum.Subsequent epochs tend to repair this by increasing the hyperbox size, but this does not happen fast enough to allow convergence to within 10 −13 of the optimum value.Conversely, this behavior pays dividends in the largest domain and for the hardest functions-0,11,and 14-where the old SSB algorithm prevails.
Function 7 is very hard, and defeats all optimizers.It has a concave valley, kinks, and a very anisotropic variable coupling.The gradient is ill-defined and unbounded in the valley bottom, and especially ill-behaved in the optimum.It is included as a reminder that our successes are relative.Memento mori.
Functions 1,11,14 are essentially constant at some distance to the optimum.This makes it harder for the exploration aspect of the algorithms to locate the target optimum.As the domain size increases, performance drops from full score, for at least some optimizer, to essentially total failure for all of them.Functions 12, 14, and 17 have multiple global optima due to symmetries.This profits the old SSB scheme and leaves the PSO scheme in the dust.In conjunction with the very localized variation of function 14, this defeats the new SSB scheme, as it does not have enough rounds for exploration.
The only functions for which the old SSB scheme does better than the new one are Functions 0, 4, 12, 14, and 17.We have just discussed the latter three, which have multiple global optima due to symmetries.Function 0 has very many local optima and the only real remedy is exploration.Function 4 is interesting in that it is a case where the clustering fails to some extent.Better clustering methods raise the success rate from 0.435 to 0.864.This is however beyond the scope of the current article.

V. SUMMARY AND CONCLUSIONS
We evaluated the stochastic simplex bisection (SSB) algorithm against a particle swarm optimizer (PSO) on a prominent test set.The former employs a common stochastic optimization scheme, but unlike other stochastic approaches, it applies the scheme to search space regions, rather than to individual points.The latter is a well-known workhorse for stochastic optimization.This is the first evaluation of the SSB algorithm against a state-of-the-art global optimizer.
The original SSB scheme holds its own against the PSO.The revised SSB scheme is better at exploitation than the old one, allowing it to significantly outperform both the PSO and old SSB schemes in all three domains.
The key difference between the new and original SSB schemes is that the new one applies statistical pattern recognition to the data points sampled using the core SSB algorithm.This opens the door for a host of other schemes that view stochastic optimization not as a random walk, but as statistical inference.Current work includes using more sophisticated statistical pattern recognition techniques to identify regions that warrant further investigation.
The PSO algorithm has been in extensive use since 1995, and it comes with a large body of experience and knowhow.The SSB algorithm was first published in 2015 and it is still very much evolving.It already outperforms the PSO algorithm.There is every reason to expect rapid progress in its performance.Objective Functions Fcns 21, 22, 24, 25 are novel; Fcn 0 from [5]; Fcn 20 from [23]; remainder from [22].Fcns 16 and 17 corrected using [24].

Fig. 1
Fig. 1 List of objective functions 7: end if 8: for all sides of the new hyperbox H do

TABLE I SUCCESS
RATE IN 1000 TRIALS OF THE PSO THE SSB ALGORITHMS.