New Measures of Algorithms Quality for Permutation Flow-Shop Scheduling Problem

The permutation flow-shop scheduling problem (PFSP) is an important problem in production industry. The problem has been a subject of many research and various algorithms to solve PFSP have been developed over the years. The newly developed algorithms are usually tested on Taillard and VRF benchmarks and their results are compared using various measures that assess the size of error made by an algorithm and the computation time. In this paper, we propose two new measures to assess the quality of results of algorithms for solving PFSP with the makespan criterion. The first ARD.NEH measure gives similar results as the well known ARPD measure but is robust to updates of the best known solutions of benchmark problems. The second ARID measure is an interval-based measure which is able to assess whether the good quality of an algorithm results stems from its good behavior of this algorithm for a few instances or from its good behavior for most instances. The computational experiments confirm the usefulness of the proposed quality measures.


I. INTRODUCTION
T HE permutation flow-shop scheduling problem (PFSP) is one of the most studied combinatorial optimization problems, rooted in the manufacturing industry.It can be defined as follows: given a finite set of m machines {M 1 , . . ., M m } and a finite set of n jobs {J 1 , . . ., J n }, each of which should go through all the m machines in the same order, the goal is order the jobs so as to minimize the assumed optimization criterion (e.g., makespan, total tardiness, flow time, cost, energy consumption).
The PFSP with makespan criterion, commonly referred to as Fm|prmu|C max [1], is undoubtedly the most frequently investigated scheduling problem.Garey and Johnson [2] proved that Fm|prmu|C max is NP-hard if m ⩾ 3. Therefore, various heuristics have been developed to solve this problem in a reasonable amount of time.Among them, the Navaz, Enscore and Ham (NEH) construction heuristic [3] plays an important role; for a long time NEH has been regarded as the best heuristic for solving Fm|prmu|C max .
Since optimal solutions are generally not known for some instances, the only way to asses the results of new methods is to compare them with the best solutions known so far.The well-known measure of solution quality, initially referred to as the increase over optimum (IOO) [4] and later as the relative percentage deviation (RPD) [5], is defined as: where S is the solution of the evaluated algorithm and Best is the best solution known so far for a given instance of the problem.For a group of instances, a synthetic solution quality measure, called the average relative percentage deviation (ARPD), is calculated as: where I is the number of instances, S i is the solution of the evaluated algorithm on the instance i of a given size, and Best i is the best solution known so far for this instance.The quality of solutions is obviously not the only aspect of algorithms evaluation -the running time is also an important feature (we often face the trade-off between the quality of results and computational time).Literature research shows that the computational time is often reported in time units (usually in milliseconds) [6], sometimes, especially in case of simpler algorithms, the computational complexity is provided.Given several algorithms to be compared and various instances, the computational effort is usually measured by using the average CPU time (ACPU) computed as follows: where CPU i,j is the CPU time consumed by algorithm j on instance i.However, the running time scheduling algorithms strongly depends on the size of the problem instance, therefore Fernandez-Viagas and Framinan [7] proposed to measure the average relative percentage time (ARPT) consumed by algorithm j: where RPT i,j (relative percentage computation time of algorithm j for instance i) is computed as: Since ARPT ′ j can yield negative values (ARPT ′ j > −1), Fernandez-Viagas and Framinan [8] proposed to compute ARPT = ARPT ′ + 1, which allows the graphics to be shown in logarithmic scale.
The above described features of ACPU and ARPT make these two measure not very authoritative and quite cumbersome in practice.In [9], we have proposed the ART.NEH (the Average Relative Time over NEH) indicator defined by the following formula: where I is the number of considered instances, CPU i is the CPU time of a considered algorithm for the instance i, and CPU i,NEH is the CPU time of NEH for the instance i. ART.NEH indicates how many times, on average, the evaluated algorithm is faster (ART.NEH<1) or slower (ART.NEH>1) than the classical NEH.Following this idea, we propose in Section II several new measures to compare the quality of results produced by algorithms for solving PFSP with the makespan criterion.Numerical experiments showing the usefulness of the proposed measures are described in Section III.The paper ends with concluding remarks.

II. NEW MEASURES OF ALGORITHMS EFFICIENCY
New algorithms are expected to be better than existing ones, but a fair comparison of algorithms is quite difficult (due to implementation issues and hardware used).However, most of papers on solving PFSP with the makespan criterion provide the results produced by NEH.So, it seems quite natural to use this well-known heuristic as a computational benchmark.
The ARPD indicator given by formula ( 2) is by far the most popular measure for assessing the quality of scheduling algorithms taking into account the size of the error.It has, however, some drawbacks which led to the development of alternative measures.An important drawback, we want to emphasize, is that the value of ARPD can change when new better solutions are found for an analyzed instance.In this regard, the ARPD value of an algorithm can change significantly over the years.A good example can be the most known PFSP benchmark -Taillard's benchmark [10] published in 1993.Though it is now 30 years since its publication, better solutions are still found for various instances [11].Thus, since the ARPD factors change, the whole measure change as well.In that case, it is difficult to compare the new results with existing (published) ones due to different reference values (the results of such a comparison may not be reliable).To get rid of this drawback, in this paper we propose a new measure ARD.NEH (Average Relative Deviation over NEH) which will not change in time thanks to the use of the NEH results as reference results.The proposed ARD.NEH measure is computed from the following formula: where I is the number of instances, S i is the solution of the evaluated algorithm on the instance i, and NEH i is the solution obtained using the NEH algorithm for this instance.
The main reason for developing this measure was to make it easier to compare the results produced by new algorithms with the results available in the literature.The advantage of ARD.NEH over ARPD is that it does not change over time.This particular feature of ARD.NEH is due to the fact that ARD.NEH does not depend on the best solutions known so far, but on the results of NEH.So, the measure is especially useful to deal with those problems for which the optimal solution is not know yet.Since ARD.NEH indicates how far the results of an algorithm are from the results of NEH, the greater is ARD.NEH the better.
Another new measure to assess the quality of the results, we propose in this paper, is the ARID(inf, sup) (Average Relative Interval Deviation) measure.ARID(inf, sup) is different from existing quality measures in that it is based on the interval [inf, sup] (it is assumed that the interval [inf, sup] can be improper) instead of a single value (reference point).By taking different intervals, we can obtain various quality measures.The concept behind this measure is to equalize the impact of each benchmark instance on the final value of the evaluation measure.The value of ARID(inf, sup) is computed from the following formula: where S i is the solution for the instance i. Proposition 1: ARPD and ARD.NEH measures are a special case of the ARID measure.
Proof: Let I be the set of instances, and Best i , NEH i , and S i the best known solution, the solution produced by NEH, and the solution for the instance i, respectively.It holds that In what follows, we set inf = Best, sup = NEH, where Best means that we use the best solutions (makespans) known so far for benchmark instances, and NEH means that we use Authorized licensed use limited to the terms of the applicable license agreement with IEEE.Restrictions apply.
the solutions produced by NEH for the respective instances.Then, ARID(Best,NEH) (further referred to as simply ARID), similarly as ARPD, uses the best know solutions, so ARID is recommended to be used for problems with Best = Opt.ARID allows to equalize the impact of different instances on the final result.For example, the ARPD value for Taillard benchmark is the most influenced by the instances having the best solutions far from the optimum and the less influenced by the instances having the best solution close to the optimum.Making each instance to have comparable impact on the final evaluation of an algorithm, allows to compare different algorithms in terms of the stability of their results in relation to the dynamically determined value, which in this case is the result of NEH.
The result of NEH can therefore be considered as a kind of assessment of the difficulty of a given instance.The stability of an algorithm should be understood here as a possibility to obtain better results than NEH for as many instances as possible.Let us note that the value of ARID, similarly as the value of ARD.NEH, should be maximized.The next section presents the experiments that aim to show the usefulness of the proposed measure of the algorithms quality.

III. COMPUTATIONAL EXPERIMENT
The measures proposed in Section II were used to assess the results of various algorithms for Taillard benchmark [10] and VRF Large benchmark instances [12].Best solutions provided by the authors of the benchmarks are updated with the recent results presented in [11] (Taillard) and [13] (VRF Large).
Tables I and II show the values of the ARPD, ARD.NEH and ARID measures obtained for, respectively, Taillard and VRF Large benchmarks by using selected deterministic algorithms for solving PFSP (cf., [14], [15], [16], [17], [18], [7], [8], [19], [20], [21], [22], [9]).As can be seen from the tables, only two algorithms (RAER and RAER-di) achieved negative values of ARD.NEH measure, which means that their average results were worse than the average result of NEH.It can be seen as well that only FRB and N -list technique-based algorithms (the latter will be further referred to as N -algorithms) achieved the results that are better than NEH results by more than 1 percent, for both benchmarks.As for the ARID measure, only FRB algorithms and Nalgorithms achieved the values greater than 15%.Moreover, only 3 algorithms (for Taillard benchmark) and 2 algorithms (for VRF Large benchmark instances) achieved the results greater than 50%, which means that only 3 algorithms were able to improve the results of NEH by, on average, more than a half distance between the best solution produced by NEH and the best solution known so far for a given instance.
Figures 1 and 2 show the rank (y-axis) of each algorithm with respect to the specific quality measure.As we can see, the ranks with respect to ARD.NEH and ARPD coincide for all algorithms.This means that the ARPD measure can be successfully replaced with the ARD.NEH measure.If we take a look at the ARID measure, we can see that this measure ranks the algorithms in a different manner than the other two measures.Those algorithms that are ranked below the Authorized licensed use limited to the terms of the applicable license agreement with IEEE.Restrictions apply.

IV. CONCLUSION
This work proposes two new measures for assessing the quality of results produced by algorithms for solving permutation flow-shop problems with the makespan criterion.The first ARD.NEH measure has the very useful feature of elimination of the dependency of the quality assessment from the best known results which, as shown by the performed analysis, change over time, and therefore the comparison of new results with the older one might be cumbersome.The second ARID measure is to our best knowledge the first interval-based measure.It is worth to underline that the ARID measure with properly selected intervals is equivalent to ARPD or ARD.NEH measures.The proposed new measure have been tested on 42 selected deterministic algorithms for solving PFSP run on Taillard and VRF Large benchmarks.Based on the obtained results it can be concluded that ARPD and ARD.NEH measures coincide, i.e., they rank the algorithms in a very similar manner.The ARID measure, in turn, is useful in assessing the stability of the algorithms, i.e., it indicates whether a good (average) quality of results stems from good results for a few instances of from good results for most instances.The numerical experiments show that the proposed measures are very useful for more reliable comparison of algorithms for solving PSFP with the makespan criterion.

Fig. 1 .Fig. 2 .
Fig. 1.Ranking of algorithms based on ARPD, ARD.NEH and ARID values for Taillard benchmark Proceedings of the 18 th Conference on Computer Science and Intelligence Systems pp.1107-1111

TABLE I ARPD
, ARD.NEH AND ARID VALUES FOR TAILLARD BENCHMARK

TABLE II ARPD
, ARD.NEH AND ARID VALUES FOR VRF LARGE INSTANCES