A Comparison between Different Chess Rating Systems for Ranking Evolutionary Algorithms

Niki Veček, Marjan Mernik, Matej Črepinšek, Dejan Hrnčič

DOI: http://dx.doi.org/10.15439/2014F33

Citation: Proceedings of the 2014 Federated Conference on Computer Science and Information Systems, M. Ganzha, L. Maciaszek, M. Paprzycki (eds). ACSIS, Vol. 2, pages 511–518 (2014)

Full text

Abstract. Chess Rating System for Evolutionary algorithms (CRS4EAs) is a novel method for comparing evolutionary algorithms which evaluates and ranks algorithms regarding the formula from the Glicko-2 chess rating system. It was empirically shown that CRS4EAs can be compared to the standard method for comparing algorithms - null hypothesis significance testing. The following paper examines the applications of chess rating systems beyond Glicko-2. The results of 15 evolutionary algorithms on 20 minimisation problems obtained using the Glicko-2 system were empirically compared to the Elo rating system, Chessmetrics rating system, and German Evaluation Number (DWZ). The results of the experiment showed that Glicko-2 is the most appropriate choice for evaluating and ranking evolutionary algorithms. Whilst other three systems' benefits were mainly the simple formulae, the ratings in Glicko-2 are proven to be more reliable, the detected significant differences are supported by confidence intervals, the inflation or deflation of ratings is easily detected, and the weight of individual results is set dynamically.