A Non-Deterministic Strategy for Searching Optimal Number of Trees Hyperparameter in Random Forest
Kennedy Senagi, Nicolas Jouandeau
DOI: http://dx.doi.org/10.15439/2018F202
Citation: Proceedings of the 2018 Federated Conference on Computer Science and Information Systems, M. Ganzha, L. Maciaszek, M. Paprzycki (eds). ACSIS, Vol. 15, pages 73–80 (2018)
Abstract. In this paper, we present a non-deterministic strategy for searching for optimal number of trees hyperparameter in Random Forest (RF). Hyperparameter tuning in Machine Learning (ML) algorithms is essential. It optimizes predictability of an ML algorithm and/or improves computer resources utilization. However, hyperparameter tuning is a complex optimization task and time wasting. We set up experiments with the goal of maximizing predictability, minimizing number of trees and minimizing time of execution. Compared to the deterministic algorithm, this research's non-deterministic algorithm recorded an average percentage accuracy of approximately 98\\%, number of trees percentage average improvement of 44.64\\%, average time of execution mean improvement ratio of 212.79 and an average improvement of 93\\% iterations. Moreover, evaluations using Jackkife Estimation show stable and reliable results from several experiment runs of the non-deterministic strategy. The non-deterministic approach in selecting hyperparameter shows a significant accuracy and better computer resources (i.e cpu and memory time) utilization. This approach can be adopted widely in hyperparameter tuning, and in conserving utilization of computer resources i.e green computing.
References
- J. Bergstra and Y. Bengio. "Random search for hyper-parameter optimization." Journal of Machine Learning Research, pp. 281-305, 2012.
- L. Breiman. "Random forests." Machine learning, Kluwer Academic Publisher, 45(1), http://dx.doi.org/10.1023/A:1010933404324, pp. 5-32, 2001.
- Breiman, L., and Cutler, A. (2003), "Random forests manual v4.0", Technical report, UC Berkeley. https://www.stat.berkeley.edu/~breiman/Using_random_forests_v4.0.pdf Date Accessed: July 2018.
- T. Chen and C. Guestrin. "Xgboost: A scalable tree boosting system." In Proceedings of the 22nd ACM sigkdd international conference on knowledge discovery and data mining, http://dx.doi.org/10.1145/2939672.2939785, pp. 785-794. ACM, 2016.
- I. Dewancker, M. McCourt, S. Clark, P. Hayes, A. Johnson and G. Ke. "A stratified analysis of bayesian optimization methods." Cornell University Library, https://arxiv.org/abs/1603.09441 [cs.LG], 2016.
- Y. Ganjisaffar, T. Debeauvais, S. Javanmardi, R. Caruana and C.V. Lopes. "Distributed tuning of machine learning algorithms using MapReduce clusters." In Proceedings of the Third Workshop on Large Scale Data Mining: Theory and Applications, http://dx.doi.org/10.1145/2002945.2002947 USA, 2011.
- E. Hazan, A. Klivans and Y. Yuan. "Hyperparameter optimization: A spectral approach.", https://arxiv.org/abs/1706.00764 [cs.LG], 2017.
- B.F. Huang and P.C. Boutros. "The parameter sensitivity of random forests." BMC bioinformatics, 17(1), http://dx.doi.org/https://doi.org/10.1186/s12859-016-1228-x, 2016.
- J.P. Lalor, H. Wu and H. Yu. "CIFT: Crowd-informed fine-tuning to improve machine learning ability". https://arxiv.org/abs/1702.08563 [cs.CL], 2017
- Kenya Agricultural and Livestock Research Organization. http://www.kalro.org/. Date Accessed: July 2018.
- K. Senagi, N. Jouandeau and P. Kamoni. "Machine learning algorithms for soil analysis and crop production optimization: A review". In Proceedings of the International Conference on Mass Data Analysis of Images and Signals (MDA), USA, pp. 1-15, 2017.
- K. Senagi, N. Jouandeau and P. Kamoni. "Using parallel random forest classifier in predicting land suitability for crop production". Journal of Agricultural Informatics. Vol. 8 (3), 2017.
- J. Snoek, H. Larochelle and R.P. Adams. "Practical bayesian optimization of machine learning algorithms." In Advances in neural information processing systems, pp. 2951-2959, 2012.
- S.K. Smit and A.E. Eiben. "Comparing parameter tuning methods for evolutionary algorithms." In Evolutionary Computation CEC’09, IEEE, http://dx.doi.org/10.1109/CEC.2009.4982974, pp. 399-406, May 2009.
- T.P. Oshiro, S.J. Perez and A. Baranauskas. "How many trees in a random forest?" In Proceedings of the International Workshop on Machine Learning and Data Mining in Pattern Recognition, Springer, Berlin, Heidelberg, http://dx.doi.org/https://doi.org/10.1007/978-3-642-31537-4_13, pp. 154-168, 2012.
- S. Wager, T. Hastie and B. Efron. "Confidence intervals for random forests: The Jackknife and the infinitesimal Jackknife." Journal of Machine Learning Research 15(1), 1625-1651, 2014.