Reinforcement Learning Algorithms for Online Single-Machine Scheduling

Yuanyuan Li; Edoardo Fadda; Daniele Manerba; Roberto Tadei; Olivier Terzo

Reinforcement Learning Algorithms for Online Single-Machine Scheduling

Yuanyuan Li, Edoardo Fadda, Daniele Manerba, Roberto Tadei, Olivier Terzo

DOI: http://dx.doi.org/10.15439/2020F100

Citation: Proceedings of the 2020 Federated Conference on Computer Science and Information Systems, M. Ganzha, L. Maciaszek, M. Paprzycki (eds). ACSIS, Vol. 21, pages 277–283 (2020)

Full text

Abstract. Online scheduling has been an attractive field of research for over three decades. Some recent developments suggest that Reinforcement Learning (RL) techniques have the potential to deal with online scheduling issues effectively. Driven by an industrial application, in this paper we apply four of the most important RL techniques, namely Q-learning, Sarsa, Watkins's Q(λ), and Sarsa(λ), to the online single-machine scheduling problem. Our main goal is to provide insights on how such techniques perform. The numerical results show that Watkins's Q(λ) performs best in minimizing the total tardiness of the scheduling process.

References

P. Brucker, Scheduling Algorithms, 5th ed. Springer Publishing Company, Incorporated, 2010.
S. C. Graves, “A review of production scheduling,” Operations Research, vol. 29, no. 4, pp. 646–675, 1981.
Y. Li, S. Carabelli, E. Fadda, D. Manerba, R. Tadei, and O. Terzo, “Machine learning and optimization for production rescheduling in industry 4.0,” The International Journal of Advanced Manufacturing Technology, pp. 1–19, 2020.
T. Gabel and M. Riedmiller, “Adaptive reactive job-shop scheduling with reinforcement learning agents,” International Journal of Information Technology and Intelligent Computing, vol. 24, no. 4, pp. 14–18, 2008.
H. Sharma and S. Jain, “Online learning algorithms for dynamic scheduling problems,” in 2011 Second International Conference on Emerging Applications of Information Technology, 2011, pp. 31–34.
T. Zhang, S. Xie, and O. Rose, “Real-time job shop scheduling based on simulation and markov decision processes,” in 2017 Winter Simulation Conference (WSC). IEEE, 2017, pp. 3899–3907.
P. Castrogiovanni, E. Fadda, G. Perboli, and A. Rizzo, “Smartphone data classification technique for detecting the usage of public or private transportation modes,” IEEE Access, vol. 8, pp. 58 377–58 391, 2020.
E. Fadda, P. Plebani, and M. Vitali, “Optimizing monitorability of multi-cloud applications,” 06 2016, pp. 411–426.
E. Fadda, G. Perboli, and G. Squillero, “Adaptive batteries exploiting on-line steady-state evolution strategy,” in Applications of Evolutionary Computation, G. Squillero and K. Sim, Eds. Cham: Springer International Publishing, 2017, pp. 329–341.
E. Fadda, D. Manerba, R. Tadei, P. Camurati, and G. Cabodi, “KPIs for Optimal Location of charging stations for Electric Vehicles: the Biella case-study,” in Proceedings of the 2019 Federated Conference on Computer Science and Information Systems, ser. Annals of Computer Science and Information Systems, M. Ganzha, L. Maciaszek, and M. Paprzycki, Eds., vol. 18. IEEE, 2019, pp. 123–126. [Online]. Available: http://dx.doi.org/10.15439/2019F171
E. Fadda, D. Manerba, G. Cabodi, P. Camurati, and R. Tadei, “Comparative analysis of models and performance indicators for optimal service facility location,” Transportation Research part E: Logistics and Transportation Reviews (submitted), 2020.
R. Giusti, C. Iorfida, Y. Li, D. Manerba, S. Musso, G. Perboli, R. Tadei, and S. Yuan, “Sustainable and de-stressed international supply-chains through the synchro-net approach,” Sustainability, vol. 11, p. 1083, 02 2019.
K. Takadama and H. Fujita, “Toward guidelines for modeling learning agents in multiagent-based simulation: Implications from q-learning and sarsa agents,” in International Workshop on Multi-Agent Systems and Agent-Based Simulation. Springer, 2004, pp. 159–172.
V. Mnih, K. Kavukcuoglu, D. Silver, A. Graves, I. Antonoglou, D. Wierstra, and M. Riedmiller, “Playing atari with deep reinforcement learning,” arXiv preprint https://arxiv.org/abs/1312.5602, 2013.
R. S. Sutton and A. G. Barto, Reinforcement learning: An introduction. MIT press, 2018.
C. J. C. H. Watkins, Learning from delayed rewards. Thesis Submitted for Ph.D., King’s College, Cambridge, 1989.
A. Kaban, Z. Othman, and D. Rohmah, “Comparison of dispatching rules in job-shop scheduling problem using simulation: a case study,” International Journal of Simulation Modelling, vol. 11, no. 3, pp. 129–140, 2012.
H. Suwa and H. Sandoh, Online scheduling in manufacturing: A cumulative delay approach. Springer Science & Business Media, 2012.
R. L. Graham, “Bounds for certain multiprocessing anomalies,” Bell System Technical Journal, vol. 45, no. 9, pp. 1563–1581, 1966.
S. S. Panwalkar and W. Iskander, “A survey of scheduling rules,” Operations Research, vol. 25, no. 1, pp. 45–61, 1977.
J. R. Correa and M. R. Wagner, “Lp-based online scheduling: from single to parallel machines,” Mathematical Programming, vol. 119, no. 1, pp. 109–136, 2009.
X. Lu, R. Sitters, and L. Stougie, “A class of on-line scheduling algorithms to minimize total completion time,” Operations Research Letters, vol. 31, no. 3, pp. 232–236, 2003.
S. Xie, T. Zhang, and O. Rose, “Online single machine scheduling based on simulation and reinforcement learning,” in Simulation in Produktion und Logistik 2019. Simulation in Produktion und Logistik 2019, 2019.
S. Singh, T. Jaakkola, M. L. Littman, and C. Szepesvári, “Convergence results for single-step on-policy reinforcement-learning algorithms,” Machine learning, vol. 38, no. 3, pp. 287–308, 2000.
V. François-Lavet, R. Fonteneau, and D. Ernst, “How to discount deep reinforcement learning: Towards new dynamic strategies,” arXiv preprint https://arxiv.org/abs/1512.02011, 2015.
V. Cerone, E. Fadda, and D. Regruto, “A robust optimization approach to kernel-based nonparametric error-in-variables identification in the presence of bounded noise,” in 2017 American Control Conference (ACC). IEEE, may 2017. [Online]. Available: https://doi.org/10.23919