Adaptive Supervisor: Method of Reinforcement Learning Fault Elimination by Application of Supervised Learning
Mateusz Krzysztoń
DOI: http://dx.doi.org/10.15439/2018F236
Citation: Proceedings of the 2018 Federated Conference on Computer Science and Information Systems, M. Ganzha, L. Maciaszek, M. Paprzycki (eds). ACSIS, Vol. 15, pages 139–143 (2018)
Abstract. Reinforcement Learning (RL) is a popular approach for solving increasing number of problems. However, standard RL approach has many deficiencies. In this paper multiple approaches for addressing those deficiencies by incorporating Supervised Learning are discussed and a new approach, Reinforcement Learning with Adaptive Supervisor, is proposed. In this model, actions chosen by the RL method are rated by the supervisor and may be replaced with safer ones. The supervisor observes the results of each action and on that basis it learns the knowledge about safety of actions in various states. It helps to overcome one of the Reinforcement Learning deficiencies -- risk of wrong action execution. The new approach is designed for domains, where failures are very expensive. The architecture was evaluated on a car intersection model. The proposed method eliminated around 50\\% of failures.
References
- Fangyi Zhang, Jürgen Leitner, Michael Milford, Ben Upcroft, and Peter Corke. Towards vision-based deep reinforcement learning for robotic motion control. https://arxiv.org/abs/1511.03791, 2015.
- Guillaume Lample and Devendra Singh Chaplot. Playing fps games with deep reinforcement learning. In AAAI, pages 2140–2146, 2017.
- Mevludin Glavic, Raphaël Fonteneau, and Damien Ernst. Reinforcement learning for electric power system decision and control: Past considerations and perspectives. IFAC-PapersOnLine, 50(1):6918–6927, 2017
- Reid, M. Ryan, M. R. K.: Using ILP to Improve Planning in Hierarchical Reinforcement Learning. In: Proceedings of the 10th International Conference on Inductive Logic Programming (ILP ’00). Springer-Verlag, London, UK, pp. 174-190, 2000. http://dx.doi.org/10.1007/3-540-44960-4_11
- Fachantidis, A.,Partalas, I.,Tsoumakas G.,Vlahavas, I.: Transferring task models in Reinforcement Learning agents. Neurocomput. 107, pp.23-32. May 2013. http://dx.doi.org/10.1016/j.neucom.2012.08.039
- Javier Garcıa and Fernando Fernández. A comprehensive survey on safe reinforcement learning. Journal of Machine Learning Research, 16(1):1437–1480, 2015.
- José M Faria. Machine learning safety: An overview. 2018.
- Javier Garcia and Fernando Fernández. Safe exploration of state and action spaces in reinforcement learning. Journal of Artificial Intelligence Research, 45:515–564, 2012, http://dx.doi.org/10.1613/jair.3761
- Uther, W. T. B.—Veloso, M. M.: Tree based discretization for continuous state space reinforcement learning. Proceedings of the fifteenth national/tenth conference on Artificial intelligence/Innovative applications of artificial intelligence (AAAI ’98/IAAI ’98), American Association for Artificial Intelligence, Menlo Park, CA, USA, pp. 769-774, 1998
- Kalyanakrishnan, S.—Stone, P.—Liu, Y.: Model-Based Reinforcement Learning in a Complex Domain, RoboCup 2007: Robot Soccer World Cup XI, Springer-Verlag, Berlin, Heidelberg, 2008
- Henderson, J.,Lemon, O.,Georgila, K.: Hybrid reinforcement/supervised learning for dialogue policies from communicator data, IJCAI workshop on Knowledge and Reasoning in Practical Dialogue Systems, 2005
- Krzysztoń, M. Sniezynski, B.: Combining Machine Learning and Multi-Agent Approach for Controlling Traffic at Intersection. Computational Collective Intelligence, Springer, 2015, pp 57-66
- Wiatrak, Ł.: Hybrid Learning in agent systems, Master Thesis, AGH University of Science and Technology, Cracow, 2012 (in Polish)
- Maclin, R.—Shavlik, J. W.: Creating advicetaking einforcement learners, Machine Learning, 22((13)): pp. 251-281, 1996
- Benbrahim, H., Franklin, J. A.: Biped dynamic walking using reinforcement learning. Robotics and Autonomous Systems,22,pp.283-302, 1997.
- Cetina, V.U.: Supervised reinforcement learning using behavior models, Machine Learning and Applications, 2007. ICMLA 2007, pp.336-341, 13-15 Dec. 2007. http://dx.doi.org/10.1109/ICMLA.2007.14