Logo PTI
Polish Information Processing Society
Logo FedCSIS

Annals of Computer Science and Information Systems, Volume 15

Proceedings of the 2018 Federated Conference on Computer Science and Information Systems

Adaptive Supervisor: Method of Reinforcement Learning Fault Elimination by Application of Supervised Learning

DOI: http://dx.doi.org/10.15439/2018F236

Citation: Proceedings of the 2018 Federated Conference on Computer Science and Information Systems, M. Ganzha, L. Maciaszek, M. Paprzycki (eds). ACSIS, Vol. 15, pages 139143 ()

Full text

Abstract. Reinforcement Learning (RL) is a popular approach for solving increasing number of problems. However, standard RL approach has many deficiencies. In this paper multiple approaches for addressing those deficiencies by incorporating Supervised Learning are discussed and a new approach, Reinforcement Learning with Adaptive Supervisor, is proposed. In this model, actions chosen by the RL method are rated by the supervisor and may be replaced with safer ones. The supervisor observes the results of each action and on that basis it learns the knowledge about safety of actions in various states. It helps to overcome one of the Reinforcement Learning deficiencies -- risk of wrong action execution. The new approach is designed for domains, where failures are very expensive. The architecture was evaluated on a car intersection model. The proposed method eliminated around 50\\% of failures.


  1. Fangyi Zhang, Jürgen Leitner, Michael Milford, Ben Upcroft, and Peter Corke. Towards vision-based deep reinforcement learning for robotic motion control. https://arxiv.org/abs/1511.03791, 2015.
  2. Guillaume Lample and Devendra Singh Chaplot. Playing fps games with deep reinforcement learning. In AAAI, pages 2140–2146, 2017.
  3. Mevludin Glavic, Raphaël Fonteneau, and Damien Ernst. Reinforcement learning for electric power system decision and control: Past considerations and perspectives. IFAC-PapersOnLine, 50(1):6918–6927, 2017
  4. Reid, M. Ryan, M. R. K.: Using ILP to Improve Planning in Hierarchical Reinforcement Learning. In: Proceedings of the 10th International Conference on Inductive Logic Programming (ILP ’00). Springer-Verlag, London, UK, pp. 174-190, 2000. http://dx.doi.org/10.1007/3-540-44960-4_11
  5. Fachantidis, A.,Partalas, I.,Tsoumakas G.,Vlahavas, I.: Transferring task models in Reinforcement Learning agents. Neurocomput. 107, pp.23-32. May 2013. http://dx.doi.org/10.1016/j.neucom.2012.08.039
  6. Javier Garcıa and Fernando Fernández. A comprehensive survey on safe reinforcement learning. Journal of Machine Learning Research, 16(1):1437–1480, 2015.
  7. José M Faria. Machine learning safety: An overview. 2018.
  8. Javier Garcia and Fernando Fernández. Safe exploration of state and action spaces in reinforcement learning. Journal of Artificial Intelligence Research, 45:515–564, 2012, http://dx.doi.org/10.1613/jair.3761
  9. Uther, W. T. B.—Veloso, M. M.: Tree based discretization for continuous state space reinforcement learning. Proceedings of the fifteenth national/tenth conference on Artificial intelligence/Innovative applications of artificial intelligence (AAAI ’98/IAAI ’98), American Association for Artificial Intelligence, Menlo Park, CA, USA, pp. 769-774, 1998
  10. Kalyanakrishnan, S.—Stone, P.—Liu, Y.: Model-Based Reinforcement Learning in a Complex Domain, RoboCup 2007: Robot Soccer World Cup XI, Springer-Verlag, Berlin, Heidelberg, 2008
  11. Henderson, J.,Lemon, O.,Georgila, K.: Hybrid reinforcement/supervised learning for dialogue policies from communicator data, IJCAI workshop on Knowledge and Reasoning in Practical Dialogue Systems, 2005
  12. Krzysztoń, M. Sniezynski, B.: Combining Machine Learning and Multi-Agent Approach for Controlling Traffic at Intersection. Computational Collective Intelligence, Springer, 2015, pp 57-66
  13. Wiatrak, Ł.: Hybrid Learning in agent systems, Master Thesis, AGH University of Science and Technology, Cracow, 2012 (in Polish)
  14. Maclin, R.—Shavlik, J. W.: Creating advicetaking einforcement learners, Machine Learning, 22((13)): pp. 251-281, 1996
  15. Benbrahim, H., Franklin, J. A.: Biped dynamic walking using reinforcement learning. Robotics and Autonomous Systems,22,pp.283-302, 1997.
  16. Cetina, V.U.: Supervised reinforcement learning using behavior models, Machine Learning and Applications, 2007. ICMLA 2007, pp.336-341, 13-15 Dec. 2007. http://dx.doi.org/10.1109/ICMLA.2007.14