Modular Multi-Objective Deep Reinforcement Learning with Decision Values
Tomasz Tajmajer
DOI: http://dx.doi.org/10.15439/2018F231
Citation: Proceedings of the 2018 Federated Conference on Computer Science and Information Systems, M. Ganzha, L. Maciaszek, M. Paprzycki (eds). ACSIS, Vol. 15, pages 85–93 (2018)
Abstract. In this work we present a method for using Deep Q-Networks (DQNs) in multi-objective environments. Deep Q-Networks provide remarkable performance in single objective problems learning from high-level visual state representations. However, in many scenarios (e.g in robotics, games), the agent needs to pursue multiple objectives simultaneously. We propose an architecture in which separate DQNs are used to control the agent's behaviour with respect to particular objectives. In this architecture we introduce decision values to improve the scalarization of multiple DQNs into a single action. Our architecture enables the decomposition of the agent's behaviour into controllable and replaceable sub-behaviours learned by distinct modules. Moreover, it allows to change the priorities of particular objectives post-learning while preserving the overall performance of the agent. To evaluate our solution we used a game-like simulator in which an agent - provided with high-level visual input - pursues multiple objectives in a 2D world.
References
- V. Mnih, K. Kavukcuoglu, D. Silver, A. Graves, I. Antonoglou, D. Wierstra, and M. Riedmiller, “Playing atari with deep reinforcement learning,” in NIPS Deep Learning Workshop, 2013.
- V. Mnih, K. Kavukcuoglu, D. Silver, A. A. Rusu, J. Veness, M. G. Bellemare, A. Graves, M. Riedmiller, A. K. Fidjeland, G. Ostrovski, S. Petersen, C. Beattie, A. Sadik, I. Antonoglou, H. King, D. Kumaran, D. Wierstra, S. Legg, and D. Hassabis, “Human-level control through deep reinforcement learning,” Nature, vol. 518, no. 7540, pp. 529–533, Feb 2015, letter. [Online]. Available: http://dx.doi.org/10.1038/nature14236
- R. S. Sutton and A. G. Barto, Introduction to Reinforcement Learning, 1st ed. Cambridge, MA, USA: MIT Press, 1998. ISBN 0262193981
- D. M. Roijers, P. Vamplew, S. Whiteson, and R. Dazeley, “A survey of multi-objective sequential decision-making,” J. Artif. Int. Res., vol. 48, no. 1, pp. 67–113, Oct. 2013. [Online]. Available: http://dl.acm.org/citation.cfm?id=2591248.2591251
- K. V. Moffaert, M. M. Drugan, and A. NowÃl’, “Scalarized multi-objective reinforcement learning: Novel design techniques,” in 2013 IEEE Symposium on Adaptive Dynamic Programming and Reinforcement Learning (ADPRL), April 2013. http://dx.doi.org/10.1109/AD-PRL.2013.6615007. ISSN 2325-1824 pp. 191–199.
- L. Barrett and S. Narayanan, “Learning all optimal policies with multiple criteria,” in Proceedings of the 25th International Conference on Machine Learning, ser. ICML ’08. New York, NY, USA: ACM, 2008. http://dx.doi.org/10.1145/1390156.1390162. ISBN 978-1-60558-205-4 pp. 41–47. [Online]. Available: http://doi.acm.org/10.1145/1390156.1390162
- P. Vamplew, R. Dazeley, A. Berry, R. Issabekov, and E. Dekker, “Empirical evaluation methods for multiobjective reinforcement learning algorithms,” Machine Learning, vol. 84, no. 1, pp. 51–80, Jul 2011. http://dx.doi.org/10.1007/s10994-010-5232-5. [Online]. Available: https://doi.org/10.1007/s10994-010-5232-5
- K. Van Moffaert and A. Nowé, “Multi-objective reinforcement learning using sets of pareto dominating policies,” J. Mach. Learn. Res., vol. 15, no. 1, pp. 3483–3512, Jan. 2014. [Online]. Available: http://dl.acm.org/citation.cfm?id=2627435.2750356
- N. Sprague and D. Ballard, “Multiple-goal reinforcement learning with modular sarsa(o),” in Proceedings of the 18th International Joint Conference on Artificial Intelligence, ser. IJCAI’03. San Francisco, CA, USA: Morgan Kaufmann Publishers Inc., 2003, pp. 1445–1447. [Online]. Available: http://dl.acm.org/citation.cfm?id=1630659.1630892
- P. Raicevic, “Parallel reinforcement learning using multiple reward signals,” Neurocomputing, vol. 69, no. 16–18, pp. 2171–2179, 2006. http://dx.doi.org/http://doi.org/10.1016/j.neucom.2005.07.008 Brain Inspired Cognitive SystemsSelected papers from the 1st International Conference on Brain Inspired Cognitive Systems (BICS 2004)1st International Conference on Brain Inspired Cognitive Systems (BICS 2004). [Online]. Available: http://www.sciencedirect.com/science/article/pii/S0925231205003036
- T. Brys, A. Harutyunyan, P. Vrancx, M. E. Taylor, D. Kudenko, and A. Nowe, “Multi-objectivization of reinforcement learning problems by reward shaping,” in 2014 International Joint Conference on Neural Networks (IJCNN), July 2014. http://dx.doi.org/10.1109/IJCNN.2014.6889732. ISSN 2161-4393 pp. 2315–2322.
- H. Mossalam, Y. M. Assael, D. M. Roijers, and S. Whiteson, “Multi-objective deep reinforcement learning,” CoRR, vol. abs/1610.02707, 2016. [Online]. Available: http://arxiv.org/abs/1610.02707
- R. Pfeifer and C. Scheier, Understanding Intelligence. Cambridge, MA, USA: MIT Press, 2001. ISBN 026266125X
- P. Dhariwal, C. Hesse, O. Klimov, A. Nichol, M. Plappert, A. Radford, J. Schulman, S. Sidor, and Y. Wu, “Openai baselines,” https://github.com/openai/baselines, 2017.
- M. Abadi, A. Agarwal, P. Barham, E. Brevdo, Z. Chen, C. Citro, G. S. Corrado, A. Davis, J. Dean, M. Devin, S. Ghemawat, I. Goodfellow, A. Harp, G. Irving, M. Isard, Y. Jia, R. Jozefowicz, L. Kaiser, M. Kudlur, J. Levenberg, D. Mané, R. Monga, S. Moore, D. Murray, C. Olah, M. Schuster, J. Shlens, B. Steiner, I. Sutskever, K. Talwar, P. Tucker, V. Vanhoucke, V. Vasudevan, F. Viégas, O. Vinyals, P. Warden, M. Wattenberg, M. Wicke, Y. Yu, and X. Zheng, “TensorFlow: Large-scale machine learning on heterogeneous systems,” 2015, software available from tensorflow.org. [Online]. Available: http://tensorflow.org/
- Z. Wang, T. Schaul, M. Hessel, H. Van Hasselt, M. Lanctot, and N. De Freitas, “Dueling network architectures for deep reinforcement learning,” in Proceedings of the 33rd International Conference on International Conference on Machine Learning - Volume 48, ser. ICML’16. JMLR.org, 2016, pp. 1995–2003. [Online]. Available: http://dl.acm.org/citation.cfm?id=3045390.3045601
- H. V. Hasselt, “Double q-learning,” in Advances in Neural Information Processing Systems 23, J. D. Lafferty, C. K. I. Williams, J. Shawe-Taylor, R. S. Zemel, and A. Culotta, Eds. Curran Associates, Inc., 2010, pp. 2613–2621. [Online]. Available: http://papers.nips.cc/paper/3964-double-q-learning.pdf
- T. Schaul, J. Quan, I. Antonoglou, and D. Silver, “Prioritized experience replay,” CoRR, vol. abs/1511.05952, 2015.