An environment model in multi-agent reinforcement learning with decentralized training
Rafał Niedziółka-Domański, Jarosław Bylina
DOI: http://dx.doi.org/10.15439/2024F1840
Citation: Proceedings of the 19th Conference on Computer Science and Intelligence Systems (FedCSIS), M. Bolanowski, M. Ganzha, L. Maciaszek, M. Paprzycki, D. Ślęzak (eds). ACSIS, Vol. 39, pages 661–666 (2024)
Abstract. In multi-agent reinforcement learning scenarios, independent learning, where agents learn independently based on their observations, is often preferred for its scalability and simplicity compared to centralized training. However, it faces significant challenges due to the non-stationary nature of the environment from each agent's perspective.
References
- S. V. Albrecht, F. Christianos, and L. Schäfer, Multi-Agent Reinforcement Learning: Foundations and Modern Approaches. MIT Press, 2024. [Online]. Available: https://www.marl-book.com
- L. S. Shapley, “Stochastic games*,” Proceedings of the National Academy of Sciences, vol. 39, no. 10, pp. 1095–1100, 1953. http://dx.doi.org/10.1073/pnas.39.10.1095. [Online]. Available: https://www.pnas.org/doi/abs/10.1073/pnas.39.10.1095
- R. Lowe, Y. Wu, A. Tamar, J. Harb, P. Abbeel, and I. Mordatch, “Multi-agent actor-critic for mixed cooperative-competitive environments,” CoRR, vol. abs/1706.02275, 2017. http://dx.doi.org/10.48550/arXiv.1706.02275. [Online]. Available: http://arxiv.org/abs/1706.02275
- R. S. Sutton and A. G. Barto, Reinforcement Learning, 2nd ed., ser. Adaptive Computation and Machine Learning. Cambridge, MA: MIT Press, 2018. ISBN 978-0-262-03924-6. [Online]. Available: http://incompleteideas.net/book/the-book.html
- T. M. Moerland, J. Broekens, A. Plaat, and C. M. Jonker, “Model-based reinforcement learning: A survey,” 2022. http://dx.doi.org/10.48550/arXiv.2006.16712
- R. S. Sutton, “Integrated architectures for learning, planning, and reacting based on approximating dynamic programming,” in Machine Learning Proceedings 1990, B. Porter and R. Mooney, Eds. San Francisco (CA): Morgan Kaufmann, 1990, pp. 216–224. ISBN 978-1-55860-141-3. [Online]. Available: https://www.sciencedirect.com/science/article/pii/B9781558601413500304
- D. Silver, T. Hubert, J. Schrittwieser, I. Antonoglou, M. Lai, A. Guez, M. Lanctot, L. Sifre, D. Kumaran, T. Graepel, T. P. Lillicrap, K. Simonyan, and D. Hassabis, “Mastering chess and shogi by self-play with a general reinforcement learning algorithm,” CoRR, vol. abs/1712.01815, 2017. http://dx.doi.org/10.48550/arXiv.1712.01815. [Online]. Available: http://arxiv.org/abs/1712.01815
- M. Watter, J. T. Springenberg, J. Boedecker, and M. A. Riedmiller, “Embed to control: A locally linear latent dynamics model for control from raw images,” CoRR, vol. abs/1506.07365, 2015. [Online]. Available: http://arxiv.org/abs/1506.07365
- R. S. Sutton, “Dyna, an integrated architecture for learning, planning, and reacting,” SIGART Bull., vol. 2, no. 4, p. 160–163, jul 1991. http://dx.doi.org/10.1145/122344.122377. [Online]. Available: https://doi.org/10.1145/122344.122377
- W. Zhang, X. Wang, J. Shen, and M. Zhou, “Model-based multi-agent policy optimization with adaptive opponent-wise rollouts,” 2022. http://dx.doi.org/10.48550/arXiv.2105.03363
- G. Tesauro, “Programming backgammon using self-teaching neural nets,” Artificial Intelligence, vol. 134, no. 1, pp. 181–199, 2002. http://dx.doi.org/10.1016/S0004-3702(01)00110-2. [Online]. Available: https://www.sciencedirect.com/science/article/pii/S0004370201001102
- I. Loshchilov and F. Hutter, “Decoupled weight decay regularization,” 2019. http://dx.doi.org/10.48550/arXiv.1711.05101
- J. K. Terry, B. Black, N. Grammel, M. Jayakumar, A. Hari, R. Sullivan, L. Santos, R. Perez, C. Horsch, C. Dieffendahl, N. L. Williams, Y. Lokesh, and P. Ravi, “Pettingzoo: Gym for multi-agent reinforcement learning,” 2021.
- V. Mnih, K. Kavukcuoglu, D. Silver, A. Graves, I. Antonoglou, D. Wierstra, and M. Riedmiller, “Playing atari with deep reinforcement learning,” 2013. http://dx.doi.org/10.48550/arXiv.1312.5602