An environment model in multi-agent reinforcement learning with decentralized training

Rafał Niedziółka-Domański; Jarosław Bylina

An environment model in multi-agent reinforcement learning with decentralized training

Rafał Niedziółka-Domański, Jarosław Bylina

DOI: http://dx.doi.org/10.15439/2024F1840

Citation: Proceedings of the 19th Conference on Computer Science and Intelligence Systems (FedCSIS), M. Bolanowski, M. Ganzha, L. Maciaszek, M. Paprzycki, D. Ślęzak (eds). ACSIS, Vol. 39, pages 661–666 (2024)

Full text

Abstract. In multi-agent reinforcement learning scenarios, independent learning, where agents learn independently based on their observations, is often preferred for its scalability and simplicity compared to centralized training. However, it faces significant challenges due to the non-stationary nature of the environment from each agent's perspective.

References

S. V. Albrecht, F. Christianos, and L. Schäfer, Multi-Agent Reinforcement Learning: Foundations and Modern Approaches. MIT Press, 2024. [Online]. Available: https://www.marl-book.com
L. S. Shapley, “Stochastic games*,” Proceedings of the National Academy of Sciences, vol. 39, no. 10, pp. 1095–1100, 1953. http://dx.doi.org/10.1073/pnas.39.10.1095. [Online]. Available: https://www.pnas.org/doi/abs/10.1073/pnas.39.10.1095
R. Lowe, Y. Wu, A. Tamar, J. Harb, P. Abbeel, and I. Mordatch, “Multi-agent actor-critic for mixed cooperative-competitive environments,” CoRR, vol. abs/1706.02275, 2017. http://dx.doi.org/10.48550/arXiv.1706.02275. [Online]. Available: http://arxiv.org/abs/1706.02275
R. S. Sutton and A. G. Barto, Reinforcement Learning, 2nd ed., ser. Adaptive Computation and Machine Learning. Cambridge, MA: MIT Press, 2018. ISBN 978-0-262-03924-6. [Online]. Available: http://incompleteideas.net/book/the-book.html
T. M. Moerland, J. Broekens, A. Plaat, and C. M. Jonker, “Model-based reinforcement learning: A survey,” 2022. http://dx.doi.org/10.48550/arXiv.2006.16712
R. S. Sutton, “Integrated architectures for learning, planning, and reacting based on approximating dynamic programming,” in Machine Learning Proceedings 1990, B. Porter and R. Mooney, Eds. San Francisco (CA): Morgan Kaufmann, 1990, pp. 216–224. ISBN 978-1-55860-141-3. [Online]. Available: https://www.sciencedirect.com/science/article/pii/B9781558601413500304
D. Silver, T. Hubert, J. Schrittwieser, I. Antonoglou, M. Lai, A. Guez, M. Lanctot, L. Sifre, D. Kumaran, T. Graepel, T. P. Lillicrap, K. Simonyan, and D. Hassabis, “Mastering chess and shogi by self-play with a general reinforcement learning algorithm,” CoRR, vol. abs/1712.01815, 2017. http://dx.doi.org/10.48550/arXiv.1712.01815. [Online]. Available: http://arxiv.org/abs/1712.01815
M. Watter, J. T. Springenberg, J. Boedecker, and M. A. Riedmiller, “Embed to control: A locally linear latent dynamics model for control from raw images,” CoRR, vol. abs/1506.07365, 2015. [Online]. Available: http://arxiv.org/abs/1506.07365
R. S. Sutton, “Dyna, an integrated architecture for learning, planning, and reacting,” SIGART Bull., vol. 2, no. 4, p. 160–163, jul 1991. http://dx.doi.org/10.1145/122344.122377. [Online]. Available: https://doi.org/10.1145/122344.122377
W. Zhang, X. Wang, J. Shen, and M. Zhou, “Model-based multi-agent policy optimization with adaptive opponent-wise rollouts,” 2022. http://dx.doi.org/10.48550/arXiv.2105.03363
G. Tesauro, “Programming backgammon using self-teaching neural nets,” Artificial Intelligence, vol. 134, no. 1, pp. 181–199, 2002. http://dx.doi.org/10.1016/S0004-3702(01)00110-2. [Online]. Available: https://www.sciencedirect.com/science/article/pii/S0004370201001102
I. Loshchilov and F. Hutter, “Decoupled weight decay regularization,” 2019. http://dx.doi.org/10.48550/arXiv.1711.05101
J. K. Terry, B. Black, N. Grammel, M. Jayakumar, A. Hari, R. Sullivan, L. Santos, R. Perez, C. Horsch, C. Dieffendahl, N. L. Williams, Y. Lokesh, and P. Ravi, “Pettingzoo: Gym for multi-agent reinforcement learning,” 2021.
V. Mnih, K. Kavukcuoglu, D. Silver, A. Graves, I. Antonoglou, D. Wierstra, and M. Riedmiller, “Playing atari with deep reinforcement learning,” 2013. http://dx.doi.org/10.48550/arXiv.1312.5602