Logo PTI Logo FedCSIS

Proceedings of the 19th Conference on Computer Science and Intelligence Systems (FedCSIS)

Annals of Computer Science and Information Systems, Volume 39

An environment model in multi-agent reinforcement learning with decentralized training

,

DOI: http://dx.doi.org/10.15439/2024F1840

Citation: Proceedings of the 19th Conference on Computer Science and Intelligence Systems (FedCSIS), M. Bolanowski, M. Ganzha, L. Maciaszek, M. Paprzycki, D. Ślęzak (eds). ACSIS, Vol. 39, pages 661666 ()

Full text

Abstract. In multi-agent reinforcement learning scenarios, independent learning, where agents learn independently based on their observations, is often preferred for its scalability and simplicity compared to centralized training. However, it faces significant challenges due to the non-stationary nature of the environment from each agent's perspective.

References

  1. S. V. Albrecht, F. Christianos, and L. Schäfer, Multi-Agent Reinforcement Learning: Foundations and Modern Approaches. MIT Press, 2024. [Online]. Available: https://www.marl-book.com
  2. L. S. Shapley, “Stochastic games*,” Proceedings of the National Academy of Sciences, vol. 39, no. 10, pp. 1095–1100, 1953. http://dx.doi.org/10.1073/pnas.39.10.1095. [Online]. Available: https://www.pnas.org/doi/abs/10.1073/pnas.39.10.1095
  3. R. Lowe, Y. Wu, A. Tamar, J. Harb, P. Abbeel, and I. Mordatch, “Multi-agent actor-critic for mixed cooperative-competitive environments,” CoRR, vol. abs/1706.02275, 2017. http://dx.doi.org/10.48550/arXiv.1706.02275. [Online]. Available: http://arxiv.org/abs/1706.02275
  4. R. S. Sutton and A. G. Barto, Reinforcement Learning, 2nd ed., ser. Adaptive Computation and Machine Learning. Cambridge, MA: MIT Press, 2018. ISBN 978-0-262-03924-6. [Online]. Available: http://incompleteideas.net/book/the-book.html
  5. T. M. Moerland, J. Broekens, A. Plaat, and C. M. Jonker, “Model-based reinforcement learning: A survey,” 2022. http://dx.doi.org/10.48550/arXiv.2006.16712
  6. R. S. Sutton, “Integrated architectures for learning, planning, and reacting based on approximating dynamic programming,” in Machine Learning Proceedings 1990, B. Porter and R. Mooney, Eds. San Francisco (CA): Morgan Kaufmann, 1990, pp. 216–224. ISBN 978-1-55860-141-3. [Online]. Available: https://www.sciencedirect.com/science/article/pii/B9781558601413500304
  7. D. Silver, T. Hubert, J. Schrittwieser, I. Antonoglou, M. Lai, A. Guez, M. Lanctot, L. Sifre, D. Kumaran, T. Graepel, T. P. Lillicrap, K. Simonyan, and D. Hassabis, “Mastering chess and shogi by self-play with a general reinforcement learning algorithm,” CoRR, vol. abs/1712.01815, 2017. http://dx.doi.org/10.48550/arXiv.1712.01815. [Online]. Available: http://arxiv.org/abs/1712.01815
  8. M. Watter, J. T. Springenberg, J. Boedecker, and M. A. Riedmiller, “Embed to control: A locally linear latent dynamics model for control from raw images,” CoRR, vol. abs/1506.07365, 2015. [Online]. Available: http://arxiv.org/abs/1506.07365
  9. R. S. Sutton, “Dyna, an integrated architecture for learning, planning, and reacting,” SIGART Bull., vol. 2, no. 4, p. 160–163, jul 1991. http://dx.doi.org/10.1145/122344.122377. [Online]. Available: https://doi.org/10.1145/122344.122377
  10. W. Zhang, X. Wang, J. Shen, and M. Zhou, “Model-based multi-agent policy optimization with adaptive opponent-wise rollouts,” 2022. http://dx.doi.org/10.48550/arXiv.2105.03363
  11. G. Tesauro, “Programming backgammon using self-teaching neural nets,” Artificial Intelligence, vol. 134, no. 1, pp. 181–199, 2002. http://dx.doi.org/10.1016/S0004-3702(01)00110-2. [Online]. Available: https://www.sciencedirect.com/science/article/pii/S0004370201001102
  12. I. Loshchilov and F. Hutter, “Decoupled weight decay regularization,” 2019. http://dx.doi.org/10.48550/arXiv.1711.05101
  13. J. K. Terry, B. Black, N. Grammel, M. Jayakumar, A. Hari, R. Sullivan, L. Santos, R. Perez, C. Horsch, C. Dieffendahl, N. L. Williams, Y. Lokesh, and P. Ravi, “Pettingzoo: Gym for multi-agent reinforcement learning,” 2021.
  14. V. Mnih, K. Kavukcuoglu, D. Silver, A. Graves, I. Antonoglou, D. Wierstra, and M. Riedmiller, “Playing atari with deep reinforcement learning,” 2013. http://dx.doi.org/10.48550/arXiv.1312.5602